PR #251 made the OAuth usage endpoint the PRIMARY rate-limit signal and the log
429 only a fallback for when the endpoint is unreachable. Observed 2026-06-15: the
endpoint reported the window <100% (available) while the claude CLI kept 429-ing
("session limit"). The supervisor then read 'rate_limited=false', classified the
drain 'hung', and restart-churned it — RE-EXTRACTING already-completed precedents
under the rate limit and DEGRADING them (e.g. 4624/21 lost halachot 3→1, only
4/18 chunks). delta_done went negative (completed cases reverting).
Fix: a FRESH CLI 429 is ground truth — the call is literally failing.
• ENTER cooldown on EITHER signal (endpoint-exhausted OR fresh 429), so a 429
overrides an endpoint that wrongly reports the window available.
• VETO the early resume while a fresh 429 remains (the endpoint can lie
"available" mid-storm → without the veto we'd bounce straight back to churn).
• DEFAULT_COOLDOWN_MIN=30 when a fresh 429 has no parseable reset time.
While limited the drain STOPS (no 429-hammering, no degrading completed cases) and
re-ignites only once quota is back AND no fresh 429 remains.
Tested: 8 unit-tests over the decision matrix (endpoint×429×stored-cooldown),
incl. the exact tonight case and the veto. py_compile clean.
Immediate mitigation already applied out-of-band: drain stopped + disabled
(drain_controls.disabled) to halt the degradation until this deploys.
Invariants: G1 (fix at source — trust the failing call, not a lagging endpoint),
G2 (same cooldown path, no parallel control). Builds on PR #251.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>