1340bff6f1a2932d91fd73c7ed5182e534a3a3b7
PR #251 made the OAuth usage endpoint the PRIMARY rate-limit signal and the log 429 only a fallback for when the endpoint is unreachable. Observed 2026-06-15: the endpoint reported the window <100% (available) while the claude CLI kept 429-ing ("session limit"). The supervisor then read 'rate_limited=false', classified the drain 'hung', and restart-churned it — RE-EXTRACTING already-completed precedents under the rate limit and DEGRADING them (e.g. 4624/21 lost halachot 3→1, only 4/18 chunks). delta_done went negative (completed cases reverting). Fix: a FRESH CLI 429 is ground truth — the call is literally failing. • ENTER cooldown on EITHER signal (endpoint-exhausted OR fresh 429), so a 429 overrides an endpoint that wrongly reports the window available. • VETO the early resume while a fresh 429 remains (the endpoint can lie "available" mid-storm → without the veto we'd bounce straight back to churn). • DEFAULT_COOLDOWN_MIN=30 when a fresh 429 has no parseable reset time. While limited the drain STOPS (no 429-hammering, no degrading completed cases) and re-ignites only once quota is back AND no fresh 429 remains. Tested: 8 unit-tests over the decision matrix (endpoint×429×stored-cooldown), incl. the exact tonight case and the veto. py_compile clean. Immediate mitigation already applied out-of-band: drain stopped + disabled (drain_controls.disabled) to halt the degradation until this deploys. Invariants: G1 (fix at source — trust the failing call, not a lagging endpoint), G2 (same cooldown path, no parallel control). Builds on PR #251. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Description
AI Legal Decision Drafting System — MCP server, web upload, RAG search
Languages
Python
64.1%
TypeScript
33.7%
JavaScript
1.2%
Shell
0.7%
CSS
0.2%