legal-ai/scripts/halacha_drain_supervisor.py at 013fe39ea75e63169e958980f929b1ebe63a10a4

Files

G12 Leak-Guard / leak-guard (pull_request) Successful in 8s

Details

fix(supervisor): re-probe claude.ai quota instead of waiting blindly for the reported reset

When the halacha drain hit a 429, the supervisor recorded the reset time the
error reported (e.g. "resets 6:50pm UTC") and then HELD until that timestamp,
re-reading it from its own state every tick without ever checking whether quota
had actually returned. claude.ai usually frees up quota earlier than the message
claims, so the drain sat idle for hours after it could have resumed — and only a
manual kick (clear cooldown + trigger) got it going again.

Now, on any tick where we'd otherwise hold on a cooldown, run a cheap live probe
(`quota_available()` → a tiny `claude -p` call, cost ~0) and resume the instant
it succeeds — at most one probe per 15-min tick, only while we believe we're
limited. Conservative on failure (non-zero exit / timeout / limit message →
stay held), so a flaky probe never resumes the drain into a real 429. Adds a
claude_bin() resolver so the probe works under pm2 cron where PATH is bare.

Invariants: G1 (resume decision driven by actual quota state, not a guessed
timestamp); no new control path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-13 09:35:45 +00:00

22 KiB

Raw Blame History

View Raw

22 KiB Raw Blame History

22 KiB

Raw Blame History