All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 8s
When the halacha drain hit a 429, the supervisor recorded the reset time the error reported (e.g. "resets 6:50pm UTC") and then HELD until that timestamp, re-reading it from its own state every tick without ever checking whether quota had actually returned. claude.ai usually frees up quota earlier than the message claims, so the drain sat idle for hours after it could have resumed — and only a manual kick (clear cooldown + trigger) got it going again. Now, on any tick where we'd otherwise hold on a cooldown, run a cheap live probe (`quota_available()` → a tiny `claude -p` call, cost ~0) and resume the instant it succeeds — at most one probe per 15-min tick, only while we believe we're limited. Conservative on failure (non-zero exit / timeout / limit message → stay held), so a flaky probe never resumes the drain into a real 429. Adds a claude_bin() resolver so the probe works under pm2 cron where PATH is bare. Invariants: G1 (resume decision driven by actual quota state, not a guessed timestamp); no new control path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
22 KiB
22 KiB