fix(operations): disabling the halacha drain now stops a running process immediately
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s

The /operations "disabled" toggle only wrote drain_controls.disabled, which the
drain checks at STARTUP — so a drain already mid-run kept going until the queue
emptied or the night window closed. Disabling did not stop a running drain.

Three layers, immediate + backstops:
- web/app.py operations_drain_toggle: on disable, also stop the running process
  immediately via the host pm2 bridge (_ops_pm2_control). Best-effort — a bridge
  failure doesn't fail the toggle.
- halacha_drain_supervisor.py: each tick now reads the disabled flag (added to
  db_snapshot) and, when set, stops the drain and never re-triggers it —
  regardless of burst/window. Backstop if the UI path failed (≤ one tick).
- drain_halacha_queue.py: re-check is_drain_disabled at the top of every round,
  so a drain disabled mid-run halts at the next round boundary. Per-chunk
  checkpoints mean the in-flight case loses nothing.

SCRIPTS.md updated for both drain and supervisor.

Invariants: G1 (fix at source — the disable control honoured along every path,
not just at startup); G2 (no parallel control path — same drain_controls flag).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 09:03:07 +00:00
parent 72f81734f1
commit a44827c3dd
4 changed files with 43 additions and 7 deletions

View File

@@ -98,7 +98,8 @@ def db_snapshot() -> dict:
" ck=await c.fetchval('SELECT count(*) FROM precedent_chunks WHERE halacha_extracted_at IS NOT NULL')\n"
" pend_rev=await c.fetchval(\"SELECT count(*) FROM halachot WHERE review_status='pending_review'\")\n"
" bu=await c.fetchval(\"SELECT burst_until FROM drain_controls WHERE name='legal-halacha-drain'\")\n"
" print(json.dumps({'status_counts':st,'processing_cases':procs,'halachot_total':hal,'checkpointed_chunks':ck,'pending_review':pend_rev,'burst_until':bu.isoformat() if bu else None}))\n"
" dis=await c.fetchval(\"SELECT disabled FROM drain_controls WHERE name='legal-halacha-drain'\")\n"
" print(json.dumps({'status_counts':st,'processing_cases':procs,'halachot_total':hal,'checkpointed_chunks':ck,'pending_review':pend_rev,'burst_until':bu.isoformat() if bu else None,'disabled':bool(dis)}))\n"
"asyncio.run(m())\n"
)
return json.loads(_venv_py(code).splitlines()[-1])
@@ -255,6 +256,23 @@ def tick():
"mode": "db_error", "next_wake_sec": 900}, ensure_ascii=False))
return
# /operations "disabled" switch — highest priority. The drain self-guards at
# startup, but a process mid-run wouldn't notice the flag, and the cron keeps
# firing it; so when disabled we stop any running drain and never re-trigger,
# regardless of burst/window. (The UI toggle also stops it immediately via the
# bridge; this is the backstop + the "don't re-ignite a disabled drain" gate.)
if snap.get("disabled"):
stopped = stop_drain()
save_state({**prev, "tick_at": now.isoformat(), "mode": "disabled",
"action": "stopped-disabled" if stopped else "already-stopped"})
print(f"🛑 {now.astimezone(IDT):%H:%M:%S IDT} | מצב: disabled | "
f"מסומן לא-פעיל ב-/operations — הדריינר {'נעצר' if stopped else 'כבר עצור'}.")
print("JSON:" + json.dumps(
{"ok": True, "mode": "disabled",
"action": "stopped-disabled" if stopped else "already-stopped",
"next_wake_sec": 900}, ensure_ascii=False))
return
# burst state from the DB + auto-expiry (manual on; auto-off at the deadline)
burst_until = datetime.fromisoformat(snap["burst_until"]) if snap.get("burst_until") else None
if burst_until and now >= burst_until: