fix(operations): disabling the halacha drain now stops a running process immediately
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s

The /operations "disabled" toggle only wrote drain_controls.disabled, which the
drain checks at STARTUP — so a drain already mid-run kept going until the queue
emptied or the night window closed. Disabling did not stop a running drain.

Three layers, immediate + backstops:
- web/app.py operations_drain_toggle: on disable, also stop the running process
  immediately via the host pm2 bridge (_ops_pm2_control). Best-effort — a bridge
  failure doesn't fail the toggle.
- halacha_drain_supervisor.py: each tick now reads the disabled flag (added to
  db_snapshot) and, when set, stops the drain and never re-triggers it —
  regardless of burst/window. Backstop if the UI path failed (≤ one tick).
- drain_halacha_queue.py: re-check is_drain_disabled at the top of every round,
  so a drain disabled mid-run halts at the next round boundary. Per-chunk
  checkpoints mean the in-flight case loses nothing.

SCRIPTS.md updated for both drain and supervisor.

Invariants: G1 (fix at source — the disable control honoured along every path,
not just at startup); G2 (no parallel control path — same drain_controls flag).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 09:03:07 +00:00
parent 72f81734f1
commit a44827c3dd
4 changed files with 43 additions and 7 deletions

View File

@@ -6709,14 +6709,24 @@ async def operations_service_action(name: str, action: str):
async def operations_drain_toggle(name: str, body: dict = Body(...)):
"""Switch a cron drain on/off (the 'startup type' in the services panel).
Written straight to drain_controls — no host roundtrip; the drain reads the
flag at startup and no-ops when disabled (pm2 cron_restart can't be trusted
to stay stopped)."""
Written to drain_controls so the drain no-ops at its NEXT startup (pm2
cron_restart can't be trusted to stay stopped). On disable we ALSO stop any
currently-running process immediately via the host pm2 bridge — the DB flag
alone wouldn't halt a drain mid-run. Best-effort: a bridge failure doesn't
fail the toggle (the supervisor stops it on its next tick as a backstop)."""
if not name.startswith("legal-"):
raise HTTPException(403, "ניתן לשלוט רק בשירותי legal-*")
disabled = bool(body.get("disabled"))
await db.set_drain_disabled(name, disabled)
return {"ok": True, "name": name, "disabled": disabled}
stopped = None
if disabled:
try:
await _ops_pm2_control(name, "stop")
stopped = True
except Exception as e:
logger.warning("disable %s: immediate pm2 stop failed: %s", name, e)
stopped = False
return {"ok": True, "name": name, "disabled": disabled, "stopped": stopped}
def _next_saturday_18_il() -> datetime: