fix(operations): disabling the halacha drain now stops a running process immediately
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s
The /operations "disabled" toggle only wrote drain_controls.disabled, which the drain checks at STARTUP — so a drain already mid-run kept going until the queue emptied or the night window closed. Disabling did not stop a running drain. Three layers, immediate + backstops: - web/app.py operations_drain_toggle: on disable, also stop the running process immediately via the host pm2 bridge (_ops_pm2_control). Best-effort — a bridge failure doesn't fail the toggle. - halacha_drain_supervisor.py: each tick now reads the disabled flag (added to db_snapshot) and, when set, stops the drain and never re-triggers it — regardless of burst/window. Backstop if the UI path failed (≤ one tick). - drain_halacha_queue.py: re-check is_drain_disabled at the top of every round, so a drain disabled mid-run halts at the next round boundary. Per-chunk checkpoints mean the in-flight case loses nothing. SCRIPTS.md updated for both drain and supervisor. Invariants: G1 (fix at source — the disable control honoured along every path, not just at startup); G2 (no parallel control path — same drain_controls flag). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
18
web/app.py
18
web/app.py
@@ -6709,14 +6709,24 @@ async def operations_service_action(name: str, action: str):
|
||||
async def operations_drain_toggle(name: str, body: dict = Body(...)):
|
||||
"""Switch a cron drain on/off (the 'startup type' in the services panel).
|
||||
|
||||
Written straight to drain_controls — no host roundtrip; the drain reads the
|
||||
flag at startup and no-ops when disabled (pm2 cron_restart can't be trusted
|
||||
to stay stopped)."""
|
||||
Written to drain_controls so the drain no-ops at its NEXT startup (pm2
|
||||
cron_restart can't be trusted to stay stopped). On disable we ALSO stop any
|
||||
currently-running process immediately via the host pm2 bridge — the DB flag
|
||||
alone wouldn't halt a drain mid-run. Best-effort: a bridge failure doesn't
|
||||
fail the toggle (the supervisor stops it on its next tick as a backstop)."""
|
||||
if not name.startswith("legal-"):
|
||||
raise HTTPException(403, "ניתן לשלוט רק בשירותי legal-*")
|
||||
disabled = bool(body.get("disabled"))
|
||||
await db.set_drain_disabled(name, disabled)
|
||||
return {"ok": True, "name": name, "disabled": disabled}
|
||||
stopped = None
|
||||
if disabled:
|
||||
try:
|
||||
await _ops_pm2_control(name, "stop")
|
||||
stopped = True
|
||||
except Exception as e:
|
||||
logger.warning("disable %s: immediate pm2 stop failed: %s", name, e)
|
||||
stopped = False
|
||||
return {"ok": True, "name": name, "disabled": disabled, "stopped": stopped}
|
||||
|
||||
|
||||
def _next_saturday_18_il() -> datetime:
|
||||
|
||||
Reference in New Issue
Block a user