feat(operations): manual burst control for the halacha drain + permanent supervisor
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s
The halacha-extraction backlog needs to be worked off the chair's leftover weekly
Claude quota on demand. This adds a MANUAL, time-boxed "burst" — run the drain
continuously now until a chosen deadline (default the upcoming Saturday 18:00 IL),
managed interactively from /operations — plus the permanent health-supervisor that
enforces it.
Backend (this PR; deploys via Coolify + host pm2):
- db: drain_controls.burst_until (SCHEMA_V37) + set_drain_burst/get_drain_burst/
get_drain_bursts. Single source of truth shared by the container-side /operations
API and the host-side supervisor.
- web: POST /api/operations/drains/{name}/burst (on→until|next-Sat-18:00, off→NULL),
and burst_until surfaced per-service in the /operations snapshot.
- scripts/halacha_drain_supervisor.py + legal-halacha-supervisor.config.cjs: pm2 cron
(*/15, zero Claude quota) — re-triggers idle drain, restarts a HUNG run (liveness =
per-chunk checkpoints, NOT log mtime), backs off on 429 until the parsed reset
(fresh-gated), verifies crash-safe staging. Reads burst_until from the DB; burst
auto-expires at the deadline (never bleeds into a fresh week).
UI (separate follow-up PR, after Claude Design approval): the /operations toggle +
date-picker that calls the burst endpoint.
Invariants: G1 (normalize at source — burst lives once in the DB, read by both
surfaces), G2 (no parallel control path — CAPTURE field on the existing
drain_controls + orchestrates the existing drain, not a new one), G12 (no Paperclip
touch), §6 (no silent error-swallow — burst-clear failure is surfaced as a note).
This commit is contained in:
49
web/app.py
49
web/app.py
@@ -6637,8 +6637,10 @@ async def operations_snapshot():
|
||||
|
||||
pm2 = await _ops_pm2_services()
|
||||
controls = await db.get_drain_controls()
|
||||
bursts = await db.get_drain_bursts()
|
||||
for svc in pm2["services"]:
|
||||
svc["disabled"] = controls.get(svc.get("name", ""), False)
|
||||
svc["burst_until"] = bursts.get(svc.get("name", ""))
|
||||
|
||||
def _iso(rows: list[dict]) -> list[dict]:
|
||||
for d in rows:
|
||||
@@ -6717,6 +6719,53 @@ async def operations_drain_toggle(name: str, body: dict = Body(...)):
|
||||
return {"ok": True, "name": name, "disabled": disabled}
|
||||
|
||||
|
||||
def _next_saturday_18_il() -> datetime:
|
||||
"""Upcoming Saturday 18:00 Israel time (DST-safe)."""
|
||||
from datetime import timedelta
|
||||
from zoneinfo import ZoneInfo
|
||||
il = ZoneInfo("Asia/Jerusalem")
|
||||
now = datetime.now(il)
|
||||
days = (5 - now.weekday()) % 7 # Mon=0 .. Sat=5 .. Sun=6
|
||||
cand = now.replace(hour=18, minute=0, second=0, microsecond=0) + timedelta(days=days)
|
||||
if cand <= now:
|
||||
cand += timedelta(days=7)
|
||||
return cand
|
||||
|
||||
|
||||
@app.post("/api/operations/drains/{name}/burst")
|
||||
async def operations_drain_burst(name: str, body: dict = Body(...)):
|
||||
"""Start/stop a drain's MANUAL burst window (chair-controlled, from /operations).
|
||||
|
||||
``action='on'`` → ``burst_until`` = body ``until`` (ISO) or the upcoming
|
||||
Saturday 18:00 Israel time. ``action='off'`` → NULL. The host supervisor
|
||||
(legal-halacha-supervisor) reads this from the DB and lifts/restores the
|
||||
drain's window accordingly (takes effect within one supervisor tick, ≤15 min).
|
||||
Never set automatically — manual only."""
|
||||
if not name.startswith("legal-"):
|
||||
raise HTTPException(403, "ניתן לשלוט רק בשירותי legal-*")
|
||||
action = (body.get("action") or "").lower()
|
||||
if action == "off":
|
||||
await db.set_drain_burst(name, None)
|
||||
return {"ok": True, "name": name, "burst_until": None}
|
||||
if action == "on":
|
||||
until = body.get("until")
|
||||
if until:
|
||||
try:
|
||||
until_dt = datetime.fromisoformat(until)
|
||||
except (ValueError, TypeError):
|
||||
raise HTTPException(400, "until חייב להיות ISO-8601")
|
||||
if until_dt.tzinfo is None:
|
||||
from zoneinfo import ZoneInfo
|
||||
until_dt = until_dt.replace(tzinfo=ZoneInfo("Asia/Jerusalem"))
|
||||
else:
|
||||
until_dt = _next_saturday_18_il()
|
||||
if until_dt <= datetime.now(timezone.utc):
|
||||
raise HTTPException(400, "until חייב להיות בעתיד")
|
||||
await db.set_drain_burst(name, until_dt)
|
||||
return {"ok": True, "name": name, "burst_until": until_dt.isoformat()}
|
||||
raise HTTPException(400, "action חייב להיות on|off")
|
||||
|
||||
|
||||
# ── Live agents (/operations "סוכנים פעילים") ──────────────────────────────
|
||||
# What the pm2/queue panels can't show: WHICH agent is doing the work right now
|
||||
# and its live output. An agent-driven drain (e.g. the CEO heartbeat draining
|
||||
|
||||
Reference in New Issue
Block a user