All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s
legal-halacha-drain crashed 29× with asyncpg DeadlockDetectedError. Root cause: every short-lived cron drain re-runs the idempotent schema migrations on startup (get_pool → _run_schema_migrations), and three jobs (metadata-drain, halacha-drain, halacha-supervisor) all fired on the same minute (*/15 / top-of-hour). Two processes running the DDL concurrently took AccessExclusiveLock in opposite order → Postgres killed one with a deadlock. Two-layer fix: - Root cause: wrap _run_schema_migrations in a session-level pg_advisory_lock so only one process applies DDL at a time; concurrent migrators wait instead of deadlocking. DDL body extracted to _apply_schema_ddl. Idempotent, schema unchanged. - Defence-in-depth: give each cron drain a distinct firing minute — metadata :00, supervisor :05, halacha-drain :10, digest :12, court-fetch :17 — so siblings no longer start at the same instant. SCRIPTS.md updated to match. Invariants: G1 (fix at source — the single migration path — not the symptom); G2 (no parallel control path introduced). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
49 lines
2.3 KiB
JavaScript
49 lines
2.3 KiB
JavaScript
/**
|
||
* pm2 ecosystem entry for legal-halacha-supervisor — the permanent health
|
||
* manager for the halacha-extraction drain (legal-halacha-drain).
|
||
*
|
||
* Runs ONE health tick per cron fire (~every 15 min) and takes ZERO Claude quota
|
||
* itself (only the drain calls Opus); it just reads the DB/logs/pm2 and pokes the
|
||
* existing drain via the established run-now mechanism. Each tick:
|
||
* • re-triggers the one-shot drain when idle and the queue is non-empty
|
||
* • restarts a HUNG run (online but no new chunk-checkpoint > 25 min — the
|
||
* out-log only updates when a whole CASE finishes, so mtime is not liveness)
|
||
* • backs off on rate-limit (claude_session 429) until the CLI's parsed reset
|
||
* • verifies crash-safe per-chunk staging is committing (nothing lost)
|
||
*
|
||
* BURST (manual "run continuously now" window): source of truth is
|
||
* drain_controls.burst_until in the DB — the SAME value the /operations page
|
||
* reads/writes (G1 single source; G2 no parallel control path). While it is in
|
||
* the future the supervisor LIFTS the drain's night-window; otherwise the drain
|
||
* keeps its normal 23:00–05:00 IDT window. Burst auto-expires at its deadline.
|
||
* Manual front-ends: the /operations toggle, or:
|
||
* mcp-server/.venv/bin/python scripts/halacha_drain_supervisor.py burst-on
|
||
* mcp-server/.venv/bin/python scripts/halacha_drain_supervisor.py burst-off
|
||
*
|
||
* Pattern (mirrors legal-halacha-drain): cron_restart fires the script;
|
||
* autorestart:false → one-shot per tick (pm2 shows "stopped" between ticks).
|
||
*
|
||
* Install (once):
|
||
* pm2 start /home/chaim/legal-ai/scripts/legal-halacha-supervisor.config.cjs
|
||
* pm2 save
|
||
*/
|
||
// Staggered to minute :05 of the */15 cycle (:05,:20,:35,:50) so it never shares
|
||
// a firing minute with legal-metadata-drain (:00) or legal-halacha-drain (:10) —
|
||
// avoids the schema-migration DDL deadlock when sibling drains start together.
|
||
const cron = process.env.HALACHA_SUPERVISOR_CRON || "5-59/15 * * * *";
|
||
|
||
module.exports = {
|
||
apps: [
|
||
{
|
||
name: "legal-halacha-supervisor",
|
||
cwd: "/home/chaim/legal-ai",
|
||
script: "/home/chaim/legal-ai/mcp-server/.venv/bin/python",
|
||
args: "scripts/halacha_drain_supervisor.py",
|
||
env: { HOME: "/home/chaim", PYTHONUNBUFFERED: "1" },
|
||
autorestart: false, // one-shot per cron tick
|
||
cron_restart: cron,
|
||
max_memory_restart: "300M",
|
||
},
|
||
],
|
||
};
|