fix(db): serialise schema migrations with an advisory lock + stagger drain crons
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s

legal-halacha-drain crashed 29× with asyncpg DeadlockDetectedError. Root cause:
every short-lived cron drain re-runs the idempotent schema migrations on startup
(get_pool → _run_schema_migrations), and three jobs (metadata-drain, halacha-drain,
halacha-supervisor) all fired on the same minute (*/15 / top-of-hour). Two
processes running the DDL concurrently took AccessExclusiveLock in opposite order
→ Postgres killed one with a deadlock.

Two-layer fix:
- Root cause: wrap _run_schema_migrations in a session-level pg_advisory_lock so
  only one process applies DDL at a time; concurrent migrators wait instead of
  deadlocking. DDL body extracted to _apply_schema_ddl. Idempotent, schema
  unchanged.
- Defence-in-depth: give each cron drain a distinct firing minute —
  metadata :00, supervisor :05, halacha-drain :10, digest :12, court-fetch :17 —
  so siblings no longer start at the same instant. SCRIPTS.md updated to match.

Invariants: G1 (fix at source — the single migration path — not the symptom);
G2 (no parallel control path introduced).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 08:08:19 +00:00
parent 387cd37255
commit 49acde591e
5 changed files with 35 additions and 8 deletions

View File

@@ -27,7 +27,10 @@
* pm2 start /home/chaim/legal-ai/scripts/legal-halacha-supervisor.config.cjs
* pm2 save
*/
const cron = process.env.HALACHA_SUPERVISOR_CRON || "*/15 * * * *";
// Staggered to minute :05 of the */15 cycle (:05,:20,:35,:50) so it never shares
// a firing minute with legal-metadata-drain (:00) or legal-halacha-drain (:10) —
// avoids the schema-migration DDL deadlock when sibling drains start together.
const cron = process.env.HALACHA_SUPERVISOR_CRON || "5-59/15 * * * *";
module.exports = {
apps: [