fix(db): serialise schema migrations with an advisory lock + stagger drain crons
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s

legal-halacha-drain crashed 29× with asyncpg DeadlockDetectedError. Root cause:
every short-lived cron drain re-runs the idempotent schema migrations on startup
(get_pool → _run_schema_migrations), and three jobs (metadata-drain, halacha-drain,
halacha-supervisor) all fired on the same minute (*/15 / top-of-hour). Two
processes running the DDL concurrently took AccessExclusiveLock in opposite order
→ Postgres killed one with a deadlock.

Two-layer fix:
- Root cause: wrap _run_schema_migrations in a session-level pg_advisory_lock so
  only one process applies DDL at a time; concurrent migrators wait instead of
  deadlocking. DDL body extracted to _apply_schema_ddl. Idempotent, schema
  unchanged.
- Defence-in-depth: give each cron drain a distinct firing minute —
  metadata :00, supervisor :05, halacha-drain :10, digest :12, court-fetch :17 —
  so siblings no longer start at the same instant. SCRIPTS.md updated to match.

Invariants: G1 (fix at source — the single migration path — not the symptom);
G2 (no parallel control path introduced).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-13 08:08:19 +00:00
parent 387cd37255
commit 49acde591e
5 changed files with 35 additions and 8 deletions

View File

@@ -36,7 +36,10 @@
* (Israel hours, default 23/5) · HALACHA_DRAIN_TZ.
*/
// UTC band covering Israel 23:0005:00 across DST; script trims to exact window.
const cron = process.env.HALACHA_DRAIN_CRON || "0 20,21,22,23,0,1,2,3 * * *";
// Fires at minute :10 (not :00) so it never shares a firing minute with
// legal-metadata-drain (:00) or legal-halacha-supervisor (:05) — avoids the
// schema-migration DDL deadlock when sibling drains start at the same instant.
const cron = process.env.HALACHA_DRAIN_CRON || "10 20,21,22,23,0,1,2,3 * * *";
module.exports = {
apps: [