legal-ai

ezer-mishpati/legal-ai

Fork 0

Commit Graph

Author	SHA1	Message	Date
Chaim	49acde591e	fix(db): serialise schema migrations with an advisory lock + stagger drain crons All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 6s Details legal-halacha-drain crashed 29× with asyncpg DeadlockDetectedError. Root cause: every short-lived cron drain re-runs the idempotent schema migrations on startup (get_pool → _run_schema_migrations), and three jobs (metadata-drain, halacha-drain, halacha-supervisor) all fired on the same minute (*/15 / top-of-hour). Two processes running the DDL concurrently took AccessExclusiveLock in opposite order → Postgres killed one with a deadlock. Two-layer fix: - Root cause: wrap _run_schema_migrations in a session-level pg_advisory_lock so only one process applies DDL at a time; concurrent migrators wait instead of deadlocking. DDL body extracted to _apply_schema_ddl. Idempotent, schema unchanged. - Defence-in-depth: give each cron drain a distinct firing minute — metadata :00, supervisor :05, halacha-drain :10, digest :12, court-fetch :17 — so siblings no longer start at the same instant. SCRIPTS.md updated to match. Invariants: G1 (fix at source — the single migration path — not the symptom); G2 (no parallel control path introduced). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 08:08:19 +00:00
Chaim	c7c402e7ef	feat(operations): manual burst control for the halacha drain + permanent supervisor All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 6s Details The halacha-extraction backlog needs to be worked off the chair's leftover weekly Claude quota on demand. This adds a MANUAL, time-boxed "burst" — run the drain continuously now until a chosen deadline (default the upcoming Saturday 18:00 IL), managed interactively from /operations — plus the permanent health-supervisor that enforces it. Backend (this PR; deploys via Coolify + host pm2): - db: drain_controls.burst_until (SCHEMA_V37) + set_drain_burst/get_drain_burst/ get_drain_bursts. Single source of truth shared by the container-side /operations API and the host-side supervisor. - web: POST /api/operations/drains/{name}/burst (on→until\|next-Sat-18:00, off→NULL), and burst_until surfaced per-service in the /operations snapshot. - scripts/halacha_drain_supervisor.py + legal-halacha-supervisor.config.cjs: pm2 cron (*/15, zero Claude quota) — re-triggers idle drain, restarts a HUNG run (liveness = per-chunk checkpoints, NOT log mtime), backs off on 429 until the parsed reset (fresh-gated), verifies crash-safe staging. Reads burst_until from the DB; burst auto-expires at the deadline (never bleeds into a fresh week). UI (separate follow-up PR, after Claude Design approval): the /operations toggle + date-picker that calls the burst endpoint. Invariants: G1 (normalize at source — burst lives once in the DB, read by both surfaces), G2 (no parallel control path — CAPTURE field on the existing drain_controls + orchestrates the existing drain, not a new one), G12 (no Paperclip touch), §6 (no silent error-swallow — burst-clear failure is surfaced as a note).	2026-06-12 11:11:13 +00:00

Author

SHA1

Message

Date

Chaim

49acde591e

fix(db): serialise schema migrations with an advisory lock + stagger drain crons

G12 Leak-Guard / leak-guard (pull_request) Successful in 6s

Details

legal-halacha-drain crashed 29× with asyncpg DeadlockDetectedError. Root cause:
every short-lived cron drain re-runs the idempotent schema migrations on startup
(get_pool → _run_schema_migrations), and three jobs (metadata-drain, halacha-drain,
halacha-supervisor) all fired on the same minute (*/15 / top-of-hour). Two
processes running the DDL concurrently took AccessExclusiveLock in opposite order
→ Postgres killed one with a deadlock.

Two-layer fix:
- Root cause: wrap _run_schema_migrations in a session-level pg_advisory_lock so
  only one process applies DDL at a time; concurrent migrators wait instead of
  deadlocking. DDL body extracted to _apply_schema_ddl. Idempotent, schema
  unchanged.
- Defence-in-depth: give each cron drain a distinct firing minute —
  metadata :00, supervisor :05, halacha-drain :10, digest :12, court-fetch :17 —
  so siblings no longer start at the same instant. SCRIPTS.md updated to match.

Invariants: G1 (fix at source — the single migration path — not the symptom);
G2 (no parallel control path introduced).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-13 08:08:19 +00:00

Chaim

c7c402e7ef

feat(operations): manual burst control for the halacha drain + permanent supervisor

G12 Leak-Guard / leak-guard (pull_request) Successful in 6s

Details

The halacha-extraction backlog needs to be worked off the chair's leftover weekly
Claude quota on demand. This adds a MANUAL, time-boxed "burst" — run the drain
continuously now until a chosen deadline (default the upcoming Saturday 18:00 IL),
managed interactively from /operations — plus the permanent health-supervisor that
enforces it.

Backend (this PR; deploys via Coolify + host pm2):
- db: drain_controls.burst_until (SCHEMA_V37) + set_drain_burst/get_drain_burst/
  get_drain_bursts. Single source of truth shared by the container-side /operations
  API and the host-side supervisor.
- web: POST /api/operations/drains/{name}/burst (on→until|next-Sat-18:00, off→NULL),
  and burst_until surfaced per-service in the /operations snapshot.
- scripts/halacha_drain_supervisor.py + legal-halacha-supervisor.config.cjs: pm2 cron
  (*/15, zero Claude quota) — re-triggers idle drain, restarts a HUNG run (liveness =
  per-chunk checkpoints, NOT log mtime), backs off on 429 until the parsed reset
  (fresh-gated), verifies crash-safe staging. Reads burst_until from the DB; burst
  auto-expires at the deadline (never bleeds into a fresh week).

UI (separate follow-up PR, after Claude Design approval): the /operations toggle +
date-picker that calls the burst endpoint.

Invariants: G1 (normalize at source — burst lives once in the DB, read by both
surfaces), G2 (no parallel control path — CAPTURE field on the existing
drain_controls + orchestrates the existing drain, not a new one), G12 (no Paperclip
touch), §6 (no silent error-swallow — burst-clear failure is surfaced as a note).

2026-06-12 11:11:13 +00:00

2 Commits