feat(halacha): UNIQUE(case_law_id, halacha_index) backstop + task tracking (#83)

#83 pipeline robustness — the index-numbering correctness guarantee:
- Add CREATE UNIQUE INDEX idx_halachot_unique_index ON halachot(case_law_id,
  halacha_index). The extractor assigns the index as MAX+1 under an in-process
  store-lock + a cross-process pg advisory lock, so collisions shouldn't occur
  in normal operation — but per the research (FireHydrant/OneUptime) the
  constraint is the actual correctness guarantee while the lock is the
  optimization. A racing/double run now fails LOUDLY (UniqueViolation, chunk
  left un-checkpointed → clean resume) instead of silently appending the
  duplicates that were the 2026-05/06 over-extraction root cause.

Data prep (run against the live DB before the constraint, backed up to
data/audit/halacha-reindex-backup-*.sql): the 6 precedents that still carried
colliding halacha_index values (9 groups, distinct principles that shared a
number — NOT content dups) were renumbered to unique sequential indices.

Verified: advisory lock holds cross-process and the DB path is direct asyncpg
(no transaction-pooler), so the session lock is safe (83.1); force=True does
delete+checkpoint-clear in one transaction (83.5); constraint rejects a
duplicate-index insert (integration-checked). Full suite 156 passed.

Also commits the TaskMaster tracking for the whole halacha-quality initiative
(#81-#84 + research-backed subtasks, statuses).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-03 13:06:58 +00:00
parent 8e3d14abee
commit 0f64b4c062
2 changed files with 380 additions and 3 deletions

View File

@@ -676,6 +676,15 @@ CREATE INDEX IF NOT EXISTS idx_halachot_practice ON halachot USING gin(practice_
CREATE INDEX IF NOT EXISTS idx_halachot_tags ON halachot USING gin(subject_tags);
CREATE INDEX IF NOT EXISTS idx_halachot_vec
ON halachot USING ivfflat (embedding vector_cosine_ops) WITH (lists = 50);
-- #83: halacha_index must be unique per precedent. The extractor assigns it as
-- MAX(halacha_index)+1 under an in-process store-lock + a cross-process advisory
-- lock, so collisions shouldn't occur — but per FireHydrant/OneUptime the
-- constraint is the actual correctness guarantee (the lock is the optimization).
-- A racing/double run now fails LOUDLY instead of silently appending duplicates
-- (the 2026-05/06 over-extraction root cause). Requires clean data first (see
-- scripts: the 6 colliding precedents were renumbered 2026-06-03).
CREATE UNIQUE INDEX IF NOT EXISTS idx_halachot_unique_index
ON halachot(case_law_id, halacha_index);
"""