feat(halacha): UNIQUE(case_law_id, halacha_index) backstop + task tracking (#83)
#83 pipeline robustness — the index-numbering correctness guarantee: - Add CREATE UNIQUE INDEX idx_halachot_unique_index ON halachot(case_law_id, halacha_index). The extractor assigns the index as MAX+1 under an in-process store-lock + a cross-process pg advisory lock, so collisions shouldn't occur in normal operation — but per the research (FireHydrant/OneUptime) the constraint is the actual correctness guarantee while the lock is the optimization. A racing/double run now fails LOUDLY (UniqueViolation, chunk left un-checkpointed → clean resume) instead of silently appending the duplicates that were the 2026-05/06 over-extraction root cause. Data prep (run against the live DB before the constraint, backed up to data/audit/halacha-reindex-backup-*.sql): the 6 precedents that still carried colliding halacha_index values (9 groups, distinct principles that shared a number — NOT content dups) were renumbered to unique sequential indices. Verified: advisory lock holds cross-process and the DB path is direct asyncpg (no transaction-pooler), so the session lock is safe (83.1); force=True does delete+checkpoint-clear in one transaction (83.5); constraint rejects a duplicate-index insert (integration-checked). Full suite 156 passed. Also commits the TaskMaster tracking for the whole halacha-quality initiative (#81-#84 + research-backed subtasks, statuses). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -676,6 +676,15 @@ CREATE INDEX IF NOT EXISTS idx_halachot_practice ON halachot USING gin(practice_
|
||||
CREATE INDEX IF NOT EXISTS idx_halachot_tags ON halachot USING gin(subject_tags);
|
||||
CREATE INDEX IF NOT EXISTS idx_halachot_vec
|
||||
ON halachot USING ivfflat (embedding vector_cosine_ops) WITH (lists = 50);
|
||||
-- #83: halacha_index must be unique per precedent. The extractor assigns it as
|
||||
-- MAX(halacha_index)+1 under an in-process store-lock + a cross-process advisory
|
||||
-- lock, so collisions shouldn't occur — but per FireHydrant/OneUptime the
|
||||
-- constraint is the actual correctness guarantee (the lock is the optimization).
|
||||
-- A racing/double run now fails LOUDLY instead of silently appending duplicates
|
||||
-- (the 2026-05/06 over-extraction root cause). Requires clean data first (see
|
||||
-- scripts: the 6 colliding precedents were renumbered 2026-06-03).
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS idx_halachot_unique_index
|
||||
ON halachot(case_law_id, halacha_index);
|
||||
"""
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user