fix(halacha): re-extraction preserves chair-approved halachot (INV-G10, #108)

תיקון data-loss: reset_halacha_extraction ביצע DELETE ללא-תנאי לפני חילוץ-מחדש;
קריסה בין המחיקה לאחסון הראשון מחקה את כל אישורי-היו"ר והשאירה את הרשומה תקועה
status='processing' עם 0 שורות (תקרית עמיאל 8126-03-25, 2026-06-08).

עכשיו המחיקה מחריגה review_status IN ('approved','published') — אישור אנושי לא
נמחק בשקט (INV-G10). ה-dedup-on-insert של store_halachot_for_chunk מדלג על חילוץ
טרי שמשכפל מאושרת שנשמרה, כך שאין כפילות. reset מחזיר {deleted, preserved},
וה-extractor מתעד כמה מאושרות נשמרו (provenance, G9).

עמידות מלאה מול מוות-תהליך (OOM) נשארת ל-X16/#114 (durable resume) — זה תנאי-מקדים.

בדיקה: test_halacha_reextract_preserves_approved.py (offline SQL-capture) מאמת
שה-DELETE מחריג approved/published; 64 בדיקות-הלכה קיימות עוברות.

Invariants: G10 (שער-יו"ר — אישור לא נמחק), G1 (תיקון במקור), G9 (provenance).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-10 09:08:16 +00:00
parent 81171983e4
commit 26e0219219
5 changed files with 173 additions and 11 deletions

View File

@@ -4157,17 +4157,44 @@ async def store_halachot(case_law_id: UUID, halachot: list[dict]) -> int:
return len(halachot)
async def reset_halacha_extraction(case_law_id: UUID) -> None:
"""Force a clean re-extraction: wipe halachot + clear per-chunk checkpoints
so every chunk is re-processed (used by explicit re-extract, not resume)."""
async def reset_halacha_extraction(case_law_id: UUID) -> dict:
"""Prepare a clean re-extraction WITHOUT destroying chair-approved work.
Deletes only un-reviewed halachot (``review_status NOT IN ('approved',
'published')``) and clears per-chunk checkpoints so every chunk is
re-processed. Chair-approved / published halachot are PRESERVED — INV-G10:
a human approval is never silently deleted by a re-extraction. The
re-extractor's dedup-on-insert (:func:`store_halachot_for_chunk`) skips any
freshly extracted halacha that duplicates a preserved one, so approvals
survive without producing duplicates.
History: this once wiped ALL halachot first, then re-extracted — a crash
between the wipe and the first chunk's store lost every approval and left
the row stuck ``status='processing'`` with 0 rows (the 2026-06-08 amiel
incident, TaskMaster #108). Durable resume of the whole pipeline is X16/#114.
Returns ``{"deleted": N, "preserved": M}``.
"""
pool = await get_pool()
async with pool.acquire() as conn:
async with conn.transaction():
await conn.execute("DELETE FROM halachot WHERE case_law_id = $1", case_law_id)
preserved = await conn.fetchval(
"SELECT COUNT(*) FROM halachot WHERE case_law_id = $1 "
"AND review_status IN ('approved', 'published')", case_law_id,
)
tag = await conn.execute(
"DELETE FROM halachot WHERE case_law_id = $1 "
"AND review_status NOT IN ('approved', 'published')", case_law_id,
)
await conn.execute(
"UPDATE precedent_chunks SET halacha_extracted_at = NULL "
"WHERE case_law_id = $1", case_law_id,
)
try:
deleted = int(str(tag).split()[-1])
except (ValueError, IndexError):
deleted = 0
return {"deleted": deleted, "preserved": int(preserved or 0)}
async def mark_all_chunks_extracted(case_law_id: UUID) -> int: