fix(halacha): re-extraction preserves chair-approved halachot (INV-G10, #108)

תיקון data-loss: reset_halacha_extraction ביצע DELETE ללא-תנאי לפני חילוץ-מחדש;
קריסה בין המחיקה לאחסון הראשון מחקה את כל אישורי-היו"ר והשאירה את הרשומה תקועה
status='processing' עם 0 שורות (תקרית עמיאל 8126-03-25, 2026-06-08).

עכשיו המחיקה מחריגה review_status IN ('approved','published') — אישור אנושי לא
נמחק בשקט (INV-G10). ה-dedup-on-insert של store_halachot_for_chunk מדלג על חילוץ
טרי שמשכפל מאושרת שנשמרה, כך שאין כפילות. reset מחזיר {deleted, preserved},
וה-extractor מתעד כמה מאושרות נשמרו (provenance, G9).

עמידות מלאה מול מוות-תהליך (OOM) נשארת ל-X16/#114 (durable resume) — זה תנאי-מקדים.

בדיקה: test_halacha_reextract_preserves_approved.py (offline SQL-capture) מאמת
שה-DELETE מחריג approved/published; 64 בדיקות-הלכה קיימות עוברות.

Invariants: G10 (שער-יו"ר — אישור לא נמחק), G1 (תיקון במקור), G9 (provenance).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-10 09:08:16 +00:00
parent 81171983e4
commit 26e0219219
5 changed files with 173 additions and 11 deletions

View File

@@ -6,8 +6,10 @@ structured list of halachot, validates each one against the source text,
embeds the rule statement, and stores everything as ``pending_review`` in
the ``halachot`` table.
All extraction is idempotent — calling ``extract(case_law_id)`` twice
deletes prior rows for that precedent first.
All extraction is idempotent — calling ``extract(case_law_id, force=True)``
twice drops the precedent's un-reviewed rows and re-extracts. Chair-approved /
published halachot are PRESERVED across a re-extract (INV-G10); see
``db.reset_halacha_extraction``.
Trust model:
Per chair decision, NO halacha is auto-published. Every extracted
@@ -530,8 +532,20 @@ async def _extract_impl(case_law_id: UUID, force: bool = False,
return {"status": "no_chunks", "extracted": 0, "stored": 0}
# force = clean slate; otherwise resume (skip already-checkpointed chunks).
# "Clean slate" preserves chair-approved/published halachot (INV-G10) — only
# un-reviewed rows are dropped; the per-chunk dedup-on-insert skips fresh
# extractions that duplicate a preserved approval, so approvals survive a
# re-extract without duplicating. See db.reset_halacha_extraction / #108.
preserved_approved = 0
if force:
await db.reset_halacha_extraction(case_law_id)
reset = await db.reset_halacha_extraction(case_law_id)
preserved_approved = reset.get("preserved", 0)
if preserved_approved:
logger.info(
"halacha_extractor: case_law=%s force re-extract — preserved %d "
"approved/published halachot (INV-G10), dropped %d un-reviewed.",
case_law_id, preserved_approved, reset.get("deleted", 0),
)
for c in chunks:
c["halacha_extracted_at"] = None
@@ -686,5 +700,6 @@ async def _extract_impl(case_law_id: UUID, force: bool = False,
"folded": folded,
"stored": stored,
"stored_this_run": stored_total,
"preserved_approved": preserved_approved,
"total_chunks": len(chunks),
}