feat(halacha): crash-safe incremental extraction + resume (A + resume)
Halacha extraction held ALL chunk results in memory and stored once at the very end — a crash/interrupt mid-run (e.g. the 2026-05-31 freeze) lost everything and re-paid the full LLM cost on retry. Now each chunk's halachot are stored AND the chunk is checkpointed (precedent_chunks.halacha_extracted_at) the moment it finishes: - V25 schema: precedent_chunks.halacha_extracted_at (per-chunk checkpoint). - db.store_halachot_for_chunk: atomic per-chunk insert (halacha_index continues from MAX, caller serializes via an in-process store-lock) + checkpoint mark. - db.reset_halacha_extraction (force) / mark_all_chunks_extracted (legacy backfill). - _extract_impl rewritten: resume by default (skip checkpointed chunks; failed chunks stay pending and are retried; status stays 'processing' until all done); force=True wipes + redoes all. reextract_halachot passes force=True; the queue drain (process_pending) resumes by default. - Legacy guard: a pre-V25 precedent (halachot exist, no checkpoints) is backfilled and treated as complete — never re-extracted (would duplicate). Verified on 9002-24 (55 halachot, legacy): resume → legacy-backfill, NO duplication (stays 55), all chunks checkpointed. Index continuation: store at 55,56 after max 54, no collision. Tracks #72. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -156,7 +156,10 @@ async def reextract_halachot(
|
||||
# bad data. See note in db.request_metadata_extraction.
|
||||
|
||||
await progress("extracting_halachot", 50, "מחלץ הלכות מחדש")
|
||||
result = await halacha_extractor.extract(case_law_id)
|
||||
# Explicit re-extraction = clean slate (force): wipe prior halachot +
|
||||
# per-chunk checkpoints and redo all. (Queue draining / resume uses the
|
||||
# default force=False so an interrupted run continues where it stopped.)
|
||||
result = await halacha_extractor.extract(case_law_id, force=True)
|
||||
# Clear the queue timestamp on completion so the UI badge / worker queue
|
||||
# don't keep showing this row. The queue worker (process_pending_extractions)
|
||||
# already does this; mirror it here so per-record extraction drains too.
|
||||
|
||||
Reference in New Issue
Block a user