feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86)

#86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route. extractor.py - extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings summary) before it is stripped — a free professional gold-set (#86.3). - _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped. (a) פסק-דין headers are markdown-wrapped (**פסק דין**); the old anchor required the keyword as the first line char with one separator, so it missed the header and matched a citation 32K deep (עמ"נ 50567-07-21, losing 45% of the body). Now tolerates leading markdown + 0-3 seps, and the final-nun form (דין ן vs דינו נ). (b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The authoring-judge line ends with a colon; we now require it. ingest.py - capture the ratio before stripping and store it on the row (best-effort, non-fatal); also strip the text-upload path (was file-only). db.py - add case_law.nevo_ratio column (additive); allow it in update_case_law. scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration: finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose quote lives in the removed preamble (review_status=pending_review + nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are excluded from --apply as suspected over-strip. --apply writes backup+manifest to data/audit/ first. Chair-gated — NOT applied here. scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session, zero cost) measures recall/precision/granularity of our halachot vs the Nevo ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text). Verified: - pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown over-strip regressions). - backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75% keep (the 32K over-strip is gone). - benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x. Invariants: G1 (normalize at source — strip/capture at ingest, not at read); no silent swallow (contaminated halachot flagged + reported, not dropped); data-migration is dry-run-default with backup+manifest, chair-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:45:43 +00:00
parent 12bdec10fa
commit fb51a0e869
7 changed files with 552 additions and 8 deletions
--- a/mcp-server/src/legal_mcp/services/db.py
+++ b/mcp-server/src/legal_mcp/services/db.py
@@ -619,6 +619,12 @@ ALTER TABLE case_law ADD COLUMN IF NOT EXISTS practice_area TEXT DEFAULT '';
 ALTER TABLE case_law ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT '';
 ALTER TABLE case_law ADD COLUMN IF NOT EXISTS headnote TEXT DEFAULT '';
    -- chair-editable abstract shown in search results.
+ALTER TABLE case_law ADD COLUMN IF NOT EXISTS nevo_ratio TEXT DEFAULT '';
+    -- The Nevo editorial מיני-רציו block, captured at ingest *before* it is
+    -- stripped from the body (#86.3). Kept separate from `headnote` (which is
+    -- our own abstract) so it can serve as a free professional gold-set for
+    -- benchmarking halacha-extraction recall/precision. Empty when the source
+    -- is not a Nevo export or carries no mini-ratio.
 ALTER TABLE case_law ADD COLUMN IF NOT EXISTS source_type TEXT DEFAULT '';
    -- 'court_ruling' | 'appeals_committee'

@@ -3263,7 +3269,7 @@ async def update_case_law(case_law_id: UUID, **fields) -> dict | None:
    """
    allowed = {
        "case_number", "case_name", "court", "date", "practice_area", "appeal_subtype",
-        "subject_tags", "summary", "headnote", "key_quote", "source_url",
+        "subject_tags", "summary", "headnote", "nevo_ratio", "key_quote", "source_url",
        "source_type", "precedent_level", "is_binding", "district", "chair_name",
        "proceeding_type", "citation_formatted",
    }