feat(#99 / T10): get_style_guide — יחסי-זהב נמדדים מהקורפוס לצד היעד

style_distance.measure_corpus_ratios(): מפצל כל החלטה ב-style_corpus לסעיפים (chunker) ומחשב ממוצע %-סעיף — אגרגט "_all" + פר-תוצאה (כשיש). cached. get_style_guide מציג שורת "נמדד בפועל" עם ⚠️ על פער מטווח-היעד. מצב נוכחי: style_corpus.outcome לא מאוכלס → מוצג אגרגט כל-ההחלטות (n=48: רקע 26.4% / טענות 9.7% / דיון 43.8% / סיכום 20.1%); פיצול לפי-תוצאה future-ready. המדידה גם מאירה מגבלות זיהוי-סעיפים (כוונת T10 — לסמן פער לבדיקה). חופף-חלקית ל-T7 שמודד adherence per-draft; זה מודד את הקורפוס. כשל מדידה מוצג, לא נבלע. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(#88+#87): סנכרון DB↔file אוטומטי + claims_coverage מבחין כתב-ערר מתכתובת
2026-06-06 21:01:42 +00:00 · 2026-06-06 20:54:31 +00:00 · 2026-06-06 20:44:41 +00:00 · 2026-06-06 20:08:54 +00:00 · 2026-06-06 20:01:27 +00:00 · 2026-06-06 20:00:52 +00:00
26 changed files with 1784 additions and 59 deletions
--- a/docs/legal-decision-lessons.md
+++ b/docs/legal-decision-lessons.md
@@ -463,6 +463,7 @@ The draft's biggest structural error was adding the "נבאר" doctrinal paragra
 - **Problem:** legal-writer updates `decision_blocks` in the DB, but legal-qa reads from `drafts/decision.md` on disk. In CMPA-62 the writer reported updating block headers in DB but the file did not re-sync, causing QA-2 to fail on exactly the same issue twice.
 - **Lesson:** Single source of truth is mandatory — either the writer must write to BOTH the DB and the decision.md file in one atomic step, or there must be an automatic `regenerate-draft` hook that runs after every block update so the file always reflects the latest DB state. Two unsynchronized sources will keep producing the same false-fail loop.
 - **Owner:** Infrastructure task — not a writer/QA prompt fix.
+- **✅ RESOLVED (GAP-88, 2026-06-06):** `block_writer._update_draft_file` is now an automatic regenerate hook called from `store_block` (every persist) **and** `renumber_all_blocks` — so `drafts/decision.md` always reflects `decision_blocks`. legal-qa already validates against the DB; both sides are now identical.

 ---

--- a/docs/spec/gap-audit.md
+++ b/docs/spec/gap-audit.md
@@ -88,7 +88,7 @@
 | GAP-51 | `set_outcome` enum-mismatch (3≠4); אוצרות-מילים סותרות | INV-TOOL1/UI1 | Medium | `block_writer.py:442` מול `lessons.py:11`, `workflow.py:145` | SSoT יחיד ל-outcome |
 | GAP-52 | רוב הכלים לא-idempotent (case_create/document_upload/precedent_attach) | INV-TOOL3, G3 | Medium | `server.py`, tools/ | upsert/ON CONFLICT |
 | GAP-53 | אין limit-caps (precedent_library_list/search_*/list_chair_feedback) | INV-TOOL5 | Low | tools/ | clamp ל-max |
-| GAP-54 | 3 מסלולי-קליטת-פסיקה ולידציה א-סימטרית; citation-guard לא-מתועד | INV-ING1, G2 | Medium | `precedent_library.py`, `internal_decisions.py` | איחוד (תואם GAP-01/05) |
+| GAP-54 | 3 מסלולי-קליטת-פסיקה ולידציה א-סימטרית; citation-guard לא-מתועד | INV-ING1, G2 | Medium | `precedent_library.py`, `internal_decisions.py` | ✅ **נפתר ע"י FU-1** — שני מסלולי-הפסיקה (library+internal) עוברים דרך `ingest.ingest_document` הקנוני (ולידציית-enums + citation-guard סימטריים, מתועד ב-01-ingest §4); המסלול ה-3 (training→`style_corpus`) הוא קורפוס נפרד במכוון (סגנון, לא פסיקה). מאומת ב-`test_unified_ingest.py` |
 | GAP-55 | Infisical dead-code; מקור-config לא-מתועד (Coolify-only) | INV-ENV2, G2 | Medium | `mcp-server/.../config.py` | לתעד Coolify SSoT / לבודד Infisical |
 | GAP-56 | UUIDs קשיחים (company/agent) — תואם GAP-26 | INV-ENV3/INT5 | High | `web/paperclip_client.py:36-62`, `web/app.py:3976` | config-driven |
 | GAP-57 | creds plaintext בברירת-מחדל (`paperclip:paperclip`) | INV-ENV4, G9, §6 | High | `web/paperclip_client.py:21`, `web/app.py:3789,3964` | default ריק + fail-loud |
@@ -207,6 +207,7 @@
 - **פרוסה 7, 2026-06-06 — ✅ GAP-48 הושלם.** משפחת `drafting` (18 כלים) הומרה ל-envelope. export_docx/revise_draft/apply_user_edit משתמשים ב-`err`-לכשל (כך שהסוכן והמשתמש רואים את הכשל ברמת-המעטפת), כש-`failed_gates` רוכב ב-`data`; 6 צרכני-app.py (get_decision_template/apply_user_edit×2/revise_draft/list_bookmarks/export_docx) חוּוטו עם בדיקת envelope-status; `test_export_qa_gate` עודכן לחוזה (182/182 עוברים). **GAP-48 סגור — כל ~12 המשפחות אחידות.**
 - **פרוסה 8, 2026-06-06 — ✅ GAP-49 (החלק הקריטי).** השם המטעה `precedent_search_library` (ציטוטים מצורפים-לתיק) שונה ל-`search_case_precedents` ובכך בוטל ההיפוך המסוכן מול `search_precedent_library` (ספרייה סמכותית — מקור CREAC). הישן נשמר כ-alias deprecated (ב-server.py) → אפס שבירה לסוכנים חיים. docstrings הובהרו; עודכנו app.py (typeahead) + legal-researcher/legal-writer docs + precedent_library docstring. 5 כלי-החיפוש הנותרים מחפשים קורפוסים מובחנים בשמות סבירים — לא בוצע rename-המוני (churn גבוה, ערך נמוך). 182/182 עוברים. **⚠ אחרי merge+deploy:** סנכרון cross-company של doc-הסוכן (frontmatter `search_case_precedents`). נותר ב-FU-14: GAP-50 (מיזוג כלי-בלוק — נוגע בתהליך-הכתיבה, דורש הכרעת-יו"ר), GAP-54, GAP-47-חלק-ב.
 - **פרוסה 9, 2026-06-06 — ✅ GAP-50 (הכרעת-יו"ר).** מיפוי הראה שכלי-הבלוק אינם "כפילות מיותרת": `write_block`/`write_all_blocks`/`save_block_content`/`write_interim_draft` משרתים זרימות שונות (CLI/initial-draft מול תהליך-ה-writer "התיקון בקובץ, לא ב-DB"). הכפילות האמיתית היחידה — `draft_section` (הקשר לפי-סעיף, כמעט-נטוש) חופף ל-`get_block_context` (לפי-בלוק, קנוני). הוחלט (יו"ר): **draft_section deprecated** (docstring ב-server.py+drafting.py מפנה ל-get_block_context; draft-decision.md עודכן) — בלי הסרה, בלי מיזוג כלי-הכתיבה (שמירת תהליך-הכתיבה המכוון). 182/182 עוברים. **GAP-49+50 סגורים.** נותר ב-FU-14: GAP-54 (איחוד קליטת-פסיקה), GAP-47-חלק-ב (הנחיות-יו"ר→DB).
+- **פרוסה 10, 2026-06-06 — ✅ GAP-54 (נסגר כ-resolved-by-FU-1).** אימות (G2: לא לפתור מחדש): `ingest.ingest_document` הוא המסלול הקנוני; `precedent_library` ו-`internal_decisions` שניהם עוברים דרכו עם ולידציית-enums + citation-guard סימטריים (מתועד ב-01-ingest §4); training→`style_corpus` הוא קורפוס נפרד במכוון. 9/9 `test_unified_ingest` עוברים — אין קוד לכתוב. **FU-14 כמעט-מלא: נותר רק GAP-47-חלק-ב** (העברת הנחיות-יו"ר מ-`analysis-and-research.md` ל-DB) — פיצ'ר UI+זרימת-אנליסט נפרד, לא דחוף.

 ### FU-15 — deploy/env/secrets
 - **מכסה:** GAP-55..62 · **invariants:** INV-ENV1–ENV5 · **effort:** M · **תלויות:** —
--- a/mcp-server/src/legal_mcp/config.py
+++ b/mcp-server/src/legal_mcp/config.py
@@ -154,6 +154,14 @@ HALACHA_AUTO_APPROVE_THRESHOLD = float(
 # principle. Set > 1.0 to disable semantic dedup (exact-quote dedup still runs).
 HALACHA_DEDUP_COSINE = float(os.environ.get("HALACHA_DEDUP_COSINE", "0.93"))

+# Halacha dedup TAIL band (#82.3) — the [BAND_COSINE, DEDUP_COSINE) range is too
+# low to auto-skip but suspicious. A halacha whose nearest same-precedent
+# neighbor sits in this band AND has high LEXICAL overlap (Jaccard/Levenshtein
+# on rule_statement) is flagged 'near_duplicate' (blocks auto-approve → review),
+# not skipped — catching paraphrases the cosine threshold misses without
+# dropping a possibly-distinct principle unreviewed. 0.83 from the same cleanup.
+HALACHA_DEDUP_BAND_COSINE = float(os.environ.get("HALACHA_DEDUP_BAND_COSINE", "0.83"))
+
 # Halacha NLI entailment validator (#81.3) — after extraction, a claude_session
 # judge checks each halacha's rule_statement is entailed by its supporting_quote.
 # Non-entailed (neutral/contradiction) → quality flag 'nli_unsupported' that
--- a/mcp-server/src/legal_mcp/services/block_writer.py
+++ b/mcp-server/src/legal_mcp/services/block_writer.py
@@ -1088,37 +1088,39 @@ async def save_block_content(case_id: UUID, block_id: str, content: str) -> dict
    result["generation_type"] = "claude-code"
    result["model_used"] = "claude-code"

-    await store_block(UUID(decision["id"]), result)
+    await store_block(UUID(decision["id"]), result)  # store_block syncs the file (#35)
    await db.mark_blocks_stale(case_id, False)

-    # Also write/update the draft file on disk
-    await _update_draft_file(case_id, UUID(decision["id"]))
-
    return result


-async def _update_draft_file(case_id: UUID, decision_id: UUID) -> None:
-    """Rebuild drafts/decision.md from all blocks in DB."""
-    from pathlib import Path
-
-    case = await db.get_case(case_id)
-    if not case:
-        return
-
-    case_dir = config.find_case_dir(case["case_number"])
-    draft_dir = case_dir / "drafts"
-    draft_dir.mkdir(parents=True, exist_ok=True)
-
+async def _update_draft_file(decision_id: UUID) -> None:
+    """Rebuild drafts/decision.md from all blocks in DB — the single
+    regenerate-draft hook (lessons #35 / GAP-88). Called after EVERY
+    decision_blocks mutation (store_block, renumber) so the on-disk file never
+    drifts from the DB. legal-qa validates against the DB; export and the chair
+    read the file — keeping them identical kills the "QA fails twice on the same
+    already-fixed issue" loop (CMPA-62). Resolves case from decision_id so no
+    caller has to thread case_id through."""
    pool = await db.get_pool()
    async with pool.acquire() as conn:
+        case_row = await conn.fetchrow(
+            "SELECT c.case_number FROM decisions d JOIN cases c ON c.id = d.case_id "
+            "WHERE d.id = $1",
+            decision_id,
+        )
+        if not case_row:
+            return
        rows = await conn.fetch(
            "SELECT content FROM decision_blocks WHERE decision_id = $1 AND content != '' ORDER BY block_index",
            decision_id,
        )

+    draft_dir = config.find_case_dir(case_row["case_number"]) / "drafts"
+    draft_dir.mkdir(parents=True, exist_ok=True)
    draft_path = draft_dir / "decision.md"
    draft_path.write_text("\n\n".join(row["content"] for row in rows if row["content"]), encoding="utf-8")
-    logger.info("Draft file updated: %s (%d blocks)", draft_path, len(rows))
+    logger.info("Draft file synced: %s (%d blocks)", draft_path, len(rows))


 # ── Renumbering ───────────────────────────────────────────────────
@@ -1172,6 +1174,11 @@ async def renumber_all_blocks(decision_id: UUID) -> dict:
                )
            updated += 1

+    # #35 — renumber mutates content via raw UPDATE (bypasses store_block), so
+    # sync the draft file here too, otherwise the file keeps stale numbering.
+    if updated:
+        await _update_draft_file(decision_id)
+
    return {"total_paragraphs": current_num - 1, "blocks_updated": updated}


@@ -1204,6 +1211,9 @@ async def store_block(decision_id: UUID, block_result: dict) -> None:
            block_result["model_used"],
            block_result["temperature"],
        )
+    # #35 — regenerate the on-disk draft on every persist so DB and file stay
+    # identical (legal-qa reads DB; export/chair read the file).
+    await _update_draft_file(decision_id)


 async def write_and_store_block(
--- a/mcp-server/src/legal_mcp/services/claude_session.py
+++ b/mcp-server/src/legal_mcp/services/claude_session.py
@@ -29,6 +29,7 @@ from __future__ import annotations
 import asyncio
 import json
 import logging
+import os

 from legal_mcp.config import parse_llm_json

@@ -40,15 +41,39 @@ logger = logging.getLogger(__name__)
 DEFAULT_TIMEOUT = 1800
 LONG_TIMEOUT = 3600  # opus block writing on full case context

-# #85 — `claude -p` fails intermittently with a fast non-zero exit and empty
-# stderr (observed on large/slow cold prompts: CEO write_interim_draft,
-# learning_loop distillation). The SAME prompt succeeds on retry, so the bail is
-# transient — retry with linear backoff. Timeouts and "CLI not found" are
-# deterministic and are NOT retried.
+# #85 — two complementary hardenings for the same symptom (`claude -p` failing
+# with a fast non-zero exit + empty stderr on large/slow cold prompts: CEO
+# write_interim_draft, learning_loop distillation):
+#
+# 1. CLEAN ENV (defensive): a running Claude Code session exports markers into
+#    child processes; a *nested* ``claude -p`` inherits them. Stripping them lets
+#    every nested invocation launch as a clean top-level session. Could not be
+#    reproduced deterministically, so it's a suspect, not a proven cause. Auth/
+#    config (CLAUDE_CONFIG_DIR, ANTHROPIC_*, PATH, HOME) are kept.
+# 2. RETRY (the real fix): the SAME large prompt that exits 1 once succeeds on a
+#    plain retry — the bail is transient. Retry with linear backoff. Timeouts and
+#    "CLI not found" stay deterministic and are NOT retried.
+# See TaskMaster legal-ai #85.
+_SESSION_MARKER_PREFIXES = ("CLAUDECODE", "CLAUDE_CODE_", "CLAUDE_AGENT_")
+_SESSION_MARKER_EXACT = frozenset({"AI_AGENT", "CLAUDE_EFFORT"})
+
 MAX_RETRIES = 3
 RETRY_BACKOFF_BASE = 5  # seconds; sleep = base * attempt_number


+def _clean_subprocess_env() -> dict[str, str]:
+    """Copy the current env minus Claude Code session markers.
+
+    Lets a nested ``claude -p`` start fresh instead of detecting it is
+    already inside a Claude Code session (#85).
+    """
+    env = dict(os.environ)
+    for key in list(env):
+        if key in _SESSION_MARKER_EXACT or key.startswith(_SESSION_MARKER_PREFIXES):
+            del env[key]
+    return env
+
+
 async def query(
    prompt: str,
    timeout: int = DEFAULT_TIMEOUT,
@@ -112,6 +137,8 @@ async def query(
                stdin=asyncio.subprocess.PIPE,
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE,
+                env=_clean_subprocess_env(),
+                cwd=os.path.expanduser("~"),
            )
        except FileNotFoundError:
            # Deterministic — never retry.
@@ -139,8 +166,11 @@ async def query(
            raise RuntimeError(f"Claude CLI timed out after {timeout}s")

        if proc.returncode != 0:
-            stderr = stderr_b.decode("utf-8", errors="replace").strip()[:500] or "unknown error"
-            last_err = f"exit {proc.returncode}: {stderr}"
+            # The CLI sometimes writes its diagnostic to stdout (or nowhere)
+            # rather than stderr (#85) — surface whichever is present.
+            stderr = stderr_b.decode("utf-8", errors="replace").strip()
+            stdout = stdout_b.decode("utf-8", errors="replace").strip()
+            last_err = f"exit {proc.returncode}: {(stderr or stdout or 'no output')[:500]}"
        else:
            stdout = stdout_b.decode("utf-8", errors="replace").strip()
            if stdout:
@@ -256,6 +286,7 @@ async def query_streaming(
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
            cwd=cwd,
+            env=_clean_subprocess_env(),
        )
    except FileNotFoundError:
        yield {
--- a/mcp-server/src/legal_mcp/services/db.py
+++ b/mcp-server/src/legal_mcp/services/db.py
@@ -619,6 +619,12 @@ ALTER TABLE case_law ADD COLUMN IF NOT EXISTS practice_area TEXT DEFAULT '';
 ALTER TABLE case_law ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT '';
 ALTER TABLE case_law ADD COLUMN IF NOT EXISTS headnote TEXT DEFAULT '';
    -- chair-editable abstract shown in search results.
+ALTER TABLE case_law ADD COLUMN IF NOT EXISTS nevo_ratio TEXT DEFAULT '';
+    -- The Nevo editorial מיני-רציו block, captured at ingest *before* it is
+    -- stripped from the body (#86.3). Kept separate from `headnote` (which is
+    -- our own abstract) so it can serve as a free professional gold-set for
+    -- benchmarking halacha-extraction recall/precision. Empty when the source
+    -- is not a Nevo export or carries no mini-ratio.
 ALTER TABLE case_law ADD COLUMN IF NOT EXISTS source_type TEXT DEFAULT '';
    -- 'court_ruling' | 'appeals_committee'

@@ -3263,7 +3269,7 @@ async def update_case_law(case_law_id: UUID, **fields) -> dict | None:
    """
    allowed = {
        "case_number", "case_name", "court", "date", "practice_area", "appeal_subtype",
-        "subject_tags", "summary", "headnote", "key_quote", "source_url",
+        "subject_tags", "summary", "headnote", "nevo_ratio", "key_quote", "source_url",
        "source_type", "precedent_level", "is_binding", "district", "chair_name",
        "proceeding_type", "citation_formatted",
    }
@@ -3693,6 +3699,7 @@ async def store_halachot_for_chunk(
    """
    threshold = config.HALACHA_AUTO_APPROVE_THRESHOLD
    dedup_distance = 1.0 - config.HALACHA_DEDUP_COSINE  # cosine sim → distance
+    band_distance = 1.0 - config.HALACHA_DEDUP_BAND_COSINE  # tail-band ceiling (#82.3)
    pool = await get_pool()
    inserted = 0
    skipped = 0
@@ -3716,21 +3723,32 @@ async def store_halachot_for_chunk(
                if norm_quote and norm_quote in existing_quotes:
                    skipped += 1
                    continue
-                # 2) semantic near-duplicate (rule embedding cosine)
+                # 2) semantic near-duplicate (rule embedding cosine) — fetch the
+                #    nearest same-precedent neighbor once so we can both auto-skip
+                #    (cosine ≥ DEDUP) and flag the lexical tail (#82.3).
                emb = h.get("embedding")
+                flags = list(h.get("quality_flags") or [])
                if emb is not None and config.HALACHA_DEDUP_COSINE <= 1.0:
-                    dup = await conn.fetchval(
-                        "SELECT 1 FROM halachot WHERE case_law_id = $1 "
-                        "AND embedding IS NOT NULL AND (embedding <=> $2) <= $3 "
-                        "LIMIT 1",
-                        case_law_id, emb, dedup_distance,
+                    neighbor = await conn.fetchrow(
+                        "SELECT rule_statement, (embedding <=> $2) AS dist "
+                        "FROM halachot WHERE case_law_id = $1 "
+                        "AND embedding IS NOT NULL "
+                        "ORDER BY embedding <=> $2 LIMIT 1",
+                        case_law_id, emb,
                    )
-                    if dup:
+                    if neighbor is not None:
+                        dist = float(neighbor["dist"])
+                        if dist <= dedup_distance:
                            skipped += 1
                            continue
+                        # tail band: below auto-skip but lexically near → flag.
+                        if (dist <= band_distance
+                                and halacha_quality.FLAG_NEAR_DUPLICATE not in flags
+                                and halacha_quality.lexical_near_duplicate(
+                                    h["rule_statement"], neighbor["rule_statement"])):
+                            flags.append(halacha_quality.FLAG_NEAR_DUPLICATE)

                confidence = float(h.get("confidence", 0.0))
-                flags = h.get("quality_flags") or []
                auto_approve = confidence >= threshold and not flags
                review_status = "approved" if auto_approve else "pending_review"
                reviewer = (
@@ -3774,7 +3792,19 @@ async def list_halachot(
    practice_area: str | None = None,
    limit: int = 200,
    offset: int = 0,
+    exclude_low_quality: bool = False,
+    order_by_priority: bool = False,
 ) -> list[dict]:
+    """List halachot with optional triage controls (#84).
+
+    exclude_low_quality — drop items carrying ANY quality_flag (application /
+      truncated_quote / quote_unverified / non_decision / thin_restatement /
+      nli_unsupported / near_duplicate). These belong in a 'needs extraction
+      fix' bucket, not the chair's approve queue (#84.1).
+    order_by_priority — replace FIFO with an active-learning order (#84.3):
+      negatively-treated first, then most-uncertain (lowest confidence), then
+      oldest — so the chair sees the highest-value decisions first.
+    """
    pool = await get_pool()
    conditions = []
    params: list = []
@@ -3791,7 +3821,16 @@ async def list_halachot(
        conditions.append(f"${idx} = ANY(h.practice_areas)")
        params.append(practice_area)
        idx += 1
+    if exclude_low_quality:
+        # a clean item has an empty/NULL quality_flags array
+        conditions.append("COALESCE(array_length(h.quality_flags, 1), 0) = 0")
    where_sql = f"WHERE {' AND '.join(conditions)}" if conditions else ""
+    order_sql = (
+        "ORDER BY corroboration_negative DESC, h.confidence ASC NULLS LAST, "
+        "h.created_at ASC"
+        if order_by_priority
+        else "ORDER BY h.case_law_id, h.halacha_index"
+    )
    params.extend([limit, offset])
    sql = f"""
        SELECT h.id, h.case_law_id, h.halacha_index, h.rule_statement,
@@ -3819,7 +3858,7 @@ async def list_halachot(
            GROUP BY halacha_id
        ) cor ON cor.halacha_id = h.id
        {where_sql}
-        ORDER BY h.case_law_id, h.halacha_index
+        {order_sql}
        LIMIT ${idx} OFFSET ${idx + 1}
    """
    rows = await pool.fetch(sql, *params)
--- a/mcp-server/src/legal_mcp/services/extractor.py
+++ b/mcp-server/src/legal_mcp/services/extractor.py
@@ -362,12 +362,24 @@ _NEVO_MARKERS = ("ספרות:", "חקיקה שאוזכרה:", "מיני-רציו
 # preamble: bibliography + מיני-רציו). Two families:
 #   - ועדת ערר / district openings (בפנינו / הערר שבנדון / ...)
 #   - COURT-RULING openings (#86.1): a פסק-דין header or the authoring judge's
-#     line ("השופט/ת X:", "כב' השופט", "הנשיא"). Without these, Nevo court
-#     judgments — exactly the ones carrying a מיני-רציו — slipped through unstripped
-#     (e.g. בג"ץ 1764/05), risking that the extractor reads Nevo's answer key.
+#     line. Without these, Nevo court judgments — exactly the ones carrying a
+#     מיני-רציו — slipped through unstripped (e.g. בג"ץ 1764/05).
+#
+# #86.2 hardening — two over-strip bugs found while backfilling:
+#   1. ``פסק-דין`` headers are often markdown-wrapped (``**פסק  דין**``); the old
+#      ``^פסק[- ]דין`` required the keyword to be the very first char of the line
+#      and allowed only one separator, so it missed the header and fell through
+#      to a citation 32K deep (עמ"נ 50567-07-21). We now tolerate leading
+#      markdown/whitespace and 0-3 separators.
+#   2. Bare ``השופט``/``הנשיא`` matched *citations* ("השופט מ' חשין, פסקה 23"),
+#      stripping real decision body. The authoring-judge line ends with a COLON
+#      ("השופט י' עמית:"); citations use a comma. We now require the colon.
 _DECISION_START = re.compile(
-    r"^(בפנינו|לפנינו|לפניי|הערר שבנדון|ועדת הערר לתכנון|רקע עובדתי|עסקינן|"
-    r"פסק[- ]דין|פסק[- ]דינו|כב(?:וד)?['׳]?\s*השופט|המשנה לנשיא|הנשיא|השופט)",
+    r"^[ \t>*_#]{0,6}(?:"
+    r"בפנינו|לפנינו|לפניי|הערר שבנדון|ועדת הערר לתכנון|רקע עובדתי|עסקינן|"
+    r"פסק[ \t\-]{0,3}די(?:ן|נו)|"  # פסק-דין / פסק דין / **פסק  דין** header (final-nun ן vs דינו)
+    r"(?:כב(?:וד)?['׳\"]?\s*)?(?:ה?שופט[ת]?|ה?נשיא[ה]?|המשנה לנשיא)\s+[^\n,]{1,40}:"  # author line → colon
+    r")",
    re.MULTILINE,
 )

@@ -388,3 +400,41 @@ def strip_nevo_preamble(text: str) -> str:
        logger.debug("Stripped %d chars of Nevo preamble", m.start())
        return stripped
    return text
+
+
+_RATIO_MARKER = "מיני-רציו:"
+
+
+def extract_nevo_ratio(text: str) -> str:
+    """Return the Nevo מיני-רציו block (editorial holdings summary), or ''.
+
+    The mini-ratio is Nevo's own headnote — a concise, professionally-written
+    list of the holdings. We capture it *before* :func:`strip_nevo_preamble`
+    discards it, to serve as a free gold-set for benchmarking how well our
+    halacha extractor covers the real holdings (#86.3).
+
+    The block runs from the ``מיני-רציו:`` marker to whichever comes first:
+    the decision body (``_DECISION_START``) or the next preamble marker
+    (bibliography / legislation). Returns '' when there is no mini-ratio.
+    """
+    if not text:
+        return ""
+    start = text.find(_RATIO_MARKER)
+    if start == -1:
+        return ""
+    body = text[start + len(_RATIO_MARKER):]
+
+    # End at the earliest of: decision body start, or a following preamble
+    # marker (ספרות: / חקיקה שאוזכרה: / ...). Both are measured relative to
+    # the ratio body so we never run past it into the judgment itself.
+    end = len(body)
+    dm = _DECISION_START.search(body)
+    if dm:
+        end = min(end, dm.start())
+    for marker in _NEVO_MARKERS:
+        if marker == _RATIO_MARKER:
+            continue
+        pos = body.find(marker)
+        if pos != -1:
+            end = min(end, pos)
+    return body[:end].strip()
--- a/mcp-server/src/legal_mcp/services/halacha_extractor.py
+++ b/mcp-server/src/legal_mcp/services/halacha_extractor.py
@@ -592,10 +592,16 @@ async def _extract_impl(case_law_id: UUID, force: bool = False,
            flags = halacha_quality.compute_quality_flags(
                coerced["rule_statement"], coerced["supporting_quote"],
                coerced["reasoning_summary"], coerced["quote_verified"],
+                coerced["rule_type"],
            )
            coerced["quality_flags"] = flags
            if halacha_quality.FLAG_NON_DECISION in flags and coerced["rule_type"] != "obiter":
                coerced["rule_type"] = "obiter"
+            # #81.4 — a binding-labeled rule that reads as a case-application is
+            # re-typed application (it carries FLAG_APPLICATION either way).
+            elif (halacha_quality.FLAG_APPLICATION in flags
+                  and coerced["rule_type"] == "binding"):
+                coerced["rule_type"] = "application"
            cleaned.append(coerced)
        # #81.3 NLI entailment — one batched judge call per chunk (fail-open).
        if config.HALACHA_NLI_ENABLED and cleaned:
--- a/mcp-server/src/legal_mcp/services/halacha_quality.py
+++ b/mcp-server/src/legal_mcp/services/halacha_quality.py
@@ -128,6 +128,91 @@ def is_thin_restatement(rule_statement: str, supporting_quote: str) -> bool:
    return overlap >= _THIN_OVERLAP and len_ratio <= _THIN_LEN_RATIO


+# ── Fact-dependent application: not a generalizable holding (#81.4) ──
+#
+# The strict rubric's cut_application (docs/halacha-strict-rubric.md §3, §27):
+# a determination that rests on the case's specific facts/parties/amounts is an
+# illustration, not a holding — it must not enter the corpus as a binding rule.
+# The extractor already classifies ``rule_type='application'``; this is a
+# HIGH-PRECISION secondary catch for rules the model mislabeled as binding,
+# using only the unambiguous "applied to THIS case" deixis (bare party words
+# like "המערער" appear in genuine rules too, so they are deliberately excluded).
+
+_FACT_DEPENDENT_MARKERS = (
+    "במקרה דנן",
+    "במקרה שבפנינו",
+    "במקרה שלפנינו",
+    "במקרה שלפניי",
+    "בענייננו",
+    "בנדון דידן",
+    "בנדון דנן",
+    "במקרה שלנו",
+    "בנסיבות המקרה שלפנינו",
+    "בנסיבות תיק זה",
+    "בתיק שלפנינו",
+    "בערר שלפנינו",
+    "בערר דנן",
+)
+
+
+def is_fact_dependent(rule_statement: str) -> bool:
+    """True when the rule is phrased as an application to THIS case (not a holding)."""
+    norm = normalize_text(rule_statement)
+    return any(marker in norm for marker in _FACT_DEPENDENT_MARKERS)
+
+
+# ── Lexical near-duplicate signal (the 0.83–0.90 cosine tail) — #82.3 ──
+#
+# Embedding cosine alone misses paraphrases that float just below the dedup
+# threshold (0.93). A secondary lexical signal — Jaccard over word-shingles +
+# normalized Levenshtein on the rule_statement — catches "same rule, reworded"
+# in that band without lowering the global cosine threshold. Hybrid
+# lexical+semantic beats either alone (arXiv:1805.11611). Pure functions.
+
+def _shingles(text: str, k: int = 2) -> set[str]:
+    words = [w for w in re.split(r"[^א-ת0-9]+", normalize_text(text)) if w]
+    if len(words) < k:
+        return {" ".join(words)} if words else set()
+    return {" ".join(words[i : i + k]) for i in range(len(words) - k + 1)}
+
+
+def jaccard_shingles(a: str, b: str, k: int = 2) -> float:
+    sa, sb = _shingles(a, k), _shingles(b, k)
+    if not sa or not sb:
+        return 0.0
+    return len(sa & sb) / len(sa | sb)
+
+
+def normalized_levenshtein(a: str, b: str) -> float:
+    """1.0 == identical, 0.0 == fully different (edit distance / max len)."""
+    a, b = normalize_text(a), normalize_text(b)
+    if not a and not b:
+        return 1.0
+    if not a or not b:
+        return 0.0
+    # classic DP edit distance (rule_statements are short — a few hundred chars)
+    prev = list(range(len(b) + 1))
+    for i, ca in enumerate(a, 1):
+        cur = [i]
+        for j, cb in enumerate(b, 1):
+            cur.append(min(prev[j] + 1, cur[j - 1] + 1, prev[j - 1] + (ca != cb)))
+        prev = cur
+    return 1.0 - prev[-1] / max(len(a), len(b))
+
+
+_LEX_JACCARD_MIN = 0.55
+_LEX_LEVENSHTEIN_MIN = 0.70
+
+
+def lexical_near_duplicate(
+    a: str, b: str, jaccard_min: float = _LEX_JACCARD_MIN,
+    levenshtein_min: float = _LEX_LEVENSHTEIN_MIN,
+) -> bool:
+    """High lexical overlap → likely the same rule reworded (for the cosine tail)."""
+    return (jaccard_shingles(a, b) >= jaccard_min
+            or normalized_levenshtein(a, b) >= levenshtein_min)
+
+
 # ── Aggregate ──

 FLAG_NON_DECISION = "non_decision"
@@ -135,6 +220,8 @@ FLAG_TRUNCATED_QUOTE = "truncated_quote"
 FLAG_THIN_RESTATEMENT = "thin_restatement"
 FLAG_QUOTE_UNVERIFIED = "quote_unverified"
 FLAG_NLI_UNSUPPORTED = "nli_unsupported"  # rule not entailed by its quote (#81.3)
+FLAG_APPLICATION = "application"           # fact-dependent, not a holding (#81.4)
+FLAG_NEAR_DUPLICATE = "near_duplicate"     # cosine-tail lexical dup (#82.3)


 # ── NLI entailment check (rule_statement ⊨ supporting_quote) — #81.3 ──
@@ -250,6 +337,7 @@ def compute_quality_flags(
    supporting_quote: str,
    reasoning_summary: str = "",
    quote_verified: bool = True,
+    rule_type: str = "binding",
 ) -> list[str]:
    """Return the list of quality flags for one halacha (empty == clean).

@@ -264,4 +352,9 @@ def compute_quality_flags(
        flags.append(FLAG_THIN_RESTATEMENT)
    if not quote_verified:
        flags.append(FLAG_QUOTE_UNVERIFIED)
+    # #81.4 — an application (fact-dependent) item is an illustration, not a
+    # generalizable holding: never auto-approve it. Trust the model's
+    # rule_type='application' and add a high-precision deixis catch.
+    if rule_type == "application" or is_fact_dependent(rule_statement):
+        flags.append(FLAG_APPLICATION)
    return flags
--- a/mcp-server/src/legal_mcp/services/ingest.py
+++ b/mcp-server/src/legal_mcp/services/ingest.py
@@ -158,9 +158,14 @@ async def ingest_document(
        except Exception as e:
            await progress("failed", 100, f"כשל בחילוץ טקסט: {e}")
            raise
-        raw_text = extractor.strip_nevo_preamble((raw_text or "")).strip()
+        raw_text = (raw_text or "")
    else:
-        raw_text = (text or "").strip()
+        raw_text = (text or "")
+    # Capture the Nevo מיני-רציו (editorial holdings summary) BEFORE stripping
+    # it out — it is a free professional gold-set for benchmarking halacha
+    # extraction (#86.3). Stored on the case_law row below once we have its id.
+    nevo_ratio = extractor.extract_nevo_ratio(raw_text)
+    raw_text = extractor.strip_nevo_preamble(raw_text).strip()
    if not raw_text:
        await progress("failed", 100, "לא נמצא טקסט בקובץ")
        raise ValueError("no extractable text in file")
@@ -180,6 +185,13 @@ async def ingest_document(
    )
    case_law_id = UUID(str(record["id"]))

+    # Persist the captured mini-ratio (best-effort; never block ingest on it).
+    if nevo_ratio:
+        try:
+            await db.update_case_law(case_law_id, nevo_ratio=nevo_ratio)
+        except Exception as e:  # noqa: BLE001 — additive metadata, non-fatal
+            logger.warning("could not store nevo_ratio for %s: %s", case_law_id, e)
+
    try:
        stored_chunks = await _chunk_embed_store(case_law_id, raw_text, page_offsets, page_count, progress)
        await db.mark_indexed(case_law_id)
--- a/mcp-server/src/legal_mcp/services/metrics.py
+++ b/mcp-server/src/legal_mcp/services/metrics.py
@@ -117,12 +117,33 @@ async def halacha_backlog(conn) -> dict:
    oldest = await conn.fetchval(
        "SELECT MIN(created_at) FROM halachot WHERE review_status = 'pending_review'"
    )
+    # #84.7 — split the pending bucket: how many are genuine candidates (clean)
+    # vs flagged 'needs extraction fix', and the breakdown by flag, so the chair
+    # sees how much of the backlog is real review vs extraction noise.
+    pending_clean = await conn.fetchval(
+        "SELECT COUNT(*) FROM halachot WHERE review_status = 'pending_review' "
+        "AND COALESCE(array_length(quality_flags, 1), 0) = 0"
+    )
+    flag_rows = await conn.fetch(
+        "SELECT flag, COUNT(*) AS n FROM ("
+        "  SELECT unnest(quality_flags) AS flag FROM halachot "
+        "  WHERE review_status = 'pending_review'"
+        ") t GROUP BY flag ORDER BY n DESC"
+    )
+    pending_total = counts.get("pending_review", 0)
+    reviewed = counts.get("approved", 0) + counts.get("rejected", 0) + counts.get("published", 0)
    return {
-        "pending_review": counts.get("pending_review", 0),
+        "pending_review": pending_total,
+        "pending_clean": pending_clean,           # real review candidates (#84.1)
+        "pending_flagged": pending_total - pending_clean,  # needs-fix bucket
        "approved": counts.get("approved", 0),
        "rejected": counts.get("rejected", 0),
+        "deferred": counts.get("deferred", 0),
        "published": counts.get("published", 0),
        "total": sum(counts.values()),
+        "reviewed_total": reviewed,
+        "approve_ratio": round(counts.get("approved", 0) / reviewed, 3) if reviewed else None,
+        "pending_by_flag": {r["flag"]: r["n"] for r in flag_rows},
        "oldest_pending_at": oldest.isoformat() if oldest else None,
    }

--- a/mcp-server/src/legal_mcp/services/qa_validator.py
+++ b/mcp-server/src/legal_mcp/services/qa_validator.py
@@ -104,7 +104,7 @@ CLAIMS_CHECK_PROMPT = """אתה בודק איכות החלטות משפטיות.
 """


-async def check_claims_coverage(blocks: list[dict], claims: list[dict]) -> dict:
+async def check_claims_coverage(blocks: list[dict], claims: list[dict], outcome: str = "") -> dict:
    """בדיקה סמנטית (Claude) שכל טענה נענתה בדיון."""
    yod = next((b for b in blocks if b["block_id"] == "block-yod"), None)
    if not yod or not yod.get("content"):
@@ -114,16 +114,26 @@ async def check_claims_coverage(blocks: list[dict], claims: list[dict]) -> dict:
    if not claims:
        return {"name": "claims_coverage", "passed": True, "errors": [], "severity": "critical"}

-    # Filter: only APPELLANT claims from original pleadings.
-    # Committee/permit_applicant claims are defensive positions, not claims
-    # that need to be "addressed" in the discussion.
+    # #87/GAP-87 — only the appellant's claims from the APPEAL PLEADING itself
+    # must be addressed. claim_type: 'claim'=כתב ערר (mandatory), 'response'=כתב
+    # תשובה, 'reply'=תגובה/השלמת-טיעון/תכתובת (supplementary correspondence — NOT
+    # a standalone duty to answer, especially on full acceptance). Counting reply/
+    # correspondence claims as "unanswered" produced false QA fails (1033-25).
    source_claims = [
        c for c in claims
        if c.get("source_document", "") != "block-zayin"
+        and c.get("claim_type") == "claim"
+        and c.get("party_role") == "appellant"
+    ]
+    if not source_claims:
+        # Fallback: appellant/respondent pleadings, excluding supplementary replies.
+        source_claims = [
+            c for c in claims
+            if c.get("source_document", "") != "block-zayin"
+            and c.get("claim_type") != "reply"
            and c.get("party_role") in ("appellant", "respondent")
        ]
    if not source_claims:
-        # Fallback: all non-block-zayin claims
        source_claims = [c for c in claims if c.get("source_document", "") != "block-zayin"]
    if not source_claims:
        source_claims = claims
@@ -165,9 +175,14 @@ async def check_claims_coverage(blocks: list[dict], claims: list[dict]) -> dict:
    total = len(source_claims)
    covered = len(addressed) + len(partial)

+    # On full acceptance the appellant prevailed in full — not every sub-claim
+    # needs individual treatment (the chair noted this for correspondence claims,
+    # 1033-25). Relax the missing-tolerance accordingly.
+    allowed_missing_ratio = 0.4 if outcome == "full_acceptance" else 0.2
+
    return {
        "name": "claims_coverage",
-        "passed": len(missing) <= total * 0.2,  # Allow up to 20% missing
+        "passed": len(missing) <= total * allowed_missing_ratio,
        "errors": errors,
        "severity": "critical",
        "details": f"{covered}/{total} טענות נענו ({covered/total*100:.0f}%), {len(partial)} חלקית, {len(missing)} חסרות",
@@ -361,8 +376,10 @@ async def validate_decision(case_id: UUID) -> dict:
    # Get claims
    claims = await db.get_claims(case_id)

-    # Determine appeal type
+    # Determine appeal type + outcome (outcome relaxes claims coverage on full acceptance — #87)
    appeal_type = case.get("appeal_type", "licensing")
+    from legal_mcp.services.lessons import canonical_outcome
+    outcome = canonical_outcome(decision.get("outcome", "") or "")

    # Run all checks
    # Run sync checks
@@ -370,7 +387,7 @@ async def validate_decision(case_id: UUID) -> dict:
        check_neutral_background(blocks),
    ]
    # Async check: claims coverage with Claude
-    results.append(await check_claims_coverage(blocks, claims))
+    results.append(await check_claims_coverage(blocks, claims, outcome))
    # More sync checks
    results.extend([
        check_weight_compliance(blocks, appeal_type),
--- a/mcp-server/src/legal_mcp/services/style_distance.py
+++ b/mcp-server/src/legal_mcp/services/style_distance.py
@@ -27,6 +27,62 @@ _BLOCK_TO_SECTION = {
    "block-yod-alef": "summary",
 }

+# chunker section_type → golden-ratio section (for corpus measurement, T10)
+_CHUNK_SECTION_TO_GOLDEN = {
+    "facts": "background", "intro": "background",
+    "appellant_claims": "claims", "respondent_claims": "claims",
+    "legal_analysis": "discussion",
+    "conclusion": "summary", "ruling": "summary",
+}
+
+_CORPUS_RATIOS_CACHE: dict | None = None
+
+
+async def measure_corpus_ratios() -> dict:
+    """Measure ACTUAL section %-of-total from Dafna's style_corpus, averaged per
+    outcome — the empirical counterpart to lessons.GOLDEN_RATIOS (T10). Splits each
+    decision via chunker (accurate, not the filtered exemplars). Cached for the
+    process. Returns {outcome: {"n": int, "sections": {sec: pct}}}."""
+    global _CORPUS_RATIOS_CACHE
+    if _CORPUS_RATIOS_CACHE is not None:
+        return _CORPUS_RATIOS_CACHE
+
+    from legal_mcp.services.chunker import _split_into_sections
+    pool = await db.get_pool()
+    async with pool.acquire() as conn:
+        rows = await conn.fetch("SELECT full_text, outcome FROM style_corpus WHERE full_text <> ''")
+
+    # Per-outcome AND an "_all" aggregate. style_corpus.outcome is currently
+    # unpopulated for the imported corpus, so per-outcome may be empty — "_all"
+    # is the meaningful signal today, and per-outcome becomes live once outcomes
+    # are backfilled. No silent loss: callers see which buckets have data via n.
+    by_outcome: dict[str, list[dict]] = {}
+    for r in rows:
+        sect_words: dict[str, int] = {}
+        for stype, stext in _split_into_sections(r["full_text"]):
+            g = _CHUNK_SECTION_TO_GOLDEN.get(stype)
+            if g:
+                sect_words[g] = sect_words.get(g, 0) + len(stext.split())
+        total = sum(sect_words.values())
+        if total < 100:  # sections didn't parse — skip
+            continue
+        pct = {s: w / total * 100 for s, w in sect_words.items()}
+        by_outcome.setdefault("_all", []).append(pct)
+        outcome = canonical_outcome(r["outcome"] or "")
+        if outcome:
+            by_outcome.setdefault(outcome, []).append(pct)
+
+    result: dict = {}
+    for outcome, decs in by_outcome.items():
+        avg = {}
+        for sec in ("background", "claims", "discussion", "summary"):
+            vals = [d.get(sec, 0.0) for d in decs]
+            if vals:
+                avg[sec] = round(sum(vals) / len(vals), 1)
+        result[outcome] = {"n": len(decs), "sections": avg}
+    _CORPUS_RATIOS_CACHE = result
+    return result
+

 def count_anti_patterns(text: str) -> dict:
    """Count each anti-pattern occurrence in text. Lower = closer to Dafna."""
--- a/mcp-server/src/legal_mcp/tools/drafting.py
+++ b/mcp-server/src/legal_mcp/tools/drafting.py
@@ -170,6 +170,41 @@ async def get_style_guide() -> str:
    )
    result += "\n"

+    # T10 — measured-from-corpus ratios alongside the targets, ⚠️ flags a gap
+    # (actual average outside the target range → revisit the target or the corpus).
+    try:
+        from legal_mcp.services.style_distance import measure_corpus_ratios
+        measured = await measure_corpus_ratios()
+        if measured:
+            result += "### נמדד מהקורפוס בפועל (ממוצע) — ⚠️ = פער מהיעד\n\n"
+            result += "| קבוצה | רקע | טענות | דיון | סיכום |\n|---|------|-------|------|-------|\n"
+            # Per-outcome rows (flagged vs that outcome's target), when outcomes exist.
+            for outcome in VALID_OUTCOMES:
+                m = measured.get(outcome)
+                if not m:
+                    continue
+                tgt = GOLDEN_RATIOS[outcome]
+                cells = []
+                for sec in ("background", "claims", "discussion", "summary"):
+                    val = m["sections"].get(sec)
+                    if val is None:
+                        cells.append("—")
+                        continue
+                    lo, hi = tgt[sec]
+                    cells.append(f"{val}%" + ("" if lo <= val <= hi else " ⚠️"))
+                result += f"| {outcome_labels[outcome]} (n={m['n']}) | " + " | ".join(cells) + " |\n"
+            # "_all" aggregate — the meaningful row today (corpus outcome unpopulated);
+            # shown informationally (no single target to flag against).
+            allm = measured.get("_all")
+            if allm:
+                cells = [f"{allm['sections'].get(s, '—')}%" if allm['sections'].get(s) is not None else "—"
+                         for s in ("background", "claims", "discussion", "summary")]
+                result += f"| כל ההחלטות (n={allm['n']}) | " + " | ".join(cells) + " |\n"
+            result += ("\n_⚠️ = הממוצע בפועל חורג מטווח-היעד; שקול לעדכן יעד ב-/methodology או לבדוק את הקורפוס. "
+                       "פיצול לפי-תוצאה יופיע כש-`style_corpus.outcome` יאוכלס._\n\n")
+    except Exception as e:  # surfaced, not swallowed
+        result += f"_מדידת יחסי-זהב מהקורפוס נכשלה: {e}_\n\n"
+
    # Opening and summary strategies
    result += "## אסטרטגיות פתיחה וסיכום לפי תוצאה\n\n"
    for outcome in VALID_OUTCOMES:
--- a/mcp-server/src/legal_mcp/tools/precedent_library.py
+++ b/mcp-server/src/legal_mcp/tools/precedent_library.py
@@ -356,7 +356,22 @@ async def halacha_review(
    return _ok(row)


-async def halachot_pending(limit: int = 100) -> str:
-    """תור ההלכות הממתינות לאישור (review_status='pending_review')."""
-    rows = await db.list_halachot(review_status="pending_review", limit=limit)
+async def halachot_pending(limit: int = 100, include_low_quality: bool = False) -> str:
+    """תור ההלכות הממתינות לאישור (review_status='pending_review').
+
+    כברירת-מחדל (#84.1, #84.3) התור **מסונן** — הלכות עם דגל-איכות כלשהו
+    (application / ציטוט-לא-מאומת / קטוע / obiter / restatement דק / לא-נתמך /
+    near-duplicate) מוסתרות (הן שייכות ל'דורש תיקון-חילוץ', לא לתור-האישור),
+    ו**ממוין לפי עדיפות** (טופלו-לרעה תחילה, אז הכי לא-ודאיים, אז הישנים).
+
+    Args:
+        limit: מספר מקסימלי.
+        include_low_quality: True כדי לחשוף גם פריטים מסומני-איכות (בקט 'דורש תיקון').
+    """
+    rows = await db.list_halachot(
+        review_status="pending_review",
+        limit=limit,
+        exclude_low_quality=not include_low_quality,
+        order_by_priority=True,
+    )
    return _ok(rows)
--- a/mcp-server/tests/test_claude_session.py
+++ b/mcp-server/tests/test_claude_session.py
@@ -0,0 +1,44 @@
+from __future__ import annotations
+
+import os
+
+from legal_mcp.services import claude_session as cs
+
+
+def test_clean_env_strips_session_markers(monkeypatch):
+    """Nested claude -p must not inherit the parent session markers (#85)."""
+    for k in (
+        "CLAUDECODE",
+        "CLAUDE_CODE_ENTRYPOINT",
+        "CLAUDE_CODE_SESSION_ID",
+        "CLAUDE_CODE_EXECPATH",
+        "CLAUDE_CODE_SSE_PORT",
+        "CLAUDE_AGENT_SDK_VERSION",
+        "AI_AGENT",
+        "CLAUDE_EFFORT",
+    ):
+        monkeypatch.setenv(k, "x")
+
+    env = cs._clean_subprocess_env()
+
+    assert "CLAUDECODE" not in env
+    assert "AI_AGENT" not in env
+    assert "CLAUDE_EFFORT" not in env
+    assert not any(k.startswith("CLAUDE_CODE_") for k in env)
+    assert not any(k.startswith("CLAUDE_AGENT_") for k in env)
+
+
+def test_clean_env_keeps_auth_and_path(monkeypatch):
+    """Auth/config + PATH/HOME must survive — they are needed by the CLI."""
+    monkeypatch.setenv("CLAUDECODE", "1")
+    monkeypatch.setenv("CLAUDE_CONFIG_DIR", "/home/chaim/.claude")
+    monkeypatch.setenv("ANTHROPIC_BASE_URL", "https://example")
+    monkeypatch.setenv("PATH", os.environ.get("PATH", "/usr/bin"))
+
+    env = cs._clean_subprocess_env()
+
+    # CLAUDE_CONFIG_DIR carries credentials — must NOT be stripped.
+    assert env.get("CLAUDE_CONFIG_DIR") == "/home/chaim/.claude"
+    assert env.get("ANTHROPIC_BASE_URL") == "https://example"
+    assert "PATH" in env
+    assert "CLAUDECODE" not in env
--- a/mcp-server/tests/test_halacha_quality.py
+++ b/mcp-server/tests/test_halacha_quality.py
@@ -181,3 +181,75 @@ def test_consolidation_priority_prefers_approved_then_confidence():
                  "quote_verified": True, "rule_statement": "x"}
    # approved sorts before higher-confidence pending → kept as canonical
    assert min([approved, pending_hi], key=he._consolidation_priority)["id"] == "a"
+
+
+# ── #81.4 fact-dependent / application ──
+
+@pytest.mark.parametrize("rule", [
+    "במקרה דנן ועדת הערר קבעה כי ההיתר בטל",
+    "בענייננו אין הצדקה לפיצוי",
+    "בערר שלפנינו הוכח כי השומה שגויה",
+])
+def test_is_fact_dependent_hits(rule):
+    assert hq.is_fact_dependent(rule) is True
+
+
+@pytest.mark.parametrize("rule", [
+    "ועדת הערר מוסמכת לדון בהיטל השבחה",
+    "נטל ההוכחה מוטל על המבקש",
+    "פגיעה תכנונית מזכה בפיצוי לפי סעיף 197",
+])
+def test_is_fact_dependent_misses(rule):
+    assert hq.is_fact_dependent(rule) is False
+
+
+def test_application_flag_from_rule_type():
+    flags = hq.compute_quality_flags(
+        "נטל ההוכחה על המבקש", "נטל ההוכחה על המבקש כאמור",
+        rule_type="application",
+    )
+    assert hq.FLAG_APPLICATION in flags
+
+
+def test_application_flag_from_deixis_even_if_binding():
+    flags = hq.compute_quality_flags(
+        "במקרה דנן נדחה הערר", "כפי שקבענו במקרה דנן נדחה הערר",
+        rule_type="binding",
+    )
+    assert hq.FLAG_APPLICATION in flags
+
+
+def test_clean_binding_rule_has_no_flags():
+    flags = hq.compute_quality_flags(
+        "ועדת הערר מוסמכת לדון בטענות חוקתיות הנוגעות לתכנית",
+        "הוועדה מוסמכת לדון אף בטענות מסוג זה, ככל שהן נוגעות לתכנית שבנדון.",
+        rule_type="binding",
+    )
+    assert flags == []
+
+
+# ── #82.3 lexical near-duplicate signal ──
+
+def test_jaccard_high_for_reworded_same_rule():
+    a = "נטל ההוכחה בהיטל השבחה מוטל על הוועדה המקומית"
+    b = "נטל ההוכחה בהיטל השבחה מוטל על הוועדה המקומית בלבד"
+    assert hq.jaccard_shingles(a, b) >= 0.5
+
+
+def test_jaccard_low_for_distinct_rules():
+    a = "ועדת הערר מוסמכת לדון בהיטל השבחה"
+    b = "המועד להגשת ערר הוא שלושים יום"
+    assert hq.jaccard_shingles(a, b) < 0.2
+
+
+def test_normalized_levenshtein_identical_and_disjoint():
+    assert hq.normalized_levenshtein("אבג", "אבג") == 1.0
+    assert hq.normalized_levenshtein("", "אבג") == 0.0
+
+
+def test_lexical_near_duplicate_band():
+    a = "נטל ההוכחה בהיטל השבחה מוטל על הוועדה המקומית"
+    b = "נטל ההוכחה בהיטל השבחה מוטל על הוועדה המקומית, כך נפסק"
+    assert hq.lexical_near_duplicate(a, b) is True
+    c = "המועד להגשת ערר על שומה הוא שלושים ימים"
+    assert hq.lexical_near_duplicate(a, c) is False
--- a/mcp-server/tests/test_nevo_preamble.py
+++ b/mcp-server/tests/test_nevo_preamble.py
@@ -55,3 +55,64 @@ def test_markers_past_400_chars_still_detected():
    text = header + _PREAMBLE + "השופטת ע' ארבל:\n\nגוף ההחלטה..."
    out = ex.strip_nevo_preamble(text)
    assert out.startswith("השופטת ע' ארבל:")
+
+
+# ── extract_nevo_ratio (#86.3 gold-set capture) ──
+
+def test_extract_ratio_returns_block_before_body():
+    text = _PREAMBLE + "השופט ס' ג'ובראן:\n\nגוף ההחלטה..."
+    ratio = ex.extract_nevo_ratio(text)
+    assert "העותרים לא הוכיחו טעם מיוחד" in ratio
+    assert "המחוקק הגביל את הזמן" in ratio
+    # must not bleed into the judgment body
+    assert "גוף ההחלטה" not in ratio
+    assert "השופט ס' ג'ובראן" not in ratio
+
+
+def test_extract_ratio_stops_at_following_marker():
+    # ratio first, then a bibliography marker AFTER it
+    text = (
+        "מיני-רציו:\n* עיקרון אחד בלבד.\n\n"
+        "פסקי דין שאוזכרו:\nבג\"ץ 1/00\n\n"
+        "פסק-דין\nגוף..."
+    )
+    ratio = ex.extract_nevo_ratio(text)
+    assert "עיקרון אחד בלבד" in ratio
+    assert "פסקי דין שאוזכרו" not in ratio
+    assert "בג\"ץ 1/00" not in ratio
+
+
+def test_extract_ratio_empty_when_no_marker():
+    assert ex.extract_nevo_ratio("פסק דין\nהשופט כהן: ...") == ""
+    assert ex.extract_nevo_ratio("") == ""
+
+
+# ── #86.2 over-strip regressions ──
+
+def test_citation_judge_line_is_not_a_decision_start():
+    # "השופט מ' חשין, פסקה 23" is a CITATION (comma, no colon) — must NOT be
+    # treated as the decision opening, or 32K of real body gets stripped.
+    body = (
+        "**פסק דין**\n\n"
+        "שני ערעורים לפניי. כפי שנפסק מפי כבוד \n\n"
+        "השופט מ' חשין, פסקה 23 (להלן עניין קהתי), יש לבחון...\n"
+    )
+    text = _PREAMBLE + body
+    out = ex.strip_nevo_preamble(text)
+    assert out.startswith("**פסק דין**")
+    assert "השופט מ' חשין, פסקה" in out  # citation kept inside body
+    assert "מיני-רציו" not in out
+
+
+def test_markdown_wrapped_pdin_header_is_stripped():
+    text = _PREAMBLE + "**פסק  דין**\n\nשני ערעוריה הנדונים..."
+    out = ex.strip_nevo_preamble(text)
+    assert out.startswith("**פסק  דין**")
+    assert "מיני-רציו" not in out
+
+
+def test_author_line_with_colon_still_strips():
+    text = _PREAMBLE + "כב' השופטת ד' ברק-ארז:\n\nגוף ההחלטה..."
+    out = ex.strip_nevo_preamble(text)
+    assert out.startswith("כב' השופטת ד' ברק-ארז:")
+    assert "מיני-רציו" not in out
--- a/scripts/SCRIPTS.md
+++ b/scripts/SCRIPTS.md
@@ -36,6 +36,11 @@
 | `multimodal_backfill.py` | python | Backfill voyage-multimodal-3 page embeddings על מסמכי תיקים קיימים. idempotent (skips by default), forces `MULTIMODAL_ENABLED=true` ל-run, רץ מהקונטיינר. שלב C — ראה `docs/voyage-upgrades-plan.md` | ידני per-case (`python multimodal_backfill.py 8174-24 8137-24`) |
 | `backfill_chunk_pages.py` | python | Backfill `page_number` ב-`document_chunks` קיימים. legacy chunker לא tracked עמודים → `page_number=NULL` חוסם boost של multimodal hybrid (text+image join על אותו עמוד). re-extracts כל PDF (re-OCR אם צריך, ~$0.0015/page), מחשב page_offsets, ומעדכן chunks. idempotent | ידני per-case (`python backfill_chunk_pages.py 8174-24 8137-24`) |
 | `rechunk_legacy_precedents.py` | python | **#57** — re-chunk + re-embed פסיקה שהוטמעה לפני תיקון ה-chunker (#55). בוחר כל `case_law` עם chunk זעיר (`length(trim(content))<50` — טביעת-האצבע של ה-chunker הישן) ומריץ `ingest.reindex_case_law` (re-chunk+re-embed מ-`full_text` שמור בלבד — ללא re-OCR/LLM, feedback_no_reocr_retrofit; idempotent DELETE-then-INSERT). idempotent ברמת-הבאטץ' (שואב מחדש את הסט המושפע בכל ריצה). דגל `--limit N`. רץ עם venv של mcp-server (`cd mcp-server && .venv/bin/python ../scripts/rechunk_legacy_precedents.py`) | חד-פעמי — מיגרציית-נתונים של פסיקה legacy (תוקן 2026-06-03) |
+| `backfill_nevo_preamble.py` | python | **#86.2** — מיגרציית-נתונים: חיתוך preamble/רציו של נבו שדלף לפסיקה שהוטמעה לפני תיקון #86.1. מאתר כל `case_law` ש-`strip_nevo_preamble(full_text)` עדיין מקצר (דליפה היסטורית), ומבצע: (1) לכידת ה-מיני-רציו ל-`case_law.nevo_ratio` (gold-set ל-#86.3); (2) שכתוב `full_text` החתוך + חישוב-מחדש של `content_hash`; (3) `reindex_case_law` (re-chunk+embed, ללא re-OCR/LLM); (4) **סימון (לא מחיקה)** הלכות ש-`supporting_quote` שלהן בתוך ה-preamble שהוסר → `pending_review` + quality_flag `nevo_preamble_leak`. **שומר-בטיחות:** שורות עם keep%<`--min-keep` (ברירת-מחדל 60) מוחרגות מ-`--apply` כחשד over-strip (אלא אם `--include-suspicious`). **dry-run כברירת-מחדל**; `--apply` כותב backup JSON + manifest CSV ל-`data/audit/` תחילה. idempotent. רץ עם venv של mcp-server. **chair-gated** (לאמת manifest לפני apply) | מיגרציית-נתונים — dry-run בוצע (19 פסקים, 27 הלכות מזוהמות); apply ממתין לאישור |
+| `nevo_ratio_benchmark.py` | python | **#86.3** — מדידת איכות חילוץ-הלכות מול ה-מיני-רציו של נבו (gold-set מקצועי חינמי). לכל פסק עם `nevo_ratio` (או נגזר מ-`full_text` אם טרם בוצע backfill): LLM-judge מקומי (`claude_session`, אפס עלות) ממפה סמנטית את הלכות-המערכת מול הלכות-נבו ומפיק **recall** (כיסוי הלכות-נבו), **precision** (אחוז הלכותינו הממופות), **granularity** (יחס פירוק — איתות over-extraction ל-#81.5). `--case <num>` / `--all [--limit N]` / `--model` / `--out`. כותב CSV ל-`data/audit/`. רץ עם venv של mcp-server (דורש Claude CLI מקומי). אומת על בג"ץ 1764/05: recall 0.875, precision 1.0, granularity 1.75x | ידני — מדידת-איכות (CI/ad-hoc) |
+| `halacha_goldset.py` | python | **#81.7** — הארנס gold-set לאיכות חילוץ-הלכות. `export --n N` מייצא מדגם מרובד (לפי precedent×rule_type) ל-CSV עם עמודות-תיוג ריקות (`is_holding`/`correct_type`/`quote_complete`) לתיוג ידני (חיים/דפנה). `score --in <csv>` קורא את ה-CSV המתויג ומודד כל ולידטור (`compute_quality_flags`/`is_fact_dependent`/`is_quote_truncated`/`is_thin_restatement`) מול אמת-המידה האנושית: P/R/F1 + confusion. בסיס ל-#81.8 (כיול סף האישור). מייבא את אותם ולידטורים שה-extractor מריץ. רץ עם venv של mcp-server | ידני — export→תיוג→score |
+| `halacha_batch_reconcile.py` | python | **#82.7** — dedup חוצה-פסקים offline (שמרני, **dry-run בלבד**). dedup-on-insert משווה רק תוך-פסק; כאן סף מחמיר (cosine ≥0.95, `--cosine`) ולא-הרסני: מאתר זוגות הלכות near-duplicate בין פסקים שונים (pgvector `<=>` exact) עם איתות לקסיקלי (Jaccard/Levenshtein) ומדווח ל-CSV ב-`data/audit/` לסקירת היו"ר. לא מדלג/ממזג/מוחק. `--include-pending`. רץ עם venv של mcp-server. אומת: 819 הלכות → 5 זוגות מועמדים | ידני — דוח-סקירה |
+| `calibrate_halacha_dedup.py` | python | **#82.1** — כיול ספי ה-dedup הלקסיקלי (#82.3) מול gold-set הניקוי. קורא `halacha-cleanup-manifest-*.csv` (זוגות duplicate↔survivor מתויגי-אדם), טוען טקסט-survivor מה-DB, ו-sweep של (jaccard_min × levenshtein_min) עם P/R/F1, מסמן את נקודת-העבודה המוגדרת. אימת ש-(0.55, 0.70) → **precision 1.0** (אפס false-merge), recall 0.30 — מתאים לאיתות-משני שחוסם auto-approve. `--manifest <path>`. רץ עם venv של mcp-server | חד-פעמי — כיול (בוצע 2026-06-06) |
 | `audit_corpus_integrity.py` | python | בדיקה תקופתית של עקביות הקורפוס — 3 בדיקות SQL read-only על `case_law` ו-`cases`: (A) `external_upload` עם prefix פנימי `ערר`/`בל"מ`; (B) `internal_committee` חסר `chair_name`/`district`; (C) `cases.practice_area` מחוץ ל-{`rishuy_uvniya`, `betterment_levy`, `compensation_197`, `''`}. כותב log מצטבר ל-`data/logs/corpus_integrity_audit.log` ובמצב הפרות שולח wakeup ל-CEO ב-Paperclip (best-effort, רק אם `PAPERCLIP_API_URL`+`PAPERCLIP_API_KEY` מוגדרים). דגל: `--no-notify`. Idempotent, יוצא 0. **Cron יומי 07:00**: `0 7 * * * /home/chaim/legal-ai/mcp-server/.venv/bin/python /home/chaim/legal-ai/scripts/audit_corpus_integrity.py` | `0 7 * * *` (cron) |
 | `backfill_legal_arguments.py` | python | Backfill `legal_arguments` לתיקים עם `claims` קיימים (TaskMaster #36). מקבץ פרופוזיציות גולמיות לטיעונים משפטיים מובחנים (~6-12 לכל צד) דרך `argument_aggregator.aggregate_claims_to_arguments` (Claude CLI). תומך `--dry-run`/`--apply`/`--force`/`--case <num>...`. **חייב לרוץ מהמכונה המקומית** (לא קונטיינר) — `claude_session` דורש Claude CLI | ידני per-case (`python scripts/backfill_legal_arguments.py --apply --case 1017-03-26`) |
 | `upload_blam_decisions.py` | python | חד-פעמי (2026-05-26) — העלאת 2 החלטות בל"מ ל-`case_law` (8126/24 סופר נוח, 8047/23 הרנון) דרך `ingest_internal_decision` ישיר, עוקף MCP server שטרם נטען מחדש אחרי הוספת `proceeding_type`. **לא להריץ שוב** | חד-פעמי — להעביר ל-`.archive/` בהזדמנות |
--- a/scripts/backfill_nevo_preamble.py
+++ b/scripts/backfill_nevo_preamble.py
@@ -0,0 +1,240 @@
+#!/usr/bin/env python3
+"""#86.2 — backfill: strip leaked Nevo preamble/ratio from already-ingested rulings.
+
+Court rulings ingested BEFORE the #86.1 fix kept their Nevo preamble
+(bibliography + מיני-רציו) because the old ``_DECISION_START`` regex only
+matched ועדת-ערר openings, not ``פסק-דין``/judge openings. For those rows the
+preamble is baked into the stored ``full_text`` AND into the chunks — and the
+מיני-רציו (Nevo's editorial answer-key) may have leaked into extracted
+halachot, contaminating the corpus.
+
+This script finds every case_law row whose stored ``full_text`` would still be
+shortened by the CURRENT ``strip_nevo_preamble`` (i.e. a pre-fix leak), and:
+
+  1. captures the מיני-רציו into ``case_law.nevo_ratio`` (gold-set for #86.3),
+     unless that column is already populated;
+  2. rewrites ``full_text`` to the stripped body + recomputes ``content_hash``;
+  3. re-chunks + re-embeds via ``ingest.reindex_case_law`` (no re-OCR, no LLM);
+  4. flags — never deletes — halachot whose supporting_quote lives entirely in
+     the removed preamble region: review_status -> 'pending_review' plus a
+     'nevo_preamble_leak' quality_flag, so the chair can re-judge them (#84).
+
+DRY-RUN BY DEFAULT. ``--apply`` performs the migration and first writes a JSON
+backup + CSV manifest to ``data/audit/`` (per the code-protocol data-migration
+rule). Idempotent: a re-run finds nothing because stripped rows no longer match.
+
+Run with the MCP server venv (config loads ~/.env / Infisical for POSTGRES +
+VOYAGE, same as the live MCP tools):
+
+    cd ~/legal-ai/mcp-server
+    .venv/bin/python ../scripts/backfill_nevo_preamble.py            # dry-run
+    .venv/bin/python ../scripts/backfill_nevo_preamble.py --apply    # migrate
+    .venv/bin/python ../scripts/backfill_nevo_preamble.py --limit 3  # smoke
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import json
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+from legal_mcp.services import db, ingest
+from legal_mcp.services.extractor import extract_nevo_ratio, strip_nevo_preamble
+from legal_mcp.services.halacha_quality import normalize_text
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+AUDIT_DIR = REPO_ROOT / "data" / "audit"
+
+# Safety: a clean strip removes only the Nevo preamble (a small head). If the
+# strip would discard more than this fraction of the document, treat it as a
+# suspected over-strip (a citation/heading false-match) and DO NOT auto-apply
+# — surface it for manual review instead. Destroying real decision body is
+# far worse than leaving a preamble in place.
+DEFAULT_MIN_KEEP_PCT = 60
+
+
+async def _scan(conn, limit: int | None) -> list[dict]:
+    """Return rows whose stored full_text still carries a Nevo preamble."""
+    rows = await conn.fetch(
+        "SELECT id, case_number, full_text, nevo_ratio "
+        "FROM case_law WHERE full_text <> '' ORDER BY case_number"
+    )
+    hits: list[dict] = []
+    for r in rows:
+        full = r["full_text"] or ""
+        stripped = strip_nevo_preamble(full)
+        if stripped == full:
+            continue  # no leak (already clean, or never had a preamble)
+        removed = full[: len(full) - len(stripped)]
+        ratio = extract_nevo_ratio(full)
+        keep_pct = round(100 * len(stripped) / len(full)) if full else 0
+        hits.append({
+            "id": r["id"],
+            "case_number": r["case_number"],
+            "full_text": full,
+            "stripped": stripped,
+            "removed": removed,
+            "ratio": ratio,
+            "keep_pct": keep_pct,
+            "had_ratio_stored": bool((r["nevo_ratio"] or "").strip()),
+        })
+        if limit and len(hits) >= limit:
+            break
+    return hits
+
+
+async def _contaminated_halachot(conn, case_law_id, removed: str) -> list[dict]:
+    """Halachot whose supporting_quote sits entirely inside the removed preamble."""
+    norm_removed = normalize_text(removed)
+    if not norm_removed:
+        return []
+    rows = await conn.fetch(
+        "SELECT id, halacha_index, supporting_quote, review_status, quality_flags "
+        "FROM halachot WHERE case_law_id = $1",
+        case_law_id,
+    )
+    bad = []
+    for r in rows:
+        q = normalize_text(r["supporting_quote"] or "")
+        if len(q) >= 20 and q in norm_removed:
+            bad.append(dict(r))
+    return bad
+
+
+async def main(args: argparse.Namespace) -> int:
+    ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    pool = await db.get_pool()
+    async with pool.acquire() as conn:
+        hits = await _scan(conn, args.limit)
+        for h in hits:
+            h["contaminated"] = await _contaminated_halachot(conn, h["id"], h["removed"])
+
+    # Partition into safe (auto-appliable) vs suspicious (manual review).
+    for h in hits:
+        h["suspicious"] = h["keep_pct"] < args.min_keep
+    safe = [h for h in hits if not h["suspicious"]]
+    suspicious = [h for h in hits if h["suspicious"]]
+
+    n = len(hits)
+    total_contam = sum(len(h["contaminated"]) for h in hits)
+    print(f"leaked rulings found: {n}  (contaminated halachot: {total_contam}; "
+          f"safe: {len(safe)}, suspicious<{args.min_keep}%: {len(suspicious)})", flush=True)
+    for h in hits:
+        print(
+            f"  {'⚠ ' if h['suspicious'] else '  '}{h['case_number']}: "
+            f"keep {h['keep_pct']}%, -{len(h['removed']):,} preamble chars, "
+            f"ratio={len(h['ratio'])} chars, "
+            f"{len(h['contaminated'])} contaminated halachot"
+            + ("" if h["ratio"] else "  [no mini-ratio]")
+            + ("  [ratio already stored]" if h["had_ratio_stored"] else ""),
+            flush=True,
+        )
+    if suspicious:
+        print(f"\n⚠ {len(suspicious)} ruling(s) below {args.min_keep}% keep — "
+              "EXCLUDED from --apply (suspected over-strip). Review manually or "
+              "pass --include-suspicious to force.", flush=True)
+
+    if not hits:
+        print("nothing to backfill — corpus clean ✓", flush=True)
+        return 0
+
+    apply_set = hits if args.include_suspicious else safe
+
+    # Always write a manifest (dry-run included) for the audit trail.
+    AUDIT_DIR.mkdir(parents=True, exist_ok=True)
+    manifest = AUDIT_DIR / f"nevo-backfill-manifest-{ts}.csv"
+    with manifest.open("w", encoding="utf-8", newline="") as f:
+        w = csv.writer(f)
+        w.writerow(["case_law_id", "case_number", "keep_pct", "preamble_chars",
+                    "ratio_chars", "contaminated_halachot", "suspicious", "applied"])
+        for h in hits:
+            will_apply = args.apply and (not h["suspicious"] or args.include_suspicious)
+            w.writerow([h["id"], h["case_number"], h["keep_pct"], len(h["removed"]),
+                        len(h["ratio"]), len(h["contaminated"]), h["suspicious"], will_apply])
+    print(f"manifest: {manifest}", flush=True)
+
+    if not args.apply:
+        print("\nDRY-RUN — no changes written. Re-run with --apply to migrate.", flush=True)
+        return 0
+
+    # Backup the BEFORE state before mutating anything.
+    backup = AUDIT_DIR / f"nevo-backfill-backup-{ts}.json"
+    with backup.open("w", encoding="utf-8") as f:
+        json.dump([
+            {
+                "id": str(h["id"]),
+                "case_number": h["case_number"],
+                "full_text": h["full_text"],
+                "ratio": h["ratio"],
+                "contaminated": [
+                    {"id": str(c["id"]), "halacha_index": c["halacha_index"],
+                     "review_status": c["review_status"],
+                     "quality_flags": list(c["quality_flags"] or [])}
+                    for c in h["contaminated"]
+                ],
+            }
+            for h in apply_set
+        ], f, ensure_ascii=False, indent=2)
+    print(f"backup: {backup}", flush=True)
+
+    n_apply = len(apply_set)
+    ok, failed = 0, []
+    for i, h in enumerate(apply_set, 1):
+        cid, cn = h["id"], h["case_number"]
+        try:
+            async with pool.acquire() as conn:
+                async with conn.transaction():
+                    # 1+2: rewrite full_text + content_hash; store ratio if absent.
+                    await conn.execute(
+                        "UPDATE case_law SET full_text = $2, content_hash = $3 WHERE id = $1",
+                        cid, h["stripped"], db._content_hash(h["stripped"]),
+                    )
+                    if h["ratio"] and not h["had_ratio_stored"]:
+                        await conn.execute(
+                            "UPDATE case_law SET nevo_ratio = $2 WHERE id = $1",
+                            cid, h["ratio"],
+                        )
+                    # 4: flag (never delete) contaminated halachot.
+                    for c in h["contaminated"]:
+                        flags = list(c["quality_flags"] or [])
+                        if "nevo_preamble_leak" not in flags:
+                            flags.append("nevo_preamble_leak")
+                        await conn.execute(
+                            "UPDATE halachot SET review_status = 'pending_review', "
+                            "quality_flags = $2 WHERE id = $1",
+                            c["id"], flags,
+                        )
+            # 3: reindex outside the txn (its own DELETE-then-INSERT + embeddings).
+            res = await ingest.reindex_case_law(cid)
+            ok += 1
+            print(f"[{i}/{n_apply}] OK  {cn}: -> {res['chunks']} chunks, "
+                  f"{len(h['contaminated'])} halachot flagged", flush=True)
+        except Exception as e:  # noqa: BLE001 — per-row, keep going
+            failed.append((cn, str(e)))
+            print(f"[{i}/{n_apply}] FAIL {cn}: {e}", flush=True)
+
+    print(f"\nDONE — {ok}/{n_apply} migrated, {len(failed)} failed"
+          + (f", {len(suspicious)} suspicious skipped" if suspicious and not args.include_suspicious else ""),
+          flush=True)
+    for cn, e in failed:
+        print(f"  FAILED {cn}: {e}", flush=True)
+    return 0 if not failed else 1
+
+
+if __name__ == "__main__":
+    ap = argparse.ArgumentParser(description=__doc__,
+                                 formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--apply", action="store_true",
+                    help="perform the migration (default: dry-run)")
+    ap.add_argument("--limit", type=int, default=None,
+                    help="process only the first N leaked rulings")
+    ap.add_argument("--min-keep", type=int, default=DEFAULT_MIN_KEEP_PCT,
+                    help=f"min%% of doc that must remain after strip to auto-apply "
+                         f"(default {DEFAULT_MIN_KEEP_PCT}); lower = suspected over-strip")
+    ap.add_argument("--include-suspicious", action="store_true",
+                    help="force --apply on rows below --min-keep (use with care)")
+    args = ap.parse_args()
+    sys.exit(asyncio.run(main(args)))
--- a/scripts/calibrate_halacha_dedup.py
+++ b/scripts/calibrate_halacha_dedup.py
@@ -0,0 +1,115 @@
+#!/usr/bin/env python3
+"""#82.1 — calibrate the lexical dedup thresholds against the cleanup gold-set.
+
+The 2026-06-03 cleanup manifest (data/audit/halacha-cleanup-manifest-*.csv)
+records, for each removed halacha, a ``reason`` and a ``survivor_id`` — i.e. a
+human-labeled set of TRUE duplicate pairs (deleted rule ↔ its survivor). This
+script uses them to validate the lexical near-duplicate thresholds introduced
+in #82.3 (``HALACHA`` Jaccard/Levenshtein), so the numbers in
+``halacha_quality.lexical_near_duplicate`` are calibrated, not guessed.
+
+It sweeps (jaccard_min × levenshtein_min) and reports precision/recall against:
+  * positives — duplicate-labeled pairs (deleted rule ↔ survivor rule)
+  * negatives — random non-paired rules from the same manifest (≈all distinct)
+
+and marks the currently-configured operating point.
+
+    cd ~/legal-ai/mcp-server
+    .venv/bin/python ../scripts/calibrate_halacha_dedup.py \
+        --manifest ../data/audit/halacha-cleanup-manifest-20260603T101747Z.csv
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import sys
+from pathlib import Path
+from uuid import UUID
+
+from legal_mcp.services import db, halacha_quality as hq
+
+
+async def _survivor_text(survivor_id: str, manifest_map: dict) -> str:
+    if survivor_id in manifest_map:
+        return manifest_map[survivor_id]
+    try:
+        row = await db.get_halacha(UUID(survivor_id)) if hasattr(db, "get_halacha") else None
+    except Exception:
+        row = None
+    if row:
+        return row.get("rule_statement", "")
+    # fallback: direct query
+    try:
+        pool = await db.get_pool()
+        r = await pool.fetchrow("SELECT rule_statement FROM halachot WHERE id = $1", UUID(survivor_id))
+        return r["rule_statement"] if r else ""
+    except Exception:
+        return ""
+
+
+async def main(args: argparse.Namespace) -> int:
+    path = Path(args.manifest)
+    if not path.is_absolute():
+        path = (Path.cwd() / path).resolve()
+    with path.open(encoding="utf-8") as f:
+        rows = list(csv.DictReader(f))
+    by_id = {r["id"]: r.get("rule_statement", "") for r in rows}
+
+    positives: list[tuple[str, str]] = []
+    for r in rows:
+        if "duplicate" in (r.get("reason") or "").lower() and r.get("survivor_id"):
+            a = r.get("rule_statement", "")
+            b = await _survivor_text(r["survivor_id"], by_id)
+            if a and b:
+                positives.append((a, b))
+
+    # negatives: pair each deleted rule with a different, non-survivor rule.
+    rules = [r.get("rule_statement", "") for r in rows if r.get("rule_statement")]
+    negatives: list[tuple[str, str]] = []
+    for i in range(len(positives)):
+        a = rules[i % len(rules)]
+        b = rules[(i * 7 + 3) % len(rules)]  # deterministic spread, no RNG
+        if a and b and a != b:
+            negatives.append((a, b))
+
+    print(f"positives (labeled dup pairs): {len(positives)}  "
+          f"negatives: {len(negatives)}", flush=True)
+    if not positives:
+        print("no labeled duplicate pairs found in manifest — cannot calibrate", flush=True)
+        return 1
+
+    # precompute lexical scores per pair
+    def scores(pairs):
+        return [(hq.jaccard_shingles(a, b), hq.normalized_levenshtein(a, b)) for a, b in pairs]
+    pos_s, neg_s = scores(positives), scores(negatives)
+
+    print(f"\n{'jac_min':>8}{'lev_min':>8}{'P':>8}{'R':>8}{'F1':>8}", flush=True)
+    best = None
+    for jm in (0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70):
+        for lm in (0.60, 0.65, 0.70, 0.75, 0.80, 0.85):
+            tp = sum(1 for j, l in pos_s if j >= jm or l >= lm)
+            fp = sum(1 for j, l in neg_s if j >= jm or l >= lm)
+            fn = len(pos_s) - tp
+            p = tp / (tp + fp) if (tp + fp) else 0.0
+            r = tp / (tp + fn) if (tp + fn) else 0.0
+            f1 = 2 * p * r / (p + r) if (p + r) else 0.0
+            mark = "  <- configured" if (abs(jm - hq._LEX_JACCARD_MIN) < 1e-9
+                                         and abs(lm - hq._LEX_LEVENSHTEIN_MIN) < 1e-9) else ""
+            if mark:
+                print(f"{jm:>8.2f}{lm:>8.2f}{p:>8.3f}{r:>8.3f}{f1:>8.3f}{mark}", flush=True)
+            if best is None or f1 > best[0]:
+                best = (f1, jm, lm, p, r)
+    print(f"\nbest F1={best[0]:.3f} at jaccard_min={best[1]}, levenshtein_min={best[2]} "
+          f"(P={best[3]:.3f}, R={best[4]:.3f})", flush=True)
+    print("note: positives may include obiter/application cuts (not pure dups); "
+          "use precision as the guard against false-merges.", flush=True)
+    return 0
+
+
+if __name__ == "__main__":
+    ap = argparse.ArgumentParser(description=__doc__,
+                                 formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--manifest", required=True, help="path to halacha-cleanup-manifest-*.csv")
+    args = ap.parse_args()
+    sys.exit(asyncio.run(main(args)))
--- a/scripts/halacha_batch_reconcile.py
+++ b/scripts/halacha_batch_reconcile.py
@@ -0,0 +1,106 @@
+#!/usr/bin/env python3
+"""#82.7 — offline CROSS-precedent halacha dedup (conservative, dry-run reporter).
+
+Dedup-on-insert (db.store_halachot_for_chunk) only compares within a single
+precedent — the 2026-06-03 audit showed cosine ≥0.90 is reliable only
+within-precedent. Across precedents the same principle legitimately recurs, so
+this batch job is deliberately STRICTER (cosine ≥0.95) and NON-DESTRUCTIVE: it
+only reports candidate cross-precedent near-duplicate pairs to a CSV for the
+chair to review. Nothing is skipped, merged, or deleted.
+
+Pairs are found with pgvector's exact cosine (``<=>``) per halacha against
+halachot in OTHER precedents; a secondary lexical check (Jaccard/Levenshtein)
+is reported alongside so the reviewer can tell "same rule" from "same topic".
+
+    cd ~/legal-ai/mcp-server
+    .venv/bin/python ../scripts/halacha_batch_reconcile.py            # cosine ≥0.95
+    .venv/bin/python ../scripts/halacha_batch_reconcile.py --cosine 0.97
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+from legal_mcp.services import db, halacha_quality as hq
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+AUDIT_DIR = REPO_ROOT / "data" / "audit"
+
+
+async def main(args: argparse.Namespace) -> int:
+    cosine = args.cosine
+    max_dist = 1.0 - cosine
+    statuses = ("approved", "published") if not args.include_pending else (
+        "approved", "published", "pending_review")
+
+    pool = await db.get_pool()
+    async with pool.acquire() as conn:
+        rows = await conn.fetch(
+            "SELECT h.id, h.case_law_id, cl.case_number, h.rule_statement "
+            "FROM halachot h JOIN case_law cl ON cl.id = h.case_law_id "
+            "WHERE h.embedding IS NOT NULL AND h.review_status = ANY($1::text[]) "
+            "ORDER BY h.case_law_id, h.halacha_index",
+            list(statuses),
+        )
+        print(f"scanning {len(rows)} halachot for cross-precedent pairs "
+              f"(cosine ≥ {cosine})...", flush=True)
+
+        seen: set[frozenset] = set()
+        pairs: list[dict] = []
+        for r in rows:
+            # nearest neighbor in a DIFFERENT precedent
+            nb = await conn.fetchrow(
+                "SELECT h2.id, cl2.case_number, h2.rule_statement, "
+                "       (h2.embedding <=> (SELECT embedding FROM halachot WHERE id = $1)) AS dist "
+                "FROM halachot h2 JOIN case_law cl2 ON cl2.id = h2.case_law_id "
+                "WHERE h2.embedding IS NOT NULL AND h2.case_law_id <> $2 "
+                "      AND h2.review_status = ANY($3::text[]) "
+                "ORDER BY h2.embedding <=> (SELECT embedding FROM halachot WHERE id = $1) "
+                "LIMIT 1",
+                r["id"], r["case_law_id"], list(statuses),
+            )
+            if nb is None or float(nb["dist"]) > max_dist:
+                continue
+            key = frozenset({str(r["id"]), str(nb["id"])})
+            if key in seen:
+                continue
+            seen.add(key)
+            pairs.append({
+                "case_a": r["case_number"], "id_a": r["id"], "rule_a": r["rule_statement"],
+                "case_b": nb["case_number"], "id_b": nb["id"], "rule_b": nb["rule_statement"],
+                "cosine": round(1.0 - float(nb["dist"]), 4),
+                "jaccard": round(hq.jaccard_shingles(r["rule_statement"], nb["rule_statement"]), 3),
+                "levenshtein": round(hq.normalized_levenshtein(r["rule_statement"], nb["rule_statement"]), 3),
+            })
+
+    pairs.sort(key=lambda p: -p["cosine"])
+    print(f"found {len(pairs)} cross-precedent candidate pair(s)", flush=True)
+    for p in pairs[:30]:
+        print(f"  cos={p['cosine']} jac={p['jaccard']} lev={p['levenshtein']}  "
+              f"{p['case_a']} ↔ {p['case_b']}: {p['rule_a'][:60]}...", flush=True)
+
+    if pairs:
+        ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+        AUDIT_DIR.mkdir(parents=True, exist_ok=True)
+        out = AUDIT_DIR / f"halacha-cross-precedent-{ts}.csv"
+        with out.open("w", encoding="utf-8", newline="") as f:
+            w = csv.DictWriter(f, fieldnames=list(pairs[0].keys()))
+            w.writeheader()
+            w.writerows(pairs)
+        print(f"\nreport: {out}  (review-only — nothing changed)", flush=True)
+    return 0
+
+
+if __name__ == "__main__":
+    ap = argparse.ArgumentParser(description=__doc__,
+                                 formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--cosine", type=float, default=0.95,
+                    help="min cosine for a cross-precedent candidate (default 0.95)")
+    ap.add_argument("--include-pending", action="store_true",
+                    help="also scan pending_review halachot (default: approved/published only)")
+    args = ap.parse_args()
+    sys.exit(asyncio.run(main(args)))
--- a/scripts/halacha_goldset.py
+++ b/scripts/halacha_goldset.py
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+"""#81.7 — gold-set harness for halacha-extraction quality.
+
+Two modes — the human tagging in between is the only manual step:
+
+  export  — dump a stratified sample of halachot to a CSV with EMPTY label
+            columns for חיים/דפנה to fill (is_holding, correct_type,
+            quote_complete). Stratified across precedents and rule_types so
+            the set isn't dominated by one ruling.
+
+  score   — read the tagged CSV back and measure each pure validator
+            (compute_quality_flags / is_fact_dependent / is_quote_truncated /
+            is_thin_restatement) against the human labels: precision, recall,
+            F1 per validator + a confusion summary. This is the ground-truth
+            #81.8 needs to recalibrate the auto-approve threshold.
+
+The validators here are the SAME ones the live extractor runs, imported
+directly — so the score reflects production behavior, not a reimplementation.
+
+    cd ~/legal-ai/mcp-server
+    .venv/bin/python ../scripts/halacha_goldset.py export --n 150
+    #   ... חיים/דפנה fill is_holding / correct_type / quote_complete ...
+    .venv/bin/python ../scripts/halacha_goldset.py score --in data/audit/halacha-goldset-<ts>.csv
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import sys
+from collections import defaultdict
+from datetime import datetime, timezone
+from pathlib import Path
+
+from legal_mcp.services import db, halacha_quality as hq
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+AUDIT_DIR = REPO_ROOT / "data" / "audit"
+
+# Columns the human fills. is_holding: 1 if a real generalizable holding, 0 if
+# obiter/application/fact-recitation/non-rule. correct_type: binding/interpretive/
+# obiter/application. quote_complete: 1 if the quote is a whole, untruncated span.
+LABEL_COLS = ["is_holding", "correct_type", "quote_complete"]
+EXPORT_COLS = [
+    "id", "case_number", "halacha_index", "rule_type", "review_status",
+    "confidence", "rule_statement", "supporting_quote", *LABEL_COLS,
+]
+
+
+async def _export(n: int) -> int:
+    rows = await db.list_halachot(limit=5000)
+    # stratify: round-robin across (case_law_id, rule_type) buckets.
+    buckets: dict = defaultdict(list)
+    for r in rows:
+        buckets[(r["case_law_id"], r.get("rule_type"))].append(r)
+    sample: list[dict] = []
+    keys = list(buckets.values())
+    i = 0
+    while len(sample) < n and any(keys):
+        b = keys[i % len(keys)]
+        if b:
+            sample.append(b.pop())
+        i += 1
+        if i > n * 50:
+            break
+    ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    AUDIT_DIR.mkdir(parents=True, exist_ok=True)
+    out = AUDIT_DIR / f"halacha-goldset-{ts}.csv"
+    with out.open("w", encoding="utf-8", newline="") as f:
+        w = csv.DictWriter(f, fieldnames=EXPORT_COLS, extrasaction="ignore")
+        w.writeheader()
+        for r in sample:
+            w.writerow({**{k: r.get(k, "") for k in EXPORT_COLS},
+                        **{lc: "" for lc in LABEL_COLS}})
+    print(f"exported {len(sample)} halachot for tagging → {out}", flush=True)
+    print(f"fill columns: {', '.join(LABEL_COLS)} (is_holding/quote_complete = 1/0)", flush=True)
+    return 0
+
+
+def _prf(tp: int, fp: int, fn: int) -> tuple[float, float, float]:
+    p = tp / (tp + fp) if (tp + fp) else 0.0
+    r = tp / (tp + fn) if (tp + fn) else 0.0
+    f1 = 2 * p * r / (p + r) if (p + r) else 0.0
+    return round(p, 3), round(r, 3), round(f1, 3)
+
+
+def _score(path: Path) -> int:
+    with path.open(encoding="utf-8") as f:
+        rows = [r for r in csv.DictReader(f) if (r.get("is_holding") or "").strip() != ""]
+    if not rows:
+        print("no labeled rows (is_holding empty everywhere) — nothing to score", flush=True)
+        return 1
+
+    # A validator FLAG is a prediction of "NOT a clean holding" (should be
+    # rejected/reviewed). Ground truth NOT-holding = is_holding == 0.
+    # We score each validator as a detector of not-holding.
+    counters: dict[str, dict[str, int]] = defaultdict(lambda: {"tp": 0, "fp": 0, "fn": 0, "tn": 0})
+
+    def tally(name: str, predicted_bad: bool, truly_bad: bool):
+        c = counters[name]
+        if predicted_bad and truly_bad:
+            c["tp"] += 1
+        elif predicted_bad and not truly_bad:
+            c["fp"] += 1
+        elif not predicted_bad and truly_bad:
+            c["fn"] += 1
+        else:
+            c["tn"] += 1
+
+    for r in rows:
+        rule = r.get("rule_statement", "")
+        quote = r.get("supporting_quote", "")
+        rtype = r.get("rule_type", "binding")
+        quote_complete = (r.get("quote_complete") or "1").strip() not in ("0", "false", "")
+        truly_not_holding = (r.get("is_holding") or "").strip() in ("0", "false")
+
+        flags = hq.compute_quality_flags(rule, quote, "", quote_complete, rtype)
+        tally("any_flag", bool(flags), truly_not_holding)
+        tally("application", hq.FLAG_APPLICATION in flags, truly_not_holding)
+        tally("non_decision", hq.FLAG_NON_DECISION in flags, truly_not_holding)
+        tally("thin_restatement", hq.FLAG_THIN_RESTATEMENT in flags, truly_not_holding)
+        # quote-truncation scored against quote_complete label specifically
+        tally("truncated_quote", hq.is_quote_truncated(quote), not quote_complete)
+
+    print(f"scored {len(rows)} labeled halachot\n", flush=True)
+    print(f"{'validator':<18}{'P':>7}{'R':>7}{'F1':>7}   tp/fp/fn/tn", flush=True)
+    for name, c in counters.items():
+        p, rec, f1 = _prf(c["tp"], c["fp"], c["fn"])
+        print(f"{name:<18}{p:>7}{rec:>7}{f1:>7}   "
+              f"{c['tp']}/{c['fp']}/{c['fn']}/{c['tn']}", flush=True)
+    return 0
+
+
+async def main(args: argparse.Namespace) -> int:
+    if args.mode == "export":
+        return await _export(args.n)
+    return _score(Path(args.infile))
+
+
+if __name__ == "__main__":
+    ap = argparse.ArgumentParser(description=__doc__,
+                                 formatter_class=argparse.RawDescriptionHelpFormatter)
+    sub = ap.add_subparsers(dest="mode", required=True)
+    pe = sub.add_parser("export", help="dump a sample CSV for human tagging")
+    pe.add_argument("--n", type=int, default=150, help="sample size (default 150)")
+    ps = sub.add_parser("score", help="measure validators against a tagged CSV")
+    ps.add_argument("--in", dest="infile", required=True, help="tagged CSV path")
+    args = ap.parse_args()
+    sys.exit(asyncio.run(main(args)))
--- a/scripts/nevo_ratio_benchmark.py
+++ b/scripts/nevo_ratio_benchmark.py
@@ -0,0 +1,173 @@
+#!/usr/bin/env python3
+"""#86.3 — benchmark halacha-extraction quality against Nevo's מיני-רציו gold-set.
+
+Nevo's editorial מיני-רציו is a free, professionally-written list of a ruling's
+holdings. By comparing the halachot WE extracted against it we get an honest,
+zero-cost measurement of extraction quality per ruling:
+
+  * recall      — fraction of Nevo's holdings that our halachot cover
+  * precision   — fraction of our halachot that map to a Nevo holding
+  * granularity — our_count / nevo_holding_count (over-decomposition signal,
+                  the #81.5 concern: e.g. 14 ours vs 4 Nevo = 3.5x)
+
+The gold-truth ratio is read from ``case_law.nevo_ratio`` (populated by
+``backfill_nevo_preamble.py`` / ingest). For rulings not yet backfilled it
+falls back to computing the ratio on-the-fly from the stored ``full_text``,
+so the harness works before and after the migration.
+
+An LLM-as-judge (local ``claude_session``, zero API cost) does the semantic
+mapping — string overlap can't tell "same holding, different words" from a
+genuinely new holding. The judge is asked to count, not to rewrite.
+
+Run with the MCP server venv (needs the local ``claude`` CLI):
+
+    cd ~/legal-ai/mcp-server
+    .venv/bin/python ../scripts/nevo_ratio_benchmark.py --case 'בג"ץ 1764/05'
+    .venv/bin/python ../scripts/nevo_ratio_benchmark.py --all --limit 5
+    .venv/bin/python ../scripts/nevo_ratio_benchmark.py --all   # full corpus
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import json
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+from legal_mcp.services import claude_session, db
+from legal_mcp.services.extractor import extract_nevo_ratio
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+AUDIT_DIR = REPO_ROOT / "data" / "audit"
+
+_JUDGE_SYSTEM = (
+    "אתה בוחן-איכות משפטי. נתונים לך (א) רשימת ההלכות (מיני-רציו) שכתב עורך נבו "
+    "עבור פסק-דין — אמת-המידה; (ב) רשימת ההלכות שמערכת אוטומטית חילצה מאותו "
+    "פסק-דין. משימתך: למפות סמנטית בין השתיים (אותו עיקרון משפטי בניסוח שונה = "
+    "התאמה), ולספור. החזר JSON בלבד, ללא טקסט נוסף."
+)
+
+
+def _judge_prompt(ratio: str, ours: list[str]) -> str:
+    ours_block = "\n".join(f"{i}. {s}" for i, s in enumerate(ours, 1)) or "(אין)"
+    return (
+        f"מיני-רציו של נבו (אמת-מידה):\n{ratio}\n\n"
+        f"ההלכות שחולצו על-ידי המערכת ({len(ours)}):\n{ours_block}\n\n"
+        "החזר JSON עם המפתחות:\n"
+        '{"nevo_holdings": <מספר העקרונות הנפרדים במיני-רציו>,\n'
+        ' "covered": <כמה מעקרונות נבו מכוסים ע"י לפחות הלכה אחת שלנו>,\n'
+        ' "ours_total": <מספר ההלכות שלנו>,\n'
+        ' "ours_mapped": <כמה מההלכות שלנו ממופות לעיקרון נבו כלשהו>,\n'
+        ' "notes": "<עד 2 משפטים: מה הוחמץ / מה עודף>"}'
+    )
+
+
+async def _bench_one(row: dict, model: str | None) -> dict:
+    cn = row["case_number"]
+    ratio = (row.get("nevo_ratio") or "").strip() or extract_nevo_ratio(row.get("full_text") or "")
+    result = {"case_number": cn, "nevo_holdings": 0, "covered": 0,
+              "ours_total": 0, "ours_mapped": 0, "recall": None,
+              "precision": None, "granularity": None, "notes": "", "error": ""}
+    if not ratio:
+        result["error"] = "no mini-ratio"
+        return result
+
+    halachot = await db.list_halachot(case_law_id=row["id"], limit=500)
+    ours = [h["rule_statement"] for h in halachot
+            if h.get("review_status") in ("approved", "published", "pending_review")
+            and (h.get("rule_statement") or "").strip()]
+    result["ours_total"] = len(ours)
+    if not ours:
+        result["error"] = "no extracted halachot"
+        return result
+
+    try:
+        verdict = await claude_session.query_json(
+            _judge_prompt(ratio, ours), system=_JUDGE_SYSTEM, model=model, effort="low",
+        )
+    except Exception as e:  # noqa: BLE001
+        result["error"] = f"judge failed: {e}"
+        return result
+    if not isinstance(verdict, dict):
+        result["error"] = "judge returned non-dict"
+        return result
+
+    nh = int(verdict.get("nevo_holdings") or 0)
+    cov = int(verdict.get("covered") or 0)
+    ot = int(verdict.get("ours_total") or len(ours))
+    om = int(verdict.get("ours_mapped") or 0)
+    result.update({
+        "nevo_holdings": nh, "covered": cov, "ours_total": ot, "ours_mapped": om,
+        "recall": round(cov / nh, 3) if nh else None,
+        "precision": round(om / ot, 3) if ot else None,
+        "granularity": round(ot / nh, 2) if nh else None,
+        "notes": str(verdict.get("notes") or "")[:300],
+    })
+    return result
+
+
+async def main(args: argparse.Namespace) -> int:
+    pool = await db.get_pool()
+    async with pool.acquire() as conn:
+        if args.case:
+            rows = await conn.fetch(
+                "SELECT id, case_number, nevo_ratio, full_text FROM case_law "
+                "WHERE case_number = $1", args.case,
+            )
+        else:
+            # rulings that have (or can derive) a ratio
+            rows = await conn.fetch(
+                "SELECT id, case_number, nevo_ratio, full_text FROM case_law "
+                "WHERE nevo_ratio <> '' OR full_text LIKE '%מיני-רציו:%' "
+                "ORDER BY case_number"
+            )
+    rows = [dict(r) for r in rows]
+    if args.limit:
+        rows = rows[: args.limit]
+    if not rows:
+        print("no rulings with a mini-ratio found", flush=True)
+        return 0
+
+    print(f"benchmarking {len(rows)} ruling(s)...", flush=True)
+    results = []
+    for i, row in enumerate(rows, 1):
+        res = await _bench_one(row, args.model)
+        results.append(res)
+        if res["error"]:
+            print(f"[{i}/{len(rows)}] {res['case_number']}: SKIP ({res['error']})", flush=True)
+        else:
+            print(f"[{i}/{len(rows)}] {res['case_number']}: "
+                  f"recall={res['recall']} precision={res['precision']} "
+                  f"granularity={res['granularity']}x "
+                  f"(nevo={res['nevo_holdings']}, ours={res['ours_total']})", flush=True)
+
+    scored = [r for r in results if r["recall"] is not None]
+    if scored:
+        avg = lambda k: round(sum(r[k] for r in scored) / len(scored), 3)  # noqa: E731
+        print(f"\n=== {len(scored)} scored — mean recall={avg('recall')} "
+              f"precision={avg('precision')} granularity={avg('granularity')}x ===", flush=True)
+
+    ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    AUDIT_DIR.mkdir(parents=True, exist_ok=True)
+    out = Path(args.out) if args.out else AUDIT_DIR / f"nevo-ratio-benchmark-{ts}.csv"
+    with out.open("w", encoding="utf-8", newline="") as f:
+        w = csv.DictWriter(f, fieldnames=list(results[0].keys()))
+        w.writeheader()
+        w.writerows(results)
+    print(f"report: {out}", flush=True)
+    return 0
+
+
+if __name__ == "__main__":
+    ap = argparse.ArgumentParser(description=__doc__,
+                                 formatter_class=argparse.RawDescriptionHelpFormatter)
+    g = ap.add_mutually_exclusive_group(required=True)
+    g.add_argument("--case", help="benchmark a single case_number")
+    g.add_argument("--all", action="store_true", help="benchmark all rulings with a mini-ratio")
+    ap.add_argument("--limit", type=int, default=None, help="cap the number of rulings")
+    ap.add_argument("--model", default=None, help="judge model (default: CLI session default)")
+    ap.add_argument("--out", default=None, help="output CSV path (default: data/audit/)")
+    args = ap.parse_args()
+    sys.exit(asyncio.run(main(args)))
--- a/web-ui/src/lib/api/types.ts
+++ b/web-ui/src/lib/api/types.ts
@@ -1113,6 +1113,52 @@ export interface paths {
        patch?: never;
        trace?: never;
    };
+    "/api/cases/{case_number}/decision-blocks": {
+        parameters: {
+            query?: never;
+            header?: never;
+            path?: never;
+            cookie?: never;
+        };
+        /**
+         * Api Get Decision Blocks
+         * @description Return all 12 decision blocks as JSON (empty blocks included).
+         *
+         *     Read path for the interactive block viewer — content lives in
+         *     decision_blocks but was previously only reachable via DOCX export.
+         */
+        get: operations["api_get_decision_blocks_api_cases__case_number__decision_blocks_get"];
+        put?: never;
+        post?: never;
+        delete?: never;
+        options?: never;
+        head?: never;
+        patch?: never;
+        trace?: never;
+    };
+    "/api/cases/{case_number}/decision-blocks/{block_id}": {
+        parameters: {
+            query?: never;
+            header?: never;
+            path?: never;
+            cookie?: never;
+        };
+        get?: never;
+        /**
+         * Api Update Decision Block
+         * @description Save inline-edited content for a single decision block.
+         *
+         *     Writes to decision_blocks (upsert, status='draft') and rebuilds the
+         *     on-disk decision.md. Creates a decision row if none exists yet.
+         */
+        put: operations["api_update_decision_block_api_cases__case_number__decision_blocks__block_id__put"];
+        post?: never;
+        delete?: never;
+        options?: never;
+        head?: never;
+        patch?: never;
+        trace?: never;
+    };
    "/api/cases/{case_number}/learn": {
        parameters: {
            query?: never;
@@ -1959,6 +2005,88 @@ export interface paths {
        patch?: never;
        trace?: never;
    };
+    "/api/learning/pairs": {
+        parameters: {
+            query?: never;
+            header?: never;
+            path?: never;
+            cookie?: never;
+        };
+        /**
+         * Api Learning Pairs
+         * @description פנקס-ההתאמה (INV-LRN4) — כל ההחלטות וסטטוס ההשוואה מול הסופי.
+         *     status אופציונלי: final_received / analyzed / lessons_folded.
+         */
+        get: operations["api_learning_pairs_api_learning_pairs_get"];
+        put?: never;
+        post?: never;
+        delete?: never;
+        options?: never;
+        head?: never;
+        patch?: never;
+        trace?: never;
+    };
+    "/api/learning/style-distance/{case_number}": {
+        parameters: {
+            query?: never;
+            header?: never;
+            path?: never;
+            cookie?: never;
+        };
+        /**
+         * Api Learning Style Distance
+         * @description מדד מרחק-סגנון (T7) לתיק — האם הטיוטה מתכנסת לדפנה.
+         */
+        get: operations["api_learning_style_distance_api_learning_style_distance__case_number__get"];
+        put?: never;
+        post?: never;
+        delete?: never;
+        options?: never;
+        head?: never;
+        patch?: never;
+        trace?: never;
+    };
+    "/api/learning/pairs/{pair_id}": {
+        parameters: {
+            query?: never;
+            header?: never;
+            path?: never;
+            cookie?: never;
+        };
+        /**
+         * Api Learning Pair Detail
+         * @description פירוט שורת-פנקס כולל הצעת-הדיסטילציה (analysis) לאישור יו"ר (T14).
+         */
+        get: operations["api_learning_pair_detail_api_learning_pairs__pair_id__get"];
+        put?: never;
+        post?: never;
+        delete?: never;
+        options?: never;
+        head?: never;
+        patch?: never;
+        trace?: never;
+    };
+    "/api/learning/pairs/{pair_id}/promote": {
+        parameters: {
+            query?: never;
+            header?: never;
+            path?: never;
+            cookie?: never;
+        };
+        get?: never;
+        put?: never;
+        /**
+         * Api Learning Promote
+         * @description שער-יו"ר (INV-G10/LRN1): מאשר לקחי-סגנון + ביטויי-מעבר מהצעת-הדיסטילציה
+         *     ומטמיע אותם בערוצים שהכותב צורך (methodology overrides → T15). מקדם status.
+         */
+        post: operations["api_learning_promote_api_learning_pairs__pair_id__promote_post"];
+        delete?: never;
+        options?: never;
+        head?: never;
+        patch?: never;
+        trace?: never;
+    };
    "/api/admin/skills": {
        parameters: {
            query?: never;
@@ -2254,7 +2382,14 @@ export interface paths {
        head?: never;
        /**
         * Api Resolve Feedback
-         * @description Mark feedback as resolved.
+         * @description Mark feedback as resolved. When ``fold`` is true (default) and the entry
+         *     has an extracted lesson, also wake the CEO to fold that lesson into the
+         *     right knowledge file (the feedback→agent-knowledge loop).
+         *
+         *     The fold is fire-and-forget (BackgroundTask) and best-effort — resolving
+         *     never fails because Paperclip is down. Pass ``fold=false`` for pure
+         *     bookkeeping resolves (e.g. from the per-case drafts panel) to avoid
+         *     spawning a CEO run per click.
         */
        patch: operations["api_resolve_feedback_api_feedback__feedback_id__resolve_patch"];
        trace?: never;
@@ -2566,7 +2701,13 @@ export interface paths {
            path?: never;
            cookie?: never;
        };
-        /** Halachot List */
+        /**
+         * Halachot List
+         * @description List halachot. ``exclude_low_quality`` hides flagged items (#84.1) and
+         *     ``order_by_priority`` switches to the active-learning order (#84.3). Both
+         *     default off so existing callers are unaffected; the review-queue view opts
+         *     in.
+         */
        get: operations["halachot_list_api_halachot_get"];
        put?: never;
        post?: never;
@@ -2746,6 +2887,11 @@ export interface components {
            /** Issue Id */
            issue_id?: string | null;
        };
+        /** BlockUpdateRequest */
+        BlockUpdateRequest: {
+            /** Content */
+            content: string;
+        };
        /** Body_api_create_feedback_api_feedback_post */
        Body_api_create_feedback_api_feedback_post: {
            /**
@@ -3475,6 +3621,19 @@ export interface components {
            /** Citation Formatted */
            citation_formatted?: string | null;
        };
+        /** PromoteLearningRequest */
+        PromoteLearningRequest: {
+            /**
+             * Lessons
+             * @default []
+             */
+            lessons: string[];
+            /**
+             * Phrases
+             * @default []
+             */
+            phrases: string[];
+        };
        /** ReviseRequest */
        ReviseRequest: {
            /** Revisions */
@@ -5263,6 +5422,73 @@ export interface operations {
            };
        };
    };
+    api_get_decision_blocks_api_cases__case_number__decision_blocks_get: {
+        parameters: {
+            query?: never;
+            header?: never;
+            path: {
+                case_number: string;
+            };
+            cookie?: never;
+        };
+        requestBody?: never;
+        responses: {
+            /** @description Successful Response */
+            200: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": unknown;
+                };
+            };
+            /** @description Validation Error */
+            422: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": components["schemas"]["HTTPValidationError"];
+                };
+            };
+        };
+    };
+    api_update_decision_block_api_cases__case_number__decision_blocks__block_id__put: {
+        parameters: {
+            query?: never;
+            header?: never;
+            path: {
+                case_number: string;
+                block_id: string;
+            };
+            cookie?: never;
+        };
+        requestBody: {
+            content: {
+                "application/json": components["schemas"]["BlockUpdateRequest"];
+            };
+        };
+        responses: {
+            /** @description Successful Response */
+            200: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": unknown;
+                };
+            };
+            /** @description Validation Error */
+            422: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": components["schemas"]["HTTPValidationError"];
+                };
+            };
+        };
+    };
    api_learn_api_cases__case_number__learn_post: {
        parameters: {
            query?: never;
@@ -6575,6 +6801,135 @@ export interface operations {
            };
        };
    };
+    api_learning_pairs_api_learning_pairs_get: {
+        parameters: {
+            query?: {
+                status?: string;
+                limit?: number;
+            };
+            header?: never;
+            path?: never;
+            cookie?: never;
+        };
+        requestBody?: never;
+        responses: {
+            /** @description Successful Response */
+            200: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": unknown;
+                };
+            };
+            /** @description Validation Error */
+            422: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": components["schemas"]["HTTPValidationError"];
+                };
+            };
+        };
+    };
+    api_learning_style_distance_api_learning_style_distance__case_number__get: {
+        parameters: {
+            query?: never;
+            header?: never;
+            path: {
+                case_number: string;
+            };
+            cookie?: never;
+        };
+        requestBody?: never;
+        responses: {
+            /** @description Successful Response */
+            200: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": unknown;
+                };
+            };
+            /** @description Validation Error */
+            422: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": components["schemas"]["HTTPValidationError"];
+                };
+            };
+        };
+    };
+    api_learning_pair_detail_api_learning_pairs__pair_id__get: {
+        parameters: {
+            query?: never;
+            header?: never;
+            path: {
+                pair_id: string;
+            };
+            cookie?: never;
+        };
+        requestBody?: never;
+        responses: {
+            /** @description Successful Response */
+            200: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": unknown;
+                };
+            };
+            /** @description Validation Error */
+            422: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": components["schemas"]["HTTPValidationError"];
+                };
+            };
+        };
+    };
+    api_learning_promote_api_learning_pairs__pair_id__promote_post: {
+        parameters: {
+            query?: never;
+            header?: never;
+            path: {
+                pair_id: string;
+            };
+            cookie?: never;
+        };
+        requestBody: {
+            content: {
+                "application/json": components["schemas"]["PromoteLearningRequest"];
+            };
+        };
+        responses: {
+            /** @description Successful Response */
+            200: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": unknown;
+                };
+            };
+            /** @description Validation Error */
+            422: {
+                headers: {
+                    [name: string]: unknown;
+                };
+                content: {
+                    "application/json": components["schemas"]["HTTPValidationError"];
+                };
+            };
+        };
+    };
    api_list_skills_api_admin_skills_get: {
        parameters: {
            query?: never;
@@ -7580,6 +7935,8 @@ export interface operations {
                practice_area?: string;
                limit?: number;
                offset?: number;
+                exclude_low_quality?: boolean;
+                order_by_priority?: boolean;
            };
            header?: never;
            path?: never;
--- a/web/app.py
+++ b/web/app.py
@@ -6031,7 +6031,13 @@ async def halachot_list(
    practice_area: str = "",
    limit: int = 200,
    offset: int = 0,
+    exclude_low_quality: bool = False,
+    order_by_priority: bool = False,
 ):
+    """List halachot. ``exclude_low_quality`` hides flagged items (#84.1) and
+    ``order_by_priority`` switches to the active-learning order (#84.3). Both
+    default off so existing callers are unaffected; the review-queue view opts
+    in."""
    cid: UUID | None = None
    if case_law_id:
        try:
@@ -6043,6 +6049,8 @@ async def halachot_list(
        review_status=review_status or None,
        practice_area=practice_area or None,
        limit=limit, offset=offset,
+        exclude_low_quality=exclude_low_quality,
+        order_by_priority=order_by_priority,
    )
    return {"items": rows, "count": len(rows)}
Author	SHA1	Message	Date
Chaim	e4651a9d06	feat(#99 / T10): get_style_guide — יחסי-זהב נמדדים מהקורפוס לצד היעד style_distance.measure_corpus_ratios(): מפצל כל החלטה ב-style_corpus לסעיפים (chunker) ומחשב ממוצע %-סעיף — אגרגט "_all" + פר-תוצאה (כשיש). cached. get_style_guide מציג שורת "נמדד בפועל" עם ⚠️ על פער מטווח-היעד. מצב נוכחי: style_corpus.outcome לא מאוכלס → מוצג אגרגט כל-ההחלטות (n=48: רקע 26.4% / טענות 9.7% / דיון 43.8% / סיכום 20.1%); פיצול לפי-תוצאה future-ready. המדידה גם מאירה מגבלות זיהוי-סעיפים (כוונת T10 — לסמן פער לבדיקה). חופף-חלקית ל-T7 שמודד adherence per-draft; זה מודד את הקורפוס. כשל מדידה מוצג, לא נבלע. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:01:42 +00:00
Chaim	a571ad535b	fix(#88+#87): סנכרון DB↔file אוטומטי + claims_coverage מבחין כתב-ערר מתכתובת #88 (DB↔file, lessons #35): drafts/decision.md דרסה את עצמה רק ב-save_block_content; renumber_all_blocks + נתיבי store_block אחרים השאירו את הקובץ stale → QA נכשל פעמיים על אותה בעיה (CMPA-62). תיקון: _update_draft_file הפך ל-hook אוטומטי (מקבל decision_id, מאתר case פנימית) שנקרא מ-store_block (כל persist) ומ- renumber_all_blocks. legal-qa ממילא קורא מ-DB → שני הצדדים זהים תמיד. #87 (claims_coverage, 1033-25): טענות מתכתובת (claim_type='reply' — תגובה/ השלמת-טיעון) סומנו "לא נענו" כ-false-positive. תיקון: check_claims_coverage דורש מענה רק לטענות כתב-הערר (claim_type='claim', appellant); reply/תכתובת מוחרגות. בקבלה מלאה הסף מוקל (0.2→0.4) כי העורר זכה במלואו. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:54:31 +00:00
Chaim	afc1548bca	chore(style-acq T11): regen API types (learning + methodology endpoints) npm run api:types — מסנכרן types.ts המחולל עם ה-endpoints החדשים (/api/learning/pairs, style-distance, promote). הקוד משתמש בטיפוסים ידניים (learning.ts) אז זה היגיינה לעתיד, לא תלות. סוגר את T11. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:44:41 +00:00
Chaim	e096c51037	fix(#85 ): claude_session — retry על כשלים חולפים של claude -p שורש #85 התברר: `claude -p` נכשל מדי פעם ב-exit מהיר + stderr ריק על פרומפטים גדולים/איטיים (CEO write_interim_draft, learning_loop distillation), אותו פרומפט מצליח בריצה חוזרת — כשל חולף, לא nesting (אומת: nested claude מ-bash וגם פרומפט 70K הצליחו; הכשל אינו דטרמיניסטי). query() עוטף spawn+communicate ב-לולאת retry (MAX_RETRIES=3, backoff לינארי 5s*attempt). FileNotFoundError + timeout נשארים דטרמיניסטיים (ללא retry). empty-response גם מטופל כ-transient. אומת e2e: distillation על 1130-25 רץ בהצלחה → pair=analyzed (9 שינויים, 6 style_method, 33.8% diff). פותר גם את write_interim_draft של ה-CEO. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:08:54 +00:00
chaim	85c5a4aacb	Merge pull request 'feat(halacha-triage): quality-gated + prioritized review queue + metrics (#84 )' (#93 ) from worktree-task84-halacha-triage into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 20:01:27 +00:00
Chaim	420cb819f5	feat(halacha-triage): quality-gated + prioritized review queue + metrics (#84 ) Backend for the halacha approval-queue triage (#84). The keyboard UI, batch actions and defer/reject (#84.4–6) already shipped; this adds the gating, prioritization and metrics the queue was missing. db.list_halachot — two opt-in triage controls: * exclude_low_quality (#84.1): drop items carrying ANY quality_flag (application / quote_unverified / truncated / non_decision / thin / nli_unsupported / near_duplicate) — they belong in a 'needs extraction fix' bucket, not the chair's approve queue. * order_by_priority (#84.3): active-learning order — negatively-treated first, then most-uncertain (lowest confidence), then oldest — instead of FIFO, so the highest-value decisions surface first. halachot_pending (MCP) — now gated + prioritized BY DEFAULT; include_low_quality= true reveals the needs-fix bucket. The agent review path benefits immediately. GET /api/halachot — same two params, default OFF (non-breaking; the UI opts in). metrics.halacha_backlog (#84.7) — splits pending into clean vs flagged, adds deferred, reviewed_total, approve_ratio, and a pending_by_flag breakdown, so the backlog distinguishes real review work from extraction noise. Deferred (documented): #84.2 near-duplicate cluster cards and wiring the UI fetch to the new params require frontend work + an api:types regen AFTER this deploys (the new query params aren't in prod's OpenAPI until then) — a clean follow-up. The backend fully supports both now. Verified against the live DB (read-only): - pending 177 → gated-clean 110, 0 flagged items leak into the clean queue. - priority order surfaces the lowest-confidence items first (0.55, 0.55, ...). - backlog: pending_clean=110 / pending_flagged=67 / approve_ratio=0.916, pending_by_flag={nli_unsupported:59, quote_unverified:3, thin:3, truncated:2}. - pytest tests/test_halacha_quality.py — 52 passed (no regression). Invariants: G1 (gate at source — SQL filter, not post-hoc); G2 (no parallel path — same list_halachot); §6 (flagged items routed to a bucket, never dropped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:00:52 +00:00
chaim	32ef259843	Merge pull request 'feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82)' (#92 ) from worktree-task81-82-halacha-engine into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 19:56:22 +00:00
Chaim	1286a1e60d	feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82) Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes (pure + tested) plus measurement/ops tooling. halacha_quality.py - #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags now takes rule_type and flags rule_type=='application' OR fact-dependent — blocking auto-approve (an illustration is not a generalizable holding). - #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein / lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band. halacha_extractor.py — pass rule_type to the flag computation; re-type a binding-labeled fact-application to 'application' (mirrors non_decision→obiter). db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a possibly-distinct principle unreviewed). config.py — HALACHA_DEDUP_BAND_COSINE (0.83). Scripts: - scripts/halacha_goldset.py (#81.7) — export stratified sample for human tagging; score validators (P/R/F1) against the tags. Backbone for #81.8. - scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent dedup (cosine ≥0.95), dry-run report only. - scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds against the 2026-06-03 cleanup gold-set. Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE on normalized quote are NOT included — the current skip+flag behavior is safe, whereas a UNIQUE on normalized_quote would fail on existing dups and a blind merge risks losing provenance; they need their own chair-reviewed migration. #82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role classifier deferred (section pre-filter + application flag cover the practical case); #81.8 blocked on the human-tagged gold-set (harness now provided). Verified: - pytest tests/test_halacha_quality.py — 52 passed (14 new). - calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall 0.30 — correct profile for an auto-approve-blocking signal. - goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5 cross-precedent candidate pairs. Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no silent swallow — suspect items flagged to review, never dropped); G2 (no parallel path — same store_halachot_for_chunk / compute_quality_flags). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:55:45 +00:00
chaim	366d89e6bb	Merge pull request 'feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86 )' (#91 ) from worktree-task86-nevo-backfill-benchmark into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 19:46:25 +00:00
Chaim	fb51a0e869	feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86 ) #86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route. extractor.py - extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings summary) before it is stripped — a free professional gold-set (#86.3). - _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped. (a) פסק-דין headers are markdown-wrapped (פסק דין); the old anchor required the keyword as the first line char with one separator, so it missed the header and matched a citation 32K deep (עמ"נ 50567-07-21, losing 45% of the body). Now tolerates leading markdown + 0-3 seps, and the final-nun form (דין ן vs דינו נ). (b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The authoring-judge line ends with a colon; we now require it. ingest.py - capture the ratio before stripping and store it on the row (best-effort, non-fatal); also strip the text-upload path (was file-only). db.py - add case_law.nevo_ratio column (additive); allow it in update_case_law. scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration: finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose quote lives in the removed preamble (review_status=pending_review + nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are excluded from --apply as suspected over-strip. --apply writes backup+manifest to data/audit/ first. Chair-gated — NOT applied here. scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session, zero cost) measures recall/precision/granularity of our halachot vs the Nevo ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text). Verified: - pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown over-strip regressions). - backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75% keep (the 32K over-strip is gone). - benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x. Invariants: G1 (normalize at source — strip/capture at ingest, not at read); no silent swallow (contaminated halachot flagged + reported, not dropped); data-migration is dry-run-default with backup+manifest, chair-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:45:43 +00:00
chaim	12bdec10fa	Merge pull request 'fix(claude_session): surface real CLI error + sanitize nested env (#85 )' (#90 ) from worktree-task85-claude-session-nested into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details	2026-06-06 19:30:22 +00:00
Chaim	8ec24cf822	fix(claude_session): surface real CLI error + sanitize nested env (#85 ) write_interim_draft failed for all blocks from the CEO MCP instance with "Claude CLI failed (exit 1): unknown error". Two fixes: 1. Error surfacing (the certain win): on non-zero exit, capture and log both stderr AND stdout (the CLI sometimes writes its diagnostic to stdout or nowhere), so the next occurrence is diagnosable instead of collapsing to "unknown error". This is why #85 was unsolved — the real error was swallowed (engineering rule §6: no silent swallow). 2. Defensive hardening: strip Claude Code session markers (CLAUDECODE, CLAUDE_CODE_, CLAUDE_AGENT_, AI_AGENT, CLAUDE_EFFORT) from the env of nested `claude -p` calls and run them from $HOME, decoupling them from the parent agent's session/project state. Aligns query() with the existing query_streaming() path (which already sets cwd=HOME). Auth/ config vars are preserved. Note: the original adapter-context failure could not be reproduced in a plain interactive session (nested claude -p succeeds there in both old and new code), so the env markers are a suspect, not a proven cause. The real value is the diagnostics. Verified: nested query() returns PONG from inside a CLAUDECODE=1 session; unit tests cover env sanitization. Invariants: G1 (normalize at source — fix the spawn, not readers), G2 (no parallel path — same query()), §6 (no silent error swallow). INV: feedback_claude_session_local_only preserved (all calls stay local). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:29:36 +00:00
chaim	3b9f77daa8	Merge pull request 'feat(style-acq T8): analyze_corpus — הסרת LIMIT 20 (כיסוי מלא)' (#89 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m24s Details	2026-06-06 19:25:40 +00:00
chaim	32a6e2b57b	Merge pull request 'fix(style-acq T9): מספור-אוטומטי אמיתי בייצוא DOCX' (#88 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details	2026-06-06 19:24:02 +00:00
chaim	37c00bac13	Merge pull request 'feat(style-acq T14): שער-יו"ר לאישור הצעות-curator → הטמעה לפרופיל' (#87 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m42s Details	2026-06-06 19:18:13 +00:00
chaim	6313fcd316	Merge pull request 'feat(style-acq T6+T13): פנקס-התאמה + מדד מרחק-סגנון ב-UI' (#86 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 38s Details	2026-06-06 19:13:32 +00:00
chaim	7b1c0c1a32	Merge pull request 'feat(style-acq T12): /methodology — ביטויי-מעבר + אנטי-דפוסים editable' (#85 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 19:09:15 +00:00
chaim	3b3e1e3bbf	Merge pull request 'docs: FU-14 GAP-54 — סגירה כ-resolved-by-FU-1 (קליטת-פסיקה כבר מאוחדת)' (#84 ) from docs/gap54-closure into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 9s Details	2026-06-06 19:03:14 +00:00
Chaim	37dcb30604	docs: FU-14 GAP-54 — סגירה כ-resolved-by-FU-1 (איחוד קליטת-פסיקה) אימות (G2 — לא לפתור מחדש): קליטת-הפסיקה כבר מאוחדת ע"י FU-1. שני מסלולי- הפסיקה (precedent_library + internal_decisions) עוברים דרך ingest.ingest_document הקנוני עם ולידציית-enums + citation-guard סימטריים (מתועד ב-01-ingest §4). המסלול ה-3 (training→style_corpus) הוא קורפוס נפרד במכוון. מאומת ב-test_unified_ingest (9/9). אין קוד — רק תיעוד סגירה. Invariants: מאשר INV-ING1 + G2 מקוימים. doc-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:02:55 +00:00