feat(halacha): over-extraction consolidation — fold facets via claude_session (#81.5)

After a precedent finishes extracting, a claude_session pass folds facets of the SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89 granularity gap) into one canonical; the rest are marked 'rejected' (reversible: out of the active corpus AND the review queue, but recoverable). FOLD-ONLY — never merges distinct legal questions, never invents. - Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding needs careful judgment. One pass per precedent, runs in _extract_impl once all chunks are done (the prompt dedups within a chunk; this catches across chunks). - Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM, build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed shape; drops <2-member groups; coerces/dedups indices). - halacha_extractor._consolidate_precedent picks the canonical per group (approved>pending, higher confidence, quote_verified, longer) and rejects the rest via the existing update_halachot_batch (#84). Never rejects a canonical. Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched). - config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT. Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude error → 0 folds (fail-open). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 16:26:44 +00:00
parent 5efb8cf915
commit fb60dca796
4 changed files with 176 additions and 3 deletions
--- a/mcp-server/src/legal_mcp/services/halacha_quality.py
+++ b/mcp-server/src/legal_mcp/services/halacha_quality.py
@@ -185,6 +185,66 @@ def parse_nli_verdicts(raw, n: int) -> list[str]:
    return out


+# ── Over-extraction consolidation (fold facets of one legal question) — #81.5 ──
+#
+# #82 dedup-on-insert removes near-EXACT dups (cosine ≥ 0.93). #81.5 handles the
+# remaining over-extraction: facets of the SAME legal question, phrased
+# differently, that sit BELOW the dedup threshold (the שפר 14-vs-4 / 403-17→89
+# granularity gap). A per-precedent claude_session pass groups such facets; the
+# extractor keeps one canonical per group and marks the rest rejected (reversible,
+# out of the active corpus + review queue). FOLD-ONLY — never merges distinct
+# legal questions, never invents. Fails OPEN (parse error → no folds).
+
+CONSOLIDATE_SYSTEM = (
+    "אתה מאחד פנים-כפולים של הלכות שחולצו מאותו פסק דין. בהינתן רשימה ממוספרת של הלכות, "
+    "זהה קבוצות של הלכות שהן **אותה שאלה משפטית** בניסוחים או פנים שונים. "
+    "כללים: (1) אַחֵד רק הלכות שעונות על אותה שאלה משפטית בדיוק; (2) **אל תאַחֵד** הלכות "
+    "שעונות על שאלות משפטיות שונות (גם אם קרובות בנושא); (3) הלכה ייחודית — אל תכלול בשום קבוצה. "
+    'החזר JSON array של קבוצות, כל קבוצה = array של מספרי-האינדקס שיש לאַחֵד (לפחות 2 חברים). '
+    "לדוגמה: [[2,5,9],[14,18]]. אם אין מה לאַחֵד החזר []. ללא markdown, ללא הסבר."
+)
+
+
+def build_consolidation_prompt(items: list[dict]) -> str:
+    """Numbered list of a precedent's halachot (index + rule + reasoning)."""
+    blocks = []
+    for h in items:
+        idx = h.get("halacha_index")
+        rule = (h.get("rule_statement") or "").strip()
+        reason = (h.get("reasoning_summary") or "").strip()
+        line = f"[{idx}] {rule}"
+        if reason:
+            line += f"  (היגיון: {reason})"
+        blocks.append(line)
+    return "\n".join(blocks)
+
+
+def parse_fold_groups(raw) -> list[list[int]]:
+    """Coerce judge output into a list of fold-groups (≥2 int indices each).
+
+    Fails SAFE: any malformed shape → [] (no folding). Non-int / <2-member
+    groups are dropped.
+    """
+    if not isinstance(raw, list):
+        return []
+    groups: list[list[int]] = []
+    for g in raw:
+        if not isinstance(g, list):
+            continue
+        members: list[int] = []
+        for x in g:
+            try:
+                members.append(int(x))
+            except (TypeError, ValueError):
+                continue
+        # dedup within group, preserve order
+        seen: set[int] = set()
+        members = [m for m in members if not (m in seen or seen.add(m))]
+        if len(members) >= 2:
+            groups.append(members)
+    return groups
+
+
 def compute_quality_flags(
    rule_statement: str,
    supporting_quote: str,