feat(halacha): over-extraction consolidation — fold facets via claude_session (#81.5)

After a precedent finishes extracting, a claude_session pass folds facets of the SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89 granularity gap) into one canonical; the rest are marked 'rejected' (reversible: out of the active corpus AND the review queue, but recoverable). FOLD-ONLY — never merges distinct legal questions, never invents. - Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding needs careful judgment. One pass per precedent, runs in _extract_impl once all chunks are done (the prompt dedups within a chunk; this catches across chunks). - Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM, build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed shape; drops <2-member groups; coerces/dedups indices). - halacha_extractor._consolidate_precedent picks the canonical per group (approved>pending, higher confidence, quote_verified, longer) and rejects the rest via the existing update_halachot_batch (#84). Never rejects a canonical. Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched). - config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT. Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude error → 0 folds (fail-open). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 16:26:44 +00:00
parent 5efb8cf915
commit fb60dca796
4 changed files with 176 additions and 3 deletions
--- a/mcp-server/tests/test_halacha_quality.py
+++ b/mcp-server/tests/test_halacha_quality.py
@@ -146,3 +146,38 @@ def test_nli_check_empty():
    import asyncio
    from legal_mcp.services import halacha_extractor as he
    assert asyncio.run(he._nli_check([])) == []
+
+
+# ── #81.5 consolidation — pure prompt + fold-group parser ──
+
+def test_build_consolidation_prompt():
+    items = [
+        {"halacha_index": 3, "rule_statement": "כלל גימל", "reasoning_summary": "כי"},
+        {"halacha_index": 7, "rule_statement": "כלל זין", "reasoning_summary": ""},
+    ]
+    p = hq.build_consolidation_prompt(items)
+    assert "[3] כלל גימל" in p and "[7] כלל זין" in p and "היגיון: כי" in p
+
+
+@pytest.mark.parametrize("raw,expected", [
+    ([[2, 5, 9], [14, 18]], [[2, 5, 9], [14, 18]]),
+    ([[2, 5], [7]], [[2, 5]]),                  # singleton group dropped
+    ([["2", "5"]], [[2, 5]]),                    # string ints coerced
+    ([[2, 2, 5]], [[2, 5]]),                     # dedup within group
+    ([], []),                                    # nothing to fold
+    ("garbage", []),                             # non-list -> safe
+    (None, []),                                  # None -> safe
+    ([[1, "x"], [3, 4]], [[3, 4]]),             # drop group that falls below 2 valid
+])
+def test_parse_fold_groups(raw, expected):
+    assert hq.parse_fold_groups(raw) == expected
+
+
+def test_consolidation_priority_prefers_approved_then_confidence():
+    from legal_mcp.services import halacha_extractor as he
+    approved = {"id": "a", "review_status": "approved", "confidence": 0.7,
+                "quote_verified": True, "rule_statement": "x"}
+    pending_hi = {"id": "b", "review_status": "pending_review", "confidence": 0.95,
+                  "quote_verified": True, "rule_statement": "x"}
+    # approved sorts before higher-confidence pending → kept as canonical
+    assert min([approved, pending_hi], key=he._consolidation_priority)["id"] == "a"