feat(halacha): over-extraction consolidation — fold facets via claude_session (#81.5)

After a precedent finishes extracting, a claude_session pass folds facets of the
SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89
granularity gap) into one canonical; the rest are marked 'rejected' (reversible:
out of the active corpus AND the review queue, but recoverable). FOLD-ONLY —
never merges distinct legal questions, never invents.

- Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding
  needs careful judgment. One pass per precedent, runs in _extract_impl once all
  chunks are done (the prompt dedups within a chunk; this catches across chunks).
- Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM,
  build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed
  shape; drops <2-member groups; coerces/dedups indices).
- halacha_extractor._consolidate_precedent picks the canonical per group
  (approved>pending, higher confidence, quote_verified, longer) and rejects the
  rest via the existing update_halachot_batch (#84). Never rejects a canonical.
  Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched).
- config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT.

Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group
folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude
error → 0 folds (fail-open).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-03 16:26:44 +00:00
parent 5efb8cf915
commit fb60dca796
4 changed files with 176 additions and 3 deletions

View File

@@ -185,6 +185,66 @@ def parse_nli_verdicts(raw, n: int) -> list[str]:
return out
# ── Over-extraction consolidation (fold facets of one legal question) — #81.5 ──
#
# #82 dedup-on-insert removes near-EXACT dups (cosine ≥ 0.93). #81.5 handles the
# remaining over-extraction: facets of the SAME legal question, phrased
# differently, that sit BELOW the dedup threshold (the שפר 14-vs-4 / 403-17→89
# granularity gap). A per-precedent claude_session pass groups such facets; the
# extractor keeps one canonical per group and marks the rest rejected (reversible,
# out of the active corpus + review queue). FOLD-ONLY — never merges distinct
# legal questions, never invents. Fails OPEN (parse error → no folds).
CONSOLIDATE_SYSTEM = (
"אתה מאחד פנים-כפולים של הלכות שחולצו מאותו פסק דין. בהינתן רשימה ממוספרת של הלכות, "
"זהה קבוצות של הלכות שהן **אותה שאלה משפטית** בניסוחים או פנים שונים. "
"כללים: (1) אַחֵד רק הלכות שעונות על אותה שאלה משפטית בדיוק; (2) **אל תאַחֵד** הלכות "
"שעונות על שאלות משפטיות שונות (גם אם קרובות בנושא); (3) הלכה ייחודית — אל תכלול בשום קבוצה. "
'החזר JSON array של קבוצות, כל קבוצה = array של מספרי-האינדקס שיש לאַחֵד (לפחות 2 חברים). '
"לדוגמה: [[2,5,9],[14,18]]. אם אין מה לאַחֵד החזר []. ללא markdown, ללא הסבר."
)
def build_consolidation_prompt(items: list[dict]) -> str:
"""Numbered list of a precedent's halachot (index + rule + reasoning)."""
blocks = []
for h in items:
idx = h.get("halacha_index")
rule = (h.get("rule_statement") or "").strip()
reason = (h.get("reasoning_summary") or "").strip()
line = f"[{idx}] {rule}"
if reason:
line += f" (היגיון: {reason})"
blocks.append(line)
return "\n".join(blocks)
def parse_fold_groups(raw) -> list[list[int]]:
"""Coerce judge output into a list of fold-groups (≥2 int indices each).
Fails SAFE: any malformed shape → [] (no folding). Non-int / <2-member
groups are dropped.
"""
if not isinstance(raw, list):
return []
groups: list[list[int]] = []
for g in raw:
if not isinstance(g, list):
continue
members: list[int] = []
for x in g:
try:
members.append(int(x))
except (TypeError, ValueError):
continue
# dedup within group, preserve order
seen: set[int] = set()
members = [m for m in members if not (m in seen or seen.add(m))]
if len(members) >= 2:
groups.append(members)
return groups
def compute_quality_flags(
rule_statement: str,
supporting_quote: str,