feat(halacha): over-extraction consolidation — fold facets via claude_session (#81.5)
After a precedent finishes extracting, a claude_session pass folds facets of the SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89 granularity gap) into one canonical; the rest are marked 'rejected' (reversible: out of the active corpus AND the review queue, but recoverable). FOLD-ONLY — never merges distinct legal questions, never invents. - Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding needs careful judgment. One pass per precedent, runs in _extract_impl once all chunks are done (the prompt dedups within a chunk; this catches across chunks). - Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM, build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed shape; drops <2-member groups; coerces/dedups indices). - halacha_extractor._consolidate_precedent picks the canonical per group (approved>pending, higher confidence, quote_verified, longer) and rejects the rest via the existing update_halachot_batch (#84). Never rejects a canonical. Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched). - config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT. Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude error → 0 folds (fail-open). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -185,6 +185,66 @@ def parse_nli_verdicts(raw, n: int) -> list[str]:
|
||||
return out
|
||||
|
||||
|
||||
# ── Over-extraction consolidation (fold facets of one legal question) — #81.5 ──
|
||||
#
|
||||
# #82 dedup-on-insert removes near-EXACT dups (cosine ≥ 0.93). #81.5 handles the
|
||||
# remaining over-extraction: facets of the SAME legal question, phrased
|
||||
# differently, that sit BELOW the dedup threshold (the שפר 14-vs-4 / 403-17→89
|
||||
# granularity gap). A per-precedent claude_session pass groups such facets; the
|
||||
# extractor keeps one canonical per group and marks the rest rejected (reversible,
|
||||
# out of the active corpus + review queue). FOLD-ONLY — never merges distinct
|
||||
# legal questions, never invents. Fails OPEN (parse error → no folds).
|
||||
|
||||
CONSOLIDATE_SYSTEM = (
|
||||
"אתה מאחד פנים-כפולים של הלכות שחולצו מאותו פסק דין. בהינתן רשימה ממוספרת של הלכות, "
|
||||
"זהה קבוצות של הלכות שהן **אותה שאלה משפטית** בניסוחים או פנים שונים. "
|
||||
"כללים: (1) אַחֵד רק הלכות שעונות על אותה שאלה משפטית בדיוק; (2) **אל תאַחֵד** הלכות "
|
||||
"שעונות על שאלות משפטיות שונות (גם אם קרובות בנושא); (3) הלכה ייחודית — אל תכלול בשום קבוצה. "
|
||||
'החזר JSON array של קבוצות, כל קבוצה = array של מספרי-האינדקס שיש לאַחֵד (לפחות 2 חברים). '
|
||||
"לדוגמה: [[2,5,9],[14,18]]. אם אין מה לאַחֵד החזר []. ללא markdown, ללא הסבר."
|
||||
)
|
||||
|
||||
|
||||
def build_consolidation_prompt(items: list[dict]) -> str:
|
||||
"""Numbered list of a precedent's halachot (index + rule + reasoning)."""
|
||||
blocks = []
|
||||
for h in items:
|
||||
idx = h.get("halacha_index")
|
||||
rule = (h.get("rule_statement") or "").strip()
|
||||
reason = (h.get("reasoning_summary") or "").strip()
|
||||
line = f"[{idx}] {rule}"
|
||||
if reason:
|
||||
line += f" (היגיון: {reason})"
|
||||
blocks.append(line)
|
||||
return "\n".join(blocks)
|
||||
|
||||
|
||||
def parse_fold_groups(raw) -> list[list[int]]:
|
||||
"""Coerce judge output into a list of fold-groups (≥2 int indices each).
|
||||
|
||||
Fails SAFE: any malformed shape → [] (no folding). Non-int / <2-member
|
||||
groups are dropped.
|
||||
"""
|
||||
if not isinstance(raw, list):
|
||||
return []
|
||||
groups: list[list[int]] = []
|
||||
for g in raw:
|
||||
if not isinstance(g, list):
|
||||
continue
|
||||
members: list[int] = []
|
||||
for x in g:
|
||||
try:
|
||||
members.append(int(x))
|
||||
except (TypeError, ValueError):
|
||||
continue
|
||||
# dedup within group, preserve order
|
||||
seen: set[int] = set()
|
||||
members = [m for m in members if not (m in seen or seen.add(m))]
|
||||
if len(members) >= 2:
|
||||
groups.append(members)
|
||||
return groups
|
||||
|
||||
|
||||
def compute_quality_flags(
|
||||
rule_statement: str,
|
||||
supporting_quote: str,
|
||||
|
||||
Reference in New Issue
Block a user