feat(halacha): over-extraction consolidation — fold facets via claude_session (#81.5)

After a precedent finishes extracting, a claude_session pass folds facets of the
SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89
granularity gap) into one canonical; the rest are marked 'rejected' (reversible:
out of the active corpus AND the review queue, but recoverable). FOLD-ONLY —
never merges distinct legal questions, never invents.

- Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding
  needs careful judgment. One pass per precedent, runs in _extract_impl once all
  chunks are done (the prompt dedups within a chunk; this catches across chunks).
- Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM,
  build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed
  shape; drops <2-member groups; coerces/dedups indices).
- halacha_extractor._consolidate_precedent picks the canonical per group
  (approved>pending, higher confidence, quote_verified, longer) and rejects the
  rest via the existing update_halachot_batch (#84). Never rejects a canonical.
  Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched).
- config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT.

Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group
folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude
error → 0 folds (fail-open).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-03 16:26:44 +00:00
parent 5efb8cf915
commit fb60dca796
4 changed files with 176 additions and 3 deletions

View File

@@ -163,6 +163,15 @@ HALACHA_NLI_ENABLED = os.environ.get("HALACHA_NLI_ENABLED", "true").lower() == "
HALACHA_NLI_MODEL = os.environ.get("HALACHA_NLI_MODEL", HALACHA_EXTRACT_MODEL)
HALACHA_NLI_EFFORT = os.environ.get("HALACHA_NLI_EFFORT", "low")
# Halacha over-extraction consolidation (#81.5) — after a precedent finishes
# extracting, a claude_session pass folds facets of the SAME legal question
# (below the #82 dedup cosine) into one canonical; the rest are marked rejected
# (reversible). Cross-chunk safety net for over-splitting. Runs through the local
# CLI (zero cost); fails OPEN. 'high' effort — folding needs careful judgment.
HALACHA_CONSOLIDATE_ENABLED = os.environ.get("HALACHA_CONSOLIDATE_ENABLED", "true").lower() == "true"
HALACHA_CONSOLIDATE_MODEL = os.environ.get("HALACHA_CONSOLIDATE_MODEL", HALACHA_EXTRACT_MODEL)
HALACHA_CONSOLIDATE_EFFORT = os.environ.get("HALACHA_CONSOLIDATE_EFFORT", "high")
# Google Cloud Vision (OCR for scanned PDFs)
GOOGLE_CLOUD_VISION_API_KEY = os.environ.get("GOOGLE_CLOUD_VISION_API_KEY", "")