fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7)

The extractor classified rule_type by SOURCE bindingness (higher-court→binding, committee→persuasive) instead of by rule KIND. The gold-set proved it: 'binding' appeared on 19/19 external rulings & 0 committees; 'persuasive' on 13/13 committees & 0 external — only 58% agreement with the human role tags. The two axes (authority vs rule role) were crammed into one enum. This splits them per INV-DM7: - authority (binding/persuasive) — DERIVED from case_law.precedent_level (עליון/מנהלי→binding, ועדת_ערר_מחוזית→persuasive), never stored, never LLM-guessed. New helper halacha_quality.derive_authority; surfaced read-only in list_halachot / goldset_list / search results. - rule_type — now the rule ROLE only: holding/interpretive/procedural/ application/obiter. Both extractor prompts unified to this vocabulary; _coerce_halacha no longer defaults rule_type from the source; legacy binding→holding / persuasive→interpretive fold for safety. UI: authority shown as a separate read-only badge (gold=מחייב / muted=משכנע) across the review queue, precedent detail, and gold-set; the gold-set role selector drops binding/persuasive and adds מהותי (holding). Migration: scripts/halacha_rule_role_backfill.py re-classifies the 276 pre-split binding/persuasive rows into a genuine role via local claude_session (run after deploy). Gold-set correct_type/ai_correct_type 'binding'→'holding' via SQL. Sources (≥3, per research-decision policy): OASIS LegalRuleML v1.0 (appliesAuthority/Strength as metadata orthogonal to rule logic) · SemEval-2023 Task 6 LegalEval (rhetorical roles by function, authority kept separate) · Bluebook signals (weight-of-authority is a separate dimension). Invariants: ESTABLISHES INV-DM7. Upholds G1 (normalize at source — extractor classifies role, system derives authority) and G2 (single source of truth — authority derived, not a parallel stored field). Tests: 211 pass + new derive_authority/coerce coverage. web-ui build + tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:18:41 +00:00
parent 955675eb1f
commit 2e33cac043
16 changed files with 407 additions and 92 deletions
--- a/mcp-server/src/legal_mcp/services/halacha_quality.py
+++ b/mcp-server/src/legal_mcp/services/halacha_quality.py
@@ -18,6 +18,37 @@ from __future__ import annotations

 import re

+# ── Authority axis — DERIVED from the source, never LLM-classified (INV-DM7) ──
+#
+# A halacha's *authority* (binding vs persuasive) is a property of WHERE it came
+# from, not of the rule's content. It is therefore derived deterministically
+# from ``case_law.precedent_level`` and never stored on ``halachot`` or guessed
+# by the extractor — keeping it orthogonal to ``rule_type`` (the rule ROLE).
+# Higher courts (עליון/מנהלי) bind the appeals committee; another committee is
+# only persuasive. See docs/spec/02-data-model.md INV-DM7.
+
+AUTHORITY_BINDING = "binding"
+AUTHORITY_PERSUASIVE = "persuasive"
+
+_BINDING_LEVELS = {"עליון", "מנהלי"}
+_PERSUASIVE_LEVELS = {"ועדת_ערר_מחוזית"}
+
+
+def derive_authority(precedent_level: str | None) -> str | None:
+    """Map a source's precedent_level to its authority over the committee.
+
+    Returns ``"binding"`` for higher courts (עליון/מנהלי), ``"persuasive"`` for
+    another appeals committee (ועדת_ערר_מחוזית), or ``None`` when the level is
+    unknown/empty (never guesses). Pure — the single source of truth for the
+    authority axis (INV-DM7).
+    """
+    level = (precedent_level or "").strip()
+    if level in _BINDING_LEVELS:
+        return AUTHORITY_BINDING
+    if level in _PERSUASIVE_LEVELS:
+        return AUTHORITY_PERSUASIVE
+    return None
+
 # ── Hebrew text normalization (shared with the extractor's quote check) ──

 _HEB_QUOTE_VARIANTS = "\"'׳״‘’“”«»„′″"
@@ -337,7 +368,7 @@ def compute_quality_flags(
    supporting_quote: str,
    reasoning_summary: str = "",
    quote_verified: bool = True,
-    rule_type: str = "binding",
+    rule_type: str = "interpretive",
 ) -> list[str]:
    """Return the list of quality flags for one halacha (empty == clean).