feat(precedents): metadata auto-fill, edit sheet, persuasive extraction

Three improvements to the precedent library based on usage feedback: 1. Auto-fill metadata at upload time. New service precedent_metadata_extractor reads the ruling's full_text and suggests case_name (short), summary, headnote, key_quote, subject_tags, appeal_subtype. The merge policy fills only empty fields, preserving everything the chair typed in the upload form. Wired into the ingest pipeline; also exposed as a re-run endpoint POST /api/precedent-library/{id}/extract-metadata for existing records. 2. Edit sheet in the UI. Pencil icon on each library row opens a pre-populated form covering every field. A Sparkles button on the sheet runs the metadata extractor on demand and refreshes the form. The case_number is read-only because halachot are FK'd to it; renaming requires delete + re-upload. 3. Halacha extractor branches on is_binding. Sources marked binding (Supreme/Administrative) keep the strict halacha prompt. Non-binding sources (other appeals committees, district courts on planning matters) get a different prompt that extracts applications, interpretive principles, and persuasive conclusions — labeled with new rule_types 'application' and 'persuasive'. The fallback also widens chunk selection: if the chunker labeled nothing as legal_analysis/ruling/conclusion, we now run on all chunks rather than returning zero halachot for a usable ruling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 10:19:35 +00:00
parent b51163b67c
commit 73a79ea7e8
10 changed files with 841 additions and 21 deletions
--- a/mcp-server/src/legal_mcp/services/halacha_extractor.py
+++ b/mcp-server/src/legal_mcp/services/halacha_extractor.py
@@ -41,7 +41,23 @@ CHUNK_RETRY_ATTEMPTS = 1
 EXTRACTABLE_SECTIONS = ("legal_analysis", "ruling", "conclusion")


-HALACHA_EXTRACTION_PROMPT = """אתה משפטן בכיר המתמחה בדיני תכנון ובניה (ועדות ערר, היטל השבחה, פיצויים לפי סעיף 197 לחוק התכנון והבניה). תפקידך: לחלץ הלכות מחייבות מתוך פסק דין/החלטה משפטית.
+# Two prompts — choose by source's is_binding flag.
+#
+# The binding prompt extracts strict halachot (rules a future panel MUST
+# follow). It rejects obiter dicta, factual findings, and citations of
+# other rulings that the present court only mentioned in passing.
+#
+# The persuasive prompt is for sources that don't establish binding law
+# (most appeals committee decisions, district courts on planning matters,
+# etc.). For those, the value is in **how the panel reasoned and applied**
+# established law to facts — not in new halachot. The user explicitly
+# wants to be able to cite "another committee reached the same conclusion"
+# even though it is not binding.
+#
+# The schema's rule_type field accepts six values:
+#   binding | interpretive | procedural | obiter | application | persuasive
+
+HALACHA_EXTRACTION_PROMPT_BINDING = """אתה משפטן בכיר המתמחה בדיני תכנון ובניה (ועדות ערר, היטל השבחה, פיצויים לפי סעיף 197 לחוק התכנון והבניה). תפקידך: לחלץ הלכות מחייבות מתוך פסק דין/החלטה משפטית של ערכאה עליונה (עליון / מנהלי).

 ## הגדרות מחייבות

@@ -94,8 +110,60 @@ HALACHA_EXTRACTION_PROMPT = """אתה משפטן בכיר המתמחה בדינ
 """


+HALACHA_EXTRACTION_PROMPT_PERSUASIVE = """אתה משפטן בכיר המתמחה בדיני תכנון ובניה. תפקידך: לחלץ עקרונות, יישומים ומסקנות מתוך החלטה של ועדת ערר אחרת או של בית משפט שאינו ערכאה עליונה לסוגיה.
+
+## חשוב — מה לחלץ ומה לא
+
+המקור הזה **אינו** מקור להלכות מחייבות חדשות (binding rules). הלכות מחייבות מגיעות מהעליון/מנהלי. עם זאת, יש כאן ערך משמעותי שצריך לחלץ — איך הפנל הזה ניתח ויישם את הדין הקיים. כשנכתוב החלטה עתידית, נצטט מהמקור הזה כ"גם ועדת הערר ב-X הגיעה למסקנה דומה" — לא כסמכות מחייבת, אלא כתמיכה משכנעת.
+
+**יש לחלץ:**
+- **יישום של הלכה ידועה** (rule_type=`application`) — הפנל החיל הלכה ידועה (של עליון/מנהלי) על עובדות הנידונות. תצטט את ניסוח הכלל **כפי שהוצג כאן** (לא בהכרח כפי שנקבע במקור) ואת התוצאה.
+- **עקרון פרשני שאומץ** (rule_type=`interpretive`) — איך הפנל פירש סעיף חוק / תכנית, באופן שניתן לאמץ.
+- **כלל פרוצדורלי** (rule_type=`procedural`) — קביעות בנושאי סמכות, מועדים, הליך.
+- **מסקנה מנומקת ומשכנעת** (rule_type=`persuasive`) — מסקנה שלמה של הפנל בסוגיה, עם ההיגיון התומך, ניתנת לציטוט כאסמכתא משכנעת.
+
+**אין לחלץ:**
+- ממצאים עובדתיים ספציפיים לתיק ("העורר לא הוכיח X").
+- ציטוטים מפסקי דין אחרים ללא ניתוח של הפנל.
+- אמרות אגב חסרות חשיבות.
+
+## תחומים אפשריים (practice_areas) — תחומי ועדת הערר בלבד
+- rishuy_uvniya — רישוי ובניה (תיקי 1xxx: היתרים, שימוש חורג, תכניות, קווי בניין, גובה, חניה)
+- betterment_levy — היטל השבחה (תיקי 8xxx: שומה, מערכות, תכניות המקנות בה, מועד קובע, סופיות ההחלטה)
+- compensation_197 — פיצויים לפי ס' 197 (תיקי 9xxx: פגיעה במקרקעין, ירידת ערך, ס' 200/פטור)
+
+## פלט נדרש
+החזר JSON array בלבד, ללא markdown, ללא הסברים:
+[
+  {
+    "rule_statement": "ניסוח הכלל / המסקנה / היישום בלשון משפטית מדויקת, 1-3 משפטים.",
+    "rule_type": "application",
+    "reasoning_summary": "תמצית ההיגיון של הפנל (1-2 משפטים).",
+    "supporting_quote": "ציטוט מילולי מדויק מהקלט שתומך בכלל. חייב להופיע מילה במילה.",
+    "page_reference": "פס' 12 / עמ' 8 — ככל שניתן לזהות.",
+    "practice_areas": ["betterment_levy"],
+    "subject_tags": ["מועד_קביעת_שומה", "תכנית_רחביה"],
+    "cites": ["עע\\"מ 3975/22"],
+    "confidence": 0.85
+  }
+]
+
+## כללי איכות
+1. **נאמנות מוחלטת לציטוט** — supporting_quote חייב להיות הדבקה מדויקת מהקלט. אם אין ציטוט מתאים — אל תוסיף את ההלכה.
+2. **מספר הלכות** — החלטה ארוכה של ועדת ערר יכולה להניב 2-8 פריטים (יישומים + מסקנות). אם אין מה לחלץ — החזר [].
+3. **rule_type מדויק** — application = יישום הלכה ידועה. interpretive = פרשנות. procedural = פרוצדורה. persuasive = מסקנה כללית בעלת ערך כאסמכתא.
+4. **לא לפצל יתר על המידה** — שני סעיפים זהים מבחינה רעיונית = פריט אחד.
+5. **שפה** — עברית משפטית מקצועית, גוף שלישי.
+6. **subject_tags** — 2-5 תגיות בעברית, snake_case.
+7. **confidence** — 0..1. דייק.
+"""
+
+
 _VALID_PRACTICE_AREAS = {"rishuy_uvniya", "betterment_levy", "compensation_197"}
-_VALID_RULE_TYPES = {"binding", "interpretive", "procedural", "obiter"}
+_VALID_RULE_TYPES = {
+    "binding", "interpretive", "procedural", "obiter",
+    "application", "persuasive",
+}


 def _normalize_for_comparison(text: str) -> str:
@@ -135,10 +203,13 @@ def _verify_quote(supporting_quote: str, full_text: str) -> bool:
    return False


-def _coerce_halacha(raw: dict) -> dict | None:
+def _coerce_halacha(raw: dict, is_binding: bool = True) -> dict | None:
    """Validate and normalize one LLM-returned halacha dict.

-    Returns ``None`` if the entry is missing required fields.
+    Returns ``None`` if the entry is missing required fields. ``is_binding``
+    only affects the default rule_type when the LLM returned an unknown
+    value — for binding sources we default to ``binding``, otherwise to
+    ``persuasive`` (never pretend an appeals committee created halacha).
    """
    if not isinstance(raw, dict):
        return None
@@ -147,9 +218,13 @@ def _coerce_halacha(raw: dict) -> dict | None:
    if not rule_statement or not supporting_quote:
        return None

-    rule_type = (raw.get("rule_type") or "binding").strip().lower()
+    default_rule_type = "binding" if is_binding else "persuasive"
+    rule_type = (raw.get("rule_type") or default_rule_type).strip().lower()
    if rule_type not in _VALID_RULE_TYPES:
-        rule_type = "binding"
+        rule_type = default_rule_type
+    # Guard: don't let a non-binding source produce 'binding' rule_type
+    if not is_binding and rule_type == "binding":
+        rule_type = "persuasive"

    practice_areas_raw = raw.get("practice_areas") or []
    if isinstance(practice_areas_raw, str):
@@ -191,11 +266,21 @@ async def _extract_chunk(
    chunk_index: int,
    chunk_total: int,
    context: str,
+    is_binding: bool,
 ) -> list[dict]:
-    """Run the halacha extractor on one chunk with retry."""
+    """Run the halacha extractor on one chunk with retry.
+
+    The prompt branches on ``is_binding`` so that non-binding sources
+    (other appeals committees, district courts) yield application /
+    persuasive entries rather than a forced 0-result strict halacha pass.
+    """
+    base_prompt = (
+        HALACHA_EXTRACTION_PROMPT_BINDING if is_binding
+        else HALACHA_EXTRACTION_PROMPT_PERSUASIVE
+    )
    chunk_label = f" (חלק {chunk_index + 1}/{chunk_total})" if chunk_total > 1 else ""
    prompt = (
-        f"{HALACHA_EXTRACTION_PROMPT}\n\n"
+        f"{base_prompt}\n\n"
        f"## הקלט\n"
        f"סוג קטע: {section_type}\n"
        f"{context}{chunk_label}\n\n"
@@ -241,9 +326,24 @@ async def extract(case_law_id: UUID | str) -> dict:
    if not record:
        return {"status": "not_found", "extracted": 0, "stored": 0}

+    is_binding = bool(record.get("is_binding"))
+
+    # Try the targeted sections first (legal_analysis / ruling / conclusion).
+    # If the chunker labeled everything as 'other' (common when a ruling
+    # uses non-standard headings or the section markers aren't bracketed
+    # cleanly), fall back to ALL chunks — better to over-include than to
+    # silently skip a ruling that has reasoning under an unexpected label.
    chunks = await db.list_precedent_chunks(
        case_law_id, section_types=EXTRACTABLE_SECTIONS,
    )
+    if not chunks:
+        chunks = await db.list_precedent_chunks(case_law_id)
+        if chunks:
+            logger.info(
+                "halacha_extractor: case_law=%s — no targeted sections, "
+                "falling back to all %d chunks",
+                case_law_id, len(chunks),
+            )
    if not chunks:
        await db.set_case_law_halacha_status(case_law_id, "completed")
        return {"status": "no_chunks", "extracted": 0, "stored": 0}
@@ -262,7 +362,7 @@ async def extract(case_law_id: UUID | str) -> dict:
        async with sem:
            return await _extract_chunk(
                chunk_row["content"], chunk_row["section_type"],
-                idx, len(chunks), context,
+                idx, len(chunks), context, is_binding,
            )

    chunk_results = await asyncio.gather(
@@ -281,7 +381,7 @@ async def extract(case_law_id: UUID | str) -> dict:

    cleaned: list[dict] = []
    for raw in raw_halachot:
-        coerced = _coerce_halacha(raw)
+        coerced = _coerce_halacha(raw, is_binding=is_binding)
        if coerced is None:
            continue
        coerced["quote_verified"] = _verify_quote(