feat(precedents): metadata auto-fill, edit sheet, persuasive extraction
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s

Three improvements to the precedent library based on usage feedback:

1. Auto-fill metadata at upload time. New service
   precedent_metadata_extractor reads the ruling's full_text and
   suggests case_name (short), summary, headnote, key_quote,
   subject_tags, appeal_subtype. The merge policy fills only empty
   fields, preserving everything the chair typed in the upload form.
   Wired into the ingest pipeline; also exposed as a re-run endpoint
   POST /api/precedent-library/{id}/extract-metadata for existing
   records.

2. Edit sheet in the UI. Pencil icon on each library row opens a
   pre-populated form covering every field. A Sparkles button on the
   sheet runs the metadata extractor on demand and refreshes the
   form. The case_number is read-only because halachot are FK'd to
   it; renaming requires delete + re-upload.

3. Halacha extractor branches on is_binding. Sources marked binding
   (Supreme/Administrative) keep the strict halacha prompt. Non-binding
   sources (other appeals committees, district courts on planning
   matters) get a different prompt that extracts applications,
   interpretive principles, and persuasive conclusions — labeled with
   new rule_types 'application' and 'persuasive'. The fallback also
   widens chunk selection: if the chunker labeled nothing as
   legal_analysis/ruling/conclusion, we now run on all chunks rather
   than returning zero halachot for a usable ruling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-03 10:19:35 +00:00
parent b51163b67c
commit 73a79ea7e8
10 changed files with 841 additions and 21 deletions

View File

@@ -0,0 +1,216 @@
"""Auto-extract precedent metadata from a freshly-uploaded ruling.
Runs after chunking. Reads the precedent's full_text and asks Claude to
fill in the metadata fields that an upload form usually leaves empty:
short case_name, summary, headnote, key_quote, subject_tags,
appeal_subtype.
Caller policy: only empty user-supplied fields are filled. Anything the
chair already typed in the upload form is preserved. This is enforced
in ``apply_to_record``.
"""
from __future__ import annotations
import logging
from uuid import UUID
from legal_mcp.config import parse_llm_json
from legal_mcp.services import claude_session, db
logger = logging.getLogger(__name__)
# The prompt is short — we only need the first 12K chars of the ruling
# (header + opening of discussion is enough for naming + summary). For
# subject tags we sample the discussion section too.
_HEAD_CHARS = 12_000
_TAIL_CHARS = 6_000
METADATA_EXTRACTION_PROMPT = """אתה מסייע משפטי בכיר. קרא את פסק הדין/ההחלטה הבא וחלץ ממנו מטא-דאטה לקטלוג הקורפוס.
המטרה: למלא שדות בטופס העלאה שהמשתמש הזין באופן חלקי. **אל תמציא** — אם המידע לא מופיע בטקסט, השאר ריק (מחרוזת ריקה / מערך ריק).
## פלט נדרש
החזר JSON אחד (object — לא array) בפורמט הבא, ללא markdown וללא הסברים:
{
"case_name_short": "שם קצר ל-3-6 מילים (למשל 'אהרון ברק' או 'ב. קרן-נכסים'). אל תכלול מספר תיק. שם המבקש/העורר העיקרי. אם זו החלטה מאוחדת — שם הצד המוביל.",
"appeal_subtype": "תת-סוג ספציפי בתוך תחום המשפט (למשל 'תכנית רחביה', 'מימוש במכר', 'תמ\\"א 38', 'שימוש חורג', 'סופיות ההחלטה'). מילה אחת או צירוף קצר.",
"summary": "תקציר עניני 2-3 משפטים: מה הייתה השאלה, מה הוכרע. בלי שיפוט.",
"headnote": "headnote בסגנון נבו: 1-2 משפטים שמסכמים את העיקרון שנקבע/יושם בפסק. למשל 'תכנית רחביה — היטל השבחה במימוש במכר — אין לחייב כשהזכויות צפות'.",
"key_quote": "ציטוט מילולי בודד, 30-100 מילים, שמייצג את לב הפסק. חייב להופיע מילה במילה בטקסט. אם אין ציטוט מתאים — מחרוזת ריקה.",
"subject_tags": ["תגיות", "נושא", "בעברית"]
}
## כללי איכות
1. **case_name_short** — שם בולט וקצר. בלי 'נ\\'' / 'נגד' / מספרי תיק.
2. **appeal_subtype** — אופציונלי. אם הסוגיה רחבה ולא מסווגת — השאר ריק.
3. **summary** — תיאור ניטרלי, גוף שלישי.
4. **headnote** — לא מצטטים, מסכמים. סגנון נבו: ביטוי קצר אחד.
5. **key_quote** — חייב להיות הדבקה מילולית מהקלט. אם אין ציטוט בולט — השאר ריק.
6. **subject_tags** — 3-7 תגיות בעברית, snake_case (חניה, קווי_בניין, שיקול_דעת, פגם_פרוצדורלי, סמכות, מועדים, פגיעה_במקרקעין, ירידת_ערך, תכנית_רחביה, מימוש_במכר, וכד'). שייך לתחום של ועדת ערר תכנון ובניה.
## הקלט
{context}
--- תחילת הטקסט ---
{text_window}
--- סוף הטקסט ---
"""
def _build_text_window(full_text: str) -> str:
"""Return the head + tail of the ruling, with a marker if truncated.
Most rulings have the parties/subject in the head and the conclusion
in the tail; the middle is the discussion which is captured via the
halacha extractor independently. Sending head+tail keeps the prompt
cheap while preserving naming and conclusion context.
"""
if len(full_text) <= _HEAD_CHARS + _TAIL_CHARS:
return full_text
return (
full_text[:_HEAD_CHARS]
+ "\n\n[... חלק האמצע הושמט עקב אורך — ראה את החלק האחרון של הפסק להלן ...]\n\n"
+ full_text[-_TAIL_CHARS:]
)
async def extract_metadata(case_law_id: UUID | str) -> dict:
"""Run metadata extraction. Returns a dict with the suggested values.
Does NOT write to the DB — caller decides what to merge.
"""
if isinstance(case_law_id, str):
case_law_id = UUID(case_law_id)
record = await db.get_case_law(case_law_id)
if not record:
return {}
full_text = (record.get("full_text") or "").strip()
if not full_text:
return {}
citation = record.get("case_number") or ""
court = record.get("court") or ""
date_str = str(record.get("date") or "")
practice_area = record.get("practice_area") or ""
context = (
f"מראה מקום: {citation}\n"
f"ערכאה: {court}\n"
f"תאריך: {date_str}\n"
f"תחום: {practice_area}"
)
prompt = METADATA_EXTRACTION_PROMPT.format(
context=context, text_window=_build_text_window(full_text),
)
try:
result = await claude_session.query_json(prompt)
except Exception as e:
logger.warning("precedent_metadata_extractor: query failed: %s", e)
return {}
if not isinstance(result, dict):
logger.warning(
"precedent_metadata_extractor: expected dict, got %s",
type(result).__name__,
)
return {}
# Normalize keys / types
out: dict = {}
if isinstance(result.get("case_name_short"), str):
out["case_name_short"] = result["case_name_short"].strip()
if isinstance(result.get("appeal_subtype"), str):
out["appeal_subtype"] = result["appeal_subtype"].strip()
if isinstance(result.get("summary"), str):
out["summary"] = result["summary"].strip()
if isinstance(result.get("headnote"), str):
out["headnote"] = result["headnote"].strip()
if isinstance(result.get("key_quote"), str):
out["key_quote"] = result["key_quote"].strip()
tags = result.get("subject_tags") or []
if isinstance(tags, list):
out["subject_tags"] = [str(t).strip() for t in tags if str(t).strip()]
return out
async def apply_to_record(
case_law_id: UUID | str,
suggested: dict,
) -> dict:
"""Merge suggested metadata into the case_law row, filling ONLY empty fields.
Empty rules:
- string field == "" → fill from suggested
- list field == [] → fill from suggested
- if suggested key is missing or empty, skip
case_name has special handling: if the current case_name equals the
case_number (a tell-tale sign of the upload form sending the long
citation into both fields), treat it as empty and overwrite.
"""
if isinstance(case_law_id, str):
case_law_id = UUID(case_law_id)
record = await db.get_case_law(case_law_id)
if not record:
return {"updated": False, "fields": []}
fields_to_update: dict = {}
cur_case_name = (record.get("case_name") or "").strip()
cur_case_number = (record.get("case_number") or "").strip()
suggested_case_name = (suggested.get("case_name_short") or "").strip()
if suggested_case_name and (
not cur_case_name or cur_case_name == cur_case_number
):
fields_to_update["case_name"] = suggested_case_name
if not (record.get("appeal_subtype") or "").strip():
s = (suggested.get("appeal_subtype") or "").strip()
if s:
fields_to_update["appeal_subtype"] = s
if not (record.get("summary") or "").strip():
s = (suggested.get("summary") or "").strip()
if s:
fields_to_update["summary"] = s
if not (record.get("headnote") or "").strip():
s = (suggested.get("headnote") or "").strip()
if s:
fields_to_update["headnote"] = s
if not (record.get("key_quote") or "").strip():
s = (suggested.get("key_quote") or "").strip()
if s:
fields_to_update["key_quote"] = s
cur_tags = record.get("subject_tags") or []
if not cur_tags:
sug_tags = suggested.get("subject_tags") or []
if sug_tags:
fields_to_update["subject_tags"] = sug_tags
if not fields_to_update:
return {"updated": False, "fields": []}
await db.update_case_law(case_law_id, **fields_to_update)
return {"updated": True, "fields": list(fields_to_update.keys())}
async def extract_and_apply(case_law_id: UUID | str) -> dict:
"""Convenience wrapper: extract → merge into row → return summary."""
suggested = await extract_metadata(case_law_id)
if not suggested:
return {"status": "no_metadata", "fields": []}
result = await apply_to_record(case_law_id, suggested)
return {
"status": "completed" if result["updated"] else "no_changes",
"fields": result["fields"],
"suggested": suggested,
}