feat(precedents): minimum-effort upload — file+citation, rest auto-extracted
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s

The missing-precedents drawer + general precedent upload both required
the user to type chair_name, district, practice_area, court, date etc.
upfront — even though those fields can be (and already are, post-upload)
extracted from the document text by the LLM. The metadata-extraction
wakeup also only fired for the /precedent-library/upload path, leaving
missing-precedents committee uploads stuck with whatever stub the user
typed.

Changes:
- Extractor learns chair_name + district, overwrites the new
  PLACEHOLDER_PENDING_EXTRACTION sentinel for internal_committee rows
  (the DB CHECK forces non-empty; we stamp the placeholder at insert).
- missing_precedent_upload no longer 400s on missing chair/district;
  it infers district from the citation when possible, falls back to
  the placeholder, and always fires pc_wake_for_precedent_extraction
  so the LLM can fill in the rest.
- Both upload sheets default to file (+ citation) only; every other
  field is tucked into a closed <details> labeled "אופציונלי — דריסה
  ידנית של שדות שיחולצו אוטומטית". Required validators on chair/
  district/practice_area dropped — the LLM fills them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 14:43:25 +00:00
parent b01722b1b4
commit a02a4e3a64
4 changed files with 291 additions and 225 deletions

View File

@@ -4416,6 +4416,9 @@ async def _process_training_document(task_id: str, source: Path, req: ClassifyRe
# corpus) and /api/cases/{n}/precedents (chair-attached quotes).
from legal_mcp.services import precedent_library as plib_service # noqa: E402
from legal_mcp.services.precedent_metadata_extractor import ( # noqa: E402
PLACEHOLDER_PENDING_EXTRACTION,
)
_PRACTICE_AREAS = {"", "rishuy_uvniya", "betterment_levy", "compensation_197"}
@@ -5237,11 +5240,20 @@ async def missing_precedent_upload(
try:
if is_committee:
if not chair_name.strip() or not district.strip():
raise HTTPException(
400,
"החלטת ועדת ערר דורשת chair_name + district",
)
# The DB CHECK forces chair_name + district to be non-empty for
# internal_committee rows. The UX goal is "upload file + citation
# only" — so if the user didn't fill those, infer district from
# the citation text (often contains the committee name, e.g.
# "ועדות ערר - תכנון ובנייה תל אביב-יפו") and fall back to a
# placeholder. The metadata extractor wakeup fired below will
# overwrite both placeholders once the LLM reads the file.
resolved_chair = chair_name.strip() or PLACEHOLDER_PENDING_EXTRACTION
resolved_district = (
district.strip()
or int_decisions_service._district_from_court(court)
or int_decisions_service._district_from_court(citation)
or PLACEHOLDER_PENDING_EXTRACTION
)
# case_number for the committee decision (not the cited-in case)
committee_case_number = case_number.strip() or citation
result = await int_decisions_service.ingest_internal_decision(
@@ -5249,8 +5261,8 @@ async def missing_precedent_upload(
case_name=(case_name.strip() or mp.get("case_name") or "").strip(),
court=court.strip(),
decision_date=decision_date or None,
chair_name=chair_name.strip(),
district=district.strip(),
chair_name=resolved_chair,
district=resolved_district,
practice_area=practice_area,
appeal_subtype=appeal_subtype.strip(),
subject_tags=tags,
@@ -5298,6 +5310,21 @@ async def missing_precedent_upload(
except Exception as e:
logger.exception("missing-precedent close failed")
raise HTTPException(500, f"קישור הרשומה נכשל: {e}")
# Fire metadata-extraction wakeup so the placeholder fields above
# (and any other empty user-supplied fields) get filled in from the
# file's text. Best-effort: mirrors the precedent_library_upload
# contract — failures are logged, not surfaced.
try:
await pc_wake_for_precedent_extraction(
case_law_id=case_law_id,
citation=citation,
practice_area=practice_area,
)
except Exception:
logger.exception(
"missing-precedent: precedent-extraction wakeup failed (non-fatal)"
)
finally:
staged.unlink(missing_ok=True)