feat(X13 Tier-0): decode supremedecisions API — fetch serial-format Supreme verdicts

The 211 open missing_precedents include 99 Supreme serial-format rulings
(בג"ץ/בר"מ/עע"מ NNNN/YY) with no נט-format triple — fetchable only from
supremedecisions.court.gov.il. Decoded its public JSON API (no browser, no
CAPTCHA, no smart-card); validated live on בג"ץ 3483/05 + בר"מ 10212/16.

- court_fetch_supreme.py: rewrite. POST Home/SearchVerdicts with a structured
  `document` ({Year:"YYYY", CaseNum, OldMainNumFormat:true, SearchText:[…]}) +
  X-Requested-With header → records; GET Home/Download?path=&fileName=&type=4 →
  PDF. The earlier attempt failed only on the request shape (string vs object).
  2-digit→4-digit year; try candidate docs best-first (פסק-דין→pages), skipping
  the published-report 's'-prefix files the free endpoint WAF-blocks.
- orchestrator: on successful ingest, close matching open missing_precedents
  (link to the new case_law). End-to-end validated (בר"מ 10212/16 → corpus).
- backfill_missing_precedents.py: enqueue fetchable open gaps (supreme + net)
  into court_fetch_jobs; the drainer fetches+ingests+closes. dry-run default.
- X13 spec + SCRIPTS.md updated (Tier-0 decoded, no longer a limitation).

Very old un-digitized Supreme cases (e.g. בג"ץ 389/87 → 0 records) → manual.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 06:53:31 +00:00
parent 36319a8d75
commit 8d2f1ea0a2
5 changed files with 220 additions and 122 deletions

View File

@@ -0,0 +1,56 @@
"""Backfill: enqueue publicly-fetchable open missing_precedents for auto-fetch.
The citation graph records cited-but-absent rulings in ``missing_precedents``.
The ones with a public source — Supreme serial (בג"ץ/בר"מ/עע"מ NNNN/YY) → Tier-0
supremedecisions; district/Supreme with a נט-format triple → Tier-1 נט המשפט —
can be fetched + ingested automatically. ועדת-ערר (needs Nevo) and serial cases
with no public record are left for the chair.
This stamps a ``court_fetch_jobs`` row for each fetchable gap; the court-fetch
drainer (``drain_court_fetch.py`` / pm2 cron) then fetches, ingests, and closes
the gap. Idempotent (upsert on the canonical case number).
scripts/backfill_missing_precedents.py # dry-run (report only)
scripts/backfill_missing_precedents.py --apply # enqueue
"""
import asyncio
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "mcp-server", "src"))
from legal_mcp.services import court_citation, db
async def main() -> int:
apply = "--apply" in sys.argv
gaps = await db.list_missing_precedents(status="open", limit=2000)
enq = skipped = 0
by_tier: dict[str, int] = {}
for g in gaps:
cit = court_citation.classify(g.get("citation", ""))
net = bool(cit.file_number and cit.month and cit.year)
# Fetchable: Supreme serial (Tier-0) or anything with a נט triple (Tier-1).
if cit.tier == "supreme" or (cit.tier == "admin" and net):
route = "Tier-0/supreme" if (cit.tier == "supreme" and not net) else "Tier-1/net"
by_tier[route] = by_tier.get(route, 0) + 1
if apply:
await db.court_fetch_job_upsert(
case_number_norm=cit.case_number_norm,
citation_raw=g.get("citation", ""),
tier=cit.tier, court=cit.court_prefix,
)
enq += 1
else:
skipped += 1
verb = "enqueued" if apply else "would enqueue"
print(f"{verb}: {enq} (routes: {by_tier})", flush=True)
print(f"skipped (ועדת-ערר/serial-no-record/unrecognized): {skipped}", flush=True)
if not apply:
print("dry-run — re-run with --apply to enqueue.", flush=True)
return 0
if __name__ == "__main__":
sys.exit(asyncio.run(main()))