feat(X13): auto-fetch court verdicts from נט המשפט → corpus (Tier 0 + scaffold)
תת-מערכת אחזור-פסיקה אוטומטי: כשיומון מצביע על פס"ד בית-משפט, מסווגים את הערכאה, מורידים מהמקור הציבורי המתאים, וקולטים דרך צינור-הקליטה הקנוני. - spec-first: docs/spec/X13-court-fetch.md (INV-CF1..CF7) + אינדקס - מסווג court_citation.py (supreme/admin/skip) + 10 בדיקות (עת"מ 46111-12-22 → admin) - Tier 0: court_fetch_supreme.py — supremedecisions API (reverse-engineered), httpx + browser-headers (אומת 200) + politeness - תור court_fetch_jobs (SCHEMA_V30) + DB helpers + court_fetch_orchestrator.py - Tier 1 scaffold: legal-court-fetch-service (aiohttp+Bearer, מראת legal-chat-service) + camofox_client (Camoufox open-source) + recaptcha_audio (Whisper מקומי) + pm2 - Tier 2 fallback חינני: manual + missing_precedent (INV-CF2/CF3 — אין drop שקט) - כלי-MCP court_verdict_fetch / court_fetch_status; SCRIPTS.md Invariants: מקיים G2 (מסלול-קליטה יחיד, INV-CF1) · G3/G1 (idempotent+נרמול, INV-CF5) · G4/§6 (אין בליעה שקטה, INV-CF2) · G10 (שער-אנושי, INV-CF3) · G5 (source_type, INV-CF6) · G9 (provenance+audit, INV-CF7). מקורות INV-CF4: RFC 9309 · Google crawler · OWASP OAT. Follow-ups (טרם אומתו חי): live Tier-0 validation · התקנת camofox-browser+whisper · כיול selectors Tier-1 · COURT_FETCH_SHARED_SECRET (Infisical+Coolify) · טריגר מ-digest try_autolink (worktree-digests-radar). V30 עלול להתנגש עם digests-radar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
148
mcp-server/src/legal_mcp/court_fetch_service/camofox_client.py
Normal file
148
mcp-server/src/legal_mcp/court_fetch_service/camofox_client.py
Normal file
@@ -0,0 +1,148 @@
|
||||
"""Camoufox-browser client + נט-המשפט navigation flow (X13, Tier 1).
|
||||
|
||||
Open-source, zero-API-cost stealth browsing: a self-hosted ``camofox-browser``
|
||||
REST server (``jo-inc/camofox-browser``, wrapping Camoufox — a Firefox fork
|
||||
with C++ fingerprint spoofing) drives a real browser. We talk to it over the
|
||||
same REST surface the Hermes agent uses (``~/.hermes/.../browser_camofox.py``):
|
||||
|
||||
POST /tabs → {tab_id}
|
||||
POST /tabs/{tab}/navigate {url}
|
||||
GET /tabs/{tab}/snapshot → accessibility tree w/ element refs
|
||||
POST /tabs/{tab}/click {ref}
|
||||
POST /tabs/{tab}/type {ref,text}
|
||||
GET /tabs/{tab}/screenshot
|
||||
DELETE /sessions/{user}
|
||||
|
||||
Set ``CAMOFOX_URL`` (e.g. ``http://127.0.0.1:9377``) to enable. The server's
|
||||
``/health`` exposes a VNC URL — that's the human-fallback surface (INV-CF3):
|
||||
when the autonomous reCAPTCHA solve fails, the chair opens the VNC and solves
|
||||
it live, and this flow continues.
|
||||
|
||||
⚠ CALIBRATION: the נט-המשפט external-case-search is an ASP.NET WebForms app
|
||||
behind an F5 WAF + reCAPTCHA. The element selectors and step sequence below
|
||||
are the *documented plan* of the flow; they must be calibrated against the
|
||||
live snapshot on first run (the site rate-limited static probing during
|
||||
development). Every step that can't find its target **raises** a clear Hebrew
|
||||
reason (INV-CF2 — no silent success-with-garbage) so the orchestrator escalates
|
||||
to the Tier-2 human fallback rather than returning an empty/wrong file.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# נט המשפט public entry points (discovered from the homepage __doPostBack menu).
|
||||
NGCS_HOME = "https://www.court.gov.il/ngcs.web.site/homepage.aspx"
|
||||
|
||||
CAMOFOX_URL = os.environ.get("CAMOFOX_URL", "").rstrip("/")
|
||||
_TIMEOUT = float(os.environ.get("COURT_FETCH_BROWSER_TIMEOUT_S", "60"))
|
||||
|
||||
|
||||
class CamofoxUnavailable(RuntimeError):
|
||||
"""camofox-browser isn't configured/reachable."""
|
||||
|
||||
|
||||
class NgcsFlowError(RuntimeError):
|
||||
"""A step in the נט-המשפט flow failed (selector/CAPTCHA/navigation)."""
|
||||
|
||||
|
||||
def is_enabled() -> bool:
|
||||
return bool(CAMOFOX_URL)
|
||||
|
||||
|
||||
async def health() -> dict:
|
||||
"""Probe camofox-browser; surfaces the VNC URL for the human fallback."""
|
||||
if not CAMOFOX_URL:
|
||||
raise CamofoxUnavailable("CAMOFOX_URL is not set")
|
||||
async with httpx.AsyncClient(timeout=10) as c:
|
||||
r = await c.get(f"{CAMOFOX_URL}/health")
|
||||
r.raise_for_status()
|
||||
return r.json()
|
||||
|
||||
|
||||
class _Browser:
|
||||
"""Thin async wrapper over the camofox-browser REST surface."""
|
||||
|
||||
def __init__(self, client: httpx.AsyncClient, tab_id: str, user_id: str):
|
||||
self._c = client
|
||||
self.tab = tab_id
|
||||
self.user = user_id
|
||||
|
||||
@classmethod
|
||||
async def open(cls, client: httpx.AsyncClient) -> "_Browser":
|
||||
r = await client.post(f"{CAMOFOX_URL}/tabs", json={})
|
||||
r.raise_for_status()
|
||||
data = r.json()
|
||||
return cls(client, data["tab_id"], data.get("user_id", data["tab_id"]))
|
||||
|
||||
async def navigate(self, url: str) -> None:
|
||||
r = await self._c.post(f"{CAMOFOX_URL}/tabs/{self.tab}/navigate", json={"url": url})
|
||||
r.raise_for_status()
|
||||
|
||||
async def snapshot(self) -> dict:
|
||||
r = await self._c.get(f"{CAMOFOX_URL}/tabs/{self.tab}/snapshot")
|
||||
r.raise_for_status()
|
||||
return r.json()
|
||||
|
||||
async def click(self, ref: str) -> dict:
|
||||
r = await self._c.post(f"{CAMOFOX_URL}/tabs/{self.tab}/click", json={"ref": ref})
|
||||
r.raise_for_status()
|
||||
return r.json()
|
||||
|
||||
async def type(self, ref: str, text: str) -> None:
|
||||
r = await self._c.post(
|
||||
f"{CAMOFOX_URL}/tabs/{self.tab}/type", json={"ref": ref, "text": text}
|
||||
)
|
||||
r.raise_for_status()
|
||||
|
||||
async def close(self) -> None:
|
||||
try:
|
||||
await self._c.delete(f"{CAMOFOX_URL}/sessions/{self.user}")
|
||||
except httpx.HTTPError:
|
||||
pass
|
||||
|
||||
|
||||
async def fetch_admin_verdict(
|
||||
*, file_number: str, month: str, year: str, case_number: str, court: str
|
||||
) -> dict:
|
||||
"""Drive נט המשפט to download an admin/district verdict PDF.
|
||||
|
||||
Returns ``{content: bytes, filename: str, source_url: str, court: str}``.
|
||||
Raises ``CamofoxUnavailable`` / ``NgcsFlowError`` on failure.
|
||||
|
||||
The flow (to be calibrated against the live snapshot):
|
||||
1. Open the homepage; trigger "חיפוש תיקים חיצוני" (btnExternalSearchCases).
|
||||
2. Fill the case-number / month / year fields.
|
||||
3. Solve the reCAPTCHA via the audio challenge (recaptcha_audio); on
|
||||
repeated failure, surface the VNC URL for a human solve (INV-CF3).
|
||||
4. Submit; open the matched case; locate the verdict ("פסק דין") document.
|
||||
5. Download the cleared PDF (served via S3 pre-signed URL) and return bytes.
|
||||
"""
|
||||
if not CAMOFOX_URL:
|
||||
raise CamofoxUnavailable(
|
||||
"שירות-הדפדפן (camofox-browser) אינו מוגדר — הגדר CAMOFOX_URL "
|
||||
"והפעל את jo-inc/camofox-browser. ראה docs/spec/X13-court-fetch.md."
|
||||
)
|
||||
|
||||
async with httpx.AsyncClient(timeout=_TIMEOUT) as client:
|
||||
br = await _Browser.open(client)
|
||||
try:
|
||||
await br.navigate(NGCS_HOME)
|
||||
snap = await br.snapshot()
|
||||
_ = snap # calibration anchor: locate btnExternalSearchCases here.
|
||||
|
||||
# The concrete selector/CAPTCHA/download steps require live
|
||||
# calibration with camofox running. Until calibrated we fail
|
||||
# loudly so the orchestrator escalates to the human fallback
|
||||
# (INV-CF2/CF3) rather than pretending success.
|
||||
raise NgcsFlowError(
|
||||
"זרימת נט-המשפט (Tier 1) ממתינה לכיול מול snapshot חי של "
|
||||
"camofox-browser — בקשת-אחזור מוסלמת ל-fallback אנושי (VNC/ידני)."
|
||||
)
|
||||
finally:
|
||||
await br.close()
|
||||
Reference in New Issue
Block a user