diff --git a/docs/spec/X13-court-fetch.md b/docs/spec/X13-court-fetch.md index acde7c9..5431fab 100644 --- a/docs/spec/X13-court-fetch.md +++ b/docs/spec/X13-court-fetch.md @@ -20,8 +20,21 @@ **שתי דרכי-מקור ציבוריות:** - **עליון** (עע"מ/בג"ץ/ע"א/רע"א/בר"מ/דנ"א) → `supremedecisions.court.gov.il` — הורדה ישירה (httpx), ללא CAPTCHA. - **מנהלי/מחוזי/שלום** (עת"מ/עמ"נ/...) → מציג-התיקים של **נט המשפט** — ASP.NET WebForms - (`__doPostBack`/VIEWSTATE), anti-bot של F5, reCAPTCHA על החיפוש הציבורי, מסמכים כ-S3 cleared URLs. - מחייב **דפדפן-אמת** (host-side), ולכן שירות-מארח ב-pm2 (כדפוס `legal-chat-service`). + (`__doPostBack`/VIEWSTATE), anti-bot של F5, מסמכים מוצגים בצופה-עמודים (turn.js). מחייב + **דפדפן-אמת** (host-side), ולכן שירות-מארח ב-pm2 (כדפוס `legal-chat-service`). + +> **אומת end-to-end (2026-06-07) על עת"מ 46111-12-22** — פס"ד 34 עמ' הורד **אוטונומית מלא, +> נטו קוד-פתוח, ללא כרטיס-חכם וללא פתרון-CAPTCHA**. ממצאי-המפתח מהכיול: +> - **החיפוש והניווט לתיק — ללא reCAPTCHA כלל.** מסלול: דף-בית → `btnExternalSearchCases` +> → מילוי `BamaCaseNumberTextBoxH`(=מס' תיק) + `BamaMonthYearTextBoxHT`(="MM-YY") → +> `CaseDetails.aspx` → לשונית "פסקי דין" → `DecisionList.aspx` → צופה `NGCSViewerPage.aspx`. +> - **reCAPTCHA קיים רק בצופה ורק על שמירה/הדפסה מפורשת** — *לא* על הצגת המסמך. הצופה +> מגיש את העמודים כ-PNG דרך PageMethod **`GetImages`** (4 עמ'/batch) **ללא CAPTCHA**. +> אחזור = לכידת `documentNumber` מהקריאה הראשונה + משיכת כל ה-batches ב-`fetch` עם הכותרת +> **`X-Requested-With: XMLHttpRequest`** (חובה — ה-WAF חוסם AJAX בלעדיה) → הרכבת PDF (Pillow). +> - דפדפן: **Camoufox דרך חבילת-הפייתון** (`camoufox.async_api`, in-process — לא שרת-Node). +> על שרת ללא-מסך נדרש **Xvfb** (אחרת Firefox קורס). פותר-ה-reCAPTCHA האודיו (Whisper) נשמר +> כ-fallback למסלול-השמירה-המפורש בלבד; מסלול-התמונות אינו זקוק לו. --- @@ -31,9 +44,9 @@ underlying_citation → [classifier] → tier ∈ {supreme, admin, skip} skip(ערר/בל"מ) → missing_precedent (נבו ידני) — לא אחזור supreme → Tier 0: httpx בקונטיינר → supremedecisions — אוטונומי מלא - admin → Tier 1: legal-court-fetch-service (host/pm2) — אוטונומי-first - → Camoufox stealth browser → external-search → reCAPTCHA(audio/Whisper) - → download cleared PDF + admin → Tier 1: legal-court-fetch-service (host/pm2 + Xvfb) — אוטונומי-first + → Camoufox(python) → external-search → CaseDetails → פסקי דין + → NGCSViewerPage → GetImages(X-Requested-With) → PNGs → PDF → Tier 2 fallback: VNC ידני / missing_precedent + התראה — שער-אנושי (כל ה-tiers) → precedent_library_upload(source_type=court_ruling) → ingest_precedent → chunks+embeddings+halachot(pending) → relink digest / close gap diff --git a/mcp-server/pyproject.toml b/mcp-server/pyproject.toml index 092713a..afb9cee 100644 --- a/mcp-server/pyproject.toml +++ b/mcp-server/pyproject.toml @@ -23,6 +23,17 @@ dependencies = [ "infisicalsdk>=1.0.0", ] +[project.optional-dependencies] +# Tier-1 court-verdict fetch (X13) — host-only. The container can't run a +# browser, so these are NOT in the base deps; install on the host venv with +# `pip install -e ".[court-fetch]" && python -m camoufox fetch`. faster-whisper +# is only for the explicit-PDF-download reCAPTCHA fallback (the primary +# image-API path needs no solving). +court-fetch = [ + "camoufox>=0.4.11", + "faster-whisper>=1.0.0", +] + [build-system] requires = ["setuptools>=68.0"] build-backend = "setuptools.build_meta" diff --git a/mcp-server/src/legal_mcp/court_fetch_service/camofox_client.py b/mcp-server/src/legal_mcp/court_fetch_service/camofox_client.py index f21c436..ba604ce 100644 --- a/mcp-server/src/legal_mcp/court_fetch_service/camofox_client.py +++ b/mcp-server/src/legal_mcp/court_fetch_service/camofox_client.py @@ -1,148 +1,251 @@ -"""Camoufox-browser client + נט-המשפט navigation flow (X13, Tier 1). +"""Camoufox driver for נט המשפט — calibrated, proven flow (X13, Tier 1). -Open-source, zero-API-cost stealth browsing: a self-hosted ``camofox-browser`` -REST server (``jo-inc/camofox-browser``, wrapping Camoufox — a Firefox fork -with C++ fingerprint spoofing) drives a real browser. We talk to it over the -same REST surface the Hermes agent uses (``~/.hermes/.../browser_camofox.py``): +Open-source, zero-API-cost: drives a **Camoufox** stealth browser (a Firefox +fork with C++ fingerprint spoofing) via its official Python package +(``camoufox.async_api``) — in-process, no separate Node server. The full flow +was reverse-engineered and validated end-to-end against עת"מ 46111-12-22 +(2026-06-07): a 34-page verdict PDF retrieved with **no smart-card and no +CAPTCHA-solving**. - POST /tabs → {tab_id} - POST /tabs/{tab}/navigate {url} - GET /tabs/{tab}/snapshot → accessibility tree w/ element refs - POST /tabs/{tab}/click {ref} - POST /tabs/{tab}/type {ref,text} - GET /tabs/{tab}/screenshot - DELETE /sessions/{user} +The proven path: + 1. homepage → DOM-click ``btnExternalSearchCases`` ("תיקים לפי מס' תיק מקור"). + 2. Fill the visible header case-locator: ``BamaCaseNumberTextBoxH`` = case + number, ``BamaMonthYearTextBoxHT`` = "MM-YY"; click ``SearchHeaderCaseButton``. + → lands on ``FolderCaseDetails/CaseDetails.aspx`` for the case. + 3. Click the "פסקי דין" sidebar tab → ``Decisions/DecisionList.aspx``. + 4. Click the document → popup ``Viewer/NGCSViewerPage.aspx?DocumentNumber=…``. + 5. The viewer renders pages as PNG images via the ``GetImages`` PageMethod — + **served without reCAPTCHA** (the reCAPTCHA on the viewer only gates the + explicit save/print, which we don't use). Capture the internal + ``documentNumber`` from the viewer's first ``GetImages`` call, then pull + every 4-page batch via ``fetch`` **with header ``X-Requested-With: + XMLHttpRequest``** (required — the F5 WAF blocks AJAX calls without it). + 6. Decode the base64 PNGs → assemble a PDF (Pillow). The existing ingest + pipeline OCRs it (Google Vision) → text → corpus. -Set ``CAMOFOX_URL`` (e.g. ``http://127.0.0.1:9377``) to enable. The server's -``/health`` exposes a VNC URL — that's the human-fallback surface (INV-CF3): -when the autonomous reCAPTCHA solve fails, the chair opens the VNC and solves -it live, and this flow continues. +Operational requirements (see scripts/legal-court-fetch-service.config.cjs): + * a virtual display — Camoufox/Firefox crashes headless on this server + without one. Set ``DISPLAY`` to a running Xvfb (e.g. ``:99``). + * RAM — a Firefox content process loading the heavy ASP.NET pages needs + ~0.5–1 GB; keep the box from swapping. -⚠ CALIBRATION: the נט-המשפט external-case-search is an ASP.NET WebForms app -behind an F5 WAF + reCAPTCHA. The element selectors and step sequence below -are the *documented plan* of the flow; they must be calibrated against the -live snapshot on first run (the site rate-limited static probing during -development). Every step that can't find its target **raises** a clear Hebrew -reason (INV-CF2 — no silent success-with-garbage) so the orchestrator escalates -to the Tier-2 human fallback rather than returning an empty/wrong file. +reCAPTCHA note: ``recaptcha_audio`` (local Whisper) remains as a fallback for +the explicit-PDF-download path, but the primary image-API path needs no +solving, so it is normally unused. """ from __future__ import annotations +import asyncio +import base64 +import io +import json import logging import os - -import httpx +import re logger = logging.getLogger(__name__) -# נט המשפט public entry points (discovered from the homepage __doPostBack menu). NGCS_HOME = "https://www.court.gov.il/ngcs.web.site/homepage.aspx" -CAMOFOX_URL = os.environ.get("CAMOFOX_URL", "").rstrip("/") -_TIMEOUT = float(os.environ.get("COURT_FETCH_BROWSER_TIMEOUT_S", "60")) +# Headless Camoufox needs a virtual display on this server. +_DISPLAY = os.environ.get("DISPLAY", "") +_NAV_TIMEOUT_MS = int(float(os.environ.get("COURT_FETCH_BROWSER_TIMEOUT_S", "60")) * 1000) +_PAGE_BATCH = 4 # the viewer's GetImages batch size +_MAX_PAGES = 400 # hard cap on a single document class CamofoxUnavailable(RuntimeError): - """camofox-browser isn't configured/reachable.""" + """Camoufox (or its virtual display) isn't available.""" class NgcsFlowError(RuntimeError): - """A step in the נט-המשפט flow failed (selector/CAPTCHA/navigation).""" + """A step in the נט-המשפט flow failed (navigation / not found / blocked).""" def is_enabled() -> bool: - return bool(CAMOFOX_URL) + """True if the Camoufox package imports (browser binary present).""" + try: + import camoufox.async_api # noqa: F401 + return True + except Exception: + return False async def health() -> dict: - """Probe camofox-browser; surfaces the VNC URL for the human fallback.""" - if not CAMOFOX_URL: - raise CamofoxUnavailable("CAMOFOX_URL is not set") - async with httpx.AsyncClient(timeout=10) as c: - r = await c.get(f"{CAMOFOX_URL}/health") - r.raise_for_status() - return r.json() + return {"camoufox_import": is_enabled(), "display": _DISPLAY or "(none)"} -class _Browser: - """Thin async wrapper over the camofox-browser REST surface.""" - - def __init__(self, client: httpx.AsyncClient, tab_id: str, user_id: str): - self._c = client - self.tab = tab_id - self.user = user_id - - @classmethod - async def open(cls, client: httpx.AsyncClient) -> "_Browser": - r = await client.post(f"{CAMOFOX_URL}/tabs", json={}) - r.raise_for_status() - data = r.json() - return cls(client, data["tab_id"], data.get("user_id", data["tab_id"])) - - async def navigate(self, url: str) -> None: - r = await self._c.post(f"{CAMOFOX_URL}/tabs/{self.tab}/navigate", json={"url": url}) - r.raise_for_status() - - async def snapshot(self) -> dict: - r = await self._c.get(f"{CAMOFOX_URL}/tabs/{self.tab}/snapshot") - r.raise_for_status() - return r.json() - - async def click(self, ref: str) -> dict: - r = await self._c.post(f"{CAMOFOX_URL}/tabs/{self.tab}/click", json={"ref": ref}) - r.raise_for_status() - return r.json() - - async def type(self, ref: str, text: str) -> None: - r = await self._c.post( - f"{CAMOFOX_URL}/tabs/{self.tab}/type", json={"ref": ref, "text": text} - ) - r.raise_for_status() - - async def close(self) -> None: +async def _fill_visible(page, id_substr: str, value: str) -> bool: + for el in await page.locator(f"input[id*='{id_substr}']").all(): try: - await self._c.delete(f"{CAMOFOX_URL}/sessions/{self.user}") - except httpx.HTTPError: - pass + if await el.is_visible() and await el.is_editable(): + await el.fill(value) + return True + except Exception: + continue + return False + + +async def _reach_viewer(page, *, case_number: str, month_year: str): + """Drive home → search → case → פסקי דין → viewer popup. Returns the popup page.""" + await page.goto(NGCS_HOME, wait_until="domcontentloaded", timeout=_NAV_TIMEOUT_MS) + await page.wait_for_timeout(2500) + await page.eval_on_selector( + "#Header1_UpperMenu1_btnExternalSearchCases", "el => el.click()" + ) + try: + await page.wait_for_load_state("domcontentloaded", timeout=_NAV_TIMEOUT_MS) + except Exception: + pass + await page.wait_for_timeout(4500) + + if not await _fill_visible(page, "BamaCaseNumberTextBoxH", case_number): + raise NgcsFlowError("שדה מספר-תיק לא נמצא בעמוד החיפוש") + my_filled = False + for el in await page.locator("input[id*='BamaMonthYearTextBoxHT']").all(): + if await el.is_visible(): + await el.click() + await page.keyboard.type(month_year, delay=60) + my_filled = True + break + if not my_filled: + raise NgcsFlowError("שדה חודש-שנה לא נמצא") + clicked = False + for b in await page.locator("[id*='SearchHeaderCaseButton']").all(): + if await b.is_visible(): + await b.click() + clicked = True + break + if not clicked: + raise NgcsFlowError("כפתור החיפוש לא נמצא") + await page.wait_for_timeout(6000) + if "CaseDetails" not in page.url: + raise NgcsFlowError( + f"לא הגענו לעמוד-התיק (URL={page.url[:80]}) — ייתכן שהתיק לא נמצא/לא פתוח לעיון" + ) + + # פסקי דין tab → DecisionList + psak = page.locator("a:has-text('פסקי דין')") + opened = False + for i in range(await psak.count()): + el = psak.nth(i) + if await el.is_visible(): + await el.click() + opened = True + break + if not opened: + raise NgcsFlowError("לשונית 'פסקי דין' לא נמצאה בעמוד-התיק") + await page.wait_for_timeout(6000) + + # open the verdict document viewer (popup) + viewers = page.locator( + "a[href*='Viewer'],[onclick*='Viewer'],a[href*='Document'],a:has-text('צפייה')" + ) + async with page.context.expect_page(timeout=15000) as pop: + clicked = False + for i in range(await viewers.count()): + el = viewers.nth(i) + if await el.is_visible(): + await el.click() + clicked = True + break + if not clicked: + raise NgcsFlowError("לא נמצא מסמך פסק-דין לצפייה") + return await pop.value async def fetch_admin_verdict( *, file_number: str, month: str, year: str, case_number: str, court: str ) -> dict: - """Drive נט המשפט to download an admin/district verdict PDF. + """Fetch an admin/district court verdict as a PDF. Returns + ``{content: bytes, filename, source_url, court}``; raises on failure. - Returns ``{content: bytes, filename: str, source_url: str, court: str}``. - Raises ``CamofoxUnavailable`` / ``NgcsFlowError`` on failure. - - The flow (to be calibrated against the live snapshot): - 1. Open the homepage; trigger "חיפוש תיקים חיצוני" (btnExternalSearchCases). - 2. Fill the case-number / month / year fields. - 3. Solve the reCAPTCHA via the audio challenge (recaptcha_audio); on - repeated failure, surface the VNC URL for a human solve (INV-CF3). - 4. Submit; open the matched case; locate the verdict ("פסק דין") document. - 5. Download the cleared PDF (served via S3 pre-signed URL) and return bytes. + ``file_number``/``month``/``year`` are the נט-המשפט triple (e.g. 46111/12/22). """ - if not CAMOFOX_URL: + try: + from camoufox.async_api import AsyncCamoufox + except Exception as e: raise CamofoxUnavailable( - "שירות-הדפדפן (camofox-browser) אינו מוגדר — הגדר CAMOFOX_URL " - "והפעל את jo-inc/camofox-browser. ראה docs/spec/X13-court-fetch.md." + "חבילת camoufox אינה מותקנת/זמינה. הרץ `pip install camoufox` ו-" + "`python -m camoufox fetch`. ראה docs/spec/X13-court-fetch.md." + ) from e + if not _DISPLAY: + # Headless Firefox crashes here without a virtual display. + raise CamofoxUnavailable( + "אין DISPLAY — Camoufox דורש Xvfb על שרת ללא מסך. הפעל Xvfb (למשל :99) " + "והגדר DISPLAY (ראה pm2 config)." ) - async with httpx.AsyncClient(timeout=_TIMEOUT) as client: - br = await _Browser.open(client) - try: - await br.navigate(NGCS_HOME) - snap = await br.snapshot() - _ = snap # calibration anchor: locate btnExternalSearchCases here. + month_year = f"{int(month):02d}-{year[-2:]}" + doc_num = {"v": None} - # The concrete selector/CAPTCHA/download steps require live - # calibration with camofox running. Until calibrated we fail - # loudly so the orchestrator escalates to the human fallback - # (INV-CF2/CF3) rather than pretending success. - raise NgcsFlowError( - "זרימת נט-המשפט (Tier 1) ממתינה לכיול מול snapshot חי של " - "camofox-browser — בקשת-אחזור מוסלמת ל-fallback אנושי (VNC/ידני)." - ) - finally: - await br.close() + async def on_resp(resp): + if "GetImages" in resp.url and not doc_num["v"]: + try: + doc_num["v"] = json.loads(resp.request.post_data).get("documentNumber") + except Exception: + pass + + async with AsyncCamoufox( + headless=True, geoip=False, humanize=True, locale="he-IL" + ) as browser: + page = await browser.new_page() + page.context.on("response", lambda r: asyncio.create_task(on_resp(r))) + vp = await _reach_viewer(page, case_number=file_number, month_year=month_year) + source_url = vp.url + await vp.wait_for_timeout(9000) + if not doc_num["v"]: + raise NgcsFlowError("לא נלכד documentNumber מהצופה (ייתכן שהמסמך לא נטען)") + + # Pull every page batch through fetch() with X-Requested-With (WAF-safe). + imgs = await vp.evaluate( + """async (args) => { + const [dn, maxPages, batch] = args; + const url = window.location.href.split('?')[0] + '/GetImages'; + const out = {}; + for (let f = 0; f < maxPages; f += batch) { + let d; + try { + const r = await fetch(url, {method:'POST', credentials:'include', + headers:{'Content-Type':'application/json; charset=utf-8', + 'X-Requested-With':'XMLHttpRequest'}, + body: JSON.stringify({documentNumber:dn, fromIndex:f, toIndex:f+batch-1})}); + if (!r.ok) break; + const j = await r.json(); d = (j.d !== undefined) ? j.d : j; + } catch (e) { break; } + if (!Array.isArray(d) || d.length === 0) break; + d.forEach((html, k) => { if (html) out[f+k] = html; }); + if (d.length < batch) break; + await new Promise(r => setTimeout(r, 350)); + } + return out; + }""", + [doc_num["v"], _MAX_PAGES, _PAGE_BATCH], + ) + + if not imgs: + raise NgcsFlowError("לא התקבלו עמודי-מסמך מ-GetImages") + from PIL import Image + + pages = [] + for idx in sorted(imgs, key=lambda x: int(x)): + m = re.search(r"base64,([A-Za-z0-9+/=]+)", imgs[idx] or "") + if not m: + continue + pages.append(Image.open(io.BytesIO(base64.b64decode(m.group(1)))).convert("RGB")) + if not pages: + raise NgcsFlowError("עמודי-המסמך לא ניתנים לפענוח (base64)") + + buf = io.BytesIO() + pages[0].save(buf, format="PDF", save_all=True, append_images=pages[1:]) + content = buf.getvalue() + logger.info("נט המשפט: fetched %s — %d pages, %d bytes", + case_number, len(pages), len(content)) + return { + "content": content, + "filename": f"{case_number}.pdf", + "source_url": source_url, + "court": court or "בית משפט מחוזי", + "pages": len(pages), + } diff --git a/scripts/SCRIPTS.md b/scripts/SCRIPTS.md index 32e2f91..d95587f 100644 --- a/scripts/SCRIPTS.md +++ b/scripts/SCRIPTS.md @@ -19,7 +19,7 @@ | `fu2c_reconcile_external_case_numbers.py` | python | **FU-2c (GAP-08, #68) — תיאום `case_number` של פסיקה חיצונית** (`source_kind <> internal_committee`) מציטוט-מלא לצורה קנונית **מציין-הליך + docket** (החלטת-יו"ר 2026-05-31, Option A: `/` נשמר, *לא* `-`; תואם db.py:369 ו-INV-ID2). דטרמיניסטי (designator+docket; 0/>1 docket → flag). `--dry-run` (ברירת-מחדל) מפיק `data/audit/fu2c-reconciliation-*.{csv,md}` עם flags (MISMATCH / NO_CITATION / CIT_NO_DOCKET / DESIG_MISMATCH / DUP_CHECK). `--apply --approved ` מגבה ואז מעדכן שורות לא-חוסמות (כולל ADVISORY/NO_CITATION). `--overrides ` (id,proposed_canonical,reason) פותח שורות-חוסמות בהכרעת-יו"ר מפורשת (למשל פס"ד מאוחד — ראה `data/audit/fu2c-overrides.csv` לרשומת לויתן/קלמנוביץ). לוגיקת-החילוץ + פיצול flags אומתו offline על 24 רשומות. scope: external בלבד (internal = FU-2b). FK-safe. | חד-פעמי, **chair-gated** (apply רק אחרי אישור דפנה) | | `eval_gold_bootstrap.py` | python | **FU-5 (GAP-11) — bootstrap ל-gold-set** של הערכת-אחזור ל-`data/eval/gold-set.jsonl`. שני מקורות: `--source citations` (cited==relevant מ-`search_relevance_feedback`; ריק עד שייצברו ציטוטים) ו-`--source known_item` (query=שם-תיק → relevant=עצמו; אות אמיתי היום). Idempotent — שומר שורות `source=chair`, מחדש `bootstrap_*`. דורש POSTGRES. | לפני eval; חוזר כשנצבר ground-truth | | `eval_retrieval.py` | python | **FU-5 (GAP-11, INV-RET4/G8) — harness הערכת-אחזור** — מריץ את מסלול-האחזור בייצור (`search_library`/`search_internal`) על ה-gold-set, מחשב precision@k/recall@k/MRR/nDCG@k (k=5,10), מצרף overall+per-corpus+per-PA ל-`data/eval/eval-report-.{json,md}` + delta מול `data/eval/baseline.json` (מתעד retrieval_config). `--self-test` בודק את המטריקות offline; `--update-baseline` מאמץ snapshot. **שער-CI במשמעת:** הרץ לפני/אחרי כל שינוי בשכבת-האחזור באותו קונפיג. דורש POSTGRES+VOYAGE_API_KEY. | לפני/אחרי שינוי RRF/k/embedder/rerank | -| `legal-court-fetch-service.config.cjs` | pm2/js | **שירות-מארח Tier-1 לאחזור פסקי-דין מנט המשפט (X13)** — מריץ `python -m legal_mcp.court_fetch_service.server` ב-pm2, bound ל-`10.0.1.1:8771`, Bearer-auth (`COURT_FETCH_SHARED_SECRET` מ-`~/.legal-court-fetch-service.env`). מריץ דפדפן Camoufox (open-source) כי הקונטיינר לא יכול. תלות לאחזור-בפועל: `camofox-browser` רץ (`CAMOFOX_URL`) + `faster-whisper` ל-reCAPTCHA אודיו; אחרת מחזיר ok:false וה-orchestrator מסלים ל-fallback אנושי. מראָה לדפוס `legal-chat-service.config.cjs`. ספ: `docs/spec/X13-court-fetch.md`. התקנה: `pm2 start scripts/legal-court-fetch-service.config.cjs && pm2 save`. בריאות: `curl http://10.0.1.1:8771/health`. | pm2 (host-side) | +| `legal-court-fetch-service.config.cjs` | pm2/js | **שירות-מארח Tier-1 לאחזור פסקי-דין מנט המשפט (X13)** — 2 apps: (א) `legal-court-fetch-xvfb` (Xvfb :99, צג-וירטואלי ל-Camoufox); (ב) `legal-court-fetch-service` (`python -m legal_mcp.court_fetch_service.server`, bound `10.0.1.1:8771`, Bearer `COURT_FETCH_SHARED_SECRET` מ-`~/.legal-court-fetch-service.env`, `DISPLAY=:99`). מריץ Camoufox דרך חבילת-הפייתון (in-process) כי הקונטיינר לא יכול דפדפן. תלות: `pip install -e "mcp-server[court-fetch]" && python -m camoufox fetch`. אחזור = ניווט→צופה→`GetImages`(X-Requested-With)→PDF, ללא CAPTCHA; כשל→`ok:false`→orchestrator מסלים ל-fallback אנושי. **אומת על עת"מ 46111-12-22 (34 עמ').** מראָה לדפוס `legal-chat-service.config.cjs`. ספ: `docs/spec/X13-court-fetch.md`. התקנה: `pm2 start scripts/legal-court-fetch-service.config.cjs && pm2 save`. בריאות: `curl http://10.0.1.1:8771/health`. | pm2 (host-side) | | `auto-sync-cases.sh` | bash | סנכרון תיקי ערר ל-Gitea — רץ כל דקה | `* * * * *` (cron) | | `backup-db.sh` | bash | גיבוי PostgreSQL יומי ל-`data/backups/` (gzip) | לתזמן: `0 2 * * *` | | `restore-db.sh` | bash | שחזור DB מגיבוי (companion ל-backup-db.sh) | ידני | diff --git a/scripts/legal-court-fetch-service.config.cjs b/scripts/legal-court-fetch-service.config.cjs index 2cc6ec8..c76d7aa 100644 --- a/scripts/legal-court-fetch-service.config.cjs +++ b/scripts/legal-court-fetch-service.config.cjs @@ -34,7 +34,9 @@ const env = { HOME: "/home/chaim", PATH: "/home/chaim/.local/bin:/usr/local/bin:/usr/bin:/bin", PYTHONUNBUFFERED: "1", - // CAMOFOX_URL: "http://127.0.0.1:9377", // set when camofox-browser is up + // Camoufox (headless Firefox) crashes on this server without a virtual + // display, so the service points at the Xvfb companion app below (:99). + DISPLAY: ":99", }; try { const text = fs.readFileSync(ENV_FILE, "utf8"); @@ -50,6 +52,16 @@ try { module.exports = { apps: [ + { + // Persistent virtual display for Camoufox (headless Firefox needs it on + // this screenless server). Bound to :99 to match DISPLAY above. + name: "legal-court-fetch-xvfb", + script: "/usr/bin/Xvfb", + args: ":99 -screen 0 1920x1080x24 -nolisten tcp", + autorestart: true, + max_restarts: 10, + restart_delay: 3000, + }, { name: "legal-court-fetch-service", cwd: "/home/chaim/legal-ai/mcp-server", @@ -59,7 +71,9 @@ module.exports = { restart_delay: 5000, max_restarts: 10, autorestart: true, - max_memory_restart: "1G", + // A Firefox content process loading the heavy ASP.NET pages can spike; + // give headroom but cap so a leak can't threaten Postgres. + max_memory_restart: "1500M", }, ], };