אומת end-to-end: פס"ד 34 עמ' של עת"מ 46111-12-22 הורד אוטונומית מלא, נטו קוד-פתוח, ללא כרטיס-חכם וללא פתרון-CAPTCHA. ממצאי-כיול עיקריים: - החיפוש+הניווט-לתיק ללא reCAPTCHA כלל. reCAPTCHA קיים רק בצופה ורק על שמירה/הדפסה מפורשת — לא על הצגת המסמך. - הצופה מגיש עמודים כ-PNG דרך PageMethod GetImages (4/batch); משיכה ב-fetch עם הכותרת X-Requested-With: XMLHttpRequest (חובה — F5 WAF חוסם בלעדיה) → הרכבת PDF (Pillow). שינויים: - camofox_client.py: שכתוב מלא — Camoufox דרך חבילת-הפייתון (in-process, לא שרת-Node REST). מסלול מכויל: home→btnExternalSearchCases→Bama fields→ CaseDetails→פסקי דין→DecisionList→NGCSViewerPage→GetImages→PDF. - pm2 config: app Xvfb :99 + DISPLAY=:99 (Camoufox קורס headless בלי צג וירטואלי). - pyproject: extra [court-fetch] = camoufox + faster-whisper (host-only; הקונטיינר לא מריץ דפדפן). Pillow כבר בבסיס. - X13 spec + SCRIPTS.md: עודכנו לממצאים (image-API, Xvfb, אימות). reCAPTCHA audio (Whisper) נשמר כ-fallback למסלול-השמירה-המפורש בלבד; המסלול הראשי אינו זקוק לו. Invariants: מקיים INV-CF1/CF4/CF6 (ללא שינוי). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
80 lines
3.1 KiB
JavaScript
80 lines
3.1 KiB
JavaScript
/**
|
|
* pm2 ecosystem entry for legal-court-fetch-service — the host-side Tier-1
|
|
* verdict fetcher (X13). It drives a Camoufox stealth browser against
|
|
* נט המשפט to download administrative/district-court verdicts the Supreme
|
|
* portal (Tier 0) doesn't carry. Lives on the host because the legal-ai
|
|
* container can't run a browser. See docs/spec/X13-court-fetch.md.
|
|
*
|
|
* Mirrors legal-chat-service.config.cjs (same security model):
|
|
* 1. Bind to 10.0.1.1 (docker0 bridge gateway) — host + docker-bridge
|
|
* containers only; nothing from outside the host.
|
|
* 2. Bearer token auth — COURT_FETCH_SHARED_SECRET loaded from
|
|
* /home/chaim/.legal-court-fetch-service.env (chmod 600) and mirrored in
|
|
* Coolify so the FastAPI proxy sends a matching Authorization header.
|
|
* The service refuses to start without the secret.
|
|
*
|
|
* Prereqs for Tier-1 to actually fetch (otherwise it returns ok:false and the
|
|
* orchestrator escalates to the human fallback — INV-CF3):
|
|
* - camofox-browser running, CAMOFOX_URL set (e.g. http://127.0.0.1:9377).
|
|
* git clone https://github.com/jo-inc/camofox-browser && npm i && npm start
|
|
* - faster-whisper installed in the venv for the reCAPTCHA audio solver.
|
|
*
|
|
* Install (once):
|
|
* pm2 start /home/chaim/legal-ai/scripts/legal-court-fetch-service.config.cjs
|
|
* pm2 save
|
|
* Smoke test:
|
|
* curl http://10.0.1.1:8771/health
|
|
* Update:
|
|
* pm2 restart legal-court-fetch-service --update-env
|
|
*/
|
|
const fs = require("fs");
|
|
|
|
const ENV_FILE = "/home/chaim/.legal-court-fetch-service.env";
|
|
const env = {
|
|
HOME: "/home/chaim",
|
|
PATH: "/home/chaim/.local/bin:/usr/local/bin:/usr/bin:/bin",
|
|
PYTHONUNBUFFERED: "1",
|
|
// Camoufox (headless Firefox) crashes on this server without a virtual
|
|
// display, so the service points at the Xvfb companion app below (:99).
|
|
DISPLAY: ":99",
|
|
};
|
|
try {
|
|
const text = fs.readFileSync(ENV_FILE, "utf8");
|
|
for (const line of text.split("\n")) {
|
|
if (!line || line.trim().startsWith("#")) continue;
|
|
const m = line.match(/^\s*([A-Z_][A-Z0-9_]*)\s*=\s*(.*?)\s*$/);
|
|
if (m) env[m[1]] = m[2];
|
|
}
|
|
} catch (e) {
|
|
console.error(`legal-court-fetch-service: failed to load ${ENV_FILE}: ${e.message}`);
|
|
console.error("Service will refuse to start without COURT_FETCH_SHARED_SECRET.");
|
|
}
|
|
|
|
module.exports = {
|
|
apps: [
|
|
{
|
|
// Persistent virtual display for Camoufox (headless Firefox needs it on
|
|
// this screenless server). Bound to :99 to match DISPLAY above.
|
|
name: "legal-court-fetch-xvfb",
|
|
script: "/usr/bin/Xvfb",
|
|
args: ":99 -screen 0 1920x1080x24 -nolisten tcp",
|
|
autorestart: true,
|
|
max_restarts: 10,
|
|
restart_delay: 3000,
|
|
},
|
|
{
|
|
name: "legal-court-fetch-service",
|
|
cwd: "/home/chaim/legal-ai/mcp-server",
|
|
script: "/home/chaim/legal-ai/mcp-server/.venv/bin/python",
|
|
args: "-m legal_mcp.court_fetch_service.server --port 8771 --host 10.0.1.1",
|
|
env,
|
|
restart_delay: 5000,
|
|
max_restarts: 10,
|
|
autorestart: true,
|
|
// A Firefox content process loading the heavy ASP.NET pages can spike;
|
|
// give headroom but cap so a leak can't threaten Postgres.
|
|
max_memory_restart: "1500M",
|
|
},
|
|
],
|
|
};
|