Commit Graph

9 Commits

Author SHA1 Message Date
69b34f1c3f fix(X13): route by נט-format availability; robust fetch error handling
Live drain surfaced three issues:
1. Tier-0 needed `h2` (httpx http2) — added to the court-fetch extra.
2. Supreme cases that carry a נט-format number (e.g. בר"מ 72182-06-25) were
   routed to the unvalidated Tier-0 and failed, even though נט המשפט serves
   Supreme cases too. classify() now parses the file-month-year triple for
   Supreme prefixes; the orchestrator routes by triple-availability:
     נט-format present → Tier-1 (validated, all courts)
     serial-only Supreme (עע"מ 5886/24) → Tier-0
     neither → clear "no public route" failure
   Validated live: בר"מ 72182-06-25 fetched via Tier-1 (5-page PDF).
3. A non-`RuntimeError` fetch exception (the h2 import error) left jobs stuck
   in 'running'. The fetch block now catches any Exception → _record_failure
   (INV-CF2/CF3), so a job always reaches a terminal state.

+ test_supreme_with_net_format_triple. Suite 11/11.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:45:20 +00:00
781f24c643 feat(X13 Tier-1): calibrate נט המשפט fetch — Camoufox python, proven on 46111-12-22
אומת end-to-end: פס"ד 34 עמ' של עת"מ 46111-12-22 הורד אוטונומית מלא, נטו
קוד-פתוח, ללא כרטיס-חכם וללא פתרון-CAPTCHA.

ממצאי-כיול עיקריים:
- החיפוש+הניווט-לתיק ללא reCAPTCHA כלל. reCAPTCHA קיים רק בצופה ורק על
  שמירה/הדפסה מפורשת — לא על הצגת המסמך.
- הצופה מגיש עמודים כ-PNG דרך PageMethod GetImages (4/batch); משיכה ב-fetch
  עם הכותרת X-Requested-With: XMLHttpRequest (חובה — F5 WAF חוסם בלעדיה) →
  הרכבת PDF (Pillow).

שינויים:
- camofox_client.py: שכתוב מלא — Camoufox דרך חבילת-הפייתון (in-process,
  לא שרת-Node REST). מסלול מכויל: home→btnExternalSearchCases→Bama fields→
  CaseDetails→פסקי דין→DecisionList→NGCSViewerPage→GetImages→PDF.
- pm2 config: app Xvfb :99 + DISPLAY=:99 (Camoufox קורס headless בלי צג וירטואלי).
- pyproject: extra [court-fetch] = camoufox + faster-whisper (host-only; הקונטיינר
  לא מריץ דפדפן). Pillow כבר בבסיס.
- X13 spec + SCRIPTS.md: עודכנו לממצאים (image-API, Xvfb, אימות).

reCAPTCHA audio (Whisper) נשמר כ-fallback למסלול-השמירה-המפורש בלבד; המסלול
הראשי אינו זקוק לו. Invariants: מקיים INV-CF1/CF4/CF6 (ללא שינוי).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:32:13 +00:00
69bdf7b30a fix(settings): harden PATCH/redeploy per code review
- Add infisicalsdk dependency
- Narrow update→create fallback to NotFound errors only (no silent swallow)
- Truncate Coolify error response text to 200 chars
- Add 60s cooldown to redeploy endpoint
- Move httpx to top-level import
2026-05-04 06:33:01 +00:00
2cfdf35191 refactor(precedents): keep all LLM calls on the local-MCP path
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
Architectural correction: every claude_session caller in this project
runs through the local MCP server (~/.claude.json points at
/home/chaim/legal-ai/mcp-server/.venv/bin/python). The Coolify container
has no `claude` CLI and no claude.ai session, so any LLM call originating
from web/ FastAPI fails with "Claude CLI not found" — which is exactly
what we hit on 403-17.

The earlier Anthropic SDK fallback would have made it work, but at
direct API cost. The chair's preference is to stay on the claude.ai
session for everything. So:

- claude_session.py: removed the SDK fallback, restored CLI-only.
  The error message now points the next person at the architectural
  rule in the module docstring instead of papering over it.
- precedent_library.py:ingest_precedent (called from FastAPI on upload)
  now does only the non-LLM half: extract → chunk → embed → store.
  Sets halacha_extraction_status='pending' for the chair to act on.
- reextract_halachot / reextract_metadata kept, but lazy-import their
  extractors so the FastAPI path can't accidentally pull them in. They
  are reachable only via the MCP tools precedent_extract_halachot /
  precedent_extract_metadata, which run locally with CLI.
- Removed POST /api/precedent-library/{id}/extract-halachot and
  /extract-metadata — they were dead ends from the container.
- Dropped the `anthropic` Python dep that the SDK fallback required.
- UI: removed the "refresh halachot" and "sparkles metadata" buttons
  that called those endpoints. Edit sheet now points the chair at the
  MCP tool names instead.

Halacha and metadata extraction for an uploaded precedent now happen
when the chair (via Claude Code) runs:
  mcp__legal-ai__precedent_extract_metadata <case_law_id>
  mcp__legal-ai__precedent_extract_halachot <case_law_id>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:06:08 +00:00
5d836ca414 fix(precedents): Anthropic SDK fallback, format() crash, UI refresh
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m31s
Three fixes to the precedent library after the first end-to-end test on
403-17 surfaced runtime issues:

1. Anthropic SDK fallback in claude_session. The legal-ai Docker container
   does not ship the `claude` CLI, so every halacha and metadata extraction
   was failing with "Claude CLI not found." Module now tries the CLI first
   (zero-cost local path) and falls back to the Anthropic SDK with
   ANTHROPIC_API_KEY when the binary is absent. Default model is
   claude-sonnet-4-6, overridable via CLAUDE_SDK_MODEL env. The system
   message gets cache_control: ephemeral so multi-chunk runs reuse the
   cached instruction prefix at ~10% read cost. Adds `anthropic` to
   pyproject deps.

2. precedent_metadata_extractor crashed with KeyError because the JSON
   example inside the prompt template contained literal { } characters
   that str.format() interpreted as placeholders. Switched to f-string
   concatenation; the prompt template no longer needs format() at all.

3. Library list query stays stale after upload because the upload
   mutation's onSuccess fires when the POST returns task_id, not when
   SSE reports completion. Added a second invalidate inside the SSE
   watcher in PrecedentUploadSheet so the new row appears with up-to-date
   chunk and halachot counts the moment processing finishes.

Halacha and metadata extractors now route the long static prompt through
the new `system=` parameter so the SDK path actually caches it; the CLI
path concatenates and behaves as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 10:52:31 +00:00
82ba4663ba Fix case repo sync + auto-create Gitea repos + add sync indicator
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m30s
- auto-sync-cases.sh: fix broken directory scan (was looking for
  status subdirs that don't exist), fix env var word-splitting bug,
  add safe.directory handling and error logging
- cases.py: auto-create Gitea repo on case_create, fix
  documents/original → documents/originals naming mismatch
- app.py: add GET /api/cases/{case_number}/git-status endpoint
- web-ui: add SyncIndicator component in case header showing
  sync status (synced/pending/no remote) with last commit time
- pyproject.toml: add httpx dependency
- CLAUDE.md: update Paperclip wakeup API docs
- settings page: switch tag input from Select to free-text with datalist

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:28:16 +00:00
94bc66d7c1 Bundle FastAPI backend into Next.js Docker container
The Next.js app was proxying /api/* to the old Flask/FastAPI server
at legal-ai.nautilus.marcusgroup.org. When that server went down,
the Next.js app's API calls failed with 503.

Now both services run in the same container:
- FastAPI (uvicorn) on :8000 — the API backend
- Next.js (node) on :3000 — proxies /api/* to localhost:8000

Changes:
- Dockerfile: multi-stage build with Python 3.12 + Node.js
- next.config.ts: default proxy target is now 127.0.0.1:8000
- start.sh: launches uvicorn in background + node in foreground
- pyproject.toml: add fastapi + uvicorn as explicit deps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:33:52 +00:00
6aaca14e31 Replace Claude Vision OCR with Google Cloud Vision
Benchmark results on Hebrew legal docs (case 1130-25):
- Google Vision: 1s/page, $0.001/page, high accuracy
- Claude Opus Vision: 90s/page, $0.05/page, poor accuracy
- PyMuPDF broken OCR layers now detected via quality check

Changes:
- extractor.py: Google Vision OCR with Hebrew language hint (300 DPI)
- extractor.py: text quality detection (word length, words-per-line, Hebrew ratio)
- extractor.py: Hebrew abbreviation quote fixer (15 known patterns)
- config.py: add GOOGLE_CLOUD_VISION_API_KEY, remove ANTHROPIC_API_KEY
- pyproject.toml: add google-cloud-vision, remove anthropic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:17:58 +00:00
6f515dc2cb Initial commit: MCP server + web upload interface
Ezer Mishpati - AI legal decision drafting system with:
- MCP server (FastMCP) with document processing pipeline
- Web upload interface (FastAPI) for file upload and classification
- pgvector-based semantic search
- Hebrew legal document chunking and embedding
2026-03-23 12:33:07 +00:00