feat(extraction): חילוץ-מטא של פסיקה דרך Gemini Flash + drainer מתוזמן #138
@@ -92,12 +92,14 @@ NCSC/JTC — *AI in Courts* (verifiable citation) | סטטוס: verified
|
||||
**אכיפה:** `proofreader.verify_quote` בעת חילוץ → `quote_verified`.
|
||||
**הפרה ידועה:** — (קיים; ה-flag נכתב, אך אין חיווי ב-UI — ראה [X6 INV-UI6](X6-ui-api-contract.md)).
|
||||
|
||||
### INV-FP5: חילוץ אסינכרוני דרך claude_session מקומי
|
||||
**כלל:** חילוץ-LLM (מטא, הלכות) רץ **אסינכרוני, מתור**, דרך `claude_session` **מקומי בלבד** — לא חוסם את
|
||||
ה-web, ולא קורא ל-LLM מהקונטיינר. מופע של [G2](00-constitution.md#inv-g2-מקור-אמת-יחיד--אין-מסלולים-מקבילים-מתפצלים)
|
||||
(מסלול-LLM קנוני יחיד). **פרויקטלי-תפעולי.** תואם זיכרון `feedback_claude_session_local_only`.
|
||||
**מקור-סמכות:** [ingest.py](../../mcp-server/src/legal_mcp/services/ingest.py) (queue בצעד 12 → `process_pending_extractions`); [legal-ai/CLAUDE.md](../../CLAUDE.md) (claude_session local-only).
|
||||
**אכיפה:** queue + `precedent_process_pending`; קריאות-LLM רק מ-MCP מקומי.
|
||||
### INV-FP5: חילוץ אסינכרוני, מתור, צד-מארח (לא מהקונטיינר)
|
||||
**כלל:** חילוץ-LLM (מטא, הלכות) רץ **אסינכרוני, מתור, מצד-המארח** — לא חוסם את ה-web ולא קורא ל-LLM
|
||||
מהקונטיינר. **בחירת-מנוע לפי אופי-המשימה (לא מסלול מקביל):** חילוץ-מטא הוא משימה *תחומה* (טקסט→JSON)
|
||||
ולכן רץ על **Gemini Flash** (`gemini_session`, structured JSON) — ה-claude CLI ה-agentic פגע ב-
|
||||
`error_max_turns`; חילוץ-הלכות (רגיש-קול/agentic) נשאר על **`claude_session`** (CLI מקומי, מנוי דפנה).
|
||||
שני המנועים מתנקזים לתור-החילוץ הקנוני היחיד ([G2](00-constitution.md#inv-g2-מקור-אמת-יחיד--אין-מסלולים-מקבילים-מתפצלים)). **פרויקטלי-תפעולי.**
|
||||
**מקור-סמכות:** [ingest.py](../../mcp-server/src/legal_mcp/services/ingest.py) (queue → `process_pending_extractions`); [gemini_session.py](../../mcp-server/src/legal_mcp/services/gemini_session.py) (מטא); [legal-ai/CLAUDE.md](../../CLAUDE.md) (claude_session local-only להלכות). `GEMINI_API_KEY` בצד-המארח בלבד — לא בקונטיינר (תואם `feedback_claude_session_local_only`: אין קריאות-LLM מהקונטיינר).
|
||||
**אכיפה:** queue + `precedent_process_pending` + drainers מתוזמנים (`legal-metadata-drain`/CEO); קריאות-LLM רק מצד-המארח.
|
||||
**הפרה ידועה:** תור-החילוץ **סמוי** (אין הבחנה pending-initial מול pending-review; אין extraction-job table) ([gap-audit GAP-45](gap-audit.md); [X9](X9-mcp-tool-contract.md)).
|
||||
|
||||
---
|
||||
|
||||
97
mcp-server/src/legal_mcp/services/gemini_session.py
Normal file
97
mcp-server/src/legal_mcp/services/gemini_session.py
Normal file
@@ -0,0 +1,97 @@
|
||||
"""Gemini structured-output helper — a drop-in for ``claude_session.query_json``
|
||||
for BOUNDED extraction tasks (text → JSON).
|
||||
|
||||
Why a second LLM path: metadata extraction is a single structured call (fill
|
||||
case_name/summary/headnote/tags from a verdict's text), not an agentic loop. The
|
||||
``claude -p`` CLI behind ``claude_session`` is agentic — it reaches for tools and
|
||||
hits ``error_max_turns`` on a task that should be one shot — so it was slow and
|
||||
flaky for the precedent metadata queue. Gemini Flash with JSON mode
|
||||
(``responseMimeType: application/json``) is the right tool: one call, schema-
|
||||
clean JSON, fast, and ~$0.10/1M tokens (negligible for this volume).
|
||||
|
||||
Scope: **bounded extraction only** (precedent metadata). The agentic, voice-
|
||||
sensitive work — decision writing, analysis, halacha extraction — stays on
|
||||
``claude_session`` (Daphna's subscription, zero API cost). This is a deliberate
|
||||
per-task provider choice, not a wholesale move off Claude.
|
||||
|
||||
Key: ``GEMINI_API_KEY`` (host ~/.env; SoT Infisical nautilus:/external-apis/gemini
|
||||
as ``GOOGLE_GEMINI_API_KEY``). Model: ``GEMINI_MODEL`` (default gemini-2.5-flash).
|
||||
Direct REST via httpx — no extra SDK dependency.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_BASE = "https://generativelanguage.googleapis.com/v1beta"
|
||||
_DEFAULT_MODEL = os.environ.get("GEMINI_MODEL", "gemini-2.5-flash")
|
||||
_DEFAULT_TIMEOUT = float(os.environ.get("GEMINI_TIMEOUT_S", "120"))
|
||||
|
||||
|
||||
class GeminiError(RuntimeError):
|
||||
"""Gemini API call failed or returned an unexpected shape."""
|
||||
|
||||
|
||||
def _api_key() -> str:
|
||||
key = os.environ.get("GEMINI_API_KEY", "").strip()
|
||||
if not key:
|
||||
raise GeminiError(
|
||||
"GEMINI_API_KEY אינו מוגדר (host ~/.env / Infisical "
|
||||
"nautilus:/external-apis/gemini)."
|
||||
)
|
||||
return key
|
||||
|
||||
|
||||
async def query_json(
|
||||
prompt: str,
|
||||
timeout: float | int = _DEFAULT_TIMEOUT,
|
||||
*,
|
||||
system: str | None = None,
|
||||
model: str | None = None,
|
||||
# Accepted for drop-in parity with claude_session.query_json; ignored here.
|
||||
effort: str | None = None,
|
||||
tools: str | None = None,
|
||||
) -> dict | list | None:
|
||||
"""Single structured-output call → parsed JSON. Drop-in for
|
||||
``claude_session.query_json``. Raises ``GeminiError`` on failure (the caller
|
||||
treats that like any extraction failure — recorded, never silently wrong).
|
||||
"""
|
||||
model = model or _DEFAULT_MODEL
|
||||
body: dict = {
|
||||
"contents": [{"role": "user", "parts": [{"text": prompt}]}],
|
||||
"generationConfig": {
|
||||
"responseMimeType": "application/json",
|
||||
"temperature": 0,
|
||||
},
|
||||
}
|
||||
if system:
|
||||
body["system_instruction"] = {"parts": [{"text": system}]}
|
||||
|
||||
url = f"{_BASE}/models/{model}:generateContent"
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=float(timeout)) as client:
|
||||
resp = await client.post(url, params={"key": _api_key()}, json=body)
|
||||
except httpx.HTTPError as e:
|
||||
raise GeminiError(f"Gemini request failed: {e}") from e
|
||||
if resp.status_code != 200:
|
||||
raise GeminiError(f"Gemini HTTP {resp.status_code}: {resp.text[:200]}")
|
||||
|
||||
data = resp.json()
|
||||
# Surface an explicit safety/finish block rather than returning empty.
|
||||
cand = (data.get("candidates") or [{}])[0]
|
||||
if cand.get("finishReason") in ("SAFETY", "RECITATION", "PROHIBITED_CONTENT"):
|
||||
raise GeminiError(f"Gemini blocked output: finishReason={cand['finishReason']}")
|
||||
try:
|
||||
text = cand["content"]["parts"][0]["text"]
|
||||
except (KeyError, IndexError, TypeError) as e:
|
||||
raise GeminiError(f"Gemini unexpected response: {str(data)[:200]}") from e
|
||||
try:
|
||||
return json.loads(text)
|
||||
except json.JSONDecodeError as e:
|
||||
raise GeminiError(f"Gemini returned non-JSON: {text[:200]}") from e
|
||||
@@ -15,6 +15,7 @@ from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Awaitable, Callable
|
||||
from uuid import UUID
|
||||
@@ -179,6 +180,9 @@ async def reextract_halachot(
|
||||
# precedent into a 429 storm. Observed 2026-05-03: 1110/20 succeeded with 9
|
||||
# halachot, 317/10 immediately after returned silent no_halachot.
|
||||
INTER_PRECEDENT_COOLDOWN_SEC = 30
|
||||
# Metadata extraction is on Gemini (fast, high rate limits) — a brief spacer is
|
||||
# enough; the 30s above is for the Claude-backed halacha path.
|
||||
METADATA_COOLDOWN_SEC = float(os.environ.get("METADATA_COOLDOWN_SEC", "2"))
|
||||
|
||||
# How many times to retry a precedent that came back as 'extraction_failed'
|
||||
# (i.e. >50% chunks crashed). Each retry uses a longer cooldown.
|
||||
@@ -226,11 +230,14 @@ async def process_pending_extractions(kind: str = "metadata", limit: int = 20) -
|
||||
cid, effort=config.HALACHA_BULK_EXTRACT_EFFORT,
|
||||
)
|
||||
|
||||
# Metadata extraction runs on Gemini (high rate limits, fast) — the long
|
||||
# cooldown is only needed for halacha (Claude/Anthropic rate limits).
|
||||
cooldown = METADATA_COOLDOWN_SEC if kind == "metadata" else INTER_PRECEDENT_COOLDOWN_SEC
|
||||
results: list[dict] = []
|
||||
processed = 0
|
||||
for idx, row in enumerate(pending):
|
||||
if idx > 0:
|
||||
await asyncio.sleep(INTER_PRECEDENT_COOLDOWN_SEC)
|
||||
await asyncio.sleep(cooldown)
|
||||
cid = UUID(str(row["id"]))
|
||||
attempts = 0
|
||||
result: dict = {}
|
||||
|
||||
@@ -19,7 +19,7 @@ from datetime import date as date_type
|
||||
from uuid import UUID
|
||||
|
||||
from legal_mcp.config import parse_llm_json
|
||||
from legal_mcp.services import claude_session, db
|
||||
from legal_mcp.services import db, gemini_session
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -150,7 +150,10 @@ async def extract_metadata(case_law_id: UUID | str) -> dict:
|
||||
)
|
||||
|
||||
try:
|
||||
result = await claude_session.query_json(
|
||||
# Bounded structured extraction → Gemini Flash (JSON mode). The agentic
|
||||
# claude CLI hit error_max_turns on this single-shot task; see
|
||||
# gemini_session.py. Voice-sensitive/agentic work stays on claude_session.
|
||||
result = await gemini_session.query_json(
|
||||
user_msg, system=METADATA_EXTRACTION_PROMPT,
|
||||
)
|
||||
except Exception as e:
|
||||
|
||||
@@ -24,6 +24,8 @@
|
||||
| `legal-reaper.config.cjs` | pm2/js | **דמון pm2 ל-`reap_orphan_procs.py --loop`** (ברירת-מחדל 180ש', `REAP_INTERVAL_S` לעקיפה). `max_memory_restart 100M` (ה-reaper עצמו לא ידלוף). התקנה: `pm2 start scripts/legal-reaper.config.cjs && pm2 save`. לוגים: `pm2 logs legal-reaper`. | pm2 (host-side) |
|
||||
| `drain_court_fetch.py` | python | **ריקון תור-אחזור הפסיקה (X13)** — קורא ל-`court_fetch_orchestrator.drain_pending(limit)` שמוריד+קולט כל job ממתין שהיומונים מילאו, וקושר חזרה ליומון. מקומי בלבד (ingest = claude CLI). no-op מהיר כשהתור ריק. הרצה ידנית: `mcp-server/.venv/bin/python scripts/drain_court_fetch.py [limit]`. | דרך `legal-court-fetch-drain.config.cjs` (pm2 cron) |
|
||||
| `legal-court-fetch-drain.config.cjs` | pm2/js | **תזמון שעתי של `drain_court_fetch.py`** (cron `17 * * * *`, `COURT_FETCH_DRAIN_CRON` לעקיפה) — הופך את לולאת יומון→אחזור→קליטה ל-fully-autonomous. `autorestart:false` (one-shot per tick). דורש `legal-court-fetch-service` רץ. התקנה: `pm2 start scripts/legal-court-fetch-drain.config.cjs && pm2 save`. | pm2 cron (host-side) |
|
||||
| `drain_metadata_queue.py` | python | **ריקון תור חילוץ-המטא של הפסיקה** — `process_pending_extractions(kind='metadata')` ב-batches עד ריק. רץ על **Gemini Flash** (structured JSON, `gemini_session`) — מהיר ואמין, במקום ה-claude CLI ה-agentic שפגע ב-`error_max_turns`. no-op מהיר כשריק. הרצה ידנית: `mcp-server/.venv/bin/python scripts/drain_metadata_queue.py [batch]`. | דרך `legal-metadata-drain.config.cjs` (pm2 cron) |
|
||||
| `legal-metadata-drain.config.cjs` | pm2/js | **תזמון כל 15 דק' של `drain_metadata_queue.py`** (cron `*/15 * * * *`, `METADATA_DRAIN_CRON` לעקיפה) — מונע סתימה של תור חילוץ-המטא ב-/precedents. דורש `GEMINI_API_KEY` ב-`~/.env`. התקנה: `pm2 start scripts/legal-metadata-drain.config.cjs && pm2 save`. | pm2 cron (host-side) |
|
||||
| `auto-sync-cases.sh` | bash | סנכרון תיקי ערר ל-Gitea — רץ כל דקה | `* * * * *` (cron) |
|
||||
| `backup-db.sh` | bash | גיבוי PostgreSQL יומי ל-`data/backups/` (gzip) | לתזמן: `0 2 * * *` |
|
||||
| `restore-db.sh` | bash | שחזור DB מגיבוי (companion ל-backup-db.sh) | ידני |
|
||||
|
||||
48
scripts/drain_metadata_queue.py
Normal file
48
scripts/drain_metadata_queue.py
Normal file
@@ -0,0 +1,48 @@
|
||||
"""Drain the precedent metadata-extraction queue.
|
||||
|
||||
Calls ``process_pending_extractions(kind='metadata')`` in batches until the
|
||||
queue is empty (two consecutive zero-progress rounds). Metadata extraction runs
|
||||
on **Gemini Flash** (structured JSON) — fast and reliable, unlike the agentic
|
||||
claude CLI which hit ``error_max_turns`` on this bounded task. A no-op (fast)
|
||||
when the queue is empty.
|
||||
|
||||
Host-only (reads GEMINI_API_KEY + POSTGRES_URL from ~/.env via legal_mcp.config).
|
||||
Scheduled by ``legal-metadata-drain`` (pm2 cron); also runnable by hand:
|
||||
|
||||
mcp-server/.venv/bin/python scripts/drain_metadata_queue.py [batch]
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "mcp-server", "src"))
|
||||
|
||||
from legal_mcp.services import precedent_library as pl
|
||||
|
||||
|
||||
async def main() -> int:
|
||||
batch = int(sys.argv[1]) if len(sys.argv) > 1 else 10
|
||||
total = 0
|
||||
empty_rounds = 0
|
||||
rnd = 0
|
||||
while empty_rounds < 2:
|
||||
rnd += 1
|
||||
out = await pl.process_pending_extractions(kind="metadata", limit=batch)
|
||||
processed = out.get("processed", 0)
|
||||
total += processed
|
||||
print(f"[round {rnd}] processed={processed} total_pending={out.get('total_pending', 0)} "
|
||||
f"status={out.get('status')}", flush=True)
|
||||
for r in out.get("results", []):
|
||||
print(f" {str(r.get('case_number',''))[:42]}: {r.get('status')}", flush=True)
|
||||
if processed == 0:
|
||||
empty_rounds += 1
|
||||
await asyncio.sleep(3)
|
||||
else:
|
||||
empty_rounds = 0
|
||||
print(f"===DONE=== metadata extracted (cumulative cases handled={total})", flush=True)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(asyncio.run(main()))
|
||||
34
scripts/legal-metadata-drain.config.cjs
Normal file
34
scripts/legal-metadata-drain.config.cjs
Normal file
@@ -0,0 +1,34 @@
|
||||
/**
|
||||
* pm2 ecosystem entry for legal-metadata-drain — scheduled (every 15 min) drain
|
||||
* of the precedent metadata-extraction queue (Gemini Flash). Keeps the
|
||||
* /precedents metadata queue from clogging (the prior agentic claude-CLI path
|
||||
* hit error_max_turns and nothing drained it autonomously).
|
||||
*
|
||||
* Pattern: cron_restart fires the script on schedule; autorestart:false → runs
|
||||
* once and exits (pm2 shows "stopped" between ticks — expected). Cheap no-op
|
||||
* when the queue is empty; Gemini Flash ≈ $0.10/1M tokens.
|
||||
*
|
||||
* Requires (host ~/.env via legal_mcp.config): GEMINI_API_KEY, POSTGRES_URL.
|
||||
*
|
||||
* Install (once):
|
||||
* pm2 start /home/chaim/legal-ai/scripts/legal-metadata-drain.config.cjs
|
||||
* pm2 save
|
||||
* Run now (manual): mcp-server/.venv/bin/python scripts/drain_metadata_queue.py
|
||||
* Schedule override: METADATA_DRAIN_CRON (default every 15 min).
|
||||
*/
|
||||
const cron = process.env.METADATA_DRAIN_CRON || "*/15 * * * *";
|
||||
|
||||
module.exports = {
|
||||
apps: [
|
||||
{
|
||||
name: "legal-metadata-drain",
|
||||
cwd: "/home/chaim/legal-ai",
|
||||
script: "/home/chaim/legal-ai/mcp-server/.venv/bin/python",
|
||||
args: "scripts/drain_metadata_queue.py 10",
|
||||
env: { HOME: "/home/chaim", PYTHONUNBUFFERED: "1" },
|
||||
autorestart: false, // one-shot per cron tick
|
||||
cron_restart: cron,
|
||||
max_memory_restart: "500M",
|
||||
},
|
||||
],
|
||||
};
|
||||
Reference in New Issue
Block a user