Files
legal-ai/mcp-server/src/legal_mcp/services/gemini_session.py
Chaim a40c4ee828
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 4s
Lint — undefined names / undefined-names (pull_request) Successful in 10s
fix(metadata): accept GOOGLE_GEMINI_API_KEY (canonical) in gemini_session — host metadata extraction broke
_api_key() read ONLY `GEMINI_API_KEY`, but the canonical secret (host ~/.env and
Infisical SoT nautilus:/external-apis/gemini) is `GOOGLE_GEMINI_API_KEY`. The key
was present but under the canonical name → `_api_key()` raised "GEMINI_API_KEY אינו
מוגדר" on every call → ALL host precedent-metadata extraction via Gemini failed
silently (186 such errors in the legal-metadata-drain err log, latest 2026-06-14).

Fix: read GEMINI_API_KEY if set, else fall back to GOOGLE_GEMINI_API_KEY. No new
secret, no duplication — aligns the code to the existing SoT name (G1: fix at
source). Verified live: _api_key() resolves (len=53) and a real gemini query_json
call returns {"ok": true}.

Invariants: G1 (fix at source — code reads the canonical secret name, not a
parallel/duplicated env var) · X10 (deploy-env-secrets: single SoT name honored).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 20:33:15 +00:00

107 lines
4.3 KiB
Python

"""Gemini structured-output helper — a drop-in for ``claude_session.query_json``
for BOUNDED extraction tasks (text → JSON).
Why a second LLM path: metadata extraction is a single structured call (fill
case_name/summary/headnote/tags from a verdict's text), not an agentic loop. The
``claude -p`` CLI behind ``claude_session`` is agentic — it reaches for tools and
hits ``error_max_turns`` on a task that should be one shot — so it was slow and
flaky for the precedent metadata queue. Gemini Flash with JSON mode
(``responseMimeType: application/json``) is the right tool: one call, schema-
clean JSON, fast, and ~$0.10/1M tokens (negligible for this volume).
Scope: **bounded extraction only** (precedent metadata). The agentic, voice-
sensitive work — decision writing, analysis, halacha extraction — stays on
``claude_session`` (Daphna's subscription, zero API cost). This is a deliberate
per-task provider choice, not a wholesale move off Claude.
Key: ``GOOGLE_GEMINI_API_KEY`` (the canonical host ~/.env / Infisical name, SoT
nautilus:/external-apis/gemini); ``GEMINI_API_KEY`` is also accepted as an alias.
Model: ``GEMINI_MODEL`` (default gemini-2.5-flash).
Direct REST via httpx — no extra SDK dependency.
"""
from __future__ import annotations
import json
import logging
import os
import httpx
logger = logging.getLogger(__name__)
_BASE = "https://generativelanguage.googleapis.com/v1beta"
_DEFAULT_MODEL = os.environ.get("GEMINI_MODEL", "gemini-2.5-flash")
_DEFAULT_TIMEOUT = float(os.environ.get("GEMINI_TIMEOUT_S", "120"))
class GeminiError(RuntimeError):
"""Gemini API call failed or returned an unexpected shape."""
def _api_key() -> str:
# Accept BOTH names: the canonical Infisical / host-~/.env secret is
# ``GOOGLE_GEMINI_API_KEY`` (SoT nautilus:/external-apis/gemini), while older
# call sites / container envs may export ``GEMINI_API_KEY``. Reading only the
# latter silently broke ALL host metadata extraction (the key is present but
# under the canonical name). Prefer GEMINI_API_KEY if set, else the SoT name.
key = (
os.environ.get("GEMINI_API_KEY", "").strip()
or os.environ.get("GOOGLE_GEMINI_API_KEY", "").strip()
)
if not key:
raise GeminiError(
"GEMINI_API_KEY/GOOGLE_GEMINI_API_KEY אינו מוגדר (host ~/.env / "
"Infisical nautilus:/external-apis/gemini)."
)
return key
async def query_json(
prompt: str,
timeout: float | int = _DEFAULT_TIMEOUT,
*,
system: str | None = None,
model: str | None = None,
# Accepted for drop-in parity with claude_session.query_json; ignored here.
effort: str | None = None,
tools: str | None = None,
) -> dict | list | None:
"""Single structured-output call → parsed JSON. Drop-in for
``claude_session.query_json``. Raises ``GeminiError`` on failure (the caller
treats that like any extraction failure — recorded, never silently wrong).
"""
model = model or _DEFAULT_MODEL
body: dict = {
"contents": [{"role": "user", "parts": [{"text": prompt}]}],
"generationConfig": {
"responseMimeType": "application/json",
"temperature": 0,
},
}
if system:
body["system_instruction"] = {"parts": [{"text": system}]}
url = f"{_BASE}/models/{model}:generateContent"
try:
async with httpx.AsyncClient(timeout=float(timeout)) as client:
resp = await client.post(url, params={"key": _api_key()}, json=body)
except httpx.HTTPError as e:
raise GeminiError(f"Gemini request failed: {e}") from e
if resp.status_code != 200:
raise GeminiError(f"Gemini HTTP {resp.status_code}: {resp.text[:200]}")
data = resp.json()
# Surface an explicit safety/finish block rather than returning empty.
cand = (data.get("candidates") or [{}])[0]
if cand.get("finishReason") in ("SAFETY", "RECITATION", "PROHIBITED_CONTENT"):
raise GeminiError(f"Gemini blocked output: finishReason={cand['finishReason']}")
try:
text = cand["content"]["parts"][0]["text"]
except (KeyError, IndexError, TypeError) as e:
raise GeminiError(f"Gemini unexpected response: {str(data)[:200]}") from e
try:
return json.loads(text)
except json.JSONDecodeError as e:
raise GeminiError(f"Gemini returned non-JSON: {text[:200]}") from e