Files
legal-ai/mcp-server/src/legal_mcp/services/claude_session.py
Chaim 2cfdf35191
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
refactor(precedents): keep all LLM calls on the local-MCP path
Architectural correction: every claude_session caller in this project
runs through the local MCP server (~/.claude.json points at
/home/chaim/legal-ai/mcp-server/.venv/bin/python). The Coolify container
has no `claude` CLI and no claude.ai session, so any LLM call originating
from web/ FastAPI fails with "Claude CLI not found" — which is exactly
what we hit on 403-17.

The earlier Anthropic SDK fallback would have made it work, but at
direct API cost. The chair's preference is to stay on the claude.ai
session for everything. So:

- claude_session.py: removed the SDK fallback, restored CLI-only.
  The error message now points the next person at the architectural
  rule in the module docstring instead of papering over it.
- precedent_library.py:ingest_precedent (called from FastAPI on upload)
  now does only the non-LLM half: extract → chunk → embed → store.
  Sets halacha_extraction_status='pending' for the chair to act on.
- reextract_halachot / reextract_metadata kept, but lazy-import their
  extractors so the FastAPI path can't accidentally pull them in. They
  are reachable only via the MCP tools precedent_extract_halachot /
  precedent_extract_metadata, which run locally with CLI.
- Removed POST /api/precedent-library/{id}/extract-halachot and
  /extract-metadata — they were dead ends from the container.
- Dropped the `anthropic` Python dep that the SDK fallback required.
- UI: removed the "refresh halachot" and "sparkles metadata" buttons
  that called those endpoints. Edit sheet now points the chair at the
  MCP tool names instead.

Halacha and metadata extraction for an uploaded precedent now happen
when the chair (via Claude Code) runs:
  mcp__legal-ai__precedent_extract_metadata <case_law_id>
  mcp__legal-ai__precedent_extract_halachot <case_law_id>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:06:08 +00:00

141 lines
5.1 KiB
Python

"""Claude Code session bridge — runs prompts via the local `claude` CLI.
All LLM calls in legal-ai go through this module. We shell out to the local
Claude Code CLI which uses the developer's claude.ai session — zero direct
API cost.
**Architectural rule (do not violate):** this module only works when invoked
from the local MCP server (the Python process at
`/home/chaim/legal-ai/mcp-server/`, launched per `~/.claude.json`). It will
**not** work when called from the legal-ai Docker container — that container
has no `claude` CLI and no claude.ai session. Any code path under `web/`
(FastAPI) that calls this module — directly or via an extractor like
`halacha_extractor`, `claims_extractor`, `precedent_metadata_extractor`,
`block_writer`, `qa_validator`, `learning_loop`, `local_classifier`,
`appraiser_facts_extractor`, `brainstorm`, `style_analyzer` — is wrong.
LLM-dependent operations must be exposed as MCP tools and triggered from
agents (or the chair via Claude Code), where this module runs locally with
CLI access.
Async history: originally synchronous (``subprocess.run``) with a 120 s
timeout. That broke for large legal documents — sync subprocess stalled the
asyncio loop, and 120 s was far too short for cold-cache Hebrew prompts
(case 8174-24 hit three timeouts in a row). Fixed by going async with a
30-minute ceiling.
"""
from __future__ import annotations
import asyncio
import json
import logging
from legal_mcp.config import parse_llm_json
logger = logging.getLogger(__name__)
# Default ceiling for any single ``claude -p`` invocation, in seconds.
# 30 min covers any single-document call we make in practice (chunking
# handles the rest); the bound exists only to prevent runaway zombies.
DEFAULT_TIMEOUT = 1800
LONG_TIMEOUT = 3600 # opus block writing on full case context
async def query(
prompt: str,
timeout: int = DEFAULT_TIMEOUT,
max_turns: int = 1,
*,
system: str | None = None,
) -> str:
"""Send a prompt to Claude Code headless and return the text response.
Passes the prompt via stdin (not argv) to avoid the OS ARG_MAX limit —
prompts can be 500K+ chars when analyzing a full style corpus.
Args:
prompt: The prompt to send.
timeout: Max seconds before the subprocess is killed.
max_turns: Max conversation turns (1 = single response).
system: Optional repeated-instruction text. Prepended to ``prompt``
for the CLI; we don't pass it as a separate arg because the
CLI doesn't expose API-level caching. The parameter exists so
extractors can structure their calls cleanly today, and to make
a future SDK-backed path drop-in.
Returns:
The text response from Claude.
Raises:
RuntimeError: if the CLI is unavailable (e.g., called from the
container — see module docstring), or fails, or times out.
"""
full_prompt = f"{system}\n\n{prompt}" if system else prompt
cmd = [
"claude", "-p",
"--output-format", "json",
"--max-turns", str(max_turns),
]
try:
proc = await asyncio.create_subprocess_exec(
*cmd,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
except FileNotFoundError:
raise RuntimeError(
"Claude CLI not found. This module only works when invoked "
"from the local MCP server — see the architectural rule in "
"the module docstring. If this error came from a FastAPI "
"endpoint in the container, refactor the call into an MCP "
"tool that the chair triggers from Claude Code."
)
try:
stdout_b, stderr_b = await asyncio.wait_for(
proc.communicate(input=full_prompt.encode("utf-8")),
timeout=timeout,
)
except asyncio.TimeoutError:
# wait_for cancellation alone leaves the child running.
try:
proc.kill()
await proc.wait()
except ProcessLookupError:
pass
raise RuntimeError(f"Claude CLI timed out after {timeout}s")
if proc.returncode != 0:
stderr = stderr_b.decode("utf-8", errors="replace").strip()[:500] or "unknown error"
raise RuntimeError(f"Claude CLI failed (exit {proc.returncode}): {stderr}")
stdout = stdout_b.decode("utf-8", errors="replace").strip()
if not stdout:
raise RuntimeError("Claude CLI returned empty response")
# claude -p --output-format json returns {"type":"result","result":"..."}
try:
data = json.loads(stdout)
if isinstance(data, dict) and "result" in data:
return data["result"]
return stdout
except json.JSONDecodeError:
return stdout
async def query_json(
prompt: str,
timeout: int = DEFAULT_TIMEOUT,
*,
system: str | None = None,
) -> dict | list | None:
"""Send a prompt and parse the response as JSON.
Uses parse_llm_json for robust parsing (handles markdown wrapping, truncation).
"""
raw = await query(prompt, timeout=timeout, system=system)
return parse_llm_json(raw)