All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
Architectural correction: every claude_session caller in this project
runs through the local MCP server (~/.claude.json points at
/home/chaim/legal-ai/mcp-server/.venv/bin/python). The Coolify container
has no `claude` CLI and no claude.ai session, so any LLM call originating
from web/ FastAPI fails with "Claude CLI not found" — which is exactly
what we hit on 403-17.
The earlier Anthropic SDK fallback would have made it work, but at
direct API cost. The chair's preference is to stay on the claude.ai
session for everything. So:
- claude_session.py: removed the SDK fallback, restored CLI-only.
The error message now points the next person at the architectural
rule in the module docstring instead of papering over it.
- precedent_library.py:ingest_precedent (called from FastAPI on upload)
now does only the non-LLM half: extract → chunk → embed → store.
Sets halacha_extraction_status='pending' for the chair to act on.
- reextract_halachot / reextract_metadata kept, but lazy-import their
extractors so the FastAPI path can't accidentally pull them in. They
are reachable only via the MCP tools precedent_extract_halachot /
precedent_extract_metadata, which run locally with CLI.
- Removed POST /api/precedent-library/{id}/extract-halachot and
/extract-metadata — they were dead ends from the container.
- Dropped the `anthropic` Python dep that the SDK fallback required.
- UI: removed the "refresh halachot" and "sparkles metadata" buttons
that called those endpoints. Edit sheet now points the chair at the
MCP tool names instead.
Halacha and metadata extraction for an uploaded precedent now happen
when the chair (via Claude Code) runs:
mcp__legal-ai__precedent_extract_metadata <case_law_id>
mcp__legal-ai__precedent_extract_halachot <case_law_id>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
141 lines
5.1 KiB
Python
141 lines
5.1 KiB
Python
"""Claude Code session bridge — runs prompts via the local `claude` CLI.
|
|
|
|
All LLM calls in legal-ai go through this module. We shell out to the local
|
|
Claude Code CLI which uses the developer's claude.ai session — zero direct
|
|
API cost.
|
|
|
|
**Architectural rule (do not violate):** this module only works when invoked
|
|
from the local MCP server (the Python process at
|
|
`/home/chaim/legal-ai/mcp-server/`, launched per `~/.claude.json`). It will
|
|
**not** work when called from the legal-ai Docker container — that container
|
|
has no `claude` CLI and no claude.ai session. Any code path under `web/`
|
|
(FastAPI) that calls this module — directly or via an extractor like
|
|
`halacha_extractor`, `claims_extractor`, `precedent_metadata_extractor`,
|
|
`block_writer`, `qa_validator`, `learning_loop`, `local_classifier`,
|
|
`appraiser_facts_extractor`, `brainstorm`, `style_analyzer` — is wrong.
|
|
LLM-dependent operations must be exposed as MCP tools and triggered from
|
|
agents (or the chair via Claude Code), where this module runs locally with
|
|
CLI access.
|
|
|
|
Async history: originally synchronous (``subprocess.run``) with a 120 s
|
|
timeout. That broke for large legal documents — sync subprocess stalled the
|
|
asyncio loop, and 120 s was far too short for cold-cache Hebrew prompts
|
|
(case 8174-24 hit three timeouts in a row). Fixed by going async with a
|
|
30-minute ceiling.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import asyncio
|
|
import json
|
|
import logging
|
|
|
|
from legal_mcp.config import parse_llm_json
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Default ceiling for any single ``claude -p`` invocation, in seconds.
|
|
# 30 min covers any single-document call we make in practice (chunking
|
|
# handles the rest); the bound exists only to prevent runaway zombies.
|
|
DEFAULT_TIMEOUT = 1800
|
|
LONG_TIMEOUT = 3600 # opus block writing on full case context
|
|
|
|
|
|
async def query(
|
|
prompt: str,
|
|
timeout: int = DEFAULT_TIMEOUT,
|
|
max_turns: int = 1,
|
|
*,
|
|
system: str | None = None,
|
|
) -> str:
|
|
"""Send a prompt to Claude Code headless and return the text response.
|
|
|
|
Passes the prompt via stdin (not argv) to avoid the OS ARG_MAX limit —
|
|
prompts can be 500K+ chars when analyzing a full style corpus.
|
|
|
|
Args:
|
|
prompt: The prompt to send.
|
|
timeout: Max seconds before the subprocess is killed.
|
|
max_turns: Max conversation turns (1 = single response).
|
|
system: Optional repeated-instruction text. Prepended to ``prompt``
|
|
for the CLI; we don't pass it as a separate arg because the
|
|
CLI doesn't expose API-level caching. The parameter exists so
|
|
extractors can structure their calls cleanly today, and to make
|
|
a future SDK-backed path drop-in.
|
|
|
|
Returns:
|
|
The text response from Claude.
|
|
|
|
Raises:
|
|
RuntimeError: if the CLI is unavailable (e.g., called from the
|
|
container — see module docstring), or fails, or times out.
|
|
"""
|
|
full_prompt = f"{system}\n\n{prompt}" if system else prompt
|
|
|
|
cmd = [
|
|
"claude", "-p",
|
|
"--output-format", "json",
|
|
"--max-turns", str(max_turns),
|
|
]
|
|
|
|
try:
|
|
proc = await asyncio.create_subprocess_exec(
|
|
*cmd,
|
|
stdin=asyncio.subprocess.PIPE,
|
|
stdout=asyncio.subprocess.PIPE,
|
|
stderr=asyncio.subprocess.PIPE,
|
|
)
|
|
except FileNotFoundError:
|
|
raise RuntimeError(
|
|
"Claude CLI not found. This module only works when invoked "
|
|
"from the local MCP server — see the architectural rule in "
|
|
"the module docstring. If this error came from a FastAPI "
|
|
"endpoint in the container, refactor the call into an MCP "
|
|
"tool that the chair triggers from Claude Code."
|
|
)
|
|
|
|
try:
|
|
stdout_b, stderr_b = await asyncio.wait_for(
|
|
proc.communicate(input=full_prompt.encode("utf-8")),
|
|
timeout=timeout,
|
|
)
|
|
except asyncio.TimeoutError:
|
|
# wait_for cancellation alone leaves the child running.
|
|
try:
|
|
proc.kill()
|
|
await proc.wait()
|
|
except ProcessLookupError:
|
|
pass
|
|
raise RuntimeError(f"Claude CLI timed out after {timeout}s")
|
|
|
|
if proc.returncode != 0:
|
|
stderr = stderr_b.decode("utf-8", errors="replace").strip()[:500] or "unknown error"
|
|
raise RuntimeError(f"Claude CLI failed (exit {proc.returncode}): {stderr}")
|
|
|
|
stdout = stdout_b.decode("utf-8", errors="replace").strip()
|
|
if not stdout:
|
|
raise RuntimeError("Claude CLI returned empty response")
|
|
|
|
# claude -p --output-format json returns {"type":"result","result":"..."}
|
|
try:
|
|
data = json.loads(stdout)
|
|
if isinstance(data, dict) and "result" in data:
|
|
return data["result"]
|
|
return stdout
|
|
except json.JSONDecodeError:
|
|
return stdout
|
|
|
|
|
|
async def query_json(
|
|
prompt: str,
|
|
timeout: int = DEFAULT_TIMEOUT,
|
|
*,
|
|
system: str | None = None,
|
|
) -> dict | list | None:
|
|
"""Send a prompt and parse the response as JSON.
|
|
|
|
Uses parse_llm_json for robust parsing (handles markdown wrapping, truncation).
|
|
"""
|
|
raw = await query(prompt, timeout=timeout, system=system)
|
|
return parse_llm_json(raw)
|