LLM session: async, 30min timeout, semantic chunking + parallel
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:
1. subprocess.run() blocks the asyncio event loop in the MCP server
for the full duration of every LLM call (60-180s typical).
2. The 120-second timeout was below the cold-cache cost of any
document over ~12K Hebrew characters. Three back-to-back timeouts
on case 8174-24 dropped 43 appellant claims on the floor.
Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:
claude_session.py
• query / query_json are now async — asyncio.create_subprocess_exec
instead of subprocess.run, so MCP server can serve other coroutines
while a call is in flight.
• DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
document hits it; bounded so a runaway never zombifies forever.
• LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
• TimeoutError now actually kills the subprocess (asyncio.wait_for
cancellation alone leaves the child running).
claims_extractor.py
• _split_by_sections: chunks at numbered sections / Hebrew letter
headings / "פרק" markers / markdown ##, falls back to paragraph
breaks, then to hard splits. Targets 12K chars per chunk — small
enough that each chunk reliably finishes inside the timeout.
• _extract_chunk: per-chunk retry (1 attempt by default) with
structured logging on failure. Failed chunks no longer crash the
overall extraction; they're skipped with a partial-result warning.
• extract_claims_with_ai now runs chunks in parallel via
asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
For a 25K-char appeal: was sequential 150-300s, now ~70-90s.
Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.
The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.
Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,27 +1,41 @@
|
||||
"""Claude Code session bridge — runs prompts via `claude -p` instead of API.
|
||||
|
||||
All LLM calls in the project should use this module instead of calling
|
||||
the Anthropic API directly. This uses the local Claude Code CLI which
|
||||
runs on the user's claude.ai session — zero API cost.
|
||||
All LLM calls in the project go through this module. We shell out to the
|
||||
local Claude Code CLI which uses the developer's claude.ai session — zero
|
||||
direct Anthropic API cost.
|
||||
|
||||
History: this module was originally synchronous (``subprocess.run``) with
|
||||
a 120-second timeout. That broke for large legal documents:
|
||||
|
||||
1. Sync subprocess stalled the asyncio event loop in the MCP server
|
||||
while a single LLM call was in flight.
|
||||
2. 120 seconds was far too short. A 25K-character Hebrew appeal on cold
|
||||
prompt cache routinely takes 130-180 seconds; we proved this in case
|
||||
8174-24 (three timeouts in a row).
|
||||
|
||||
The fix: switch to async subprocess (non-blocking) and raise the default
|
||||
ceiling to 30 minutes — long enough that no realistic document hits it,
|
||||
but bounded so a runaway never zombifies forever.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
|
||||
from legal_mcp.config import parse_llm_json
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Default timeout for claude -p calls (seconds)
|
||||
DEFAULT_TIMEOUT = 120
|
||||
LONG_TIMEOUT = 300 # For complex tasks like block writing
|
||||
# Default ceiling for any single ``claude -p`` invocation, in seconds.
|
||||
# 30 min covers any single-document call we make in practice (chunking
|
||||
# handles the rest); the bound exists only to prevent runaway zombies.
|
||||
DEFAULT_TIMEOUT = 1800
|
||||
LONG_TIMEOUT = 3600 # opus block writing on full case context
|
||||
|
||||
|
||||
def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> str:
|
||||
async def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> str:
|
||||
"""Send a prompt to Claude Code headless and return the text response.
|
||||
|
||||
Passes the prompt via stdin (not argv) to avoid the OS ARG_MAX limit —
|
||||
@@ -29,14 +43,14 @@ def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> st
|
||||
|
||||
Args:
|
||||
prompt: The prompt to send.
|
||||
timeout: Max seconds to wait.
|
||||
timeout: Max seconds before the subprocess is killed.
|
||||
max_turns: Max conversation turns (1 = single response).
|
||||
|
||||
Returns:
|
||||
The text response from Claude.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If claude CLI is not available or fails.
|
||||
RuntimeError: If claude CLI is not available, fails, or times out.
|
||||
"""
|
||||
cmd = [
|
||||
"claude", "-p",
|
||||
@@ -45,23 +59,34 @@ def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> st
|
||||
]
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
input=prompt,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
stdin=asyncio.subprocess.PIPE,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
)
|
||||
except FileNotFoundError:
|
||||
raise RuntimeError("Claude CLI not found. Install Claude Code or add 'claude' to PATH.")
|
||||
except subprocess.TimeoutExpired:
|
||||
|
||||
try:
|
||||
stdout_b, stderr_b = await asyncio.wait_for(
|
||||
proc.communicate(input=prompt.encode("utf-8")),
|
||||
timeout=timeout,
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
# wait_for cancellation alone leaves the child running.
|
||||
try:
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
except ProcessLookupError:
|
||||
pass
|
||||
raise RuntimeError(f"Claude CLI timed out after {timeout}s")
|
||||
|
||||
if result.returncode != 0:
|
||||
stderr = result.stderr.strip()[:500] if result.stderr else "unknown error"
|
||||
raise RuntimeError(f"Claude CLI failed (exit {result.returncode}): {stderr}")
|
||||
if proc.returncode != 0:
|
||||
stderr = stderr_b.decode("utf-8", errors="replace").strip()[:500] or "unknown error"
|
||||
raise RuntimeError(f"Claude CLI failed (exit {proc.returncode}): {stderr}")
|
||||
|
||||
stdout = result.stdout.strip()
|
||||
stdout = stdout_b.decode("utf-8", errors="replace").strip()
|
||||
if not stdout:
|
||||
raise RuntimeError("Claude CLI returned empty response")
|
||||
|
||||
@@ -75,10 +100,10 @@ def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> st
|
||||
return stdout
|
||||
|
||||
|
||||
def query_json(prompt: str, timeout: int = DEFAULT_TIMEOUT) -> dict | list | None:
|
||||
async def query_json(prompt: str, timeout: int = DEFAULT_TIMEOUT) -> dict | list | None:
|
||||
"""Send a prompt and parse the response as JSON.
|
||||
|
||||
Uses parse_llm_json for robust parsing (handles markdown wrapping, truncation).
|
||||
"""
|
||||
raw = query(prompt, timeout=timeout)
|
||||
raw = await query(prompt, timeout=timeout)
|
||||
return parse_llm_json(raw)
|
||||
|
||||
Reference in New Issue
Block a user