LLM session: async, 30min timeout, semantic chunking + parallel
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s

The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:

  1. subprocess.run() blocks the asyncio event loop in the MCP server
     for the full duration of every LLM call (60-180s typical).
  2. The 120-second timeout was below the cold-cache cost of any
     document over ~12K Hebrew characters. Three back-to-back timeouts
     on case 8174-24 dropped 43 appellant claims on the floor.

Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:

claude_session.py
  • query / query_json are now async — asyncio.create_subprocess_exec
    instead of subprocess.run, so MCP server can serve other coroutines
    while a call is in flight.
  • DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
    document hits it; bounded so a runaway never zombifies forever.
  • LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
  • TimeoutError now actually kills the subprocess (asyncio.wait_for
    cancellation alone leaves the child running).

claims_extractor.py
  • _split_by_sections: chunks at numbered sections / Hebrew letter
    headings / "פרק" markers / markdown ##, falls back to paragraph
    breaks, then to hard splits. Targets 12K chars per chunk — small
    enough that each chunk reliably finishes inside the timeout.
  • _extract_chunk: per-chunk retry (1 attempt by default) with
    structured logging on failure. Failed chunks no longer crash the
    overall extraction; they're skipped with a partial-result warning.
  • extract_claims_with_ai now runs chunks in parallel via
    asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
    For a 25K-char appeal: was sequential 150-300s, now ~70-90s.

Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.

The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.

Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-30 14:21:35 +00:00
parent 9bdfb05350
commit 28f49defff
10 changed files with 329 additions and 82 deletions

View File

@@ -1,27 +1,41 @@
"""Claude Code session bridge — runs prompts via `claude -p` instead of API.
All LLM calls in the project should use this module instead of calling
the Anthropic API directly. This uses the local Claude Code CLI which
runs on the user's claude.ai session — zero API cost.
All LLM calls in the project go through this module. We shell out to the
local Claude Code CLI which uses the developer's claude.ai session — zero
direct Anthropic API cost.
History: this module was originally synchronous (``subprocess.run``) with
a 120-second timeout. That broke for large legal documents:
1. Sync subprocess stalled the asyncio event loop in the MCP server
while a single LLM call was in flight.
2. 120 seconds was far too short. A 25K-character Hebrew appeal on cold
prompt cache routinely takes 130-180 seconds; we proved this in case
8174-24 (three timeouts in a row).
The fix: switch to async subprocess (non-blocking) and raise the default
ceiling to 30 minutes — long enough that no realistic document hits it,
but bounded so a runaway never zombifies forever.
"""
from __future__ import annotations
import asyncio
import json
import logging
import subprocess
from pathlib import Path
from legal_mcp.config import parse_llm_json
logger = logging.getLogger(__name__)
# Default timeout for claude -p calls (seconds)
DEFAULT_TIMEOUT = 120
LONG_TIMEOUT = 300 # For complex tasks like block writing
# Default ceiling for any single ``claude -p`` invocation, in seconds.
# 30 min covers any single-document call we make in practice (chunking
# handles the rest); the bound exists only to prevent runaway zombies.
DEFAULT_TIMEOUT = 1800
LONG_TIMEOUT = 3600 # opus block writing on full case context
def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> str:
async def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> str:
"""Send a prompt to Claude Code headless and return the text response.
Passes the prompt via stdin (not argv) to avoid the OS ARG_MAX limit —
@@ -29,14 +43,14 @@ def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> st
Args:
prompt: The prompt to send.
timeout: Max seconds to wait.
timeout: Max seconds before the subprocess is killed.
max_turns: Max conversation turns (1 = single response).
Returns:
The text response from Claude.
Raises:
RuntimeError: If claude CLI is not available or fails.
RuntimeError: If claude CLI is not available, fails, or times out.
"""
cmd = [
"claude", "-p",
@@ -45,23 +59,34 @@ def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> st
]
try:
result = subprocess.run(
cmd,
input=prompt,
capture_output=True,
text=True,
timeout=timeout,
proc = await asyncio.create_subprocess_exec(
*cmd,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
except FileNotFoundError:
raise RuntimeError("Claude CLI not found. Install Claude Code or add 'claude' to PATH.")
except subprocess.TimeoutExpired:
try:
stdout_b, stderr_b = await asyncio.wait_for(
proc.communicate(input=prompt.encode("utf-8")),
timeout=timeout,
)
except asyncio.TimeoutError:
# wait_for cancellation alone leaves the child running.
try:
proc.kill()
await proc.wait()
except ProcessLookupError:
pass
raise RuntimeError(f"Claude CLI timed out after {timeout}s")
if result.returncode != 0:
stderr = result.stderr.strip()[:500] if result.stderr else "unknown error"
raise RuntimeError(f"Claude CLI failed (exit {result.returncode}): {stderr}")
if proc.returncode != 0:
stderr = stderr_b.decode("utf-8", errors="replace").strip()[:500] or "unknown error"
raise RuntimeError(f"Claude CLI failed (exit {proc.returncode}): {stderr}")
stdout = result.stdout.strip()
stdout = stdout_b.decode("utf-8", errors="replace").strip()
if not stdout:
raise RuntimeError("Claude CLI returned empty response")
@@ -75,10 +100,10 @@ def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> st
return stdout
def query_json(prompt: str, timeout: int = DEFAULT_TIMEOUT) -> dict | list | None:
async def query_json(prompt: str, timeout: int = DEFAULT_TIMEOUT) -> dict | list | None:
"""Send a prompt and parse the response as JSON.
Uses parse_llm_json for robust parsing (handles markdown wrapping, truncation).
"""
raw = query(prompt, timeout=timeout)
raw = await query(prompt, timeout=timeout)
return parse_llm_json(raw)