LLM session: async, 30min timeout, semantic chunking + parallel
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s

The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:

  1. subprocess.run() blocks the asyncio event loop in the MCP server
     for the full duration of every LLM call (60-180s typical).
  2. The 120-second timeout was below the cold-cache cost of any
     document over ~12K Hebrew characters. Three back-to-back timeouts
     on case 8174-24 dropped 43 appellant claims on the floor.

Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:

claude_session.py
  • query / query_json are now async — asyncio.create_subprocess_exec
    instead of subprocess.run, so MCP server can serve other coroutines
    while a call is in flight.
  • DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
    document hits it; bounded so a runaway never zombifies forever.
  • LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
  • TimeoutError now actually kills the subprocess (asyncio.wait_for
    cancellation alone leaves the child running).

claims_extractor.py
  • _split_by_sections: chunks at numbered sections / Hebrew letter
    headings / "פרק" markers / markdown ##, falls back to paragraph
    breaks, then to hard splits. Targets 12K chars per chunk — small
    enough that each chunk reliably finishes inside the timeout.
  • _extract_chunk: per-chunk retry (1 attempt by default) with
    structured logging on failure. Failed chunks no longer crash the
    overall extraction; they're skipped with a partial-result warning.
  • extract_claims_with_ai now runs chunks in parallel via
    asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
    For a 25K-char appeal: was sequential 150-300s, now ~70-90s.

Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.

The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.

Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-30 14:21:35 +00:00
parent 9bdfb05350
commit 28f49defff
10 changed files with 329 additions and 82 deletions

View File

@@ -7,6 +7,7 @@
from __future__ import annotations
import asyncio
import logging
import re
from uuid import UUID
@@ -17,6 +18,21 @@ from legal_mcp.services import db, claude_session
logger = logging.getLogger(__name__)
# Each chunk targets ~12K chars (≈3K tokens of Hebrew). Smaller than the
# previous 25K because:
# • A single ``claude -p`` call on a 25K-char Hebrew prompt with cold
# cache routinely hit ~150-180s. 12K chunks finish in ~60-90s.
# • Per-chunk retry costs less when chunks are smaller.
# • Parallel chunks benefit more — see CHUNK_CONCURRENCY.
CHUNK_TARGET_CHARS = 12000
# How many chunks to send to Claude in parallel. Each subprocess holds
# ~300 MB RSS plus its own MCP stack; concurrency=3 keeps the box usable.
CHUNK_CONCURRENCY = 3
# How many retry attempts per failed chunk before giving up on it.
CHUNK_RETRY_ATTEMPTS = 1
EXTRACT_CLAIMS_PROMPT = """אתה מנתח מסמכים משפטיים בתחום תכנון ובניה. תפקידך לחלץ טענות מכתב טענות.
@@ -43,6 +59,103 @@ EXTRACT_CLAIMS_PROMPT = """אתה מנתח מסמכים משפטיים בתחו
"""
# Section markers we treat as natural chunk boundaries when present.
# Hebrew legal briefs almost always use numbered sections like "10." or
# letter-section headings (".א", ".ב"). Splitting between sections keeps
# every chunk a self-contained argumentative unit.
_SECTION_BOUNDARY_RE = re.compile(
r"\n\s*("
r"\d+\.\s+\S" # numbered section: "10. טענות"
r"|[א-ת]\.\s+\S" # Hebrew letter section: "א. רקע"
r"|##\s+\S" # markdown heading
r"|פרק\s+\S" # "פרק" headings
r")"
)
def _split_by_sections(text: str, target: int = CHUNK_TARGET_CHARS) -> list[str]:
"""Split a long document into roughly ``target``-sized chunks at section
boundaries. Falls back to paragraph breaks, then to hard splits if a
section happens to be larger than ``target`` on its own.
"""
if len(text) <= target:
return [text]
boundaries = [m.start() for m in _SECTION_BOUNDARY_RE.finditer(text)]
boundaries = [0, *boundaries, len(text)]
chunks: list[str] = []
start = 0
for cut in boundaries[1:]:
# Greedy: keep adding sections to the current chunk until adding
# the next one would push past ``target``.
if cut - start < target:
continue
end = cut
if end - start > target * 1.5:
# Section group exceeds 1.5× target — fall back to paragraph
# break inside it to avoid one chunk being far too big.
soft = text.rfind("\n\n", start, start + target)
if soft > start + target // 2:
end = soft
chunks.append(text[start:end].strip())
start = end
if start < len(text):
chunks.append(text[start:].strip())
# Hard splits for any chunk that is still too large (rare, but
# documents without any section markers can fall through).
final: list[str] = []
for c in chunks:
if len(c) <= target * 1.5:
final.append(c)
continue
for i in range(0, len(c), target):
final.append(c[i:i + target])
return [c for c in final if c.strip()]
async def _extract_chunk(
chunk: str,
chunk_index: int,
chunk_total: int,
context: str,
) -> tuple[int, list[dict] | None]:
"""Run extraction on one chunk with retry. Returns ``(chunk_index, claims_or_None)``.
None means the chunk failed both the initial call and every retry
(caller can use this to mark the result as partial).
"""
chunk_label = f" (חלק {chunk_index + 1}/{chunk_total})" if chunk_total > 1 else ""
prompt = (
f"{EXTRACT_CLAIMS_PROMPT}\n\n"
f"{context}{chunk_label}\n\n"
f"--- תחילת מסמך ---\n{chunk}\n--- סוף מסמך ---"
)
last_err: Exception | None = None
for attempt in range(CHUNK_RETRY_ATTEMPTS + 1):
try:
claims = await claude_session.query_json(prompt)
except Exception as e:
last_err = e
logger.warning(
"extract_claims chunk %d/%d attempt %d raised: %s",
chunk_index + 1, chunk_total, attempt + 1, e,
)
continue
if isinstance(claims, list):
return chunk_index, claims
logger.warning(
"extract_claims chunk %d/%d attempt %d returned non-list (%s)",
chunk_index + 1, chunk_total, attempt + 1, type(claims).__name__,
)
logger.error(
"extract_claims chunk %d/%d failed after %d attempts: %s",
chunk_index + 1, chunk_total, CHUNK_RETRY_ATTEMPTS + 1, last_err,
)
return chunk_index, None
async def extract_claims_with_ai(
text: str,
doc_type: str = "appeal",
@@ -50,68 +163,62 @@ async def extract_claims_with_ai(
) -> list[dict]:
"""חילוץ טענות מכתב טענות באמצעות Claude.
Splits ``text`` at section boundaries, runs every chunk through
Claude in parallel (bounded by ``CHUNK_CONCURRENCY``), retries each
failed chunk once, and merges the results in original document order.
Failed chunks are logged but don't block the overall extraction —
we return what we got and surface the gap via the logs.
Args:
text: טקסט המסמך
doc_type: סוג המסמך (appeal/response)
party_hint: רמז לזהות הצד (אם ידוע)
Returns:
רשימת טענות עם party_role, claim_text, topic
רשימת טענות עם party_role, claim_text, topic, claim_index.
"""
context = f"סוג המסמך: {doc_type}"
if party_hint:
context += f"\nהצד המגיש: {party_hint}"
# For very long documents, split into chunks and merge results
max_chars_per_call = 25000
chunks = []
if len(text) > max_chars_per_call:
# Split at paragraph boundaries
pos = 0
while pos < len(text):
end = min(pos + max_chars_per_call, len(text))
if end < len(text):
# Find paragraph break near the limit
break_pos = text.rfind("\n\n", pos, end)
if break_pos > pos + max_chars_per_call // 2:
end = break_pos
chunks.append(text[pos:end])
pos = end
logger.info("Document split into %d chunks (%d chars total)", len(chunks), len(text))
else:
chunks = [text]
all_claims = []
for i, chunk in enumerate(chunks):
chunk_label = f" (חלק {i+1}/{len(chunks)})" if len(chunks) > 1 else ""
prompt = (
f"{EXTRACT_CLAIMS_PROMPT}\n\n"
f"{context}{chunk_label}\n\n"
f"--- תחילת מסמך ---\n{chunk}\n--- סוף מסמך ---"
chunks = _split_by_sections(text)
if len(chunks) > 1:
logger.info(
"extract_claims: split %d chars into %d chunks (target=%d, concurrency=%d)",
len(text), len(chunks), CHUNK_TARGET_CHARS, CHUNK_CONCURRENCY,
)
claims = claude_session.query_json(prompt, timeout=120)
if claims is None:
logger.warning("Failed to parse claims for chunk %d: %s", i, raw[:200])
sem = asyncio.Semaphore(CHUNK_CONCURRENCY)
async def _bounded(idx: int, c: str) -> tuple[int, list[dict] | None]:
async with sem:
return await _extract_chunk(c, idx, len(chunks), context)
results = await asyncio.gather(*[_bounded(i, c) for i, c in enumerate(chunks)])
# Merge in original order. Skip chunks that failed entirely.
failed = [i for i, r in results if r is None]
if failed:
logger.warning(
"extract_claims: %d/%d chunks failed (indices=%s) — returning partial result",
len(failed), len(chunks), failed,
)
merged: list[dict] = []
for idx, claims in sorted(results, key=lambda x: x[0]):
if not claims:
continue
if isinstance(claims, list):
all_claims.extend(claims)
merged.extend(claims)
claims = all_claims
if not claims:
return []
if not isinstance(claims, list):
return []
# Add claim_index
for i, claim in enumerate(claims):
claim["claim_index"] = i
# Validate required fields
# Add claim_index and drop entries missing required fields.
cleaned: list[dict] = []
for i, claim in enumerate(merged):
if not isinstance(claim, dict):
continue
if "party_role" not in claim or "claim_text" not in claim:
continue
return [c for c in claims if "party_role" in c and "claim_text" in c]
claim["claim_index"] = i
cleaned.append(claim)
return cleaned
def _infer_claim_type(doc_type: str, source_name: str) -> str: