LLM session: async, 30min timeout, semantic chunking + parallel
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:
1. subprocess.run() blocks the asyncio event loop in the MCP server
for the full duration of every LLM call (60-180s typical).
2. The 120-second timeout was below the cold-cache cost of any
document over ~12K Hebrew characters. Three back-to-back timeouts
on case 8174-24 dropped 43 appellant claims on the floor.
Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:
claude_session.py
• query / query_json are now async — asyncio.create_subprocess_exec
instead of subprocess.run, so MCP server can serve other coroutines
while a call is in flight.
• DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
document hits it; bounded so a runaway never zombifies forever.
• LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
• TimeoutError now actually kills the subprocess (asyncio.wait_for
cancellation alone leaves the child running).
claims_extractor.py
• _split_by_sections: chunks at numbered sections / Hebrew letter
headings / "פרק" markers / markdown ##, falls back to paragraph
breaks, then to hard splits. Targets 12K chars per chunk — small
enough that each chunk reliably finishes inside the timeout.
• _extract_chunk: per-chunk retry (1 attempt by default) with
structured logging on failure. Failed chunks no longer crash the
overall extraction; they're skipped with a partial-result warning.
• extract_claims_with_ai now runs chunks in parallel via
asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
For a 25K-char appeal: was sequential 150-300s, now ~70-90s.
Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.
The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.
Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -159,7 +159,7 @@ async def _analyze_single_pass(rows, appeal_subtype: str = "") -> dict:
|
||||
decisions_text += f"\n\n--- החלטה {row['decision_number'] or 'ללא מספר'} ---\n"
|
||||
decisions_text += row["full_text"]
|
||||
|
||||
raw = claude_session.query(
|
||||
raw = await claude_session.query(
|
||||
ANALYSIS_PROMPT.format(decisions=decisions_text),
|
||||
timeout=claude_session.LONG_TIMEOUT,
|
||||
)
|
||||
@@ -176,7 +176,7 @@ async def _analyze_multi_pass(rows, appeal_subtype: str = "") -> dict:
|
||||
decision_text = f"--- החלטה {row['decision_number'] or 'ללא מספר'} ---\n"
|
||||
decision_text += row["full_text"]
|
||||
|
||||
raw = claude_session.query(
|
||||
raw = await claude_session.query(
|
||||
SINGLE_DECISION_PROMPT.format(decision=decision_text),
|
||||
timeout=claude_session.LONG_TIMEOUT,
|
||||
)
|
||||
@@ -189,7 +189,7 @@ async def _analyze_multi_pass(rows, appeal_subtype: str = "") -> dict:
|
||||
return {"error": "לא הצלחתי לחלץ דפוסים מההחלטות"}
|
||||
|
||||
# Pass 2: Synthesize across all decisions
|
||||
raw = claude_session.query(
|
||||
raw = await claude_session.query(
|
||||
SYNTHESIS_PROMPT.format(
|
||||
num_decisions=len(rows),
|
||||
patterns=json.dumps(all_patterns, ensure_ascii=False, indent=2),
|
||||
|
||||
Reference in New Issue
Block a user