Files
legal-ai/mcp-server/src/legal_mcp/services/learning_loop.py
Chaim 28f49defff
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
LLM session: async, 30min timeout, semantic chunking + parallel
The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:

  1. subprocess.run() blocks the asyncio event loop in the MCP server
     for the full duration of every LLM call (60-180s typical).
  2. The 120-second timeout was below the cold-cache cost of any
     document over ~12K Hebrew characters. Three back-to-back timeouts
     on case 8174-24 dropped 43 appellant claims on the floor.

Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:

claude_session.py
  • query / query_json are now async — asyncio.create_subprocess_exec
    instead of subprocess.run, so MCP server can serve other coroutines
    while a call is in flight.
  • DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
    document hits it; bounded so a runaway never zombifies forever.
  • LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
  • TimeoutError now actually kills the subprocess (asyncio.wait_for
    cancellation alone leaves the child running).

claims_extractor.py
  • _split_by_sections: chunks at numbered sections / Hebrew letter
    headings / "פרק" markers / markdown ##, falls back to paragraph
    breaks, then to hard splits. Targets 12K chars per chunk — small
    enough that each chunk reliably finishes inside the timeout.
  • _extract_chunk: per-chunk retry (1 attempt by default) with
    structured logging on failure. Failed chunks no longer crash the
    overall extraction; they're skipped with a partial-result warning.
  • extract_claims_with_ai now runs chunks in parallel via
    asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
    For a 25K-char appeal: was sequential 150-300s, now ~70-90s.

Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.

The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.

Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 14:21:35 +00:00

163 lines
5.2 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""לולאת למידה — השוואת טיוטה לגרסה סופית וחילוץ לקחים.
שלב 7 באיפיון:
1. קליטת גרסה סופית (שדפנה חתמה)
2. השוואת טיוטה לסופית — זיהוי שינויים
3. חילוץ לקחים: ביטויים חדשים, דפוסים שהשתנו, שגיאות חוזרות
4. עדכון מודל הסגנון
"""
from __future__ import annotations
import logging
from uuid import UUID
from legal_mcp import config
from legal_mcp.config import parse_llm_json
from legal_mcp.services import db, claude_session
logger = logging.getLogger(__name__)
def compute_diff_stats(draft_text: str, final_text: str) -> dict:
"""חישוב סטטיסטיקות השוואה בין טיוטה לסופית."""
draft_words = draft_text.split()
final_words = final_text.split()
draft_len = len(draft_words)
final_len = len(final_words)
# Simple word-level diff (not a full diff algorithm, but good enough for stats)
draft_set = set(draft_words)
final_set = set(final_words)
common = draft_set & final_set
added = final_set - draft_set
removed = draft_set - final_set
# Estimate change percentage
if draft_len == 0:
change_pct = 100.0
else:
change_pct = (len(added) + len(removed)) / max(draft_len, final_len) * 100
return {
"draft_words": draft_len,
"final_words": final_len,
"change_percent": round(change_pct, 1),
"words_added": len(added),
"words_removed": len(removed),
"words_common": len(common),
}
LESSONS_PROMPT = """אתה מנתח שינויים בהחלטות משפטיות. קיבלת טיוטה (שנוצרה ע"י AI) וגרסה סופית (שעברה עריכת דפנה).
## משימה:
1. זהה את השינויים המהותיים (לא הקלדה/פורמט)
2. סווג כל שינוי:
- expression_change — ביטוי שהוחלף (הצע כלקח לעתיד)
- structure_change — שינוי מבני (סדר, חלוקה)
- content_addition — תוכן שנוסף (מה חסר?)
- content_removal — תוכן שהוסר (מה מיותר?)
- tone_change — שינוי טון (רשמי יותר/פחות)
- error_fix — תיקון שגיאה עובדתית/משפטית
3. הסק לקחים שניתן להפעיל בהחלטות עתידיות
## פלט JSON:
{
"changes": [
{"type": "...", "description": "תיאור השינוי", "draft_text": "...", "final_text": "...", "lesson": "לקח לעתיד"}
],
"new_expressions": ["ביטוי חדש שדפנה הוסיפה"],
"overall_assessment": "הערכה כללית (1-2 משפטים)"
}
"""
async def analyze_changes(draft_text: str, final_text: str) -> dict:
"""ניתוח שינויים בין טיוטה לגרסה סופית עם Claude."""
# Truncate for context window
max_chars = 15000
draft_sample = draft_text[:max_chars]
final_sample = final_text[:max_chars]
prompt = f"""{LESSONS_PROMPT}
--- טיוטה ---
{draft_sample}
--- גרסה סופית ---
{final_sample}
"""
result = await claude_session.query_json(prompt)
if result is None:
logger.warning("Failed to parse lessons response")
return {"changes": [], "new_expressions": [], "overall_assessment": ""}
return result
async def process_final_version(
case_id: UUID,
final_text: str,
) -> dict:
"""קליטת גרסה סופית, השוואה לטיוטה, חילוץ לקחים.
Args:
case_id: מזהה התיק
final_text: טקסט הגרסה הסופית
Returns:
dict עם diff stats, changes, lessons
"""
decision = await db.get_decision_by_case(case_id)
if not decision:
raise ValueError(f"No decision for case {case_id}")
# Get draft text (combine all blocks)
pool = await db.get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"""SELECT content FROM decision_blocks
WHERE decision_id = $1 AND word_count > 0
ORDER BY block_index""",
UUID(decision["id"]),
)
draft_text = "\n\n".join(r["content"] for r in rows if r["content"])
if not draft_text:
raise ValueError("No draft content to compare")
# Compute stats
diff_stats = compute_diff_stats(draft_text, final_text)
# Analyze changes with AI
analysis = await analyze_changes(draft_text, final_text)
# Store new expressions as style patterns
for expr in analysis.get("new_expressions", []):
if expr and len(expr) > 3:
await db.upsert_style_pattern(
pattern_type="characteristic_phrase",
pattern_text=expr,
context="למד מגרסה סופית",
)
# Update decision status
await db.update_decision(
UUID(decision["id"]),
status="final",
)
# Update case status
case = await db.get_case(case_id)
if case:
await db.update_case(case_id, status="final")
return {
"diff_stats": diff_stats,
"analysis": analysis,
"lessons_count": len(analysis.get("changes", [])),
"new_expressions": len(analysis.get("new_expressions", [])),
}