Add training corpus UI with Nevo proofreading pipeline

- New proofreader service strips Nevo editorial additions (front matter, postamble, page headers, watermarks, inline codes) from DOCX/PDF/MD - PDF pages use Google Vision OCR for clean Hebrew RTL extraction - New training page at #/training with drag-and-drop upload, automatic metadata extraction (decision number, date, categories), reviewable preview, and style pattern report grouped by type - API endpoints: /api/training/{analyze,upload,corpus,patterns, analyze-style,analyze-style/status} - Fix claude_session.query to pipe prompt via stdin, avoiding ARG_MAX overflow when analyzing 900K+ char corpus - CLI scripts for batch proofreading and corpus upload Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:04:58 +00:00
parent ecda95d610
commit 32f18de049
6 changed files with 1960 additions and 3 deletions
--- a/mcp-server/src/legal_mcp/services/claude_session.py
+++ b/mcp-server/src/legal_mcp/services/claude_session.py
@@ -24,6 +24,9 @@ LONG_TIMEOUT = 300  # For complex tasks like block writing
 def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> str:
    """Send a prompt to Claude Code headless and return the text response.

+    Passes the prompt via stdin (not argv) to avoid the OS ARG_MAX limit —
+    prompts can be 500K+ chars when analyzing a full style corpus.
+
    Args:
        prompt: The prompt to send.
        timeout: Max seconds to wait.
@@ -36,14 +39,18 @@ def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> st
        RuntimeError: If claude CLI is not available or fails.
    """
    cmd = [
-        "claude", "-p", prompt,
+        "claude", "-p",
        "--output-format", "json",
        "--max-turns", str(max_turns),
    ]

    try:
        result = subprocess.run(
-            cmd, capture_output=True, text=True, timeout=timeout,
+            cmd,
+            input=prompt,
+            capture_output=True,
+            text=True,
+            timeout=timeout,
        )
    except FileNotFoundError:
        raise RuntimeError("Claude CLI not found. Install Claude Code or add 'claude' to PATH.")