Add training corpus UI with Nevo proofreading pipeline
- New proofreader service strips Nevo editorial additions (front matter,
postamble, page headers, watermarks, inline codes) from DOCX/PDF/MD
- PDF pages use Google Vision OCR for clean Hebrew RTL extraction
- New training page at #/training with drag-and-drop upload, automatic
metadata extraction (decision number, date, categories), reviewable
preview, and style pattern report grouped by type
- API endpoints: /api/training/{analyze,upload,corpus,patterns,
analyze-style,analyze-style/status}
- Fix claude_session.query to pipe prompt via stdin, avoiding ARG_MAX
overflow when analyzing 900K+ char corpus
- CLI scripts for batch proofreading and corpus upload
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -24,6 +24,9 @@ LONG_TIMEOUT = 300 # For complex tasks like block writing
|
||||
def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> str:
|
||||
"""Send a prompt to Claude Code headless and return the text response.
|
||||
|
||||
Passes the prompt via stdin (not argv) to avoid the OS ARG_MAX limit —
|
||||
prompts can be 500K+ chars when analyzing a full style corpus.
|
||||
|
||||
Args:
|
||||
prompt: The prompt to send.
|
||||
timeout: Max seconds to wait.
|
||||
@@ -36,14 +39,18 @@ def query(prompt: str, timeout: int = DEFAULT_TIMEOUT, max_turns: int = 1) -> st
|
||||
RuntimeError: If claude CLI is not available or fails.
|
||||
"""
|
||||
cmd = [
|
||||
"claude", "-p", prompt,
|
||||
"claude", "-p",
|
||||
"--output-format", "json",
|
||||
"--max-turns", str(max_turns),
|
||||
]
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd, capture_output=True, text=True, timeout=timeout,
|
||||
cmd,
|
||||
input=prompt,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
except FileNotFoundError:
|
||||
raise RuntimeError("Claude CLI not found. Install Claude Code or add 'claude' to PATH.")
|
||||
|
||||
Reference in New Issue
Block a user