Files

Build & Deploy / build-and-deploy (push) Successful in 1m28s

Details

LLM session: async, 30min timeout, semantic chunking + parallel

The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:

  1. subprocess.run() blocks the asyncio event loop in the MCP server
     for the full duration of every LLM call (60-180s typical).
  2. The 120-second timeout was below the cold-cache cost of any
     document over ~12K Hebrew characters. Three back-to-back timeouts
     on case 8174-24 dropped 43 appellant claims on the floor.

Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:

claude_session.py
  • query / query_json are now async — asyncio.create_subprocess_exec
    instead of subprocess.run, so MCP server can serve other coroutines
    while a call is in flight.
  • DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
    document hits it; bounded so a runaway never zombifies forever.
  • LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
  • TimeoutError now actually kills the subprocess (asyncio.wait_for
    cancellation alone leaves the child running).

claims_extractor.py
  • _split_by_sections: chunks at numbered sections / Hebrew letter
    headings / "פרק" markers / markdown ##, falls back to paragraph
    breaks, then to hard splits. Targets 12K chars per chunk — small
    enough that each chunk reliably finishes inside the timeout.
  • _extract_chunk: per-chunk retry (1 attempt by default) with
    structured logging on failure. Failed chunks no longer crash the
    overall extraction; they're skipped with a partial-result warning.
  • extract_claims_with_ai now runs chunks in parallel via
    asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
    For a 25K-char appeal: was sequential 150-300s, now ~70-90s.

Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.

The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.

Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 14:21:35 +00:00

4.5 KiB

Raw Blame History

scripts/ — מדריך סקריפטים

כלל: כל עדכון, יצירה, או מחיקה של סקריפט בתיקייה זו מחייב עדכון של קובץ זה.

סקריפטים פעילים

Script	Type	Purpose	Scheduled
`auto-sync-cases.sh`	bash	סנכרון תיקי ערר ל-Gitea — רץ כל דקה	`* * * * *` (cron)
`backup-db.sh`	bash	גיבוי PostgreSQL יומי ל-`data/backups/` (gzip)	לתזמן: `0 2 * * *`
`restore-db.sh`	bash	שחזור DB מגיבוי (companion ל-backup-db.sh)	ידני
`notify.py`	python	שליחת מייל התראה מסוכנים via SMTP (Gmail)	נקרא ע"י סוכנים
`bidi_table.py`	python	יצירת טבלאות box-drawing עם תמיכה ב-BiDi (עברית+אנגלית)	ספריית עזר
`convert_decision_template.py`	python	המרת `data/training/טיוטת החלטה.dotx` → `skills/docx/decision_template.docx` לטעינה ב-python-docx	להריץ כשמתעדכנת התבנית
`deploy-track-changes.sh`	bash	סנכרון skills CMP↔CMPA + בדיקות + הנחיות deploy לארכיטקטורת Track Changes	ידני
`retrofit_case.py`	python	retrofit רטרואקטיבי — מזריק bookmarks לקובץ קיים של תיק ספציפי ומגדיר אותו כ-active_draft	ידני (חד-פעמי לתיק)

תיקיית `.archive/` — סקריפטים שהושלמו

סקריפטים חד-פעמיים שהפונקציונליות שלהם הוטמעה ב-MCP server או ב-API. נשמרים ב-git לצורך היסטוריה — אין להריץ אותם.

Script	Original Purpose	Superseded By
`backfill_pattern_frequency.py`	עדכון תדירות דפוסי סגנון ב-DB	`web/app.py::_extract_pattern_variants()`
`batch_upload_training.py`	העלאת קורפוס אימון (16 קבצים)	Web UI: `/api/training/upload`
`benchmark_embeddings.py`	השוואת מודלי embeddings (voyage-3 vs voyage-4)	הושלם — voyage-3-large נבחר
`benchmark_new_vs_old.py`	השוואת Google Vision vs markdown קיים	הושלם — בדיקה חד-פעמית לתיק 1130-25
`decompose-decisions.py`	פירוק החלטות סופיות ל-12 בלוקים	MCP: `write_block()`, `write_all_blocks()`
`export-decision-docx.py`	ייצוא החלטה ל-DOCX	MCP: `export_docx()`
`extract-citations.py`	חילוץ ציטוטי פסיקה מבלוק י	MCP service: `references_extractor.py`
`extract-claims.py`	חילוץ טענות מבלוק ז	MCP: `extract_claims()` + `claims_extractor.py`
`extract_claims_8174.py`	חד-פעמי — חילוץ טענות חסרות לתיק 8174-24 אחרי timeout של האנליסט (43 טענות עורר נוספו 30/04/26)	phase 1: `claude_session` async + 30min timeout + chunking סמנטי
`extract_all_google_vision.py`	OCR בכמות עם Google Vision	MCP: `document_upload()` pipeline
`extract_originals.py`	חילוץ טקסט מ-PDF עם Claude Opus	MCP service: `extractor.py`
`extract_originals_ocr.py`	חילוץ OCR מלא מ-PDF	MCP service: `extractor.py`
`generate-embeddings.py`	יצירת embeddings לבלוקים ופסיקה	אוטומטי — נוצרים עם יצירת בלוקים
`link-claims-to-discussion.py`	קישור טענות לפסקאות דיון	MCP service: `qa_validator.py`
`proofread_training_corpus.py`	ניקוי Nevo מ-DOCX/PDF ל-Markdown	MCP service: `proofreader.py` + Web UI
`seed-appeals.py`	seeding תיקי ערר ראשוניים ל-DB	MCP: `case_create()`
`seed-knowledge.py`	seeding לקחים, ביטויי מעבר, פסיקה	MCP: `record_chair_feedback()`, `precedent_attach()`
`validate-decision.py`	ולידציה מול block-schema	MCP: `validate_decision()` + `qa_validator.py`

סקריפטים שנמחקו (git history בלבד)

Script	Reason
`import-final-decisions.py`	מיגרציה הושלמה — כל ההחלטות ב-`data/training/`
`compare_extractions.py`	בדיקה חד-פעמית לתיק 1130-25
`decompose-decisions-v2.py`	כפילות של v1
`extract_google_vision.py`	hardcoded לתיק בודד
`extract_google_vision_single.py`	wrapper חד-פעמי
`test-search.py`	סקריפט דיבאג

4.5 KiB Raw Blame History Unescape Escape

scripts/ — מדריך סקריפטים

סקריפטים פעילים

תיקיית .archive/ — סקריפטים שהושלמו

סקריפטים שנמחקו (git history בלבד)

4.5 KiB

Raw Blame History

תיקיית `.archive/` — סקריפטים שהושלמו