Phase A — voyage-3 migration (executed):
- VOYAGE_MODEL=voyage-3 set in Coolify (legal-ai app) and ~/.env
- scripts/reembed_voyage.py: re-embeds document_chunks (6157),
case_law_embeddings (9), precedent_chunks (385), and halachot (400)
using the new model. paragraph_embeddings was empty. 6951 rows
re-embedded in 93s, ~75 rows/sec.
- Same 1024 dim → no schema change needed.
Why voyage-3 over voyage-law-2: benchmark on 3 Hebrew legal queries
with real passages from the corpus gave voyage-3 perfect ordering on
3/3 tests AND the largest separation (+0.483 vs voyage-law-2's
+0.238). voyage-4 family had bigger separation but missed top-1 on
the hardest test.
Phase B (voyage-context-3) and Phase C (voyage-multimodal-3.5 for
scanned + appraiser docs) are designed in docs/voyage-upgrades-plan.md
but deferred — to be picked up in a fresh conversation. The plan
includes:
- Phase B: contextualized embeddings refactor (~49% recall lift on
legal docs per Anthropic's research). Same dim, but ingestion
pipeline must pass full doc context per chunk.
- Phase C: page-level image embeddings via voyage-multimodal-3.5,
stored in a parallel *_image_embeddings table. Hybrid text+image
search. Targets appraiser report tables and scanned PDFs where
current OCR loses layout.
After this commit: MCP server needs a /mcp reconnect to pick up the
new VOYAGE_MODEL env, and the legal-ai container will pick it up on
its next redeploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:
1. subprocess.run() blocks the asyncio event loop in the MCP server
for the full duration of every LLM call (60-180s typical).
2. The 120-second timeout was below the cold-cache cost of any
document over ~12K Hebrew characters. Three back-to-back timeouts
on case 8174-24 dropped 43 appellant claims on the floor.
Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:
claude_session.py
• query / query_json are now async — asyncio.create_subprocess_exec
instead of subprocess.run, so MCP server can serve other coroutines
while a call is in flight.
• DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
document hits it; bounded so a runaway never zombifies forever.
• LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
• TimeoutError now actually kills the subprocess (asyncio.wait_for
cancellation alone leaves the child running).
claims_extractor.py
• _split_by_sections: chunks at numbered sections / Hebrew letter
headings / "פרק" markers / markdown ##, falls back to paragraph
breaks, then to hard splits. Targets 12K chars per chunk — small
enough that each chunk reliably finishes inside the timeout.
• _extract_chunk: per-chunk retry (1 attempt by default) with
structured logging on failure. Failed chunks no longer crash the
overall extraction; they're skipped with a partial-result warning.
• extract_claims_with_ai now runs chunks in parallel via
asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
For a 25K-char appeal: was sequential 150-300s, now ~70-90s.
Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.
The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.
Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes critical bug in 1033-25: user-uploaded עריכה-*.docx files were
orphaned on disk while exports kept rebuilding from stale DB blocks.
New architecture:
- User-uploaded DOCX becomes the source of truth (cases.active_draft_path)
- System edits via XML surgery with real Word <w:ins>/<w:del> revisions
- User can Accept/Reject each change from within Word
Components:
- docx_reviser.py: XML surgery for Track Changes (15 tests)
- docx_retrofit.py: retroactive bookmark injection with Hebrew marker
detection + heading heuristic (9 tests)
- docx_exporter.py: emits bookmarks around each of the 12 blocks
- 3 new MCP tools: apply_user_edit, list_bookmarks, revise_draft
- 4 new/updated endpoints: upload (auto-registers active draft),
/exports/revise, /exports/bookmarks, /exports/{filename}/retrofit,
/active-draft
- DB migration: cases.active_draft_path column
- UI: correct banner using real v-numbers, "מקור האמת" badge,
detailed upload toast with bookmarks_added/missing_blocks
- agents: legal-exporter (3 export modes), legal-ceo (stage G for
revision handling), legal-writer (revision mode)
Multi-tenancy:
- Works for both CMP (1xxx cases) and CMPA (8xxx/9xxx cases)
- New revise-draft skill added to both companies
- deploy-track-changes.sh syncs skills CMP ↔ CMPA
- retrofit_case.py: one-off retrofit of existing files
Tests: 34 passing (15 reviser + 9 retrofit + 4 exporter bookmarks + 6 e2e)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Active scripts (5): auto-sync-cases.sh, backup-db.sh, restore-db.sh,
notify.py, bidi_table.py
Archived (17): one-time migration/seeding scripts whose functionality
is now in MCP server or web API. Moved to scripts/.archive/
Deleted (5): zero-value scripts (duplicates, hardcoded single-case,
debug scripts)
Added scripts/SCRIPTS.md — registry of all scripts with purpose,
status, and what superseded them. CLAUDE.md updated with rule:
any script change requires SCRIPTS.md update.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLAUDE.md: clarify vault was deleted, knowledge is in docs/+training/
- Remove import-final-decisions.py (migration completed, all decisions in DB)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Assets live in ezer-mishpati/paperclip-config (cloned at ~/.paperclip).
Deploy via: ~/.paperclip/hebrew/apply-hebrew.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CEO agent now sends email via notify.py when awaiting human response
- CEO creates child issues (parentId) instead of flat disconnected issues
- Fix notify.py email address to chaim+paperclip@marcus-law.co.il
- Move Paperclip UI assets (RTL CSS + Hebrew JS) into repo under scripts/
- Add deploy.sh script to push assets to live Paperclip instance
- Fix comment box positioning: newest comment on top, input below it
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- auto-sync-cases.sh: fix broken directory scan (was looking for
status subdirs that don't exist), fix env var word-splitting bug,
add safe.directory handling and error logging
- cases.py: auto-create Gitea repo on case_create, fix
documents/original → documents/originals naming mismatch
- app.py: add GET /api/cases/{case_number}/git-status endpoint
- web-ui: add SyncIndicator component in case header showing
sync status (synced/pending/no remote) with last commit time
- pyproject.toml: add httpx dependency
- CLAUDE.md: update Paperclip wakeup API docs
- settings page: switch tag input from Select to free-text with datalist
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Visual dashboard at #/style-report with 4 sections:
- Hero: 24 decisions, char counts, subject donut, timeline
- Anatomy: average section-length breakdown (intro → ruling → conclusion)
- Signature Phrases Wall: pattern cards with real corpus frequencies, filter
chips by type, click → modal with examples
- Contribution: per-decision "new vs confirmed" patterns, growth curve SVG
Backend:
- /api/training/style-report endpoint computes all 4 sections in one call
- Headlines in Hebrew are computed server-side from real data
- Backfill script for style_patterns.frequency using _strip_nikud +
pattern-variant extraction (templates with [placeholders], / alternatives,
ellipsis all handled)
Real findings from the 24-decision corpus:
- דיון משפטי = 49% of avg decision (the focus)
- 23/24 use "לפנינו ערר" opening formula
- 21/24 use "ניתנה פה אחד" closing
- After 7 decisions we already learned 85% of her style patterns
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New proofreader service strips Nevo editorial additions (front matter,
postamble, page headers, watermarks, inline codes) from DOCX/PDF/MD
- PDF pages use Google Vision OCR for clean Hebrew RTL extraction
- New training page at #/training with drag-and-drop upload, automatic
metadata extraction (decision number, date, categories), reviewable
preview, and style pattern report grouped by type
- API endpoints: /api/training/{analyze,upload,corpus,patterns,
analyze-style,analyze-style/status}
- Fix claude_session.query to pipe prompt via stdin, avoiding ARG_MAX
overflow when analyzing 900K+ char corpus
- CLI scripts for batch proofreading and corpus upload
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add delete_document_chunks for reprocessing, save extracted text to disk
- Expand case directory structure (original/extracted/proofread/backup)
- Update classifier patterns (תגובה, הודעת עמדה)
- Fix proofreader agent paths for new directory layout
- Update HEARTBEAT to notify on every task completion
- Improve bidi_table with LRE/PDF directional embedding
- Add Paperclip project verification and auto-close setup issue
- Add auto-sync-cases.sh for Gitea synchronization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove cases/new|in-progress|completed subdivision (status managed in DB)
- Rename documents/original → documents/originals (consistent plural)
- Move exports from global data/exports/ into cases/{num}/exports/
- Add documents/research/ for case law and analysis files
- Update all agents, scripts, config, web API endpoints, and DB paths
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New: scripts/notify.py — sends via SMTP (notify@marcus-law.co.il → paperclip+chaim@marcus-law.co.il)
Updated: HEARTBEAT.md — agents must send email when waiting for human decision
Triggers: outcome choice, direction approval, QA failures, review ready.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>