legal-ai

Author	SHA1	Message	Date
Chaim	8a815ecff5	fix(retrieval): rewrite chunk-page retrofit to skip OCR All checks were successful Build & Deploy / build-and-deploy (push) Successful in 16s Details The first-pass retrofit re-extracted via extractor.extract_text, which re-runs Google Vision OCR on scanned pages. OCR is non-deterministic, so the new text didn't match the chunk content stored in the DB (produced by the original OCR run) — only ~7% of chunks were located. New approach (no OCR cost): 1. Use the stored documents.extracted_text from the DB — the exact text the chunks were produced from, so chunk lookups match. 2. Anchor page boundaries via PyMuPDF direct text reads (free, no OCR). Pages with usable direct text are anchored by snippet match; OCR-only pages are linearly interpolated between anchors. 3. Search each chunk in extracted_text using a whitespace-tolerant helper — needed because the chunker joins paragraphs with single '\\n' while extracted_text uses '\\n\\n' as page separators. Verified on 8174-24 (5 docs, 307 chunks) + 8137-24 (9 docs, 512 chunks): 100% chunks tagged, 13s total, $0 cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:04:33 +00:00
Chaim	81ccf3a888	feat(retrieval): track page_number on text chunks for multimodal hybrid boost All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6m33s Details The legacy chunker did not track which PDF page each chunk came from. Stored chunks had page_number=NULL, which blocked the multimodal hybrid retriever's text+image boost — it joins (chunk, image) on (document_id, page_number) and the join could never fire. This change: - extractor.extract_text now returns (text, page_count, page_offsets); page_offsets[i] is the start char offset of page (i+1) in the joined text. None for non-PDFs. - chunker.chunk_document accepts an optional page_offsets and tags each chunk with the page that contains its first character (uses the existing chunker logic; pages assigned post-hoc by content search to keep the diff minimal). - processor.process_document and precedent_library.ingest_precedent forward page_offsets through the chunker. New uploads now carry accurate page_number on every chunk. - Other extract_text callers (tools/documents, tools/workflow, web/app.py) updated to unpack the third element (ignored). - scripts/backfill_chunk_pages.py: per-case retrofit. Re-extracts each PDF (re-OCRs via Google Vision if needed, ~$0.0015/page), computes page_offsets, and updates page_number on every chunk by content search. Idempotent; --force re-runs on already-tagged docs. Forward-only would leave the 419 image embeddings backfilled on cases 8174-24 + 8137-24 unable to boost their corresponding text chunks. The retrofit script closes that gap (cost ~$0.60). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:49:41 +00:00
Chaim	242f668319	feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m50s Details Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid text+image search. Off by default; enable with MULTIMODAL_ENABLED=true. - Schema V9: document_image_embeddings + precedent_image_embeddings (vector(1024), page_number, image_thumbnail_path) - extractor.render_pages_for_multimodal renders PDF pages at MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass - embeddings.embed_images calls voyage-multimodal-3 in 50-page batches - services/hybrid_search.py orchestrator: rerank applied to text side first (rerank-2 is text-only); image side cosine; weighted merge with text_weight 0.65 (env-tunable); image-only pages surface as match_type='image' so dense scanned content still appears - processor.process_document and precedent_library.ingest_precedent gated by flag — non-fatal on multimodal failure - scripts/multimodal_backfill.py — idempotent per-case CLI to embed existing documents without re-extracting text Validated locally on a 5-page response brief: render 0.31s, embed 8.32s, hybrid merge surfaces image rows correctly. Production rollout starts with flag=false (no behavior change), then per-case A/B. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:24:52 +00:00
Chaim	26c3fddf41	feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which 4 POCs showed inconsistent improvement), add a cross-encoder rerank layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false). POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge): - mean@3 +4.5% (4.306 → 4.500) - practical-category queries +11.6% (3.78 → 4.22) - latency +702ms per query - no schema change, no re-embed, no double storage Plumbing: - config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars - embeddings.voyage_rerank() wraps voyageai client.rerank - services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is unavailable. - tools/search.py: search_decisions, search_case_documents, find_similar_cases all wrapped - services/precedent_library.search_library wrapped Smoke-tested locally with flag on/off — produces expected behaviour and latency profile. Ready for production rollout via Coolify env flip after deploy. POCs (kept under scripts/ for reference): - voyage_context3_poc{_long}.py — context-3 evaluation (rejected) - voyage_multimodal_poc.py — multimodal-3 (stage C, deferred) - voyage_rerank_judge_poc.py — single-case rerank benchmark - voyage_rerank_corpus_poc.py — full-corpus rerank validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:43:41 +00:00
Chaim	da0a385d9c	docs: register reembed_voyage.py in SCRIPTS.md All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details	2026-05-03 16:44:07 +00:00
Chaim	cb0b4b6a8b	ops: switch embeddings to voyage-3 + plan for context-3 + multimodal-3.5 All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details Phase A — voyage-3 migration (executed): - VOYAGE_MODEL=voyage-3 set in Coolify (legal-ai app) and ~/.env - scripts/reembed_voyage.py: re-embeds document_chunks (6157), case_law_embeddings (9), precedent_chunks (385), and halachot (400) using the new model. paragraph_embeddings was empty. 6951 rows re-embedded in 93s, ~75 rows/sec. - Same 1024 dim → no schema change needed. Why voyage-3 over voyage-law-2: benchmark on 3 Hebrew legal queries with real passages from the corpus gave voyage-3 perfect ordering on 3/3 tests AND the largest separation (+0.483 vs voyage-law-2's +0.238). voyage-4 family had bigger separation but missed top-1 on the hardest test. Phase B (voyage-context-3) and Phase C (voyage-multimodal-3.5 for scanned + appraiser docs) are designed in docs/voyage-upgrades-plan.md but deferred — to be picked up in a fresh conversation. The plan includes: - Phase B: contextualized embeddings refactor (~49% recall lift on legal docs per Anthropic's research). Same dim, but ingestion pipeline must pass full doc context per chunk. - Phase C: page-level image embeddings via voyage-multimodal-3.5, stored in a parallel *_image_embeddings table. Hybrid text+image search. Targets appraiser report tables and scanned PDFs where current OCR loses layout. After this commit: MCP server needs a /mcp reconnect to pick up the new VOYAGE_MODEL env, and the legal-ai container will pick it up on its next redeploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 16:43:48 +00:00
Chaim	28f49defff	LLM session: async, 30min timeout, semantic chunking + parallel All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details The claude_session bridge had two structural defects that made any non-trivial document extraction unreliable: 1. subprocess.run() blocks the asyncio event loop in the MCP server for the full duration of every LLM call (60-180s typical). 2. The 120-second timeout was below the cold-cache cost of any document over ~12K Hebrew characters. Three back-to-back timeouts on case 8174-24 dropped 43 appellant claims on the floor. Phase 1 of the remediation plan — keeps claude_session as the engine (no Anthropic API switch) and restructures around it: claude_session.py • query / query_json are now async — asyncio.create_subprocess_exec instead of subprocess.run, so MCP server can serve other coroutines while a call is in flight. • DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic document hits it; bounded so a runaway never zombifies forever. • LONG_TIMEOUT 300 → 3600 for opus block writing on full case context. • TimeoutError now actually kills the subprocess (asyncio.wait_for cancellation alone leaves the child running). claims_extractor.py • _split_by_sections: chunks at numbered sections / Hebrew letter headings / "פרק" markers / markdown ##, falls back to paragraph breaks, then to hard splits. Targets 12K chars per chunk — small enough that each chunk reliably finishes inside the timeout. • _extract_chunk: per-chunk retry (1 attempt by default) with structured logging on failure. Failed chunks no longer crash the overall extraction; they're skipped with a partial-result warning. • extract_claims_with_ai now runs chunks in parallel via asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3). For a 25K-char appeal: was sequential 150-300s, now ~70-90s. Updated all 9 callers (claims, appraiser facts, block writer, qa validator, brainstorm, learning loop, style analyzer × 3) to await the now-async API. The one-shot scripts/extract_claims_8174.py used to recover 43 appellant claims on case 8174-24 has been moved to .archive/ — phase 1 makes it obsolete. SCRIPTS.md updated. Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent llm_tasks table, SSE progress) is the structural follow-up — separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:21:35 +00:00
Chaim	726498126d	Add Track Changes architecture for draft revisions (CMP + CMPA) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Fixes critical bug in 1033-25: user-uploaded עריכה-*.docx files were orphaned on disk while exports kept rebuilding from stale DB blocks. New architecture: - User-uploaded DOCX becomes the source of truth (cases.active_draft_path) - System edits via XML surgery with real Word <w:ins>/<w:del> revisions - User can Accept/Reject each change from within Word Components: - docx_reviser.py: XML surgery for Track Changes (15 tests) - docx_retrofit.py: retroactive bookmark injection with Hebrew marker detection + heading heuristic (9 tests) - docx_exporter.py: emits bookmarks around each of the 12 blocks - 3 new MCP tools: apply_user_edit, list_bookmarks, revise_draft - 4 new/updated endpoints: upload (auto-registers active draft), /exports/revise, /exports/bookmarks, /exports/{filename}/retrofit, /active-draft - DB migration: cases.active_draft_path column - UI: correct banner using real v-numbers, "מקור האמת" badge, detailed upload toast with bookmarks_added/missing_blocks - agents: legal-exporter (3 export modes), legal-ceo (stage G for revision handling), legal-writer (revision mode) Multi-tenancy: - Works for both CMP (1xxx cases) and CMPA (8xxx/9xxx cases) - New revise-draft skill added to both companies - deploy-track-changes.sh syncs skills CMP ↔ CMPA - retrofit_case.py: one-off retrofit of existing files Tests: 34 passing (15 reviser + 9 retrofit + 4 exporter bookmarks + 6 e2e) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 18:49:30 +00:00
Chaim	28daff58be	Pre-existing agent updates + analysis DOCX export Updates accumulated from prior sessions: - HEARTBEAT: company-based filtering (CMP/CMPA) rules - legal-qa, legal-researcher: routine updates - analysis_docx_exporter: new service for analysis DOCX export - compose page: "הורד כ-DOCX" button for analysis - decision_template.docx: template for exporter Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 18:49:10 +00:00
Chaim	5c9a5d702a	Clean up scripts/: archive 17, delete 5, add SCRIPTS.md registry Active scripts (5): auto-sync-cases.sh, backup-db.sh, restore-db.sh, notify.py, bidi_table.py Archived (17): one-time migration/seeding scripts whose functionality is now in MCP server or web API. Moved to scripts/.archive/ Deleted (5): zero-value scripts (duplicates, hardcoded single-case, debug scripts) Added scripts/SCRIPTS.md — registry of all scripts with purpose, status, and what superseded them. CLAUDE.md updated with rule: any script change requires SCRIPTS.md update. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 16:30:19 +00:00
Chaim	891f20dbb9	Clean up legacy references: update CLAUDE.md, remove dead import script - CLAUDE.md: clarify vault was deleted, knowledge is in docs/+training/ - Remove import-final-decisions.py (migration completed, all decisions in DB) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 16:16:35 +00:00
Chaim	ad3c2b7117	Remove duplicate paperclip-assets — source of truth is paperclip-config repo Assets live in ezer-mishpati/paperclip-config (cloned at ~/.paperclip). Deploy via: ~/.paperclip/hebrew/apply-hebrew.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:57:18 +00:00
Chaim	11c73a7c60	CEO: add email notifications, subtask parentId, and Paperclip UI assets - CEO agent now sends email via notify.py when awaiting human response - CEO creates child issues (parentId) instead of flat disconnected issues - Fix notify.py email address to chaim+paperclip@marcus-law.co.il - Move Paperclip UI assets (RTL CSS + Hebrew JS) into repo under scripts/ - Add deploy.sh script to push assets to live Paperclip instance - Fix comment box positioning: newest comment on top, input below it Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:55:55 +00:00
Chaim	82ba4663ba	Fix case repo sync + auto-create Gitea repos + add sync indicator All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m30s Details - auto-sync-cases.sh: fix broken directory scan (was looking for status subdirs that don't exist), fix env var word-splitting bug, add safe.directory handling and error logging - cases.py: auto-create Gitea repo on case_create, fix documents/original → documents/originals naming mismatch - app.py: add GET /api/cases/{case_number}/git-status endpoint - web-ui: add SyncIndicator component in case header showing sync status (synced/pending/no remote) with last commit time - pyproject.toml: add httpx dependency - CLAUDE.md: update Paperclip wakeup API docs - settings page: switch tag input from Select to free-text with datalist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:28:16 +00:00
Chaim	858333b386	Add style report dashboard — Dafna's style portrait Visual dashboard at #/style-report with 4 sections: - Hero: 24 decisions, char counts, subject donut, timeline - Anatomy: average section-length breakdown (intro → ruling → conclusion) - Signature Phrases Wall: pattern cards with real corpus frequencies, filter chips by type, click → modal with examples - Contribution: per-decision "new vs confirmed" patterns, growth curve SVG Backend: - /api/training/style-report endpoint computes all 4 sections in one call - Headlines in Hebrew are computed server-side from real data - Backfill script for style_patterns.frequency using _strip_nikud + pattern-variant extraction (templates with [placeholders], / alternatives, ellipsis all handled) Real findings from the 24-decision corpus: - דיון משפטי = 49% of avg decision (the focus) - 23/24 use "לפנינו ערר" opening formula - 21/24 use "ניתנה פה אחד" closing - After 7 decisions we already learned 85% of her style patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 11:34:37 +00:00
Chaim	32f18de049	Add training corpus UI with Nevo proofreading pipeline - New proofreader service strips Nevo editorial additions (front matter, postamble, page headers, watermarks, inline codes) from DOCX/PDF/MD - PDF pages use Google Vision OCR for clean Hebrew RTL extraction - New training page at #/training with drag-and-drop upload, automatic metadata extraction (decision number, date, categories), reviewable preview, and style pattern report grouped by type - API endpoints: /api/training/{analyze,upload,corpus,patterns, analyze-style,analyze-style/status} - Fix claude_session.query to pipe prompt via stdin, avoiding ARG_MAX overflow when analyzing 900K+ char corpus - CLI scripts for batch proofreading and corpus upload Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 11:04:58 +00:00
Chaim	3f759d3610	Improve document processing pipeline and agent workflows - Add delete_document_chunks for reprocessing, save extracted text to disk - Expand case directory structure (original/extracted/proofread/backup) - Update classifier patterns (תגובה, הודעת עמדה) - Fix proofreader agent paths for new directory layout - Update HEARTBEAT to notify on every task completion - Improve bidi_table with LRE/PDF directional embedding - Add Paperclip project verification and auto-close setup issue - Add auto-sync-cases.sh for Gitea synchronization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 16:45:49 +00:00
Chaim	22e819363e	Flatten cases directory structure and unify paths - Remove cases/new\|in-progress\|completed subdivision (status managed in DB) - Rename documents/original → documents/originals (consistent plural) - Move exports from global data/exports/ into cases/{num}/exports/ - Add documents/research/ for case law and analysis files - Update all agents, scripts, config, web API endpoints, and DB paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:33:27 +00:00
Chaim	e1d2e18ea8	Add email notifications: agents send mail when human action needed New: scripts/notify.py — sends via SMTP (notify@marcus-law.co.il → paperclip+chaim@marcus-law.co.il) Updated: HEARTBEAT.md — agents must send email when waiting for human decision Triggers: outcome choice, direction approval, QA failures, review ready. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 17:07:43 +00:00
Chaim	911c797eb2	Reorganize: skills/ directory + move memory to docs/ skill-legal-decision/ → skills/decision/ skill-legal-assistant/ → skills/assistant/ skill-legal-docx/ → skills/docx/ memory/*.md → docs/ Also removed: TASKS.md (use TaskMaster), classifier.py (replaced by local_classifier.py) Updated all references in CLAUDE.md, scripts, PRDs, docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 14:27:07 +00:00
Chaim	d5ccf03e4c	Add docs, scripts, skills, commands, and taskmaster config to repo Includes: - docs/: architecture, block-schema, migration-plan, product-specification - scripts/: bidi_table, decompose-decisions, extract-claims, seed-knowledge, etc. - skill-legal-decision/: SKILL.md + references + block-schema - skill-legal-assistant/: SKILL.md - skill-legal-docx/: SKILL.md + references - .claude/commands/: bidi-table skill - .taskmaster/: task config + PRDs - .gitignore: exclude legacy/, kiryat-yearim/, node_modules/, memory/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 14:19:17 +00:00
Chaim	d9e5ef0f46	Add full decision writing pipeline: classify, extract, brainstorm, write, QA, export New services (11 files): - classifier.py: auto doc-type classification + party identification (Claude Haiku) - claims_extractor.py: claim extraction from pleadings (Claude Sonnet + regex) - references_extractor.py: plan/case-law/legislation detection (regex) - brainstorm.py: direction generation with 2-3 options (Claude Sonnet) - block_writer.py: 12-block decision writer (template + Claude Sonnet/Opus) - docx_exporter.py: DOCX export with David font, RTL, headings - qa_validator.py: 6 QA checks with export blocking on critical failure - learning_loop.py: draft vs final comparison + lesson extraction - metrics.py: KPIs dashboard per case and global - audit.py: action audit log - cli.py: standalone CLI with 11 commands Updated pipeline: extract → classify → chunk → embed → store → extract_references New MCP tools: 29 total (was 16) New DB tables: audit_log, decisions CRUD, claims CRUD Config: Infisical support, external service allowlist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 10:21:47 +00:00

22 Commits