legal-ai

Author	SHA1	Message	Date
Chaim	d12cdb1fad	docs(voyage): mark stage C complete + record empirical fixes All checks were successful Build & Deploy / build-and-deploy (push) Successful in 10s Details Stage C of the voyage-upgrades-plan shipped to production on 2026-05-03. The doc now leads with the final state and the two empirical corrections vs the original plan: 1. Reciprocal Rank Fusion replaces weighted-sum hybrid merge. voyage-3 cosines (~0.4-0.5) systematically outscale voyage-multimodal-3 cosines (~0.20-0.25); a weighted sum lets text dominate even when image is the better signal. RRF is rank-based and robust to scale differences. 2. Chunker now propagates page_number end-to-end (extractor returns per-page offsets, chunker tags each chunk by its first character's page). A retrofit script backfills page_number on existing document_chunks without re-OCR — uses the stored documents.extracted_text plus PyMuPDF direct text reads as page anchors (linear interpolation for OCR-only pages). Production state on cases 8174-24 + 8137-24: 419 page-image embeddings, 819 chunks tagged with page_number, MULTIMODAL_ENABLED=true in Coolify env, hybrid search verified A/B against text-only baseline. The original stage C plan section is retained below for reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:16:13 +00:00
Chaim	26c3fddf41	feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which 4 POCs showed inconsistent improvement), add a cross-encoder rerank layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false). POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge): - mean@3 +4.5% (4.306 → 4.500) - practical-category queries +11.6% (3.78 → 4.22) - latency +702ms per query - no schema change, no re-embed, no double storage Plumbing: - config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars - embeddings.voyage_rerank() wraps voyageai client.rerank - services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is unavailable. - tools/search.py: search_decisions, search_case_documents, find_similar_cases all wrapped - services/precedent_library.search_library wrapped Smoke-tested locally with flag on/off — produces expected behaviour and latency profile. Ready for production rollout via Coolify env flip after deploy. POCs (kept under scripts/ for reference): - voyage_context3_poc{_long}.py — context-3 evaluation (rejected) - voyage_multimodal_poc.py — multimodal-3 (stage C, deferred) - voyage_rerank_judge_poc.py — single-case rerank benchmark - voyage_rerank_corpus_poc.py — full-corpus rerank validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:43:41 +00:00
Chaim	cb0b4b6a8b	ops: switch embeddings to voyage-3 + plan for context-3 + multimodal-3.5 All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details Phase A — voyage-3 migration (executed): - VOYAGE_MODEL=voyage-3 set in Coolify (legal-ai app) and ~/.env - scripts/reembed_voyage.py: re-embeds document_chunks (6157), case_law_embeddings (9), precedent_chunks (385), and halachot (400) using the new model. paragraph_embeddings was empty. 6951 rows re-embedded in 93s, ~75 rows/sec. - Same 1024 dim → no schema change needed. Why voyage-3 over voyage-law-2: benchmark on 3 Hebrew legal queries with real passages from the corpus gave voyage-3 perfect ordering on 3/3 tests AND the largest separation (+0.483 vs voyage-law-2's +0.238). voyage-4 family had bigger separation but missed top-1 on the hardest test. Phase B (voyage-context-3) and Phase C (voyage-multimodal-3.5 for scanned + appraiser docs) are designed in docs/voyage-upgrades-plan.md but deferred — to be picked up in a fresh conversation. The plan includes: - Phase B: contextualized embeddings refactor (~49% recall lift on legal docs per Anthropic's research). Same dim, but ingestion pipeline must pass full doc context per chunk. - Phase C: page-level image embeddings via voyage-multimodal-3.5, stored in a parallel *_image_embeddings table. Hybrid text+image search. Targets appraiser report tables and scanned PDFs where current OCR loses layout. After this commit: MCP server needs a /mcp reconnect to pick up the new VOYAGE_MODEL env, and the legal-ai container will pick it up on its next redeploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 16:43:48 +00:00
Chaim	6a38789379	docs+heartbeat: paperclip quirks + temp-file pattern + self-recovery All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details Two latent issues surfaced today while watching the case 8174-24 end-to-end run, both worth documenting and engineering around because they will recur on every future case. Bug 1 — issue.released flips done→todo After an agent successfully PATCHes its issue to "done", Paperclip's internal issue.released action reverts the status to "todo" within ~30 seconds. This triggers a fresh wakeup of the same agent on a task that is already complete. Reproduced on CMPA-18 (30/04/26): 18:14:57 agent PATCH → status: done 18:15:35 Paperclip → issue.released → status: todo 18:15:54 new researcher run started The fix at the right altitude (Paperclip itself) is outside our repo. Mitigation in HEARTBEAT.md §3 — when an agent boots and finds the issue in `todo` while expected outputs (file, DB rows) already exist, it must short-circuit: post a "no change" comment, PATCH back to done, and exit. Costs ~$0.20 per false wakeup but breaks the loop. Bug 2 — Bash backtick trap on long comment bodies Researcher agent built a curl pipeline like: curl ... -d "$(python3 -c "body = '''... 📁 קובץ מחקר: `/path/to/file.md` '''")" The backticks around the file path (markdown convention) get evaluated by the OUTER bash $(...) as command substitution. Bash then tries to exec /path/to/file.md, which is not executable, and prints "Permission denied" — a misleading error since the actual file ownership is fine. The curl itself succeeded; only the bash prelude noised up the log. Fix in HEARTBEAT.md §4א: long bodies must go via Write→tempfile then `curl -d @file`. Avoids every shell quoting edge case. Files: • docs/paperclip-quirks.md — new. Full writeup of both bugs plus two prior known-quirks (CEO auto-block in_progress, INSERT vs API for wakeups). Each section: what happens, empirical evidence from logs, impact, workaround, status. • .claude/agents/HEARTBEAT.md — added the self-recovery section to §3 and the temp-file pattern to §4א. The temp-file pattern is the canonical answer for any agent posting markdown comments — applies to all 7 agents in this skill set. • CLAUDE.md — referenced the new doc from the docs index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:23:32 +00:00
Chaim	cd4eed0045	docs: case-deletion runbook (legal-ai + Paperclip + Gitea) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details Captures the full deletion procedure we worked out empirically while wiping case 8174-24 for a clean rerun. Covers all four systems where case state lives, in dependency order: 1. legal-ai DB + on-disk dir — DELETE /api/cases?remove_files=true (now actually works after `903fb4d` added the missing db.delete_case) 2. Paperclip DB — no API; raw SQL with explicit FK-blocker ordering (issue_comments, cost_events, finance_events, feedback_votes, issue_inbox_archives, issue_read_states must go before issues; heartbeat_runs.wakeup_request_id must be NULLed before agent_wakeup_requests can be deleted) 3. Gitea — DELETE /api/v1/repos/cases/{N} 4. Verification queries for each system Two gotchas worth highlighting in the doc: • The case directory inside /data/cases is owned by root because the container runs as root — host-side rm needs sudo, or use the API (rmtree happens inside the container). • Paperclip projects are referenced via name LIKE '%{N}%' since there's no slug column. Stricter matching is recommended if N appears in multiple project names. Linked from legal-ai/CLAUDE.md docs index. A future scripts/delete-case.sh that automates the runbook with a confirmation prompt is noted as TODO inside the runbook itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:54:21 +00:00
Chaim	4a297f910c	Lessons from 1033-25 (clean acceptance — first in training corpus) Comparison of our draft (טיוטה-v6, 2,126 words) against Dafna's final decision (עריכה-v2, 2,299 words). 14 lessons (#20-#33) covering what the draft got right and where she rebuilt the discussion. Key findings: - Lesson #20: Match doctrinal depth to legal uncertainty. In clean acceptance the committee's OWN conditions provide the anchor — no CREAC framework needed. The draft's 101-word "נבאר" doctrinal paragraph was deleted entirely. - Lesson #21: Plant analytical seeds in the background ("ודוק" foreshadowing) for technical planning distinctions. - Lesson #23: Concrete documentary evidence (specific permits in buildings 5, 7, 11) beats generic statements. - Lesson #25: Counter-factual reasoning — "approved by mistake" gives the committee benefit of the doubt while strengthening reversal. - Lesson #26: Engineer counter-factual — "had he known the shadow plan was not feasible, his opposition would have been even stronger". - Lesson #27: "אכן...אולם" / "לא נעלם מעינינו" patterns are for rejection, NOT acceptance. Don't use prophylactically. - Lesson #28: "ונפרט;" (ו prefix + semicolon), never "נפרט." with period. - Lesson #33: Full acceptance against permit applicant → no expenses to either side. New transition phrases catalogued: "דיון עקר", "אושרה מתוך טעות כי הרי לא נוכל להניח כי אושרה למראית עין", "ועדת הערר אפשרה מרחב של זמן בתקווה כי ההחלטה תתייתר", "להלן כדוגמא מתוך", "ברי כי הכוונה ל...". Several of these lessons fed directly into daphna-acceptance-architecture.md (template A) and daphna-decision-tree.md from the recent voice corpus work; this file remains the case-study record. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:37:38 +00:00
Chaim	c2fb4ca08e	Voice corpus: acceptance architecture + block-zayin + decision tree All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m8s Details Three new voice docs based on deep reading of 1033-25 (full-acceptance) and 7 representative cases for block-zayin (claims summary): - daphna-acceptance-architecture.md: 5 distinct templates for case acceptance (A: internal flaw + voiding; B: remand to committee; C: corrections in request; D: substantive 8xxx; E: appraiser remand). Fixes the wrong reference in architecture-by-outcome that treated full-acceptance as a variation of partial-acceptance. - daphna-block-zayin-claims.md: rules for claims summary block — order by procedural role, neutrality, sub-headings per party, anti-patterns (numbered lists, evaluation words, premature conclusion). - daphna-decision-tree.md: operational tool that unifies all 5 voice docs into a short analytical process. Starts with the decisive question: "what is the winning evidence?". Decision trees for architecture selection, opening mode, citation choice, length by weight. Updates legal-writer.md to read decision-tree first, then the 5 voice docs, plus block-zayin.md before block ז. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:41:25 +00:00
Chaim	8b816c8b61	Voice corpus deep read: precedent network + architecture-by-outcome All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6s Details After reading all 23 1xxx decisions from style_corpus DB (in addition to the 10 training files and 1130-25/1194-25 deep reads), synthesized two new operational documents: docs/daphna-precedent-network.md - Maps each legal issue to the specific precedent Daphna cites - 9 threshold issues (standing, השפר, סעיף 152, קנייני, פגמי פרסום, פסילה, עבירות בנייה) with her preferred quotes for each - 8 substantive issues (תכנון נקודתי vs כולל, חיקוק תכנית, סטייה ניכרת, 62א, חניה, תמ"א 38, תכניות ישנות, שימוש חורג) - Lists ~30 external precedents she cites consistently + ~15 personal precedents (her own canon — 1110/20 בעלז, 1112/22 שקופה, 1181/22 אדלר, 1130-25, etc.) - Distinguishes precedents she cites vs. those she does NOT cite docs/daphna-architecture-by-outcome.md - 7 distinct block-yod architectures keyed to outcome type: 1. Pure rejection (short, 555-2000 words) 2. Rejection after complex analysis (2500-4500) 3. Threshold dismissal + merits "ועל מנת לא לצאת בחסר" (mode F) 4. Three or more distinct issues (sub-headings) 5. Partial acceptance (full funnel architecture) 6. Joined appeals 7. Remand follow-up - Decision tree for the agent (4 questions → architecture choice) - Internal proportions table (opening 5-10%, doctrine 15-25%, etc.) - Costs matrix with 6 scenarios Updated docs/daphna-voice-fingerprint.md with section 6 (additions from 23-file corpus read): 2 new opening modes (F: threshold+merits, G: remand follow-up), nuanced sub-heading rule, self-citation of full analytical blocks, 10 new "we" verbs, 11 traditional phrases with sources, expanded costs matrix, transparency about petition outcomes, warning that 1015-24 is dissent (not Daphna's voice). Updated .claude/agents/legal-writer.md to require reading all 4 voice docs before block-yod (the "voice quartet"), with explicit decision tree integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 07:26:45 +00:00
Chaim	bccc0a132f	Refine voice fingerprint with full 1xxx corpus (24 cases) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m3s Details After analyzing all 24 building_permit decisions in style_corpus DB (not just the 2 local files), refined two anti-patterns: 1. Sub-headings: actually permitted when block-yod handles 3+ distinct legal issues (e.g., 1079-24 had "הבקשות לפסילה" / "מעמד המבקשת וזכות עמידה" / "עותרים ציבוריים"). The earlier rule of "no sub-headings except academic cases" was too strict — based only on small local sample. 2. Paragraph numbering: discovered it's an evolutionary pattern, not a static rule. Pre-2025 decisions had sequential paragraph numbers (1, 2, 3 throughout); recent decisions (1126-25, 1128-25, 1130-25, 1194-25) abandoned it for narrative flow. The agent should NOT add paragraph numbers — the new style. The (1)...(2)...(3)... in-paragraph enumeration ban remains absolute — 0/33 final decisions used it. Distinction now made explicit: in-paragraph enumeration ≠ paragraph-level numbering (former always forbidden; latter is evolutionary). Updated: - docs/daphna-voice-fingerprint.md — corpus stats, refined anti-patterns - .claude/agents/legal-writer.md — checklist with new distinctions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 06:45:55 +00:00
Chaim	deb8baab5d	Inject Daphna's voice into legal-writer + corpus fingerprint All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details Synthesized two voice documents from corpus reading: - docs/voice-1130-25.md: deep read of case 1130-25 block-yod (5000 words), extracting the 9-movement funnel architecture, 8 reasoning templates, 10 'we' verbs with their distinct functions, the 'akhen...ulam' pattern, pacing/silence principles, and the deliberative meta-narrative. - docs/daphna-voice-fingerprint.md: cross-corpus synthesis of 10 finals (1 planning + 9 appraisal levy). Identifies 10 invariants, 5 opening modes mapped to outcome certainty, mandatory ברמ 3644/13 preamble for shamai cases, copy-paste templates, and 7 anti-patterns to avoid. Updated .claude/agents/legal-writer.md: - Added voice docs as MUST-READ before block-yod (was missing the deep voice layer; only had surface style_guide patterns) - Replaced the ' (1)...(2)...(3)...' enumeration template with the 5 opening modes (the enumeration was a known anti-pattern Daphna always removes) - Added the 'we' verbs catalog with explicit functions - Made 'אכן...אולם' pattern mandatory for issues with substantial counter-arguments (was vaguely 'אמנם...אולם') - Added mandatory ברמ 3644/13 preamble for 8xxx shamai cases - Added self-citation triple-mode (refer/defer/distinguish) — Daphna's emerging practice of building personal jurisprudence - Added 8-item anti-patterns checklist for post-write review - Replaced block-yod-alef section with proper 4-paragraph closing template (process narrative → outcome → costs → date) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:38:17 +00:00
Chaim	c619c22a51	Add pre-ruling interim draft (טיוטת ביניים) for appeals committee All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details Lets the chair generate a partial decision DOCX before the discussion-and- ruling block is decided. Same template, skill and DOCX styling as the final decision (David, RTL, bookmarks) — only the block selection and order differ: רקע (ו) → תכניות+היתרים (ט) → טענות (ז) → הליכים (ח). The opening (ה), ruling (י), summary (יא), and signatures (יב) are omitted. - New appraiser_facts table + CRUD + conflict detection in db.py (V5 schema). Conflict = same plan/permit identifier reported differently by 2+ appraisers. - New appraiser_facts_extractor service: per-appraisal Claude extraction of plans + permits with raw quotes and page numbers. - block-tet prompt extended with a permits sub-section sourced from the extracted facts, plus an explicit instruction to flag inter-appraiser conflicts in neutral wording without resolving them (deferred to block-yod). - block-chet prompt extended with a post-hearing materials context sourced from documents.metadata.is_post_hearing. - docx_exporter.export_decision now accepts mode='interim' which reorders the blocks per the chair's mental model and writes טיוטת-ביניים-v{N}.docx (versioned independently of regular drafts). - 3 new MCP tools: extract_appraiser_facts, write_interim_draft, export_interim_draft. write_interim_draft auto-runs extraction if the appraiser_facts table is empty for the case.	2026-04-18 13:28:04 +00:00
Chaim	e068a611e7	Rewrite architecture.md — add Track Changes edit flow + 8 stages All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6s Details The old architecture.md was out of date (mentioned n8n which isn't used, wrong embedding dimensions, missing multi-tenancy, no edit loop). The rewrite documents the full process end-to-end: 1. Document upload + OCR + embedding 2. Analysis (proofreader, researcher) 3. Outcome + direction decision (CEO + human) 4. Deep analysis (pass 2) 5. Drafting (writer writes 12 blocks) 6. QA 7. Initial DOCX export (with bookmarks for future revisions) 8. Edit loop with Track Changes — the new architecture: a. User downloads + edits in Word + uploads עריכה-v{N}.docx b. Backend auto-retrofits bookmarks + registers as active_draft c. User asks CEO for specific change in Paperclip comment d. CEO stage G: calls writer in revision mode → builds revisions JSON e. docx_reviser applies <w:ins>/<w:del> preserving user's template f. User Accept/Reject from Word Review tab g. Repeat until marked final Plus MCP tool reference, API endpoints, DB schema, multi-tenancy, technology stack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 19:10:11 +00:00
Chaim	1e4c5c1518	Add Paperclip agent activity mirror to case detail page All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m16s Details New "Agents" tab in case detail shows all Paperclip agent comments, issue status, and agent status for each case — eliminating the need to switch between Legal-AI and Paperclip UIs. Backend: 4 new DB query functions in paperclip_client.py (issues, comments, agents, post_comment) + 2 new API endpoints (GET/POST /api/cases/{case_number}/agents). Comment posting uses Board API with DB+wakeup fallback to ensure CEO routing. Frontend: agents.ts hooks (10s polling), AgentActivityFeed component (markdown timeline + comment input), AgentStatusWidget (sidebar), 4th tab in case detail page. Also includes new-company-setup-guide.md documenting the process for setting up the betterment levy (CMPA) company. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 10:44:42 +00:00
Chaim	be9fa9e712	Add decision-writing methodology based on FJC, Garner, Posner sources "בית ספר להחלטות" Phase 2 — the system now has formal analytical methodology for building quasi-judicial decisions, separate from Dafna's writing style (SKILL.md) and content checklists. What was done: - Downloaded 5 authoritative sources (~341K words): FJC Judicial Writing Manual (1991+2020), Garner Legal Writing in Plain English, Posner How Judges Think, Scalia/Garner Making Your Case - Extracted principles from all sources into intermediate docs - Synthesized into docs/decision-methodology.md (3,400 words, 12 sections, 10 guiding principles) - Integrated methodology into block-yod prompt via {methodology_guidance} - Restructured legal-writer agent workflow to follow analytical stages - Made "answer all claims" flexible (bundle/skip via chair_directions) - Added methodology compliance check (#7) to legal-qa agent - Updated all knowledge files (CLAUDE.md, SKILL.md, lessons, corpus) Three-layer architecture: 1. Methodology (decision-methodology.md) — universal, how to think 2. Content checklists (lessons.py) — specific per appeal subtype 3. Style (SKILL.md) — Dafna's personal writing patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 23:29:16 +00:00
Chaim	ed8502d46b	Update knowledge files with corpus analysis and feedback system docs - CLAUDE.md: added corpus-analysis.md to reference table, documented chair feedback system - block-schema.md: added content_checklist constraint to block-yod - legal-decision-lessons.md: added lessons 12-16 from corpus analysis (planning discussion, 5 subtypes, feedback system) - SKILL.md: added section 12 (content checklists, planning discussion patterns, chair feedback) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:25:54 +00:00
Chaim	0fef20e272	Add content checklists for block-yod and chair feedback system Addresses Dafna's observation that licensing decisions lack comprehensive planning discussion. Systematic corpus analysis of all 24 training decisions revealed the system learned writing style but not substantive content. Changes: - Corpus analysis of all 24 decisions (docs/corpus-analysis.md) - 5 content checklists by appeal subtype injected into block-yod prompt - chair_feedback DB table + API endpoints + MCP tools - Feedback management page in Next.js UI (/feedback) - Navigation updated with "הערות יו״ר" link Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 20:58:28 +00:00
Chaim	22e819363e	Flatten cases directory structure and unify paths - Remove cases/new\|in-progress\|completed subdivision (status managed in DB) - Rename documents/original → documents/originals (consistent plural) - Move exports from global data/exports/ into cases/{num}/exports/ - Add documents/research/ for case law and analysis files - Update all agents, scripts, config, web API endpoints, and DB paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:33:27 +00:00
Chaim	911c797eb2	Reorganize: skills/ directory + move memory to docs/ skill-legal-decision/ → skills/decision/ skill-legal-assistant/ → skills/assistant/ skill-legal-docx/ → skills/docx/ memory/*.md → docs/ Also removed: TASKS.md (use TaskMaster), classifier.py (replaced by local_classifier.py) Updated all references in CLAUDE.md, scripts, PRDs, docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 14:27:07 +00:00
Chaim	d5ccf03e4c	Add docs, scripts, skills, commands, and taskmaster config to repo Includes: - docs/: architecture, block-schema, migration-plan, product-specification - scripts/: bidi_table, decompose-decisions, extract-claims, seed-knowledge, etc. - skill-legal-decision/: SKILL.md + references + block-schema - skill-legal-assistant/: SKILL.md - skill-legal-docx/: SKILL.md + references - .claude/commands/: bidi-table skill - .taskmaster/: task config + PRDs - .gitignore: exclude legacy/, kiryat-yearim/, node_modules/, memory/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 14:19:17 +00:00

19 Commits