legal-ai

Author	SHA1	Message	Date
Chaim	9bfb912bdf	fix(audit): _collect_block_sources mirrors None-doc-types (provenance accuracy, FU-7 review) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 21:40:42 +00:00
Chaim	769f5020eb	feat(audit): block→source provenance via write_block audit event (GAP-19, FU-7) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 21:33:36 +00:00
Chaim	a9cd8aeb12	fix: prevent write_interim_draft context overflow (465K → ≤300K chars) Two bugs caused all 5 interim blocks to fail with "Claude CLI failed (exit 1): unknown error": 1. source_context was embedded BOTH inside the prompt template (via {source_context}) AND prepended again in write_block — doubling every block's context size (232K chars × 2 = 465K chars). 2. _build_source_context loaded all 9 case documents for every block regardless of relevance. Fixes: - Remove the duplicate source_context prepend in write_block; the template already contains it via {source_context} - Add per-block document filtering (_BLOCK_DOC_TYPES): block-he/zayin → empty, block-chet → protocol only, block-tet → appraisals only - Add 400K char guard before calling claude -p with a descriptive error (vs opaque "exit 1: unknown error") - Add prompt-size warning and size info in claude_session error messages Result: block-he 0 chars, block-zayin 0 chars, block-vav ~172K, block-chet ~45K, block-tet ~300K (all under 400K limit) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 10:49:47 +00:00
Chaim	28f49defff	LLM session: async, 30min timeout, semantic chunking + parallel All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details The claude_session bridge had two structural defects that made any non-trivial document extraction unreliable: 1. subprocess.run() blocks the asyncio event loop in the MCP server for the full duration of every LLM call (60-180s typical). 2. The 120-second timeout was below the cold-cache cost of any document over ~12K Hebrew characters. Three back-to-back timeouts on case 8174-24 dropped 43 appellant claims on the floor. Phase 1 of the remediation plan — keeps claude_session as the engine (no Anthropic API switch) and restructures around it: claude_session.py • query / query_json are now async — asyncio.create_subprocess_exec instead of subprocess.run, so MCP server can serve other coroutines while a call is in flight. • DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic document hits it; bounded so a runaway never zombifies forever. • LONG_TIMEOUT 300 → 3600 for opus block writing on full case context. • TimeoutError now actually kills the subprocess (asyncio.wait_for cancellation alone leaves the child running). claims_extractor.py • _split_by_sections: chunks at numbered sections / Hebrew letter headings / "פרק" markers / markdown ##, falls back to paragraph breaks, then to hard splits. Targets 12K chars per chunk — small enough that each chunk reliably finishes inside the timeout. • _extract_chunk: per-chunk retry (1 attempt by default) with structured logging on failure. Failed chunks no longer crash the overall extraction; they're skipped with a partial-result warning. • extract_claims_with_ai now runs chunks in parallel via asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3). For a 25K-char appeal: was sequential 150-300s, now ~70-90s. Updated all 9 callers (claims, appraiser facts, block writer, qa validator, brainstorm, learning loop, style analyzer × 3) to await the now-async API. The one-shot scripts/extract_claims_8174.py used to recover 43 appellant claims on case 8174-24 has been moved to .archive/ — phase 1 makes it obsolete. SCRIPTS.md updated. Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent llm_tasks table, SSE progress) is the structural follow-up — separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:21:35 +00:00
Chaim	c536ed0e63	Edit document doc_type and appraiser side from the case UI All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details Until now changing a document's doc_type required a manual SQL update. Adds an inline editor on the document badge so the chair can retag without leaving the case page, and threads an appraiser_side tag (committee / appellant / deciding) through the appraisal pipeline so betterment-levy cases — which usually have 2-3 appraisers — render conflicts with the deciding appraiser's view marked as governing. Backend - New appraiser_facts.appraiser_side column (V5.1) populated from documents.metadata.appraiser_side at extraction time. - extract_appraiser_facts now returns status='sides_missing' with the list of untagged appraisals instead of running with empty side labels — chair must tag every appraisal first via the UI. - Conflict detection orders entries committee → appellant → deciding so the deciding appraiser appears last; block-tet's prompt instructs the writer to phrase the deciding appraiser's view as the governing factual finding ("ואולם, השמאי המכריע קבע..."). - New PATCH /api/cases/{n}/documents/{doc_id} (Pydantic model with whitelist validation) and matching document_update MCP tool. Both merge appraiser_side into metadata JSONB instead of touching the schema. UI - New shared doc-types module exports the canonical 11 doc_type options plus the 3 appraiser-side options; both upload-sheet and the document badge now read from it instead of duplicating Hebrew labels. - New DocumentTypeEditor renders a Popover off the doc-type Badge with two Selects. The save button stays disabled while doc_type is appraisal but no side has been picked, mirroring the backend enforcement so the user finds out before triggering extraction. - usePatchDocument React-Query mutation invalidates the case detail on success so the badge updates without a manual refresh.	2026-04-19 06:26:51 +00:00
Chaim	c619c22a51	Add pre-ruling interim draft (טיוטת ביניים) for appeals committee All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details Lets the chair generate a partial decision DOCX before the discussion-and- ruling block is decided. Same template, skill and DOCX styling as the final decision (David, RTL, bookmarks) — only the block selection and order differ: רקע (ו) → תכניות+היתרים (ט) → טענות (ז) → הליכים (ח). The opening (ה), ruling (י), summary (יא), and signatures (יב) are omitted. - New appraiser_facts table + CRUD + conflict detection in db.py (V5 schema). Conflict = same plan/permit identifier reported differently by 2+ appraisers. - New appraiser_facts_extractor service: per-appraisal Claude extraction of plans + permits with raw quotes and page numbers. - block-tet prompt extended with a permits sub-section sourced from the extracted facts, plus an explicit instruction to flag inter-appraiser conflicts in neutral wording without resolving them (deferred to block-yod). - block-chet prompt extended with a post-hearing materials context sourced from documents.metadata.is_post_hearing. - docx_exporter.export_decision now accepts mode='interim' which reorders the blocks per the chair's mental model and writes טיוטת-ביניים-v{N}.docx (versioned independently of regular drafts). - 3 new MCP tools: extract_appraiser_facts, write_interim_draft, export_interim_draft. write_interim_draft auto-runs extraction if the appraiser_facts table is empty for the case.	2026-04-18 13:28:04 +00:00
Chaim	3da4d73498	Upgrade agents to Claude Opus 4.7 All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details - legal-analyst: opus 4.6 → opus 4.7 - legal-proofreader: opus 4.6 → opus 4.7 - legal-writer: sonnet 4.6 → opus 4.7 (complex block writing benefits from stronger model) - block_writer MODEL_MAP: updated opus ID to 4.7 Opus 4.7 brings: high-res images (2576px), better file-based memory, improved DOCX generation, and task budgets for agentic loops. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:10:56 +00:00
Chaim	be9fa9e712	Add decision-writing methodology based on FJC, Garner, Posner sources "בית ספר להחלטות" Phase 2 — the system now has formal analytical methodology for building quasi-judicial decisions, separate from Dafna's writing style (SKILL.md) and content checklists. What was done: - Downloaded 5 authoritative sources (~341K words): FJC Judicial Writing Manual (1991+2020), Garner Legal Writing in Plain English, Posner How Judges Think, Scalia/Garner Making Your Case - Extracted principles from all sources into intermediate docs - Synthesized into docs/decision-methodology.md (3,400 words, 12 sections, 10 guiding principles) - Integrated methodology into block-yod prompt via {methodology_guidance} - Restructured legal-writer agent workflow to follow analytical stages - Made "answer all claims" flexible (bundle/skip via chair_directions) - Added methodology compliance check (#7) to legal-qa agent - Updated all knowledge files (CLAUDE.md, SKILL.md, lessons, corpus) Three-layer architecture: 1. Methodology (decision-methodology.md) — universal, how to think 2. Content checklists (lessons.py) — specific per appeal subtype 3. Style (SKILL.md) — Dafna's personal writing patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 23:29:16 +00:00
Chaim	0fef20e272	Add content checklists for block-yod and chair feedback system Addresses Dafna's observation that licensing decisions lack comprehensive planning discussion. Systematic corpus analysis of all 24 training decisions revealed the system learned writing style but not substantive content. Changes: - Corpus analysis of all 24 decisions (docs/corpus-analysis.md) - 5 content checklists by appeal subtype injected into block-yod prompt - chair_feedback DB table + API endpoints + MCP tools - Feedback management page in Next.js UI (/feedback) - Navigation updated with "הערות יו״ר" link Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 20:58:28 +00:00
Chaim	4df2040a40	Fix: save_block_content now writes draft file + writer must update status Two issues that caused QA agent to fail: 1. save_block_content saved to DB only — now also rebuilds drafts/decision.md 2. legal-writer.md now has explicit mandatory step: case_update(status="drafted") Without these, workflow_status reports has_draft=false and QA can't run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 15:25:53 +00:00
Chaim	bacb330a2a	Replace all Anthropic API calls with Claude Code session (claude -p) New module claude_session.py provides query() and query_json() that run prompts via `claude -p` CLI — uses the claude.ai session, zero API cost. Converted 6 services: - claims_extractor.py: extract_claims_with_ai - brainstorm.py: brainstorm_directions - block_writer.py: write_block (was streaming+thinking, now simple) - qa_validator.py: claims_coverage check - style_analyzer.py: 3 API calls (single pass, multi pass, synthesis) - learning_loop.py: extract_lessons Only extractor.py still uses Anthropic API (for PDF OCR with Vision). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 14:14:08 +00:00
Chaim	9d0a73a1dc	Add context-only mode: Claude Code writes blocks, no API needed New architecture: MCP provides context, Claude Code writes. New functions: - get_block_context(case_id, block_id) → returns full context package (prompt, source docs, claims, direction, precedents, style guide) WITHOUT calling Anthropic API - save_block_content(case_id, block_id, content) → saves block to DB New MCP tools: get_block_context, save_block_content The old write_block (API-based) still works as fallback. The new flow uses Claude Code's own model (Opus 4.6, 1M context) which has no separate API billing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:18:25 +00:00
Chaim	7033d2d3ee	Embed full style guide in block prompts for Dafna's voice _build_style_context rewritten from 10-line summary to comprehensive style guide including: - Tone rules per appeal type (warm for licensing, cold for levy) - 15 mandatory expressions ("כידוע", "ברי כי", "אין בידנו לקבל") - Discussion structure rules (continuous prose, conclusion first) - Per-party phrasing templates (appellants, committee, permit applicants) - DB patterns grouped by type (phrases, transitions, openings, closings) This addresses the main quality gap: style rated 2/5 because the output was "dry and overly formal" vs Dafna's "direct and clear" voice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:12:09 +00:00
Chaim	7d1dc73112	Fix max_tokens to 16K for Opus (API limit is 32K, need room for thinking) block-yod max_tokens reduced from 32K to 16K — the API returned "max_tokens: 32768 > 32000" error. With thinking enabled, the actual limit for output is lower. 16K is sufficient for discussion blocks. Also: extractor.py now supports .md files (was missing, blocked Beit HaKerem upload). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:00:49 +00:00
Chaim	e24e24dac5	Maximize context and output per Anthropic best practices Per official Anthropic documentation (April 2026): Output tokens increased to match model capabilities: - block-yod (discussion): 8K → 32K (Opus supports 128K) - block-zayin (claims): 4K → 16K - block-vav (background): 4K → 16K - claims_extractor: 4K → 8K (fixes truncated JSON) - qa_validator: 4K → 8K Source documents sent in full (not truncated): - Was: 3000 chars per doc, 15K total - Now: full document text, no truncation - Reduces hallucinations: "extract word-for-word quotes first" Prompt structure follows long-context tips: - Source documents placed FIRST (top of prompt) - Instructions and query placed LAST - "Queries at the end improve quality by up to 30%" Extended thinking uses adaptive mode for Opus 4.6. Streaming enabled for all requests > 21K tokens. Unified JSON parsing via parse_llm_json() helper in config.py. Applied to: classifier, claims_extractor, brainstorm, qa_validator, learning_loop (5 files). Also: extractor.py now supports .md files. Sources: - https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking - https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips - https://docs.anthropic.com/en/docs/minimizing-hallucinations - https://docs.anthropic.com/en/docs/about-claude/models/overview Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:17:43 +00:00
Chaim	bed9d5c7e9	Improve block-zayin: synthesize claims by topic + fix markdown JSON parsing block_writer: Rewrote block-zayin prompt to require synthesis by topic instead of listing each claim separately. Now produces 3 organized sections (appellants 8, committee 6, permit applicants 3+) instead of 40 scattered paragraphs. Target: 800-1500 words. claims_extractor: Fix markdown code block stripping (same bug as qa_validator had). Enables parsing claims from Claude responses wrapped in ```json blocks. Tested on Hecht: block-zayin from 40 paragraphs/1049 words to 17 organized paragraphs/1039 words. Structure now matches Dafna's original (3 parties, grouped by topic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:54:42 +00:00
Chaim	e438740ab4	Add renumber_all_blocks + fix sequential_numbering check for bold format block_writer: new renumber_all_blocks() function that renumbers all paragraphs across all blocks sequentially (1, 2, 3...). Handles both plain "N." and bold "N." formats. Added missing 'import re'. qa_validator: sequential_numbering check now matches bold-formatted numbers (N.) in addition to plain (N.). Tested on Hecht: renumbered 115 paragraphs across 7 blocks, QA 6/6. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:30:31 +00:00
Chaim	7781987c3a	Fix precedents search + auto-update case parties block_writer: _build_precedents_context now searches both paragraph_embeddings (other decisions by Dafna) and case_law_embeddings (precedent case law). Previously only searched document_chunks which had no cross-case data. Now returns ~2400 chars from 3 other decisions. processor: Step 1.6 auto-updates case appellants/respondents from classifier results when they're empty. Prevents blank party fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 11:59:33 +00:00
Chaim	018b5936a1	Fix claims handling: filter block-zayin duplicates, improve QA matching block_writer: _build_claims_context now filters out block-zayin claims (from final decision) and uses only claims from original pleadings. Reduces noise from 78 to 48 real claims for Hecht case. qa_validator: claims_coverage check rewritten: - Filter block-zayin claims (same reason) - Keyword-based matching instead of 3-word phrase matching - 25% keyword overlap threshold (was: any 3-word match) - Allow up to 20% uncovered claims before failing - Check both block-yod and block-zayin for coverage Result: Hecht case QA goes from 4/6 to 6/6, 47/48 claims covered (98%). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 11:32:29 +00:00
Chaim	570f745823	Improve block-yod prompt: require minimum length, numbered claims, precedent citations - Add minimum word count guidance (2000-4000 words) - Number each claim in claims_context for explicit tracking - Require 3-5 case law citations minimum - Fix max_tokens > budget_tokens for extended thinking - Use streaming for opus+thinking requests (>10min timeout) Tested on Hecht case: block-yod improved from 1039 to 1927 words. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 11:28:12 +00:00
Chaim	d9e5ef0f46	Add full decision writing pipeline: classify, extract, brainstorm, write, QA, export New services (11 files): - classifier.py: auto doc-type classification + party identification (Claude Haiku) - claims_extractor.py: claim extraction from pleadings (Claude Sonnet + regex) - references_extractor.py: plan/case-law/legislation detection (regex) - brainstorm.py: direction generation with 2-3 options (Claude Sonnet) - block_writer.py: 12-block decision writer (template + Claude Sonnet/Opus) - docx_exporter.py: DOCX export with David font, RTL, headings - qa_validator.py: 6 QA checks with export blocking on critical failure - learning_loop.py: draft vs final comparison + lesson extraction - metrics.py: KPIs dashboard per case and global - audit.py: action audit log - cli.py: standalone CLI with 11 commands Updated pipeline: extract → classify → chunk → embed → store → extract_references New MCP tools: 29 total (was 16) New DB tables: audit_log, decisions CRUD, claims CRUD Config: Infisical support, external service allowlist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 10:21:47 +00:00

21 Commits