legal-ai

Author	SHA1	Message	Date
Chaim	81ccf3a888	feat(retrieval): track page_number on text chunks for multimodal hybrid boost All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6m33s Details The legacy chunker did not track which PDF page each chunk came from. Stored chunks had page_number=NULL, which blocked the multimodal hybrid retriever's text+image boost — it joins (chunk, image) on (document_id, page_number) and the join could never fire. This change: - extractor.extract_text now returns (text, page_count, page_offsets); page_offsets[i] is the start char offset of page (i+1) in the joined text. None for non-PDFs. - chunker.chunk_document accepts an optional page_offsets and tags each chunk with the page that contains its first character (uses the existing chunker logic; pages assigned post-hoc by content search to keep the diff minimal). - processor.process_document and precedent_library.ingest_precedent forward page_offsets through the chunker. New uploads now carry accurate page_number on every chunk. - Other extract_text callers (tools/documents, tools/workflow, web/app.py) updated to unpack the third element (ignored). - scripts/backfill_chunk_pages.py: per-case retrofit. Re-extracts each PDF (re-OCRs via Google Vision if needed, ~$0.0015/page), computes page_offsets, and updates page_number on every chunk by content search. Idempotent; --force re-runs on already-tagged docs. Forward-only would leave the 419 image embeddings backfilled on cases 8174-24 + 8137-24 unable to boost their corresponding text chunks. The retrofit script closes that gap (cost ~$0.60). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:49:41 +00:00
Chaim	242f668319	feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m50s Details Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid text+image search. Off by default; enable with MULTIMODAL_ENABLED=true. - Schema V9: document_image_embeddings + precedent_image_embeddings (vector(1024), page_number, image_thumbnail_path) - extractor.render_pages_for_multimodal renders PDF pages at MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass - embeddings.embed_images calls voyage-multimodal-3 in 50-page batches - services/hybrid_search.py orchestrator: rerank applied to text side first (rerank-2 is text-only); image side cosine; weighted merge with text_weight 0.65 (env-tunable); image-only pages surface as match_type='image' so dense scanned content still appears - processor.process_document and precedent_library.ingest_precedent gated by flag — non-fatal on multimodal failure - scripts/multimodal_backfill.py — idempotent per-case CLI to embed existing documents without re-extracting text Validated locally on a 5-page response brief: render 0.31s, embed 8.32s, hybrid merge surfaces image rows correctly. Production rollout starts with flag=false (no behavior change), then per-case A/B. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:24:52 +00:00
Chaim	26c3fddf41	feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which 4 POCs showed inconsistent improvement), add a cross-encoder rerank layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false). POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge): - mean@3 +4.5% (4.306 → 4.500) - practical-category queries +11.6% (3.78 → 4.22) - latency +702ms per query - no schema change, no re-embed, no double storage Plumbing: - config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars - embeddings.voyage_rerank() wraps voyageai client.rerank - services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is unavailable. - tools/search.py: search_decisions, search_case_documents, find_similar_cases all wrapped - services/precedent_library.search_library wrapped Smoke-tested locally with flag on/off — produces expected behaviour and latency profile. Ready for production rollout via Coolify env flip after deploy. POCs (kept under scripts/ for reference): - voyage_context3_poc{_long}.py — context-3 evaluation (rejected) - voyage_multimodal_poc.py — multimodal-3 (stage C, deferred) - voyage_rerank_judge_poc.py — single-case rerank benchmark - voyage_rerank_corpus_poc.py — full-corpus rerank validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:43:41 +00:00
Chaim	4a9a6b7970	feat(precedents): UI button queues extraction for local MCP worker All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details The chair wanted a one-click "extract metadata" button on the edit sheet. The constraint stays the same — claude_session needs the local CLI which the container doesn't have, so the button can't run the extractor itself. Compromise: button stamps a queue marker; the local MCP server drains the queue on demand. DB (V8): two nullable timestamps on case_law, metadata_extraction_requested_at and halacha_extraction_requested_at, with partial indexes for cheap "find pending" scans. API: POST /api/precedent-library/{id}/request-metadata → stamp the row POST /api/precedent-library/{id}/request-halachot → same for halacha GET /api/precedent-library/queue/pending?kind=... → read-only view UI: Sparkles button in the edit sheet header. Click → toast tells the chair what to run from Claude Code. The button never triggers the extractor directly from the container. MCP tool: precedent_process_pending(kind, limit) — runs from Claude Code with the local CLI, picks up everything stamped, calls the extractor for each, clears the timestamp on success. Failures keep the timestamp so the next invocation retries them. Architectural rule (claude_session local-only) is preserved end-to-end and called out in the new endpoint comment + tool docstring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 12:32:25 +00:00
Chaim	73a79ea7e8	feat(precedents): metadata auto-fill, edit sheet, persuasive extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details Three improvements to the precedent library based on usage feedback: 1. Auto-fill metadata at upload time. New service precedent_metadata_extractor reads the ruling's full_text and suggests case_name (short), summary, headnote, key_quote, subject_tags, appeal_subtype. The merge policy fills only empty fields, preserving everything the chair typed in the upload form. Wired into the ingest pipeline; also exposed as a re-run endpoint POST /api/precedent-library/{id}/extract-metadata for existing records. 2. Edit sheet in the UI. Pencil icon on each library row opens a pre-populated form covering every field. A Sparkles button on the sheet runs the metadata extractor on demand and refreshes the form. The case_number is read-only because halachot are FK'd to it; renaming requires delete + re-upload. 3. Halacha extractor branches on is_binding. Sources marked binding (Supreme/Administrative) keep the strict halacha prompt. Non-binding sources (other appeals committees, district courts on planning matters) get a different prompt that extracts applications, interpretive principles, and persuasive conclusions — labeled with new rule_types 'application' and 'persuasive'. The fallback also widens chunk selection: if the chunker labeled nothing as legal_analysis/ruling/conclusion, we now run on all chunks rather than returning zero halachot for a usable ruling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 10:19:35 +00:00
Chaim	7ee90dce31	feat: external precedent library with auto halacha extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details Adds a third corpus of legal authority distinct from style_corpus (Daphna's prior decisions for voice) and case_precedents (chair-attached quotes per case). The new corpus holds chair-uploaded court rulings and other appeals committee decisions, with binding rules (הלכות) extracted automatically and queued for chair approval. Pipeline (web/app.py + services/precedent_library.py): file → extract → chunk → Voyage embed → halacha_extractor → store + publish progress over the existing Redis SSE channel. Schema V7 (services/db.py): extends case_law with source_kind + extraction status fields under a CHECK constraint pinning practice_area to the three appeals committee domains (rishuy_uvniya, betterment_levy, compensation_197). New precedent_chunks (vector(1024)) and halachot tables (vector(1024) over rule_statement, IVFFlat indexes, gin on practice_areas/subject_tags). Halachot start as pending_review; only approved/published rows are visible to search_precedent_library. Agents: legal-writer, legal-researcher, legal-analyst, legal-ceo, legal-qa get search_precedent_library. legal-writer prompt explains the three-corpus distinction and CREAC use; legal-qa now verifies that every cited halacha resolves to an approved row in the corpus. UI: /precedents page with four tabs — library / semantic search / pending review (J/K nav, A/R/E shortcuts, badge count) / stats. Reuses the existing upload-sheet progress + SSE pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:38:18 +00:00
Chaim	f256eddbb1	git_sync: full case-dir backup to Gitea (sweep + explicit commits) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details The case repo is the user's backup, so anything in the dir must end up on Gitea. Two layers: 1. Periodic sweep (every 30s) — git_sync.sweep_loop runs as a FastAPI background task. It scans every case dir, runs git status --porcelain on each, and commit_and_push's any dirty changes with an auto-built Hebrew message ("אוטו: טיוטות (2) · מסמכים"). Catches files written outside the API path: agent research artefacts, manual edits, etc. 2. Explicit commits at known write paths — DOCX export, interim draft, apply_user_edit, revise_draft, mark-final, analysis DOCX export. These give immediate feedback with descriptive messages instead of waiting up to 30s for the sweep. safe.directory injection added to _git_env so sweep + explicit commits work even when the running uid differs from the case-dir owner (host runs vs. uniform-root container). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:27:36 +00:00
Chaim	fa70944ed4	case-create: surface Gitea repo result + UI retry button All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details The auto-creation in case_create had two failure modes that combined to make repos silently missing: a stale GITEA_TOKEN returning 401, and the outer try/except in case_create that swallowed every exception with a bare pass. Result: cases like 8174-24 ended up with a local git repo and Paperclip project but no Gitea repo, with no signal anywhere. _setup_gitea_remote now returns {ok, url, error} and never raises; the result is attached to the case JSON and the FastAPI endpoint logs a warning when ok=false. The UI gets a "צור ריפו ב-Gitea" button on the case header that appears only when the repo or remote is missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:12:05 +00:00
Chaim	5e4c03d0cd	Case sync: refresh remote URL with current token before each push All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details Cases failed to push silently after the Gitea token in Infisical was rotated: the embedded credential in each case repo's origin URL was the old token, the rotation never propagated, and capture_output=True hid the auth failure as a logger.warning. Three cases (1033-25, 1130-25, 1194-25) accumulated unpushed commits over weeks before this was noticed. Fixes the root cause in two places: web/gitea_client.py for uploads through the FastAPI endpoint, and mcp-server/services/git_sync.py for case_update / document_upload through MCP tools (which previously committed but never pushed at all). The new commit_and_push helper: - re-injects the current GITEA_ACCESS_TOKEN into the existing origin URL on every call, so pushes survive token rotation - logs push failures at WARNING with the actual stderr (the previous code suppressed errors entirely) - continues to push even when the commit was a no-op, in case earlier commits are still unpushed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:14:57 +00:00
Chaim	36ca713dfa	Retrofit: tighten yod-bet pattern, add cover-block fallback All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6s Details The "על כן" pattern for block-yod-bet was too greedy and matched mid-discussion transitional sentences (e.g. "על כן, במקום בו..."), which caused forward-scan to skip block-yod-alef ("סוף דבר") via the pointer advance. Tightened to require an operative subject (אנו / הערר / הוועדה / ועדת הערר) so terminal "על כן, אנו מחליטים" still matches but mid-block transitions don't. Added structural_fallback for cover blocks (alef/bet/gimel/dalet) — these are template metadata not present in user-edited DOCX bodies. Inject zero-content anchors so apply_user_edit can still target them later. The frontend toast distinguishes real content gaps from fallback anchors. Also expanded heading patterns based on training corpus inspection: - block-vav: על המקרקעין חלות / במצב התכנוני / התכניות החלות - block-zayin: טענות העוררת - block-chet: עיקר תגובת המשיב - block-tet: הדיון בוועדת הערר For case 1130-25, this raises detection from 6/12 to 11/12 blocks — only block-yod-bet remains missing (Daphna's edit ends at "סוף דבר" + numbered ruling, no terminal "ההחלטה" or "על כן אנו מחליטים" paragraph). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 06:57:41 +00:00
Chaim	c536ed0e63	Edit document doc_type and appraiser side from the case UI All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details Until now changing a document's doc_type required a manual SQL update. Adds an inline editor on the document badge so the chair can retag without leaving the case page, and threads an appraiser_side tag (committee / appellant / deciding) through the appraisal pipeline so betterment-levy cases — which usually have 2-3 appraisers — render conflicts with the deciding appraiser's view marked as governing. Backend - New appraiser_facts.appraiser_side column (V5.1) populated from documents.metadata.appraiser_side at extraction time. - extract_appraiser_facts now returns status='sides_missing' with the list of untagged appraisals instead of running with empty side labels — chair must tag every appraisal first via the UI. - Conflict detection orders entries committee → appellant → deciding so the deciding appraiser appears last; block-tet's prompt instructs the writer to phrase the deciding appraiser's view as the governing factual finding ("ואולם, השמאי המכריע קבע..."). - New PATCH /api/cases/{n}/documents/{doc_id} (Pydantic model with whitelist validation) and matching document_update MCP tool. Both merge appraiser_side into metadata JSONB instead of touching the schema. UI - New shared doc-types module exports the canonical 11 doc_type options plus the 3 appraiser-side options; both upload-sheet and the document badge now read from it instead of duplicating Hebrew labels. - New DocumentTypeEditor renders a Popover off the doc-type Badge with two Selects. The save button stays disabled while doc_type is appraisal but no side has been picked, mirroring the backend enforcement so the user finds out before triggering extraction. - usePatchDocument React-Query mutation invalidates the case detail on success so the badge updates without a manual refresh.	2026-04-19 06:26:51 +00:00
Chaim	c619c22a51	Add pre-ruling interim draft (טיוטת ביניים) for appeals committee All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details Lets the chair generate a partial decision DOCX before the discussion-and- ruling block is decided. Same template, skill and DOCX styling as the final decision (David, RTL, bookmarks) — only the block selection and order differ: רקע (ו) → תכניות+היתרים (ט) → טענות (ז) → הליכים (ח). The opening (ה), ruling (י), summary (יא), and signatures (יב) are omitted. - New appraiser_facts table + CRUD + conflict detection in db.py (V5 schema). Conflict = same plan/permit identifier reported differently by 2+ appraisers. - New appraiser_facts_extractor service: per-appraisal Claude extraction of plans + permits with raw quotes and page numbers. - block-tet prompt extended with a permits sub-section sourced from the extracted facts, plus an explicit instruction to flag inter-appraiser conflicts in neutral wording without resolving them (deferred to block-yod). - block-chet prompt extended with a post-hearing materials context sourced from documents.metadata.is_post_hearing. - docx_exporter.export_decision now accepts mode='interim' which reorders the blocks per the chair's mental model and writes טיוטת-ביניים-v{N}.docx (versioned independently of regular drafts). - 3 new MCP tools: extract_appraiser_facts, write_interim_draft, export_interim_draft. write_interim_draft auto-runs extraction if the appraiser_facts table is empty for the case.	2026-04-18 13:28:04 +00:00
Chaim	726498126d	Add Track Changes architecture for draft revisions (CMP + CMPA) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Fixes critical bug in 1033-25: user-uploaded עריכה-*.docx files were orphaned on disk while exports kept rebuilding from stale DB blocks. New architecture: - User-uploaded DOCX becomes the source of truth (cases.active_draft_path) - System edits via XML surgery with real Word <w:ins>/<w:del> revisions - User can Accept/Reject each change from within Word Components: - docx_reviser.py: XML surgery for Track Changes (15 tests) - docx_retrofit.py: retroactive bookmark injection with Hebrew marker detection + heading heuristic (9 tests) - docx_exporter.py: emits bookmarks around each of the 12 blocks - 3 new MCP tools: apply_user_edit, list_bookmarks, revise_draft - 4 new/updated endpoints: upload (auto-registers active draft), /exports/revise, /exports/bookmarks, /exports/{filename}/retrofit, /active-draft - DB migration: cases.active_draft_path column - UI: correct banner using real v-numbers, "מקור האמת" badge, detailed upload toast with bookmarks_added/missing_blocks - agents: legal-exporter (3 export modes), legal-ceo (stage G for revision handling), legal-writer (revision mode) Multi-tenancy: - Works for both CMP (1xxx cases) and CMPA (8xxx/9xxx cases) - New revise-draft skill added to both companies - deploy-track-changes.sh syncs skills CMP ↔ CMPA - retrofit_case.py: one-off retrofit of existing files Tests: 34 passing (15 reviser + 9 retrofit + 4 exporter bookmarks + 6 e2e) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 18:49:30 +00:00
Chaim	5dd24729e2	Auto-strip Nevo preambles and separate style analysis per appeal subtype - Add strip_nevo_preamble() to extractor.py — auto-removes Nevo database headers (bibliography, legislation, mini-ratio) during training upload - Add appeal_subtype column to style_patterns table — patterns are now stored per subtype instead of globally mixed - Update clear_style_patterns() to support subtype-scoped deletion - Pass appeal_subtype through analyze_corpus → store → upsert pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:03:06 +00:00
Chaim	ba39707c70	Add CMPA (betterment levy) training support and update methodology Support ingestion of betterment levy (היטל השבחה) decisions into a separate training corpus (CMPA). Key changes: - Add .doc file extraction via LibreOffice conversion in extractor - Add practice_area/appeal_subtype columns to style_corpus table - Route training files to cmp/ or cmpa/ subdirs based on appeal subtype - Fix derive_subtype to handle ARAR-YY-NNNN format (was matching year digit) - Expose practice_area/appeal_subtype params in MCP upload_training tool - Add appeal_subtype filter to analyze_style for per-type style analysis - Update betterment levy methodology in lessons.py: checklist (from generic to corpus-based), opening/closing strategies, and discussion rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:00:35 +00:00
Chaim	684a4cfd3b	Fix 500 error on precedents API — add default=str to json.dumps All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m41s Details UUID and datetime objects from PostgreSQL RETURNING * were not serializable. All other tool files already used default=str. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 12:11:30 +00:00
Chaim	2e2d2d42b6	Prevent status regression in case_update All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m32s Details CEO agent was reverting case status from "processing" to "new" when updating metadata fields. Added ordered status list — case_update now silently ignores status changes that would move backwards. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:05:40 +00:00
Chaim	82ba4663ba	Fix case repo sync + auto-create Gitea repos + add sync indicator All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m30s Details - auto-sync-cases.sh: fix broken directory scan (was looking for status subdirs that don't exist), fix env var word-splitting bug, add safe.directory handling and error logging - cases.py: auto-create Gitea repo on case_create, fix documents/original → documents/originals naming mismatch - app.py: add GET /api/cases/{case_number}/git-status endpoint - web-ui: add SyncIndicator component in case header showing sync status (synced/pending/no remote) with last commit time - pyproject.toml: add httpx dependency - CLAUDE.md: update Paperclip wakeup API docs - settings page: switch tag input from Select to free-text with datalist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:28:16 +00:00
Chaim	e698419faf	Fix git not found error crashing document uploads in container All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m13s Details Install git in Docker image and wrap all subprocess git calls in try/except so a missing or failing git binary never kills an upload that already succeeded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:38:40 +00:00
Chaim	50eaa887db	Add chair feedback system and content checklists for block-yod Backend changes cherry-picked from ui-rewrite branch to enable feedback API endpoints for the Next.js staging UI. - chair_feedback DB table + API endpoints (GET/POST/PATCH) - Content checklists by appeal subtype injected into block-yod prompt - MCP tools for recording and listing chair feedback - Corpus analysis documentation (24 decisions) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:05:53 +00:00
Chaim	e2088a4f60	Add case_precedents: attached legal support for the compose phase New self-contained table + MCP tools + FastAPI endpoints for letting the chair attach external case-law quotes (quote + citation מראה מקום, optional chair note, optional archived PDF) to either a specific threshold_claim / issue or the case as a whole. Data model - case_precedents (SCHEMA_V5_SQL) — case_id, section_id NULL/ "threshold_N"/"issue_N", quote, citation (free-text), chair_note, pdf_document_id FK to documents, denormalized practice_area for cross-case library filtering. - Deliberately NOT linked to the existing case_law table — that one has UNIQUE(case_number) which would force parsing the free-text citation into a structured key. A backfill pass into case_law is a later follow-up once the UI stabilizes. - db.py gains 4 helpers: create_case_precedent, list_case_precedents, delete_case_precedent, search_precedent_library. The last uses DISTINCT ON (citation) for the cross-case typeahead so each precedent appears once even if reused across many cases. MCP tools (legal_mcp/tools/precedents.py) - precedent_attach, precedent_list, precedent_remove, precedent_search_library — registered in server.py. FastAPI (web/app.py) - POST /api/cases/{n}/precedents — create, with PrecedentCreateRequest - POST /api/cases/{n}/precedents/upload-pdf — one-shot PDF upload to a dedicated documents/precedents/ subdirectory, creates a documents row with doc_type="precedent_archive" and no text extraction (archive only) - GET /api/cases/{n}/precedents — list - DELETE /api/precedents/{id} — uses path param since precedent_id is a UUID (slash-safe, unlike case numbers) - GET /api/precedents/search?q=...&practice_area=... — library typeahead Block-writer integration into _build_precedents_context is a deferred follow-up — Phase 1 surfaces the feature in the compose UI only. Plan: ~/.claude/plans/woolly-cooking-graham.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 19:16:48 +00:00
Chaim	8989ad9a9b	Add case_delete: MCP tool + DELETE endpoint + DB helper Wires a new case-deletion path across the three layers that needed it: - db.delete_case(case_id) — single SQL DELETE; documents, chunks, and qa_results cascade via existing schema FKs, audit_log nullifies. - cases_tools.case_delete(case_number, remove_files=False) — MCP tool wrapper. File tree on disk is kept by default (audit trail); pass remove_files=True for a hard delete. - DELETE /api/cases?case_number=... — FastAPI endpoint taking the case number as a QUERY param rather than a path segment. Case numbers like "1000/0426" can't be passed through a path parameter because FastAPI routing decodes %2F before matching, so a query param is the only shape that works for historical data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:47:50 +00:00
Chaim	26d09d648f	Practice area separation: multi-tenant axis across DB, RAG, and UI Adds two orthogonal columns — practice_area (top-level legal domain: appeals_committee / national_insurance / labor_law) and appeal_subtype (building_permit / betterment_levy / compensation_197) — denormalized into cases, documents, document_chunks, decisions, and style_corpus so vector searches can filter without JOINs. Why: the system handles two unrelated sub-domains under the same appeals committee (1xxx building permits and 8xxx/9xxx betterment/197), with different rules and writing style. Without a separation axis, search_similar() and the block-writer's precedent lookup were free to surface betterment-levy paragraphs while drafting a building-permit decision — a real risk of cross-domain contamination. The same axis also lets future domains (national insurance, labor law) coexist without separate schemas. Schema (V4 migration in db.py): - ALTER ... ADD COLUMN IF NOT EXISTS on all five tables + composite indexes (practice_area first). - Idempotent backfill: case_number ~ '^1' → building_permit, '^8' → betterment_levy, '^9' → compensation_197; propagated to documents, chunks, and decisions via case_id; training-corpus rows (case_id NULL) default to appeals_committee. Code: - New services/practice_area.py with derive_subtype, validate, and is_override + enum constants. - db.create_case / create_document / store_chunks / create_decision inherit practice_area from the parent case (or take an explicit override for the case_id=None training corpus). - db.search_similar and search_similar_paragraphs accept practice_area + appeal_subtype filters using the denormalized columns. - tools/search.py auto-resolves the filter from case_number when given. - block_writer._build_precedents_context now passes the active case's practice_area to search_similar_paragraphs — closes the contamination hole for the discussion-block precedent fetch. - tools/cases.case_create auto-derives subtype from case_number; an explicit override that disagrees writes a case_subtype_override entry to audit_log so we can spot bad classifications later. - tools/documents.document_upload_training tags new training material with practice_area + subtype end-to-end (corpus, document, chunks). UI (web/static/index.html + web/app.py): - New-case wizard gets a practice_area dropdown (others disabled until national_insurance / labor_law arrive) and an appeal_subtype dropdown with JS auto-fill from the case-number prefix; manual edits stick. - Case header shows a blue badge with practice_area · subtype. - CaseCreateRequest plumbs both fields through to cases_tools.case_create. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:36:48 +00:00
Chaim	0c4886afe6	Wire legal-writer to chair directions from analysis-and-research.md Closes the loop so דפנה's positions (written inline in the UI and saved to analysis-and-research.md) automatically become binding direction for the legal-writer agent — no manual copy-paste, no bypass. Backend: - research_md.extract_chair_directions(path) returns a compact dict with status (missing/empty/partial/complete), filled_count, empty_count, and a reduced list of threshold_claims + issues each with {id, number, title, direction}. Designed to be directly usable as direction_doc by the writer. - New MCP tool: drafting.get_chair_directions(case_number) wraps the helper, resolves the case research file path via config.find_case_dir, returns formatted JSON. - Registered in server.py as mcp__legal-ai__get_chair_directions. legal-writer agent update: - Adds get_chair_directions to the tools list. - New mandatory "שלב 1ב" before any block writing: call get_chair_directions, branch on status. - missing → halt, report "legal-analyst לא רץ עדיין" - empty → halt, instruct Dafna to fill positions via the UI URL - partial → halt unless user confirms; write only filled sections - complete → proceed - New "שלב 1ג" constructs an internal direction_doc from the received chair rulings before writing block י. - Block י section expanded with 5 binding rules: 1. Open each discussion with Dafna's ruling as the thesis 2. Frame the reasoning in her style (use get_style_guide phrases) 3. Match her tone (decisive vs nuanced) 4. Must NOT contradict her position — if she disagreed with your own inclination, her position rules 5. Use legal_questions from the analysis file as the analytical structure (principle question first, concrete application second) - New bullet section for block יא: summarize each chair ruling briefly, state final outcome, close with the signed date formula. Verified all four status paths (missing/empty/partial/complete) via local test. Now Dafna's workflow is fully end-to-end: she reads the analyst report in the UI, fills "עמדת ועדת הערר" in each card, hits blur to auto-save, then triggers legal-writer — which picks up her positions as direction without any file shuffle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 13:04:30 +00:00
Chaim	3f759d3610	Improve document processing pipeline and agent workflows - Add delete_document_chunks for reprocessing, save extracted text to disk - Expand case directory structure (original/extracted/proofread/backup) - Update classifier patterns (תגובה, הודעת עמדה) - Fix proofreader agent paths for new directory layout - Update HEARTBEAT to notify on every task completion - Improve bidi_table with LRE/PDF directional embedding - Add Paperclip project verification and auto-close setup issue - Add auto-sync-cases.sh for Gitea synchronization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 16:45:49 +00:00
Chaim	22e819363e	Flatten cases directory structure and unify paths - Remove cases/new\|in-progress\|completed subdivision (status managed in DB) - Rename documents/original → documents/originals (consistent plural) - Move exports from global data/exports/ into cases/{num}/exports/ - Add documents/research/ for case law and analysis files - Update all agents, scripts, config, web API endpoints, and DB paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 14:33:27 +00:00
Chaim	5fc52ce530	Switch to cases/{new,in-progress,completed}/ directory structure Replace single CASES_DIR with find_case_dir() that searches across all status directories. New cases created in cases/new/{number}/. Config: CASES_BASE, CASES_NEW, CASES_IN_PROGRESS, CASES_COMPLETED Docker: added -v /home/chaim/legal-ai/cases:/cases volume mount Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 10:45:47 +00:00
Chaim	9d0a73a1dc	Add context-only mode: Claude Code writes blocks, no API needed New architecture: MCP provides context, Claude Code writes. New functions: - get_block_context(case_id, block_id) → returns full context package (prompt, source docs, claims, direction, precedents, style guide) WITHOUT calling Anthropic API - save_block_content(case_id, block_id, content) → saves block to DB New MCP tools: get_block_context, save_block_content The old write_block (API-based) still works as fallback. The new flow uses Claude Code's own model (Opus 4.6, 1M context) which has no separate API billing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:18:25 +00:00
Chaim	d9e5ef0f46	Add full decision writing pipeline: classify, extract, brainstorm, write, QA, export New services (11 files): - classifier.py: auto doc-type classification + party identification (Claude Haiku) - claims_extractor.py: claim extraction from pleadings (Claude Sonnet + regex) - references_extractor.py: plan/case-law/legislation detection (regex) - brainstorm.py: direction generation with 2-3 options (Claude Sonnet) - block_writer.py: 12-block decision writer (template + Claude Sonnet/Opus) - docx_exporter.py: DOCX export with David font, RTL, headings - qa_validator.py: 6 QA checks with export blocking on critical failure - learning_loop.py: draft vs final comparison + lesson extraction - metrics.py: KPIs dashboard per case and global - audit.py: action audit log - cli.py: standalone CLI with 11 commands Updated pipeline: extract → classify → chunk → embed → store → extract_references New MCP tools: 29 total (was 16) New DB tables: audit_log, decisions CRUD, claims CRUD Config: Infisical support, external service allowlist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 10:21:47 +00:00
Chaim	39089dcef5	Add outcome-aware drafting, lessons system, and improved style analysis - Add expected_outcome field to cases (rejection/partial/full/betterment_levy) - New lessons.py module with golden ratios, templates, and drafting guidance per outcome type - Style analyzer now uses Opus with full decision text (no truncation), with multi-pass fallback for large corpora - Drafting tool provides outcome-specific templates, section guidance, and ratio comments - Improved JSON extraction with bracket-matching fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 18:58:42 +00:00
Chaim	6f515dc2cb	Initial commit: MCP server + web upload interface Ezer Mishpati - AI legal decision drafting system with: - MCP server (FastMCP) with document processing pipeline - Web upload interface (FastAPI) for file upload and classification - pgvector-based semantic search - Hebrew legal document chunking and embedding	2026-03-23 12:33:07 +00:00

31 Commits