# PRD — Legal Decision Assistant (עוזר משפטי) ## Project Overview AI-powered system to assist the Chair of the Jerusalem District Planning Appeals Committee (Adv. Dafna Tamir) in writing formal legal decisions. The system migrates knowledge from a legacy Obsidian vault to a structured PostgreSQL + pgvector + n8n platform on the Nautilus server. ## Current State (What Already Exists) ### Infrastructure (Completed) - PostgreSQL with pgvector on Nautilus (legal-ai-postgres) - 16 database tables in 4 layers: Core, Decision, Knowledge, RAG - MCP server (legal-ai) with document upload, case management, search, style analysis - Web upload interface (ezer-mishpati-web) at legal-ai.nautilus.marcusgroup.org - Voyage AI embeddings (voyage-3-large, dim=1024) — 323 existing embeddings from 4 training decisions - Coolify, Gitea, Redis, n8n (empty), Infisical on Nautilus ### Data Already Imported - 19 appeal cases with basic metadata (case numbers, titles, parties, addresses, status) - 15 lessons learned from 3 analyzed decisions (הכט, בית הכרם, קרית יערים) - 44 transition phrases from Dafna's writing style - 9 case law references (precedents) - 7 statutory provisions - 4 training decisions in style corpus with 90 style patterns ### Legacy Vault (Read-Only Reference) Located at legacy/dafna-tamir/. Contains: - 16 archived case folders with source materials (~280 documents total) - 3 active case folders - 9 completed decisions (PDF/DOCX) - Original SKILL.md style guide - Original Claude Code skills ## What Needs to Be Done ### Phase 1: Full Case Audit (Priority: HIGH) Systematically audit all 19 case folders in the legacy vault: - For each case folder: list every document, classify by type (appeal/response/decision/exhibit/protocol/expert-opinion), record dates and page counts - Identify completed decisions vs. in-progress vs. not-started - Identify gaps (missing documents, incomplete metadata) - Produce audit report per case ### Phase 2: Document Import (Priority: HIGH) Import all documents from legacy vault to the database: - Register each document in the `documents` table with correct case_id, doc_type, title, file_path - Track which documents have been imported vs. pending - Priority: completed cases first (הכט, בית הכרם, אפרים אבי, etc.) ### Phase 3: Text Extraction (Priority: HIGH) Extract text from all imported documents: - PDF extraction using PyMuPDF (already in MCP server dependencies) - DOCX extraction - Hebrew OCR for scanned PDFs (Claude Vision or Tesseract) - Store extracted text in documents.extracted_text - Update extraction_status for each document ### Phase 4: Decision Decomposition (Priority: HIGH) Parse the 9 completed decisions into the 12-block structure: - For each completed decision: create a `decisions` record - Identify and extract each of the 12 blocks (alef through yod-bet) - Store blocks in `decision_blocks` with correct block_id, content, word counts, weights - Extract individual paragraphs to `decision_paragraphs` with paragraph numbers - Track citations within paragraphs (case law references) - This is critical training data for the system ### Phase 5: Claims Extraction (Priority: MEDIUM) Extract party claims from appeal documents and responses: - Parse appeal letters (כתבי ערר) to extract appellant claims - Parse responses (כתבי תשובה) to extract respondent/committee claims - Store in `claims` table with party_role, claim_text, source_document - Link claims to paragraphs in discussion blocks where they are addressed (addressed_in_paragraph) ### Phase 6: Embeddings & RAG (Priority: MEDIUM) Generate embeddings for all extracted content: - Chunk extracted document text (600 tokens, 100 overlap — already configured) - Generate Voyage embeddings for document chunks - Generate embeddings for decision paragraphs → paragraph_embeddings - Generate embeddings for case law summaries → case_law_embeddings - Build semantic search functions in MCP server - Test: "find similar precedents for this case" ### Phase 7: n8n Workflow Automation (Priority: LOW) Create automated workflows: - Document upload → classify document type → store in DB → generate embeddings - New appeal creation → auto-create 12-block structure → generate DOCX template - Precedent search → RAG query → return ranked results - Draft validation → check against block-schema constraints ### Phase 8: Enhanced Web UI (Priority: LOW) Extend ezer-mishpati-web: - Case management dashboard (list all cases, status, documents) - Decision writing interface (block-by-block with live preview) - Precedent search interface with semantic results - Style guide reference panel - DOCX export from decision blocks ## Technical Architecture ### Database: 4 Layers, 16 Tables Layer 1 (Core): cases, documents, document_chunks, style_corpus, style_patterns Layer 2 (Decision): decisions, decision_blocks, decision_paragraphs, claims Layer 3 (Knowledge): case_law, case_law_citations, statutory_provisions, transition_phrases, lessons_learned Layer 4 (RAG): paragraph_embeddings, case_law_embeddings ### Key Design Decisions - Embedding model: Voyage voyage-3-large (1024 dimensions) - Chunk size: 600 tokens with 100 overlap - Decision structure: 12 blocks based on CREAC/DITA/Akoma Ntoso/Federal Judicial Center - All Hebrew content — RTL support required in DOCX export - Style guide: SKILL.md (Dafna's writing patterns, tone per appeal type, transition phrases) ### MCP Server Stack - Python asyncpg for PostgreSQL - FastMCP for tool registration - PyMuPDF for PDF extraction - Anthropic API for Claude Vision OCR (scanned PDFs) ## Critical Rules 1. "Judge Test" — every decision readable by a judge unfamiliar with the case 2. "Neutral Background" — Block ו contains only objective facts, no party quotes or value judgments 3. "No Duplication" — Block י references previous blocks, doesn't repeat them 4. "Original Claims Only" — Block ז uses only original appeal/response documents; supplements go to Block ח 5. 12-Block Architecture — see docs/block-schema.md for full specification 6. Work methodically — audit before import, validate after each step, no shortcuts ## File Locations - Project root: /home/chaim/legal-ai/ - Legacy vault: legacy/dafna-tamir/ (read-only) - MCP server: mcp-server/src/legal_mcp/ - Documentation: docs/ (architecture.md, block-schema.md, migration-plan.md) - Scripts: scripts/ (seed-knowledge.py, seed-appeals.py) - Style guide: skills/decision/SKILL.md - Lessons: docs/legal-decision-lessons.md