Files
legal-ai/.taskmaster/docs/prd.txt
Chaim 911c797eb2 Reorganize: skills/ directory + move memory to docs/
skill-legal-decision/ → skills/decision/
skill-legal-assistant/ → skills/assistant/
skill-legal-docx/ → skills/docx/
memory/*.md → docs/

Also removed: TASKS.md (use TaskMaster), classifier.py (replaced by local_classifier.py)
Updated all references in CLAUDE.md, scripts, PRDs, docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:27:07 +00:00

133 lines
6.4 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PRD — Legal Decision Assistant (עוזר משפטי)
## Project Overview
AI-powered system to assist the Chair of the Jerusalem District Planning Appeals Committee (Adv. Dafna Tamir) in writing formal legal decisions. The system migrates knowledge from a legacy Obsidian vault to a structured PostgreSQL + pgvector + n8n platform on the Nautilus server.
## Current State (What Already Exists)
### Infrastructure (Completed)
- PostgreSQL with pgvector on Nautilus (legal-ai-postgres)
- 16 database tables in 4 layers: Core, Decision, Knowledge, RAG
- MCP server (legal-ai) with document upload, case management, search, style analysis
- Web upload interface (ezer-mishpati-web) at legal-ai.nautilus.marcusgroup.org
- Voyage AI embeddings (voyage-3-large, dim=1024) — 323 existing embeddings from 4 training decisions
- Coolify, Gitea, Redis, n8n (empty), Infisical on Nautilus
### Data Already Imported
- 19 appeal cases with basic metadata (case numbers, titles, parties, addresses, status)
- 15 lessons learned from 3 analyzed decisions (הכט, בית הכרם, קרית יערים)
- 44 transition phrases from Dafna's writing style
- 9 case law references (precedents)
- 7 statutory provisions
- 4 training decisions in style corpus with 90 style patterns
### Legacy Vault (Read-Only Reference)
Located at legacy/dafna-tamir/. Contains:
- 16 archived case folders with source materials (~280 documents total)
- 3 active case folders
- 9 completed decisions (PDF/DOCX)
- Original SKILL.md style guide
- Original Claude Code skills
## What Needs to Be Done
### Phase 1: Full Case Audit (Priority: HIGH)
Systematically audit all 19 case folders in the legacy vault:
- For each case folder: list every document, classify by type (appeal/response/decision/exhibit/protocol/expert-opinion), record dates and page counts
- Identify completed decisions vs. in-progress vs. not-started
- Identify gaps (missing documents, incomplete metadata)
- Produce audit report per case
### Phase 2: Document Import (Priority: HIGH)
Import all documents from legacy vault to the database:
- Register each document in the `documents` table with correct case_id, doc_type, title, file_path
- Track which documents have been imported vs. pending
- Priority: completed cases first (הכט, בית הכרם, אפרים אבי, etc.)
### Phase 3: Text Extraction (Priority: HIGH)
Extract text from all imported documents:
- PDF extraction using PyMuPDF (already in MCP server dependencies)
- DOCX extraction
- Hebrew OCR for scanned PDFs (Claude Vision or Tesseract)
- Store extracted text in documents.extracted_text
- Update extraction_status for each document
### Phase 4: Decision Decomposition (Priority: HIGH)
Parse the 9 completed decisions into the 12-block structure:
- For each completed decision: create a `decisions` record
- Identify and extract each of the 12 blocks (alef through yod-bet)
- Store blocks in `decision_blocks` with correct block_id, content, word counts, weights
- Extract individual paragraphs to `decision_paragraphs` with paragraph numbers
- Track citations within paragraphs (case law references)
- This is critical training data for the system
### Phase 5: Claims Extraction (Priority: MEDIUM)
Extract party claims from appeal documents and responses:
- Parse appeal letters (כתבי ערר) to extract appellant claims
- Parse responses (כתבי תשובה) to extract respondent/committee claims
- Store in `claims` table with party_role, claim_text, source_document
- Link claims to paragraphs in discussion blocks where they are addressed (addressed_in_paragraph)
### Phase 6: Embeddings & RAG (Priority: MEDIUM)
Generate embeddings for all extracted content:
- Chunk extracted document text (600 tokens, 100 overlap — already configured)
- Generate Voyage embeddings for document chunks
- Generate embeddings for decision paragraphs → paragraph_embeddings
- Generate embeddings for case law summaries → case_law_embeddings
- Build semantic search functions in MCP server
- Test: "find similar precedents for this case"
### Phase 7: n8n Workflow Automation (Priority: LOW)
Create automated workflows:
- Document upload → classify document type → store in DB → generate embeddings
- New appeal creation → auto-create 12-block structure → generate DOCX template
- Precedent search → RAG query → return ranked results
- Draft validation → check against block-schema constraints
### Phase 8: Enhanced Web UI (Priority: LOW)
Extend ezer-mishpati-web:
- Case management dashboard (list all cases, status, documents)
- Decision writing interface (block-by-block with live preview)
- Precedent search interface with semantic results
- Style guide reference panel
- DOCX export from decision blocks
## Technical Architecture
### Database: 4 Layers, 16 Tables
Layer 1 (Core): cases, documents, document_chunks, style_corpus, style_patterns
Layer 2 (Decision): decisions, decision_blocks, decision_paragraphs, claims
Layer 3 (Knowledge): case_law, case_law_citations, statutory_provisions, transition_phrases, lessons_learned
Layer 4 (RAG): paragraph_embeddings, case_law_embeddings
### Key Design Decisions
- Embedding model: Voyage voyage-3-large (1024 dimensions)
- Chunk size: 600 tokens with 100 overlap
- Decision structure: 12 blocks based on CREAC/DITA/Akoma Ntoso/Federal Judicial Center
- All Hebrew content — RTL support required in DOCX export
- Style guide: SKILL.md (Dafna's writing patterns, tone per appeal type, transition phrases)
### MCP Server Stack
- Python asyncpg for PostgreSQL
- FastMCP for tool registration
- PyMuPDF for PDF extraction
- Anthropic API for Claude Vision OCR (scanned PDFs)
## Critical Rules
1. "Judge Test" — every decision readable by a judge unfamiliar with the case
2. "Neutral Background" — Block ו contains only objective facts, no party quotes or value judgments
3. "No Duplication" — Block י references previous blocks, doesn't repeat them
4. "Original Claims Only" — Block ז uses only original appeal/response documents; supplements go to Block ח
5. 12-Block Architecture — see docs/block-schema.md for full specification
6. Work methodically — audit before import, validate after each step, no shortcuts
## File Locations
- Project root: /home/chaim/legal-ai/
- Legacy vault: legacy/dafna-tamir/ (read-only)
- MCP server: mcp-server/src/legal_mcp/
- Documentation: docs/ (architecture.md, block-schema.md, migration-plan.md)
- Scripts: scripts/ (seed-knowledge.py, seed-appeals.py)
- Style guide: skills/decision/SKILL.md
- Lessons: docs/legal-decision-lessons.md