Add docs, scripts, skills, commands, and taskmaster config to repo
Includes: - docs/: architecture, block-schema, migration-plan, product-specification - scripts/: bidi_table, decompose-decisions, extract-claims, seed-knowledge, etc. - skill-legal-decision/: SKILL.md + references + block-schema - skill-legal-assistant/: SKILL.md - skill-legal-docx/: SKILL.md + references - .claude/commands/: bidi-table skill - .taskmaster/: task config + PRDs - .gitignore: exclude legacy/, kiryat-yearim/, node_modules/, memory/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
132
.taskmaster/docs/prd.txt
Normal file
132
.taskmaster/docs/prd.txt
Normal file
@@ -0,0 +1,132 @@
|
||||
# PRD — Legal Decision Assistant (עוזר משפטי)
|
||||
|
||||
## Project Overview
|
||||
|
||||
AI-powered system to assist the Chair of the Jerusalem District Planning Appeals Committee (Adv. Dafna Tamir) in writing formal legal decisions. The system migrates knowledge from a legacy Obsidian vault to a structured PostgreSQL + pgvector + n8n platform on the Nautilus server.
|
||||
|
||||
## Current State (What Already Exists)
|
||||
|
||||
### Infrastructure (Completed)
|
||||
- PostgreSQL with pgvector on Nautilus (legal-ai-postgres)
|
||||
- 16 database tables in 4 layers: Core, Decision, Knowledge, RAG
|
||||
- MCP server (legal-ai) with document upload, case management, search, style analysis
|
||||
- Web upload interface (ezer-mishpati-web) at legal-ai.nautilus.marcusgroup.org
|
||||
- Voyage AI embeddings (voyage-3-large, dim=1024) — 323 existing embeddings from 4 training decisions
|
||||
- Coolify, Gitea, Redis, n8n (empty), Infisical on Nautilus
|
||||
|
||||
### Data Already Imported
|
||||
- 19 appeal cases with basic metadata (case numbers, titles, parties, addresses, status)
|
||||
- 15 lessons learned from 3 analyzed decisions (הכט, בית הכרם, קרית יערים)
|
||||
- 44 transition phrases from Dafna's writing style
|
||||
- 9 case law references (precedents)
|
||||
- 7 statutory provisions
|
||||
- 4 training decisions in style corpus with 90 style patterns
|
||||
|
||||
### Legacy Vault (Read-Only Reference)
|
||||
Located at legacy/dafna-tamir/. Contains:
|
||||
- 16 archived case folders with source materials (~280 documents total)
|
||||
- 3 active case folders
|
||||
- 9 completed decisions (PDF/DOCX)
|
||||
- Original SKILL.md style guide
|
||||
- Original Claude Code skills
|
||||
|
||||
## What Needs to Be Done
|
||||
|
||||
### Phase 1: Full Case Audit (Priority: HIGH)
|
||||
Systematically audit all 19 case folders in the legacy vault:
|
||||
- For each case folder: list every document, classify by type (appeal/response/decision/exhibit/protocol/expert-opinion), record dates and page counts
|
||||
- Identify completed decisions vs. in-progress vs. not-started
|
||||
- Identify gaps (missing documents, incomplete metadata)
|
||||
- Produce audit report per case
|
||||
|
||||
### Phase 2: Document Import (Priority: HIGH)
|
||||
Import all documents from legacy vault to the database:
|
||||
- Register each document in the `documents` table with correct case_id, doc_type, title, file_path
|
||||
- Track which documents have been imported vs. pending
|
||||
- Priority: completed cases first (הכט, בית הכרם, אפרים אבי, etc.)
|
||||
|
||||
### Phase 3: Text Extraction (Priority: HIGH)
|
||||
Extract text from all imported documents:
|
||||
- PDF extraction using PyMuPDF (already in MCP server dependencies)
|
||||
- DOCX extraction
|
||||
- Hebrew OCR for scanned PDFs (Claude Vision or Tesseract)
|
||||
- Store extracted text in documents.extracted_text
|
||||
- Update extraction_status for each document
|
||||
|
||||
### Phase 4: Decision Decomposition (Priority: HIGH)
|
||||
Parse the 9 completed decisions into the 12-block structure:
|
||||
- For each completed decision: create a `decisions` record
|
||||
- Identify and extract each of the 12 blocks (alef through yod-bet)
|
||||
- Store blocks in `decision_blocks` with correct block_id, content, word counts, weights
|
||||
- Extract individual paragraphs to `decision_paragraphs` with paragraph numbers
|
||||
- Track citations within paragraphs (case law references)
|
||||
- This is critical training data for the system
|
||||
|
||||
### Phase 5: Claims Extraction (Priority: MEDIUM)
|
||||
Extract party claims from appeal documents and responses:
|
||||
- Parse appeal letters (כתבי ערר) to extract appellant claims
|
||||
- Parse responses (כתבי תשובה) to extract respondent/committee claims
|
||||
- Store in `claims` table with party_role, claim_text, source_document
|
||||
- Link claims to paragraphs in discussion blocks where they are addressed (addressed_in_paragraph)
|
||||
|
||||
### Phase 6: Embeddings & RAG (Priority: MEDIUM)
|
||||
Generate embeddings for all extracted content:
|
||||
- Chunk extracted document text (600 tokens, 100 overlap — already configured)
|
||||
- Generate Voyage embeddings for document chunks
|
||||
- Generate embeddings for decision paragraphs → paragraph_embeddings
|
||||
- Generate embeddings for case law summaries → case_law_embeddings
|
||||
- Build semantic search functions in MCP server
|
||||
- Test: "find similar precedents for this case"
|
||||
|
||||
### Phase 7: n8n Workflow Automation (Priority: LOW)
|
||||
Create automated workflows:
|
||||
- Document upload → classify document type → store in DB → generate embeddings
|
||||
- New appeal creation → auto-create 12-block structure → generate DOCX template
|
||||
- Precedent search → RAG query → return ranked results
|
||||
- Draft validation → check against block-schema constraints
|
||||
|
||||
### Phase 8: Enhanced Web UI (Priority: LOW)
|
||||
Extend ezer-mishpati-web:
|
||||
- Case management dashboard (list all cases, status, documents)
|
||||
- Decision writing interface (block-by-block with live preview)
|
||||
- Precedent search interface with semantic results
|
||||
- Style guide reference panel
|
||||
- DOCX export from decision blocks
|
||||
|
||||
## Technical Architecture
|
||||
|
||||
### Database: 4 Layers, 16 Tables
|
||||
Layer 1 (Core): cases, documents, document_chunks, style_corpus, style_patterns
|
||||
Layer 2 (Decision): decisions, decision_blocks, decision_paragraphs, claims
|
||||
Layer 3 (Knowledge): case_law, case_law_citations, statutory_provisions, transition_phrases, lessons_learned
|
||||
Layer 4 (RAG): paragraph_embeddings, case_law_embeddings
|
||||
|
||||
### Key Design Decisions
|
||||
- Embedding model: Voyage voyage-3-large (1024 dimensions)
|
||||
- Chunk size: 600 tokens with 100 overlap
|
||||
- Decision structure: 12 blocks based on CREAC/DITA/Akoma Ntoso/Federal Judicial Center
|
||||
- All Hebrew content — RTL support required in DOCX export
|
||||
- Style guide: SKILL.md (Dafna's writing patterns, tone per appeal type, transition phrases)
|
||||
|
||||
### MCP Server Stack
|
||||
- Python asyncpg for PostgreSQL
|
||||
- FastMCP for tool registration
|
||||
- PyMuPDF for PDF extraction
|
||||
- Anthropic API for Claude Vision OCR (scanned PDFs)
|
||||
|
||||
## Critical Rules
|
||||
1. "Judge Test" — every decision readable by a judge unfamiliar with the case
|
||||
2. "Neutral Background" — Block ו contains only objective facts, no party quotes or value judgments
|
||||
3. "No Duplication" — Block י references previous blocks, doesn't repeat them
|
||||
4. "Original Claims Only" — Block ז uses only original appeal/response documents; supplements go to Block ח
|
||||
5. 12-Block Architecture — see docs/block-schema.md for full specification
|
||||
6. Work methodically — audit before import, validate after each step, no shortcuts
|
||||
|
||||
## File Locations
|
||||
- Project root: /home/chaim/legal-ai/
|
||||
- Legacy vault: legacy/dafna-tamir/ (read-only)
|
||||
- MCP server: mcp-server/src/legal_mcp/
|
||||
- Documentation: docs/ (architecture.md, block-schema.md, migration-plan.md)
|
||||
- Scripts: scripts/ (seed-knowledge.py, seed-appeals.py)
|
||||
- Style guide: skill-legal-decision/SKILL.md
|
||||
- Lessons: memory/legal-decision-lessons.md
|
||||
Reference in New Issue
Block a user