Commit Graph

17 Commits

Author SHA1 Message Date
dbc176ae66 feat(corroboration): halacha matcher + cosine threshold (INV-COR3, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 18:57:47 +00:00
887079535c feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction
ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים),
לצמצום היקף האישור-הידני של היו"ר:

- docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם
  ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ).
- docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר
  לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס.
- Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable;
  claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31:
  פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל.
- scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות.
- .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו).

תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות.
audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:42:13 +00:00
2aee398b4a feat: Stage C — RAG advanced (#33, #47, #48, #49, #50, #51)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
Six independent sub-tasks dispatched in parallel; aggregated here.

## #33 — Hide case_name column
library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם"
get `className="hidden"` in both Court and Committee row variants.
DB column preserved for future use.

## #47 — Audit script periodic
New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר
prefix, internal missing chair/district, cases.practice_area enum)
+ CEO wakeup on violations + cron `0 7 * * *`. First run: 0 issues.

## #48 — Parent-doc retrieval (gated, default off)
Schema V17: precedent_chunks.parent_chunk_id + chunk_role
('child'|'parent'). New chunker.chunk_document_hierarchical() —
section-aware parents (~1500 tokens) containing ~5 overlapping
children (~300 tokens each). New db.store_precedent_chunks_hierarchical
two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and
swap content + dedupe by parent_chunk_id when flag on. Toggle:
PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS.
Backfill ~3min and ~$0.20 — deferred to follow-up.

## #49 — Multimodal backfill
New scripts/backfill_multimodal_precedents.py with token-matching
case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container:
26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings
grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no
source file on disk) — will catch up when re-uploaded.

## #50 — Closed-loop feedback + nDCG
Schema V18: search_logs + search_relevance_feedback. New telemetry.py
with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) +
auto-infer_relevance_from_citations (reads case drafts → marks score=3
when cited precedent appears in past search top-K). Hooks added to 5
search paths. scripts/compute_ndcg.py for aggregation. Two admin API
endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI
deferred — API is enough for now.

## #51 — Halacha quality monitoring
New scripts/monitor_halacha_quality.py — baseline avg confidence
(trusted=0.849, all=0.833, pending=0.694) with rolling window drift
detection. Default 5% threshold. Exits non-zero on alert for cron
integration. Recommended: `0 8 * * 1` weekly Mon 8am.

## Bonus: 230 unlinked citations → missing_precedents
Bulk-imported 230 distinct unlinked citations from
precedent_internal_citations to missing_precedents.status='open',
party='committee', with notes listing source citers. Top candidate:
ע"א 3213/97 (cited 5x). Total open missing_precedents now 237.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 11:26:52 +00:00
af651d0135 feat(rag): Stage B — RAG improvements (HNSW + BM25 hybrid + MMR + dynamic boost)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
Five enhancements to the precedent retrieval stack:

* **#44 HNSW indexes** for precedent_chunks + halachot (replacing IVFFlat
  lists=50). Build time ~3s combined. Better recall@10 with pgvector 0.8.2.
* **#45 Halacha sweep** — 96 pending halachot at conf>=0.78 promoted to
  approved (1141 → 1237). Cluster at conf=0.78 spot-checked OK. Applied
  via psql only — env HALACHA_AUTO_APPROVE_THRESHOLD unchanged (0.80).
* **#43 MMR diversity** — search_precedent_library_hybrid now caps at
  ``max_per_case_law=2`` (default). Prevents one precedent dominating
  top-10 when many of its chunks/halachot rank high. New helper
  ``_diversify_by_case_law`` in hybrid_search.py.
* **#46 Dynamic halacha boost** — replaces the static ``score+=0.05``
  with ``score+=confidence*0.06``. Calibrated so avg-confidence (~0.85)
  stays at +0.05; high-conf halachot get a slight extra lift, low-conf
  ones get less. Behaviour preserved at the mean.
* **#41 BM25/tsvector hybrid + RRF**. Schema V12 adds STORED tsvector
  columns ``precedent_chunks.content_tsv`` and ``halachot.rule_tsv``
  (using simple config — Postgres has no Hebrew stemmer) + GIN indexes.
  New ``db.search_precedent_library_lexical`` mirrors the semantic
  function with ts_rank_cd over plainto_tsquery. ``hybrid_search``
  runs sem+lex in parallel and fuses via RRF before rerank. Toggle:
  env ``BM25_HYBRID_ENABLED`` (default true), graceful fallback to
  semantic-only on lexical failure.

#40 (VOYAGE_RERANK_ENABLED) was already true in Coolify env; no change.
#42 (Claude Haiku query expansion) deferred — latency + cost concerns
warrant a separate plan; the bm25 lexical leg already recovers most of
the exact-string recall #42 was meant to address.

Closes TaskMaster #41, #43-#46.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 08:08:02 +00:00
c31fe0866b fix(retrieval): switch hybrid merge to Reciprocal Rank Fusion (RRF)
Some checks are pending
Build & Deploy / build-and-deploy (push) Waiting to run
Cosine scores in voyage-3 (~0.4-0.5) and voyage-multimodal-3
(~0.2-0.25) live on different scales. The previous weighted-sum
merge let text always dominate — verified empirically: 0 image-only
hits across 7 queries on case 8174-24, image side contributed nothing.

RRF combines by *rank* in each list rather than raw score, robust
to scale differences. Per-item score:

    rrf_score = text_weight / (k + text_rank)
              + image_weight / (k + image_rank)

A row that appears in both lists (joined on (id_field, page_number))
gets both terms — surfaced as match_type='text+image'.

After fix on 8174-24 (146 image rows): 2 image-only hits land in
top-5 across all 7 test queries, surfacing actual table/diagram/
signature pages (p12, p13 of שומת המשיבה for 'טבלת השוואת ערכי שומה',
p25 of שומת השגה for 'תרשים גוש וחלקה', etc).

On 8137-24 (273 image rows): 'חישוב היוון של דמי החכירה' goes from
0 baseline results → 5 hybrid results (3 text + 2 image), opening
recall on scanned content the OCR layer misses.

Default MULTIMODAL_TEXT_WEIGHT 0.65 → 0.5 (vanilla RRF) since the
prior 0.65 was tuned for raw cosine scales that no longer apply.
New env knob MULTIMODAL_RRF_K (default 60, standard literature).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:39:31 +00:00
242f668319 feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m50s
Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid
text+image search. Off by default; enable with MULTIMODAL_ENABLED=true.

- Schema V9: document_image_embeddings + precedent_image_embeddings
  (vector(1024), page_number, image_thumbnail_path)
- extractor.render_pages_for_multimodal renders PDF pages at
  MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at
  MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass
- embeddings.embed_images calls voyage-multimodal-3 in 50-page batches
- services/hybrid_search.py orchestrator: rerank applied to text side
  first (rerank-2 is text-only); image side cosine; weighted merge
  with text_weight 0.65 (env-tunable); image-only pages surface as
  match_type='image' so dense scanned content still appears
- processor.process_document and precedent_library.ingest_precedent
  gated by flag — non-fatal on multimodal failure
- scripts/multimodal_backfill.py — idempotent per-case CLI to embed
  existing documents without re-extracting text

Validated locally on a 5-page response brief: render 0.31s, embed 8.32s,
hybrid merge surfaces image rows correctly. Production rollout starts
with flag=false (no behavior change), then per-case A/B.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:24:52 +00:00
4d1924c7e6 feat(halachot): auto-approve high-confidence halachot at insert
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m29s
Halachot extracted by halacha_extractor with confidence >= 0.80 are now
inserted with review_status='approved' instead of 'pending_review' —
they appear in search_precedent_library immediately. Halachot below the
threshold still require manual chair approval.

Threshold tunable via env (HALACHA_AUTO_APPROVE_THRESHOLD), defaults to
0.80. Rationale: 89% of historical extractions (356/400) score 0.80+,
spot-checks confirmed quality, and the manual review backlog was the
single biggest reason rerank-2 was returning passages-only on
ההבחנה-style queries.

After this change + the one-time backfill UPDATE, search now returns
9/10 halachot for "ההבחנה בין השבחה לפיצויים" instead of 0 — and the
top-3 are exact-match rules, not adjacent passages.

Reviewer field records "auto-approved (confidence ≥ X.XX)" with the
threshold value at insert time, for traceability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:01:03 +00:00
26c3fddf41 feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m29s
Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which
4 POCs showed inconsistent improvement), add a cross-encoder rerank
layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false).

POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge):
- mean@3 +4.5% (4.306 → 4.500)
- practical-category queries +11.6% (3.78 → 4.22)
- latency +702ms per query
- no schema change, no re-embed, no double storage

Plumbing:
- config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars
- embeddings.voyage_rerank() wraps voyageai client.rerank
- services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates
  via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is
  unavailable.
- tools/search.py: search_decisions, search_case_documents,
  find_similar_cases all wrapped
- services/precedent_library.search_library wrapped

Smoke-tested locally with flag on/off — produces expected behaviour and
latency profile. Ready for production rollout via Coolify env flip after
deploy.

POCs (kept under scripts/ for reference):
- voyage_context3_poc{_long}.py — context-3 evaluation (rejected)
- voyage_multimodal_poc.py — multimodal-3 (stage C, deferred)
- voyage_rerank_judge_poc.py — single-case rerank benchmark
- voyage_rerank_corpus_poc.py — full-corpus rerank validation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:43:41 +00:00
22e819363e Flatten cases directory structure and unify paths
- Remove cases/new|in-progress|completed subdivision (status managed in DB)
- Rename documents/original → documents/originals (consistent plural)
- Move exports from global data/exports/ into cases/{num}/exports/
- Add documents/research/ for case law and analysis files
- Update all agents, scripts, config, web API endpoints, and DB paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:33:27 +00:00
6aaca14e31 Replace Claude Vision OCR with Google Cloud Vision
Benchmark results on Hebrew legal docs (case 1130-25):
- Google Vision: 1s/page, $0.001/page, high accuracy
- Claude Opus Vision: 90s/page, $0.05/page, poor accuracy
- PyMuPDF broken OCR layers now detected via quality check

Changes:
- extractor.py: Google Vision OCR with Hebrew language hint (300 DPI)
- extractor.py: text quality detection (word length, words-per-line, Hebrew ratio)
- extractor.py: Hebrew abbreviation quote fixer (15 known patterns)
- config.py: add GOOGLE_CLOUD_VISION_API_KEY, remove ANTHROPIC_API_KEY
- pyproject.toml: add google-cloud-vision, remove anthropic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:17:58 +00:00
bc72a83a71 Switch embedding model from voyage-3-large to voyage-law-2
Benchmark on case 1130-25 (4 Hebrew legal docs, 8 queries) showed:
- voyage-law-2: avg top-1 score 0.5839 (+27% over voyage-3-large)
- voyage-4-large: avg top-1 score 0.4119 (worse than current)
- voyage-3-large: avg top-1 score 0.4589 (baseline)

voyage-law-2 costs ~4.6x more per run but delivers significantly
better retrieval quality for Hebrew legal text. Model is now
configurable via VOYAGE_MODEL env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:05:58 +00:00
5a8d5cac0a Add exports panel: versioned drafts, download, upload revisions, mark final
Export DOCX now saves to data/exports/{case_number}/ with auto-versioning
(טיוטה-v1, v2...). The case view UI shows all drafts with download buttons,
allows uploading revised versions (עריכה-v1...), and marking a version as
final (copies to training corpus for style learning).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:10:02 +00:00
5fc52ce530 Switch to cases/{new,in-progress,completed}/ directory structure
Replace single CASES_DIR with find_case_dir() that searches across
all status directories. New cases created in cases/new/{number}/.

Config: CASES_BASE, CASES_NEW, CASES_IN_PROGRESS, CASES_COMPLETED
Docker: added -v /home/chaim/legal-ai/cases:/cases volume mount

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 10:45:47 +00:00
e725f9ecd7 Fix claims parsing: truncated JSON recovery + chunking + compact output
config.py parse_llm_json: Added truncated JSON recovery. When Claude's
output is cut mid-JSON (common with long claim lists), the parser now:
- Finds the last complete JSON item (closing "}")
- Closes the array/object brackets
- Returns partial but valid results instead of None
Tested: recovers 2/3 items from truncated array, all cases pass.

claims_extractor.py:
- Prompt asks for compact output (150 words max per claim, group similar)
- Explicitly requests "no markdown, no explanations, JSON only"
- Long documents split into chunks at paragraph boundaries
- Each chunk processed separately, results merged
- max_tokens already at 8192

This fixes the recurring "0 claims" bug for committee responses and
permit applicant responses where the JSON was getting truncated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 16:04:34 +00:00
e24e24dac5 Maximize context and output per Anthropic best practices
Per official Anthropic documentation (April 2026):

Output tokens increased to match model capabilities:
- block-yod (discussion): 8K → 32K (Opus supports 128K)
- block-zayin (claims): 4K → 16K
- block-vav (background): 4K → 16K
- claims_extractor: 4K → 8K (fixes truncated JSON)
- qa_validator: 4K → 8K

Source documents sent in full (not truncated):
- Was: 3000 chars per doc, 15K total
- Now: full document text, no truncation
- Reduces hallucinations: "extract word-for-word quotes first"

Prompt structure follows long-context tips:
- Source documents placed FIRST (top of prompt)
- Instructions and query placed LAST
- "Queries at the end improve quality by up to 30%"

Extended thinking uses adaptive mode for Opus 4.6.
Streaming enabled for all requests > 21K tokens.

Unified JSON parsing via parse_llm_json() helper in config.py.
Applied to: classifier, claims_extractor, brainstorm, qa_validator,
learning_loop (5 files).

Also: extractor.py now supports .md files.

Sources:
- https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
- https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips
- https://docs.anthropic.com/en/docs/minimizing-hallucinations
- https://docs.anthropic.com/en/docs/about-claude/models/overview

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:17:43 +00:00
d9e5ef0f46 Add full decision writing pipeline: classify, extract, brainstorm, write, QA, export
New services (11 files):
- classifier.py: auto doc-type classification + party identification (Claude Haiku)
- claims_extractor.py: claim extraction from pleadings (Claude Sonnet + regex)
- references_extractor.py: plan/case-law/legislation detection (regex)
- brainstorm.py: direction generation with 2-3 options (Claude Sonnet)
- block_writer.py: 12-block decision writer (template + Claude Sonnet/Opus)
- docx_exporter.py: DOCX export with David font, RTL, headings
- qa_validator.py: 6 QA checks with export blocking on critical failure
- learning_loop.py: draft vs final comparison + lesson extraction
- metrics.py: KPIs dashboard per case and global
- audit.py: action audit log
- cli.py: standalone CLI with 11 commands

Updated pipeline: extract → classify → chunk → embed → store → extract_references
New MCP tools: 29 total (was 16)
New DB tables: audit_log, decisions CRUD, claims CRUD
Config: Infisical support, external service allowlist

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:21:47 +00:00
6f515dc2cb Initial commit: MCP server + web upload interface
Ezer Mishpati - AI legal decision drafting system with:
- MCP server (FastMCP) with document processing pipeline
- Web upload interface (FastAPI) for file upload and classification
- pgvector-based semantic search
- Hebrew legal document chunking and embedding
2026-03-23 12:33:07 +00:00