Cosine scores in voyage-3 (~0.4-0.5) and voyage-multimodal-3
(~0.2-0.25) live on different scales. The previous weighted-sum
merge let text always dominate — verified empirically: 0 image-only
hits across 7 queries on case 8174-24, image side contributed nothing.
RRF combines by *rank* in each list rather than raw score, robust
to scale differences. Per-item score:
rrf_score = text_weight / (k + text_rank)
+ image_weight / (k + image_rank)
A row that appears in both lists (joined on (id_field, page_number))
gets both terms — surfaced as match_type='text+image'.
After fix on 8174-24 (146 image rows): 2 image-only hits land in
top-5 across all 7 test queries, surfacing actual table/diagram/
signature pages (p12, p13 of שומת המשיבה for 'טבלת השוואת ערכי שומה',
p25 of שומת השגה for 'תרשים גוש וחלקה', etc).
On 8137-24 (273 image rows): 'חישוב היוון של דמי החכירה' goes from
0 baseline results → 5 hybrid results (3 text + 2 image), opening
recall on scanned content the OCR layer misses.
Default MULTIMODAL_TEXT_WEIGHT 0.65 → 0.5 (vanilla RRF) since the
prior 0.65 was tuned for raw cosine scales that no longer apply.
New env knob MULTIMODAL_RRF_K (default 60, standard literature).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid
text+image search. Off by default; enable with MULTIMODAL_ENABLED=true.
- Schema V9: document_image_embeddings + precedent_image_embeddings
(vector(1024), page_number, image_thumbnail_path)
- extractor.render_pages_for_multimodal renders PDF pages at
MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at
MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass
- embeddings.embed_images calls voyage-multimodal-3 in 50-page batches
- services/hybrid_search.py orchestrator: rerank applied to text side
first (rerank-2 is text-only); image side cosine; weighted merge
with text_weight 0.65 (env-tunable); image-only pages surface as
match_type='image' so dense scanned content still appears
- processor.process_document and precedent_library.ingest_precedent
gated by flag — non-fatal on multimodal failure
- scripts/multimodal_backfill.py — idempotent per-case CLI to embed
existing documents without re-extracting text
Validated locally on a 5-page response brief: render 0.31s, embed 8.32s,
hybrid merge surfaces image rows correctly. Production rollout starts
with flag=false (no behavior change), then per-case A/B.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Halachot extracted by halacha_extractor with confidence >= 0.80 are now
inserted with review_status='approved' instead of 'pending_review' —
they appear in search_precedent_library immediately. Halachot below the
threshold still require manual chair approval.
Threshold tunable via env (HALACHA_AUTO_APPROVE_THRESHOLD), defaults to
0.80. Rationale: 89% of historical extractions (356/400) score 0.80+,
spot-checks confirmed quality, and the manual review backlog was the
single biggest reason rerank-2 was returning passages-only on
ההבחנה-style queries.
After this change + the one-time backfill UPDATE, search now returns
9/10 halachot for "ההבחנה בין השבחה לפיצויים" instead of 0 — and the
top-3 are exact-match rules, not adjacent passages.
Reviewer field records "auto-approved (confidence ≥ X.XX)" with the
threshold value at insert time, for traceability.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove cases/new|in-progress|completed subdivision (status managed in DB)
- Rename documents/original → documents/originals (consistent plural)
- Move exports from global data/exports/ into cases/{num}/exports/
- Add documents/research/ for case law and analysis files
- Update all agents, scripts, config, web API endpoints, and DB paths
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark on case 1130-25 (4 Hebrew legal docs, 8 queries) showed:
- voyage-law-2: avg top-1 score 0.5839 (+27% over voyage-3-large)
- voyage-4-large: avg top-1 score 0.4119 (worse than current)
- voyage-3-large: avg top-1 score 0.4589 (baseline)
voyage-law-2 costs ~4.6x more per run but delivers significantly
better retrieval quality for Hebrew legal text. Model is now
configurable via VOYAGE_MODEL env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Export DOCX now saves to data/exports/{case_number}/ with auto-versioning
(טיוטה-v1, v2...). The case view UI shows all drafts with download buttons,
allows uploading revised versions (עריכה-v1...), and marking a version as
final (copies to training corpus for style learning).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace single CASES_DIR with find_case_dir() that searches across
all status directories. New cases created in cases/new/{number}/.
Config: CASES_BASE, CASES_NEW, CASES_IN_PROGRESS, CASES_COMPLETED
Docker: added -v /home/chaim/legal-ai/cases:/cases volume mount
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
config.py parse_llm_json: Added truncated JSON recovery. When Claude's
output is cut mid-JSON (common with long claim lists), the parser now:
- Finds the last complete JSON item (closing "}")
- Closes the array/object brackets
- Returns partial but valid results instead of None
Tested: recovers 2/3 items from truncated array, all cases pass.
claims_extractor.py:
- Prompt asks for compact output (150 words max per claim, group similar)
- Explicitly requests "no markdown, no explanations, JSON only"
- Long documents split into chunks at paragraph boundaries
- Each chunk processed separately, results merged
- max_tokens already at 8192
This fixes the recurring "0 claims" bug for committee responses and
permit applicant responses where the JSON was getting truncated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ezer Mishpati - AI legal decision drafting system with:
- MCP server (FastMCP) with document processing pipeline
- Web upload interface (FastAPI) for file upload and classification
- pgvector-based semantic search
- Hebrew legal document chunking and embedding