legal-ai

Author	SHA1	Message	Date
Chaim	92a2763b86	feat: add internal committee decisions corpus (source_kind='internal_committee') All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m31s Details Three-layer separation: style learning (style_corpus), appeals-committee decisions (internal_committee), and court rulings (external_upload). - SCHEMA_V10: chair_name + district columns on case_law and cases, partial indexes - create_internal_committee_decision() DB upsert function - search_precedent_library_semantic() now accepts source_kind/district/chair_name params - search_precedent_library_hybrid() passes through new params - services/internal_decisions.py: ingest_internal_decision, migrate_from_style_corpus, migrate_from_external_corpus (identifies rows via source_type='appeals_committee') - search_internal_decisions() MCP tool (server.py + tools/search.py) - internal_decision_migrate() MCP admin tool - Web endpoints: POST /api/internal-decisions/upload, POST /api/internal-decisions/migrate, GET /api/internal-decisions - ingest_final_version auto-ingests finalized decisions into internal corpus - SKILL.md updated: agents now search internal + external in parallel, present separately Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:33:39 +00:00
Chaim	f6bb46dc4a	fix(retrieval): restore _base(limit=) contract in hybrid precedent search All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m23s Details `rerank.maybe_rerank` calls `base_search(limit=…, base_kwargs)` on both the rerank-on and rerank-off paths. Commit `242f668` moved the closure into hybrid_search.py and renamed its parameter to `limit_inner`, so every call to `/api/precedent-library/search` raised TypeError 500 regardless of the VOYAGE_RERANK_ENABLED flag. Sibling `search_documents_hybrid` was unaffected because it uses `lambda kw:` which absorbs the kwarg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:19:53 +00:00
Chaim	c31fe0866b	fix(retrieval): switch hybrid merge to Reciprocal Rank Fusion (RRF) Some checks are pending Build & Deploy / build-and-deploy (push) Waiting to run Details Cosine scores in voyage-3 (~0.4-0.5) and voyage-multimodal-3 (~0.2-0.25) live on different scales. The previous weighted-sum merge let text always dominate — verified empirically: 0 image-only hits across 7 queries on case 8174-24, image side contributed nothing. RRF combines by rank in each list rather than raw score, robust to scale differences. Per-item score: rrf_score = text_weight / (k + text_rank) + image_weight / (k + image_rank) A row that appears in both lists (joined on (id_field, page_number)) gets both terms — surfaced as match_type='text+image'. After fix on 8174-24 (146 image rows): 2 image-only hits land in top-5 across all 7 test queries, surfacing actual table/diagram/ signature pages (p12, p13 of שומת המשיבה for 'טבלת השוואת ערכי שומה', p25 of שומת השגה for 'תרשים גוש וחלקה', etc). On 8137-24 (273 image rows): 'חישוב היוון של דמי החכירה' goes from 0 baseline results → 5 hybrid results (3 text + 2 image), opening recall on scanned content the OCR layer misses. Default MULTIMODAL_TEXT_WEIGHT 0.65 → 0.5 (vanilla RRF) since the prior 0.65 was tuned for raw cosine scales that no longer apply. New env knob MULTIMODAL_RRF_K (default 60, standard literature). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:39:31 +00:00
Chaim	242f668319	feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m50s Details Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid text+image search. Off by default; enable with MULTIMODAL_ENABLED=true. - Schema V9: document_image_embeddings + precedent_image_embeddings (vector(1024), page_number, image_thumbnail_path) - extractor.render_pages_for_multimodal renders PDF pages at MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass - embeddings.embed_images calls voyage-multimodal-3 in 50-page batches - services/hybrid_search.py orchestrator: rerank applied to text side first (rerank-2 is text-only); image side cosine; weighted merge with text_weight 0.65 (env-tunable); image-only pages surface as match_type='image' so dense scanned content still appears - processor.process_document and precedent_library.ingest_precedent gated by flag — non-fatal on multimodal failure - scripts/multimodal_backfill.py — idempotent per-case CLI to embed existing documents without re-extracting text Validated locally on a 5-page response brief: render 0.31s, embed 8.32s, hybrid merge surfaces image rows correctly. Production rollout starts with flag=false (no behavior change), then per-case A/B. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:24:52 +00:00

4 Commits