legal-ai

ezer-mishpati/legal-ai

Fork 0

Files

History

Chaim c31fe0866b

Build & Deploy / build-and-deploy (push) Waiting to run

Details

fix(retrieval): switch hybrid merge to Reciprocal Rank Fusion (RRF)

Cosine scores in voyage-3 (~0.4-0.5) and voyage-multimodal-3
(~0.2-0.25) live on different scales. The previous weighted-sum
merge let text always dominate — verified empirically: 0 image-only
hits across 7 queries on case 8174-24, image side contributed nothing.

RRF combines by *rank* in each list rather than raw score, robust
to scale differences. Per-item score:

    rrf_score = text_weight / (k + text_rank)
              + image_weight / (k + image_rank)

A row that appears in both lists (joined on (id_field, page_number))
gets both terms — surfaced as match_type='text+image'.

After fix on 8174-24 (146 image rows): 2 image-only hits land in
top-5 across all 7 test queries, surfacing actual table/diagram/
signature pages (p12, p13 of שומת המשיבה for 'טבלת השוואת ערכי שומה',
p25 of שומת השגה for 'תרשים גוש וחלקה', etc).

On 8137-24 (273 image rows): 'חישוב היוון של דמי החכירה' goes from
0 baseline results → 5 hybrid results (3 text + 2 image), opening
recall on scanned content the OCR layer misses.

Default MULTIMODAL_TEXT_WEIGHT 0.65 → 0.5 (vanilla RRF) since the
prior 0.65 was tuned for raw cosine scales that no longer apply.
New env knob MULTIMODAL_RRF_K (default 60, standard literature).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 19:39:31 +00:00

src/legal_mcp

fix(retrieval): switch hybrid merge to Reciprocal Rank Fusion (RRF)

2026-05-03 19:39:31 +00:00

tests

DOCX exporter: 3-layer RTL + David font on all slots

2026-04-28 17:37:52 +00:00

pyproject.toml

refactor(precedents): keep all LLM calls on the local-MCP path

2026-05-03 11:06:08 +00:00