chore(#15): adopt MULTIMODAL_TEXT_WEIGHT=0.65 + close #15, open #80 #46

chaim · 2026-06-03T08:45:23Z

chaim commented

2026-06-03 08:45:23 +00:00

רקע

A/B אובייקטיבי (eval_retrieval.py, 86 שאילתות gold-set) על #15. הנחת היסוד התיישנה — multimodal כבר ברירת-מחדל בייצור (110 מסמכים), לא 2 תיקי A/B.

ממצא

ה-weight 0.5 (ברירת-מחדל) היה mis-tuned — צד-התמונה כבד מדי, חתך recall של precedent_library מ-0.971 ל-0.885. sweep 0.5→0.75:

weight	R@5	nDCG@5	MRR
OFF (text-only)	0.989	0.944	0.936
0.5 (היום)	0.957 ❌	0.936	0.955
0.65	0.994	0.960	0.954

ב-0.65 multimodal מנצח את text-only בכל מדד ובכל corpus. דפנה אישרה.

בוצע

MULTIMODAL_TEXT_WEIGHT=0.65 ב-Coolify env (runtime) + redeploy.
baseline.json עודכן לקונפיג 0.65.
#15 → done (ה-win היה כיוונון ה-weight, לא ה-backfill).
#80 נפתח: backfill ל-140 התיקים ה-legacy נדחה עד בדיקת ערך image-answer ממוקדת (לא נבדק כאן).

🤖 Generated with Claude Code

## רקע A/B אובייקטיבי (eval_retrieval.py, 86 שאילתות gold-set) על #15. הנחת היסוד התיישנה — multimodal כבר ברירת-מחדל בייצור (110 מסמכים), לא 2 תיקי A/B. ## ממצא ה-weight 0.5 (ברירת-מחדל) היה **mis-tuned** — צד-התמונה כבד מדי, חתך recall של precedent_library מ-0.971 ל-0.885. sweep 0.5→0.75: | weight | R@5 | nDCG@5 | MRR | |---|:---:|:---:|:---:| | OFF (text-only) | 0.989 | 0.944 | 0.936 | | 0.5 (היום) | 0.957 ❌ | 0.936 | 0.955 | | **0.65** | **0.994** | **0.960** | **0.954** | ב-0.65 multimodal מנצח את text-only **בכל מדד ובכל corpus**. דפנה אישרה. ## בוצע - `MULTIMODAL_TEXT_WEIGHT=0.65` ב-Coolify env (runtime) + redeploy. - `baseline.json` עודכן לקונפיג 0.65. - **#15 → done** (ה-win היה כיוונון ה-weight, לא ה-backfill). - **#80 נפתח**: backfill ל-140 התיקים ה-legacy נדחה עד בדיקת ערך image-answer ממוקדת (לא נבדק כאן). 🤖 Generated with [Claude Code](https://claude.com/claude-code)

chaim added 1 commit 2026-06-03 08:45:24 +00:00

chore(#15 ): adopt MULTIMODAL_TEXT_WEIGHT=0.65 + close #15 , open #80 4debe9995b

A/B eval (eval_retrieval.py, 86-query gold-set) showed the 0.5 default was
mis-tuned: the image side was too heavy and dragged precedent_library recall
0.971 -> 0.885. Sweep 0.5..0.75 — at 0.65 multimodal beats text-only on every
overall metric AND every corpus (R@5 0.994 vs 0.989, nDCG@5 0.960 vs 0.944,
MRR 0.954 vs 0.936). Dafna approved.

- MULTIMODAL_TEXT_WEIGHT=0.65 set in Coolify (legal-ai, runtime) + redeploy.
- baseline.json updated to the 0.65 config (future regression reference).
- #15 done (premise was stale — multimodal already default on 110 docs; the
  win was tuning the weight, not the backfill).
- #80 opened: the costly 140-doc legacy backfill is deferred until a targeted
  image-answer gold-set proves the table/image value prop (untested here).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chaim merged commit 5a00a0ef47 into main

2026-06-03 08:45:30 +00:00

chaim deleted branch chore/15-multimodal-weight-065

2026-06-03 08:45:30 +00:00

chaim referenced this issue from a commit

2026-06-03 08:45:30 +00:00

Merge pull request 'chore(#15): adopt MULTIMODAL_TEXT_WEIGHT=0.65 + close #15, open #80' (#46) from chore/15-multimodal-weight-065 into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: ezer-mishpati/legal-ai#46