fix(retrieval): switch hybrid merge to Reciprocal Rank Fusion (RRF)
Some checks are pending
Build & Deploy / build-and-deploy (push) Waiting to run
Some checks are pending
Build & Deploy / build-and-deploy (push) Waiting to run
Cosine scores in voyage-3 (~0.4-0.5) and voyage-multimodal-3
(~0.2-0.25) live on different scales. The previous weighted-sum
merge let text always dominate — verified empirically: 0 image-only
hits across 7 queries on case 8174-24, image side contributed nothing.
RRF combines by *rank* in each list rather than raw score, robust
to scale differences. Per-item score:
rrf_score = text_weight / (k + text_rank)
+ image_weight / (k + image_rank)
A row that appears in both lists (joined on (id_field, page_number))
gets both terms — surfaced as match_type='text+image'.
After fix on 8174-24 (146 image rows): 2 image-only hits land in
top-5 across all 7 test queries, surfacing actual table/diagram/
signature pages (p12, p13 of שומת המשיבה for 'טבלת השוואת ערכי שומה',
p25 of שומת השגה for 'תרשים גוש וחלקה', etc).
On 8137-24 (273 image rows): 'חישוב היוון של דמי החכירה' goes from
0 baseline results → 5 hybrid results (3 text + 2 image), opening
recall on scanned content the OCR layer misses.
Default MULTIMODAL_TEXT_WEIGHT 0.65 → 0.5 (vanilla RRF) since the
prior 0.65 was tuned for raw cosine scales that no longer apply.
New env knob MULTIMODAL_RRF_K (default 60, standard literature).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -73,13 +73,19 @@ MULTIMODAL_DPI = int(os.environ.get("MULTIMODAL_DPI", "144"))
|
||||
# Separate, lower DPI for the JPEG thumbnail saved to disk for UI
|
||||
# preview. ~96dpi → ~20KB/page; ingestion-time, no re-render at view.
|
||||
MULTIMODAL_THUMB_DPI = int(os.environ.get("MULTIMODAL_THUMB_DPI", "96"))
|
||||
# Hybrid merge weight for the *text* side. The image side gets
|
||||
# (1 - this). POC found text dominates most queries; image wins only
|
||||
# on table/visual queries — slight text bias starting point, tunable
|
||||
# per env without redeploy.
|
||||
# Hybrid merge: Reciprocal Rank Fusion (RRF) bias for the *text* side.
|
||||
# voyage-3 cosine scores (~0.4-0.5) and voyage-multimodal-3 scores
|
||||
# (~0.20-0.25) live on different scales; a direct weighted sum lets
|
||||
# text always dominate. RRF is rank-based and robust to that. The
|
||||
# weight here biases the contribution of each side: 0.5 = balanced
|
||||
# (vanilla RRF), >0.5 favours text, <0.5 favours image. Tunable per
|
||||
# env without redeploy.
|
||||
MULTIMODAL_TEXT_WEIGHT = float(
|
||||
os.environ.get("MULTIMODAL_TEXT_WEIGHT", "0.65")
|
||||
os.environ.get("MULTIMODAL_TEXT_WEIGHT", "0.5")
|
||||
)
|
||||
# RRF damping constant. Standard literature value is 60: lower values
|
||||
# concentrate weight at top ranks; higher values flatten the curve.
|
||||
MULTIMODAL_RRF_K = int(os.environ.get("MULTIMODAL_RRF_K", "60"))
|
||||
|
||||
# Halacha extraction — auto-approve threshold. Halachot with extractor
|
||||
# confidence >= this value are inserted with review_status='approved'
|
||||
|
||||
Reference in New Issue
Block a user