feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m50s
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m50s
Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid text+image search. Off by default; enable with MULTIMODAL_ENABLED=true. - Schema V9: document_image_embeddings + precedent_image_embeddings (vector(1024), page_number, image_thumbnail_path) - extractor.render_pages_for_multimodal renders PDF pages at MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass - embeddings.embed_images calls voyage-multimodal-3 in 50-page batches - services/hybrid_search.py orchestrator: rerank applied to text side first (rerank-2 is text-only); image side cosine; weighted merge with text_weight 0.65 (env-tunable); image-only pages surface as match_type='image' so dense scanned content still appears - processor.process_document and precedent_library.ingest_precedent gated by flag — non-fatal on multimodal failure - scripts/multimodal_backfill.py — idempotent per-case CLI to embed existing documents without re-extracting text Validated locally on a 5-page response brief: render 0.31s, embed 8.32s, hybrid merge surfaces image rows correctly. Production rollout starts with flag=false (no behavior change), then per-case A/B. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -58,6 +58,29 @@ VOYAGE_RERANK_ENABLED = (
|
||||
# 50 was the depth used in the POC; balances recall vs rerank cost.
|
||||
VOYAGE_RERANK_FETCH_K = int(os.environ.get("VOYAGE_RERANK_FETCH_K", "50"))
|
||||
|
||||
# Multimodal — page-image embeddings via voyage-multimodal-3. Off by
|
||||
# default; flip with env to enable per-page image embedding during
|
||||
# ingestion + hybrid (text+image) ranking at search time. POC #3
|
||||
# validated on a 89-page appraisal PDF (38s, 312K tokens, recovered
|
||||
# table structure + image-only scanned pages that text-OCR misses).
|
||||
MULTIMODAL_ENABLED = (
|
||||
os.environ.get("MULTIMODAL_ENABLED", "false").lower() == "true"
|
||||
)
|
||||
MULTIMODAL_MODEL = os.environ.get("MULTIMODAL_MODEL", "voyage-multimodal-3")
|
||||
# Render DPI for the image fed to the embedder. POC used 144 — sweet
|
||||
# spot between embedding quality and tokens/page (144 ≈ 3.5K tok/page).
|
||||
MULTIMODAL_DPI = int(os.environ.get("MULTIMODAL_DPI", "144"))
|
||||
# Separate, lower DPI for the JPEG thumbnail saved to disk for UI
|
||||
# preview. ~96dpi → ~20KB/page; ingestion-time, no re-render at view.
|
||||
MULTIMODAL_THUMB_DPI = int(os.environ.get("MULTIMODAL_THUMB_DPI", "96"))
|
||||
# Hybrid merge weight for the *text* side. The image side gets
|
||||
# (1 - this). POC found text dominates most queries; image wins only
|
||||
# on table/visual queries — slight text bias starting point, tunable
|
||||
# per env without redeploy.
|
||||
MULTIMODAL_TEXT_WEIGHT = float(
|
||||
os.environ.get("MULTIMODAL_TEXT_WEIGHT", "0.65")
|
||||
)
|
||||
|
||||
# Halacha extraction — auto-approve threshold. Halachot with extractor
|
||||
# confidence >= this value are inserted with review_status='approved'
|
||||
# instead of 'pending_review' (so they immediately appear in
|
||||
|
||||
Reference in New Issue
Block a user