feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m29s
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m29s
Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which
4 POCs showed inconsistent improvement), add a cross-encoder rerank
layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false).
POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge):
- mean@3 +4.5% (4.306 → 4.500)
- practical-category queries +11.6% (3.78 → 4.22)
- latency +702ms per query
- no schema change, no re-embed, no double storage
Plumbing:
- config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars
- embeddings.voyage_rerank() wraps voyageai client.rerank
- services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates
via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is
unavailable.
- tools/search.py: search_decisions, search_case_documents,
find_similar_cases all wrapped
- services/precedent_library.search_library wrapped
Smoke-tested locally with flag on/off — produces expected behaviour and
latency profile. Ready for production rollout via Coolify env flip after
deploy.
POCs (kept under scripts/ for reference):
- voyage_context3_poc{_long}.py — context-3 evaluation (rejected)
- voyage_multimodal_poc.py — multimodal-3 (stage C, deferred)
- voyage_rerank_judge_poc.py — single-case rerank benchmark
- voyage_rerank_corpus_poc.py — full-corpus rerank validation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -6,7 +6,7 @@ import json
|
||||
import logging
|
||||
from uuid import UUID
|
||||
|
||||
from legal_mcp.services import db, embeddings
|
||||
from legal_mcp.services import db, embeddings, rerank
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -43,8 +43,9 @@ async def search_decisions(
|
||||
)
|
||||
|
||||
query_emb = await embeddings.embed_query(query)
|
||||
results = await db.search_similar(
|
||||
query_embedding=query_emb,
|
||||
results = await rerank.maybe_rerank(
|
||||
query=query,
|
||||
base_search=lambda **kw: db.search_similar(query_embedding=query_emb, **kw),
|
||||
limit=limit,
|
||||
section_type=section_type or None,
|
||||
practice_area=practice_area or None,
|
||||
@@ -86,8 +87,9 @@ async def search_case_documents(
|
||||
|
||||
query_emb = await embeddings.embed_query(query)
|
||||
# Restricted to case_id — practice_area filter would be redundant.
|
||||
results = await db.search_similar(
|
||||
query_embedding=query_emb,
|
||||
results = await rerank.maybe_rerank(
|
||||
query=query,
|
||||
base_search=lambda **kw: db.search_similar(query_embedding=query_emb, **kw),
|
||||
limit=limit,
|
||||
case_id=UUID(case["id"]),
|
||||
)
|
||||
@@ -137,9 +139,13 @@ async def find_similar_cases(
|
||||
)
|
||||
|
||||
query_emb = await embeddings.embed_query(description)
|
||||
results = await db.search_similar(
|
||||
query_embedding=query_emb,
|
||||
limit=limit * 3, # Get more to deduplicate by case
|
||||
# Use description as the query text for rerank too.
|
||||
# Note: even with rerank we ask for ``limit*3`` so the dedup-by-case
|
||||
# step downstream still has enough rows to pick the best per case.
|
||||
results = await rerank.maybe_rerank(
|
||||
query=description,
|
||||
base_search=lambda **kw: db.search_similar(query_embedding=query_emb, **kw),
|
||||
limit=limit * 3,
|
||||
practice_area=practice_area or None,
|
||||
appeal_subtype=appeal_subtype or None,
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user