feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)

Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which 4 POCs showed inconsistent improvement), add a cross-encoder rerank layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false). POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge): - mean@3 +4.5% (4.306 → 4.500) - practical-category queries +11.6% (3.78 → 4.22) - latency +702ms per query - no schema change, no re-embed, no double storage Plumbing: - config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars - embeddings.voyage_rerank() wraps voyageai client.rerank - services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is unavailable. - tools/search.py: search_decisions, search_case_documents, find_similar_cases all wrapped - services/precedent_library.search_library wrapped Smoke-tested locally with flag on/off — produces expected behaviour and latency profile. Ready for production rollout via Coolify env flip after deploy. POCs (kept under scripts/ for reference): - voyage_context3_poc{_long}.py — context-3 evaluation (rejected) - voyage_multimodal_poc.py — multimodal-3 (stage C, deferred) - voyage_rerank_judge_poc.py — single-case rerank benchmark - voyage_rerank_corpus_poc.py — full-corpus rerank validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:43:41 +00:00
parent 688ba37d9c
commit 26c3fddf41
13 changed files with 1578 additions and 100 deletions
--- a/mcp-server/src/legal_mcp/services/precedent_library.py
+++ b/mcp-server/src/legal_mcp/services/precedent_library.py
@@ -22,7 +22,7 @@ from typing import Awaitable, Callable
 from uuid import UUID, uuid4

 from legal_mcp import config
-from legal_mcp.services import chunker, db, embeddings, extractor
+from legal_mcp.services import chunker, db, embeddings, extractor, rerank

 # Note: halacha_extractor and precedent_metadata_extractor are NOT imported
 # at module load. They are imported lazily inside the dedicated re-extract
@@ -403,18 +403,29 @@ async def search_library(

    Only ``approved`` / ``published`` halachot are returned, per chair-review
    policy. Chunks are returned regardless of halacha review status.
+
+    When ``VOYAGE_RERANK_ENABLED`` is set, results are passed through
+    voyage rerank-2 (cross-encoder). The +0.05 halacha boost from
+    ``search_precedent_library_semantic`` is preserved before rerank
+    but the rerank scores ultimately decide the order.
    """
    if not query.strip():
        return []
    query_vec = await embeddings.embed_query(query)
-    return await db.search_precedent_library_semantic(
-        query_embedding=query_vec,
-        practice_area=practice_area,
-        court=court,
-        precedent_level=precedent_level,
-        appeal_subtype=appeal_subtype,
-        is_binding=is_binding,
-        subject_tag=subject_tag,
-        limit=limit,
-        include_halachot=include_halachot,
+
+    async def _base(limit: int) -> list[dict]:
+        return await db.search_precedent_library_semantic(
+            query_embedding=query_vec,
+            practice_area=practice_area,
+            court=court,
+            precedent_level=precedent_level,
+            appeal_subtype=appeal_subtype,
+            is_binding=is_binding,
+            subject_tag=subject_tag,
+            limit=limit,
+            include_halachot=include_halachot,
+        )
+
+    return await rerank.maybe_rerank(
+        query=query, base_search=_base, limit=limit,
    )