feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m29s

Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which
4 POCs showed inconsistent improvement), add a cross-encoder rerank
layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false).

POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge):
- mean@3 +4.5% (4.306 → 4.500)
- practical-category queries +11.6% (3.78 → 4.22)
- latency +702ms per query
- no schema change, no re-embed, no double storage

Plumbing:
- config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars
- embeddings.voyage_rerank() wraps voyageai client.rerank
- services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates
  via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is
  unavailable.
- tools/search.py: search_decisions, search_case_documents,
  find_similar_cases all wrapped
- services/precedent_library.search_library wrapped

Smoke-tested locally with flag on/off — produces expected behaviour and
latency profile. Ready for production rollout via Coolify env flip after
deploy.

POCs (kept under scripts/ for reference):
- voyage_context3_poc{_long}.py — context-3 evaluation (rejected)
- voyage_multimodal_poc.py — multimodal-3 (stage C, deferred)
- voyage_rerank_judge_poc.py — single-case rerank benchmark
- voyage_rerank_corpus_poc.py — full-corpus rerank validation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-03 18:43:41 +00:00
parent 688ba37d9c
commit 26c3fddf41
13 changed files with 1578 additions and 100 deletions

View File

@@ -40,7 +40,7 @@ Local (developer machine, pm2):
External:
← Claude API (Opus 4.7 for agents)
← Voyage AI (voyage-3-large, 1024-dim embeddings)
← Voyage AI (voyage-3, 1024-dim embeddings)
← Infisical (secret management)
← Gmail SMTP (agent notifications)
```
@@ -59,7 +59,7 @@ External:
- מפעיל OCR (Google Vision) אם PDF ללא טקסט
- מריץ proofreader להסרת artifacts מ-Nevo
- מחלץ טקסט ל-`documents.extracted_text`
- מפצל ל-chunks של ~500 מילים, מחשב embeddings (voyage-3-large, 1024D), שומר ב-`document_chunks`
- מפצל ל-chunks של ~500 מילים, מחשב embeddings (voyage-3, 1024D), שומר ב-`document_chunks`
4. סטטוס תיק: `new``proofread`
### שלב 2 — ניתוח משפטי (legal-researcher + analyst)
@@ -223,7 +223,7 @@ legal-qa מריץ 6 בדיקות איכות:
`case_law`, `statutory_provisions`, `transition_phrases`, `lessons_learned`, `style_corpus`, `style_patterns`
### Layer 4: Semantic Search (RAG)
`document_embeddings`, `paragraph_embeddings`, `case_law_embeddings` (pgvector 1024-dim, voyage-3-large)
`document_embeddings`, `paragraph_embeddings`, `case_law_embeddings` (pgvector 1024-dim, voyage-3)
### Layer 5 — Multi-tenancy
`companies`, `tag_company_mappings` (appeal_subtype → company_id)
@@ -283,7 +283,9 @@ legal-qa מריץ 6 בדיקות איכות:
## טכנולוגיות עיקריות
- **Database**: PostgreSQL 15 + pgvector 0.8.1
- **Embeddings**: Voyage AI (`voyage-3-large`, 1024-dim)
- **Embeddings**: Voyage AI (`voyage-3`, 1024-dim) + cross-encoder rerank (`rerank-2`)
- bi-encoder: voyage-3 לכל chunk (חד-פעמי בעת ingestion)
- cross-encoder: rerank-2 לכל query (top-50 → top-K), feature flag `VOYAGE_RERANK_ENABLED`
- **Agents**: Claude Opus 4.7 (via Paperclip pm2)
- **DOCX manipulation**: `python-docx` 1.2+ ו-`lxml` 5.2+ (XML surgery)
- **Frontend**: Next.js + TanStack Query + Tailwind