legal-ai

ezer-mishpati/legal-ai

Fork 0

Commit Graph

Author	SHA1	Message	Date
Chaim	d12cdb1fad	docs(voyage): mark stage C complete + record empirical fixes All checks were successful Build & Deploy / build-and-deploy (push) Successful in 10s Details Stage C of the voyage-upgrades-plan shipped to production on 2026-05-03. The doc now leads with the final state and the two empirical corrections vs the original plan: 1. Reciprocal Rank Fusion replaces weighted-sum hybrid merge. voyage-3 cosines (~0.4-0.5) systematically outscale voyage-multimodal-3 cosines (~0.20-0.25); a weighted sum lets text dominate even when image is the better signal. RRF is rank-based and robust to scale differences. 2. Chunker now propagates page_number end-to-end (extractor returns per-page offsets, chunker tags each chunk by its first character's page). A retrofit script backfills page_number on existing document_chunks without re-OCR — uses the stored documents.extracted_text plus PyMuPDF direct text reads as page anchors (linear interpolation for OCR-only pages). Production state on cases 8174-24 + 8137-24: 419 page-image embeddings, 819 chunks tagged with page_number, MULTIMODAL_ENABLED=true in Coolify env, hybrid search verified A/B against text-only baseline. The original stage C plan section is retained below for reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:16:13 +00:00
Chaim	26c3fddf41	feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which 4 POCs showed inconsistent improvement), add a cross-encoder rerank layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false). POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge): - mean@3 +4.5% (4.306 → 4.500) - practical-category queries +11.6% (3.78 → 4.22) - latency +702ms per query - no schema change, no re-embed, no double storage Plumbing: - config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars - embeddings.voyage_rerank() wraps voyageai client.rerank - services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is unavailable. - tools/search.py: search_decisions, search_case_documents, find_similar_cases all wrapped - services/precedent_library.search_library wrapped Smoke-tested locally with flag on/off — produces expected behaviour and latency profile. Ready for production rollout via Coolify env flip after deploy. POCs (kept under scripts/ for reference): - voyage_context3_poc{_long}.py — context-3 evaluation (rejected) - voyage_multimodal_poc.py — multimodal-3 (stage C, deferred) - voyage_rerank_judge_poc.py — single-case rerank benchmark - voyage_rerank_corpus_poc.py — full-corpus rerank validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:43:41 +00:00
Chaim	cb0b4b6a8b	ops: switch embeddings to voyage-3 + plan for context-3 + multimodal-3.5 All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details Phase A — voyage-3 migration (executed): - VOYAGE_MODEL=voyage-3 set in Coolify (legal-ai app) and ~/.env - scripts/reembed_voyage.py: re-embeds document_chunks (6157), case_law_embeddings (9), precedent_chunks (385), and halachot (400) using the new model. paragraph_embeddings was empty. 6951 rows re-embedded in 93s, ~75 rows/sec. - Same 1024 dim → no schema change needed. Why voyage-3 over voyage-law-2: benchmark on 3 Hebrew legal queries with real passages from the corpus gave voyage-3 perfect ordering on 3/3 tests AND the largest separation (+0.483 vs voyage-law-2's +0.238). voyage-4 family had bigger separation but missed top-1 on the hardest test. Phase B (voyage-context-3) and Phase C (voyage-multimodal-3.5 for scanned + appraiser docs) are designed in docs/voyage-upgrades-plan.md but deferred — to be picked up in a fresh conversation. The plan includes: - Phase B: contextualized embeddings refactor (~49% recall lift on legal docs per Anthropic's research). Same dim, but ingestion pipeline must pass full doc context per chunk. - Phase C: page-level image embeddings via voyage-multimodal-3.5, stored in a parallel *_image_embeddings table. Hybrid text+image search. Targets appraiser report tables and scanned PDFs where current OCR loses layout. After this commit: MCP server needs a /mcp reconnect to pick up the new VOYAGE_MODEL env, and the legal-ai container will pick it up on its next redeploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 16:43:48 +00:00

Author

SHA1

Message

Date

Chaim

d12cdb1fad

docs(voyage): mark stage C complete + record empirical fixes

Build & Deploy / build-and-deploy (push) Successful in 10s

Details

Stage C of the voyage-upgrades-plan shipped to production on
2026-05-03. The doc now leads with the final state and the two
empirical corrections vs the original plan:

1. Reciprocal Rank Fusion replaces weighted-sum hybrid merge.
   voyage-3 cosines (~0.4-0.5) systematically outscale
   voyage-multimodal-3 cosines (~0.20-0.25); a weighted sum lets
   text dominate even when image is the better signal. RRF is
   rank-based and robust to scale differences.

2. Chunker now propagates page_number end-to-end (extractor returns
   per-page offsets, chunker tags each chunk by its first character's
   page). A retrofit script backfills page_number on existing
   document_chunks without re-OCR — uses the stored
   documents.extracted_text plus PyMuPDF direct text reads as page
   anchors (linear interpolation for OCR-only pages).

Production state on cases 8174-24 + 8137-24: 419 page-image
embeddings, 819 chunks tagged with page_number, MULTIMODAL_ENABLED=true
in Coolify env, hybrid search verified A/B against text-only baseline.

The original stage C plan section is retained below for reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 20:16:13 +00:00

Chaim

26c3fddf41

feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)

Build & Deploy / build-and-deploy (push) Successful in 1m29s

Details

Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which
4 POCs showed inconsistent improvement), add a cross-encoder rerank
layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false).

POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge):
- mean@3 +4.5% (4.306 → 4.500)
- practical-category queries +11.6% (3.78 → 4.22)
- latency +702ms per query
- no schema change, no re-embed, no double storage

Plumbing:
- config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars
- embeddings.voyage_rerank() wraps voyageai client.rerank
- services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates
  via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is
  unavailable.
- tools/search.py: search_decisions, search_case_documents,
  find_similar_cases all wrapped
- services/precedent_library.search_library wrapped

Smoke-tested locally with flag on/off — produces expected behaviour and
latency profile. Ready for production rollout via Coolify env flip after
deploy.

POCs (kept under scripts/ for reference):
- voyage_context3_poc{_long}.py — context-3 evaluation (rejected)
- voyage_multimodal_poc.py — multimodal-3 (stage C, deferred)
- voyage_rerank_judge_poc.py — single-case rerank benchmark
- voyage_rerank_corpus_poc.py — full-corpus rerank validation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 18:43:41 +00:00

Chaim

cb0b4b6a8b

ops: switch embeddings to voyage-3 + plan for context-3 + multimodal-3.5

Build & Deploy / build-and-deploy (push) Successful in 7s

Details

Phase A — voyage-3 migration (executed):

- VOYAGE_MODEL=voyage-3 set in Coolify (legal-ai app) and ~/.env
- scripts/reembed_voyage.py: re-embeds document_chunks (6157),
  case_law_embeddings (9), precedent_chunks (385), and halachot (400)
  using the new model. paragraph_embeddings was empty. 6951 rows
  re-embedded in 93s, ~75 rows/sec.
- Same 1024 dim → no schema change needed.

Why voyage-3 over voyage-law-2: benchmark on 3 Hebrew legal queries
with real passages from the corpus gave voyage-3 perfect ordering on
3/3 tests AND the largest separation (+0.483 vs voyage-law-2's
+0.238). voyage-4 family had bigger separation but missed top-1 on
the hardest test.

Phase B (voyage-context-3) and Phase C (voyage-multimodal-3.5 for
scanned + appraiser docs) are designed in docs/voyage-upgrades-plan.md
but deferred — to be picked up in a fresh conversation. The plan
includes:
- Phase B: contextualized embeddings refactor (~49% recall lift on
  legal docs per Anthropic's research). Same dim, but ingestion
  pipeline must pass full doc context per chunk.
- Phase C: page-level image embeddings via voyage-multimodal-3.5,
  stored in a parallel *_image_embeddings table. Hybrid text+image
  search. Targets appraiser report tables and scanned PDFs where
  current OCR loses layout.

After this commit: MCP server needs a /mcp reconnect to pick up the
new VOYAGE_MODEL env, and the legal-ai container will pick it up on
its next redeploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 16:43:48 +00:00

3 Commits