legal-ai/scripts at 41742171792d0d78ee8b07efdc738394377e2e34 - legal-ai - Dafna Tamir Vault

ezer-mishpati/legal-ai

Files

History

Chaim 434341cc29 chore(#57 ): re-chunk+re-embed legacy precedents (pre-#55 chunker remediation)

Adds scripts/rechunk_legacy_precedents.py: selects every case_law with a tiny
chunk (content<50 — the pre-fix chunker fingerprint) and runs
ingest.reindex_case_law (re-chunk+re-embed from stored full_text only, no
re-OCR/LLM, idempotent). Batch-idempotent (re-queries the affected set).

Run result (2026-06-03): 73 precedents reindexed, 0 failed. Tiny chunks
483 -> 4 (99.2%); total precedent_chunks 5019 -> 3115 (fragments merged).
Search verified healthy (substantial coherent passages, no errors).

The 4 residual tiny chunks are isolated section headings ('דיון',
'טענות המשיבים', ...) emitted by the CURRENT (fixed) chunker — not legacy
fragments — and are already filtered at query time (>=50, #55). Minor
chunker edge case, candidate #55 follow-up.

The DB chunk migration is already applied to prod; this commit is the script
+ SCRIPTS.md entry only (no app code change, no deploy needed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-03 07:55:42 +00:00

..

feat(curator): switch Hermes Curator to DeepSeek V4-Pro via deepseek_local adapter

2026-05-10 05:58:52 +00:00

ab_halacha_opus48.py

feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction

2026-05-31 18:42:13 +00:00

audit_corpus_integrity.py

feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 )

2026-05-26 11:26:52 +00:00

audit_training_corpus.py

feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat

2026-05-27 10:06:22 +00:00

auto-sync-cases.sh

Fix case repo sync + auto-create Gitea repos + add sync indicator

2026-04-14 15:28:16 +00:00

backfill_chunk_pages.py

fix(retrieval): rewrite chunk-page retrofit to skip OCR

2026-05-03 20:04:33 +00:00

backfill_legal_arguments.py

feat: Stage A finalizers + #35/#36/#37 — critical-gap closure

2026-05-26 08:34:40 +00:00

backfill_multimodal_precedents.py

feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 )

2026-05-26 11:26:52 +00:00

backup-db.sh

Add full decision writing pipeline: classify, extract, brainstorm, write, QA, export

2026-04-03 10:21:47 +00:00

bidi_table.py

Improve document processing pipeline and agent workflows

2026-04-09 16:45:49 +00:00

compute_ndcg.py

feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 )

2026-05-26 11:26:52 +00:00

convert_decision_template.py

Pre-existing agent updates + analysis DOCX export

2026-04-16 18:49:10 +00:00

deploy-track-changes.sh

Add Track Changes architecture for draft revisions (CMP + CMPA)

2026-04-16 18:49:30 +00:00

eval_gold_bootstrap.py

feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63 )

2026-05-31 14:58:13 +00:00

eval_retrieval.py

feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63 )

2026-05-31 14:58:13 +00:00

fix_paperclipai_skills_drift.py

chore(skills): remove paperclip-dev, scope converting-plans-to-tasks

2026-05-04 17:47:05 +00:00

fu2b_reconcile_internal_case_numbers.py

feat(fu2b): flag PROC_MISMATCH (case_number prefix vs proceeding_type) for chair

2026-05-31 08:57:42 +00:00

fu2c_reconcile_external_case_numbers.py

feat(migration): FU-2c — reconcile external case_law identifiers (GAP-08, #68 )

2026-05-31 14:12:45 +00:00

legal-chat-service.config.cjs

security(chat): bind chat service to docker bridge + require Bearer auth

2026-05-27 10:22:14 +00:00

monitor_halacha_quality.py

feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 )

2026-05-26 11:26:52 +00:00

multimodal_backfill.py

feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag)

2026-05-03 19:24:52 +00:00

notify.py

CEO: add email notifications, subtask parentId, and Paperclip UI assets

2026-04-14 15:55:55 +00:00

pc.sh

feat(paperclip): close 11 integration gaps (#16-#28)

2026-05-04 17:25:45 +00:00

process_pending_blam.py

fix(cases): בל"מ badge reads proceeding_type, not just appeal_subtype

2026-05-26 09:34:23 +00:00

rechunk_legacy_precedents.py

chore(#57 ): re-chunk+re-embed legacy precedents (pre-#55 chunker remediation)

2026-06-03 07:55:42 +00:00

reembed_voyage.py

ops: switch embeddings to voyage-3 + plan for context-3 + multimodal-3.5

2026-05-03 16:43:48 +00:00

restore-db.sh

Add full decision writing pipeline: classify, extract, brainstorm, write, QA, export

2026-04-03 10:21:47 +00:00

retrofit_case.py

Add Track Changes architecture for draft revisions (CMP + CMPA)

2026-04-16 18:49:30 +00:00

SCRIPTS.md

chore(#57 ): re-chunk+re-embed legacy precedents (pre-#55 chunker remediation)

2026-06-03 07:55:42 +00:00

sync_agents_across_companies.py

feat(sync): --verify exits non-zero on drift; adapter mismatch = loud drift (GAP-21, FU-8a)

2026-05-31 11:14:44 +00:00

sync_missing_agent_skills.py

feat(paperclip): close 11 integration gaps (#16-#28)

2026-05-04 17:25:45 +00:00

test_retrieval_by_name.py

fix(retrieval): make decisions findable by name + unhide committee uploads

2026-05-30 11:26:19 +00:00

upload_blam_decisions.py

feat(proceeding-type): explicit ערר/בל"מ field for cases + corpus

2026-05-26 09:17:33 +00:00

voyage_context3_poc_long.py

feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)

2026-05-03 18:43:41 +00:00

voyage_context3_poc.py

feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)

2026-05-03 18:43:41 +00:00

voyage_multimodal_poc.py

feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)

2026-05-03 18:43:41 +00:00

voyage_rerank_corpus_poc.py

feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)

2026-05-03 18:43:41 +00:00

voyage_rerank_judge_poc.py

feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)

2026-05-03 18:43:41 +00:00