legal-ai

Author	SHA1	Message	Date
Chaim	ea8b48c6ac	feat(mcp): FU-14 GAP-45 — extraction_status (חשיפת תור-החילוץ הסמוי) INV-TOOL4 (visibility / persistence). תור בקשות-החילוץ (metadata/halacha) נשמר ב-case_law.{metadata,halacha}_extraction_requested_at ומרוקן ע"י precedent_process_pending — אבל לא היה כלי לראות את עומק-התור. נוסף: - db.extraction_queue_status() — count + גיל הבקשה הוותיקה לכל kind (read-only). - plib.extraction_status() — tool wrapper (envelope _ok/_err). - רישום extraction_status ב-server.py ליד precedent_process_pending. - precedent_process_pending קיבל _clamp_limit (עקביות עם GAP-53). תוספתי, read-only, אפס שבירה. עודכנו X9 (INV-TOOL4 ✅) ו-gap-audit (GAP-45 ✅). py_compile עבר על 3 קבצי הקוד. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:00:25 +00:00
Chaim	8a0c206ecd	feat(reindex): precedent_reindex MCP tool (GAP-09, FU-3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 22:09:44 +00:00
Chaim	58ab003206	fix(retrieval): make decisions findable by name + unhide committee uploads All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m57s Details Root cause of "agent can't find the Agasi decision in the corpus" (CMPA-55): the decision was fully ingested, but the retrieval layer failed on the realistic agent query — searching by case name. - RC-A (#52): lexical tsvector covered only chunk content + halacha text, so a bare-name query ("אגסי") matched decisions that cite the case, not the case itself. Add meta_tsv on case_law(case_name, case_number) (SCHEMA V20) and OR it into the lexical halacha/chunk SQL with a match boost, so a name/number hit surfaces the case's own rows. Agasi: rank 4 → rank 1. - RC-B (#53): precedent_library_list hard-defaulted source_kind=external_upload and never exposed the param, hiding uploaded ערר/בל"מ (internal_committee) decisions. Thread source_kind through service → tool → MCP tool (supports 'internal_committee' / 'all_committees'). - #54: agent instructions (researcher/analyst/writer) — search-by-name protocol: add content/case-number, search both corpora, use all_committees before declaring "not in corpus". - #55: chunker produced tiny fragment chunks ("דיון", "החלטה") from header keywords matched mid-sentence. Anchor SECTION_PATTERNS to line start + merge sub-min sections; exclude <50-char fragments at query time (484 existing fragments hidden; full re-chunk tracked as #57). Tests: scripts/test_retrieval_by_name.py (name ranks case above citer + substantive regressions); chunker unit checks (0 tiny chunks). New findings filed as tasks #56 (halacha source_kind leak) and #57 (re-chunk migration). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:26:19 +00:00
Chaim	2aee398b4a	feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 ) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details Six independent sub-tasks dispatched in parallel; aggregated here. ## #33 — Hide case_name column library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם" get `className="hidden"` in both Court and Committee row variants. DB column preserved for future use. ## #47 — Audit script periodic New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר prefix, internal missing chair/district, cases.practice_area enum) + CEO wakeup on violations + cron `0 7 * * `. First run: 0 issues. ## #48 — Parent-doc retrieval (gated, default off) Schema V17: precedent_chunks.parent_chunk_id + chunk_role ('child'\|'parent'). New chunker.chunk_document_hierarchical() — section-aware parents (~1500 tokens) containing ~5 overlapping children (~300 tokens each). New db.store_precedent_chunks_hierarchical two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and swap content + dedupe by parent_chunk_id when flag on. Toggle: PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS. Backfill ~3min and ~$0.20 — deferred to follow-up. ## #49 — Multimodal backfill New scripts/backfill_multimodal_precedents.py with token-matching case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container: 26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no source file on disk) — will catch up when re-uploaded. ## #50 — Closed-loop feedback + nDCG Schema V18: search_logs + search_relevance_feedback. New telemetry.py with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) + auto-infer_relevance_from_citations (reads case drafts → marks score=3 when cited precedent appears in past search top-K). Hooks added to 5 search paths. scripts/compute_ndcg.py for aggregation. Two admin API endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI deferred — API is enough for now. ## #51 — Halacha quality monitoring New scripts/monitor_halacha_quality.py — baseline avg confidence (trusted=0.849, all=0.833, pending=0.694) with rolling window drift detection. Default 5% threshold. Exits non-zero on alert for cron integration. Recommended: `0 8 * 1` weekly Mon 8am. ## Bonus: 230 unlinked citations → missing_precedents Bulk-imported 230 distinct unlinked citations from precedent_internal_citations to missing_precedents.status='open', party='committee', with notes listing source citers. Top candidate: ע"א 3213/97 (cited 5x). Total open missing_precedents now 237. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 11:26:52 +00:00
Chaim	c6e368e4f7	feat(corpus): Stage A — corpus tagging fixes + prevention layer All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m8s Details מתקן את הבאג של תיוג שגוי לועדות ערר ומונע חזרתו: Code changes: * New MCP tool `internal_decision_upload` (chair_name+district required) — sole supported path for ingesting committee decisions; tags source_kind='internal_committee' automatically. * Citation guard in `precedent_library_upload` rejects citations starting with "ערר" or "בל\"מ" with a directive to use internal_decision_upload. * `practice_area.py` taxonomy unification: PRACTICE_AREAS now accepts both multi-tenant (appeals_committee/national_insurance/labor_law) and domain (rishuy_uvniya/betterment_levy/compensation_197) values. New helper `to_db_practice_area(multi_tenant, subtype) -> domain`. Agent docs: * legal-researcher (+5K): upload-tool decision flowchart, code samples per source_kind, district enum (ירושלים/מרכז/תל אביב/צפון/דרום/חיפה/ארצי) * legal-ceo, legal-analyst, legal-writer, legal-qa, HEARTBEAT — taxonomy awareness + source_kind-aware citation patterns + research_complete as valid status. * Fixed two pre-existing wrong practice_area values in examples (histael_hashbacha→betterment_levy, pitsuim_197→compensation_197). Closes TaskMaster #30(parts), #38(parts), #39 (root cause). DB-side backfill + CHECK constraints applied directly via psql: * 11 cases.practice_area corrected (1xxx→rishuy, 8xxx→betterment) * 6 case_law records reclassified external_upload→internal_committee with inferred district * 6 chair_name backfilled from full_text (5 שרית אריאלי + 1 דפנה תמיר) * 88 new halachot extracted for newly-uploaded precedents (אנטרים + ירושלים שקופה 1112/22 + אגא וכט) * CHECK constraints: cases.practice_area enum, case_law internal⇒district Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 07:40:18 +00:00
Chaim	3e14cd6798	feat: link related precedents across court instances (SCHEMA_V11) Add ability to mark case_law records as related (e.g. same appeal through ועדת ערר → מנהלי → עליון): - DB: case_law_relations join table (bidirectional, V11 migration) - DB CRUD: add/remove/get_case_law_relations - Service: get_precedent() now returns related_cases[] - MCP: precedent_link_cases + precedent_unlink_cases tools - REST: POST/DELETE /api/precedent-library/{id}/relations - UI: RelatedCasesSection on detail page with search dialog and unlink Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 07:52:29 +00:00
Chaim	4a9a6b7970	feat(precedents): UI button queues extraction for local MCP worker All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details The chair wanted a one-click "extract metadata" button on the edit sheet. The constraint stays the same — claude_session needs the local CLI which the container doesn't have, so the button can't run the extractor itself. Compromise: button stamps a queue marker; the local MCP server drains the queue on demand. DB (V8): two nullable timestamps on case_law, metadata_extraction_requested_at and halacha_extraction_requested_at, with partial indexes for cheap "find pending" scans. API: POST /api/precedent-library/{id}/request-metadata → stamp the row POST /api/precedent-library/{id}/request-halachot → same for halacha GET /api/precedent-library/queue/pending?kind=... → read-only view UI: Sparkles button in the edit sheet header. Click → toast tells the chair what to run from Claude Code. The button never triggers the extractor directly from the container. MCP tool: precedent_process_pending(kind, limit) — runs from Claude Code with the local CLI, picks up everything stamped, calls the extractor for each, clears the timestamp on success. Failures keep the timestamp so the next invocation retries them. Architectural rule (claude_session local-only) is preserved end-to-end and called out in the new endpoint comment + tool docstring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 12:32:25 +00:00
Chaim	73a79ea7e8	feat(precedents): metadata auto-fill, edit sheet, persuasive extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details Three improvements to the precedent library based on usage feedback: 1. Auto-fill metadata at upload time. New service precedent_metadata_extractor reads the ruling's full_text and suggests case_name (short), summary, headnote, key_quote, subject_tags, appeal_subtype. The merge policy fills only empty fields, preserving everything the chair typed in the upload form. Wired into the ingest pipeline; also exposed as a re-run endpoint POST /api/precedent-library/{id}/extract-metadata for existing records. 2. Edit sheet in the UI. Pencil icon on each library row opens a pre-populated form covering every field. A Sparkles button on the sheet runs the metadata extractor on demand and refreshes the form. The case_number is read-only because halachot are FK'd to it; renaming requires delete + re-upload. 3. Halacha extractor branches on is_binding. Sources marked binding (Supreme/Administrative) keep the strict halacha prompt. Non-binding sources (other appeals committees, district courts on planning matters) get a different prompt that extracts applications, interpretive principles, and persuasive conclusions — labeled with new rule_types 'application' and 'persuasive'. The fallback also widens chunk selection: if the chunker labeled nothing as legal_analysis/ruling/conclusion, we now run on all chunks rather than returning zero halachot for a usable ruling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 10:19:35 +00:00
Chaim	7ee90dce31	feat: external precedent library with auto halacha extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details Adds a third corpus of legal authority distinct from style_corpus (Daphna's prior decisions for voice) and case_precedents (chair-attached quotes per case). The new corpus holds chair-uploaded court rulings and other appeals committee decisions, with binding rules (הלכות) extracted automatically and queued for chair approval. Pipeline (web/app.py + services/precedent_library.py): file → extract → chunk → Voyage embed → halacha_extractor → store + publish progress over the existing Redis SSE channel. Schema V7 (services/db.py): extends case_law with source_kind + extraction status fields under a CHECK constraint pinning practice_area to the three appeals committee domains (rishuy_uvniya, betterment_levy, compensation_197). New precedent_chunks (vector(1024)) and halachot tables (vector(1024) over rule_statement, IVFFlat indexes, gin on practice_areas/subject_tags). Halachot start as pending_review; only approved/published rows are visible to search_precedent_library. Agents: legal-writer, legal-researcher, legal-analyst, legal-ceo, legal-qa get search_precedent_library. legal-writer prompt explains the three-corpus distinction and CREAC use; legal-qa now verifies that every cited halacha resolves to an approved row in the corpus. UI: /precedents page with four tabs — library / semantic search / pending review (J/K nav, A/R/E shortcuts, badge count) / stats. Reuses the existing upload-sheet progress + SSE pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:38:18 +00:00

9 Commits