legal-ai

Author	SHA1	Message	Date
Chaim	a61495f5ef	fix(api): export endpoint returns 409 when QA gate blocks (FU-6 UX — avoid false success toast) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 18:03:21 +00:00
Chaim	084b31cd9b	fix(qa): enforce critical-QA gate on export + fix neutral_background critical-but-passed (GAP-15/16, INV-QA3/EX3) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:58:50 +00:00
Chaim	1473bdf3c2	merge: FU-4/GAP-10 corpus-isolation fix All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m39s Details Enforce source_kind on halacha_filters (db.py) — closes cross-corpus halacha leak (#56). Verified by offline regression test (mcp-server/tests/test_precedent_corpus_isolation.py).	2026-05-30 17:53:46 +00:00
Chaim	f51036bd98	merge: System Spec-set + gap-audit (sub-projects 1+2) Adds docs/spec/ (14-file living system spec, 11 invariants) + gap-audit (23 findings → 8 fix-units) + TaskMaster tasks 59-66. Closes PR #8. Docs/tasks only — no runtime code.	2026-05-30 17:53:46 +00:00
Chaim	1af689a969	fix(retrieval): enforce source_kind on halacha_filters — close cross-corpus leak (GAP-10, INV-RET1) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:46:59 +00:00
Chaim	80d1c5ff27	tasks(legal-ai): reconcile #56 (cancel→superseded by 62.1) + #57 (link to FU-3) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:43:12 +00:00
Chaim	d72d5429ed	tasks(legal-ai): 8 fix-unit tasks (59-66) + 23 GAP subtasks from gap-audit Granularity (epic-per-fix-unit + subtask-per-gap) and dependency-aware/WSJF prioritization both backed by ≥3 authoritative sources (SAFe/Pichler/OWASP/CVSS; Wake-INVEST/Cohn/Agile-Alliance/Atlassian/SAFe). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:38:31 +00:00
Chaim	28bed4906c	docs(spec): gap-audit — 23 findings mapped to invariants + proposed fix-units (sub-project 2)	2026-05-30 17:27:06 +00:00
Chaim	ebfda74575	docs(spec): X1 — canonical case_number = official assigned number (no month invention); mixed-form reconciliation is a migration task	2026-05-30 17:23:14 +00:00
Chaim	e3880aef4e	docs(spec): sign-off fixes — 06 index row (G2,G9), refresh stale §7 note, fix X3 G9 anchor niqqud Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:15:00 +00:00
Chaim	380998da17	docs(spec): X5 — file:line/name precision (log_search_bg, user param, active_draft_path)	2026-05-30 17:09:33 +00:00
Chaim	8c4b8cf19e	docs(spec): X5-audit-provenance Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:05:43 +00:00
Chaim	b0351958db	docs(spec): X4-agents map + reserved process-agents section Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 16:59:31 +00:00
Chaim	c881665b7c	docs(spec): constitution index — X3 enforces G2,G9 (operational) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 16:56:39 +00:00
Chaim	7fd6d8cb95	docs(spec): X3 — replace out-of-repo memory links with plain mentions (self-containment)	2026-05-30 16:56:20 +00:00
Chaim	951f2366e6	docs(spec): X3-integration-deploy Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 16:53:01 +00:00
Chaim	a0004f0274	docs(spec): constitution — document third authority model (project-operational) X2/X3/X4 invariants are facts about this system's own integration/ops (no external authority); they use מקור-סמכות=project runbooks, tied to a global engineering invariant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 16:49:58 +00:00
Chaim	f0fd405f4e	docs(spec): X2-multi-company sync rules Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 16:47:19 +00:00
Chaim	b0e4e14832	docs(spec): X1-identifiers canonical model Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 16:41:37 +00:00
Chaim	b46d25f605	docs(spec): 07-learning loop Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 15:21:34 +00:00
Chaim	0fd06659da	docs(spec): 06-export DOCX contract Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 15:16:00 +00:00
Chaim	c0ef90d722	docs(spec): 05-qa-review — clarify neutral_background dual return path (critical fallback w/ passed=True); fix line ref Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 15:12:30 +00:00
Chaim	c1872aa214	docs(spec): 05-qa-review — QA gates + human gates	2026-05-30 15:09:42 +00:00
Chaim	1582556b0b	docs(spec): 04-analysis-writing — 12 blocks + reasoned-decision invariants Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 15:03:56 +00:00
Chaim	5e80bf560d	docs(spec): constitution index — add G9 to 03-retrieval row (consistency) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 15:00:30 +00:00
Chaim	72737df154	docs(spec): 03-retrieval corpora + retrieval invariants Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:57:11 +00:00
Chaim	998194462f	docs(spec): 02-data-model entities + completeness contract Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:50:06 +00:00
Chaim	9199214b7c	docs(spec): 01-ingest — trim §4 redundancy (reference INV-ING3) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:46:23 +00:00
Chaim	da80bcf0fe	docs(spec): 01-ingest unified intake contract Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:42:26 +00:00
Chaim	6afd155dc1	docs(spec): scope ≥3-source rule to engineering decisions; reframe legal-content (G11) Per chair clarification: the ≥3-authoritative-source verification protocol governs ENGINEERING/architecture decisions only (G1–G10). Legal-domain content (G11) is the authority of the chair + project docs (block-schema, decision-methodology, lessons, skills/decision) — NOT externally triple-sourced. - §2/§4/§5 scoped to engineering invariants; added the two-authority distinction - G11 reframed: source-of-authority = chair + project docs; removed FJC/South Bucks/ 1958-statute as "sources to verify" and the UNVERIFIED flag - Removed the "open items — primary-source verification" section (the over-application) - Pruned now-orphaned legal sources from the appendix (kept NCSC/CEPEJ/FJC for G9/G10) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:37:54 +00:00
Chaim	1daaa4861b	docs(spec): reframe G2 example as structural asymmetry + note forthcoming files Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:21:00 +00:00
Chaim	fd682d130f	docs(spec): 00-constitution — mission, 11 global invariants, engineering rules Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:15:28 +00:00
Chaim	c351d6d714	docs(spec): scaffold docs/spec/ living spec-set Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:12:25 +00:00
Chaim	1d01135e32	docs(plan): implementation plan for system spec-set (sub-project 1) 13 tasks across 3 phases (keystone constitution → lifecycle files → cross-cutting), each verification-gated (≥3 sources or UNVERIFIED+escalate) with review checkpoints. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:08:31 +00:00
Chaim	a5b22dadf3	docs(spec): master design for system spec + integrity layer Establishes the foundation to fix a recurring root-cause failure class (non-canonical identifiers, asymmetric ingest paths, silent manual gates): - Confirmed system mission (quasi-judicial decision assistant; human decides) - Decomposition into 5 sub-projects (spec → audit → integrity layer → re-check → process agents) - spec-set structure under docs/spec/ (lifecycle-organized + cross-cutting files) - 11 global invariants + engineering rules, each backed by ≥3 authoritative sources (NCSC/JTC, FJC, CEPEJ, South Bucks; RAG/Lewis, Manning IR, Elastic/Pinecone/Weaviate; DAMA-DMBOK, ISO 8000, ISO 15489, Kleppmann, Codd, Fowler) - 3-source verification protocol; UNVERIFIED items escalated, not decided solo Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:05:06 +00:00
Chaim	7826ff4910	fix(cases): tolerant case_number lookup so agents see case documents All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m39s Details Reported: an agent claimed the case had no documents because document_list returned empty — but the documents exist. Root cause: get_case_by_number did an exact `WHERE case_number = $1`, so any formatting variant of the number silently failed to resolve. Verified on 8137-24 (9 docs): "8137/24", "ערר 8137-24", leading/trailing space, and "בל\"מ 8126/03/25" all returned "תיק לא נמצא", which the agent read as "no documents" and went blind. Add _normalize_case_number (strip leading proceeding-type prefix to the first digit, trim, unify '/'→'-') and a normalized fallback in the lookup query (exact match preferred via ORDER BY). One fix covers every case_number-scoped tool (document_list, extract_references, search_case_documents, get_claims, drafting, ...). Bogus numbers still correctly resolve to "not found". (#58) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:54:52 +00:00
Chaim	58ab003206	fix(retrieval): make decisions findable by name + unhide committee uploads All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m57s Details Root cause of "agent can't find the Agasi decision in the corpus" (CMPA-55): the decision was fully ingested, but the retrieval layer failed on the realistic agent query — searching by case name. - RC-A (#52): lexical tsvector covered only chunk content + halacha text, so a bare-name query ("אגסי") matched decisions that cite the case, not the case itself. Add meta_tsv on case_law(case_name, case_number) (SCHEMA V20) and OR it into the lexical halacha/chunk SQL with a match boost, so a name/number hit surfaces the case's own rows. Agasi: rank 4 → rank 1. - RC-B (#53): precedent_library_list hard-defaulted source_kind=external_upload and never exposed the param, hiding uploaded ערר/בל"מ (internal_committee) decisions. Thread source_kind through service → tool → MCP tool (supports 'internal_committee' / 'all_committees'). - #54: agent instructions (researcher/analyst/writer) — search-by-name protocol: add content/case-number, search both corpora, use all_committees before declaring "not in corpus". - #55: chunker produced tiny fragment chunks ("דיון", "החלטה") from header keywords matched mid-sentence. Anchor SECTION_PATTERNS to line start + merge sub-min sections; exclude <50-char fragments at query time (484 existing fragments hidden; full re-chunk tracked as #57). Tests: scripts/test_retrieval_by_name.py (name ranks case above citer + substantive regressions); chunker unit checks (0 tiny chunks). New findings filed as tasks #56 (halacha source_kind leak) and #57 (re-chunk migration). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:26:19 +00:00
Chaim	165efc62b0	docs(claude): correct canonical tasks.json path + add CLI cwd footgun warning TaskMaster's --tag selects the logical group inside a file, not which tasks.json to write; the CLI resolves the file from cwd. Document the canonical project-root-relative path and the cwd footgun. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:19:47 +00:00
Chaim	d3c6baf9e2	security(chat): bind chat service to docker bridge + require Bearer auth All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m38s Details Address security-review finding: the host-side legal-chat-service was binding 0.0.0.0:8770 with no authentication. The service spawns the claude CLI, whose tool set includes Bash + Edit — so an unauthenticated /chat/start is effectively RCE. Oracle Cloud's security list closes the port externally, but defense-in-depth requires two independent layers: 1. Bind defaults to 10.0.1.1 (docker0 bridge gateway). Reachable from containers on docker bridges (the legal-ai container has a route via the coolify network), invisible to anything outside the host. The --host flag is still configurable for local-dev (127.0.0.1) or special-case deployments, but 0.0.0.0 is explicitly discouraged in the docstring. 2. /chat/start requires Authorization: Bearer <LEGAL_CHAT_SHARED_SECRET>. The secret is loaded from /home/chaim/.legal-chat-service.env (chmod 600, off-repo) by the pm2 ecosystem and mirrored as a Coolify env var so the FastAPI chat_proxy sends a matching header. hmac.compare_digest prevents timing oracles. /health stays unauthenticated (static OK, no subprocess) so the FastAPI proxy can probe liveness without the secret. The service refuses to start if LEGAL_CHAT_SHARED_SECRET is empty or shorter than 24 chars — no silent fallback to an open mode. When the Infisical MCP comes back, migrate the secret into the vault at /_GUIDELINES per the project secrets policy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:22:14 +00:00
Chaim	5ad541e54c	ui(precedents): upload sheet routes ערר/בל"מ to internal-decisions endpoint Some checks failed Build & Deploy / build-and-deploy (push) Has been cancelled Details Citations starting with ערר/בל"מ/ARAR are committee decisions and must carry chair_name + district. The /precedents upload form previously errored out for these (precedent_library service rejects them) with no in-UI path forward — internal_decision_upload was only reachable via the /missing-precedents flow. The form now auto-detects committee citations, reveals chair_name + district fields, hides the irrelevant source_type/precedent_level (derived server-side), and posts to /api/internal-decisions/upload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 10:22:03 +00:00
Chaim	a3454bcb57	fix(training): bundle reference content + use docker bridge gateway All checks were successful Build & Deploy / build-and-deploy (push) Successful in 9s Details The Style Studio's curator-prompt + chat features read reference docs from disk at runtime. Two issues from the initial production run: 1. Dockerfile + .dockerignore excluded .claude/, docs/, and most of skills/. Now COPY the four specific files the new endpoints need: - .claude/agents/hermes-curator.md - skills/decision/SKILL.md - docs/legal-decision-lessons.md - docs/corpus-analysis.md .dockerignore opens whitelists for just those files. 2. Coolify's custom_docker_run_options=--add-host=host.docker.internal:host-gateway is not honored on dockerimage build_pack apps (ExtraHosts stayed []). Switch chat_proxy.py default to http://10.0.1.1:8770 — the docker0 bridge gateway, same pattern Paperclip uses for 3100. Bind the host pm2 service to 0.0.0.0:8770 so the container can reach it via the bridge IP. Oracle Cloud's security list keeps the port unreachable from the public internet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:15:27 +00:00
Chaim	bb0cd7c6a2	feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat All checks were successful Build & Deploy / build-and-deploy (push) Successful in 2m7s Details Six-phase upgrade of /training from a read-only dashboard into a full Style Studio for managing Daphna's style corpus. - Upload Sheet on /training: file → proofread preview → commit (no more CLI-only `upload-training` skill). - Rich corpus metadata: GET /api/training/corpus returns summary, outcome, key_principles, page_count, parties (regex), legal_citation, lessons_count. PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details /content/lessons/patterns) replaces the bare table row. - LLM metadata enrichment: style_metadata_extractor + MCP tools (style_corpus_enrich, style_corpus_pending_enrichment) fill summary /outcome/key_principles via claude_session (free, host-side). - Per-decision lessons: new decision_lessons table + 4 REST endpoints + LessonsTab in drawer; hermes-curator now auto-posts findings as decision_lessons(source=curator). - Curator Portrait tab: prompt rendered with link to Gitea, recent curator findings, style_analyzer training prompts, propose-change form that writes proposals to data/curator-proposals/ for manual chair review (no auto-mutation of the agent file). - Style chat tab: SSE-streamed conversations with the style agent. New host-side pm2 service (legal-chat-service, port 8770) wraps claude CLI with stream-json + --resume continuation; FastAPI proxies via host.docker.internal. Zero API cost — uses chaim's claude.ai subscription. chat_conversations + chat_messages persist history. Architecture: keeps the existing rule that claude_session only runs on the host (not the container). The new legal-chat-service is the canonical bridge between the container and the local CLI for the chat feature; everything else (upload, metadata, lessons) stays within the container's existing capabilities. Audit script (scripts/audit_training_corpus.py) included for verifying which corpus rows still need enrichment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:06:22 +00:00
Chaim	0629f19d5f	ui(missing-precedents): drawer = notes + upload only All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m21s Details The drawer was showing a full metadata form (legal topic, case name, legal issue, cited-by-party + name, status) — most of it duplicated fields that get auto-extracted from the file once it's uploaded, or that are already known from when the row was detected. The visible placeholder text ('לינדאב בע"מ', 'אנטרים', 'זכות עמידה') looked like real data and confused readers. Strip the form down to a single "הערות" textarea — that's the only field the chair actually needs to edit. Reasons for who cited the decision and in what context belong there too. Everything else (shape of the precedent on the case_law side) is the LLM extractor's job. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 09:58:23 +00:00
Chaim	f920cfc738	ui(precedents): edit sheet — make citation_formatted editable All checks were successful Build & Deploy / build-and-deploy (push) Successful in 46s Details The "ערוך פרטים" sheet labeled the case_number field "מראה מקום" and marked it read-only — confusing because the formal citation IS supposed to be editable. Rename the read-only field to "מספר תיק (מזהה ייחודי)" to clarify it's the system key, and add a separate Textarea for the true formal citation (citation_formatted) with the same markdown-bold convention used by the inline editor on the detail page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 09:40:08 +00:00
Chaim	c4046cc0a0	ui(precedents): citation action buttons icon-only All checks were successful Build & Deploy / build-and-deploy (push) Successful in 35s Details Drop the visible "העתק" / "ערוך" labels and keep just the icon — matches the editorial/judicial restraint of the surrounding card. Tooltip + aria-label preserve the affordance for hover and assistive tech. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 09:33:55 +00:00
Chaim	cbc7a1e336	feat(precedents): formal citation per Israeli citation rules + copy/edit UI All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m25s Details Until now, "case_number" was the only stored identifier for a precedent. But a citation per the Israeli unified citation rules is a different beast — it has bold parties, an unbold prefix (court abbrev + panel/ district parenthetical + case number), and an unbold trailing reporter (נבו / פ"ד...). Without storing it as a first-class field we couldn't hand the chair a one-click "copy as citation" experience for pasting into decisions. Changes: - Schema V19: case_law.citation_formatted TEXT (Markdown — parties wrapped in … so the copy helper can render <strong> for Word/Docs paste and keep plain-text fallback meaningful). - Metadata extractor: composes citation_formatted from the document text per the unified citation rules, with worked examples for ע"א / עת"מ / ערר / בל"מ in the prompt. Refuses to store half-formed strings. - PATCH /api/precedent-library/{id} accepts citation_formatted so the chair can correct LLM mistakes. - /precedents/[id]: dedicated "מראה מקום" block with bold rendering, a copy-to-clipboard button (text/html + text/plain so Word keeps the bolds), and an inline edit textarea. - /precedents list rows: link displays the formatted citation when available, with a small inline copy button — falls back to the bare case_number for older rows. Backfill of existing rows happens by re-stamping the extraction queue once V19 has rolled out and the new field is reachable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 07:14:34 +00:00
Chaim	a02a4e3a64	feat(precedents): minimum-effort upload — file+citation, rest auto-extracted All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details The missing-precedents drawer + general precedent upload both required the user to type chair_name, district, practice_area, court, date etc. upfront — even though those fields can be (and already are, post-upload) extracted from the document text by the LLM. The metadata-extraction wakeup also only fired for the /precedent-library/upload path, leaving missing-precedents committee uploads stuck with whatever stub the user typed. Changes: - Extractor learns chair_name + district, overwrites the new PLACEHOLDER_PENDING_EXTRACTION sentinel for internal_committee rows (the DB CHECK forces non-empty; we stamp the placeholder at insert). - missing_precedent_upload no longer 400s on missing chair/district; it infers district from the citation when possible, falls back to the placeholder, and always fires pc_wake_for_precedent_extraction so the LLM can fill in the rest. - Both upload sheets default to file (+ citation) only; every other field is tucked into a closed <details> labeled "אופציונלי — דריסה ידנית של שדות שיחולצו אוטומטית". Required validators on chair/ district/practice_area dropped — the LLM fills them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 14:43:25 +00:00
Chaim	b01722b1b4	feat: emit missing_precedent + export_complete webhooks to plugin All checks were successful Build & Deploy / build-and-deploy (push) Successful in 9s Details Adds two webhook emitters in paperclip_api.py that the plugin's onWebhook handler now routes by ``eventType``: * ``emit_missing_precedent_webhook(...)`` — fires from POST /api/missing-precedents on first insert (non-duplicate). The plugin surfaces an askUserQuestions interaction on the linked issue so Daphna can choose upload / irrelevant / defer without needing to open the legal-ai UI. * ``emit_export_complete_webhook(...)`` — fires from POST /api/cases/{n}/export-docx after a successful export. The plugin attaches a "final-decision" markdown document with a download link to the linked Paperclip issue. Both are fire-and-forget BackgroundTasks — failures are logged but never block the originating request. Company resolution follows the same 1xxx→licensing / 8-9xxx→betterment rule used by emit_case_status_webhook. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 13:29:04 +00:00
Chaim	1d4f214abe	chore(taskmaster): mark #26 + #27 done (Paperclip SDK upgrade + host already on 525) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details	2026-05-26 12:19:16 +00:00
Chaim	2aee398b4a	feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 ) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details Six independent sub-tasks dispatched in parallel; aggregated here. ## #33 — Hide case_name column library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם" get `className="hidden"` in both Court and Committee row variants. DB column preserved for future use. ## #47 — Audit script periodic New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר prefix, internal missing chair/district, cases.practice_area enum) + CEO wakeup on violations + cron `0 7 * * `. First run: 0 issues. ## #48 — Parent-doc retrieval (gated, default off) Schema V17: precedent_chunks.parent_chunk_id + chunk_role ('child'\|'parent'). New chunker.chunk_document_hierarchical() — section-aware parents (~1500 tokens) containing ~5 overlapping children (~300 tokens each). New db.store_precedent_chunks_hierarchical two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and swap content + dedupe by parent_chunk_id when flag on. Toggle: PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS. Backfill ~3min and ~$0.20 — deferred to follow-up. ## #49 — Multimodal backfill New scripts/backfill_multimodal_precedents.py with token-matching case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container: 26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no source file on disk) — will catch up when re-uploaded. ## #50 — Closed-loop feedback + nDCG Schema V18: search_logs + search_relevance_feedback. New telemetry.py with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) + auto-infer_relevance_from_citations (reads case drafts → marks score=3 when cited precedent appears in past search top-K). Hooks added to 5 search paths. scripts/compute_ndcg.py for aggregation. Two admin API endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI deferred — API is enough for now. ## #51 — Halacha quality monitoring New scripts/monitor_halacha_quality.py — baseline avg confidence (trusted=0.849, all=0.833, pending=0.694) with rolling window drift detection. Default 5% threshold. Exits non-zero on alert for cron integration. Recommended: `0 8 * 1` weekly Mon 8am. ## Bonus: 230 unlinked citations → missing_precedents Bulk-imported 230 distinct unlinked citations from precedent_internal_citations to missing_precedents.status='open', party='committee', with notes listing source citers. Top candidate: ע"א 3213/97 (cited 5x). Total open missing_precedents now 237. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 11:26:52 +00:00

1 2 3 4 5 ...

400 Commits