legal-ai

Author	SHA1	Message	Date
Chaim	6ff2e36bf9	feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63 ) Covers GAP-11 (INV-RET4/G8) and GAP-14 (INV-QA1/G10). Retrieval quality was never measured (only telemetry observation) and the halacha review backlog was invisible (the 10/19 gap was found by accident). Unit B — backlog visibility (pure code, container): - metrics.halacha_backlog(conn) → {pending_review, approved, rejected, published, total, oldest_pending_at}; surfaced in metrics.get_dashboard() (get_metrics MCP tool) and /api/system/diagnostics. Live count revealed 178 pending / 1552 total, oldest from 2026-05-03 — previously invisible. Unit A — retrieval eval harness (host-side scripts): - scripts/eval_gold_bootstrap.py — seeds data/eval/gold-set.jsonl. Two sources: citations (cited==relevant via search_relevance_feedback — empty until decisions cite precedents) and known_item (query=case_name → relevant=self; a real citation-free signal, the methodology #52 checked by hand). Idempotent; preserves source='chair' rows. - scripts/eval_retrieval.py — runs the production retrieval path (search_library / search_internal) over the gold-set; computes precision@k, recall@k, MRR, nDCG@k (k=5,10); aggregates overall + per-corpus + per-practice_area; writes a report and a delta vs committed baseline.json (which records the retrieval_config it reflects). --self-test unit-checks the metric math offline. Gold-set strategy = hybrid (chair decision): bootstrap + chair review. The citation source is empty today (0 cited precedents in decisions), so the seed is known-item (77 queries: 54 internal_decisions + 23 precedent_library). The gold-set is PROVISIONAL until Dafna reviews it (the domain chair-gate). Baseline (production config: multimodal+rerank on): R@10=0.987, MRR=0.837, nDCG@10=0.872. Finding: MULTIMODAL_ENABLED=true slightly lowers known-item recall (image-page results displace exact name matches) — relevant to #15. precedent_library weaker than internal (R@10 0.957 vs 1.0) — one external precedent unfindable by name. "CI gate" realized as discipline (re-runnable harness + committed baseline + run before/after any retrieval-layer change) — retrieval needs prod DB + Voyage, no CI runner has that access. Spec: docs/superpowers/specs/2026-05-31-fu5-eval-harness-design.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 14:58:13 +00:00
Chaim	e8431a2adf	docs(spec): FU-8a process→code guards design (GAP-21/22) + split GAP-23 to #69 GAP-21: sync_agents --verify exits non-zero on drift; adapter_type mismatch counted as drift (loud), not silent skip — makes it an enforceable gate (INV-MC1). GAP-22: fitness-function pytest guarding against raw Paperclip HTTP + direct agent_wakeup_requests INSERT (INV-INT1/INT3). Repo pre-scanned: 0 existing violations → clean forward-fence. Verified vs 3+ sources (architectural fitness functions; drift-verify non-zero exit). GAP-23 (spec→agents) split to #69. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 10:48:15 +00:00
Chaim	105d9626ca	docs(spec): FU-2b internal identifier reconciliation design (GAP-07/08) + split external to #68 Deterministic migration of ~52 internal_committee rows whose case_number holds a full citation → normalized bare number (citation_formatted already correct). DB analysis (2026-05-31): clean 1-token extraction, 0 key-collisions, 0 citation↔case_number mismatches, no month-padding dups. Chair-gated reversible migration (backup→dry-run→approve→apply). One edge for chair: 8047/23 ערר vs בל"מ. External (#68/FU-2c) split out — its citation_formatted is inconsistent. Verified all 11 case_law FKs use id(UUID), not case_number → rename is FK-safe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 06:12:43 +00:00
Chaim	a62116a571	docs(spec): FU-3 re-index on content change design (GAP-09) + close #61.2 not-applicable content_hash/indexed_hash change detection + reindex_case_law from stored full_text (no re-OCR) + drift health-check. Verified vs 3+ sources (content- hash change detection, RAG re-embed-on-edit). #61.2 multimodal backfill closed: 42 rows are text-ingested (document_id NULL, no source PDF) — page-images impossible without a PDF to render. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 21:52:40 +00:00
Chaim	99cd6bc4dd	docs(spec): FU-7 audit-trail + provenance design (GAP-17/18/19/20) Reuse audit_log.log_action with details JSONB (X5 §4, no new table) for end-to-end audit + block→source provenance. GAP-17 drift = blocks_stale flag + health-check (not fragile DOCX→blocks reparse). GAP-20 = structural case_law_id resolution (not Hebrew citation NLP). Verified vs 3+ sources (append-only lineage event; GitOps drift detect-don't-auto-remediate). Pure-code, no migration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 21:15:50 +00:00
Chaim	a8b780765d	docs(spec): FU-2a idempotent-ingest design + split FU-2b migration to #67 FU-2 split (chair decision 2026-05-30): FU-2a = pure-code (GAP-03 ON CONFLICT upsert, GAP-06 write-time type-aware normalization, GAP-13 materialized searchable flag); FU-2b (#67) = data-migration for GAP-07/08 (identifier reconciliation + dedup) deferred as separate chair-involved task. DB check 2026-05-30: ~52/56 internal_committee rows hold full citation in case_number, >=1 duplicate (8047-23). Architecture verified vs 3+ sources (PostgreSQL ON CONFLICT, DDD write-boundary normalization, materialized validity flag). No identifier migration in FU-2a. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:56:07 +00:00
Chaim	90728ccb3e	docs(spec): FU-1 documented drift notes + mark TaskMaster #59 done Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:28:04 +00:00
Chaim	357a5238c4	docs(spec): FU-1 unified-ingest design + FU-3 backfill task (#61.2) Design for unifying the two parallel ingest paths (ingest_precedent / ingest_internal_decision) into one canonical pipeline parameterized by an IntakeSpec config object — Template Method skeleton + Strategy injection. Closes the GAP-02 root cause (missing metadata queue on internal path) by making a skipped step structurally impossible. Architecture choice verified against 3+ authoritative sources (refactoring.guru Template-Method/Replace-Conditional, Fowler FlagArgument, Strategy pattern). DB check (2026-05-30): no migration needed — 0/56 internal rows lack metadata, 0 invalid enums; multimodal backfill (42 rows) tracked as TaskMaster #61.2 / FU-3. Covers GAP-01/02/04/05 · provides INV-ING1/ING3/G2/G4 · TaskMaster #59. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:00:30 +00:00
Chaim	a5b22dadf3	docs(spec): master design for system spec + integrity layer Establishes the foundation to fix a recurring root-cause failure class (non-canonical identifiers, asymmetric ingest paths, silent manual gates): - Confirmed system mission (quasi-judicial decision assistant; human decides) - Decomposition into 5 sub-projects (spec → audit → integrity layer → re-check → process agents) - spec-set structure under docs/spec/ (lifecycle-organized + cross-cutting files) - 11 global invariants + engineering rules, each backed by ≥3 authoritative sources (NCSC/JTC, FJC, CEPEJ, South Bucks; RAG/Lewis, Manning IR, Elastic/Pinecone/Weaviate; DAMA-DMBOK, ISO 8000, ISO 15489, Kleppmann, Codd, Fowler) - 3-source verification protocol; UNVERIFIED items escalated, not decided solo Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 14:05:06 +00:00
Chaim	70052b0133	docs(specs): add design for MCP settings page Settings page extension to view and edit MCP server config (env vars, tools, client registrations) — hybrid edit model: non-secrets editable through Infisical, secrets read-only with drift detection vs container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:44:31 +00:00

10 Commits