legal-ai

Author	SHA1	Message	Date
Chaim	1286a1e60d	feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82) Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes (pure + tested) plus measurement/ops tooling. halacha_quality.py - #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags now takes rule_type and flags rule_type=='application' OR fact-dependent — blocking auto-approve (an illustration is not a generalizable holding). - #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein / lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band. halacha_extractor.py — pass rule_type to the flag computation; re-type a binding-labeled fact-application to 'application' (mirrors non_decision→obiter). db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a possibly-distinct principle unreviewed). config.py — HALACHA_DEDUP_BAND_COSINE (0.83). Scripts: - scripts/halacha_goldset.py (#81.7) — export stratified sample for human tagging; score validators (P/R/F1) against the tags. Backbone for #81.8. - scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent dedup (cosine ≥0.95), dry-run report only. - scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds against the 2026-06-03 cleanup gold-set. Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE on normalized quote are NOT included — the current skip+flag behavior is safe, whereas a UNIQUE on normalized_quote would fail on existing dups and a blind merge risks losing provenance; they need their own chair-reviewed migration. #82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role classifier deferred (section pre-filter + application flag cover the practical case); #81.8 blocked on the human-tagged gold-set (harness now provided). Verified: - pytest tests/test_halacha_quality.py — 52 passed (14 new). - calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall 0.30 — correct profile for an auto-approve-blocking signal. - goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5 cross-precedent candidate pairs. Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no silent swallow — suspect items flagged to review, never dropped); G2 (no parallel path — same store_halachot_for_chunk / compute_quality_flags). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:55:45 +00:00
Chaim	fb60dca796	feat(halacha): over-extraction consolidation — fold facets via claude_session (#81.5) After a precedent finishes extracting, a claude_session pass folds facets of the SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89 granularity gap) into one canonical; the rest are marked 'rejected' (reversible: out of the active corpus AND the review queue, but recoverable). FOLD-ONLY — never merges distinct legal questions, never invents. - Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding needs careful judgment. One pass per precedent, runs in _extract_impl once all chunks are done (the prompt dedups within a chunk; this catches across chunks). - Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM, build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed shape; drops <2-member groups; coerces/dedups indices). - halacha_extractor._consolidate_precedent picks the canonical per group (approved>pending, higher confidence, quote_verified, longer) and rejects the rest via the existing update_halachot_batch (#84). Never rejects a canonical. Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched). - config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT. Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude error → 0 folds (fail-open). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 16:26:44 +00:00
Chaim	f196bed564	feat(halacha): NLI entailment validator via claude_session (#81.3) + task #86 #81.3 — a post-extraction validator that flags halachot whose rule_statement is NOT entailed by its supporting_quote (the model over-reaching beyond its source). - Engine: claude_session-as-judge (local CLI, zero API cost) per chaim's standing preference — one batched judge call per chunk, NOT a hosted NLI model. - Pure, unit-tested helpers in halacha_quality: NLI_SYSTEM, build_nli_prompt, parse_nli_verdicts (fails OPEN — any shape/label ambiguity → 'entailed'). - halacha_extractor._nli_check wraps the call; fails OPEN on any error (e.g. no CLI in the container) so a flaky judge never blocks a genuine halacha. - Non-entailed (neutral/contradiction) → quality_flag 'nli_unsupported' which blocks auto-approve (routes to pending_review) via the existing store gate. - config: HALACHA_NLI_ENABLED/MODEL/EFFORT (effort 'low' — entailment is simple). Verified: suite 166 passed (10 new); LIVE smoke test against the real claude CLI returned ['entailed','neutral'] for a supported vs unsupported rule. Also commits TaskMaster #86 (Nevo preamble/ratio: anti-contamination strip fix + gold-set benchmark) capturing today's strip_nevo_preamble findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 14:46:12 +00:00
Chaim	ca959d4a9c	feat(halacha): strict-rubric quality gate + dedup-on-insert (#81,#82) Bake the 2026-06-03 strict-cleanup rubric into the extraction pipeline so the corpus stays clean at the source instead of accumulating duplicates, obiter dicta, truncated quotes and thin restatements that clog the review queue. #81 — quality gate: - New pure module halacha_quality.py with unit-tested validators: non-decision/obiter (Wambaugh markers), truncated-quote (mid-word cut), thin-restatement (rule≈quote), quote-unverified. - Validators run in halacha_extractor._process; a non-decision is re-typed obiter; flags persist in new halachot.quality_flags column. - Auto-approve now requires confidence>=threshold AND no quality flags; flagged items route to pending_review regardless of confidence. - Both extraction prompts hardened: reject undecided dicta, exclude case-specific applications, require abstraction, forbid over-splitting. #82 — dedup-on-insert (store_halachot_for_chunk): - Within the same precedent, skip a halacha whose normalized supporting_quote already exists, or whose rule-embedding has cosine>=HALACHA_DEDUP_COSINE (0.93) against an already-stored one. Makes re-runs idempotent. Migration: halachot.quality_flags TEXT[] (additive, idempotent ALTER). Tests: 19 new unit tests; full suite 156 passed. Validated end-to-end against dev DB (dedup skips dups, flag blocks auto-approve, re-run inserts 0). Calibration: flags fire on only ~10% of current survivors (low false-positive). Spec: docs/halacha-strict-rubric.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 12:30:38 +00:00
Chaim	885cba543e	feat(halacha): lighter effort for BULK queue-drain extraction (speed at scale) xhigh is the quality sweet-spot for a single precedent but very slow at scale (64-chunk case ≈ 20 min). Bulk queue-drains (process_pending over many precedents) now use a lighter effort to cut wall-clock; interactive single re-extraction keeps xhigh quality. - config.HALACHA_BULK_EXTRACT_EFFORT (env, default 'high'; set 'medium' for max speed, 'xhigh' to match single). - extract()/_extract_impl()/_extract_chunk() take an `effort` override threaded to claude_session.query_json; None falls back to HALACHA_EXTRACT_EFFORT (xhigh). - process_pending_extractions(kind='halacha') passes the bulk effort; single reextract_halachot keeps xhigh. Verified end-to-end (mocked LLM): _extract_chunk(effort='medium') → query_json effort='medium'; effort=None → 'xhigh' fallback. Closes the open item in #72. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 21:34:13 +00:00
Chaim	8e4ea23882	feat(halacha): crash-safe incremental extraction + resume (A + resume) Halacha extraction held ALL chunk results in memory and stored once at the very end — a crash/interrupt mid-run (e.g. the 2026-05-31 freeze) lost everything and re-paid the full LLM cost on retry. Now each chunk's halachot are stored AND the chunk is checkpointed (precedent_chunks.halacha_extracted_at) the moment it finishes: - V25 schema: precedent_chunks.halacha_extracted_at (per-chunk checkpoint). - db.store_halachot_for_chunk: atomic per-chunk insert (halacha_index continues from MAX, caller serializes via an in-process store-lock) + checkpoint mark. - db.reset_halacha_extraction (force) / mark_all_chunks_extracted (legacy backfill). - _extract_impl rewritten: resume by default (skip checkpointed chunks; failed chunks stay pending and are retried; status stays 'processing' until all done); force=True wipes + redoes all. reextract_halachot passes force=True; the queue drain (process_pending) resumes by default. - Legacy guard: a pre-V25 precedent (halachot exist, no checkpoints) is backfilled and treated as complete — never re-extracted (would duplicate). Verified on 9002-24 (55 halachot, legacy): resume → legacy-backfill, NO duplication (stays 55), all chunks checkpointed. Index continuation: store at 55,56 after max 54, no collision. Tracks #72. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 21:27:46 +00:00
Chaim	807053ec54	fix(halacha): global advisory lock — one extraction at a time (prevents box freeze) 2026-05-31: opus-4-8 @ xhigh extraction + overlapping driver processes (agent fallback retries each spawn an independent `python -c` driver; process_pending is serial WITHIN a process but the box ran 4-5 drivers in parallel) → 12-16 concurrent xhigh `claude -p` procs → load 69 → hard reboot. Fix: halacha_extractor.extract() now takes a Postgres advisory lock (pg_try_advisory_lock, key 'HALA') before any work. If another extraction (any process/agent/driver — all share the legal-ai DB) holds it, the call returns status='busy' and the precedent stays pending for the next drain. Guarantees ONE extraction at a time ACROSS PROCESSES — an in-process Semaphore cannot (drivers are separate OS processes). Core logic moved to _extract_impl (unchanged) under the lock. CHUNK_CONCURRENCY now env-tunable (HALACHA_CHUNK_CONCURRENCY, default 3). Verified: while a lock is held, extract() returns 'busy' with no LLM call; lock releases cleanly and the next extraction proceeds. Tracks #72. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 20:42:15 +00:00
Chaim	887079535c	feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים), לצמצום היקף האישור-הידני של היו"ר: - docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ). - docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס. - Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable; claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31: פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל. - scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות. - .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו). תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות. audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 18:42:13 +00:00
Chaim	36f21c815e	fix(precedents): distinguish silent extraction failure from "no halachot" All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m5s Details Observed 2026-05-03: a `precedent_process_pending(halacha)` run that chained two precedents (1110/20 → 317/10) succeeded for the first (9 halachot, 129 chunks) and produced status=`no_halachot` for the second despite it being a 47KB Supreme Court ruling with rich legal analysis. A manual single-precedent re-run on 317/10 immediately extracted 53 halachot. Diagnosis: every chunk's claude_session call in the back-to-back run silently failed (likely Anthropic rate-limit storm after the 1110/20 token burn), and the empty list was reported as "Claude looked and found nothing" — same code path as a real 0-halacha ruling. The user couldn't tell the difference. Three changes: 1. Surface chunk-level failures (halacha_extractor.py) `_extract_chunk` now returns `(halachot, succeeded)` so the caller can count how many chunks crashed. `extract()` uses this to distinguish: - `no_halachot` — chunks ran cleanly, Claude found nothing - `extraction_failed` — ≥50% of chunks crashed AND zero halachot came back (rate limit, subprocess crash, etc.) When `extraction_failed`, DB status is left as 'processing' so the request stays in the queue for the caller to retry — instead of the old behaviour where it got marked 'completed' and silently dropped from the queue. 2. Inter-precedent cooldown (precedent_library.py) `process_pending_extractions` now sleeps 30s between precedents. Anthropic rate-limits per-org, and back-to-back large rulings (~4M tokens for 1110/20, immediately followed by another 2-3M) was the empirical trigger. 30s gives the per-minute counter time to drain. 3. Auto-retry on extraction_failed (precedent_library.py) When a precedent comes back as `extraction_failed`, retry once after a 60s cooldown before giving up. Rate-limit storms are transient — the manual re-run of 317/10 minutes later succeeded with 53 halachot and zero chunk failures, confirming a single retry is sufficient. Only retries `extraction_failed`; never `no_halachot` (Claude looked and there genuinely is no holding). The DB status now ends up as 'failed' only after retries are exhausted, matching the UI's terminal-failure chip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:13:10 +00:00
Chaim	5d836ca414	fix(precedents): Anthropic SDK fallback, format() crash, UI refresh All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m31s Details Three fixes to the precedent library after the first end-to-end test on 403-17 surfaced runtime issues: 1. Anthropic SDK fallback in claude_session. The legal-ai Docker container does not ship the `claude` CLI, so every halacha and metadata extraction was failing with "Claude CLI not found." Module now tries the CLI first (zero-cost local path) and falls back to the Anthropic SDK with ANTHROPIC_API_KEY when the binary is absent. Default model is claude-sonnet-4-6, overridable via CLAUDE_SDK_MODEL env. The system message gets cache_control: ephemeral so multi-chunk runs reuse the cached instruction prefix at ~10% read cost. Adds `anthropic` to pyproject deps. 2. precedent_metadata_extractor crashed with KeyError because the JSON example inside the prompt template contained literal { } characters that str.format() interpreted as placeholders. Switched to f-string concatenation; the prompt template no longer needs format() at all. 3. Library list query stays stale after upload because the upload mutation's onSuccess fires when the POST returns task_id, not when SSE reports completion. Added a second invalidate inside the SSE watcher in PrecedentUploadSheet so the new row appears with up-to-date chunk and halachot counts the moment processing finishes. Halacha and metadata extractors now route the long static prompt through the new `system=` parameter so the SDK path actually caches it; the CLI path concatenates and behaves as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 10:52:31 +00:00
Chaim	73a79ea7e8	feat(precedents): metadata auto-fill, edit sheet, persuasive extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details Three improvements to the precedent library based on usage feedback: 1. Auto-fill metadata at upload time. New service precedent_metadata_extractor reads the ruling's full_text and suggests case_name (short), summary, headnote, key_quote, subject_tags, appeal_subtype. The merge policy fills only empty fields, preserving everything the chair typed in the upload form. Wired into the ingest pipeline; also exposed as a re-run endpoint POST /api/precedent-library/{id}/extract-metadata for existing records. 2. Edit sheet in the UI. Pencil icon on each library row opens a pre-populated form covering every field. A Sparkles button on the sheet runs the metadata extractor on demand and refreshes the form. The case_number is read-only because halachot are FK'd to it; renaming requires delete + re-upload. 3. Halacha extractor branches on is_binding. Sources marked binding (Supreme/Administrative) keep the strict halacha prompt. Non-binding sources (other appeals committees, district courts on planning matters) get a different prompt that extracts applications, interpretive principles, and persuasive conclusions — labeled with new rule_types 'application' and 'persuasive'. The fallback also widens chunk selection: if the chunker labeled nothing as legal_analysis/ruling/conclusion, we now run on all chunks rather than returning zero halachot for a usable ruling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 10:19:35 +00:00
Chaim	7ee90dce31	feat: external precedent library with auto halacha extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details Adds a third corpus of legal authority distinct from style_corpus (Daphna's prior decisions for voice) and case_precedents (chair-attached quotes per case). The new corpus holds chair-uploaded court rulings and other appeals committee decisions, with binding rules (הלכות) extracted automatically and queued for chair approval. Pipeline (web/app.py + services/precedent_library.py): file → extract → chunk → Voyage embed → halacha_extractor → store + publish progress over the existing Redis SSE channel. Schema V7 (services/db.py): extends case_law with source_kind + extraction status fields under a CHECK constraint pinning practice_area to the three appeals committee domains (rishuy_uvniya, betterment_levy, compensation_197). New precedent_chunks (vector(1024)) and halachot tables (vector(1024) over rule_statement, IVFFlat indexes, gin on practice_areas/subject_tags). Halachot start as pending_review; only approved/published rows are visible to search_precedent_library. Agents: legal-writer, legal-researcher, legal-analyst, legal-ceo, legal-qa get search_precedent_library. legal-writer prompt explains the three-corpus distinction and CREAC use; legal-qa now verifies that every cited halacha resolves to an approved row in the corpus. UI: /precedents page with four tabs — library / semantic search / pending review (J/K nav, A/R/E shortcuts, badge count) / stats. Reuses the existing upload-sheet progress + SSE pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:38:18 +00:00

12 Commits