Commit Graph

171 Commits

Author SHA1 Message Date
6fcfdc76db fix(#79): chunker never emits sub-50-char fragment chunks (#55 follow-up)
A section that opens with a short header line ('דיון', 'טענות המשיבים')
followed by a paragraph larger than chunk_size flushed the header alone as a
tiny chunk. #55 added a query-time >=50 filter to hide these; this removes
them at the source.

_split_section: (1) don't flush a buffer still below MIN_CHUNK_CHARS — let it
absorb the next paragraph even if that overflows chunk_size, so a short header
rides with its following content; (2) fold a trailing tiny chunk back into its
predecessor.

Verified: re-chunked the 4 corpus docs that still had a tiny chunk
(ע"א 5138/04, בר"מ 2340/02, בג"ץ 6525/15, 403-17) — corpus-wide chunks<50
went 4 -> 0; all 4 stay embedded/searchable and rank top in a relevant search
(נווה שלום #1 for the s.19(ג)(1) exemption query). No regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 08:10:10 +00:00
f46bf47d5b feat(web-ui): expose citation-corroboration badge on halachot (X11)
- db.list_halachot: aggregate corroboration_count (distinct positive sources)
  + corroboration_negative from halacha_citation_corroboration (LEFT JOIN)
- web-ui: CorroborationBadge — 'מתוקף · N ציטוטים' at ≥2 (gold), soft single
  citation, danger badge on negative treatment; native title tooltips
- shown in ExtractedHalachotSection (per-precedent) + halacha review panel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 05:04:31 +00:00
ed547e20ad feat(corroboration): wire approval gate + backfill driver + rebuild tool (X11 Phase 2)
- db: approve_halacha_by_corroboration (pending_review→approved only),
  demote_halacha_overruled (approved→pending_review only), list_corroboration_grouped,
  precedents_with_halachot_and_incoming_citations
- corroboration: reconcile_approvals (INV-COR2/COR4/COR5), build_all backfill;
  build_for_precedent now returns approved/demoted counts
- mcp: corroboration_rebuild write tool (single precedent or full-corpus backfill)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:35:37 +00:00
df007784c9 feat(corroboration): approval_action decision fn + kill-switch (INV-COR2/COR4, X11 Phase 2)
- HALACHA_CORROBORATION_AUTO_APPROVE config (default ON, Dafna validated 2026-06-01)
- approval_action(agg, has_overruled): overruled→demote, corroborated→approve, else None
- 4 offline unit tests; Phase 2 plan + TaskMaster #75

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:34:23 +00:00
885cba543e feat(halacha): lighter effort for BULK queue-drain extraction (speed at scale)
xhigh is the quality sweet-spot for a single precedent but very slow at scale
(64-chunk case ≈ 20 min). Bulk queue-drains (process_pending over many
precedents) now use a lighter effort to cut wall-clock; interactive single
re-extraction keeps xhigh quality.

- config.HALACHA_BULK_EXTRACT_EFFORT (env, default 'high'; set 'medium' for max
  speed, 'xhigh' to match single).
- extract()/_extract_impl()/_extract_chunk() take an `effort` override threaded
  to claude_session.query_json; None falls back to HALACHA_EXTRACT_EFFORT (xhigh).
- process_pending_extractions(kind='halacha') passes the bulk effort; single
  reextract_halachot keeps xhigh.

Verified end-to-end (mocked LLM): _extract_chunk(effort='medium') → query_json
effort='medium'; effort=None → 'xhigh' fallback. Closes the open item in #72.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 21:34:13 +00:00
8e4ea23882 feat(halacha): crash-safe incremental extraction + resume (A + resume)
Halacha extraction held ALL chunk results in memory and stored once at the very
end — a crash/interrupt mid-run (e.g. the 2026-05-31 freeze) lost everything and
re-paid the full LLM cost on retry.

Now each chunk's halachot are stored AND the chunk is checkpointed
(precedent_chunks.halacha_extracted_at) the moment it finishes:

- V25 schema: precedent_chunks.halacha_extracted_at (per-chunk checkpoint).
- db.store_halachot_for_chunk: atomic per-chunk insert (halacha_index continues
  from MAX, caller serializes via an in-process store-lock) + checkpoint mark.
- db.reset_halacha_extraction (force) / mark_all_chunks_extracted (legacy backfill).
- _extract_impl rewritten: resume by default (skip checkpointed chunks; failed
  chunks stay pending and are retried; status stays 'processing' until all done);
  force=True wipes + redoes all. reextract_halachot passes force=True; the queue
  drain (process_pending) resumes by default.
- Legacy guard: a pre-V25 precedent (halachot exist, no checkpoints) is
  backfilled and treated as complete — never re-extracted (would duplicate).

Verified on 9002-24 (55 halachot, legacy): resume → legacy-backfill, NO
duplication (stays 55), all chunks checkpointed. Index continuation: store at
55,56 after max 54, no collision. Tracks #72.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 21:27:46 +00:00
807053ec54 fix(halacha): global advisory lock — one extraction at a time (prevents box freeze)
2026-05-31: opus-4-8 @ xhigh extraction + overlapping driver processes (agent
fallback retries each spawn an independent `python -c` driver; process_pending is
serial WITHIN a process but the box ran 4-5 drivers in parallel) → 12-16 concurrent
xhigh `claude -p` procs → load 69 → hard reboot.

Fix: halacha_extractor.extract() now takes a Postgres advisory lock
(pg_try_advisory_lock, key 'HALA') before any work. If another extraction (any
process/agent/driver — all share the legal-ai DB) holds it, the call returns
status='busy' and the precedent stays pending for the next drain. Guarantees ONE
extraction at a time ACROSS PROCESSES — an in-process Semaphore cannot (drivers
are separate OS processes). Core logic moved to _extract_impl (unchanged) under
the lock. CHUNK_CONCURRENCY now env-tunable (HALACHA_CHUNK_CONCURRENCY, default 3).

Verified: while a lock is held, extract() returns 'busy' with no LLM call; lock
releases cleanly and the next extraction proceeds. Tracks #72.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 20:42:15 +00:00
5abfbd2746 feat(mcp): halacha_corroboration read-only tool (INV-COR6, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:07:37 +00:00
b57e590275 feat(corroboration): orchestrator + persistence over both citation graphs (X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:04:20 +00:00
33f955e372 feat(corroboration): aggregator — distinct positive + negative-flag (INV-COR4, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:00:16 +00:00
dbc176ae66 feat(corroboration): halacha matcher + cosine threshold (INV-COR3, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 18:57:47 +00:00
09eec6a906 feat(corroboration): treatment classifier + polarity (INV-COR2, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 18:54:50 +00:00
ca31932a5f feat(db): V24 — citation treatment column + halacha corroboration link table (X11) 2026-05-31 18:52:16 +00:00
887079535c feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction
ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים),
לצמצום היקף האישור-הידני של היו"ר:

- docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם
  ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ).
- docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר
  לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס.
- Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable;
  claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31:
  פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל.
- scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות.
- .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו).

תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות.
audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:42:13 +00:00
6ff2e36bf9 feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63)
Covers GAP-11 (INV-RET4/G8) and GAP-14 (INV-QA1/G10). Retrieval quality was
never measured (only telemetry observation) and the halacha review backlog was
invisible (the 10/19 gap was found by accident).

Unit B — backlog visibility (pure code, container):
- metrics.halacha_backlog(conn) → {pending_review, approved, rejected, published,
  total, oldest_pending_at}; surfaced in metrics.get_dashboard() (get_metrics MCP
  tool) and /api/system/diagnostics. Live count revealed 178 pending / 1552 total,
  oldest from 2026-05-03 — previously invisible.

Unit A — retrieval eval harness (host-side scripts):
- scripts/eval_gold_bootstrap.py — seeds data/eval/gold-set.jsonl. Two sources:
  citations (cited==relevant via search_relevance_feedback — empty until decisions
  cite precedents) and known_item (query=case_name → relevant=self; a real
  citation-free signal, the methodology #52 checked by hand). Idempotent; preserves
  source='chair' rows.
- scripts/eval_retrieval.py — runs the production retrieval path (search_library /
  search_internal) over the gold-set; computes precision@k, recall@k, MRR, nDCG@k
  (k=5,10); aggregates overall + per-corpus + per-practice_area; writes a report and
  a delta vs committed baseline.json (which records the retrieval_config it reflects).
  --self-test unit-checks the metric math offline.

Gold-set strategy = hybrid (chair decision): bootstrap + chair review. The citation
source is empty today (0 cited precedents in decisions), so the seed is known-item
(77 queries: 54 internal_decisions + 23 precedent_library). The gold-set is
PROVISIONAL until Dafna reviews it (the domain chair-gate).

Baseline (production config: multimodal+rerank on): R@10=0.987, MRR=0.837,
nDCG@10=0.872. Finding: MULTIMODAL_ENABLED=true slightly lowers known-item recall
(image-page results displace exact name matches) — relevant to #15. precedent_library
weaker than internal (R@10 0.957 vs 1.0) — one external precedent unfindable by name.

"CI gate" realized as discipline (re-runnable harness + committed baseline + run
before/after any retrieval-layer change) — retrieval needs prod DB + Voyage, no CI
runner has that access.

Spec: docs/superpowers/specs/2026-05-31-fu5-eval-harness-design.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:58:13 +00:00
4d8422198a feat(guard): fitness function blocking raw Paperclip access (GAP-22, FU-8a)
Wakeup-INSERT rule is universal (never allowlisted — hard invariant). Raw-HTTP
rule exempts the sanctioned helpers + standalone operator/admin scripts (a
distinct category per fitness-function scope differentiation + DRY: tooling
needn't reuse the FastAPI wrapper). Repo scanned clean under these rules.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 11:35:07 +00:00
a66ab3b3cd feat(guard): fitness function blocking raw Paperclip access (GAP-22, FU-8a)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 11:16:36 +00:00
aac383acb7 feat(sync): --verify exits non-zero on drift; adapter mismatch = loud drift (GAP-21, FU-8a)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 11:14:44 +00:00
e46868feda feat(fu2b): flag PROC_MISMATCH (case_number prefix vs proceeding_type) for chair
Dry-run surfaced 2 rows with בל"מ prefix but proceeding_type=ערר. Since the
migration strips the prefix, a wrong proceeding_type would silently lose the
בל"מ signal — must be chair-adjudicated, not auto-applied. Chair table now
flags 4 rows: 2 DUP_CHECK (8047-23) + 2 PROC_MISMATCH.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 08:57:42 +00:00
a41fcedc28 test(fu2b): failing tests for bare-number extraction (FU-2b) 2026-05-31 08:52:48 +00:00
7e35a24d80 test(reindex): cover empty-text raise path (FU-3 review)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:13:18 +00:00
8a0c206ecd feat(reindex): precedent_reindex MCP tool (GAP-09, FU-3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 22:09:44 +00:00
f008820ec8 feat(reindex): health-check stale_embedding_case_law count (GAP-09, FU-3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 22:08:27 +00:00
63abf83e76 test(reindex): fix mark_indexed stub arity in FU-1 fixture (FU-3)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:07:39 +00:00
c8de42150e test(reindex): stub db.mark_indexed in FU-1/FU-2a ingest fixtures (FU-3 interaction)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:07:18 +00:00
c7c7a1e119 feat(reindex): reindex_case_law from stored text + mark_indexed on ingest (GAP-09, FU-3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 22:06:17 +00:00
96ae83081f feat(reindex): V23 content/indexed hashes + helpers + write content_hash (GAP-09, FU-3)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:04:43 +00:00
e522555b1a test(reindex): failing tests for content-hash re-index (FU-3) 2026-05-30 22:02:16 +00:00
9bfb912bdf fix(audit): _collect_block_sources mirrors None-doc-types (provenance accuracy, FU-7 review)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:40:42 +00:00
677f29ddec feat(audit): blocks_stale drift flag + health-check visibility (GAP-17, FU-7)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 21:36:56 +00:00
7e2f4b2872 feat(qa): citation→corpus resolution as non-blocking warning (GAP-20, FU-7)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:35:24 +00:00
769f5020eb feat(audit): block→source provenance via write_block audit event (GAP-19, FU-7)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:33:36 +00:00
1f483383b9 feat(audit): log document_upload/extract_claims/export_docx (GAP-18, FU-7)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 21:31:09 +00:00
a121f79d6a feat(audit): log_action_safe + V22 blocks_stale + citation resolver (FU-7)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 21:29:26 +00:00
bffd2ec701 test(audit): failing tests for audit-trail + provenance (FU-7) 2026-05-30 21:27:54 +00:00
5d3c340243 test(ingest): stub recompute_searchable in FU-1 fixture (FU-2a interaction)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 20:59:11 +00:00
358d82e90e feat(retrieval): require practice_area only for internal/cases; enable searchable filter + health visibility (GAP-13, FU-2a)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 20:57:27 +00:00
6dbcb7e798 feat(ingest): recompute searchable on ingest + metadata completion (GAP-13, FU-2a)
Wire db.recompute_searchable into the ingest pipeline (after statuses are set) and into
extract_and_apply (after fields are persisted to DB, success path only).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 20:47:51 +00:00
4b8bbc3794 feat(data-model): V21 searchable flag + recompute_searchable (GAP-13, FU-2a)
Add SCHEMA_V21_SQL (searchable boolean column + index on case_law), wire it
into _run_schema_migrations, and implement _compute_searchable (pure predicate)
+ recompute_searchable (idempotent async backfill/update). All 5 unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 20:46:29 +00:00
cd0f6cda0a feat(ingest): atomic ON CONFLICT upsert in create_*_case_law (GAP-03, FU-2a)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 20:44:31 +00:00
2b91173f25 feat(ingest): write-time canonical case_number normalization (GAP-06, FU-2a)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 20:42:47 +00:00
bcd226ac1a test(ingest): failing tests for idempotent ingest + searchable (FU-2a)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 20:41:34 +00:00
3c431403f6 refactor(ingest): drop obsolete queue_halachot flag + dead imports (FU-1 review)
pipeline always queues both extraction kinds (INV-ING3); remove the
now-meaningless queue_halachot param from ingest_internal_decision and
migrate_from_style_corpus. Also trim chunker/extractor/rerank from the
precedent_library module-top import (chunking/extraction moved to ingest.py).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 19:26:53 +00:00
5104db8f4e refactor(ingest): ingest_internal_decision delegates to canonical pipeline; queue metadata too (GAP-02, FU-1)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:19:10 +00:00
d7eb1b2824 refactor(ingest): ingest_precedent delegates to canonical pipeline (FU-1)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:16:29 +00:00
be4f7bbe99 feat(ingest): canonical ingest_document pipeline (FU-1) 2026-05-30 19:13:15 +00:00
d4663eba8f feat(ingest): IntakeSpec + shared helpers for canonical pipeline (FU-1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 19:11:27 +00:00
9ae2d47d03 test(ingest): failing tests for unified pipeline (FU-1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 19:09:37 +00:00
0c8d415044 fix(retrieval): scope search_decisions by domain — derive from case, block only on undeterminable case (GAP-12, INV-RET1)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 18:23:41 +00:00
084b31cd9b fix(qa): enforce critical-QA gate on export + fix neutral_background critical-but-passed (GAP-15/16, INV-QA3/EX3)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 17:58:50 +00:00