Commit Graph

46 Commits

Author SHA1 Message Date
f5d14fd6b8 chore(#80): backfill 8070-25 -> appraisal multimodal coverage 12/12; close #80
Full check found the premise wrong on every count (like #71/#70):
- Not 140 docs/17,700 pages/2hr/$$ needing Dafna+chaim. Of 140 image-less
  docs, only 65 are PDF (rest MD/DOCX — pipeline renders PDF only) = 704 pages.
- The value docs (appraisal, where multimodal's table/image worth is) were
  already 8/12 embedded. The only gap was ONE case, 8070-25 (4 appraisal docs).
- Backfilled 8070-25 locally (voyage-multimodal-3, ~30s, cents): all 14 docs
  embedded. Appraisal coverage now 12/12 (100%).
- Remaining 51 PDFs/649 pages are all text-dense (reference/response/appeal);
  #15 proved multimodal does NOT help text-dense docs, so they're intentionally
  left text-only. Not an inconsistency — the correct config.

No gold-set / Dafna labeling / chaim cost approval needed — cost was cents and
value was already proven in #15. #80 done (technical, not human-gated).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 09:46:23 +00:00
7d0d4a9b27 chore(#70): delete 15 orphaned cited_only stubs + close #70
The 4 'ambiguous' citation items flagged for chair turned out to be dead
orphan stubs: 0 inbound/outbound edges across all 5 citation mechanisms,
0 full_text, 0 halachot, 0 chunks/embeddings. A corpus-wide check found 15
such orphans total (incl. clean-looking ones). Per OpenCitations (keep an
id-less entity only if it is CITED — these are cited by nothing), these are
pure noise → deleted, not chair-judgment.

- 15 orphan cited_only stubs deleted (cited_only 46 -> 31); backup in
  data/audit/fu2b-orphan-stub-cleanup-*.json.
- 0 malformed / 0 orphans remain; all 31 remaining stubs are cited.
- Combines with the 3 earlier mechanical normalizations. #70 fully done.
- Known forward-edge (no current data, no task): '+' combined-citation
  handling in citation_extractor if it recurs in future extraction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 09:38:30 +00:00
2a9168a1b4 chore(tasks): research-backed decisions to close open tasks (#71/#42/#14/#76/#70)
Per chaim's directive — for decisions not requiring Dafna/chaim, decide after
>=3 authoritative open sources.

#71 DONE — resolved by #15's weight fix (measured: all multi-relevant docs now
  in top-10, the rank-15/16 weak queries fixed). Research (6 sources) said
  enable rerank; tested empirically → it HURT (nDCG@5 0.879 vs 0.960, MRR 0.867
  vs 0.954) because recall is saturated and the cross-encoder demotes exact
  known-item matches. Measurement overrides theory: no rerank, no limit change.
#42 CANCELLED — obviated by BM25 hybrid (already on; handles abbreviation
  tokens lexically); 0 abbrev queries in eval, recall ~0.99, no measured gap.
#14 DEFERRED (reviewed) — no current blocker; YAGNI; trigger documented.
#76 CANCELLED — upstream Paperclip bug (ee=companyId), not safely fixable our
  side; workaround + #78 documented.
#70 — research-backed normalization (ECLI/Akoma Ntoso/ELI/OpenCitations +
  Christen). Applied 3 deterministic mechanical fixes to cited_only (whitespace
  + missing prefix-space); 0 malformed remain. 4 ambiguous items (2 garbled,
  'ערר אדלר', 1 combined citation) flagged for chair — NOT auto-guessed, per
  the entity-resolution false-merge guardrail.

#80 stays pending — human-gated (Dafna value-labeling + chaim cost).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 09:09:30 +00:00
4debe9995b chore(#15): adopt MULTIMODAL_TEXT_WEIGHT=0.65 + close #15, open #80
A/B eval (eval_retrieval.py, 86-query gold-set) showed the 0.5 default was
mis-tuned: the image side was too heavy and dragged precedent_library recall
0.971 -> 0.885. Sweep 0.5..0.75 — at 0.65 multimodal beats text-only on every
overall metric AND every corpus (R@5 0.994 vs 0.989, nDCG@5 0.960 vs 0.944,
MRR 0.954 vs 0.936). Dafna approved.

- MULTIMODAL_TEXT_WEIGHT=0.65 set in Coolify (legal-ai, runtime) + redeploy.
- baseline.json updated to the 0.65 config (future regression reference).
- #15 done (premise was stale — multimodal already default on 110 docs; the
  win was tuning the weight, not the backfill).
- #80 opened: the costly 140-doc legacy backfill is deferred until a targeted
  image-answer gold-set proves the table/image value prop (untested here).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 08:45:06 +00:00
d4046c2fbd chore(tasks): #79#55 follow-up for isolated section-heading chunks
Discovered closing #57: the current chunker still emits 4 tiny chunks that
are standalone section headings ('דיון', 'טענות המשיבים', ...). Low priority
— filtered at query time, search unaffected. Proposed fix: anchor a short
isolated heading forward into the following section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 07:58:54 +00:00
76fae77393 chore(tasks): #77+#78 done; #76 deferred with root-cause diagnosis
#78 (committee-upload wakeup) + #77 (case_number identity) shipped.
#76 (Paperclip create-task button): root-caused to ee=companyId guard —
button enabled on title only but submit requires a company; not safely
patchable via injection. Deferred with workaround + upstream note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:27:45 +00:00
7e34c53224 chore(tasks): add #76-78 — Paperclip create-task button + 2 precedent-upload bugs
#76 צור-משימה button (enabled but submit no-ops), #77 committee-upload
field mapping (citation→case_number, case_number uneditable), #78 silent
extraction wakeup failure. Discovered while debugging precedent 8027-25.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:57:24 +00:00
e79f74bc23 docs(X11): wire corroboration tools into CEO halacha flow + guide (X11 Phase 2)
- CEO: run corroboration_rebuild after precedent_process_pending(halacha);
  report {approved, demoted}; tools added to allowlist
- researcher: halacha_corroboration (read) in allowlist
- TaskMaster #75 → done

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:52:02 +00:00
df007784c9 feat(corroboration): approval_action decision fn + kill-switch (INV-COR2/COR4, X11 Phase 2)
- HALACHA_CORROBORATION_AUTO_APPROVE config (default ON, Dafna validated 2026-06-01)
- approval_action(agg, has_overruled): overruled→demote, corroborated→approve, else None
- 4 offline unit tests; Phase 2 plan + TaskMaster #75

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:34:23 +00:00
887079535c feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction
ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים),
לצמצום היקף האישור-הידני של היו"ר:

- docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם
  ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ).
- docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר
  לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס.
- Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable;
  claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31:
  פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל.
- scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות.
- .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו).

תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות.
audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:42:13 +00:00
1cc7c0e757 chore(tasks): #71 — FU-5 follow-up, multi-precedent recall depth tuning
Diagnosis from the FU-5 eval: co-relevant precedents for broad legal questions
rank 15-16 (retrieved, not absent — recall ~1.0 by rank 20). Tracked as a
deliberate, harness-measured tuning task rather than an unmeasured global limit
change (which affects UI + writer agents + token cost).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 16:05:53 +00:00
a02a606f34 feat(agents): wire spec into agents — INV-AG1 read-before-act gate (FU-8b/GAP-23)
חיווט ספ-המערכת לסוכני-Paperclip כך שכל סוכן חייב לקרוא את 00-constitution
תחילה, ואז את ספ-התחום הרלוונטי לתפקידו (לפי טבלת X4 §2) — לפני עבודה מהותית.

- HEARTBEAT.md: סעיף עליון "קריאת-ספ — קודם החוקה (00), אז ספ-התחום" לפני §0–§8,
  עם טבלת תפקיד→ספ ל-8 הסוכנים.
- 8 קבצי-סוכן (ceo/proofreader/researcher/analyst/writer/qa/exporter/hermes):
  סעיף "קרא לפני פעולה (INV-AG1)" בראש הגוף.
- X4-agents.md: שדה "אכיפה" של INV-AG1 → "מחוּוט (פרוצדורלי)"; §5 → "בוצע".

אכיפה פרוצדורלית בכוונה — invariant פרויקטלי-תפעולי, אין שער-קוד שמכריח קריאה.
prereq לסוכני-התהליך (תת-פרויקט 5). gap-audit נשמר כ-snapshot (כמו FU-8a).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 16:02:04 +00:00
6ff2e36bf9 feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63)
Covers GAP-11 (INV-RET4/G8) and GAP-14 (INV-QA1/G10). Retrieval quality was
never measured (only telemetry observation) and the halacha review backlog was
invisible (the 10/19 gap was found by accident).

Unit B — backlog visibility (pure code, container):
- metrics.halacha_backlog(conn) → {pending_review, approved, rejected, published,
  total, oldest_pending_at}; surfaced in metrics.get_dashboard() (get_metrics MCP
  tool) and /api/system/diagnostics. Live count revealed 178 pending / 1552 total,
  oldest from 2026-05-03 — previously invisible.

Unit A — retrieval eval harness (host-side scripts):
- scripts/eval_gold_bootstrap.py — seeds data/eval/gold-set.jsonl. Two sources:
  citations (cited==relevant via search_relevance_feedback — empty until decisions
  cite precedents) and known_item (query=case_name → relevant=self; a real
  citation-free signal, the methodology #52 checked by hand). Idempotent; preserves
  source='chair' rows.
- scripts/eval_retrieval.py — runs the production retrieval path (search_library /
  search_internal) over the gold-set; computes precision@k, recall@k, MRR, nDCG@k
  (k=5,10); aggregates overall + per-corpus + per-practice_area; writes a report and
  a delta vs committed baseline.json (which records the retrieval_config it reflects).
  --self-test unit-checks the metric math offline.

Gold-set strategy = hybrid (chair decision): bootstrap + chair review. The citation
source is empty today (0 cited precedents in decisions), so the seed is known-item
(77 queries: 54 internal_decisions + 23 precedent_library). The gold-set is
PROVISIONAL until Dafna reviews it (the domain chair-gate).

Baseline (production config: multimodal+rerank on): R@10=0.987, MRR=0.837,
nDCG@10=0.872. Finding: MULTIMODAL_ENABLED=true slightly lowers known-item recall
(image-page results displace exact name matches) — relevant to #15. precedent_library
weaker than internal (R@10 0.957 vs 1.0) — one external precedent unfindable by name.

"CI gate" realized as discipline (re-runnable harness + committed baseline + run
before/after any retrieval-layer change) — retrieval needs prod DB + Voyage, no CI
runner has that access.

Spec: docs/superpowers/specs/2026-05-31-fu5-eval-harness-design.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:58:13 +00:00
4fce9d503f feat(migration): FU-2c — reconcile external case_law identifiers (GAP-08, #68)
External court precedents stored the full citation (designator + docket +
parties + Nevo date) inside case_number, violating INV-ID2/G1 (citation as
identifier). Chair decision 2026-05-31 (Option A): canonical external
case_number = proceeding-designator + docket, '/' preserved (court
convention, not X1's '/'→'-'); parties/court/date → citation_formatted.

scripts/fu2c_reconcile_external_case_numbers.py — deterministic dry-run →
chair-review → apply, mirroring FU-2b:
- extracts designator+docket; flags split into BLOCKING (MISMATCH /
  CIT_NO_DOCKET / DESIG_MISMATCH / DUP_CHECK / NO_DOCKET) vs ADVISORY
  (NO_CITATION — case_number fix still deterministic, missing citation is a
  separate gap), so advisory rows apply while uncertain identity does not.
- --overrides CSV (id,proposed_canonical,citation_formatted,reason) for
  audited chair adjudication of blocking rows.
- apply scoped to source_kind='external_upload' (task target) while keeping
  cited_only/nevo_seed in the reconciliation VIEW so DUP_CHECK spans the full
  external unique space; pre-flight collision guard before every UPDATE.

Applied to production 2026-05-31: 21 case_number normalized + 3
citation_formatted reconciled (D = consolidated Supreme Court judgment
לויתן/קלמנוביץ → lead docket 25226-04-25; 2×C empty citations composed from
metadata). אהוד שפר עע"מ 317/10 deferred — cross-source duplicate with an
existing cited_only reference (collision guard held; → #70). 49 cited_only
records out of scope → new task #70 (committee-form NNNN-NN dockets the
extractor misses, dedup, unresolvable "ערר אדלר"). Extraction + gating
verified offline on all 24 records.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:12:45 +00:00
e8431a2adf docs(spec): FU-8a process→code guards design (GAP-21/22) + split GAP-23 to #69
GAP-21: sync_agents --verify exits non-zero on drift; adapter_type mismatch
counted as drift (loud), not silent skip — makes it an enforceable gate (INV-MC1).
GAP-22: fitness-function pytest guarding against raw Paperclip HTTP + direct
agent_wakeup_requests INSERT (INV-INT1/INT3). Repo pre-scanned: 0 existing
violations → clean forward-fence. Verified vs 3+ sources (architectural fitness
functions; drift-verify non-zero exit). GAP-23 (spec→agents) split to #69.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 10:48:15 +00:00
105d9626ca docs(spec): FU-2b internal identifier reconciliation design (GAP-07/08) + split external to #68
Deterministic migration of ~52 internal_committee rows whose case_number holds
a full citation → normalized bare number (citation_formatted already correct).
DB analysis (2026-05-31): clean 1-token extraction, 0 key-collisions, 0
citation↔case_number mismatches, no month-padding dups. Chair-gated reversible
migration (backup→dry-run→approve→apply). One edge for chair: 8047/23 ערר vs בל"מ.
External (#68/FU-2c) split out — its citation_formatted is inconsistent.
Verified all 11 case_law FKs use id(UUID), not case_number → rename is FK-safe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:12:43 +00:00
7341ee8275 tasks(legal-ai): mark FU-3 (#61) done; 61.1 done, 61.2 cancelled (not-applicable)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:10:27 +00:00
a62116a571 docs(spec): FU-3 re-index on content change design (GAP-09) + close #61.2 not-applicable
content_hash/indexed_hash change detection + reindex_case_law from stored
full_text (no re-OCR) + drift health-check. Verified vs 3+ sources (content-
hash change detection, RAG re-embed-on-edit). #61.2 multimodal backfill closed:
42 rows are text-ingested (document_id NULL, no source PDF) — page-images
impossible without a PDF to render.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:52:40 +00:00
d28f7b8398 tasks(legal-ai): mark FU-7 (#65) done; FU-2a (#60) done
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 21:37:46 +00:00
a8b780765d docs(spec): FU-2a idempotent-ingest design + split FU-2b migration to #67
FU-2 split (chair decision 2026-05-30): FU-2a = pure-code (GAP-03 ON CONFLICT
upsert, GAP-06 write-time type-aware normalization, GAP-13 materialized
searchable flag); FU-2b (#67) = data-migration for GAP-07/08 (identifier
reconciliation + dedup) deferred as separate chair-involved task.

DB check 2026-05-30: ~52/56 internal_committee rows hold full citation in
case_number, >=1 duplicate (8047-23). Architecture verified vs 3+ sources
(PostgreSQL ON CONFLICT, DDD write-boundary normalization, materialized
validity flag). No identifier migration in FU-2a.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:07 +00:00
90728ccb3e docs(spec): FU-1 documented drift notes + mark TaskMaster #59 done
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:28:04 +00:00
357a5238c4 docs(spec): FU-1 unified-ingest design + FU-3 backfill task (#61.2)
Design for unifying the two parallel ingest paths (ingest_precedent /
ingest_internal_decision) into one canonical pipeline parameterized by an
IntakeSpec config object — Template Method skeleton + Strategy injection.
Closes the GAP-02 root cause (missing metadata queue on internal path) by
making a skipped step structurally impossible.

Architecture choice verified against 3+ authoritative sources (refactoring.guru
Template-Method/Replace-Conditional, Fowler FlagArgument, Strategy pattern).
DB check (2026-05-30): no migration needed — 0/56 internal rows lack metadata,
0 invalid enums; multimodal backfill (42 rows) tracked as TaskMaster #61.2 / FU-3.

Covers GAP-01/02/04/05 · provides INV-ING1/ING3/G2/G4 · TaskMaster #59.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:00:30 +00:00
df437c2462 tasks(legal-ai): mark FU-4 (62) + FU-6 (64) + subtasks done (merged+deployed)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m34s
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 18:30:26 +00:00
80d1c5ff27 tasks(legal-ai): reconcile #56 (cancel→superseded by 62.1) + #57 (link to FU-3)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 17:43:12 +00:00
d72d5429ed tasks(legal-ai): 8 fix-unit tasks (59-66) + 23 GAP subtasks from gap-audit
Granularity (epic-per-fix-unit + subtask-per-gap) and dependency-aware/WSJF
prioritization both backed by ≥3 authoritative sources (SAFe/Pichler/OWASP/CVSS;
Wake-INVEST/Cohn/Agile-Alliance/Atlassian/SAFe).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 17:38:31 +00:00
7826ff4910 fix(cases): tolerant case_number lookup so agents see case documents
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m39s
Reported: an agent claimed the case had no documents because document_list
returned empty — but the documents exist. Root cause: get_case_by_number did
an exact `WHERE case_number = $1`, so any formatting variant of the number
silently failed to resolve. Verified on 8137-24 (9 docs): "8137/24",
"ערר 8137-24", leading/trailing space, and "בל\"מ 8126/03/25" all returned
"תיק לא נמצא", which the agent read as "no documents" and went blind.

Add _normalize_case_number (strip leading proceeding-type prefix to the first
digit, trim, unify '/'→'-') and a normalized fallback in the lookup query
(exact match preferred via ORDER BY). One fix covers every case_number-scoped
tool (document_list, extract_references, search_case_documents, get_claims,
drafting, ...). Bogus numbers still correctly resolve to "not found". (#58)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 11:54:52 +00:00
58ab003206 fix(retrieval): make decisions findable by name + unhide committee uploads
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m57s
Root cause of "agent can't find the Agasi decision in the corpus" (CMPA-55):
the decision was fully ingested, but the retrieval layer failed on the
realistic agent query — searching by case name.

- RC-A (#52): lexical tsvector covered only chunk content + halacha text,
  so a bare-name query ("אגסי") matched decisions that *cite* the case, not
  the case itself. Add meta_tsv on case_law(case_name, case_number) (SCHEMA
  V20) and OR it into the lexical halacha/chunk SQL with a match boost, so a
  name/number hit surfaces the case's own rows. Agasi: rank 4 → rank 1.
- RC-B (#53): precedent_library_list hard-defaulted source_kind=external_upload
  and never exposed the param, hiding uploaded ערר/בל"מ (internal_committee)
  decisions. Thread source_kind through service → tool → MCP tool (supports
  'internal_committee' / 'all_committees').
- #54: agent instructions (researcher/analyst/writer) — search-by-name
  protocol: add content/case-number, search both corpora, use all_committees
  before declaring "not in corpus".
- #55: chunker produced tiny fragment chunks ("דיון", "החלטה") from header
  keywords matched mid-sentence. Anchor SECTION_PATTERNS to line start +
  merge sub-min sections; exclude <50-char fragments at query time (484
  existing fragments hidden; full re-chunk tracked as #57).

Tests: scripts/test_retrieval_by_name.py (name ranks case above citer +
substantive regressions); chunker unit checks (0 tiny chunks). New findings
filed as tasks #56 (halacha source_kind leak) and #57 (re-chunk migration).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 11:26:19 +00:00
1d4f214abe chore(taskmaster): mark #26 + #27 done (Paperclip SDK upgrade + host already on 525)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 7s
2026-05-26 12:19:16 +00:00
1645653ba9 chore(taskmaster): mark Stage A+B + #30/31/35/36/37 as done
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 26s
37/51 tasks done after the parallel sub-agent sprint:
- #30 closed (9/9 subtasks)
- #31 closed (3/3)
- #35 closed (6/6) — missing_precedents feature
- #36 closed (5/5) — legal_arguments aggregation
- #37 closed (5/5) — בל"מ subtypes
- #38, #39, #40, #41, #43, #44, #45, #46 done

Deferred: #42 (Haiku query expansion).
Pending: Stage C #47-51 + 3 UI smaller items (#32-34).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 08:36:02 +00:00
702c01d678 chore(tasks): mark Task #29 done — Agents tab deployed to prod
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 36s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:29:30 +00:00
cf5f6fe274 feat(paperclip): close 11 integration gaps (#16-#28)
Brings the legal-ai ↔ Paperclip integration in line with the official
Paperclip skill. Net effect: HEARTBEAT.md -47% (370→195 lines), all 14
agents on uniform runtime_config + budget + instructionsBundleMode, and
two cross-company helpers replacing manual SQL.

Highlights:
- HEARTBEAT.md refactor: project-specific only, delegates to the official
  paperclipai/paperclip skill (loaded per agent). Adds heartbeat-context
  fast-path (§1.7) and PAPERCLIP_WAKE_PAYLOAD_JSON shortcut (§1.5).
- Issue Thread Interactions API: legal-ceo.md now uses
  ask_user_questions / request_confirmation / suggest_tasks instead of
  free-text comments — gives chair structured UI with idempotency keys.
- pc.sh + paperclip_api.pc_request: every API call goes through helpers
  that inject Authorization + X-Paperclip-Run-Id (audit trail).
- sync_agents_across_companies.py: master(CMP)→mirror(CMPA) sync via
  Paperclip API, idempotent, with --verify and --apply modes.
- skills/new-company-setup: 11-step blueprint distilling all 11 gaps
  into a single onboarding runbook for the next company.
- .taskmaster: 12 tasks covering each gap (one already closed: #29).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:25:45 +00:00
73a79ea7e8 feat(precedents): metadata auto-fill, edit sheet, persuasive extraction
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
Three improvements to the precedent library based on usage feedback:

1. Auto-fill metadata at upload time. New service
   precedent_metadata_extractor reads the ruling's full_text and
   suggests case_name (short), summary, headnote, key_quote,
   subject_tags, appeal_subtype. The merge policy fills only empty
   fields, preserving everything the chair typed in the upload form.
   Wired into the ingest pipeline; also exposed as a re-run endpoint
   POST /api/precedent-library/{id}/extract-metadata for existing
   records.

2. Edit sheet in the UI. Pencil icon on each library row opens a
   pre-populated form covering every field. A Sparkles button on the
   sheet runs the metadata extractor on demand and refreshes the
   form. The case_number is read-only because halachot are FK'd to
   it; renaming requires delete + re-upload.

3. Halacha extractor branches on is_binding. Sources marked binding
   (Supreme/Administrative) keep the strict halacha prompt. Non-binding
   sources (other appeals committees, district courts on planning
   matters) get a different prompt that extracts applications,
   interpretive principles, and persuasive conclusions — labeled with
   new rule_types 'application' and 'persuasive'. The fallback also
   widens chunk selection: if the chunker labeled nothing as
   legal_analysis/ruling/conclusion, we now run on all chunks rather
   than returning zero halachot for a usable ruling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 10:19:35 +00:00
7ee90dce31 feat: external precedent library with auto halacha extraction
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m27s
Adds a third corpus of legal authority distinct from style_corpus
(Daphna's prior decisions for voice) and case_precedents (chair-attached
quotes per case). The new corpus holds chair-uploaded court rulings and
other appeals committee decisions, with binding rules (הלכות) extracted
automatically and queued for chair approval.

Pipeline (web/app.py + services/precedent_library.py):
file → extract → chunk → Voyage embed → halacha_extractor → store +
publish progress over the existing Redis SSE channel.

Schema V7 (services/db.py): extends case_law with source_kind +
extraction status fields under a CHECK constraint pinning practice_area
to the three appeals committee domains (rishuy_uvniya, betterment_levy,
compensation_197). New precedent_chunks (vector(1024)) and halachot
tables (vector(1024) over rule_statement, IVFFlat indexes, gin on
practice_areas/subject_tags). Halachot start as pending_review; only
approved/published rows are visible to search_precedent_library.

Agents: legal-writer, legal-researcher, legal-analyst, legal-ceo,
legal-qa get search_precedent_library. legal-writer prompt explains
the three-corpus distinction and CREAC use; legal-qa now verifies that
every cited halacha resolves to an approved row in the corpus.

UI: /precedents page with four tabs — library / semantic search /
pending review (J/K nav, A/R/E shortcuts, badge count) / stats.
Reuses the existing upload-sheet progress + SSE pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 08:38:18 +00:00
62a67e3f31 Add status icons, descriptions, status guide, manual status changer, and merge action buttons into overview tab
- StatusBadge: added icons (lucide-react) and Hebrew descriptions for all 13 statuses
- WorkflowTimeline: added phase icons and current-status description display
- StatusGuide: new collapsible component showing all statuses grouped by phase with explanations
- StatusChanger: new dropdown for manual status override on the case detail sidebar
- Case detail page: merged action buttons into overview tab, removed separate actions tab

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:19:28 +00:00
e34d217345 Close first threshold claim by default on compose screen
Drop the defaultOpen={i===0} on the first threshold_claim — when a
case has a lot of material already on screen (research background
+ chair positions + now precedents), auto-opening the first card
creates a wall of text on page load. All cards now start collapsed,
same as the issues list already did.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 19:27:28 +00:00
6b8f002596 Precedent attachment UI in the compose screen
Surface the new POST/GET/DELETE /api/cases/{n}/precedents endpoints
in the compose screen as two insertion points:

1. A new case-level card "פסיקה כללית לדיון" at the top of the
   main column, for precedents that support the discussion intro
   rather than a specific threshold_claim / issue.
2. An inline "פסיקה תומכת" section inside each SubsectionCard,
   below the ChairEditor.

Both insertion points render a `<PrecedentsSection>` which shows a
list of `<PrecedentCard>` (citation + blockquote + optional chair
note + 📄 chip if a PDF was archived) followed by a `<PrecedentAttacher>`
popover trigger.

The Attacher is a Popover with cross-case typeahead: typing 2+
characters into the citation field hits /api/precedents/search and
shows distinct library matches; picking one prefills quote + chair
note but leaves them editable so customizing the quote for this
case doesn't mutate the library. An optional PDF/DOCX/DOC file can
be attached — it uploads first via POST .../upload-pdf and the
returned document_id is passed into the precedent create call.

The parent compose page issues a single useCasePrecedents query
and partitions the result by section_id into a Map so each
SubsectionCard renders its own slice without re-fetching.

shadcn Popover installed as a new primitive. sonner toasts wired
for success/error in both attach and delete flows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 19:20:45 +00:00
916360e9b2 Fix case detail: document fields, expected-outcome label, drop debug note
Three user-reported bugs on /cases/[caseNumber]:

1. Documents tab showed "9 מסמכים" in the count but rendered nothing —
   DocumentsPanel was reading filename/category/status/size_bytes/uploaded_at,
   but the real FastAPI payload (case_get → db.list_documents) returns
   title/doc_type/extraction_status/page_count/created_at. Rewrote the
   panel against the actual document row shape, added a CaseDocument
   type alias in lib/api/cases.ts, mapped doc_type to Hebrew labels
   (כתב ערר / כתב תשובה / ...) and extraction_status likewise.

2. The "פעולות" tab showed a debug-flavoured paragraph "עריכת פרטי התיק
   נשמרת מיד דרך PUT /api/cases/1033-25" — that was internal wording,
   not user copy. Removed.

3. Overview tab showed the raw enum value "full_acceptance" in the
   expected-outcome line. Mapped through the existing expectedOutcomes
   label array so it now reads "קבלה מלאה".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:00:24 +00:00
cbe9d60901 Phase 6: polish — error boundaries, a11y, smoke test doc
Close out the read-only surface before cutover with three families of
small fixes that the previous phases left unfinished:

- Error boundaries: add src/app/error.tsx (route-segment), global-error.tsx
  (root crash fallback with its own minimal html/body — no Providers
  dependency since those may be the thing that crashed), and not-found.tsx
  for a Hebrew 404 instead of the default Next page.

- Accessibility: wire usePathname() into AppShell so the current nav item
  gets aria-current="page" and a gold underline. Add aria-label + aria-hidden
  on the icon-only buttons that Phase 5 left text-less (corpus trash,
  parties-field Plus). Nav gets an aria-label of its own.

- Metadata template: title on each route now reads "X · עוזר משפטי" via
  the layout.tsx title.template. Description localized to Jerusalem.

- README: full E2E smoke test checklist covering all 9 screens, plus a
  backend contract table so future phases know which hook wraps which
  endpoint. Documents the known Gitea→Coolify webhook issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:43:59 +00:00
fb1f73fa25 Phase 5: secondary screens (diagnostics, skills, training)
Port the remaining read views from the vanilla UI to Next.js:

- /diagnostics — system health snapshot (DB connected, table counts,
  active tasks, failed and stuck documents). Uses the existing
  /api/system/diagnostics payload with a 10s refetchInterval so the
  page self-updates while the user watches.
- /skills — Paperclip skill inventory with sync status (DB-only,
  disk-only, synced, not-synced) as a card grid driven by
  /api/admin/skills.
- /training — Dafna's style portrait as three tabs on one page:
  * Report: corpus KPIs + CSS conic-gradient subject donut
    (SubjectDonut ported from index.html renderHero) + horizontal
    anatomy bars + top-12 signature phrases.
  * Corpus: TanStack Table of style_corpus rows with an inline
    delete mutation (useDeleteCorpusEntry invalidates both the
    corpus list and the style report so KPIs update).
  * Compare: two-decision selector backed by /api/training/compare,
    side-by-side panels plus shared / only-A / only-B pattern
    lists.

New API modules: lib/api/system.ts, lib/api/skills.ts,
lib/api/training.ts. All three use TanStack Query with staleTime
profiles tuned per endpoint (10s for diagnostics, 30s for skills,
60s for training reports).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:33:33 +00:00
ac0a5ee30b Phase 4.5: practice area integration in the Next.js UI
Backend commit 26d09d6 introduced a multi-tenant axis (practice_area +
appeal_subtype) that the vanilla UI picked up but the new Next.js
rewrite did not. Close the gap in the screens we already shipped so
future search/filter work in Phase 5 has the right data on screen.

- lib/practice-area.ts — new: enum + label maps + deriveSubtype(),
  mirrors mcp-server/src/legal_mcp/services/practice_area.py.
- lib/schemas/case.ts — two new z.enum fields on caseCreateSchema.
- lib/api/cases.ts — Case / CaseDetail gain practice_area and
  appeal_subtype as optional (cached pre-migration responses still
  parse cleanly).
- wizard/case-wizard.tsx — basics step gains a practice_area dropdown
  (future domains disabled with "(בקרוב)") and an appeal_subtype
  dropdown with auto-fill effect tracking a userTouchedSubtype ref,
  same behaviour as wireSubtypeAutofill() in the vanilla UI.
- cases/case-header.tsx — gold badge next to the status shows
  "ועדת ערר · רישוי ובנייה" when both fields are populated.
- cases/cases-table.tsx — new "תחום" column showing subtype label
  (dash for unknown). No filter yet — that's phase 5 when a second
  domain actually exists.

Plan: ~/.claude/plans/woolly-cooking-graham.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:15:48 +00:00
75ea6825b2 Remove committee_type from case wizard and header
The committee_type field in FastAPI's CaseCreateRequest is a leftover
with no meaningful UI. Appeals against a ועדה מקומית / ועדה מחוזית are
legally distinguishable by case-number range, not by a picked committee
label; appeals against a ועדת ערר decision go to בית משפט לעניינים
מנהליים and never enter this system at all. The backend retains its
"ועדה מקומית" default, which is what we'd send anyway.

Drop the Select from the wizard's basics step and the "ועדה" row from
the case detail header. The Case TS type keeps the optional field so
existing API responses still parse cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:36:50 +00:00
9fcf4f2dc7 Phase 4a: shadcn form primitives + case inline edit
Add dialog/select/textarea/label/progress/sonner components and wire
a Toaster into Providers. New zod schemas in lib/schemas/case.ts
mirror CaseCreateRequest/CaseUpdateRequest and feed react-hook-form
validation.

CaseEditDialog on the case detail Actions tab posts PUT /api/cases/{n}
with optimistic cache patching via useUpdateCase, showing toast
feedback on success/error.

shadcn's <Form> registry entry skipped at init (missing from the
nova preset); the dialog uses RHF directly against the same Input/
Textarea/Select primitives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:21:21 +00:00
51064f3a03 Phase 3a: shadcn init + Home/Case List view
Initialize shadcn/ui (radix-nova preset) and wire its semantic tokens
to the editorial navy/cream/gold palette so primitives inherit the
judicial voice without per-component overrides.

Replace the Phase 2 live-probe with a real dashboard: KPI tiles,
conic-gradient status donut (ported from the vanilla renderHero),
and a TanStack Table cases list with search + sort. Add useCase(n)
hook with 5s staleTime/refetchInterval to replace the old manual
polling loop when Case Detail ships next.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:04:30 +00:00
Chaim
0ee8e723bd Phase 2: API client, typed hooks, live probe
- Add api:types script (openapi-typescript against live FastAPI)
- Generate src/lib/api/types.ts (2972 lines, 55 paths, 16 schemas)
- lib/api/client.ts: typed apiRequest + ApiError + makeQueryClient
  (staleTime 5s, no refetchOnWindowFocus to preserve editor state)
- lib/providers.tsx: QueryClientProvider client component, useState
  singleton so App Router re-renders don't dump the cache
- lib/api/cases.ts: Case type + casesKeys + useCases hook (pragmatic
  hand-typed Case pending backend response-model annotations)
- layout.tsx: wrap children with <Providers>
- Smoke test: CasesLiveProbe component on home page hitting live FastAPI
  via /api/cases rewrite proxy

Phase 2 deliverable check: useCases() returns typed Case[] from the
production FastAPI through the Next.js proxy. End-to-end wiring proven.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 15:49:24 +00:00
Chaim
64724656af Phase 1: scaffold Next.js 16 web-ui + Coolify staging
- create-next-app with TypeScript, Tailwind v4, App Router
- Port design-system.css tokens into Tailwind @theme (navy/gold/parchment, Heebo)
- Install TanStack Query, react-hook-form, zod, lucide-react, react-dropzone
- layout.tsx: RTL Hebrew + Heebo via next/font/google
- AppShell component with navy header + gold rule + nav
- next.config.ts: output:standalone + rewrites to proxy /api/* to production FastAPI
- Dockerfile: multi-stage Node 20 Alpine build for Next.js standalone
  (branch-local override of the FastAPI Dockerfile; main is unaffected)
- Switch .taskmaster to claude-code provider (no API key required)
- Add 7 phase tasks (83-89) tracking the full rewrite plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 13:47:05 +00:00
d5ccf03e4c Add docs, scripts, skills, commands, and taskmaster config to repo
Includes:
- docs/: architecture, block-schema, migration-plan, product-specification
- scripts/: bidi_table, decompose-decisions, extract-claims, seed-knowledge, etc.
- skill-legal-decision/: SKILL.md + references + block-schema
- skill-legal-assistant/: SKILL.md
- skill-legal-docx/: SKILL.md + references
- .claude/commands/: bidi-table skill
- .taskmaster/: task config + PRDs
- .gitignore: exclude legacy/, kiryat-yearim/, node_modules/, memory/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:19:17 +00:00