Replaces the CSV-edit workflow with an in-app tagging page so the chair/Dafna
can label the extraction-quality gold-set by clicking, and see validator
precision/recall live.
Schema (V29): halacha_goldset — a stratified, human-tagged evaluation batch
(is_holding / correct_type / quote_complete, NULL until tagged).
db.py:
- goldset_create_sample (stratified round-robin over case×rule_type, idempotent),
- goldset_list (items + halacha content + the machine's own labels),
- goldset_tag (partial — one field at a time for keyboard tagging),
- goldset_score (ports the script's P/R/F1: each validator scored as a
not-a-holding detector against the human tags — the #81.8 input).
API: GET /api/goldset, POST /api/goldset/sample, GET /api/goldset/score,
PATCH /api/goldset/{id}.
web-ui:
- lib/api/goldset.ts (hooks),
- components/goldset/goldset-panel.tsx — card-per-item, keyboard-first
(J/K nav, H/N holding, C/X quote), progress bar, hide-tagged toggle, and a
collapsible live score table,
- app/goldset/page.tsx + nav link "מדגם-זהב" under ידע ולמידה.
Methodology guard kept explicit in UI + docstrings: tags are HUMAN ground truth,
no AI pre-fill (circular bias). Populated a 150-item stratified batch.
Verified: backend create/list/tag/score against the live DB; tsc --noEmit 0;
py_compile ok. (Local Turbopack build blocked by worktree symlink — CI builds clean.)
Invariants: G1 (eval set modeled at source in its own table); G2 (reuses the same
halacha_quality validators the extractor runs — no parallel scoring logic).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cross-precedent recurrence of a principle is real but is NOT citation
corroboration (X11) — the 5 candidate pairs have ZERO citations between their
precedents. Recording them in halacha_citation_corroboration would fabricate
citation data and inflate corroboration_count. This adds a proper, separate
halacha-level link for parallel authority.
Schema (V28): equivalent_halachot — symmetric (halacha_a < halacha_b, CHECK +
UNIQUE), non-citation, cross-precedent-only. ON DELETE CASCADE.
db.py:
- link_equivalent_halachot (idempotent; rejects same-id and SAME-precedent pairs
— parallel authority is cross-precedent by definition), unlink, and
list_equivalent_for_halacha.
- list_halachot gains include_equivalents → _annotate_equivalents attaches an
`equivalents` list (both directions) per row.
API: include_equivalents on GET /api/halachot; GET/POST/DELETE
/api/halachot/{id}/equivalents for the chair to view/link/unlink manually.
scripts/halacha_batch_reconcile.py: --link records found cross-precedent pairs
as equivalent_halachot (non-destructive, idempotent).
web-ui: Halacha.equivalents type; the clean review queue fetches
include_equivalents; the review card shows a gold "עיקרון מקביל ב-N" badge + an
expandable list (case + rule + similarity) labeled "אסמכתה מקבילה — לא ציטוט".
Populated the 5 reviewed pairs (chair decision: keep all + link as parallel
authority). Verified: 5 rows; the 1023-20 hub annotates 3 of its halachot with
equivalents; tsc --noEmit exits 0.
Invariants: G1 (model recurrence at source in its own table, not by abusing the
citator); G2 (no parallel path — extends list_halachot); citator integrity
preserved (corroboration stays citation-only).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes #84 — surfaces the backend gating/prioritization (#84.1/#84.3, PR
#93) in the chair's review UI and adds near-duplicate clustering (#84.2).
Backend
- db.list_halachot gains `cluster` (#84.2): annotates each row with cluster_id +
cluster_size by unioning same-precedent halachot within HALACHA_CLUSTER_COSINE
(0.90, new config). Display-only — never merges/deletes. Pairwise is confined
to the returned set (cheap).
- GET /api/halachot exposes the `cluster` query param (default off).
Frontend (web-ui)
- Halacha type gains optional cluster_id / cluster_size (hand-written module; no
api:types regen needed — halachot aren't typed off the generated schema).
- useHalachotPending(opts): the default "clean" queue now fetches
exclude_low_quality + order_by_priority + cluster; needsFix:true returns the
flagged 'needs extraction fix' bucket (filtered client-side).
- HalachaReviewPanel: a "תור נקי / דורש תיקון-חילוץ" toggle (#84.1); near-dup
clusters collapse into ONE card showing "+N וריאנטים" with an expandable list,
and approve/reject/defer on a clustered card applies to all variants via the
batch endpoint (#84.2 + #84.4). Counts show true halacha totals (pendingTotal).
New flag labels added (application / near_duplicate / nevo_preamble_leak).
Verified:
- backend: list_halachot(cluster=True) on the live queue — algorithm correct
(groups related same-precedent rules at 0.78; none at the production 0.90
because dedup #82 already removed near-dups — the desired state).
- frontend: `tsc --noEmit` exits 0 (type-clean); no new lint errors (the one
lint error is pre-existing in training/learning-panel.tsx from #94). Local
Turbopack build can't run on the worktree node_modules symlink — CI builds in
a clean checkout.
Invariants: G1 (gate/cluster at source in SQL, not post-hoc); G2 (same
list_halachot path); §6 (flagged items routed to a visible bucket, not dropped).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Backend for the halacha approval-queue triage (#84). The keyboard UI, batch
actions and defer/reject (#84.4–6) already shipped; this adds the gating,
prioritization and metrics the queue was missing.
db.list_halachot — two opt-in triage controls:
* exclude_low_quality (#84.1): drop items carrying ANY quality_flag
(application / quote_unverified / truncated / non_decision / thin /
nli_unsupported / near_duplicate) — they belong in a 'needs extraction fix'
bucket, not the chair's approve queue.
* order_by_priority (#84.3): active-learning order — negatively-treated
first, then most-uncertain (lowest confidence), then oldest — instead of
FIFO, so the highest-value decisions surface first.
halachot_pending (MCP) — now gated + prioritized BY DEFAULT; include_low_quality=
true reveals the needs-fix bucket. The agent review path benefits immediately.
GET /api/halachot — same two params, default OFF (non-breaking; the UI opts in).
metrics.halacha_backlog (#84.7) — splits pending into clean vs flagged, adds
deferred, reviewed_total, approve_ratio, and a pending_by_flag breakdown, so the
backlog distinguishes real review work from extraction noise.
Deferred (documented): #84.2 near-duplicate cluster cards and wiring the UI
fetch to the new params require frontend work + an api:types regen AFTER this
deploys (the new query params aren't in prod's OpenAPI until then) — a clean
follow-up. The backend fully supports both now.
Verified against the live DB (read-only):
- pending 177 → gated-clean 110, 0 flagged items leak into the clean queue.
- priority order surfaces the lowest-confidence items first (0.55, 0.55, ...).
- backlog: pending_clean=110 / pending_flagged=67 / approve_ratio=0.916,
pending_by_flag={nli_unsupported:59, quote_unverified:3, thin:3, truncated:2}.
- pytest tests/test_halacha_quality.py — 52 passed (no regression).
Invariants: G1 (gate at source — SQL filter, not post-hoc); G2 (no parallel
path — same list_halachot); §6 (flagged items routed to a bucket, never dropped).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
הפרוסה האחרונה של GAP-48 (INV-TOOL1). 18 כלי drafting הומרו ל-{status,data,message}
דרך tools/envelope.py — כולל מסלול הפקת-ההחלטה הקריטי.
עיקרון לכלים עם כשל משמעותי (export_docx/revise_draft/apply_user_edit): err()
ברמת-המעטפת — כך שהסוכן והמשתמש רואים את הכשל; failed_gates רוכב ב-data.
שאר הכלים: ok(data=payload) להצלחה, err להיעדר-תיק/קלט-שגוי/חריגה.
6 צרכני-app.py חוּוטו (get_decision_template, apply_user_edit ×2, revise_draft,
list_bookmarks, export_docx) עם envelope_unwrap + בדיקת status=="error"→4xx,
לשמירת חוזה-ה-API (X6) ללא-שינוי. test_export_qa_gate עודכן לחוזה החדש.
בדיקות: 182/182 עוברים (כולל שערי-QA של הייצוא).
GAP-48 סגור: כל ~12 משפחות-הכלים אחידות. נותר ב-FU-14: GAP-49/50 (שובר), GAP-54.
Invariants: משלים INV-TOOL1 + G2. מתועד ב-X9 (נסגר) + gap-audit פרוסה 7.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
סוגר את לולאת-הלמידה (INV-LRN4): כל החלטה נסגרת מול הסופי, וכל סופי
מנותח מול הטיוטה. מזין את הטבלאות ש-T15 כבר קורא מהן.
T5 — פנקס-התאמה:
- SCHEMA_V26: טבלת draft_final_pairs (snapshot draft + final + diff + analysis + status).
- db: create/update/list_draft_final_pairs.
- mark-final (app.py): תופס snapshot של הטיוטה (decision_blocks) ברגע החתימה,
לפני שאפשר לדרוס אותו, ופותח שורת-פנקס (status=final_received).
T4 — דיסטילציה אוטומטית:
- learning_loop.process_final_version: משתמש ב-snapshot (לא בבלוקים שאולי השתנו),
מסווג style_method↔substance, שומר הצעה ב-pair (status=analyzed).
**הוסר ה-auto-upsert של style_patterns** — ביטל את ה-bug שדרס את שער-היו"ר
וזיהם סגנון במהות (INV-LRN1 + INV-LRN5).
- LESSONS_PROMPT: הפרדת style_method↔substance מפורשת + לקח מופשט בלבד.
- curator wake + hermes-curator.md: מריץ ingest_final_version ראשון; מציע רק
style_method שלא תועד; substance→מסלול precedent.
INV-LRN1 (שער-יו"ר, אין auto-commit) · INV-LRN4 (ניגוד-אמת) · INV-LRN5 (טוהר).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GAP-57 (אבטחה, CWE-798 / INV-ENV4): ה-default הקשיח
postgresql://paperclip:paperclip@... הוסר מ-3 קבצי web/. נוסף resolver משותף
require_paperclip_db_url() ב-paperclip_api.py שנכשל בקול אם PAPERCLIP_DB_URL לא
מוגדר — במקום ליפול בשקט ל-creds ידועים. Coolify מגדיר את המשתנה (אומת), אז
הייצור לא נפגע. (2 מופעים בסקריפטים מקומיים נותרו ל-FU-15 המלא.)
FU-13 (INV-AG3, GAP-46): יישור הרשאות-סוכן. התברר שהפער שמופה ב-31.5 היה רחב
מדי — יוחס לפי תיאור-תפקיד, לא ההוראות בפועל. הכרעת-יו"ר "היבריד":
- legal-analyst: נוסף aggregate_claims_to_arguments (frontmatter + שלב 7) — הכלי
שמקבץ את הטענות שהוא חילץ לטיעונים משפטיים.
- extract_references/extract_internal_citations הם מטלת-researcher (שכבר מחזיק
אותם), לא analyst — הוסרו מרשימת "החסרים".
- legal-researcher: כבר היה תקין; ה-spec היה מיושן.
עודכנו X4-agents.md (§2א, INV-AG3) ו-gap-audit.md (FU-13 ✅, FU-15 חלקי).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
סוגר את לולאת פידבק-יו"ר→ידע-סוכנים. עד כה resolve רק עדכן את ה-DB; עכשיו
לחיצה ב-/feedback מעירה את ה-CEO שמקפל את הלקח לקובץ לפי הקטגוריה.
- paperclip_client.py: wake_ceo_for_feedback_fold() — יוצר issue ב-Paperclip
עם הלקח + rubric ניתוב (style→SKILL.md, wrong_structure→block-schema,
אחר→lessons.md), מעיר CEO. משכפל את דפוס wake_for_precedent_extraction
- db.py: get_chair_feedback(id) — שליפת הערה בודדת עם case_number/appeal_type
- app.py: resolve endpoint מקבל fold (ברירת מחדל true); BackgroundTask
fire-and-forget; guard — רק עם lesson_extracted. מחזיר fold_queued
- legal-ceo.md: dispatch ל-feedback_fold_ + סעיף "קיפול הערת יו"ר" עם rubric
- frontend: useResolveFeedback מקבל fold; /feedback שולח fold=true עם toast;
drafts-panel שולח fold=false (bookkeeping per-case, בלי קיפול כפול)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- דף /feedback חדש: מאגד את כל הערות chair_feedback מכל התיקים, סינון
טרם-יושמו/הכל + לפי קטגוריה, כפתור "סמן כיושם" לכל הערה
- מרכז אישורים: כרטיס "הערות יו"ר" קישר ל-/ (חסר תועלת) → עכשיו /feedback
- מרכז אישורים: כרטיס "תיקים שנכשלו ב-QA" — כל תיק במדגם קליקבילי לדף
התיק, והכרטיס מקשר ישירות לתיק כשיש רק אחד
- ApprovalSample.href אופציונלי; פריטי מדגם נהפכים ל-Link כשיש href
- ניווט: הוספת "הערות יו"ר" לקבוצת work ב-app-shell
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- בקאנד: GET לפני ה-async task — אם citation כבר קיים כ-external_upload מחזיר 409
- DB: get_external_case_law_by_citation — lookup לפי citation + source_kind
- פרונט: banner אדום עם פרטי הרשומה הקיימת ושני כפתורות:
• "הפעל חילוץ מחדש" — request-halachot ל-ID הקיים וסגירת הטופס
• "מחק את הרשומה" — DELETE עם confirm, ניקוי conflict לאחר מכן
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new "ההחלטה" tab to the case detail page showing all 12 decision
blocks with rendered markdown content and inline editing that saves back
to the DB via two new FastAPI endpoints.
Backend (web/app.py):
- GET /api/cases/{n}/decision-blocks — returns all 12 blocks (empty
ones included) merged from BLOCK_CONFIG + decision_blocks table.
Exposes source_of_truth ("docx"|"blocks") and active_draft_path.
- PUT /api/cases/{n}/decision-blocks/{block_id} — inline save via
block_writer.save_block_content; warns (does not block) when an
active DOCX draft exists.
Frontend:
- src/lib/api/decision-blocks.ts — typed hooks (useDecisionBlocks,
useSaveBlock) following the cases.ts hand-written-module pattern.
- src/components/cases/decision-blocks-panel.tsx — accordion of 12
blocks; view mode renders Markdown component; edit mode is a textarea
with on-blur save (derived from ChairEditor pattern, setState-during-
render for re-sync to avoid effect cascade).
- BLOCK_LABELS in feedback.ts extended from 7 → 12 blocks.
- cases/[caseNumber]/page.tsx — new "ההחלטה" tab wired to the panel.
No DB migration required — decision_blocks + active_draft_path exist.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Legacy Hebrew .doc precedents (e.g. nevo.co.il CP1255 OLE2) can now be
uploaded directly through the precedent-library, missing-precedent, and
training upload paths — the frontend already advertised .doc but the
backend gate rejected it before reaching the extractor.
- web/app.py: add .doc to ALLOWED_EXTENSIONS (covers all paths that share
the set: precedent library, missing-precedent, training).
- Dockerfile: install libreoffice-writer-nogui (no X11/Java) so the
extractor's existing _extract_doc LibreOffice conversion works in the
Coolify container (was missing → would fail at runtime).
- extractor.py: isolate the LibreOffice user profile per call to avoid a
profile-lock failure on concurrent .doc conversions.
Verified in python:3.12-slim (prod base): .doc→.docx→text yields text
byte-identical to a native Word .docx save (103 paragraphs, 24,341 chars).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the chair's pending-halacha review faster and less exhausting.
Backend:
- New 'deferred' review_status (snooze): stays out of the active library AND
out of the default pending queue, without the finality of 'rejected'.
update_halacha stamps reviewer+reviewed_at on defer; HALACHA_REVIEW_STATUSES
is the single source of valid statuses (PATCH validation now uses it).
- db.update_halachot_batch(ids, status, reviewer) — one atomic UPDATE for a
whole group; invalid status / empty ids are a no-op.
- POST /api/halachot/batch (HalachaBatchReviewRequest) wraps it.
- update_halacha now RETURNs quality_flags too (parity with list_halachot).
Frontend (halacha-review-panel):
- Quality-flag badges (#81: non_decision / truncated_quote / thin_restatement /
quote_unverified) so the chair sees WHY an item was held back.
- Defer action — button + keyboard 'D' — to snooze without rejecting (fixes the
'leave in pending forever' anti-pattern; reject stays the junk verb).
- Per-precedent batch bar: 'אשר הכל' / 'דחה הכל' via useBatchReviewHalachot
(one request, one refetch) with confirm guards.
- Halacha/HalachaPatch types gain quality_flags + 'deferred'.
Verified: mcp-server suite 156 passed; web build green; end-to-end integration
against dev DB (batch approve/reject, defer sets status+timestamp, pending
excludes approved+deferred, deferred queryable, invalid status no-op).
Note: api:types regen deferred until deploy (the batch hook is hand-typed, not
dependent on generated types).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two identity fixes for the precedent corpus:
1. PrecedentUpdateRequest += case_number — the canonical identifier was not
in the edit model, so a wrong id captured at upload (e.g. the full
citation pasted into the field) could not be corrected. update_case_law
already whitelists case_number.
2. /api/internal-decisions/upload += citation form field — case_number is
now the clean identifier (e.g. 8027-25) and citation is the full
מראה-מקום, stored as citation_formatted up-front (previously the UI sent
the citation AS case_number, leaving the id polluted and citation_formatted
empty until extraction). Stored via a post-ingest update_case_law, not the
core INSERT.
Frontend (separate case_number field in the upload + edit sheets) follows in
a second PR after api:types regen.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /api/internal-decisions/upload path (used by the UI for ועדת-ערר
decisions) never called pc_wake_for_precedent_extraction, so committee
decisions were stuck at halacha_extraction_status='pending' forever — the
CEO was never woken to drain the queue. Root cause behind 8027-25's stuck
extraction. The other two upload paths (precedent_library, missing-precedent)
already wake the CEO; this one was missing it.
- internal-decisions upload: add the wakeup, routing the company by case
number prefix (1xxx→רישוי, 8xxx→היטל, 9xxx→פיצויים) when practice_area is
empty (else an 8xxx case wrongly routes to the licensing CEO).
- all three call sites: the wake helper returns {ok:False} WITHOUT raising
on a skipped/failed wakeup; that was silently dropped. Now logged at
WARNING with the reason, and the upload progress carries extraction_queued.
Fallback drainer (scheduled precedent_process_pending) deferred — the
missing wakeup was the actual failure; manual precedent_process_pending
remains the recovery path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The gold-set card read data/eval/gold-set.jsonl, but .dockerignore excludes
data/ from the build context, so the file is never in the container and the
card silently never rendered. Baking eval data into the image is the wrong
layering (data/ is runtime volumes). The gold-set review is a one-time task,
not a recurring chair queue, so it doesn't belong on the live board — it's
tracked via task #63 and reviewed directly with the chair. The board now
returns the 4 robust DB-backed gates (halachot, missing precedents, feedback,
qa_failed). Removes the best-effort file read + its unused Path import.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dafna asked for a single page under the prod site listing everything she needs
to approve, so nothing is forgotten — the visible embodiment of INV-G10 (human
gates) and INV-QA1 (halacha backlog must be visible).
Backend — GET /api/chair/pending aggregates every pending chair gate, each as a
direct source query (count + sample + action link):
- halachot review backlog (review_status='pending_review') + oldest
- open missing precedents
- unresolved chair_feedback
- qa_failed cases
- gold-set review (FU-5, file-based, best-effort: total vs source='chair')
Frontend — /approvals page ("מרכז אישורים"):
- src/lib/api/chair.ts — usePendingApprovals() (hand-typed until next api:types)
- src/app/approvals/page.tsx — card per category, severity-coloured count, sample
rows, oldest-pending date, link to where each is handled; live (60s refetch)
- app-shell nav: "מרכז אישורים" in the work group + total-pending badge (quiet at 0)
Live counts at build time surfaced the value immediately: 226 open missing
precedents, 178 pending halachot, 20 unapplied feedback notes, 1 qa_failed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers GAP-11 (INV-RET4/G8) and GAP-14 (INV-QA1/G10). Retrieval quality was
never measured (only telemetry observation) and the halacha review backlog was
invisible (the 10/19 gap was found by accident).
Unit B — backlog visibility (pure code, container):
- metrics.halacha_backlog(conn) → {pending_review, approved, rejected, published,
total, oldest_pending_at}; surfaced in metrics.get_dashboard() (get_metrics MCP
tool) and /api/system/diagnostics. Live count revealed 178 pending / 1552 total,
oldest from 2026-05-03 — previously invisible.
Unit A — retrieval eval harness (host-side scripts):
- scripts/eval_gold_bootstrap.py — seeds data/eval/gold-set.jsonl. Two sources:
citations (cited==relevant via search_relevance_feedback — empty until decisions
cite precedents) and known_item (query=case_name → relevant=self; a real
citation-free signal, the methodology #52 checked by hand). Idempotent; preserves
source='chair' rows.
- scripts/eval_retrieval.py — runs the production retrieval path (search_library /
search_internal) over the gold-set; computes precision@k, recall@k, MRR, nDCG@k
(k=5,10); aggregates overall + per-corpus + per-practice_area; writes a report and
a delta vs committed baseline.json (which records the retrieval_config it reflects).
--self-test unit-checks the metric math offline.
Gold-set strategy = hybrid (chair decision): bootstrap + chair review. The citation
source is empty today (0 cited precedents in decisions), so the seed is known-item
(77 queries: 54 internal_decisions + 23 precedent_library). The gold-set is
PROVISIONAL until Dafna reviews it (the domain chair-gate).
Baseline (production config: multimodal+rerank on): R@10=0.987, MRR=0.837,
nDCG@10=0.872. Finding: MULTIMODAL_ENABLED=true slightly lowers known-item recall
(image-page results displace exact name matches) — relevant to #15. precedent_library
weaker than internal (R@10 0.957 vs 1.0) — one external precedent unfindable by name.
"CI gate" realized as discipline (re-runnable harness + committed baseline + run
before/after any retrieval-layer change) — retrieval needs prod DB + Voyage, no CI
runner has that access.
Spec: docs/superpowers/specs/2026-05-31-fu5-eval-harness-design.md
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Six-phase upgrade of /training from a read-only dashboard into a full
Style Studio for managing Daphna's style corpus.
- Upload Sheet on /training: file → proofread preview → commit (no more
CLI-only `upload-training` skill).
- Rich corpus metadata: GET /api/training/corpus returns summary, outcome,
key_principles, page_count, parties (regex), legal_citation, lessons_count.
PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details
/content/lessons/patterns) replaces the bare table row.
- LLM metadata enrichment: style_metadata_extractor + MCP tools
(style_corpus_enrich, style_corpus_pending_enrichment) fill summary
/outcome/key_principles via claude_session (free, host-side).
- Per-decision lessons: new decision_lessons table + 4 REST endpoints +
LessonsTab in drawer; hermes-curator now auto-posts findings as
decision_lessons(source=curator).
- Curator Portrait tab: prompt rendered with link to Gitea, recent
curator findings, style_analyzer training prompts, propose-change
form that writes proposals to data/curator-proposals/ for manual
chair review (no auto-mutation of the agent file).
- Style chat tab: SSE-streamed conversations with the style agent.
New host-side pm2 service (legal-chat-service, port 8770) wraps
claude CLI with stream-json + --resume continuation; FastAPI proxies
via host.docker.internal. Zero API cost — uses chaim's claude.ai
subscription. chat_conversations + chat_messages persist history.
Architecture: keeps the existing rule that claude_session only runs
on the host (not the container). The new legal-chat-service is the
canonical bridge between the container and the local CLI for the chat
feature; everything else (upload, metadata, lessons) stays within the
container's existing capabilities.
Audit script (scripts/audit_training_corpus.py) included for verifying
which corpus rows still need enrichment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Until now, "case_number" was the only stored identifier for a precedent.
But a *citation per the Israeli unified citation rules* is a different
beast — it has bold parties, an unbold prefix (court abbrev + panel/
district parenthetical + case number), and an unbold trailing reporter
(נבו / פ"ד...). Without storing it as a first-class field we couldn't
hand the chair a one-click "copy as citation" experience for pasting
into decisions.
Changes:
- Schema V19: case_law.citation_formatted TEXT (Markdown — parties
wrapped in **…** so the copy helper can render <strong> for Word/Docs
paste and keep plain-text fallback meaningful).
- Metadata extractor: composes citation_formatted from the document
text per the unified citation rules, with worked examples for ע"א /
עת"מ / ערר / בל"מ in the prompt. Refuses to store half-formed strings.
- PATCH /api/precedent-library/{id} accepts citation_formatted so the
chair can correct LLM mistakes.
- /precedents/[id]: dedicated "מראה מקום" block with bold rendering,
a copy-to-clipboard button (text/html + text/plain so Word keeps
the bolds), and an inline edit textarea.
- /precedents list rows: link displays the formatted citation when
available, with a small inline copy button — falls back to the bare
case_number for older rows.
Backfill of existing rows happens by re-stamping the extraction queue
once V19 has rolled out and the new field is reachable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The missing-precedents drawer + general precedent upload both required
the user to type chair_name, district, practice_area, court, date etc.
upfront — even though those fields can be (and already are, post-upload)
extracted from the document text by the LLM. The metadata-extraction
wakeup also only fired for the /precedent-library/upload path, leaving
missing-precedents committee uploads stuck with whatever stub the user
typed.
Changes:
- Extractor learns chair_name + district, overwrites the new
PLACEHOLDER_PENDING_EXTRACTION sentinel for internal_committee rows
(the DB CHECK forces non-empty; we stamp the placeholder at insert).
- missing_precedent_upload no longer 400s on missing chair/district;
it infers district from the citation when possible, falls back to
the placeholder, and always fires pc_wake_for_precedent_extraction
so the LLM can fill in the rest.
- Both upload sheets default to file (+ citation) only; every other
field is tucked into a closed <details> labeled "אופציונלי — דריסה
ידנית של שדות שיחולצו אוטומטית". Required validators on chair/
district/practice_area dropped — the LLM fills them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two webhook emitters in paperclip_api.py that the plugin's
onWebhook handler now routes by ``eventType``:
* ``emit_missing_precedent_webhook(...)`` — fires from
POST /api/missing-precedents on first insert (non-duplicate).
The plugin surfaces an askUserQuestions interaction on the
linked issue so Daphna can choose upload / irrelevant / defer
without needing to open the legal-ai UI.
* ``emit_export_complete_webhook(...)`` — fires from
POST /api/cases/{n}/export-docx after a successful export. The
plugin attaches a "final-decision" markdown document with a
download link to the linked Paperclip issue.
Both are fire-and-forget BackgroundTasks — failures are logged
but never block the originating request. Company resolution
follows the same 1xxx→licensing / 8-9xxx→betterment rule used
by emit_case_status_webhook.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The "חלץ עובדות שמאיות" UI button hit POST /api/cases/{n}/extract-appraiser-facts
which called appraiser_facts_extractor inline — that shells out to the local
`claude` CLI, which is absent in the Coolify container, so every doc errored,
the per-doc try/except swallowed it, and the response was "completed, 0 facts".
Refactored the endpoint to wake the legal-analyst of the correct company via
Paperclip (same pattern as wake_curator_for_final), and surface
extraction_failed instead of "completed" when every doc errored.
The cases-table reads from the list endpoint, not /details, so without
proceeding_type in the row payload the בל"מ badge can't render for
cases that flipped the field manually (only the legacy
appeal_subtype LIKE 'extension_request_%' path was firing).
Added the field to both detail=false and detail=true branches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same case_number can exist as both a regular appeal (ערר) and an
extension-of-time request (בל"מ), and we were inferring the difference
from appeal_subtype prefixes — fragile, and case-number lookups
weren't disambiguated. Now stored as a first-class field on both
case_law (corpus) and cases (live cases), with partial unique indexes
on (case_number, proceeding_type).
- SCHEMA_V15: column + CHECK constraints + backfill from
appeal_subtype LIKE 'extension_request_%' + partial unique indexes
replace the old global UNIQUE(case_number).
- derive_proceeding_type() centralizes the inference rule
(extension_request_* → בל"מ; subject regex fallback; default ערר).
- Metadata extractor prompt asks Claude to populate the new field
explicitly; apply_to_record writes it for internal_committee rows.
- internal_decision_upload, case_create, case_update accept an
optional proceeding_type; FastAPI request models expose it.
- Wizard + edit dialog get a sided Select; case header renders the
resolved label (ערר / בל"מ).
- Uploaded the 2 staged בל"מ decisions on betterment levy:
8126/24 (סופר נוח, 13 chunks), 8047/23 (הרנון, 48 chunks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four parallel sub-agents closed the remaining critical gaps from the
26/05 Stage A/B sprint. Each block independently tested; aggregated here.
## #30/#31 finalizers (sub-agent A)
* Auto-derive practice_area in case_create from case_number prefix
(1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197);
default for CaseCreateRequest is now "" (the DB constraint catches
any stray "appeals_committee").
* practice_area.py: derive_subtype now handles axis-B domain values
(rishuy_uvniya/betterment_levy/compensation_197) without parsing the
case number; new helper derive_domain_practice_area().
* Halacha re-extraction verified unnecessary — all 6 reclassified
records already had is_binding=false and approved halachot.
* Regression tests: 6 cases in tests/test_corpus_constraints.py
covering practice_area enum, internal-committee chair/district,
external-upload arar prefix, MCP guard.
* UI: district input → Select dropdown (7 districts) in
precedent-edit-sheet.tsx, preserving legacy free-text values.
## #37 בל"מ subtypes (sub-agent B)
* 3 new appeal_subtypes: extension_request_{building_permit,
betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended,
SUBTYPES_BY_AREA mappings added.
* New helpers: is_blam_subject(), is_blam_subtype(),
derive_subtype_with_blam(case_number, subject, practice_area).
case_create now uses it to auto-detect "בקשה להארכת מועד" subjects.
* 3 methodology templates under docs/methodology/extension-request-*.md.
* paperclip_client.py mapping updated for the 3 new subtypes
(extension_request_building_permit→CMP, the other two→CMPA).
* Frontend: bilingual "בל"מ" badge + filter dropdown on cases list +
detail header; appeal-type-bars collapseBlam() merges בל"מ into its
parent domain for aggregate bars.
* Wizard auto-detects בל"מ from subject during case creation.
* 3 Berlinger cases (1017/1018/1019-03-26) migrated to
appeal_subtype=extension_request_building_permit via psql.
## #35 missing_precedents feature (sub-agent C)
* Schema V13: missing_precedents table (citation, case_id, party,
legal_topic, status, linked_case_law_id, claim_quote, ...) +
FK constraints + 3 indexes. Applied via psql + idempotent migration.
* 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints
(POST/GET/PATCH/DELETE/upload — upload routes by citation prefix
to ingest_internal_decision or ingest_precedent).
* Next.js page /missing-precedents with 5 status tabs + filters +
sidebar badge counter + detail drawer with metadata edit + smart
upload form that switches fields per committee/court.
* Bootstrap: 7 rows imported from the JSON file
(3 citations × cases, all status=closed with linked_case_law_id).
* legal-researcher.md: new §2ב.5 with missing_precedent_create
usage + dedup semantics + tool grant.
## #36 legal_arguments aggregation (sub-agent D)
* Schema V14: legal_arguments + legal_argument_propositions M:M.
Applied via psql.
* New service argument_aggregator.py with two functions —
aggregate_claims_to_arguments() (Claude CLI / claude_session) and
get_legal_arguments(). Graceful llm_unavailable handling when CLI
is missing (containers).
* 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as
BackgroundTask, GET .../legal-arguments).
* Frontend: shadcn Accordion + new legal-arguments-panel.tsx with
hierarchical (party → priority badge → arguments) display, "טיעונים"
tab on the case page, "חשב/חשב מחדש" buttons.
* scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run
found 8 candidate cases including 1017/1018/1019.
## Open follow-ups (intentionally deferred)
* npm run api:types in web-ui (CLAUDE.md flow) — recommended before
the next UI commit; not required for backend deployment.
* Run backfill_legal_arguments.py --apply once the container picks up
the new aggregator service.
* webhook on missing-precedents upload-close to Paperclip (optional).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wrap date.fromisoformat() in try/except in case_update tool — prevents
unhandled ValueError from surfacing as 500; FastAPI now catches it as 422
- Add DialogDescription (sr-only) to 5 dialogs missing aria-describedby:
documents-panel preview + delete, drafts-panel delete + feedback, link-related-dialog
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fields existed in DB and Precedent type but were missing from:
- PrecedentUpdateRequest (backend model)
- update_case_law allowed set (db layer)
- PrecedentPatch (frontend type)
- precedent-edit-sheet form state, inputs, and patch payload
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Archived cases have archived_at IS NOT NULL — they are not "stuck",
they are done. The stale query was missing this filter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If pc_wake_ceo fails, the endpoint now raises HTTP 502 and skips the
case_update to processing — preventing cases from silently getting stuck
with no CEO running. Also adds `processing` to CEO routing table and
updates case_list docstring with full status list.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix keyboard navigation bug: React was reusing the submit button DOM element
when transitioning "הבא" → "צור תיק", retaining focus and causing Enter to
auto-submit step 3. Added key props to force element replacement.
- CaseEditDialog now covers all wizard fields: appellants, respondents,
property_address, permit_number (in addition to existing title, subject,
hearing_date, expected_outcome, notes).
- When case title changes, Paperclip project name is updated in background
via new update_project_name() in paperclip_client.py.
- Extended CaseUpdateRequest, case_update MCP tool, and caseUpdateSchema
to carry the new fields end-to-end.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET /api/cases/stale?days=N — returns cases not updated in N days (default 3)
that are not in 'final' or 'new' status, with days_stale count.
GET /api/chair-feedback/weekly-summary?days=N — returns chair feedback from
the last N days (default 7) as a Hebrew bullet-list summary for CEO agent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
asyncpg returns JSONB columns as raw JSON strings when no type codec is
configured (only pgvector is registered in _init_connection). The stored
value is a correct JSONB array (jsonb_typeof=array confirmed), but
asyncpg decodes it as str. Parse it explicitly in the GET handler so
the frontend receives the correct Python list/dict.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
asyncpg cannot encode a Python list as JSONB directly (expects str).
Passing str with ::jsonb causes double-encoding (stored as JSONB string).
Solution: json.dumps() the value → pass as text → PostgreSQL parses
with ::text::jsonb cast, storing it as the correct JSONB array/object.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass req.value directly to asyncpg instead of json.dumps(req.value).
When a Python string was passed with ::jsonb, asyncpg encoded it as a
JSONB string (not an array), causing the frontend spread operator to
split it into individual characters — one textarea per character.
Also fix typo in DISCUSSION_RULES default: "אסה" → "מאסה".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add ability to mark case_law records as related (e.g. same appeal
through ועדת ערר → מנהלי → עליון):
- DB: case_law_relations join table (bidirectional, V11 migration)
- DB CRUD: add/remove/get_case_law_relations
- Service: get_precedent() now returns related_cases[]
- MCP: precedent_link_cases + precedent_unlink_cases tools
- REST: POST/DELETE /api/precedent-library/{id}/relations
- UI: RelatedCasesSection on detail page with search dialog and unlink
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The "חלץ מטא-דאטה" / "חלץ הלכות" buttons in the UI used to only stamp
the queue (set metadata_extraction_requested_at / halacha_extraction_requested_at)
and rely on a human running `mcp__legal-ai__precedent_process_pending` from
local Claude Code to drain it.
That left the user with an unintuitive two-step flow: click button → run
local MCP tool. Meanwhile, the upload endpoint already does the right
thing — after ingest succeeds it calls `pc_wake_for_precedent_extraction`,
which creates a Paperclip issue, assigns it to the CEO, and wakes them
to run `precedent_process_pending` automatically.
Add the same wakeup call to the manual request-metadata / request-halachot
endpoints. Now clicking the button is sufficient — the CEO picks it up
and drains the queue without manual intervention.
Best-effort: matches the upload flow's failure semantics. The queue stamp
still happens even if the wakeup fails, so the user can fall back to the
manual MCP tool when needed. The wakeup outcome is included in the
response under `wakeup` for observability.
Coolify deploy required for the FastAPI container to pick this up.