X2/X3/X4 invariants are facts about this system's own integration/ops (no external
authority); they use מקור-סמכות=project runbooks, tied to a global engineering invariant.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per chair clarification: the ≥3-authoritative-source verification protocol governs
ENGINEERING/architecture decisions only (G1–G10). Legal-domain content (G11) is the
authority of the chair + project docs (block-schema, decision-methodology, lessons,
skills/decision) — NOT externally triple-sourced.
- §2/§4/§5 scoped to engineering invariants; added the two-authority distinction
- G11 reframed: source-of-authority = chair + project docs; removed FJC/South Bucks/
1958-statute as "sources to verify" and the UNVERIFIED flag
- Removed the "open items — primary-source verification" section (the over-application)
- Pruned now-orphaned legal sources from the appendix (kept NCSC/CEPEJ/FJC for G9/G10)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
13 tasks across 3 phases (keystone constitution → lifecycle files → cross-cutting),
each verification-gated (≥3 sources or UNVERIFIED+escalate) with review checkpoints.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Establishes the foundation to fix a recurring root-cause failure class
(non-canonical identifiers, asymmetric ingest paths, silent manual gates):
- Confirmed system mission (quasi-judicial decision assistant; human decides)
- Decomposition into 5 sub-projects (spec → audit → integrity layer → re-check → process agents)
- spec-set structure under docs/spec/ (lifecycle-organized + cross-cutting files)
- 11 global invariants + engineering rules, each backed by ≥3 authoritative sources
(NCSC/JTC, FJC, CEPEJ, South Bucks; RAG/Lewis, Manning IR, Elastic/Pinecone/Weaviate;
DAMA-DMBOK, ISO 8000, ISO 15489, Kleppmann, Codd, Fowler)
- 3-source verification protocol; UNVERIFIED items escalated, not decided solo
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Four parallel sub-agents closed the remaining critical gaps from the
26/05 Stage A/B sprint. Each block independently tested; aggregated here.
## #30/#31 finalizers (sub-agent A)
* Auto-derive practice_area in case_create from case_number prefix
(1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197);
default for CaseCreateRequest is now "" (the DB constraint catches
any stray "appeals_committee").
* practice_area.py: derive_subtype now handles axis-B domain values
(rishuy_uvniya/betterment_levy/compensation_197) without parsing the
case number; new helper derive_domain_practice_area().
* Halacha re-extraction verified unnecessary — all 6 reclassified
records already had is_binding=false and approved halachot.
* Regression tests: 6 cases in tests/test_corpus_constraints.py
covering practice_area enum, internal-committee chair/district,
external-upload arar prefix, MCP guard.
* UI: district input → Select dropdown (7 districts) in
precedent-edit-sheet.tsx, preserving legacy free-text values.
## #37 בל"מ subtypes (sub-agent B)
* 3 new appeal_subtypes: extension_request_{building_permit,
betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended,
SUBTYPES_BY_AREA mappings added.
* New helpers: is_blam_subject(), is_blam_subtype(),
derive_subtype_with_blam(case_number, subject, practice_area).
case_create now uses it to auto-detect "בקשה להארכת מועד" subjects.
* 3 methodology templates under docs/methodology/extension-request-*.md.
* paperclip_client.py mapping updated for the 3 new subtypes
(extension_request_building_permit→CMP, the other two→CMPA).
* Frontend: bilingual "בל"מ" badge + filter dropdown on cases list +
detail header; appeal-type-bars collapseBlam() merges בל"מ into its
parent domain for aggregate bars.
* Wizard auto-detects בל"מ from subject during case creation.
* 3 Berlinger cases (1017/1018/1019-03-26) migrated to
appeal_subtype=extension_request_building_permit via psql.
## #35 missing_precedents feature (sub-agent C)
* Schema V13: missing_precedents table (citation, case_id, party,
legal_topic, status, linked_case_law_id, claim_quote, ...) +
FK constraints + 3 indexes. Applied via psql + idempotent migration.
* 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints
(POST/GET/PATCH/DELETE/upload — upload routes by citation prefix
to ingest_internal_decision or ingest_precedent).
* Next.js page /missing-precedents with 5 status tabs + filters +
sidebar badge counter + detail drawer with metadata edit + smart
upload form that switches fields per committee/court.
* Bootstrap: 7 rows imported from the JSON file
(3 citations × cases, all status=closed with linked_case_law_id).
* legal-researcher.md: new §2ב.5 with missing_precedent_create
usage + dedup semantics + tool grant.
## #36 legal_arguments aggregation (sub-agent D)
* Schema V14: legal_arguments + legal_argument_propositions M:M.
Applied via psql.
* New service argument_aggregator.py with two functions —
aggregate_claims_to_arguments() (Claude CLI / claude_session) and
get_legal_arguments(). Graceful llm_unavailable handling when CLI
is missing (containers).
* 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as
BackgroundTask, GET .../legal-arguments).
* Frontend: shadcn Accordion + new legal-arguments-panel.tsx with
hierarchical (party → priority badge → arguments) display, "טיעונים"
tab on the case page, "חשב/חשב מחדש" buttons.
* scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run
found 8 candidate cases including 1017/1018/1019.
## Open follow-ups (intentionally deferred)
* npm run api:types in web-ui (CLAUDE.md flow) — recommended before
the next UI commit; not required for backend deployment.
* Run backfill_legal_arguments.py --apply once the container picks up
the new aggregator service.
* webhook on missing-precedents upload-close to Paperclip (optional).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document new daphna-procedural-patterns.md cataloging the
"appraiser clarification request" interim-decision pattern observed in
8174-24 — structure only, not phrasing (case is an outlier example).
- daphna-decision-tree.md §0.5: gating question before main tree
- legal-ceo.md voice docs table: register procedural patterns doc
- legal-writer.md: mandatory consultation when pattern_tag is set,
with explicit warning against copying 8174-24 wording
Approved via interaction request_confirmation (CMPA-15) 2026-05-17.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- עדכון טבלת מצב: כל המודלים מסונכרנים (instructions = DB)
- החלפת טבלת בעיות בטבלת סטטוס תיקונים עם commit references
- הוסף טבלת שינויים נוספים מהסשן
- הערה: Skills CMPA=6 עיצוב מכוון, verify מאשר "0 need sync"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7-agent parallel audit of all Paperclip agents (CEO, analyst,
researcher, writer, QA, exporter, proofreader, curator).
Found 12 issues including 3 critical:
- Exporter: V vs v naming mismatch in DOCX versioning
- Exporter: case.status not updated to exported after export
- Researcher: section ז missing from case 8174-24
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A/B test (2026-05-05) showed DeepSeek V4-Pro is 2-3x faster and ~20x cheaper
than Sonnet for style/lexicon pattern analysis, with comparable quality.
Adds adapters/deepseek-paperclip-adapter/ package, documents adapter requirements
(env injection, run-id headers), updates CLAUDE.md with adapter integration notes,
and records lessons from ערר 1200-25 (block order for 1xxx, "להלן מתוך" pattern,
expanded factual background, bridge planning analysis, flat heading structure).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Settings page extension to view and edit MCP server config (env vars,
tools, client registrations) — hybrid edit model: non-secrets editable
through Infisical, secrets read-only with drift detection vs container.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage C of the voyage-upgrades-plan shipped to production on
2026-05-03. The doc now leads with the final state and the two
empirical corrections vs the original plan:
1. Reciprocal Rank Fusion replaces weighted-sum hybrid merge.
voyage-3 cosines (~0.4-0.5) systematically outscale
voyage-multimodal-3 cosines (~0.20-0.25); a weighted sum lets
text dominate even when image is the better signal. RRF is
rank-based and robust to scale differences.
2. Chunker now propagates page_number end-to-end (extractor returns
per-page offsets, chunker tags each chunk by its first character's
page). A retrofit script backfills page_number on existing
document_chunks without re-OCR — uses the stored
documents.extracted_text plus PyMuPDF direct text reads as page
anchors (linear interpolation for OCR-only pages).
Production state on cases 8174-24 + 8137-24: 419 page-image
embeddings, 819 chunks tagged with page_number, MULTIMODAL_ENABLED=true
in Coolify env, hybrid search verified A/B against text-only baseline.
The original stage C plan section is retained below for reference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase A — voyage-3 migration (executed):
- VOYAGE_MODEL=voyage-3 set in Coolify (legal-ai app) and ~/.env
- scripts/reembed_voyage.py: re-embeds document_chunks (6157),
case_law_embeddings (9), precedent_chunks (385), and halachot (400)
using the new model. paragraph_embeddings was empty. 6951 rows
re-embedded in 93s, ~75 rows/sec.
- Same 1024 dim → no schema change needed.
Why voyage-3 over voyage-law-2: benchmark on 3 Hebrew legal queries
with real passages from the corpus gave voyage-3 perfect ordering on
3/3 tests AND the largest separation (+0.483 vs voyage-law-2's
+0.238). voyage-4 family had bigger separation but missed top-1 on
the hardest test.
Phase B (voyage-context-3) and Phase C (voyage-multimodal-3.5 for
scanned + appraiser docs) are designed in docs/voyage-upgrades-plan.md
but deferred — to be picked up in a fresh conversation. The plan
includes:
- Phase B: contextualized embeddings refactor (~49% recall lift on
legal docs per Anthropic's research). Same dim, but ingestion
pipeline must pass full doc context per chunk.
- Phase C: page-level image embeddings via voyage-multimodal-3.5,
stored in a parallel *_image_embeddings table. Hybrid text+image
search. Targets appraiser report tables and scanned PDFs where
current OCR loses layout.
After this commit: MCP server needs a /mcp reconnect to pick up the
new VOYAGE_MODEL env, and the legal-ai container will pick it up on
its next redeploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two latent issues surfaced today while watching the case 8174-24
end-to-end run, both worth documenting and engineering around because
they will recur on every future case.
Bug 1 — issue.released flips done→todo
After an agent successfully PATCHes its issue to "done", Paperclip's
internal issue.released action reverts the status to "todo" within
~30 seconds. This triggers a fresh wakeup of the same agent on a
task that is already complete.
Reproduced on CMPA-18 (30/04/26):
18:14:57 agent PATCH → status: done
18:15:35 Paperclip → issue.released → status: todo
18:15:54 new researcher run started
The fix at the right altitude (Paperclip itself) is outside our repo.
Mitigation in HEARTBEAT.md §3 — when an agent boots and finds the
issue in `todo` while expected outputs (file, DB rows) already exist,
it must short-circuit: post a "no change" comment, PATCH back to done,
and exit. Costs ~$0.20 per false wakeup but breaks the loop.
Bug 2 — Bash backtick trap on long comment bodies
Researcher agent built a curl pipeline like:
curl ... -d "$(python3 -c "body = '''...
📁 קובץ מחקר: `/path/to/file.md`
'''")"
The backticks around the file path (markdown convention) get
evaluated by the OUTER bash $(...) as command substitution. Bash
then tries to exec /path/to/file.md, which is not executable, and
prints "Permission denied" — a misleading error since the actual
file ownership is fine. The curl itself succeeded; only the bash
prelude noised up the log.
Fix in HEARTBEAT.md §4א: long bodies must go via Write→tempfile
then `curl -d @file`. Avoids every shell quoting edge case.
Files:
• docs/paperclip-quirks.md — new. Full writeup of both bugs plus
two prior known-quirks (CEO auto-block in_progress, INSERT vs
API for wakeups). Each section: what happens, empirical evidence
from logs, impact, workaround, status.
• .claude/agents/HEARTBEAT.md — added the self-recovery section to
§3 and the temp-file pattern to §4א. The temp-file pattern is the
canonical answer for any agent posting markdown comments —
applies to all 7 agents in this skill set.
• CLAUDE.md — referenced the new doc from the docs index.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the full deletion procedure we worked out empirically while
wiping case 8174-24 for a clean rerun. Covers all four systems where
case state lives, in dependency order:
1. legal-ai DB + on-disk dir — DELETE /api/cases?remove_files=true
(now actually works after 903fb4d added the missing db.delete_case)
2. Paperclip DB — no API; raw SQL with explicit FK-blocker ordering
(issue_comments, cost_events, finance_events, feedback_votes,
issue_inbox_archives, issue_read_states must go before issues;
heartbeat_runs.wakeup_request_id must be NULLed before
agent_wakeup_requests can be deleted)
3. Gitea — DELETE /api/v1/repos/cases/{N}
4. Verification queries for each system
Two gotchas worth highlighting in the doc:
• The case directory inside /data/cases is owned by root because the
container runs as root — host-side rm needs sudo, or use the API
(rmtree happens inside the container).
• Paperclip projects are referenced via name LIKE '%{N}%' since
there's no slug column. Stricter matching is recommended if N
appears in multiple project names.
Linked from legal-ai/CLAUDE.md docs index. A future scripts/delete-case.sh
that automates the runbook with a confirmation prompt is noted as TODO
inside the runbook itself.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comparison of our draft (טיוטה-v6, 2,126 words) against Dafna's final
decision (עריכה-v2, 2,299 words). 14 lessons (#20-#33) covering what
the draft got right and where she rebuilt the discussion.
Key findings:
- Lesson #20: Match doctrinal depth to legal uncertainty. In clean
acceptance the committee's OWN conditions provide the anchor — no
CREAC framework needed. The draft's 101-word "נבאר" doctrinal
paragraph was deleted entirely.
- Lesson #21: Plant analytical seeds in the background ("ודוק"
foreshadowing) for technical planning distinctions.
- Lesson #23: Concrete documentary evidence (specific permits in
buildings 5, 7, 11) beats generic statements.
- Lesson #25: Counter-factual reasoning — "approved by mistake" gives
the committee benefit of the doubt while strengthening reversal.
- Lesson #26: Engineer counter-factual — "had he known the shadow plan
was not feasible, his opposition would have been even stronger".
- Lesson #27: "אכן...אולם" / "לא נעלם מעינינו" patterns are for
rejection, NOT acceptance. Don't use prophylactically.
- Lesson #28: "ונפרט;" (ו prefix + semicolon), never "נפרט." with
period.
- Lesson #33: Full acceptance against permit applicant → no expenses
to either side.
New transition phrases catalogued: "דיון עקר", "אושרה מתוך טעות כי הרי
לא נוכל להניח כי אושרה למראית עין", "ועדת הערר אפשרה מרחב של זמן
בתקווה כי ההחלטה תתייתר", "להלן כדוגמא מתוך", "ברי כי הכוונה ל...".
Several of these lessons fed directly into daphna-acceptance-architecture.md
(template A) and daphna-decision-tree.md from the recent voice corpus
work; this file remains the case-study record.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new voice docs based on deep reading of 1033-25 (full-acceptance) and
7 representative cases for block-zayin (claims summary):
- daphna-acceptance-architecture.md: 5 distinct templates for case acceptance
(A: internal flaw + voiding; B: remand to committee; C: corrections in
request; D: substantive 8xxx; E: appraiser remand). Fixes the wrong
reference in architecture-by-outcome that treated full-acceptance as a
variation of partial-acceptance.
- daphna-block-zayin-claims.md: rules for claims summary block — order by
procedural role, neutrality, sub-headings per party, anti-patterns
(numbered lists, evaluation words, premature conclusion).
- daphna-decision-tree.md: operational tool that unifies all 5 voice docs
into a short analytical process. Starts with the decisive question:
"what is the winning evidence?". Decision trees for architecture
selection, opening mode, citation choice, length by weight.
Updates legal-writer.md to read decision-tree first, then the 5 voice docs,
plus block-zayin.md before block ז.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>