Same case_number can exist as both a regular appeal (ערר) and an
extension-of-time request (בל"מ), and we were inferring the difference
from appeal_subtype prefixes — fragile, and case-number lookups
weren't disambiguated. Now stored as a first-class field on both
case_law (corpus) and cases (live cases), with partial unique indexes
on (case_number, proceeding_type).
- SCHEMA_V15: column + CHECK constraints + backfill from
appeal_subtype LIKE 'extension_request_%' + partial unique indexes
replace the old global UNIQUE(case_number).
- derive_proceeding_type() centralizes the inference rule
(extension_request_* → בל"מ; subject regex fallback; default ערר).
- Metadata extractor prompt asks Claude to populate the new field
explicitly; apply_to_record writes it for internal_committee rows.
- internal_decision_upload, case_create, case_update accept an
optional proceeding_type; FastAPI request models expose it.
- Wizard + edit dialog get a sided Select; case header renders the
resolved label (ערר / בל"מ).
- Uploaded the 2 staged בל"מ decisions on betterment levy:
8126/24 (סופר נוח, 13 chunks), 8047/23 (הרנון, 48 chunks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four parallel sub-agents closed the remaining critical gaps from the
26/05 Stage A/B sprint. Each block independently tested; aggregated here.
## #30/#31 finalizers (sub-agent A)
* Auto-derive practice_area in case_create from case_number prefix
(1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197);
default for CaseCreateRequest is now "" (the DB constraint catches
any stray "appeals_committee").
* practice_area.py: derive_subtype now handles axis-B domain values
(rishuy_uvniya/betterment_levy/compensation_197) without parsing the
case number; new helper derive_domain_practice_area().
* Halacha re-extraction verified unnecessary — all 6 reclassified
records already had is_binding=false and approved halachot.
* Regression tests: 6 cases in tests/test_corpus_constraints.py
covering practice_area enum, internal-committee chair/district,
external-upload arar prefix, MCP guard.
* UI: district input → Select dropdown (7 districts) in
precedent-edit-sheet.tsx, preserving legacy free-text values.
## #37 בל"מ subtypes (sub-agent B)
* 3 new appeal_subtypes: extension_request_{building_permit,
betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended,
SUBTYPES_BY_AREA mappings added.
* New helpers: is_blam_subject(), is_blam_subtype(),
derive_subtype_with_blam(case_number, subject, practice_area).
case_create now uses it to auto-detect "בקשה להארכת מועד" subjects.
* 3 methodology templates under docs/methodology/extension-request-*.md.
* paperclip_client.py mapping updated for the 3 new subtypes
(extension_request_building_permit→CMP, the other two→CMPA).
* Frontend: bilingual "בל"מ" badge + filter dropdown on cases list +
detail header; appeal-type-bars collapseBlam() merges בל"מ into its
parent domain for aggregate bars.
* Wizard auto-detects בל"מ from subject during case creation.
* 3 Berlinger cases (1017/1018/1019-03-26) migrated to
appeal_subtype=extension_request_building_permit via psql.
## #35 missing_precedents feature (sub-agent C)
* Schema V13: missing_precedents table (citation, case_id, party,
legal_topic, status, linked_case_law_id, claim_quote, ...) +
FK constraints + 3 indexes. Applied via psql + idempotent migration.
* 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints
(POST/GET/PATCH/DELETE/upload — upload routes by citation prefix
to ingest_internal_decision or ingest_precedent).
* Next.js page /missing-precedents with 5 status tabs + filters +
sidebar badge counter + detail drawer with metadata edit + smart
upload form that switches fields per committee/court.
* Bootstrap: 7 rows imported from the JSON file
(3 citations × cases, all status=closed with linked_case_law_id).
* legal-researcher.md: new §2ב.5 with missing_precedent_create
usage + dedup semantics + tool grant.
## #36 legal_arguments aggregation (sub-agent D)
* Schema V14: legal_arguments + legal_argument_propositions M:M.
Applied via psql.
* New service argument_aggregator.py with two functions —
aggregate_claims_to_arguments() (Claude CLI / claude_session) and
get_legal_arguments(). Graceful llm_unavailable handling when CLI
is missing (containers).
* 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as
BackgroundTask, GET .../legal-arguments).
* Frontend: shadcn Accordion + new legal-arguments-panel.tsx with
hierarchical (party → priority badge → arguments) display, "טיעונים"
tab on the case page, "חשב/חשב מחדש" buttons.
* scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run
found 8 candidate cases including 1017/1018/1019.
## Open follow-ups (intentionally deferred)
* npm run api:types in web-ui (CLAUDE.md flow) — recommended before
the next UI commit; not required for backend deployment.
* Run backfill_legal_arguments.py --apply once the container picks up
the new aggregator service.
* webhook on missing-precedents upload-close to Paperclip (optional).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A/B test (2026-05-05) showed DeepSeek V4-Pro is 2-3x faster and ~20x cheaper
than Sonnet for style/lexicon pattern analysis, with comparable quality.
Adds adapters/deepseek-paperclip-adapter/ package, documents adapter requirements
(env injection, run-id headers), updates CLAUDE.md with adapter integration notes,
and records lessons from ערר 1200-25 (block order for 1xxx, "להלן מתוך" pattern,
expanded factual background, bridge planning analysis, flat heading structure).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
paperclip-dev is for maintaining the Paperclip codebase itself — not
relevant to legal work. Removed from all 14 agents (was on CMPA mirror).
paperclip-converting-plans-to-tasks helps decompose a plan into assigned
issues. Useful for the planning-heavy agents (CEO, analyst). Now scoped
to those two — removed from the other 5 in CMPA where it had crept in.
Net effect: zero drift on paperclipai/* skills across all 7 master+mirror
pairs. Verified via the new Agents tab dashboard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the legal-ai ↔ Paperclip integration in line with the official
Paperclip skill. Net effect: HEARTBEAT.md -47% (370→195 lines), all 14
agents on uniform runtime_config + budget + instructionsBundleMode, and
two cross-company helpers replacing manual SQL.
Highlights:
- HEARTBEAT.md refactor: project-specific only, delegates to the official
paperclipai/paperclip skill (loaded per agent). Adds heartbeat-context
fast-path (§1.7) and PAPERCLIP_WAKE_PAYLOAD_JSON shortcut (§1.5).
- Issue Thread Interactions API: legal-ceo.md now uses
ask_user_questions / request_confirmation / suggest_tasks instead of
free-text comments — gives chair structured UI with idempotency keys.
- pc.sh + paperclip_api.pc_request: every API call goes through helpers
that inject Authorization + X-Paperclip-Run-Id (audit trail).
- sync_agents_across_companies.py: master(CMP)→mirror(CMPA) sync via
Paperclip API, idempotent, with --verify and --apply modes.
- skills/new-company-setup: 11-step blueprint distilling all 11 gaps
into a single onboarding runbook for the next company.
- .taskmaster: 12 tasks covering each gap (one already closed: #29).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The legacy chunker did not track which PDF page each chunk came from.
Stored chunks had page_number=NULL, which blocked the multimodal
hybrid retriever's text+image boost — it joins (chunk, image) on
(document_id, page_number) and the join could never fire.
This change:
- extractor.extract_text now returns (text, page_count, page_offsets);
page_offsets[i] is the start char offset of page (i+1) in the joined
text. None for non-PDFs.
- chunker.chunk_document accepts an optional page_offsets and tags
each chunk with the page that contains its first character (uses
the existing chunker logic; pages assigned post-hoc by content
search to keep the diff minimal).
- processor.process_document and precedent_library.ingest_precedent
forward page_offsets through the chunker. New uploads now carry
accurate page_number on every chunk.
- Other extract_text callers (tools/documents, tools/workflow,
web/app.py) updated to unpack the third element (ignored).
- scripts/backfill_chunk_pages.py: per-case retrofit. Re-extracts each
PDF (re-OCRs via Google Vision if needed, ~$0.0015/page), computes
page_offsets, and updates page_number on every chunk by content
search. Idempotent; --force re-runs on already-tagged docs.
Forward-only would leave the 419 image embeddings backfilled on
cases 8174-24 + 8137-24 unable to boost their corresponding text
chunks. The retrofit script closes that gap (cost ~$0.60).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid
text+image search. Off by default; enable with MULTIMODAL_ENABLED=true.
- Schema V9: document_image_embeddings + precedent_image_embeddings
(vector(1024), page_number, image_thumbnail_path)
- extractor.render_pages_for_multimodal renders PDF pages at
MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at
MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass
- embeddings.embed_images calls voyage-multimodal-3 in 50-page batches
- services/hybrid_search.py orchestrator: rerank applied to text side
first (rerank-2 is text-only); image side cosine; weighted merge
with text_weight 0.65 (env-tunable); image-only pages surface as
match_type='image' so dense scanned content still appears
- processor.process_document and precedent_library.ingest_precedent
gated by flag — non-fatal on multimodal failure
- scripts/multimodal_backfill.py — idempotent per-case CLI to embed
existing documents without re-extracting text
Validated locally on a 5-page response brief: render 0.31s, embed 8.32s,
hybrid merge surfaces image rows correctly. Production rollout starts
with flag=false (no behavior change), then per-case A/B.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:
1. subprocess.run() blocks the asyncio event loop in the MCP server
for the full duration of every LLM call (60-180s typical).
2. The 120-second timeout was below the cold-cache cost of any
document over ~12K Hebrew characters. Three back-to-back timeouts
on case 8174-24 dropped 43 appellant claims on the floor.
Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:
claude_session.py
• query / query_json are now async — asyncio.create_subprocess_exec
instead of subprocess.run, so MCP server can serve other coroutines
while a call is in flight.
• DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
document hits it; bounded so a runaway never zombifies forever.
• LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
• TimeoutError now actually kills the subprocess (asyncio.wait_for
cancellation alone leaves the child running).
claims_extractor.py
• _split_by_sections: chunks at numbered sections / Hebrew letter
headings / "פרק" markers / markdown ##, falls back to paragraph
breaks, then to hard splits. Targets 12K chars per chunk — small
enough that each chunk reliably finishes inside the timeout.
• _extract_chunk: per-chunk retry (1 attempt by default) with
structured logging on failure. Failed chunks no longer crash the
overall extraction; they're skipped with a partial-result warning.
• extract_claims_with_ai now runs chunks in parallel via
asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
For a 25K-char appeal: was sequential 150-300s, now ~70-90s.
Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.
The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.
Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes critical bug in 1033-25: user-uploaded עריכה-*.docx files were
orphaned on disk while exports kept rebuilding from stale DB blocks.
New architecture:
- User-uploaded DOCX becomes the source of truth (cases.active_draft_path)
- System edits via XML surgery with real Word <w:ins>/<w:del> revisions
- User can Accept/Reject each change from within Word
Components:
- docx_reviser.py: XML surgery for Track Changes (15 tests)
- docx_retrofit.py: retroactive bookmark injection with Hebrew marker
detection + heading heuristic (9 tests)
- docx_exporter.py: emits bookmarks around each of the 12 blocks
- 3 new MCP tools: apply_user_edit, list_bookmarks, revise_draft
- 4 new/updated endpoints: upload (auto-registers active draft),
/exports/revise, /exports/bookmarks, /exports/{filename}/retrofit,
/active-draft
- DB migration: cases.active_draft_path column
- UI: correct banner using real v-numbers, "מקור האמת" badge,
detailed upload toast with bookmarks_added/missing_blocks
- agents: legal-exporter (3 export modes), legal-ceo (stage G for
revision handling), legal-writer (revision mode)
Multi-tenancy:
- Works for both CMP (1xxx cases) and CMPA (8xxx/9xxx cases)
- New revise-draft skill added to both companies
- deploy-track-changes.sh syncs skills CMP ↔ CMPA
- retrofit_case.py: one-off retrofit of existing files
Tests: 34 passing (15 reviser + 9 retrofit + 4 exporter bookmarks + 6 e2e)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Active scripts (5): auto-sync-cases.sh, backup-db.sh, restore-db.sh,
notify.py, bidi_table.py
Archived (17): one-time migration/seeding scripts whose functionality
is now in MCP server or web API. Moved to scripts/.archive/
Deleted (5): zero-value scripts (duplicates, hardcoded single-case,
debug scripts)
Added scripts/SCRIPTS.md — registry of all scripts with purpose,
status, and what superseded them. CLAUDE.md updated with rule:
any script change requires SCRIPTS.md update.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>