Two bugs caused all 5 interim blocks to fail with "Claude CLI failed
(exit 1): unknown error":
1. source_context was embedded BOTH inside the prompt template (via
{source_context}) AND prepended again in write_block — doubling every
block's context size (232K chars × 2 = 465K chars).
2. _build_source_context loaded all 9 case documents for every block
regardless of relevance.
Fixes:
- Remove the duplicate source_context prepend in write_block; the
template already contains it via {source_context}
- Add per-block document filtering (_BLOCK_DOC_TYPES): block-he/zayin →
empty, block-chet → protocol only, block-tet → appraisals only
- Add 400K char guard before calling claude -p with a descriptive error
(vs opaque "exit 1: unknown error")
- Add prompt-size warning and size info in claude_session error messages
Result: block-he 0 chars, block-zayin 0 chars, block-vav ~172K,
block-chet ~45K, block-tet ~300K (all under 400K limit)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The claude_session bridge had two structural defects that made any
non-trivial document extraction unreliable:
1. subprocess.run() blocks the asyncio event loop in the MCP server
for the full duration of every LLM call (60-180s typical).
2. The 120-second timeout was below the cold-cache cost of any
document over ~12K Hebrew characters. Three back-to-back timeouts
on case 8174-24 dropped 43 appellant claims on the floor.
Phase 1 of the remediation plan — keeps claude_session as the engine
(no Anthropic API switch) and restructures around it:
claude_session.py
• query / query_json are now async — asyncio.create_subprocess_exec
instead of subprocess.run, so MCP server can serve other coroutines
while a call is in flight.
• DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic
document hits it; bounded so a runaway never zombifies forever.
• LONG_TIMEOUT 300 → 3600 for opus block writing on full case context.
• TimeoutError now actually kills the subprocess (asyncio.wait_for
cancellation alone leaves the child running).
claims_extractor.py
• _split_by_sections: chunks at numbered sections / Hebrew letter
headings / "פרק" markers / markdown ##, falls back to paragraph
breaks, then to hard splits. Targets 12K chars per chunk — small
enough that each chunk reliably finishes inside the timeout.
• _extract_chunk: per-chunk retry (1 attempt by default) with
structured logging on failure. Failed chunks no longer crash the
overall extraction; they're skipped with a partial-result warning.
• extract_claims_with_ai now runs chunks in parallel via
asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3).
For a 25K-char appeal: was sequential 150-300s, now ~70-90s.
Updated all 9 callers (claims, appraiser facts, block writer, qa
validator, brainstorm, learning loop, style analyzer × 3) to await
the now-async API.
The one-shot scripts/extract_claims_8174.py used to recover 43
appellant claims on case 8174-24 has been moved to .archive/ — phase 1
makes it obsolete. SCRIPTS.md updated.
Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent
llm_tasks table, SSE progress) is the structural follow-up — separate PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Until now changing a document's doc_type required a manual SQL update.
Adds an inline editor on the document badge so the chair can retag
without leaving the case page, and threads an appraiser_side tag
(committee / appellant / deciding) through the appraisal pipeline so
betterment-levy cases — which usually have 2-3 appraisers — render
conflicts with the deciding appraiser's view marked as governing.
Backend
- New appraiser_facts.appraiser_side column (V5.1) populated from
documents.metadata.appraiser_side at extraction time.
- extract_appraiser_facts now returns status='sides_missing' with the
list of untagged appraisals instead of running with empty side
labels — chair must tag every appraisal first via the UI.
- Conflict detection orders entries committee → appellant → deciding so
the deciding appraiser appears last; block-tet's prompt instructs the
writer to phrase the deciding appraiser's view as the governing
factual finding ("ואולם, השמאי המכריע קבע...").
- New PATCH /api/cases/{n}/documents/{doc_id} (Pydantic model with
whitelist validation) and matching document_update MCP tool. Both
merge appraiser_side into metadata JSONB instead of touching the
schema.
UI
- New shared doc-types module exports the canonical 11 doc_type
options plus the 3 appraiser-side options; both upload-sheet and
the document badge now read from it instead of duplicating Hebrew
labels.
- New DocumentTypeEditor renders a Popover off the doc-type Badge
with two Selects. The save button stays disabled while doc_type is
appraisal but no side has been picked, mirroring the backend
enforcement so the user finds out before triggering extraction.
- usePatchDocument React-Query mutation invalidates the case detail
on success so the badge updates without a manual refresh.
Lets the chair generate a partial decision DOCX before the discussion-and-
ruling block is decided. Same template, skill and DOCX styling as the final
decision (David, RTL, bookmarks) — only the block selection and order differ:
רקע (ו) → תכניות+היתרים (ט) → טענות (ז) → הליכים (ח). The opening (ה),
ruling (י), summary (יא), and signatures (יב) are omitted.
- New appraiser_facts table + CRUD + conflict detection in db.py (V5 schema).
Conflict = same plan/permit identifier reported differently by 2+ appraisers.
- New appraiser_facts_extractor service: per-appraisal Claude extraction of
plans + permits with raw quotes and page numbers.
- block-tet prompt extended with a permits sub-section sourced from the
extracted facts, plus an explicit instruction to flag inter-appraiser
conflicts in neutral wording without resolving them (deferred to block-yod).
- block-chet prompt extended with a post-hearing materials context sourced
from documents.metadata.is_post_hearing.
- docx_exporter.export_decision now accepts mode='interim' which reorders
the blocks per the chair's mental model and writes
טיוטת-ביניים-v{N}.docx (versioned independently of regular drafts).
- 3 new MCP tools: extract_appraiser_facts, write_interim_draft,
export_interim_draft. write_interim_draft auto-runs extraction if the
appraiser_facts table is empty for the case.
- legal-analyst: opus 4.6 → opus 4.7
- legal-proofreader: opus 4.6 → opus 4.7
- legal-writer: sonnet 4.6 → opus 4.7 (complex block writing benefits from stronger model)
- block_writer MODEL_MAP: updated opus ID to 4.7
Opus 4.7 brings: high-res images (2576px), better file-based memory,
improved DOCX generation, and task budgets for agentic loops.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
"בית ספר להחלטות" Phase 2 — the system now has formal analytical
methodology for building quasi-judicial decisions, separate from
Dafna's writing style (SKILL.md) and content checklists.
What was done:
- Downloaded 5 authoritative sources (~341K words): FJC Judicial
Writing Manual (1991+2020), Garner Legal Writing in Plain English,
Posner How Judges Think, Scalia/Garner Making Your Case
- Extracted principles from all sources into intermediate docs
- Synthesized into docs/decision-methodology.md (3,400 words,
12 sections, 10 guiding principles)
- Integrated methodology into block-yod prompt via {methodology_guidance}
- Restructured legal-writer agent workflow to follow analytical stages
- Made "answer all claims" flexible (bundle/skip via chair_directions)
- Added methodology compliance check (#7) to legal-qa agent
- Updated all knowledge files (CLAUDE.md, SKILL.md, lessons, corpus)
Three-layer architecture:
1. Methodology (decision-methodology.md) — universal, how to think
2. Content checklists (lessons.py) — specific per appeal subtype
3. Style (SKILL.md) — Dafna's personal writing patterns
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses Dafna's observation that licensing decisions lack comprehensive
planning discussion. Systematic corpus analysis of all 24 training decisions
revealed the system learned writing style but not substantive content.
Changes:
- Corpus analysis of all 24 decisions (docs/corpus-analysis.md)
- 5 content checklists by appeal subtype injected into block-yod prompt
- chair_feedback DB table + API endpoints + MCP tools
- Feedback management page in Next.js UI (/feedback)
- Navigation updated with "הערות יו״ר" link
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues that caused QA agent to fail:
1. save_block_content saved to DB only — now also rebuilds drafts/decision.md
2. legal-writer.md now has explicit mandatory step: case_update(status="drafted")
Without these, workflow_status reports has_draft=false and QA can't run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New module claude_session.py provides query() and query_json() that
run prompts via `claude -p` CLI — uses the claude.ai session, zero API cost.
Converted 6 services:
- claims_extractor.py: extract_claims_with_ai
- brainstorm.py: brainstorm_directions
- block_writer.py: write_block (was streaming+thinking, now simple)
- qa_validator.py: claims_coverage check
- style_analyzer.py: 3 API calls (single pass, multi pass, synthesis)
- learning_loop.py: extract_lessons
Only extractor.py still uses Anthropic API (for PDF OCR with Vision).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New architecture: MCP provides context, Claude Code writes.
New functions:
- get_block_context(case_id, block_id) → returns full context package
(prompt, source docs, claims, direction, precedents, style guide)
WITHOUT calling Anthropic API
- save_block_content(case_id, block_id, content) → saves block to DB
New MCP tools: get_block_context, save_block_content
The old write_block (API-based) still works as fallback.
The new flow uses Claude Code's own model (Opus 4.6, 1M context)
which has no separate API billing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_build_style_context rewritten from 10-line summary to comprehensive
style guide including:
- Tone rules per appeal type (warm for licensing, cold for levy)
- 15 mandatory expressions ("כידוע", "ברי כי", "אין בידנו לקבל")
- Discussion structure rules (continuous prose, conclusion first)
- Per-party phrasing templates (appellants, committee, permit applicants)
- DB patterns grouped by type (phrases, transitions, openings, closings)
This addresses the main quality gap: style rated 2/5 because the output
was "dry and overly formal" vs Dafna's "direct and clear" voice.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
block-yod max_tokens reduced from 32K to 16K — the API returned
"max_tokens: 32768 > 32000" error. With thinking enabled, the actual
limit for output is lower. 16K is sufficient for discussion blocks.
Also: extractor.py now supports .md files (was missing, blocked
Beit HaKerem upload).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
block_writer: Rewrote block-zayin prompt to require synthesis by topic
instead of listing each claim separately. Now produces 3 organized
sections (appellants 8, committee 6, permit applicants 3+) instead
of 40 scattered paragraphs. Target: 800-1500 words.
claims_extractor: Fix markdown code block stripping (same bug as
qa_validator had). Enables parsing claims from Claude responses
wrapped in ```json blocks.
Tested on Hecht: block-zayin from 40 paragraphs/1049 words to
17 organized paragraphs/1039 words. Structure now matches Dafna's
original (3 parties, grouped by topic).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
block_writer: new renumber_all_blocks() function that renumbers all
paragraphs across all blocks sequentially (1, 2, 3...). Handles both
plain "N." and bold "**N.**" formats. Added missing 'import re'.
qa_validator: sequential_numbering check now matches bold-formatted
numbers (**N.**) in addition to plain (N.).
Tested on Hecht: renumbered 115 paragraphs across 7 blocks, QA 6/6.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
block_writer: _build_precedents_context now searches both
paragraph_embeddings (other decisions by Dafna) and case_law_embeddings
(precedent case law). Previously only searched document_chunks which
had no cross-case data. Now returns ~2400 chars from 3 other decisions.
processor: Step 1.6 auto-updates case appellants/respondents from
classifier results when they're empty. Prevents blank party fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
block_writer: _build_claims_context now filters out block-zayin claims
(from final decision) and uses only claims from original pleadings.
Reduces noise from 78 to 48 real claims for Hecht case.
qa_validator: claims_coverage check rewritten:
- Filter block-zayin claims (same reason)
- Keyword-based matching instead of 3-word phrase matching
- 25% keyword overlap threshold (was: any 3-word match)
- Allow up to 20% uncovered claims before failing
- Check both block-yod and block-zayin for coverage
Result: Hecht case QA goes from 4/6 to 6/6, 47/48 claims covered (98%).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add minimum word count guidance (2000-4000 words)
- Number each claim in claims_context for explicit tracking
- Require 3-5 case law citations minimum
- Fix max_tokens > budget_tokens for extended thinking
- Use streaming for opus+thinking requests (>10min timeout)
Tested on Hecht case: block-yod improved from 1039 to 1927 words.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>