Commit Graph

7 Commits

Author SHA1 Message Date
bacb330a2a Replace all Anthropic API calls with Claude Code session (claude -p)
New module claude_session.py provides query() and query_json() that
run prompts via `claude -p` CLI — uses the claude.ai session, zero API cost.

Converted 6 services:
- claims_extractor.py: extract_claims_with_ai
- brainstorm.py: brainstorm_directions
- block_writer.py: write_block (was streaming+thinking, now simple)
- qa_validator.py: claims_coverage check
- style_analyzer.py: 3 API calls (single pass, multi pass, synthesis)
- learning_loop.py: extract_lessons

Only extractor.py still uses Anthropic API (for PDF OCR with Vision).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:14:08 +00:00
586f1db402 QA claims check: Haiku→Sonnet + filter appellant claims only
Two fixes for claims_coverage false negatives (55% → expected ~85%+):

1. Model upgrade: Haiku → Sonnet for semantic matching. Haiku missed
   obvious matches (e.g., paragraph about "כריתת עצים" not matching
   claim about tree cutting). Sonnet understands context better.

2. Filter: only check appellant/respondent claims, not committee or
   permit_applicant claims. Committee claims are defensive positions
   ("the application complies with the plan") — they don't need to
   be "addressed" in the discussion section.

3. Send full discussion text (was truncated to 12K chars).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 07:37:23 +00:00
e24e24dac5 Maximize context and output per Anthropic best practices
Per official Anthropic documentation (April 2026):

Output tokens increased to match model capabilities:
- block-yod (discussion): 8K → 32K (Opus supports 128K)
- block-zayin (claims): 4K → 16K
- block-vav (background): 4K → 16K
- claims_extractor: 4K → 8K (fixes truncated JSON)
- qa_validator: 4K → 8K

Source documents sent in full (not truncated):
- Was: 3000 chars per doc, 15K total
- Now: full document text, no truncation
- Reduces hallucinations: "extract word-for-word quotes first"

Prompt structure follows long-context tips:
- Source documents placed FIRST (top of prompt)
- Instructions and query placed LAST
- "Queries at the end improve quality by up to 30%"

Extended thinking uses adaptive mode for Opus 4.6.
Streaming enabled for all requests > 21K tokens.

Unified JSON parsing via parse_llm_json() helper in config.py.
Applied to: classifier, claims_extractor, brainstorm, qa_validator,
learning_loop (5 files).

Also: extractor.py now supports .md files.

Sources:
- https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
- https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips
- https://docs.anthropic.com/en/docs/minimizing-hallucinations
- https://docs.anthropic.com/en/docs/about-claude/models/overview

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:17:43 +00:00
e438740ab4 Add renumber_all_blocks + fix sequential_numbering check for bold format
block_writer: new renumber_all_blocks() function that renumbers all
paragraphs across all blocks sequentially (1, 2, 3...). Handles both
plain "N." and bold "**N.**" formats. Added missing 'import re'.

qa_validator: sequential_numbering check now matches bold-formatted
numbers (**N.**) in addition to plain (N.).

Tested on Hecht: renumbered 115 paragraphs across 7 blocks, QA 6/6.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 12:30:31 +00:00
52beb6ebdc Replace keyword claims check with Claude-based semantic check
claims_coverage now uses Claude Haiku to check if each claim is
semantically addressed in the discussion, not just keyword-matched.

- Sends all claims + discussion to Claude in one API call
- Returns addressed/partial/missing for each claim
- Handles markdown code block wrapping in response
- max_tokens 4096 (was 2048) for 48+ claims

Result on Hecht: 45/48 addressed (94%), 1 partial, 3 missing.
The 3 missing are genuinely unaddressed (personal/procedural claims).
Previously keyword check showed 47/48 but missed semantic gaps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:38:31 +00:00
018b5936a1 Fix claims handling: filter block-zayin duplicates, improve QA matching
block_writer: _build_claims_context now filters out block-zayin claims
(from final decision) and uses only claims from original pleadings.
Reduces noise from 78 to 48 real claims for Hecht case.

qa_validator: claims_coverage check rewritten:
- Filter block-zayin claims (same reason)
- Keyword-based matching instead of 3-word phrase matching
- 25% keyword overlap threshold (was: any 3-word match)
- Allow up to 20% uncovered claims before failing
- Check both block-yod and block-zayin for coverage

Result: Hecht case QA goes from 4/6 to 6/6, 47/48 claims covered (98%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:32:29 +00:00
d9e5ef0f46 Add full decision writing pipeline: classify, extract, brainstorm, write, QA, export
New services (11 files):
- classifier.py: auto doc-type classification + party identification (Claude Haiku)
- claims_extractor.py: claim extraction from pleadings (Claude Sonnet + regex)
- references_extractor.py: plan/case-law/legislation detection (regex)
- brainstorm.py: direction generation with 2-3 options (Claude Sonnet)
- block_writer.py: 12-block decision writer (template + Claude Sonnet/Opus)
- docx_exporter.py: DOCX export with David font, RTL, headings
- qa_validator.py: 6 QA checks with export blocking on critical failure
- learning_loop.py: draft vs final comparison + lesson extraction
- metrics.py: KPIs dashboard per case and global
- audit.py: action audit log
- cli.py: standalone CLI with 11 commands

Updated pipeline: extract → classify → chunk → embed → store → extract_references
New MCP tools: 29 total (was 16)
New DB tables: audit_log, decisions CRUD, claims CRUD
Config: Infisical support, external service allowlist

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:21:47 +00:00