Maximize context and output per Anthropic best practices
Per official Anthropic documentation (April 2026):
Output tokens increased to match model capabilities:
- block-yod (discussion): 8K → 32K (Opus supports 128K)
- block-zayin (claims): 4K → 16K
- block-vav (background): 4K → 16K
- claims_extractor: 4K → 8K (fixes truncated JSON)
- qa_validator: 4K → 8K
Source documents sent in full (not truncated):
- Was: 3000 chars per doc, 15K total
- Now: full document text, no truncation
- Reduces hallucinations: "extract word-for-word quotes first"
Prompt structure follows long-context tips:
- Source documents placed FIRST (top of prompt)
- Instructions and query placed LAST
- "Queries at the end improve quality by up to 30%"
Extended thinking uses adaptive mode for Opus 4.6.
Streaming enabled for all requests > 21K tokens.
Unified JSON parsing via parse_llm_json() helper in config.py.
Applied to: classifier, claims_extractor, brainstorm, qa_validator,
learning_loop (5 files).
Also: extractor.py now supports .md files.
Sources:
- https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
- https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips
- https://docs.anthropic.com/en/docs/minimizing-hallucinations
- https://docs.anthropic.com/en/docs/about-claude/models/overview
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>