dba2a131e01533be33fba4533f8ace1264459057
The chair cannot review every pending halacha. Three independent-lineage judges (Opus via claude_session · DeepSeek · Gemini-2.5-flash — #1 on LegalBench) vote on the COARSE axis we proved reliable across models (92%): "is this a genuine, keepable rule?". Only an agreed verdict acts; every split escalates to the chair (INV-G10). Buckets: clean→KEEP?; nli_unsupported→entailment re-adjudication; extraction-defects→re-extraction. halacha_panel_calibrate.py calibrates the voting policy on the gold-set's is_holding (the coarse label) per Trust-or-Escalate (ICLR 2025): unanimous → 94.9% precision / 78% coverage; majority → 92.9% / 99%; ZERO false-drops in both (the panel never rejects a good rule). Chosen policy (chair-approved): clean→majority-2/3, nli→asymmetric (majority-reject, unanimous-approve), defects→re-extraction. Reversible (--apply backs up review_status+flags first). Sources: Panel-of-LLM-Evaluators (PoLL) · Trust-or-Escalate (ICLR 2025, arXiv:2407.18370) · selective-prediction / learning-to-defer. Invariants: upholds G10 (human gate — splits escalate, panel only collapses the queue) and G9 (provenance — reviewer records the panel + policy). Read paths only in calibrate; --apply writes review_status/quality_flags reversibly with backup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Description
AI Legal Decision Drafting System — MCP server, web upload, RAG search
Languages
Python
63.2%
TypeScript
34.3%
JavaScript
1.3%
Shell
0.8%
CSS
0.3%
Other
0.1%