f8c3fd6c89388b2ed3668853d6e8db3680754a4f
strip_nevo_preamble's _DECISION_START only matched ועדת-ערר openings (בפנינו / הערר שבנדון / ...), so Nevo COURT judgments — exactly the ones carrying a מיני-רציו — slipped through unstripped. The editorial mini-ratio then leaked into the chunked body, risking that the halacha extractor reads Nevo's answer key (contamination) and polluting the corpus. Proven on בג"ץ 1764/05: its full_text still contained the מיני-רציו (unstripped). Fix: - Extend _DECISION_START with court-ruling openings: פסק-דין/פסק דין header and the authoring-judge line (השופט/ת, כב' השופט, הנשיא, המשנה לנשיא). re.search picks the earliest line-start match → the real opinion start, not the prose ratio above it. - Widen the Nevo-marker detection window 400→1500 chars so a long court/parties header doesn't push חקיקה שאוזכרה:/מיני-רציו: out of range. Verified on the real 1764/05 full_text: strips 2702 chars, body now starts at 'השופט ס' ג'ובראן:', מיני-רציו gone. Regression: ועדת-ערר openings still strip; non-Nevo text untouched; markers-past-400 now detected. Suite 182 passed (6 new). This is the anti-contamination prerequisite for the Nevo-ratio gold-set (#86.3/#81.7). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Description
AI Legal Decision Drafting System — MCP server, web upload, RAG search
Languages
Python
63.2%
TypeScript
34.3%
JavaScript
1.3%
Shell
0.8%
CSS
0.3%
Other
0.1%