Auto-strip Nevo preambles and separate style analysis per appeal subtype
- Add strip_nevo_preamble() to extractor.py — auto-removes Nevo database headers (bibliography, legislation, mini-ratio) during training upload - Add appeal_subtype column to style_patterns table — patterns are now stored per subtype instead of globally mixed - Update clear_style_patterns() to support subtype-scoped deletion - Pass appeal_subtype through analyze_corpus → store → upsert pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -152,8 +152,9 @@ async def document_upload_training(
|
||||
if source.resolve() != dest.resolve():
|
||||
shutil.copy2(str(source), str(dest))
|
||||
|
||||
# Extract text
|
||||
# Extract text and strip Nevo preamble
|
||||
text, page_count = await extractor.extract_text(str(dest))
|
||||
text = extractor.strip_nevo_preamble(text)
|
||||
|
||||
# Parse date
|
||||
d_date = None
|
||||
|
||||
Reference in New Issue
Block a user