Auto-strip Nevo preambles and separate style analysis per appeal subtype

- Add strip_nevo_preamble() to extractor.py — auto-removes Nevo database headers (bibliography, legislation, mini-ratio) during training upload - Add appeal_subtype column to style_patterns table — patterns are now stored per subtype instead of globally mixed - Update clear_style_patterns() to support subtype-scoped deletion - Pass appeal_subtype through analyze_corpus → store → upsert pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:03:06 +00:00
parent ba39707c70
commit 5dd24729e2
4 changed files with 65 additions and 18 deletions
--- a/mcp-server/src/legal_mcp/tools/documents.py
+++ b/mcp-server/src/legal_mcp/tools/documents.py
@@ -152,8 +152,9 @@ async def document_upload_training(
    if source.resolve() != dest.resolve():
        shutil.copy2(str(source), str(dest))

-    # Extract text
+    # Extract text and strip Nevo preamble
    text, page_count = await extractor.extract_text(str(dest))
+    text = extractor.strip_nevo_preamble(text)

    # Parse date
    d_date = None