feat(style-acq T1-T3): קורפוס-דוגמאות של דפנה לכותב (style_exemplars)

ממלא את ערוץ-הדוגמאות (B) של מערכת רכישת-הסגנון: הכותב מאחזר פסקאות-בלוק אמיתיות של דפנה בזמן כתיבה, ממוקדות section+outcome+practice_area. T1 — תשתית + backfill: - SCHEMA_V27: טבלת style_exemplars (purpose-built — בלי תיקים מזויפים בשרשרת decision_paragraphs). decision_number/source/section/outcome/practice_area+embedding. - db: insert/delete/search_style_exemplars + count_style_exemplars. - scripts/backfill_style_exemplars.py: מפצל קורפוס דפנה (style_corpus + internal_committee) לסעיפים→פסקאות, embed, שמירה. אידמפוטנטי, dry-run/apply. T2 — אחזור ממוקד: - search_style_exemplars(section, outcome, practice_area) — section=hard filter, outcome/practice_area=soft. block_writer._build_precedents_context ממפה block→section ומאחזר (ראשי), לצד הנתיב הישן (משלים). T3 — contrastive/adapt: - הדוגמאות מתויגות "מבנה/קול בלבד — התאם, אל תעתיק תוכן"; פסקה מלאה (1100 תווים). INV-LRN5 (טוהר — סגנון בלבד). G11. הרצת backfill --apply בנפרד. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:10:01 +00:00
parent a3451775fa
commit 2e20e27e17
4 changed files with 261 additions and 3 deletions
--- a/mcp-server/src/legal_mcp/services/block_writer.py
+++ b/mcp-server/src/legal_mcp/services/block_writer.py
@@ -725,19 +725,43 @@ async def _build_precedents_context(
    style_parts: list[str] = []
    caselaw_parts: list[str] = []
    case_law_ids: list[str] = []
+    # block → golden-ratio section, for targeted exemplar retrieval (T2)
+    _BLOCK_SECTION = {
+        "block-vav": "background", "block-zayin": "claims",
+        "block-yod": "discussion", "block-yod-alef": "summary",
+    }
    try:
        case = await db.get_case(case_id)
        case_number = case.get("case_number", "") if case else ""
        subject = case.get("subject", "") if case else ""
+        practice_area = case.get("practice_area", "") if case else ""
+        decision = await db.get_decision_by_case(case_id)
+        outcome = (decision or {}).get("outcome", "")
        query = f"דיון משפטי בנושא {subject}" if subject else "דיון משפטי ועדת ערר"
        query_emb = await embeddings.embed_query(query)
+        section = _BLOCK_SECTION.get(block_id)

-        # Stream 1: paragraph_embeddings — Dafna's own prose (STYLE exemplars, not content)
+        # Stream 1a (PRIMARY): Dafna's own block-level prose from her corpus
+        # (style_exemplars) — matched by section + outcome + practice_area (T2/T3).
+        if section:
+            exemplars = await db.search_style_exemplars(
+                query_embedding=query_emb, section=section,
+                outcome=outcome or None, practice_area=practice_area or None, limit=6,
+            )
+            exemplars = [e for e in exemplars if e.get("decision_number", "") != case_number]
+            for e in exemplars[:4]:
+                style_parts.append(
+                    f"[דוגמת-סגנון (מבנה/קול בלבד — התאם, אל תעתיק תוכן) — "
+                    f"{e.get('decision_number', '?')}, {section}, "
+                    f"outcome={e.get('outcome') or '—'}]\n{e['paragraph_text'][:1100]}"
+                )
+
+        # Stream 1b: paragraphs from pipeline cases (legacy path; may be empty)
        para_results = await db.search_similar_paragraphs(
            query_embedding=query_emb, limit=10, block_type="block-yod",
        )
        para_results = [r for r in para_results if r.get("case_number", "") != case_number]
-        for r in para_results[:4]:
+        for r in para_results[:2]:
            style_parts.append(
                f"[דוגמת-סגנון — החלטת {r.get('case_number', '?')} "
                f"{r.get('case_title', '')}, בלוק {r.get('block_type', '')}]\n{r['content'][:500]}"