feat(style-acq T1-T3): קורפוס-דוגמאות של דפנה לכותב (style_exemplars)

ממלא את ערוץ-הדוגמאות (B) של מערכת רכישת-הסגנון: הכותב מאחזר פסקאות-בלוק
אמיתיות של דפנה בזמן כתיבה, ממוקדות section+outcome+practice_area.

T1 — תשתית + backfill:
- SCHEMA_V27: טבלת style_exemplars (purpose-built — בלי תיקים מזויפים בשרשרת
  decision_paragraphs). decision_number/source/section/outcome/practice_area+embedding.
- db: insert/delete/search_style_exemplars + count_style_exemplars.
- scripts/backfill_style_exemplars.py: מפצל קורפוס דפנה (style_corpus +
  internal_committee) לסעיפים→פסקאות, embed, שמירה. אידמפוטנטי, dry-run/apply.

T2 — אחזור ממוקד:
- search_style_exemplars(section, outcome, practice_area) — section=hard filter,
  outcome/practice_area=soft. block_writer._build_precedents_context ממפה
  block→section ומאחזר (ראשי), לצד הנתיב הישן (משלים).

T3 — contrastive/adapt:
- הדוגמאות מתויגות "מבנה/קול בלבד — התאם, אל תעתיק תוכן"; פסקה מלאה (1100 תווים).

INV-LRN5 (טוהר — סגנון בלבד). G11. הרצת backfill --apply בנפרד.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-06 18:10:01 +00:00
parent a3451775fa
commit 2e20e27e17
4 changed files with 261 additions and 3 deletions

View File

@@ -725,19 +725,43 @@ async def _build_precedents_context(
style_parts: list[str] = []
caselaw_parts: list[str] = []
case_law_ids: list[str] = []
# block → golden-ratio section, for targeted exemplar retrieval (T2)
_BLOCK_SECTION = {
"block-vav": "background", "block-zayin": "claims",
"block-yod": "discussion", "block-yod-alef": "summary",
}
try:
case = await db.get_case(case_id)
case_number = case.get("case_number", "") if case else ""
subject = case.get("subject", "") if case else ""
practice_area = case.get("practice_area", "") if case else ""
decision = await db.get_decision_by_case(case_id)
outcome = (decision or {}).get("outcome", "")
query = f"דיון משפטי בנושא {subject}" if subject else "דיון משפטי ועדת ערר"
query_emb = await embeddings.embed_query(query)
section = _BLOCK_SECTION.get(block_id)
# Stream 1: paragraph_embeddings — Dafna's own prose (STYLE exemplars, not content)
# Stream 1a (PRIMARY): Dafna's own block-level prose from her corpus
# (style_exemplars) — matched by section + outcome + practice_area (T2/T3).
if section:
exemplars = await db.search_style_exemplars(
query_embedding=query_emb, section=section,
outcome=outcome or None, practice_area=practice_area or None, limit=6,
)
exemplars = [e for e in exemplars if e.get("decision_number", "") != case_number]
for e in exemplars[:4]:
style_parts.append(
f"[דוגמת-סגנון (מבנה/קול בלבד — התאם, אל תעתיק תוכן) — "
f"{e.get('decision_number', '?')}, {section}, "
f"outcome={e.get('outcome') or ''}]\n{e['paragraph_text'][:1100]}"
)
# Stream 1b: paragraphs from pipeline cases (legacy path; may be empty)
para_results = await db.search_similar_paragraphs(
query_embedding=query_emb, limit=10, block_type="block-yod",
)
para_results = [r for r in para_results if r.get("case_number", "") != case_number]
for r in para_results[:4]:
for r in para_results[:2]:
style_parts.append(
f"[דוגמת-סגנון — החלטת {r.get('case_number', '?')} "
f"{r.get('case_title', '')}, בלוק {r.get('block_type', '')}]\n{r['content'][:500]}"