feat(style-acq T1-T3): קורפוס-דוגמאות של דפנה לכותב (style_exemplars)
ממלא את ערוץ-הדוגמאות (B) של מערכת רכישת-הסגנון: הכותב מאחזר פסקאות-בלוק אמיתיות של דפנה בזמן כתיבה, ממוקדות section+outcome+practice_area. T1 — תשתית + backfill: - SCHEMA_V27: טבלת style_exemplars (purpose-built — בלי תיקים מזויפים בשרשרת decision_paragraphs). decision_number/source/section/outcome/practice_area+embedding. - db: insert/delete/search_style_exemplars + count_style_exemplars. - scripts/backfill_style_exemplars.py: מפצל קורפוס דפנה (style_corpus + internal_committee) לסעיפים→פסקאות, embed, שמירה. אידמפוטנטי, dry-run/apply. T2 — אחזור ממוקד: - search_style_exemplars(section, outcome, practice_area) — section=hard filter, outcome/practice_area=soft. block_writer._build_precedents_context ממפה block→section ומאחזר (ראשי), לצד הנתיב הישן (משלים). T3 — contrastive/adapt: - הדוגמאות מתויגות "מבנה/קול בלבד — התאם, אל תעתיק תוכן"; פסקה מלאה (1100 תווים). INV-LRN5 (טוהר — סגנון בלבד). G11. הרצת backfill --apply בנפרד. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -725,19 +725,43 @@ async def _build_precedents_context(
|
||||
style_parts: list[str] = []
|
||||
caselaw_parts: list[str] = []
|
||||
case_law_ids: list[str] = []
|
||||
# block → golden-ratio section, for targeted exemplar retrieval (T2)
|
||||
_BLOCK_SECTION = {
|
||||
"block-vav": "background", "block-zayin": "claims",
|
||||
"block-yod": "discussion", "block-yod-alef": "summary",
|
||||
}
|
||||
try:
|
||||
case = await db.get_case(case_id)
|
||||
case_number = case.get("case_number", "") if case else ""
|
||||
subject = case.get("subject", "") if case else ""
|
||||
practice_area = case.get("practice_area", "") if case else ""
|
||||
decision = await db.get_decision_by_case(case_id)
|
||||
outcome = (decision or {}).get("outcome", "")
|
||||
query = f"דיון משפטי בנושא {subject}" if subject else "דיון משפטי ועדת ערר"
|
||||
query_emb = await embeddings.embed_query(query)
|
||||
section = _BLOCK_SECTION.get(block_id)
|
||||
|
||||
# Stream 1: paragraph_embeddings — Dafna's own prose (STYLE exemplars, not content)
|
||||
# Stream 1a (PRIMARY): Dafna's own block-level prose from her corpus
|
||||
# (style_exemplars) — matched by section + outcome + practice_area (T2/T3).
|
||||
if section:
|
||||
exemplars = await db.search_style_exemplars(
|
||||
query_embedding=query_emb, section=section,
|
||||
outcome=outcome or None, practice_area=practice_area or None, limit=6,
|
||||
)
|
||||
exemplars = [e for e in exemplars if e.get("decision_number", "") != case_number]
|
||||
for e in exemplars[:4]:
|
||||
style_parts.append(
|
||||
f"[דוגמת-סגנון (מבנה/קול בלבד — התאם, אל תעתיק תוכן) — "
|
||||
f"{e.get('decision_number', '?')}, {section}, "
|
||||
f"outcome={e.get('outcome') or '—'}]\n{e['paragraph_text'][:1100]}"
|
||||
)
|
||||
|
||||
# Stream 1b: paragraphs from pipeline cases (legacy path; may be empty)
|
||||
para_results = await db.search_similar_paragraphs(
|
||||
query_embedding=query_emb, limit=10, block_type="block-yod",
|
||||
)
|
||||
para_results = [r for r in para_results if r.get("case_number", "") != case_number]
|
||||
for r in para_results[:4]:
|
||||
for r in para_results[:2]:
|
||||
style_parts.append(
|
||||
f"[דוגמת-סגנון — החלטת {r.get('case_number', '?')} "
|
||||
f"{r.get('case_title', '')}, בלוק {r.get('block_type', '')}]\n{r['content'][:500]}"
|
||||
|
||||
Reference in New Issue
Block a user