From 770d23b19863e1f7655747e0fafb56beb4edd90f Mon Sep 17 00:00:00 2001
From: Chaim <chaim@marcus-law.co.il>
Date: Sat, 6 Jun 2026 16:02:18 +0000
Subject: [PATCH] =?UTF-8?q?feat(spec):=20=D7=94=D7=92=D7=93=D7=A8=D7=AA=20?=
 =?UTF-8?q?=D7=9E=D7=A2=D7=A8=D7=9B=D7=AA=20=D7=A8=D7=9B=D7=99=D7=A9=D7=AA?=
 =?UTF-8?q?-=D7=94=D7=A1=D7=92=D7=A0=D7=95=D7=9F=20=D7=9B=D7=99=D7=A2?=
 =?UTF-8?q?=D7=93-=D7=A2=D7=9C=20+=20=D7=A1=D7=A4=20+=20=D7=9E=D7=A9=D7=99?=
 =?UTF-8?q?=D7=9E=D7=95=D7=AA=20(PR1=20=D7=99=D7=A1=D7=95=D7=93=D7=95?=
 =?UTF-8?q?=D7=AA)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

מגדיר במפורש את יעד-העל שמעולם לא הוגדר: שהסוכנים יכתבו וינתחו עררים
בדיוק כמו דפנה תמיר, דרך תת-מערכת Style-Acquisition נפרדת ממערכת-הכתיבה.

- CLAUDE.md: פרק "יעד-העל: רכישת-הסגנון" — הפרדה writing↔learning,
  Authorial Style Profiling (לא fine-tuning), מדיניות-העתקה לפי סוג-תוכן
- docs/spec/07-learning.md §0: תת-המערכת, 3 ערוצי-הזנה, צינור 7-שלבים,
  ניהול ב-UI, + INV-LRN4 (ניגוד-אמת draft↔final) + INV-LRN5 (טוהר-הקול)
- TaskMaster: 15 משימות T0-T14 (89-103) — MVP=T0+T4+T7

ללא שינוי-קוד runtime. 1130-25 כבר נקלט ל-internal_committee (תהליך מקביל).
INV: G9 (ידע מובנה), G10 (שער-יו"ר), G11 (סגנון דפנה).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .taskmaster/tasks/tasks.json | 167 ++++++++++++++++++++++++++++++++++-
 CLAUDE.md                    |  12 +++
 docs/spec/07-learning.md     |  35 ++++++++
 3 files changed, 213 insertions(+), 1 deletion(-)

diff --git a/.taskmaster/tasks/tasks.json b/.taskmaster/tasks/tasks.json
index 6214de0..39c41a2 100644
--- a/.taskmaster/tasks/tasks.json
+++ b/.taskmaster/tasks/tasks.json
@@ -2997,6 +2997,171 @@
         "dependencies": [],
         "priority": "high",
         "subtasks": []
+      },
+      {
+        "id": 89,
+        "title": "[רכישת-סגנון T0] הזרקת הפרופיל-המופשט ל-block_writer + מדיניות-העתקה",
+        "description": "הלוֹבר הראשי. block_writer.py יטען voice-fingerprint+author-features+Copy-Paste Templates ל-{style_context} בכל בלוק. הוראת-מדיניות לפי סוג-תוכן: נוסחה/בוילרפלייט→מותר להעתיק, ניתוח ספציפי→הכלל והתאם, מהות מתיק אחר→אסור. פיצול {precedents_context} ל-{daphna_style_exemplars} (סגנון) ו-{case_law_citations} (פסיקה). קבצים: block_writer.py:205-260,710,795-815. MVP.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "high",
+        "subtasks": []
+      },
+      {
+        "id": 90,
+        "title": "[רכישת-סגנון T1] Backfill decision_paragraphs+paragraph_embeddings מ-style_corpus",
+        "description": "אכלוס כל 48 ההחלטות עם author='daphna' כדי שאחזור-הבלוק (search_similar_paragraphs) יחזיר פסקאות אמיתיות של דפנה. documents.py:186-215 + סקריפט חד-פעמי. תלוי: אין. MVP-enabler.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "high",
+        "subtasks": []
+      },
+      {
+        "id": 91,
+        "title": "[רכישת-סגנון T2] הרחבת search_similar_paragraphs — סינון outcome+practice_area+block_type",
+        "description": "db.py:2243 — להוסיף סינון, להחזיר פסקה מלאה, להרחיב לבלוקים ז/ח (לא רק י). block_writer.py:710 מעביר outcome, 4→6 exemplars. תלוי: T1.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "high",
+        "subtasks": []
+      },
+      {
+        "id": 92,
+        "title": "[רכישת-סגנון T3] דוגמאות contrastive + תיוג 'תבנית-קול בלבד'",
+        "description": "להחזיר גם 'במה דפנה שונה' לא רק דומה (author-features+contrastive, arxiv 2504.08745). תלוי: T2,T0.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "medium",
+        "subtasks": []
+      },
+      {
+        "id": 93,
+        "title": "[רכישת-סגנון T4] חיבור learning_loop ל-mark-final דרך ה-curator",
+        "description": "mark-final מסמן+מעיר; curator מריץ ingest_final_version (claude_session לא בקונטיינר). app.py:3217-3283, paperclip_client.py, hermes-curator.md. תלוי: אין. MVP.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "high",
+        "subtasks": []
+      },
+      {
+        "id": 94,
+        "title": "[רכישת-סגנון T5] פנקס-התאמה draft_final_pairs + snapshot ב-mark-final (INV-LRN4)",
+        "description": "טבלה draft_final_pairs(case_id,draft_text,final_text,diff_stats,status,created_at). snapshot של הטיוטה ברגע mark-final (אחרת diff מזוהם). זו 'רשימת ההחלטות' של כלל-העל + ground-truth ל-T7. תלוי: T4. MVP.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "high",
+        "subtasks": []
+      },
+      {
+        "id": 95,
+        "title": "[רכישת-סגנון T6] פנקס-התאמה ב-UI + קטגוריה במרכז-אישורים",
+        "description": "רשימת כל ההחלטות + סטטוס (draft_done/final_received/analyzed/lessons_folded). מרכז-אישורים: קטגוריה 'ממתינות להשוואה מול סופי'. תלוי: T5.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "medium",
+        "subtasks": []
+      },
+      {
+        "id": 96,
+        "title": "[רכישת-סגנון T7] מדד מרחק-סגנון (style_distance.py)",
+        "description": "golden_ratio_adherence + anti_pattern_hits + draft_to_final_diff (ללא LLM). lessons.py יקבל ANTI_PATTERNS. חשיפה דרך get_metrics/tool. תלוי: T5. MVP.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "high",
+        "subtasks": []
+      },
+      {
+        "id": 97,
+        "title": "[רכישת-סגנון T8] הסרת LIMIT 20 ב-style_analyzer (כיסוי 48/48)",
+        "description": "style_analyzer.py:124 — LIMIT 20→פרמטר/הסרה. מזין author-features של T0. תלוי: אין.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "medium",
+        "subtasks": []
+      },
+      {
+        "id": 98,
+        "title": "[רכישת-סגנון T9] תיקון-המספור: ביטול אנטי-דפוס + מספור-אוטומטי בייצוא",
+        "description": "ביטול 'אסור רשימה ממוספרת' (voice-fingerprint 3.1 — שגוי, ההחלטה ממוספרת תמיד). ייצוא DOCX יחיל מספור-אוטומטי של Word (skills/dafna-decision-template); הכותב יפסיק להזריק מספרים ידניים. בדיקה: האם הייצוא כבר ממספר אוטומטית. תלוי: אין.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "medium",
+        "subtasks": []
+      },
+      {
+        "id": 99,
+        "title": "[רכישת-סגנון T10] get_style_guide דינמי — golden-ratios נמדדים מקורפוס",
+        "description": "drafting.py:68 — golden-ratios נמדדים מהקורפוס לצד הקבועים, סימון פער. תלוי: T1,T8.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "low",
+        "subtasks": []
+      },
+      {
+        "id": 100,
+        "title": "[רכישת-סגנון T11] regen API types + deploy",
+        "description": "npm run api:types ב-web-ui אם נוסף tool/endpoint; commit+push+Coolify deploy; MCP restart מקומי. תלוי: כל משימה שמשנה endpoint/tool.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "medium",
+        "subtasks": []
+      },
+      {
+        "id": 101,
+        "title": "[רכישת-סגנון T12] /methodology — קטגוריות profile חדשות",
+        "description": "להוסיף ל-CRUD הגנרי (/api/methodology/{category}) קטגוריות: transition_phrases, anti_patterns, voice_invariants (קבועי voice-fingerprint). + טאבים ב-web-ui/src/app/methodology. תלוי: אין.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "medium",
+        "subtasks": []
+      },
+      {
+        "id": 102,
+        "title": "[רכישת-סגנון T13] /training — טאבי learning חדשים",
+        "description": "טאב מדד-מרחק (מגמת T7), טאב פנקס-התאמה (T6), חיווט 'השוואה' ל-draft_final_pairs, פורטרט 'נמדד מול יעד'. תלוי: T5,T7.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "medium",
+        "subtasks": []
+      },
+      {
+        "id": 103,
+        "title": "[רכישת-סגנון T14] אישור הצעות-distillation של ה-curator → כתיבה לפרופיל",
+        "description": "משטח אישור הצעות ה-curator (שער INV-G10) שכותב ל-methodology/voice-fingerprint. /training טאב הסוכן. תלוי: T4.",
+        "details": "",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [],
+        "priority": "medium",
+        "subtasks": []
       }
     ],
     "metadata": {
@@ -3009,7 +3174,7 @@
       ],
       "created": "2026-06-06T12:53:14.496Z",
       "description": "Tasks for legal-ai context",
-      "updated": "2026-06-06T12:53:30.413Z"
+      "updated": "2026-06-06T15:58:42.555Z"
     }
   }
 }
\ No newline at end of file
diff --git a/CLAUDE.md b/CLAUDE.md
index e8772ca..a3a9a4e 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -22,6 +22,18 @@
 4. **סיוע בכתיבה** — ייצור טיוטות לפי ארכיטקטורת 12 בלוקים בסגנון דפנה
 5. **ייצוא DOCX** — מסמך מעוצב מוכן להגשה
 
+### ⭐ יעד-העל: רכישת-הסגנון של דפנה (Style Acquisition)
+**היעד הראשי של המערכת הוא שהסוכנים יכתבו וינתחו עררים בדיוק כמו עו"ד דפנה תמיר** — לא רק לייצר טיוטה תקנית, אלא להפנים את **הקול והשיטה** שלה. זה מחייב **הפרדה מובהקת בין שתי תת-מערכות**:
+
+1. **מערכת-הכתיבה (Writing)** — מייצרת טיוטות (analyst/writer/qa/ceo). **צרכן read-only** של artifacts-הקול.
+2. **מערכת רכישת-הסגנון (Style Acquisition)** — לומדת *איך* דפנה כותבת מכל זוג "טיוטה שלנו → סופי שלה", ומזינה חזרה את מערכת-הכתיבה. **היחידה שכותבת ל-artifacts-הקול** — תמיד דרך שער-יו"ר (INV-G10).
+
+**הגישה (state-of-the-art לדאטה-מועט):** Text Style Transfer מבוסס **Authorial Style Profiling** — להכליל את סגנון דפנה ולהתאים לתיק. העתקת פסקאות מותרת לתוכן קבוע/נוסחאי; ניתוח ספציפי → להכליל; **מהות משפטית (הלכה/עובדה) — אסור להעתיק מתיק לתיק**. *לא* fine-tuning של משקולות (Opus סגור; קורפוס קטן מדי).
+
+**כלל-העל — INV-LRN4:** כל החלטה אינה "סגורה" עד שהושוותה מול הגרסה הסופית של דפנה; כל סופי מנותח מול הטיוטה. כך לומדים מכל החלטה. **INV-LRN5:** שכבת-ידע-הקול לא תכיל מהות ספציפית — רק סגנון ושיטה.
+
+ספ מלא: [`docs/spec/07-learning.md`](docs/spec/07-learning.md) §0. ארכיטקטורה ומשימות: תוכנית `style-acquisition-subsystem`.
+
 ### מה היה קודם (Legacy)
 המערכת הקודמת היתה **Obsidian vault** עם Claude Code skills על שרת אחר. פותחו:
 - ניתוח סגנון של 3 החלטות (הכט — דחייה, בית הכרם — קבלה חלקית, אריאלי — השוואה)
diff --git a/docs/spec/07-learning.md b/docs/spec/07-learning.md
index d557c6c..abb8260 100644
--- a/docs/spec/07-learning.md
+++ b/docs/spec/07-learning.md
@@ -19,6 +19,41 @@
 
 ---
 
+## 0. תת-מערכת רכישת-הסגנון (Style Acquisition) — יעד-העל וההפרדה מהכתיבה
+
+**יעד-העל של legal-ai:** שהסוכנים יכתבו וינתחו עררים **בדיוק כמו עו"ד דפנה תמיר** — להפנים את הקול והשיטה, לא רק לייצר טיוטה תקנית. ל-end זה מחייב **הפרדה מובהקת בין שתי תת-מערכות**:
+
+| | **Writing Subsystem** | **Style-Acquisition Subsystem** |
+|---|---|---|
+| שאלה | "איך אכתוב את התיק כמו דפנה?" | "מה למדנו מהפער בין מה שכתבנו למה שדפנה חתמה?" |
+| טריגר | issue כתיבה | `mark-final` |
+| פלט | 12 בלוקים | עדכוני-קול מאושרים + מדד-מרחק |
+| סוכנים | writer/analyst/qa/ceo | hermes-curator (מורחב) |
+| יחס ל-artifacts-הקול | **צרכן read-only** | **היחיד שכותב** (דרך שער INV-G10) |
+
+### 0.1 הגישה: Authorial Style Profiling, לא fine-tuning
+היעד הוא **Text Style Transfer** מבוסס **פרופיל-סגנון מופשט** — להכליל את סגנון/שיטת דפנה ולהתאים לתיק הספציפי. fine-tuning של משקולות **לא רלוונטי**: המודל (Opus) סגור, והקורפוס (~48 החלטות, יו"ר חדשה) קטן מדי — מצב שבו הספרות מראה שפרופיל-מופשט + דוגמאות מנצח (≈+15% מעל RAG-בלבד). **מדיניות-העתקה לפי סוג-תוכן:** קבוע/נוסחאי (פתיחים דוקטרינליים, תבניות-סיום) → מותר להעתיק; ניתוח/טענות ספציפיים → להכליל ולהתאים; מהות (הלכה/עובדה מתיק אחר) → אסור (INV-LRN5).
+
+### 0.2 שלושת ערוצי-ההזנה לכותב
+1. **A — פרופיל-מופשט (ראשי):** voice-fingerprint + author-features כמותיים, מוזרק לכתיבה.
+2. **B — דוגמאות + תבניות (תומך):** פסקאות-בלוק אמיתיות + Copy-Paste Templates + contrastive.
+3. **C — deep-read (נקודתי):** voice-XXXX.md — worked example לתיק-מופת.
+
+### 0.3 הצינור החוזר per-final (7 שלבים)
+`mark-final` → [1] INTAKE (snapshot של הטיוטה) → [2] PAIRING (בלוק↔בלוק) → [3] ALIGNMENT (diff פר-בלוק) → [4] DISTILLATION (מפריד סגנון↔מהות) → [5] CURATION (Hermes + שער-יו"ר) → [6] FEEDBACK (ניתוב לערוץ A/B/C) → [7] MEASUREMENT (מדד-מרחק-סגנון).
+
+### 0.4 ניהול ב-UI
+`/methodology` = **עורך-הפרופיל** (declarative: יחסי-זהב, כללי-דיון, צ׳קליסטים, ביטויי-מעבר, אנטי-דפוסים, voice-invariants). `/training` = **שולחן-הלמידה** (קורפוס, פורטרט-סגנון, השוואת draft↔final, curator, מדד-מרחק, פנקס-התאמה).
+
+### 0.5 Invariants חדשים
+**INV-LRN4 (ניגוד-אמת → G10/G9):** למידת-קול מבוססת **pairing draft↔final ברמת-בלוק**, לא קריאת-final בלבד. כל החלטה אינה "סגורה" עד שהושוותה מול הסופי; כל סופי מנותח מול הטיוטה. נשמר פנקס-התאמה (`draft_final_pairs`) עם מצב-חיים `draft_done → final_received → analyzed → lessons_folded`.
+*מקורות:* imitation-learning-from-expert-edits · contrastive personalization (arxiv 2504.08745) · author-profiling. *סטטוס: verified.*
+
+**INV-LRN5 (טוהר-הקול → G4/G11):** שכבת-ידע-הקול (voice-fingerprint, style_patterns, exemplars) **לא תכיל הלכות/עובדות ספציפיות** — רק סגנון ושיטה. מהות מנותבת ל-precedent_library/halacha. ה-distillation מפריד במקור.
+*מקורות:* quality-at-source (Data Mesh) · separation-of-concerns. *סטטוס: verified.*
+
+---
+
 ## 1. שלוש לולאות-המשנה
 
 הלמידה אינה אירוע יחיד אלא **שלוש לולאות** המתנקזות לאותם מסמכי-ידע מוסמכים