Add dafna-decision-template skill — knowledge for template-based DOCX export

Documents the rules and decisions behind building DOCX files from דפנה's decision template (טיוטת החלטה.dotx). The implementation lives in mcp-server/src/legal_mcp/services/analysis_docx_exporter.py; this skill captures the "why" so future improvements don't need to rediscover it. Contents: SKILL.md 5 critical rules, style mapping table, export flow, line classification, dash policy, placeholder handling, troubleshooting, future TODOs references/dotx-to-docx.md why python-docx can't open .dotx + the conversion recipe references/rtl-runs.md why <w:rtl/> is required on every run (otherwise Hebrew falls back to Times New Roman) references/style-mapping.md XML dump of every template style, with the Title-via-theme gotcha references/line-classification.md the 7 regex categories in _classify_line() with real examples Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 18:57:57 +00:00
parent 726498126d
commit bfec8bdaa3
5 changed files with 701 additions and 0 deletions
--- a/skills/dafna-decision-template/references/rtl-runs.md
+++ b/skills/dafna-decision-template/references/rtl-runs.md
@@ -0,0 +1,81 @@
+# למה `<w:rtl/>` חובה בכל run
+
+## הבעיה
+
+כשאתה יוצר `run` ב-python-docx על סגנון עברי מוגדר היטב (למשל Normal עם
+`cs="David"`) — עברית עדיין יוצאת ב-Times New Roman.
+
+## הסיבה
+
+Word משתמש ב-3 font slots בתוך `<w:rFonts>`:
+- `w:ascii` — תווים לטיניים
+- `w:hAnsi` — אותיות מיוחדות אירופיות
+- `w:cs` (complex script) — עברית, ערבית, תאית
+
+ההחלטה איזה slot להשתמש נעשית **לפי סוג הטקסט ב-run** ולפי **דגל רמת
+הריצה `<w:rtl/>`**. בלי הדגל, Word יכול להתייחס לטקסט העברי כ-LTR
+(למשל כשהוא מתערבב עם ספרות/לטינית) ולבחור את `ascii` — Times New Roman.
+
+## הפתרון
+
+מסמן כל run עברי כ-complex-script:
+
+```python
+from docx.oxml import OxmlElement
+from docx.oxml.ns import qn
+
+def _mark_run_rtl(run):
+    rPr = run._r.get_or_add_rPr()
+    if rPr.find(qn("w:rtl")) is None:
+        rPr.append(OxmlElement("w:rtl"))
+```
+
+וגם ברמת ה-paragraph (למקרה ש-paragraph mark עצמו משפיע):
+
+```python
+def _mark_paragraph_rtl(paragraph):
+    pPr = paragraph._p.get_or_add_pPr()
+    rPr = pPr.find(qn("w:rPr"))
+    if rPr is None:
+        rPr = OxmlElement("w:rPr"); pPr.append(rPr)
+    if rPr.find(qn("w:rtl")) is None:
+        rPr.append(OxmlElement("w:rtl"))
+```
+
+## תופעות לוואי של חוסר RTL ברמת ה-run
+
+1. **Font fallback ל-Times New Roman** — הסימפטום הנפוץ ביותר.
+2. **BiDi reordering של פיסוק** — נקודתיים, פסיקים, סוגריים עוברים למקום
+   הלא נכון. הסימפטום: `"(א)"` הופך ל-`")א("`.
+3. **מספרים "נוגדים" ברצף עברי** — `"בשנת 2024 פסקנו"` יכול להיראות
+   עם המספר במיקום הלא נכון.
+
+## איך לבדוק שה-RTL חל
+
+```python
+from docx.oxml.ns import qn
+
+for p in doc.paragraphs:
+    for r in p.runs:
+        rPr = r._r.find(qn("w:rPr"))
+        has_rtl = rPr is not None and rPr.find(qn("w:rtl")) is not None
+        if not has_rtl and any('\u0590' <= c <= '\u05FF' for c in r.text):
+            print(f"Missing RTL: {r.text[:40]!r}")
+```
+
+## זה לא מספיק רק ברמת הסגנון
+
+זו תפיסה מוטעית נפוצה: "אם הסגנון כולל `<w:rtl/>` ב-`rPr`, ירש כל ריצה".
+**לא נכון**. סגנון נותן ברירת מחדל ל-runs שעדיין לא נוצרו ב-Word GUI —
+אבל runs שנוצרו דרך python-docx מקבלים `rPr` ריק, שלא תורש אוטומטית
+את ה-rtl מהסגנון. לכן חייבים להוסיף ידנית.
+
+## הטמפלט של דפנה כדוגמה
+
+בוחנים את `word/document.xml` של הטמפלט המקורי — כל ריצה עברית כוללת:
+
+```xml
+<w:r><w:rPr><w:rFonts w:hint="cs"/><w:rtl/></w:rPr><w:t>רקע</w:t></w:r>
+```
+
+`<w:rtl/>` נמצא שם **במפורש**. אנחנו מחקים את זה.