Files

Build & Deploy / build-and-deploy (push) Successful in 6s

Details

Add dafna-decision-template skill — knowledge for template-based DOCX export

Documents the rules and decisions behind building DOCX files from דפנה's
decision template (טיוטת החלטה.dotx). The implementation lives in
mcp-server/src/legal_mcp/services/analysis_docx_exporter.py; this skill
captures the "why" so future improvements don't need to rediscover it.

Contents:
  SKILL.md                       5 critical rules, style mapping table,
                                 export flow, line classification,
                                 dash policy, placeholder handling,
                                 troubleshooting, future TODOs
  references/dotx-to-docx.md     why python-docx can't open .dotx +
                                 the conversion recipe
  references/rtl-runs.md         why <w:rtl/> is required on every run
                                 (otherwise Hebrew falls back to
                                 Times New Roman)
  references/style-mapping.md    XML dump of every template style,
                                 with the Title-via-theme gotcha
  references/line-classification.md  the 7 regex categories in
                                 _classify_line() with real examples

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-16 18:57:57 +00:00

4.6 KiB

Raw Blame History

סיווג שורות — `_classify_line()`

כל שורה של content מה-MD עוברת דרך _classify_line() שמחזירה (kind, clean_text). הקטגוריה מכתיבה איזה סגנון Word יוחל.

טבלת הקטגוריות

kind	regex	clean_text	נמפה ל-style
`label_heading`	`^\s\\([^\n]+?):\\\s*$`	`"$1"`	`Heading 2`
`label_heading` (plain)	`^\s([^\n:]{2,40}):\s$`	`"$1"`	`Heading 2`
`inline_label`	`^\s\\([^\n]+?):\\\s+(.+)$`	`"$1\x00$2"`	`Normal` + bold label + value
`numbered`	`^\s*(\d+)[.)]\s+(.+)$`	`"$2"`	`List Paragraph`
`bullet`	`^\s[\-\u2022\\u25CF\u25E6]\s+(.+)$`	`"$1"`	`Normal` (marker stripped)
`heb_letter`	`^\s*$[א-ת]$\s+`	full line (marker kept)	`List Paragraph` + `_strip_numpr()`
`plain`	fallback	line	`Normal`

סדר הבדיקות (חשוב!)

1. STANDALONE_LABEL_RE      (**X:**)
2. INLINE_LABEL_RE          (**X:** value)
3. NUMBERED_LINE_RE         (1. X)
4. BULLET_LINE_RE           (- X) + re-check inside:
     4a. STANDALONE inside → label_heading
     4b. INLINE inside → inline_label
5. HEB_LETTER_LINE_RE       ((א) X)
6. PLAIN_LABEL_RE           (X:)  — last because it's broad
7. plain

דוגמאות מהשטח

input: `- נקודות פתוחות:`

BULLET_LINE_RE תופס → inner = "**נקודות פתוחות:**"
STANDALONE_LABEL_RE על ה-inner → label_heading, text = "נקודות פתוחות"
יוצא כ-Heading 2.

input: `- נקודות פתוחות: האם המקדם...`

BULLET_LINE_RE תופס → inner = "**נקודות פתוחות:** האם..."
INLINE_LABEL_RE על ה-inner → inline_label
יוצא כ-Normal עם label "נקודות פתוחות:" bold + value רגיל.

input: `1. שאלה עקרונית: האם נספח...`

NUMBERED_LINE_RE תופס → "**שאלה עקרונית:** האם נספח..."
יוצא כ-List Paragraph. ה-**...** בתוכו יעובד על ידי _add_runs_with_inline_bold() (bold inline run).

input: `(א) נספח הבינוי של תכנית...`

HEB_LETTER_LINE_RE תופס
יוצא כ-List Paragraph עם _strip_numpr() — כי המחבר כבר כתב "(א)".

input: `העורר טוען כי:`

לא תואם regex ספציפי
PLAIN_LABEL_RE תופס (23 תווים, מסתיים ב-:)
יוצא כ-Heading 2.

input: `פסקה ארוכה: עם עוד תוכן ופסיק, וסוגיה מורכבת.`

PLAIN_LABEL_RE לא תופס (יותר מ-40 תווים לפני :)
נשאר plain → Normal.

למה הגבלת `PLAIN_LABEL_RE` ל-`{2,40}`

בלי הגבלה, כל פסקה עם : במקום כלשהו הייתה הופכת ל-Heading 2. דוגמה שצריך למנוע:

טענה חשובה כאן: היא שהוועדה שגתה בכל אופן.

אין כאן כוונה לכותרת — : הוא חלק ממשפט. ההגבלה ל-40 תווים מסננת את רוב המקרים האלה כי רוב headings אמיתיים הם קצרים.

40 תווים זה ניחוש — אפשר לכוון אם מגלים false positives/negatives.

inline bold — `_add_runs_with_inline_bold()`

אחרי סיווג, הטקסט עדיין יכול להכיל **word** באמצע. הפונקציה מחלקת את המחרוזת ל-runs מתחלפים:

"העורר טוען **שהתוצאה** שגויה"
→ [
    Run("העורר טוען ", bold=None),
    Run("שהתוצאה", bold=True),
    Run(" שגויה", bold=None),
  ]

כל run מסומן RTL בנפרד. יוצא ב-Word עם הדגש המקומי בלבד על המילה שהתוצאה.

השלמת התמונה

md content
    ↓ splitlines()
for line in lines:
    ↓ _classify_line(line)
    → (kind, clean_text)
    ↓ _emit_content_line(doc, line)
    → paragraph with chosen style
    ↓ _add_runs_with_inline_bold(paragraph, clean_text)
    → runs with inline **bold** rendered
    ↓ _mark_run_rtl / _mark_paragraph_rtl
    → Hebrew renders in David (cs slot)

הוספת קטגוריה חדשה

אם יש דפוס שלא מזוהה ורצוי למפות אותו:

הוסף regex constant (למעלה בקובץ, אחרי הקיימים).
הוסף branch ב-_classify_line() לפי הסדר הנכון (ספציפי לפני כללי).
הוסף branch ב-_emit_content_line() עם הסגנון המתאים.
הוסף test case ב-references/line-classification.md (כאן).
הרץ על תיק מייצג (למשל 8070-25) וראה שהתוצאה נכונה.

4.6 KiB Raw Blame History

סיווג שורות — _classify_line()

טבלת הקטגוריות

סדר הבדיקות (חשוב!)

דוגמאות מהשטח

input: - **נקודות פתוחות:**

input: - **נקודות פתוחות:** האם המקדם...

input: 1. **שאלה עקרונית:** האם נספח...

input: (א) נספח הבינוי של תכנית...

input: העורר טוען כי:

input: פסקה ארוכה: עם עוד תוכן ופסיק, וסוגיה מורכבת.

למה הגבלת PLAIN_LABEL_RE ל-{2,40}

inline bold — _add_runs_with_inline_bold()