# סיווג שורות — `_classify_line()`

כל שורה של content מה-MD עוברת דרך `_classify_line()` שמחזירה
`(kind, clean_text)`. הקטגוריה מכתיבה איזה סגנון Word יוחל.

## טבלת הקטגוריות

| kind | regex | clean_text | נמפה ל-style |
|------|--------|-----------|--------------|
| `label_heading` | `^\s*\*\*([^\n*]+?):\*\*\s*$` | `"$1"` | `Heading 2` |
| `label_heading` (plain) | `^\s*([^\n:]{2,40}):\s*$` | `"$1"` | `Heading 2` |
| `inline_label` | `^\s*\*\*([^\n*]+?):\*\*\s+(.+)$` | `"$1\x00$2"` | `Normal` + bold label + value |
| `numbered` | `^\s*(\d+)[.)]\s+(.+)$` | `"$2"` | `List Paragraph` |
| `bullet` | `^\s*[\-\u2022\*\u25CF\u25E6]\s+(.+)$` | `"$1"` | `Normal` (marker stripped) |
| `heb_letter` | `^\s*\([א-ת]\)\s+` | **full line** (marker kept) | `List Paragraph` + `_strip_numpr()` |
| `plain` | fallback | line | `Normal` |

## סדר הבדיקות (חשוב!)

```
1. STANDALONE_LABEL_RE      (**X:**)
2. INLINE_LABEL_RE          (**X:** value)
3. NUMBERED_LINE_RE         (1. X)
4. BULLET_LINE_RE           (- X) + re-check inside:
     4a. STANDALONE inside → label_heading
     4b. INLINE inside → inline_label
5. HEB_LETTER_LINE_RE       ((א) X)
6. PLAIN_LABEL_RE           (X:)  — last because it's broad
7. plain
```

## דוגמאות מהשטח

### input: `- **נקודות פתוחות:**`
- `BULLET_LINE_RE` תופס → `inner = "**נקודות פתוחות:**"`
- `STANDALONE_LABEL_RE` על ה-inner → `label_heading`, text = `"נקודות פתוחות"`
- יוצא כ-Heading 2.

### input: `- **נקודות פתוחות:** האם המקדם...`
- `BULLET_LINE_RE` תופס → inner = `"**נקודות פתוחות:** האם..."`
- `INLINE_LABEL_RE` על ה-inner → `inline_label`
- יוצא כ-Normal עם label "נקודות פתוחות:" bold + value רגיל.

### input: `1. **שאלה עקרונית:** האם נספח...`
- `NUMBERED_LINE_RE` תופס → `"**שאלה עקרונית:** האם נספח..."`
- יוצא כ-List Paragraph. ה-`**...**` בתוכו יעובד על ידי
  `_add_runs_with_inline_bold()` (bold inline run).

### input: `(א) נספח הבינוי של תכנית...`
- `HEB_LETTER_LINE_RE` תופס
- יוצא כ-List Paragraph עם `_strip_numpr()` — כי המחבר כבר כתב "(א)".

### input: `העורר טוען כי:`
- לא תואם regex ספציפי
- `PLAIN_LABEL_RE` תופס (23 תווים, מסתיים ב-`:`)
- יוצא כ-Heading 2.

### input: `פסקה ארוכה: עם עוד תוכן ופסיק, וסוגיה מורכבת.`
- `PLAIN_LABEL_RE` לא תופס (יותר מ-40 תווים לפני `:`)
- נשאר `plain` → Normal.

## למה הגבלת `PLAIN_LABEL_RE` ל-`{2,40}`

בלי הגבלה, כל פסקה עם `:` במקום כלשהו הייתה הופכת ל-Heading 2. דוגמה
שצריך למנוע:
```
טענה חשובה כאן: היא שהוועדה שגתה בכל אופן.
```
אין כאן כוונה לכותרת — `:` הוא חלק ממשפט. ההגבלה ל-40 תווים מסננת את
רוב המקרים האלה כי רוב headings אמיתיים הם קצרים.

40 תווים זה ניחוש — אפשר לכוון אם מגלים false positives/negatives.

## inline bold — `_add_runs_with_inline_bold()`

אחרי סיווג, הטקסט עדיין יכול להכיל `**word**` באמצע. הפונקציה מחלקת
את המחרוזת ל-runs מתחלפים:

```
"העורר טוען **שהתוצאה** שגויה"
→ [
    Run("העורר טוען ", bold=None),
    Run("שהתוצאה", bold=True),
    Run(" שגויה", bold=None),
  ]
```

כל run מסומן RTL בנפרד. יוצא ב-Word עם הדגש המקומי בלבד על המילה
`שהתוצאה`.

## השלמת התמונה

```
md content
    ↓ splitlines()
for line in lines:
    ↓ _classify_line(line)
    → (kind, clean_text)
    ↓ _emit_content_line(doc, line)
    → paragraph with chosen style
    ↓ _add_runs_with_inline_bold(paragraph, clean_text)
    → runs with inline **bold** rendered
    ↓ _mark_run_rtl / _mark_paragraph_rtl
    → Hebrew renders in David (cs slot)
```

## הוספת קטגוריה חדשה

אם יש דפוס שלא מזוהה ורצוי למפות אותו:

1. הוסף regex constant (למעלה בקובץ, אחרי הקיימים).
2. הוסף branch ב-`_classify_line()` לפי הסדר הנכון (ספציפי לפני כללי).
3. הוסף branch ב-`_emit_content_line()` עם הסגנון המתאים.
4. הוסף test case ב-references/line-classification.md (כאן).
5. הרץ על תיק מייצג (למשל 8070-25) וראה שהתוצאה נכונה.