2 Commits

Author SHA1 Message Date
03e7d88aee DOCX exporter: 3-layer RTL + David font on all slots
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m30s
Hebrew was rendering LTR or in Times New Roman fallback in some Word
contexts. Root cause: incomplete RTL marking and missing font hints
on the run level.

Three layers of RTL are required (per skills/docx/SKILL.md):
1. Section: <w:bidi/> in sectPr (now inherited from template)
2. Paragraph: <w:bidi/> directly in pPr (paragraph direction)
3. Run: <w:rtl/> in rPr — tells Word to use cs (complex-script) font

Without an explicit font on the run, Hebrew renders in the ascii slot
(Times New Roman). Force David on all four slots (ascii / hAnsi / cs /
eastAsia) so every shaping path picks the correct font.

Changes:
- TEMPLATE_PATH now points to skills/docx/decision_template.docx
  (carries David, RTL, margins, styles); replaces hard-coded constants.
- _mark_run_rtl: writes rFonts on all four slots, then appends <w:rtl/>.
- _mark_paragraph_rtl: places <w:bidi/> directly in pPr (not nested in
  rPr — that was the bug), and adds <w:rtl/> to the paragraph-mark rPr.
- _set_paragraph_jc: forces explicit jc, overriding style-inherited.

Tests:
- test_mark_paragraph_rtl_adds_bidi_directly_in_pPr — guards against
  the regression where bidi was nested inside rPr.
- test_mark_run_rtl_forces_david_on_all_font_slots — ensures all four
  font slots are set, not just cs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 17:37:52 +00:00
4a297f910c Lessons from 1033-25 (clean acceptance — first in training corpus)
Comparison of our draft (טיוטה-v6, 2,126 words) against Dafna's final
decision (עריכה-v2, 2,299 words). 14 lessons (#20-#33) covering what
the draft got right and where she rebuilt the discussion.

Key findings:
- Lesson #20: Match doctrinal depth to legal uncertainty. In clean
  acceptance the committee's OWN conditions provide the anchor — no
  CREAC framework needed. The draft's 101-word "נבאר" doctrinal
  paragraph was deleted entirely.
- Lesson #21: Plant analytical seeds in the background ("ודוק"
  foreshadowing) for technical planning distinctions.
- Lesson #23: Concrete documentary evidence (specific permits in
  buildings 5, 7, 11) beats generic statements.
- Lesson #25: Counter-factual reasoning — "approved by mistake" gives
  the committee benefit of the doubt while strengthening reversal.
- Lesson #26: Engineer counter-factual — "had he known the shadow plan
  was not feasible, his opposition would have been even stronger".
- Lesson #27: "אכן...אולם" / "לא נעלם מעינינו" patterns are for
  rejection, NOT acceptance. Don't use prophylactically.
- Lesson #28: "ונפרט;" (ו prefix + semicolon), never "נפרט." with
  period.
- Lesson #33: Full acceptance against permit applicant → no expenses
  to either side.

New transition phrases catalogued: "דיון עקר", "אושרה מתוך טעות כי הרי
לא נוכל להניח כי אושרה למראית עין", "ועדת הערר אפשרה מרחב של זמן
בתקווה כי ההחלטה תתייתר", "להלן כדוגמא מתוך", "ברי כי הכוונה ל...".

Several of these lessons fed directly into daphna-acceptance-architecture.md
(template A) and daphna-decision-tree.md from the recent voice corpus
work; this file remains the case-study record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 17:37:38 +00:00
3 changed files with 524 additions and 117 deletions

View File

@@ -252,3 +252,136 @@ Total: ~340,000 words of source material.
Intermediate extraction documents also saved: Intermediate extraction documents also saved:
- `docs/fjc-principles-extraction.md` — 38 principles from FJC - `docs/fjc-principles-extraction.md` — 38 principles from FJC
- `docs/garner-methodology-extraction.md` — ~50 principles from Garner/Scalia - `docs/garner-methodology-extraction.md` — ~50 principles from Garner/Scalia
---
## Lessons from הר הבשן 1033-25 (April 2026)
### Source
- Final decision: `data/cases/1033-25/exports/עריכה-v2.docx`
- Our draft (v6): `data/cases/1033-25/exports/טיוטה-v6.docx`
- Intermediate edit (v1): `data/cases/1033-25/exports/עריכה-v1.docx`
- Date: April 2026
- Result: Full acceptance (קבלה מלאה)
- Word counts: Draft 2,126 → Final 2,299 (+8%)
- Discussion section: Draft 960 words (19 paras) → Final 1,099 words (23 paras) (+14%)
### What Our Draft Got Right
- **12-block structure preserved** — all blocks in correct order, headings identical
- **Opening formula** — bottom-line opening "מצאנו כי דין הערר להתקבל" (mode A adapted for acceptance) — used and kept
- **Threshold claims treatment** — all 3 threshold claims handled correctly with same reasoning
- **Central argument flow** — committee's own conditions → shadow plan → not feasible → appeal accepted — this was the exact structure Dafna kept
- **Background neutrality** — facts-only background passed final review (no party quotes, no value words)
- **Most paragraphs kept verbatim** — blocks ו (background), ז (claims), and most of ח (procedures) were kept nearly word-for-word
- **Transition phrases** — "ונוסיף", "הנה כי כן", "הדברים מתחדדים שעה שנזכיר כי" — all used correctly and retained
- **Direct quote from licensing rep** — "נכון, אני מסכימה, התבקשו הרחבות..." — kept verbatim
- **"מסקנת ביניים"** technique — used correctly and retained
- **"למען הסדר הטוב"** — correct usage for remaining claims section
### What the Final Version Changed — Critical Gaps
#### 20. Over-Doctrinal: Abstract Legal Framework Removed Entirely
- **Draft:** Had a 101-word "נבאר" paragraph explaining the general legal authority of committees to require uniform building plans, covering advisory vs. mandatory annexes and administrative review processes — pure CREAC doctrine.
- **Final:** Completely deleted. Went straight from conclusion ("מסקנתנו היא שהבקשה אינה עומדת") to factual evidence (shadow plan is theoretical).
- **Lesson:** In "clean acceptance" cases where the committee's OWN conditions provide the anchor for the decision, skip the doctrinal framework. The committee said "show us X", the applicant didn't show X — no need to explain WHY committees can require X. CREAC is for contested legal rules, not for applying a committee's own explicitly-stated conditions. This is the most important lesson from this case: **match doctrinal depth to legal uncertainty**.
#### 21. Background Enhanced with "ודוק" Foreshadowing
- **Draft:** Simple description of the permit application: "ופורסמה כנדרש לפי סעיף 149 לחוק"
- **Final:** Added 2 sentences after the permit description: "ודוק, בהתאם להוראות התכנית נספח הבינוי מחייב לגבי מספר הקומות המירבי ובכל הנוגע לדרישה להכנת תכנית אחידה הרי שזו מכח שלביות הביצוע של התכנית. על מנת לסטות מהוראות אלו התבקשו ההקלות."
- **Lesson:** Dafna plants analytical seeds in the background. This "ודוק" paragraph in the background isn't neutrality-violating — it's explaining how plan provisions work as a matter of technical fact. But it foreshadows the fulcrum of the entire analysis (the reliefs are from MANDATORY provisions, not from advisory guidance). The background reader already understands what's at stake before reaching the discussion. **Rule**: when the decision hinges on a technical planning distinction, explain that distinction in the background (as fact, not as argument).
#### 22. Procedures Section: Specific Dates → Summary Narrative
- **Draft:** Listed specific dates and documents: "ביום 05.02.2026 ניתנה החלטת ביניים... הודעת עמדה מטעם העוררת גלנסקי מיום 23.02.2026, תגובת גבי אינגרם מיום 08.02.2026, ותגובת מבקשת ההיתר מיום 25.02.2026"
- **Final:** Generalized: "לאחר מועד זה הוגשו בקשות, עדכונים ותגובות מטעם הצדדים לגבי ניסיון להגיע לידי הסכמות, וגם בניסיון לתכנן בקשה שונה ומכל מקום ועדת הערר אפשרה מרחב של זמן בתקווה כי ההחלטה תתייתר"
- **Lesson:** For post-hearing procedural history that didn't change the outcome, Dafna prefers summary narrative over chronological detail. The intermediate decisions, update letters, and their specific dates don't matter to the reader — what matters is the narrative arc: "we gave them time to agree, they didn't, now we decide." Also: "ועדת הערר אפשרה מרחב של זמן בתקווה כי ההחלטה תתייתר" — this signals judicial patience and good faith before ruling.
#### 23. Concrete Evidence Added: Specific Permits in Buildings 5, 7, 11
- **Draft:** General statement that expansions were done ("הרחבות אלו, שחלקן כבר בוצעו וחלקן אושרו...")
- **Final:** Added an entire new paragraph: "להלן כדוגמא מתוך היתרי הבניה בבתים מספר 5, 7, ו-11 (בניינים סמוכים ואף צמודים לזה מושא הערר), בהם התבקשו ואושרו תוספות בניה בהתאם להוראות התכנית בקומה ב' (מפלס 5.80+). משזכויות הבניה נוצלו כאמור, הרי שלא תהיה בידם האפשרות לנצל וליישם את הרחבת הבניה באופן דומה לזה המתבקש בענייננו, מה שיגרום לבית 13 להיות חריג לסביבתו" — with accompanying images of the permits.
- **Lesson:** In acceptance decisions where you're overturning a committee, provide specific factual evidence that makes the conclusion inevitable. Not "other buildings already expanded" but "HERE are permits 5, 7, 11 showing exactly what was approved at level +5.80, making it physically impossible for the shadow plan to be implemented." The word "חריג לסביבתו" appears here as factual consequence, not as value judgment.
#### 24. Plan-Provision Integration Paragraphs Added (נחדד + מקל וחומר)
- **Draft:** None of this content existed
- **Final:** Two new paragraphs:
- F13: "נחדד כי בהתאם להוראות התכנית נספח הבינוי מחייב לגבי מספר הקומות, ולכך מתווספת גם הוראת השלביות והדרישה להכנת תכנית אחידה לכל הבניין. ברי כי הכוונה לתכנית הממחישה ומבטיחה כי ההרחבות מושא התכנית יוכלו להתממש לגבי כלל בעלי הזכויות ובאופן המייצר מופע מקובל."
- F14: "הדברים מתחדדים ביתר שאת שעה שמבוקשת הקלה שמשמעותה חריגה מהוראות התכנית שאז בוודאי מקל וחומר נכון להכין תכנית אחידה."
- **Lesson:** Where the draft used abstract doctrine, Dafna uses specific plan provisions. The "מקל וחומר" argument is new and powerful: if a uniform plan is required even for plan-conforming construction, then all the more so for construction that deviates from the plan. This replaces the general legal framework with a specific, irrefutable logical argument anchored in THIS plan's provisions.
#### 25. Counter-Factual Reasoning: "Approved by Mistake" + "Barren Discussion"
- **Draft:** Simple statement: "לאחר שהתברר בדיון בפנינו כי תכנית הצל אינה ישימה" followed by intermediate conclusion
- **Final:** Added entirely new reasoning: "תכנית הצל אושרה מתוך טעות כי הרי לא נוכל להניח כי אושרה למראית עין וברי כי הועדה המקומית ביקשה להבטיח זכויות של אחרים והשתלבות בסביבה. במקום בו התכנית אינה ישימה דיון בה הינו דיון עקר."
- **Lesson:** The "benefit of the doubt" technique — assume the committee acted in good faith (they didn't knowingly approve a hollow document), then show that this good-faith assumption actually STRENGTHENS the reversal (if they thought it was real, and it's not, then they were misled). "דיון עקר" = "barren discussion" — a phrase that shuts down any further argument about the shadow plan's merits. This is a new rhetorical move not seen in previous decisions.
#### 26. Engineer Counter-Factual: "Had He Known..." (Two New Paragraphs)
- **Draft:** Nothing about the engineer after the discussion section
- **Final:** Two new paragraphs (F18-F19) adding meta-reasoning about the engineer's opinion:
- "חוות דעתו של מהנדס הוועדה כי התכנון המבוקש חורג לסביבתו נבחנה לאור תכנית הצל שהוגשה ומשזו הוגשה בחסר חוו"ד הגורם המקצועי נותרה גם היא בחסר."
- "ונציין כי חוו"ד מהנדס הוועדה ניתנה במקום בו היה סבור כי תכנית הצל ישימה ובהינתן כך קבע כי הינה עדיין חורגת לסביבה... היה והייתה מוצגת תכנית צל המאגדת את ההיתרים שאושרו וממחישה את חריגות הבניה במרחב, ניתן לשער כי חוו"ד המהנדס הייתה החלטית יותר"
- **Lesson:** In acceptance decisions where you're overturning a committee that had professional support, explain WHY the professional got it wrong (or rather, why his analysis was based on faulty premises). The counter-factual "had the engineer known the shadow plan was not feasible, his opposition would have been even stronger" turns the committee's own professional opinion into evidence FOR the reversal. This is Dafna's way of being respectful to professionals while still overturning their conclusions.
#### 27. "לא נעלם מעינינו" Acknowledge-Before-Reject Removed
- **Draft:** Had a 66-word paragraph: "לא נעלם מעינינו כי נספח הבינוי הוגדר כ'מנחה' ולא כ'מחייב'... אולם אף בנספח מנחה, סטייה מהותית... אינה עניין טכני אלא שינוי מהותי"
- **Final:** Completely removed
- **Lesson:** The "אכן...אולם" or "לא נעלם מעינינו" pattern is for REJECTING an appeal — you need to show you considered the losing side's best argument. In ACCEPTANCE, the losing side is the committee/permit applicant, and the analysis already shows their conditions weren't met. No need to acknowledge the other side's argument when the factual record speaks for itself. **Rule**: "acknowledge-before-reject" = only in rejection decisions or on specific issues where you rule against a party. Don't use it prophylactically.
#### 28. Committee Response: Personal Circumstances Added
- **Draft:** Missing entirely — no mention of "פסק הלכתי" or "נסיבות אישיות חריגות"
- **Final:** Added new paragraph in committee response section: "בין השיקולים ששקלו חברי הוועדה נלקחו בחשבון גם נסיבות אישיות חריגות של מבקשת ההיתר, ובכללן פסק הלכתי שהוצג בפני הוועדה, שלפיו בנות מתבגרות אינן יכולות להתגורר באותו מפלס עם שאר בני המשפחה"
- **Lesson:** If a committee considered unusual factors (religious rulings, personal hardship), document them in the claims section for completeness, even if they're not addressed in the discussion. Omitting them would create a gap for judicial review — a judge reading the protocol would wonder why the decision doesn't mention them. Including them in the claims section without addressing them in the discussion implicitly signals: "we noted this but it doesn't change the planning analysis."
#### 29. Opening Precision: Permit Number and Phrasing
- **Draft:** "בקשה להיתר שמספרה" (placeholder — number missing!), "בהקלה לתוספת קומה"
- **Final:** "בקשה להיתר מס' 20230614", "בקשה הכוללת הקלות 'הקלה לתוספת קומה ללא תכנית אחידה וללא אדריכלות חוץ'"
- **Lesson:** (a) Never leave placeholders — "שמספרה" without the actual number is a production error. (b) The permit number is a legal identifier that must appear in the opening. (c) The phrasing "בקשה הכוללת הקלות" (application that includes reliefs) is more precise than "בהקלה" (with a relief). Also: the relief description is quoted in quotation marks from the official publication.
#### 30. "ונפרט;" Not "נפרט."
- **Draft:** "נפרט." (period)
- **Final:** "ונפרט;" (ו prefix + semicolon)
- **Lesson:** The transition from conclusion to detail uses "ו" prefix (connecting) and semicolon (flowing into the detail), not a period (which creates a full stop). This was already documented in the voice fingerprint ("מעבר עם נקודה-פסיק") but the draft didn't apply it. This confirms: **semicolons before elaboration are not optional — they are Dafna's standard punctuation for transitions into detail**.
#### 31. Summary: No Forward-Looking Guidance to Losing Party
- **Draft:** Had a forward-looking paragraph: "ככל שמבקשת ההיתר תבקש להגיש בקשה מחודשת עליה לעמוד בדרישות התכנית, לרבות הצגת תכנית אחידה ישימה לכל הבניין כנדרש"
- **Final:** Replaced with simple restatement: "על כן, הבקשה להיתר לא עמדה בתנאים שהוועדה המקומית עצמה קבעה בהחלטתה מיום 8.7.2024."
- **Lesson:** Dafna does NOT give advice to the losing party in the summary. The decision says what was decided, not what the applicant should do next. Forward-looking guidance would be an advisory opinion outside the scope of the decision. Also note: the final added "ולמעשה היא אינה ממחישה את המצב הפיזי והתכנוני 'האמיתי'" — a new phrase capturing the essence of why the shadow plan fails (it doesn't reflect reality).
#### 32. Unit vs. Extension: Deference to Committee, Not Independent Analysis
- **Draft:** "ניתן לקבל בדוחק את עמדת מבקשת ההיתר כי מדובר בתוספת לדירה קיימת" — expressing the committee's own hesitant view
- **Final:** "עולה כי הועדה המקומית דנה בכך וקבעה כי מדובר ביחידת דיור אחת שבנייתה מיועדת לשימוש בן משפחה... אין אנו מוצאים להתערב בכך ראשית כי הדבר מקדים את זמנו... ושנית ככל שתאושר בניה זו יש לוודא כי לא תבנה יח"ד נוספת"
- **Lesson:** When a secondary issue was resolved by the committee and you're not overturning THAT specific finding, use deference ("אין אנו מוצאים להתערב") rather than expressing your own opinion ("ניתן לקבל בדוחק"). The final also adds a CONDITION ("יש לוודא כי לא תבנה יח"ד נוספת") — practical safeguard rather than theoretical analysis.
#### 33. No Expenses in Full Acceptance
- **Draft:** No mention of expenses
- **Final:** No mention of expenses
- **Lesson confirmed:** In full acceptance of an appeal by neighbor-appellants against a permit applicant, Dafna does not award expenses to either side. This contrasts with rejection (הכט: appellants pay expenses). The pattern emerges: expenses = only in rejection. Acceptance or partial acceptance = no expenses order.
### New Transition Phrases Discovered
- **"ונפרט;"** — correct form (ו + semicolon, not "נפרט.")
- **"דיון בה הינו דיון עקר"** — declaring a point moot
- **"אושרה מתוך טעות כי הרי לא נוכל להניח כי אושרה למראית עין"** — benefit-of-the-doubt construction
- **"ונציין כי חוו"ד... ניתנה במקום בו היה סבור כי..."** — counter-factual about professional opinion
- **"להלן כדוגמא מתוך"** — introducing specific documentary evidence
- **"ברי כי הכוונה ל..."** — explaining legislative intent of plan provisions
- **"מה שיגרום לבית 13 להיות חריג לסביבתו"** — factual consequence language
- **"ועדת הערר אפשרה מרחב של זמן בתקווה כי ההחלטה תתייתר"** — explaining judicial patience
### Meta-Lesson
This is the first "clean acceptance" in our training data (הכט = rejection, בית הכרם = partial acceptance). The key insight: **the draft was too careful**. It built a doctrinal framework (CREAC) as if it needed to justify overturning the committee from first principles, when in reality the committee's OWN conditions provided all the justification needed. Dafna's approach to acceptance:
1. **Anchor in the committee's own conditions** — no need for external legal authority
2. **Show concrete evidence** the conditions weren't met (specific permits, images)
3. **Explain WHY the committee was misled** (shadow plan approved by mistake)
4. **Counter-factual reasoning** about what professionals would have said with correct information
5. **No abstract doctrine needed** when the facts are clear
The draft's biggest structural error was adding the "נבאר" doctrinal paragraph and the "לא נעלם מעינינו" acknowledge-before-reject. Both are tools for CONTESTED or REJECTED cases. In a clean acceptance, the facts lead directly to the conclusion.
### Applied To
- [ ] Update SKILL.md: add "clean acceptance" track — skip doctrine, anchor in committee's conditions
- [ ] Update SKILL.md: "acknowledge-before-reject" only in rejection/contested issues
- [ ] Update SKILL.md: no forward-looking guidance in summary
- [ ] Update SKILL.md: "ודוק" foreshadowing in background for technical planning distinctions
- [ ] Update SKILL.md: counter-factual reasoning about professional opinions
- [ ] Update SKILL.md: procedures section — summary narrative for post-hearing history
- [ ] Update voice-fingerprint: add new transition phrases
- [ ] Update architecture-by-outcome: add "clean acceptance" archetype
- [ ] Fix agent opening punctuation: "ונפרט;" not "נפרט."

View File

@@ -15,47 +15,112 @@ from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.oxml import OxmlElement from docx.oxml import OxmlElement
from docx.oxml.ns import qn from docx.oxml.ns import qn
from docx.shared import Cm, Pt, RGBColor
from legal_mcp import config from legal_mcp import config
from legal_mcp.services import db from legal_mcp.services import db
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# ── Constants ───────────────────────────────────────────────────── # Path to the converted decision template. Carries David font, RTL, margins,
# and styles (Title / Heading 1-2 / Normal / Quote / List Paragraph).
FONT_NAME = "David" # Populated once by `scripts/convert_decision_template.py` from `.dotx`.
FONT_SIZE_BODY = Pt(12) TEMPLATE_PATH = (
FONT_SIZE_TITLE = Pt(16) Path(__file__).resolve().parents[4]
FONT_SIZE_HEADING = Pt(14) / "skills" / "docx" / "decision_template.docx"
LINE_SPACING = 1.5 )
PAGE_MARGIN = Cm(2.5)
# ── RTL helpers ─────────────────────────────────────────────────── # ── RTL helpers ───────────────────────────────────────────────────
# Three layers of RTL are required (per skills/docx/SKILL.md):
# 1. Section: <w:bidi/> in sectPr (inherited from template)
# 2. Paragraph: <w:bidi/> directly in pPr — paragraph direction
# 3. Run: <w:rtl/> in rPr — tells Word to use cs (complex-script) font
# Without explicit font on run, Hebrew can render in the ascii slot
# (Times New Roman) — so we also force David on all four font slots.
def _set_rtl_paragraph(paragraph) -> None: HEBREW_FONT = "David"
"""Set paragraph-level RTL properties."""
pPr = paragraph._element.get_or_add_pPr()
bidi = OxmlElement("w:bidi")
bidi.set(qn("w:val"), "1")
pPr.append(bidi)
def _set_rtl_run(run) -> None: def _mark_run_rtl(run) -> None:
"""Set run-level RTL properties.""" """Force David font on all four slots, then add <w:rtl/>."""
rPr = run._element.get_or_add_rPr() rPr = run._r.get_or_add_rPr()
rtl = OxmlElement("w:rtl") if rPr.find(qn("w:rFonts")) is None:
rtl.set(qn("w:val"), "1") fonts = OxmlElement("w:rFonts")
rPr.append(rtl) fonts.set(qn("w:ascii"), HEBREW_FONT)
fonts.set(qn("w:hAnsi"), HEBREW_FONT)
fonts.set(qn("w:cs"), HEBREW_FONT)
fonts.set(qn("w:eastAsia"), HEBREW_FONT)
rPr.insert(0, fonts)
if rPr.find(qn("w:rtl")) is None:
rPr.append(OxmlElement("w:rtl"))
def _set_rtl_section(section) -> None: def _mark_paragraph_rtl(paragraph) -> None:
"""Set section-level RTL (bidi).""" """Add <w:bidi/> directly to pPr (paragraph direction) and <w:rtl/>
sectPr = section._sectPr to the paragraph-mark rPr (affects trailing ¶ glyph)."""
bidi = OxmlElement("w:bidi") pPr = paragraph._p.get_or_add_pPr()
bidi.set(qn("w:val"), "1") # (2) <w:bidi/> directly in pPr — paragraph direction
sectPr.append(bidi) if pPr.find(qn("w:bidi")) is None:
bidi = OxmlElement("w:bidi")
pstyle = pPr.find(qn("w:pStyle"))
if pstyle is not None:
pstyle.addnext(bidi)
else:
pPr.insert(0, bidi)
# paragraph-mark rPr gets <w:rtl/> so ¶ inherits RTL too
rPr = pPr.find(qn("w:rPr"))
if rPr is None:
rPr = OxmlElement("w:rPr")
pPr.append(rPr)
if rPr.find(qn("w:rtl")) is None:
rPr.append(OxmlElement("w:rtl"))
def _set_paragraph_jc(paragraph, value: str) -> None:
"""Force <w:jc w:val="..."/> on a paragraph, overriding style-inherited jc.
Needed because Heading 3 in the template ships with jc=center — we want
body headings justified right (jc=both) like Normal.
"""
pPr = paragraph._p.get_or_add_pPr()
existing = pPr.find(qn("w:jc"))
if existing is not None:
pPr.remove(existing)
jc = OxmlElement("w:jc")
jc.set(qn("w:val"), value)
pPr.append(jc)
def _suppress_paragraph_numbering(paragraph) -> None:
"""Kill any style-inherited auto-numbering on this paragraph.
Heading styles linked to outline lists can auto-inject א./ב./ג. markers
in some Word versions even when the style we read doesn't show numPr.
Setting numId=0 explicitly removes the paragraph from any list.
"""
pPr = paragraph._p.get_or_add_pPr()
existing = pPr.find(qn("w:numPr"))
if existing is not None:
pPr.remove(existing)
numPr = OxmlElement("w:numPr")
ilvl = OxmlElement("w:ilvl")
ilvl.set(qn("w:val"), "0")
numId = OxmlElement("w:numId")
numId.set(qn("w:val"), "0")
numPr.append(ilvl)
numPr.append(numId)
pPr.append(numPr)
def _clear_body(doc) -> None:
"""Remove all paragraphs in the document body while keeping sectPr.
The template ships with sample paragraphs we don't want. Section
properties (page size, margins, bidi) stay intact.
"""
body = doc.element.body
for p in list(body.findall(qn("w:p"))):
body.remove(p)
# ── Bookmark helpers ────────────────────────────────────────────── # ── Bookmark helpers ──────────────────────────────────────────────
@@ -109,61 +174,109 @@ def _wrap_block_with_bookmarks(doc, block_name: str,
_insert_bookmark_end(last_new, bm_id) _insert_bookmark_end(last_new, bm_id)
def _add_paragraph(doc, text: str, style: str = "Normal", # ── Content cleanup ──────────────────────────────────────────────
bold: bool = False, font_size=None,
alignment=None, space_after: Pt | None = None) -> None:
"""Add an RTL paragraph with David font."""
para = doc.add_paragraph()
_set_rtl_paragraph(para)
if alignment: # Em-dash (—, U+2014) and en-dash (, U+2013) — per chair's no-dash policy,
# strip from body text. Surrounding spaces collapse.
_DASH_RE = re.compile(r"\s*[—–]\s*")
_MULTI_SPACE_RE = re.compile(r" {2,}")
def _strip_dashes(text: str) -> str:
"""Remove em/en-dashes and collapse surrounding whitespace."""
text = _DASH_RE.sub(" ", text)
return _MULTI_SPACE_RE.sub(" ", text).strip()
# Numbered paragraph: "1. content", "23. content" — auto-numbered via
# List Paragraph style so order reflects emission, not literal prefix.
_NUM_PREFIX_RE = re.compile(r"^(\d+)\.\s+(.*)$", re.DOTALL)
# Markdown inline bold — `**...**`
_INLINE_BOLD_RE = re.compile(r"\*\*([^\n*]+?)\*\*")
def _add_runs_with_inline_bold(paragraph, text: str, *, bold_all: bool = False) -> None:
"""Split text on `**...**` markers, alternating plain and bold runs.
Keeps `**טענה חשובה**` rendering as bold instead of leaving literal
asterisks. When bold_all is True, every run is bold (used for headings
that still carry inline-bold markup).
"""
pos = 0
for m in _INLINE_BOLD_RE.finditer(text):
if m.start() > pos:
plain = paragraph.add_run(text[pos:m.start()])
if bold_all:
plain.bold = True
_mark_run_rtl(plain)
run_bold = paragraph.add_run(m.group(1))
run_bold.bold = True
_mark_run_rtl(run_bold)
pos = m.end()
if pos < len(text):
tail = paragraph.add_run(text[pos:])
if bold_all:
tail.bold = True
_mark_run_rtl(tail)
def _add_styled_paragraph(doc, text: str, style: str = "Normal",
bold: bool = False,
alignment=None):
"""Add a paragraph using a template style.
Font, size, RTL direction and spacing all come from the style
definition in the template — we only pick the style by name.
Renders `**...**` markdown as inline bold runs.
Returns the paragraph so callers can apply further overrides.
"""
para = doc.add_paragraph(style=style)
_mark_paragraph_rtl(para)
if alignment is not None:
para.alignment = alignment para.alignment = alignment
else:
para.alignment = WD_ALIGN_PARAGRAPH.RIGHT
run = para.add_run(text) if text:
run.font.name = FONT_NAME _add_runs_with_inline_bold(para, text, bold_all=bold)
run.font.size = font_size or FONT_SIZE_BODY
run.bold = bold
_set_rtl_run(run)
# Line spacing return para
pf = para.paragraph_format
pf.line_spacing = LINE_SPACING
if space_after is not None:
pf.space_after = space_after
def _add_centered_paragraph(doc, text: str, bold: bool = True, def _add_centered_paragraph(doc, text: str, *, bold: bool = True,
font_size=None) -> None: style: str = "Normal") -> None:
"""Add centered RTL paragraph.""" _add_styled_paragraph(doc, text, style=style, bold=bold,
_add_paragraph(doc, text, bold=bold, font_size=font_size, alignment=WD_ALIGN_PARAGRAPH.CENTER)
alignment=WD_ALIGN_PARAGRAPH.CENTER)
def _add_heading(doc, text: str, *, style: str) -> None:
"""Heading with overrides: jc=both (overrides style-center / style-left)
and suppressed auto-numbering (so style-linked outline lists don't inject
א./ב./ג. — chair manages markers manually in content)."""
para = doc.add_paragraph(style=style)
_mark_paragraph_rtl(para)
_set_paragraph_jc(para, "both")
_suppress_paragraph_numbering(para)
if text:
_add_runs_with_inline_bold(para, text)
def _add_blockquote(doc, text: str) -> None: def _add_blockquote(doc, text: str) -> None:
"""Add indented blockquote paragraph.""" """Indented quote using the template's Quote style."""
para = doc.add_paragraph() _add_styled_paragraph(doc, text, style="Quote")
_set_rtl_paragraph(para)
para.alignment = WD_ALIGN_PARAGRAPH.RIGHT
run = para.add_run(text)
run.font.name = FONT_NAME
run.font.size = Pt(11)
run.italic = True
_set_rtl_run(run)
pf = para.paragraph_format
pf.left_indent = Cm(1.5)
pf.right_indent = Cm(1.5)
pf.line_spacing = LINE_SPACING
def _add_image_placeholder(doc, description: str) -> None: def _add_image_placeholder(doc, description: str) -> None:
"""Add image placeholder box.""" _add_styled_paragraph(doc, f"[{description}]", style="Normal",
_add_paragraph(doc, f"[{description}]", alignment=WD_ALIGN_PARAGRAPH.CENTER)
alignment=WD_ALIGN_PARAGRAPH.CENTER,
font_size=Pt(10))
def _add_spacer(doc) -> None:
"""Add an empty paragraph as a visual spacer."""
para = doc.add_paragraph(style="Normal")
_mark_paragraph_rtl(para)
# ── Main export ─────────────────────────────────────────────────── # ── Main export ───────────────────────────────────────────────────
@@ -241,16 +354,14 @@ async def export_decision(
else: else:
ordered_blocks = list(rows) ordered_blocks = list(rows)
# Create document if not TEMPLATE_PATH.exists():
doc = Document() raise FileNotFoundError(
f"Template not found at {TEMPLATE_PATH}. "
"Run scripts/convert_decision_template.py first."
)
# Set page margins doc = Document(str(TEMPLATE_PATH))
for section in doc.sections: _clear_body(doc)
section.top_margin = PAGE_MARGIN
section.bottom_margin = PAGE_MARGIN
section.left_margin = PAGE_MARGIN
section.right_margin = PAGE_MARGIN
_set_rtl_section(section)
# Write blocks with bookmarks wrapping each block (anchors for revisions) # Write blocks with bookmarks wrapping each block (anchors for revisions)
bm_counter = [_BOOKMARK_ID_START] bm_counter = [_BOOKMARK_ID_START]
@@ -291,93 +402,132 @@ async def export_decision(
def _write_block_to_docx(doc, block_id: str, title: str, content: str) -> None: def _write_block_to_docx(doc, block_id: str, title: str, content: str) -> None:
"""Write a single block to the DOCX document.""" """Write a single block to the DOCX document using template styles."""
# Header blocks (א-ד) # Header blocks (א-ד)
if block_id == "block-alef": if block_id == "block-alef":
for line in content.split("\n"): for line in content.split("\n"):
if line.strip(): if line.strip():
_add_centered_paragraph(doc, line.strip(), bold=True, font_size=FONT_SIZE_HEADING) _add_styled_paragraph(doc, line.strip(), style="Heading 1",
alignment=WD_ALIGN_PARAGRAPH.CENTER)
return return
if block_id == "block-bet": if block_id == "block-bet":
_add_paragraph(doc, "", space_after=Pt(6)) # spacer _add_spacer(doc)
for line in content.split("\n"): for line in content.split("\n"):
if line.strip(): if line.strip():
_add_centered_paragraph(doc, line.strip(), bold=False, font_size=FONT_SIZE_BODY) _add_centered_paragraph(doc, line.strip(), bold=False)
return return
if block_id == "block-gimel": if block_id == "block-gimel":
_add_paragraph(doc, "", space_after=Pt(6)) _add_spacer(doc)
lines = content.split("\n") for line in content.split("\n"):
for line in lines:
stripped = line.strip() stripped = line.strip()
if not stripped: if not stripped:
continue continue
if stripped == "נגד": if stripped == "נגד":
_add_centered_paragraph(doc, "— נגד —", bold=True, font_size=FONT_SIZE_BODY) _add_centered_paragraph(doc, "— נגד —", bold=True)
else: else:
_add_centered_paragraph(doc, stripped, bold=False, font_size=FONT_SIZE_BODY) _add_centered_paragraph(doc, stripped, bold=False)
return return
if block_id == "block-dalet": if block_id == "block-dalet":
_add_paragraph(doc, "", space_after=Pt(12)) # spacer _add_spacer(doc)
_add_centered_paragraph(doc, "החלטה", bold=True, font_size=FONT_SIZE_TITLE) # Avoid style=Title: its rFonts use theme fonts (majorHAnsi / majorBidi)
_add_paragraph(doc, "", space_after=Pt(12)) # and 28pt size — renders Hebrew oversized and in the wrong face.
# Heading 1 carries David and proper RTL, bold + center gives the
# same visual weight.
para = _add_styled_paragraph(doc, "החלטה", style="Heading 1",
alignment=WD_ALIGN_PARAGRAPH.CENTER,
bold=True)
_suppress_paragraph_numbering(para)
_add_spacer(doc)
return return
if block_id == "block-yod-bet": if block_id == "block-yod-bet":
_add_paragraph(doc, "", space_after=Pt(24)) # spacer _add_spacer(doc)
for line in content.split("\n"): for line in content.split("\n"):
if line.strip(): if line.strip():
_add_centered_paragraph(doc, line.strip(), bold=False, font_size=FONT_SIZE_BODY) _add_centered_paragraph(doc, line.strip(), bold=False)
return return
# Content blocks (ה-יא) — parse paragraphs # Content blocks (ה-יא) — parse paragraphs
paragraphs = content.split("\n") for para_text in content.split("\n"):
for para_text in paragraphs: stripped = _strip_dashes(para_text.strip())
stripped = para_text.strip()
if not stripped: if not stripped:
continue continue
# Section headings (e.g., "תמצית טענות הצדדים", "טענות העוררים") # Markdown H1/H2/H3 → template heading styles
if _is_section_heading(stripped): md_heading = re.match(r"^(#{1,6})\s+(.*)$", stripped)
_add_paragraph(doc, stripped, bold=True, font_size=FONT_SIZE_HEADING, if md_heading:
space_after=Pt(6)) level = len(md_heading.group(1))
heading_text = md_heading.group(2).strip()
style = "Heading 1" if level == 1 else f"Heading {min(level, 3)}"
_add_heading(doc, heading_text, style=style)
continue
# Standalone `**...**` line — treat as a sub-heading (Heading 3)
stand_bold = re.match(r"^\*\*([^\n*]+?)\*\*$", stripped)
if stand_bold:
_add_heading(doc, stand_bold.group(1).strip(), style="Heading 3")
continue
if _is_section_heading(stripped):
_add_heading(doc, stripped, style="Heading 2")
continue continue
# Blockquotes (indented quotes from protocols/rulings)
if stripped.startswith('"') or stripped.startswith("״") or stripped.startswith(">"): if stripped.startswith('"') or stripped.startswith("״") or stripped.startswith(">"):
clean = stripped.lstrip(">").strip().strip('"').strip("״").strip('"') clean = stripped.lstrip(">").strip().strip('"').strip("״").strip('"')
_add_blockquote(doc, clean) _add_blockquote(doc, clean)
continue continue
# Image placeholders if "📷" in stripped or (stripped.startswith("[") and "תמונה" in stripped):
if "📷" in stripped or stripped.startswith("[") and "תמונה" in stripped:
_add_image_placeholder(doc, stripped.strip("[]📷 ")) _add_image_placeholder(doc, stripped.strip("[]📷 "))
continue continue
# Regular numbered paragraph or plain text # Numbered body paragraph ("1. text") → List Paragraph with auto-num.
_add_paragraph(doc, stripped) # The literal prefix is dropped; Word renders "1. 2. 3. ..." via numId.
num_match = _NUM_PREFIX_RE.match(stripped)
if num_match:
body_text = num_match.group(2).strip()
_add_styled_paragraph(doc, body_text, style="List Paragraph")
continue
_add_styled_paragraph(doc, stripped, style="Normal")
def _is_section_heading(text: str) -> bool: _SECTION_HEADING_PATTERNS = [
"""Detect section headings in decision text.""" re.compile(p) for p in (
heading_patterns = [ # Block-level titles
r"^פתח\s+דבר",
r"^רקע\s+עובדתי",
r"^תמצית\s+טענות", r"^תמצית\s+טענות",
r"^טענות\s+הצדדים",
r"^טענות\s+העוררי", r"^טענות\s+העוררי",
r"^טענות\s+המשיב",
r"^עמדת\s+הוועדה", r"^עמדת\s+הוועדה",
r"^עמדת\s+מבקשי", r"^עמדת\s+מבקשי",
r"^ההליכים\s+בפני", r"^ההליכים\s+בפני",
r"^הליכים\s+בפני",
r"^דיון\s+והכרעה", r"^דיון\s+והכרעה",
r"^סוף\s+דבר", r"^סוף\s+דבר",
r"^סיכום", r"^סיכום",
r"^פתח\s+דבר", # Subsection titles produced by legal-writer inside block-vav/block-tet
r"^המצב\s+התכנוני",
r"^הליכי\s+הרישוי",
r"^שומת\s+ההשבחה",
r"^הליך\s+השומה",
r"^הגשת\s+הערר",
r"^תכניות\s+מתאר",
r"^תכניות\s+מפורטות",
r"^תכניות\s+חלות", r"^תכניות\s+חלות",
] r"^תכניות\s+החלות",
for pattern in heading_patterns: r"^מדיניות\s+מהנדס",
if re.search(pattern, text): r"^היתרי\s+בני",
return True r"^היתר\s+בני",
# Short bold-like lines (under 60 chars, not numbered) )
if len(text) < 60 and not re.match(r"^\d+\.", text): ]
return False
return False
def _is_section_heading(text: str) -> bool:
"""Detect legal-decision section headings — mapped to Heading 2 style."""
return any(p.search(text) for p in _SECTION_HEADING_PATTERNS)

View File

@@ -13,12 +13,20 @@ from lxml import etree
from legal_mcp.services.docx_exporter import ( from legal_mcp.services.docx_exporter import (
_BOOKMARK_ID_START, _BOOKMARK_ID_START,
HEBREW_FONT,
_add_styled_paragraph,
_insert_bookmark_end, _insert_bookmark_end,
_insert_bookmark_start, _insert_bookmark_start,
_mark_paragraph_rtl,
_mark_run_rtl,
_strip_dashes,
_wrap_block_with_bookmarks, _wrap_block_with_bookmarks,
_write_block_to_docx,
) )
from legal_mcp.services.docx_reviser import NSMAP, _w, list_bookmarks from legal_mcp.services.docx_reviser import NSMAP, _w, list_bookmarks
from docx.oxml.ns import qn
def test_insert_bookmark_helpers_create_valid_xml(tmp_path: Path) -> None: def test_insert_bookmark_helpers_create_valid_xml(tmp_path: Path) -> None:
doc = Document() doc = Document()
@@ -101,3 +109,119 @@ def test_multiple_blocks_get_unique_bookmark_ids(tmp_path: Path) -> None:
names = list_bookmarks(out) names = list_bookmarks(out)
assert set(names) == {"block-alef", "block-bet", "block-gimel"} assert set(names) == {"block-alef", "block-bet", "block-gimel"}
# ── RTL / David-font invariants ───────────────────────────────────
# These guard against regressions where Hebrew renders LTR or in the wrong
# font slot (Times New Roman instead of David). See plan file for context.
def test_mark_paragraph_rtl_adds_bidi_directly_in_pPr() -> None:
doc = Document()
p = doc.add_paragraph("טקסט בעברית")
_mark_paragraph_rtl(p)
pPr = p._p.find(qn("w:pPr"))
assert pPr is not None
# <w:bidi/> must be a direct child of pPr (paragraph direction),
# NOT nested inside <w:rPr>.
assert pPr.find(qn("w:bidi")) is not None
# paragraph-mark rPr still gets <w:rtl/>
rPr = pPr.find(qn("w:rPr"))
assert rPr is not None and rPr.find(qn("w:rtl")) is not None
def test_mark_run_rtl_forces_david_on_all_font_slots() -> None:
doc = Document()
p = doc.add_paragraph()
run = p.add_run("טקסט")
_mark_run_rtl(run)
rPr = run._r.find(qn("w:rPr"))
assert rPr is not None
fonts = rPr.find(qn("w:rFonts"))
assert fonts is not None
for slot in ("w:ascii", "w:hAnsi", "w:cs", "w:eastAsia"):
assert fonts.get(qn(slot)) == HEBREW_FONT, f"{slot} not {HEBREW_FONT}"
assert rPr.find(qn("w:rtl")) is not None
def test_styled_paragraph_applies_bidi_and_david() -> None:
"""End-to-end: _add_styled_paragraph produces pPr/bidi + rFonts/cs=David."""
doc = Document()
_add_styled_paragraph(doc, "פסקה עברית", style="Normal")
p = doc.paragraphs[-1]
assert p._p.find(qn("w:pPr")).find(qn("w:bidi")) is not None
run = p.runs[0]
fonts = run._r.find(qn("w:rPr")).find(qn("w:rFonts"))
assert fonts.get(qn("w:cs")) == HEBREW_FONT
def test_block_dalet_does_not_use_title_style() -> None:
"""Title style uses theme fonts and 28pt — avoid for Hebrew."""
doc = Document()
_write_block_to_docx(doc, "block-dalet", title="", content="")
styles_used = {p.style.name for p in doc.paragraphs}
assert "Title" not in styles_used, (
f"block-dalet should not produce a Title-styled paragraph, got {styles_used}"
)
# The 'החלטה' text must still appear somewhere
texts = [p.text for p in doc.paragraphs]
assert any("החלטה" in t for t in texts)
# ── Heading overrides, numbered-list, dash strip ──────────────────
def test_strip_dashes_removes_em_and_en_dashes() -> None:
assert _strip_dashes("תכנית 1454198 — אושרה ביום") == "תכנית 1454198 אושרה ביום"
assert _strip_dashes("א ב") == "א ב"
assert _strip_dashes("no dash") == "no dash"
# Collapsed whitespace
assert _strip_dashes("רקע — עובדתי") == "רקע עובדתי"
def test_heading2_gets_justified_and_no_numbering() -> None:
"""Section heading → Heading 2 with jc=both and numId=0."""
doc = Document()
_write_block_to_docx(doc, "block-vav", title="", content="דיון והכרעה")
heading = next(p for p in doc.paragraphs if p.style.name == "Heading 2")
pPr = heading._p.find(qn("w:pPr"))
jc = pPr.find(qn("w:jc"))
assert jc is not None and jc.get(qn("w:val")) == "both"
numPr = pPr.find(qn("w:numPr"))
assert numPr is not None
numId = numPr.find(qn("w:numId"))
assert numId is not None and numId.get(qn("w:val")) == "0"
def test_heading3_gets_justified_not_centered() -> None:
"""Heading 3 in template has jc=center — override to jc=both."""
doc = Document()
_write_block_to_docx(doc, "block-vav", title="", content="**המצב התכנוני**")
heading = next(p for p in doc.paragraphs if p.style.name == "Heading 3")
jc = heading._p.find(qn("w:pPr")).find(qn("w:jc"))
assert jc is not None and jc.get(qn("w:val")) == "both"
def test_numbered_paragraph_uses_list_paragraph_and_strips_prefix() -> None:
"""'1. text' → List Paragraph style, literal '1. ' removed."""
doc = Document()
_write_block_to_docx(
doc, "block-vav", title="",
content="1. עניינו של ערר זה.\n2. שכונת נווה יעקב.",
)
lp = [p for p in doc.paragraphs if p.style.name == "List Paragraph"]
assert len(lp) == 2
assert lp[0].text.startswith("עניינו")
assert not lp[0].text.startswith("1.")
assert lp[1].text.startswith("שכונת")
def test_body_content_has_no_em_dashes() -> None:
"""Content with em-dashes is rendered without them."""
doc = Document()
_write_block_to_docx(
doc, "block-vav", title="",
content="3. תכנית 5924 — קובעת את שטחי הבנייה.",
)
texts = "\n".join(p.text for p in doc.paragraphs)
assert "" not in texts