Rewire the source-document staging writes onto the unified storage layer
(INV-STG1), replacing direct shutil.copy2 calls:
- tools/documents.py: case originals + training-corpus uploads
- services/ingest.py: _stage_file (now async) — covers precedent-library,
internal-decisions, and digests (the canonical intake helper)
- services/digest_library.py: awaits the now-async _stage_file
Each write goes through storage.put_file(..., bucket=DOCUMENTS) with the
DATA_DIR-relative key; the Hebrew original filename rides as object metadata
(INV-STG2), content-type is guessed from the extension. DB path columns are
unchanged (still the absolute dest) — object_key backfill is Phase 3.
Under the default STORAGE_BACKEND=filesystem the bytes land at the exact
legacy on-disk location (put_file → shutil.copy2 to DATA_DIR/key), so this
is zero behaviour change in prod. shutil import dropped where now unused.
tests: +2 staging regression tests (file lands under DATA_DIR at the legacy
path); 20 storage + 22 ingest tests green; 242 collected with no import
breakage.
Derived/export write sites (thumbnails, extracted text, DOCX exports) are
Phase 2b. Keeps G2; advances INV-STG1. Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The single choke-point for all binary file I/O (originals, derived
artifacts, exports), replacing the scattered open()/shutil/Path.write_bytes
calls across ~8 services. Backend chosen by STORAGE_BACKEND:
- filesystem (default): disk under DATA_DIR — byte-for-byte legacy behaviour
- dual: write disk + S3, read S3→disk fallback (migration window)
- s3: MinIO via aioboto3 (lazy import; absent in the filesystem path)
Keys are DATA_DIR-relative POSIX paths; the FS backend ignores the logical
bucket and keeps the existing single tree, so the default backend is zero
behaviour change. S3 maps a governance bucket (documents/immutable/derived)
→ MinIO bucket; presigned URLs are minted against the public endpoint
(browser-reachable) and carry the Hebrew filename via RFC-5987
Content-Disposition.
- config: STORAGE_BACKEND + MINIO_* (endpoint, public-endpoint, creds,
region, 3 bucket names, presign TTL)
- mcp_env_catalog: new "storage" category + 10 specs (X10/INV-ENV1)
- pyproject: aioboto3>=13 (consumed here, deployed with first use)
- tests: 18 unit tests (FS round-trip, key normalization/traversal guard,
bucket resolution, backend selection, dual write-both + S3-down fallback)
No call-sites are rewired yet — that is Phase 2 (106.3). STORAGE_BACKEND
stays filesystem in prod, so behaviour is unchanged.
Invariants: keeps G2 (one storage path replaces scattered I/O); establishes
INV-STG1 (single layer), INV-STG2 (atomic keys, Hebrew name in metadata),
INV-STG3 (governance buckets), INV-STG6 (presigned serving).
Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Live drain surfaced three issues:
1. Tier-0 needed `h2` (httpx http2) — added to the court-fetch extra.
2. Supreme cases that carry a נט-format number (e.g. בר"מ 72182-06-25) were
routed to the unvalidated Tier-0 and failed, even though נט המשפט serves
Supreme cases too. classify() now parses the file-month-year triple for
Supreme prefixes; the orchestrator routes by triple-availability:
נט-format present → Tier-1 (validated, all courts)
serial-only Supreme (עע"מ 5886/24) → Tier-0
neither → clear "no public route" failure
Validated live: בר"מ 72182-06-25 fetched via Tier-1 (5-page PDF).
3. A non-`RuntimeError` fetch exception (the h2 import error) left jobs stuck
in 'running'. The fetch block now catches any Exception → _record_failure
(INV-CF2/CF3), so a job always reaches a terminal state.
+ test_supreme_with_net_format_triple. Suite 11/11.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The extractor classified rule_type by SOURCE bindingness (higher-court→binding,
committee→persuasive) instead of by rule KIND. The gold-set proved it: 'binding'
appeared on 19/19 external rulings & 0 committees; 'persuasive' on 13/13
committees & 0 external — only 58% agreement with the human role tags. The two
axes (authority vs rule role) were crammed into one enum.
This splits them per INV-DM7:
- authority (binding/persuasive) — DERIVED from case_law.precedent_level
(עליון/מנהלי→binding, ועדת_ערר_מחוזית→persuasive), never stored, never
LLM-guessed. New helper halacha_quality.derive_authority; surfaced read-only
in list_halachot / goldset_list / search results.
- rule_type — now the rule ROLE only: holding/interpretive/procedural/
application/obiter. Both extractor prompts unified to this vocabulary;
_coerce_halacha no longer defaults rule_type from the source; legacy
binding→holding / persuasive→interpretive fold for safety.
UI: authority shown as a separate read-only badge (gold=מחייב / muted=משכנע)
across the review queue, precedent detail, and gold-set; the gold-set role
selector drops binding/persuasive and adds מהותי (holding).
Migration: scripts/halacha_rule_role_backfill.py re-classifies the 276 pre-split
binding/persuasive rows into a genuine role via local claude_session (run after
deploy). Gold-set correct_type/ai_correct_type 'binding'→'holding' via SQL.
Sources (≥3, per research-decision policy): OASIS LegalRuleML v1.0
(appliesAuthority/Strength as metadata orthogonal to rule logic) · SemEval-2023
Task 6 LegalEval (rhetorical roles by function, authority kept separate) ·
Bluebook signals (weight-of-authority is a separate dimension).
Invariants: ESTABLISHES INV-DM7. Upholds G1 (normalize at source — extractor
classifies role, system derives authority) and G2 (single source of truth —
authority derived, not a parallel stored field). Tests: 211 pass + new
derive_authority/coerce coverage. web-ui build + tsc clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes
(pure + tested) plus measurement/ops tooling.
halacha_quality.py
- #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS
case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags
now takes rule_type and flags rule_type=='application' OR fact-dependent —
blocking auto-approve (an illustration is not a generalizable holding).
- #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein /
lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band.
halacha_extractor.py — pass rule_type to the flag computation; re-type a
binding-labeled fact-application to 'application' (mirrors non_decision→obiter).
db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent
neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with
high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a
possibly-distinct principle unreviewed).
config.py — HALACHA_DEDUP_BAND_COSINE (0.83).
Scripts:
- scripts/halacha_goldset.py (#81.7) — export stratified sample for human
tagging; score validators (P/R/F1) against the tags. Backbone for #81.8.
- scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent
dedup (cosine ≥0.95), dry-run report only.
- scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds
against the 2026-06-03 cleanup gold-set.
Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE
on normalized quote are NOT included — the current skip+flag behavior is safe,
whereas a UNIQUE on normalized_quote would fail on existing dups and a blind
merge risks losing provenance; they need their own chair-reviewed migration.
#82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role
classifier deferred (section pre-filter + application flag cover the practical
case); #81.8 blocked on the human-tagged gold-set (harness now provided).
Verified:
- pytest tests/test_halacha_quality.py — 52 passed (14 new).
- calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall
0.30 — correct profile for an auto-approve-blocking signal.
- goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5
cross-precedent candidate pairs.
Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no
silent swallow — suspect items flagged to review, never dropped); G2 (no
parallel path — same store_halachot_for_chunk / compute_quality_flags).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route.
extractor.py
- extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings
summary) before it is stripped — a free professional gold-set (#86.3).
- _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped.
(a) פסק-דין headers are markdown-wrapped (**פסק דין**); the old anchor
required the keyword as the first line char with one separator, so it
missed the header and matched a citation 32K deep (עמ"נ 50567-07-21,
losing 45% of the body). Now tolerates leading markdown + 0-3 seps,
and the final-nun form (דין ן vs דינו נ).
(b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The
authoring-judge line ends with a colon; we now require it.
ingest.py
- capture the ratio before stripping and store it on the row (best-effort,
non-fatal); also strip the text-upload path (was file-only).
db.py
- add case_law.nevo_ratio column (additive); allow it in update_case_law.
scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration:
finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites
full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose
quote lives in the removed preamble (review_status=pending_review +
nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are
excluded from --apply as suspected over-strip. --apply writes backup+manifest
to data/audit/ first. Chair-gated — NOT applied here.
scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session,
zero cost) measures recall/precision/granularity of our halachot vs the Nevo
ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text).
Verified:
- pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown
over-strip regressions).
- backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75%
keep (the 32K over-strip is gone).
- benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x.
Invariants: G1 (normalize at source — strip/capture at ingest, not at read);
no silent swallow (contaminated halachot flagged + reported, not dropped);
data-migration is dry-run-default with backup+manifest, chair-gated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
write_interim_draft failed for all blocks from the CEO MCP instance with
"Claude CLI failed (exit 1): unknown error". Two fixes:
1. Error surfacing (the certain win): on non-zero exit, capture and log
both stderr AND stdout (the CLI sometimes writes its diagnostic to
stdout or nowhere), so the next occurrence is diagnosable instead of
collapsing to "unknown error". This is why #85 was unsolved — the real
error was swallowed (engineering rule §6: no silent swallow).
2. Defensive hardening: strip Claude Code session markers (CLAUDECODE,
CLAUDE_CODE_*, CLAUDE_AGENT_*, AI_AGENT, CLAUDE_EFFORT) from the env of
nested `claude -p` calls and run them from $HOME, decoupling them from
the parent agent's session/project state. Aligns query() with the
existing query_streaming() path (which already sets cwd=HOME). Auth/
config vars are preserved.
Note: the original adapter-context failure could not be reproduced in a
plain interactive session (nested claude -p succeeds there in both old and
new code), so the env markers are a suspect, not a proven cause. The real
value is the diagnostics. Verified: nested query() returns PONG from
inside a CLAUDECODE=1 session; unit tests cover env sanitization.
Invariants: G1 (normalize at source — fix the spawn, not readers),
G2 (no parallel path — same query()), §6 (no silent error swallow).
INV: feedback_claude_session_local_only preserved (all calls stay local).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
הפרוסה האחרונה של GAP-48 (INV-TOOL1). 18 כלי drafting הומרו ל-{status,data,message}
דרך tools/envelope.py — כולל מסלול הפקת-ההחלטה הקריטי.
עיקרון לכלים עם כשל משמעותי (export_docx/revise_draft/apply_user_edit): err()
ברמת-המעטפת — כך שהסוכן והמשתמש רואים את הכשל; failed_gates רוכב ב-data.
שאר הכלים: ok(data=payload) להצלחה, err להיעדר-תיק/קלט-שגוי/חריגה.
6 צרכני-app.py חוּוטו (get_decision_template, apply_user_edit ×2, revise_draft,
list_bookmarks, export_docx) עם envelope_unwrap + בדיקת status=="error"→4xx,
לשמירת חוזה-ה-API (X6) ללא-שינוי. test_export_qa_gate עודכן לחוזה החדש.
בדיקות: 182/182 עוברים (כולל שערי-QA של הייצוא).
GAP-48 סגור: כל ~12 משפחות-הכלים אחידות. נותר ב-FU-14: GAP-49/50 (שובר), GAP-54.
Invariants: משלים INV-TOOL1 + G2. מתועד ב-X9 (נסגר) + gap-audit פרוסה 7.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
strip_nevo_preamble's _DECISION_START only matched ועדת-ערר openings (בפנינו /
הערר שבנדון / ...), so Nevo COURT judgments — exactly the ones carrying a
מיני-רציו — slipped through unstripped. The editorial mini-ratio then leaked into
the chunked body, risking that the halacha extractor reads Nevo's answer key
(contamination) and polluting the corpus. Proven on בג"ץ 1764/05: its full_text
still contained the מיני-רציו (unstripped).
Fix:
- Extend _DECISION_START with court-ruling openings: פסק-דין/פסק דין header and
the authoring-judge line (השופט/ת, כב' השופט, הנשיא, המשנה לנשיא). re.search
picks the earliest line-start match → the real opinion start, not the prose
ratio above it.
- Widen the Nevo-marker detection window 400→1500 chars so a long court/parties
header doesn't push חקיקה שאוזכרה:/מיני-רציו: out of range.
Verified on the real 1764/05 full_text: strips 2702 chars, body now starts at
'השופט ס' ג'ובראן:', מיני-רציו gone. Regression: ועדת-ערר openings still strip;
non-Nevo text untouched; markers-past-400 now detected. Suite 182 passed (6 new).
This is the anti-contamination prerequisite for the Nevo-ratio gold-set (#86.3/#81.7).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After a precedent finishes extracting, a claude_session pass folds facets of the
SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89
granularity gap) into one canonical; the rest are marked 'rejected' (reversible:
out of the active corpus AND the review queue, but recoverable). FOLD-ONLY —
never merges distinct legal questions, never invents.
- Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding
needs careful judgment. One pass per precedent, runs in _extract_impl once all
chunks are done (the prompt dedups within a chunk; this catches across chunks).
- Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM,
build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed
shape; drops <2-member groups; coerces/dedups indices).
- halacha_extractor._consolidate_precedent picks the canonical per group
(approved>pending, higher confidence, quote_verified, longer) and rejects the
rest via the existing update_halachot_batch (#84). Never rejects a canonical.
Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched).
- config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT.
Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group
folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude
error → 0 folds (fail-open).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#81.3 — a post-extraction validator that flags halachot whose rule_statement is
NOT entailed by its supporting_quote (the model over-reaching beyond its source).
- Engine: claude_session-as-judge (local CLI, zero API cost) per chaim's standing
preference — one batched judge call per chunk, NOT a hosted NLI model.
- Pure, unit-tested helpers in halacha_quality: NLI_SYSTEM, build_nli_prompt,
parse_nli_verdicts (fails OPEN — any shape/label ambiguity → 'entailed').
- halacha_extractor._nli_check wraps the call; fails OPEN on any error (e.g. no
CLI in the container) so a flaky judge never blocks a genuine halacha.
- Non-entailed (neutral/contradiction) → quality_flag 'nli_unsupported' which
blocks auto-approve (routes to pending_review) via the existing store gate.
- config: HALACHA_NLI_ENABLED/MODEL/EFFORT (effort 'low' — entailment is simple).
Verified: suite 166 passed (10 new); LIVE smoke test against the real claude CLI
returned ['entailed','neutral'] for a supported vs unsupported rule.
Also commits TaskMaster #86 (Nevo preamble/ratio: anti-contamination strip fix +
gold-set benchmark) capturing today's strip_nevo_preamble findings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bake the 2026-06-03 strict-cleanup rubric into the extraction pipeline so the
corpus stays clean at the source instead of accumulating duplicates, obiter
dicta, truncated quotes and thin restatements that clog the review queue.
#81 — quality gate:
- New pure module halacha_quality.py with unit-tested validators:
non-decision/obiter (Wambaugh markers), truncated-quote (mid-word cut),
thin-restatement (rule≈quote), quote-unverified.
- Validators run in halacha_extractor._process; a non-decision is re-typed
obiter; flags persist in new halachot.quality_flags column.
- Auto-approve now requires confidence>=threshold AND no quality flags;
flagged items route to pending_review regardless of confidence.
- Both extraction prompts hardened: reject undecided dicta, exclude
case-specific applications, require abstraction, forbid over-splitting.
#82 — dedup-on-insert (store_halachot_for_chunk):
- Within the same precedent, skip a halacha whose normalized supporting_quote
already exists, or whose rule-embedding has cosine>=HALACHA_DEDUP_COSINE
(0.93) against an already-stored one. Makes re-runs idempotent.
Migration: halachot.quality_flags TEXT[] (additive, idempotent ALTER).
Tests: 19 new unit tests; full suite 156 passed. Validated end-to-end against
dev DB (dedup skips dups, flag blocks auto-approve, re-run inserts 0).
Calibration: flags fire on only ~10% of current survivors (low false-positive).
Spec: docs/halacha-strict-rubric.md
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wakeup-INSERT rule is universal (never allowlisted — hard invariant). Raw-HTTP
rule exempts the sanctioned helpers + standalone operator/admin scripts (a
distinct category per fitness-function scope differentiation + DRY: tooling
needn't reuse the FastAPI wrapper). Repo scanned clean under these rules.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dry-run surfaced 2 rows with בל"מ prefix but proceeding_type=ערר. Since the
migration strips the prefix, a wrong proceeding_type would silently lose the
בל"מ signal — must be chair-adjudicated, not auto-applied. Chair table now
flags 4 rows: 2 DUP_CHECK (8047-23) + 2 PROC_MISMATCH.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Four parallel sub-agents closed the remaining critical gaps from the
26/05 Stage A/B sprint. Each block independently tested; aggregated here.
## #30/#31 finalizers (sub-agent A)
* Auto-derive practice_area in case_create from case_number prefix
(1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197);
default for CaseCreateRequest is now "" (the DB constraint catches
any stray "appeals_committee").
* practice_area.py: derive_subtype now handles axis-B domain values
(rishuy_uvniya/betterment_levy/compensation_197) without parsing the
case number; new helper derive_domain_practice_area().
* Halacha re-extraction verified unnecessary — all 6 reclassified
records already had is_binding=false and approved halachot.
* Regression tests: 6 cases in tests/test_corpus_constraints.py
covering practice_area enum, internal-committee chair/district,
external-upload arar prefix, MCP guard.
* UI: district input → Select dropdown (7 districts) in
precedent-edit-sheet.tsx, preserving legacy free-text values.
## #37 בל"מ subtypes (sub-agent B)
* 3 new appeal_subtypes: extension_request_{building_permit,
betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended,
SUBTYPES_BY_AREA mappings added.
* New helpers: is_blam_subject(), is_blam_subtype(),
derive_subtype_with_blam(case_number, subject, practice_area).
case_create now uses it to auto-detect "בקשה להארכת מועד" subjects.
* 3 methodology templates under docs/methodology/extension-request-*.md.
* paperclip_client.py mapping updated for the 3 new subtypes
(extension_request_building_permit→CMP, the other two→CMPA).
* Frontend: bilingual "בל"מ" badge + filter dropdown on cases list +
detail header; appeal-type-bars collapseBlam() merges בל"מ into its
parent domain for aggregate bars.
* Wizard auto-detects בל"מ from subject during case creation.
* 3 Berlinger cases (1017/1018/1019-03-26) migrated to
appeal_subtype=extension_request_building_permit via psql.
## #35 missing_precedents feature (sub-agent C)
* Schema V13: missing_precedents table (citation, case_id, party,
legal_topic, status, linked_case_law_id, claim_quote, ...) +
FK constraints + 3 indexes. Applied via psql + idempotent migration.
* 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints
(POST/GET/PATCH/DELETE/upload — upload routes by citation prefix
to ingest_internal_decision or ingest_precedent).
* Next.js page /missing-precedents with 5 status tabs + filters +
sidebar badge counter + detail drawer with metadata edit + smart
upload form that switches fields per committee/court.
* Bootstrap: 7 rows imported from the JSON file
(3 citations × cases, all status=closed with linked_case_law_id).
* legal-researcher.md: new §2ב.5 with missing_precedent_create
usage + dedup semantics + tool grant.
## #36 legal_arguments aggregation (sub-agent D)
* Schema V14: legal_arguments + legal_argument_propositions M:M.
Applied via psql.
* New service argument_aggregator.py with two functions —
aggregate_claims_to_arguments() (Claude CLI / claude_session) and
get_legal_arguments(). Graceful llm_unavailable handling when CLI
is missing (containers).
* 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as
BackgroundTask, GET .../legal-arguments).
* Frontend: shadcn Accordion + new legal-arguments-panel.tsx with
hierarchical (party → priority badge → arguments) display, "טיעונים"
tab on the case page, "חשב/חשב מחדש" buttons.
* scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run
found 8 candidate cases including 1017/1018/1019.
## Open follow-ups (intentionally deferred)
* npm run api:types in web-ui (CLAUDE.md flow) — recommended before
the next UI commit; not required for backend deployment.
* Run backfill_legal_arguments.py --apply once the container picks up
the new aggregator service.
* webhook on missing-precedents upload-close to Paperclip (optional).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hebrew was rendering LTR or in Times New Roman fallback in some Word
contexts. Root cause: incomplete RTL marking and missing font hints
on the run level.
Three layers of RTL are required (per skills/docx/SKILL.md):
1. Section: <w:bidi/> in sectPr (now inherited from template)
2. Paragraph: <w:bidi/> directly in pPr (paragraph direction)
3. Run: <w:rtl/> in rPr — tells Word to use cs (complex-script) font
Without an explicit font on the run, Hebrew renders in the ascii slot
(Times New Roman). Force David on all four slots (ascii / hAnsi / cs /
eastAsia) so every shaping path picks the correct font.
Changes:
- TEMPLATE_PATH now points to skills/docx/decision_template.docx
(carries David, RTL, margins, styles); replaces hard-coded constants.
- _mark_run_rtl: writes rFonts on all four slots, then appends <w:rtl/>.
- _mark_paragraph_rtl: places <w:bidi/> directly in pPr (not nested in
rPr — that was the bug), and adds <w:rtl/> to the paragraph-mark rPr.
- _set_paragraph_jc: forces explicit jc, overriding style-inherited.
Tests:
- test_mark_paragraph_rtl_adds_bidi_directly_in_pPr — guards against
the regression where bidi was nested inside rPr.
- test_mark_run_rtl_forces_david_on_all_font_slots — ensures all four
font slots are set, not just cs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "על כן" pattern for block-yod-bet was too greedy and matched mid-discussion
transitional sentences (e.g. "על כן, במקום בו..."), which caused forward-scan
to skip block-yod-alef ("סוף דבר") via the pointer advance.
Tightened to require an operative subject (אנו / הערר / הוועדה / ועדת הערר)
so terminal "על כן, אנו מחליטים" still matches but mid-block transitions don't.
Added structural_fallback for cover blocks (alef/bet/gimel/dalet) — these are
template metadata not present in user-edited DOCX bodies. Inject zero-content
anchors so apply_user_edit can still target them later. The frontend toast
distinguishes real content gaps from fallback anchors.
Also expanded heading patterns based on training corpus inspection:
- block-vav: על המקרקעין חלות / במצב התכנוני / התכניות החלות
- block-zayin: טענות העוררת
- block-chet: עיקר תגובת המשיב
- block-tet: הדיון בוועדת הערר
For case 1130-25, this raises detection from 6/12 to 11/12 blocks — only
block-yod-bet remains missing (Daphna's edit ends at "סוף דבר" + numbered
ruling, no terminal "ההחלטה" or "על כן אנו מחליטים" paragraph).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes critical bug in 1033-25: user-uploaded עריכה-*.docx files were
orphaned on disk while exports kept rebuilding from stale DB blocks.
New architecture:
- User-uploaded DOCX becomes the source of truth (cases.active_draft_path)
- System edits via XML surgery with real Word <w:ins>/<w:del> revisions
- User can Accept/Reject each change from within Word
Components:
- docx_reviser.py: XML surgery for Track Changes (15 tests)
- docx_retrofit.py: retroactive bookmark injection with Hebrew marker
detection + heading heuristic (9 tests)
- docx_exporter.py: emits bookmarks around each of the 12 blocks
- 3 new MCP tools: apply_user_edit, list_bookmarks, revise_draft
- 4 new/updated endpoints: upload (auto-registers active draft),
/exports/revise, /exports/bookmarks, /exports/{filename}/retrofit,
/active-draft
- DB migration: cases.active_draft_path column
- UI: correct banner using real v-numbers, "מקור האמת" badge,
detailed upload toast with bookmarks_added/missing_blocks
- agents: legal-exporter (3 export modes), legal-ceo (stage G for
revision handling), legal-writer (revision mode)
Multi-tenancy:
- Works for both CMP (1xxx cases) and CMPA (8xxx/9xxx cases)
- New revise-draft skill added to both companies
- deploy-track-changes.sh syncs skills CMP ↔ CMPA
- retrofit_case.py: one-off retrofit of existing files
Tests: 34 passing (15 reviser + 9 retrofit + 4 exporter bookmarks + 6 e2e)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>