From 105d9626cace5b98aa8b55fc7fe569ea53f5363a Mon Sep 17 00:00:00 2001
From: Chaim <chaim@marcus-law.co.il>
Date: Sun, 31 May 2026 06:12:43 +0000
Subject: [PATCH 1/6] docs(spec): FU-2b internal identifier reconciliation
 design (GAP-07/08) + split external to #68
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Deterministic migration of ~52 internal_committee rows whose case_number holds
a full citation → normalized bare number (citation_formatted already correct).
DB analysis (2026-05-31): clean 1-token extraction, 0 key-collisions, 0
citation↔case_number mismatches, no month-padding dups. Chair-gated reversible
migration (backup→dry-run→approve→apply). One edge for chair: 8047/23 ערר vs בל"מ.
External (#68/FU-2c) split out — its citation_formatted is inconsistent.
Verified all 11 case_law FKs use id(UUID), not case_number → rename is FK-safe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .taskmaster/tasks/tasks.json                  |  13 +++
 ...1-fu2b-identifier-reconciliation-design.md | 101 ++++++++++++++++++
 2 files changed, 114 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-05-31-fu2b-identifier-reconciliation-design.md

diff --git a/.taskmaster/tasks/tasks.json b/.taskmaster/tasks/tasks.json
index 2cfefd4..cf166bc 100644
--- a/.taskmaster/tasks/tasks.json
+++ b/.taskmaster/tasks/tasks.json
@@ -2372,6 +2372,19 @@
             "parentId": "67"
           }
         ]
+      },
+      {
+        "id": "68",
+        "title": "[FU-2c] תיאום מזהי external_upload (case_number↔citation_formatted)",
+        "description": "פסיקה חיצונית: case_number מחזיק ציטוט מלא; citation_formatted לא תמיד תואם (נמצאה סתירה 25226-04-25 מול 1975/24). דורש קודם תיקון סתירות citation_formatted↔case_number, ואז הכרעה אם docket מחולץ הופך ל-case_number או שהציטוט נשאר המזהה.",
+        "details": "מקור: בדיקת DB 2026-05-31 (FU-2b scoping). 22/24 external עם ציטוט ב-case_number; citation_formatted נוצר בנפרד (LLM) ולא אמין כ-ground truth. שונה מ-internal (שם 0 סתירות). דורש סקירת-יו\"ר פר-רשומה. severity: Medium. סוג: data-migration + chair. תלוי בהחלטה: האם זהות external = ציטוט (FU-1) או docket מנורמל (INV-ID2). מופרד מ-FU-2b לפי החלטת chaim 2026-05-31.",
+        "testStrategy": "",
+        "status": "pending",
+        "dependencies": [
+          "67"
+        ],
+        "priority": "medium",
+        "subtasks": []
       }
     ],
     "metadata": {
diff --git a/docs/superpowers/specs/2026-05-31-fu2b-identifier-reconciliation-design.md b/docs/superpowers/specs/2026-05-31-fu2b-identifier-reconciliation-design.md
new file mode 100644
index 0000000..3cfac78
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-31-fu2b-identifier-reconciliation-design.md
@@ -0,0 +1,101 @@
+# FU-2b — תיאום מזהי `case_number` (Identifier Reconciliation) — עיצוב
+
+**סטטוס:** מאושר-לעיצוב · **תאריך:** 2026-05-31 · **ענף:** TBD
+**מכסה:** GAP-07, GAP-08 (scope: `internal_committee` בלבד) · **מספק:** INV-ID1, INV-ID2, INV-DM2
+**משימה:** TaskMaster #67 · **תלוי ב:** FU-2a (#60, פונקציית הנרמול) · **סוג:** **data-migration + chair-gate**
+**מחוץ-להיקף:** external_upload → **#68 / FU-2c** (נתונים סותרים, ראה §1).
+
+---
+
+## 1. הבעיה והיקף (מאומת מול DB, 2026-05-31)
+
+`internal_committee` הוא הקורפוס שבו `case_number` חייב להיות **מספר-ועדה מנורמל** (X1 §1), אך
+~52/56 רשומות מחזיקות **ציטוט-מלא** בשדה-המזהה (GAP-08 — "החלטות סופר"), בניגוד ל-INV-ID2
+(ציטוט = שדה-תצוגה נגזר, לעולם לא מזהה).
+
+**ממצאי-נתונים שמעצבים את המיגרציה:**
+- **חילוץ דטרמיניסטי ונקי:** כל 56 הרשומות → בדיוק token-מספר אחד (regex `[0-9]{2,6}(?:[-/][0-9]{1,2}){1,2}`). 0 רב-משמעיים, 0 בלתי-פתירים.
+- **עקביות מושלמת:** ב-55/56 המספר המחולץ **מופיע** ב-`citation_formatted`; **0 סתירות**. (1 רשומה בלי citation_formatted — כבר bare.)
+- **0 התנגשויות-מפתח** על (bare, proceeding_type) → **אין dedup**.
+- **אין בעיית with/without-month:** ה"צורות הכפולות" (1024-24 מול 1024-25 וכו') הן **שנים שונות** = תיקים שונים, לא padding.
+- **edge יחיד ליו"ר:** `8047/23` קיים פעמיים — אחת `proceeding_type=ערר`, אחת `בל"מ` (48 chunks כל אחת). לפי X1 אלו **שתי רשומות מובחנות** (ערר מול בל"מ), אך זהות chunk-count מצדיקה אימות-יו"ר שאינן כפילות מתויגת-שגוי.
+
+**external מופרד (#68):** ב-external נמצאה **סתירה** (`case_number=25226-04-25` מול
+`citation_formatted=1975/24`) — ה-citation_formatted נוצר בנפרד ואינו ground-truth אמין; דורש
+טיפול נפרד. בנוסף, זהות פסיקה-חיצונית היא טבעית הציטוט (אין מספר-ועדה). מחוץ ל-FU-2b.
+
+## 2. ההכרעה (מבוססת X1 + ממצאי-נתונים)
+
+הצורה הקנונית של `case_number` ל-internal = **trim · prefix-strip · `/`→`-`** על המספר הרשמי,
+**בלי להמציא/להסיר חודש** (X1 §1; מקורות: Codd 1NF · Kleppmann DDIA · SSOT — verified ב-X1).
+המיגרציה **דטרמיניסטית** (לא LLM): מחלצת את ה-token המספרי היחיד ומנרמלת. הציטוט כבר חי
+ב-`citation_formatted` — אין מה לנגוע בו.
+
+**דפוס-בטיחות (chair-gated reversible migration):** גיבוי-לפני-שינוי → dry-run שמפיק טבלת-תיאום
+→ **שער-אישור-יו"ר** → apply מפורש → אימות. זהו דפוס סטנדרטי למיגרציה בלתי-הפיכה על נתוני-ייצור.
+
+## 3. הרכיבים
+
+- **סקריפט** `scripts/fu2b_reconcile_internal_case_numbers.py` (לא MCP tool — מיגרציה חד-פעמית מבוקרת):
+  - `--dry-run` (ברירת-מחדל): מפיק טבלת-תיאום `data/audit/fu2b-reconciliation-<ts>.csv` +
+    `.md` קריא ליו"ר. עמודות: `id, current_case_number, proposed_bare, proceeding_type,
+    citation_formatted, consistency_ok, flag`.
+  - `--apply`: דורש קובץ-אישור (ראה §4); מגבה ואז מבצע.
+  - מעבד **רק** `source_kind='internal_committee'` ו**רק** רשומות שבהן `proposed_bare != case_number`
+    (idempotent — already-bare לא נוגעים).
+- **חילוץ:** `_extract_bare(case_number) -> str|None` — regex token יחיד + `_canonical_case_number`
+  (מ-FU-2a, db.py) לנרמול הסופי. אם 0 או >1 tokens → `None` + flag `NEEDS_CHAIR`.
+- **consistency guard:** אם `proposed_bare` **לא** מופיע ב-`citation_formatted` → flag `MISMATCH` (לא
+  יוחל אוטומטית; ליו"ר). (כיום 0 כאלה, אך הסקריפט בודק בזמן-ריצה.)
+- **גיבוי:** לפני apply, כתיבת `data/audit/fu2b-backup-<ts>.csv` = `(id, old_case_number)` לכל רשומה
+  שתשונה → revert-script טריוויאלי.
+- **edge 8047/23:** הסקריפט **לא** ממזג; מסמן את הזוג ב-flag `DUP_CHECK` בטבלה. ההכרעה (מובחנות מול
+  כפילות) היא של היו"ר; אם כפילות — מחיקה ידנית נפרדת (לא חלק מה-apply הדטרמיניסטי).
+
+## 4. שער-אישור-היו"ר (chair gate)
+
+1. הרצת `--dry-run` → טבלת-תיאום (`.md`) + סיכום (כמה ישתנו, אילו flags).
+2. **הצגה לדפנה**: הטבלה (52 שורות: ציטוט-נוכחי → bare מוצע) + ה-edge של 8047/23. היא מסמנת
+   שורות שגויות (אם יש) ומכריעה על 8047/23.
+3. תיקון flags לפי הערותיה (אם יש), ואז `--apply --approved data/audit/fu2b-approved-<ts>.csv`
+   (קובץ-האישור = הטבלה לאחר סקירתה; הסקריפט מחיל רק שורות שאושרו).
+4. אימות אחרי apply: כל internal `case_number` תואם regex bare; 0 ציטוטים בשדה-המזהה;
+   `search`/`get_case_by_number` עדיין פותרים (FU-2a tolerant-read + הנרמול).
+
+## 5. אינטראקציה עם FU-2a (forward-consistency)
+
+FU-2a `_canonical_case_number` מנרמל prefix+separator אך **אינו מחלץ מספר מתוך ציטוט-מלא**. לכן
+אם קליטה עתידית תעביר ציטוט-מלא כ-`case_number`, ייווצר שוב מזהה מלוכלך. **הערכת-סיכון:** נמוכה —
+טופס-ההעלאה וה-MCP tool מעבירים שדה-`case_number` נפרד (בד"כ נקי). **החלטה:** FU-2b הוא ניקוי-נתונים
+בלבד; הקשחת-כתיבה (חילוץ-token גם ב-create) **לא בהיקף** — תיפתח רק אם יתגלה caller שמעביר ציטוט.
+(מתועד; לא לשנות התנהגות-כתיבה בלי ראיה.)
+
+## 6. שינויי-התנהגות וסיכון
+
+| שינוי | השפעה | סיכון |
+|--------|--------|--------|
+| `case_number` של ~52 internal → bare | חיפוש exact-match על המספר עובד; (case_number,proceeding_type) נקי | נמוך — דטרמיניסטי, גיבוי, שער-יו"ר, 0 collisions |
+| 8047/23 edge | אולי מחיקת רשומה כפולה | בינוני — **רק** בהחלטת-יו"ר, מחיקה ידנית נפרדת, לא ב-apply האוטומטי |
+| citation_formatted | **לא משתנה** (כבר תקין) | אין |
+| FK/relations | `case_law_relations`/`precedent_internal_citations` מפנים ל-`id` (UUID), לא ל-case_number | אין — שינוי case_number לא שובר קשרים |
+| chunks/embeddings | מפתח-זר `case_law_id` (UUID) — לא תלוי ב-case_number | אין — re-index לא נדרש |
+
+## 7. אסטרטגיית בדיקה
+
+- **בדיקות-יחידה offline** (`tests/test_fu2b_reconcile.py`): `_extract_bare` — token יחיד→bare מנורמל;
+  ציטוט מלא→המספר הנכון (דוגמאות אמיתיות: `"ערר (...) 403/17 אהרון ברק..."`→`403-17`,
+  `"...8136-10-24 שחר..."`→`8136-10-24` חודש נשמר); 0/רב-token→None+flag; consistency guard.
+- **dry-run מול DB מקומי**: הטבלה מופקת, מספר-השורות-לשינוי = ~52, 0 MISMATCH, 1 DUP_CHECK (8047).
+- **apply בסביבת-בדיקה**: על עותק/תיק-בדיקה — אימות idempotency (הרצה שנייה = 0 שינויים) + revert מהגיבוי.
+- ה-apply בייצור רץ **רק אחרי אישור-יו"ר** (לא חלק מה-CI/PR; ידני ומבוקר).
+
+## 8. סדר-ביצוע
+
+1. בדיקות אדומות ל-`_extract_bare` + consistency guard.
+2. `_extract_bare` + הסקריפט (`--dry-run` בלבד תחילה) + הפקת טבלת-תיאום + גיבוי.
+3. בדיקות ירוקות + dry-run מול DB → הפקת הטבלה.
+4. **עצירה: הצגת הטבלה + 8047/23 ליו"ר (דפנה)** — שער-אישור.
+5. (אחרי אישור) מימוש `--apply --approved` + אימות + revert-script.
+6. הרצת apply בייצור (מבוקר) + אימות-אחרי + TaskMaster #67.
+
+> צעדים 1–3 לא דורשים את דפנה (אני מכין הכל). צעד 4 הוא שער-האישור. צעדים 5–6 אחרי אישורה.

From c2de69272dce88ccca1c7ca213fa20f5c69a6b0d Mon Sep 17 00:00:00 2001
From: Chaim <chaim@marcus-law.co.il>
Date: Sun, 31 May 2026 08:09:22 +0000
Subject: [PATCH 2/6] docs(plan): FU-2b identifier-reconciliation
 implementation plan (chair-gated, TDD)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 ...26-05-31-fu2b-identifier-reconciliation.md | 401 ++++++++++++++++++
 1 file changed, 401 insertions(+)
 create mode 100644 docs/superpowers/plans/2026-05-31-fu2b-identifier-reconciliation.md

diff --git a/docs/superpowers/plans/2026-05-31-fu2b-identifier-reconciliation.md b/docs/superpowers/plans/2026-05-31-fu2b-identifier-reconciliation.md
new file mode 100644
index 0000000..57b34a4
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-31-fu2b-identifier-reconciliation.md
@@ -0,0 +1,401 @@
+# FU-2b: Internal Identifier Reconciliation — Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Build a reversible, chair-gated migration script that rewrites `internal_committee` `case_number` values currently holding a full citation into the canonical normalized bare number (X1: trim · prefix-strip · `/`→`-`, month preserved), leaving `citation_formatted` untouched.
+
+**Architecture:** A standalone `scripts/` migration (not editable-service code), `--dry-run` by default. Dry-run emits a reconciliation table (CSV + Hebrew Markdown) for chair review; `--apply --approved <csv>` writes a backup then updates only chair-approved rows. Extraction is deterministic (single number-token regex) — no LLM. The production apply runs only AFTER Dafna approves the table.
+
+**Tech Stack:** Python 3.12, asyncpg, PostgreSQL@localhost:5433, pytest offline, `.venv` at `mcp-server/.venv`.
+
+**Spec:** [docs/superpowers/specs/2026-05-31-fu2b-identifier-reconciliation-design.md](../specs/2026-05-31-fu2b-identifier-reconciliation-design.md)
+
+**Run script:** `PY=/home/chaim/legal-ai/mcp-server/.venv/bin/python; $PY scripts/fu2b_reconcile_internal_case_numbers.py` (dry-run)
+**Run tests:** `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_fu2b_reconcile.py -v`
+
+---
+
+## File Structure
+
+- **Create** `scripts/fu2b_reconcile_internal_case_numbers.py` — the migration (pure `_extract_bare` + reconciliation builder + table/backup/revert writers + argparse `--dry-run`/`--apply`).
+- **Create** `mcp-server/tests/test_fu2b_reconcile.py` — offline tests for `_extract_bare` + consistency flagging (imports the script module via sys.path).
+- **Modify** `scripts/SCRIPTS.md` — register the new script (CLAUDE.md rule).
+- **Artifact (produced, committed for review)** `data/audit/fu2b-reconciliation-<ts>.md` — the chair table from the dry-run.
+
+No service code changes; no schema change. FK-safe (all `case_law` FKs use `id` UUID — verified).
+
+---
+
+## Task 1: Failing tests for `_extract_bare`
+
+**Files:** Create `mcp-server/tests/test_fu2b_reconcile.py`
+
+- [ ] **Step 1: Write the failing tests**
+
+```python
+"""FU-2b: deterministic bare-number extraction (offline)."""
+from __future__ import annotations
+
+import importlib.util
+from pathlib import Path
+
+import pytest
+
+# Load the migration script as a module (it lives in scripts/, not a package).
+_SCRIPT = Path(__file__).resolve().parents[2] / "scripts" / "fu2b_reconcile_internal_case_numbers.py"
+_spec = importlib.util.spec_from_file_location("fu2b_reconcile", _SCRIPT)
+fu2b = importlib.util.module_from_spec(_spec)
+_spec.loader.exec_module(fu2b)
+
+
+@pytest.mark.parametrize("raw,expected_bare", [
+    ("ערר ‏(‏ועדות ערר - תכנון ובנייה ירושלים‏)‏ 403/17 אהרון ברק נ'", "403-17"),
+    ("ערר (...) 8136-10-24 שחר שות'", "8136-10-24"),          # month preserved
+    ("בל\"מ (...) 1028/20 חלוואני ריאד", "1028-20"),
+    ("8047/23", "8047-23"),                                     # already-bare-ish
+    ("ערר 81002-01-21", "81002-01-21"),
+])
+def test_extract_bare_single_token(raw, expected_bare):
+    bare, flag = fu2b._extract_bare(raw)
+    assert bare == expected_bare
+    assert flag == "OK"
+
+
+def test_extract_bare_no_number():
+    bare, flag = fu2b._extract_bare("ערר אדלר נ' הוועדה")
+    assert bare is None and flag == "NO_NUMBER"
+
+
+def test_extract_bare_multiple_numbers_flagged():
+    # Two case-number-shaped tokens → ambiguous, must NOT auto-pick.
+    bare, flag = fu2b._extract_bare("ערר 403/17 ו-1024/24 מאוחדים")
+    assert bare is None and flag == "MULTI_NUMBER"
+
+
+def test_extract_bare_preserves_month_not_padding():
+    # Month kept exactly; 2-part stays 2-part (no invented month).
+    assert fu2b._extract_bare("ערר 8126/24 פלוני")[0] == "8126-24"
+    assert fu2b._extract_bare("ערר 8126-03-25 פלוני")[0] == "8126-03-25"
+
+
+def test_consistency_flag_when_bare_absent_from_citation():
+    # proposed bare must appear in citation_formatted, else MISMATCH.
+    assert fu2b._consistency_flag("403-17", "ערר (...) 403/17 אהרון ברק") == "OK"
+    assert fu2b._consistency_flag("403-17", "ערר (...) 1975/24 מישהו אחר") == "MISMATCH"
+    assert fu2b._consistency_flag("403-17", "") == "NO_CITATION"
+```
+
+- [ ] **Step 2: Run to verify failure**
+
+Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_fu2b_reconcile.py -v`
+Expected: FAIL — `FileNotFoundError`/`ModuleNotFoundError` (script doesn't exist) or `AttributeError: _extract_bare`.
+
+- [ ] **Step 3: Commit**
+
+```bash
+cd ~/legal-ai
+git add mcp-server/tests/test_fu2b_reconcile.py
+git commit -m "test(fu2b): failing tests for bare-number extraction (FU-2b)"
+```
+
+---
+
+## Task 2: The migration script (dry-run + apply + backup)
+
+**Files:** Create `scripts/fu2b_reconcile_internal_case_numbers.py`
+
+- [ ] **Step 1: Write the script**
+
+```python
+#!/usr/bin/env python3
+"""FU-2b — reconcile internal_committee case_number → canonical bare number.
+
+Rewrites case_number values that currently hold a full citation into the
+canonical normalized bare number (X1: trim · prefix-strip · '/'→'-', month
+preserved). citation_formatted is the display field and is left untouched.
+
+DETERMINISTIC — no LLM. Extraction takes the single case-number-shaped token
+from the value; 0 or >1 tokens are flagged for chair review, never guessed.
+
+Usage (must use the mcp-server venv — asyncpg/pgvector vendored there):
+    PY=/home/chaim/legal-ai/mcp-server/.venv/bin/python
+
+    # Dry-run (default): builds the reconciliation table for chair review.
+    $PY scripts/fu2b_reconcile_internal_case_numbers.py
+
+    # Apply ONLY the chair-approved rows (after Dafna's review), backup first:
+    $PY scripts/fu2b_reconcile_internal_case_numbers.py --apply \
+        --approved data/audit/fu2b-approved-<ts>.csv
+
+Scope: source_kind='internal_committee' only (external → #68/FU-2c). FK-safe:
+all case_law FKs reference case_law.id (UUID), not case_number.
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import os
+import re
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(REPO_ROOT / "mcp-server" / "src"))
+
+if "POSTGRES_URL" not in os.environ:
+    os.environ["POSTGRES_URL"] = (
+        f"postgres://{os.environ.get('POSTGRES_USER','legal_ai')}:"
+        f"{os.environ.get('POSTGRES_PASSWORD','')}@"
+        f"{os.environ.get('POSTGRES_HOST','127.0.0.1')}:"
+        f"{os.environ.get('POSTGRES_PORT','5433')}/"
+        f"{os.environ.get('POSTGRES_DB','legal_ai')}"
+    )
+
+AUDIT_DIR = REPO_ROOT / "data" / "audit"
+_TOKEN_RE = re.compile(r"[0-9]{2,6}(?:[-/][0-9]{1,2}){1,2}")
+
+
+def _extract_bare(case_number: str) -> tuple[str | None, str]:
+    """Return (canonical_bare, flag). flag ∈ {OK, NO_NUMBER, MULTI_NUMBER}.
+
+    Deterministic: finds case-number-shaped tokens (NNNN/YY or NNNN-MM-YY).
+    Exactly one → normalize '/'→'-' (month preserved, none invented). 0 or >1
+    → None + flag (chair decides; never guess).
+    """
+    tokens = _TOKEN_RE.findall(case_number or "")
+    if len(tokens) == 1:
+        return tokens[0].replace("/", "-"), "OK"
+    if not tokens:
+        return None, "NO_NUMBER"
+    return None, "MULTI_NUMBER"
+
+
+def _consistency_flag(bare: str | None, citation_formatted: str) -> str:
+    """OK if bare appears in citation_formatted; MISMATCH if not; NO_CITATION if empty."""
+    if not citation_formatted:
+        return "NO_CITATION"
+    if not bare:
+        return "NO_NUMBER"
+    # compare against the citation with separators unified, to match 403/17 vs 403-17
+    cf = citation_formatted.replace("/", "-")
+    return "OK" if bare in cf else "MISMATCH"
+
+
+async def _build_reconciliation() -> list[dict]:
+    from legal_mcp.services import db
+    pool = await db.get_pool()
+    async with pool.acquire() as conn:
+        rows = await conn.fetch(
+            "SELECT id, case_number, proceeding_type, coalesce(citation_formatted,'') AS cf "
+            "FROM case_law WHERE source_kind='internal_committee' ORDER BY case_number")
+    # detect dup serials across proceeding_type for a DUP_CHECK flag
+    out: list[dict] = []
+    for r in rows:
+        bare, flag = _extract_bare(r["case_number"])
+        cons = _consistency_flag(bare, r["cf"])
+        changes = bare is not None and bare != r["case_number"]
+        out.append({
+            "id": str(r["id"]),
+            "current_case_number": r["case_number"],
+            "proposed_bare": bare or "",
+            "proceeding_type": r["proceeding_type"] or "",
+            "citation_formatted": r["cf"],
+            "extract_flag": flag,
+            "consistency": cons,
+            "will_change": "yes" if changes else "no",
+        })
+    # DUP_CHECK: same proposed_bare appearing on >1 row (any proceeding_type)
+    from collections import Counter
+    bare_counts = Counter(d["proposed_bare"] for d in out if d["proposed_bare"])
+    for d in out:
+        if d["proposed_bare"] and bare_counts[d["proposed_bare"]] > 1:
+            d["dup_check"] = "DUP_CHECK"
+        else:
+            d["dup_check"] = ""
+    return out
+
+
+def _ts() -> str:
+    return datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+
+
+def _write_table(rows: list[dict], ts: str) -> tuple[Path, Path]:
+    AUDIT_DIR.mkdir(parents=True, exist_ok=True)
+    csv_path = AUDIT_DIR / f"fu2b-reconciliation-{ts}.csv"
+    md_path = AUDIT_DIR / f"fu2b-reconciliation-{ts}.md"
+    cols = ["id", "current_case_number", "proposed_bare", "proceeding_type",
+            "citation_formatted", "extract_flag", "consistency", "dup_check", "will_change"]
+    with csv_path.open("w", newline="", encoding="utf-8") as f:
+        w = csv.DictWriter(f, fieldnames=cols)
+        w.writeheader()
+        w.writerows(rows)
+    changing = [r for r in rows if r["will_change"] == "yes"]
+    flagged = [r for r in rows if r["extract_flag"] != "OK" or r["consistency"] == "MISMATCH" or r["dup_check"]]
+    with md_path.open("w", encoding="utf-8") as f:
+        f.write(f"# FU-2b — טבלת-תיאום מזהים (internal_committee) — {ts}\n\n")
+        f.write(f"- סה\"כ רשומות: {len(rows)}\n- ישתנו: {len(changing)}\n- מסומנות לסקירה: {len(flagged)}\n\n")
+        f.write("## דורש הכרעת-יו\"ר (flags)\n\n")
+        f.write("| current_case_number | proposed_bare | proc | flags |\n|---|---|---|---|\n")
+        for r in flagged:
+            fl = " ".join(x for x in [r["extract_flag"] if r["extract_flag"] != "OK" else "",
+                                       r["consistency"] if r["consistency"] == "MISMATCH" else "",
+                                       r["dup_check"]] if x)
+            f.write(f"| {r['current_case_number'][:50]} | {r['proposed_bare']} | {r['proceeding_type']} | {fl} |\n")
+        f.write("\n## כל השינויים המוצעים\n\n")
+        f.write("| current_case_number | → proposed_bare | proc |\n|---|---|---|\n")
+        for r in changing:
+            f.write(f"| {r['current_case_number'][:55]} | {r['proposed_bare']} | {r['proceeding_type']} |\n")
+    return csv_path, md_path
+
+
+async def _apply(approved_csv: Path, ts: str) -> dict:
+    from legal_mcp.services import db
+    with approved_csv.open(encoding="utf-8") as f:
+        approved = [r for r in csv.DictReader(f)
+                    if r.get("will_change") == "yes" and r.get("proposed_bare")]
+    if not approved:
+        return {"applied": 0, "note": "no approved changing rows"}
+    AUDIT_DIR.mkdir(parents=True, exist_ok=True)
+    backup = AUDIT_DIR / f"fu2b-backup-{ts}.csv"
+    pool = await db.get_pool()
+    applied = 0
+    with backup.open("w", newline="", encoding="utf-8") as bf:
+        bw = csv.writer(bf)
+        bw.writerow(["id", "old_case_number"])
+        async with pool.acquire() as conn:
+            for r in approved:
+                old = await conn.fetchval("SELECT case_number FROM case_law WHERE id=$1", r["id"])
+                if old is None:
+                    continue
+                bw.writerow([r["id"], old])
+                await conn.execute(
+                    "UPDATE case_law SET case_number=$2 WHERE id=$1 "
+                    "AND source_kind='internal_committee'",
+                    r["id"], r["proposed_bare"])
+                applied += 1
+    return {"applied": applied, "backup": str(backup)}
+
+
+async def main() -> int:
+    parser = argparse.ArgumentParser(description="FU-2b internal case_number reconciliation")
+    parser.add_argument("--apply", action="store_true", help="apply approved changes (default: dry-run)")
+    parser.add_argument("--approved", type=str, help="path to chair-approved CSV (required with --apply)")
+    args = parser.parse_args()
+    ts = _ts()
+
+    if not args.apply:
+        rows = await _build_reconciliation()
+        csv_path, md_path = _write_table(rows, ts)
+        changing = sum(1 for r in rows if r["will_change"] == "yes")
+        flagged = sum(1 for r in rows if r["extract_flag"] != "OK" or r["consistency"] == "MISMATCH" or r["dup_check"])
+        print(f"DRY-RUN: {len(rows)} rows | will_change={changing} | flagged={flagged}")
+        print(f"  table:  {md_path}")
+        print(f"  csv:    {csv_path}")
+        print("Review the table with the chair, then run --apply --approved <reviewed.csv>.")
+        return 0
+
+    if not args.approved:
+        print("ERROR: --apply requires --approved <csv> (the chair-reviewed table).", file=sys.stderr)
+        return 2
+    result = await _apply(Path(args.approved), ts)
+    print(f"APPLIED: {result}")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(asyncio.run(main()))
+```
+
+- [ ] **Step 2: Run the unit tests**
+
+Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_fu2b_reconcile.py -v`
+Expected: ALL pass (extraction + flags + consistency).
+
+- [ ] **Step 3: Commit**
+
+```bash
+cd ~/legal-ai
+chmod +x scripts/fu2b_reconcile_internal_case_numbers.py
+git add scripts/fu2b_reconcile_internal_case_numbers.py
+git commit -m "feat(fu2b): chair-gated internal case_number reconciliation script (GAP-07/08)"
+```
+
+---
+
+## Task 3: Dry-run against the DB → produce the chair table
+
+**Files:** Produces `data/audit/fu2b-reconciliation-<ts>.{csv,md}`
+
+- [ ] **Step 1: Run the dry-run**
+
+```bash
+cd ~/legal-ai && set -a && source ~/.env 2>/dev/null && set +a
+PY=/home/chaim/legal-ai/mcp-server/.venv/bin/python
+$PY scripts/fu2b_reconcile_internal_case_numbers.py
+```
+Expected output: `DRY-RUN: 56 rows | will_change=~52 | flagged=~1` (the ~1 = the 8047/23 DUP_CHECK pair → 2 rows flagged). Note the exact numbers.
+
+- [ ] **Step 2: Sanity-check the produced table**
+
+Open `data/audit/fu2b-reconciliation-<ts>.md`. Verify:
+- `will_change` rows: each `current_case_number` (full citation) → a clean `proposed_bare` matching the number inside it.
+- `flagged` section: should contain the `8047-23` DUP_CHECK pair (ערר + בל"מ) and ideally nothing else (0 MULTI_NUMBER, 0 MISMATCH expected per the analysis).
+- If MULTI_NUMBER / MISMATCH rows appear unexpectedly, STOP and report them (the analysis predicted 0; an unexpected flag means the data changed and needs investigation before chair review).
+
+- [ ] **Step 3: Commit the produced table as a review artifact**
+
+```bash
+cd ~/legal-ai
+git add data/audit/fu2b-reconciliation-*.md data/audit/fu2b-reconciliation-*.csv
+git commit -m "chore(fu2b): dry-run reconciliation table for chair review (GAP-07/08)"
+```
+(If `data/audit/` is gitignored, skip the commit and report the path instead — the table still exists on disk for review.)
+
+---
+
+## Task 4: SCRIPTS.md + PR
+
+- [ ] **Step 1: Register the script in `scripts/SCRIPTS.md`**
+
+Add a row to the active-scripts table (match the file's existing table format) describing `fu2b_reconcile_internal_case_numbers.py`: purpose (FU-2b internal case_number reconciliation, GAP-07/08), status (active, chair-gated), usage (dry-run default / `--apply --approved`).
+
+- [ ] **Step 2: Full suite + commit + push + PR**
+
+```bash
+cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/ -q   # report summary (expect all pass)
+cd ~/legal-ai
+git add scripts/SCRIPTS.md
+git commit -m "docs(scripts): register fu2b reconciliation script (FU-2b)"
+git push -u origin fix/fu2b-identifier-reconciliation
+```
+Then create the PR via the Gitea REST API (token from `~/.git-credentials`) and merge per the standing PR+merge rule. The PR delivers the **tooling + dry-run table**; the production `--apply` is the separate gated step below.
+
+---
+
+## Task 5: [HUMAN GATE] Chair review + gated apply (NOT automated)
+
+> This task is the chair-approval gate. It is NOT executed by an implementer subagent.
+
+- [ ] **Step 1:** Present `data/audit/fu2b-reconciliation-<ts>.md` to the controller, who presents it to Dafna: the ~52 proposed changes + the `8047-23` ערר/בל"מ DUP_CHECK pair. Dafna confirms the mapping and adjudicates whether 8047/23 is two distinct proceedings (keep both) or a mis-tagged duplicate (manual delete, separate).
+- [ ] **Step 2:** Save the reviewed table as `data/audit/fu2b-approved-<ts>.csv` (rows Dafna approved; `will_change=yes` only for those).
+- [ ] **Step 3:** Run the gated apply against the DB:
+  ```bash
+  cd ~/legal-ai && set -a && source ~/.env && set +a
+  PY=/home/chaim/legal-ai/mcp-server/.venv/bin/python
+  $PY scripts/fu2b_reconcile_internal_case_numbers.py --apply --approved data/audit/fu2b-approved-<ts>.csv
+  ```
+- [ ] **Step 4:** Verify: re-run dry-run → `will_change=0` (idempotent); spot-check `get_case_by_number` still resolves a migrated case; confirm a backup CSV was written (revert path). Mark TaskMaster #67 done.
+
+---
+
+## Self-Review Notes
+
+- **GAP-07/08 (internal)** → Task 2 script + Task 3 dry-run + Task 5 gated apply. Canonical form per X1 (month preserved) — `_extract_bare` replaces only `/`→`-` on the single extracted token, never strips/pads a month.
+- **Reversible:** `_apply` writes `fu2b-backup-<ts>.csv` (id, old_case_number) before each UPDATE.
+- **Chair gate:** `--apply` requires `--approved <csv>`; production apply is Task 5 (human), not part of the PR merge.
+- **Determinism / safety:** 0/>1 token → flagged, never guessed; consistency + DUP_CHECK flags surface the 8047 edge.
+- **Scope:** `source_kind='internal_committee'` only (the UPDATE has the `AND source_kind='internal_committee'` guard); external → #68.
+- **FK-safe:** verified all 11 `case_law` FKs use `id` (UUID).
+- **Type consistency:** `_extract_bare(case_number)->(bare|None,flag)`, `_consistency_flag(bare,citation)->str` — names match tests (Task 1) and script (Task 2).

From a41fcedc286587ada73c1eacdc1d423b634d5d0e Mon Sep 17 00:00:00 2001
From: Chaim <chaim@marcus-law.co.il>
Date: Sun, 31 May 2026 08:52:48 +0000
Subject: [PATCH 3/6] test(fu2b): failing tests for bare-number extraction
 (FU-2b)

---
 mcp-server/tests/test_fu2b_reconcile.py | 50 +++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 mcp-server/tests/test_fu2b_reconcile.py

diff --git a/mcp-server/tests/test_fu2b_reconcile.py b/mcp-server/tests/test_fu2b_reconcile.py
new file mode 100644
index 0000000..39c53f5
--- /dev/null
+++ b/mcp-server/tests/test_fu2b_reconcile.py
@@ -0,0 +1,50 @@
+"""FU-2b: deterministic bare-number extraction (offline)."""
+from __future__ import annotations
+
+import importlib.util
+from pathlib import Path
+
+import pytest
+
+# Load the migration script as a module (it lives in scripts/, not a package).
+_SCRIPT = Path(__file__).resolve().parents[2] / "scripts" / "fu2b_reconcile_internal_case_numbers.py"
+_spec = importlib.util.spec_from_file_location("fu2b_reconcile", _SCRIPT)
+fu2b = importlib.util.module_from_spec(_spec)
+_spec.loader.exec_module(fu2b)
+
+
+@pytest.mark.parametrize("raw,expected_bare", [
+    ("ערר ‏(‏ועדות ערר - תכנון ובנייה ירושלים‏)‏ 403/17 אהרון ברק נ'", "403-17"),
+    ("ערר (...) 8136-10-24 שחר שות'", "8136-10-24"),          # month preserved
+    ("בל\"מ (...) 1028/20 חלוואני ריאד", "1028-20"),
+    ("8047/23", "8047-23"),                                     # already-bare-ish
+    ("ערר 81002-01-21", "81002-01-21"),
+])
+def test_extract_bare_single_token(raw, expected_bare):
+    bare, flag = fu2b._extract_bare(raw)
+    assert bare == expected_bare
+    assert flag == "OK"
+
+
+def test_extract_bare_no_number():
+    bare, flag = fu2b._extract_bare("ערר אדלר נ' הוועדה")
+    assert bare is None and flag == "NO_NUMBER"
+
+
+def test_extract_bare_multiple_numbers_flagged():
+    # Two case-number-shaped tokens → ambiguous, must NOT auto-pick.
+    bare, flag = fu2b._extract_bare("ערר 403/17 ו-1024/24 מאוחדים")
+    assert bare is None and flag == "MULTI_NUMBER"
+
+
+def test_extract_bare_preserves_month_not_padding():
+    # Month kept exactly; 2-part stays 2-part (no invented month).
+    assert fu2b._extract_bare("ערר 8126/24 פלוני")[0] == "8126-24"
+    assert fu2b._extract_bare("ערר 8126-03-25 פלוני")[0] == "8126-03-25"
+
+
+def test_consistency_flag_when_bare_absent_from_citation():
+    # proposed bare must appear in citation_formatted, else MISMATCH.
+    assert fu2b._consistency_flag("403-17", "ערר (...) 403/17 אהרון ברק") == "OK"
+    assert fu2b._consistency_flag("403-17", "ערר (...) 1975/24 מישהו אחר") == "MISMATCH"
+    assert fu2b._consistency_flag("403-17", "") == "NO_CITATION"

From ab8d17fdd87f8ebded7c9ff0ea519afa113c6f14 Mon Sep 17 00:00:00 2001
From: Chaim <chaim@marcus-law.co.il>
Date: Sun, 31 May 2026 08:54:38 +0000
Subject: [PATCH 4/6] feat(fu2b): chair-gated internal case_number
 reconciliation script (GAP-07/08)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 .../fu2b_reconcile_internal_case_numbers.py   | 193 ++++++++++++++++++
 1 file changed, 193 insertions(+)
 create mode 100755 scripts/fu2b_reconcile_internal_case_numbers.py

diff --git a/scripts/fu2b_reconcile_internal_case_numbers.py b/scripts/fu2b_reconcile_internal_case_numbers.py
new file mode 100755
index 0000000..8ab4238
--- /dev/null
+++ b/scripts/fu2b_reconcile_internal_case_numbers.py
@@ -0,0 +1,193 @@
+#!/usr/bin/env python3
+"""FU-2b — reconcile internal_committee case_number → canonical bare number.
+
+Rewrites case_number values that currently hold a full citation into the
+canonical normalized bare number (X1: trim · prefix-strip · '/'→'-', month
+preserved). citation_formatted is the display field and is left untouched.
+
+DETERMINISTIC — no LLM. Extraction takes the single case-number-shaped token
+from the value; 0 or >1 tokens are flagged for chair review, never guessed.
+
+Usage (must use the mcp-server venv — asyncpg/pgvector vendored there):
+    PY=/home/chaim/legal-ai/mcp-server/.venv/bin/python
+
+    # Dry-run (default): builds the reconciliation table for chair review.
+    $PY scripts/fu2b_reconcile_internal_case_numbers.py
+
+    # Apply ONLY the chair-approved rows (after Dafna's review), backup first:
+    $PY scripts/fu2b_reconcile_internal_case_numbers.py --apply \
+        --approved data/audit/fu2b-approved-<ts>.csv
+
+Scope: source_kind='internal_committee' only (external → #68/FU-2c). FK-safe:
+all case_law FKs reference case_law.id (UUID), not case_number.
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import os
+import re
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(REPO_ROOT / "mcp-server" / "src"))
+
+if "POSTGRES_URL" not in os.environ:
+    os.environ["POSTGRES_URL"] = (
+        f"postgres://{os.environ.get('POSTGRES_USER','legal_ai')}:"
+        f"{os.environ.get('POSTGRES_PASSWORD','')}@"
+        f"{os.environ.get('POSTGRES_HOST','127.0.0.1')}:"
+        f"{os.environ.get('POSTGRES_PORT','5433')}/"
+        f"{os.environ.get('POSTGRES_DB','legal_ai')}"
+    )
+
+AUDIT_DIR = REPO_ROOT / "data" / "audit"
+_TOKEN_RE = re.compile(r"[0-9]{2,6}(?:[-/][0-9]{1,2}){1,2}")
+
+
+def _extract_bare(case_number: str) -> tuple[str | None, str]:
+    """Return (canonical_bare, flag). flag ∈ {OK, NO_NUMBER, MULTI_NUMBER}.
+
+    Deterministic: finds case-number-shaped tokens (NNNN/YY or NNNN-MM-YY).
+    Exactly one → normalize '/'→'-' (month preserved, none invented). 0 or >1
+    → None + flag (chair decides; never guess).
+    """
+    tokens = _TOKEN_RE.findall(case_number or "")
+    if len(tokens) == 1:
+        return tokens[0].replace("/", "-"), "OK"
+    if not tokens:
+        return None, "NO_NUMBER"
+    return None, "MULTI_NUMBER"
+
+
+def _consistency_flag(bare: str | None, citation_formatted: str) -> str:
+    """OK if bare appears in citation_formatted; MISMATCH if not; NO_CITATION if empty."""
+    if not citation_formatted:
+        return "NO_CITATION"
+    if not bare:
+        return "NO_NUMBER"
+    cf = citation_formatted.replace("/", "-")
+    return "OK" if bare in cf else "MISMATCH"
+
+
+async def _build_reconciliation() -> list[dict]:
+    from legal_mcp.services import db
+    pool = await db.get_pool()
+    async with pool.acquire() as conn:
+        rows = await conn.fetch(
+            "SELECT id, case_number, proceeding_type, coalesce(citation_formatted,'') AS cf "
+            "FROM case_law WHERE source_kind='internal_committee' ORDER BY case_number")
+    out: list[dict] = []
+    for r in rows:
+        bare, flag = _extract_bare(r["case_number"])
+        cons = _consistency_flag(bare, r["cf"])
+        changes = bare is not None and bare != r["case_number"]
+        out.append({
+            "id": str(r["id"]),
+            "current_case_number": r["case_number"],
+            "proposed_bare": bare or "",
+            "proceeding_type": r["proceeding_type"] or "",
+            "citation_formatted": r["cf"],
+            "extract_flag": flag,
+            "consistency": cons,
+            "will_change": "yes" if changes else "no",
+        })
+    from collections import Counter
+    bare_counts = Counter(d["proposed_bare"] for d in out if d["proposed_bare"])
+    for d in out:
+        d["dup_check"] = "DUP_CHECK" if (d["proposed_bare"] and bare_counts[d["proposed_bare"]] > 1) else ""
+    return out
+
+
+def _ts() -> str:
+    return datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+
+
+def _write_table(rows: list[dict], ts: str) -> tuple[Path, Path]:
+    AUDIT_DIR.mkdir(parents=True, exist_ok=True)
+    csv_path = AUDIT_DIR / f"fu2b-reconciliation-{ts}.csv"
+    md_path = AUDIT_DIR / f"fu2b-reconciliation-{ts}.md"
+    cols = ["id", "current_case_number", "proposed_bare", "proceeding_type",
+            "citation_formatted", "extract_flag", "consistency", "dup_check", "will_change"]
+    with csv_path.open("w", newline="", encoding="utf-8") as f:
+        w = csv.DictWriter(f, fieldnames=cols)
+        w.writeheader()
+        w.writerows(rows)
+    changing = [r for r in rows if r["will_change"] == "yes"]
+    flagged = [r for r in rows if r["extract_flag"] != "OK" or r["consistency"] == "MISMATCH" or r["dup_check"]]
+    with md_path.open("w", encoding="utf-8") as f:
+        f.write(f"# FU-2b — טבלת-תיאום מזהים (internal_committee) — {ts}\n\n")
+        f.write(f"- סה\"כ רשומות: {len(rows)}\n- ישתנו: {len(changing)}\n- מסומנות לסקירה: {len(flagged)}\n\n")
+        f.write("## דורש הכרעת-יו\"ר (flags)\n\n")
+        f.write("| current_case_number | proposed_bare | proc | flags |\n|---|---|---|---|\n")
+        for r in flagged:
+            fl = " ".join(x for x in [r["extract_flag"] if r["extract_flag"] != "OK" else "",
+                                       r["consistency"] if r["consistency"] == "MISMATCH" else "",
+                                       r["dup_check"]] if x)
+            f.write(f"| {r['current_case_number'][:50]} | {r['proposed_bare']} | {r['proceeding_type']} | {fl} |\n")
+        f.write("\n## כל השינויים המוצעים\n\n")
+        f.write("| current_case_number | → proposed_bare | proc |\n|---|---|---|\n")
+        for r in changing:
+            f.write(f"| {r['current_case_number'][:55]} | {r['proposed_bare']} | {r['proceeding_type']} |\n")
+    return csv_path, md_path
+
+
+async def _apply(approved_csv: Path, ts: str) -> dict:
+    from legal_mcp.services import db
+    with approved_csv.open(encoding="utf-8") as f:
+        approved = [r for r in csv.DictReader(f)
+                    if r.get("will_change") == "yes" and r.get("proposed_bare")]
+    if not approved:
+        return {"applied": 0, "note": "no approved changing rows"}
+    AUDIT_DIR.mkdir(parents=True, exist_ok=True)
+    backup = AUDIT_DIR / f"fu2b-backup-{ts}.csv"
+    pool = await db.get_pool()
+    applied = 0
+    with backup.open("w", newline="", encoding="utf-8") as bf:
+        bw = csv.writer(bf)
+        bw.writerow(["id", "old_case_number"])
+        async with pool.acquire() as conn:
+            for r in approved:
+                old = await conn.fetchval("SELECT case_number FROM case_law WHERE id=$1", r["id"])
+                if old is None:
+                    continue
+                bw.writerow([r["id"], old])
+                await conn.execute(
+                    "UPDATE case_law SET case_number=$2 WHERE id=$1 "
+                    "AND source_kind='internal_committee'",
+                    r["id"], r["proposed_bare"])
+                applied += 1
+    return {"applied": applied, "backup": str(backup)}
+
+
+async def main() -> int:
+    parser = argparse.ArgumentParser(description="FU-2b internal case_number reconciliation")
+    parser.add_argument("--apply", action="store_true", help="apply approved changes (default: dry-run)")
+    parser.add_argument("--approved", type=str, help="path to chair-approved CSV (required with --apply)")
+    args = parser.parse_args()
+    ts = _ts()
+
+    if not args.apply:
+        rows = await _build_reconciliation()
+        csv_path, md_path = _write_table(rows, ts)
+        changing = sum(1 for r in rows if r["will_change"] == "yes")
+        flagged = sum(1 for r in rows if r["extract_flag"] != "OK" or r["consistency"] == "MISMATCH" or r["dup_check"])
+        print(f"DRY-RUN: {len(rows)} rows | will_change={changing} | flagged={flagged}")
+        print(f"  table:  {md_path}")
+        print(f"  csv:    {csv_path}")
+        print("Review the table with the chair, then run --apply --approved <reviewed.csv>.")
+        return 0
+
+    if not args.approved:
+        print("ERROR: --apply requires --approved <csv> (the chair-reviewed table).", file=sys.stderr)
+        return 2
+    result = await _apply(Path(args.approved), ts)
+    print(f"APPLIED: {result}")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(asyncio.run(main()))

From e46868fedaf9db7282c5fa7d0ac98146bdc99f42 Mon Sep 17 00:00:00 2001
From: Chaim <chaim@marcus-law.co.il>
Date: Sun, 31 May 2026 08:57:42 +0000
Subject: [PATCH 5/6] feat(fu2b): flag PROC_MISMATCH (case_number prefix vs
 proceeding_type) for chair
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Dry-run surfaced 2 rows with בל"מ prefix but proceeding_type=ערר. Since the
migration strips the prefix, a wrong proceeding_type would silently lose the
בל"מ signal — must be chair-adjudicated, not auto-applied. Chair table now
flags 4 rows: 2 DUP_CHECK (8047-23) + 2 PROC_MISMATCH.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 mcp-server/tests/test_fu2b_reconcile.py       | 12 ++++++++
 .../fu2b_reconcile_internal_case_numbers.py   | 29 ++++++++++++++++---
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/mcp-server/tests/test_fu2b_reconcile.py b/mcp-server/tests/test_fu2b_reconcile.py
index 39c53f5..8dbc968 100644
--- a/mcp-server/tests/test_fu2b_reconcile.py
+++ b/mcp-server/tests/test_fu2b_reconcile.py
@@ -48,3 +48,15 @@ def test_consistency_flag_when_bare_absent_from_citation():
     assert fu2b._consistency_flag("403-17", "ערר (...) 403/17 אהרון ברק") == "OK"
     assert fu2b._consistency_flag("403-17", "ערר (...) 1975/24 מישהו אחר") == "MISMATCH"
     assert fu2b._consistency_flag("403-17", "") == "NO_CITATION"
+
+
+def test_proc_mismatch_detects_prefix_vs_type_conflict():
+    # case_number prefix disagrees with proceeding_type → must flag (prefix is
+    # stripped by the migration, so a wrong proceeding_type loses the signal).
+    assert fu2b._proc_mismatch('בל"מ 1010-01-25', "ערר") is True
+    assert fu2b._proc_mismatch('בל"מ (...) 1028/20 חלוואני', "ערר") is True
+    # agreement → no flag
+    assert fu2b._proc_mismatch('ערר 1024/24 נילי', "ערר") is False
+    assert fu2b._proc_mismatch('בל"מ 1010-01-25', 'בל"מ') is False
+    # bare number with no prefix → nothing to contradict
+    assert fu2b._proc_mismatch("8047/23", 'בל"מ') is False
diff --git a/scripts/fu2b_reconcile_internal_case_numbers.py b/scripts/fu2b_reconcile_internal_case_numbers.py
index 8ab4238..8d5a73a 100755
--- a/scripts/fu2b_reconcile_internal_case_numbers.py
+++ b/scripts/fu2b_reconcile_internal_case_numbers.py
@@ -73,6 +73,24 @@ def _consistency_flag(bare: str | None, citation_formatted: str) -> str:
     return "OK" if bare in cf else "MISMATCH"
 
 
+def _proc_mismatch(case_number: str, proceeding_type: str) -> bool:
+    """True if the case_number's leading proceeding prefix disagrees with proceeding_type.
+
+    The migration strips the prefix from case_number, so a בל"מ prefix paired with
+    proceeding_type='ערר' (or vice-versa) would SILENTLY LOSE the proceeding signal.
+    Such rows must be flagged for chair adjudication, never auto-applied.
+    """
+    cn = (case_number or "").lstrip().lstrip("‏‎")  # drop RTL/LTR marks
+    pt = (proceeding_type or "").strip()
+    starts_balam = cn.startswith('בל"מ') or cn.startswith("בל”מ")
+    starts_arar = cn.startswith("ערר")
+    if starts_balam and pt and pt != 'בל"מ':
+        return True
+    if starts_arar and pt and pt != "ערר":
+        return True
+    return False
+
+
 async def _build_reconciliation() -> list[dict]:
     from legal_mcp.services import db
     pool = await db.get_pool()
@@ -93,6 +111,7 @@ async def _build_reconciliation() -> list[dict]:
             "citation_formatted": r["cf"],
             "extract_flag": flag,
             "consistency": cons,
+            "proc_flag": "PROC_MISMATCH" if _proc_mismatch(r["case_number"], r["proceeding_type"] or "") else "",
             "will_change": "yes" if changes else "no",
         })
     from collections import Counter
@@ -111,13 +130,14 @@ def _write_table(rows: list[dict], ts: str) -> tuple[Path, Path]:
     csv_path = AUDIT_DIR / f"fu2b-reconciliation-{ts}.csv"
     md_path = AUDIT_DIR / f"fu2b-reconciliation-{ts}.md"
     cols = ["id", "current_case_number", "proposed_bare", "proceeding_type",
-            "citation_formatted", "extract_flag", "consistency", "dup_check", "will_change"]
+            "citation_formatted", "extract_flag", "consistency", "proc_flag", "dup_check", "will_change"]
     with csv_path.open("w", newline="", encoding="utf-8") as f:
         w = csv.DictWriter(f, fieldnames=cols)
         w.writeheader()
         w.writerows(rows)
     changing = [r for r in rows if r["will_change"] == "yes"]
-    flagged = [r for r in rows if r["extract_flag"] != "OK" or r["consistency"] == "MISMATCH" or r["dup_check"]]
+    flagged = [r for r in rows if r["extract_flag"] != "OK" or r["consistency"] == "MISMATCH"
+               or r["dup_check"] or r["proc_flag"]]
     with md_path.open("w", encoding="utf-8") as f:
         f.write(f"# FU-2b — טבלת-תיאום מזהים (internal_committee) — {ts}\n\n")
         f.write(f"- סה\"כ רשומות: {len(rows)}\n- ישתנו: {len(changing)}\n- מסומנות לסקירה: {len(flagged)}\n\n")
@@ -126,7 +146,7 @@ def _write_table(rows: list[dict], ts: str) -> tuple[Path, Path]:
         for r in flagged:
             fl = " ".join(x for x in [r["extract_flag"] if r["extract_flag"] != "OK" else "",
                                        r["consistency"] if r["consistency"] == "MISMATCH" else "",
-                                       r["dup_check"]] if x)
+                                       r["proc_flag"], r["dup_check"]] if x)
             f.write(f"| {r['current_case_number'][:50]} | {r['proposed_bare']} | {r['proceeding_type']} | {fl} |\n")
         f.write("\n## כל השינויים המוצעים\n\n")
         f.write("| current_case_number | → proposed_bare | proc |\n|---|---|---|\n")
@@ -174,7 +194,8 @@ async def main() -> int:
         rows = await _build_reconciliation()
         csv_path, md_path = _write_table(rows, ts)
         changing = sum(1 for r in rows if r["will_change"] == "yes")
-        flagged = sum(1 for r in rows if r["extract_flag"] != "OK" or r["consistency"] == "MISMATCH" or r["dup_check"])
+        flagged = sum(1 for r in rows if r["extract_flag"] != "OK" or r["consistency"] == "MISMATCH"
+                      or r["dup_check"] or r["proc_flag"])
         print(f"DRY-RUN: {len(rows)} rows | will_change={changing} | flagged={flagged}")
         print(f"  table:  {md_path}")
         print(f"  csv:    {csv_path}")

From 8477fd87e7bfda800e3530429596d46fb4950ac3 Mon Sep 17 00:00:00 2001
From: Chaim <chaim@marcus-law.co.il>
Date: Sun, 31 May 2026 08:58:32 +0000
Subject: [PATCH 6/6] docs(scripts): register fu2b reconciliation script
 (FU-2b)

---
 scripts/SCRIPTS.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/SCRIPTS.md b/scripts/SCRIPTS.md
index 2aa5044..13cb7ea 100644
--- a/scripts/SCRIPTS.md
+++ b/scripts/SCRIPTS.md
@@ -13,6 +13,7 @@
 | `sync_agents_across_companies.py` | python | **סנכרון סוכנים מ-CMP (1xxx, master) ל-CMPA (8xxx, mirror)** — Gap #25. משווה adapter_config (model/timeout/instructions/skills/etc), runtime_config (heartbeat), ושדות top-level (budget/metadata/icon/title/role). מסנן אוטומטית local skills שלא קיימים ב-mirror. לוגיקת subset (mirror יכול להחזיק יותר skills כי ה-API מוסיף required runtime skills). תומך `--verify`/`--dry-run`/`--apply [--only NAME]`. גיבוי אוטומטי. דורש `PAPERCLIP_BOARD_API_KEY`. **להריץ אחרי כל שינוי הגדרות ב-CMP.** **⚠ אם `adapter_type` שונה בין CMP ל-CMPA — הסקריפט מדלג על הסוכן עם warning. בעת מעבר adapter (למשל ל-`deepseek_local`) חובה לעדכן ידנית בשתי החברות לפני sync.** | ידני אחרי כל שינוי |
 | `fix_paperclipai_skills_drift.py` | python | סקריפט חד-פעמי (בוצע 2026-05-04) שניקה drift על `paperclipai/*` skills בין CMP ל-CMPA. הסיר `paperclip-dev` מכל 14 הסוכנים, ודאג ש-`paperclip-converting-plans-to-tasks` קיים רק על CEO ו-analyst. תומך `--apply` (ברירת מחדל: dry-run). דורש `PAPERCLIP_BOARD_API_KEY`. נשמר לרפרנס למקרה שhdrift חוזר. | חד-פעמי (בוצע) |
 | `test_retrieval_by_name.py` | python | בדיקת אחזור-לפי-שם (#52/RC-A) — מאמת ש`search_precedent_library`/`search_internal_decisions` מדרגים את ההחלטה עצמה (אגסי) מעל מי שמצטט אותה, + רגרסיות לשאילתות מהותיות. הרצה: `DOTENV_PATH=/home/chaim/.env DATA_DIR=.../data mcp-server/.venv/bin/python scripts/test_retrieval_by_name.py` (exit 0 = עבר). | ידני אחרי שינוי שכבת חיפוש |
+| `fu2b_reconcile_internal_case_numbers.py` | python | **FU-2b (GAP-07/08) — תיאום `case_number` של `internal_committee`** מציטוט-מלא למספר-בסיס קנוני (X1: trim·prefix-strip·`/`→`-`, חודש נשמר). דטרמיניסטי (token יחיד; 0/>1 → flag). `--dry-run` (ברירת-מחדל) מפיק טבלת-תיאום ל-`data/audit/fu2b-reconciliation-*.{csv,md}` עם flags (DUP_CHECK / PROC_MISMATCH / MISMATCH). `--apply --approved <csv>` מגבה ואז מעדכן רק שורות שאושרו ע"י היו"ר. scope: internal בלבד (external → #68). FK-safe. | חד-פעמי, **chair-gated** (apply רק אחרי אישור דפנה) |
 | `auto-sync-cases.sh` | bash | סנכרון תיקי ערר ל-Gitea — רץ כל דקה | `* * * * *` (cron) |
 | `backup-db.sh` | bash | גיבוי PostgreSQL יומי ל-`data/backups/` (gzip) | לתזמן: `0 2 * * *` |
 | `restore-db.sh` | bash | שחזור DB מגיבוי (companion ל-backup-db.sh) | ידני |