Merge pull request 'feat(principles): עיצוב-מחדש עקרונות משפטיים — פאנל-3, תקרת-5, סינון רטרואקטיבי, סינתזה, טרמינולוגיה (#152)' (#304) from worktree-canonical-synthesis into main

Merge PR #304: legal-principles redesign — panel-3 + cap-5 + cull + synthesis + terminology (#152)
2026-06-19 11:17:05 +00:00
parent ca1f0e8c66 4ca907b97f
commit ffa02ca83c
22 changed files with 2102 additions and 89 deletions
--- a/scripts/SCRIPTS.md
+++ b/scripts/SCRIPTS.md
@@ -65,6 +65,8 @@
 | `halacha_panel_calibrate.py` | python | **כיול + מדידת הפאנל** (Trust-or-Escalate, ICLR 2025). `--source live` (ברירת-מחדל): מריץ את שאלת-ה-KEEP על מדגם-הזהב ומודד מול `is_holding` precision+coverage+**split-rate** לכל מדיניות + false-keep/false-drop (מייבא שופטים מ-`halacha_panel_approve`, **חובה מקומי**). **#133/FU-5** — `--source captured`: **אפס-עלות** (בלי re-vote/LLM) — מצליב סבבים שמורים (FU-1) מול הכרעות-יו"ר (FU-2) דרך `db.panel_rounds_vs_chair` ומדווח split-rate+auto-precision **לכל סבב** (מגמת הלולאה: ככל שהרובריקה משתפרת precision נשמר ו-split יורד); משתף את `analyze_pairs` של FU-4 (מקור-יחיד). שתי המדידות מדווחות **anon-stability** (מבחן-אנונימיזציה #81.7) כמטריקת-בריאות נגד echo-chamber. `--batch`/`--limit`/`--concurrency`. | ידני — לפני חיווט `--apply` (live) / תקופתי — מעקב-לולאה (captured) |
 | `halacha_rubric_distill.py` | python | **#133/FU-4 — זיקוק-רובריקה PROPOSE-ONLY.** מצליב `halacha_panel_rounds` (FU-1, הצבעות+נימוקים) מול הכרעות-היו"ר (FU-2, seeds ב-`halacha_goldset` batch `chair-live`) דרך `db.panel_rounds_vs_chair` (read-only), מנתח דטרמיניסטית **כשלים שיטתיים** (false-keep/false-drop, פיצולים-שהוכרעו, שיעור-מחלוקת-עם-היו"ר לכל שופט), ומציע `KEEP_SYSTEM` v2 + exemplars מופשטים (claude_session מקומי, אפס עלות) כ**דוח-diff** ל-`data/learning/rubric-proposal-<ts>.md`. **לעולם לא auto-apply** — אימוץ v2 = עריכה אנושית של הקבוע דרך PR (INV-LRN1); exemplars מופשטים בלבד (INV-LRN5); הסיגנל היחיד = הכרעת-יו"ר, לא הצבעות-פאנל (anti-echo). מתחת ל-12 זוגות → "אין מספיק נתונים". `--no-llm` (סטטיסטיקה בלבד) / `--limit N`. **חובה מקומי**. | תקופתי — אחרי שהצטברו הכרעות-יו"ר על מחלוקות-פאנל |
 | `backfill_canonical_halachot.py` | python | **V41 — הקמת מודל ההלכות הקנוניות (חד-פעמי + idempotent).** (1) בונה רכיבים-קשורים (connected components) מ-`equivalent_halachot` (transitive closure — union-find). (2) לכל אשכול: בוחר נציג-קנוני (הכי הרבה corroboration → confidence → earliest), יוצר שורת `canonical_halachot`, ומעדכן `canonical_id` + `instance_type` לכל חברי האשכול. (3) לסינגלטונים (ללא קישורי-שוויון): 1:1 canonical. (4) מאכלס `halacha_citation_corroboration.canonical_id` מ-`halachot.canonical_id`. `--dry-run` (ברירת-מחדל, מחשב ומדווח בלבד) / `--apply` (כותב) / `--verbose`. לאחר הרצה: `canonical_statement` = ניסוח-נציג (pending_synthesis); עוקב: `backfill_canonical_synthesis.py` (Phase 4) יסנתז ניסוח-רחב דרך LLM. הרץ: `mcp-server/.venv/bin/python scripts/backfill_canonical_halachot.py --apply`. | **חד-פעמי** (לאחר deploy V41) / idempotent לפי צורך |
+| `cull_principles.py` | python | **#152 Phase C — סינון רטרואקטיבי של קורפוס-העקרונות דרך פאנל-3 (הפיך).** מריץ על כל עיקרון 'original' קיים את אותו משטר שה-extractor משתמש בו להבא (`services/panel_extraction.panel_keep_score`, G2): 3 שופטים (Claude מקומי + DeepSeek + Gemini) מצביעים keep+score → כלל-האישור (3 קולות→שורד · 2 וציון≥0.85→שורד · 2 ו<0.85→יו"ר · ≤1→נדחה) → תקרת `HALACHA_PANEL_MAX_NEW`=5 לכל החלטה לפי ציון (`apply_cap`). נדחה → `halachot.review_status='rejected'` + ה-canonical שלו `rejected` (הפיך, גיבוי-CSV ב-`data/audit/` לפני כל כתיבה). מרוסן ב-`usage_limits` (עוצר-רך בתקרת-שימוש, resumable). `--dry-run` (ברירת-מחדל) / `--apply` / `--sample N` (החלטות אקראיות) / `--limit N` / `--no-throttle` / `--verbose`. **חובה מקומי** (3 שופטים). הרץ: `cd mcp-server && HOME=/home/chaim .venv/bin/python ../scripts/cull_principles.py --apply`. | **חד-פעמי** (סינון ראשוני) + ניתן-לחזרה |
+| `backfill_canonical_synthesis.py` | python | **V41 Phase 4 — סינתזת-LLM ל-`canonical_statement` (idempotent + resumable).** עובר על canonicals ב-`review_status='pending_synthesis'` (רב-instance ראשונים) ומזקק לכל אחד ניסוח אחד כללי ומעוגן בציטוטי-המופעים (INV-AH) דרך `services/canonical_synthesis.py` (מסלול-יחיד, G2). שערים: עיגון/הימנעות, **drift-floor** (cosine מול המקור, ברירת-מחדל 0.80 — סטייה גדולה→נשמר המקור), ואיסור ציטוטי-תיק חדשים. בכל מקרה הסטטוס מתקדם ל-`pending_review` לשער-היו"ר (G10/INV-LRN6). מודל Opus (`HALACHA_CANONICAL_SYNTH_MODEL`). מרוסן ע"י `usage_limits` (עוצר-רך בתקרת-שימוש, resumable). `--dry-run` (ברירת-מחדל) / `--apply` / `--sample N` (מדגם אקראי לבדיקה) / `--limit N` / `--no-throttle` / `--verbose`. CSV-audit ל-`data/audit/canonical-synthesis-*.csv`. **חובה מקומי** (claude_session). הרץ: `cd mcp-server && HOME=/home/chaim .venv/bin/python ../scripts/backfill_canonical_synthesis.py --apply`. שוטף: כלי-MCP `canonical_synthesize_pending`. | **חד-פעמי** (המסה הראשונית) + idempotent לחדשים |
 | `halacha_batch_reconcile.py` | python | **#82.7** — dedup חוצה-פסקים offline (שמרני, **dry-run בלבד**). dedup-on-insert משווה רק תוך-פסק; כאן סף מחמיר (cosine ≥0.95, `--cosine`) ולא-הרסני: מאתר זוגות הלכות near-duplicate בין פסקים שונים (pgvector `<=>` exact) עם איתות לקסיקלי (Jaccard/Levenshtein) ומדווח ל-CSV ב-`data/audit/` לסקירת היו"ר. לא מדלג/ממזג/מוחק. `--include-pending`. **`--link`** רושם את הזוגות שנמצאו כ-`equivalent_halachot` (parallel authority, #84.2 — **deprecated post-V41** — השתמש ב-`backfill_canonical_halachot.py --apply` במקום). רץ עם venv של mcp-server. | **deprecated** — הוחלף ב-`backfill_canonical_halachot.py` (V41). נשמר לצורכי audit |
 | `calibrate_halacha_dedup.py` | python | **#82.1** — כיול ספי ה-dedup הלקסיקלי (#82.3) מול gold-set הניקוי. קורא `halacha-cleanup-manifest-*.csv` (זוגות duplicate↔survivor מתויגי-אדם), טוען טקסט-survivor מה-DB, ו-sweep של (jaccard_min × levenshtein_min) עם P/R/F1, מסמן את נקודת-העבודה המוגדרת. אימת ש-(0.55, 0.70) → **precision 1.0** (אפס false-merge), recall 0.30 — מתאים לאיתות-משני שחוסם auto-approve. `--manifest <path>`. רץ עם venv של mcp-server | חד-פעמי — כיול (בוצע 2026-06-06) |
 | `ab_halacha_opus48.py` | python | **A/B לא-הרסני לחילוץ הלכות (Claude)** — מריץ מחדש חילוץ הלכות על פסק-דין בודד דרך מודל/effort נבחרים (`AB_MODEL`/`AB_EFFORT`, ברירת-מחדל `claude-opus-4-8`/`xhigh`) ומשווה לסטטיסטיקות ההלכות הקיימות ב-DB **בלי למחוק/לכתוב כלום**. משכפל את `halacha_extractor.extract()` (אותם פרומפטים, בחירת-צ'אנקים, אימות-ציטוט) ומחליף רק את קריאת ה-LLM ב-`claude -p --model --effort`. מפיק `data/ab_halacha_<case>_<effort>.json`. הרצה: `DOTENV_PATH=/home/chaim/.env DATA_DIR=.../data .venv/bin/python scripts/ab_halacha_opus48.py <case_law_id>`. **ממצא 2026-05-31 (שטיין 1128-08-20):** Opus 4.8@xhigh חילץ 51 מול 124 בייצור (100% quote-verified מול 96%) אך ביטחון מכויל-נמוך יותר (חציון 0.75 מול 0.82) — ולכן **לא** מקטין את תור-האישור-הידני תחת sweep אוטו-אישור conf≥0.78 (26 מול 24). שיפור איכות, לא צמצום-תור. | ידני (החלטת מודל-חילוץ) |
--- a/scripts/backfill_canonical_synthesis.py
+++ b/scripts/backfill_canonical_synthesis.py
@@ -0,0 +1,174 @@
+#!/usr/bin/env python3
+"""Backfill — LLM synthesis of canonical_halachot.canonical_statement (V41 Phase 4).
+
+WHAT THIS DOES
+--------------
+Walks canonicals in ``review_status='pending_synthesis'`` and, for each, asks a
+local ``claude_session`` model (Opus by default) to rewrite the statement carried
+over from the representative halacha into ONE clean, case-independent legal
+principle — grounded in the instances' supporting quotes (INV-AH). Accepted
+rewrites are committed with a fresh embedding; abstained / drift-rejected /
+new-citation outcomes keep the original statement. Either way ``review_status``
+advances to ``pending_review`` for the chair gate (G10 / INV-LRN1).
+
+All logic lives in services/canonical_synthesis.py (G2) — this script is the
+batch driver: ordering, throttling, dry-run reporting and a CSV audit trail.
+
+IDEMPOTENCY / RESUME
+--------------------
+Operates on ``pending_synthesis`` only; a committed canonical leaves the queue, so
+re-running continues where it stopped. Safe to interrupt.
+
+THROTTLING
+----------
+Each item is one Opus call against chaim's claude.ai subscription. Before every
+item the shared usage_limits ceilings are checked; once a window is over its soft
+ceiling the run STOPS gracefully (resumable) instead of hammering 429. Disable
+with --no-throttle (e.g. small samples).
+
+USAGE
+-----
+cd ~/legal-ai/mcp-server
+.venv/bin/python ../scripts/backfill_canonical_synthesis.py --sample 20            # dry-run, 20 random
+.venv/bin/python ../scripts/backfill_canonical_synthesis.py --dry-run --limit 50   # dry-run, first 50 (multi-instance first)
+.venv/bin/python ../scripts/backfill_canonical_synthesis.py --apply                # full throttled run
+.venv/bin/python ../scripts/backfill_canonical_synthesis.py --apply --limit 200
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import os
+import random
+import sys
+from collections import Counter
+from datetime import datetime, timezone
+from uuid import UUID
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "mcp-server", "src"))
+
+from legal_mcp.services import canonical_synthesis, db  # noqa: E402
+
+try:  # stdlib-only module, importable from system python too
+    from legal_mcp.services import usage_limits
+except Exception:  # pragma: no cover
+    usage_limits = None
+
+AUDIT_DIR = os.path.join(os.path.dirname(__file__), "..", "data", "audit")
+
+
+async def _pending(limit: int | None, sample: int | None) -> list[dict]:
+    """Pending-synthesis canonicals, multi-instance first (highest value)."""
+    pool = await db.get_pool()
+    rows = await pool.fetch(
+        "SELECT id::text AS id, instance_count, canonical_statement "
+        "FROM canonical_halachot WHERE review_status='pending_synthesis' "
+        "ORDER BY instance_count DESC, created_at",
+    )
+    items = [dict(r) for r in rows]
+    if sample and sample < len(items):
+        items = random.sample(items, sample)
+    if limit:
+        items = items[:limit]
+    return items
+
+
+def _throttled() -> tuple[bool, str]:
+    if usage_limits is None:
+        return False, "usage_limits unavailable"
+    usage = usage_limits.subscription_usage()
+    if usage is None:
+        return False, "usage read failed (proceeding)"
+    over, _reset, detail = usage_limits.ceiling_status(usage)
+    return over, detail
+
+
+def _short(s: str, n: int = 90) -> str:
+    s = (s or "").replace("\n", " ")
+    return s if len(s) <= n else s[: n - 1] + "…"
+
+
+async def _run(apply: bool, limit: int | None, sample: int | None,
+               throttle: bool, verbose: bool) -> int:
+    items = await _pending(limit, sample)
+    total = len(items)
+    mode = "APPLY" if apply else "DRY-RUN"
+    print(f"[{mode}] {total} canonicals pending_synthesis to process "
+          f"(throttle={'on' if throttle else 'off'})\n")
+    if not total:
+        print("nothing to do.")
+        return 0
+
+    stamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    os.makedirs(AUDIT_DIR, exist_ok=True)
+    audit_path = os.path.join(
+        AUDIT_DIR, f"canonical-synthesis-{'apply' if apply else 'dryrun'}-{stamp}.csv")
+    counts: Counter[str] = Counter()
+    stopped = False
+
+    with open(audit_path, "w", newline="", encoding="utf-8") as fh:
+        w = csv.writer(fh)
+        w.writerow(["canonical_id", "instance_count", "status", "drift_cosine",
+                    "reason", "before", "after"])
+        for n, it in enumerate(items, 1):
+            if throttle:
+                over, detail = _throttled()
+                if over:
+                    print(f"\n⏸  usage ceiling reached ({detail}) — stopping at "
+                          f"{n - 1}/{total}. Re-run to resume.")
+                    stopped = True
+                    break
+
+            cid = UUID(it["id"])
+            if apply:
+                res = await canonical_synthesis.synthesize_and_apply(cid)
+            else:
+                res = await canonical_synthesis.synthesize_canonical(cid)
+            counts[res["status"]] += 1
+
+            w.writerow([it["id"], it["instance_count"], res["status"],
+                        res.get("drift_cosine"), res.get("reason", ""),
+                        res.get("original", ""), res.get("proposed", "")])
+
+            mark = {"accepted": "✓", "abstained": "·", "drift_rejected": "✗",
+                    "new_citation": "✗", "llm_error": "!", "no_instances": "·",
+                    "not_found": "!"}.get(res["status"], "?")
+            line = (f"[{n}/{total}] {mark} {res['status']:<14} "
+                    f"inst={it['instance_count']} {it['id'][:8]}")
+            print(line)
+            if verbose and res["status"] in ("accepted",) or (verbose and res.get("proposed") != res.get("original")):
+                print(f"      before: {_short(res.get('original', ''))}")
+                print(f"      after : {_short(res.get('proposed', ''))}  "
+                      f"(drift={res.get('drift_cosine')})")
+                if res.get("reason"):
+                    print(f"      reason: {_short(res['reason'], 110)}")
+
+    processed = sum(counts.values())
+    print(f"\n── summary ({mode}) — {processed}/{total} processed"
+          f"{' (stopped early)' if stopped else ''} ──")
+    for status, c in counts.most_common():
+        print(f"   {status:<16} {c}")
+    print(f"\naudit CSV: {audit_path}")
+    if not apply:
+        print("dry-run — nothing written to the DB. Re-run with --apply to commit.")
+    return 0
+
+
+def main() -> int:
+    p = argparse.ArgumentParser(description="LLM synthesis of canonical_statement (V41 Phase 4)")
+    p.add_argument("--apply", action="store_true", help="commit to the DB (default: dry-run)")
+    p.add_argument("--dry-run", action="store_true", help="explicit dry-run (default)")
+    p.add_argument("--limit", type=int, default=None, help="cap items processed")
+    p.add_argument("--sample", type=int, default=None, help="random sample of N (dry-run inspection)")
+    p.add_argument("--no-throttle", action="store_true", help="skip usage-ceiling checks")
+    p.add_argument("--verbose", action="store_true", help="print before/after for changed items")
+    args = p.parse_args()
+    return asyncio.run(_run(
+        apply=args.apply, limit=args.limit, sample=args.sample,
+        throttle=not args.no_throttle, verbose=args.verbose,
+    ))
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/cull_principles.py
+++ b/scripts/cull_principles.py
@@ -0,0 +1,204 @@
+#!/usr/bin/env python3
+"""Retroactive cull of the legal-principles corpus via the 3-model panel (#152, Phase C).
+
+The corpus grew to ~5,243 principles (18.8/decision) under the old single-model
+auto-approve. This re-adjudicates EVERY existing 'original' principle with the
+SAME regime the extractor now uses going forward (chaim 2026-06-19):
+
+  • 3 judges (Claude local + DeepSeek + Gemini) vote keep + score on each principle.
+  • Approval rule: 3 votes→survive · 2 & score≥0.85→survive · 2 & <0.85→chair
+    (pending_review) · ≤1→reject.
+  • Per DECISION, survivors are capped to HALACHA_PANEL_MAX_NEW (=5) by score; the
+    rest are rejected (over-cap).
+
+All logic is shared with the extractor via services/panel_extraction (G2). The
+cull is REVERSIBLE: a CSV backup of every (id, old_status) is written before any
+write, and a rejected principle's canonical is also set 'rejected' (recoverable).
+Throttled by usage_limits (stops gracefully at the soft ceiling, resumable).
+
+  cd ~/legal-ai/mcp-server
+  HOME=/home/chaim .venv/bin/python ../scripts/cull_principles.py --sample 5     # dry-run, 5 decisions
+  HOME=/home/chaim .venv/bin/python ../scripts/cull_principles.py --dry-run      # all, dry-run
+  HOME=/home/chaim .venv/bin/python ../scripts/cull_principles.py --apply        # full, throttled
+"""
+from __future__ import annotations
+
+import argparse
+import asyncio
+import csv
+import os
+import random
+import sys
+from collections import Counter
+from datetime import datetime, timezone
+from uuid import UUID
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "mcp-server", "src"))
+
+from legal_mcp import config  # noqa: E402
+from legal_mcp.services import db, panel_extraction as pe  # noqa: E402
+
+try:
+    from legal_mcp.services import usage_limits
+except Exception:  # pragma: no cover
+    usage_limits = None
+
+AUDIT_DIR = os.path.join(os.path.dirname(__file__), "..", "data", "audit")
+_JUDGE_CONCURRENCY = 4
+
+
+async def _decisions(limit, sample):
+    """case_law ids that have 'original' principles, with source metadata."""
+    pool = await db.get_pool()
+    rows = await pool.fetch(
+        "SELECT cl.id, cl.case_number, cl.source_kind, cl.is_binding, "
+        "       count(*) AS n "
+        "FROM halachot h JOIN case_law cl ON cl.id = h.case_law_id "
+        "WHERE h.instance_type = 'original' AND h.review_status <> 'rejected' "
+        "GROUP BY cl.id, cl.case_number, cl.source_kind, cl.is_binding "
+        "ORDER BY n DESC",
+    )
+    items = [dict(r) for r in rows]
+    if sample and sample < len(items):
+        items = random.sample(items, sample)
+    if limit:
+        items = items[:limit]
+    return items
+
+
+async def _principles(case_law_id):
+    pool = await db.get_pool()
+    rows = await pool.fetch(
+        "SELECT id, rule_statement, supporting_quote, reasoning_summary, "
+        "       canonical_id, review_status "
+        "FROM halachot WHERE case_law_id = $1 AND instance_type = 'original' "
+        "AND review_status <> 'rejected' ORDER BY halacha_index",
+        case_law_id,
+    )
+    return [dict(r) for r in rows]
+
+
+def _throttled():
+    if usage_limits is None:
+        return False, "no usage_limits"
+    u = usage_limits.subscription_usage()
+    if u is None:
+        return False, "usage read failed"
+    over, _r, detail = usage_limits.ceiling_status(u)
+    return over, detail
+
+
+async def _judge_decision(dec, sem):
+    principles = await _principles(dec["id"])
+    if not principles:
+        return []
+
+    async def one(p):
+        async with sem:
+            v = await pe.panel_keep_score(
+                p["rule_statement"], p["supporting_quote"], p.get("reasoning_summary") or "",
+                source_kind=dec["source_kind"] or "external_upload",
+                is_binding=bool(dec["is_binding"]),
+            )
+        return {**p, **v}
+
+    judged = await asyncio.gather(*[one(p) for p in principles])
+    return pe.apply_cap(list(judged))
+
+
+async def _apply_decision(judged, reviewer):
+    pool = await db.get_pool()
+    async with pool.acquire() as conn:
+        async with conn.transaction():
+            for j in judged:
+                fv = j["final_verdict"]
+                if fv == "approved":
+                    await conn.execute(
+                        "UPDATE halachot SET review_status='approved', reviewed_at=now(), "
+                        "reviewer=$2, updated_at=now() WHERE id=$1", j["id"], reviewer)
+                elif fv == "pending_review":
+                    await conn.execute(
+                        "UPDATE halachot SET review_status='pending_review', reviewer=$2, "
+                        "updated_at=now() WHERE id=$1", j["id"], reviewer)
+                else:  # rejected — also reject its canonical (reversible)
+                    await conn.execute(
+                        "UPDATE halachot SET review_status='rejected', reviewed_at=now(), "
+                        "reviewer=$2, updated_at=now() WHERE id=$1", j["id"], reviewer)
+                    if j.get("canonical_id"):
+                        await conn.execute(
+                            "UPDATE canonical_halachot SET review_status='rejected', "
+                            "updated_at=now() WHERE id=$1", j["canonical_id"])
+
+
+async def _run(apply, limit, sample, throttle, verbose):
+    decisions = await _decisions(limit, sample)
+    mode = "APPLY" if apply else "DRY-RUN"
+    print(f"[{mode}] {len(decisions)} decisions with principles "
+          f"(throttle={'on' if throttle else 'off'})\n", flush=True)
+    if not decisions:
+        print("nothing to do.")
+        return 0
+
+    stamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    os.makedirs(AUDIT_DIR, exist_ok=True)
+    audit = os.path.join(AUDIT_DIR, f"principle-cull-{'apply' if apply else 'dryrun'}-{stamp}.csv")
+    reviewer = f"cull:panel v{config.HALACHA_PANEL_SCORE_FLOOR} cap{config.HALACHA_PANEL_MAX_NEW}"
+    sem = asyncio.Semaphore(_JUDGE_CONCURRENCY)
+    tally = Counter()
+    n_in = n_out = 0
+    stopped = False
+
+    with open(audit, "w", newline="", encoding="utf-8") as fh:
+        w = csv.writer(fh)
+        w.writerow(["case_number", "halacha_id", "old_status", "final_verdict",
+                    "votes", "score", "canonical_id", "rule"])
+        for k, dec in enumerate(decisions, 1):
+            if throttle:
+                over, detail = _throttled()
+                if over:
+                    print(f"\n⏸  usage ceiling ({detail}) — stopping at {k-1}/{len(decisions)}. "
+                          f"Re-run to resume.", flush=True)
+                    stopped = True
+                    break
+            judged = await _judge_decision(dec, sem)
+            survivors = sum(1 for j in judged if j["final_verdict"] in ("approved", "pending_review"))
+            n_in += len(judged)
+            n_out += survivors
+            for j in judged:
+                tally[j["final_verdict"]] += 1
+                w.writerow([dec["case_number"], str(j["id"]), j["review_status"],
+                            j["final_verdict"], j["votes"], j["score"],
+                            str(j.get("canonical_id") or ""), (j["rule_statement"] or "")[:160]])
+            if apply and judged:
+                await _apply_decision(judged, reviewer)
+            print(f"[{k}/{len(decisions)}] {dec['case_number']:<16} "
+                  f"{len(judged)}→{survivors} survive", flush=True)
+            if verbose:
+                for j in judged:
+                    mark = {"approved": "✓", "pending_review": "→chair", "rejected": "✗"}[j["final_verdict"]]
+                    print(f"      {mark} v={j['votes']} s={j['score']} {(j['rule_statement'] or '')[:80]}")
+
+    print(f"\n── {mode} summary{' (stopped early)' if stopped else ''} ──")
+    print(f"   principles judged: {n_in} → survive: {n_out}  ({n_in - n_out} rejected)")
+    for v, c in tally.most_common():
+        print(f"   {v:<16} {c}")
+    print(f"\naudit CSV: {audit}")
+    if not apply:
+        print("dry-run — no DB writes. Re-run with --apply to commit (reversible).")
+    return 0
+
+
+def main():
+    p = argparse.ArgumentParser(description="Retroactive principle cull via 3-model panel (#152)")
+    p.add_argument("--apply", action="store_true", help="write verdicts (reversible, CSV-backed)")
+    p.add_argument("--dry-run", action="store_true", help="explicit dry-run (default)")
+    p.add_argument("--limit", type=int, default=None)
+    p.add_argument("--sample", type=int, default=None, help="random sample of N decisions")
+    p.add_argument("--no-throttle", action="store_true")
+    p.add_argument("--verbose", action="store_true")
+    a = p.parse_args()
+    return asyncio.run(_run(a.apply, a.limit, a.sample, not a.no_throttle, a.verbose))
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/halacha_panel_approve.py
+++ b/scripts/halacha_panel_approve.py
@@ -50,24 +50,17 @@ from pathlib import Path

 import httpx

-from legal_mcp.services import claude_session, db
+from legal_mcp.services import db, panel_judges
+# Judges are the shared primitive (G2) — #152 lifted them to services/panel_judges.
+from legal_mcp.services.panel_judges import (
+    DEEPSEEK_KEY,
+    GEMINI_KEY,
+    judge_claude,
+    judge_deepseek,
+    judge_gemini,
+)

-# ── keys (local files, same pattern as the other local judges) ──
-
-def _env_key(name: str, *files: str) -> str:
-    for f in files:
-        p = Path(f).expanduser()
-        if p.exists():
-            for line in p.read_text().splitlines():
-                if line.startswith(name + "="):
-                    return line.split("=", 1)[1].strip()
-    return os.environ.get(name, "")
-
-
-DEEPSEEK_KEY = _env_key("DEEPSEEK_API_KEY", "~/.hermes/profiles/deepseek/.env", "~/.env")
-# canonical Infisical name is GOOGLE_GEMINI_API_KEY (/external-apis/gemini); accept
-# the bare GEMINI_API_KEY too for back-compat.
-GEMINI_KEY = _env_key("GOOGLE_GEMINI_API_KEY", "~/.env") or _env_key("GEMINI_API_KEY", "~/.env")
+_bool = panel_judges.to_bool

 # ── the two coarse questions (the reliable axis — NOT the fuzzy sub-type) ──

@@ -99,62 +92,6 @@ def _nli_user(h: dict) -> str:
    return f"כלל:\n{h.get('rule_statement') or ''}\n\nציטוט:\n{h.get('supporting_quote') or ''}"


-# ── three judges, one signature: (system, user) -> dict|None ──
-
-async def judge_claude(system: str, user: str) -> dict | None:
-    try:
-        return await claude_session.query_json(user, system=system)
-    except Exception:
-        return None
-
-
-async def judge_deepseek(client: httpx.AsyncClient, system: str, user: str) -> dict | None:
-    if not DEEPSEEK_KEY:
-        return None
-    try:
-        r = await client.post(
-            "https://api.deepseek.com/v1/chat/completions",
-            headers={"Authorization": f"Bearer {DEEPSEEK_KEY}", "Content-Type": "application/json"},
-            json={"model": "deepseek-chat", "temperature": 0, "max_tokens": 120,
-                  "response_format": {"type": "json_object"},
-                  "messages": [{"role": "system", "content": system},
-                               {"role": "user", "content": user}]},
-            timeout=90,
-        )
-        r.raise_for_status()
-        return json.loads(r.json()["choices"][0]["message"]["content"])
-    except Exception:
-        return None
-
-
-async def judge_gemini(client: httpx.AsyncClient, system: str, user: str) -> dict | None:
-    if not GEMINI_KEY:
-        return None
-    try:
-        r = await client.post(
-            f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key={GEMINI_KEY}",
-            headers={"Content-Type": "application/json"},
-            json={"system_instruction": {"parts": [{"text": system}]},
-                  "contents": [{"parts": [{"text": user}]}],
-                  "generationConfig": {"temperature": 0, "maxOutputTokens": 4000,
-                                       "responseMimeType": "application/json"}},
-            timeout=90,
-        )
-        r.raise_for_status()
-        return json.loads(r.json()["candidates"][0]["content"]["parts"][0]["text"])
-    except Exception:
-        return None
-
-
-def _bool(d: dict | None, key: str) -> bool | None:
-    if not isinstance(d, dict) or key not in d:
-        return None
-    v = d[key]
-    if isinstance(v, bool):
-        return v
-    return str(v).strip().lower() in ("true", "1", "yes", "כן")
-
-
 async def panel_vote(client, system, user, key) -> dict:
    """Run all three judges; return per-judge bools + the verdict."""
    c, ds, gm = await asyncio.gather(