feat(principles): retroactive cull (Phase C) + source-derived terminology (Phase D, #152)
Phase C — scripts/cull_principles.py: re-adjudicates every existing 'original' principle with the SAME panel regime (panel_keep_score → classify → apply_cap), reversible (CSV backup + rejected canonical recoverable), usage-throttled. panel_extraction.panel_keep_score + apply_cap (shared, G2). Dry-run on 3 decisions: 37→15 survive. Phase D — services/principles.py: source-derived label הלכה (binding court) / כלל פרשני (committee) / עיקרון (persuasive); umbrella עקרונות משפטיים. Wired into canonical_halacha_get/list (principle_class+principle_label). UI string changes deferred to the Claude Design gate. spec INV-LRN7; SCRIPTS.md; 7 new tests; 428 green. Phase E needs no new code — synthesis already targets pending_synthesis, which the cull leaves only on survivors (rejected canonicals → 'rejected'). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -65,6 +65,7 @@
|
||||
| `halacha_panel_calibrate.py` | python | **כיול + מדידת הפאנל** (Trust-or-Escalate, ICLR 2025). `--source live` (ברירת-מחדל): מריץ את שאלת-ה-KEEP על מדגם-הזהב ומודד מול `is_holding` precision+coverage+**split-rate** לכל מדיניות + false-keep/false-drop (מייבא שופטים מ-`halacha_panel_approve`, **חובה מקומי**). **#133/FU-5** — `--source captured`: **אפס-עלות** (בלי re-vote/LLM) — מצליב סבבים שמורים (FU-1) מול הכרעות-יו"ר (FU-2) דרך `db.panel_rounds_vs_chair` ומדווח split-rate+auto-precision **לכל סבב** (מגמת הלולאה: ככל שהרובריקה משתפרת precision נשמר ו-split יורד); משתף את `analyze_pairs` של FU-4 (מקור-יחיד). שתי המדידות מדווחות **anon-stability** (מבחן-אנונימיזציה #81.7) כמטריקת-בריאות נגד echo-chamber. `--batch`/`--limit`/`--concurrency`. | ידני — לפני חיווט `--apply` (live) / תקופתי — מעקב-לולאה (captured) |
|
||||
| `halacha_rubric_distill.py` | python | **#133/FU-4 — זיקוק-רובריקה PROPOSE-ONLY.** מצליב `halacha_panel_rounds` (FU-1, הצבעות+נימוקים) מול הכרעות-היו"ר (FU-2, seeds ב-`halacha_goldset` batch `chair-live`) דרך `db.panel_rounds_vs_chair` (read-only), מנתח דטרמיניסטית **כשלים שיטתיים** (false-keep/false-drop, פיצולים-שהוכרעו, שיעור-מחלוקת-עם-היו"ר לכל שופט), ומציע `KEEP_SYSTEM` v2 + exemplars מופשטים (claude_session מקומי, אפס עלות) כ**דוח-diff** ל-`data/learning/rubric-proposal-<ts>.md`. **לעולם לא auto-apply** — אימוץ v2 = עריכה אנושית של הקבוע דרך PR (INV-LRN1); exemplars מופשטים בלבד (INV-LRN5); הסיגנל היחיד = הכרעת-יו"ר, לא הצבעות-פאנל (anti-echo). מתחת ל-12 זוגות → "אין מספיק נתונים". `--no-llm` (סטטיסטיקה בלבד) / `--limit N`. **חובה מקומי**. | תקופתי — אחרי שהצטברו הכרעות-יו"ר על מחלוקות-פאנל |
|
||||
| `backfill_canonical_halachot.py` | python | **V41 — הקמת מודל ההלכות הקנוניות (חד-פעמי + idempotent).** (1) בונה רכיבים-קשורים (connected components) מ-`equivalent_halachot` (transitive closure — union-find). (2) לכל אשכול: בוחר נציג-קנוני (הכי הרבה corroboration → confidence → earliest), יוצר שורת `canonical_halachot`, ומעדכן `canonical_id` + `instance_type` לכל חברי האשכול. (3) לסינגלטונים (ללא קישורי-שוויון): 1:1 canonical. (4) מאכלס `halacha_citation_corroboration.canonical_id` מ-`halachot.canonical_id`. `--dry-run` (ברירת-מחדל, מחשב ומדווח בלבד) / `--apply` (כותב) / `--verbose`. לאחר הרצה: `canonical_statement` = ניסוח-נציג (pending_synthesis); עוקב: `backfill_canonical_synthesis.py` (Phase 4) יסנתז ניסוח-רחב דרך LLM. הרץ: `mcp-server/.venv/bin/python scripts/backfill_canonical_halachot.py --apply`. | **חד-פעמי** (לאחר deploy V41) / idempotent לפי צורך |
|
||||
| `cull_principles.py` | python | **#152 Phase C — סינון רטרואקטיבי של קורפוס-העקרונות דרך פאנל-3 (הפיך).** מריץ על כל עיקרון 'original' קיים את אותו משטר שה-extractor משתמש בו להבא (`services/panel_extraction.panel_keep_score`, G2): 3 שופטים (Claude מקומי + DeepSeek + Gemini) מצביעים keep+score → כלל-האישור (3 קולות→שורד · 2 וציון≥0.85→שורד · 2 ו<0.85→יו"ר · ≤1→נדחה) → תקרת `HALACHA_PANEL_MAX_NEW`=5 לכל החלטה לפי ציון (`apply_cap`). נדחה → `halachot.review_status='rejected'` + ה-canonical שלו `rejected` (הפיך, גיבוי-CSV ב-`data/audit/` לפני כל כתיבה). מרוסן ב-`usage_limits` (עוצר-רך בתקרת-שימוש, resumable). `--dry-run` (ברירת-מחדל) / `--apply` / `--sample N` (החלטות אקראיות) / `--limit N` / `--no-throttle` / `--verbose`. **חובה מקומי** (3 שופטים). הרץ: `cd mcp-server && HOME=/home/chaim .venv/bin/python ../scripts/cull_principles.py --apply`. | **חד-פעמי** (סינון ראשוני) + ניתן-לחזרה |
|
||||
| `backfill_canonical_synthesis.py` | python | **V41 Phase 4 — סינתזת-LLM ל-`canonical_statement` (idempotent + resumable).** עובר על canonicals ב-`review_status='pending_synthesis'` (רב-instance ראשונים) ומזקק לכל אחד ניסוח אחד כללי ומעוגן בציטוטי-המופעים (INV-AH) דרך `services/canonical_synthesis.py` (מסלול-יחיד, G2). שערים: עיגון/הימנעות, **drift-floor** (cosine מול המקור, ברירת-מחדל 0.80 — סטייה גדולה→נשמר המקור), ואיסור ציטוטי-תיק חדשים. בכל מקרה הסטטוס מתקדם ל-`pending_review` לשער-היו"ר (G10/INV-LRN6). מודל Opus (`HALACHA_CANONICAL_SYNTH_MODEL`). מרוסן ע"י `usage_limits` (עוצר-רך בתקרת-שימוש, resumable). `--dry-run` (ברירת-מחדל) / `--apply` / `--sample N` (מדגם אקראי לבדיקה) / `--limit N` / `--no-throttle` / `--verbose`. CSV-audit ל-`data/audit/canonical-synthesis-*.csv`. **חובה מקומי** (claude_session). הרץ: `cd mcp-server && HOME=/home/chaim .venv/bin/python ../scripts/backfill_canonical_synthesis.py --apply`. שוטף: כלי-MCP `canonical_synthesize_pending`. | **חד-פעמי** (המסה הראשונית) + idempotent לחדשים |
|
||||
| `halacha_batch_reconcile.py` | python | **#82.7** — dedup חוצה-פסקים offline (שמרני, **dry-run בלבד**). dedup-on-insert משווה רק תוך-פסק; כאן סף מחמיר (cosine ≥0.95, `--cosine`) ולא-הרסני: מאתר זוגות הלכות near-duplicate בין פסקים שונים (pgvector `<=>` exact) עם איתות לקסיקלי (Jaccard/Levenshtein) ומדווח ל-CSV ב-`data/audit/` לסקירת היו"ר. לא מדלג/ממזג/מוחק. `--include-pending`. **`--link`** רושם את הזוגות שנמצאו כ-`equivalent_halachot` (parallel authority, #84.2 — **deprecated post-V41** — השתמש ב-`backfill_canonical_halachot.py --apply` במקום). רץ עם venv של mcp-server. | **deprecated** — הוחלף ב-`backfill_canonical_halachot.py` (V41). נשמר לצורכי audit |
|
||||
| `calibrate_halacha_dedup.py` | python | **#82.1** — כיול ספי ה-dedup הלקסיקלי (#82.3) מול gold-set הניקוי. קורא `halacha-cleanup-manifest-*.csv` (זוגות duplicate↔survivor מתויגי-אדם), טוען טקסט-survivor מה-DB, ו-sweep של (jaccard_min × levenshtein_min) עם P/R/F1, מסמן את נקודת-העבודה המוגדרת. אימת ש-(0.55, 0.70) → **precision 1.0** (אפס false-merge), recall 0.30 — מתאים לאיתות-משני שחוסם auto-approve. `--manifest <path>`. רץ עם venv של mcp-server | חד-פעמי — כיול (בוצע 2026-06-06) |
|
||||
|
||||
204
scripts/cull_principles.py
Normal file
204
scripts/cull_principles.py
Normal file
@@ -0,0 +1,204 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Retroactive cull of the legal-principles corpus via the 3-model panel (#152, Phase C).
|
||||
|
||||
The corpus grew to ~5,243 principles (18.8/decision) under the old single-model
|
||||
auto-approve. This re-adjudicates EVERY existing 'original' principle with the
|
||||
SAME regime the extractor now uses going forward (chaim 2026-06-19):
|
||||
|
||||
• 3 judges (Claude local + DeepSeek + Gemini) vote keep + score on each principle.
|
||||
• Approval rule: 3 votes→survive · 2 & score≥0.85→survive · 2 & <0.85→chair
|
||||
(pending_review) · ≤1→reject.
|
||||
• Per DECISION, survivors are capped to HALACHA_PANEL_MAX_NEW (=5) by score; the
|
||||
rest are rejected (over-cap).
|
||||
|
||||
All logic is shared with the extractor via services/panel_extraction (G2). The
|
||||
cull is REVERSIBLE: a CSV backup of every (id, old_status) is written before any
|
||||
write, and a rejected principle's canonical is also set 'rejected' (recoverable).
|
||||
Throttled by usage_limits (stops gracefully at the soft ceiling, resumable).
|
||||
|
||||
cd ~/legal-ai/mcp-server
|
||||
HOME=/home/chaim .venv/bin/python ../scripts/cull_principles.py --sample 5 # dry-run, 5 decisions
|
||||
HOME=/home/chaim .venv/bin/python ../scripts/cull_principles.py --dry-run # all, dry-run
|
||||
HOME=/home/chaim .venv/bin/python ../scripts/cull_principles.py --apply # full, throttled
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import csv
|
||||
import os
|
||||
import random
|
||||
import sys
|
||||
from collections import Counter
|
||||
from datetime import datetime, timezone
|
||||
from uuid import UUID
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "mcp-server", "src"))
|
||||
|
||||
from legal_mcp import config # noqa: E402
|
||||
from legal_mcp.services import db, panel_extraction as pe # noqa: E402
|
||||
|
||||
try:
|
||||
from legal_mcp.services import usage_limits
|
||||
except Exception: # pragma: no cover
|
||||
usage_limits = None
|
||||
|
||||
AUDIT_DIR = os.path.join(os.path.dirname(__file__), "..", "data", "audit")
|
||||
_JUDGE_CONCURRENCY = 4
|
||||
|
||||
|
||||
async def _decisions(limit, sample):
|
||||
"""case_law ids that have 'original' principles, with source metadata."""
|
||||
pool = await db.get_pool()
|
||||
rows = await pool.fetch(
|
||||
"SELECT cl.id, cl.case_number, cl.source_kind, cl.is_binding, "
|
||||
" count(*) AS n "
|
||||
"FROM halachot h JOIN case_law cl ON cl.id = h.case_law_id "
|
||||
"WHERE h.instance_type = 'original' AND h.review_status <> 'rejected' "
|
||||
"GROUP BY cl.id, cl.case_number, cl.source_kind, cl.is_binding "
|
||||
"ORDER BY n DESC",
|
||||
)
|
||||
items = [dict(r) for r in rows]
|
||||
if sample and sample < len(items):
|
||||
items = random.sample(items, sample)
|
||||
if limit:
|
||||
items = items[:limit]
|
||||
return items
|
||||
|
||||
|
||||
async def _principles(case_law_id):
|
||||
pool = await db.get_pool()
|
||||
rows = await pool.fetch(
|
||||
"SELECT id, rule_statement, supporting_quote, reasoning_summary, "
|
||||
" canonical_id, review_status "
|
||||
"FROM halachot WHERE case_law_id = $1 AND instance_type = 'original' "
|
||||
"AND review_status <> 'rejected' ORDER BY halacha_index",
|
||||
case_law_id,
|
||||
)
|
||||
return [dict(r) for r in rows]
|
||||
|
||||
|
||||
def _throttled():
|
||||
if usage_limits is None:
|
||||
return False, "no usage_limits"
|
||||
u = usage_limits.subscription_usage()
|
||||
if u is None:
|
||||
return False, "usage read failed"
|
||||
over, _r, detail = usage_limits.ceiling_status(u)
|
||||
return over, detail
|
||||
|
||||
|
||||
async def _judge_decision(dec, sem):
|
||||
principles = await _principles(dec["id"])
|
||||
if not principles:
|
||||
return []
|
||||
|
||||
async def one(p):
|
||||
async with sem:
|
||||
v = await pe.panel_keep_score(
|
||||
p["rule_statement"], p["supporting_quote"], p.get("reasoning_summary") or "",
|
||||
source_kind=dec["source_kind"] or "external_upload",
|
||||
is_binding=bool(dec["is_binding"]),
|
||||
)
|
||||
return {**p, **v}
|
||||
|
||||
judged = await asyncio.gather(*[one(p) for p in principles])
|
||||
return pe.apply_cap(list(judged))
|
||||
|
||||
|
||||
async def _apply_decision(judged, reviewer):
|
||||
pool = await db.get_pool()
|
||||
async with pool.acquire() as conn:
|
||||
async with conn.transaction():
|
||||
for j in judged:
|
||||
fv = j["final_verdict"]
|
||||
if fv == "approved":
|
||||
await conn.execute(
|
||||
"UPDATE halachot SET review_status='approved', reviewed_at=now(), "
|
||||
"reviewer=$2, updated_at=now() WHERE id=$1", j["id"], reviewer)
|
||||
elif fv == "pending_review":
|
||||
await conn.execute(
|
||||
"UPDATE halachot SET review_status='pending_review', reviewer=$2, "
|
||||
"updated_at=now() WHERE id=$1", j["id"], reviewer)
|
||||
else: # rejected — also reject its canonical (reversible)
|
||||
await conn.execute(
|
||||
"UPDATE halachot SET review_status='rejected', reviewed_at=now(), "
|
||||
"reviewer=$2, updated_at=now() WHERE id=$1", j["id"], reviewer)
|
||||
if j.get("canonical_id"):
|
||||
await conn.execute(
|
||||
"UPDATE canonical_halachot SET review_status='rejected', "
|
||||
"updated_at=now() WHERE id=$1", j["canonical_id"])
|
||||
|
||||
|
||||
async def _run(apply, limit, sample, throttle, verbose):
|
||||
decisions = await _decisions(limit, sample)
|
||||
mode = "APPLY" if apply else "DRY-RUN"
|
||||
print(f"[{mode}] {len(decisions)} decisions with principles "
|
||||
f"(throttle={'on' if throttle else 'off'})\n", flush=True)
|
||||
if not decisions:
|
||||
print("nothing to do.")
|
||||
return 0
|
||||
|
||||
stamp = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
|
||||
os.makedirs(AUDIT_DIR, exist_ok=True)
|
||||
audit = os.path.join(AUDIT_DIR, f"principle-cull-{'apply' if apply else 'dryrun'}-{stamp}.csv")
|
||||
reviewer = f"cull:panel v{config.HALACHA_PANEL_SCORE_FLOOR} cap{config.HALACHA_PANEL_MAX_NEW}"
|
||||
sem = asyncio.Semaphore(_JUDGE_CONCURRENCY)
|
||||
tally = Counter()
|
||||
n_in = n_out = 0
|
||||
stopped = False
|
||||
|
||||
with open(audit, "w", newline="", encoding="utf-8") as fh:
|
||||
w = csv.writer(fh)
|
||||
w.writerow(["case_number", "halacha_id", "old_status", "final_verdict",
|
||||
"votes", "score", "canonical_id", "rule"])
|
||||
for k, dec in enumerate(decisions, 1):
|
||||
if throttle:
|
||||
over, detail = _throttled()
|
||||
if over:
|
||||
print(f"\n⏸ usage ceiling ({detail}) — stopping at {k-1}/{len(decisions)}. "
|
||||
f"Re-run to resume.", flush=True)
|
||||
stopped = True
|
||||
break
|
||||
judged = await _judge_decision(dec, sem)
|
||||
survivors = sum(1 for j in judged if j["final_verdict"] in ("approved", "pending_review"))
|
||||
n_in += len(judged)
|
||||
n_out += survivors
|
||||
for j in judged:
|
||||
tally[j["final_verdict"]] += 1
|
||||
w.writerow([dec["case_number"], str(j["id"]), j["review_status"],
|
||||
j["final_verdict"], j["votes"], j["score"],
|
||||
str(j.get("canonical_id") or ""), (j["rule_statement"] or "")[:160]])
|
||||
if apply and judged:
|
||||
await _apply_decision(judged, reviewer)
|
||||
print(f"[{k}/{len(decisions)}] {dec['case_number']:<16} "
|
||||
f"{len(judged)}→{survivors} survive", flush=True)
|
||||
if verbose:
|
||||
for j in judged:
|
||||
mark = {"approved": "✓", "pending_review": "→chair", "rejected": "✗"}[j["final_verdict"]]
|
||||
print(f" {mark} v={j['votes']} s={j['score']} {(j['rule_statement'] or '')[:80]}")
|
||||
|
||||
print(f"\n── {mode} summary{' (stopped early)' if stopped else ''} ──")
|
||||
print(f" principles judged: {n_in} → survive: {n_out} ({n_in - n_out} rejected)")
|
||||
for v, c in tally.most_common():
|
||||
print(f" {v:<16} {c}")
|
||||
print(f"\naudit CSV: {audit}")
|
||||
if not apply:
|
||||
print("dry-run — no DB writes. Re-run with --apply to commit (reversible).")
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
p = argparse.ArgumentParser(description="Retroactive principle cull via 3-model panel (#152)")
|
||||
p.add_argument("--apply", action="store_true", help="write verdicts (reversible, CSV-backed)")
|
||||
p.add_argument("--dry-run", action="store_true", help="explicit dry-run (default)")
|
||||
p.add_argument("--limit", type=int, default=None)
|
||||
p.add_argument("--sample", type=int, default=None, help="random sample of N decisions")
|
||||
p.add_argument("--no-throttle", action="store_true")
|
||||
p.add_argument("--verbose", action="store_true")
|
||||
a = p.parse_args()
|
||||
return asyncio.run(_run(a.apply, a.limit, a.sample, not a.no_throttle, a.verbose))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
Reference in New Issue
Block a user