feat(learning): FU-1 — לכידת סבבי-פאנל להלכות (#133)
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 7s

לולאת ה-active-learning זקוקה לסיגנל ללמוד ממנו, אבל הפאנל
(halacha_panel_approve.py) זרק עד כה את הצבעות-3-השופטים ואת
ההנמקות — שרד רק review_status הסופי על halachot. בלי
ההצבעות+הנימוקים אין דרך לזקק rubric משופר.

FU-1:
- טבלה חדשה halacha_panel_rounds (SCHEMA_V35) — שורה לכל
  (הלכה, סבב): הצבעה+נימוק לכל לינאז' (claude/deepseek/gemini),
  ה-verdict, ומה הריצה עשתה (applied_action), apply_mode.
  במתכונת עמודות-הפאנל של halacha_goldset.
- db.insert_panel_round() — helper כתיבה (capture-only).
- halacha_panel_approve.py: שומר את התשובות הגולמיות (במקום
  לזרוק את הנימוק), מוסיף reason ל-NLI_SYSTEM, וכותב סבב לכל
  פריט בשני המצבים (dry-run ו---apply). --no-capture לדילוג.

capture-only: לעולם לא נוגע ב-halachot — שער-היו"ר ב-/precedents
נשאר מקור-האמת היחיד (INV-G10). ה-seed ללמידה נוצר בהצלבה מול
הכרעת-היו"ר המאוחרת על אותה הלכה (FU-2).

Invariants: מקיים INV-G10 (capture-only, שער-יו"ר יחיד),
INV-LRN1/3 (לכידה-מבנית; propose-only — אין auto-commit),
G1 (לכידה-במקור), G2 (יכולת חדשה, לא מסלול-מקביל),
G12 (לא נוגע ב-Paperclip port). חלק מ-#133.

smoke (dry-run --limit 8): 6 nli captured, errors=0, נימוקים
מלאים מ-3 השופטים.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-12 04:22:48 +00:00
parent b4e79aa8fa
commit 0a7869175e
3 changed files with 168 additions and 61 deletions

View File

@@ -22,10 +22,17 @@ Three buckets of pending_review:
3. other quality flags (quote_unverified/truncated/thin) → genuine extraction
defects → flagged for re-extraction, never auto-approved.
DRY-RUN writes NOTHING. --apply acts on the agreed verdicts (clean: 2/3 majority;
DRY-RUN writes no DECISIONS. --apply acts on the agreed verdicts (clean: 2/3 majority;
nli: unanimous-entailed clears the flag) — reversible, backed up to data/audit/ first.
Splits/defects stay pending_review for the chair. Local-only (claude_session needs CLI).
FU-1 (#133, active-learning): EVERY adjudication — votes AND per-judge rationale — is
persisted to halacha_panel_rounds in BOTH modes (a dry-run analysis is still a learning
datapoint; apply_mode records which). This is capture-only and never touches `halachot`
(the chair gate stays the single source of truth, INV-G10). The learning seed is formed
later by joining a round against the chair's own later decision on the same halacha. Pass
--no-capture to skip.
cd ~/legal-ai/mcp-server
.venv/bin/python ../scripts/halacha_panel_approve.py --limit 12 # smoke
.venv/bin/python ../scripts/halacha_panel_approve.py # full dry-run
@@ -75,7 +82,8 @@ KEEP_SYSTEM = (
NLI_SYSTEM = (
"אתה בודק היסק משפטי. בהינתן כלל וציטוט-תומך, הכרע האם הציטוט באמת תומך בכלל "
"ואינו מרחיב מעבר למה שכתוב בו (entailed=true), או שהכלל מרחיב/חורג מהציטוט "
'(entailed=false). החזר JSON בלבד: {"entailed": true/false}. ללא markdown, ללא הסבר.'
'(entailed=false). החזר JSON בלבד: {"entailed": true/false, "reason": "<משפט קצר>"}. '
"ללא markdown."
)
@@ -161,6 +169,8 @@ async def panel_vote(client, system, user, key) -> dict:
votes["_verdict"] = ("unanimous_yes" if unanimous_yes else
"unanimous_no" if unanimous_no else
"split" if len(valid) >= 2 else "incomplete")
# keep the raw replies so the per-judge rationale can be persisted (FU-1)
votes["_raw"] = {"claude": c, "deepseek": ds, "gemini": gm}
return votes
@@ -189,6 +199,9 @@ async def main(args: argparse.Namespace) -> int:
buckets[bucket(h)].append(h)
print("queue:", {k: len(v) for k, v in buckets.items()}, "\n", flush=True)
# one stamp shared by the whole run, so a round is reconstructable later (FU-1)
round_ts = datetime.now(timezone.utc)
sem = asyncio.Semaphore(args.concurrency)
results = {"clean": [], "nli": []}
@@ -259,10 +272,6 @@ async def main(args: argparse.Namespace) -> int:
# NLI → asymmetric: unanimous-entailed → clear nli flag (+approve if clean),
# majority not-entailed → rejected, else → chair
# DEFECT → untouched (needs re-extraction)
if not args.apply:
print("\n(dry-run — pass --apply to write the approved policy)")
return 0
def majority(v: dict) -> bool | None:
vs = [v[k] for k in ("claude", "deepseek", "gemini") if v[k] is not None]
if len(vs) < 2:
@@ -270,63 +279,89 @@ async def main(args: argparse.Namespace) -> int:
y, n = sum(vs), len(vs) - sum(vs)
return True if y > n else (False if n > y else None)
ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
audit = Path(__file__).resolve().parent.parent / "data" / "audit"
audit.mkdir(parents=True, exist_ok=True)
backup = audit / f"halacha-panel-apply-backup-{ts}.csv"
with backup.open("w", encoding="utf-8", newline="") as f:
w = csv.writer(f)
w.writerow(["id", "review_status", "quality_flags"])
for r in clean + nli:
h = r["_h"]
w.writerow([h["id"], h["review_status"], "|".join(h.get("quality_flags") or [])])
if args.apply:
ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
audit = Path(__file__).resolve().parent.parent / "data" / "audit"
audit.mkdir(parents=True, exist_ok=True)
backup = audit / f"halacha-panel-apply-backup-{ts}.csv"
with backup.open("w", encoding="utf-8", newline="") as f:
w = csv.writer(f)
w.writerow(["id", "review_status", "quality_flags"])
for r in clean + nli:
h = r["_h"]
w.writerow([h["id"], h["review_status"], "|".join(h.get("quality_flags") or [])])
pool = await db.get_pool()
REV = "panel:opus+deepseek+gemini"
approved = rejected = cleared = chair = 0
pool = await db.get_pool()
REV = "panel:opus+deepseek+gemini"
approved = rejected = cleared = chair = 0
for r in clean:
d = majority(r)
if d is True:
await pool.execute("UPDATE halachot SET review_status='approved', "
"reviewed_at=now(), reviewer=$2, updated_at=now() WHERE id=$1",
r["_h"]["id"], REV + " 2/3-keep")
approved += 1
elif d is False:
await pool.execute("UPDATE halachot SET review_status='rejected', "
"reviewed_at=now(), reviewer=$2, updated_at=now() WHERE id=$1",
r["_h"]["id"], REV + " 2/3-drop")
rejected += 1
else:
chair += 1
for r in clean:
d = majority(r)
if d is True:
await pool.execute("UPDATE halachot SET review_status='approved', "
"reviewed_at=now(), reviewer=$2, updated_at=now() WHERE id=$1",
r["_h"]["id"], REV + " 2/3-keep")
approved += 1; r["_action"] = "approved"
elif d is False:
await pool.execute("UPDATE halachot SET review_status='rejected', "
"reviewed_at=now(), reviewer=$2, updated_at=now() WHERE id=$1",
r["_h"]["id"], REV + " 2/3-drop")
rejected += 1; r["_action"] = "rejected"
else:
chair += 1; r["_action"] = "chair"
for r in nli:
vs = [r[k] for k in ("claude", "deepseek", "gemini") if r[k] is not None]
unanimous_yes = len(vs) == 3 and all(vs)
maj_no = len(vs) >= 2 and sum(vs) < len(vs) - sum(vs)
if unanimous_yes:
rest = [x for x in (r["_h"].get("quality_flags") or []) if x != "nli_unsupported"]
if rest: # other flags remain → clear nli but keep in queue
await pool.execute("UPDATE halachot SET quality_flags=$2, updated_at=now() "
"WHERE id=$1", r["_h"]["id"], rest)
cleared += 1; chair += 1
else: # nli was the only blocker → clear + approve
await pool.execute("UPDATE halachot SET quality_flags='{}', "
"review_status='approved', reviewed_at=now(), reviewer=$2, "
"updated_at=now() WHERE id=$1", r["_h"]["id"], REV + " 3/3-entailed")
approved += 1; cleared += 1
elif maj_no:
await pool.execute("UPDATE halachot SET review_status='rejected', "
"reviewed_at=now(), reviewer=$2, updated_at=now() WHERE id=$1",
r["_h"]["id"], REV + " maj-not-entailed")
rejected += 1
else:
chair += 1
for r in nli:
vs = [r[k] for k in ("claude", "deepseek", "gemini") if r[k] is not None]
unanimous_yes = len(vs) == 3 and all(vs)
maj_no = len(vs) >= 2 and sum(vs) < len(vs) - sum(vs)
if unanimous_yes:
rest = [x for x in (r["_h"].get("quality_flags") or []) if x != "nli_unsupported"]
if rest: # other flags remain → clear nli but keep in queue
await pool.execute("UPDATE halachot SET quality_flags=$2, updated_at=now() "
"WHERE id=$1", r["_h"]["id"], rest)
cleared += 1; chair += 1; r["_action"] = "nli_cleared"
else: # nli was the only blocker → clear + approve
await pool.execute("UPDATE halachot SET quality_flags='{}', "
"review_status='approved', reviewed_at=now(), reviewer=$2, "
"updated_at=now() WHERE id=$1", r["_h"]["id"], REV + " 3/3-entailed")
approved += 1; cleared += 1; r["_action"] = "approved"
elif maj_no:
await pool.execute("UPDATE halachot SET review_status='rejected', "
"reviewed_at=now(), reviewer=$2, updated_at=now() WHERE id=$1",
r["_h"]["id"], REV + " maj-not-entailed")
rejected += 1; r["_action"] = "rejected"
else:
chair += 1; r["_action"] = "chair"
print(f"\nAPPLIED (reversible): approved {approved} · rejected {rejected} · "
f"nli-flag-cleared {cleared} · left to chair {chair + len(buckets['defect'])} "
f"(incl. {len(buckets['defect'])} defects for re-extraction)")
print(f"backup → {backup}")
print(f"\nAPPLIED (reversible): approved {approved} · rejected {rejected} · "
f"nli-flag-cleared {cleared} · left to chair {chair + len(buckets['defect'])} "
f"(incl. {len(buckets['defect'])} defects for re-extraction)")
print(f"backup → {backup}")
else:
print("\n(dry-run — pass --apply to write the approved policy)")
# ── FU-1 (#133): persist EVERY adjudication so active-learning has a signal.
# Capture-only — writes to halacha_panel_rounds, never touches `halachot`
# (chair gate stays the single source of truth, INV-G10). Runs in BOTH modes:
# a dry-run analysis is still a learning datapoint (apply_mode records which).
if not args.no_capture:
captured = errs = 0
for tag, q in (("clean", "keep"), ("nli", "entailed")):
for r in results[tag]:
raw = r.get("_raw") or {}
try:
await db.insert_panel_round(
r["_h"]["id"], round_ts=round_ts, question=q, bucket=tag,
claude=raw.get("claude"), deepseek=raw.get("deepseek"),
gemini=raw.get("gemini"), vote_key=q, verdict=r["_verdict"],
applied_action=r.get("_action", ""), apply_mode=args.apply,
)
captured += 1
except Exception as e:
errs += 1
print(f" capture-error {r['_h']['id']}: {e}", flush=True)
print(f"captured {captured} panel rounds → halacha_panel_rounds "
f"(apply_mode={args.apply}, errors={errs})")
return 0
@@ -337,4 +372,6 @@ if __name__ == "__main__":
ap.add_argument("--concurrency", type=int, default=6)
ap.add_argument("--apply", action="store_true",
help="write the agreed verdicts (reversible, CSV-backed); default dry-run")
ap.add_argument("--no-capture", action="store_true",
help="skip persisting per-judge votes+reasons to halacha_panel_rounds (FU-1, #133)")
raise SystemExit(asyncio.run(main(ap.parse_args())))