feat(learning): FU-4 — זיקוק-רובריקה propose-only מהכרעות-היו"ר (#133)

job תקופתי שסוגר את לולאת-הלמידה: מצליב את סבבי-הפאנל (FU-1, הצבעות+ נימוקים) מול הכרעות-היו"ר (FU-2 seeds), מזהה כשלים שיטתיים, ומציע KEEP_SYSTEM v2 + exemplars מופשטים — כדוח-diff לעיון-היו"ר. לעולם לא auto-applied. - db.panel_rounds_vs_chair() — read-only LATERAL join: לכל הלכה עם seed chair-live (FU-2, אמת אנושית) + סבב-פאנל אחרון (FU-1) → הצבעות+נימוקי- 3-השופטים מול keep/drop של היו"ר. הסיגנל היחיד = הכרעת-יו"ר, לא הצבעות-הפאנל (anti-echo-chamber, INV-LRN1). - scripts/halacha_rubric_distill.py: • analyze_pairs() — ליבה דטרמיניסטית טהורה (offline-testable): false-keep (פאנל שמר, יו"ר דחה), false-drop, פיצולים-שהוכרעו, שיעור-מחלוקת-עם- היו"ר לכל שופט; בוחר ראיות-מחלוקת מכוסות. • הצעת-LLM מקומית (claude_session, tools="", אפס עלות): מזהה דפוסי-כשל ומציע נוסח-רובריקה v2 + exemplars מופשטים (INV-LRN5 — בלי מהות-תיק). • כותב data/learning/rubric-proposal-<ts>.md עם diff(KEEP_SYSTEM→v2); אף שורת-קוד לא משתנה. אימוץ = עריכה ידנית דרך PR (INV-LRN1). • <12 זוגות → "אין מספיק נתונים" (מצב נוכחי: seeds עדיין מצטברים). • --no-llm (סטטיסטיקה בלבד) / --limit N. - tests/test_rubric_distill.py — 8 בדיקות offline על analyze_pairs. - SCRIPTS.md עודכן. smoke read-only עבר (0 זוגות → insufficient-data). תואם הדפוס הקיים (style_lesson_panel/halacha_panel_audit): פאנל מציע, הטמעה נשארת שער-יו"ר ידני. Invariants: INV-LRN1 (propose-only) · INV-LRN5 (טוהר-רובריקה) · INV-G10 · anti-echo-chamber. בלי שער/UI חדש. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 06:59:34 +00:00
parent 305c084d0c
commit 4cad17df3a
4 changed files with 359 additions and 0 deletions
--- a/mcp-server/src/legal_mcp/services/db.py
+++ b/mcp-server/src/legal_mcp/services/db.py
@@ -5237,6 +5237,50 @@ async def seed_goldset_from_chair(
        return False


+async def panel_rounds_vs_chair(limit: int = 2000) -> list[dict]:
+    """Read-only join for rubric distillation (#133 / FU-4).
+
+    For every halacha that has BOTH a chair-live gold-set seed (FU-2 — the
+    human keep/drop ground-truth) AND at least one captured panel round (FU-1),
+    return the LATEST round's per-judge votes+reasons+verdict next to the
+    chair's label and the halacha text. These (panel ⋈ chair) pairs are the
+    ONLY signal the distillation may learn from — human ground-truth, never the
+    panel's own votes (echo-chamber guard, INV-LRN1). Purely analytical: reads
+    capture tables, writes nothing."""
+    pool = await get_pool()
+    rows = await pool.fetch(
+        """
+        SELECT g.halacha_id::text AS halacha_id,
+               g.is_holding AS chair_keep, g.tagged_at AS chair_at,
+               h.rule_statement, h.reasoning_summary, h.supporting_quote,
+               h.rule_type, h.quality_flags,
+               pr.question, pr.verdict, pr.applied_action, pr.round_ts,
+               pr.claude_vote, pr.claude_reason,
+               pr.deepseek_vote, pr.deepseek_reason,
+               pr.gemini_vote, pr.gemini_reason
+        FROM halacha_goldset g
+        JOIN halachot h ON h.id = g.halacha_id
+        JOIN LATERAL (
+            SELECT * FROM halacha_panel_rounds r
+            WHERE r.halacha_id = g.halacha_id
+            ORDER BY r.round_ts DESC LIMIT 1
+        ) pr ON true
+        WHERE g.batch = 'chair-live' AND g.is_holding IS NOT NULL
+        ORDER BY g.tagged_at DESC NULLS LAST
+        LIMIT $1
+        """,
+        limit,
+    )
+    out = []
+    for r in rows:
+        d = dict(r)
+        for k in ("chair_at", "round_ts"):
+            if d.get(k) is not None:
+                d[k] = d[k].isoformat()
+        out.append(d)
+    return out
+
+
 async def goldset_tag(
    goldset_id: UUID, *, is_holding: bool | None = None,
    correct_type: str | None = None, quote_complete: bool | None = None,