feat(goldset): interactive gold-set tagging page (#81.7/#81.8)

Replaces the CSV-edit workflow with an in-app tagging page so the chair/Dafna can label the extraction-quality gold-set by clicking, and see validator precision/recall live. Schema (V29): halacha_goldset — a stratified, human-tagged evaluation batch (is_holding / correct_type / quote_complete, NULL until tagged). db.py: - goldset_create_sample (stratified round-robin over case×rule_type, idempotent), - goldset_list (items + halacha content + the machine's own labels), - goldset_tag (partial — one field at a time for keyboard tagging), - goldset_score (ports the script's P/R/F1: each validator scored as a not-a-holding detector against the human tags — the #81.8 input). API: GET /api/goldset, POST /api/goldset/sample, GET /api/goldset/score, PATCH /api/goldset/{id}. web-ui: - lib/api/goldset.ts (hooks), - components/goldset/goldset-panel.tsx — card-per-item, keyboard-first (J/K nav, H/N holding, C/X quote), progress bar, hide-tagged toggle, and a collapsible live score table, - app/goldset/page.tsx + nav link "מדגם-זהב" under ידע ולמידה. Methodology guard kept explicit in UI + docstrings: tags are HUMAN ground truth, no AI pre-fill (circular bias). Populated a 150-item stratified batch. Verified: backend create/list/tag/score against the live DB; tsc --noEmit 0; py_compile ok. (Local Turbopack build blocked by worktree symlink — CI builds clean.) Invariants: G1 (eval set modeled at source in its own table); G2 (reuses the same halacha_quality validators the extractor runs — no parallel scoring logic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:52:05 +00:00
parent 9bd247c421
commit ac279220c4
6 changed files with 632 additions and 1 deletions
--- a/web-ui/src/app/goldset/page.tsx
+++ b/web-ui/src/app/goldset/page.tsx
@@ -0,0 +1,41 @@
+"use client";
+
+import Link from "next/link";
+import { AppShell } from "@/components/app-shell";
+import { GoldsetPanel } from "@/components/goldset/goldset-panel";
+
+/**
+ * Gold-set tagging page (#81.7 / #81.8).
+ *
+ * Interactive review of a stratified halacha sample. The chair/Dafna labels each
+ * item (is_holding / correct_type / quote_complete); those human labels are the
+ * ground truth that measures the extraction validators and recalibrates the
+ * auto-approve threshold. Tags MUST be human — no AI pre-fill (circular bias).
+ */
+export default function GoldsetPage() {
+  return (
+    <AppShell>
+      <section className="space-y-6">
+        <header>
+          <nav className="text-[0.78rem] text-ink-muted mb-1">
+            <Link href="/" className="hover:text-gold-deep">בית</Link>
+            <span aria-hidden> · </span>
+            <span className="text-navy">מדגם-זהב לתיוג</span>
+          </nav>
+          <h1 className="text-navy mb-0">מדגם-זהב לתיוג איכות</h1>
+          <p className="text-ink-muted text-sm mt-1 max-w-3xl">
+            מדגם מרובד של הלכות שחולצו. לכל הלכה הכריעו שלוש שאלות —
+            <strong> האם זו הלכה אמיתית</strong>, <strong>מה הסוג הנכון</strong>,
+            ו<strong>האם הציטוט שלם</strong>. ההכרעות שלכם הן אמת-המידה שמודדת את
+            דיוק המחלץ ומכיילת את סף-האישור האוטומטי. שיפוט משפטי אנושי בלבד —
+            לא תיוג-AI (כדי למנוע הטיה מעגלית).
+          </p>
+        </header>
+
+        <div className="h-[2px] bg-gradient-to-l from-transparent via-gold to-transparent" />
+
+        <GoldsetPanel />
+      </section>
+    </AppShell>
+  );
+}