feat(goldset): interactive gold-set tagging page (#81.7/#81.8)
Replaces the CSV-edit workflow with an in-app tagging page so the chair/Dafna
can label the extraction-quality gold-set by clicking, and see validator
precision/recall live.
Schema (V29): halacha_goldset — a stratified, human-tagged evaluation batch
(is_holding / correct_type / quote_complete, NULL until tagged).
db.py:
- goldset_create_sample (stratified round-robin over case×rule_type, idempotent),
- goldset_list (items + halacha content + the machine's own labels),
- goldset_tag (partial — one field at a time for keyboard tagging),
- goldset_score (ports the script's P/R/F1: each validator scored as a
not-a-holding detector against the human tags — the #81.8 input).
API: GET /api/goldset, POST /api/goldset/sample, GET /api/goldset/score,
PATCH /api/goldset/{id}.
web-ui:
- lib/api/goldset.ts (hooks),
- components/goldset/goldset-panel.tsx — card-per-item, keyboard-first
(J/K nav, H/N holding, C/X quote), progress bar, hide-tagged toggle, and a
collapsible live score table,
- app/goldset/page.tsx + nav link "מדגם-זהב" under ידע ולמידה.
Methodology guard kept explicit in UI + docstrings: tags are HUMAN ground truth,
no AI pre-fill (circular bias). Populated a 150-item stratified batch.
Verified: backend create/list/tag/score against the live DB; tsc --noEmit 0;
py_compile ok. (Local Turbopack build blocked by worktree symlink — CI builds clean.)
Invariants: G1 (eval set modeled at source in its own table); G2 (reuses the same
halacha_quality validators the extractor runs — no parallel scoring logic).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
53
web/app.py
53
web/app.py
@@ -6099,6 +6099,59 @@ async def halacha_equivalents_unlink(halacha_id: str, other_id: str):
|
||||
return {"ok": await db.unlink_equivalent_halachot(hid, oid)}
|
||||
|
||||
|
||||
# ── Gold-set tagging (#81.7 / #81.8) ─────────────────────────────────────────
|
||||
|
||||
class GoldsetSampleRequest(BaseModel):
|
||||
n: int = 150
|
||||
batch: str = "default"
|
||||
reset: bool = False
|
||||
|
||||
|
||||
class GoldsetTagRequest(BaseModel):
|
||||
is_holding: bool | None = None
|
||||
correct_type: str | None = None
|
||||
quote_complete: bool | None = None
|
||||
tagged_by: str = "chair"
|
||||
|
||||
|
||||
@app.get("/api/goldset")
|
||||
async def goldset_list_ep(batch: str = "default"):
|
||||
"""The gold-set tagging queue (halacha content + machine labels + human tags)."""
|
||||
return {"items": await db.goldset_list(batch), "batch": batch}
|
||||
|
||||
|
||||
@app.post("/api/goldset/sample")
|
||||
async def goldset_sample_ep(req: GoldsetSampleRequest):
|
||||
"""Create/extend a stratified gold-set batch for tagging (#81.7)."""
|
||||
return await db.goldset_create_sample(n=req.n, batch=req.batch, reset=req.reset)
|
||||
|
||||
|
||||
@app.get("/api/goldset/score")
|
||||
async def goldset_score_ep(batch: str = "default"):
|
||||
"""Measure the extraction validators against the human tags (#81.8)."""
|
||||
return await db.goldset_score(batch)
|
||||
|
||||
|
||||
@app.patch("/api/goldset/{goldset_id}")
|
||||
async def goldset_tag_ep(goldset_id: str, req: GoldsetTagRequest):
|
||||
"""Save one human tag on a gold-set item."""
|
||||
try:
|
||||
gid = UUID(goldset_id)
|
||||
except ValueError:
|
||||
raise HTTPException(400, "מזהה לא תקין")
|
||||
if req.correct_type and req.correct_type not in (
|
||||
"binding", "interpretive", "obiter", "application", "procedural", "persuasive",
|
||||
):
|
||||
raise HTTPException(400, "correct_type לא תקין")
|
||||
row = await db.goldset_tag(
|
||||
gid, is_holding=req.is_holding, correct_type=req.correct_type,
|
||||
quote_complete=req.quote_complete, tagged_by=req.tagged_by,
|
||||
)
|
||||
if not row:
|
||||
raise HTTPException(404, "פריט לא נמצא")
|
||||
return {"ok": True}
|
||||
|
||||
|
||||
@app.patch("/api/halachot/{halacha_id}")
|
||||
async def halacha_update(halacha_id: str, req: HalachaUpdateRequest):
|
||||
"""Approve / reject / edit a halacha. Used by the chair review queue."""
|
||||
|
||||
Reference in New Issue
Block a user