feat(goldset): AI second-opinion per item (QA aid) — compare vs human tag

The chair wanted an independent recommendation beside each tag, to reconsider his own judgments. Adds a NON-ground-truth AI second-opinion: - schema: halacha_goldset.ai_is_holding / ai_correct_type / ai_rationale / ai_generated_at (additive). - db.goldset_set_ai_recommendation + goldset_list now returns the ai_* fields. - scripts/goldset_ai_recommend.py — local claude_session judges is_holding + type + a one-line rationale per item, INDEPENDENTLY (own legal rubric). Independent of the rule-based validators #81.8 measures → no circularity. Never auto-applied; QA aid only. - web-ui: each card shows "🤖 המלצת AI: הלכה/לא · type" + rationale and an agreement/disagreement chip vs the human tag (amber on disagree); a "⚠ אי-הסכמות AI (N)" filter to review only the conflicts. Methodology note kept explicit: the human stays the ground truth; the AI is a prompt to reconsider, not to copy. Verified: tsc --noEmit 0; generator stores recs and flags disagreements with existing human tags. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:24:35 +00:00
parent a0c1b74c55
commit 0e35060d3d
5 changed files with 184 additions and 3 deletions
--- a/web-ui/src/lib/api/goldset.ts
+++ b/web-ui/src/lib/api/goldset.ts
@@ -29,6 +29,11 @@ export type GoldsetItem = {
  case_number: string | null;
  case_name: string | null;
  source_type: string | null;  // 'court_ruling' | 'appeals_committee' | ''
+  // AI second-opinion (QA aid — independent, not ground truth, not auto-applied)
+  ai_is_holding: boolean | null;
+  ai_correct_type: string;
+  ai_rationale: string;
+  ai_generated_at: string | null;
 };

 export type GoldsetScore = {