feat(goldset): AI second-opinion per item (QA aid) — compare vs human tag
The chair wanted an independent recommendation beside each tag, to reconsider his own judgments. Adds a NON-ground-truth AI second-opinion: - schema: halacha_goldset.ai_is_holding / ai_correct_type / ai_rationale / ai_generated_at (additive). - db.goldset_set_ai_recommendation + goldset_list now returns the ai_* fields. - scripts/goldset_ai_recommend.py — local claude_session judges is_holding + type + a one-line rationale per item, INDEPENDENTLY (own legal rubric). Independent of the rule-based validators #81.8 measures → no circularity. Never auto-applied; QA aid only. - web-ui: each card shows "🤖 המלצת AI: הלכה/לא · type" + rationale and an agreement/disagreement chip vs the human tag (amber on disagree); a "⚠ אי-הסכמות AI (N)" filter to review only the conflicts. Methodology note kept explicit: the human stays the ground truth; the AI is a prompt to reconsider, not to copy. Verified: tsc --noEmit 0; generator stores recs and flags disagreements with existing human tags. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -29,6 +29,11 @@ export type GoldsetItem = {
|
||||
case_number: string | null;
|
||||
case_name: string | null;
|
||||
source_type: string | null; // 'court_ruling' | 'appeals_committee' | ''
|
||||
// AI second-opinion (QA aid — independent, not ground truth, not auto-applied)
|
||||
ai_is_holding: boolean | null;
|
||||
ai_correct_type: string;
|
||||
ai_rationale: string;
|
||||
ai_generated_at: string | null;
|
||||
};
|
||||
|
||||
export type GoldsetScore = {
|
||||
|
||||
Reference in New Issue
Block a user