feat(digests): digest_kind classification — robust extraction for all issue types (X12)

~2% מגיליונות "כל יום" הם לא-הכרעות (עדכוני-חקיקה/הודעות/ברכות) ללא ruling →
החילוץ ה-decision-centric החזיר ריק → both-empty → מחזורי ב-self-heal.

- SCHEMA_V32: `digest_kind` (decision/announcement/other) + backfill legacy בזול
  (יש citation→decision, אחרת announcement) — לפני שה-self-heal מסתמך עליו.
- extractor: prompt מסווג + מחלץ תמיד concept/headline/summary; underlying_* רק
  ל-decision. extract מנרמל digest_kind.
- enrich: שומר digest_kind; חילוץ מוצלח תמיד מסתיים ב-kind לא-ריק (ברירת-מחדל
  לפי citation אם המודל השמיט).
- drain self-heal: הגדרת-כשל = completed עם digest_kind='' (במקום both-empty) →
  הודעות לא מנוסות-מחדש לנצח.
- db: digest_kind ב-_DIGEST_COLS + update-whitelist (זורם ל-search/list/API).
- X12 spec: תיעוד digest_kind + הגדרת-הכשל המתוקנת.

אומת: V32 סיווג 533 (525 decision + 8 announcement, 0 unclassified — self-heal
לא נוגע בהם). extract: 5163→decision+citation · 5060→announcement+concept,
citation ריק (לא both-empty).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 06:02:08 +00:00
parent 5bf2ea0262
commit 83d1a8253c
5 changed files with 67 additions and 21 deletions

View File

@@ -211,6 +211,16 @@ async def enrich_digest(digest_id: UUID | str, progress: ProgressCb | None = Non
fields["underlying_date"] = extracted["underlying_date"]
if not (row.get("subject_tags") or []) and extracted.get("subject_tags"):
fields["subject_tags"] = extracted["subject_tags"]
# digest_kind classifies the issue (decision vs announcement). A successful
# extraction (any field returned) must end with a non-empty kind — that is the
# signal the drain self-heal uses to tell "enriched" from "failed". If the
# model omitted it, infer: a ruling citation → decision, else announcement.
if extracted and not (row.get("digest_kind") or "").strip():
kind = extracted.get("digest_kind")
if kind not in ("decision", "announcement", "other"):
cite = fields.get("underlying_citation") or row.get("underlying_citation") or ""
kind = "decision" if cite.strip() else "announcement"
fields["digest_kind"] = kind
if fields:
try: