feat(digests): self-heal in drain_digests — auto-resume after quota/interruption

ה-cron של drain_digests הוא מנגנון ה-resume (pending-based, idempotent, host-side,
לא תלוי בסשן). חיזוק: אם enrich נכשל באמצע (מכסת claude נגמרה) השורה נשארה
'completed' עם שדות ריקים → לא היתה מטופלת שוב. עכשיו drain מאפס בתחילתו כל
digest 'completed' עם concept_tag ריק *וגם* underlying_citation ריק (= חילוץ
שמעולם לא נחת; שורה תקינה תמיד מכילה לפחות מראה-מקום) → pending לריצה חוזרת.
כך כל קטיעה/מכסה מתאוששת אוטומטית בריצת ה-cron הבאה.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 20:59:49 +00:00
parent 106ab53231
commit 3ae183009f

View File

@@ -36,6 +36,20 @@ CONCURRENCY = int(os.environ.get("DIGEST_DRAIN_CONCURRENCY", "3"))
async def main() -> int:
pool = await db.get_pool()
# Self-heal: an enrich that failed mid-LLM (e.g. the local claude
# subscription window was exhausted) can leave a row 'completed' with no
# concept_tag AND no underlying_citation — a real digest always extracts at
# least a citation, so "both empty" means the extraction never landed. Reset
# those to 'pending' so the next run retries (idempotent auto-resume). Safe:
# successfully-enriched rows always have a concept_tag or citation.
healed = await pool.execute(
"UPDATE digests SET extraction_status = 'pending' "
"WHERE extraction_status = 'completed' "
"AND coalesce(concept_tag,'') = '' AND coalesce(underlying_citation,'') = '' "
"AND coalesce(analysis_text,'') <> ''"
)
if healed and healed != "UPDATE 0":
print(f"self-heal: reset failed-empty digests → pending ({healed})", flush=True)
rows = await pool.fetch(
"SELECT id FROM digests WHERE extraction_status = 'pending' ORDER BY created_at"
)