feat(halacha-triage): quality-gated + prioritized review queue + metrics (#84)

Backend for the halacha approval-queue triage (#84). The keyboard UI, batch actions and defer/reject (#84.4–6) already shipped; this adds the gating, prioritization and metrics the queue was missing. db.list_halachot — two opt-in triage controls: * exclude_low_quality (#84.1): drop items carrying ANY quality_flag (application / quote_unverified / truncated / non_decision / thin / nli_unsupported / near_duplicate) — they belong in a 'needs extraction fix' bucket, not the chair's approve queue. * order_by_priority (#84.3): active-learning order — negatively-treated first, then most-uncertain (lowest confidence), then oldest — instead of FIFO, so the highest-value decisions surface first. halachot_pending (MCP) — now gated + prioritized BY DEFAULT; include_low_quality= true reveals the needs-fix bucket. The agent review path benefits immediately. GET /api/halachot — same two params, default OFF (non-breaking; the UI opts in). metrics.halacha_backlog (#84.7) — splits pending into clean vs flagged, adds deferred, reviewed_total, approve_ratio, and a pending_by_flag breakdown, so the backlog distinguishes real review work from extraction noise. Deferred (documented): #84.2 near-duplicate cluster cards and wiring the UI fetch to the new params require frontend work + an api:types regen AFTER this deploys (the new query params aren't in prod's OpenAPI until then) — a clean follow-up. The backend fully supports both now. Verified against the live DB (read-only): - pending 177 → gated-clean 110, 0 flagged items leak into the clean queue. - priority order surfaces the lowest-confidence items first (0.55, 0.55, ...). - backlog: pending_clean=110 / pending_flagged=67 / approve_ratio=0.916, pending_by_flag={nli_unsupported:59, quote_unverified:3, thin:3, truncated:2}. - pytest tests/test_halacha_quality.py — 52 passed (no regression). Invariants: G1 (gate at source — SQL filter, not post-hoc); G2 (no parallel path — same list_halachot); §6 (flagged items routed to a bucket, never dropped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 20:00:52 +00:00
parent 32ef259843
commit 420cb819f5
4 changed files with 70 additions and 5 deletions
--- a/mcp-server/src/legal_mcp/services/metrics.py
+++ b/mcp-server/src/legal_mcp/services/metrics.py
@@ -117,12 +117,33 @@ async def halacha_backlog(conn) -> dict:
    oldest = await conn.fetchval(
        "SELECT MIN(created_at) FROM halachot WHERE review_status = 'pending_review'"
    )
+    # #84.7 — split the pending bucket: how many are genuine candidates (clean)
+    # vs flagged 'needs extraction fix', and the breakdown by flag, so the chair
+    # sees how much of the backlog is real review vs extraction noise.
+    pending_clean = await conn.fetchval(
+        "SELECT COUNT(*) FROM halachot WHERE review_status = 'pending_review' "
+        "AND COALESCE(array_length(quality_flags, 1), 0) = 0"
+    )
+    flag_rows = await conn.fetch(
+        "SELECT flag, COUNT(*) AS n FROM ("
+        "  SELECT unnest(quality_flags) AS flag FROM halachot "
+        "  WHERE review_status = 'pending_review'"
+        ") t GROUP BY flag ORDER BY n DESC"
+    )
+    pending_total = counts.get("pending_review", 0)
+    reviewed = counts.get("approved", 0) + counts.get("rejected", 0) + counts.get("published", 0)
    return {
-        "pending_review": counts.get("pending_review", 0),
+        "pending_review": pending_total,
+        "pending_clean": pending_clean,           # real review candidates (#84.1)
+        "pending_flagged": pending_total - pending_clean,  # needs-fix bucket
        "approved": counts.get("approved", 0),
        "rejected": counts.get("rejected", 0),
+        "deferred": counts.get("deferred", 0),
        "published": counts.get("published", 0),
        "total": sum(counts.values()),
+        "reviewed_total": reviewed,
+        "approve_ratio": round(counts.get("approved", 0) / reviewed, 3) if reviewed else None,
+        "pending_by_flag": {r["flag"]: r["n"] for r in flag_rows},
        "oldest_pending_at": oldest.isoformat() if oldest else None,
    }