feat(halacha-triage UI): wire gating + near-duplicate cluster cards (#84.2)

Completes #84 — surfaces the backend gating/prioritization (#84.1/#84.3, PR
#93) in the chair's review UI and adds near-duplicate clustering (#84.2).

Backend
- db.list_halachot gains `cluster` (#84.2): annotates each row with cluster_id +
  cluster_size by unioning same-precedent halachot within HALACHA_CLUSTER_COSINE
  (0.90, new config). Display-only — never merges/deletes. Pairwise is confined
  to the returned set (cheap).
- GET /api/halachot exposes the `cluster` query param (default off).

Frontend (web-ui)
- Halacha type gains optional cluster_id / cluster_size (hand-written module; no
  api:types regen needed — halachot aren't typed off the generated schema).
- useHalachotPending(opts): the default "clean" queue now fetches
  exclude_low_quality + order_by_priority + cluster; needsFix:true returns the
  flagged 'needs extraction fix' bucket (filtered client-side).
- HalachaReviewPanel: a "תור נקי / דורש תיקון-חילוץ" toggle (#84.1); near-dup
  clusters collapse into ONE card showing "+N וריאנטים" with an expandable list,
  and approve/reject/defer on a clustered card applies to all variants via the
  batch endpoint (#84.2 + #84.4). Counts show true halacha totals (pendingTotal).
  New flag labels added (application / near_duplicate / nevo_preamble_leak).

Verified:
- backend: list_halachot(cluster=True) on the live queue — algorithm correct
  (groups related same-precedent rules at 0.78; none at the production 0.90
  because dedup #82 already removed near-dups — the desired state).
- frontend: `tsc --noEmit` exits 0 (type-clean); no new lint errors (the one
  lint error is pre-existing in training/learning-panel.tsx from #94). Local
  Turbopack build can't run on the worktree node_modules symlink — CI builds in
  a clean checkout.

Invariants: G1 (gate/cluster at source in SQL, not post-hoc); G2 (same
list_halachot path); §6 (flagged items routed to a visible bucket, not dropped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-06 21:01:30 +00:00
parent 161d0d6ed6
commit 12313774a1
5 changed files with 255 additions and 64 deletions

View File

@@ -162,6 +162,13 @@ HALACHA_DEDUP_COSINE = float(os.environ.get("HALACHA_DEDUP_COSINE", "0.93"))
# dropping a possibly-distinct principle unreviewed. 0.83 from the same cleanup.
HALACHA_DEDUP_BAND_COSINE = float(os.environ.get("HALACHA_DEDUP_BAND_COSINE", "0.83"))
# Halacha review-queue clustering (#84.2) — when the review queue is requested
# with cluster=true, halachot of the SAME precedent whose rule-embeddings are
# within this cosine are grouped into ONE review card (canonical + variants), so
# the chair judges near-identical principles once instead of repeatedly. Display
# only — never merges/deletes. 0.90 = "same principle, reworded".
HALACHA_CLUSTER_COSINE = float(os.environ.get("HALACHA_CLUSTER_COSINE", "0.90"))
# Halacha NLI entailment validator (#81.3) — after extraction, a claude_session
# judge checks each halacha's rule_statement is entailed by its supporting_quote.
# Non-entailed (neutral/contradiction) → quality flag 'nli_unsupported' that

View File

@@ -3794,6 +3794,7 @@ async def list_halachot(
offset: int = 0,
exclude_low_quality: bool = False,
order_by_priority: bool = False,
cluster: bool = False,
) -> list[dict]:
"""List halachot with optional triage controls (#84).
@@ -3804,6 +3805,9 @@ async def list_halachot(
order_by_priority — replace FIFO with an active-learning order (#84.3):
negatively-treated first, then most-uncertain (lowest confidence), then
oldest — so the chair sees the highest-value decisions first.
cluster — annotate each row with ``cluster_id`` + ``cluster_size`` (#84.2):
same-precedent halachot within HALACHA_CLUSTER_COSINE form one group so
the UI can collapse near-identical principles into a single review card.
"""
pool = await get_pool()
conditions = []
@@ -3868,9 +3872,47 @@ async def list_halachot(
if d.get("decision_date") is not None:
d["decision_date"] = d["decision_date"].isoformat()
out.append(d)
if cluster and out:
await _annotate_clusters(pool, out)
return out
async def _annotate_clusters(pool, out: list[dict]) -> None:
"""Add cluster_id + cluster_size to each row (#84.2), display-only.
Same-precedent halachot within HALACHA_CLUSTER_COSINE are unioned into one
group. Singletons get their own id as cluster_id and size 1. Pairwise is
confined to the returned set (cheap; the queue is ~hundreds of rows)."""
ids = [d["id"] for d in out]
max_dist = 1.0 - config.HALACHA_CLUSTER_COSINE
pairs = await pool.fetch(
"SELECT a.id AS a, b.id AS b FROM halachot a JOIN halachot b "
"ON a.case_law_id = b.case_law_id AND a.id < b.id "
"AND a.embedding IS NOT NULL AND b.embedding IS NOT NULL "
"AND (a.embedding <=> b.embedding) <= $2 "
"WHERE a.id = ANY($1::uuid[]) AND b.id = ANY($1::uuid[])",
ids, max_dist,
)
parent = {str(i): str(i) for i in ids}
def find(x: str) -> str:
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
for p in pairs:
ra, rb = find(str(p["a"])), find(str(p["b"]))
if ra != rb:
parent[ra] = rb
from collections import Counter
sizes = Counter(find(str(i)) for i in ids)
for d in out:
root = find(str(d["id"]))
d["cluster_id"] = root
d["cluster_size"] = sizes[root]
async def update_halacha(
halacha_id: UUID,
review_status: str | None = None,