feat(halacha-triage UI): wire gating + near-duplicate cluster cards (#84.2)
Completes #84 — surfaces the backend gating/prioritization (#84.1/#84.3, PR #93) in the chair's review UI and adds near-duplicate clustering (#84.2). Backend - db.list_halachot gains `cluster` (#84.2): annotates each row with cluster_id + cluster_size by unioning same-precedent halachot within HALACHA_CLUSTER_COSINE (0.90, new config). Display-only — never merges/deletes. Pairwise is confined to the returned set (cheap). - GET /api/halachot exposes the `cluster` query param (default off). Frontend (web-ui) - Halacha type gains optional cluster_id / cluster_size (hand-written module; no api:types regen needed — halachot aren't typed off the generated schema). - useHalachotPending(opts): the default "clean" queue now fetches exclude_low_quality + order_by_priority + cluster; needsFix:true returns the flagged 'needs extraction fix' bucket (filtered client-side). - HalachaReviewPanel: a "תור נקי / דורש תיקון-חילוץ" toggle (#84.1); near-dup clusters collapse into ONE card showing "+N וריאנטים" with an expandable list, and approve/reject/defer on a clustered card applies to all variants via the batch endpoint (#84.2 + #84.4). Counts show true halacha totals (pendingTotal). New flag labels added (application / near_duplicate / nevo_preamble_leak). Verified: - backend: list_halachot(cluster=True) on the live queue — algorithm correct (groups related same-precedent rules at 0.78; none at the production 0.90 because dedup #82 already removed near-dups — the desired state). - frontend: `tsc --noEmit` exits 0 (type-clean); no new lint errors (the one lint error is pre-existing in training/learning-panel.tsx from #94). Local Turbopack build can't run on the worktree node_modules symlink — CI builds in a clean checkout. Invariants: G1 (gate/cluster at source in SQL, not post-hoc); G2 (same list_halachot path); §6 (flagged items routed to a visible bucket, not dropped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -162,6 +162,13 @@ HALACHA_DEDUP_COSINE = float(os.environ.get("HALACHA_DEDUP_COSINE", "0.93"))
|
||||
# dropping a possibly-distinct principle unreviewed. 0.83 from the same cleanup.
|
||||
HALACHA_DEDUP_BAND_COSINE = float(os.environ.get("HALACHA_DEDUP_BAND_COSINE", "0.83"))
|
||||
|
||||
# Halacha review-queue clustering (#84.2) — when the review queue is requested
|
||||
# with cluster=true, halachot of the SAME precedent whose rule-embeddings are
|
||||
# within this cosine are grouped into ONE review card (canonical + variants), so
|
||||
# the chair judges near-identical principles once instead of repeatedly. Display
|
||||
# only — never merges/deletes. 0.90 = "same principle, reworded".
|
||||
HALACHA_CLUSTER_COSINE = float(os.environ.get("HALACHA_CLUSTER_COSINE", "0.90"))
|
||||
|
||||
# Halacha NLI entailment validator (#81.3) — after extraction, a claude_session
|
||||
# judge checks each halacha's rule_statement is entailed by its supporting_quote.
|
||||
# Non-entailed (neutral/contradiction) → quality flag 'nli_unsupported' that
|
||||
|
||||
@@ -3794,6 +3794,7 @@ async def list_halachot(
|
||||
offset: int = 0,
|
||||
exclude_low_quality: bool = False,
|
||||
order_by_priority: bool = False,
|
||||
cluster: bool = False,
|
||||
) -> list[dict]:
|
||||
"""List halachot with optional triage controls (#84).
|
||||
|
||||
@@ -3804,6 +3805,9 @@ async def list_halachot(
|
||||
order_by_priority — replace FIFO with an active-learning order (#84.3):
|
||||
negatively-treated first, then most-uncertain (lowest confidence), then
|
||||
oldest — so the chair sees the highest-value decisions first.
|
||||
cluster — annotate each row with ``cluster_id`` + ``cluster_size`` (#84.2):
|
||||
same-precedent halachot within HALACHA_CLUSTER_COSINE form one group so
|
||||
the UI can collapse near-identical principles into a single review card.
|
||||
"""
|
||||
pool = await get_pool()
|
||||
conditions = []
|
||||
@@ -3868,9 +3872,47 @@ async def list_halachot(
|
||||
if d.get("decision_date") is not None:
|
||||
d["decision_date"] = d["decision_date"].isoformat()
|
||||
out.append(d)
|
||||
if cluster and out:
|
||||
await _annotate_clusters(pool, out)
|
||||
return out
|
||||
|
||||
|
||||
async def _annotate_clusters(pool, out: list[dict]) -> None:
|
||||
"""Add cluster_id + cluster_size to each row (#84.2), display-only.
|
||||
|
||||
Same-precedent halachot within HALACHA_CLUSTER_COSINE are unioned into one
|
||||
group. Singletons get their own id as cluster_id and size 1. Pairwise is
|
||||
confined to the returned set (cheap; the queue is ~hundreds of rows)."""
|
||||
ids = [d["id"] for d in out]
|
||||
max_dist = 1.0 - config.HALACHA_CLUSTER_COSINE
|
||||
pairs = await pool.fetch(
|
||||
"SELECT a.id AS a, b.id AS b FROM halachot a JOIN halachot b "
|
||||
"ON a.case_law_id = b.case_law_id AND a.id < b.id "
|
||||
"AND a.embedding IS NOT NULL AND b.embedding IS NOT NULL "
|
||||
"AND (a.embedding <=> b.embedding) <= $2 "
|
||||
"WHERE a.id = ANY($1::uuid[]) AND b.id = ANY($1::uuid[])",
|
||||
ids, max_dist,
|
||||
)
|
||||
parent = {str(i): str(i) for i in ids}
|
||||
|
||||
def find(x: str) -> str:
|
||||
while parent[x] != x:
|
||||
parent[x] = parent[parent[x]]
|
||||
x = parent[x]
|
||||
return x
|
||||
|
||||
for p in pairs:
|
||||
ra, rb = find(str(p["a"])), find(str(p["b"]))
|
||||
if ra != rb:
|
||||
parent[ra] = rb
|
||||
from collections import Counter
|
||||
sizes = Counter(find(str(i)) for i in ids)
|
||||
for d in out:
|
||||
root = find(str(d["id"]))
|
||||
d["cluster_id"] = root
|
||||
d["cluster_size"] = sizes[root]
|
||||
|
||||
|
||||
async def update_halacha(
|
||||
halacha_id: UUID,
|
||||
review_status: str | None = None,
|
||||
|
||||
Reference in New Issue
Block a user