feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82)

Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes (pure + tested) plus measurement/ops tooling. halacha_quality.py - #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags now takes rule_type and flags rule_type=='application' OR fact-dependent — blocking auto-approve (an illustration is not a generalizable holding). - #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein / lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band. halacha_extractor.py — pass rule_type to the flag computation; re-type a binding-labeled fact-application to 'application' (mirrors non_decision→obiter). db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a possibly-distinct principle unreviewed). config.py — HALACHA_DEDUP_BAND_COSINE (0.83). Scripts: - scripts/halacha_goldset.py (#81.7) — export stratified sample for human tagging; score validators (P/R/F1) against the tags. Backbone for #81.8. - scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent dedup (cosine ≥0.95), dry-run report only. - scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds against the 2026-06-03 cleanup gold-set. Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE on normalized quote are NOT included — the current skip+flag behavior is safe, whereas a UNIQUE on normalized_quote would fail on existing dups and a blind merge risks losing provenance; they need their own chair-reviewed migration. #82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role classifier deferred (section pre-filter + application flag cover the practical case); #81.8 blocked on the human-tagged gold-set (harness now provided). Verified: - pytest tests/test_halacha_quality.py — 52 passed (14 new). - calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall 0.30 — correct profile for an auto-approve-blocking signal. - goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5 cross-precedent candidate pairs. Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no silent swallow — suspect items flagged to review, never dropped); G2 (no parallel path — same store_halachot_for_chunk / compute_quality_flags). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:55:45 +00:00
parent 366d89e6bb
commit 1286a1e60d
9 changed files with 574 additions and 10 deletions
--- a/mcp-server/src/legal_mcp/services/db.py
+++ b/mcp-server/src/legal_mcp/services/db.py
@@ -3699,6 +3699,7 @@ async def store_halachot_for_chunk(
    """
    threshold = config.HALACHA_AUTO_APPROVE_THRESHOLD
    dedup_distance = 1.0 - config.HALACHA_DEDUP_COSINE  # cosine sim → distance
+    band_distance = 1.0 - config.HALACHA_DEDUP_BAND_COSINE  # tail-band ceiling (#82.3)
    pool = await get_pool()
    inserted = 0
    skipped = 0
@@ -3722,21 +3723,32 @@ async def store_halachot_for_chunk(
                if norm_quote and norm_quote in existing_quotes:
                    skipped += 1
                    continue
-                # 2) semantic near-duplicate (rule embedding cosine)
+                # 2) semantic near-duplicate (rule embedding cosine) — fetch the
+                #    nearest same-precedent neighbor once so we can both auto-skip
+                #    (cosine ≥ DEDUP) and flag the lexical tail (#82.3).
                emb = h.get("embedding")
+                flags = list(h.get("quality_flags") or [])
                if emb is not None and config.HALACHA_DEDUP_COSINE <= 1.0:
-                    dup = await conn.fetchval(
-                        "SELECT 1 FROM halachot WHERE case_law_id = $1 "
-                        "AND embedding IS NOT NULL AND (embedding <=> $2) <= $3 "
-                        "LIMIT 1",
-                        case_law_id, emb, dedup_distance,
+                    neighbor = await conn.fetchrow(
+                        "SELECT rule_statement, (embedding <=> $2) AS dist "
+                        "FROM halachot WHERE case_law_id = $1 "
+                        "AND embedding IS NOT NULL "
+                        "ORDER BY embedding <=> $2 LIMIT 1",
+                        case_law_id, emb,
                    )
-                    if dup:
-                        skipped += 1
-                        continue
+                    if neighbor is not None:
+                        dist = float(neighbor["dist"])
+                        if dist <= dedup_distance:
+                            skipped += 1
+                            continue
+                        # tail band: below auto-skip but lexically near → flag.
+                        if (dist <= band_distance
+                                and halacha_quality.FLAG_NEAR_DUPLICATE not in flags
+                                and halacha_quality.lexical_near_duplicate(
+                                    h["rule_statement"], neighbor["rule_statement"])):
+                            flags.append(halacha_quality.FLAG_NEAR_DUPLICATE)

                confidence = float(h.get("confidence", 0.0))
-                flags = h.get("quality_flags") or []
                auto_approve = confidence >= threshold and not flags
                review_status = "approved" if auto_approve else "pending_review"
                reviewer = (