feat(corpus): corpus redesign — eliminate halacha queue, verified-by-citation layer, rank-at-retrieval (#153)
Implements chaim's 2026-06-20 directive (5 steps; step 6 deferred): 1. No review queue — HALACHA_NO_REVIEW_QUEUE=true (auto-approve all → background); migration cleared 2,416 pending_review → approved. 2. Verified layer — halachot.verified/cite_count from chair citations (db.refresh_verified_layer + scripts/build_verified_layer.py runs citator on ALL committee decisions). 2,775 verified / 137 precedents. 3. Retrieval ranks verified ≫ background — HALACHA_VERIFIED_BOOST in both semantic + lexical halacha queries; filter now includes background (<> rejected). 5. Disabled destructive panel cap/novelty — HALACHA_PANEL_REGIME_ENABLED=false (8508/1049/1200 proved it lost 22-30 genuine principles incl. Lustrenik). 4. Ingest contract — going-forward already queues metadata; backfill_practice_area.py + 206 re-queued to the metadata drain. Source of truth: docs/precedent-corpus-redesign/00-final-synthesis.md. Quality flags are 97% false-positive (nli-audit) → no longer gate. UI queue removal → Claude Design gate. 429 tests green (no regressions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
53
scripts/build_verified_layer.py
Normal file
53
scripts/build_verified_layer.py
Normal file
@@ -0,0 +1,53 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Build the verified principle layer from chair citations (#153, corpus redesign).
|
||||
|
||||
"Trusted = citation, not review" (chaim 2026-06-20). A principle is `verified` iff
|
||||
its SOURCE precedent was actually cited by a chair (any committee decision); never
|
||||
from human review. This:
|
||||
1. Runs the citator (`extract_internal_citations`) over ALL committee decisions —
|
||||
not just דפנה's — so other chairs' citations populate the graph too (tier-2).
|
||||
2. Recomputes halachot.verified / cite_count from precedent_internal_citations.
|
||||
|
||||
Idempotent. Run after ingesting new chair decisions (or wire into the ingest path)
|
||||
so the verified layer grows automatically. EMBEDDING/REGEX-only for the citator,
|
||||
no LLM.
|
||||
|
||||
cd ~/legal-ai/mcp-server
|
||||
HOME=/home/chaim .venv/bin/python ../scripts/build_verified_layer.py # full
|
||||
HOME=/home/chaim .venv/bin/python ../scripts/build_verified_layer.py --no-citator # refresh only
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "mcp-server", "src"))
|
||||
|
||||
from legal_mcp.services import citation_extractor, db # noqa: E402
|
||||
|
||||
|
||||
async def _run(run_citator: bool) -> int:
|
||||
if run_citator:
|
||||
print("→ extracting citations from ALL committee decisions (citator)…", flush=True)
|
||||
res = await citation_extractor.extract_all_internal_committee()
|
||||
print(f" citator: {res}", flush=True)
|
||||
print("→ refreshing verified/cite_count from chair citations…", flush=True)
|
||||
stats = await db.refresh_verified_layer()
|
||||
print(f"\n── verified layer ──")
|
||||
print(f" verified principles: {stats['verified_principles']}")
|
||||
print(f" verified precedents: {stats['verified_precedents']}")
|
||||
return 0
|
||||
|
||||
|
||||
def main() -> int:
|
||||
p = argparse.ArgumentParser(description="Build verified principle layer (#153)")
|
||||
p.add_argument("--no-citator", action="store_true",
|
||||
help="skip citation extraction; only recompute verified/cite_count")
|
||||
a = p.parse_args()
|
||||
return asyncio.run(_run(run_citator=not a.no_citator))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
Reference in New Issue
Block a user