feat: Stage C — RAG advanced (#33, #47, #48, #49, #50, #51)

Six independent sub-tasks dispatched in parallel; aggregated here. ## #33 — Hide case_name column library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם" get `className="hidden"` in both Court and Committee row variants. DB column preserved for future use. ## #47 — Audit script periodic New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר prefix, internal missing chair/district, cases.practice_area enum) + CEO wakeup on violations + cron `0 7 * * *`. First run: 0 issues. ## #48 — Parent-doc retrieval (gated, default off) Schema V17: precedent_chunks.parent_chunk_id + chunk_role ('child'|'parent'). New chunker.chunk_document_hierarchical() — section-aware parents (~1500 tokens) containing ~5 overlapping children (~300 tokens each). New db.store_precedent_chunks_hierarchical two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and swap content + dedupe by parent_chunk_id when flag on. Toggle: PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS. Backfill ~3min and ~$0.20 — deferred to follow-up. ## #49 — Multimodal backfill New scripts/backfill_multimodal_precedents.py with token-matching case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container: 26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no source file on disk) — will catch up when re-uploaded. ## #50 — Closed-loop feedback + nDCG Schema V18: search_logs + search_relevance_feedback. New telemetry.py with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) + auto-infer_relevance_from_citations (reads case drafts → marks score=3 when cited precedent appears in past search top-K). Hooks added to 5 search paths. scripts/compute_ndcg.py for aggregation. Two admin API endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI deferred — API is enough for now. ## #51 — Halacha quality monitoring New scripts/monitor_halacha_quality.py — baseline avg confidence (trusted=0.849, all=0.833, pending=0.694) with rolling window drift detection. Default 5% threshold. Exits non-zero on alert for cron integration. Recommended: `0 8 * * 1` weekly Mon 8am. ## Bonus: 230 unlinked citations → missing_precedents Bulk-imported 230 distinct unlinked citations from precedent_internal_citations to missing_precedents.status='open', party='committee', with notes listing source citers. Top candidate: ע"א 3213/97 (cited 5x). Total open missing_precedents now 237. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 11:26:52 +00:00
parent 3a05e30c8d
commit 2aee398b4a
15 changed files with 2493 additions and 57 deletions
--- a/web/app.py
+++ b/web/app.py
@@ -5250,3 +5250,46 @@ async def missing_precedent_upload(
        "case_law_id": case_law_id,
        "route": "internal_committee" if is_committee else "external_upload",
    }
+
+
+# ── RAG telemetry / nDCG dashboard ────────────────────────────────────
+# Backs the /admin/rag-metrics page. The heavy aggregation lives in
+# ``scripts/compute_ndcg.py`` — we re-use its functions here so the API
+# response stays in lock-step with the CLI tool.
+
+
+@app.get("/api/admin/rag-metrics")
+async def api_rag_metrics(weeks: int = 12, k: int = 10):
+    """Return nDCG@k aggregates for the RAG retrieval feedback loop.
+
+    Args:
+        weeks: window for "recent" metrics (default 12).
+        k: nDCG cutoff (default 10).
+    """
+    # Late import — keeps the path-extension to scripts/ local to this route.
+    scripts_dir = Path(__file__).resolve().parent.parent / "scripts"
+    if str(scripts_dir) not in sys.path:
+        sys.path.insert(0, str(scripts_dir))
+    import compute_ndcg  # type: ignore
+
+    try:
+        metrics = await compute_ndcg.compute(weeks=weeks, k=k)
+    except Exception as e:
+        logger.exception("rag-metrics compute failed")
+        raise HTTPException(500, f"חישוב מטריקות נכשל: {e}") from e
+    return metrics
+
+
+@app.post("/api/admin/rag-metrics/infer")
+async def api_rag_metrics_infer(limit: int | None = None):
+    """Run auto-inference: for every finalized case, mark its cited
+    precedents as ``relevance_score=3`` against any search_log where
+    they appeared in the top-K. Idempotent.
+    """
+    from legal_mcp.services import telemetry as telem_svc
+    try:
+        result = await telem_svc.infer_relevance_for_all_finalized_cases(limit=limit)
+    except Exception as e:
+        logger.exception("rag-metrics auto-inference failed")
+        raise HTTPException(500, f"auto-inference נכשל: {e}") from e
+    return result