Compare commits
19 Commits
9d2536a667
...
worktree-s
| Author | SHA1 | Date | |
|---|---|---|---|
| e4651a9d06 | |||
| a571ad535b | |||
| afc1548bca | |||
| e096c51037 | |||
| 85c5a4aacb | |||
| 420cb819f5 | |||
| 32ef259843 | |||
| 1286a1e60d | |||
| 366d89e6bb | |||
| fb51a0e869 | |||
| 12bdec10fa | |||
| 8ec24cf822 | |||
| 3b9f77daa8 | |||
| 32a6e2b57b | |||
| 37c00bac13 | |||
| 6313fcd316 | |||
| 7b1c0c1a32 | |||
| 3b3e1e3bbf | |||
| 37dcb30604 |
@@ -463,6 +463,7 @@ The draft's biggest structural error was adding the "נבאר" doctrinal paragra
|
||||
- **Problem:** legal-writer updates `decision_blocks` in the DB, but legal-qa reads from `drafts/decision.md` on disk. In CMPA-62 the writer reported updating block headers in DB but the file did not re-sync, causing QA-2 to fail on exactly the same issue twice.
|
||||
- **Lesson:** Single source of truth is mandatory — either the writer must write to BOTH the DB and the decision.md file in one atomic step, or there must be an automatic `regenerate-draft` hook that runs after every block update so the file always reflects the latest DB state. Two unsynchronized sources will keep producing the same false-fail loop.
|
||||
- **Owner:** Infrastructure task — not a writer/QA prompt fix.
|
||||
- **✅ RESOLVED (GAP-88, 2026-06-06):** `block_writer._update_draft_file` is now an automatic regenerate hook called from `store_block` (every persist) **and** `renumber_all_blocks` — so `drafts/decision.md` always reflects `decision_blocks`. legal-qa already validates against the DB; both sides are now identical.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -88,7 +88,7 @@
|
||||
| GAP-51 | `set_outcome` enum-mismatch (3≠4); אוצרות-מילים סותרות | INV-TOOL1/UI1 | Medium | `block_writer.py:442` מול `lessons.py:11`, `workflow.py:145` | SSoT יחיד ל-outcome |
|
||||
| GAP-52 | רוב הכלים לא-idempotent (case_create/document_upload/precedent_attach) | INV-TOOL3, G3 | Medium | `server.py`, tools/ | upsert/ON CONFLICT |
|
||||
| GAP-53 | אין limit-caps (precedent_library_list/search_*/list_chair_feedback) | INV-TOOL5 | Low | tools/ | clamp ל-max |
|
||||
| GAP-54 | 3 מסלולי-קליטת-פסיקה ולידציה א-סימטרית; citation-guard לא-מתועד | INV-ING1, G2 | Medium | `precedent_library.py`, `internal_decisions.py` | איחוד (תואם GAP-01/05) |
|
||||
| GAP-54 | 3 מסלולי-קליטת-פסיקה ולידציה א-סימטרית; citation-guard לא-מתועד | INV-ING1, G2 | Medium | `precedent_library.py`, `internal_decisions.py` | ✅ **נפתר ע"י FU-1** — שני מסלולי-הפסיקה (library+internal) עוברים דרך `ingest.ingest_document` הקנוני (ולידציית-enums + citation-guard סימטריים, מתועד ב-01-ingest §4); המסלול ה-3 (training→`style_corpus`) הוא קורפוס נפרד במכוון (סגנון, לא פסיקה). מאומת ב-`test_unified_ingest.py` |
|
||||
| GAP-55 | Infisical dead-code; מקור-config לא-מתועד (Coolify-only) | INV-ENV2, G2 | Medium | `mcp-server/.../config.py` | לתעד Coolify SSoT / לבודד Infisical |
|
||||
| GAP-56 | UUIDs קשיחים (company/agent) — תואם GAP-26 | INV-ENV3/INT5 | High | `web/paperclip_client.py:36-62`, `web/app.py:3976` | config-driven |
|
||||
| GAP-57 | creds plaintext בברירת-מחדל (`paperclip:paperclip`) | INV-ENV4, G9, §6 | High | `web/paperclip_client.py:21`, `web/app.py:3789,3964` | default ריק + fail-loud |
|
||||
@@ -207,6 +207,7 @@
|
||||
- **פרוסה 7, 2026-06-06 — ✅ GAP-48 הושלם.** משפחת `drafting` (18 כלים) הומרה ל-envelope. export_docx/revise_draft/apply_user_edit משתמשים ב-`err`-לכשל (כך שהסוכן והמשתמש רואים את הכשל ברמת-המעטפת), כש-`failed_gates` רוכב ב-`data`; 6 צרכני-app.py (get_decision_template/apply_user_edit×2/revise_draft/list_bookmarks/export_docx) חוּוטו עם בדיקת envelope-status; `test_export_qa_gate` עודכן לחוזה (182/182 עוברים). **GAP-48 סגור — כל ~12 המשפחות אחידות.**
|
||||
- **פרוסה 8, 2026-06-06 — ✅ GAP-49 (החלק הקריטי).** השם המטעה `precedent_search_library` (ציטוטים מצורפים-לתיק) שונה ל-`search_case_precedents` ובכך בוטל ההיפוך המסוכן מול `search_precedent_library` (ספרייה סמכותית — מקור CREAC). הישן נשמר כ-alias deprecated (ב-server.py) → אפס שבירה לסוכנים חיים. docstrings הובהרו; עודכנו app.py (typeahead) + legal-researcher/legal-writer docs + precedent_library docstring. 5 כלי-החיפוש הנותרים מחפשים קורפוסים מובחנים בשמות סבירים — לא בוצע rename-המוני (churn גבוה, ערך נמוך). 182/182 עוברים. **⚠ אחרי merge+deploy:** סנכרון cross-company של doc-הסוכן (frontmatter `search_case_precedents`). נותר ב-FU-14: GAP-50 (מיזוג כלי-בלוק — נוגע בתהליך-הכתיבה, דורש הכרעת-יו"ר), GAP-54, GAP-47-חלק-ב.
|
||||
- **פרוסה 9, 2026-06-06 — ✅ GAP-50 (הכרעת-יו"ר).** מיפוי הראה שכלי-הבלוק אינם "כפילות מיותרת": `write_block`/`write_all_blocks`/`save_block_content`/`write_interim_draft` משרתים זרימות שונות (CLI/initial-draft מול תהליך-ה-writer "התיקון בקובץ, לא ב-DB"). הכפילות האמיתית היחידה — `draft_section` (הקשר לפי-סעיף, כמעט-נטוש) חופף ל-`get_block_context` (לפי-בלוק, קנוני). הוחלט (יו"ר): **draft_section deprecated** (docstring ב-server.py+drafting.py מפנה ל-get_block_context; draft-decision.md עודכן) — בלי הסרה, בלי מיזוג כלי-הכתיבה (שמירת תהליך-הכתיבה המכוון). 182/182 עוברים. **GAP-49+50 סגורים.** נותר ב-FU-14: GAP-54 (איחוד קליטת-פסיקה), GAP-47-חלק-ב (הנחיות-יו"ר→DB).
|
||||
- **פרוסה 10, 2026-06-06 — ✅ GAP-54 (נסגר כ-resolved-by-FU-1).** אימות (G2: לא לפתור מחדש): `ingest.ingest_document` הוא המסלול הקנוני; `precedent_library` ו-`internal_decisions` שניהם עוברים דרכו עם ולידציית-enums + citation-guard סימטריים (מתועד ב-01-ingest §4); training→`style_corpus` הוא קורפוס נפרד במכוון. 9/9 `test_unified_ingest` עוברים — אין קוד לכתוב. **FU-14 כמעט-מלא: נותר רק GAP-47-חלק-ב** (העברת הנחיות-יו"ר מ-`analysis-and-research.md` ל-DB) — פיצ'ר UI+זרימת-אנליסט נפרד, לא דחוף.
|
||||
|
||||
### FU-15 — deploy/env/secrets
|
||||
- **מכסה:** GAP-55..62 · **invariants:** INV-ENV1–ENV5 · **effort:** M · **תלויות:** —
|
||||
|
||||
@@ -154,6 +154,14 @@ HALACHA_AUTO_APPROVE_THRESHOLD = float(
|
||||
# principle. Set > 1.0 to disable semantic dedup (exact-quote dedup still runs).
|
||||
HALACHA_DEDUP_COSINE = float(os.environ.get("HALACHA_DEDUP_COSINE", "0.93"))
|
||||
|
||||
# Halacha dedup TAIL band (#82.3) — the [BAND_COSINE, DEDUP_COSINE) range is too
|
||||
# low to auto-skip but suspicious. A halacha whose nearest same-precedent
|
||||
# neighbor sits in this band AND has high LEXICAL overlap (Jaccard/Levenshtein
|
||||
# on rule_statement) is flagged 'near_duplicate' (blocks auto-approve → review),
|
||||
# not skipped — catching paraphrases the cosine threshold misses without
|
||||
# dropping a possibly-distinct principle unreviewed. 0.83 from the same cleanup.
|
||||
HALACHA_DEDUP_BAND_COSINE = float(os.environ.get("HALACHA_DEDUP_BAND_COSINE", "0.83"))
|
||||
|
||||
# Halacha NLI entailment validator (#81.3) — after extraction, a claude_session
|
||||
# judge checks each halacha's rule_statement is entailed by its supporting_quote.
|
||||
# Non-entailed (neutral/contradiction) → quality flag 'nli_unsupported' that
|
||||
|
||||
@@ -1088,37 +1088,39 @@ async def save_block_content(case_id: UUID, block_id: str, content: str) -> dict
|
||||
result["generation_type"] = "claude-code"
|
||||
result["model_used"] = "claude-code"
|
||||
|
||||
await store_block(UUID(decision["id"]), result)
|
||||
await store_block(UUID(decision["id"]), result) # store_block syncs the file (#35)
|
||||
await db.mark_blocks_stale(case_id, False)
|
||||
|
||||
# Also write/update the draft file on disk
|
||||
await _update_draft_file(case_id, UUID(decision["id"]))
|
||||
|
||||
return result
|
||||
|
||||
|
||||
async def _update_draft_file(case_id: UUID, decision_id: UUID) -> None:
|
||||
"""Rebuild drafts/decision.md from all blocks in DB."""
|
||||
from pathlib import Path
|
||||
|
||||
case = await db.get_case(case_id)
|
||||
if not case:
|
||||
return
|
||||
|
||||
case_dir = config.find_case_dir(case["case_number"])
|
||||
draft_dir = case_dir / "drafts"
|
||||
draft_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
async def _update_draft_file(decision_id: UUID) -> None:
|
||||
"""Rebuild drafts/decision.md from all blocks in DB — the single
|
||||
regenerate-draft hook (lessons #35 / GAP-88). Called after EVERY
|
||||
decision_blocks mutation (store_block, renumber) so the on-disk file never
|
||||
drifts from the DB. legal-qa validates against the DB; export and the chair
|
||||
read the file — keeping them identical kills the "QA fails twice on the same
|
||||
already-fixed issue" loop (CMPA-62). Resolves case from decision_id so no
|
||||
caller has to thread case_id through."""
|
||||
pool = await db.get_pool()
|
||||
async with pool.acquire() as conn:
|
||||
case_row = await conn.fetchrow(
|
||||
"SELECT c.case_number FROM decisions d JOIN cases c ON c.id = d.case_id "
|
||||
"WHERE d.id = $1",
|
||||
decision_id,
|
||||
)
|
||||
if not case_row:
|
||||
return
|
||||
rows = await conn.fetch(
|
||||
"SELECT content FROM decision_blocks WHERE decision_id = $1 AND content != '' ORDER BY block_index",
|
||||
decision_id,
|
||||
)
|
||||
|
||||
draft_dir = config.find_case_dir(case_row["case_number"]) / "drafts"
|
||||
draft_dir.mkdir(parents=True, exist_ok=True)
|
||||
draft_path = draft_dir / "decision.md"
|
||||
draft_path.write_text("\n\n".join(row["content"] for row in rows if row["content"]), encoding="utf-8")
|
||||
logger.info("Draft file updated: %s (%d blocks)", draft_path, len(rows))
|
||||
logger.info("Draft file synced: %s (%d blocks)", draft_path, len(rows))
|
||||
|
||||
|
||||
# ── Renumbering ───────────────────────────────────────────────────
|
||||
@@ -1172,6 +1174,11 @@ async def renumber_all_blocks(decision_id: UUID) -> dict:
|
||||
)
|
||||
updated += 1
|
||||
|
||||
# #35 — renumber mutates content via raw UPDATE (bypasses store_block), so
|
||||
# sync the draft file here too, otherwise the file keeps stale numbering.
|
||||
if updated:
|
||||
await _update_draft_file(decision_id)
|
||||
|
||||
return {"total_paragraphs": current_num - 1, "blocks_updated": updated}
|
||||
|
||||
|
||||
@@ -1204,6 +1211,9 @@ async def store_block(decision_id: UUID, block_result: dict) -> None:
|
||||
block_result["model_used"],
|
||||
block_result["temperature"],
|
||||
)
|
||||
# #35 — regenerate the on-disk draft on every persist so DB and file stay
|
||||
# identical (legal-qa reads DB; export/chair read the file).
|
||||
await _update_draft_file(decision_id)
|
||||
|
||||
|
||||
async def write_and_store_block(
|
||||
|
||||
@@ -29,6 +29,7 @@ from __future__ import annotations
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
|
||||
from legal_mcp.config import parse_llm_json
|
||||
|
||||
@@ -40,15 +41,39 @@ logger = logging.getLogger(__name__)
|
||||
DEFAULT_TIMEOUT = 1800
|
||||
LONG_TIMEOUT = 3600 # opus block writing on full case context
|
||||
|
||||
# #85 — `claude -p` fails intermittently with a fast non-zero exit and empty
|
||||
# stderr (observed on large/slow cold prompts: CEO write_interim_draft,
|
||||
# learning_loop distillation). The SAME prompt succeeds on retry, so the bail is
|
||||
# transient — retry with linear backoff. Timeouts and "CLI not found" are
|
||||
# deterministic and are NOT retried.
|
||||
# #85 — two complementary hardenings for the same symptom (`claude -p` failing
|
||||
# with a fast non-zero exit + empty stderr on large/slow cold prompts: CEO
|
||||
# write_interim_draft, learning_loop distillation):
|
||||
#
|
||||
# 1. CLEAN ENV (defensive): a running Claude Code session exports markers into
|
||||
# child processes; a *nested* ``claude -p`` inherits them. Stripping them lets
|
||||
# every nested invocation launch as a clean top-level session. Could not be
|
||||
# reproduced deterministically, so it's a suspect, not a proven cause. Auth/
|
||||
# config (CLAUDE_CONFIG_DIR, ANTHROPIC_*, PATH, HOME) are kept.
|
||||
# 2. RETRY (the real fix): the SAME large prompt that exits 1 once succeeds on a
|
||||
# plain retry — the bail is transient. Retry with linear backoff. Timeouts and
|
||||
# "CLI not found" stay deterministic and are NOT retried.
|
||||
# See TaskMaster legal-ai #85.
|
||||
_SESSION_MARKER_PREFIXES = ("CLAUDECODE", "CLAUDE_CODE_", "CLAUDE_AGENT_")
|
||||
_SESSION_MARKER_EXACT = frozenset({"AI_AGENT", "CLAUDE_EFFORT"})
|
||||
|
||||
MAX_RETRIES = 3
|
||||
RETRY_BACKOFF_BASE = 5 # seconds; sleep = base * attempt_number
|
||||
|
||||
|
||||
def _clean_subprocess_env() -> dict[str, str]:
|
||||
"""Copy the current env minus Claude Code session markers.
|
||||
|
||||
Lets a nested ``claude -p`` start fresh instead of detecting it is
|
||||
already inside a Claude Code session (#85).
|
||||
"""
|
||||
env = dict(os.environ)
|
||||
for key in list(env):
|
||||
if key in _SESSION_MARKER_EXACT or key.startswith(_SESSION_MARKER_PREFIXES):
|
||||
del env[key]
|
||||
return env
|
||||
|
||||
|
||||
async def query(
|
||||
prompt: str,
|
||||
timeout: int = DEFAULT_TIMEOUT,
|
||||
@@ -112,6 +137,8 @@ async def query(
|
||||
stdin=asyncio.subprocess.PIPE,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
env=_clean_subprocess_env(),
|
||||
cwd=os.path.expanduser("~"),
|
||||
)
|
||||
except FileNotFoundError:
|
||||
# Deterministic — never retry.
|
||||
@@ -139,8 +166,11 @@ async def query(
|
||||
raise RuntimeError(f"Claude CLI timed out after {timeout}s")
|
||||
|
||||
if proc.returncode != 0:
|
||||
stderr = stderr_b.decode("utf-8", errors="replace").strip()[:500] or "unknown error"
|
||||
last_err = f"exit {proc.returncode}: {stderr}"
|
||||
# The CLI sometimes writes its diagnostic to stdout (or nowhere)
|
||||
# rather than stderr (#85) — surface whichever is present.
|
||||
stderr = stderr_b.decode("utf-8", errors="replace").strip()
|
||||
stdout = stdout_b.decode("utf-8", errors="replace").strip()
|
||||
last_err = f"exit {proc.returncode}: {(stderr or stdout or 'no output')[:500]}"
|
||||
else:
|
||||
stdout = stdout_b.decode("utf-8", errors="replace").strip()
|
||||
if stdout:
|
||||
@@ -256,6 +286,7 @@ async def query_streaming(
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
cwd=cwd,
|
||||
env=_clean_subprocess_env(),
|
||||
)
|
||||
except FileNotFoundError:
|
||||
yield {
|
||||
|
||||
@@ -619,6 +619,12 @@ ALTER TABLE case_law ADD COLUMN IF NOT EXISTS practice_area TEXT DEFAULT '';
|
||||
ALTER TABLE case_law ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT '';
|
||||
ALTER TABLE case_law ADD COLUMN IF NOT EXISTS headnote TEXT DEFAULT '';
|
||||
-- chair-editable abstract shown in search results.
|
||||
ALTER TABLE case_law ADD COLUMN IF NOT EXISTS nevo_ratio TEXT DEFAULT '';
|
||||
-- The Nevo editorial מיני-רציו block, captured at ingest *before* it is
|
||||
-- stripped from the body (#86.3). Kept separate from `headnote` (which is
|
||||
-- our own abstract) so it can serve as a free professional gold-set for
|
||||
-- benchmarking halacha-extraction recall/precision. Empty when the source
|
||||
-- is not a Nevo export or carries no mini-ratio.
|
||||
ALTER TABLE case_law ADD COLUMN IF NOT EXISTS source_type TEXT DEFAULT '';
|
||||
-- 'court_ruling' | 'appeals_committee'
|
||||
|
||||
@@ -3263,7 +3269,7 @@ async def update_case_law(case_law_id: UUID, **fields) -> dict | None:
|
||||
"""
|
||||
allowed = {
|
||||
"case_number", "case_name", "court", "date", "practice_area", "appeal_subtype",
|
||||
"subject_tags", "summary", "headnote", "key_quote", "source_url",
|
||||
"subject_tags", "summary", "headnote", "nevo_ratio", "key_quote", "source_url",
|
||||
"source_type", "precedent_level", "is_binding", "district", "chair_name",
|
||||
"proceeding_type", "citation_formatted",
|
||||
}
|
||||
@@ -3693,6 +3699,7 @@ async def store_halachot_for_chunk(
|
||||
"""
|
||||
threshold = config.HALACHA_AUTO_APPROVE_THRESHOLD
|
||||
dedup_distance = 1.0 - config.HALACHA_DEDUP_COSINE # cosine sim → distance
|
||||
band_distance = 1.0 - config.HALACHA_DEDUP_BAND_COSINE # tail-band ceiling (#82.3)
|
||||
pool = await get_pool()
|
||||
inserted = 0
|
||||
skipped = 0
|
||||
@@ -3716,21 +3723,32 @@ async def store_halachot_for_chunk(
|
||||
if norm_quote and norm_quote in existing_quotes:
|
||||
skipped += 1
|
||||
continue
|
||||
# 2) semantic near-duplicate (rule embedding cosine)
|
||||
# 2) semantic near-duplicate (rule embedding cosine) — fetch the
|
||||
# nearest same-precedent neighbor once so we can both auto-skip
|
||||
# (cosine ≥ DEDUP) and flag the lexical tail (#82.3).
|
||||
emb = h.get("embedding")
|
||||
flags = list(h.get("quality_flags") or [])
|
||||
if emb is not None and config.HALACHA_DEDUP_COSINE <= 1.0:
|
||||
dup = await conn.fetchval(
|
||||
"SELECT 1 FROM halachot WHERE case_law_id = $1 "
|
||||
"AND embedding IS NOT NULL AND (embedding <=> $2) <= $3 "
|
||||
"LIMIT 1",
|
||||
case_law_id, emb, dedup_distance,
|
||||
neighbor = await conn.fetchrow(
|
||||
"SELECT rule_statement, (embedding <=> $2) AS dist "
|
||||
"FROM halachot WHERE case_law_id = $1 "
|
||||
"AND embedding IS NOT NULL "
|
||||
"ORDER BY embedding <=> $2 LIMIT 1",
|
||||
case_law_id, emb,
|
||||
)
|
||||
if dup:
|
||||
if neighbor is not None:
|
||||
dist = float(neighbor["dist"])
|
||||
if dist <= dedup_distance:
|
||||
skipped += 1
|
||||
continue
|
||||
# tail band: below auto-skip but lexically near → flag.
|
||||
if (dist <= band_distance
|
||||
and halacha_quality.FLAG_NEAR_DUPLICATE not in flags
|
||||
and halacha_quality.lexical_near_duplicate(
|
||||
h["rule_statement"], neighbor["rule_statement"])):
|
||||
flags.append(halacha_quality.FLAG_NEAR_DUPLICATE)
|
||||
|
||||
confidence = float(h.get("confidence", 0.0))
|
||||
flags = h.get("quality_flags") or []
|
||||
auto_approve = confidence >= threshold and not flags
|
||||
review_status = "approved" if auto_approve else "pending_review"
|
||||
reviewer = (
|
||||
@@ -3774,7 +3792,19 @@ async def list_halachot(
|
||||
practice_area: str | None = None,
|
||||
limit: int = 200,
|
||||
offset: int = 0,
|
||||
exclude_low_quality: bool = False,
|
||||
order_by_priority: bool = False,
|
||||
) -> list[dict]:
|
||||
"""List halachot with optional triage controls (#84).
|
||||
|
||||
exclude_low_quality — drop items carrying ANY quality_flag (application /
|
||||
truncated_quote / quote_unverified / non_decision / thin_restatement /
|
||||
nli_unsupported / near_duplicate). These belong in a 'needs extraction
|
||||
fix' bucket, not the chair's approve queue (#84.1).
|
||||
order_by_priority — replace FIFO with an active-learning order (#84.3):
|
||||
negatively-treated first, then most-uncertain (lowest confidence), then
|
||||
oldest — so the chair sees the highest-value decisions first.
|
||||
"""
|
||||
pool = await get_pool()
|
||||
conditions = []
|
||||
params: list = []
|
||||
@@ -3791,7 +3821,16 @@ async def list_halachot(
|
||||
conditions.append(f"${idx} = ANY(h.practice_areas)")
|
||||
params.append(practice_area)
|
||||
idx += 1
|
||||
if exclude_low_quality:
|
||||
# a clean item has an empty/NULL quality_flags array
|
||||
conditions.append("COALESCE(array_length(h.quality_flags, 1), 0) = 0")
|
||||
where_sql = f"WHERE {' AND '.join(conditions)}" if conditions else ""
|
||||
order_sql = (
|
||||
"ORDER BY corroboration_negative DESC, h.confidence ASC NULLS LAST, "
|
||||
"h.created_at ASC"
|
||||
if order_by_priority
|
||||
else "ORDER BY h.case_law_id, h.halacha_index"
|
||||
)
|
||||
params.extend([limit, offset])
|
||||
sql = f"""
|
||||
SELECT h.id, h.case_law_id, h.halacha_index, h.rule_statement,
|
||||
@@ -3819,7 +3858,7 @@ async def list_halachot(
|
||||
GROUP BY halacha_id
|
||||
) cor ON cor.halacha_id = h.id
|
||||
{where_sql}
|
||||
ORDER BY h.case_law_id, h.halacha_index
|
||||
{order_sql}
|
||||
LIMIT ${idx} OFFSET ${idx + 1}
|
||||
"""
|
||||
rows = await pool.fetch(sql, *params)
|
||||
|
||||
@@ -362,12 +362,24 @@ _NEVO_MARKERS = ("ספרות:", "חקיקה שאוזכרה:", "מיני-רציו
|
||||
# preamble: bibliography + מיני-רציו). Two families:
|
||||
# - ועדת ערר / district openings (בפנינו / הערר שבנדון / ...)
|
||||
# - COURT-RULING openings (#86.1): a פסק-דין header or the authoring judge's
|
||||
# line ("השופט/ת X:", "כב' השופט", "הנשיא"). Without these, Nevo court
|
||||
# judgments — exactly the ones carrying a מיני-רציו — slipped through unstripped
|
||||
# (e.g. בג"ץ 1764/05), risking that the extractor reads Nevo's answer key.
|
||||
# line. Without these, Nevo court judgments — exactly the ones carrying a
|
||||
# מיני-רציו — slipped through unstripped (e.g. בג"ץ 1764/05).
|
||||
#
|
||||
# #86.2 hardening — two over-strip bugs found while backfilling:
|
||||
# 1. ``פסק-דין`` headers are often markdown-wrapped (``**פסק דין**``); the old
|
||||
# ``^פסק[- ]דין`` required the keyword to be the very first char of the line
|
||||
# and allowed only one separator, so it missed the header and fell through
|
||||
# to a citation 32K deep (עמ"נ 50567-07-21). We now tolerate leading
|
||||
# markdown/whitespace and 0-3 separators.
|
||||
# 2. Bare ``השופט``/``הנשיא`` matched *citations* ("השופט מ' חשין, פסקה 23"),
|
||||
# stripping real decision body. The authoring-judge line ends with a COLON
|
||||
# ("השופט י' עמית:"); citations use a comma. We now require the colon.
|
||||
_DECISION_START = re.compile(
|
||||
r"^(בפנינו|לפנינו|לפניי|הערר שבנדון|ועדת הערר לתכנון|רקע עובדתי|עסקינן|"
|
||||
r"פסק[- ]דין|פסק[- ]דינו|כב(?:וד)?['׳]?\s*השופט|המשנה לנשיא|הנשיא|השופט)",
|
||||
r"^[ \t>*_#]{0,6}(?:"
|
||||
r"בפנינו|לפנינו|לפניי|הערר שבנדון|ועדת הערר לתכנון|רקע עובדתי|עסקינן|"
|
||||
r"פסק[ \t\-]{0,3}די(?:ן|נו)|" # פסק-דין / פסק דין / **פסק דין** header (final-nun ן vs דינו)
|
||||
r"(?:כב(?:וד)?['׳\"]?\s*)?(?:ה?שופט[ת]?|ה?נשיא[ה]?|המשנה לנשיא)\s+[^\n,]{1,40}:" # author line → colon
|
||||
r")",
|
||||
re.MULTILINE,
|
||||
)
|
||||
|
||||
@@ -388,3 +400,41 @@ def strip_nevo_preamble(text: str) -> str:
|
||||
logger.debug("Stripped %d chars of Nevo preamble", m.start())
|
||||
return stripped
|
||||
return text
|
||||
|
||||
|
||||
_RATIO_MARKER = "מיני-רציו:"
|
||||
|
||||
|
||||
def extract_nevo_ratio(text: str) -> str:
|
||||
"""Return the Nevo מיני-רציו block (editorial holdings summary), or ''.
|
||||
|
||||
The mini-ratio is Nevo's own headnote — a concise, professionally-written
|
||||
list of the holdings. We capture it *before* :func:`strip_nevo_preamble`
|
||||
discards it, to serve as a free gold-set for benchmarking how well our
|
||||
halacha extractor covers the real holdings (#86.3).
|
||||
|
||||
The block runs from the ``מיני-רציו:`` marker to whichever comes first:
|
||||
the decision body (``_DECISION_START``) or the next preamble marker
|
||||
(bibliography / legislation). Returns '' when there is no mini-ratio.
|
||||
"""
|
||||
if not text:
|
||||
return ""
|
||||
start = text.find(_RATIO_MARKER)
|
||||
if start == -1:
|
||||
return ""
|
||||
body = text[start + len(_RATIO_MARKER):]
|
||||
|
||||
# End at the earliest of: decision body start, or a following preamble
|
||||
# marker (ספרות: / חקיקה שאוזכרה: / ...). Both are measured relative to
|
||||
# the ratio body so we never run past it into the judgment itself.
|
||||
end = len(body)
|
||||
dm = _DECISION_START.search(body)
|
||||
if dm:
|
||||
end = min(end, dm.start())
|
||||
for marker in _NEVO_MARKERS:
|
||||
if marker == _RATIO_MARKER:
|
||||
continue
|
||||
pos = body.find(marker)
|
||||
if pos != -1:
|
||||
end = min(end, pos)
|
||||
return body[:end].strip()
|
||||
|
||||
@@ -592,10 +592,16 @@ async def _extract_impl(case_law_id: UUID, force: bool = False,
|
||||
flags = halacha_quality.compute_quality_flags(
|
||||
coerced["rule_statement"], coerced["supporting_quote"],
|
||||
coerced["reasoning_summary"], coerced["quote_verified"],
|
||||
coerced["rule_type"],
|
||||
)
|
||||
coerced["quality_flags"] = flags
|
||||
if halacha_quality.FLAG_NON_DECISION in flags and coerced["rule_type"] != "obiter":
|
||||
coerced["rule_type"] = "obiter"
|
||||
# #81.4 — a binding-labeled rule that reads as a case-application is
|
||||
# re-typed application (it carries FLAG_APPLICATION either way).
|
||||
elif (halacha_quality.FLAG_APPLICATION in flags
|
||||
and coerced["rule_type"] == "binding"):
|
||||
coerced["rule_type"] = "application"
|
||||
cleaned.append(coerced)
|
||||
# #81.3 NLI entailment — one batched judge call per chunk (fail-open).
|
||||
if config.HALACHA_NLI_ENABLED and cleaned:
|
||||
|
||||
@@ -128,6 +128,91 @@ def is_thin_restatement(rule_statement: str, supporting_quote: str) -> bool:
|
||||
return overlap >= _THIN_OVERLAP and len_ratio <= _THIN_LEN_RATIO
|
||||
|
||||
|
||||
# ── Fact-dependent application: not a generalizable holding (#81.4) ──
|
||||
#
|
||||
# The strict rubric's cut_application (docs/halacha-strict-rubric.md §3, §27):
|
||||
# a determination that rests on the case's specific facts/parties/amounts is an
|
||||
# illustration, not a holding — it must not enter the corpus as a binding rule.
|
||||
# The extractor already classifies ``rule_type='application'``; this is a
|
||||
# HIGH-PRECISION secondary catch for rules the model mislabeled as binding,
|
||||
# using only the unambiguous "applied to THIS case" deixis (bare party words
|
||||
# like "המערער" appear in genuine rules too, so they are deliberately excluded).
|
||||
|
||||
_FACT_DEPENDENT_MARKERS = (
|
||||
"במקרה דנן",
|
||||
"במקרה שבפנינו",
|
||||
"במקרה שלפנינו",
|
||||
"במקרה שלפניי",
|
||||
"בענייננו",
|
||||
"בנדון דידן",
|
||||
"בנדון דנן",
|
||||
"במקרה שלנו",
|
||||
"בנסיבות המקרה שלפנינו",
|
||||
"בנסיבות תיק זה",
|
||||
"בתיק שלפנינו",
|
||||
"בערר שלפנינו",
|
||||
"בערר דנן",
|
||||
)
|
||||
|
||||
|
||||
def is_fact_dependent(rule_statement: str) -> bool:
|
||||
"""True when the rule is phrased as an application to THIS case (not a holding)."""
|
||||
norm = normalize_text(rule_statement)
|
||||
return any(marker in norm for marker in _FACT_DEPENDENT_MARKERS)
|
||||
|
||||
|
||||
# ── Lexical near-duplicate signal (the 0.83–0.90 cosine tail) — #82.3 ──
|
||||
#
|
||||
# Embedding cosine alone misses paraphrases that float just below the dedup
|
||||
# threshold (0.93). A secondary lexical signal — Jaccard over word-shingles +
|
||||
# normalized Levenshtein on the rule_statement — catches "same rule, reworded"
|
||||
# in that band without lowering the global cosine threshold. Hybrid
|
||||
# lexical+semantic beats either alone (arXiv:1805.11611). Pure functions.
|
||||
|
||||
def _shingles(text: str, k: int = 2) -> set[str]:
|
||||
words = [w for w in re.split(r"[^א-ת0-9]+", normalize_text(text)) if w]
|
||||
if len(words) < k:
|
||||
return {" ".join(words)} if words else set()
|
||||
return {" ".join(words[i : i + k]) for i in range(len(words) - k + 1)}
|
||||
|
||||
|
||||
def jaccard_shingles(a: str, b: str, k: int = 2) -> float:
|
||||
sa, sb = _shingles(a, k), _shingles(b, k)
|
||||
if not sa or not sb:
|
||||
return 0.0
|
||||
return len(sa & sb) / len(sa | sb)
|
||||
|
||||
|
||||
def normalized_levenshtein(a: str, b: str) -> float:
|
||||
"""1.0 == identical, 0.0 == fully different (edit distance / max len)."""
|
||||
a, b = normalize_text(a), normalize_text(b)
|
||||
if not a and not b:
|
||||
return 1.0
|
||||
if not a or not b:
|
||||
return 0.0
|
||||
# classic DP edit distance (rule_statements are short — a few hundred chars)
|
||||
prev = list(range(len(b) + 1))
|
||||
for i, ca in enumerate(a, 1):
|
||||
cur = [i]
|
||||
for j, cb in enumerate(b, 1):
|
||||
cur.append(min(prev[j] + 1, cur[j - 1] + 1, prev[j - 1] + (ca != cb)))
|
||||
prev = cur
|
||||
return 1.0 - prev[-1] / max(len(a), len(b))
|
||||
|
||||
|
||||
_LEX_JACCARD_MIN = 0.55
|
||||
_LEX_LEVENSHTEIN_MIN = 0.70
|
||||
|
||||
|
||||
def lexical_near_duplicate(
|
||||
a: str, b: str, jaccard_min: float = _LEX_JACCARD_MIN,
|
||||
levenshtein_min: float = _LEX_LEVENSHTEIN_MIN,
|
||||
) -> bool:
|
||||
"""High lexical overlap → likely the same rule reworded (for the cosine tail)."""
|
||||
return (jaccard_shingles(a, b) >= jaccard_min
|
||||
or normalized_levenshtein(a, b) >= levenshtein_min)
|
||||
|
||||
|
||||
# ── Aggregate ──
|
||||
|
||||
FLAG_NON_DECISION = "non_decision"
|
||||
@@ -135,6 +220,8 @@ FLAG_TRUNCATED_QUOTE = "truncated_quote"
|
||||
FLAG_THIN_RESTATEMENT = "thin_restatement"
|
||||
FLAG_QUOTE_UNVERIFIED = "quote_unverified"
|
||||
FLAG_NLI_UNSUPPORTED = "nli_unsupported" # rule not entailed by its quote (#81.3)
|
||||
FLAG_APPLICATION = "application" # fact-dependent, not a holding (#81.4)
|
||||
FLAG_NEAR_DUPLICATE = "near_duplicate" # cosine-tail lexical dup (#82.3)
|
||||
|
||||
|
||||
# ── NLI entailment check (rule_statement ⊨ supporting_quote) — #81.3 ──
|
||||
@@ -250,6 +337,7 @@ def compute_quality_flags(
|
||||
supporting_quote: str,
|
||||
reasoning_summary: str = "",
|
||||
quote_verified: bool = True,
|
||||
rule_type: str = "binding",
|
||||
) -> list[str]:
|
||||
"""Return the list of quality flags for one halacha (empty == clean).
|
||||
|
||||
@@ -264,4 +352,9 @@ def compute_quality_flags(
|
||||
flags.append(FLAG_THIN_RESTATEMENT)
|
||||
if not quote_verified:
|
||||
flags.append(FLAG_QUOTE_UNVERIFIED)
|
||||
# #81.4 — an application (fact-dependent) item is an illustration, not a
|
||||
# generalizable holding: never auto-approve it. Trust the model's
|
||||
# rule_type='application' and add a high-precision deixis catch.
|
||||
if rule_type == "application" or is_fact_dependent(rule_statement):
|
||||
flags.append(FLAG_APPLICATION)
|
||||
return flags
|
||||
|
||||
@@ -158,9 +158,14 @@ async def ingest_document(
|
||||
except Exception as e:
|
||||
await progress("failed", 100, f"כשל בחילוץ טקסט: {e}")
|
||||
raise
|
||||
raw_text = extractor.strip_nevo_preamble((raw_text or "")).strip()
|
||||
raw_text = (raw_text or "")
|
||||
else:
|
||||
raw_text = (text or "").strip()
|
||||
raw_text = (text or "")
|
||||
# Capture the Nevo מיני-רציו (editorial holdings summary) BEFORE stripping
|
||||
# it out — it is a free professional gold-set for benchmarking halacha
|
||||
# extraction (#86.3). Stored on the case_law row below once we have its id.
|
||||
nevo_ratio = extractor.extract_nevo_ratio(raw_text)
|
||||
raw_text = extractor.strip_nevo_preamble(raw_text).strip()
|
||||
if not raw_text:
|
||||
await progress("failed", 100, "לא נמצא טקסט בקובץ")
|
||||
raise ValueError("no extractable text in file")
|
||||
@@ -180,6 +185,13 @@ async def ingest_document(
|
||||
)
|
||||
case_law_id = UUID(str(record["id"]))
|
||||
|
||||
# Persist the captured mini-ratio (best-effort; never block ingest on it).
|
||||
if nevo_ratio:
|
||||
try:
|
||||
await db.update_case_law(case_law_id, nevo_ratio=nevo_ratio)
|
||||
except Exception as e: # noqa: BLE001 — additive metadata, non-fatal
|
||||
logger.warning("could not store nevo_ratio for %s: %s", case_law_id, e)
|
||||
|
||||
try:
|
||||
stored_chunks = await _chunk_embed_store(case_law_id, raw_text, page_offsets, page_count, progress)
|
||||
await db.mark_indexed(case_law_id)
|
||||
|
||||
@@ -117,12 +117,33 @@ async def halacha_backlog(conn) -> dict:
|
||||
oldest = await conn.fetchval(
|
||||
"SELECT MIN(created_at) FROM halachot WHERE review_status = 'pending_review'"
|
||||
)
|
||||
# #84.7 — split the pending bucket: how many are genuine candidates (clean)
|
||||
# vs flagged 'needs extraction fix', and the breakdown by flag, so the chair
|
||||
# sees how much of the backlog is real review vs extraction noise.
|
||||
pending_clean = await conn.fetchval(
|
||||
"SELECT COUNT(*) FROM halachot WHERE review_status = 'pending_review' "
|
||||
"AND COALESCE(array_length(quality_flags, 1), 0) = 0"
|
||||
)
|
||||
flag_rows = await conn.fetch(
|
||||
"SELECT flag, COUNT(*) AS n FROM ("
|
||||
" SELECT unnest(quality_flags) AS flag FROM halachot "
|
||||
" WHERE review_status = 'pending_review'"
|
||||
") t GROUP BY flag ORDER BY n DESC"
|
||||
)
|
||||
pending_total = counts.get("pending_review", 0)
|
||||
reviewed = counts.get("approved", 0) + counts.get("rejected", 0) + counts.get("published", 0)
|
||||
return {
|
||||
"pending_review": counts.get("pending_review", 0),
|
||||
"pending_review": pending_total,
|
||||
"pending_clean": pending_clean, # real review candidates (#84.1)
|
||||
"pending_flagged": pending_total - pending_clean, # needs-fix bucket
|
||||
"approved": counts.get("approved", 0),
|
||||
"rejected": counts.get("rejected", 0),
|
||||
"deferred": counts.get("deferred", 0),
|
||||
"published": counts.get("published", 0),
|
||||
"total": sum(counts.values()),
|
||||
"reviewed_total": reviewed,
|
||||
"approve_ratio": round(counts.get("approved", 0) / reviewed, 3) if reviewed else None,
|
||||
"pending_by_flag": {r["flag"]: r["n"] for r in flag_rows},
|
||||
"oldest_pending_at": oldest.isoformat() if oldest else None,
|
||||
}
|
||||
|
||||
|
||||
@@ -104,7 +104,7 @@ CLAIMS_CHECK_PROMPT = """אתה בודק איכות החלטות משפטיות.
|
||||
"""
|
||||
|
||||
|
||||
async def check_claims_coverage(blocks: list[dict], claims: list[dict]) -> dict:
|
||||
async def check_claims_coverage(blocks: list[dict], claims: list[dict], outcome: str = "") -> dict:
|
||||
"""בדיקה סמנטית (Claude) שכל טענה נענתה בדיון."""
|
||||
yod = next((b for b in blocks if b["block_id"] == "block-yod"), None)
|
||||
if not yod or not yod.get("content"):
|
||||
@@ -114,16 +114,26 @@ async def check_claims_coverage(blocks: list[dict], claims: list[dict]) -> dict:
|
||||
if not claims:
|
||||
return {"name": "claims_coverage", "passed": True, "errors": [], "severity": "critical"}
|
||||
|
||||
# Filter: only APPELLANT claims from original pleadings.
|
||||
# Committee/permit_applicant claims are defensive positions, not claims
|
||||
# that need to be "addressed" in the discussion.
|
||||
# #87/GAP-87 — only the appellant's claims from the APPEAL PLEADING itself
|
||||
# must be addressed. claim_type: 'claim'=כתב ערר (mandatory), 'response'=כתב
|
||||
# תשובה, 'reply'=תגובה/השלמת-טיעון/תכתובת (supplementary correspondence — NOT
|
||||
# a standalone duty to answer, especially on full acceptance). Counting reply/
|
||||
# correspondence claims as "unanswered" produced false QA fails (1033-25).
|
||||
source_claims = [
|
||||
c for c in claims
|
||||
if c.get("source_document", "") != "block-zayin"
|
||||
and c.get("claim_type") == "claim"
|
||||
and c.get("party_role") == "appellant"
|
||||
]
|
||||
if not source_claims:
|
||||
# Fallback: appellant/respondent pleadings, excluding supplementary replies.
|
||||
source_claims = [
|
||||
c for c in claims
|
||||
if c.get("source_document", "") != "block-zayin"
|
||||
and c.get("claim_type") != "reply"
|
||||
and c.get("party_role") in ("appellant", "respondent")
|
||||
]
|
||||
if not source_claims:
|
||||
# Fallback: all non-block-zayin claims
|
||||
source_claims = [c for c in claims if c.get("source_document", "") != "block-zayin"]
|
||||
if not source_claims:
|
||||
source_claims = claims
|
||||
@@ -165,9 +175,14 @@ async def check_claims_coverage(blocks: list[dict], claims: list[dict]) -> dict:
|
||||
total = len(source_claims)
|
||||
covered = len(addressed) + len(partial)
|
||||
|
||||
# On full acceptance the appellant prevailed in full — not every sub-claim
|
||||
# needs individual treatment (the chair noted this for correspondence claims,
|
||||
# 1033-25). Relax the missing-tolerance accordingly.
|
||||
allowed_missing_ratio = 0.4 if outcome == "full_acceptance" else 0.2
|
||||
|
||||
return {
|
||||
"name": "claims_coverage",
|
||||
"passed": len(missing) <= total * 0.2, # Allow up to 20% missing
|
||||
"passed": len(missing) <= total * allowed_missing_ratio,
|
||||
"errors": errors,
|
||||
"severity": "critical",
|
||||
"details": f"{covered}/{total} טענות נענו ({covered/total*100:.0f}%), {len(partial)} חלקית, {len(missing)} חסרות",
|
||||
@@ -361,8 +376,10 @@ async def validate_decision(case_id: UUID) -> dict:
|
||||
# Get claims
|
||||
claims = await db.get_claims(case_id)
|
||||
|
||||
# Determine appeal type
|
||||
# Determine appeal type + outcome (outcome relaxes claims coverage on full acceptance — #87)
|
||||
appeal_type = case.get("appeal_type", "licensing")
|
||||
from legal_mcp.services.lessons import canonical_outcome
|
||||
outcome = canonical_outcome(decision.get("outcome", "") or "")
|
||||
|
||||
# Run all checks
|
||||
# Run sync checks
|
||||
@@ -370,7 +387,7 @@ async def validate_decision(case_id: UUID) -> dict:
|
||||
check_neutral_background(blocks),
|
||||
]
|
||||
# Async check: claims coverage with Claude
|
||||
results.append(await check_claims_coverage(blocks, claims))
|
||||
results.append(await check_claims_coverage(blocks, claims, outcome))
|
||||
# More sync checks
|
||||
results.extend([
|
||||
check_weight_compliance(blocks, appeal_type),
|
||||
|
||||
@@ -27,6 +27,62 @@ _BLOCK_TO_SECTION = {
|
||||
"block-yod-alef": "summary",
|
||||
}
|
||||
|
||||
# chunker section_type → golden-ratio section (for corpus measurement, T10)
|
||||
_CHUNK_SECTION_TO_GOLDEN = {
|
||||
"facts": "background", "intro": "background",
|
||||
"appellant_claims": "claims", "respondent_claims": "claims",
|
||||
"legal_analysis": "discussion",
|
||||
"conclusion": "summary", "ruling": "summary",
|
||||
}
|
||||
|
||||
_CORPUS_RATIOS_CACHE: dict | None = None
|
||||
|
||||
|
||||
async def measure_corpus_ratios() -> dict:
|
||||
"""Measure ACTUAL section %-of-total from Dafna's style_corpus, averaged per
|
||||
outcome — the empirical counterpart to lessons.GOLDEN_RATIOS (T10). Splits each
|
||||
decision via chunker (accurate, not the filtered exemplars). Cached for the
|
||||
process. Returns {outcome: {"n": int, "sections": {sec: pct}}}."""
|
||||
global _CORPUS_RATIOS_CACHE
|
||||
if _CORPUS_RATIOS_CACHE is not None:
|
||||
return _CORPUS_RATIOS_CACHE
|
||||
|
||||
from legal_mcp.services.chunker import _split_into_sections
|
||||
pool = await db.get_pool()
|
||||
async with pool.acquire() as conn:
|
||||
rows = await conn.fetch("SELECT full_text, outcome FROM style_corpus WHERE full_text <> ''")
|
||||
|
||||
# Per-outcome AND an "_all" aggregate. style_corpus.outcome is currently
|
||||
# unpopulated for the imported corpus, so per-outcome may be empty — "_all"
|
||||
# is the meaningful signal today, and per-outcome becomes live once outcomes
|
||||
# are backfilled. No silent loss: callers see which buckets have data via n.
|
||||
by_outcome: dict[str, list[dict]] = {}
|
||||
for r in rows:
|
||||
sect_words: dict[str, int] = {}
|
||||
for stype, stext in _split_into_sections(r["full_text"]):
|
||||
g = _CHUNK_SECTION_TO_GOLDEN.get(stype)
|
||||
if g:
|
||||
sect_words[g] = sect_words.get(g, 0) + len(stext.split())
|
||||
total = sum(sect_words.values())
|
||||
if total < 100: # sections didn't parse — skip
|
||||
continue
|
||||
pct = {s: w / total * 100 for s, w in sect_words.items()}
|
||||
by_outcome.setdefault("_all", []).append(pct)
|
||||
outcome = canonical_outcome(r["outcome"] or "")
|
||||
if outcome:
|
||||
by_outcome.setdefault(outcome, []).append(pct)
|
||||
|
||||
result: dict = {}
|
||||
for outcome, decs in by_outcome.items():
|
||||
avg = {}
|
||||
for sec in ("background", "claims", "discussion", "summary"):
|
||||
vals = [d.get(sec, 0.0) for d in decs]
|
||||
if vals:
|
||||
avg[sec] = round(sum(vals) / len(vals), 1)
|
||||
result[outcome] = {"n": len(decs), "sections": avg}
|
||||
_CORPUS_RATIOS_CACHE = result
|
||||
return result
|
||||
|
||||
|
||||
def count_anti_patterns(text: str) -> dict:
|
||||
"""Count each anti-pattern occurrence in text. Lower = closer to Dafna."""
|
||||
|
||||
@@ -170,6 +170,41 @@ async def get_style_guide() -> str:
|
||||
)
|
||||
result += "\n"
|
||||
|
||||
# T10 — measured-from-corpus ratios alongside the targets, ⚠️ flags a gap
|
||||
# (actual average outside the target range → revisit the target or the corpus).
|
||||
try:
|
||||
from legal_mcp.services.style_distance import measure_corpus_ratios
|
||||
measured = await measure_corpus_ratios()
|
||||
if measured:
|
||||
result += "### נמדד מהקורפוס בפועל (ממוצע) — ⚠️ = פער מהיעד\n\n"
|
||||
result += "| קבוצה | רקע | טענות | דיון | סיכום |\n|---|------|-------|------|-------|\n"
|
||||
# Per-outcome rows (flagged vs that outcome's target), when outcomes exist.
|
||||
for outcome in VALID_OUTCOMES:
|
||||
m = measured.get(outcome)
|
||||
if not m:
|
||||
continue
|
||||
tgt = GOLDEN_RATIOS[outcome]
|
||||
cells = []
|
||||
for sec in ("background", "claims", "discussion", "summary"):
|
||||
val = m["sections"].get(sec)
|
||||
if val is None:
|
||||
cells.append("—")
|
||||
continue
|
||||
lo, hi = tgt[sec]
|
||||
cells.append(f"{val}%" + ("" if lo <= val <= hi else " ⚠️"))
|
||||
result += f"| {outcome_labels[outcome]} (n={m['n']}) | " + " | ".join(cells) + " |\n"
|
||||
# "_all" aggregate — the meaningful row today (corpus outcome unpopulated);
|
||||
# shown informationally (no single target to flag against).
|
||||
allm = measured.get("_all")
|
||||
if allm:
|
||||
cells = [f"{allm['sections'].get(s, '—')}%" if allm['sections'].get(s) is not None else "—"
|
||||
for s in ("background", "claims", "discussion", "summary")]
|
||||
result += f"| כל ההחלטות (n={allm['n']}) | " + " | ".join(cells) + " |\n"
|
||||
result += ("\n_⚠️ = הממוצע בפועל חורג מטווח-היעד; שקול לעדכן יעד ב-/methodology או לבדוק את הקורפוס. "
|
||||
"פיצול לפי-תוצאה יופיע כש-`style_corpus.outcome` יאוכלס._\n\n")
|
||||
except Exception as e: # surfaced, not swallowed
|
||||
result += f"_מדידת יחסי-זהב מהקורפוס נכשלה: {e}_\n\n"
|
||||
|
||||
# Opening and summary strategies
|
||||
result += "## אסטרטגיות פתיחה וסיכום לפי תוצאה\n\n"
|
||||
for outcome in VALID_OUTCOMES:
|
||||
|
||||
@@ -356,7 +356,22 @@ async def halacha_review(
|
||||
return _ok(row)
|
||||
|
||||
|
||||
async def halachot_pending(limit: int = 100) -> str:
|
||||
"""תור ההלכות הממתינות לאישור (review_status='pending_review')."""
|
||||
rows = await db.list_halachot(review_status="pending_review", limit=limit)
|
||||
async def halachot_pending(limit: int = 100, include_low_quality: bool = False) -> str:
|
||||
"""תור ההלכות הממתינות לאישור (review_status='pending_review').
|
||||
|
||||
כברירת-מחדל (#84.1, #84.3) התור **מסונן** — הלכות עם דגל-איכות כלשהו
|
||||
(application / ציטוט-לא-מאומת / קטוע / obiter / restatement דק / לא-נתמך /
|
||||
near-duplicate) מוסתרות (הן שייכות ל'דורש תיקון-חילוץ', לא לתור-האישור),
|
||||
ו**ממוין לפי עדיפות** (טופלו-לרעה תחילה, אז הכי לא-ודאיים, אז הישנים).
|
||||
|
||||
Args:
|
||||
limit: מספר מקסימלי.
|
||||
include_low_quality: True כדי לחשוף גם פריטים מסומני-איכות (בקט 'דורש תיקון').
|
||||
"""
|
||||
rows = await db.list_halachot(
|
||||
review_status="pending_review",
|
||||
limit=limit,
|
||||
exclude_low_quality=not include_low_quality,
|
||||
order_by_priority=True,
|
||||
)
|
||||
return _ok(rows)
|
||||
|
||||
44
mcp-server/tests/test_claude_session.py
Normal file
44
mcp-server/tests/test_claude_session.py
Normal file
@@ -0,0 +1,44 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
|
||||
from legal_mcp.services import claude_session as cs
|
||||
|
||||
|
||||
def test_clean_env_strips_session_markers(monkeypatch):
|
||||
"""Nested claude -p must not inherit the parent session markers (#85)."""
|
||||
for k in (
|
||||
"CLAUDECODE",
|
||||
"CLAUDE_CODE_ENTRYPOINT",
|
||||
"CLAUDE_CODE_SESSION_ID",
|
||||
"CLAUDE_CODE_EXECPATH",
|
||||
"CLAUDE_CODE_SSE_PORT",
|
||||
"CLAUDE_AGENT_SDK_VERSION",
|
||||
"AI_AGENT",
|
||||
"CLAUDE_EFFORT",
|
||||
):
|
||||
monkeypatch.setenv(k, "x")
|
||||
|
||||
env = cs._clean_subprocess_env()
|
||||
|
||||
assert "CLAUDECODE" not in env
|
||||
assert "AI_AGENT" not in env
|
||||
assert "CLAUDE_EFFORT" not in env
|
||||
assert not any(k.startswith("CLAUDE_CODE_") for k in env)
|
||||
assert not any(k.startswith("CLAUDE_AGENT_") for k in env)
|
||||
|
||||
|
||||
def test_clean_env_keeps_auth_and_path(monkeypatch):
|
||||
"""Auth/config + PATH/HOME must survive — they are needed by the CLI."""
|
||||
monkeypatch.setenv("CLAUDECODE", "1")
|
||||
monkeypatch.setenv("CLAUDE_CONFIG_DIR", "/home/chaim/.claude")
|
||||
monkeypatch.setenv("ANTHROPIC_BASE_URL", "https://example")
|
||||
monkeypatch.setenv("PATH", os.environ.get("PATH", "/usr/bin"))
|
||||
|
||||
env = cs._clean_subprocess_env()
|
||||
|
||||
# CLAUDE_CONFIG_DIR carries credentials — must NOT be stripped.
|
||||
assert env.get("CLAUDE_CONFIG_DIR") == "/home/chaim/.claude"
|
||||
assert env.get("ANTHROPIC_BASE_URL") == "https://example"
|
||||
assert "PATH" in env
|
||||
assert "CLAUDECODE" not in env
|
||||
@@ -181,3 +181,75 @@ def test_consolidation_priority_prefers_approved_then_confidence():
|
||||
"quote_verified": True, "rule_statement": "x"}
|
||||
# approved sorts before higher-confidence pending → kept as canonical
|
||||
assert min([approved, pending_hi], key=he._consolidation_priority)["id"] == "a"
|
||||
|
||||
|
||||
# ── #81.4 fact-dependent / application ──
|
||||
|
||||
@pytest.mark.parametrize("rule", [
|
||||
"במקרה דנן ועדת הערר קבעה כי ההיתר בטל",
|
||||
"בענייננו אין הצדקה לפיצוי",
|
||||
"בערר שלפנינו הוכח כי השומה שגויה",
|
||||
])
|
||||
def test_is_fact_dependent_hits(rule):
|
||||
assert hq.is_fact_dependent(rule) is True
|
||||
|
||||
|
||||
@pytest.mark.parametrize("rule", [
|
||||
"ועדת הערר מוסמכת לדון בהיטל השבחה",
|
||||
"נטל ההוכחה מוטל על המבקש",
|
||||
"פגיעה תכנונית מזכה בפיצוי לפי סעיף 197",
|
||||
])
|
||||
def test_is_fact_dependent_misses(rule):
|
||||
assert hq.is_fact_dependent(rule) is False
|
||||
|
||||
|
||||
def test_application_flag_from_rule_type():
|
||||
flags = hq.compute_quality_flags(
|
||||
"נטל ההוכחה על המבקש", "נטל ההוכחה על המבקש כאמור",
|
||||
rule_type="application",
|
||||
)
|
||||
assert hq.FLAG_APPLICATION in flags
|
||||
|
||||
|
||||
def test_application_flag_from_deixis_even_if_binding():
|
||||
flags = hq.compute_quality_flags(
|
||||
"במקרה דנן נדחה הערר", "כפי שקבענו במקרה דנן נדחה הערר",
|
||||
rule_type="binding",
|
||||
)
|
||||
assert hq.FLAG_APPLICATION in flags
|
||||
|
||||
|
||||
def test_clean_binding_rule_has_no_flags():
|
||||
flags = hq.compute_quality_flags(
|
||||
"ועדת הערר מוסמכת לדון בטענות חוקתיות הנוגעות לתכנית",
|
||||
"הוועדה מוסמכת לדון אף בטענות מסוג זה, ככל שהן נוגעות לתכנית שבנדון.",
|
||||
rule_type="binding",
|
||||
)
|
||||
assert flags == []
|
||||
|
||||
|
||||
# ── #82.3 lexical near-duplicate signal ──
|
||||
|
||||
def test_jaccard_high_for_reworded_same_rule():
|
||||
a = "נטל ההוכחה בהיטל השבחה מוטל על הוועדה המקומית"
|
||||
b = "נטל ההוכחה בהיטל השבחה מוטל על הוועדה המקומית בלבד"
|
||||
assert hq.jaccard_shingles(a, b) >= 0.5
|
||||
|
||||
|
||||
def test_jaccard_low_for_distinct_rules():
|
||||
a = "ועדת הערר מוסמכת לדון בהיטל השבחה"
|
||||
b = "המועד להגשת ערר הוא שלושים יום"
|
||||
assert hq.jaccard_shingles(a, b) < 0.2
|
||||
|
||||
|
||||
def test_normalized_levenshtein_identical_and_disjoint():
|
||||
assert hq.normalized_levenshtein("אבג", "אבג") == 1.0
|
||||
assert hq.normalized_levenshtein("", "אבג") == 0.0
|
||||
|
||||
|
||||
def test_lexical_near_duplicate_band():
|
||||
a = "נטל ההוכחה בהיטל השבחה מוטל על הוועדה המקומית"
|
||||
b = "נטל ההוכחה בהיטל השבחה מוטל על הוועדה המקומית, כך נפסק"
|
||||
assert hq.lexical_near_duplicate(a, b) is True
|
||||
c = "המועד להגשת ערר על שומה הוא שלושים ימים"
|
||||
assert hq.lexical_near_duplicate(a, c) is False
|
||||
|
||||
@@ -55,3 +55,64 @@ def test_markers_past_400_chars_still_detected():
|
||||
text = header + _PREAMBLE + "השופטת ע' ארבל:\n\nגוף ההחלטה..."
|
||||
out = ex.strip_nevo_preamble(text)
|
||||
assert out.startswith("השופטת ע' ארבל:")
|
||||
|
||||
|
||||
# ── extract_nevo_ratio (#86.3 gold-set capture) ──
|
||||
|
||||
def test_extract_ratio_returns_block_before_body():
|
||||
text = _PREAMBLE + "השופט ס' ג'ובראן:\n\nגוף ההחלטה..."
|
||||
ratio = ex.extract_nevo_ratio(text)
|
||||
assert "העותרים לא הוכיחו טעם מיוחד" in ratio
|
||||
assert "המחוקק הגביל את הזמן" in ratio
|
||||
# must not bleed into the judgment body
|
||||
assert "גוף ההחלטה" not in ratio
|
||||
assert "השופט ס' ג'ובראן" not in ratio
|
||||
|
||||
|
||||
def test_extract_ratio_stops_at_following_marker():
|
||||
# ratio first, then a bibliography marker AFTER it
|
||||
text = (
|
||||
"מיני-רציו:\n* עיקרון אחד בלבד.\n\n"
|
||||
"פסקי דין שאוזכרו:\nבג\"ץ 1/00\n\n"
|
||||
"פסק-דין\nגוף..."
|
||||
)
|
||||
ratio = ex.extract_nevo_ratio(text)
|
||||
assert "עיקרון אחד בלבד" in ratio
|
||||
assert "פסקי דין שאוזכרו" not in ratio
|
||||
assert "בג\"ץ 1/00" not in ratio
|
||||
|
||||
|
||||
def test_extract_ratio_empty_when_no_marker():
|
||||
assert ex.extract_nevo_ratio("פסק דין\nהשופט כהן: ...") == ""
|
||||
assert ex.extract_nevo_ratio("") == ""
|
||||
|
||||
|
||||
# ── #86.2 over-strip regressions ──
|
||||
|
||||
def test_citation_judge_line_is_not_a_decision_start():
|
||||
# "השופט מ' חשין, פסקה 23" is a CITATION (comma, no colon) — must NOT be
|
||||
# treated as the decision opening, or 32K of real body gets stripped.
|
||||
body = (
|
||||
"**פסק דין**\n\n"
|
||||
"שני ערעורים לפניי. כפי שנפסק מפי כבוד \n\n"
|
||||
"השופט מ' חשין, פסקה 23 (להלן עניין קהתי), יש לבחון...\n"
|
||||
)
|
||||
text = _PREAMBLE + body
|
||||
out = ex.strip_nevo_preamble(text)
|
||||
assert out.startswith("**פסק דין**")
|
||||
assert "השופט מ' חשין, פסקה" in out # citation kept inside body
|
||||
assert "מיני-רציו" not in out
|
||||
|
||||
|
||||
def test_markdown_wrapped_pdin_header_is_stripped():
|
||||
text = _PREAMBLE + "**פסק דין**\n\nשני ערעוריה הנדונים..."
|
||||
out = ex.strip_nevo_preamble(text)
|
||||
assert out.startswith("**פסק דין**")
|
||||
assert "מיני-רציו" not in out
|
||||
|
||||
|
||||
def test_author_line_with_colon_still_strips():
|
||||
text = _PREAMBLE + "כב' השופטת ד' ברק-ארז:\n\nגוף ההחלטה..."
|
||||
out = ex.strip_nevo_preamble(text)
|
||||
assert out.startswith("כב' השופטת ד' ברק-ארז:")
|
||||
assert "מיני-רציו" not in out
|
||||
|
||||
@@ -36,6 +36,11 @@
|
||||
| `multimodal_backfill.py` | python | Backfill voyage-multimodal-3 page embeddings על מסמכי תיקים קיימים. idempotent (skips by default), forces `MULTIMODAL_ENABLED=true` ל-run, רץ מהקונטיינר. שלב C — ראה `docs/voyage-upgrades-plan.md` | ידני per-case (`python multimodal_backfill.py 8174-24 8137-24`) |
|
||||
| `backfill_chunk_pages.py` | python | Backfill `page_number` ב-`document_chunks` קיימים. legacy chunker לא tracked עמודים → `page_number=NULL` חוסם boost של multimodal hybrid (text+image join על אותו עמוד). re-extracts כל PDF (re-OCR אם צריך, ~$0.0015/page), מחשב page_offsets, ומעדכן chunks. idempotent | ידני per-case (`python backfill_chunk_pages.py 8174-24 8137-24`) |
|
||||
| `rechunk_legacy_precedents.py` | python | **#57** — re-chunk + re-embed פסיקה שהוטמעה לפני תיקון ה-chunker (#55). בוחר כל `case_law` עם chunk זעיר (`length(trim(content))<50` — טביעת-האצבע של ה-chunker הישן) ומריץ `ingest.reindex_case_law` (re-chunk+re-embed מ-`full_text` שמור בלבד — ללא re-OCR/LLM, feedback_no_reocr_retrofit; idempotent DELETE-then-INSERT). idempotent ברמת-הבאטץ' (שואב מחדש את הסט המושפע בכל ריצה). דגל `--limit N`. רץ עם venv של mcp-server (`cd mcp-server && .venv/bin/python ../scripts/rechunk_legacy_precedents.py`) | חד-פעמי — מיגרציית-נתונים של פסיקה legacy (תוקן 2026-06-03) |
|
||||
| `backfill_nevo_preamble.py` | python | **#86.2** — מיגרציית-נתונים: חיתוך preamble/רציו של נבו שדלף לפסיקה שהוטמעה לפני תיקון #86.1. מאתר כל `case_law` ש-`strip_nevo_preamble(full_text)` עדיין מקצר (דליפה היסטורית), ומבצע: (1) לכידת ה-מיני-רציו ל-`case_law.nevo_ratio` (gold-set ל-#86.3); (2) שכתוב `full_text` החתוך + חישוב-מחדש של `content_hash`; (3) `reindex_case_law` (re-chunk+embed, ללא re-OCR/LLM); (4) **סימון (לא מחיקה)** הלכות ש-`supporting_quote` שלהן בתוך ה-preamble שהוסר → `pending_review` + quality_flag `nevo_preamble_leak`. **שומר-בטיחות:** שורות עם keep%<`--min-keep` (ברירת-מחדל 60) מוחרגות מ-`--apply` כחשד over-strip (אלא אם `--include-suspicious`). **dry-run כברירת-מחדל**; `--apply` כותב backup JSON + manifest CSV ל-`data/audit/` תחילה. idempotent. רץ עם venv של mcp-server. **chair-gated** (לאמת manifest לפני apply) | מיגרציית-נתונים — dry-run בוצע (19 פסקים, 27 הלכות מזוהמות); apply ממתין לאישור |
|
||||
| `nevo_ratio_benchmark.py` | python | **#86.3** — מדידת איכות חילוץ-הלכות מול ה-מיני-רציו של נבו (gold-set מקצועי חינמי). לכל פסק עם `nevo_ratio` (או נגזר מ-`full_text` אם טרם בוצע backfill): LLM-judge מקומי (`claude_session`, אפס עלות) ממפה סמנטית את הלכות-המערכת מול הלכות-נבו ומפיק **recall** (כיסוי הלכות-נבו), **precision** (אחוז הלכותינו הממופות), **granularity** (יחס פירוק — איתות over-extraction ל-#81.5). `--case <num>` / `--all [--limit N]` / `--model` / `--out`. כותב CSV ל-`data/audit/`. רץ עם venv של mcp-server (דורש Claude CLI מקומי). אומת על בג"ץ 1764/05: recall 0.875, precision 1.0, granularity 1.75x | ידני — מדידת-איכות (CI/ad-hoc) |
|
||||
| `halacha_goldset.py` | python | **#81.7** — הארנס gold-set לאיכות חילוץ-הלכות. `export --n N` מייצא מדגם מרובד (לפי precedent×rule_type) ל-CSV עם עמודות-תיוג ריקות (`is_holding`/`correct_type`/`quote_complete`) לתיוג ידני (חיים/דפנה). `score --in <csv>` קורא את ה-CSV המתויג ומודד כל ולידטור (`compute_quality_flags`/`is_fact_dependent`/`is_quote_truncated`/`is_thin_restatement`) מול אמת-המידה האנושית: P/R/F1 + confusion. בסיס ל-#81.8 (כיול סף האישור). מייבא את אותם ולידטורים שה-extractor מריץ. רץ עם venv של mcp-server | ידני — export→תיוג→score |
|
||||
| `halacha_batch_reconcile.py` | python | **#82.7** — dedup חוצה-פסקים offline (שמרני, **dry-run בלבד**). dedup-on-insert משווה רק תוך-פסק; כאן סף מחמיר (cosine ≥0.95, `--cosine`) ולא-הרסני: מאתר זוגות הלכות near-duplicate בין פסקים שונים (pgvector `<=>` exact) עם איתות לקסיקלי (Jaccard/Levenshtein) ומדווח ל-CSV ב-`data/audit/` לסקירת היו"ר. לא מדלג/ממזג/מוחק. `--include-pending`. רץ עם venv של mcp-server. אומת: 819 הלכות → 5 זוגות מועמדים | ידני — דוח-סקירה |
|
||||
| `calibrate_halacha_dedup.py` | python | **#82.1** — כיול ספי ה-dedup הלקסיקלי (#82.3) מול gold-set הניקוי. קורא `halacha-cleanup-manifest-*.csv` (זוגות duplicate↔survivor מתויגי-אדם), טוען טקסט-survivor מה-DB, ו-sweep של (jaccard_min × levenshtein_min) עם P/R/F1, מסמן את נקודת-העבודה המוגדרת. אימת ש-(0.55, 0.70) → **precision 1.0** (אפס false-merge), recall 0.30 — מתאים לאיתות-משני שחוסם auto-approve. `--manifest <path>`. רץ עם venv של mcp-server | חד-פעמי — כיול (בוצע 2026-06-06) |
|
||||
| `audit_corpus_integrity.py` | python | בדיקה תקופתית של עקביות הקורפוס — 3 בדיקות SQL read-only על `case_law` ו-`cases`: (A) `external_upload` עם prefix פנימי `ערר`/`בל"מ`; (B) `internal_committee` חסר `chair_name`/`district`; (C) `cases.practice_area` מחוץ ל-{`rishuy_uvniya`, `betterment_levy`, `compensation_197`, `''`}. כותב log מצטבר ל-`data/logs/corpus_integrity_audit.log` ובמצב הפרות שולח wakeup ל-CEO ב-Paperclip (best-effort, רק אם `PAPERCLIP_API_URL`+`PAPERCLIP_API_KEY` מוגדרים). דגל: `--no-notify`. Idempotent, יוצא 0. **Cron יומי 07:00**: `0 7 * * * /home/chaim/legal-ai/mcp-server/.venv/bin/python /home/chaim/legal-ai/scripts/audit_corpus_integrity.py` | `0 7 * * *` (cron) |
|
||||
| `backfill_legal_arguments.py` | python | Backfill `legal_arguments` לתיקים עם `claims` קיימים (TaskMaster #36). מקבץ פרופוזיציות גולמיות לטיעונים משפטיים מובחנים (~6-12 לכל צד) דרך `argument_aggregator.aggregate_claims_to_arguments` (Claude CLI). תומך `--dry-run`/`--apply`/`--force`/`--case <num>...`. **חייב לרוץ מהמכונה המקומית** (לא קונטיינר) — `claude_session` דורש Claude CLI | ידני per-case (`python scripts/backfill_legal_arguments.py --apply --case 1017-03-26`) |
|
||||
| `upload_blam_decisions.py` | python | חד-פעמי (2026-05-26) — העלאת 2 החלטות בל"מ ל-`case_law` (8126/24 סופר נוח, 8047/23 הרנון) דרך `ingest_internal_decision` ישיר, עוקף MCP server שטרם נטען מחדש אחרי הוספת `proceeding_type`. **לא להריץ שוב** | חד-פעמי — להעביר ל-`.archive/` בהזדמנות |
|
||||
|
||||
240
scripts/backfill_nevo_preamble.py
Normal file
240
scripts/backfill_nevo_preamble.py
Normal file
@@ -0,0 +1,240 @@
|
||||
#!/usr/bin/env python3
|
||||
"""#86.2 — backfill: strip leaked Nevo preamble/ratio from already-ingested rulings.
|
||||
|
||||
Court rulings ingested BEFORE the #86.1 fix kept their Nevo preamble
|
||||
(bibliography + מיני-רציו) because the old ``_DECISION_START`` regex only
|
||||
matched ועדת-ערר openings, not ``פסק-דין``/judge openings. For those rows the
|
||||
preamble is baked into the stored ``full_text`` AND into the chunks — and the
|
||||
מיני-רציו (Nevo's editorial answer-key) may have leaked into extracted
|
||||
halachot, contaminating the corpus.
|
||||
|
||||
This script finds every case_law row whose stored ``full_text`` would still be
|
||||
shortened by the CURRENT ``strip_nevo_preamble`` (i.e. a pre-fix leak), and:
|
||||
|
||||
1. captures the מיני-רציו into ``case_law.nevo_ratio`` (gold-set for #86.3),
|
||||
unless that column is already populated;
|
||||
2. rewrites ``full_text`` to the stripped body + recomputes ``content_hash``;
|
||||
3. re-chunks + re-embeds via ``ingest.reindex_case_law`` (no re-OCR, no LLM);
|
||||
4. flags — never deletes — halachot whose supporting_quote lives entirely in
|
||||
the removed preamble region: review_status -> 'pending_review' plus a
|
||||
'nevo_preamble_leak' quality_flag, so the chair can re-judge them (#84).
|
||||
|
||||
DRY-RUN BY DEFAULT. ``--apply`` performs the migration and first writes a JSON
|
||||
backup + CSV manifest to ``data/audit/`` (per the code-protocol data-migration
|
||||
rule). Idempotent: a re-run finds nothing because stripped rows no longer match.
|
||||
|
||||
Run with the MCP server venv (config loads ~/.env / Infisical for POSTGRES +
|
||||
VOYAGE, same as the live MCP tools):
|
||||
|
||||
cd ~/legal-ai/mcp-server
|
||||
.venv/bin/python ../scripts/backfill_nevo_preamble.py # dry-run
|
||||
.venv/bin/python ../scripts/backfill_nevo_preamble.py --apply # migrate
|
||||
.venv/bin/python ../scripts/backfill_nevo_preamble.py --limit 3 # smoke
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import csv
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from legal_mcp.services import db, ingest
|
||||
from legal_mcp.services.extractor import extract_nevo_ratio, strip_nevo_preamble
|
||||
from legal_mcp.services.halacha_quality import normalize_text
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
AUDIT_DIR = REPO_ROOT / "data" / "audit"
|
||||
|
||||
# Safety: a clean strip removes only the Nevo preamble (a small head). If the
|
||||
# strip would discard more than this fraction of the document, treat it as a
|
||||
# suspected over-strip (a citation/heading false-match) and DO NOT auto-apply
|
||||
# — surface it for manual review instead. Destroying real decision body is
|
||||
# far worse than leaving a preamble in place.
|
||||
DEFAULT_MIN_KEEP_PCT = 60
|
||||
|
||||
|
||||
async def _scan(conn, limit: int | None) -> list[dict]:
|
||||
"""Return rows whose stored full_text still carries a Nevo preamble."""
|
||||
rows = await conn.fetch(
|
||||
"SELECT id, case_number, full_text, nevo_ratio "
|
||||
"FROM case_law WHERE full_text <> '' ORDER BY case_number"
|
||||
)
|
||||
hits: list[dict] = []
|
||||
for r in rows:
|
||||
full = r["full_text"] or ""
|
||||
stripped = strip_nevo_preamble(full)
|
||||
if stripped == full:
|
||||
continue # no leak (already clean, or never had a preamble)
|
||||
removed = full[: len(full) - len(stripped)]
|
||||
ratio = extract_nevo_ratio(full)
|
||||
keep_pct = round(100 * len(stripped) / len(full)) if full else 0
|
||||
hits.append({
|
||||
"id": r["id"],
|
||||
"case_number": r["case_number"],
|
||||
"full_text": full,
|
||||
"stripped": stripped,
|
||||
"removed": removed,
|
||||
"ratio": ratio,
|
||||
"keep_pct": keep_pct,
|
||||
"had_ratio_stored": bool((r["nevo_ratio"] or "").strip()),
|
||||
})
|
||||
if limit and len(hits) >= limit:
|
||||
break
|
||||
return hits
|
||||
|
||||
|
||||
async def _contaminated_halachot(conn, case_law_id, removed: str) -> list[dict]:
|
||||
"""Halachot whose supporting_quote sits entirely inside the removed preamble."""
|
||||
norm_removed = normalize_text(removed)
|
||||
if not norm_removed:
|
||||
return []
|
||||
rows = await conn.fetch(
|
||||
"SELECT id, halacha_index, supporting_quote, review_status, quality_flags "
|
||||
"FROM halachot WHERE case_law_id = $1",
|
||||
case_law_id,
|
||||
)
|
||||
bad = []
|
||||
for r in rows:
|
||||
q = normalize_text(r["supporting_quote"] or "")
|
||||
if len(q) >= 20 and q in norm_removed:
|
||||
bad.append(dict(r))
|
||||
return bad
|
||||
|
||||
|
||||
async def main(args: argparse.Namespace) -> int:
|
||||
ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
|
||||
pool = await db.get_pool()
|
||||
async with pool.acquire() as conn:
|
||||
hits = await _scan(conn, args.limit)
|
||||
for h in hits:
|
||||
h["contaminated"] = await _contaminated_halachot(conn, h["id"], h["removed"])
|
||||
|
||||
# Partition into safe (auto-appliable) vs suspicious (manual review).
|
||||
for h in hits:
|
||||
h["suspicious"] = h["keep_pct"] < args.min_keep
|
||||
safe = [h for h in hits if not h["suspicious"]]
|
||||
suspicious = [h for h in hits if h["suspicious"]]
|
||||
|
||||
n = len(hits)
|
||||
total_contam = sum(len(h["contaminated"]) for h in hits)
|
||||
print(f"leaked rulings found: {n} (contaminated halachot: {total_contam}; "
|
||||
f"safe: {len(safe)}, suspicious<{args.min_keep}%: {len(suspicious)})", flush=True)
|
||||
for h in hits:
|
||||
print(
|
||||
f" {'⚠ ' if h['suspicious'] else ' '}{h['case_number']}: "
|
||||
f"keep {h['keep_pct']}%, -{len(h['removed']):,} preamble chars, "
|
||||
f"ratio={len(h['ratio'])} chars, "
|
||||
f"{len(h['contaminated'])} contaminated halachot"
|
||||
+ ("" if h["ratio"] else " [no mini-ratio]")
|
||||
+ (" [ratio already stored]" if h["had_ratio_stored"] else ""),
|
||||
flush=True,
|
||||
)
|
||||
if suspicious:
|
||||
print(f"\n⚠ {len(suspicious)} ruling(s) below {args.min_keep}% keep — "
|
||||
"EXCLUDED from --apply (suspected over-strip). Review manually or "
|
||||
"pass --include-suspicious to force.", flush=True)
|
||||
|
||||
if not hits:
|
||||
print("nothing to backfill — corpus clean ✓", flush=True)
|
||||
return 0
|
||||
|
||||
apply_set = hits if args.include_suspicious else safe
|
||||
|
||||
# Always write a manifest (dry-run included) for the audit trail.
|
||||
AUDIT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
manifest = AUDIT_DIR / f"nevo-backfill-manifest-{ts}.csv"
|
||||
with manifest.open("w", encoding="utf-8", newline="") as f:
|
||||
w = csv.writer(f)
|
||||
w.writerow(["case_law_id", "case_number", "keep_pct", "preamble_chars",
|
||||
"ratio_chars", "contaminated_halachot", "suspicious", "applied"])
|
||||
for h in hits:
|
||||
will_apply = args.apply and (not h["suspicious"] or args.include_suspicious)
|
||||
w.writerow([h["id"], h["case_number"], h["keep_pct"], len(h["removed"]),
|
||||
len(h["ratio"]), len(h["contaminated"]), h["suspicious"], will_apply])
|
||||
print(f"manifest: {manifest}", flush=True)
|
||||
|
||||
if not args.apply:
|
||||
print("\nDRY-RUN — no changes written. Re-run with --apply to migrate.", flush=True)
|
||||
return 0
|
||||
|
||||
# Backup the BEFORE state before mutating anything.
|
||||
backup = AUDIT_DIR / f"nevo-backfill-backup-{ts}.json"
|
||||
with backup.open("w", encoding="utf-8") as f:
|
||||
json.dump([
|
||||
{
|
||||
"id": str(h["id"]),
|
||||
"case_number": h["case_number"],
|
||||
"full_text": h["full_text"],
|
||||
"ratio": h["ratio"],
|
||||
"contaminated": [
|
||||
{"id": str(c["id"]), "halacha_index": c["halacha_index"],
|
||||
"review_status": c["review_status"],
|
||||
"quality_flags": list(c["quality_flags"] or [])}
|
||||
for c in h["contaminated"]
|
||||
],
|
||||
}
|
||||
for h in apply_set
|
||||
], f, ensure_ascii=False, indent=2)
|
||||
print(f"backup: {backup}", flush=True)
|
||||
|
||||
n_apply = len(apply_set)
|
||||
ok, failed = 0, []
|
||||
for i, h in enumerate(apply_set, 1):
|
||||
cid, cn = h["id"], h["case_number"]
|
||||
try:
|
||||
async with pool.acquire() as conn:
|
||||
async with conn.transaction():
|
||||
# 1+2: rewrite full_text + content_hash; store ratio if absent.
|
||||
await conn.execute(
|
||||
"UPDATE case_law SET full_text = $2, content_hash = $3 WHERE id = $1",
|
||||
cid, h["stripped"], db._content_hash(h["stripped"]),
|
||||
)
|
||||
if h["ratio"] and not h["had_ratio_stored"]:
|
||||
await conn.execute(
|
||||
"UPDATE case_law SET nevo_ratio = $2 WHERE id = $1",
|
||||
cid, h["ratio"],
|
||||
)
|
||||
# 4: flag (never delete) contaminated halachot.
|
||||
for c in h["contaminated"]:
|
||||
flags = list(c["quality_flags"] or [])
|
||||
if "nevo_preamble_leak" not in flags:
|
||||
flags.append("nevo_preamble_leak")
|
||||
await conn.execute(
|
||||
"UPDATE halachot SET review_status = 'pending_review', "
|
||||
"quality_flags = $2 WHERE id = $1",
|
||||
c["id"], flags,
|
||||
)
|
||||
# 3: reindex outside the txn (its own DELETE-then-INSERT + embeddings).
|
||||
res = await ingest.reindex_case_law(cid)
|
||||
ok += 1
|
||||
print(f"[{i}/{n_apply}] OK {cn}: -> {res['chunks']} chunks, "
|
||||
f"{len(h['contaminated'])} halachot flagged", flush=True)
|
||||
except Exception as e: # noqa: BLE001 — per-row, keep going
|
||||
failed.append((cn, str(e)))
|
||||
print(f"[{i}/{n_apply}] FAIL {cn}: {e}", flush=True)
|
||||
|
||||
print(f"\nDONE — {ok}/{n_apply} migrated, {len(failed)} failed"
|
||||
+ (f", {len(suspicious)} suspicious skipped" if suspicious and not args.include_suspicious else ""),
|
||||
flush=True)
|
||||
for cn, e in failed:
|
||||
print(f" FAILED {cn}: {e}", flush=True)
|
||||
return 0 if not failed else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
ap = argparse.ArgumentParser(description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
ap.add_argument("--apply", action="store_true",
|
||||
help="perform the migration (default: dry-run)")
|
||||
ap.add_argument("--limit", type=int, default=None,
|
||||
help="process only the first N leaked rulings")
|
||||
ap.add_argument("--min-keep", type=int, default=DEFAULT_MIN_KEEP_PCT,
|
||||
help=f"min%% of doc that must remain after strip to auto-apply "
|
||||
f"(default {DEFAULT_MIN_KEEP_PCT}); lower = suspected over-strip")
|
||||
ap.add_argument("--include-suspicious", action="store_true",
|
||||
help="force --apply on rows below --min-keep (use with care)")
|
||||
args = ap.parse_args()
|
||||
sys.exit(asyncio.run(main(args)))
|
||||
115
scripts/calibrate_halacha_dedup.py
Normal file
115
scripts/calibrate_halacha_dedup.py
Normal file
@@ -0,0 +1,115 @@
|
||||
#!/usr/bin/env python3
|
||||
"""#82.1 — calibrate the lexical dedup thresholds against the cleanup gold-set.
|
||||
|
||||
The 2026-06-03 cleanup manifest (data/audit/halacha-cleanup-manifest-*.csv)
|
||||
records, for each removed halacha, a ``reason`` and a ``survivor_id`` — i.e. a
|
||||
human-labeled set of TRUE duplicate pairs (deleted rule ↔ its survivor). This
|
||||
script uses them to validate the lexical near-duplicate thresholds introduced
|
||||
in #82.3 (``HALACHA`` Jaccard/Levenshtein), so the numbers in
|
||||
``halacha_quality.lexical_near_duplicate`` are calibrated, not guessed.
|
||||
|
||||
It sweeps (jaccard_min × levenshtein_min) and reports precision/recall against:
|
||||
* positives — duplicate-labeled pairs (deleted rule ↔ survivor rule)
|
||||
* negatives — random non-paired rules from the same manifest (≈all distinct)
|
||||
|
||||
and marks the currently-configured operating point.
|
||||
|
||||
cd ~/legal-ai/mcp-server
|
||||
.venv/bin/python ../scripts/calibrate_halacha_dedup.py \
|
||||
--manifest ../data/audit/halacha-cleanup-manifest-20260603T101747Z.csv
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import csv
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from uuid import UUID
|
||||
|
||||
from legal_mcp.services import db, halacha_quality as hq
|
||||
|
||||
|
||||
async def _survivor_text(survivor_id: str, manifest_map: dict) -> str:
|
||||
if survivor_id in manifest_map:
|
||||
return manifest_map[survivor_id]
|
||||
try:
|
||||
row = await db.get_halacha(UUID(survivor_id)) if hasattr(db, "get_halacha") else None
|
||||
except Exception:
|
||||
row = None
|
||||
if row:
|
||||
return row.get("rule_statement", "")
|
||||
# fallback: direct query
|
||||
try:
|
||||
pool = await db.get_pool()
|
||||
r = await pool.fetchrow("SELECT rule_statement FROM halachot WHERE id = $1", UUID(survivor_id))
|
||||
return r["rule_statement"] if r else ""
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
async def main(args: argparse.Namespace) -> int:
|
||||
path = Path(args.manifest)
|
||||
if not path.is_absolute():
|
||||
path = (Path.cwd() / path).resolve()
|
||||
with path.open(encoding="utf-8") as f:
|
||||
rows = list(csv.DictReader(f))
|
||||
by_id = {r["id"]: r.get("rule_statement", "") for r in rows}
|
||||
|
||||
positives: list[tuple[str, str]] = []
|
||||
for r in rows:
|
||||
if "duplicate" in (r.get("reason") or "").lower() and r.get("survivor_id"):
|
||||
a = r.get("rule_statement", "")
|
||||
b = await _survivor_text(r["survivor_id"], by_id)
|
||||
if a and b:
|
||||
positives.append((a, b))
|
||||
|
||||
# negatives: pair each deleted rule with a different, non-survivor rule.
|
||||
rules = [r.get("rule_statement", "") for r in rows if r.get("rule_statement")]
|
||||
negatives: list[tuple[str, str]] = []
|
||||
for i in range(len(positives)):
|
||||
a = rules[i % len(rules)]
|
||||
b = rules[(i * 7 + 3) % len(rules)] # deterministic spread, no RNG
|
||||
if a and b and a != b:
|
||||
negatives.append((a, b))
|
||||
|
||||
print(f"positives (labeled dup pairs): {len(positives)} "
|
||||
f"negatives: {len(negatives)}", flush=True)
|
||||
if not positives:
|
||||
print("no labeled duplicate pairs found in manifest — cannot calibrate", flush=True)
|
||||
return 1
|
||||
|
||||
# precompute lexical scores per pair
|
||||
def scores(pairs):
|
||||
return [(hq.jaccard_shingles(a, b), hq.normalized_levenshtein(a, b)) for a, b in pairs]
|
||||
pos_s, neg_s = scores(positives), scores(negatives)
|
||||
|
||||
print(f"\n{'jac_min':>8}{'lev_min':>8}{'P':>8}{'R':>8}{'F1':>8}", flush=True)
|
||||
best = None
|
||||
for jm in (0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70):
|
||||
for lm in (0.60, 0.65, 0.70, 0.75, 0.80, 0.85):
|
||||
tp = sum(1 for j, l in pos_s if j >= jm or l >= lm)
|
||||
fp = sum(1 for j, l in neg_s if j >= jm or l >= lm)
|
||||
fn = len(pos_s) - tp
|
||||
p = tp / (tp + fp) if (tp + fp) else 0.0
|
||||
r = tp / (tp + fn) if (tp + fn) else 0.0
|
||||
f1 = 2 * p * r / (p + r) if (p + r) else 0.0
|
||||
mark = " <- configured" if (abs(jm - hq._LEX_JACCARD_MIN) < 1e-9
|
||||
and abs(lm - hq._LEX_LEVENSHTEIN_MIN) < 1e-9) else ""
|
||||
if mark:
|
||||
print(f"{jm:>8.2f}{lm:>8.2f}{p:>8.3f}{r:>8.3f}{f1:>8.3f}{mark}", flush=True)
|
||||
if best is None or f1 > best[0]:
|
||||
best = (f1, jm, lm, p, r)
|
||||
print(f"\nbest F1={best[0]:.3f} at jaccard_min={best[1]}, levenshtein_min={best[2]} "
|
||||
f"(P={best[3]:.3f}, R={best[4]:.3f})", flush=True)
|
||||
print("note: positives may include obiter/application cuts (not pure dups); "
|
||||
"use precision as the guard against false-merges.", flush=True)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
ap = argparse.ArgumentParser(description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
ap.add_argument("--manifest", required=True, help="path to halacha-cleanup-manifest-*.csv")
|
||||
args = ap.parse_args()
|
||||
sys.exit(asyncio.run(main(args)))
|
||||
106
scripts/halacha_batch_reconcile.py
Normal file
106
scripts/halacha_batch_reconcile.py
Normal file
@@ -0,0 +1,106 @@
|
||||
#!/usr/bin/env python3
|
||||
"""#82.7 — offline CROSS-precedent halacha dedup (conservative, dry-run reporter).
|
||||
|
||||
Dedup-on-insert (db.store_halachot_for_chunk) only compares within a single
|
||||
precedent — the 2026-06-03 audit showed cosine ≥0.90 is reliable only
|
||||
within-precedent. Across precedents the same principle legitimately recurs, so
|
||||
this batch job is deliberately STRICTER (cosine ≥0.95) and NON-DESTRUCTIVE: it
|
||||
only reports candidate cross-precedent near-duplicate pairs to a CSV for the
|
||||
chair to review. Nothing is skipped, merged, or deleted.
|
||||
|
||||
Pairs are found with pgvector's exact cosine (``<=>``) per halacha against
|
||||
halachot in OTHER precedents; a secondary lexical check (Jaccard/Levenshtein)
|
||||
is reported alongside so the reviewer can tell "same rule" from "same topic".
|
||||
|
||||
cd ~/legal-ai/mcp-server
|
||||
.venv/bin/python ../scripts/halacha_batch_reconcile.py # cosine ≥0.95
|
||||
.venv/bin/python ../scripts/halacha_batch_reconcile.py --cosine 0.97
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import csv
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from legal_mcp.services import db, halacha_quality as hq
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
AUDIT_DIR = REPO_ROOT / "data" / "audit"
|
||||
|
||||
|
||||
async def main(args: argparse.Namespace) -> int:
|
||||
cosine = args.cosine
|
||||
max_dist = 1.0 - cosine
|
||||
statuses = ("approved", "published") if not args.include_pending else (
|
||||
"approved", "published", "pending_review")
|
||||
|
||||
pool = await db.get_pool()
|
||||
async with pool.acquire() as conn:
|
||||
rows = await conn.fetch(
|
||||
"SELECT h.id, h.case_law_id, cl.case_number, h.rule_statement "
|
||||
"FROM halachot h JOIN case_law cl ON cl.id = h.case_law_id "
|
||||
"WHERE h.embedding IS NOT NULL AND h.review_status = ANY($1::text[]) "
|
||||
"ORDER BY h.case_law_id, h.halacha_index",
|
||||
list(statuses),
|
||||
)
|
||||
print(f"scanning {len(rows)} halachot for cross-precedent pairs "
|
||||
f"(cosine ≥ {cosine})...", flush=True)
|
||||
|
||||
seen: set[frozenset] = set()
|
||||
pairs: list[dict] = []
|
||||
for r in rows:
|
||||
# nearest neighbor in a DIFFERENT precedent
|
||||
nb = await conn.fetchrow(
|
||||
"SELECT h2.id, cl2.case_number, h2.rule_statement, "
|
||||
" (h2.embedding <=> (SELECT embedding FROM halachot WHERE id = $1)) AS dist "
|
||||
"FROM halachot h2 JOIN case_law cl2 ON cl2.id = h2.case_law_id "
|
||||
"WHERE h2.embedding IS NOT NULL AND h2.case_law_id <> $2 "
|
||||
" AND h2.review_status = ANY($3::text[]) "
|
||||
"ORDER BY h2.embedding <=> (SELECT embedding FROM halachot WHERE id = $1) "
|
||||
"LIMIT 1",
|
||||
r["id"], r["case_law_id"], list(statuses),
|
||||
)
|
||||
if nb is None or float(nb["dist"]) > max_dist:
|
||||
continue
|
||||
key = frozenset({str(r["id"]), str(nb["id"])})
|
||||
if key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
pairs.append({
|
||||
"case_a": r["case_number"], "id_a": r["id"], "rule_a": r["rule_statement"],
|
||||
"case_b": nb["case_number"], "id_b": nb["id"], "rule_b": nb["rule_statement"],
|
||||
"cosine": round(1.0 - float(nb["dist"]), 4),
|
||||
"jaccard": round(hq.jaccard_shingles(r["rule_statement"], nb["rule_statement"]), 3),
|
||||
"levenshtein": round(hq.normalized_levenshtein(r["rule_statement"], nb["rule_statement"]), 3),
|
||||
})
|
||||
|
||||
pairs.sort(key=lambda p: -p["cosine"])
|
||||
print(f"found {len(pairs)} cross-precedent candidate pair(s)", flush=True)
|
||||
for p in pairs[:30]:
|
||||
print(f" cos={p['cosine']} jac={p['jaccard']} lev={p['levenshtein']} "
|
||||
f"{p['case_a']} ↔ {p['case_b']}: {p['rule_a'][:60]}...", flush=True)
|
||||
|
||||
if pairs:
|
||||
ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
|
||||
AUDIT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
out = AUDIT_DIR / f"halacha-cross-precedent-{ts}.csv"
|
||||
with out.open("w", encoding="utf-8", newline="") as f:
|
||||
w = csv.DictWriter(f, fieldnames=list(pairs[0].keys()))
|
||||
w.writeheader()
|
||||
w.writerows(pairs)
|
||||
print(f"\nreport: {out} (review-only — nothing changed)", flush=True)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
ap = argparse.ArgumentParser(description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
ap.add_argument("--cosine", type=float, default=0.95,
|
||||
help="min cosine for a cross-precedent candidate (default 0.95)")
|
||||
ap.add_argument("--include-pending", action="store_true",
|
||||
help="also scan pending_review halachot (default: approved/published only)")
|
||||
args = ap.parse_args()
|
||||
sys.exit(asyncio.run(main(args)))
|
||||
149
scripts/halacha_goldset.py
Normal file
149
scripts/halacha_goldset.py
Normal file
@@ -0,0 +1,149 @@
|
||||
#!/usr/bin/env python3
|
||||
"""#81.7 — gold-set harness for halacha-extraction quality.
|
||||
|
||||
Two modes — the human tagging in between is the only manual step:
|
||||
|
||||
export — dump a stratified sample of halachot to a CSV with EMPTY label
|
||||
columns for חיים/דפנה to fill (is_holding, correct_type,
|
||||
quote_complete). Stratified across precedents and rule_types so
|
||||
the set isn't dominated by one ruling.
|
||||
|
||||
score — read the tagged CSV back and measure each pure validator
|
||||
(compute_quality_flags / is_fact_dependent / is_quote_truncated /
|
||||
is_thin_restatement) against the human labels: precision, recall,
|
||||
F1 per validator + a confusion summary. This is the ground-truth
|
||||
#81.8 needs to recalibrate the auto-approve threshold.
|
||||
|
||||
The validators here are the SAME ones the live extractor runs, imported
|
||||
directly — so the score reflects production behavior, not a reimplementation.
|
||||
|
||||
cd ~/legal-ai/mcp-server
|
||||
.venv/bin/python ../scripts/halacha_goldset.py export --n 150
|
||||
# ... חיים/דפנה fill is_holding / correct_type / quote_complete ...
|
||||
.venv/bin/python ../scripts/halacha_goldset.py score --in data/audit/halacha-goldset-<ts>.csv
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import csv
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from legal_mcp.services import db, halacha_quality as hq
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
AUDIT_DIR = REPO_ROOT / "data" / "audit"
|
||||
|
||||
# Columns the human fills. is_holding: 1 if a real generalizable holding, 0 if
|
||||
# obiter/application/fact-recitation/non-rule. correct_type: binding/interpretive/
|
||||
# obiter/application. quote_complete: 1 if the quote is a whole, untruncated span.
|
||||
LABEL_COLS = ["is_holding", "correct_type", "quote_complete"]
|
||||
EXPORT_COLS = [
|
||||
"id", "case_number", "halacha_index", "rule_type", "review_status",
|
||||
"confidence", "rule_statement", "supporting_quote", *LABEL_COLS,
|
||||
]
|
||||
|
||||
|
||||
async def _export(n: int) -> int:
|
||||
rows = await db.list_halachot(limit=5000)
|
||||
# stratify: round-robin across (case_law_id, rule_type) buckets.
|
||||
buckets: dict = defaultdict(list)
|
||||
for r in rows:
|
||||
buckets[(r["case_law_id"], r.get("rule_type"))].append(r)
|
||||
sample: list[dict] = []
|
||||
keys = list(buckets.values())
|
||||
i = 0
|
||||
while len(sample) < n and any(keys):
|
||||
b = keys[i % len(keys)]
|
||||
if b:
|
||||
sample.append(b.pop())
|
||||
i += 1
|
||||
if i > n * 50:
|
||||
break
|
||||
ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
|
||||
AUDIT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
out = AUDIT_DIR / f"halacha-goldset-{ts}.csv"
|
||||
with out.open("w", encoding="utf-8", newline="") as f:
|
||||
w = csv.DictWriter(f, fieldnames=EXPORT_COLS, extrasaction="ignore")
|
||||
w.writeheader()
|
||||
for r in sample:
|
||||
w.writerow({**{k: r.get(k, "") for k in EXPORT_COLS},
|
||||
**{lc: "" for lc in LABEL_COLS}})
|
||||
print(f"exported {len(sample)} halachot for tagging → {out}", flush=True)
|
||||
print(f"fill columns: {', '.join(LABEL_COLS)} (is_holding/quote_complete = 1/0)", flush=True)
|
||||
return 0
|
||||
|
||||
|
||||
def _prf(tp: int, fp: int, fn: int) -> tuple[float, float, float]:
|
||||
p = tp / (tp + fp) if (tp + fp) else 0.0
|
||||
r = tp / (tp + fn) if (tp + fn) else 0.0
|
||||
f1 = 2 * p * r / (p + r) if (p + r) else 0.0
|
||||
return round(p, 3), round(r, 3), round(f1, 3)
|
||||
|
||||
|
||||
def _score(path: Path) -> int:
|
||||
with path.open(encoding="utf-8") as f:
|
||||
rows = [r for r in csv.DictReader(f) if (r.get("is_holding") or "").strip() != ""]
|
||||
if not rows:
|
||||
print("no labeled rows (is_holding empty everywhere) — nothing to score", flush=True)
|
||||
return 1
|
||||
|
||||
# A validator FLAG is a prediction of "NOT a clean holding" (should be
|
||||
# rejected/reviewed). Ground truth NOT-holding = is_holding == 0.
|
||||
# We score each validator as a detector of not-holding.
|
||||
counters: dict[str, dict[str, int]] = defaultdict(lambda: {"tp": 0, "fp": 0, "fn": 0, "tn": 0})
|
||||
|
||||
def tally(name: str, predicted_bad: bool, truly_bad: bool):
|
||||
c = counters[name]
|
||||
if predicted_bad and truly_bad:
|
||||
c["tp"] += 1
|
||||
elif predicted_bad and not truly_bad:
|
||||
c["fp"] += 1
|
||||
elif not predicted_bad and truly_bad:
|
||||
c["fn"] += 1
|
||||
else:
|
||||
c["tn"] += 1
|
||||
|
||||
for r in rows:
|
||||
rule = r.get("rule_statement", "")
|
||||
quote = r.get("supporting_quote", "")
|
||||
rtype = r.get("rule_type", "binding")
|
||||
quote_complete = (r.get("quote_complete") or "1").strip() not in ("0", "false", "")
|
||||
truly_not_holding = (r.get("is_holding") or "").strip() in ("0", "false")
|
||||
|
||||
flags = hq.compute_quality_flags(rule, quote, "", quote_complete, rtype)
|
||||
tally("any_flag", bool(flags), truly_not_holding)
|
||||
tally("application", hq.FLAG_APPLICATION in flags, truly_not_holding)
|
||||
tally("non_decision", hq.FLAG_NON_DECISION in flags, truly_not_holding)
|
||||
tally("thin_restatement", hq.FLAG_THIN_RESTATEMENT in flags, truly_not_holding)
|
||||
# quote-truncation scored against quote_complete label specifically
|
||||
tally("truncated_quote", hq.is_quote_truncated(quote), not quote_complete)
|
||||
|
||||
print(f"scored {len(rows)} labeled halachot\n", flush=True)
|
||||
print(f"{'validator':<18}{'P':>7}{'R':>7}{'F1':>7} tp/fp/fn/tn", flush=True)
|
||||
for name, c in counters.items():
|
||||
p, rec, f1 = _prf(c["tp"], c["fp"], c["fn"])
|
||||
print(f"{name:<18}{p:>7}{rec:>7}{f1:>7} "
|
||||
f"{c['tp']}/{c['fp']}/{c['fn']}/{c['tn']}", flush=True)
|
||||
return 0
|
||||
|
||||
|
||||
async def main(args: argparse.Namespace) -> int:
|
||||
if args.mode == "export":
|
||||
return await _export(args.n)
|
||||
return _score(Path(args.infile))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
ap = argparse.ArgumentParser(description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
sub = ap.add_subparsers(dest="mode", required=True)
|
||||
pe = sub.add_parser("export", help="dump a sample CSV for human tagging")
|
||||
pe.add_argument("--n", type=int, default=150, help="sample size (default 150)")
|
||||
ps = sub.add_parser("score", help="measure validators against a tagged CSV")
|
||||
ps.add_argument("--in", dest="infile", required=True, help="tagged CSV path")
|
||||
args = ap.parse_args()
|
||||
sys.exit(asyncio.run(main(args)))
|
||||
173
scripts/nevo_ratio_benchmark.py
Normal file
173
scripts/nevo_ratio_benchmark.py
Normal file
@@ -0,0 +1,173 @@
|
||||
#!/usr/bin/env python3
|
||||
"""#86.3 — benchmark halacha-extraction quality against Nevo's מיני-רציו gold-set.
|
||||
|
||||
Nevo's editorial מיני-רציו is a free, professionally-written list of a ruling's
|
||||
holdings. By comparing the halachot WE extracted against it we get an honest,
|
||||
zero-cost measurement of extraction quality per ruling:
|
||||
|
||||
* recall — fraction of Nevo's holdings that our halachot cover
|
||||
* precision — fraction of our halachot that map to a Nevo holding
|
||||
* granularity — our_count / nevo_holding_count (over-decomposition signal,
|
||||
the #81.5 concern: e.g. 14 ours vs 4 Nevo = 3.5x)
|
||||
|
||||
The gold-truth ratio is read from ``case_law.nevo_ratio`` (populated by
|
||||
``backfill_nevo_preamble.py`` / ingest). For rulings not yet backfilled it
|
||||
falls back to computing the ratio on-the-fly from the stored ``full_text``,
|
||||
so the harness works before and after the migration.
|
||||
|
||||
An LLM-as-judge (local ``claude_session``, zero API cost) does the semantic
|
||||
mapping — string overlap can't tell "same holding, different words" from a
|
||||
genuinely new holding. The judge is asked to count, not to rewrite.
|
||||
|
||||
Run with the MCP server venv (needs the local ``claude`` CLI):
|
||||
|
||||
cd ~/legal-ai/mcp-server
|
||||
.venv/bin/python ../scripts/nevo_ratio_benchmark.py --case 'בג"ץ 1764/05'
|
||||
.venv/bin/python ../scripts/nevo_ratio_benchmark.py --all --limit 5
|
||||
.venv/bin/python ../scripts/nevo_ratio_benchmark.py --all # full corpus
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import csv
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from legal_mcp.services import claude_session, db
|
||||
from legal_mcp.services.extractor import extract_nevo_ratio
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
AUDIT_DIR = REPO_ROOT / "data" / "audit"
|
||||
|
||||
_JUDGE_SYSTEM = (
|
||||
"אתה בוחן-איכות משפטי. נתונים לך (א) רשימת ההלכות (מיני-רציו) שכתב עורך נבו "
|
||||
"עבור פסק-דין — אמת-המידה; (ב) רשימת ההלכות שמערכת אוטומטית חילצה מאותו "
|
||||
"פסק-דין. משימתך: למפות סמנטית בין השתיים (אותו עיקרון משפטי בניסוח שונה = "
|
||||
"התאמה), ולספור. החזר JSON בלבד, ללא טקסט נוסף."
|
||||
)
|
||||
|
||||
|
||||
def _judge_prompt(ratio: str, ours: list[str]) -> str:
|
||||
ours_block = "\n".join(f"{i}. {s}" for i, s in enumerate(ours, 1)) or "(אין)"
|
||||
return (
|
||||
f"מיני-רציו של נבו (אמת-מידה):\n{ratio}\n\n"
|
||||
f"ההלכות שחולצו על-ידי המערכת ({len(ours)}):\n{ours_block}\n\n"
|
||||
"החזר JSON עם המפתחות:\n"
|
||||
'{"nevo_holdings": <מספר העקרונות הנפרדים במיני-רציו>,\n'
|
||||
' "covered": <כמה מעקרונות נבו מכוסים ע"י לפחות הלכה אחת שלנו>,\n'
|
||||
' "ours_total": <מספר ההלכות שלנו>,\n'
|
||||
' "ours_mapped": <כמה מההלכות שלנו ממופות לעיקרון נבו כלשהו>,\n'
|
||||
' "notes": "<עד 2 משפטים: מה הוחמץ / מה עודף>"}'
|
||||
)
|
||||
|
||||
|
||||
async def _bench_one(row: dict, model: str | None) -> dict:
|
||||
cn = row["case_number"]
|
||||
ratio = (row.get("nevo_ratio") or "").strip() or extract_nevo_ratio(row.get("full_text") or "")
|
||||
result = {"case_number": cn, "nevo_holdings": 0, "covered": 0,
|
||||
"ours_total": 0, "ours_mapped": 0, "recall": None,
|
||||
"precision": None, "granularity": None, "notes": "", "error": ""}
|
||||
if not ratio:
|
||||
result["error"] = "no mini-ratio"
|
||||
return result
|
||||
|
||||
halachot = await db.list_halachot(case_law_id=row["id"], limit=500)
|
||||
ours = [h["rule_statement"] for h in halachot
|
||||
if h.get("review_status") in ("approved", "published", "pending_review")
|
||||
and (h.get("rule_statement") or "").strip()]
|
||||
result["ours_total"] = len(ours)
|
||||
if not ours:
|
||||
result["error"] = "no extracted halachot"
|
||||
return result
|
||||
|
||||
try:
|
||||
verdict = await claude_session.query_json(
|
||||
_judge_prompt(ratio, ours), system=_JUDGE_SYSTEM, model=model, effort="low",
|
||||
)
|
||||
except Exception as e: # noqa: BLE001
|
||||
result["error"] = f"judge failed: {e}"
|
||||
return result
|
||||
if not isinstance(verdict, dict):
|
||||
result["error"] = "judge returned non-dict"
|
||||
return result
|
||||
|
||||
nh = int(verdict.get("nevo_holdings") or 0)
|
||||
cov = int(verdict.get("covered") or 0)
|
||||
ot = int(verdict.get("ours_total") or len(ours))
|
||||
om = int(verdict.get("ours_mapped") or 0)
|
||||
result.update({
|
||||
"nevo_holdings": nh, "covered": cov, "ours_total": ot, "ours_mapped": om,
|
||||
"recall": round(cov / nh, 3) if nh else None,
|
||||
"precision": round(om / ot, 3) if ot else None,
|
||||
"granularity": round(ot / nh, 2) if nh else None,
|
||||
"notes": str(verdict.get("notes") or "")[:300],
|
||||
})
|
||||
return result
|
||||
|
||||
|
||||
async def main(args: argparse.Namespace) -> int:
|
||||
pool = await db.get_pool()
|
||||
async with pool.acquire() as conn:
|
||||
if args.case:
|
||||
rows = await conn.fetch(
|
||||
"SELECT id, case_number, nevo_ratio, full_text FROM case_law "
|
||||
"WHERE case_number = $1", args.case,
|
||||
)
|
||||
else:
|
||||
# rulings that have (or can derive) a ratio
|
||||
rows = await conn.fetch(
|
||||
"SELECT id, case_number, nevo_ratio, full_text FROM case_law "
|
||||
"WHERE nevo_ratio <> '' OR full_text LIKE '%מיני-רציו:%' "
|
||||
"ORDER BY case_number"
|
||||
)
|
||||
rows = [dict(r) for r in rows]
|
||||
if args.limit:
|
||||
rows = rows[: args.limit]
|
||||
if not rows:
|
||||
print("no rulings with a mini-ratio found", flush=True)
|
||||
return 0
|
||||
|
||||
print(f"benchmarking {len(rows)} ruling(s)...", flush=True)
|
||||
results = []
|
||||
for i, row in enumerate(rows, 1):
|
||||
res = await _bench_one(row, args.model)
|
||||
results.append(res)
|
||||
if res["error"]:
|
||||
print(f"[{i}/{len(rows)}] {res['case_number']}: SKIP ({res['error']})", flush=True)
|
||||
else:
|
||||
print(f"[{i}/{len(rows)}] {res['case_number']}: "
|
||||
f"recall={res['recall']} precision={res['precision']} "
|
||||
f"granularity={res['granularity']}x "
|
||||
f"(nevo={res['nevo_holdings']}, ours={res['ours_total']})", flush=True)
|
||||
|
||||
scored = [r for r in results if r["recall"] is not None]
|
||||
if scored:
|
||||
avg = lambda k: round(sum(r[k] for r in scored) / len(scored), 3) # noqa: E731
|
||||
print(f"\n=== {len(scored)} scored — mean recall={avg('recall')} "
|
||||
f"precision={avg('precision')} granularity={avg('granularity')}x ===", flush=True)
|
||||
|
||||
ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
|
||||
AUDIT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
out = Path(args.out) if args.out else AUDIT_DIR / f"nevo-ratio-benchmark-{ts}.csv"
|
||||
with out.open("w", encoding="utf-8", newline="") as f:
|
||||
w = csv.DictWriter(f, fieldnames=list(results[0].keys()))
|
||||
w.writeheader()
|
||||
w.writerows(results)
|
||||
print(f"report: {out}", flush=True)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
ap = argparse.ArgumentParser(description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
g = ap.add_mutually_exclusive_group(required=True)
|
||||
g.add_argument("--case", help="benchmark a single case_number")
|
||||
g.add_argument("--all", action="store_true", help="benchmark all rulings with a mini-ratio")
|
||||
ap.add_argument("--limit", type=int, default=None, help="cap the number of rulings")
|
||||
ap.add_argument("--model", default=None, help="judge model (default: CLI session default)")
|
||||
ap.add_argument("--out", default=None, help="output CSV path (default: data/audit/)")
|
||||
args = ap.parse_args()
|
||||
sys.exit(asyncio.run(main(args)))
|
||||
@@ -1113,6 +1113,52 @@ export interface paths {
|
||||
patch?: never;
|
||||
trace?: never;
|
||||
};
|
||||
"/api/cases/{case_number}/decision-blocks": {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path?: never;
|
||||
cookie?: never;
|
||||
};
|
||||
/**
|
||||
* Api Get Decision Blocks
|
||||
* @description Return all 12 decision blocks as JSON (empty blocks included).
|
||||
*
|
||||
* Read path for the interactive block viewer — content lives in
|
||||
* decision_blocks but was previously only reachable via DOCX export.
|
||||
*/
|
||||
get: operations["api_get_decision_blocks_api_cases__case_number__decision_blocks_get"];
|
||||
put?: never;
|
||||
post?: never;
|
||||
delete?: never;
|
||||
options?: never;
|
||||
head?: never;
|
||||
patch?: never;
|
||||
trace?: never;
|
||||
};
|
||||
"/api/cases/{case_number}/decision-blocks/{block_id}": {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path?: never;
|
||||
cookie?: never;
|
||||
};
|
||||
get?: never;
|
||||
/**
|
||||
* Api Update Decision Block
|
||||
* @description Save inline-edited content for a single decision block.
|
||||
*
|
||||
* Writes to decision_blocks (upsert, status='draft') and rebuilds the
|
||||
* on-disk decision.md. Creates a decision row if none exists yet.
|
||||
*/
|
||||
put: operations["api_update_decision_block_api_cases__case_number__decision_blocks__block_id__put"];
|
||||
post?: never;
|
||||
delete?: never;
|
||||
options?: never;
|
||||
head?: never;
|
||||
patch?: never;
|
||||
trace?: never;
|
||||
};
|
||||
"/api/cases/{case_number}/learn": {
|
||||
parameters: {
|
||||
query?: never;
|
||||
@@ -1959,6 +2005,88 @@ export interface paths {
|
||||
patch?: never;
|
||||
trace?: never;
|
||||
};
|
||||
"/api/learning/pairs": {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path?: never;
|
||||
cookie?: never;
|
||||
};
|
||||
/**
|
||||
* Api Learning Pairs
|
||||
* @description פנקס-ההתאמה (INV-LRN4) — כל ההחלטות וסטטוס ההשוואה מול הסופי.
|
||||
* status אופציונלי: final_received / analyzed / lessons_folded.
|
||||
*/
|
||||
get: operations["api_learning_pairs_api_learning_pairs_get"];
|
||||
put?: never;
|
||||
post?: never;
|
||||
delete?: never;
|
||||
options?: never;
|
||||
head?: never;
|
||||
patch?: never;
|
||||
trace?: never;
|
||||
};
|
||||
"/api/learning/style-distance/{case_number}": {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path?: never;
|
||||
cookie?: never;
|
||||
};
|
||||
/**
|
||||
* Api Learning Style Distance
|
||||
* @description מדד מרחק-סגנון (T7) לתיק — האם הטיוטה מתכנסת לדפנה.
|
||||
*/
|
||||
get: operations["api_learning_style_distance_api_learning_style_distance__case_number__get"];
|
||||
put?: never;
|
||||
post?: never;
|
||||
delete?: never;
|
||||
options?: never;
|
||||
head?: never;
|
||||
patch?: never;
|
||||
trace?: never;
|
||||
};
|
||||
"/api/learning/pairs/{pair_id}": {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path?: never;
|
||||
cookie?: never;
|
||||
};
|
||||
/**
|
||||
* Api Learning Pair Detail
|
||||
* @description פירוט שורת-פנקס כולל הצעת-הדיסטילציה (analysis) לאישור יו"ר (T14).
|
||||
*/
|
||||
get: operations["api_learning_pair_detail_api_learning_pairs__pair_id__get"];
|
||||
put?: never;
|
||||
post?: never;
|
||||
delete?: never;
|
||||
options?: never;
|
||||
head?: never;
|
||||
patch?: never;
|
||||
trace?: never;
|
||||
};
|
||||
"/api/learning/pairs/{pair_id}/promote": {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path?: never;
|
||||
cookie?: never;
|
||||
};
|
||||
get?: never;
|
||||
put?: never;
|
||||
/**
|
||||
* Api Learning Promote
|
||||
* @description שער-יו"ר (INV-G10/LRN1): מאשר לקחי-סגנון + ביטויי-מעבר מהצעת-הדיסטילציה
|
||||
* ומטמיע אותם בערוצים שהכותב צורך (methodology overrides → T15). מקדם status.
|
||||
*/
|
||||
post: operations["api_learning_promote_api_learning_pairs__pair_id__promote_post"];
|
||||
delete?: never;
|
||||
options?: never;
|
||||
head?: never;
|
||||
patch?: never;
|
||||
trace?: never;
|
||||
};
|
||||
"/api/admin/skills": {
|
||||
parameters: {
|
||||
query?: never;
|
||||
@@ -2254,7 +2382,14 @@ export interface paths {
|
||||
head?: never;
|
||||
/**
|
||||
* Api Resolve Feedback
|
||||
* @description Mark feedback as resolved.
|
||||
* @description Mark feedback as resolved. When ``fold`` is true (default) and the entry
|
||||
* has an extracted lesson, also wake the CEO to fold that lesson into the
|
||||
* right knowledge file (the feedback→agent-knowledge loop).
|
||||
*
|
||||
* The fold is fire-and-forget (BackgroundTask) and best-effort — resolving
|
||||
* never fails because Paperclip is down. Pass ``fold=false`` for pure
|
||||
* bookkeeping resolves (e.g. from the per-case drafts panel) to avoid
|
||||
* spawning a CEO run per click.
|
||||
*/
|
||||
patch: operations["api_resolve_feedback_api_feedback__feedback_id__resolve_patch"];
|
||||
trace?: never;
|
||||
@@ -2566,7 +2701,13 @@ export interface paths {
|
||||
path?: never;
|
||||
cookie?: never;
|
||||
};
|
||||
/** Halachot List */
|
||||
/**
|
||||
* Halachot List
|
||||
* @description List halachot. ``exclude_low_quality`` hides flagged items (#84.1) and
|
||||
* ``order_by_priority`` switches to the active-learning order (#84.3). Both
|
||||
* default off so existing callers are unaffected; the review-queue view opts
|
||||
* in.
|
||||
*/
|
||||
get: operations["halachot_list_api_halachot_get"];
|
||||
put?: never;
|
||||
post?: never;
|
||||
@@ -2746,6 +2887,11 @@ export interface components {
|
||||
/** Issue Id */
|
||||
issue_id?: string | null;
|
||||
};
|
||||
/** BlockUpdateRequest */
|
||||
BlockUpdateRequest: {
|
||||
/** Content */
|
||||
content: string;
|
||||
};
|
||||
/** Body_api_create_feedback_api_feedback_post */
|
||||
Body_api_create_feedback_api_feedback_post: {
|
||||
/**
|
||||
@@ -3475,6 +3621,19 @@ export interface components {
|
||||
/** Citation Formatted */
|
||||
citation_formatted?: string | null;
|
||||
};
|
||||
/** PromoteLearningRequest */
|
||||
PromoteLearningRequest: {
|
||||
/**
|
||||
* Lessons
|
||||
* @default []
|
||||
*/
|
||||
lessons: string[];
|
||||
/**
|
||||
* Phrases
|
||||
* @default []
|
||||
*/
|
||||
phrases: string[];
|
||||
};
|
||||
/** ReviseRequest */
|
||||
ReviseRequest: {
|
||||
/** Revisions */
|
||||
@@ -5263,6 +5422,73 @@ export interface operations {
|
||||
};
|
||||
};
|
||||
};
|
||||
api_get_decision_blocks_api_cases__case_number__decision_blocks_get: {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path: {
|
||||
case_number: string;
|
||||
};
|
||||
cookie?: never;
|
||||
};
|
||||
requestBody?: never;
|
||||
responses: {
|
||||
/** @description Successful Response */
|
||||
200: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": unknown;
|
||||
};
|
||||
};
|
||||
/** @description Validation Error */
|
||||
422: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": components["schemas"]["HTTPValidationError"];
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
api_update_decision_block_api_cases__case_number__decision_blocks__block_id__put: {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path: {
|
||||
case_number: string;
|
||||
block_id: string;
|
||||
};
|
||||
cookie?: never;
|
||||
};
|
||||
requestBody: {
|
||||
content: {
|
||||
"application/json": components["schemas"]["BlockUpdateRequest"];
|
||||
};
|
||||
};
|
||||
responses: {
|
||||
/** @description Successful Response */
|
||||
200: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": unknown;
|
||||
};
|
||||
};
|
||||
/** @description Validation Error */
|
||||
422: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": components["schemas"]["HTTPValidationError"];
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
api_learn_api_cases__case_number__learn_post: {
|
||||
parameters: {
|
||||
query?: never;
|
||||
@@ -6575,6 +6801,135 @@ export interface operations {
|
||||
};
|
||||
};
|
||||
};
|
||||
api_learning_pairs_api_learning_pairs_get: {
|
||||
parameters: {
|
||||
query?: {
|
||||
status?: string;
|
||||
limit?: number;
|
||||
};
|
||||
header?: never;
|
||||
path?: never;
|
||||
cookie?: never;
|
||||
};
|
||||
requestBody?: never;
|
||||
responses: {
|
||||
/** @description Successful Response */
|
||||
200: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": unknown;
|
||||
};
|
||||
};
|
||||
/** @description Validation Error */
|
||||
422: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": components["schemas"]["HTTPValidationError"];
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
api_learning_style_distance_api_learning_style_distance__case_number__get: {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path: {
|
||||
case_number: string;
|
||||
};
|
||||
cookie?: never;
|
||||
};
|
||||
requestBody?: never;
|
||||
responses: {
|
||||
/** @description Successful Response */
|
||||
200: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": unknown;
|
||||
};
|
||||
};
|
||||
/** @description Validation Error */
|
||||
422: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": components["schemas"]["HTTPValidationError"];
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
api_learning_pair_detail_api_learning_pairs__pair_id__get: {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path: {
|
||||
pair_id: string;
|
||||
};
|
||||
cookie?: never;
|
||||
};
|
||||
requestBody?: never;
|
||||
responses: {
|
||||
/** @description Successful Response */
|
||||
200: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": unknown;
|
||||
};
|
||||
};
|
||||
/** @description Validation Error */
|
||||
422: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": components["schemas"]["HTTPValidationError"];
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
api_learning_promote_api_learning_pairs__pair_id__promote_post: {
|
||||
parameters: {
|
||||
query?: never;
|
||||
header?: never;
|
||||
path: {
|
||||
pair_id: string;
|
||||
};
|
||||
cookie?: never;
|
||||
};
|
||||
requestBody: {
|
||||
content: {
|
||||
"application/json": components["schemas"]["PromoteLearningRequest"];
|
||||
};
|
||||
};
|
||||
responses: {
|
||||
/** @description Successful Response */
|
||||
200: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": unknown;
|
||||
};
|
||||
};
|
||||
/** @description Validation Error */
|
||||
422: {
|
||||
headers: {
|
||||
[name: string]: unknown;
|
||||
};
|
||||
content: {
|
||||
"application/json": components["schemas"]["HTTPValidationError"];
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
api_list_skills_api_admin_skills_get: {
|
||||
parameters: {
|
||||
query?: never;
|
||||
@@ -7580,6 +7935,8 @@ export interface operations {
|
||||
practice_area?: string;
|
||||
limit?: number;
|
||||
offset?: number;
|
||||
exclude_low_quality?: boolean;
|
||||
order_by_priority?: boolean;
|
||||
};
|
||||
header?: never;
|
||||
path?: never;
|
||||
|
||||
@@ -6031,7 +6031,13 @@ async def halachot_list(
|
||||
practice_area: str = "",
|
||||
limit: int = 200,
|
||||
offset: int = 0,
|
||||
exclude_low_quality: bool = False,
|
||||
order_by_priority: bool = False,
|
||||
):
|
||||
"""List halachot. ``exclude_low_quality`` hides flagged items (#84.1) and
|
||||
``order_by_priority`` switches to the active-learning order (#84.3). Both
|
||||
default off so existing callers are unaffected; the review-queue view opts
|
||||
in."""
|
||||
cid: UUID | None = None
|
||||
if case_law_id:
|
||||
try:
|
||||
@@ -6043,6 +6049,8 @@ async def halachot_list(
|
||||
review_status=review_status or None,
|
||||
practice_area=practice_area or None,
|
||||
limit=limit, offset=offset,
|
||||
exclude_low_quality=exclude_low_quality,
|
||||
order_by_priority=order_by_priority,
|
||||
)
|
||||
return {"items": rows, "count": len(rows)}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user