Practice area separation: multi-tenant axis across DB, RAG, and UI
Adds two orthogonal columns — practice_area (top-level legal domain: appeals_committee / national_insurance / labor_law) and appeal_subtype (building_permit / betterment_levy / compensation_197) — denormalized into cases, documents, document_chunks, decisions, and style_corpus so vector searches can filter without JOINs. Why: the system handles two unrelated sub-domains under the same appeals committee (1xxx building permits and 8xxx/9xxx betterment/197), with different rules and writing style. Without a separation axis, search_similar() and the block-writer's precedent lookup were free to surface betterment-levy paragraphs while drafting a building-permit decision — a real risk of cross-domain contamination. The same axis also lets future domains (national insurance, labor law) coexist without separate schemas. Schema (V4 migration in db.py): - ALTER ... ADD COLUMN IF NOT EXISTS on all five tables + composite indexes (practice_area first). - Idempotent backfill: case_number ~ '^1' → building_permit, '^8' → betterment_levy, '^9' → compensation_197; propagated to documents, chunks, and decisions via case_id; training-corpus rows (case_id NULL) default to appeals_committee. Code: - New services/practice_area.py with derive_subtype, validate, and is_override + enum constants. - db.create_case / create_document / store_chunks / create_decision inherit practice_area from the parent case (or take an explicit override for the case_id=None training corpus). - db.search_similar and search_similar_paragraphs accept practice_area + appeal_subtype filters using the denormalized columns. - tools/search.py auto-resolves the filter from case_number when given. - block_writer._build_precedents_context now passes the active case's practice_area to search_similar_paragraphs — closes the contamination hole for the discussion-block precedent fetch. - tools/cases.case_create auto-derives subtype from case_number; an explicit override that disagrees writes a case_subtype_override entry to audit_log so we can spot bad classifications later. - tools/documents.document_upload_training tags new training material with practice_area + subtype end-to-end (corpus, document, chunks). UI (web/static/index.html + web/app.py): - New-case wizard gets a practice_area dropdown (others disabled until national_insurance / labor_law arrive) and an appeal_subtype dropdown with JS auto-fill from the case-number prefix; manual edits stick. - Case header shows a blue badge with practice_area · subtype. - CaseCreateRequest plumbs both fields through to cases_tools.case_create. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
96
mcp-server/src/legal_mcp/services/practice_area.py
Normal file
96
mcp-server/src/legal_mcp/services/practice_area.py
Normal file
@@ -0,0 +1,96 @@
|
||||
"""Practice area + appeal subtype: derivation, validation, constants.
|
||||
|
||||
Two orthogonal axes used to separate legal domains across the system:
|
||||
|
||||
practice_area — top-level domain (multi-tenant axis). Examples:
|
||||
appeals_committee, national_insurance, labor_law.
|
||||
appeal_subtype — refines within a domain. For appeals_committee:
|
||||
building_permit (1xxx), betterment_levy (8xxx),
|
||||
compensation_197 (9xxx), unknown.
|
||||
|
||||
Both columns are denormalized into documents/chunks/decisions/style_corpus
|
||||
so vector searches can filter cheaply.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
|
||||
# ── Enums ──────────────────────────────────────────────────────────
|
||||
|
||||
PRACTICE_AREAS: set[str] = {
|
||||
"appeals_committee",
|
||||
"national_insurance",
|
||||
"labor_law",
|
||||
}
|
||||
|
||||
APPEALS_COMMITTEE_SUBTYPES: set[str] = {
|
||||
"building_permit",
|
||||
"betterment_levy",
|
||||
"compensation_197",
|
||||
"unknown",
|
||||
}
|
||||
|
||||
DEFAULT_PRACTICE_AREA = "appeals_committee"
|
||||
|
||||
# Subtypes per practice_area (extend when adding domains)
|
||||
SUBTYPES_BY_AREA: dict[str, set[str]] = {
|
||||
"appeals_committee": APPEALS_COMMITTEE_SUBTYPES,
|
||||
"national_insurance": {"unknown"},
|
||||
"labor_law": {"unknown"},
|
||||
}
|
||||
|
||||
|
||||
# ── Derivation ─────────────────────────────────────────────────────
|
||||
|
||||
_FIRST_DIGIT = re.compile(r"^\s*(\d)")
|
||||
|
||||
_APPEALS_COMMITTEE_DIGIT_TO_SUBTYPE = {
|
||||
"1": "building_permit",
|
||||
"8": "betterment_levy",
|
||||
"9": "compensation_197",
|
||||
}
|
||||
|
||||
|
||||
def derive_subtype(case_number: str, practice_area: str = DEFAULT_PRACTICE_AREA) -> str:
|
||||
"""Infer the appeal_subtype from case_number.
|
||||
|
||||
For appeals_committee, the convention is:
|
||||
1xxx → building_permit, 8xxx → betterment_levy, 9xxx → compensation_197.
|
||||
|
||||
For other practice areas there is no public numbering convention yet,
|
||||
so we return 'unknown' until a real rule is defined.
|
||||
"""
|
||||
if practice_area != "appeals_committee":
|
||||
return "unknown"
|
||||
m = _FIRST_DIGIT.match(case_number or "")
|
||||
if not m:
|
||||
return "unknown"
|
||||
return _APPEALS_COMMITTEE_DIGIT_TO_SUBTYPE.get(m.group(1), "unknown")
|
||||
|
||||
|
||||
# ── Validation ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def validate(practice_area: str, appeal_subtype: str | None) -> None:
|
||||
"""Raise ValueError on unknown values. appeal_subtype=None is allowed."""
|
||||
if practice_area not in PRACTICE_AREAS:
|
||||
raise ValueError(
|
||||
f"unknown practice_area: {practice_area!r}. "
|
||||
f"expected one of {sorted(PRACTICE_AREAS)}"
|
||||
)
|
||||
if appeal_subtype is None:
|
||||
return
|
||||
allowed = SUBTYPES_BY_AREA.get(practice_area, {"unknown"})
|
||||
if appeal_subtype not in allowed:
|
||||
raise ValueError(
|
||||
f"unknown appeal_subtype {appeal_subtype!r} for practice_area "
|
||||
f"{practice_area!r}. expected one of {sorted(allowed)}"
|
||||
)
|
||||
|
||||
|
||||
def is_override(case_number: str, practice_area: str, appeal_subtype: str) -> bool:
|
||||
"""True iff the user-supplied subtype disagrees with what derive_subtype
|
||||
would have produced (and the derived value is not 'unknown')."""
|
||||
derived = derive_subtype(case_number, practice_area)
|
||||
return derived != "unknown" and derived != appeal_subtype
|
||||
Reference in New Issue
Block a user