Practice area separation: multi-tenant axis across DB, RAG, and UI

Adds two orthogonal columns — practice_area (top-level legal domain:
appeals_committee / national_insurance / labor_law) and appeal_subtype
(building_permit / betterment_levy / compensation_197) — denormalized
into cases, documents, document_chunks, decisions, and style_corpus so
vector searches can filter without JOINs.

Why: the system handles two unrelated sub-domains under the same
appeals committee (1xxx building permits and 8xxx/9xxx betterment/197),
with different rules and writing style. Without a separation axis,
search_similar() and the block-writer's precedent lookup were free to
surface betterment-levy paragraphs while drafting a building-permit
decision — a real risk of cross-domain contamination. The same axis
also lets future domains (national insurance, labor law) coexist
without separate schemas.

Schema (V4 migration in db.py):
- ALTER ... ADD COLUMN IF NOT EXISTS on all five tables + composite
  indexes (practice_area first).
- Idempotent backfill: case_number ~ '^1' → building_permit, '^8' →
  betterment_levy, '^9' → compensation_197; propagated to documents,
  chunks, and decisions via case_id; training-corpus rows (case_id NULL)
  default to appeals_committee.

Code:
- New services/practice_area.py with derive_subtype, validate, and
  is_override + enum constants.
- db.create_case / create_document / store_chunks / create_decision
  inherit practice_area from the parent case (or take an explicit
  override for the case_id=None training corpus).
- db.search_similar and search_similar_paragraphs accept practice_area
  + appeal_subtype filters using the denormalized columns.
- tools/search.py auto-resolves the filter from case_number when given.
- block_writer._build_precedents_context now passes the active case's
  practice_area to search_similar_paragraphs — closes the contamination
  hole for the discussion-block precedent fetch.
- tools/cases.case_create auto-derives subtype from case_number; an
  explicit override that disagrees writes a case_subtype_override entry
  to audit_log so we can spot bad classifications later.
- tools/documents.document_upload_training tags new training material
  with practice_area + subtype end-to-end (corpus, document, chunks).

UI (web/static/index.html + web/app.py):
- New-case wizard gets a practice_area dropdown (others disabled until
  national_insurance / labor_law arrive) and an appeal_subtype dropdown
  with JS auto-fill from the case-number prefix; manual edits stick.
- Case header shows a blue badge with practice_area · subtype.
- CaseCreateRequest plumbs both fields through to cases_tools.case_create.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-11 16:36:48 +00:00
parent a8b79822bf
commit 26d09d648f
8 changed files with 468 additions and 34 deletions

View File

@@ -476,12 +476,17 @@ async def _build_precedents_context(case_id: UUID, block_id: str) -> str:
case = await db.get_case(case_id)
case_number = case.get("case_number", "") if case else ""
subject = case.get("subject", "") if case else ""
practice_area = case.get("practice_area") if case else None
appeal_subtype = case.get("appeal_subtype") if case else None
query = f"דיון משפטי בנושא {subject}" if subject else "דיון משפטי ועדת ערר"
query_emb = await embeddings.embed_query(query)
# Search 1: paragraph_embeddings (from other decisions by Dafna)
# Search 1: paragraph_embeddings (from other decisions by Dafna).
# Filter by practice_area + appeal_subtype so we don't pull a
# betterment-levy paragraph when writing a building-permit decision.
para_results = await db.search_similar_paragraphs(
query_embedding=query_emb, limit=10, block_type="block-yod",
practice_area=practice_area, appeal_subtype=appeal_subtype,
)
# Filter out same case
para_results = [r for r in para_results if r.get("case_number", "") != case_number]

View File

@@ -200,6 +200,76 @@ CREATE TABLE IF NOT EXISTS appeal_type_rules (
ALTER TABLE decision_blocks ADD COLUMN IF NOT EXISTS image_placeholders JSONB DEFAULT '[]';
"""
# ── Phase 4: Practice area separation (multi-tenant axis) ──────────
SCHEMA_V4_SQL = """
-- ═══════════════════════════════════════════════════════════════════
-- practice_area = top-level legal domain (multi-tenant axis):
-- appeals_committee | national_insurance | labor_law | ...
-- appeal_subtype = refines within practice_area:
-- building_permit | betterment_levy | compensation_197 | unknown
-- Both columns are denormalized to documents/chunks/decisions/style_corpus
-- so vector searches can filter without expensive JOINs.
-- ═══════════════════════════════════════════════════════════════════
ALTER TABLE cases ADD COLUMN IF NOT EXISTS practice_area TEXT;
ALTER TABLE cases ADD COLUMN IF NOT EXISTS appeal_subtype TEXT;
ALTER TABLE documents ADD COLUMN IF NOT EXISTS practice_area TEXT;
ALTER TABLE documents ADD COLUMN IF NOT EXISTS appeal_subtype TEXT;
ALTER TABLE document_chunks ADD COLUMN IF NOT EXISTS practice_area TEXT;
ALTER TABLE document_chunks ADD COLUMN IF NOT EXISTS appeal_subtype TEXT;
ALTER TABLE decisions ADD COLUMN IF NOT EXISTS practice_area TEXT;
ALTER TABLE decisions ADD COLUMN IF NOT EXISTS appeal_subtype TEXT;
ALTER TABLE style_corpus ADD COLUMN IF NOT EXISTS practice_area TEXT;
ALTER TABLE style_corpus ADD COLUMN IF NOT EXISTS appeal_subtype TEXT;
CREATE INDEX IF NOT EXISTS idx_cases_practice
ON cases(practice_area, appeal_subtype);
CREATE INDEX IF NOT EXISTS idx_chunks_practice
ON document_chunks(practice_area);
CREATE INDEX IF NOT EXISTS idx_corpus_practice
ON style_corpus(practice_area, appeal_subtype);
CREATE INDEX IF NOT EXISTS idx_decisions_practice
ON decisions(practice_area);
-- Backfill (idempotent — only fills NULLs)
UPDATE cases SET practice_area = 'appeals_committee' WHERE practice_area IS NULL;
UPDATE cases SET appeal_subtype = CASE
WHEN case_number ~ '^1[0-9]{3}' THEN 'building_permit'
WHEN case_number ~ '^8[0-9]{3}' THEN 'betterment_levy'
WHEN case_number ~ '^9[0-9]{3}' THEN 'compensation_197'
ELSE 'unknown'
END WHERE appeal_subtype IS NULL;
UPDATE documents d
SET practice_area = c.practice_area, appeal_subtype = c.appeal_subtype
FROM cases c
WHERE d.case_id = c.id AND d.practice_area IS NULL;
UPDATE document_chunks dc
SET practice_area = c.practice_area, appeal_subtype = c.appeal_subtype
FROM cases c
WHERE dc.case_id = c.id AND dc.practice_area IS NULL;
UPDATE decisions de
SET practice_area = c.practice_area, appeal_subtype = c.appeal_subtype
FROM cases c
WHERE de.case_id = c.id AND de.practice_area IS NULL;
-- All existing style_corpus entries are דפנה's appeals-committee decisions
UPDATE style_corpus SET practice_area = 'appeals_committee' WHERE practice_area IS NULL;
-- Training corpus documents/chunks have case_id = NULL. All historical
-- training material is from דפנה's appeals committee, so default them.
UPDATE documents SET practice_area = 'appeals_committee'
WHERE case_id IS NULL AND practice_area IS NULL;
UPDATE document_chunks dc
SET practice_area = d.practice_area, appeal_subtype = d.appeal_subtype
FROM documents d
WHERE dc.document_id = d.id AND dc.practice_area IS NULL;
"""
# ── Phase 2: Decision + Knowledge + RAG layers ────────────────────
SCHEMA_V2_SQL = """
@@ -388,7 +458,8 @@ async def init_schema() -> None:
await conn.execute(MIGRATIONS_SQL)
await conn.execute(SCHEMA_V2_SQL)
await conn.execute(SCHEMA_V3_SQL)
logger.info("Database schema initialized (v1 + v2 + v3)")
await conn.execute(SCHEMA_V4_SQL)
logger.info("Database schema initialized (v1 + v2 + v3 + v4)")
# ── Case CRUD ───────────────────────────────────────────────────────
@@ -405,6 +476,8 @@ async def create_case(
hearing_date: date | None = None,
notes: str = "",
expected_outcome: str = "",
practice_area: str = "appeals_committee",
appeal_subtype: str | None = None,
) -> dict:
pool = await get_pool()
case_id = uuid4()
@@ -412,17 +485,43 @@ async def create_case(
await conn.execute(
"""INSERT INTO cases (id, case_number, title, appellants, respondents,
subject, property_address, permit_number, committee_type,
hearing_date, notes, expected_outcome)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)""",
hearing_date, notes, expected_outcome,
practice_area, appeal_subtype)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)""",
case_id, case_number, title,
json.dumps(appellants or []),
json.dumps(respondents or []),
subject, property_address, permit_number, committee_type,
hearing_date, notes, expected_outcome,
practice_area, appeal_subtype,
)
return await get_case(case_id)
async def get_case_practice_area(case_id: UUID) -> tuple[str | None, str | None]:
"""Return (practice_area, appeal_subtype) for a case, or (None, None) if missing."""
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT practice_area, appeal_subtype FROM cases WHERE id = $1", case_id
)
if row is None:
return None, None
return row["practice_area"], row["appeal_subtype"]
async def get_case_practice_area_by_number(case_number: str) -> tuple[str | None, str | None]:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT practice_area, appeal_subtype FROM cases WHERE case_number = $1",
case_number,
)
if row is None:
return None, None
return row["practice_area"], row["appeal_subtype"]
async def get_case(case_id: UUID) -> dict | None:
pool = await get_pool()
async with pool.acquire() as conn:
@@ -488,19 +587,34 @@ def _row_to_case(row: asyncpg.Record) -> dict:
# ── Document CRUD ───────────────────────────────────────────────────
async def create_document(
case_id: UUID,
case_id: UUID | None,
doc_type: str,
title: str,
file_path: str,
page_count: int | None = None,
practice_area: str | None = None,
appeal_subtype: str | None = None,
) -> dict:
pool = await get_pool()
doc_id = uuid4()
async with pool.acquire() as conn:
# If practice_area not explicitly given, inherit from the parent case
# (for case-bound documents). Training corpus passes case_id=None and
# provides the practice_area directly.
if practice_area is None and case_id is not None:
case_row = await conn.fetchrow(
"SELECT practice_area, appeal_subtype FROM cases WHERE id = $1",
case_id,
)
if case_row:
practice_area = case_row["practice_area"]
appeal_subtype = case_row["appeal_subtype"]
await conn.execute(
"""INSERT INTO documents (id, case_id, doc_type, title, file_path, page_count)
VALUES ($1, $2, $3, $4, $5, $6)""",
"""INSERT INTO documents (id, case_id, doc_type, title, file_path,
page_count, practice_area, appeal_subtype)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)""",
doc_id, case_id, doc_type, title, file_path, page_count,
practice_area, appeal_subtype,
)
row = await conn.fetchrow("SELECT * FROM documents WHERE id = $1", doc_id)
return _row_to_doc(row)
@@ -622,12 +736,20 @@ async def create_decision(
)
version = (existing["version"] + 1) if existing else 1
case_row = await conn.fetchrow(
"SELECT practice_area, appeal_subtype FROM cases WHERE id = $1", case_id
)
practice_area = case_row["practice_area"] if case_row else None
appeal_subtype = case_row["appeal_subtype"] if case_row else None
await conn.execute(
"""INSERT INTO decisions (id, case_id, version, outcome, outcome_summary,
outcome_reasoning, direction_doc)
VALUES ($1, $2, $3, $4, $5, $6, $7)""",
outcome_reasoning, direction_doc,
practice_area, appeal_subtype)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)""",
decision_id, case_id, version, outcome, outcome_summary,
outcome_reasoning, json.dumps(direction_doc) if direction_doc else None,
practice_area, appeal_subtype,
)
return await get_decision(decision_id)
@@ -701,12 +823,37 @@ async def store_chunks(
document_id: UUID,
case_id: UUID | None,
chunks: list[dict],
practice_area: str | None = None,
appeal_subtype: str | None = None,
) -> int:
"""Store document chunks with embeddings. Each chunk dict has:
content, section_type, embedding (list[float]), page_number, chunk_index
content, section_type, embedding (list[float]), page_number, chunk_index.
practice_area defaults to the parent case's value, or — when case_id is
None (training corpus) — falls back to the parent document's value so
vector search can still filter cleanly.
"""
pool = await get_pool()
async with pool.acquire() as conn:
# Resolve practice_area in priority order: explicit > case > document.
if practice_area is None:
if case_id is not None:
case_row = await conn.fetchrow(
"SELECT practice_area, appeal_subtype FROM cases WHERE id = $1",
case_id,
)
if case_row:
practice_area = case_row["practice_area"]
appeal_subtype = case_row["appeal_subtype"]
if practice_area is None:
doc_row = await conn.fetchrow(
"SELECT practice_area, appeal_subtype FROM documents WHERE id = $1",
document_id,
)
if doc_row:
practice_area = doc_row["practice_area"]
appeal_subtype = doc_row["appeal_subtype"]
# Delete existing chunks for this document
await conn.execute(
"DELETE FROM document_chunks WHERE document_id = $1", document_id
@@ -714,14 +861,16 @@ async def store_chunks(
for chunk in chunks:
await conn.execute(
"""INSERT INTO document_chunks
(document_id, case_id, chunk_index, content, section_type, embedding, page_number)
VALUES ($1, $2, $3, $4, $5, $6, $7)""",
(document_id, case_id, chunk_index, content, section_type,
embedding, page_number, practice_area, appeal_subtype)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)""",
document_id, case_id,
chunk["chunk_index"],
chunk["content"],
chunk.get("section_type", "other"),
chunk["embedding"],
chunk.get("page_number"),
practice_area, appeal_subtype,
)
return len(chunks)
@@ -731,8 +880,15 @@ async def search_similar(
limit: int = 10,
case_id: UUID | None = None,
section_type: str | None = None,
practice_area: str | None = None,
appeal_subtype: str | None = None,
) -> list[dict]:
"""Cosine similarity search on document chunks."""
"""Cosine similarity search on document chunks.
Filter by practice_area to keep precedents from the same legal domain
(e.g. don't surface betterment-levy chunks when working on building
permits). Uses the denormalized column on document_chunks — no JOIN.
"""
pool = await get_pool()
conditions = []
params: list = [query_embedding, limit]
@@ -746,6 +902,14 @@ async def search_similar(
conditions.append(f"dc.section_type = ${param_idx}")
params.append(section_type)
param_idx += 1
if practice_area:
conditions.append(f"dc.practice_area = ${param_idx}")
params.append(practice_area)
param_idx += 1
if appeal_subtype:
conditions.append(f"dc.appeal_subtype = ${param_idx}")
params.append(appeal_subtype)
param_idx += 1
where = f"WHERE {' AND '.join(conditions)}" if conditions else ""
@@ -778,6 +942,8 @@ async def add_to_style_corpus(
summary: str = "",
outcome: str = "",
key_principles: list[str] | None = None,
practice_area: str = "appeals_committee",
appeal_subtype: str | None = None,
) -> UUID:
pool = await get_pool()
corpus_id = uuid4()
@@ -785,11 +951,13 @@ async def add_to_style_corpus(
await conn.execute(
"""INSERT INTO style_corpus
(id, document_id, decision_number, decision_date,
subject_categories, full_text, summary, outcome, key_principles)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)""",
subject_categories, full_text, summary, outcome, key_principles,
practice_area, appeal_subtype)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)""",
corpus_id, document_id, decision_number, decision_date,
json.dumps(subject_categories), full_text, summary, outcome,
json.dumps(key_principles or []),
practice_area, appeal_subtype,
)
return corpus_id
@@ -893,8 +1061,15 @@ async def search_similar_paragraphs(
query_embedding: list[float],
limit: int = 10,
block_type: str | None = None,
practice_area: str | None = None,
appeal_subtype: str | None = None,
) -> list[dict]:
"""Search decision paragraphs by semantic similarity."""
"""Search decision paragraphs by semantic similarity.
Filtering by practice_area uses the denormalized column on `decisions`
so we don't pull, e.g., betterment-levy paragraphs when writing a
building-permit decision.
"""
pool = await get_pool()
conditions = []
params: list = [query_embedding, limit]
@@ -904,6 +1079,14 @@ async def search_similar_paragraphs(
conditions.append(f"db.block_id = ${param_idx}")
params.append(block_type)
param_idx += 1
if practice_area:
conditions.append(f"d.practice_area = ${param_idx}")
params.append(practice_area)
param_idx += 1
if appeal_subtype:
conditions.append(f"d.appeal_subtype = ${param_idx}")
params.append(appeal_subtype)
param_idx += 1
where = f"WHERE {' AND '.join(conditions)}" if conditions else ""

View File

@@ -0,0 +1,96 @@
"""Practice area + appeal subtype: derivation, validation, constants.
Two orthogonal axes used to separate legal domains across the system:
practice_area — top-level domain (multi-tenant axis). Examples:
appeals_committee, national_insurance, labor_law.
appeal_subtype — refines within a domain. For appeals_committee:
building_permit (1xxx), betterment_levy (8xxx),
compensation_197 (9xxx), unknown.
Both columns are denormalized into documents/chunks/decisions/style_corpus
so vector searches can filter cheaply.
"""
from __future__ import annotations
import re
# ── Enums ──────────────────────────────────────────────────────────
PRACTICE_AREAS: set[str] = {
"appeals_committee",
"national_insurance",
"labor_law",
}
APPEALS_COMMITTEE_SUBTYPES: set[str] = {
"building_permit",
"betterment_levy",
"compensation_197",
"unknown",
}
DEFAULT_PRACTICE_AREA = "appeals_committee"
# Subtypes per practice_area (extend when adding domains)
SUBTYPES_BY_AREA: dict[str, set[str]] = {
"appeals_committee": APPEALS_COMMITTEE_SUBTYPES,
"national_insurance": {"unknown"},
"labor_law": {"unknown"},
}
# ── Derivation ─────────────────────────────────────────────────────
_FIRST_DIGIT = re.compile(r"^\s*(\d)")
_APPEALS_COMMITTEE_DIGIT_TO_SUBTYPE = {
"1": "building_permit",
"8": "betterment_levy",
"9": "compensation_197",
}
def derive_subtype(case_number: str, practice_area: str = DEFAULT_PRACTICE_AREA) -> str:
"""Infer the appeal_subtype from case_number.
For appeals_committee, the convention is:
1xxx → building_permit, 8xxx → betterment_levy, 9xxx → compensation_197.
For other practice areas there is no public numbering convention yet,
so we return 'unknown' until a real rule is defined.
"""
if practice_area != "appeals_committee":
return "unknown"
m = _FIRST_DIGIT.match(case_number or "")
if not m:
return "unknown"
return _APPEALS_COMMITTEE_DIGIT_TO_SUBTYPE.get(m.group(1), "unknown")
# ── Validation ─────────────────────────────────────────────────────
def validate(practice_area: str, appeal_subtype: str | None) -> None:
"""Raise ValueError on unknown values. appeal_subtype=None is allowed."""
if practice_area not in PRACTICE_AREAS:
raise ValueError(
f"unknown practice_area: {practice_area!r}. "
f"expected one of {sorted(PRACTICE_AREAS)}"
)
if appeal_subtype is None:
return
allowed = SUBTYPES_BY_AREA.get(practice_area, {"unknown"})
if appeal_subtype not in allowed:
raise ValueError(
f"unknown appeal_subtype {appeal_subtype!r} for practice_area "
f"{practice_area!r}. expected one of {sorted(allowed)}"
)
def is_override(case_number: str, practice_area: str, appeal_subtype: str) -> bool:
"""True iff the user-supplied subtype disagrees with what derive_subtype
would have produced (and the derived value is not 'unknown')."""
derived = derive_subtype(case_number, practice_area)
return derived != "unknown" and derived != appeal_subtype

View File

@@ -8,7 +8,7 @@ from pathlib import Path
from uuid import UUID
from legal_mcp import config
from legal_mcp.services import db
from legal_mcp.services import audit, db, practice_area as pa
async def case_create(
@@ -23,6 +23,8 @@ async def case_create(
hearing_date: str = "",
notes: str = "",
expected_outcome: str = "",
practice_area: str = "appeals_committee",
appeal_subtype: str = "",
) -> str:
"""יצירת תיק ערר חדש.
@@ -38,6 +40,9 @@ async def case_create(
hearing_date: תאריך דיון (YYYY-MM-DD)
notes: הערות
expected_outcome: תוצאה צפויה (rejection/partial_acceptance/full_acceptance/betterment_levy)
practice_area: תחום משפטי (appeals_committee / national_insurance / labor_law)
appeal_subtype: סוג ערר (building_permit / betterment_levy / compensation_197).
ריק = יוסק אוטומטית ממספר התיק
"""
from datetime import date as date_type
@@ -45,6 +50,12 @@ async def case_create(
if hearing_date:
h_date = date_type.fromisoformat(hearing_date)
# Resolve appeal_subtype: explicit override > auto-derive > 'unknown'
derived_subtype = pa.derive_subtype(case_number, practice_area)
if not appeal_subtype:
appeal_subtype = derived_subtype
pa.validate(practice_area, appeal_subtype)
case = await db.create_case(
case_number=case_number,
title=title,
@@ -57,8 +68,24 @@ async def case_create(
hearing_date=h_date,
notes=notes,
expected_outcome=expected_outcome,
practice_area=practice_area,
appeal_subtype=appeal_subtype,
)
# If the user overrode the case-number convention (e.g. case 8500 marked
# as building_permit), record it so we can audit later.
if pa.is_override(case_number, practice_area, appeal_subtype):
await audit.log_action(
action="case_subtype_override",
case_id=UUID(case["id"]),
details={
"case_number": case_number,
"derived_subtype": derived_subtype,
"chosen_subtype": appeal_subtype,
"practice_area": practice_area,
},
)
# Initialize git repo for the case
case_dir = config.find_case_dir(case_number)
case_dir.mkdir(parents=True, exist_ok=True)

View File

@@ -105,6 +105,8 @@ async def document_upload_training(
decision_date: str = "",
subject_categories: list[str] | None = None,
title: str = "",
practice_area: str = "appeals_committee",
appeal_subtype: str = "",
) -> str:
"""העלאת החלטה קודמת של דפנה לקורפוס הסגנון (training).
@@ -114,10 +116,13 @@ async def document_upload_training(
decision_date: תאריך ההחלטה (YYYY-MM-DD)
subject_categories: קטגוריות - אפשר לבחור כמה (בנייה, שימוש חורג, תכנית, היתר, הקלה, חלוקה, תמ"א 38, היטל השבחה, פיצויים 197)
title: שם המסמך
practice_area: תחום משפטי (appeals_committee / national_insurance / labor_law)
appeal_subtype: סוג ערר (building_permit / betterment_levy / compensation_197).
ריק = יוסק אוטומטית ממספר ההחלטה
"""
from datetime import date as date_type
from legal_mcp.services import extractor, embeddings, chunker
from legal_mcp.services import chunker, embeddings, extractor, practice_area as pa
source = Path(file_path)
if not source.exists():
@@ -126,6 +131,11 @@ async def document_upload_training(
if not title:
title = source.stem
# Resolve subtype: explicit > derived from decision_number > 'unknown'
if not appeal_subtype:
appeal_subtype = pa.derive_subtype(decision_number, practice_area)
pa.validate(practice_area, appeal_subtype)
# Copy to training directory (skip if already there)
config.TRAINING_DIR.mkdir(parents=True, exist_ok=True)
dest = config.TRAINING_DIR / source.name
@@ -140,25 +150,29 @@ async def document_upload_training(
if decision_date:
d_date = date_type.fromisoformat(decision_date)
# Add to style corpus
# Add to style corpus (tagged by domain so block-writer can filter)
corpus_id = await db.add_to_style_corpus(
document_id=None,
decision_number=decision_number,
decision_date=d_date,
subject_categories=subject_categories or [],
full_text=text,
practice_area=practice_area,
appeal_subtype=appeal_subtype,
)
# Chunk and embed for RAG search over training corpus
chunks = chunker.chunk_document(text)
if chunks:
# Create a document record (no case association)
# Create a document record (no case association — tag explicitly)
doc = await db.create_document(
case_id=None,
doc_type="decision",
title=f"[קורפוס] {title}",
file_path=str(dest),
page_count=page_count,
practice_area=practice_area,
appeal_subtype=appeal_subtype,
)
doc_id = UUID(doc["id"])
await db.update_document(doc_id, extracted_text=text, extraction_status="completed")
@@ -176,7 +190,10 @@ async def document_upload_training(
}
for c, emb in zip(chunks, embs)
]
await db.store_chunks(doc_id, None, chunk_dicts)
await db.store_chunks(
doc_id, None, chunk_dicts,
practice_area=practice_area, appeal_subtype=appeal_subtype,
)
return json.dumps({
"corpus_id": str(corpus_id),

View File

@@ -3,28 +3,52 @@
from __future__ import annotations
import json
import logging
from uuid import UUID
from legal_mcp.services import db, embeddings
logger = logging.getLogger(__name__)
async def search_decisions(
query: str,
limit: int = 10,
section_type: str = "",
practice_area: str = "",
appeal_subtype: str = "",
case_number: str = "",
) -> str:
"""חיפוש סמנטי בהחלטות קודמות ובמסמכים.
"""חיפוש סמנטי בהחלטות קודמות ובמסמכים — מסונן לפי תחום משפטי.
Args:
query: שאילתת חיפוש בעברית (לדוגמה: "שימוש חורג למסחר באזור מגורים")
query: שאילתת חיפוש בעברית
limit: מספר תוצאות מקסימלי
section_type: סינון לפי סוג סעיף (facts, legal_analysis, conclusion, ruling, וכו'). ריק = הכל
section_type: סינון לפי סוג סעיף (facts, legal_analysis, ...)
practice_area: תחום משפטי לסינון (appeals_committee/national_insurance/...)
appeal_subtype: סוג ערר לסינון (building_permit/betterment_levy/compensation_197)
case_number: אם סופק, ה-practice_area/subtype יוסקו אוטומטית מהתיק
"""
# Auto-resolve practice_area from case_number if available
if case_number and not practice_area:
case = await db.get_case_by_number(case_number)
if case:
practice_area = case.get("practice_area") or ""
appeal_subtype = appeal_subtype or (case.get("appeal_subtype") or "")
if not practice_area:
logger.warning(
"search_decisions called without practice_area filter — "
"results may mix legal domains"
)
query_emb = await embeddings.embed_query(query)
results = await db.search_similar(
query_embedding=query_emb,
limit=limit,
section_type=section_type or None,
practice_area=practice_area or None,
appeal_subtype=appeal_subtype or None,
)
if not results:
@@ -61,6 +85,7 @@ async def search_case_documents(
return f"תיק {case_number} לא נמצא."
query_emb = await embeddings.embed_query(query)
# Restricted to case_id — practice_area filter would be redundant.
results = await db.search_similar(
query_embedding=query_emb,
limit=limit,
@@ -86,17 +111,37 @@ async def search_case_documents(
async def find_similar_cases(
description: str,
limit: int = 5,
practice_area: str = "",
appeal_subtype: str = "",
case_number: str = "",
) -> str:
"""מציאת תיקים דומים על בסיס תיאור.
"""מציאת תיקים דומים על בסיס תיאור — מסונן לפי תחום משפטי.
Args:
description: תיאור התיק או הנושא (לדוגמה: "ערר על סירוב להיתר בנייה לתוספת קומה")
description: תיאור התיק או הנושא
limit: מספר תוצאות מקסימלי
practice_area: תחום משפטי לסינון
appeal_subtype: סוג ערר לסינון
case_number: אם סופק, ה-practice_area/subtype יוסקו אוטומטית מהתיק
"""
if case_number and not practice_area:
case = await db.get_case_by_number(case_number)
if case:
practice_area = case.get("practice_area") or ""
appeal_subtype = appeal_subtype or (case.get("appeal_subtype") or "")
if not practice_area:
logger.warning(
"find_similar_cases called without practice_area filter — "
"results may mix legal domains"
)
query_emb = await embeddings.embed_query(description)
results = await db.search_similar(
query_embedding=query_emb,
limit=limit * 3, # Get more to deduplicate by case
practice_area=practice_area or None,
appeal_subtype=appeal_subtype or None,
)
if not results: