feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 2m7s
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 2m7s
Six-phase upgrade of /training from a read-only dashboard into a full Style Studio for managing Daphna's style corpus. - Upload Sheet on /training: file → proofread preview → commit (no more CLI-only `upload-training` skill). - Rich corpus metadata: GET /api/training/corpus returns summary, outcome, key_principles, page_count, parties (regex), legal_citation, lessons_count. PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details /content/lessons/patterns) replaces the bare table row. - LLM metadata enrichment: style_metadata_extractor + MCP tools (style_corpus_enrich, style_corpus_pending_enrichment) fill summary /outcome/key_principles via claude_session (free, host-side). - Per-decision lessons: new decision_lessons table + 4 REST endpoints + LessonsTab in drawer; hermes-curator now auto-posts findings as decision_lessons(source=curator). - Curator Portrait tab: prompt rendered with link to Gitea, recent curator findings, style_analyzer training prompts, propose-change form that writes proposals to data/curator-proposals/ for manual chair review (no auto-mutation of the agent file). - Style chat tab: SSE-streamed conversations with the style agent. New host-side pm2 service (legal-chat-service, port 8770) wraps claude CLI with stream-json + --resume continuation; FastAPI proxies via host.docker.internal. Zero API cost — uses chaim's claude.ai subscription. chat_conversations + chat_messages persist history. Architecture: keeps the existing rule that claude_session only runs on the host (not the container). The new legal-chat-service is the canonical bridge between the container and the local CLI for the chat feature; everything else (upload, metadata, lessons) stays within the container's existing capabilities. Audit script (scripts/audit_training_corpus.py) included for verifying which corpus rows still need enrichment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
85
mcp-server/src/legal_mcp/tools/training_enrichment.py
Normal file
85
mcp-server/src/legal_mcp/tools/training_enrichment.py
Normal file
@@ -0,0 +1,85 @@
|
||||
"""MCP tool wrappers for the style_corpus metadata-enrichment flow.
|
||||
|
||||
The actual extractor lives in
|
||||
``legal_mcp.services.style_metadata_extractor``; this module just exposes
|
||||
it as MCP tools that the chair (or a future automation) can call from
|
||||
Claude Code.
|
||||
|
||||
Why these tools matter: the upload pipeline (`/api/training/upload` →
|
||||
`_process_proofread_training`) inserts a style_corpus row with
|
||||
``summary=''``, ``outcome=''``, ``key_principles=[]`` because LLM
|
||||
extraction can't run from the FastAPI container (no claude CLI there).
|
||||
This module fills that gap — call it from the host, where ``claude``
|
||||
CLI is available, and the row gets enriched.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from uuid import UUID
|
||||
|
||||
from legal_mcp.services import db, style_metadata_extractor
|
||||
|
||||
|
||||
def _ok(payload) -> str:
|
||||
return json.dumps({"ok": True, **payload}, ensure_ascii=False, default=str)
|
||||
|
||||
|
||||
def _err(msg: str) -> str:
|
||||
return json.dumps({"ok": False, "error": msg}, ensure_ascii=False)
|
||||
|
||||
|
||||
async def extract_decision_metadata(corpus_id: str, overwrite: bool = False) -> str:
|
||||
"""חילוץ מטא-דאטה (summary, outcome, key_principles, appeal_subtype) להחלטה בקורפוס הסגנון.
|
||||
|
||||
ברירת מחדל ``overwrite=False`` ממלא רק שדות ריקים. הזן ``overwrite=true``
|
||||
כדי לרענן ערכים שכבר נכתבו.
|
||||
"""
|
||||
try:
|
||||
cid = UUID(corpus_id)
|
||||
except ValueError:
|
||||
return _err("corpus_id לא תקין")
|
||||
try:
|
||||
result = await style_metadata_extractor.extract_and_apply(cid, overwrite=overwrite)
|
||||
except Exception as e:
|
||||
return _err(str(e))
|
||||
return _ok(result)
|
||||
|
||||
|
||||
async def list_corpus_pending_enrichment(limit: int = 50) -> str:
|
||||
"""רשימת רשומות style_corpus שחסר להן summary/outcome/key_principles — מועמדות להעשרה."""
|
||||
pool = await db.get_pool()
|
||||
async with pool.acquire() as conn:
|
||||
rows = await conn.fetch(
|
||||
"""
|
||||
SELECT id, decision_number, decision_date,
|
||||
length(full_text) AS chars,
|
||||
coalesce(summary, '') = '' AS missing_summary,
|
||||
coalesce(outcome, '') = '' AS missing_outcome,
|
||||
coalesce(jsonb_array_length(key_principles), 0) = 0 AS missing_principles
|
||||
FROM style_corpus
|
||||
WHERE coalesce(summary, '') = ''
|
||||
OR coalesce(outcome, '') = ''
|
||||
OR coalesce(jsonb_array_length(key_principles), 0) = 0
|
||||
ORDER BY decision_date NULLS LAST
|
||||
LIMIT $1
|
||||
""",
|
||||
limit,
|
||||
)
|
||||
items = [
|
||||
{
|
||||
"corpus_id": str(r["id"]),
|
||||
"decision_number": r["decision_number"] or "",
|
||||
"decision_date": str(r["decision_date"]) if r["decision_date"] else "",
|
||||
"chars": r["chars"],
|
||||
"missing": [
|
||||
f for f, v in (
|
||||
("summary", r["missing_summary"]),
|
||||
("outcome", r["missing_outcome"]),
|
||||
("key_principles", r["missing_principles"]),
|
||||
) if v
|
||||
],
|
||||
}
|
||||
for r in rows
|
||||
]
|
||||
return _ok({"count": len(items), "items": items})
|
||||
Reference in New Issue
Block a user