All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 2m7s
Six-phase upgrade of /training from a read-only dashboard into a full Style Studio for managing Daphna's style corpus. - Upload Sheet on /training: file → proofread preview → commit (no more CLI-only `upload-training` skill). - Rich corpus metadata: GET /api/training/corpus returns summary, outcome, key_principles, page_count, parties (regex), legal_citation, lessons_count. PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details /content/lessons/patterns) replaces the bare table row. - LLM metadata enrichment: style_metadata_extractor + MCP tools (style_corpus_enrich, style_corpus_pending_enrichment) fill summary /outcome/key_principles via claude_session (free, host-side). - Per-decision lessons: new decision_lessons table + 4 REST endpoints + LessonsTab in drawer; hermes-curator now auto-posts findings as decision_lessons(source=curator). - Curator Portrait tab: prompt rendered with link to Gitea, recent curator findings, style_analyzer training prompts, propose-change form that writes proposals to data/curator-proposals/ for manual chair review (no auto-mutation of the agent file). - Style chat tab: SSE-streamed conversations with the style agent. New host-side pm2 service (legal-chat-service, port 8770) wraps claude CLI with stream-json + --resume continuation; FastAPI proxies via host.docker.internal. Zero API cost — uses chaim's claude.ai subscription. chat_conversations + chat_messages persist history. Architecture: keeps the existing rule that claude_session only runs on the host (not the container). The new legal-chat-service is the canonical bridge between the container and the local CLI for the chat feature; everything else (upload, metadata, lessons) stays within the container's existing capabilities. Audit script (scripts/audit_training_corpus.py) included for verifying which corpus rows still need enrichment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
206 lines
8.7 KiB
Python
206 lines
8.7 KiB
Python
"""Compose the system prompt the style-chat agent receives.
|
||
|
||
The chat runs against the local ``claude`` CLI on the host (via
|
||
legal-chat-service). We assemble a once-per-conversation system block
|
||
that gives the agent everything it needs to discuss decisions in
|
||
Daphna's voice:
|
||
|
||
- The style guide (``skills/decision/SKILL.md``) — how she writes
|
||
- The lessons file (``docs/legal-decision-lessons.md``) — what we've
|
||
learned across the corpus
|
||
- The corpus-analysis report (``docs/corpus-analysis.md``) — the
|
||
structural map of 24+ decisions
|
||
- A summary of every style_corpus row (number, date, subjects,
|
||
chars + summary if extracted) so the agent can reason about the
|
||
whole corpus without us shipping all of it inline
|
||
- Optional: when the conversation is scoped to a specific decision
|
||
(``style_corpus_id``), append its full_text so the chat can dive
|
||
into the text directly
|
||
|
||
Sent **once**, when the conversation is first created. On subsequent
|
||
messages the legal-chat-service uses ``claude --resume <session_id>``
|
||
and the on-disk CLI session keeps the system context intact — no need
|
||
to re-ship the 100K+ chars of skills + lessons every turn.
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import logging
|
||
import os
|
||
from pathlib import Path
|
||
from uuid import UUID
|
||
|
||
from legal_mcp.services import db
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
|
||
# The reference files live in the repo at known paths. In the
|
||
# container they're mounted alongside the code, so resolve relative
|
||
# to web/app.py's parent.
|
||
_REPO_ROOT = Path(os.environ.get(
|
||
"LEGAL_AI_REPO_ROOT",
|
||
str(Path(__file__).resolve().parent.parent),
|
||
))
|
||
|
||
|
||
_SKILLS_PATH = _REPO_ROOT / "skills" / "decision" / "SKILL.md"
|
||
_LESSONS_PATH = _REPO_ROOT / "docs" / "legal-decision-lessons.md"
|
||
_CORPUS_ANALYSIS_PATH = _REPO_ROOT / "docs" / "corpus-analysis.md"
|
||
|
||
|
||
def _safe_read(path: Path, cap_chars: int = 50_000) -> str:
|
||
"""Read a file (UTF-8) or return a marker that it's missing.
|
||
|
||
The cap protects against accidentally injecting an enormous file —
|
||
even at 50K, a single source file is the lion's share of the
|
||
system prompt budget.
|
||
"""
|
||
try:
|
||
text = path.read_text(encoding="utf-8")
|
||
except FileNotFoundError:
|
||
return f"(קובץ {path.name} לא נמצא בנתיב {path})"
|
||
except OSError as e:
|
||
logger.warning("could not read %s: %s", path, e)
|
||
return f"(שגיאה בקריאת {path.name}: {e})"
|
||
if len(text) > cap_chars:
|
||
return text[:cap_chars] + f"\n\n[... חתך ב-{cap_chars:,} תווים מתוך {len(text):,}]"
|
||
return text
|
||
|
||
|
||
async def _corpus_summary_block() -> str:
|
||
"""Compact one-row-per-decision summary the agent can scan."""
|
||
rows = await db.get_pool()
|
||
async with rows.acquire() as conn:
|
||
records = await conn.fetch(
|
||
"""
|
||
SELECT decision_number, decision_date, appeal_subtype,
|
||
subject_categories, length(full_text) AS chars,
|
||
coalesce(summary, '') AS summary,
|
||
coalesce(outcome, '') AS outcome
|
||
FROM style_corpus
|
||
ORDER BY decision_date NULLS LAST
|
||
"""
|
||
)
|
||
if not records:
|
||
return "(הקורפוס ריק)"
|
||
|
||
lines = []
|
||
for r in records:
|
||
cats = r["subject_categories"]
|
||
if isinstance(cats, str):
|
||
import json as _json
|
||
try:
|
||
cats = _json.loads(cats)
|
||
except _json.JSONDecodeError:
|
||
cats = []
|
||
cats_str = ", ".join(cats or []) if cats else "—"
|
||
date_str = str(r["decision_date"]) if r["decision_date"] else "—"
|
||
summary = (r["summary"] or "").strip()
|
||
outcome = (r["outcome"] or "").strip()
|
||
head = f"- **{r['decision_number'] or '—'}** ({date_str}) [{r['appeal_subtype'] or '—'}] · {r['chars']:,} תווים"
|
||
meta = f" נושאים: {cats_str}"
|
||
body = ""
|
||
if summary:
|
||
body = f"\n תקציר: {summary}"
|
||
if outcome:
|
||
body += f" — תוצאה: {outcome}"
|
||
elif outcome:
|
||
body = f"\n תוצאה: {outcome}"
|
||
lines.append(head + "\n" + meta + body)
|
||
return "\n".join(lines)
|
||
|
||
|
||
async def _decision_full_text(corpus_id: UUID) -> str:
|
||
pool = await db.get_pool()
|
||
async with pool.acquire() as conn:
|
||
row = await conn.fetchrow(
|
||
"SELECT decision_number, decision_date, full_text "
|
||
"FROM style_corpus WHERE id = $1",
|
||
corpus_id,
|
||
)
|
||
if not row:
|
||
return ""
|
||
header = f"# החלטה {row['decision_number']} ({row['decision_date']})\n\n"
|
||
return header + (row["full_text"] or "")
|
||
|
||
|
||
SYSTEM_PROMPT_HEADER = """\
|
||
אתה סוכן הסגנון של עו"ד דפנה תמיר, יו"ר ועדת הערר לתכנון ובניה — מחוז ירושלים.
|
||
|
||
תפקידך: לעזור לחיים (העוזר המקצועי של דפנה) להבין, לנתח ולחדד את הסגנון
|
||
של דפנה. אתה לא כותב החלטות חדשות; אתה דן בסגנון של החלטות קיימות,
|
||
מזהה דפוסים, מקפיד שהכותבים העתידיים (ה-writer agent) יישארו נאמנים
|
||
לקולה.
|
||
|
||
יש לך גישה ל:
|
||
1. **מדריך הסגנון** של דפנה (skills/decision/SKILL.md) — איך היא כותבת.
|
||
2. **הלקחים הגנריים** מהקורפוס (docs/legal-decision-lessons.md) — מה
|
||
למדנו לאורך 24+ החלטות. **חובה** להישען על הקבצים האלה כשאתה דן
|
||
בסגנון, ולא להמציא תובנות חדשות מהאוויר.
|
||
3. **ניתוח הקורפוס** המבני (docs/corpus-analysis.md) — מפת תוכן ופערים.
|
||
4. **רשימת ההחלטות בקורפוס** (למטה) — סקירה תמציתית של כל החלטה
|
||
שעלתה ל-style_corpus.
|
||
5. **טקסט מלא של החלטה ספציפית** (אם השיחה הוצמדה ל-style_corpus_id).
|
||
|
||
כללי תקשורת:
|
||
- כל התשובות בעברית.
|
||
- חיים יושב מולך, לא דפנה — אבל המטרה היא לחדד את הסגנון *של דפנה*.
|
||
- אם חיים שואל "האם פסקה X מתאימה לסגנון של דפנה?" — תן ניתוח מנומק
|
||
שמסתמך על SKILL.md ועל החלטות הקורפוס. אל תמציא ראיות.
|
||
- אם אתה צריך החלטה ספציפית שאין בקורפוס — הודע לחיים שיצרף אותה.
|
||
- אם חיים אומר לך משהו חדש על דפנה ("דפנה אומרת לעולם אל תפתח החלטה
|
||
במילה X") — שמור את זה בזיכרון השיחה; אם זה מצדיק תיעוד קבוע, הצע
|
||
לחיים להוסיף את זה כ-decision_lesson (POST /api/training/lessons)
|
||
או כתוספת ל-SKILL.md.
|
||
- אל תיתן לעצמך אישיות מומצאת — אתה כלי-עזר מקצועי, לא חבר.
|
||
"""
|
||
|
||
|
||
async def build_system_prompt(
|
||
*,
|
||
corpus_id: UUID | None = None,
|
||
include_corpus_summary: bool = True,
|
||
) -> str:
|
||
"""Assemble the full system prompt for a new chat conversation.
|
||
|
||
Args:
|
||
corpus_id: When set, the full_text of that decision is appended
|
||
so the chat can dive into the text.
|
||
include_corpus_summary: Set False for low-context chats (e.g.
|
||
quick "what does Daphna do at the end of a betterment-levy
|
||
decision?" — no need to ship 24 summaries).
|
||
"""
|
||
parts: list[str] = [SYSTEM_PROMPT_HEADER]
|
||
|
||
parts.append("\n## מדריך הסגנון (skills/decision/SKILL.md)\n")
|
||
parts.append(_safe_read(_SKILLS_PATH, cap_chars=40_000))
|
||
|
||
parts.append("\n\n## לקחים מהקורפוס (docs/legal-decision-lessons.md)\n")
|
||
parts.append(_safe_read(_LESSONS_PATH, cap_chars=30_000))
|
||
|
||
parts.append("\n\n## ניתוח קורפוס מבני (docs/corpus-analysis.md)\n")
|
||
parts.append(_safe_read(_CORPUS_ANALYSIS_PATH, cap_chars=15_000))
|
||
|
||
if include_corpus_summary:
|
||
parts.append("\n\n## רשימת ההחלטות בקורפוס הסגנון\n")
|
||
try:
|
||
parts.append(await _corpus_summary_block())
|
||
except Exception as e:
|
||
logger.warning("corpus summary failed: %s", e)
|
||
parts.append("(שגיאה בטעינת רשימת הקורפוס)")
|
||
|
||
if corpus_id is not None:
|
||
parts.append("\n\n## ההחלטה הספציפית בדיון (full_text)\n")
|
||
try:
|
||
txt = await _decision_full_text(corpus_id)
|
||
if txt:
|
||
parts.append(txt[:200_000]) # hard cap
|
||
else:
|
||
parts.append("(לא נמצאה החלטה — בדוק את ה-corpus_id)")
|
||
except Exception as e:
|
||
logger.warning("decision full_text failed: %s", e)
|
||
parts.append("(שגיאה בטעינת ההחלטה)")
|
||
|
||
return "\n".join(parts)
|