Files
legal-ai/web/chat_system_prompt.py
Chaim bb0cd7c6a2
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 2m7s
feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat
Six-phase upgrade of /training from a read-only dashboard into a full
Style Studio for managing Daphna's style corpus.

- Upload Sheet on /training: file → proofread preview → commit (no more
  CLI-only `upload-training` skill).
- Rich corpus metadata: GET /api/training/corpus returns summary, outcome,
  key_principles, page_count, parties (regex), legal_citation, lessons_count.
  PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details
  /content/lessons/patterns) replaces the bare table row.
- LLM metadata enrichment: style_metadata_extractor + MCP tools
  (style_corpus_enrich, style_corpus_pending_enrichment) fill summary
  /outcome/key_principles via claude_session (free, host-side).
- Per-decision lessons: new decision_lessons table + 4 REST endpoints +
  LessonsTab in drawer; hermes-curator now auto-posts findings as
  decision_lessons(source=curator).
- Curator Portrait tab: prompt rendered with link to Gitea, recent
  curator findings, style_analyzer training prompts, propose-change
  form that writes proposals to data/curator-proposals/ for manual
  chair review (no auto-mutation of the agent file).
- Style chat tab: SSE-streamed conversations with the style agent.
  New host-side pm2 service (legal-chat-service, port 8770) wraps
  claude CLI with stream-json + --resume continuation; FastAPI proxies
  via host.docker.internal. Zero API cost — uses chaim's claude.ai
  subscription. chat_conversations + chat_messages persist history.

Architecture: keeps the existing rule that claude_session only runs
on the host (not the container). The new legal-chat-service is the
canonical bridge between the container and the local CLI for the chat
feature; everything else (upload, metadata, lessons) stays within the
container's existing capabilities.

Audit script (scripts/audit_training_corpus.py) included for verifying
which corpus rows still need enrichment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:06:22 +00:00

206 lines
8.7 KiB
Python
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""Compose the system prompt the style-chat agent receives.
The chat runs against the local ``claude`` CLI on the host (via
legal-chat-service). We assemble a once-per-conversation system block
that gives the agent everything it needs to discuss decisions in
Daphna's voice:
- The style guide (``skills/decision/SKILL.md``) — how she writes
- The lessons file (``docs/legal-decision-lessons.md``) — what we've
learned across the corpus
- The corpus-analysis report (``docs/corpus-analysis.md``) — the
structural map of 24+ decisions
- A summary of every style_corpus row (number, date, subjects,
chars + summary if extracted) so the agent can reason about the
whole corpus without us shipping all of it inline
- Optional: when the conversation is scoped to a specific decision
(``style_corpus_id``), append its full_text so the chat can dive
into the text directly
Sent **once**, when the conversation is first created. On subsequent
messages the legal-chat-service uses ``claude --resume <session_id>``
and the on-disk CLI session keeps the system context intact — no need
to re-ship the 100K+ chars of skills + lessons every turn.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
from uuid import UUID
from legal_mcp.services import db
logger = logging.getLogger(__name__)
# The reference files live in the repo at known paths. In the
# container they're mounted alongside the code, so resolve relative
# to web/app.py's parent.
_REPO_ROOT = Path(os.environ.get(
"LEGAL_AI_REPO_ROOT",
str(Path(__file__).resolve().parent.parent),
))
_SKILLS_PATH = _REPO_ROOT / "skills" / "decision" / "SKILL.md"
_LESSONS_PATH = _REPO_ROOT / "docs" / "legal-decision-lessons.md"
_CORPUS_ANALYSIS_PATH = _REPO_ROOT / "docs" / "corpus-analysis.md"
def _safe_read(path: Path, cap_chars: int = 50_000) -> str:
"""Read a file (UTF-8) or return a marker that it's missing.
The cap protects against accidentally injecting an enormous file —
even at 50K, a single source file is the lion's share of the
system prompt budget.
"""
try:
text = path.read_text(encoding="utf-8")
except FileNotFoundError:
return f"(קובץ {path.name} לא נמצא בנתיב {path})"
except OSError as e:
logger.warning("could not read %s: %s", path, e)
return f"(שגיאה בקריאת {path.name}: {e})"
if len(text) > cap_chars:
return text[:cap_chars] + f"\n\n[... חתך ב-{cap_chars:,} תווים מתוך {len(text):,}]"
return text
async def _corpus_summary_block() -> str:
"""Compact one-row-per-decision summary the agent can scan."""
rows = await db.get_pool()
async with rows.acquire() as conn:
records = await conn.fetch(
"""
SELECT decision_number, decision_date, appeal_subtype,
subject_categories, length(full_text) AS chars,
coalesce(summary, '') AS summary,
coalesce(outcome, '') AS outcome
FROM style_corpus
ORDER BY decision_date NULLS LAST
"""
)
if not records:
return "(הקורפוס ריק)"
lines = []
for r in records:
cats = r["subject_categories"]
if isinstance(cats, str):
import json as _json
try:
cats = _json.loads(cats)
except _json.JSONDecodeError:
cats = []
cats_str = ", ".join(cats or []) if cats else ""
date_str = str(r["decision_date"]) if r["decision_date"] else ""
summary = (r["summary"] or "").strip()
outcome = (r["outcome"] or "").strip()
head = f"- **{r['decision_number'] or ''}** ({date_str}) [{r['appeal_subtype'] or ''}] · {r['chars']:,} תווים"
meta = f" נושאים: {cats_str}"
body = ""
if summary:
body = f"\n תקציר: {summary}"
if outcome:
body += f" — תוצאה: {outcome}"
elif outcome:
body = f"\n תוצאה: {outcome}"
lines.append(head + "\n" + meta + body)
return "\n".join(lines)
async def _decision_full_text(corpus_id: UUID) -> str:
pool = await db.get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT decision_number, decision_date, full_text "
"FROM style_corpus WHERE id = $1",
corpus_id,
)
if not row:
return ""
header = f"# החלטה {row['decision_number']} ({row['decision_date']})\n\n"
return header + (row["full_text"] or "")
SYSTEM_PROMPT_HEADER = """\
אתה סוכן הסגנון של עו"ד דפנה תמיר, יו"ר ועדת הערר לתכנון ובניה — מחוז ירושלים.
תפקידך: לעזור לחיים (העוזר המקצועי של דפנה) להבין, לנתח ולחדד את הסגנון
של דפנה. אתה לא כותב החלטות חדשות; אתה דן בסגנון של החלטות קיימות,
מזהה דפוסים, מקפיד שהכותבים העתידיים (ה-writer agent) יישארו נאמנים
לקולה.
יש לך גישה ל:
1. **מדריך הסגנון** של דפנה (skills/decision/SKILL.md) — איך היא כותבת.
2. **הלקחים הגנריים** מהקורפוס (docs/legal-decision-lessons.md) — מה
למדנו לאורך 24+ החלטות. **חובה** להישען על הקבצים האלה כשאתה דן
בסגנון, ולא להמציא תובנות חדשות מהאוויר.
3. **ניתוח הקורפוס** המבני (docs/corpus-analysis.md) — מפת תוכן ופערים.
4. **רשימת ההחלטות בקורפוס** (למטה) — סקירה תמציתית של כל החלטה
שעלתה ל-style_corpus.
5. **טקסט מלא של החלטה ספציפית** (אם השיחה הוצמדה ל-style_corpus_id).
כללי תקשורת:
- כל התשובות בעברית.
- חיים יושב מולך, לא דפנה — אבל המטרה היא לחדד את הסגנון *של דפנה*.
- אם חיים שואל "האם פסקה X מתאימה לסגנון של דפנה?" — תן ניתוח מנומק
שמסתמך על SKILL.md ועל החלטות הקורפוס. אל תמציא ראיות.
- אם אתה צריך החלטה ספציפית שאין בקורפוס — הודע לחיים שיצרף אותה.
- אם חיים אומר לך משהו חדש על דפנה ("דפנה אומרת לעולם אל תפתח החלטה
במילה X") — שמור את זה בזיכרון השיחה; אם זה מצדיק תיעוד קבוע, הצע
לחיים להוסיף את זה כ-decision_lesson (POST /api/training/lessons)
או כתוספת ל-SKILL.md.
- אל תיתן לעצמך אישיות מומצאת — אתה כלי-עזר מקצועי, לא חבר.
"""
async def build_system_prompt(
*,
corpus_id: UUID | None = None,
include_corpus_summary: bool = True,
) -> str:
"""Assemble the full system prompt for a new chat conversation.
Args:
corpus_id: When set, the full_text of that decision is appended
so the chat can dive into the text.
include_corpus_summary: Set False for low-context chats (e.g.
quick "what does Daphna do at the end of a betterment-levy
decision?" — no need to ship 24 summaries).
"""
parts: list[str] = [SYSTEM_PROMPT_HEADER]
parts.append("\n## מדריך הסגנון (skills/decision/SKILL.md)\n")
parts.append(_safe_read(_SKILLS_PATH, cap_chars=40_000))
parts.append("\n\n## לקחים מהקורפוס (docs/legal-decision-lessons.md)\n")
parts.append(_safe_read(_LESSONS_PATH, cap_chars=30_000))
parts.append("\n\n## ניתוח קורפוס מבני (docs/corpus-analysis.md)\n")
parts.append(_safe_read(_CORPUS_ANALYSIS_PATH, cap_chars=15_000))
if include_corpus_summary:
parts.append("\n\n## רשימת ההחלטות בקורפוס הסגנון\n")
try:
parts.append(await _corpus_summary_block())
except Exception as e:
logger.warning("corpus summary failed: %s", e)
parts.append("(שגיאה בטעינת רשימת הקורפוס)")
if corpus_id is not None:
parts.append("\n\n## ההחלטה הספציפית בדיון (full_text)\n")
try:
txt = await _decision_full_text(corpus_id)
if txt:
parts.append(txt[:200_000]) # hard cap
else:
parts.append("(לא נמצאה החלטה — בדוק את ה-corpus_id)")
except Exception as e:
logger.warning("decision full_text failed: %s", e)
parts.append("(שגיאה בטעינת ההחלטה)")
return "\n".join(parts)