feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 2m7s

Six-phase upgrade of /training from a read-only dashboard into a full
Style Studio for managing Daphna's style corpus.

- Upload Sheet on /training: file → proofread preview → commit (no more
  CLI-only `upload-training` skill).
- Rich corpus metadata: GET /api/training/corpus returns summary, outcome,
  key_principles, page_count, parties (regex), legal_citation, lessons_count.
  PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details
  /content/lessons/patterns) replaces the bare table row.
- LLM metadata enrichment: style_metadata_extractor + MCP tools
  (style_corpus_enrich, style_corpus_pending_enrichment) fill summary
  /outcome/key_principles via claude_session (free, host-side).
- Per-decision lessons: new decision_lessons table + 4 REST endpoints +
  LessonsTab in drawer; hermes-curator now auto-posts findings as
  decision_lessons(source=curator).
- Curator Portrait tab: prompt rendered with link to Gitea, recent
  curator findings, style_analyzer training prompts, propose-change
  form that writes proposals to data/curator-proposals/ for manual
  chair review (no auto-mutation of the agent file).
- Style chat tab: SSE-streamed conversations with the style agent.
  New host-side pm2 service (legal-chat-service, port 8770) wraps
  claude CLI with stream-json + --resume continuation; FastAPI proxies
  via host.docker.internal. Zero API cost — uses chaim's claude.ai
  subscription. chat_conversations + chat_messages persist history.

Architecture: keeps the existing rule that claude_session only runs
on the host (not the container). The new legal-chat-service is the
canonical bridge between the container and the local CLI for the chat
feature; everything else (upload, metadata, lessons) stays within the
container's existing capabilities.

Audit script (scripts/audit_training_corpus.py) included for verifying
which corpus rows still need enrichment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-27 10:06:22 +00:00
parent 0629f19d5f
commit bb0cd7c6a2
23 changed files with 4568 additions and 75 deletions

View File

@@ -76,6 +76,24 @@ profiles:
Authorization: Bearer $PAPERCLIP_API_KEY Authorization: Bearer $PAPERCLIP_API_KEY
{ "body": "<my findings>" } { "body": "<my findings>" }
``` ```
5b. **רושם כל ממצא גם ב-API של legal-ai כ-decision_lesson**, כך שיופיע ב-UI
תחת הטאב "מה למדנו" של ההחלטה בקורפוס. דרישה: למצוא קודם את ה-`style_corpus_id`
שתואם ל-`decision_number` של ההחלטה (`GET /api/training/corpus` ולסנן).
לכל ממצא:
```
POST https://legal-ai.nautilus.marcusgroup.org/api/training/corpus/{corpus_id}/lessons
Content-Type: application/json
{
"lesson_text": "<התקציר של הממצא — מה ראיתי + הצעה — שורה אחת>",
"category": "<style|structure|lexicon|tabular|general>",
"source": "curator"
}
```
מיפוי תגי-ממצא ל-`category`:
- `[סגנון]` → `style`
- `[מבנה]` → `structure`
- `[לקסיקון משפטי]` → `lexicon`
- `[טבלאי]` → `tabular`
6. סוגר את ה-issue (status=done) אחרי שכתבתי את ה-comment 6. סוגר את ה-issue (status=done) אחרי שכתבתי את ה-comment
## פורמט ה-comment ## פורמט ה-comment

View File

@@ -91,6 +91,16 @@
- שינויי קוד נכנסים לתוקף אחרי `pm2 restart paperclip` - שינויי קוד נכנסים לתוקף אחרי `pm2 restart paperclip`
- **אין צורך ב-Docker או Coolify** - **אין צורך ב-Docker או Coolify**
**legal-chat-service** — רץ **מקומית דרך pm2** (חדש, מאפריל 2026):
- פורט: `localhost:8770` (loopback בלבד)
- שירות aiohttp קצר שעוטף את `claude` CLI ב-streaming + session continuation, ומשרת את הטאב "שיחה" בדף `/training`. הקונטיינר משדל אליו proxy דרך `host.docker.internal:8770`.
- קוד: [mcp-server/src/legal_mcp/chat_service/](mcp-server/src/legal_mcp/chat_service/)
- התקנה: `pm2 start /home/chaim/legal-ai/scripts/legal-chat-service.config.cjs && pm2 save`
- בריאות: `curl http://127.0.0.1:8770/health``{"ok":true,...}`
- שינויי קוד: `pm2 restart legal-chat-service`
- **אפס עלות API** — claude CLI משתמש ב-claude.ai subscription של chaim. הנחת היסוד של `claude_session.py` (claude CLI מקומי בלבד) נשמרת — השירות הזה הוא הגשר הרשמי בין הקונטיינר לחוץ.
- Coolify dependency: ה-Service Definition של legal-ai חייב להכיל `extra_hosts: host.docker.internal:host-gateway` (אחרת ה-proxy יקבל ConnectError).
--- ---
## מבנה תיקיות ## מבנה תיקיות

View File

@@ -0,0 +1,13 @@
"""legal-chat-service — host-side SSE bridge to ``claude`` CLI.
Runs as a pm2-managed process on the host (port 127.0.0.1:8770 by default).
The legal-ai FastAPI container proxies chat requests to it via
``host.docker.internal:8770``.
Why a separate service:
The chat needs real-time streaming + multi-turn session continuation
(``claude --resume <session_id>``). The container can't run the
claude CLI (no binary, no claude.ai credentials). Splitting this out
keeps the architectural rule of ``claude_session.py`` intact while
enabling the new chat feature for free (no API key).
"""

View File

@@ -0,0 +1,144 @@
"""HTTP+SSE bridge from FastAPI (in container) to local claude CLI.
Endpoints:
POST /chat/start — body: {prompt, system?, resume_session_id?}
returns SSE stream of events from
``claude_session.query_streaming``.
GET /health — liveness probe.
Run with pm2:
pm2 start ecosystem.config.cjs --only legal-chat-service
Standalone for dev:
cd ~/legal-ai/mcp-server
.venv/bin/python -m legal_mcp.chat_service.server --port 8770
We intentionally bind to 127.0.0.1 only — the FastAPI container reaches
us via ``host.docker.internal``, and exposing the bridge publicly would
let anyone run claude CLI commands against Daphna's session.
"""
from __future__ import annotations
import argparse
import asyncio
import json
import logging
import os
import sys
from typing import Any
from aiohttp import web
# Run-via-CLI bootstrap so ``python -m legal_mcp.chat_service.server``
# works even when the package isn't installed (it is in the venv, but
# this safeguard keeps the entrypoint robust).
_pkg_root = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
if _pkg_root not in sys.path:
sys.path.insert(0, _pkg_root)
from legal_mcp.services import claude_session # noqa: E402
logger = logging.getLogger("legal_chat_service")
async def health(request: web.Request) -> web.Response:
return web.json_response({"ok": True, "service": "legal-chat-service"})
async def chat_start(request: web.Request) -> web.StreamResponse:
"""Drive ``claude_session.query_streaming`` and forward events as SSE.
Request body (JSON):
prompt: str — required, user message
system: str | None — system instructions (ignored if resuming)
resume_session_id: str | None — continue a prior CLI session
timeout: int = 3600 — hard timeout for the subprocess
"""
try:
body = await request.json()
except json.JSONDecodeError:
return web.json_response({"error": "invalid JSON body"}, status=400)
prompt = body.get("prompt") or ""
if not prompt.strip():
return web.json_response({"error": "prompt is required"}, status=400)
system = body.get("system")
resume_session_id = body.get("resume_session_id")
timeout = int(body.get("timeout") or 3600)
response = web.StreamResponse(
status=200,
reason="OK",
headers={
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
"Connection": "keep-alive",
# X-Accel-Buffering=no defeats nginx/traefik buffering — the
# FastAPI container proxies via httpx and forwards bytes as
# they arrive, but the inner header is harmless and makes
# browser-direct testing easier.
"X-Accel-Buffering": "no",
},
)
await response.prepare(request)
async def send_event(payload: dict[str, Any]) -> None:
line = f"data: {json.dumps(payload, ensure_ascii=False)}\n\n"
await response.write(line.encode("utf-8"))
try:
async for event in claude_session.query_streaming(
prompt,
system=system,
resume_session_id=resume_session_id,
timeout=timeout,
):
await send_event(event)
if event.get("type") == "done" or event.get("type") == "error":
break
except asyncio.CancelledError:
# Client disconnected — bail cleanly.
logger.info("chat_start: client disconnected")
except Exception as e:
logger.exception("chat_start: streaming failed")
try:
await send_event({"type": "error", "message": str(e)})
except ConnectionResetError:
pass
try:
await response.write_eof()
except ConnectionResetError:
pass
return response
def build_app() -> web.Application:
app = web.Application()
app.router.add_get("/health", health)
app.router.add_post("/chat/start", chat_start)
return app
def main() -> int:
parser = argparse.ArgumentParser(description="legal-chat-service")
parser.add_argument("--port", type=int, default=8770)
parser.add_argument("--host", default="127.0.0.1",
help="bind address; 127.0.0.1 keeps the service "
"loopback-only — leave it alone in production")
parser.add_argument("--log-level", default="INFO")
args = parser.parse_args()
logging.basicConfig(
level=args.log_level.upper(),
format="%(asctime)s %(name)s %(levelname)s %(message)s",
)
app = build_app()
web.run_app(app, host=args.host, port=args.port, print=lambda _msg: None)
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -57,6 +57,7 @@ from legal_mcp.tools import ( # noqa: E402
legal_arguments as la_tools, legal_arguments as la_tools,
missing_precedents as mp_tools, missing_precedents as mp_tools,
citations as cit_tools, citations as cit_tools,
training_enrichment as train_tools,
) )
@@ -248,6 +249,18 @@ async def precedent_extract_metadata(case_law_id: str) -> str:
return await plib.precedent_extract_metadata(case_law_id) return await plib.precedent_extract_metadata(case_law_id)
@mcp.tool()
async def style_corpus_enrich(corpus_id: str, overwrite: bool = False) -> str:
"""חילוץ מטא-דאטה (summary, outcome, key_principles, appeal_subtype) להחלטה בקורפוס הסגנון של דפנה. ברירת מחדל: ממלא רק שדות ריקים. שלח `overwrite=true` כדי לרענן."""
return await train_tools.extract_decision_metadata(corpus_id, overwrite=overwrite)
@mcp.tool()
async def style_corpus_pending_enrichment(limit: int = 50) -> str:
"""רשימת החלטות בקורפוס הסגנון שעדיין חסרות summary/outcome/key_principles — מועמדות לחילוץ."""
return await train_tools.list_corpus_pending_enrichment(limit)
@mcp.tool() @mcp.tool()
async def precedent_process_pending(kind: str = "metadata", limit: int = 20) -> str: async def precedent_process_pending(kind: str = "metadata", limit: int = 20) -> str:
"""ריקון תור בקשות חילוץ שנשלחו מ-UI. kind: 'metadata' או 'halacha'. מריץ extractor מקומית עם CLI על כל פריט בתור, ומנקה את הסימון אחרי הצלחה.""" """ריקון תור בקשות חילוץ שנשלחו מ-UI. kind: 'metadata' או 'halacha'. מריץ extractor מקומית עם CLI על כל פריט בתור, ומנקה את הסימון אחרי הצלחה."""

View File

@@ -142,3 +142,175 @@ async def query_json(
""" """
raw = await query(prompt, timeout=timeout, system=system) raw = await query(prompt, timeout=timeout, system=system)
return parse_llm_json(raw) return parse_llm_json(raw)
# ── Streaming + session continuation ────────────────────────────────
async def query_streaming(
prompt: str,
*,
system: str | None = None,
resume_session_id: str | None = None,
timeout: int = LONG_TIMEOUT,
cwd: str | None = None,
):
"""Stream Claude's response as an async iterator of events.
Wraps `claude -p --output-format=stream-json` (newline-delimited JSON
objects from the CLI) and translates each line into a small, stable
shape that the chat service / SSE proxy can forward without leaking
CLI internals to the browser.
Event shapes yielded:
{"type": "session_id", "value": "<uuid>"} # first event, used for resume
{"type": "text_delta", "text": "<partial>"} # incremental assistant text
{"type": "tool_use", "name": "...", "input": {...}}
{"type": "error", "message": "..."}
{"type": "done", "text": "<full response>"}
The CLI emits a richer stream; we project to this minimal set so the
front-end can stay stable across CLI upgrades.
Args:
prompt: The user message to send.
system: Optional system instructions (used only when starting a
fresh conversation — when resume_session_id is set, the
session already carries its system prompt).
resume_session_id: Continue a prior conversation. When given,
we don't re-send the system prompt; the CLI loads the
entire conversation history from disk.
timeout: Hard ceiling on the subprocess.
cwd: Working directory for the subprocess — defaults to the
host's HOME so claude.ai credentials resolve correctly.
"""
if resume_session_id:
# When resuming, system is already baked into the on-disk session
# — sending it again would be a no-op at best and confuse the
# conversation at worst.
full_prompt = prompt
cmd = [
"claude", "-p",
"--output-format", "stream-json",
"--verbose",
"--resume", resume_session_id,
]
else:
full_prompt = f"{system}\n\n{prompt}" if system else prompt
cmd = [
"claude", "-p",
"--output-format", "stream-json",
"--verbose",
]
if len(full_prompt) > 200_000:
logger.warning(
"Streaming: large prompt (%d chars) — may hit CLI input limits",
len(full_prompt),
)
try:
proc = await asyncio.create_subprocess_exec(
*cmd,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=cwd,
)
except FileNotFoundError:
yield {
"type": "error",
"message": (
"Claude CLI not found on host — legal-chat-service must "
"run where the `claude` binary is installed (Daphna's host, "
"not the legal-ai container)."
),
}
return
assert proc.stdin is not None # for type checkers
assert proc.stdout is not None
# Send the prompt and close stdin so the CLI knows the user message
# is complete.
try:
proc.stdin.write(full_prompt.encode("utf-8"))
await proc.stdin.drain()
proc.stdin.close()
except BrokenPipeError:
# CLI exited before reading the prompt — drain stderr and bail.
stderr_b = await proc.stderr.read() if proc.stderr else b""
yield {
"type": "error",
"message": f"Claude CLI closed stdin early: {stderr_b.decode('utf-8', errors='replace')[:300]}",
}
return
accumulated_text: list[str] = []
session_id_emitted = False
deadline = asyncio.get_event_loop().time() + timeout
try:
while True:
remaining = deadline - asyncio.get_event_loop().time()
if remaining <= 0:
yield {"type": "error", "message": f"timed out after {timeout}s"}
break
try:
line_b = await asyncio.wait_for(proc.stdout.readline(), timeout=remaining)
except asyncio.TimeoutError:
yield {"type": "error", "message": f"stream timed out after {timeout}s"}
break
if not line_b:
break
line = line_b.decode("utf-8", errors="replace").strip()
if not line:
continue
try:
event = json.loads(line)
except json.JSONDecodeError:
# Stray non-JSON line from CLI — surface a snippet for debug.
logger.debug("non-JSON stream line: %s", line[:120])
continue
# The CLI's stream-json emits several event types. We only
# care about the ones the chat service forwards.
t = event.get("type")
if not session_id_emitted:
sid = event.get("session_id")
if sid:
session_id_emitted = True
yield {"type": "session_id", "value": sid}
if t == "assistant":
# event["message"]["content"] is a list of blocks; we extract
# text blocks and tool_use blocks.
msg = event.get("message") or {}
for block in msg.get("content") or []:
btype = block.get("type")
if btype == "text":
text = block.get("text") or ""
if text:
accumulated_text.append(text)
yield {"type": "text_delta", "text": text}
elif btype == "tool_use":
yield {
"type": "tool_use",
"name": block.get("name") or "",
"input": block.get("input") or {},
}
elif t == "result":
# Final synthesized result line from the CLI — we already
# delivered the deltas, so just stop here.
break
finally:
if proc.returncode is None:
try:
proc.kill()
except ProcessLookupError:
pass
try:
await proc.wait()
except Exception:
pass
yield {"type": "done", "text": "".join(accumulated_text)}

View File

@@ -194,6 +194,55 @@ ALTER TABLE style_corpus ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT ''
-- הרחבת style_patterns עם appeal_subtype לניתוח סגנון נפרד לכל סוג ערר -- הרחבת style_patterns עם appeal_subtype לניתוח סגנון נפרד לכל סוג ערר
ALTER TABLE style_patterns ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT ''; ALTER TABLE style_patterns ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT '';
-- decision_lessons: per-decision learnings the chair / curator / style_analyzer
-- attaches to a corpus row. The generic legal-decision-lessons.md file stays
-- as the source of truth for cross-corpus patterns; this table stores the
-- granular "what we learned from THIS decision" notes that drive the writer's
-- future drafts and let the curator look up prior observations on the same row.
CREATE TABLE IF NOT EXISTS decision_lessons (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
style_corpus_id UUID NOT NULL REFERENCES style_corpus(id) ON DELETE CASCADE,
lesson_text TEXT NOT NULL,
category TEXT DEFAULT 'general', -- style / structure / lexicon / tabular / general
source TEXT DEFAULT 'manual', -- manual / curator / chair / style_analyzer
applied_to_skill BOOLEAN DEFAULT false, -- has this been promoted into SKILL.md?
created_by TEXT DEFAULT 'chaim',
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_decision_lessons_corpus ON decision_lessons(style_corpus_id);
CREATE INDEX IF NOT EXISTS idx_decision_lessons_applied ON decision_lessons(applied_to_skill);
-- chat_conversations / chat_messages: persistent history for the
-- "שיחה עם הסוכן" tab on /training. Each conversation can optionally be
-- scoped to a single style_corpus row (when the chair starts a chat
-- "about decision X"). claude_session_id is the value the local claude
-- CLI returns in stream-json — we pass it back via `--resume` on the
-- next message so the model continues the same conversation without
-- re-loading the system prompt every time.
CREATE TABLE IF NOT EXISTS chat_conversations (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
title TEXT NOT NULL DEFAULT 'שיחה חדשה',
style_corpus_id UUID REFERENCES style_corpus(id) ON DELETE SET NULL,
claude_session_id TEXT,
system_prompt_version TEXT DEFAULT 'v1',
created_at TIMESTAMPTZ DEFAULT now(),
last_message_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE IF NOT EXISTS chat_messages (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
conversation_id UUID NOT NULL REFERENCES chat_conversations(id) ON DELETE CASCADE,
role TEXT NOT NULL, -- 'user' | 'assistant'
content TEXT NOT NULL,
raw_events JSONB DEFAULT '[]', -- stream-json events for the assistant turn (optional, for debug)
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_chat_messages_conv ON chat_messages(conversation_id, created_at);
CREATE INDEX IF NOT EXISTS idx_chat_conv_corpus ON chat_conversations(style_corpus_id);
CREATE INDEX IF NOT EXISTS idx_chat_conv_last ON chat_conversations(last_message_at DESC);
-- טבלת qa_results -- טבלת qa_results
CREATE TABLE IF NOT EXISTS qa_results ( CREATE TABLE IF NOT EXISTS qa_results (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
@@ -1609,6 +1658,284 @@ async def delete_from_style_corpus(corpus_id: UUID) -> dict:
} }
async def get_style_corpus_row(corpus_id: UUID) -> dict | None:
"""Return a single style_corpus row by id, or None if missing."""
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"""
SELECT id, document_id, decision_number, decision_date,
subject_categories, full_text, summary, outcome,
key_principles, practice_area, appeal_subtype, created_at
FROM style_corpus WHERE id = $1
""",
corpus_id,
)
return dict(row) if row else None
async def update_style_corpus_metadata(
corpus_id: UUID,
*,
summary: str | None = None,
outcome: str | None = None,
key_principles: list[str] | None = None,
appeal_subtype: str | None = None,
practice_area: str | None = None,
overwrite: bool = False,
) -> dict:
"""Patch the enriched-metadata columns of a style_corpus row.
By default, only empty columns are filled — passing ``overwrite=True``
is the caller's signal that they intentionally want to replace existing
values (used by the re-extract flow when the chair runs it manually).
"""
pool = await get_pool()
async with pool.acquire() as conn:
existing = await conn.fetchrow(
"SELECT summary, outcome, key_principles, appeal_subtype, practice_area "
"FROM style_corpus WHERE id = $1",
corpus_id,
)
if not existing:
return {"updated": False, "reason": "not found"}
sets: dict = {}
if summary is not None and (overwrite or not (existing["summary"] or "").strip()):
sets["summary"] = summary
if outcome is not None and (overwrite or not (existing["outcome"] or "").strip()):
sets["outcome"] = outcome
if key_principles is not None:
current = existing["key_principles"]
if isinstance(current, str):
try:
current = json.loads(current)
except json.JSONDecodeError:
current = []
if overwrite or not (current or []):
sets["key_principles"] = json.dumps(key_principles)
if appeal_subtype is not None and (overwrite or not (existing["appeal_subtype"] or "").strip()):
sets["appeal_subtype"] = appeal_subtype
if practice_area is not None and (overwrite or not (existing["practice_area"] or "").strip()):
sets["practice_area"] = practice_area
if not sets:
return {"updated": False, "reason": "nothing to update", "fields": []}
cols = list(sets.keys())
set_clause = ", ".join(f"{c} = ${i + 2}" for i, c in enumerate(cols))
values = [sets[c] for c in cols]
await conn.execute(
f"UPDATE style_corpus SET {set_clause} WHERE id = $1",
corpus_id, *values,
)
return {"updated": True, "fields": cols}
# ── decision_lessons (per-corpus row notes) ────────────────────────
async def list_decision_lessons(corpus_id: UUID) -> list[dict]:
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"SELECT id, style_corpus_id, lesson_text, category, source, "
" applied_to_skill, created_by, created_at, updated_at "
"FROM decision_lessons WHERE style_corpus_id = $1 "
"ORDER BY created_at DESC",
corpus_id,
)
return [dict(r) for r in rows]
async def add_decision_lesson(
corpus_id: UUID,
*,
lesson_text: str,
category: str = "general",
source: str = "manual",
created_by: str = "chaim",
) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"INSERT INTO decision_lessons "
"(style_corpus_id, lesson_text, category, source, created_by) "
"VALUES ($1, $2, $3, $4, $5) "
"RETURNING id, style_corpus_id, lesson_text, category, source, "
" applied_to_skill, created_by, created_at, updated_at",
corpus_id, lesson_text, category, source, created_by,
)
return dict(row) if row else {}
async def update_decision_lesson(
lesson_id: UUID,
*,
lesson_text: str | None = None,
category: str | None = None,
applied_to_skill: bool | None = None,
) -> dict:
sets: dict = {}
if lesson_text is not None:
sets["lesson_text"] = lesson_text
if category is not None:
sets["category"] = category
if applied_to_skill is not None:
sets["applied_to_skill"] = applied_to_skill
if not sets:
return {"updated": False, "reason": "nothing to update"}
sets["updated_at"] = "now()" # sentinel — replaced inline below
cols = [c for c in sets if c != "updated_at"]
set_clause = ", ".join(f"{c} = ${i + 2}" for i, c in enumerate(cols))
set_clause += ", updated_at = now()"
values = [sets[c] for c in cols]
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
f"UPDATE decision_lessons SET {set_clause} WHERE id = $1 "
f"RETURNING id, style_corpus_id, lesson_text, category, source, "
f" applied_to_skill, updated_at",
lesson_id, *values,
)
if not row:
return {"updated": False, "reason": "not found"}
return {"updated": True, **dict(row)}
async def delete_decision_lesson(lesson_id: UUID) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
result = await conn.execute(
"DELETE FROM decision_lessons WHERE id = $1", lesson_id,
)
# asyncpg returns "DELETE n"
deleted = result.split(" ", 1)[1].strip() if " " in result else "0"
return {"deleted": deleted != "0"}
async def count_decision_lessons_per_corpus() -> dict[str, int]:
"""Map style_corpus.id (str) → lesson count, for badge display in the list."""
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"SELECT style_corpus_id, count(*) AS n "
"FROM decision_lessons GROUP BY style_corpus_id"
)
return {str(r["style_corpus_id"]): r["n"] for r in rows}
# ── chat (style agent conversations) ───────────────────────────────
async def create_chat_conversation(
*,
title: str = "שיחה חדשה",
style_corpus_id: UUID | None = None,
system_prompt_version: str = "v1",
) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"INSERT INTO chat_conversations "
"(title, style_corpus_id, system_prompt_version) "
"VALUES ($1, $2, $3) "
"RETURNING id, title, style_corpus_id, claude_session_id, "
" system_prompt_version, created_at, last_message_at",
title, style_corpus_id, system_prompt_version,
)
return dict(row) if row else {}
async def list_chat_conversations(limit: int = 50) -> list[dict]:
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"""
SELECT c.id, c.title, c.style_corpus_id, c.claude_session_id,
c.created_at, c.last_message_at,
sc.decision_number,
(SELECT count(*) FROM chat_messages m WHERE m.conversation_id = c.id) AS message_count
FROM chat_conversations c
LEFT JOIN style_corpus sc ON sc.id = c.style_corpus_id
ORDER BY c.last_message_at DESC NULLS LAST
LIMIT $1
""",
limit,
)
return [dict(r) for r in rows]
async def get_chat_conversation(conv_id: UUID) -> dict | None:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT id, title, style_corpus_id, claude_session_id, "
" system_prompt_version, created_at, last_message_at "
"FROM chat_conversations WHERE id = $1",
conv_id,
)
return dict(row) if row else None
async def delete_chat_conversation(conv_id: UUID) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
result = await conn.execute(
"DELETE FROM chat_conversations WHERE id = $1", conv_id,
)
deleted = result.split(" ", 1)[1].strip() if " " in result else "0"
return {"deleted": deleted != "0"}
async def update_chat_conversation_session_id(
conv_id: UUID, claude_session_id: str,
) -> None:
pool = await get_pool()
async with pool.acquire() as conn:
await conn.execute(
"UPDATE chat_conversations SET claude_session_id = $1, "
" last_message_at = now() "
"WHERE id = $2",
claude_session_id, conv_id,
)
async def add_chat_message(
conv_id: UUID,
*,
role: str,
content: str,
raw_events: list | None = None,
) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"INSERT INTO chat_messages "
"(conversation_id, role, content, raw_events) "
"VALUES ($1, $2, $3, $4) "
"RETURNING id, conversation_id, role, content, created_at",
conv_id, role, content, json.dumps(raw_events or []),
)
await conn.execute(
"UPDATE chat_conversations SET last_message_at = now() WHERE id = $1",
conv_id,
)
return dict(row) if row else {}
async def list_chat_messages(conv_id: UUID) -> list[dict]:
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"SELECT id, role, content, created_at "
"FROM chat_messages WHERE conversation_id = $1 "
"ORDER BY created_at ASC",
conv_id,
)
return [dict(r) for r in rows]
async def get_style_patterns(pattern_type: str | None = None) -> list[dict]: async def get_style_patterns(pattern_type: str | None = None) -> list[dict]:
pool = await get_pool() pool = await get_pool()
async with pool.acquire() as conn: async with pool.acquire() as conn:

View File

@@ -0,0 +1,195 @@
"""Auto-extract per-decision metadata for a style_corpus row.
Populates the fields that the upload flow leaves empty — summary, outcome,
key_principles, appeal_subtype, practice_area — by asking Claude (via the
local CLI session) to read the proofread full_text and return a structured
JSON blob.
Caller policy (``apply_to_corpus``): by default we **only fill empty
columns**, so chair-edited values are preserved across re-runs. The chair
can force a refresh by passing ``overwrite=True``.
Why this is a separate module from ``precedent_metadata_extractor``:
that one fills the *external* case_law corpus (court rulings, third-party
committee decisions). This one fills the *style* corpus — Daphna's own
decisions used to teach the writer the in-house voice. The two corpora
have different schemas, different prompts, and different downstream
consumers, so coupling them would have been the wrong shortcut.
"""
from __future__ import annotations
import logging
from uuid import UUID
from legal_mcp.services import claude_session, db
logger = logging.getLogger(__name__)
# A single decision typically runs 200K-650K chars. We sample the head
# (where outcome + parties + framing live) and the tail (where the
# operative ruling sits). Picking from both edges keeps the prompt under
# 60K chars — comfortable for any Claude tier.
_HEAD_CHARS = 25_000
_TAIL_CHARS = 15_000
def _build_text_window(full_text: str) -> str:
if len(full_text) <= _HEAD_CHARS + _TAIL_CHARS:
return full_text
head = full_text[:_HEAD_CHARS]
tail = full_text[-_TAIL_CHARS:]
return (
f"{head}\n\n"
f"[... חתך: {len(full_text) - _HEAD_CHARS - _TAIL_CHARS:,} תווים מהאמצע "
f"הושמטו — שמרנו על ההתחלה (טענות + רקע) ועל הסוף (הכרעה + הוצאות) ...]"
f"\n\n{tail}"
)
# Static instructions — go via ``system`` so the SDK path can cache them
# across batch enrichment runs (24+ decisions in one pass).
METADATA_PROMPT = """אתה מסייע משפטי שמקטלג את הקורפוס הסגנוני של דפנה תמיר (יו"ר ועדת ערר).
תפקידך: לקרוא החלטה אחת ולחלץ מטא-דאטה ל-style_corpus — שדות שהמשתמש לא הזין בעת ההעלאה.
**אל תמציא**. אם המידע לא מופיע בטקסט, השאר מחרוזת ריקה או מערך ריק. אסור להסיק עובדות שלא כתובות.
## פלט נדרש
החזר JSON אחד (object אחד — לא array, לא markdown, לא הסברים):
{
"summary": "תקציר עניני ב-2-3 משפטים: מי העורר, מה דרש, מה הוכרע. סגנון יבש, ניטרלי, ללא שיפוט. דוגמה: 'ערר על דחיית בקשה להיתר לתוספת מרפסת בקומה ג׳. דפנה קיבלה את הערר חלקית — אישרה את המרפסת בהקטנה ל-12 מ״ר.'",
"outcome": "התוצאה התמציתית. אחד מאלה (או צירוף קצר): 'קבלה' / 'קבלה חלקית' / 'דחייה' / 'הסתלקות' / 'החזרה לוועדה המקומית'. אם זה לא ברור — מחרוזת ריקה.",
"key_principles": [
"עיקרון משפטי 1 שעולה מההחלטה — משפט אחד, ניסוח מופשט. למשל 'שיקול דעת מוגבל לחריגות בנייה קטנות'.",
"עיקרון 2",
"..."
],
"appeal_subtype": "תת-סוג ערר. ערכים מותרים: 'building_permit' (היתר בנייה / רישוי), 'betterment_levy' (היטל השבחה), 'compensation_197' (פיצויים ס׳ 197), 'use_change' (שימוש חורג), 'tama_38' (תמ\\"א 38), או מחרוזת ריקה אם לא ברור.",
"practice_area": "תחום משפט גנרי. ברירת מחדל: 'appeals_committee'. אם זה במובהק 'planning_law' — סמן.",
"parties_appellant": "שם העורר/ים המרכזיים בהחלטה (אחד או כמה, מופרדים בפסיק). אם זו החלטה מאוחדת — שם הצד המוביל. השאר ריק אם לא ניתן לזהות במדויק.",
"parties_respondent": "שם המשיב/ים. ברירת מחדל לעררי 1xxx ו-8xxx: 'הוועדה המקומית לתכנון ובניה ירושלים' או דומה. השאר ריק אם לא ברור."
}
## כללי איכות
1. **summary** — חייב להזכיר את התוצאה. בלי 'בית המשפט קבע ש...' (אנחנו לא בית משפט). בלי הערכת אישית.
2. **outcome** — קבלה / קבלה חלקית / דחייה / הסתלקות / החזרה לוועדה המקומית. אם דפנה הכריעה חלקית — 'קבלה חלקית'. אסור 'התקבל' או 'נדחה' בלשון פעולה — רק שם פעולה.
3. **key_principles** — 2-5 עקרונות מקסימום. כל אחד משפט אחד. לא ציטוטים מילוליים, אלא תמצות העיקרון.
4. **appeal_subtype** — תמיד פעולה אחת. אם החלטה מערבת כמה תת-סוגים — בחר את העיקרי.
5. **parties_appellant / parties_respondent** — שם בלבד, בלי 'נ׳' או 'נגד'.
החזר רק את ה-JSON. אל תכתוב שום דבר לפניו או אחריו.
"""
async def extract_decision_metadata(corpus_id: UUID | str) -> dict:
"""Run Claude over the row's full_text and return suggested fields.
Does NOT touch the DB. The caller decides what to apply.
"""
if isinstance(corpus_id, str):
corpus_id = UUID(corpus_id)
row = await db.get_style_corpus_row(corpus_id)
if not row:
return {}
full_text = (row.get("full_text") or "").strip()
if not full_text:
return {}
context = (
f"מספר החלטה: {row.get('decision_number') or ''}\n"
f"תאריך: {row.get('decision_date') or ''}\n"
f"תת-סוג נוכחי: {row.get('appeal_subtype') or ''}\n"
f"נושאים מתויגים: {row.get('subject_categories') or ''}"
)
window = _build_text_window(full_text)
user_msg = (
f"## הקלט\n{context}\n\n"
f"--- תחילת ההחלטה ---\n{window}\n--- סוף ההחלטה ---"
)
try:
result = await claude_session.query_json(user_msg, system=METADATA_PROMPT)
except Exception as e:
logger.warning("style_metadata_extractor: query failed: %s", e)
return {}
if not isinstance(result, dict):
logger.warning(
"style_metadata_extractor: expected JSON object, got %s",
type(result).__name__,
)
return {}
out: dict = {}
if isinstance(result.get("summary"), str):
out["summary"] = result["summary"].strip()
if isinstance(result.get("outcome"), str):
out["outcome"] = result["outcome"].strip()
kp = result.get("key_principles") or []
if isinstance(kp, list):
out["key_principles"] = [str(p).strip() for p in kp if str(p).strip()]
if isinstance(result.get("appeal_subtype"), str):
st = result["appeal_subtype"].strip()
# Open enum — but log values outside the documented list so we can
# tighten the prompt later if needed.
known = {
"building_permit", "betterment_levy", "compensation_197",
"use_change", "tama_38", "",
}
if st not in known:
logger.info("style_metadata: unknown appeal_subtype=%r (kept)", st)
out["appeal_subtype"] = st
if isinstance(result.get("practice_area"), str):
out["practice_area"] = result["practice_area"].strip()
# Parties: not stored in the schema today, but worth surfacing in the
# extractor's return value so callers (and the UI's drawer) can display
# them. The list endpoint extracts via regex; LLM output is the
# higher-quality fallback when regex fails.
if isinstance(result.get("parties_appellant"), str):
out["parties_appellant"] = result["parties_appellant"].strip()
if isinstance(result.get("parties_respondent"), str):
out["parties_respondent"] = result["parties_respondent"].strip()
return out
async def extract_and_apply(
corpus_id: UUID | str, *, overwrite: bool = False,
) -> dict:
"""Convenience: extract → apply → return summary of what changed.
Idempotent under default ``overwrite=False`` — re-runs only fill empty
fields. Use ``overwrite=True`` to refresh values the chair (or a prior
extraction) already wrote.
"""
if isinstance(corpus_id, str):
corpus_id = UUID(corpus_id)
suggested = await extract_decision_metadata(corpus_id)
if not suggested:
return {"extracted": False, "applied": False, "reason": "no suggestion"}
update_result = await db.update_style_corpus_metadata(
corpus_id,
summary=suggested.get("summary"),
outcome=suggested.get("outcome"),
key_principles=suggested.get("key_principles"),
appeal_subtype=suggested.get("appeal_subtype"),
practice_area=suggested.get("practice_area"),
overwrite=overwrite,
)
return {
"extracted": True,
"applied": update_result.get("updated", False),
"fields_set": update_result.get("fields", []),
"suggested": suggested,
}

View File

@@ -0,0 +1,85 @@
"""MCP tool wrappers for the style_corpus metadata-enrichment flow.
The actual extractor lives in
``legal_mcp.services.style_metadata_extractor``; this module just exposes
it as MCP tools that the chair (or a future automation) can call from
Claude Code.
Why these tools matter: the upload pipeline (`/api/training/upload` →
`_process_proofread_training`) inserts a style_corpus row with
``summary=''``, ``outcome=''``, ``key_principles=[]`` because LLM
extraction can't run from the FastAPI container (no claude CLI there).
This module fills that gap — call it from the host, where ``claude``
CLI is available, and the row gets enriched.
"""
from __future__ import annotations
import json
from uuid import UUID
from legal_mcp.services import db, style_metadata_extractor
def _ok(payload) -> str:
return json.dumps({"ok": True, **payload}, ensure_ascii=False, default=str)
def _err(msg: str) -> str:
return json.dumps({"ok": False, "error": msg}, ensure_ascii=False)
async def extract_decision_metadata(corpus_id: str, overwrite: bool = False) -> str:
"""חילוץ מטא-דאטה (summary, outcome, key_principles, appeal_subtype) להחלטה בקורפוס הסגנון.
ברירת מחדל ``overwrite=False`` ממלא רק שדות ריקים. הזן ``overwrite=true``
כדי לרענן ערכים שכבר נכתבו.
"""
try:
cid = UUID(corpus_id)
except ValueError:
return _err("corpus_id לא תקין")
try:
result = await style_metadata_extractor.extract_and_apply(cid, overwrite=overwrite)
except Exception as e:
return _err(str(e))
return _ok(result)
async def list_corpus_pending_enrichment(limit: int = 50) -> str:
"""רשימת רשומות style_corpus שחסר להן summary/outcome/key_principles — מועמדות להעשרה."""
pool = await db.get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"""
SELECT id, decision_number, decision_date,
length(full_text) AS chars,
coalesce(summary, '') = '' AS missing_summary,
coalesce(outcome, '') = '' AS missing_outcome,
coalesce(jsonb_array_length(key_principles), 0) = 0 AS missing_principles
FROM style_corpus
WHERE coalesce(summary, '') = ''
OR coalesce(outcome, '') = ''
OR coalesce(jsonb_array_length(key_principles), 0) = 0
ORDER BY decision_date NULLS LAST
LIMIT $1
""",
limit,
)
items = [
{
"corpus_id": str(r["id"]),
"decision_number": r["decision_number"] or "",
"decision_date": str(r["decision_date"]) if r["decision_date"] else "",
"chars": r["chars"],
"missing": [
f for f, v in (
("summary", r["missing_summary"]),
("outcome", r["missing_outcome"]),
("key_principles", r["missing_principles"]),
) if v
],
}
for r in rows
]
return _ok({"count": len(items), "items": items})

View File

@@ -35,6 +35,7 @@
| `compute_ndcg.py` | python | חישוב nDCG@10 על `search_relevance_feedback` (TaskMaster #50, Stage C). aggregation לפי `search_type` ולפי שבוע, כולל top-cited case_law ו-coverage %. דגלים: `--k 10`, `--weeks 12`, `--pretty`. read-only, פלט JSON. משמש גם את `GET /api/admin/rag-metrics` (מיובא inline) — שינוי חתימה ב-`compute()` ישבור את ה-endpoint | ידני / cron עתידי לדיווח שבועי | | `compute_ndcg.py` | python | חישוב nDCG@10 על `search_relevance_feedback` (TaskMaster #50, Stage C). aggregation לפי `search_type` ולפי שבוע, כולל top-cited case_law ו-coverage %. דגלים: `--k 10`, `--weeks 12`, `--pretty`. read-only, פלט JSON. משמש גם את `GET /api/admin/rag-metrics` (מיובא inline) — שינוי חתימה ב-`compute()` ישבור את ה-endpoint | ידני / cron עתידי לדיווח שבועי |
| `backfill_multimodal_precedents.py` | python | Backfill voyage-multimodal-3 page embeddings על רשומות `case_law` (external_upload + internal_committee) שחסרות `precedent_image_embeddings`. בונה אינדקס קבצים מ-`data/precedent-library/` ו-`data/internal-decisions/`, מנסה התאמה לפי tokens של מספרי תיק (כולל parts-match לפורמטים שונים של Nevo doc-id). מדלג על רשומות בלי קובץ-מקור או עם MD בלבד (PyMuPDF לא מרנדר MD). תומך `--dry-run` (default) / `--apply` / `--only external_upload\|internal_committee` / `--limit N`. רץ בקונטיינר (יש `/data` + Voyage env). **הופעל 2026-05-26**: 70 חסרים → 26 backfilled (503 pages, ~$0.21 voyage tokens), 44 אין-קובץ-מקור. ניתן להריץ שוב אחרי שיועלו עוד PDF/DOCX לספרייה | ידני | | `backfill_multimodal_precedents.py` | python | Backfill voyage-multimodal-3 page embeddings על רשומות `case_law` (external_upload + internal_committee) שחסרות `precedent_image_embeddings`. בונה אינדקס קבצים מ-`data/precedent-library/` ו-`data/internal-decisions/`, מנסה התאמה לפי tokens של מספרי תיק (כולל parts-match לפורמטים שונים של Nevo doc-id). מדלג על רשומות בלי קובץ-מקור או עם MD בלבד (PyMuPDF לא מרנדר MD). תומך `--dry-run` (default) / `--apply` / `--only external_upload\|internal_committee` / `--limit N`. רץ בקונטיינר (יש `/data` + Voyage env). **הופעל 2026-05-26**: 70 חסרים → 26 backfilled (503 pages, ~$0.21 voyage tokens), 44 אין-קובץ-מקור. ניתן להריץ שוב אחרי שיועלו עוד PDF/DOCX לספרייה | ידני |
| `monitor_halacha_quality.py` | python | מנטר איכות חילוץ הלכות. בודק drift של `avg(confidence)` בין baseline היסטורי לחלון אחרון. מחזיר JSON מטריקות + alert ב-stderr אם drift > threshold (ברירת מחדל 5%). 2 סדרות: trusted (approved+published) ו-all_extracted. תומך `--window N` / `--threshold X` / `--min-sample N` / `--silent` / `--exit-on-alert`. רץ ב-container או מקומית עם `mcp-server/.venv` (אין תלות ב-LLM, רק SQL). **תזמון מומלץ**: `0 8 * * 1` (יום ראשון 08:00, שבועי) | `0 8 * * 1` (לתזמן) | | `monitor_halacha_quality.py` | python | מנטר איכות חילוץ הלכות. בודק drift של `avg(confidence)` בין baseline היסטורי לחלון אחרון. מחזיר JSON מטריקות + alert ב-stderr אם drift > threshold (ברירת מחדל 5%). 2 סדרות: trusted (approved+published) ו-all_extracted. תומך `--window N` / `--threshold X` / `--min-sample N` / `--silent` / `--exit-on-alert`. רץ ב-container או מקומית עם `mcp-server/.venv` (אין תלות ב-LLM, רק SQL). **תזמון מומלץ**: `0 8 * * 1` (יום ראשון 08:00, שבועי) | `0 8 * * 1` (לתזמן) |
| `audit_training_corpus.py` | python | audit של `style_corpus` — לכל החלטה: שדות מטא-דאטה מאוכלסים (`summary`/`outcome`/`key_principles`/`appeal_subtype`/`subject_categories`), קישור ל-`documents` (FK + chunks + embeddings). מפיק `data/audit/corpus-YYYY-MM-DD.json` + summary בקונסול. דרוש `POSTGRES_URL` או POSTGRES_*. אין תלויות חיצוניות מלבד asyncpg. **רץ מהמכונה המקומית** (לא קונטיינר) — חיבור ישיר ל-Postgres :5433 | ידני / קדם-עבודה לפני enrichment של מטא-דאטה |
## תיקיית `.archive/` — סקריפטים שהושלמו ## תיקיית `.archive/` — סקריפטים שהושלמו

196
scripts/audit_training_corpus.py Executable file
View File

@@ -0,0 +1,196 @@
#!/usr/bin/env python
"""Audit the style_corpus table — list each decision with what's populated and what's missing.
Produces a JSON report at data/audit/corpus-YYYY-MM-DD.json so we can see at a glance
which corpus entries lack summary/outcome/key_principles/appeal_subtype/chunks/embeddings.
Run with the mcp-server venv (has asyncpg):
POSTGRES_URL=postgres://... ./mcp-server/.venv/bin/python scripts/audit_training_corpus.py
Without POSTGRES_URL, falls back to the per-field env vars used by web/mcp-server config.
"""
from __future__ import annotations
import asyncio
import json
import os
import re
import sys
from datetime import UTC, date, datetime
from pathlib import Path
import asyncpg
def _build_dsn() -> str:
if url := os.environ.get("POSTGRES_URL"):
return url
return (
f"postgres://{os.environ.get('POSTGRES_USER', 'legal_ai')}:"
f"{os.environ.get('POSTGRES_PASSWORD', '')}@"
f"{os.environ.get('POSTGRES_HOST', '127.0.0.1')}:"
f"{os.environ.get('POSTGRES_PORT', '5433')}/"
f"{os.environ.get('POSTGRES_DB', 'legal_ai')}"
)
async def audit() -> dict:
dsn = _build_dsn()
conn = await asyncpg.connect(dsn)
try:
rows = await conn.fetch(
"""
SELECT id, decision_number, decision_date, subject_categories,
length(full_text) AS chars,
summary,
outcome,
key_principles,
practice_area,
appeal_subtype,
document_id,
created_at
FROM style_corpus
ORDER BY decision_date NULLS LAST, decision_number
"""
)
# Chunk + embedding counts for each related document — by direct FK first,
# then by title-match for legacy rows where style_corpus.document_id is NULL.
chunk_counts = await conn.fetch(
"""
SELECT d.id AS doc_id, d.title,
count(c.id) AS chunks,
count(c.embedding) FILTER (WHERE c.embedding IS NOT NULL) AS chunks_with_emb
FROM documents d
LEFT JOIN document_chunks c ON c.document_id = d.id
WHERE d.title LIKE '[קורפוס]%' OR d.id IN (SELECT document_id FROM style_corpus WHERE document_id IS NOT NULL)
GROUP BY d.id, d.title
"""
)
finally:
await conn.close()
by_doc_id = {r["doc_id"]: r for r in chunk_counts}
# Index corpus documents by every digit cluster in their title so we can
# match against style_corpus.decision_number regardless of formatting
# (e.g. style_corpus has "1109-25" but title may say "ARAR-25-1109" or
# "ערר 1009-25"). Each digit run >=3 chars becomes a key.
by_digit: dict[str, dict] = {}
for r in chunk_counts:
title = r["title"] or ""
for tok in re.findall(r"\d{3,}", title):
by_digit.setdefault(tok, r)
decisions = []
gaps_total = {
"summary": 0, "outcome": 0, "key_principles": 0,
"appeal_subtype": 0, "subject_categories": 0,
"chunks": 0, "embeddings": 0, "document_id": 0,
}
for row in rows:
cats = row["subject_categories"]
if isinstance(cats, str):
try:
cats = json.loads(cats)
except json.JSONDecodeError:
cats = []
cats = cats or []
kp = row["key_principles"]
if isinstance(kp, str):
try:
kp = json.loads(kp)
except json.JSONDecodeError:
kp = []
kp = kp or []
# Resolve chunks: prefer FK, fall back to digit-cluster match on decision_number.
chunks = 0
chunks_with_emb = 0
if row["document_id"] and row["document_id"] in by_doc_id:
r = by_doc_id[row["document_id"]]
chunks = r["chunks"]
chunks_with_emb = r["chunks_with_emb"]
elif row["decision_number"]:
for tok in re.findall(r"\d{3,}", row["decision_number"]):
if tok in by_digit:
r = by_digit[tok]
chunks = r["chunks"]
chunks_with_emb = r["chunks_with_emb"]
break
missing = []
if not row["summary"]:
missing.append("summary")
gaps_total["summary"] += 1
if not row["outcome"]:
missing.append("outcome")
gaps_total["outcome"] += 1
if not kp:
missing.append("key_principles")
gaps_total["key_principles"] += 1
if not row["appeal_subtype"]:
missing.append("appeal_subtype")
gaps_total["appeal_subtype"] += 1
if not cats:
missing.append("subject_categories")
gaps_total["subject_categories"] += 1
if chunks == 0:
missing.append("chunks")
gaps_total["chunks"] += 1
elif chunks_with_emb < chunks:
missing.append(f"embeddings({chunks_with_emb}/{chunks})")
gaps_total["embeddings"] += 1
if row["document_id"] is None:
missing.append("document_id")
gaps_total["document_id"] += 1
decisions.append({
"id": str(row["id"]),
"decision_number": row["decision_number"] or "",
"decision_date": row["decision_date"].isoformat() if row["decision_date"] else None,
"chars": row["chars"],
"subject_categories": cats,
"practice_area": row["practice_area"] or "",
"appeal_subtype": row["appeal_subtype"] or "",
"summary_len": len(row["summary"] or ""),
"outcome_len": len(row["outcome"] or ""),
"key_principles_count": len(kp),
"chunks": chunks,
"chunks_with_embeddings": chunks_with_emb,
"document_id": str(row["document_id"]) if row["document_id"] else None,
"missing": missing,
"created_at": row["created_at"].isoformat() if row["created_at"] else None,
})
return {
"generated_at": datetime.now(UTC).isoformat(),
"total_decisions": len(decisions),
"gaps_total": gaps_total,
"decisions": decisions,
}
async def main() -> int:
report = await audit()
out_dir = Path(__file__).resolve().parents[1] / "data" / "audit"
out_dir.mkdir(parents=True, exist_ok=True)
today = date.today().isoformat()
out_file = out_dir / f"corpus-{today}.json"
out_file.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8")
# Console summary
print(f"Total decisions: {report['total_decisions']}")
print("Gaps by field (count of decisions missing it):")
for field, n in report["gaps_total"].items():
bar = "" * min(n, 60)
print(f" {field:25s} {n:3d} {bar}")
print(f"\nReport written to {out_file}")
return 0
if __name__ == "__main__":
sys.exit(asyncio.run(main()))

View File

@@ -0,0 +1,48 @@
/**
* pm2 ecosystem entry for legal-chat-service — the host-side SSE bridge
* to ``claude`` CLI that powers the /training chat tab.
*
* Why pm2:
* - Auto-restart if the process dies (claude CLI subprocess failures
* should never leave the service in a half-dead state).
* - Log rotation matches paperclip's behavior so the chair sees
* consistent log paths under ~/.pm2/logs/.
*
* Install (once):
* pm2 start /home/chaim/legal-ai/scripts/legal-chat-service.config.cjs
* pm2 save
*
* Smoke test:
* curl http://127.0.0.1:8770/health
* # → {"ok":true,"service":"legal-chat-service"}
*
* Update:
* pm2 restart legal-chat-service
*
* Stop:
* pm2 stop legal-chat-service
*/
module.exports = {
apps: [
{
name: "legal-chat-service",
cwd: "/home/chaim/legal-ai/mcp-server",
// Run the in-package server via the venv interpreter so all
// imports (claude_session, etc) resolve.
script: "/home/chaim/legal-ai/mcp-server/.venv/bin/python",
args: "-m legal_mcp.chat_service.server --port 8770",
// claude CLI looks up credentials under HOME — make sure it
// sees Daphna's session, not an empty container HOME.
env: {
HOME: "/home/chaim",
PATH: "/home/chaim/.local/bin:/usr/local/bin:/usr/bin:/bin",
PYTHONUNBUFFERED: "1",
},
restart_delay: 5000,
max_restarts: 10,
autorestart: true,
max_memory_restart: "500M",
},
],
};

View File

@@ -1,30 +1,49 @@
"use client"; "use client";
import { useState } from "react";
import Link from "next/link"; import Link from "next/link";
import { Upload } from "lucide-react";
import { AppShell } from "@/components/app-shell"; import { AppShell } from "@/components/app-shell";
import { Button } from "@/components/ui/button";
import { Card, CardContent } from "@/components/ui/card"; import { Card, CardContent } from "@/components/ui/card";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs"; import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { StyleReportPanel } from "@/components/training/style-report-panel"; import { StyleReportPanel } from "@/components/training/style-report-panel";
import { CorpusPanel } from "@/components/training/corpus-panel"; import { CorpusPanel } from "@/components/training/corpus-panel";
import { ComparePanel } from "@/components/training/compare-panel"; import { ComparePanel } from "@/components/training/compare-panel";
import { CuratorPortraitPanel } from "@/components/training/curator-portrait-panel";
import { ChatPanel } from "@/components/training/chat-panel";
import { TrainingUploadDialog } from "@/components/training/upload-dialog";
export default function TrainingPage() { export default function TrainingPage() {
const [uploadOpen, setUploadOpen] = useState(false);
return ( return (
<AppShell> <AppShell>
<section className="space-y-6"> <section className="space-y-6">
<header> <header className="flex items-start justify-between gap-4 flex-wrap">
<nav className="text-[0.78rem] text-ink-muted mb-1"> <div>
<Link href="/" className="hover:text-gold-deep">בית</Link> <nav className="text-[0.78rem] text-ink-muted mb-1">
<span aria-hidden> · </span> <Link href="/" className="hover:text-gold-deep">בית</Link>
<span className="text-navy">אימון סגנון</span> <span aria-hidden> · </span>
</nav> <span className="text-navy">אימון סגנון</span>
<h1 className="text-navy mb-0">הפורטרט הסגנוני של דפנה</h1> </nav>
<p className="text-ink-muted text-sm mt-1 max-w-2xl"> <h1 className="text-navy mb-0">הפורטרט הסגנוני של דפנה</h1>
לוח בקרה של קורפוס האימון סטטיסטיקות, אנטומיית החלטה ממוצעת, <p className="text-ink-muted text-sm mt-1 max-w-2xl">
ביטויי חתימה, וכלי השוואה בין שתי החלטות. לוח בקרה של קורפוס האימון סטטיסטיקות, אנטומיית החלטה ממוצעת,
</p> ביטויי חתימה, וכלי השוואה בין שתי החלטות.
</p>
</div>
<Button
onClick={() => setUploadOpen(true)}
className="bg-navy text-parchment hover:bg-navy-soft shrink-0"
>
<Upload className="w-4 h-4 me-1" />
העלה החלטה
</Button>
</header> </header>
<TrainingUploadDialog open={uploadOpen} onOpenChange={setUploadOpen} />
<div className="h-[2px] bg-gradient-to-l from-transparent via-gold to-transparent" /> <div className="h-[2px] bg-gradient-to-l from-transparent via-gold to-transparent" />
<Card className="bg-surface border-rule shadow-sm"> <Card className="bg-surface border-rule shadow-sm">
@@ -34,6 +53,8 @@ export default function TrainingPage() {
<TabsTrigger value="report">פורטרט סגנון</TabsTrigger> <TabsTrigger value="report">פורטרט סגנון</TabsTrigger>
<TabsTrigger value="corpus">קורפוס</TabsTrigger> <TabsTrigger value="corpus">קורפוס</TabsTrigger>
<TabsTrigger value="compare">השוואה</TabsTrigger> <TabsTrigger value="compare">השוואה</TabsTrigger>
<TabsTrigger value="curator">הסוכן</TabsTrigger>
<TabsTrigger value="chat">שיחה</TabsTrigger>
</TabsList> </TabsList>
<TabsContent value="report" className="mt-5"> <TabsContent value="report" className="mt-5">
@@ -47,6 +68,14 @@ export default function TrainingPage() {
<TabsContent value="compare" className="mt-5"> <TabsContent value="compare" className="mt-5">
<ComparePanel /> <ComparePanel />
</TabsContent> </TabsContent>
<TabsContent value="curator" className="mt-5">
<CuratorPortraitPanel />
</TabsContent>
<TabsContent value="chat" className="mt-5">
<ChatPanel />
</TabsContent>
</Tabs> </Tabs>
</CardContent> </CardContent>
</Card> </Card>

View File

@@ -0,0 +1,434 @@
"use client";
/*
* Style-agent chat panel — the new "שיחה" tab on /training.
*
* Layout: two columns.
* - Sidebar: list of conversations + "+ שיחה חדשה" button
* - Main: thread of messages + composer with SSE streaming
*
* Each message is persisted to the legal-ai DB; the LLM call goes
* out via FastAPI → host's legal-chat-service → claude CLI. There
* is no API cost — the claude CLI uses Daphna's claude.ai
* subscription via the host's auth.
*
* Health gate: if /api/training/chat/health reports the host service
* is unreachable, the composer is replaced by a setup notice telling
* the chair to start the pm2 service.
*/
import { useEffect, useRef, useState } from "react";
import {
Send, Plus, Trash2, Loader2, MessageSquare, Sparkles, AlertTriangle,
} from "lucide-react";
import { toast } from "sonner";
import { Card, CardContent } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Textarea } from "@/components/ui/textarea";
import { ScrollArea } from "@/components/ui/scroll-area";
import { Badge } from "@/components/ui/badge";
import { Skeleton } from "@/components/ui/skeleton";
import {
Select, SelectContent, SelectItem, SelectTrigger, SelectValue,
} from "@/components/ui/select";
import {
chatKeys,
useChatConversation,
useChatConversations,
useChatHealth,
useCorpus,
useCreateChat,
useDeleteChat,
type ChatMessage,
} from "@/lib/api/training";
import { useQueryClient } from "@tanstack/react-query";
export function ChatPanel() {
const [activeId, setActiveId] = useState<string | null>(null);
const health = useChatHealth();
return (
<div className="grid gap-4 lg:grid-cols-[280px_1fr]">
<ConversationsSidebar activeId={activeId} onSelect={setActiveId} />
<div className="space-y-3">
{health.data && !health.data.reachable && (
<ChatServiceWarning health={health.data} />
)}
{activeId ? (
<ChatThread convId={activeId} />
) : (
<Card className="bg-rule-soft/40 border-rule">
<CardContent className="px-6 py-10 text-center text-ink-muted text-sm space-y-2">
<MessageSquare className="w-8 h-8 mx-auto opacity-50" />
<p>בחר שיחה קיימת או פתח חדשה כדי להתחיל לדבר עם סוכן הסגנון.</p>
<p className="text-[0.78rem]">
הסוכן רץ על claude CLI מקומי דרך legal-chat-service. אין עלות API.
</p>
</CardContent>
</Card>
)}
</div>
</div>
);
}
// ── Sidebar: list + new ────────────────────────────────────────────
function ConversationsSidebar({
activeId, onSelect,
}: {
activeId: string | null;
onSelect: (id: string | null) => void;
}) {
const { data: convs, isPending } = useChatConversations();
const { data: corpus } = useCorpus();
const create = useCreateChat();
const del = useDeleteChat();
const [creating, setCreating] = useState(false);
const [newTitle, setNewTitle] = useState("");
const [newCorpusId, setNewCorpusId] = useState<string>("__none__");
const onCreate = async () => {
try {
const conv = await create.mutateAsync({
title: newTitle.trim() || "שיחה חדשה",
style_corpus_id: newCorpusId === "__none__" ? null : newCorpusId,
});
onSelect(conv.id);
setCreating(false);
setNewTitle("");
setNewCorpusId("__none__");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל ביצירת שיחה");
}
};
const onDelete = async (id: string) => {
if (!window.confirm("למחוק את השיחה? פעולה זו לא ניתנת לביטול.")) return;
try {
await del.mutateAsync(id);
if (activeId === id) onSelect(null);
toast.success("השיחה נמחקה");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל במחיקה");
}
};
return (
<Card className="bg-surface border-rule">
<CardContent className="px-3 py-3 space-y-2">
{!creating ? (
<Button
onClick={() => setCreating(true)}
className="w-full bg-navy text-parchment hover:bg-navy-soft"
size="sm"
>
<Plus className="w-4 h-4 me-1" />
שיחה חדשה
</Button>
) : (
<div className="space-y-2 border border-rule rounded p-2 bg-rule-soft/30">
<Textarea
value={newTitle}
onChange={(e) => setNewTitle(e.target.value)}
placeholder="כותרת לשיחה (אופציונלי)"
rows={2} dir="rtl"
/>
<Select value={newCorpusId} onValueChange={setNewCorpusId} dir="rtl">
<SelectTrigger>
<SelectValue placeholder="צמד להחלטה (אופציונלי)" />
</SelectTrigger>
<SelectContent className="max-h-[300px]">
<SelectItem value="__none__"> שיחה כללית </SelectItem>
{corpus?.map((c) => (
<SelectItem key={c.id} value={c.id}>
{c.decision_number || "—"}
{c.decision_date ? ` · ${c.decision_date}` : ""}
</SelectItem>
))}
</SelectContent>
</Select>
<div className="flex gap-1 justify-end">
<Button variant="ghost" size="sm"
onClick={() => { setCreating(false); setNewTitle(""); setNewCorpusId("__none__"); }}>
ביטול
</Button>
<Button size="sm" onClick={onCreate} disabled={create.isPending}
className="bg-navy text-parchment hover:bg-navy-soft">
צור
</Button>
</div>
</div>
)}
<ScrollArea className="h-[520px]">
<ul className="space-y-1">
{isPending && (
<>
<Skeleton className="h-12 w-full" />
<Skeleton className="h-12 w-full" />
</>
)}
{convs?.length === 0 && (
<p className="text-center text-ink-muted text-[0.78rem] py-6">
אין עדיין שיחות
</p>
)}
{convs?.map((c) => {
const active = c.id === activeId;
return (
<li key={c.id}>
<button
onClick={() => onSelect(c.id)}
className={
"w-full text-end rounded-md px-2 py-2 transition " +
(active
? "bg-gold-wash border border-gold/40"
: "hover:bg-rule-soft/60 border border-transparent")
}
>
<div className="text-sm text-navy font-semibold truncate">
{c.title}
</div>
<div className="flex items-center gap-1 text-[0.7rem] text-ink-muted">
{c.decision_number && (
<Badge variant="outline"
className="text-[0.65rem] bg-info-bg text-info border-info/40">
{c.decision_number}
</Badge>
)}
<span className="tabular-nums">{c.message_count}</span>
<MessageSquare className="w-3 h-3" />
<span className="grow text-end">
{new Date(c.last_message_at).toLocaleDateString("he-IL")}
</span>
<button
onClick={(e) => { e.stopPropagation(); onDelete(c.id); }}
className="hover:text-danger"
aria-label="מחק שיחה"
>
<Trash2 className="w-3 h-3" />
</button>
</div>
</button>
</li>
);
})}
</ul>
</ScrollArea>
</CardContent>
</Card>
);
}
// ── Thread + composer ──────────────────────────────────────────────
function ChatThread({ convId }: { convId: string }) {
const { data, isPending } = useChatConversation(convId);
const qc = useQueryClient();
const [draft, setDraft] = useState("");
const [streaming, setStreaming] = useState(false);
const [streamingText, setStreamingText] = useState("");
const [streamError, setStreamError] = useState("");
const scrollRef = useRef<HTMLDivElement | null>(null);
/* Auto-scroll to bottom when new messages arrive. */
useEffect(() => {
const el = scrollRef.current;
if (!el) return;
el.scrollTo({ top: el.scrollHeight, behavior: "smooth" });
}, [data?.messages.length, streamingText]);
const onSend = async () => {
const text = draft.trim();
if (!text || streaming) return;
setDraft("");
setStreaming(true);
setStreamingText("");
setStreamError("");
try {
const res = await fetch(
`/api/training/chat/conversations/${encodeURIComponent(convId)}/messages`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ content: text }),
},
);
if (!res.ok || !res.body) {
const body = await res.text();
throw new Error(`HTTP ${res.status}: ${body.slice(0, 200)}`);
}
// Parse SSE line-by-line. EventSource would be cleaner but it
// doesn't support POST bodies; the manual reader is small.
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let accumulated = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
let nl: number;
while ((nl = buffer.indexOf("\n\n")) !== -1) {
const event = buffer.slice(0, nl);
buffer = buffer.slice(nl + 2);
if (!event.startsWith("data: ")) continue;
try {
const payload = JSON.parse(event.slice("data: ".length));
if (payload.type === "text_delta" && payload.text) {
accumulated += payload.text;
setStreamingText(accumulated);
} else if (payload.type === "error") {
setStreamError(String(payload.message || "שגיאה לא ידועה"));
} else if (payload.type === "done") {
if (payload.text && !accumulated) {
accumulated = payload.text;
setStreamingText(accumulated);
}
}
} catch {
/* ignore non-JSON */
}
}
}
} catch (e) {
setStreamError(e instanceof Error ? e.message : "שגיאה בשיחה");
} finally {
setStreaming(false);
setStreamingText("");
// Refetch the conversation so the persisted assistant turn shows up.
qc.invalidateQueries({ queryKey: chatKeys.conversation(convId) });
qc.invalidateQueries({ queryKey: chatKeys.conversations() });
}
};
if (isPending) return <Skeleton className="h-[560px] w-full" />;
if (!data) return null;
return (
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3 space-y-3">
<header className="flex items-center gap-2 border-b border-rule pb-2">
<Sparkles className="w-4 h-4 text-gold-deep" />
<h3 className="text-navy font-semibold grow">{data.conversation.title}</h3>
{data.conversation.decision_number && (
<Badge variant="outline" className="bg-info-bg text-info border-info/40">
{data.conversation.decision_number}
</Badge>
)}
</header>
<div ref={scrollRef} className="h-[440px] overflow-y-auto space-y-3 pe-1">
{data.messages.length === 0 && !streaming && (
<p className="text-center text-ink-muted text-sm py-8">
התחל בשאלה למשל: &quot;מה מאפיין את הפתיחות של דפנה בעררי 1xxx?&quot;
</p>
)}
{data.messages.map((m) => <MessageBubble key={m.id} message={m} />)}
{streaming && (
<MessageBubble
message={{
id: "streaming",
role: "assistant",
content: streamingText || "(מקליד…)",
created_at: "",
}}
isStreaming
/>
)}
{streamError && (
<div className="rounded-lg border border-danger/40 bg-danger-bg p-3 text-danger text-sm">
{streamError}
</div>
)}
</div>
<div className="border-t border-rule pt-3 space-y-2">
<Textarea
value={draft}
onChange={(e) => setDraft(e.target.value)}
placeholder="שאל את הסוכן… (Shift+Enter לשורה חדשה)"
rows={3} dir="rtl"
disabled={streaming}
onKeyDown={(e) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
void onSend();
}
}}
/>
<div className="flex items-center gap-2">
<p className="text-[0.72rem] text-ink-muted grow">
{data.conversation.claude_session_id
? "שיחה ממשיכה (--resume) — אין צורך לטעון מחדש את ה-system prompt"
: "שיחה חדשה — system prompt ייטען (שני מסמכי ייחוס + רשימת קורפוס)"}
</p>
<Button onClick={onSend} disabled={streaming || !draft.trim()}
className="bg-navy text-parchment hover:bg-navy-soft">
{streaming ? (
<Loader2 className="w-4 h-4 animate-spin me-1" />
) : (
<Send className="w-4 h-4 me-1" />
)}
שלח
</Button>
</div>
</div>
</CardContent>
</Card>
);
}
function MessageBubble({
message, isStreaming = false,
}: { message: ChatMessage; isStreaming?: boolean }) {
const isUser = message.role === "user";
return (
<div className={isUser ? "flex justify-start" : "flex justify-end"}>
<div
className={
"max-w-[85%] rounded-lg px-3 py-2 text-sm leading-relaxed whitespace-pre-wrap " +
(isUser
? "bg-gold-wash text-ink border border-gold/40"
: "bg-rule-soft text-ink border border-rule")
}
dir="rtl"
>
{message.content}
{isStreaming && (
<span className="inline-block w-1.5 h-3.5 bg-navy/60 align-middle ms-1 animate-pulse" />
)}
</div>
</div>
);
}
// ── Service-down warning ──────────────────────────────────────────
function ChatServiceWarning({
health,
}: { health: { reachable: boolean; url: string; error?: string } }) {
return (
<Card className="bg-danger-bg border-danger/40">
<CardContent className="px-4 py-3 space-y-1">
<div className="flex items-center gap-2 text-danger">
<AlertTriangle className="w-4 h-4" />
<strong>שירות הצ&apos;אט אינו זמין</strong>
</div>
<p className="text-[0.78rem] text-danger">
לא ניתן להגיע ל-legal-chat-service בכתובת
<code className="px-1 mx-1 bg-rule-soft rounded">{health.url}</code>.
{health.error && (<> פירוט: <code className="px-1 bg-rule-soft rounded">{health.error}</code></>)}
</p>
<p className="text-[0.72rem] text-ink-muted">
על המכונה המקומית הפעל:&nbsp;
<code className="px-1 bg-rule-soft rounded">
pm2 start /home/chaim/legal-ai/scripts/legal-chat-service.config.cjs
</code>
</p>
</CardContent>
</Card>
);
}

View File

@@ -0,0 +1,402 @@
"use client";
/*
* Side-drawer for inspecting + editing a single style_corpus entry.
*
* Tabs:
* - "פרטים" — show + edit the enriched metadata (decision_number, date,
* subjects, summary, outcome, key_principles, appeal_subtype). Saving
* issues a PATCH /api/training/corpus/{id} and invalidates the list.
* - "תוכן" — read-only full_text view (truncated to 5K with "show more").
* We never let the chair edit full_text from the UI; corrections happen
* by re-uploading via the Upload dialog.
* - "מה למדנו" — per-decision lessons (Phase 4 placeholder for now).
* - "דפוסים" — style_patterns scoped by appeal_subtype.
*
* Why a Sheet, not a Dialog: the drawer needs to coexist with the corpus
* table so the chair can scan multiple decisions without losing context.
* Sheet (side: "left" in RTL = right edge in LTR) gives that without
* stealing the entire viewport.
*/
import { useEffect, useState } from "react";
import { Save, FileText, Tag, Calendar, BookOpen, Loader2 } from "lucide-react";
import { toast } from "sonner";
import {
Sheet, SheetContent, SheetHeader, SheetTitle, SheetDescription,
} from "@/components/ui/sheet";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { Card, CardContent } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Textarea } from "@/components/ui/textarea";
import { Badge } from "@/components/ui/badge";
import { ScrollArea } from "@/components/ui/scroll-area";
import {
usePatchCorpus,
type CorpusDecision,
type CorpusDecisionPatch,
} from "@/lib/api/training";
import { LessonsTab } from "./lessons-tab";
type Props = {
decision: CorpusDecision | null;
onOpenChange: (open: boolean) => void;
};
export function CorpusDetailDrawer({ decision, onOpenChange }: Props) {
// Local editable state for the "details" tab. Re-seeds whenever the
// selected decision changes so the form reflects the row the chair
// clicked.
const [draft, setDraft] = useState<CorpusDecisionPatch>({});
const patch = usePatchCorpus();
/* eslint-disable react-hooks/set-state-in-effect */
useEffect(() => {
if (!decision) {
setDraft({});
return;
}
setDraft({
decision_number: decision.decision_number,
decision_date: decision.decision_date,
subject_categories: decision.subject_categories,
summary: decision.summary,
outcome: decision.outcome,
key_principles: decision.key_principles,
appeal_subtype: decision.appeal_subtype,
practice_area: decision.practice_area,
});
}, [decision]);
/* eslint-enable react-hooks/set-state-in-effect */
const open = decision !== null;
if (!decision) return null;
// Diff against the originally loaded row — only PATCH fields the chair
// actually changed, so concurrent edits to other fields stay intact.
const diff: CorpusDecisionPatch = {};
if (draft.decision_number !== decision.decision_number)
diff.decision_number = draft.decision_number;
if (draft.decision_date !== decision.decision_date)
diff.decision_date = draft.decision_date;
if (draft.summary !== decision.summary)
diff.summary = draft.summary;
if (draft.outcome !== decision.outcome)
diff.outcome = draft.outcome;
if (draft.appeal_subtype !== decision.appeal_subtype)
diff.appeal_subtype = draft.appeal_subtype;
if (draft.practice_area !== decision.practice_area)
diff.practice_area = draft.practice_area;
if (
JSON.stringify(draft.subject_categories) !==
JSON.stringify(decision.subject_categories)
)
diff.subject_categories = draft.subject_categories;
if (
JSON.stringify(draft.key_principles) !==
JSON.stringify(decision.key_principles)
)
diff.key_principles = draft.key_principles;
const isDirty = Object.keys(diff).length > 0;
const onSave = async () => {
if (!isDirty) return;
try {
await patch.mutateAsync({ id: decision.id, patch: diff });
toast.success("המטא-דאטה עודכן");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בשמירה");
}
};
const setSubjects = (raw: string) =>
setDraft((d) => ({
...d,
subject_categories: raw.split(/[,،]/).map((s) => s.trim()).filter(Boolean),
}));
const setPrinciples = (raw: string) =>
setDraft((d) => ({
...d,
key_principles: raw.split("\n").map((s) => s.trim()).filter(Boolean),
}));
return (
<Sheet open={open} onOpenChange={onOpenChange}>
<SheetContent side="left" className="w-full sm:max-w-3xl overflow-y-auto" dir="rtl">
<SheetHeader>
<SheetTitle className="text-navy flex items-center gap-2">
<BookOpen className="w-4 h-4 shrink-0" />
{decision.legal_citation || decision.decision_number || "—"}
</SheetTitle>
<SheetDescription className="text-ink-muted">
{decision.doc_title || "החלטה בקורפוס הסגנוני"}
</SheetDescription>
</SheetHeader>
{/* Summary strip — fast-scan info, always visible above the tabs. */}
<div className="px-6 mt-3 grid grid-cols-2 md:grid-cols-4 gap-3 text-[0.78rem]">
<DataPoint icon={<Calendar className="w-3 h-3" />} label="תאריך"
value={decision.decision_date || "—"} />
<DataPoint icon={<FileText className="w-3 h-3" />} label="תווים"
value={`${(decision.chars / 1000).toFixed(1)}K`} />
<DataPoint icon={<FileText className="w-3 h-3" />} label="עמודים"
value={decision.page_count > 0 ? String(decision.page_count) : "—"} />
<DataPoint icon={<Tag className="w-3 h-3" />} label="תת-סוג"
value={decision.appeal_subtype || "—"} />
</div>
<div className="px-6 pb-6 mt-4">
<Tabs defaultValue="details" dir="rtl">
<TabsList className="bg-rule-soft/60">
<TabsTrigger value="details">פרטים</TabsTrigger>
<TabsTrigger value="content">תוכן</TabsTrigger>
<TabsTrigger value="lessons">מה למדנו</TabsTrigger>
<TabsTrigger value="patterns">דפוסים</TabsTrigger>
</TabsList>
{/* ── Tab: editable metadata ─────────────────────────── */}
<TabsContent value="details" className="mt-4 space-y-4">
<div className="grid grid-cols-2 gap-3">
<Field label="מספר ההחלטה">
<Input value={draft.decision_number ?? ""}
onChange={(e) => setDraft((d) => ({ ...d, decision_number: e.target.value }))}
dir="rtl" />
</Field>
<Field label="תאריך">
<Input type="date" value={draft.decision_date ?? ""}
onChange={(e) => setDraft((d) => ({ ...d, decision_date: e.target.value }))} />
</Field>
</div>
<Field label="נושאים (מופרדים בפסיקים)">
<Input value={(draft.subject_categories ?? []).join(", ")}
onChange={(e) => setSubjects(e.target.value)} dir="rtl" />
{decision.subject_categories.length > 0 && (
<div className="flex flex-wrap gap-1 mt-1">
{decision.subject_categories.map((s) => (
<Badge key={s} variant="outline"
className="text-[0.7rem] bg-gold-wash text-gold-deep border-gold/40">
{s}
</Badge>
))}
</div>
)}
</Field>
<div className="grid grid-cols-2 gap-3">
<Field label="תת-סוג ערר">
<Input value={draft.appeal_subtype ?? ""}
onChange={(e) => setDraft((d) => ({ ...d, appeal_subtype: e.target.value }))}
placeholder="building_permit / betterment_levy / compensation_197"
dir="rtl" />
</Field>
<Field label="תחום משפט">
<Input value={draft.practice_area ?? ""}
onChange={(e) => setDraft((d) => ({ ...d, practice_area: e.target.value }))}
dir="rtl" />
</Field>
</div>
<Field label="תקציר (summary)">
<Textarea value={draft.summary ?? ""} rows={3}
onChange={(e) => setDraft((d) => ({ ...d, summary: e.target.value }))}
placeholder="תקציר חופשי — מי, מה, איך הוכרע"
dir="rtl" />
</Field>
<Field label="התוצאה (outcome)">
<Textarea value={draft.outcome ?? ""} rows={2}
onChange={(e) => setDraft((d) => ({ ...d, outcome: e.target.value }))}
placeholder="קבלה / קבלה חלקית / דחייה — בקצרה"
dir="rtl" />
</Field>
<Field label="עקרונות מרכזיים (שורה לכל אחד)">
<Textarea value={(draft.key_principles ?? []).join("\n")} rows={4}
onChange={(e) => setPrinciples(e.target.value)}
placeholder={"דוגמה:\nשיקול דעת מוגבל לחריגות קטנות\nריפוי פגם רק בנסיבות חריגות"}
dir="rtl" />
</Field>
{decision.parties.appellant && (
<Card className="bg-rule-soft/40 border-rule">
<CardContent className="px-4 py-3 text-[0.78rem] text-ink-soft">
<p><strong className="text-navy">עורר/ת:</strong> {decision.parties.appellant}</p>
{decision.parties.respondent && (
<p className="mt-1"><strong className="text-navy">משיב/ה:</strong> {decision.parties.respondent}</p>
)}
<p className="mt-2 text-ink-muted text-[0.72rem]">
(חולץ אוטומטית מתחילת הטקסט תקן ע&quot;י עריכת ה-full_text במקור.)
</p>
</CardContent>
</Card>
)}
<div className="flex items-center justify-end gap-2 pt-2 border-t border-rule">
<Button variant="ghost" onClick={() => onOpenChange(false)}>
סגור
</Button>
<Button onClick={onSave} disabled={!isDirty || patch.isPending}
className="bg-navy text-parchment hover:bg-navy-soft">
{patch.isPending ? (
<Loader2 className="w-4 h-4 animate-spin me-1" />
) : (
<Save className="w-4 h-4 me-1" />
)}
שמור שינויים
</Button>
</div>
</TabsContent>
{/* ── Tab: full_text (read-only) ─────────────────────── */}
<TabsContent value="content" className="mt-4">
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3">
<p className="text-[0.72rem] text-ink-muted mb-2">
{decision.chars.toLocaleString("he-IL")} תווים · קריאה בלבד
</p>
<ScrollArea className="h-[480px] pe-2">
<p className="text-sm text-ink leading-relaxed whitespace-pre-wrap">
<FullTextLazy id={decision.id} />
</p>
</ScrollArea>
</CardContent>
</Card>
</TabsContent>
{/* ── Tab: lessons (per-decision) ────────────────────── */}
<TabsContent value="lessons" className="mt-4">
<LessonsTab corpusId={decision.id} />
</TabsContent>
{/* ── Tab: patterns scoped by appeal_subtype ─────────── */}
<TabsContent value="patterns" className="mt-4">
<PatternsForSubtype subtype={decision.appeal_subtype} />
</TabsContent>
</Tabs>
</div>
</SheetContent>
</Sheet>
);
}
// ── helpers ────────────────────────────────────────────────────────
function DataPoint({
icon, label, value,
}: { icon: React.ReactNode; label: string; value: string }) {
return (
<div className="flex items-center gap-1 text-ink-muted">
{icon}
<span>{label}:</span>
<span className="font-semibold text-navy tabular-nums truncate">{value}</span>
</div>
);
}
function Field({
label, children,
}: { label: string; children: React.ReactNode }) {
return (
<div className="space-y-1">
<Label className="text-[0.78rem]">{label}</Label>
{children}
</div>
);
}
/* The corpus-list endpoint deliberately doesn't return full_text (too big).
* We fetch it on demand only when the content tab opens.
*
* Implementation note: we don't have a dedicated /api/training/corpus/{id}
* GET endpoint yet. As a thin stopgap we hit a planned `/full-text` shortcut
* via apiRequest; if the endpoint isn't deployed yet the UI just shows the
* fallback message instead of crashing. The full-text endpoint lands with
* the next backend deploy.
*/
function FullTextLazy({ id }: { id: string }) {
const [text, setText] = useState<string>("");
const [loading, setLoading] = useState(true);
const [error, setError] = useState("");
/* eslint-disable react-hooks/set-state-in-effect */
useEffect(() => {
let cancelled = false;
setLoading(true);
setError("");
fetch(`/api/training/corpus/${encodeURIComponent(id)}/full-text`)
.then((r) => (r.ok ? r.json() : Promise.reject(new Error(`HTTP ${r.status}`))))
.then((d: { full_text: string }) => {
if (cancelled) return;
setText(d.full_text || "");
})
.catch((e: Error) => {
if (cancelled) return;
setError(e.message);
})
.finally(() => !cancelled && setLoading(false));
return () => { cancelled = true; };
}, [id]);
/* eslint-enable react-hooks/set-state-in-effect */
if (loading) return <span className="text-ink-muted">טוען</span>;
if (error) return <span className="text-ink-muted">לא נמצא ({error})</span>;
return text;
}
function PatternsForSubtype({ subtype }: { subtype: string }) {
// Filtered patterns endpoint isn't built yet — we fall back to /patterns
// and filter client-side. The result is mediocre when many subtypes share
// patterns; better filtering ships in the metadata-enrichment iteration.
const [data, setData] = useState<Record<string, { pattern_text: string; frequency: number }[]> | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
let cancelled = false;
fetch("/api/training/patterns")
.then((r) => r.json())
.then((d: { by_type: Record<string, { pattern_text: string; frequency: number }[]> }) => {
if (!cancelled) setData(d.by_type);
})
.catch(() => !cancelled && setData({}))
.finally(() => !cancelled && setLoading(false));
return () => { cancelled = true; };
}, []);
if (loading) return <p className="text-ink-muted text-sm text-center py-6">טוען</p>;
if (!data || Object.keys(data).length === 0) {
return <p className="text-ink-muted text-sm text-center py-6">אין דפוסים שמורים הרץ ניתוח סגנון.</p>;
}
return (
<div className="space-y-3">
{subtype && (
<p className="text-[0.78rem] text-ink-muted">
דפוסים בכלל הקורפוס. סינון לפי תת-סוג {subtype} ייושם בעדכון הבא.
</p>
)}
{Object.entries(data).slice(0, 4).map(([type, items]) => (
<Card key={type} className="bg-surface border-rule">
<CardContent className="px-4 py-3">
<h4 className="text-[0.78rem] uppercase tracking-wider text-gold-deep font-semibold mb-2">
{type}
</h4>
<ul className="space-y-1 text-sm text-ink">
{items.slice(0, 6).map((p, i) => (
<li key={i} className="flex items-start gap-2">
<span className="text-[0.72rem] tabular-nums text-ink-muted shrink-0 mt-0.5">
×{p.frequency}
</span>
<span>{p.pattern_text}</span>
</li>
))}
</ul>
</CardContent>
</Card>
))}
</div>
);
}

View File

@@ -1,6 +1,7 @@
"use client"; "use client";
import { Trash2 } from "lucide-react"; import { useState } from "react";
import { Trash2, Sparkles } from "lucide-react";
import { toast } from "sonner"; import { toast } from "sonner";
import { import {
Table, TableBody, TableCell, TableHead, TableHeader, TableRow, Table, TableBody, TableCell, TableHead, TableHeader, TableRow,
@@ -9,12 +10,20 @@ import { Button } from "@/components/ui/button";
import { Badge } from "@/components/ui/badge"; import { Badge } from "@/components/ui/badge";
import { Skeleton } from "@/components/ui/skeleton"; import { Skeleton } from "@/components/ui/skeleton";
import { useCorpus, useDeleteCorpusEntry, type CorpusDecision } from "@/lib/api/training"; import { useCorpus, useDeleteCorpusEntry, type CorpusDecision } from "@/lib/api/training";
import { CorpusDetailDrawer } from "./corpus-detail-drawer";
/* /*
* Corpus tab: table of all decisions currently in the style corpus, with a * Corpus tab: table of all decisions currently in the style corpus.
* single destructive action (remove from corpus). Uses browser confirm() for *
* the confirmation — a full shadcn AlertDialog would be overkill for an * Click any row → opens CorpusDetailDrawer with the enriched metadata
* admin-only destructive action with a server-side safety net. * + edit UI. The trash button is now in its own narrow column and uses
* stopPropagation so deleting a row doesn't also open the drawer.
*
* We use browser confirm() for the destructive action rather than a
* full shadcn AlertDialog because this is a single admin operation
* gated by an API-level safety net (FK cascade is best-effort but
* style_corpus DELETE returns 404 on missing rows, so the worst case
* is a no-op).
*/ */
function formatChars(n: number) { function formatChars(n: number) {
@@ -30,9 +39,12 @@ function formatDate(iso: string) {
} }
} }
function Row({ item }: { item: CorpusDecision }) { function Row({
item, onOpen,
}: { item: CorpusDecision; onOpen: () => void }) {
const del = useDeleteCorpusEntry(); const del = useDeleteCorpusEntry();
const onDelete = async () => { const onDelete = async (e: React.MouseEvent) => {
e.stopPropagation();
if (!window.confirm(`למחוק את החלטה ${item.decision_number} מהקורפוס?`)) return; if (!window.confirm(`למחוק את החלטה ${item.decision_number} מהקורפוס?`)) return;
try { try {
await del.mutateAsync(item.id); await del.mutateAsync(item.id);
@@ -43,7 +55,10 @@ function Row({ item }: { item: CorpusDecision }) {
}; };
return ( return (
<TableRow className="border-rule hover:bg-gold-wash/30"> <TableRow
className="border-rule hover:bg-gold-wash/30 cursor-pointer"
onClick={onOpen}
>
<TableCell className="font-semibold text-navy tabular-nums"> <TableCell className="font-semibold text-navy tabular-nums">
{item.decision_number || "—"} {item.decision_number || "—"}
</TableCell> </TableCell>
@@ -55,20 +70,39 @@ function Row({ item }: { item: CorpusDecision }) {
<span className="text-ink-light"></span> <span className="text-ink-light"></span>
) : ( ) : (
<div className="flex flex-wrap gap-1"> <div className="flex flex-wrap gap-1">
{item.subject_categories.map((s) => ( {item.subject_categories.slice(0, 3).map((s) => (
<Badge <Badge key={s} variant="outline"
key={s} className="text-[0.7rem] bg-gold-wash text-gold-deep border-gold/40">
variant="outline"
className="text-[0.7rem] bg-gold-wash text-gold-deep border-gold/40"
>
{s} {s}
</Badge> </Badge>
))} ))}
{item.subject_categories.length > 3 && (
<span className="text-[0.7rem] text-ink-muted">
+{item.subject_categories.length - 3}
</span>
)}
</div> </div>
)} )}
</TableCell> </TableCell>
<TableCell className="text-[0.78rem] text-ink-soft">
<div className="flex items-center gap-2">
<span className="truncate">{item.legal_citation || "—"}</span>
{item.lessons_count > 0 && (
<Badge variant="outline"
className="text-[0.7rem] bg-info-bg text-info border-info/40 shrink-0">
<Sparkles className="w-3 h-3 me-0.5" />
{item.lessons_count}
</Badge>
)}
</div>
</TableCell>
<TableCell className="text-ink-soft tabular-nums"> <TableCell className="text-ink-soft tabular-nums">
{formatChars(item.chars)} {formatChars(item.chars)}
{item.page_count > 0 && (
<span className="text-ink-muted text-[0.72rem] ms-1">
· {item.page_count} ע׳
</span>
)}
</TableCell> </TableCell>
<TableCell className="text-ink-muted tabular-nums text-[0.78rem]"> <TableCell className="text-ink-muted tabular-nums text-[0.78rem]">
{formatDate(item.created_at)} {formatDate(item.created_at)}
@@ -91,6 +125,7 @@ function Row({ item }: { item: CorpusDecision }) {
export function CorpusPanel() { export function CorpusPanel() {
const { data, isPending, error } = useCorpus(); const { data, isPending, error } = useCorpus();
const [selected, setSelected] = useState<CorpusDecision | null>(null);
if (error) { if (error) {
return ( return (
@@ -101,40 +136,50 @@ export function CorpusPanel() {
} }
return ( return (
<div className="rounded-lg border border-rule bg-surface shadow-sm overflow-hidden"> <>
<Table> <div className="rounded-lg border border-rule bg-surface shadow-sm overflow-hidden">
<TableHeader className="bg-rule-soft/60"> <Table>
<TableRow className="border-rule"> <TableHeader className="bg-rule-soft/60">
<TableHead className="text-navy text-right">מס׳ החלטה</TableHead> <TableRow className="border-rule">
<TableHead className="text-navy text-right">תאריך</TableHead> <TableHead className="text-navy text-right">מס׳ החלטה</TableHead>
<TableHead className="text-navy text-right">נושאים</TableHead> <TableHead className="text-navy text-right">תאריך</TableHead>
<TableHead className="text-navy text-right">תווים</TableHead> <TableHead className="text-navy text-right">נושאים</TableHead>
<TableHead className="text-navy text-right">נוסף בתאריך</TableHead> <TableHead className="text-navy text-right">מראה מקום</TableHead>
<TableHead className="text-navy" /> <TableHead className="text-navy text-right">תווים / עמודים</TableHead>
</TableRow> <TableHead className="text-navy text-right">נוסף בתאריך</TableHead>
</TableHeader> <TableHead className="text-navy" />
<TableBody>
{isPending ? (
[...Array(4)].map((_, i) => (
<TableRow key={i} className="border-rule">
{[...Array(6)].map((_, j) => (
<TableCell key={j}>
<Skeleton className="h-4 w-24" />
</TableCell>
))}
</TableRow>
))
) : data?.length === 0 ? (
<TableRow>
<TableCell colSpan={6} className="text-center text-ink-muted py-12">
הקורפוס ריק
</TableCell>
</TableRow> </TableRow>
) : ( </TableHeader>
data?.map((item) => <Row key={item.id} item={item} />) <TableBody>
)} {isPending ? (
</TableBody> [...Array(4)].map((_, i) => (
</Table> <TableRow key={i} className="border-rule">
</div> {[...Array(7)].map((_, j) => (
<TableCell key={j}>
<Skeleton className="h-4 w-24" />
</TableCell>
))}
</TableRow>
))
) : data?.length === 0 ? (
<TableRow>
<TableCell colSpan={7} className="text-center text-ink-muted py-12">
הקורפוס ריק
</TableCell>
</TableRow>
) : (
data?.map((item) => (
<Row key={item.id} item={item} onOpen={() => setSelected(item)} />
))
)}
</TableBody>
</Table>
</div>
<CorpusDetailDrawer
decision={selected}
onOpenChange={(open) => { if (!open) setSelected(null); }}
/>
</>
); );
} }

View File

@@ -0,0 +1,338 @@
"use client";
/*
* Curator-Portrait tab — shows everything about the agent that learns
* Daphna's style:
* 1. Snapshot stats (curator findings to date, % applied)
* 2. Recent curator findings (last 10) — linked by decision number
* 3. The hermes-curator system prompt, rendered + linked to Gitea
* 4. The style_analyzer training prompts (different lifecycle — runs
* over the corpus at training time, not per-decision)
* 5. Propose-change form — writes a markdown file to disk for chair
* review (no auto-commit)
*
* The prompts are deliberately read-only here: they're symlinked into
* Paperclip and load-bearing for every curator wake. Editing them from
* the UI would silently fork the source of truth.
*/
import { useState } from "react";
import {
Sparkles, ExternalLink, Send, Loader2, FileText, Brain,
CheckCircle2, Clock,
} from "lucide-react";
import { toast } from "sonner";
import { Card, CardContent } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Textarea } from "@/components/ui/textarea";
import { Badge } from "@/components/ui/badge";
import { Skeleton } from "@/components/ui/skeleton";
import { ScrollArea } from "@/components/ui/scroll-area";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { Markdown } from "@/components/ui/markdown";
import {
useCuratorPrompt,
useCuratorStats,
useStyleAnalyzerPrompts,
useSubmitCuratorProposal,
} from "@/lib/api/training";
export function CuratorPortraitPanel() {
return (
<div className="space-y-6">
<StatsCard />
<RecentFindings />
<Tabs defaultValue="curator-prompt" dir="rtl">
<TabsList className="bg-rule-soft/60">
<TabsTrigger value="curator-prompt">פרומפט ה-Curator</TabsTrigger>
<TabsTrigger value="analyzer-prompt">פרומפט אימון הסגנון</TabsTrigger>
<TabsTrigger value="propose">הצעת שינוי</TabsTrigger>
</TabsList>
<TabsContent value="curator-prompt" className="mt-4">
<CuratorPromptCard />
</TabsContent>
<TabsContent value="analyzer-prompt" className="mt-4">
<StyleAnalyzerPromptCard />
</TabsContent>
<TabsContent value="propose" className="mt-4">
<ProposeChangeForm />
</TabsContent>
</Tabs>
</div>
);
}
// ── stats card ─────────────────────────────────────────────────────
function StatsCard() {
const { data, isPending } = useCuratorStats();
if (isPending) {
return (
<div className="grid grid-cols-2 md:grid-cols-4 gap-3">
{[...Array(4)].map((_, i) => <Skeleton key={i} className="h-20 w-full" />)}
</div>
);
}
if (!data) return null;
return (
<div className="grid grid-cols-2 md:grid-cols-4 gap-3">
<Kpi label="ממצאי curator" value={data.total_findings} icon={<Sparkles className="w-4 h-4" />} />
<Kpi label="החלטות שנסקרו" value={`${data.decisions_with_findings}/${data.decisions_total}`} icon={<FileText className="w-4 h-4" />} />
<Kpi label="ממצאים שאומצו ל-SKILL" value={data.findings_applied} icon={<CheckCircle2 className="w-4 h-4" />} />
<Kpi label="ממוצע ממצאים להחלטה"
value={
data.decisions_with_findings > 0
? (data.total_findings / data.decisions_with_findings).toFixed(1)
: "—"
}
icon={<Brain className="w-4 h-4" />}
/>
</div>
);
}
function Kpi({
label, value, icon,
}: { label: string; value: string | number; icon: React.ReactNode }) {
return (
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3">
<div className="flex items-center gap-2 text-ink-muted text-[0.78rem]">
{icon}
<span>{label}</span>
</div>
<p className="text-2xl text-navy font-semibold tabular-nums mt-1">{value}</p>
</CardContent>
</Card>
);
}
// ── recent findings ────────────────────────────────────────────────
function RecentFindings() {
const { data, isPending } = useCuratorStats();
if (isPending) {
return <Skeleton className="h-40 w-full" />;
}
if (!data || data.recent_findings.length === 0) {
return (
<Card className="bg-rule-soft/40 border-rule">
<CardContent className="px-6 py-5 text-center text-ink-muted text-sm">
אין עדיין ממצאים של ה-Curator. הוא מופעל אוטומטית כאשר דפנה מסמנת
החלטה כסופית (mark-final), ושומר את ממצאיו כ-decision_lessons עם
source=&quot;curator&quot;.
</CardContent>
</Card>
);
}
return (
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3">
<h3 className="text-[0.78rem] uppercase tracking-wider text-gold-deep font-semibold mb-3">
ממצאים אחרונים של ה-Curator
</h3>
<ul className="space-y-2">
{data.recent_findings.map((f) => (
<li key={f.id} className="border-b border-rule pb-2 last:border-0 last:pb-0">
<div className="flex items-center gap-2 text-[0.72rem] mb-1">
<Badge variant="outline"
className="bg-info-bg text-info border-info/40">
{f.category}
</Badge>
<span className="text-navy font-semibold tabular-nums">
{f.decision_number || "—"}
</span>
{f.applied_to_skill && (
<Badge variant="outline"
className="bg-success-bg text-success border-success/40">
<CheckCircle2 className="w-3 h-3 me-0.5" />
אומץ
</Badge>
)}
<span className="grow text-ink-muted text-end">
<Clock className="w-3 h-3 inline me-1" />
{new Date(f.created_at).toLocaleDateString("he-IL")}
</span>
</div>
<p className="text-sm text-ink leading-relaxed">{f.lesson_text}</p>
</li>
))}
</ul>
</CardContent>
</Card>
);
}
// ── prompts ────────────────────────────────────────────────────────
function CuratorPromptCard() {
const { data, isPending, error } = useCuratorPrompt();
if (isPending) return <Skeleton className="h-96 w-full" />;
if (error) {
return (
<Card className="bg-danger-bg border-danger/40">
<CardContent className="px-6 py-4 text-danger">{error.message}</CardContent>
</Card>
);
}
if (!data) return null;
return (
<Card className="bg-surface border-rule">
<CardContent className="px-5 py-4 space-y-3">
<div className="flex items-center justify-between gap-2 flex-wrap">
<div>
<h3 className="text-navy font-semibold">{data.filename}</h3>
<p className="text-[0.72rem] text-ink-muted">
{data.bytes.toLocaleString("he-IL")} בייטים ·
עודכן: {new Date(data.last_modified * 1000).toLocaleString("he-IL")}
</p>
</div>
<Button asChild variant="outline" size="sm">
<a href={data.gitea_url} target="_blank" rel="noopener noreferrer">
<ExternalLink className="w-3 h-3 me-1" />
ערוך ב-Gitea
</a>
</Button>
</div>
<ScrollArea className="h-[520px] pe-2 border border-rule rounded p-3 bg-rule-soft/30">
<Markdown content={data.content} />
</ScrollArea>
</CardContent>
</Card>
);
}
function StyleAnalyzerPromptCard() {
const { data, isPending } = useStyleAnalyzerPrompts();
if (isPending) return <Skeleton className="h-96 w-full" />;
if (!data) return null;
return (
<Card className="bg-surface border-rule">
<CardContent className="px-5 py-4 space-y-3">
<div>
<h3 className="text-navy font-semibold">פרומפטים של style_analyzer.py</h3>
<p className="text-[0.72rem] text-ink-muted">
רץ ב-Claude Opus (1M context, עד {data.max_input_tokens.toLocaleString("he-IL")} tokens
input) דרך claude CLI מקומי חינמי, ללא API. נקרא ע&quot;י
<code className="px-1 mx-1 bg-rule-soft rounded">POST /api/training/analyze-style</code>
ומכניס דפוסים ל-<code className="px-1 bg-rule-soft rounded">style_patterns</code>.
</p>
</div>
<Tabs defaultValue="analysis" dir="rtl">
<TabsList className="bg-rule-soft/60">
<TabsTrigger value="analysis">Single-pass (כל הקורפוס)</TabsTrigger>
<TabsTrigger value="single">Multi-pass (החלטה אחת)</TabsTrigger>
<TabsTrigger value="synthesis">Synthesis</TabsTrigger>
</TabsList>
<TabsContent value="analysis" className="mt-3">
<PromptBlock content={data.analysis_prompt} />
</TabsContent>
<TabsContent value="single" className="mt-3">
<PromptBlock content={data.single_decision_prompt} />
</TabsContent>
<TabsContent value="synthesis" className="mt-3">
<PromptBlock content={data.synthesis_prompt} />
</TabsContent>
</Tabs>
</CardContent>
</Card>
);
}
function PromptBlock({ content }: { content: string }) {
return (
<ScrollArea className="h-[420px] pe-2 border border-rule rounded p-3 bg-rule-soft/30">
<pre className="text-[0.78rem] whitespace-pre-wrap font-mono text-ink leading-relaxed"
dir="rtl">
{content}
</pre>
</ScrollArea>
);
}
// ── propose change form ────────────────────────────────────────────
function ProposeChangeForm() {
const [title, setTitle] = useState("");
const [proposedChange, setProposedChange] = useState("");
const [rationale, setRationale] = useState("");
const submit = useSubmitCuratorProposal();
const onSubmit = async (e: React.FormEvent) => {
e.preventDefault();
if (!title.trim() || !proposedChange.trim()) {
toast.error("חובה כותרת ושינוי מוצע");
return;
}
try {
const r = await submit.mutateAsync({
title: title.trim(),
proposed_change: proposedChange.trim(),
rationale: rationale.trim(),
});
toast.success(`נשמרה הצעה: ${r.filename}`);
setTitle(""); setProposedChange(""); setRationale("");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בשמירה");
}
};
return (
<Card className="bg-surface border-rule">
<CardContent className="px-5 py-4">
<h3 className="text-navy font-semibold mb-2">הצעת שינוי לפרומפט ה-Curator</h3>
<p className="text-[0.78rem] text-ink-muted mb-4">
ההצעה תישמר כקובץ Markdown ב-
<code className="px-1 bg-rule-soft rounded">data/curator-proposals/</code>.
חיים יבחן ויאשר ידנית אין שינוי אוטומטי בפרומפט.
</p>
<form onSubmit={onSubmit} className="space-y-3">
<div className="space-y-1">
<Label htmlFor="proposal-title">כותרת השינוי</Label>
<Input id="proposal-title" value={title}
onChange={(e) => setTitle(e.target.value)}
placeholder="לדוגמה: הוסף קטגוריה [צ׳קליסט תוכן] לממצאי ה-curator"
dir="rtl" />
</div>
<div className="space-y-1">
<Label htmlFor="proposal-change">השינוי המוצע (Markdown)</Label>
<Textarea id="proposal-change" value={proposedChange} rows={6}
onChange={(e) => setProposedChange(e.target.value)}
placeholder={"תאר במדויק מה לשנות. אפשר להעתיק את הקטע הקיים ולסמן ב-strikethrough + להוסיף את החדש."}
dir="rtl" />
</div>
<div className="space-y-1">
<Label htmlFor="proposal-rationale">נימוק</Label>
<Textarea id="proposal-rationale" value={rationale} rows={3}
onChange={(e) => setRationale(e.target.value)}
placeholder="למה השינוי הזה חשוב? איזה בעיה הוא פותר?"
dir="rtl" />
</div>
<div className="flex justify-end">
<Button type="submit" disabled={submit.isPending}
className="bg-navy text-parchment hover:bg-navy-soft">
{submit.isPending ? (
<Loader2 className="w-4 h-4 animate-spin me-1" />
) : (
<Send className="w-4 h-4 me-1" />
)}
שלח הצעה
</Button>
</div>
</form>
</CardContent>
</Card>
);
}

View File

@@ -0,0 +1,267 @@
"use client";
/*
* Per-decision lessons editor — lives inside CorpusDetailDrawer's
* "מה למדנו" tab. Lessons are persisted in the decision_lessons table
* (one-to-many on style_corpus) and consumed by hermes-curator and
* future style_analyzer runs as context.
*
* The chair can:
* - Add a lesson typed manually (category = "general" by default)
* - Edit / delete existing lessons
* - Mark a lesson as "applied_to_skill" (informational — doesn't
* auto-commit anything to SKILL.md; chair still curates that file
* manually in git).
*
* Lessons from the curator arrive with source="curator" and are visually
* distinguished by a badge so the chair can audit auto-suggestions.
*/
import { useState } from "react";
import { Plus, Save, Trash2, Loader2, CheckCircle2, Sparkles } from "lucide-react";
import { toast } from "sonner";
import { Button } from "@/components/ui/button";
import { Card, CardContent } from "@/components/ui/card";
import { Textarea } from "@/components/ui/textarea";
import { Badge } from "@/components/ui/badge";
import { Skeleton } from "@/components/ui/skeleton";
import {
Select, SelectContent, SelectItem, SelectTrigger, SelectValue,
} from "@/components/ui/select";
import {
useAddLesson,
useCorpusLessons,
useDeleteLesson,
usePatchLesson,
type DecisionLesson,
} from "@/lib/api/training";
const CATEGORIES = [
{ value: "general", label: "כללי" },
{ value: "style", label: "סגנון" },
{ value: "structure", label: "מבנה" },
{ value: "lexicon", label: "לקסיקון" },
{ value: "tabular", label: "טבלאי" },
] as const;
const SOURCE_BADGE: Record<DecisionLesson["source"], { label: string; cls: string }> = {
manual: { label: "ידני", cls: "bg-rule-soft text-ink-soft" },
chair: { label: "יו״ר", cls: "bg-gold-wash text-gold-deep" },
curator: { label: "Curator", cls: "bg-info-bg text-info" },
style_analyzer: { label: "Analyzer", cls: "bg-success-bg text-success" },
};
export function LessonsTab({ corpusId }: { corpusId: string }) {
const { data, isPending } = useCorpusLessons(corpusId);
const add = useAddLesson(corpusId);
const [draftText, setDraftText] = useState("");
const [draftCategory, setDraftCategory] = useState<DecisionLesson["category"]>("general");
const onAdd = async () => {
const text = draftText.trim();
if (!text) return;
try {
await add.mutateAsync({ lesson_text: text, category: draftCategory });
setDraftText("");
setDraftCategory("general");
toast.success("הלקח נוסף");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בשמירה");
}
};
return (
<div className="space-y-4">
{/* Composer */}
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3 space-y-2">
<h4 className="text-[0.78rem] uppercase tracking-wider text-gold-deep font-semibold">
הוסף לקח להחלטה
</h4>
<Textarea
value={draftText}
onChange={(e) => setDraftText(e.target.value)}
placeholder="מה למדנו מההחלטה הזו? למשל: 'דפנה מעדיפה הוצאות מתונות (5K-10K ₪) גם בערר שהתקבל במלואו'"
rows={3}
dir="rtl"
disabled={add.isPending}
/>
<div className="flex items-center gap-2">
<Select
value={draftCategory}
onValueChange={(v) => setDraftCategory(v as DecisionLesson["category"])}
disabled={add.isPending}
dir="rtl"
>
<SelectTrigger className="w-40">
<SelectValue />
</SelectTrigger>
<SelectContent>
{CATEGORIES.map((c) => (
<SelectItem key={c.value} value={c.value}>{c.label}</SelectItem>
))}
</SelectContent>
</Select>
<div className="grow" />
<Button onClick={onAdd} disabled={add.isPending || !draftText.trim()}
className="bg-navy text-parchment hover:bg-navy-soft">
{add.isPending ? (
<Loader2 className="w-4 h-4 animate-spin me-1" />
) : (
<Plus className="w-4 h-4 me-1" />
)}
שמור לקח
</Button>
</div>
</CardContent>
</Card>
{/* List */}
{isPending ? (
<div className="space-y-2">
{[...Array(3)].map((_, i) => (
<Skeleton key={i} className="h-16 w-full" />
))}
</div>
) : !data || data.length === 0 ? (
<p className="text-center text-ink-muted text-sm py-6">
אין עדיין לקחים להחלטה זו. הוסף לקח ראשון מלמעלה.
</p>
) : (
<div className="space-y-2">
{data.map((lesson) => (
<LessonItem key={lesson.id} lesson={lesson} corpusId={corpusId} />
))}
</div>
)}
</div>
);
}
function LessonItem({
lesson, corpusId,
}: { lesson: DecisionLesson; corpusId: string }) {
const [editing, setEditing] = useState(false);
const [text, setText] = useState(lesson.lesson_text);
const [category, setCategory] = useState<DecisionLesson["category"]>(lesson.category);
const patch = usePatchLesson(corpusId);
const del = useDeleteLesson(corpusId);
const sourceBadge = SOURCE_BADGE[lesson.source];
const dirty = text !== lesson.lesson_text || category !== lesson.category;
const onSave = async () => {
try {
await patch.mutateAsync({
id: lesson.id,
patch: dirty ? { lesson_text: text, category } : {},
});
setEditing(false);
toast.success("הלקח עודכן");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בעדכון");
}
};
const onToggleApplied = async () => {
try {
await patch.mutateAsync({
id: lesson.id,
patch: { applied_to_skill: !lesson.applied_to_skill },
});
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בעדכון");
}
};
const onDelete = async () => {
if (!window.confirm("למחוק את הלקח?")) return;
try {
await del.mutateAsync(lesson.id);
toast.success("נמחק");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל במחיקה");
}
};
return (
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3 space-y-2">
<div className="flex items-center gap-2 text-[0.72rem]">
<Badge variant="outline"
className="bg-rule-soft text-ink-soft">
{CATEGORIES.find((c) => c.value === lesson.category)?.label || lesson.category}
</Badge>
<Badge variant="outline" className={sourceBadge.cls}>
{sourceBadge.label}
</Badge>
{lesson.applied_to_skill && (
<Badge variant="outline"
className="bg-success-bg text-success border-success/40">
<CheckCircle2 className="w-3 h-3 me-1" />
אומץ
</Badge>
)}
<span className="grow text-ink-muted tabular-nums">
{new Date(lesson.created_at).toLocaleDateString("he-IL")}
</span>
</div>
{editing ? (
<>
<Textarea value={text} onChange={(e) => setText(e.target.value)}
rows={3} dir="rtl" />
<div className="flex items-center gap-2">
<Select value={category}
onValueChange={(v) => setCategory(v as DecisionLesson["category"])}
dir="rtl">
<SelectTrigger className="w-40">
<SelectValue />
</SelectTrigger>
<SelectContent>
{CATEGORIES.map((c) => (
<SelectItem key={c.value} value={c.value}>{c.label}</SelectItem>
))}
</SelectContent>
</Select>
<div className="grow" />
<Button variant="ghost" size="sm"
onClick={() => { setEditing(false); setText(lesson.lesson_text); setCategory(lesson.category); }}>
ביטול
</Button>
<Button size="sm" onClick={onSave} disabled={patch.isPending}
className="bg-navy text-parchment hover:bg-navy-soft">
<Save className="w-3 h-3 me-1" />
שמור
</Button>
</div>
</>
) : (
<>
<p className="text-sm text-ink leading-relaxed whitespace-pre-wrap"
onClick={() => setEditing(true)}
style={{ cursor: "text" }}>
{lesson.lesson_text}
</p>
<div className="flex items-center gap-2">
<Button variant="ghost" size="sm" onClick={onToggleApplied}
disabled={patch.isPending}>
<Sparkles className="w-3 h-3 me-1" />
{lesson.applied_to_skill ? "בטל סימון 'אומץ'" : "סמן כ'אומץ ל-SKILL'"}
</Button>
<Button variant="ghost" size="sm" onClick={() => setEditing(true)}>
ערוך
</Button>
<div className="grow" />
<Button variant="ghost" size="sm" onClick={onDelete}
disabled={del.isPending}
className="text-danger hover:text-danger hover:bg-danger-bg">
<Trash2 className="w-3 h-3" />
</Button>
</div>
</>
)}
</CardContent>
</Card>
);
}

View File

@@ -0,0 +1,328 @@
"use client";
/*
* Upload a Daphna decision into the style corpus, from the /training page.
*
* The flow is three explicit steps inside the same sheet:
* 1. file picker → POST /api/upload (gets sanitized filename)
* 2. preview → POST /api/training/analyze (proofread + auto-extracted meta)
* chair can correct decision_number / decision_date / subjects
* 3. commit → POST /api/training/upload (background task)
* progress watched via SSE; on completion we invalidate
* corpus + style-report so the new row appears.
*
* The Sheet UX mirrors precedent-upload-sheet.tsx: same dir="rtl", same
* loading + error patterns, same toast on success. The reason this isn't
* a single one-click upload is that style-corpus rows are write-once
* (we don't allow editing full_text), so the chair MUST see the proofread
* preview before committing — otherwise a bad OCR/proofread can silently
* pollute the style portrait.
*/
import { useEffect, useState } from "react";
import { Upload, Loader2, CheckCircle2, AlertCircle, FileText } from "lucide-react";
import { toast } from "sonner";
import { useQueryClient } from "@tanstack/react-query";
import {
Sheet, SheetContent, SheetHeader, SheetTitle, SheetDescription,
} from "@/components/ui/sheet";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Progress } from "@/components/ui/progress";
import { Badge } from "@/components/ui/badge";
import {
trainingKeys,
useAnalyzeTraining,
useCommitTrainingUpload,
useUploadFile,
type AnalyzeTrainingResponse,
} from "@/lib/api/training";
import { useProgress } from "@/lib/api/documents";
const ACCEPT = ".pdf,.docx,.doc,.rtf,.txt,.md";
type Props = {
open: boolean;
onOpenChange: (open: boolean) => void;
};
type Stage = "pick" | "analyzing" | "preview" | "committing" | "done" | "error";
export function TrainingUploadDialog({ open, onOpenChange }: Props) {
const [stage, setStage] = useState<Stage>("pick");
const [file, setFile] = useState<File | null>(null);
const [analysis, setAnalysis] = useState<AnalyzeTrainingResponse | null>(null);
// editable copies of the auto-extracted metadata
const [decisionNumber, setDecisionNumber] = useState("");
const [decisionDate, setDecisionDate] = useState("");
const [subjectsRaw, setSubjectsRaw] = useState("");
const [title, setTitle] = useState("");
const [taskId, setTaskId] = useState<string | null>(null);
const [errorMsg, setErrorMsg] = useState("");
const uploadFile = useUploadFile();
const analyze = useAnalyzeTraining();
const commit = useCommitTrainingUpload();
const progress = useProgress(taskId);
const qc = useQueryClient();
// Reset everything when the sheet closes — important because Sheet keeps
// the component mounted between opens. The cascade-render warning is the
// intended behavior (reset is the side effect we want).
useEffect(() => {
if (open) return;
/* eslint-disable react-hooks/set-state-in-effect */
setStage("pick"); setFile(null); setAnalysis(null);
setDecisionNumber(""); setDecisionDate(""); setSubjectsRaw("");
setTitle(""); setTaskId(null); setErrorMsg("");
/* eslint-enable react-hooks/set-state-in-effect */
}, [open]);
// Watch background task. When complete, invalidate corpus + report so the
// new row + updated stats show up automatically. The setStage call here
// is the deliberate UX (success card → auto-close) — synchronizing UI
// with the external SSE stream is exactly what effects are for.
useEffect(() => {
if (!progress) return;
if (progress.status === "completed") {
qc.invalidateQueries({ queryKey: trainingKeys.corpus() });
qc.invalidateQueries({ queryKey: trainingKeys.report() });
// eslint-disable-next-line react-hooks/set-state-in-effect
setStage("done");
toast.success(`החלטה ${decisionNumber || analysis?.decision_number || ""} נוספה לקורפוס`);
const t = window.setTimeout(() => onOpenChange(false), 1500);
return () => window.clearTimeout(t);
}
if (progress.status === "failed") {
setStage("error");
setErrorMsg(progress.error || "כשל בעיבוד");
}
}, [progress, analysis, decisionNumber, qc, onOpenChange]);
const onPickFile = async (f: File | null) => {
setFile(f);
setErrorMsg("");
if (!f) return;
setStage("analyzing");
try {
const { filename } = await uploadFile.mutateAsync(f);
const result = await analyze.mutateAsync(filename);
setAnalysis(result);
setDecisionNumber(result.decision_number);
setDecisionDate(result.decision_date);
setSubjectsRaw(result.subject_categories.join(", "));
// Default title from the original filename stem (chair can override).
const stem = f.name.replace(/\.[^.]+$/, "");
setTitle(stem);
setStage("preview");
} catch (e) {
setStage("error");
setErrorMsg(e instanceof Error ? e.message : "כשל בקריאת הקובץ");
}
};
const onCommit = async () => {
if (!analysis) return;
setStage("committing");
setErrorMsg("");
try {
const subjects = subjectsRaw
.split(/[,،]/)
.map((s) => s.trim())
.filter(Boolean);
const res = await commit.mutateAsync({
filename: analysis.filename,
decision_number: decisionNumber.trim(),
decision_date: decisionDate || "",
subject_categories: subjects,
title: title.trim() || undefined,
});
setTaskId(res.task_id);
} catch (e) {
setStage("error");
// 409 = duplicate decision_number — surface the backend's Hebrew message.
setErrorMsg(e instanceof Error ? e.message : "כשל בהעלאה");
}
};
const isProcessing =
stage === "analyzing" || stage === "committing" ||
(taskId !== null && progress?.status !== "completed" && progress?.status !== "failed");
const progressStep = (progress as { step?: string } | null)?.step;
return (
<Sheet open={open} onOpenChange={onOpenChange}>
<SheetContent side="left" className="w-full sm:max-w-2xl overflow-y-auto" dir="rtl">
<SheetHeader>
<SheetTitle className="text-navy">העלאת החלטה לקורפוס הסגנון</SheetTitle>
<SheetDescription className="text-ink-muted">
הקובץ יעבור הגהה (סינון Nevo, ניקוד), חילוץ אוטומטי של מספר תיק, תאריך
ונושאים, ויוטמע ב-style_corpus עם chunks ו-embeddings. תוכל לתקן את
פרטי המטא-דאטה לפני שמירה.
</SheetDescription>
</SheetHeader>
<div className="px-6 pb-6 mt-4 space-y-4">
{/* Step 1: pick */}
{stage === "pick" && (
<div className="space-y-2">
<Label htmlFor="t-file">קובץ ההחלטה (PDF / DOCX / DOC / RTF / TXT / MD)</Label>
<Input
id="t-file" type="file" accept={ACCEPT}
onChange={(e) => onPickFile(e.target.files?.[0] ?? null)}
/>
<p className="text-[0.78rem] text-ink-muted">
המערכת תחלץ מהקובץ את מספר התיק, התאריך והנושאים. תוכל לערוך
לפני השמירה.
</p>
</div>
)}
{/* Stage 2: analyzing the file */}
{stage === "analyzing" && (
<div className="rounded-lg border border-rule bg-rule-soft/40 p-6 space-y-2 text-center">
<Loader2 className="w-5 h-5 animate-spin mx-auto text-navy" />
<p className="text-sm text-navy">מבצע הגהה וחילוץ מטא-דאטה</p>
<p className="text-[0.78rem] text-ink-muted">
{file?.name}
</p>
</div>
)}
{/* Stage 3: preview + editable metadata */}
{stage === "preview" && analysis && (
<form
className="space-y-4"
onSubmit={(e) => { e.preventDefault(); onCommit(); }}
>
<div className="rounded-lg border border-rule bg-surface px-4 py-3">
<h3 className="text-[0.78rem] uppercase tracking-wider text-gold-deep font-semibold mb-2">
תצוגה מקדימה של הטקסט הנקי
</h3>
<p className="text-sm text-ink leading-relaxed line-clamp-6 whitespace-pre-wrap">
{analysis.preview}
</p>
<div className="mt-2 flex items-center gap-3 text-[0.72rem] text-ink-muted tabular-nums">
<span className="flex items-center gap-1">
<FileText className="w-3 h-3" />
{analysis.chars.toLocaleString("he-IL")} תווים
</span>
</div>
</div>
<div className="grid grid-cols-2 gap-3">
<div className="space-y-1">
<Label htmlFor="t-decision-number">מספר ההחלטה</Label>
<Input
id="t-decision-number"
value={decisionNumber}
onChange={(e) => setDecisionNumber(e.target.value)}
placeholder="1130-25"
dir="rtl"
/>
</div>
<div className="space-y-1">
<Label htmlFor="t-decision-date">תאריך ההחלטה</Label>
<Input
id="t-decision-date" type="date"
value={decisionDate}
onChange={(e) => setDecisionDate(e.target.value)}
/>
</div>
</div>
<div className="space-y-1">
<Label htmlFor="t-title">כותרת קצרה (אופציונלי)</Label>
<Input
id="t-title" value={title}
onChange={(e) => setTitle(e.target.value)}
placeholder="ARAR-25-1130 - כרמל יצחק" dir="rtl"
/>
</div>
<div className="space-y-1">
<Label htmlFor="t-subjects">נושאים (מופרדים בפסיקים)</Label>
<Input
id="t-subjects" value={subjectsRaw}
onChange={(e) => setSubjectsRaw(e.target.value)}
placeholder="חניה, קווי בניין, שימוש חורג" dir="rtl"
/>
{analysis.subject_categories.length > 0 && (
<div className="flex flex-wrap gap-1 mt-1">
<span className="text-[0.72rem] text-ink-muted">חולץ אוטומטית:</span>
{analysis.subject_categories.map((s) => (
<Badge key={s} variant="outline"
className="text-[0.7rem] bg-gold-wash text-gold-deep border-gold/40">
{s}
</Badge>
))}
</div>
)}
</div>
{errorMsg && (
<div className="rounded-lg border border-danger/40 bg-danger-bg p-3 flex items-center gap-2 text-danger text-sm">
<AlertCircle className="w-4 h-4 shrink-0" />
{errorMsg}
</div>
)}
<div className="flex gap-2 justify-end pt-2">
<Button type="button" variant="ghost"
onClick={() => onOpenChange(false)}
disabled={isProcessing}>
ביטול
</Button>
<Button type="submit" disabled={isProcessing || !decisionNumber.trim()}
className="bg-navy text-parchment hover:bg-navy-soft">
<Upload className="w-4 h-4 me-1" />
שמור בקורפוס
</Button>
</div>
</form>
)}
{/* Stage 4: committing — background task progress */}
{(stage === "committing" || (taskId && stage !== "done" && stage !== "error")) && (
<div className="rounded-lg border border-rule bg-rule-soft/40 p-4 space-y-2">
<div className="flex items-center gap-2 text-sm text-navy">
<Loader2 className="w-4 h-4 animate-spin" />
<span>{progressStep || "מעבד את ההחלטה לקורפוס"}</span>
</div>
<Progress value={progressStep ? 60 : 30} className="h-1.5" />
</div>
)}
{/* Stage 5: success */}
{stage === "done" && (
<div className="rounded-lg border border-gold/40 bg-gold-wash p-4 flex items-center gap-2 text-gold-deep text-sm">
<CheckCircle2 className="w-4 h-4" />
ההחלטה נוספה לקורפוס בהצלחה.
</div>
)}
{/* Stage 6: error (after a failed analyze or upload) */}
{stage === "error" && (
<div className="space-y-3">
<div className="rounded-lg border border-danger/40 bg-danger-bg p-4 flex items-center gap-2 text-danger text-sm">
<AlertCircle className="w-4 h-4 shrink-0" />
{errorMsg || "שגיאה לא ידועה"}
</div>
<div className="flex gap-2 justify-end">
<Button type="button" variant="ghost"
onClick={() => onOpenChange(false)}>
סגור
</Button>
<Button type="button"
onClick={() => { setStage("pick"); setErrorMsg(""); setFile(null); }}>
נסה קובץ אחר
</Button>
</div>
</div>
)}
</div>
</SheetContent>
</Sheet>
);
}

View File

@@ -7,10 +7,13 @@
* - GET /corpus → flat list of decisions for the corpus tab / compare tool * - GET /corpus → flat list of decisions for the corpus tab / compare tool
* - GET /compare?a=UUID&b=UUID → side-by-side comparison * - GET /compare?a=UUID&b=UUID → side-by-side comparison
* - DELETE /corpus/{id} → remove a decision from the corpus * - DELETE /corpus/{id} → remove a decision from the corpus
* - POST /api/upload → multipart file → returns sanitized filename
* - POST /analyze → proofread + extract metadata for preview
* - POST /upload → commit a proofread decision to the corpus (task_id)
*/ */
import { useMutation, useQuery, useQueryClient } from "@tanstack/react-query"; import { useMutation, useQuery, useQueryClient } from "@tanstack/react-query";
import { apiRequest } from "./client"; import { ApiError, apiRequest } from "./client";
export type StyleReport = { export type StyleReport = {
corpus: { corpus: {
@@ -69,6 +72,29 @@ export type CorpusDecision = {
subject_categories: string[]; subject_categories: string[];
chars: number; chars: number;
created_at: string; created_at: string;
// Enriched metadata (added in the corpus-page upgrade).
summary: string;
outcome: string;
key_principles: string[];
appeal_subtype: string;
practice_area: string;
page_count: number;
document_id: string | null;
doc_title: string;
parties: { appellant: string; respondent: string };
legal_citation: string;
lessons_count: number;
};
export type CorpusDecisionPatch = {
decision_number?: string;
decision_date?: string;
subject_categories?: string[];
summary?: string;
outcome?: string;
key_principles?: string[];
appeal_subtype?: string;
practice_area?: string;
}; };
export type CompareResult = { export type CompareResult = {
@@ -149,3 +175,407 @@ export function useDeleteCorpusEntry() {
}, },
}); });
} }
// ── Style-agent chat ─────────────────────────────────────────────
export type ChatConversation = {
id: string;
title: string;
style_corpus_id: string | null;
decision_number: string;
claude_session_id: string | null;
message_count: number;
created_at: string;
last_message_at: string;
};
export type ChatMessage = {
id: string;
role: "user" | "assistant";
content: string;
created_at: string;
};
export type ChatHealth = {
reachable: boolean;
status?: number;
url: string;
error?: string;
};
export const chatKeys = {
conversations: () => [...trainingKeys.all, "chat", "conversations"] as const,
conversation: (id: string) =>
[...trainingKeys.all, "chat", "conversations", id] as const,
health: () => [...trainingKeys.all, "chat", "health"] as const,
};
export function useChatConversations() {
return useQuery({
queryKey: chatKeys.conversations(),
queryFn: ({ signal }) =>
apiRequest<ChatConversation[]>("/api/training/chat/conversations", { signal }),
staleTime: 15_000,
});
}
export function useChatConversation(convId: string | null) {
return useQuery({
queryKey: chatKeys.conversation(convId ?? ""),
queryFn: ({ signal }) =>
apiRequest<{ conversation: ChatConversation; messages: ChatMessage[] }>(
`/api/training/chat/conversations/${encodeURIComponent(convId!)}`,
{ signal },
),
enabled: Boolean(convId),
staleTime: 5_000,
});
}
export function useChatHealth() {
return useQuery({
queryKey: chatKeys.health(),
queryFn: ({ signal }) =>
apiRequest<ChatHealth>("/api/training/chat/health", { signal }),
staleTime: 30_000,
retry: false,
});
}
export function useCreateChat() {
const qc = useQueryClient();
return useMutation({
mutationFn: (body: { title?: string; style_corpus_id?: string | null }) =>
apiRequest<ChatConversation>("/api/training/chat/conversations", {
method: "POST",
body,
}),
onSuccess: () => {
qc.invalidateQueries({ queryKey: chatKeys.conversations() });
},
});
}
export function useDeleteChat() {
const qc = useQueryClient();
return useMutation({
mutationFn: (id: string) =>
apiRequest<{ deleted: boolean }>(
`/api/training/chat/conversations/${encodeURIComponent(id)}`,
{ method: "DELETE" },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: chatKeys.conversations() });
},
});
}
// ── Curator portrait ──────────────────────────────────────────────
export type CuratorPrompt = {
content: string;
filename: string;
bytes: number;
last_modified: number;
gitea_url: string;
};
export type StyleAnalyzerPrompts = {
analysis_prompt: string;
single_decision_prompt: string;
synthesis_prompt: string;
max_input_tokens: number;
};
export type CuratorFinding = {
id: string;
lesson_text: string;
category: string;
applied_to_skill: boolean;
decision_number: string;
decision_date: string;
created_at: string;
};
export type CuratorStats = {
total_findings: number;
decisions_with_findings: number;
decisions_total: number;
findings_applied: number;
recent_findings: CuratorFinding[];
};
export type CuratorProposalInput = {
title: string;
proposed_change: string;
rationale: string;
};
export type CuratorProposalFile = {
filename: string;
bytes: number;
modified_at: number;
};
export const curatorKeys = {
prompt: () => [...trainingKeys.all, "curator", "prompt"] as const,
analyzerPrompt: () => [...trainingKeys.all, "curator", "analyzer-prompt"] as const,
stats: () => [...trainingKeys.all, "curator", "stats"] as const,
proposals: () => [...trainingKeys.all, "curator", "proposals"] as const,
};
export function useCuratorPrompt() {
return useQuery({
queryKey: curatorKeys.prompt(),
queryFn: ({ signal }) =>
apiRequest<CuratorPrompt>("/api/training/curator/prompt", { signal }),
staleTime: 5 * 60_000,
});
}
export function useStyleAnalyzerPrompts() {
return useQuery({
queryKey: curatorKeys.analyzerPrompt(),
queryFn: ({ signal }) =>
apiRequest<StyleAnalyzerPrompts>(
"/api/training/curator/style-analyzer-prompt",
{ signal },
),
staleTime: 5 * 60_000,
});
}
export function useCuratorStats() {
return useQuery({
queryKey: curatorKeys.stats(),
queryFn: ({ signal }) =>
apiRequest<CuratorStats>("/api/training/curator/stats", { signal }),
staleTime: 60_000,
});
}
export function useCuratorProposals() {
return useQuery({
queryKey: curatorKeys.proposals(),
queryFn: ({ signal }) =>
apiRequest<CuratorProposalFile[]>("/api/training/curator/proposals", { signal }),
staleTime: 30_000,
});
}
export function useSubmitCuratorProposal() {
const qc = useQueryClient();
return useMutation({
mutationFn: (body: CuratorProposalInput) =>
apiRequest<{ saved: boolean; filename: string }>(
"/api/training/curator/proposals",
{ method: "POST", body },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: curatorKeys.proposals() });
},
});
}
// ── Upload flow ──────────────────────────────────────────────────
// Three-step pipeline:
// 1. useUploadFile → POST /api/upload (multipart) → { filename }
// 2. useAnalyzeFile → POST /api/training/analyze (form) → preview + extracted metadata
// 3. useCommitUpload → POST /api/training/upload (json) → { task_id }
// Track task_id via useProgress() from documents.ts.
export type UploadFileResponse = {
filename: string; // sanitized, time-prefixed name in UPLOAD_DIR
original_name: string;
size: number;
};
export type AnalyzeTrainingResponse = {
filename: string;
clean_text: string;
preview: string;
decision_number: string;
decision_date: string; // ISO YYYY-MM-DD or ""
subject_categories: string[];
stats: Record<string, unknown>;
chars: number;
};
export type CommitTrainingRequest = {
filename: string;
decision_number: string;
decision_date: string; // YYYY-MM-DD or ""
subject_categories: string[];
title?: string;
};
export type CommitTrainingResponse = { task_id: string };
export function useUploadFile() {
return useMutation({
mutationFn: async (file: File): Promise<UploadFileResponse> => {
const fd = new FormData();
fd.append("file", file);
const res = await fetch("/api/upload", { method: "POST", body: fd });
const contentType = res.headers.get("content-type") ?? "";
const parsed = contentType.includes("application/json")
? await res.json().catch(() => null)
: await res.text().catch(() => null);
if (!res.ok) {
throw new ApiError(
typeof parsed === "object" && parsed && "detail" in parsed
? String((parsed as { detail: unknown }).detail)
: `Upload failed with ${res.status}`,
res.status,
parsed,
);
}
return parsed as UploadFileResponse;
},
});
}
export function useAnalyzeTraining() {
return useMutation({
mutationFn: async (filename: string): Promise<AnalyzeTrainingResponse> => {
const fd = new FormData();
fd.append("filename", filename);
const res = await fetch("/api/training/analyze", {
method: "POST",
body: fd,
});
const contentType = res.headers.get("content-type") ?? "";
const parsed = contentType.includes("application/json")
? await res.json().catch(() => null)
: await res.text().catch(() => null);
if (!res.ok) {
throw new ApiError(
typeof parsed === "object" && parsed && "detail" in parsed
? String((parsed as { detail: unknown }).detail)
: `Analyze failed with ${res.status}`,
res.status,
parsed,
);
}
return parsed as AnalyzeTrainingResponse;
},
});
}
// ── Per-decision lessons ─────────────────────────────────────────
export type DecisionLesson = {
id: string;
style_corpus_id: string;
lesson_text: string;
category: "style" | "structure" | "lexicon" | "tabular" | "general";
source: "manual" | "curator" | "chair" | "style_analyzer";
applied_to_skill: boolean;
created_by: string;
created_at: string;
updated_at: string;
};
export type LessonCreate = {
lesson_text: string;
category?: DecisionLesson["category"];
source?: DecisionLesson["source"];
};
export type LessonPatch = {
lesson_text?: string;
category?: DecisionLesson["category"];
applied_to_skill?: boolean;
};
export const lessonsKeys = {
forCorpus: (corpusId: string) =>
[...trainingKeys.all, "lessons", corpusId] as const,
};
export function useCorpusLessons(corpusId: string | null) {
return useQuery({
queryKey: lessonsKeys.forCorpus(corpusId ?? ""),
queryFn: ({ signal }) =>
apiRequest<DecisionLesson[]>(
`/api/training/corpus/${encodeURIComponent(corpusId!)}/lessons`,
{ signal },
),
enabled: Boolean(corpusId),
staleTime: 30_000,
});
}
export function useAddLesson(corpusId: string) {
const qc = useQueryClient();
return useMutation({
mutationFn: (body: LessonCreate) =>
apiRequest<DecisionLesson>(
`/api/training/corpus/${encodeURIComponent(corpusId)}/lessons`,
{ method: "POST", body },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: lessonsKeys.forCorpus(corpusId) });
// lessons_count on the corpus row is computed server-side, so
// invalidate the list too — otherwise the badge stays stale.
qc.invalidateQueries({ queryKey: trainingKeys.corpus() });
},
});
}
export function usePatchLesson(corpusId: string) {
const qc = useQueryClient();
return useMutation({
mutationFn: ({ id, patch }: { id: string; patch: LessonPatch }) =>
apiRequest<{ updated: boolean }>(
`/api/training/lessons/${encodeURIComponent(id)}`,
{ method: "PATCH", body: patch },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: lessonsKeys.forCorpus(corpusId) });
},
});
}
export function useDeleteLesson(corpusId: string) {
const qc = useQueryClient();
return useMutation({
mutationFn: (id: string) =>
apiRequest<{ deleted: boolean }>(
`/api/training/lessons/${encodeURIComponent(id)}`,
{ method: "DELETE" },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: lessonsKeys.forCorpus(corpusId) });
qc.invalidateQueries({ queryKey: trainingKeys.corpus() });
},
});
}
export function usePatchCorpus() {
const qc = useQueryClient();
return useMutation({
mutationFn: ({ id, patch }: { id: string; patch: CorpusDecisionPatch }) =>
apiRequest<{ updated: boolean; id: string }>(
`/api/training/corpus/${encodeURIComponent(id)}`,
{ method: "PATCH", body: patch },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: trainingKeys.corpus() });
qc.invalidateQueries({ queryKey: trainingKeys.report() });
},
});
}
export function useCommitTrainingUpload() {
// No onSuccess invalidation here — the row only appears after the
// background task finishes. The dialog watches useProgress(task_id)
// and invalidates trainingKeys when status === "completed".
return useMutation({
mutationFn: (body: CommitTrainingRequest) =>
apiRequest<CommitTrainingResponse>("/api/training/upload", {
method: "POST",
body,
}),
});
}

View File

@@ -12,6 +12,7 @@ import subprocess
import sys import sys
import time import time
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
from datetime import date as date_type
from pathlib import Path from pathlib import Path
from uuid import UUID, uuid4 from uuid import UUID, uuid4
@@ -945,32 +946,648 @@ async def training_corpus_delete(corpus_id: str):
return result return result
def _format_legal_citation(decision_number: str, decision_date: str) -> str:
"""Compose the Israeli ועדת ערר citation string from corpus metadata.
Mirrors how decisions are referenced in Daphna's own writing — e.g.
"ערר 1130-25 ועדת ערר ירושלים (26.4.2026)". Empty parts are dropped
gracefully so partially populated rows still produce a readable label.
"""
if not decision_number:
return ""
parts = [f"ערר {decision_number}", "ועדת ערר ירושלים"]
if decision_date:
try:
d = date_type.fromisoformat(decision_date)
parts.append(f"({d.day}.{d.month}.{d.year})")
except ValueError:
pass
return " ".join(parts)
_PARTIES_PATTERNS = (
# "העורר: X" or "העוררים: X". Captures up to a newline / end of stanza.
re.compile(r"העורר(?:ים|ת)?[:\s]+([^\n]{3,120})"),
re.compile(r"המבקש(?:ים|ת)?[:\s]+([^\n]{3,120})"),
re.compile(r"בעניין[:\s]+([^\n]{3,120})"),
)
_RESPONDENT_PATTERNS = (
re.compile(r"המשיב(?:ים|ה|ות)?[:\s]+([^\n]{3,120})"),
re.compile(r"נגד\s*\n+\s*([^\n]{3,120})"),
)
def _extract_parties(text: str) -> dict[str, str]:
"""Best-effort regex extraction of עורר/משיב from the first 5K of full_text.
We only scan the head of the document because the parties are always
declared at the top in Israeli legal decisions. The result is a hint
for display — never authoritative — so a miss returns an empty string
rather than raising.
"""
head = (text or "")[:5000]
appellant = respondent = ""
for pat in _PARTIES_PATTERNS:
m = pat.search(head)
if m:
appellant = m.group(1).strip(" .,-—")
break
for pat in _RESPONDENT_PATTERNS:
m = pat.search(head)
if m:
respondent = m.group(1).strip(" .,-—")
break
return {"appellant": appellant, "respondent": respondent}
@app.get("/api/training/corpus") @app.get("/api/training/corpus")
async def training_corpus_list(): async def training_corpus_list():
"""List all decisions currently in the style corpus.""" """List all decisions currently in the style corpus, with enriched metadata.
Joins to ``documents`` via FK when available, falling back to the
title-token match used in the chunking pipeline so legacy rows with
``style_corpus.document_id IS NULL`` still resolve to their page_count
and chunk counts.
"""
pool = await db.get_pool() pool = await db.get_pool()
async with pool.acquire() as conn: async with pool.acquire() as conn:
rows = await conn.fetch( rows = await conn.fetch(
"SELECT id, decision_number, decision_date, subject_categories, " """
" length(full_text) as chars, created_at " SELECT sc.id,
"FROM style_corpus " sc.decision_number,
"ORDER BY created_at DESC" sc.decision_date,
sc.subject_categories,
length(sc.full_text) AS chars,
substring(sc.full_text from 1 for 5000) AS head_text,
sc.summary,
sc.outcome,
sc.key_principles,
sc.appeal_subtype,
sc.practice_area,
sc.document_id,
sc.created_at,
d.page_count AS page_count,
d.title AS doc_title
FROM style_corpus sc
LEFT JOIN documents d ON d.id = sc.document_id
ORDER BY sc.created_at DESC
"""
) )
return [ lessons_counts = await db.count_decision_lessons_per_corpus()
{ out = []
for r in rows:
cats = r["subject_categories"]
if isinstance(cats, str):
try:
cats = json.loads(cats)
except json.JSONDecodeError:
cats = []
kp = r["key_principles"]
if isinstance(kp, str):
try:
kp = json.loads(kp)
except json.JSONDecodeError:
kp = []
decision_date = str(r["decision_date"]) if r["decision_date"] else ""
parties = _extract_parties(r["head_text"] or "")
out.append({
"id": str(r["id"]), "id": str(r["id"]),
"decision_number": r["decision_number"] or "", "decision_number": r["decision_number"] or "",
"decision_date": str(r["decision_date"]) if r["decision_date"] else "", "decision_date": decision_date,
"subject_categories": ( "subject_categories": cats or [],
json.loads(r["subject_categories"])
if isinstance(r["subject_categories"], str)
else r["subject_categories"] or []
),
"chars": r["chars"], "chars": r["chars"],
"created_at": r["created_at"].isoformat() if r["created_at"] else "", "created_at": r["created_at"].isoformat() if r["created_at"] else "",
# ── enriched fields ──
"summary": r["summary"] or "",
"outcome": r["outcome"] or "",
"key_principles": kp or [],
"appeal_subtype": r["appeal_subtype"] or "",
"practice_area": r["practice_area"] or "",
"page_count": r["page_count"] or 0,
"document_id": str(r["document_id"]) if r["document_id"] else None,
"doc_title": r["doc_title"] or "",
"parties": parties,
"legal_citation": _format_legal_citation(r["decision_number"] or "", decision_date),
"lessons_count": lessons_counts.get(str(r["id"]), 0),
})
return out
# ── Style-agent chat (delegated to legal-chat-service on host) ─────
class ChatConversationCreate(BaseModel):
title: str = "שיחה חדשה"
style_corpus_id: str | None = None # optional — scope chat to a decision
class ChatMessageRequest(BaseModel):
content: str
def _conv_to_json(row: dict) -> dict:
"""Serialize a chat_conversations row for the API."""
return {
"id": str(row["id"]),
"title": row.get("title") or "",
"style_corpus_id": str(row["style_corpus_id"]) if row.get("style_corpus_id") else None,
"decision_number": row.get("decision_number") or "",
"claude_session_id": row.get("claude_session_id"),
"message_count": row.get("message_count", 0),
"created_at": row["created_at"].isoformat() if row.get("created_at") else "",
"last_message_at": row["last_message_at"].isoformat() if row.get("last_message_at") else "",
}
def _msg_to_json(row: dict) -> dict:
return {
"id": str(row["id"]),
"role": row["role"],
"content": row["content"],
"created_at": row["created_at"].isoformat() if row.get("created_at") else "",
}
@app.post("/api/training/chat/conversations")
async def chat_create_conversation(body: ChatConversationCreate):
"""Create a new style-agent chat conversation."""
corpus_uuid: UUID | None = None
if body.style_corpus_id:
try:
corpus_uuid = UUID(body.style_corpus_id)
except ValueError:
raise HTTPException(400, "invalid style_corpus_id")
row = await db.create_chat_conversation(
title=body.title.strip() or "שיחה חדשה",
style_corpus_id=corpus_uuid,
)
if not row:
raise HTTPException(500, "failed to create conversation")
return _conv_to_json(row)
@app.get("/api/training/chat/conversations")
async def chat_list_conversations(limit: int = 50):
rows = await db.list_chat_conversations(limit=limit)
return [_conv_to_json(r) for r in rows]
@app.get("/api/training/chat/conversations/{conv_id}")
async def chat_get_conversation(conv_id: str):
try:
cid = UUID(conv_id)
except ValueError:
raise HTTPException(400, "invalid conv_id")
conv = await db.get_chat_conversation(cid)
if not conv:
raise HTTPException(404, "conversation not found")
messages = await db.list_chat_messages(cid)
return {
"conversation": _conv_to_json(conv),
"messages": [_msg_to_json(m) for m in messages],
}
@app.delete("/api/training/chat/conversations/{conv_id}")
async def chat_delete_conversation(conv_id: str):
try:
cid = UUID(conv_id)
except ValueError:
raise HTTPException(400, "invalid conv_id")
result = await db.delete_chat_conversation(cid)
if not result.get("deleted"):
raise HTTPException(404, "conversation not found")
return result
@app.post("/api/training/chat/conversations/{conv_id}/messages")
async def chat_send_message(conv_id: str, body: ChatMessageRequest):
"""Send a user message; stream the assistant response as SSE.
Proxies through ``web.chat_proxy.stream_chat_message`` to the
legal-chat-service running on the host.
"""
try:
cid = UUID(conv_id)
except ValueError:
raise HTTPException(400, "invalid conv_id")
text = (body.content or "").strip()
if not text:
raise HTTPException(400, "content is required")
from web import chat_proxy
return await chat_proxy.stream_chat_message(cid, text)
@app.get("/api/training/chat/health")
async def chat_health():
"""Probe legal-chat-service liveness from inside the container.
Useful when the UI wants to gracefully degrade ("שירות הצ'אט אינו
זמין") instead of letting messages fail mid-stream.
"""
from web import chat_proxy
try:
async with httpx.AsyncClient(timeout=httpx.Timeout(5.0)) as client:
r = await client.get(f"{chat_proxy.CHAT_SERVICE_URL}/health")
return {"reachable": r.status_code == 200, "status": r.status_code,
"url": chat_proxy.CHAT_SERVICE_URL}
except Exception as e:
return {"reachable": False, "error": str(e),
"url": chat_proxy.CHAT_SERVICE_URL}
# ── Curator portrait — read prompt + stats + accept proposals ──────
# The curator agent's prompt is symlinked into Paperclip, but the source
# lives in the legal-ai repo. Resolve via env so the container (where the
# agent file is mounted from a different path) and the host both work.
_AGENTS_DIR = Path(os.environ.get(
"AGENTS_DIR",
str(Path(__file__).resolve().parent.parent / ".claude" / "agents"),
))
_CURATOR_PROPOSALS_DIR = Path(os.environ.get(
"CURATOR_PROPOSALS_DIR",
str(Path(__file__).resolve().parent.parent / "data" / "curator-proposals"),
))
_GITEA_REPO_BASE = os.environ.get(
"GITEA_REPO_BASE",
"https://gitea.nautilus.marcusgroup.org/ezer-mishpati/legal-ai",
)
@app.get("/api/training/curator/prompt")
async def get_curator_prompt():
"""Return the hermes-curator agent's prompt (read-only) + Gitea source URL.
The file is the canonical source of how the curator analyzes Daphna's
final decisions. Changes go through git/Gitea, not the UI — the UI just
surfaces it for transparency.
"""
path = _AGENTS_DIR / "hermes-curator.md"
if not path.exists():
raise HTTPException(404, f"curator prompt not found at {path}")
try:
content = path.read_text(encoding="utf-8")
stat = path.stat()
except OSError as e:
raise HTTPException(500, f"failed to read curator prompt: {e}")
gitea_url = (
f"{_GITEA_REPO_BASE}/src/branch/main/.claude/agents/hermes-curator.md"
)
return {
"content": content,
"filename": path.name,
"bytes": stat.st_size,
"last_modified": stat.st_mtime,
"gitea_url": gitea_url,
}
@app.get("/api/training/curator/style-analyzer-prompt")
async def get_style_analyzer_prompt():
"""Return the system prompt that style_analyzer.py uses to extract patterns.
Surfaces the *training-time* prompt (Claude Opus 1M context) so the
chair can compare it against the curator's post-export prompt. Both
are shown side-by-side in the curator-portrait tab.
"""
# Embedded as a string so we don't need to import the service module
# here (which would pull in claude_session + db). The prompt is the
# one defined in mcp-server/src/legal_mcp/services/style_analyzer.py.
try:
from legal_mcp.services import style_analyzer
return {
"analysis_prompt": style_analyzer.ANALYSIS_PROMPT,
"single_decision_prompt": style_analyzer.SINGLE_DECISION_PROMPT,
"synthesis_prompt": style_analyzer.SYNTHESIS_PROMPT,
"max_input_tokens": style_analyzer.MAX_INPUT_TOKENS,
} }
for r in rows except Exception as e:
] raise HTTPException(500, f"failed to load style_analyzer prompt: {e}")
@app.get("/api/training/curator/stats")
async def get_curator_stats():
"""Cheap aggregate stats over decision_lessons + style_corpus.
Used by the Curator-Portrait tab to show "10 curator findings across 24
decisions". We deliberately keep this server-side and aggregate so the
UI can render a single card without fanning out N queries.
"""
pool = await db.get_pool()
async with pool.acquire() as conn:
total_lessons = await conn.fetchval(
"SELECT count(*) FROM decision_lessons WHERE source = 'curator'"
)
decisions_with_findings = await conn.fetchval(
"SELECT count(DISTINCT style_corpus_id) FROM decision_lessons "
"WHERE source = 'curator'"
)
total_corpus = await conn.fetchval("SELECT count(*) FROM style_corpus")
applied = await conn.fetchval(
"SELECT count(*) FROM decision_lessons "
"WHERE source = 'curator' AND applied_to_skill = true"
)
# Last 10 curator findings — newest first
recent_rows = await conn.fetch(
"""
SELECT dl.id, dl.lesson_text, dl.category, dl.applied_to_skill,
dl.created_at,
sc.decision_number, sc.decision_date
FROM decision_lessons dl
JOIN style_corpus sc ON sc.id = dl.style_corpus_id
WHERE dl.source = 'curator'
ORDER BY dl.created_at DESC
LIMIT 10
"""
)
return {
"total_findings": total_lessons or 0,
"decisions_with_findings": decisions_with_findings or 0,
"decisions_total": total_corpus or 0,
"findings_applied": applied or 0,
"recent_findings": [
{
"id": str(r["id"]),
"lesson_text": r["lesson_text"],
"category": r["category"],
"applied_to_skill": bool(r["applied_to_skill"]),
"decision_number": r["decision_number"] or "",
"decision_date": str(r["decision_date"]) if r["decision_date"] else "",
"created_at": r["created_at"].isoformat() if r["created_at"] else "",
}
for r in recent_rows
],
}
class CuratorProposal(BaseModel):
title: str
proposed_change: str # markdown — what to change in the prompt
rationale: str # markdown — why
@app.post("/api/training/curator/proposals")
async def create_curator_proposal(body: CuratorProposal):
"""Save a proposed change to the curator prompt as a file on disk.
No automatic commit, no overwrite — the chair (chaim) reviews the
file manually and applies it through git. This is intentional: the
prompt is too load-bearing to mutate from a web UI.
"""
title = (body.title or "").strip()
if not title:
raise HTTPException(400, "title is required")
if not body.proposed_change.strip():
raise HTTPException(400, "proposed_change is required")
_CURATOR_PROPOSALS_DIR.mkdir(parents=True, exist_ok=True)
# Slug-ish filename — strip anything that isn't a Hebrew letter, ASCII
# letter, digit, hyphen, or underscore. Hebrew letters are explicitly
# allowed because most proposals will be in Hebrew.
slug = re.sub(r"[^\w֐-׿\-]+", "-", title)[:60].strip("-_") or "proposal"
today = date_type.today().isoformat()
fname = f"{today}-{slug}.md"
path = _CURATOR_PROPOSALS_DIR / fname
# If a proposal with the same slug already exists today, append a
# numeric suffix so we don't silently overwrite.
idx = 2
while path.exists():
path = _CURATOR_PROPOSALS_DIR / f"{today}-{slug}-{idx}.md"
idx += 1
md = (
f"# הצעת שינוי לפרומפט hermes-curator\n\n"
f"- **תאריך:** {today}\n"
f"- **כותרת:** {title}\n\n"
f"## שינוי מוצע\n\n{body.proposed_change.strip()}\n\n"
f"## נימוק\n\n{body.rationale.strip() or '(לא ניתן)'}\n"
)
try:
path.write_text(md, encoding="utf-8")
except OSError as e:
raise HTTPException(500, f"failed to write proposal: {e}")
return {
"saved": True,
"filename": path.name,
"path": str(path),
"bytes": len(md.encode("utf-8")),
}
@app.get("/api/training/curator/proposals")
async def list_curator_proposals():
"""List proposed-change files in data/curator-proposals/, newest first."""
if not _CURATOR_PROPOSALS_DIR.exists():
return []
items = []
for p in sorted(_CURATOR_PROPOSALS_DIR.iterdir(),
key=lambda f: f.stat().st_mtime, reverse=True):
if not p.is_file() or p.suffix.lower() != ".md":
continue
stat = p.stat()
items.append({
"filename": p.name,
"bytes": stat.st_size,
"modified_at": stat.st_mtime,
})
return items
# ── Per-decision lessons (decision_lessons table) ──────────────────
class LessonCreate(BaseModel):
lesson_text: str
category: str = "general"
source: str = "manual"
class LessonPatch(BaseModel):
lesson_text: str | None = None
category: str | None = None
applied_to_skill: bool | None = None
_LESSON_CATEGORIES = {"style", "structure", "lexicon", "tabular", "general"}
_LESSON_SOURCES = {"manual", "curator", "chair", "style_analyzer"}
def _lesson_to_json(row: dict) -> dict:
return {
"id": str(row["id"]),
"style_corpus_id": str(row["style_corpus_id"]),
"lesson_text": row["lesson_text"],
"category": row["category"],
"source": row["source"],
"applied_to_skill": bool(row["applied_to_skill"]),
"created_by": row.get("created_by", ""),
"created_at": row["created_at"].isoformat() if row.get("created_at") else "",
"updated_at": row["updated_at"].isoformat() if row.get("updated_at") else "",
}
@app.get("/api/training/corpus/{corpus_id}/lessons")
async def list_corpus_lessons(corpus_id: str):
try:
cid = UUID(corpus_id)
except ValueError:
raise HTTPException(400, "invalid corpus_id")
rows = await db.list_decision_lessons(cid)
return [_lesson_to_json(r) for r in rows]
@app.post("/api/training/corpus/{corpus_id}/lessons")
async def add_corpus_lesson(corpus_id: str, body: LessonCreate):
try:
cid = UUID(corpus_id)
except ValueError:
raise HTTPException(400, "invalid corpus_id")
text = (body.lesson_text or "").strip()
if not text:
raise HTTPException(400, "lesson_text is required")
if body.category not in _LESSON_CATEGORIES:
raise HTTPException(400, f"invalid category; allowed: {sorted(_LESSON_CATEGORIES)}")
if body.source not in _LESSON_SOURCES:
raise HTTPException(400, f"invalid source; allowed: {sorted(_LESSON_SOURCES)}")
row = await db.add_decision_lesson(
cid, lesson_text=text, category=body.category, source=body.source,
)
if not row:
raise HTTPException(500, "failed to insert lesson")
return _lesson_to_json(row)
@app.patch("/api/training/lessons/{lesson_id}")
async def patch_corpus_lesson(lesson_id: str, body: LessonPatch):
try:
lid = UUID(lesson_id)
except ValueError:
raise HTTPException(400, "invalid lesson_id")
if body.category is not None and body.category not in _LESSON_CATEGORIES:
raise HTTPException(400, f"invalid category; allowed: {sorted(_LESSON_CATEGORIES)}")
result = await db.update_decision_lesson(
lid,
lesson_text=body.lesson_text,
category=body.category,
applied_to_skill=body.applied_to_skill,
)
if not result.get("updated"):
if result.get("reason") == "not found":
raise HTTPException(404, "lesson not found")
return result # "nothing to update" — 200 with reason
return result
@app.delete("/api/training/lessons/{lesson_id}")
async def delete_corpus_lesson(lesson_id: str):
try:
lid = UUID(lesson_id)
except ValueError:
raise HTTPException(400, "invalid lesson_id")
result = await db.delete_decision_lesson(lid)
if not result.get("deleted"):
raise HTTPException(404, "lesson not found")
return result
@app.get("/api/training/corpus/{corpus_id}/full-text")
async def training_corpus_full_text(corpus_id: str):
"""Return the proofread full_text for a single corpus row.
Kept out of the list endpoint because full_text is large (50K-650K chars
per decision) and the table view only needs counts. The drawer fetches
it on demand when the chair opens the "content" tab.
"""
try:
cid = UUID(corpus_id)
except ValueError:
raise HTTPException(400, "invalid corpus_id")
pool = await db.get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT decision_number, full_text FROM style_corpus WHERE id = $1",
cid,
)
if not row:
raise HTTPException(404, "corpus row not found")
return {
"id": corpus_id,
"decision_number": row["decision_number"] or "",
"full_text": row["full_text"] or "",
}
class TrainingCorpusPatch(BaseModel):
"""Editable metadata fields on a style_corpus row.
full_text is intentionally NOT editable — the corpus is write-once.
For corrections, re-upload the decision via /api/training/upload.
"""
decision_number: str | None = None
decision_date: str | None = None # ISO YYYY-MM-DD, or "" to clear
subject_categories: list[str] | None = None
summary: str | None = None
outcome: str | None = None
key_principles: list[str] | None = None
appeal_subtype: str | None = None
practice_area: str | None = None
@app.patch("/api/training/corpus/{corpus_id}")
async def training_corpus_patch(corpus_id: str, patch: TrainingCorpusPatch):
"""Update metadata fields on a corpus row. Only provided fields are touched."""
try:
cid = UUID(corpus_id)
except ValueError:
raise HTTPException(400, "invalid corpus_id")
fields = patch.model_dump(exclude_none=True)
if not fields:
return {"updated": False, "reason": "no fields to update"}
# Coerce decision_date "" → SQL NULL, otherwise parse as DATE.
if "decision_date" in fields:
v = fields["decision_date"]
if v == "":
fields["decision_date"] = None
else:
try:
fields["decision_date"] = date_type.fromisoformat(v)
except ValueError as e:
raise HTTPException(400, f"invalid decision_date: {e}")
# subject_categories + key_principles are JSONB columns.
if "subject_categories" in fields:
fields["subject_categories"] = json.dumps(fields["subject_categories"])
if "key_principles" in fields:
fields["key_principles"] = json.dumps(fields["key_principles"])
# Build a positional UPDATE — asyncpg doesn't support named parameters.
cols = list(fields.keys())
set_clause = ", ".join(f"{c} = ${i + 2}" for i, c in enumerate(cols))
values = [fields[c] for c in cols]
pool = await db.get_pool()
async with pool.acquire() as conn:
result = await conn.fetchrow(
f"UPDATE style_corpus SET {set_clause} "
f"WHERE id = $1 "
f"RETURNING id, decision_number, decision_date, summary, outcome",
cid, *values,
)
if not result:
raise HTTPException(404, "corpus row not found")
return {
"updated": True,
"id": str(result["id"]),
"decision_number": result["decision_number"] or "",
"decision_date": str(result["decision_date"]) if result["decision_date"] else "",
"summary_len": len(result["summary"] or ""),
"outcome_len": len(result["outcome"] or ""),
}
# Headers that defeat proxy buffering for SSE streams. `X-Accel-Buffering: no` # Headers that defeat proxy buffering for SSE streams. `X-Accel-Buffering: no`

176
web/chat_proxy.py Normal file
View File

@@ -0,0 +1,176 @@
"""FastAPI ↔ legal-chat-service streaming bridge.
The browser hits ``/api/training/chat/conversations/{id}/messages`` on
the legal-ai container. The container is sealed off from the host's
``claude`` CLI (intentional — see ``claude_session.py`` docstring), so
we forward each request to the pm2-managed ``legal-chat-service`` over
loopback (``host.docker.internal:8770``).
Responsibilities:
- Save the user message to ``chat_messages`` before streaming starts.
- Open an HTTP streaming connection to the host service.
- Forward each SSE event to the browser as-is, accumulating the
assistant text and any ``session_id`` so we can persist them once
the stream closes.
- Persist the assistant turn + the CLI's session_id at end-of-stream.
"""
from __future__ import annotations
import json
import logging
import os
from typing import AsyncIterator
from uuid import UUID
import httpx
from fastapi import HTTPException
from fastapi.responses import StreamingResponse
from legal_mcp.services import db
from web import chat_system_prompt
logger = logging.getLogger(__name__)
# legal-chat-service lives on the host. In the container we reach it via
# host.docker.internal — which requires ``extra_hosts: host.docker.internal:host-gateway``
# in the Coolify service definition. Set ``CHAT_SERVICE_URL`` to override
# (handy for local dev outside Docker).
CHAT_SERVICE_URL = os.environ.get(
"CHAT_SERVICE_URL",
"http://host.docker.internal:8770",
)
CHAT_SERVICE_TIMEOUT_S = float(os.environ.get("CHAT_SERVICE_TIMEOUT_S", "3600"))
_SSE_HEADERS = {
"Cache-Control": "no-cache, no-transform",
"X-Accel-Buffering": "no",
"Connection": "keep-alive",
}
async def stream_chat_message(
conversation_id: UUID,
user_message: str,
) -> StreamingResponse:
"""Open SSE stream, forward events, persist when done.
Returns a FastAPI StreamingResponse the route can return directly.
"""
conv = await db.get_chat_conversation(conversation_id)
if not conv:
raise HTTPException(404, "conversation not found")
# Persist the user turn immediately so a network drop doesn't lose it.
await db.add_chat_message(
conversation_id, role="user", content=user_message,
)
is_first_turn = not conv.get("claude_session_id")
system_block: str | None = None
if is_first_turn:
try:
system_block = await chat_system_prompt.build_system_prompt(
corpus_id=conv.get("style_corpus_id"),
)
except Exception as e:
logger.exception("system prompt build failed")
raise HTTPException(500, f"system prompt failed: {e}")
payload = {
"prompt": user_message,
"system": system_block,
"resume_session_id": conv.get("claude_session_id"),
}
async def proxy_stream() -> AsyncIterator[bytes]:
accumulated_text: list[str] = []
events_log: list[dict] = []
new_session_id: str | None = None
try:
timeout_cfg = httpx.Timeout(
CHAT_SERVICE_TIMEOUT_S,
connect=10.0,
read=CHAT_SERVICE_TIMEOUT_S,
)
async with httpx.AsyncClient(timeout=timeout_cfg) as client:
async with client.stream(
"POST",
f"{CHAT_SERVICE_URL}/chat/start",
json=payload,
) as upstream:
if upstream.status_code != 200:
body = await upstream.aread()
msg = body.decode("utf-8", errors="replace")[:300]
err = {"type": "error",
"message": f"chat-service {upstream.status_code}: {msg}"}
yield f"data: {json.dumps(err, ensure_ascii=False)}\n\n".encode("utf-8")
return
async for line in upstream.aiter_lines():
if not line:
yield b"\n"
continue
# Forward verbatim so the browser sees the same
# SSE framing the host emits.
out = line + "\n"
yield out.encode("utf-8")
# Mirror events: capture text + session_id for
# persistence. The line starts with "data: <json>"
# so we strip the prefix before parsing.
if line.startswith("data: "):
try:
event = json.loads(line[len("data: "):])
except json.JSONDecodeError:
continue
events_log.append(event)
t = event.get("type")
if t == "session_id" and event.get("value"):
new_session_id = event["value"]
elif t == "text_delta" and event.get("text"):
accumulated_text.append(event["text"])
elif t == "done" and event.get("text"):
if not accumulated_text:
accumulated_text.append(event["text"])
except httpx.ConnectError:
err = {
"type": "error",
"message": (
f"לא ניתן להגיע ל-legal-chat-service בכתובת {CHAT_SERVICE_URL}. "
"ודא ש-pm2 מריץ אותו: `pm2 status legal-chat-service`."
),
}
yield f"data: {json.dumps(err, ensure_ascii=False)}\n\n".encode("utf-8")
return
except Exception as e:
logger.exception("chat proxy failed")
err = {"type": "error", "message": str(e)}
yield f"data: {json.dumps(err, ensure_ascii=False)}\n\n".encode("utf-8")
return
# End of stream — persist the assistant turn.
try:
full_text = "".join(accumulated_text).strip()
if full_text:
await db.add_chat_message(
conversation_id,
role="assistant",
content=full_text,
raw_events=events_log,
)
if new_session_id:
await db.update_chat_conversation_session_id(
conversation_id, new_session_id,
)
except Exception:
logger.exception("failed to persist assistant turn for conv=%s", conversation_id)
return StreamingResponse(
proxy_stream(),
media_type="text/event-stream",
headers=_SSE_HEADERS,
)

205
web/chat_system_prompt.py Normal file
View File

@@ -0,0 +1,205 @@
"""Compose the system prompt the style-chat agent receives.
The chat runs against the local ``claude`` CLI on the host (via
legal-chat-service). We assemble a once-per-conversation system block
that gives the agent everything it needs to discuss decisions in
Daphna's voice:
- The style guide (``skills/decision/SKILL.md``) — how she writes
- The lessons file (``docs/legal-decision-lessons.md``) — what we've
learned across the corpus
- The corpus-analysis report (``docs/corpus-analysis.md``) — the
structural map of 24+ decisions
- A summary of every style_corpus row (number, date, subjects,
chars + summary if extracted) so the agent can reason about the
whole corpus without us shipping all of it inline
- Optional: when the conversation is scoped to a specific decision
(``style_corpus_id``), append its full_text so the chat can dive
into the text directly
Sent **once**, when the conversation is first created. On subsequent
messages the legal-chat-service uses ``claude --resume <session_id>``
and the on-disk CLI session keeps the system context intact — no need
to re-ship the 100K+ chars of skills + lessons every turn.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
from uuid import UUID
from legal_mcp.services import db
logger = logging.getLogger(__name__)
# The reference files live in the repo at known paths. In the
# container they're mounted alongside the code, so resolve relative
# to web/app.py's parent.
_REPO_ROOT = Path(os.environ.get(
"LEGAL_AI_REPO_ROOT",
str(Path(__file__).resolve().parent.parent),
))
_SKILLS_PATH = _REPO_ROOT / "skills" / "decision" / "SKILL.md"
_LESSONS_PATH = _REPO_ROOT / "docs" / "legal-decision-lessons.md"
_CORPUS_ANALYSIS_PATH = _REPO_ROOT / "docs" / "corpus-analysis.md"
def _safe_read(path: Path, cap_chars: int = 50_000) -> str:
"""Read a file (UTF-8) or return a marker that it's missing.
The cap protects against accidentally injecting an enormous file —
even at 50K, a single source file is the lion's share of the
system prompt budget.
"""
try:
text = path.read_text(encoding="utf-8")
except FileNotFoundError:
return f"(קובץ {path.name} לא נמצא בנתיב {path})"
except OSError as e:
logger.warning("could not read %s: %s", path, e)
return f"(שגיאה בקריאת {path.name}: {e})"
if len(text) > cap_chars:
return text[:cap_chars] + f"\n\n[... חתך ב-{cap_chars:,} תווים מתוך {len(text):,}]"
return text
async def _corpus_summary_block() -> str:
"""Compact one-row-per-decision summary the agent can scan."""
rows = await db.get_pool()
async with rows.acquire() as conn:
records = await conn.fetch(
"""
SELECT decision_number, decision_date, appeal_subtype,
subject_categories, length(full_text) AS chars,
coalesce(summary, '') AS summary,
coalesce(outcome, '') AS outcome
FROM style_corpus
ORDER BY decision_date NULLS LAST
"""
)
if not records:
return "(הקורפוס ריק)"
lines = []
for r in records:
cats = r["subject_categories"]
if isinstance(cats, str):
import json as _json
try:
cats = _json.loads(cats)
except _json.JSONDecodeError:
cats = []
cats_str = ", ".join(cats or []) if cats else ""
date_str = str(r["decision_date"]) if r["decision_date"] else ""
summary = (r["summary"] or "").strip()
outcome = (r["outcome"] or "").strip()
head = f"- **{r['decision_number'] or ''}** ({date_str}) [{r['appeal_subtype'] or ''}] · {r['chars']:,} תווים"
meta = f" נושאים: {cats_str}"
body = ""
if summary:
body = f"\n תקציר: {summary}"
if outcome:
body += f" — תוצאה: {outcome}"
elif outcome:
body = f"\n תוצאה: {outcome}"
lines.append(head + "\n" + meta + body)
return "\n".join(lines)
async def _decision_full_text(corpus_id: UUID) -> str:
pool = await db.get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT decision_number, decision_date, full_text "
"FROM style_corpus WHERE id = $1",
corpus_id,
)
if not row:
return ""
header = f"# החלטה {row['decision_number']} ({row['decision_date']})\n\n"
return header + (row["full_text"] or "")
SYSTEM_PROMPT_HEADER = """\
אתה סוכן הסגנון של עו"ד דפנה תמיר, יו"ר ועדת הערר לתכנון ובניה — מחוז ירושלים.
תפקידך: לעזור לחיים (העוזר המקצועי של דפנה) להבין, לנתח ולחדד את הסגנון
של דפנה. אתה לא כותב החלטות חדשות; אתה דן בסגנון של החלטות קיימות,
מזהה דפוסים, מקפיד שהכותבים העתידיים (ה-writer agent) יישארו נאמנים
לקולה.
יש לך גישה ל:
1. **מדריך הסגנון** של דפנה (skills/decision/SKILL.md) — איך היא כותבת.
2. **הלקחים הגנריים** מהקורפוס (docs/legal-decision-lessons.md) — מה
למדנו לאורך 24+ החלטות. **חובה** להישען על הקבצים האלה כשאתה דן
בסגנון, ולא להמציא תובנות חדשות מהאוויר.
3. **ניתוח הקורפוס** המבני (docs/corpus-analysis.md) — מפת תוכן ופערים.
4. **רשימת ההחלטות בקורפוס** (למטה) — סקירה תמציתית של כל החלטה
שעלתה ל-style_corpus.
5. **טקסט מלא של החלטה ספציפית** (אם השיחה הוצמדה ל-style_corpus_id).
כללי תקשורת:
- כל התשובות בעברית.
- חיים יושב מולך, לא דפנה — אבל המטרה היא לחדד את הסגנון *של דפנה*.
- אם חיים שואל "האם פסקה X מתאימה לסגנון של דפנה?" — תן ניתוח מנומק
שמסתמך על SKILL.md ועל החלטות הקורפוס. אל תמציא ראיות.
- אם אתה צריך החלטה ספציפית שאין בקורפוס — הודע לחיים שיצרף אותה.
- אם חיים אומר לך משהו חדש על דפנה ("דפנה אומרת לעולם אל תפתח החלטה
במילה X") — שמור את זה בזיכרון השיחה; אם זה מצדיק תיעוד קבוע, הצע
לחיים להוסיף את זה כ-decision_lesson (POST /api/training/lessons)
או כתוספת ל-SKILL.md.
- אל תיתן לעצמך אישיות מומצאת — אתה כלי-עזר מקצועי, לא חבר.
"""
async def build_system_prompt(
*,
corpus_id: UUID | None = None,
include_corpus_summary: bool = True,
) -> str:
"""Assemble the full system prompt for a new chat conversation.
Args:
corpus_id: When set, the full_text of that decision is appended
so the chat can dive into the text.
include_corpus_summary: Set False for low-context chats (e.g.
quick "what does Daphna do at the end of a betterment-levy
decision?" — no need to ship 24 summaries).
"""
parts: list[str] = [SYSTEM_PROMPT_HEADER]
parts.append("\n## מדריך הסגנון (skills/decision/SKILL.md)\n")
parts.append(_safe_read(_SKILLS_PATH, cap_chars=40_000))
parts.append("\n\n## לקחים מהקורפוס (docs/legal-decision-lessons.md)\n")
parts.append(_safe_read(_LESSONS_PATH, cap_chars=30_000))
parts.append("\n\n## ניתוח קורפוס מבני (docs/corpus-analysis.md)\n")
parts.append(_safe_read(_CORPUS_ANALYSIS_PATH, cap_chars=15_000))
if include_corpus_summary:
parts.append("\n\n## רשימת ההחלטות בקורפוס הסגנון\n")
try:
parts.append(await _corpus_summary_block())
except Exception as e:
logger.warning("corpus summary failed: %s", e)
parts.append("(שגיאה בטעינת רשימת הקורפוס)")
if corpus_id is not None:
parts.append("\n\n## ההחלטה הספציפית בדיון (full_text)\n")
try:
txt = await _decision_full_text(corpus_id)
if txt:
parts.append(txt[:200_000]) # hard cap
else:
parts.append("(לא נמצאה החלטה — בדוק את ה-corpus_id)")
except Exception as e:
logger.warning("decision full_text failed: %s", e)
parts.append("(שגיאה בטעינת ההחלטה)")
return "\n".join(parts)