feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 2m7s

Six-phase upgrade of /training from a read-only dashboard into a full
Style Studio for managing Daphna's style corpus.

- Upload Sheet on /training: file → proofread preview → commit (no more
  CLI-only `upload-training` skill).
- Rich corpus metadata: GET /api/training/corpus returns summary, outcome,
  key_principles, page_count, parties (regex), legal_citation, lessons_count.
  PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details
  /content/lessons/patterns) replaces the bare table row.
- LLM metadata enrichment: style_metadata_extractor + MCP tools
  (style_corpus_enrich, style_corpus_pending_enrichment) fill summary
  /outcome/key_principles via claude_session (free, host-side).
- Per-decision lessons: new decision_lessons table + 4 REST endpoints +
  LessonsTab in drawer; hermes-curator now auto-posts findings as
  decision_lessons(source=curator).
- Curator Portrait tab: prompt rendered with link to Gitea, recent
  curator findings, style_analyzer training prompts, propose-change
  form that writes proposals to data/curator-proposals/ for manual
  chair review (no auto-mutation of the agent file).
- Style chat tab: SSE-streamed conversations with the style agent.
  New host-side pm2 service (legal-chat-service, port 8770) wraps
  claude CLI with stream-json + --resume continuation; FastAPI proxies
  via host.docker.internal. Zero API cost — uses chaim's claude.ai
  subscription. chat_conversations + chat_messages persist history.

Architecture: keeps the existing rule that claude_session only runs
on the host (not the container). The new legal-chat-service is the
canonical bridge between the container and the local CLI for the chat
feature; everything else (upload, metadata, lessons) stays within the
container's existing capabilities.

Audit script (scripts/audit_training_corpus.py) included for verifying
which corpus rows still need enrichment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-27 10:06:22 +00:00
parent 0629f19d5f
commit bb0cd7c6a2
23 changed files with 4568 additions and 75 deletions

View File

@@ -76,6 +76,24 @@ profiles:
Authorization: Bearer $PAPERCLIP_API_KEY Authorization: Bearer $PAPERCLIP_API_KEY
{ "body": "<my findings>" } { "body": "<my findings>" }
``` ```
5b. **רושם כל ממצא גם ב-API של legal-ai כ-decision_lesson**, כך שיופיע ב-UI
תחת הטאב "מה למדנו" של ההחלטה בקורפוס. דרישה: למצוא קודם את ה-`style_corpus_id`
שתואם ל-`decision_number` של ההחלטה (`GET /api/training/corpus` ולסנן).
לכל ממצא:
```
POST https://legal-ai.nautilus.marcusgroup.org/api/training/corpus/{corpus_id}/lessons
Content-Type: application/json
{
"lesson_text": "<התקציר של הממצא — מה ראיתי + הצעה — שורה אחת>",
"category": "<style|structure|lexicon|tabular|general>",
"source": "curator"
}
```
מיפוי תגי-ממצא ל-`category`:
- `[סגנון]` → `style`
- `[מבנה]` → `structure`
- `[לקסיקון משפטי]` → `lexicon`
- `[טבלאי]` → `tabular`
6. סוגר את ה-issue (status=done) אחרי שכתבתי את ה-comment 6. סוגר את ה-issue (status=done) אחרי שכתבתי את ה-comment
## פורמט ה-comment ## פורמט ה-comment

View File

@@ -91,6 +91,16 @@
- שינויי קוד נכנסים לתוקף אחרי `pm2 restart paperclip` - שינויי קוד נכנסים לתוקף אחרי `pm2 restart paperclip`
- **אין צורך ב-Docker או Coolify** - **אין צורך ב-Docker או Coolify**
**legal-chat-service** — רץ **מקומית דרך pm2** (חדש, מאפריל 2026):
- פורט: `localhost:8770` (loopback בלבד)
- שירות aiohttp קצר שעוטף את `claude` CLI ב-streaming + session continuation, ומשרת את הטאב "שיחה" בדף `/training`. הקונטיינר משדל אליו proxy דרך `host.docker.internal:8770`.
- קוד: [mcp-server/src/legal_mcp/chat_service/](mcp-server/src/legal_mcp/chat_service/)
- התקנה: `pm2 start /home/chaim/legal-ai/scripts/legal-chat-service.config.cjs && pm2 save`
- בריאות: `curl http://127.0.0.1:8770/health``{"ok":true,...}`
- שינויי קוד: `pm2 restart legal-chat-service`
- **אפס עלות API** — claude CLI משתמש ב-claude.ai subscription של chaim. הנחת היסוד של `claude_session.py` (claude CLI מקומי בלבד) נשמרת — השירות הזה הוא הגשר הרשמי בין הקונטיינר לחוץ.
- Coolify dependency: ה-Service Definition של legal-ai חייב להכיל `extra_hosts: host.docker.internal:host-gateway` (אחרת ה-proxy יקבל ConnectError).
--- ---
## מבנה תיקיות ## מבנה תיקיות

View File

@@ -0,0 +1,13 @@
"""legal-chat-service — host-side SSE bridge to ``claude`` CLI.
Runs as a pm2-managed process on the host (port 127.0.0.1:8770 by default).
The legal-ai FastAPI container proxies chat requests to it via
``host.docker.internal:8770``.
Why a separate service:
The chat needs real-time streaming + multi-turn session continuation
(``claude --resume <session_id>``). The container can't run the
claude CLI (no binary, no claude.ai credentials). Splitting this out
keeps the architectural rule of ``claude_session.py`` intact while
enabling the new chat feature for free (no API key).
"""

View File

@@ -0,0 +1,144 @@
"""HTTP+SSE bridge from FastAPI (in container) to local claude CLI.
Endpoints:
POST /chat/start — body: {prompt, system?, resume_session_id?}
returns SSE stream of events from
``claude_session.query_streaming``.
GET /health — liveness probe.
Run with pm2:
pm2 start ecosystem.config.cjs --only legal-chat-service
Standalone for dev:
cd ~/legal-ai/mcp-server
.venv/bin/python -m legal_mcp.chat_service.server --port 8770
We intentionally bind to 127.0.0.1 only — the FastAPI container reaches
us via ``host.docker.internal``, and exposing the bridge publicly would
let anyone run claude CLI commands against Daphna's session.
"""
from __future__ import annotations
import argparse
import asyncio
import json
import logging
import os
import sys
from typing import Any
from aiohttp import web
# Run-via-CLI bootstrap so ``python -m legal_mcp.chat_service.server``
# works even when the package isn't installed (it is in the venv, but
# this safeguard keeps the entrypoint robust).
_pkg_root = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
if _pkg_root not in sys.path:
sys.path.insert(0, _pkg_root)
from legal_mcp.services import claude_session # noqa: E402
logger = logging.getLogger("legal_chat_service")
async def health(request: web.Request) -> web.Response:
return web.json_response({"ok": True, "service": "legal-chat-service"})
async def chat_start(request: web.Request) -> web.StreamResponse:
"""Drive ``claude_session.query_streaming`` and forward events as SSE.
Request body (JSON):
prompt: str — required, user message
system: str | None — system instructions (ignored if resuming)
resume_session_id: str | None — continue a prior CLI session
timeout: int = 3600 — hard timeout for the subprocess
"""
try:
body = await request.json()
except json.JSONDecodeError:
return web.json_response({"error": "invalid JSON body"}, status=400)
prompt = body.get("prompt") or ""
if not prompt.strip():
return web.json_response({"error": "prompt is required"}, status=400)
system = body.get("system")
resume_session_id = body.get("resume_session_id")
timeout = int(body.get("timeout") or 3600)
response = web.StreamResponse(
status=200,
reason="OK",
headers={
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
"Connection": "keep-alive",
# X-Accel-Buffering=no defeats nginx/traefik buffering — the
# FastAPI container proxies via httpx and forwards bytes as
# they arrive, but the inner header is harmless and makes
# browser-direct testing easier.
"X-Accel-Buffering": "no",
},
)
await response.prepare(request)
async def send_event(payload: dict[str, Any]) -> None:
line = f"data: {json.dumps(payload, ensure_ascii=False)}\n\n"
await response.write(line.encode("utf-8"))
try:
async for event in claude_session.query_streaming(
prompt,
system=system,
resume_session_id=resume_session_id,
timeout=timeout,
):
await send_event(event)
if event.get("type") == "done" or event.get("type") == "error":
break
except asyncio.CancelledError:
# Client disconnected — bail cleanly.
logger.info("chat_start: client disconnected")
except Exception as e:
logger.exception("chat_start: streaming failed")
try:
await send_event({"type": "error", "message": str(e)})
except ConnectionResetError:
pass
try:
await response.write_eof()
except ConnectionResetError:
pass
return response
def build_app() -> web.Application:
app = web.Application()
app.router.add_get("/health", health)
app.router.add_post("/chat/start", chat_start)
return app
def main() -> int:
parser = argparse.ArgumentParser(description="legal-chat-service")
parser.add_argument("--port", type=int, default=8770)
parser.add_argument("--host", default="127.0.0.1",
help="bind address; 127.0.0.1 keeps the service "
"loopback-only — leave it alone in production")
parser.add_argument("--log-level", default="INFO")
args = parser.parse_args()
logging.basicConfig(
level=args.log_level.upper(),
format="%(asctime)s %(name)s %(levelname)s %(message)s",
)
app = build_app()
web.run_app(app, host=args.host, port=args.port, print=lambda _msg: None)
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -57,6 +57,7 @@ from legal_mcp.tools import ( # noqa: E402
legal_arguments as la_tools, legal_arguments as la_tools,
missing_precedents as mp_tools, missing_precedents as mp_tools,
citations as cit_tools, citations as cit_tools,
training_enrichment as train_tools,
) )
@@ -248,6 +249,18 @@ async def precedent_extract_metadata(case_law_id: str) -> str:
return await plib.precedent_extract_metadata(case_law_id) return await plib.precedent_extract_metadata(case_law_id)
@mcp.tool()
async def style_corpus_enrich(corpus_id: str, overwrite: bool = False) -> str:
"""חילוץ מטא-דאטה (summary, outcome, key_principles, appeal_subtype) להחלטה בקורפוס הסגנון של דפנה. ברירת מחדל: ממלא רק שדות ריקים. שלח `overwrite=true` כדי לרענן."""
return await train_tools.extract_decision_metadata(corpus_id, overwrite=overwrite)
@mcp.tool()
async def style_corpus_pending_enrichment(limit: int = 50) -> str:
"""רשימת החלטות בקורפוס הסגנון שעדיין חסרות summary/outcome/key_principles — מועמדות לחילוץ."""
return await train_tools.list_corpus_pending_enrichment(limit)
@mcp.tool() @mcp.tool()
async def precedent_process_pending(kind: str = "metadata", limit: int = 20) -> str: async def precedent_process_pending(kind: str = "metadata", limit: int = 20) -> str:
"""ריקון תור בקשות חילוץ שנשלחו מ-UI. kind: 'metadata' או 'halacha'. מריץ extractor מקומית עם CLI על כל פריט בתור, ומנקה את הסימון אחרי הצלחה.""" """ריקון תור בקשות חילוץ שנשלחו מ-UI. kind: 'metadata' או 'halacha'. מריץ extractor מקומית עם CLI על כל פריט בתור, ומנקה את הסימון אחרי הצלחה."""

View File

@@ -142,3 +142,175 @@ async def query_json(
""" """
raw = await query(prompt, timeout=timeout, system=system) raw = await query(prompt, timeout=timeout, system=system)
return parse_llm_json(raw) return parse_llm_json(raw)
# ── Streaming + session continuation ────────────────────────────────
async def query_streaming(
prompt: str,
*,
system: str | None = None,
resume_session_id: str | None = None,
timeout: int = LONG_TIMEOUT,
cwd: str | None = None,
):
"""Stream Claude's response as an async iterator of events.
Wraps `claude -p --output-format=stream-json` (newline-delimited JSON
objects from the CLI) and translates each line into a small, stable
shape that the chat service / SSE proxy can forward without leaking
CLI internals to the browser.
Event shapes yielded:
{"type": "session_id", "value": "<uuid>"} # first event, used for resume
{"type": "text_delta", "text": "<partial>"} # incremental assistant text
{"type": "tool_use", "name": "...", "input": {...}}
{"type": "error", "message": "..."}
{"type": "done", "text": "<full response>"}
The CLI emits a richer stream; we project to this minimal set so the
front-end can stay stable across CLI upgrades.
Args:
prompt: The user message to send.
system: Optional system instructions (used only when starting a
fresh conversation — when resume_session_id is set, the
session already carries its system prompt).
resume_session_id: Continue a prior conversation. When given,
we don't re-send the system prompt; the CLI loads the
entire conversation history from disk.
timeout: Hard ceiling on the subprocess.
cwd: Working directory for the subprocess — defaults to the
host's HOME so claude.ai credentials resolve correctly.
"""
if resume_session_id:
# When resuming, system is already baked into the on-disk session
# — sending it again would be a no-op at best and confuse the
# conversation at worst.
full_prompt = prompt
cmd = [
"claude", "-p",
"--output-format", "stream-json",
"--verbose",
"--resume", resume_session_id,
]
else:
full_prompt = f"{system}\n\n{prompt}" if system else prompt
cmd = [
"claude", "-p",
"--output-format", "stream-json",
"--verbose",
]
if len(full_prompt) > 200_000:
logger.warning(
"Streaming: large prompt (%d chars) — may hit CLI input limits",
len(full_prompt),
)
try:
proc = await asyncio.create_subprocess_exec(
*cmd,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=cwd,
)
except FileNotFoundError:
yield {
"type": "error",
"message": (
"Claude CLI not found on host — legal-chat-service must "
"run where the `claude` binary is installed (Daphna's host, "
"not the legal-ai container)."
),
}
return
assert proc.stdin is not None # for type checkers
assert proc.stdout is not None
# Send the prompt and close stdin so the CLI knows the user message
# is complete.
try:
proc.stdin.write(full_prompt.encode("utf-8"))
await proc.stdin.drain()
proc.stdin.close()
except BrokenPipeError:
# CLI exited before reading the prompt — drain stderr and bail.
stderr_b = await proc.stderr.read() if proc.stderr else b""
yield {
"type": "error",
"message": f"Claude CLI closed stdin early: {stderr_b.decode('utf-8', errors='replace')[:300]}",
}
return
accumulated_text: list[str] = []
session_id_emitted = False
deadline = asyncio.get_event_loop().time() + timeout
try:
while True:
remaining = deadline - asyncio.get_event_loop().time()
if remaining <= 0:
yield {"type": "error", "message": f"timed out after {timeout}s"}
break
try:
line_b = await asyncio.wait_for(proc.stdout.readline(), timeout=remaining)
except asyncio.TimeoutError:
yield {"type": "error", "message": f"stream timed out after {timeout}s"}
break
if not line_b:
break
line = line_b.decode("utf-8", errors="replace").strip()
if not line:
continue
try:
event = json.loads(line)
except json.JSONDecodeError:
# Stray non-JSON line from CLI — surface a snippet for debug.
logger.debug("non-JSON stream line: %s", line[:120])
continue
# The CLI's stream-json emits several event types. We only
# care about the ones the chat service forwards.
t = event.get("type")
if not session_id_emitted:
sid = event.get("session_id")
if sid:
session_id_emitted = True
yield {"type": "session_id", "value": sid}
if t == "assistant":
# event["message"]["content"] is a list of blocks; we extract
# text blocks and tool_use blocks.
msg = event.get("message") or {}
for block in msg.get("content") or []:
btype = block.get("type")
if btype == "text":
text = block.get("text") or ""
if text:
accumulated_text.append(text)
yield {"type": "text_delta", "text": text}
elif btype == "tool_use":
yield {
"type": "tool_use",
"name": block.get("name") or "",
"input": block.get("input") or {},
}
elif t == "result":
# Final synthesized result line from the CLI — we already
# delivered the deltas, so just stop here.
break
finally:
if proc.returncode is None:
try:
proc.kill()
except ProcessLookupError:
pass
try:
await proc.wait()
except Exception:
pass
yield {"type": "done", "text": "".join(accumulated_text)}

View File

@@ -194,6 +194,55 @@ ALTER TABLE style_corpus ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT ''
-- הרחבת style_patterns עם appeal_subtype לניתוח סגנון נפרד לכל סוג ערר -- הרחבת style_patterns עם appeal_subtype לניתוח סגנון נפרד לכל סוג ערר
ALTER TABLE style_patterns ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT ''; ALTER TABLE style_patterns ADD COLUMN IF NOT EXISTS appeal_subtype TEXT DEFAULT '';
-- decision_lessons: per-decision learnings the chair / curator / style_analyzer
-- attaches to a corpus row. The generic legal-decision-lessons.md file stays
-- as the source of truth for cross-corpus patterns; this table stores the
-- granular "what we learned from THIS decision" notes that drive the writer's
-- future drafts and let the curator look up prior observations on the same row.
CREATE TABLE IF NOT EXISTS decision_lessons (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
style_corpus_id UUID NOT NULL REFERENCES style_corpus(id) ON DELETE CASCADE,
lesson_text TEXT NOT NULL,
category TEXT DEFAULT 'general', -- style / structure / lexicon / tabular / general
source TEXT DEFAULT 'manual', -- manual / curator / chair / style_analyzer
applied_to_skill BOOLEAN DEFAULT false, -- has this been promoted into SKILL.md?
created_by TEXT DEFAULT 'chaim',
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_decision_lessons_corpus ON decision_lessons(style_corpus_id);
CREATE INDEX IF NOT EXISTS idx_decision_lessons_applied ON decision_lessons(applied_to_skill);
-- chat_conversations / chat_messages: persistent history for the
-- "שיחה עם הסוכן" tab on /training. Each conversation can optionally be
-- scoped to a single style_corpus row (when the chair starts a chat
-- "about decision X"). claude_session_id is the value the local claude
-- CLI returns in stream-json — we pass it back via `--resume` on the
-- next message so the model continues the same conversation without
-- re-loading the system prompt every time.
CREATE TABLE IF NOT EXISTS chat_conversations (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
title TEXT NOT NULL DEFAULT 'שיחה חדשה',
style_corpus_id UUID REFERENCES style_corpus(id) ON DELETE SET NULL,
claude_session_id TEXT,
system_prompt_version TEXT DEFAULT 'v1',
created_at TIMESTAMPTZ DEFAULT now(),
last_message_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE IF NOT EXISTS chat_messages (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
conversation_id UUID NOT NULL REFERENCES chat_conversations(id) ON DELETE CASCADE,
role TEXT NOT NULL, -- 'user' | 'assistant'
content TEXT NOT NULL,
raw_events JSONB DEFAULT '[]', -- stream-json events for the assistant turn (optional, for debug)
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_chat_messages_conv ON chat_messages(conversation_id, created_at);
CREATE INDEX IF NOT EXISTS idx_chat_conv_corpus ON chat_conversations(style_corpus_id);
CREATE INDEX IF NOT EXISTS idx_chat_conv_last ON chat_conversations(last_message_at DESC);
-- טבלת qa_results -- טבלת qa_results
CREATE TABLE IF NOT EXISTS qa_results ( CREATE TABLE IF NOT EXISTS qa_results (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
@@ -1609,6 +1658,284 @@ async def delete_from_style_corpus(corpus_id: UUID) -> dict:
} }
async def get_style_corpus_row(corpus_id: UUID) -> dict | None:
"""Return a single style_corpus row by id, or None if missing."""
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"""
SELECT id, document_id, decision_number, decision_date,
subject_categories, full_text, summary, outcome,
key_principles, practice_area, appeal_subtype, created_at
FROM style_corpus WHERE id = $1
""",
corpus_id,
)
return dict(row) if row else None
async def update_style_corpus_metadata(
corpus_id: UUID,
*,
summary: str | None = None,
outcome: str | None = None,
key_principles: list[str] | None = None,
appeal_subtype: str | None = None,
practice_area: str | None = None,
overwrite: bool = False,
) -> dict:
"""Patch the enriched-metadata columns of a style_corpus row.
By default, only empty columns are filled — passing ``overwrite=True``
is the caller's signal that they intentionally want to replace existing
values (used by the re-extract flow when the chair runs it manually).
"""
pool = await get_pool()
async with pool.acquire() as conn:
existing = await conn.fetchrow(
"SELECT summary, outcome, key_principles, appeal_subtype, practice_area "
"FROM style_corpus WHERE id = $1",
corpus_id,
)
if not existing:
return {"updated": False, "reason": "not found"}
sets: dict = {}
if summary is not None and (overwrite or not (existing["summary"] or "").strip()):
sets["summary"] = summary
if outcome is not None and (overwrite or not (existing["outcome"] or "").strip()):
sets["outcome"] = outcome
if key_principles is not None:
current = existing["key_principles"]
if isinstance(current, str):
try:
current = json.loads(current)
except json.JSONDecodeError:
current = []
if overwrite or not (current or []):
sets["key_principles"] = json.dumps(key_principles)
if appeal_subtype is not None and (overwrite or not (existing["appeal_subtype"] or "").strip()):
sets["appeal_subtype"] = appeal_subtype
if practice_area is not None and (overwrite or not (existing["practice_area"] or "").strip()):
sets["practice_area"] = practice_area
if not sets:
return {"updated": False, "reason": "nothing to update", "fields": []}
cols = list(sets.keys())
set_clause = ", ".join(f"{c} = ${i + 2}" for i, c in enumerate(cols))
values = [sets[c] for c in cols]
await conn.execute(
f"UPDATE style_corpus SET {set_clause} WHERE id = $1",
corpus_id, *values,
)
return {"updated": True, "fields": cols}
# ── decision_lessons (per-corpus row notes) ────────────────────────
async def list_decision_lessons(corpus_id: UUID) -> list[dict]:
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"SELECT id, style_corpus_id, lesson_text, category, source, "
" applied_to_skill, created_by, created_at, updated_at "
"FROM decision_lessons WHERE style_corpus_id = $1 "
"ORDER BY created_at DESC",
corpus_id,
)
return [dict(r) for r in rows]
async def add_decision_lesson(
corpus_id: UUID,
*,
lesson_text: str,
category: str = "general",
source: str = "manual",
created_by: str = "chaim",
) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"INSERT INTO decision_lessons "
"(style_corpus_id, lesson_text, category, source, created_by) "
"VALUES ($1, $2, $3, $4, $5) "
"RETURNING id, style_corpus_id, lesson_text, category, source, "
" applied_to_skill, created_by, created_at, updated_at",
corpus_id, lesson_text, category, source, created_by,
)
return dict(row) if row else {}
async def update_decision_lesson(
lesson_id: UUID,
*,
lesson_text: str | None = None,
category: str | None = None,
applied_to_skill: bool | None = None,
) -> dict:
sets: dict = {}
if lesson_text is not None:
sets["lesson_text"] = lesson_text
if category is not None:
sets["category"] = category
if applied_to_skill is not None:
sets["applied_to_skill"] = applied_to_skill
if not sets:
return {"updated": False, "reason": "nothing to update"}
sets["updated_at"] = "now()" # sentinel — replaced inline below
cols = [c for c in sets if c != "updated_at"]
set_clause = ", ".join(f"{c} = ${i + 2}" for i, c in enumerate(cols))
set_clause += ", updated_at = now()"
values = [sets[c] for c in cols]
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
f"UPDATE decision_lessons SET {set_clause} WHERE id = $1 "
f"RETURNING id, style_corpus_id, lesson_text, category, source, "
f" applied_to_skill, updated_at",
lesson_id, *values,
)
if not row:
return {"updated": False, "reason": "not found"}
return {"updated": True, **dict(row)}
async def delete_decision_lesson(lesson_id: UUID) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
result = await conn.execute(
"DELETE FROM decision_lessons WHERE id = $1", lesson_id,
)
# asyncpg returns "DELETE n"
deleted = result.split(" ", 1)[1].strip() if " " in result else "0"
return {"deleted": deleted != "0"}
async def count_decision_lessons_per_corpus() -> dict[str, int]:
"""Map style_corpus.id (str) → lesson count, for badge display in the list."""
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"SELECT style_corpus_id, count(*) AS n "
"FROM decision_lessons GROUP BY style_corpus_id"
)
return {str(r["style_corpus_id"]): r["n"] for r in rows}
# ── chat (style agent conversations) ───────────────────────────────
async def create_chat_conversation(
*,
title: str = "שיחה חדשה",
style_corpus_id: UUID | None = None,
system_prompt_version: str = "v1",
) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"INSERT INTO chat_conversations "
"(title, style_corpus_id, system_prompt_version) "
"VALUES ($1, $2, $3) "
"RETURNING id, title, style_corpus_id, claude_session_id, "
" system_prompt_version, created_at, last_message_at",
title, style_corpus_id, system_prompt_version,
)
return dict(row) if row else {}
async def list_chat_conversations(limit: int = 50) -> list[dict]:
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"""
SELECT c.id, c.title, c.style_corpus_id, c.claude_session_id,
c.created_at, c.last_message_at,
sc.decision_number,
(SELECT count(*) FROM chat_messages m WHERE m.conversation_id = c.id) AS message_count
FROM chat_conversations c
LEFT JOIN style_corpus sc ON sc.id = c.style_corpus_id
ORDER BY c.last_message_at DESC NULLS LAST
LIMIT $1
""",
limit,
)
return [dict(r) for r in rows]
async def get_chat_conversation(conv_id: UUID) -> dict | None:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT id, title, style_corpus_id, claude_session_id, "
" system_prompt_version, created_at, last_message_at "
"FROM chat_conversations WHERE id = $1",
conv_id,
)
return dict(row) if row else None
async def delete_chat_conversation(conv_id: UUID) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
result = await conn.execute(
"DELETE FROM chat_conversations WHERE id = $1", conv_id,
)
deleted = result.split(" ", 1)[1].strip() if " " in result else "0"
return {"deleted": deleted != "0"}
async def update_chat_conversation_session_id(
conv_id: UUID, claude_session_id: str,
) -> None:
pool = await get_pool()
async with pool.acquire() as conn:
await conn.execute(
"UPDATE chat_conversations SET claude_session_id = $1, "
" last_message_at = now() "
"WHERE id = $2",
claude_session_id, conv_id,
)
async def add_chat_message(
conv_id: UUID,
*,
role: str,
content: str,
raw_events: list | None = None,
) -> dict:
pool = await get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"INSERT INTO chat_messages "
"(conversation_id, role, content, raw_events) "
"VALUES ($1, $2, $3, $4) "
"RETURNING id, conversation_id, role, content, created_at",
conv_id, role, content, json.dumps(raw_events or []),
)
await conn.execute(
"UPDATE chat_conversations SET last_message_at = now() WHERE id = $1",
conv_id,
)
return dict(row) if row else {}
async def list_chat_messages(conv_id: UUID) -> list[dict]:
pool = await get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"SELECT id, role, content, created_at "
"FROM chat_messages WHERE conversation_id = $1 "
"ORDER BY created_at ASC",
conv_id,
)
return [dict(r) for r in rows]
async def get_style_patterns(pattern_type: str | None = None) -> list[dict]: async def get_style_patterns(pattern_type: str | None = None) -> list[dict]:
pool = await get_pool() pool = await get_pool()
async with pool.acquire() as conn: async with pool.acquire() as conn:

View File

@@ -0,0 +1,195 @@
"""Auto-extract per-decision metadata for a style_corpus row.
Populates the fields that the upload flow leaves empty — summary, outcome,
key_principles, appeal_subtype, practice_area — by asking Claude (via the
local CLI session) to read the proofread full_text and return a structured
JSON blob.
Caller policy (``apply_to_corpus``): by default we **only fill empty
columns**, so chair-edited values are preserved across re-runs. The chair
can force a refresh by passing ``overwrite=True``.
Why this is a separate module from ``precedent_metadata_extractor``:
that one fills the *external* case_law corpus (court rulings, third-party
committee decisions). This one fills the *style* corpus — Daphna's own
decisions used to teach the writer the in-house voice. The two corpora
have different schemas, different prompts, and different downstream
consumers, so coupling them would have been the wrong shortcut.
"""
from __future__ import annotations
import logging
from uuid import UUID
from legal_mcp.services import claude_session, db
logger = logging.getLogger(__name__)
# A single decision typically runs 200K-650K chars. We sample the head
# (where outcome + parties + framing live) and the tail (where the
# operative ruling sits). Picking from both edges keeps the prompt under
# 60K chars — comfortable for any Claude tier.
_HEAD_CHARS = 25_000
_TAIL_CHARS = 15_000
def _build_text_window(full_text: str) -> str:
if len(full_text) <= _HEAD_CHARS + _TAIL_CHARS:
return full_text
head = full_text[:_HEAD_CHARS]
tail = full_text[-_TAIL_CHARS:]
return (
f"{head}\n\n"
f"[... חתך: {len(full_text) - _HEAD_CHARS - _TAIL_CHARS:,} תווים מהאמצע "
f"הושמטו — שמרנו על ההתחלה (טענות + רקע) ועל הסוף (הכרעה + הוצאות) ...]"
f"\n\n{tail}"
)
# Static instructions — go via ``system`` so the SDK path can cache them
# across batch enrichment runs (24+ decisions in one pass).
METADATA_PROMPT = """אתה מסייע משפטי שמקטלג את הקורפוס הסגנוני של דפנה תמיר (יו"ר ועדת ערר).
תפקידך: לקרוא החלטה אחת ולחלץ מטא-דאטה ל-style_corpus — שדות שהמשתמש לא הזין בעת ההעלאה.
**אל תמציא**. אם המידע לא מופיע בטקסט, השאר מחרוזת ריקה או מערך ריק. אסור להסיק עובדות שלא כתובות.
## פלט נדרש
החזר JSON אחד (object אחד — לא array, לא markdown, לא הסברים):
{
"summary": "תקציר עניני ב-2-3 משפטים: מי העורר, מה דרש, מה הוכרע. סגנון יבש, ניטרלי, ללא שיפוט. דוגמה: 'ערר על דחיית בקשה להיתר לתוספת מרפסת בקומה ג׳. דפנה קיבלה את הערר חלקית — אישרה את המרפסת בהקטנה ל-12 מ״ר.'",
"outcome": "התוצאה התמציתית. אחד מאלה (או צירוף קצר): 'קבלה' / 'קבלה חלקית' / 'דחייה' / 'הסתלקות' / 'החזרה לוועדה המקומית'. אם זה לא ברור — מחרוזת ריקה.",
"key_principles": [
"עיקרון משפטי 1 שעולה מההחלטה — משפט אחד, ניסוח מופשט. למשל 'שיקול דעת מוגבל לחריגות בנייה קטנות'.",
"עיקרון 2",
"..."
],
"appeal_subtype": "תת-סוג ערר. ערכים מותרים: 'building_permit' (היתר בנייה / רישוי), 'betterment_levy' (היטל השבחה), 'compensation_197' (פיצויים ס׳ 197), 'use_change' (שימוש חורג), 'tama_38' (תמ\\"א 38), או מחרוזת ריקה אם לא ברור.",
"practice_area": "תחום משפט גנרי. ברירת מחדל: 'appeals_committee'. אם זה במובהק 'planning_law' — סמן.",
"parties_appellant": "שם העורר/ים המרכזיים בהחלטה (אחד או כמה, מופרדים בפסיק). אם זו החלטה מאוחדת — שם הצד המוביל. השאר ריק אם לא ניתן לזהות במדויק.",
"parties_respondent": "שם המשיב/ים. ברירת מחדל לעררי 1xxx ו-8xxx: 'הוועדה המקומית לתכנון ובניה ירושלים' או דומה. השאר ריק אם לא ברור."
}
## כללי איכות
1. **summary** — חייב להזכיר את התוצאה. בלי 'בית המשפט קבע ש...' (אנחנו לא בית משפט). בלי הערכת אישית.
2. **outcome** — קבלה / קבלה חלקית / דחייה / הסתלקות / החזרה לוועדה המקומית. אם דפנה הכריעה חלקית — 'קבלה חלקית'. אסור 'התקבל' או 'נדחה' בלשון פעולה — רק שם פעולה.
3. **key_principles** — 2-5 עקרונות מקסימום. כל אחד משפט אחד. לא ציטוטים מילוליים, אלא תמצות העיקרון.
4. **appeal_subtype** — תמיד פעולה אחת. אם החלטה מערבת כמה תת-סוגים — בחר את העיקרי.
5. **parties_appellant / parties_respondent** — שם בלבד, בלי 'נ׳' או 'נגד'.
החזר רק את ה-JSON. אל תכתוב שום דבר לפניו או אחריו.
"""
async def extract_decision_metadata(corpus_id: UUID | str) -> dict:
"""Run Claude over the row's full_text and return suggested fields.
Does NOT touch the DB. The caller decides what to apply.
"""
if isinstance(corpus_id, str):
corpus_id = UUID(corpus_id)
row = await db.get_style_corpus_row(corpus_id)
if not row:
return {}
full_text = (row.get("full_text") or "").strip()
if not full_text:
return {}
context = (
f"מספר החלטה: {row.get('decision_number') or ''}\n"
f"תאריך: {row.get('decision_date') or ''}\n"
f"תת-סוג נוכחי: {row.get('appeal_subtype') or ''}\n"
f"נושאים מתויגים: {row.get('subject_categories') or ''}"
)
window = _build_text_window(full_text)
user_msg = (
f"## הקלט\n{context}\n\n"
f"--- תחילת ההחלטה ---\n{window}\n--- סוף ההחלטה ---"
)
try:
result = await claude_session.query_json(user_msg, system=METADATA_PROMPT)
except Exception as e:
logger.warning("style_metadata_extractor: query failed: %s", e)
return {}
if not isinstance(result, dict):
logger.warning(
"style_metadata_extractor: expected JSON object, got %s",
type(result).__name__,
)
return {}
out: dict = {}
if isinstance(result.get("summary"), str):
out["summary"] = result["summary"].strip()
if isinstance(result.get("outcome"), str):
out["outcome"] = result["outcome"].strip()
kp = result.get("key_principles") or []
if isinstance(kp, list):
out["key_principles"] = [str(p).strip() for p in kp if str(p).strip()]
if isinstance(result.get("appeal_subtype"), str):
st = result["appeal_subtype"].strip()
# Open enum — but log values outside the documented list so we can
# tighten the prompt later if needed.
known = {
"building_permit", "betterment_levy", "compensation_197",
"use_change", "tama_38", "",
}
if st not in known:
logger.info("style_metadata: unknown appeal_subtype=%r (kept)", st)
out["appeal_subtype"] = st
if isinstance(result.get("practice_area"), str):
out["practice_area"] = result["practice_area"].strip()
# Parties: not stored in the schema today, but worth surfacing in the
# extractor's return value so callers (and the UI's drawer) can display
# them. The list endpoint extracts via regex; LLM output is the
# higher-quality fallback when regex fails.
if isinstance(result.get("parties_appellant"), str):
out["parties_appellant"] = result["parties_appellant"].strip()
if isinstance(result.get("parties_respondent"), str):
out["parties_respondent"] = result["parties_respondent"].strip()
return out
async def extract_and_apply(
corpus_id: UUID | str, *, overwrite: bool = False,
) -> dict:
"""Convenience: extract → apply → return summary of what changed.
Idempotent under default ``overwrite=False`` — re-runs only fill empty
fields. Use ``overwrite=True`` to refresh values the chair (or a prior
extraction) already wrote.
"""
if isinstance(corpus_id, str):
corpus_id = UUID(corpus_id)
suggested = await extract_decision_metadata(corpus_id)
if not suggested:
return {"extracted": False, "applied": False, "reason": "no suggestion"}
update_result = await db.update_style_corpus_metadata(
corpus_id,
summary=suggested.get("summary"),
outcome=suggested.get("outcome"),
key_principles=suggested.get("key_principles"),
appeal_subtype=suggested.get("appeal_subtype"),
practice_area=suggested.get("practice_area"),
overwrite=overwrite,
)
return {
"extracted": True,
"applied": update_result.get("updated", False),
"fields_set": update_result.get("fields", []),
"suggested": suggested,
}

View File

@@ -0,0 +1,85 @@
"""MCP tool wrappers for the style_corpus metadata-enrichment flow.
The actual extractor lives in
``legal_mcp.services.style_metadata_extractor``; this module just exposes
it as MCP tools that the chair (or a future automation) can call from
Claude Code.
Why these tools matter: the upload pipeline (`/api/training/upload` →
`_process_proofread_training`) inserts a style_corpus row with
``summary=''``, ``outcome=''``, ``key_principles=[]`` because LLM
extraction can't run from the FastAPI container (no claude CLI there).
This module fills that gap — call it from the host, where ``claude``
CLI is available, and the row gets enriched.
"""
from __future__ import annotations
import json
from uuid import UUID
from legal_mcp.services import db, style_metadata_extractor
def _ok(payload) -> str:
return json.dumps({"ok": True, **payload}, ensure_ascii=False, default=str)
def _err(msg: str) -> str:
return json.dumps({"ok": False, "error": msg}, ensure_ascii=False)
async def extract_decision_metadata(corpus_id: str, overwrite: bool = False) -> str:
"""חילוץ מטא-דאטה (summary, outcome, key_principles, appeal_subtype) להחלטה בקורפוס הסגנון.
ברירת מחדל ``overwrite=False`` ממלא רק שדות ריקים. הזן ``overwrite=true``
כדי לרענן ערכים שכבר נכתבו.
"""
try:
cid = UUID(corpus_id)
except ValueError:
return _err("corpus_id לא תקין")
try:
result = await style_metadata_extractor.extract_and_apply(cid, overwrite=overwrite)
except Exception as e:
return _err(str(e))
return _ok(result)
async def list_corpus_pending_enrichment(limit: int = 50) -> str:
"""רשימת רשומות style_corpus שחסר להן summary/outcome/key_principles — מועמדות להעשרה."""
pool = await db.get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(
"""
SELECT id, decision_number, decision_date,
length(full_text) AS chars,
coalesce(summary, '') = '' AS missing_summary,
coalesce(outcome, '') = '' AS missing_outcome,
coalesce(jsonb_array_length(key_principles), 0) = 0 AS missing_principles
FROM style_corpus
WHERE coalesce(summary, '') = ''
OR coalesce(outcome, '') = ''
OR coalesce(jsonb_array_length(key_principles), 0) = 0
ORDER BY decision_date NULLS LAST
LIMIT $1
""",
limit,
)
items = [
{
"corpus_id": str(r["id"]),
"decision_number": r["decision_number"] or "",
"decision_date": str(r["decision_date"]) if r["decision_date"] else "",
"chars": r["chars"],
"missing": [
f for f, v in (
("summary", r["missing_summary"]),
("outcome", r["missing_outcome"]),
("key_principles", r["missing_principles"]),
) if v
],
}
for r in rows
]
return _ok({"count": len(items), "items": items})

View File

@@ -35,6 +35,7 @@
| `compute_ndcg.py` | python | חישוב nDCG@10 על `search_relevance_feedback` (TaskMaster #50, Stage C). aggregation לפי `search_type` ולפי שבוע, כולל top-cited case_law ו-coverage %. דגלים: `--k 10`, `--weeks 12`, `--pretty`. read-only, פלט JSON. משמש גם את `GET /api/admin/rag-metrics` (מיובא inline) — שינוי חתימה ב-`compute()` ישבור את ה-endpoint | ידני / cron עתידי לדיווח שבועי | | `compute_ndcg.py` | python | חישוב nDCG@10 על `search_relevance_feedback` (TaskMaster #50, Stage C). aggregation לפי `search_type` ולפי שבוע, כולל top-cited case_law ו-coverage %. דגלים: `--k 10`, `--weeks 12`, `--pretty`. read-only, פלט JSON. משמש גם את `GET /api/admin/rag-metrics` (מיובא inline) — שינוי חתימה ב-`compute()` ישבור את ה-endpoint | ידני / cron עתידי לדיווח שבועי |
| `backfill_multimodal_precedents.py` | python | Backfill voyage-multimodal-3 page embeddings על רשומות `case_law` (external_upload + internal_committee) שחסרות `precedent_image_embeddings`. בונה אינדקס קבצים מ-`data/precedent-library/` ו-`data/internal-decisions/`, מנסה התאמה לפי tokens של מספרי תיק (כולל parts-match לפורמטים שונים של Nevo doc-id). מדלג על רשומות בלי קובץ-מקור או עם MD בלבד (PyMuPDF לא מרנדר MD). תומך `--dry-run` (default) / `--apply` / `--only external_upload\|internal_committee` / `--limit N`. רץ בקונטיינר (יש `/data` + Voyage env). **הופעל 2026-05-26**: 70 חסרים → 26 backfilled (503 pages, ~$0.21 voyage tokens), 44 אין-קובץ-מקור. ניתן להריץ שוב אחרי שיועלו עוד PDF/DOCX לספרייה | ידני | | `backfill_multimodal_precedents.py` | python | Backfill voyage-multimodal-3 page embeddings על רשומות `case_law` (external_upload + internal_committee) שחסרות `precedent_image_embeddings`. בונה אינדקס קבצים מ-`data/precedent-library/` ו-`data/internal-decisions/`, מנסה התאמה לפי tokens של מספרי תיק (כולל parts-match לפורמטים שונים של Nevo doc-id). מדלג על רשומות בלי קובץ-מקור או עם MD בלבד (PyMuPDF לא מרנדר MD). תומך `--dry-run` (default) / `--apply` / `--only external_upload\|internal_committee` / `--limit N`. רץ בקונטיינר (יש `/data` + Voyage env). **הופעל 2026-05-26**: 70 חסרים → 26 backfilled (503 pages, ~$0.21 voyage tokens), 44 אין-קובץ-מקור. ניתן להריץ שוב אחרי שיועלו עוד PDF/DOCX לספרייה | ידני |
| `monitor_halacha_quality.py` | python | מנטר איכות חילוץ הלכות. בודק drift של `avg(confidence)` בין baseline היסטורי לחלון אחרון. מחזיר JSON מטריקות + alert ב-stderr אם drift > threshold (ברירת מחדל 5%). 2 סדרות: trusted (approved+published) ו-all_extracted. תומך `--window N` / `--threshold X` / `--min-sample N` / `--silent` / `--exit-on-alert`. רץ ב-container או מקומית עם `mcp-server/.venv` (אין תלות ב-LLM, רק SQL). **תזמון מומלץ**: `0 8 * * 1` (יום ראשון 08:00, שבועי) | `0 8 * * 1` (לתזמן) | | `monitor_halacha_quality.py` | python | מנטר איכות חילוץ הלכות. בודק drift של `avg(confidence)` בין baseline היסטורי לחלון אחרון. מחזיר JSON מטריקות + alert ב-stderr אם drift > threshold (ברירת מחדל 5%). 2 סדרות: trusted (approved+published) ו-all_extracted. תומך `--window N` / `--threshold X` / `--min-sample N` / `--silent` / `--exit-on-alert`. רץ ב-container או מקומית עם `mcp-server/.venv` (אין תלות ב-LLM, רק SQL). **תזמון מומלץ**: `0 8 * * 1` (יום ראשון 08:00, שבועי) | `0 8 * * 1` (לתזמן) |
| `audit_training_corpus.py` | python | audit של `style_corpus` — לכל החלטה: שדות מטא-דאטה מאוכלסים (`summary`/`outcome`/`key_principles`/`appeal_subtype`/`subject_categories`), קישור ל-`documents` (FK + chunks + embeddings). מפיק `data/audit/corpus-YYYY-MM-DD.json` + summary בקונסול. דרוש `POSTGRES_URL` או POSTGRES_*. אין תלויות חיצוניות מלבד asyncpg. **רץ מהמכונה המקומית** (לא קונטיינר) — חיבור ישיר ל-Postgres :5433 | ידני / קדם-עבודה לפני enrichment של מטא-דאטה |
## תיקיית `.archive/` — סקריפטים שהושלמו ## תיקיית `.archive/` — סקריפטים שהושלמו

196
scripts/audit_training_corpus.py Executable file
View File

@@ -0,0 +1,196 @@
#!/usr/bin/env python
"""Audit the style_corpus table — list each decision with what's populated and what's missing.
Produces a JSON report at data/audit/corpus-YYYY-MM-DD.json so we can see at a glance
which corpus entries lack summary/outcome/key_principles/appeal_subtype/chunks/embeddings.
Run with the mcp-server venv (has asyncpg):
POSTGRES_URL=postgres://... ./mcp-server/.venv/bin/python scripts/audit_training_corpus.py
Without POSTGRES_URL, falls back to the per-field env vars used by web/mcp-server config.
"""
from __future__ import annotations
import asyncio
import json
import os
import re
import sys
from datetime import UTC, date, datetime
from pathlib import Path
import asyncpg
def _build_dsn() -> str:
if url := os.environ.get("POSTGRES_URL"):
return url
return (
f"postgres://{os.environ.get('POSTGRES_USER', 'legal_ai')}:"
f"{os.environ.get('POSTGRES_PASSWORD', '')}@"
f"{os.environ.get('POSTGRES_HOST', '127.0.0.1')}:"
f"{os.environ.get('POSTGRES_PORT', '5433')}/"
f"{os.environ.get('POSTGRES_DB', 'legal_ai')}"
)
async def audit() -> dict:
dsn = _build_dsn()
conn = await asyncpg.connect(dsn)
try:
rows = await conn.fetch(
"""
SELECT id, decision_number, decision_date, subject_categories,
length(full_text) AS chars,
summary,
outcome,
key_principles,
practice_area,
appeal_subtype,
document_id,
created_at
FROM style_corpus
ORDER BY decision_date NULLS LAST, decision_number
"""
)
# Chunk + embedding counts for each related document — by direct FK first,
# then by title-match for legacy rows where style_corpus.document_id is NULL.
chunk_counts = await conn.fetch(
"""
SELECT d.id AS doc_id, d.title,
count(c.id) AS chunks,
count(c.embedding) FILTER (WHERE c.embedding IS NOT NULL) AS chunks_with_emb
FROM documents d
LEFT JOIN document_chunks c ON c.document_id = d.id
WHERE d.title LIKE '[קורפוס]%' OR d.id IN (SELECT document_id FROM style_corpus WHERE document_id IS NOT NULL)
GROUP BY d.id, d.title
"""
)
finally:
await conn.close()
by_doc_id = {r["doc_id"]: r for r in chunk_counts}
# Index corpus documents by every digit cluster in their title so we can
# match against style_corpus.decision_number regardless of formatting
# (e.g. style_corpus has "1109-25" but title may say "ARAR-25-1109" or
# "ערר 1009-25"). Each digit run >=3 chars becomes a key.
by_digit: dict[str, dict] = {}
for r in chunk_counts:
title = r["title"] or ""
for tok in re.findall(r"\d{3,}", title):
by_digit.setdefault(tok, r)
decisions = []
gaps_total = {
"summary": 0, "outcome": 0, "key_principles": 0,
"appeal_subtype": 0, "subject_categories": 0,
"chunks": 0, "embeddings": 0, "document_id": 0,
}
for row in rows:
cats = row["subject_categories"]
if isinstance(cats, str):
try:
cats = json.loads(cats)
except json.JSONDecodeError:
cats = []
cats = cats or []
kp = row["key_principles"]
if isinstance(kp, str):
try:
kp = json.loads(kp)
except json.JSONDecodeError:
kp = []
kp = kp or []
# Resolve chunks: prefer FK, fall back to digit-cluster match on decision_number.
chunks = 0
chunks_with_emb = 0
if row["document_id"] and row["document_id"] in by_doc_id:
r = by_doc_id[row["document_id"]]
chunks = r["chunks"]
chunks_with_emb = r["chunks_with_emb"]
elif row["decision_number"]:
for tok in re.findall(r"\d{3,}", row["decision_number"]):
if tok in by_digit:
r = by_digit[tok]
chunks = r["chunks"]
chunks_with_emb = r["chunks_with_emb"]
break
missing = []
if not row["summary"]:
missing.append("summary")
gaps_total["summary"] += 1
if not row["outcome"]:
missing.append("outcome")
gaps_total["outcome"] += 1
if not kp:
missing.append("key_principles")
gaps_total["key_principles"] += 1
if not row["appeal_subtype"]:
missing.append("appeal_subtype")
gaps_total["appeal_subtype"] += 1
if not cats:
missing.append("subject_categories")
gaps_total["subject_categories"] += 1
if chunks == 0:
missing.append("chunks")
gaps_total["chunks"] += 1
elif chunks_with_emb < chunks:
missing.append(f"embeddings({chunks_with_emb}/{chunks})")
gaps_total["embeddings"] += 1
if row["document_id"] is None:
missing.append("document_id")
gaps_total["document_id"] += 1
decisions.append({
"id": str(row["id"]),
"decision_number": row["decision_number"] or "",
"decision_date": row["decision_date"].isoformat() if row["decision_date"] else None,
"chars": row["chars"],
"subject_categories": cats,
"practice_area": row["practice_area"] or "",
"appeal_subtype": row["appeal_subtype"] or "",
"summary_len": len(row["summary"] or ""),
"outcome_len": len(row["outcome"] or ""),
"key_principles_count": len(kp),
"chunks": chunks,
"chunks_with_embeddings": chunks_with_emb,
"document_id": str(row["document_id"]) if row["document_id"] else None,
"missing": missing,
"created_at": row["created_at"].isoformat() if row["created_at"] else None,
})
return {
"generated_at": datetime.now(UTC).isoformat(),
"total_decisions": len(decisions),
"gaps_total": gaps_total,
"decisions": decisions,
}
async def main() -> int:
report = await audit()
out_dir = Path(__file__).resolve().parents[1] / "data" / "audit"
out_dir.mkdir(parents=True, exist_ok=True)
today = date.today().isoformat()
out_file = out_dir / f"corpus-{today}.json"
out_file.write_text(json.dumps(report, ensure_ascii=False, indent=2), encoding="utf-8")
# Console summary
print(f"Total decisions: {report['total_decisions']}")
print("Gaps by field (count of decisions missing it):")
for field, n in report["gaps_total"].items():
bar = "" * min(n, 60)
print(f" {field:25s} {n:3d} {bar}")
print(f"\nReport written to {out_file}")
return 0
if __name__ == "__main__":
sys.exit(asyncio.run(main()))

View File

@@ -0,0 +1,48 @@
/**
* pm2 ecosystem entry for legal-chat-service — the host-side SSE bridge
* to ``claude`` CLI that powers the /training chat tab.
*
* Why pm2:
* - Auto-restart if the process dies (claude CLI subprocess failures
* should never leave the service in a half-dead state).
* - Log rotation matches paperclip's behavior so the chair sees
* consistent log paths under ~/.pm2/logs/.
*
* Install (once):
* pm2 start /home/chaim/legal-ai/scripts/legal-chat-service.config.cjs
* pm2 save
*
* Smoke test:
* curl http://127.0.0.1:8770/health
* # → {"ok":true,"service":"legal-chat-service"}
*
* Update:
* pm2 restart legal-chat-service
*
* Stop:
* pm2 stop legal-chat-service
*/
module.exports = {
apps: [
{
name: "legal-chat-service",
cwd: "/home/chaim/legal-ai/mcp-server",
// Run the in-package server via the venv interpreter so all
// imports (claude_session, etc) resolve.
script: "/home/chaim/legal-ai/mcp-server/.venv/bin/python",
args: "-m legal_mcp.chat_service.server --port 8770",
// claude CLI looks up credentials under HOME — make sure it
// sees Daphna's session, not an empty container HOME.
env: {
HOME: "/home/chaim",
PATH: "/home/chaim/.local/bin:/usr/local/bin:/usr/bin:/bin",
PYTHONUNBUFFERED: "1",
},
restart_delay: 5000,
max_restarts: 10,
autorestart: true,
max_memory_restart: "500M",
},
],
};

View File

@@ -1,18 +1,27 @@
"use client"; "use client";
import { useState } from "react";
import Link from "next/link"; import Link from "next/link";
import { Upload } from "lucide-react";
import { AppShell } from "@/components/app-shell"; import { AppShell } from "@/components/app-shell";
import { Button } from "@/components/ui/button";
import { Card, CardContent } from "@/components/ui/card"; import { Card, CardContent } from "@/components/ui/card";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs"; import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { StyleReportPanel } from "@/components/training/style-report-panel"; import { StyleReportPanel } from "@/components/training/style-report-panel";
import { CorpusPanel } from "@/components/training/corpus-panel"; import { CorpusPanel } from "@/components/training/corpus-panel";
import { ComparePanel } from "@/components/training/compare-panel"; import { ComparePanel } from "@/components/training/compare-panel";
import { CuratorPortraitPanel } from "@/components/training/curator-portrait-panel";
import { ChatPanel } from "@/components/training/chat-panel";
import { TrainingUploadDialog } from "@/components/training/upload-dialog";
export default function TrainingPage() { export default function TrainingPage() {
const [uploadOpen, setUploadOpen] = useState(false);
return ( return (
<AppShell> <AppShell>
<section className="space-y-6"> <section className="space-y-6">
<header> <header className="flex items-start justify-between gap-4 flex-wrap">
<div>
<nav className="text-[0.78rem] text-ink-muted mb-1"> <nav className="text-[0.78rem] text-ink-muted mb-1">
<Link href="/" className="hover:text-gold-deep">בית</Link> <Link href="/" className="hover:text-gold-deep">בית</Link>
<span aria-hidden> · </span> <span aria-hidden> · </span>
@@ -23,8 +32,18 @@ export default function TrainingPage() {
לוח בקרה של קורפוס האימון סטטיסטיקות, אנטומיית החלטה ממוצעת, לוח בקרה של קורפוס האימון סטטיסטיקות, אנטומיית החלטה ממוצעת,
ביטויי חתימה, וכלי השוואה בין שתי החלטות. ביטויי חתימה, וכלי השוואה בין שתי החלטות.
</p> </p>
</div>
<Button
onClick={() => setUploadOpen(true)}
className="bg-navy text-parchment hover:bg-navy-soft shrink-0"
>
<Upload className="w-4 h-4 me-1" />
העלה החלטה
</Button>
</header> </header>
<TrainingUploadDialog open={uploadOpen} onOpenChange={setUploadOpen} />
<div className="h-[2px] bg-gradient-to-l from-transparent via-gold to-transparent" /> <div className="h-[2px] bg-gradient-to-l from-transparent via-gold to-transparent" />
<Card className="bg-surface border-rule shadow-sm"> <Card className="bg-surface border-rule shadow-sm">
@@ -34,6 +53,8 @@ export default function TrainingPage() {
<TabsTrigger value="report">פורטרט סגנון</TabsTrigger> <TabsTrigger value="report">פורטרט סגנון</TabsTrigger>
<TabsTrigger value="corpus">קורפוס</TabsTrigger> <TabsTrigger value="corpus">קורפוס</TabsTrigger>
<TabsTrigger value="compare">השוואה</TabsTrigger> <TabsTrigger value="compare">השוואה</TabsTrigger>
<TabsTrigger value="curator">הסוכן</TabsTrigger>
<TabsTrigger value="chat">שיחה</TabsTrigger>
</TabsList> </TabsList>
<TabsContent value="report" className="mt-5"> <TabsContent value="report" className="mt-5">
@@ -47,6 +68,14 @@ export default function TrainingPage() {
<TabsContent value="compare" className="mt-5"> <TabsContent value="compare" className="mt-5">
<ComparePanel /> <ComparePanel />
</TabsContent> </TabsContent>
<TabsContent value="curator" className="mt-5">
<CuratorPortraitPanel />
</TabsContent>
<TabsContent value="chat" className="mt-5">
<ChatPanel />
</TabsContent>
</Tabs> </Tabs>
</CardContent> </CardContent>
</Card> </Card>

View File

@@ -0,0 +1,434 @@
"use client";
/*
* Style-agent chat panel — the new "שיחה" tab on /training.
*
* Layout: two columns.
* - Sidebar: list of conversations + "+ שיחה חדשה" button
* - Main: thread of messages + composer with SSE streaming
*
* Each message is persisted to the legal-ai DB; the LLM call goes
* out via FastAPI → host's legal-chat-service → claude CLI. There
* is no API cost — the claude CLI uses Daphna's claude.ai
* subscription via the host's auth.
*
* Health gate: if /api/training/chat/health reports the host service
* is unreachable, the composer is replaced by a setup notice telling
* the chair to start the pm2 service.
*/
import { useEffect, useRef, useState } from "react";
import {
Send, Plus, Trash2, Loader2, MessageSquare, Sparkles, AlertTriangle,
} from "lucide-react";
import { toast } from "sonner";
import { Card, CardContent } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Textarea } from "@/components/ui/textarea";
import { ScrollArea } from "@/components/ui/scroll-area";
import { Badge } from "@/components/ui/badge";
import { Skeleton } from "@/components/ui/skeleton";
import {
Select, SelectContent, SelectItem, SelectTrigger, SelectValue,
} from "@/components/ui/select";
import {
chatKeys,
useChatConversation,
useChatConversations,
useChatHealth,
useCorpus,
useCreateChat,
useDeleteChat,
type ChatMessage,
} from "@/lib/api/training";
import { useQueryClient } from "@tanstack/react-query";
export function ChatPanel() {
const [activeId, setActiveId] = useState<string | null>(null);
const health = useChatHealth();
return (
<div className="grid gap-4 lg:grid-cols-[280px_1fr]">
<ConversationsSidebar activeId={activeId} onSelect={setActiveId} />
<div className="space-y-3">
{health.data && !health.data.reachable && (
<ChatServiceWarning health={health.data} />
)}
{activeId ? (
<ChatThread convId={activeId} />
) : (
<Card className="bg-rule-soft/40 border-rule">
<CardContent className="px-6 py-10 text-center text-ink-muted text-sm space-y-2">
<MessageSquare className="w-8 h-8 mx-auto opacity-50" />
<p>בחר שיחה קיימת או פתח חדשה כדי להתחיל לדבר עם סוכן הסגנון.</p>
<p className="text-[0.78rem]">
הסוכן רץ על claude CLI מקומי דרך legal-chat-service. אין עלות API.
</p>
</CardContent>
</Card>
)}
</div>
</div>
);
}
// ── Sidebar: list + new ────────────────────────────────────────────
function ConversationsSidebar({
activeId, onSelect,
}: {
activeId: string | null;
onSelect: (id: string | null) => void;
}) {
const { data: convs, isPending } = useChatConversations();
const { data: corpus } = useCorpus();
const create = useCreateChat();
const del = useDeleteChat();
const [creating, setCreating] = useState(false);
const [newTitle, setNewTitle] = useState("");
const [newCorpusId, setNewCorpusId] = useState<string>("__none__");
const onCreate = async () => {
try {
const conv = await create.mutateAsync({
title: newTitle.trim() || "שיחה חדשה",
style_corpus_id: newCorpusId === "__none__" ? null : newCorpusId,
});
onSelect(conv.id);
setCreating(false);
setNewTitle("");
setNewCorpusId("__none__");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל ביצירת שיחה");
}
};
const onDelete = async (id: string) => {
if (!window.confirm("למחוק את השיחה? פעולה זו לא ניתנת לביטול.")) return;
try {
await del.mutateAsync(id);
if (activeId === id) onSelect(null);
toast.success("השיחה נמחקה");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל במחיקה");
}
};
return (
<Card className="bg-surface border-rule">
<CardContent className="px-3 py-3 space-y-2">
{!creating ? (
<Button
onClick={() => setCreating(true)}
className="w-full bg-navy text-parchment hover:bg-navy-soft"
size="sm"
>
<Plus className="w-4 h-4 me-1" />
שיחה חדשה
</Button>
) : (
<div className="space-y-2 border border-rule rounded p-2 bg-rule-soft/30">
<Textarea
value={newTitle}
onChange={(e) => setNewTitle(e.target.value)}
placeholder="כותרת לשיחה (אופציונלי)"
rows={2} dir="rtl"
/>
<Select value={newCorpusId} onValueChange={setNewCorpusId} dir="rtl">
<SelectTrigger>
<SelectValue placeholder="צמד להחלטה (אופציונלי)" />
</SelectTrigger>
<SelectContent className="max-h-[300px]">
<SelectItem value="__none__"> שיחה כללית </SelectItem>
{corpus?.map((c) => (
<SelectItem key={c.id} value={c.id}>
{c.decision_number || "—"}
{c.decision_date ? ` · ${c.decision_date}` : ""}
</SelectItem>
))}
</SelectContent>
</Select>
<div className="flex gap-1 justify-end">
<Button variant="ghost" size="sm"
onClick={() => { setCreating(false); setNewTitle(""); setNewCorpusId("__none__"); }}>
ביטול
</Button>
<Button size="sm" onClick={onCreate} disabled={create.isPending}
className="bg-navy text-parchment hover:bg-navy-soft">
צור
</Button>
</div>
</div>
)}
<ScrollArea className="h-[520px]">
<ul className="space-y-1">
{isPending && (
<>
<Skeleton className="h-12 w-full" />
<Skeleton className="h-12 w-full" />
</>
)}
{convs?.length === 0 && (
<p className="text-center text-ink-muted text-[0.78rem] py-6">
אין עדיין שיחות
</p>
)}
{convs?.map((c) => {
const active = c.id === activeId;
return (
<li key={c.id}>
<button
onClick={() => onSelect(c.id)}
className={
"w-full text-end rounded-md px-2 py-2 transition " +
(active
? "bg-gold-wash border border-gold/40"
: "hover:bg-rule-soft/60 border border-transparent")
}
>
<div className="text-sm text-navy font-semibold truncate">
{c.title}
</div>
<div className="flex items-center gap-1 text-[0.7rem] text-ink-muted">
{c.decision_number && (
<Badge variant="outline"
className="text-[0.65rem] bg-info-bg text-info border-info/40">
{c.decision_number}
</Badge>
)}
<span className="tabular-nums">{c.message_count}</span>
<MessageSquare className="w-3 h-3" />
<span className="grow text-end">
{new Date(c.last_message_at).toLocaleDateString("he-IL")}
</span>
<button
onClick={(e) => { e.stopPropagation(); onDelete(c.id); }}
className="hover:text-danger"
aria-label="מחק שיחה"
>
<Trash2 className="w-3 h-3" />
</button>
</div>
</button>
</li>
);
})}
</ul>
</ScrollArea>
</CardContent>
</Card>
);
}
// ── Thread + composer ──────────────────────────────────────────────
function ChatThread({ convId }: { convId: string }) {
const { data, isPending } = useChatConversation(convId);
const qc = useQueryClient();
const [draft, setDraft] = useState("");
const [streaming, setStreaming] = useState(false);
const [streamingText, setStreamingText] = useState("");
const [streamError, setStreamError] = useState("");
const scrollRef = useRef<HTMLDivElement | null>(null);
/* Auto-scroll to bottom when new messages arrive. */
useEffect(() => {
const el = scrollRef.current;
if (!el) return;
el.scrollTo({ top: el.scrollHeight, behavior: "smooth" });
}, [data?.messages.length, streamingText]);
const onSend = async () => {
const text = draft.trim();
if (!text || streaming) return;
setDraft("");
setStreaming(true);
setStreamingText("");
setStreamError("");
try {
const res = await fetch(
`/api/training/chat/conversations/${encodeURIComponent(convId)}/messages`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ content: text }),
},
);
if (!res.ok || !res.body) {
const body = await res.text();
throw new Error(`HTTP ${res.status}: ${body.slice(0, 200)}`);
}
// Parse SSE line-by-line. EventSource would be cleaner but it
// doesn't support POST bodies; the manual reader is small.
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let accumulated = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
let nl: number;
while ((nl = buffer.indexOf("\n\n")) !== -1) {
const event = buffer.slice(0, nl);
buffer = buffer.slice(nl + 2);
if (!event.startsWith("data: ")) continue;
try {
const payload = JSON.parse(event.slice("data: ".length));
if (payload.type === "text_delta" && payload.text) {
accumulated += payload.text;
setStreamingText(accumulated);
} else if (payload.type === "error") {
setStreamError(String(payload.message || "שגיאה לא ידועה"));
} else if (payload.type === "done") {
if (payload.text && !accumulated) {
accumulated = payload.text;
setStreamingText(accumulated);
}
}
} catch {
/* ignore non-JSON */
}
}
}
} catch (e) {
setStreamError(e instanceof Error ? e.message : "שגיאה בשיחה");
} finally {
setStreaming(false);
setStreamingText("");
// Refetch the conversation so the persisted assistant turn shows up.
qc.invalidateQueries({ queryKey: chatKeys.conversation(convId) });
qc.invalidateQueries({ queryKey: chatKeys.conversations() });
}
};
if (isPending) return <Skeleton className="h-[560px] w-full" />;
if (!data) return null;
return (
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3 space-y-3">
<header className="flex items-center gap-2 border-b border-rule pb-2">
<Sparkles className="w-4 h-4 text-gold-deep" />
<h3 className="text-navy font-semibold grow">{data.conversation.title}</h3>
{data.conversation.decision_number && (
<Badge variant="outline" className="bg-info-bg text-info border-info/40">
{data.conversation.decision_number}
</Badge>
)}
</header>
<div ref={scrollRef} className="h-[440px] overflow-y-auto space-y-3 pe-1">
{data.messages.length === 0 && !streaming && (
<p className="text-center text-ink-muted text-sm py-8">
התחל בשאלה למשל: &quot;מה מאפיין את הפתיחות של דפנה בעררי 1xxx?&quot;
</p>
)}
{data.messages.map((m) => <MessageBubble key={m.id} message={m} />)}
{streaming && (
<MessageBubble
message={{
id: "streaming",
role: "assistant",
content: streamingText || "(מקליד…)",
created_at: "",
}}
isStreaming
/>
)}
{streamError && (
<div className="rounded-lg border border-danger/40 bg-danger-bg p-3 text-danger text-sm">
{streamError}
</div>
)}
</div>
<div className="border-t border-rule pt-3 space-y-2">
<Textarea
value={draft}
onChange={(e) => setDraft(e.target.value)}
placeholder="שאל את הסוכן… (Shift+Enter לשורה חדשה)"
rows={3} dir="rtl"
disabled={streaming}
onKeyDown={(e) => {
if (e.key === "Enter" && !e.shiftKey) {
e.preventDefault();
void onSend();
}
}}
/>
<div className="flex items-center gap-2">
<p className="text-[0.72rem] text-ink-muted grow">
{data.conversation.claude_session_id
? "שיחה ממשיכה (--resume) — אין צורך לטעון מחדש את ה-system prompt"
: "שיחה חדשה — system prompt ייטען (שני מסמכי ייחוס + רשימת קורפוס)"}
</p>
<Button onClick={onSend} disabled={streaming || !draft.trim()}
className="bg-navy text-parchment hover:bg-navy-soft">
{streaming ? (
<Loader2 className="w-4 h-4 animate-spin me-1" />
) : (
<Send className="w-4 h-4 me-1" />
)}
שלח
</Button>
</div>
</div>
</CardContent>
</Card>
);
}
function MessageBubble({
message, isStreaming = false,
}: { message: ChatMessage; isStreaming?: boolean }) {
const isUser = message.role === "user";
return (
<div className={isUser ? "flex justify-start" : "flex justify-end"}>
<div
className={
"max-w-[85%] rounded-lg px-3 py-2 text-sm leading-relaxed whitespace-pre-wrap " +
(isUser
? "bg-gold-wash text-ink border border-gold/40"
: "bg-rule-soft text-ink border border-rule")
}
dir="rtl"
>
{message.content}
{isStreaming && (
<span className="inline-block w-1.5 h-3.5 bg-navy/60 align-middle ms-1 animate-pulse" />
)}
</div>
</div>
);
}
// ── Service-down warning ──────────────────────────────────────────
function ChatServiceWarning({
health,
}: { health: { reachable: boolean; url: string; error?: string } }) {
return (
<Card className="bg-danger-bg border-danger/40">
<CardContent className="px-4 py-3 space-y-1">
<div className="flex items-center gap-2 text-danger">
<AlertTriangle className="w-4 h-4" />
<strong>שירות הצ&apos;אט אינו זמין</strong>
</div>
<p className="text-[0.78rem] text-danger">
לא ניתן להגיע ל-legal-chat-service בכתובת
<code className="px-1 mx-1 bg-rule-soft rounded">{health.url}</code>.
{health.error && (<> פירוט: <code className="px-1 bg-rule-soft rounded">{health.error}</code></>)}
</p>
<p className="text-[0.72rem] text-ink-muted">
על המכונה המקומית הפעל:&nbsp;
<code className="px-1 bg-rule-soft rounded">
pm2 start /home/chaim/legal-ai/scripts/legal-chat-service.config.cjs
</code>
</p>
</CardContent>
</Card>
);
}

View File

@@ -0,0 +1,402 @@
"use client";
/*
* Side-drawer for inspecting + editing a single style_corpus entry.
*
* Tabs:
* - "פרטים" — show + edit the enriched metadata (decision_number, date,
* subjects, summary, outcome, key_principles, appeal_subtype). Saving
* issues a PATCH /api/training/corpus/{id} and invalidates the list.
* - "תוכן" — read-only full_text view (truncated to 5K with "show more").
* We never let the chair edit full_text from the UI; corrections happen
* by re-uploading via the Upload dialog.
* - "מה למדנו" — per-decision lessons (Phase 4 placeholder for now).
* - "דפוסים" — style_patterns scoped by appeal_subtype.
*
* Why a Sheet, not a Dialog: the drawer needs to coexist with the corpus
* table so the chair can scan multiple decisions without losing context.
* Sheet (side: "left" in RTL = right edge in LTR) gives that without
* stealing the entire viewport.
*/
import { useEffect, useState } from "react";
import { Save, FileText, Tag, Calendar, BookOpen, Loader2 } from "lucide-react";
import { toast } from "sonner";
import {
Sheet, SheetContent, SheetHeader, SheetTitle, SheetDescription,
} from "@/components/ui/sheet";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { Card, CardContent } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Textarea } from "@/components/ui/textarea";
import { Badge } from "@/components/ui/badge";
import { ScrollArea } from "@/components/ui/scroll-area";
import {
usePatchCorpus,
type CorpusDecision,
type CorpusDecisionPatch,
} from "@/lib/api/training";
import { LessonsTab } from "./lessons-tab";
type Props = {
decision: CorpusDecision | null;
onOpenChange: (open: boolean) => void;
};
export function CorpusDetailDrawer({ decision, onOpenChange }: Props) {
// Local editable state for the "details" tab. Re-seeds whenever the
// selected decision changes so the form reflects the row the chair
// clicked.
const [draft, setDraft] = useState<CorpusDecisionPatch>({});
const patch = usePatchCorpus();
/* eslint-disable react-hooks/set-state-in-effect */
useEffect(() => {
if (!decision) {
setDraft({});
return;
}
setDraft({
decision_number: decision.decision_number,
decision_date: decision.decision_date,
subject_categories: decision.subject_categories,
summary: decision.summary,
outcome: decision.outcome,
key_principles: decision.key_principles,
appeal_subtype: decision.appeal_subtype,
practice_area: decision.practice_area,
});
}, [decision]);
/* eslint-enable react-hooks/set-state-in-effect */
const open = decision !== null;
if (!decision) return null;
// Diff against the originally loaded row — only PATCH fields the chair
// actually changed, so concurrent edits to other fields stay intact.
const diff: CorpusDecisionPatch = {};
if (draft.decision_number !== decision.decision_number)
diff.decision_number = draft.decision_number;
if (draft.decision_date !== decision.decision_date)
diff.decision_date = draft.decision_date;
if (draft.summary !== decision.summary)
diff.summary = draft.summary;
if (draft.outcome !== decision.outcome)
diff.outcome = draft.outcome;
if (draft.appeal_subtype !== decision.appeal_subtype)
diff.appeal_subtype = draft.appeal_subtype;
if (draft.practice_area !== decision.practice_area)
diff.practice_area = draft.practice_area;
if (
JSON.stringify(draft.subject_categories) !==
JSON.stringify(decision.subject_categories)
)
diff.subject_categories = draft.subject_categories;
if (
JSON.stringify(draft.key_principles) !==
JSON.stringify(decision.key_principles)
)
diff.key_principles = draft.key_principles;
const isDirty = Object.keys(diff).length > 0;
const onSave = async () => {
if (!isDirty) return;
try {
await patch.mutateAsync({ id: decision.id, patch: diff });
toast.success("המטא-דאטה עודכן");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בשמירה");
}
};
const setSubjects = (raw: string) =>
setDraft((d) => ({
...d,
subject_categories: raw.split(/[,،]/).map((s) => s.trim()).filter(Boolean),
}));
const setPrinciples = (raw: string) =>
setDraft((d) => ({
...d,
key_principles: raw.split("\n").map((s) => s.trim()).filter(Boolean),
}));
return (
<Sheet open={open} onOpenChange={onOpenChange}>
<SheetContent side="left" className="w-full sm:max-w-3xl overflow-y-auto" dir="rtl">
<SheetHeader>
<SheetTitle className="text-navy flex items-center gap-2">
<BookOpen className="w-4 h-4 shrink-0" />
{decision.legal_citation || decision.decision_number || "—"}
</SheetTitle>
<SheetDescription className="text-ink-muted">
{decision.doc_title || "החלטה בקורפוס הסגנוני"}
</SheetDescription>
</SheetHeader>
{/* Summary strip — fast-scan info, always visible above the tabs. */}
<div className="px-6 mt-3 grid grid-cols-2 md:grid-cols-4 gap-3 text-[0.78rem]">
<DataPoint icon={<Calendar className="w-3 h-3" />} label="תאריך"
value={decision.decision_date || "—"} />
<DataPoint icon={<FileText className="w-3 h-3" />} label="תווים"
value={`${(decision.chars / 1000).toFixed(1)}K`} />
<DataPoint icon={<FileText className="w-3 h-3" />} label="עמודים"
value={decision.page_count > 0 ? String(decision.page_count) : "—"} />
<DataPoint icon={<Tag className="w-3 h-3" />} label="תת-סוג"
value={decision.appeal_subtype || "—"} />
</div>
<div className="px-6 pb-6 mt-4">
<Tabs defaultValue="details" dir="rtl">
<TabsList className="bg-rule-soft/60">
<TabsTrigger value="details">פרטים</TabsTrigger>
<TabsTrigger value="content">תוכן</TabsTrigger>
<TabsTrigger value="lessons">מה למדנו</TabsTrigger>
<TabsTrigger value="patterns">דפוסים</TabsTrigger>
</TabsList>
{/* ── Tab: editable metadata ─────────────────────────── */}
<TabsContent value="details" className="mt-4 space-y-4">
<div className="grid grid-cols-2 gap-3">
<Field label="מספר ההחלטה">
<Input value={draft.decision_number ?? ""}
onChange={(e) => setDraft((d) => ({ ...d, decision_number: e.target.value }))}
dir="rtl" />
</Field>
<Field label="תאריך">
<Input type="date" value={draft.decision_date ?? ""}
onChange={(e) => setDraft((d) => ({ ...d, decision_date: e.target.value }))} />
</Field>
</div>
<Field label="נושאים (מופרדים בפסיקים)">
<Input value={(draft.subject_categories ?? []).join(", ")}
onChange={(e) => setSubjects(e.target.value)} dir="rtl" />
{decision.subject_categories.length > 0 && (
<div className="flex flex-wrap gap-1 mt-1">
{decision.subject_categories.map((s) => (
<Badge key={s} variant="outline"
className="text-[0.7rem] bg-gold-wash text-gold-deep border-gold/40">
{s}
</Badge>
))}
</div>
)}
</Field>
<div className="grid grid-cols-2 gap-3">
<Field label="תת-סוג ערר">
<Input value={draft.appeal_subtype ?? ""}
onChange={(e) => setDraft((d) => ({ ...d, appeal_subtype: e.target.value }))}
placeholder="building_permit / betterment_levy / compensation_197"
dir="rtl" />
</Field>
<Field label="תחום משפט">
<Input value={draft.practice_area ?? ""}
onChange={(e) => setDraft((d) => ({ ...d, practice_area: e.target.value }))}
dir="rtl" />
</Field>
</div>
<Field label="תקציר (summary)">
<Textarea value={draft.summary ?? ""} rows={3}
onChange={(e) => setDraft((d) => ({ ...d, summary: e.target.value }))}
placeholder="תקציר חופשי — מי, מה, איך הוכרע"
dir="rtl" />
</Field>
<Field label="התוצאה (outcome)">
<Textarea value={draft.outcome ?? ""} rows={2}
onChange={(e) => setDraft((d) => ({ ...d, outcome: e.target.value }))}
placeholder="קבלה / קבלה חלקית / דחייה — בקצרה"
dir="rtl" />
</Field>
<Field label="עקרונות מרכזיים (שורה לכל אחד)">
<Textarea value={(draft.key_principles ?? []).join("\n")} rows={4}
onChange={(e) => setPrinciples(e.target.value)}
placeholder={"דוגמה:\nשיקול דעת מוגבל לחריגות קטנות\nריפוי פגם רק בנסיבות חריגות"}
dir="rtl" />
</Field>
{decision.parties.appellant && (
<Card className="bg-rule-soft/40 border-rule">
<CardContent className="px-4 py-3 text-[0.78rem] text-ink-soft">
<p><strong className="text-navy">עורר/ת:</strong> {decision.parties.appellant}</p>
{decision.parties.respondent && (
<p className="mt-1"><strong className="text-navy">משיב/ה:</strong> {decision.parties.respondent}</p>
)}
<p className="mt-2 text-ink-muted text-[0.72rem]">
(חולץ אוטומטית מתחילת הטקסט תקן ע&quot;י עריכת ה-full_text במקור.)
</p>
</CardContent>
</Card>
)}
<div className="flex items-center justify-end gap-2 pt-2 border-t border-rule">
<Button variant="ghost" onClick={() => onOpenChange(false)}>
סגור
</Button>
<Button onClick={onSave} disabled={!isDirty || patch.isPending}
className="bg-navy text-parchment hover:bg-navy-soft">
{patch.isPending ? (
<Loader2 className="w-4 h-4 animate-spin me-1" />
) : (
<Save className="w-4 h-4 me-1" />
)}
שמור שינויים
</Button>
</div>
</TabsContent>
{/* ── Tab: full_text (read-only) ─────────────────────── */}
<TabsContent value="content" className="mt-4">
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3">
<p className="text-[0.72rem] text-ink-muted mb-2">
{decision.chars.toLocaleString("he-IL")} תווים · קריאה בלבד
</p>
<ScrollArea className="h-[480px] pe-2">
<p className="text-sm text-ink leading-relaxed whitespace-pre-wrap">
<FullTextLazy id={decision.id} />
</p>
</ScrollArea>
</CardContent>
</Card>
</TabsContent>
{/* ── Tab: lessons (per-decision) ────────────────────── */}
<TabsContent value="lessons" className="mt-4">
<LessonsTab corpusId={decision.id} />
</TabsContent>
{/* ── Tab: patterns scoped by appeal_subtype ─────────── */}
<TabsContent value="patterns" className="mt-4">
<PatternsForSubtype subtype={decision.appeal_subtype} />
</TabsContent>
</Tabs>
</div>
</SheetContent>
</Sheet>
);
}
// ── helpers ────────────────────────────────────────────────────────
function DataPoint({
icon, label, value,
}: { icon: React.ReactNode; label: string; value: string }) {
return (
<div className="flex items-center gap-1 text-ink-muted">
{icon}
<span>{label}:</span>
<span className="font-semibold text-navy tabular-nums truncate">{value}</span>
</div>
);
}
function Field({
label, children,
}: { label: string; children: React.ReactNode }) {
return (
<div className="space-y-1">
<Label className="text-[0.78rem]">{label}</Label>
{children}
</div>
);
}
/* The corpus-list endpoint deliberately doesn't return full_text (too big).
* We fetch it on demand only when the content tab opens.
*
* Implementation note: we don't have a dedicated /api/training/corpus/{id}
* GET endpoint yet. As a thin stopgap we hit a planned `/full-text` shortcut
* via apiRequest; if the endpoint isn't deployed yet the UI just shows the
* fallback message instead of crashing. The full-text endpoint lands with
* the next backend deploy.
*/
function FullTextLazy({ id }: { id: string }) {
const [text, setText] = useState<string>("");
const [loading, setLoading] = useState(true);
const [error, setError] = useState("");
/* eslint-disable react-hooks/set-state-in-effect */
useEffect(() => {
let cancelled = false;
setLoading(true);
setError("");
fetch(`/api/training/corpus/${encodeURIComponent(id)}/full-text`)
.then((r) => (r.ok ? r.json() : Promise.reject(new Error(`HTTP ${r.status}`))))
.then((d: { full_text: string }) => {
if (cancelled) return;
setText(d.full_text || "");
})
.catch((e: Error) => {
if (cancelled) return;
setError(e.message);
})
.finally(() => !cancelled && setLoading(false));
return () => { cancelled = true; };
}, [id]);
/* eslint-enable react-hooks/set-state-in-effect */
if (loading) return <span className="text-ink-muted">טוען</span>;
if (error) return <span className="text-ink-muted">לא נמצא ({error})</span>;
return text;
}
function PatternsForSubtype({ subtype }: { subtype: string }) {
// Filtered patterns endpoint isn't built yet — we fall back to /patterns
// and filter client-side. The result is mediocre when many subtypes share
// patterns; better filtering ships in the metadata-enrichment iteration.
const [data, setData] = useState<Record<string, { pattern_text: string; frequency: number }[]> | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
let cancelled = false;
fetch("/api/training/patterns")
.then((r) => r.json())
.then((d: { by_type: Record<string, { pattern_text: string; frequency: number }[]> }) => {
if (!cancelled) setData(d.by_type);
})
.catch(() => !cancelled && setData({}))
.finally(() => !cancelled && setLoading(false));
return () => { cancelled = true; };
}, []);
if (loading) return <p className="text-ink-muted text-sm text-center py-6">טוען</p>;
if (!data || Object.keys(data).length === 0) {
return <p className="text-ink-muted text-sm text-center py-6">אין דפוסים שמורים הרץ ניתוח סגנון.</p>;
}
return (
<div className="space-y-3">
{subtype && (
<p className="text-[0.78rem] text-ink-muted">
דפוסים בכלל הקורפוס. סינון לפי תת-סוג {subtype} ייושם בעדכון הבא.
</p>
)}
{Object.entries(data).slice(0, 4).map(([type, items]) => (
<Card key={type} className="bg-surface border-rule">
<CardContent className="px-4 py-3">
<h4 className="text-[0.78rem] uppercase tracking-wider text-gold-deep font-semibold mb-2">
{type}
</h4>
<ul className="space-y-1 text-sm text-ink">
{items.slice(0, 6).map((p, i) => (
<li key={i} className="flex items-start gap-2">
<span className="text-[0.72rem] tabular-nums text-ink-muted shrink-0 mt-0.5">
×{p.frequency}
</span>
<span>{p.pattern_text}</span>
</li>
))}
</ul>
</CardContent>
</Card>
))}
</div>
);
}

View File

@@ -1,6 +1,7 @@
"use client"; "use client";
import { Trash2 } from "lucide-react"; import { useState } from "react";
import { Trash2, Sparkles } from "lucide-react";
import { toast } from "sonner"; import { toast } from "sonner";
import { import {
Table, TableBody, TableCell, TableHead, TableHeader, TableRow, Table, TableBody, TableCell, TableHead, TableHeader, TableRow,
@@ -9,12 +10,20 @@ import { Button } from "@/components/ui/button";
import { Badge } from "@/components/ui/badge"; import { Badge } from "@/components/ui/badge";
import { Skeleton } from "@/components/ui/skeleton"; import { Skeleton } from "@/components/ui/skeleton";
import { useCorpus, useDeleteCorpusEntry, type CorpusDecision } from "@/lib/api/training"; import { useCorpus, useDeleteCorpusEntry, type CorpusDecision } from "@/lib/api/training";
import { CorpusDetailDrawer } from "./corpus-detail-drawer";
/* /*
* Corpus tab: table of all decisions currently in the style corpus, with a * Corpus tab: table of all decisions currently in the style corpus.
* single destructive action (remove from corpus). Uses browser confirm() for *
* the confirmation — a full shadcn AlertDialog would be overkill for an * Click any row → opens CorpusDetailDrawer with the enriched metadata
* admin-only destructive action with a server-side safety net. * + edit UI. The trash button is now in its own narrow column and uses
* stopPropagation so deleting a row doesn't also open the drawer.
*
* We use browser confirm() for the destructive action rather than a
* full shadcn AlertDialog because this is a single admin operation
* gated by an API-level safety net (FK cascade is best-effort but
* style_corpus DELETE returns 404 on missing rows, so the worst case
* is a no-op).
*/ */
function formatChars(n: number) { function formatChars(n: number) {
@@ -30,9 +39,12 @@ function formatDate(iso: string) {
} }
} }
function Row({ item }: { item: CorpusDecision }) { function Row({
item, onOpen,
}: { item: CorpusDecision; onOpen: () => void }) {
const del = useDeleteCorpusEntry(); const del = useDeleteCorpusEntry();
const onDelete = async () => { const onDelete = async (e: React.MouseEvent) => {
e.stopPropagation();
if (!window.confirm(`למחוק את החלטה ${item.decision_number} מהקורפוס?`)) return; if (!window.confirm(`למחוק את החלטה ${item.decision_number} מהקורפוס?`)) return;
try { try {
await del.mutateAsync(item.id); await del.mutateAsync(item.id);
@@ -43,7 +55,10 @@ function Row({ item }: { item: CorpusDecision }) {
}; };
return ( return (
<TableRow className="border-rule hover:bg-gold-wash/30"> <TableRow
className="border-rule hover:bg-gold-wash/30 cursor-pointer"
onClick={onOpen}
>
<TableCell className="font-semibold text-navy tabular-nums"> <TableCell className="font-semibold text-navy tabular-nums">
{item.decision_number || "—"} {item.decision_number || "—"}
</TableCell> </TableCell>
@@ -55,20 +70,39 @@ function Row({ item }: { item: CorpusDecision }) {
<span className="text-ink-light"></span> <span className="text-ink-light"></span>
) : ( ) : (
<div className="flex flex-wrap gap-1"> <div className="flex flex-wrap gap-1">
{item.subject_categories.map((s) => ( {item.subject_categories.slice(0, 3).map((s) => (
<Badge <Badge key={s} variant="outline"
key={s} className="text-[0.7rem] bg-gold-wash text-gold-deep border-gold/40">
variant="outline"
className="text-[0.7rem] bg-gold-wash text-gold-deep border-gold/40"
>
{s} {s}
</Badge> </Badge>
))} ))}
{item.subject_categories.length > 3 && (
<span className="text-[0.7rem] text-ink-muted">
+{item.subject_categories.length - 3}
</span>
)}
</div> </div>
)} )}
</TableCell> </TableCell>
<TableCell className="text-[0.78rem] text-ink-soft">
<div className="flex items-center gap-2">
<span className="truncate">{item.legal_citation || "—"}</span>
{item.lessons_count > 0 && (
<Badge variant="outline"
className="text-[0.7rem] bg-info-bg text-info border-info/40 shrink-0">
<Sparkles className="w-3 h-3 me-0.5" />
{item.lessons_count}
</Badge>
)}
</div>
</TableCell>
<TableCell className="text-ink-soft tabular-nums"> <TableCell className="text-ink-soft tabular-nums">
{formatChars(item.chars)} {formatChars(item.chars)}
{item.page_count > 0 && (
<span className="text-ink-muted text-[0.72rem] ms-1">
· {item.page_count} ע׳
</span>
)}
</TableCell> </TableCell>
<TableCell className="text-ink-muted tabular-nums text-[0.78rem]"> <TableCell className="text-ink-muted tabular-nums text-[0.78rem]">
{formatDate(item.created_at)} {formatDate(item.created_at)}
@@ -91,6 +125,7 @@ function Row({ item }: { item: CorpusDecision }) {
export function CorpusPanel() { export function CorpusPanel() {
const { data, isPending, error } = useCorpus(); const { data, isPending, error } = useCorpus();
const [selected, setSelected] = useState<CorpusDecision | null>(null);
if (error) { if (error) {
return ( return (
@@ -101,6 +136,7 @@ export function CorpusPanel() {
} }
return ( return (
<>
<div className="rounded-lg border border-rule bg-surface shadow-sm overflow-hidden"> <div className="rounded-lg border border-rule bg-surface shadow-sm overflow-hidden">
<Table> <Table>
<TableHeader className="bg-rule-soft/60"> <TableHeader className="bg-rule-soft/60">
@@ -108,7 +144,8 @@ export function CorpusPanel() {
<TableHead className="text-navy text-right">מס׳ החלטה</TableHead> <TableHead className="text-navy text-right">מס׳ החלטה</TableHead>
<TableHead className="text-navy text-right">תאריך</TableHead> <TableHead className="text-navy text-right">תאריך</TableHead>
<TableHead className="text-navy text-right">נושאים</TableHead> <TableHead className="text-navy text-right">נושאים</TableHead>
<TableHead className="text-navy text-right">תווים</TableHead> <TableHead className="text-navy text-right">מראה מקום</TableHead>
<TableHead className="text-navy text-right">תווים / עמודים</TableHead>
<TableHead className="text-navy text-right">נוסף בתאריך</TableHead> <TableHead className="text-navy text-right">נוסף בתאריך</TableHead>
<TableHead className="text-navy" /> <TableHead className="text-navy" />
</TableRow> </TableRow>
@@ -117,7 +154,7 @@ export function CorpusPanel() {
{isPending ? ( {isPending ? (
[...Array(4)].map((_, i) => ( [...Array(4)].map((_, i) => (
<TableRow key={i} className="border-rule"> <TableRow key={i} className="border-rule">
{[...Array(6)].map((_, j) => ( {[...Array(7)].map((_, j) => (
<TableCell key={j}> <TableCell key={j}>
<Skeleton className="h-4 w-24" /> <Skeleton className="h-4 w-24" />
</TableCell> </TableCell>
@@ -126,15 +163,23 @@ export function CorpusPanel() {
)) ))
) : data?.length === 0 ? ( ) : data?.length === 0 ? (
<TableRow> <TableRow>
<TableCell colSpan={6} className="text-center text-ink-muted py-12"> <TableCell colSpan={7} className="text-center text-ink-muted py-12">
הקורפוס ריק הקורפוס ריק
</TableCell> </TableCell>
</TableRow> </TableRow>
) : ( ) : (
data?.map((item) => <Row key={item.id} item={item} />) data?.map((item) => (
<Row key={item.id} item={item} onOpen={() => setSelected(item)} />
))
)} )}
</TableBody> </TableBody>
</Table> </Table>
</div> </div>
<CorpusDetailDrawer
decision={selected}
onOpenChange={(open) => { if (!open) setSelected(null); }}
/>
</>
); );
} }

View File

@@ -0,0 +1,338 @@
"use client";
/*
* Curator-Portrait tab — shows everything about the agent that learns
* Daphna's style:
* 1. Snapshot stats (curator findings to date, % applied)
* 2. Recent curator findings (last 10) — linked by decision number
* 3. The hermes-curator system prompt, rendered + linked to Gitea
* 4. The style_analyzer training prompts (different lifecycle — runs
* over the corpus at training time, not per-decision)
* 5. Propose-change form — writes a markdown file to disk for chair
* review (no auto-commit)
*
* The prompts are deliberately read-only here: they're symlinked into
* Paperclip and load-bearing for every curator wake. Editing them from
* the UI would silently fork the source of truth.
*/
import { useState } from "react";
import {
Sparkles, ExternalLink, Send, Loader2, FileText, Brain,
CheckCircle2, Clock,
} from "lucide-react";
import { toast } from "sonner";
import { Card, CardContent } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Textarea } from "@/components/ui/textarea";
import { Badge } from "@/components/ui/badge";
import { Skeleton } from "@/components/ui/skeleton";
import { ScrollArea } from "@/components/ui/scroll-area";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { Markdown } from "@/components/ui/markdown";
import {
useCuratorPrompt,
useCuratorStats,
useStyleAnalyzerPrompts,
useSubmitCuratorProposal,
} from "@/lib/api/training";
export function CuratorPortraitPanel() {
return (
<div className="space-y-6">
<StatsCard />
<RecentFindings />
<Tabs defaultValue="curator-prompt" dir="rtl">
<TabsList className="bg-rule-soft/60">
<TabsTrigger value="curator-prompt">פרומפט ה-Curator</TabsTrigger>
<TabsTrigger value="analyzer-prompt">פרומפט אימון הסגנון</TabsTrigger>
<TabsTrigger value="propose">הצעת שינוי</TabsTrigger>
</TabsList>
<TabsContent value="curator-prompt" className="mt-4">
<CuratorPromptCard />
</TabsContent>
<TabsContent value="analyzer-prompt" className="mt-4">
<StyleAnalyzerPromptCard />
</TabsContent>
<TabsContent value="propose" className="mt-4">
<ProposeChangeForm />
</TabsContent>
</Tabs>
</div>
);
}
// ── stats card ─────────────────────────────────────────────────────
function StatsCard() {
const { data, isPending } = useCuratorStats();
if (isPending) {
return (
<div className="grid grid-cols-2 md:grid-cols-4 gap-3">
{[...Array(4)].map((_, i) => <Skeleton key={i} className="h-20 w-full" />)}
</div>
);
}
if (!data) return null;
return (
<div className="grid grid-cols-2 md:grid-cols-4 gap-3">
<Kpi label="ממצאי curator" value={data.total_findings} icon={<Sparkles className="w-4 h-4" />} />
<Kpi label="החלטות שנסקרו" value={`${data.decisions_with_findings}/${data.decisions_total}`} icon={<FileText className="w-4 h-4" />} />
<Kpi label="ממצאים שאומצו ל-SKILL" value={data.findings_applied} icon={<CheckCircle2 className="w-4 h-4" />} />
<Kpi label="ממוצע ממצאים להחלטה"
value={
data.decisions_with_findings > 0
? (data.total_findings / data.decisions_with_findings).toFixed(1)
: "—"
}
icon={<Brain className="w-4 h-4" />}
/>
</div>
);
}
function Kpi({
label, value, icon,
}: { label: string; value: string | number; icon: React.ReactNode }) {
return (
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3">
<div className="flex items-center gap-2 text-ink-muted text-[0.78rem]">
{icon}
<span>{label}</span>
</div>
<p className="text-2xl text-navy font-semibold tabular-nums mt-1">{value}</p>
</CardContent>
</Card>
);
}
// ── recent findings ────────────────────────────────────────────────
function RecentFindings() {
const { data, isPending } = useCuratorStats();
if (isPending) {
return <Skeleton className="h-40 w-full" />;
}
if (!data || data.recent_findings.length === 0) {
return (
<Card className="bg-rule-soft/40 border-rule">
<CardContent className="px-6 py-5 text-center text-ink-muted text-sm">
אין עדיין ממצאים של ה-Curator. הוא מופעל אוטומטית כאשר דפנה מסמנת
החלטה כסופית (mark-final), ושומר את ממצאיו כ-decision_lessons עם
source=&quot;curator&quot;.
</CardContent>
</Card>
);
}
return (
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3">
<h3 className="text-[0.78rem] uppercase tracking-wider text-gold-deep font-semibold mb-3">
ממצאים אחרונים של ה-Curator
</h3>
<ul className="space-y-2">
{data.recent_findings.map((f) => (
<li key={f.id} className="border-b border-rule pb-2 last:border-0 last:pb-0">
<div className="flex items-center gap-2 text-[0.72rem] mb-1">
<Badge variant="outline"
className="bg-info-bg text-info border-info/40">
{f.category}
</Badge>
<span className="text-navy font-semibold tabular-nums">
{f.decision_number || "—"}
</span>
{f.applied_to_skill && (
<Badge variant="outline"
className="bg-success-bg text-success border-success/40">
<CheckCircle2 className="w-3 h-3 me-0.5" />
אומץ
</Badge>
)}
<span className="grow text-ink-muted text-end">
<Clock className="w-3 h-3 inline me-1" />
{new Date(f.created_at).toLocaleDateString("he-IL")}
</span>
</div>
<p className="text-sm text-ink leading-relaxed">{f.lesson_text}</p>
</li>
))}
</ul>
</CardContent>
</Card>
);
}
// ── prompts ────────────────────────────────────────────────────────
function CuratorPromptCard() {
const { data, isPending, error } = useCuratorPrompt();
if (isPending) return <Skeleton className="h-96 w-full" />;
if (error) {
return (
<Card className="bg-danger-bg border-danger/40">
<CardContent className="px-6 py-4 text-danger">{error.message}</CardContent>
</Card>
);
}
if (!data) return null;
return (
<Card className="bg-surface border-rule">
<CardContent className="px-5 py-4 space-y-3">
<div className="flex items-center justify-between gap-2 flex-wrap">
<div>
<h3 className="text-navy font-semibold">{data.filename}</h3>
<p className="text-[0.72rem] text-ink-muted">
{data.bytes.toLocaleString("he-IL")} בייטים ·
עודכן: {new Date(data.last_modified * 1000).toLocaleString("he-IL")}
</p>
</div>
<Button asChild variant="outline" size="sm">
<a href={data.gitea_url} target="_blank" rel="noopener noreferrer">
<ExternalLink className="w-3 h-3 me-1" />
ערוך ב-Gitea
</a>
</Button>
</div>
<ScrollArea className="h-[520px] pe-2 border border-rule rounded p-3 bg-rule-soft/30">
<Markdown content={data.content} />
</ScrollArea>
</CardContent>
</Card>
);
}
function StyleAnalyzerPromptCard() {
const { data, isPending } = useStyleAnalyzerPrompts();
if (isPending) return <Skeleton className="h-96 w-full" />;
if (!data) return null;
return (
<Card className="bg-surface border-rule">
<CardContent className="px-5 py-4 space-y-3">
<div>
<h3 className="text-navy font-semibold">פרומפטים של style_analyzer.py</h3>
<p className="text-[0.72rem] text-ink-muted">
רץ ב-Claude Opus (1M context, עד {data.max_input_tokens.toLocaleString("he-IL")} tokens
input) דרך claude CLI מקומי חינמי, ללא API. נקרא ע&quot;י
<code className="px-1 mx-1 bg-rule-soft rounded">POST /api/training/analyze-style</code>
ומכניס דפוסים ל-<code className="px-1 bg-rule-soft rounded">style_patterns</code>.
</p>
</div>
<Tabs defaultValue="analysis" dir="rtl">
<TabsList className="bg-rule-soft/60">
<TabsTrigger value="analysis">Single-pass (כל הקורפוס)</TabsTrigger>
<TabsTrigger value="single">Multi-pass (החלטה אחת)</TabsTrigger>
<TabsTrigger value="synthesis">Synthesis</TabsTrigger>
</TabsList>
<TabsContent value="analysis" className="mt-3">
<PromptBlock content={data.analysis_prompt} />
</TabsContent>
<TabsContent value="single" className="mt-3">
<PromptBlock content={data.single_decision_prompt} />
</TabsContent>
<TabsContent value="synthesis" className="mt-3">
<PromptBlock content={data.synthesis_prompt} />
</TabsContent>
</Tabs>
</CardContent>
</Card>
);
}
function PromptBlock({ content }: { content: string }) {
return (
<ScrollArea className="h-[420px] pe-2 border border-rule rounded p-3 bg-rule-soft/30">
<pre className="text-[0.78rem] whitespace-pre-wrap font-mono text-ink leading-relaxed"
dir="rtl">
{content}
</pre>
</ScrollArea>
);
}
// ── propose change form ────────────────────────────────────────────
function ProposeChangeForm() {
const [title, setTitle] = useState("");
const [proposedChange, setProposedChange] = useState("");
const [rationale, setRationale] = useState("");
const submit = useSubmitCuratorProposal();
const onSubmit = async (e: React.FormEvent) => {
e.preventDefault();
if (!title.trim() || !proposedChange.trim()) {
toast.error("חובה כותרת ושינוי מוצע");
return;
}
try {
const r = await submit.mutateAsync({
title: title.trim(),
proposed_change: proposedChange.trim(),
rationale: rationale.trim(),
});
toast.success(`נשמרה הצעה: ${r.filename}`);
setTitle(""); setProposedChange(""); setRationale("");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בשמירה");
}
};
return (
<Card className="bg-surface border-rule">
<CardContent className="px-5 py-4">
<h3 className="text-navy font-semibold mb-2">הצעת שינוי לפרומפט ה-Curator</h3>
<p className="text-[0.78rem] text-ink-muted mb-4">
ההצעה תישמר כקובץ Markdown ב-
<code className="px-1 bg-rule-soft rounded">data/curator-proposals/</code>.
חיים יבחן ויאשר ידנית אין שינוי אוטומטי בפרומפט.
</p>
<form onSubmit={onSubmit} className="space-y-3">
<div className="space-y-1">
<Label htmlFor="proposal-title">כותרת השינוי</Label>
<Input id="proposal-title" value={title}
onChange={(e) => setTitle(e.target.value)}
placeholder="לדוגמה: הוסף קטגוריה [צ׳קליסט תוכן] לממצאי ה-curator"
dir="rtl" />
</div>
<div className="space-y-1">
<Label htmlFor="proposal-change">השינוי המוצע (Markdown)</Label>
<Textarea id="proposal-change" value={proposedChange} rows={6}
onChange={(e) => setProposedChange(e.target.value)}
placeholder={"תאר במדויק מה לשנות. אפשר להעתיק את הקטע הקיים ולסמן ב-strikethrough + להוסיף את החדש."}
dir="rtl" />
</div>
<div className="space-y-1">
<Label htmlFor="proposal-rationale">נימוק</Label>
<Textarea id="proposal-rationale" value={rationale} rows={3}
onChange={(e) => setRationale(e.target.value)}
placeholder="למה השינוי הזה חשוב? איזה בעיה הוא פותר?"
dir="rtl" />
</div>
<div className="flex justify-end">
<Button type="submit" disabled={submit.isPending}
className="bg-navy text-parchment hover:bg-navy-soft">
{submit.isPending ? (
<Loader2 className="w-4 h-4 animate-spin me-1" />
) : (
<Send className="w-4 h-4 me-1" />
)}
שלח הצעה
</Button>
</div>
</form>
</CardContent>
</Card>
);
}

View File

@@ -0,0 +1,267 @@
"use client";
/*
* Per-decision lessons editor — lives inside CorpusDetailDrawer's
* "מה למדנו" tab. Lessons are persisted in the decision_lessons table
* (one-to-many on style_corpus) and consumed by hermes-curator and
* future style_analyzer runs as context.
*
* The chair can:
* - Add a lesson typed manually (category = "general" by default)
* - Edit / delete existing lessons
* - Mark a lesson as "applied_to_skill" (informational — doesn't
* auto-commit anything to SKILL.md; chair still curates that file
* manually in git).
*
* Lessons from the curator arrive with source="curator" and are visually
* distinguished by a badge so the chair can audit auto-suggestions.
*/
import { useState } from "react";
import { Plus, Save, Trash2, Loader2, CheckCircle2, Sparkles } from "lucide-react";
import { toast } from "sonner";
import { Button } from "@/components/ui/button";
import { Card, CardContent } from "@/components/ui/card";
import { Textarea } from "@/components/ui/textarea";
import { Badge } from "@/components/ui/badge";
import { Skeleton } from "@/components/ui/skeleton";
import {
Select, SelectContent, SelectItem, SelectTrigger, SelectValue,
} from "@/components/ui/select";
import {
useAddLesson,
useCorpusLessons,
useDeleteLesson,
usePatchLesson,
type DecisionLesson,
} from "@/lib/api/training";
const CATEGORIES = [
{ value: "general", label: "כללי" },
{ value: "style", label: "סגנון" },
{ value: "structure", label: "מבנה" },
{ value: "lexicon", label: "לקסיקון" },
{ value: "tabular", label: "טבלאי" },
] as const;
const SOURCE_BADGE: Record<DecisionLesson["source"], { label: string; cls: string }> = {
manual: { label: "ידני", cls: "bg-rule-soft text-ink-soft" },
chair: { label: "יו״ר", cls: "bg-gold-wash text-gold-deep" },
curator: { label: "Curator", cls: "bg-info-bg text-info" },
style_analyzer: { label: "Analyzer", cls: "bg-success-bg text-success" },
};
export function LessonsTab({ corpusId }: { corpusId: string }) {
const { data, isPending } = useCorpusLessons(corpusId);
const add = useAddLesson(corpusId);
const [draftText, setDraftText] = useState("");
const [draftCategory, setDraftCategory] = useState<DecisionLesson["category"]>("general");
const onAdd = async () => {
const text = draftText.trim();
if (!text) return;
try {
await add.mutateAsync({ lesson_text: text, category: draftCategory });
setDraftText("");
setDraftCategory("general");
toast.success("הלקח נוסף");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בשמירה");
}
};
return (
<div className="space-y-4">
{/* Composer */}
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3 space-y-2">
<h4 className="text-[0.78rem] uppercase tracking-wider text-gold-deep font-semibold">
הוסף לקח להחלטה
</h4>
<Textarea
value={draftText}
onChange={(e) => setDraftText(e.target.value)}
placeholder="מה למדנו מההחלטה הזו? למשל: 'דפנה מעדיפה הוצאות מתונות (5K-10K ₪) גם בערר שהתקבל במלואו'"
rows={3}
dir="rtl"
disabled={add.isPending}
/>
<div className="flex items-center gap-2">
<Select
value={draftCategory}
onValueChange={(v) => setDraftCategory(v as DecisionLesson["category"])}
disabled={add.isPending}
dir="rtl"
>
<SelectTrigger className="w-40">
<SelectValue />
</SelectTrigger>
<SelectContent>
{CATEGORIES.map((c) => (
<SelectItem key={c.value} value={c.value}>{c.label}</SelectItem>
))}
</SelectContent>
</Select>
<div className="grow" />
<Button onClick={onAdd} disabled={add.isPending || !draftText.trim()}
className="bg-navy text-parchment hover:bg-navy-soft">
{add.isPending ? (
<Loader2 className="w-4 h-4 animate-spin me-1" />
) : (
<Plus className="w-4 h-4 me-1" />
)}
שמור לקח
</Button>
</div>
</CardContent>
</Card>
{/* List */}
{isPending ? (
<div className="space-y-2">
{[...Array(3)].map((_, i) => (
<Skeleton key={i} className="h-16 w-full" />
))}
</div>
) : !data || data.length === 0 ? (
<p className="text-center text-ink-muted text-sm py-6">
אין עדיין לקחים להחלטה זו. הוסף לקח ראשון מלמעלה.
</p>
) : (
<div className="space-y-2">
{data.map((lesson) => (
<LessonItem key={lesson.id} lesson={lesson} corpusId={corpusId} />
))}
</div>
)}
</div>
);
}
function LessonItem({
lesson, corpusId,
}: { lesson: DecisionLesson; corpusId: string }) {
const [editing, setEditing] = useState(false);
const [text, setText] = useState(lesson.lesson_text);
const [category, setCategory] = useState<DecisionLesson["category"]>(lesson.category);
const patch = usePatchLesson(corpusId);
const del = useDeleteLesson(corpusId);
const sourceBadge = SOURCE_BADGE[lesson.source];
const dirty = text !== lesson.lesson_text || category !== lesson.category;
const onSave = async () => {
try {
await patch.mutateAsync({
id: lesson.id,
patch: dirty ? { lesson_text: text, category } : {},
});
setEditing(false);
toast.success("הלקח עודכן");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בעדכון");
}
};
const onToggleApplied = async () => {
try {
await patch.mutateAsync({
id: lesson.id,
patch: { applied_to_skill: !lesson.applied_to_skill },
});
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל בעדכון");
}
};
const onDelete = async () => {
if (!window.confirm("למחוק את הלקח?")) return;
try {
await del.mutateAsync(lesson.id);
toast.success("נמחק");
} catch (e) {
toast.error(e instanceof Error ? e.message : "כשל במחיקה");
}
};
return (
<Card className="bg-surface border-rule">
<CardContent className="px-4 py-3 space-y-2">
<div className="flex items-center gap-2 text-[0.72rem]">
<Badge variant="outline"
className="bg-rule-soft text-ink-soft">
{CATEGORIES.find((c) => c.value === lesson.category)?.label || lesson.category}
</Badge>
<Badge variant="outline" className={sourceBadge.cls}>
{sourceBadge.label}
</Badge>
{lesson.applied_to_skill && (
<Badge variant="outline"
className="bg-success-bg text-success border-success/40">
<CheckCircle2 className="w-3 h-3 me-1" />
אומץ
</Badge>
)}
<span className="grow text-ink-muted tabular-nums">
{new Date(lesson.created_at).toLocaleDateString("he-IL")}
</span>
</div>
{editing ? (
<>
<Textarea value={text} onChange={(e) => setText(e.target.value)}
rows={3} dir="rtl" />
<div className="flex items-center gap-2">
<Select value={category}
onValueChange={(v) => setCategory(v as DecisionLesson["category"])}
dir="rtl">
<SelectTrigger className="w-40">
<SelectValue />
</SelectTrigger>
<SelectContent>
{CATEGORIES.map((c) => (
<SelectItem key={c.value} value={c.value}>{c.label}</SelectItem>
))}
</SelectContent>
</Select>
<div className="grow" />
<Button variant="ghost" size="sm"
onClick={() => { setEditing(false); setText(lesson.lesson_text); setCategory(lesson.category); }}>
ביטול
</Button>
<Button size="sm" onClick={onSave} disabled={patch.isPending}
className="bg-navy text-parchment hover:bg-navy-soft">
<Save className="w-3 h-3 me-1" />
שמור
</Button>
</div>
</>
) : (
<>
<p className="text-sm text-ink leading-relaxed whitespace-pre-wrap"
onClick={() => setEditing(true)}
style={{ cursor: "text" }}>
{lesson.lesson_text}
</p>
<div className="flex items-center gap-2">
<Button variant="ghost" size="sm" onClick={onToggleApplied}
disabled={patch.isPending}>
<Sparkles className="w-3 h-3 me-1" />
{lesson.applied_to_skill ? "בטל סימון 'אומץ'" : "סמן כ'אומץ ל-SKILL'"}
</Button>
<Button variant="ghost" size="sm" onClick={() => setEditing(true)}>
ערוך
</Button>
<div className="grow" />
<Button variant="ghost" size="sm" onClick={onDelete}
disabled={del.isPending}
className="text-danger hover:text-danger hover:bg-danger-bg">
<Trash2 className="w-3 h-3" />
</Button>
</div>
</>
)}
</CardContent>
</Card>
);
}

View File

@@ -0,0 +1,328 @@
"use client";
/*
* Upload a Daphna decision into the style corpus, from the /training page.
*
* The flow is three explicit steps inside the same sheet:
* 1. file picker → POST /api/upload (gets sanitized filename)
* 2. preview → POST /api/training/analyze (proofread + auto-extracted meta)
* chair can correct decision_number / decision_date / subjects
* 3. commit → POST /api/training/upload (background task)
* progress watched via SSE; on completion we invalidate
* corpus + style-report so the new row appears.
*
* The Sheet UX mirrors precedent-upload-sheet.tsx: same dir="rtl", same
* loading + error patterns, same toast on success. The reason this isn't
* a single one-click upload is that style-corpus rows are write-once
* (we don't allow editing full_text), so the chair MUST see the proofread
* preview before committing — otherwise a bad OCR/proofread can silently
* pollute the style portrait.
*/
import { useEffect, useState } from "react";
import { Upload, Loader2, CheckCircle2, AlertCircle, FileText } from "lucide-react";
import { toast } from "sonner";
import { useQueryClient } from "@tanstack/react-query";
import {
Sheet, SheetContent, SheetHeader, SheetTitle, SheetDescription,
} from "@/components/ui/sheet";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Progress } from "@/components/ui/progress";
import { Badge } from "@/components/ui/badge";
import {
trainingKeys,
useAnalyzeTraining,
useCommitTrainingUpload,
useUploadFile,
type AnalyzeTrainingResponse,
} from "@/lib/api/training";
import { useProgress } from "@/lib/api/documents";
const ACCEPT = ".pdf,.docx,.doc,.rtf,.txt,.md";
type Props = {
open: boolean;
onOpenChange: (open: boolean) => void;
};
type Stage = "pick" | "analyzing" | "preview" | "committing" | "done" | "error";
export function TrainingUploadDialog({ open, onOpenChange }: Props) {
const [stage, setStage] = useState<Stage>("pick");
const [file, setFile] = useState<File | null>(null);
const [analysis, setAnalysis] = useState<AnalyzeTrainingResponse | null>(null);
// editable copies of the auto-extracted metadata
const [decisionNumber, setDecisionNumber] = useState("");
const [decisionDate, setDecisionDate] = useState("");
const [subjectsRaw, setSubjectsRaw] = useState("");
const [title, setTitle] = useState("");
const [taskId, setTaskId] = useState<string | null>(null);
const [errorMsg, setErrorMsg] = useState("");
const uploadFile = useUploadFile();
const analyze = useAnalyzeTraining();
const commit = useCommitTrainingUpload();
const progress = useProgress(taskId);
const qc = useQueryClient();
// Reset everything when the sheet closes — important because Sheet keeps
// the component mounted between opens. The cascade-render warning is the
// intended behavior (reset is the side effect we want).
useEffect(() => {
if (open) return;
/* eslint-disable react-hooks/set-state-in-effect */
setStage("pick"); setFile(null); setAnalysis(null);
setDecisionNumber(""); setDecisionDate(""); setSubjectsRaw("");
setTitle(""); setTaskId(null); setErrorMsg("");
/* eslint-enable react-hooks/set-state-in-effect */
}, [open]);
// Watch background task. When complete, invalidate corpus + report so the
// new row + updated stats show up automatically. The setStage call here
// is the deliberate UX (success card → auto-close) — synchronizing UI
// with the external SSE stream is exactly what effects are for.
useEffect(() => {
if (!progress) return;
if (progress.status === "completed") {
qc.invalidateQueries({ queryKey: trainingKeys.corpus() });
qc.invalidateQueries({ queryKey: trainingKeys.report() });
// eslint-disable-next-line react-hooks/set-state-in-effect
setStage("done");
toast.success(`החלטה ${decisionNumber || analysis?.decision_number || ""} נוספה לקורפוס`);
const t = window.setTimeout(() => onOpenChange(false), 1500);
return () => window.clearTimeout(t);
}
if (progress.status === "failed") {
setStage("error");
setErrorMsg(progress.error || "כשל בעיבוד");
}
}, [progress, analysis, decisionNumber, qc, onOpenChange]);
const onPickFile = async (f: File | null) => {
setFile(f);
setErrorMsg("");
if (!f) return;
setStage("analyzing");
try {
const { filename } = await uploadFile.mutateAsync(f);
const result = await analyze.mutateAsync(filename);
setAnalysis(result);
setDecisionNumber(result.decision_number);
setDecisionDate(result.decision_date);
setSubjectsRaw(result.subject_categories.join(", "));
// Default title from the original filename stem (chair can override).
const stem = f.name.replace(/\.[^.]+$/, "");
setTitle(stem);
setStage("preview");
} catch (e) {
setStage("error");
setErrorMsg(e instanceof Error ? e.message : "כשל בקריאת הקובץ");
}
};
const onCommit = async () => {
if (!analysis) return;
setStage("committing");
setErrorMsg("");
try {
const subjects = subjectsRaw
.split(/[,،]/)
.map((s) => s.trim())
.filter(Boolean);
const res = await commit.mutateAsync({
filename: analysis.filename,
decision_number: decisionNumber.trim(),
decision_date: decisionDate || "",
subject_categories: subjects,
title: title.trim() || undefined,
});
setTaskId(res.task_id);
} catch (e) {
setStage("error");
// 409 = duplicate decision_number — surface the backend's Hebrew message.
setErrorMsg(e instanceof Error ? e.message : "כשל בהעלאה");
}
};
const isProcessing =
stage === "analyzing" || stage === "committing" ||
(taskId !== null && progress?.status !== "completed" && progress?.status !== "failed");
const progressStep = (progress as { step?: string } | null)?.step;
return (
<Sheet open={open} onOpenChange={onOpenChange}>
<SheetContent side="left" className="w-full sm:max-w-2xl overflow-y-auto" dir="rtl">
<SheetHeader>
<SheetTitle className="text-navy">העלאת החלטה לקורפוס הסגנון</SheetTitle>
<SheetDescription className="text-ink-muted">
הקובץ יעבור הגהה (סינון Nevo, ניקוד), חילוץ אוטומטי של מספר תיק, תאריך
ונושאים, ויוטמע ב-style_corpus עם chunks ו-embeddings. תוכל לתקן את
פרטי המטא-דאטה לפני שמירה.
</SheetDescription>
</SheetHeader>
<div className="px-6 pb-6 mt-4 space-y-4">
{/* Step 1: pick */}
{stage === "pick" && (
<div className="space-y-2">
<Label htmlFor="t-file">קובץ ההחלטה (PDF / DOCX / DOC / RTF / TXT / MD)</Label>
<Input
id="t-file" type="file" accept={ACCEPT}
onChange={(e) => onPickFile(e.target.files?.[0] ?? null)}
/>
<p className="text-[0.78rem] text-ink-muted">
המערכת תחלץ מהקובץ את מספר התיק, התאריך והנושאים. תוכל לערוך
לפני השמירה.
</p>
</div>
)}
{/* Stage 2: analyzing the file */}
{stage === "analyzing" && (
<div className="rounded-lg border border-rule bg-rule-soft/40 p-6 space-y-2 text-center">
<Loader2 className="w-5 h-5 animate-spin mx-auto text-navy" />
<p className="text-sm text-navy">מבצע הגהה וחילוץ מטא-דאטה</p>
<p className="text-[0.78rem] text-ink-muted">
{file?.name}
</p>
</div>
)}
{/* Stage 3: preview + editable metadata */}
{stage === "preview" && analysis && (
<form
className="space-y-4"
onSubmit={(e) => { e.preventDefault(); onCommit(); }}
>
<div className="rounded-lg border border-rule bg-surface px-4 py-3">
<h3 className="text-[0.78rem] uppercase tracking-wider text-gold-deep font-semibold mb-2">
תצוגה מקדימה של הטקסט הנקי
</h3>
<p className="text-sm text-ink leading-relaxed line-clamp-6 whitespace-pre-wrap">
{analysis.preview}
</p>
<div className="mt-2 flex items-center gap-3 text-[0.72rem] text-ink-muted tabular-nums">
<span className="flex items-center gap-1">
<FileText className="w-3 h-3" />
{analysis.chars.toLocaleString("he-IL")} תווים
</span>
</div>
</div>
<div className="grid grid-cols-2 gap-3">
<div className="space-y-1">
<Label htmlFor="t-decision-number">מספר ההחלטה</Label>
<Input
id="t-decision-number"
value={decisionNumber}
onChange={(e) => setDecisionNumber(e.target.value)}
placeholder="1130-25"
dir="rtl"
/>
</div>
<div className="space-y-1">
<Label htmlFor="t-decision-date">תאריך ההחלטה</Label>
<Input
id="t-decision-date" type="date"
value={decisionDate}
onChange={(e) => setDecisionDate(e.target.value)}
/>
</div>
</div>
<div className="space-y-1">
<Label htmlFor="t-title">כותרת קצרה (אופציונלי)</Label>
<Input
id="t-title" value={title}
onChange={(e) => setTitle(e.target.value)}
placeholder="ARAR-25-1130 - כרמל יצחק" dir="rtl"
/>
</div>
<div className="space-y-1">
<Label htmlFor="t-subjects">נושאים (מופרדים בפסיקים)</Label>
<Input
id="t-subjects" value={subjectsRaw}
onChange={(e) => setSubjectsRaw(e.target.value)}
placeholder="חניה, קווי בניין, שימוש חורג" dir="rtl"
/>
{analysis.subject_categories.length > 0 && (
<div className="flex flex-wrap gap-1 mt-1">
<span className="text-[0.72rem] text-ink-muted">חולץ אוטומטית:</span>
{analysis.subject_categories.map((s) => (
<Badge key={s} variant="outline"
className="text-[0.7rem] bg-gold-wash text-gold-deep border-gold/40">
{s}
</Badge>
))}
</div>
)}
</div>
{errorMsg && (
<div className="rounded-lg border border-danger/40 bg-danger-bg p-3 flex items-center gap-2 text-danger text-sm">
<AlertCircle className="w-4 h-4 shrink-0" />
{errorMsg}
</div>
)}
<div className="flex gap-2 justify-end pt-2">
<Button type="button" variant="ghost"
onClick={() => onOpenChange(false)}
disabled={isProcessing}>
ביטול
</Button>
<Button type="submit" disabled={isProcessing || !decisionNumber.trim()}
className="bg-navy text-parchment hover:bg-navy-soft">
<Upload className="w-4 h-4 me-1" />
שמור בקורפוס
</Button>
</div>
</form>
)}
{/* Stage 4: committing — background task progress */}
{(stage === "committing" || (taskId && stage !== "done" && stage !== "error")) && (
<div className="rounded-lg border border-rule bg-rule-soft/40 p-4 space-y-2">
<div className="flex items-center gap-2 text-sm text-navy">
<Loader2 className="w-4 h-4 animate-spin" />
<span>{progressStep || "מעבד את ההחלטה לקורפוס"}</span>
</div>
<Progress value={progressStep ? 60 : 30} className="h-1.5" />
</div>
)}
{/* Stage 5: success */}
{stage === "done" && (
<div className="rounded-lg border border-gold/40 bg-gold-wash p-4 flex items-center gap-2 text-gold-deep text-sm">
<CheckCircle2 className="w-4 h-4" />
ההחלטה נוספה לקורפוס בהצלחה.
</div>
)}
{/* Stage 6: error (after a failed analyze or upload) */}
{stage === "error" && (
<div className="space-y-3">
<div className="rounded-lg border border-danger/40 bg-danger-bg p-4 flex items-center gap-2 text-danger text-sm">
<AlertCircle className="w-4 h-4 shrink-0" />
{errorMsg || "שגיאה לא ידועה"}
</div>
<div className="flex gap-2 justify-end">
<Button type="button" variant="ghost"
onClick={() => onOpenChange(false)}>
סגור
</Button>
<Button type="button"
onClick={() => { setStage("pick"); setErrorMsg(""); setFile(null); }}>
נסה קובץ אחר
</Button>
</div>
</div>
)}
</div>
</SheetContent>
</Sheet>
);
}

View File

@@ -7,10 +7,13 @@
* - GET /corpus → flat list of decisions for the corpus tab / compare tool * - GET /corpus → flat list of decisions for the corpus tab / compare tool
* - GET /compare?a=UUID&b=UUID → side-by-side comparison * - GET /compare?a=UUID&b=UUID → side-by-side comparison
* - DELETE /corpus/{id} → remove a decision from the corpus * - DELETE /corpus/{id} → remove a decision from the corpus
* - POST /api/upload → multipart file → returns sanitized filename
* - POST /analyze → proofread + extract metadata for preview
* - POST /upload → commit a proofread decision to the corpus (task_id)
*/ */
import { useMutation, useQuery, useQueryClient } from "@tanstack/react-query"; import { useMutation, useQuery, useQueryClient } from "@tanstack/react-query";
import { apiRequest } from "./client"; import { ApiError, apiRequest } from "./client";
export type StyleReport = { export type StyleReport = {
corpus: { corpus: {
@@ -69,6 +72,29 @@ export type CorpusDecision = {
subject_categories: string[]; subject_categories: string[];
chars: number; chars: number;
created_at: string; created_at: string;
// Enriched metadata (added in the corpus-page upgrade).
summary: string;
outcome: string;
key_principles: string[];
appeal_subtype: string;
practice_area: string;
page_count: number;
document_id: string | null;
doc_title: string;
parties: { appellant: string; respondent: string };
legal_citation: string;
lessons_count: number;
};
export type CorpusDecisionPatch = {
decision_number?: string;
decision_date?: string;
subject_categories?: string[];
summary?: string;
outcome?: string;
key_principles?: string[];
appeal_subtype?: string;
practice_area?: string;
}; };
export type CompareResult = { export type CompareResult = {
@@ -149,3 +175,407 @@ export function useDeleteCorpusEntry() {
}, },
}); });
} }
// ── Style-agent chat ─────────────────────────────────────────────
export type ChatConversation = {
id: string;
title: string;
style_corpus_id: string | null;
decision_number: string;
claude_session_id: string | null;
message_count: number;
created_at: string;
last_message_at: string;
};
export type ChatMessage = {
id: string;
role: "user" | "assistant";
content: string;
created_at: string;
};
export type ChatHealth = {
reachable: boolean;
status?: number;
url: string;
error?: string;
};
export const chatKeys = {
conversations: () => [...trainingKeys.all, "chat", "conversations"] as const,
conversation: (id: string) =>
[...trainingKeys.all, "chat", "conversations", id] as const,
health: () => [...trainingKeys.all, "chat", "health"] as const,
};
export function useChatConversations() {
return useQuery({
queryKey: chatKeys.conversations(),
queryFn: ({ signal }) =>
apiRequest<ChatConversation[]>("/api/training/chat/conversations", { signal }),
staleTime: 15_000,
});
}
export function useChatConversation(convId: string | null) {
return useQuery({
queryKey: chatKeys.conversation(convId ?? ""),
queryFn: ({ signal }) =>
apiRequest<{ conversation: ChatConversation; messages: ChatMessage[] }>(
`/api/training/chat/conversations/${encodeURIComponent(convId!)}`,
{ signal },
),
enabled: Boolean(convId),
staleTime: 5_000,
});
}
export function useChatHealth() {
return useQuery({
queryKey: chatKeys.health(),
queryFn: ({ signal }) =>
apiRequest<ChatHealth>("/api/training/chat/health", { signal }),
staleTime: 30_000,
retry: false,
});
}
export function useCreateChat() {
const qc = useQueryClient();
return useMutation({
mutationFn: (body: { title?: string; style_corpus_id?: string | null }) =>
apiRequest<ChatConversation>("/api/training/chat/conversations", {
method: "POST",
body,
}),
onSuccess: () => {
qc.invalidateQueries({ queryKey: chatKeys.conversations() });
},
});
}
export function useDeleteChat() {
const qc = useQueryClient();
return useMutation({
mutationFn: (id: string) =>
apiRequest<{ deleted: boolean }>(
`/api/training/chat/conversations/${encodeURIComponent(id)}`,
{ method: "DELETE" },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: chatKeys.conversations() });
},
});
}
// ── Curator portrait ──────────────────────────────────────────────
export type CuratorPrompt = {
content: string;
filename: string;
bytes: number;
last_modified: number;
gitea_url: string;
};
export type StyleAnalyzerPrompts = {
analysis_prompt: string;
single_decision_prompt: string;
synthesis_prompt: string;
max_input_tokens: number;
};
export type CuratorFinding = {
id: string;
lesson_text: string;
category: string;
applied_to_skill: boolean;
decision_number: string;
decision_date: string;
created_at: string;
};
export type CuratorStats = {
total_findings: number;
decisions_with_findings: number;
decisions_total: number;
findings_applied: number;
recent_findings: CuratorFinding[];
};
export type CuratorProposalInput = {
title: string;
proposed_change: string;
rationale: string;
};
export type CuratorProposalFile = {
filename: string;
bytes: number;
modified_at: number;
};
export const curatorKeys = {
prompt: () => [...trainingKeys.all, "curator", "prompt"] as const,
analyzerPrompt: () => [...trainingKeys.all, "curator", "analyzer-prompt"] as const,
stats: () => [...trainingKeys.all, "curator", "stats"] as const,
proposals: () => [...trainingKeys.all, "curator", "proposals"] as const,
};
export function useCuratorPrompt() {
return useQuery({
queryKey: curatorKeys.prompt(),
queryFn: ({ signal }) =>
apiRequest<CuratorPrompt>("/api/training/curator/prompt", { signal }),
staleTime: 5 * 60_000,
});
}
export function useStyleAnalyzerPrompts() {
return useQuery({
queryKey: curatorKeys.analyzerPrompt(),
queryFn: ({ signal }) =>
apiRequest<StyleAnalyzerPrompts>(
"/api/training/curator/style-analyzer-prompt",
{ signal },
),
staleTime: 5 * 60_000,
});
}
export function useCuratorStats() {
return useQuery({
queryKey: curatorKeys.stats(),
queryFn: ({ signal }) =>
apiRequest<CuratorStats>("/api/training/curator/stats", { signal }),
staleTime: 60_000,
});
}
export function useCuratorProposals() {
return useQuery({
queryKey: curatorKeys.proposals(),
queryFn: ({ signal }) =>
apiRequest<CuratorProposalFile[]>("/api/training/curator/proposals", { signal }),
staleTime: 30_000,
});
}
export function useSubmitCuratorProposal() {
const qc = useQueryClient();
return useMutation({
mutationFn: (body: CuratorProposalInput) =>
apiRequest<{ saved: boolean; filename: string }>(
"/api/training/curator/proposals",
{ method: "POST", body },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: curatorKeys.proposals() });
},
});
}
// ── Upload flow ──────────────────────────────────────────────────
// Three-step pipeline:
// 1. useUploadFile → POST /api/upload (multipart) → { filename }
// 2. useAnalyzeFile → POST /api/training/analyze (form) → preview + extracted metadata
// 3. useCommitUpload → POST /api/training/upload (json) → { task_id }
// Track task_id via useProgress() from documents.ts.
export type UploadFileResponse = {
filename: string; // sanitized, time-prefixed name in UPLOAD_DIR
original_name: string;
size: number;
};
export type AnalyzeTrainingResponse = {
filename: string;
clean_text: string;
preview: string;
decision_number: string;
decision_date: string; // ISO YYYY-MM-DD or ""
subject_categories: string[];
stats: Record<string, unknown>;
chars: number;
};
export type CommitTrainingRequest = {
filename: string;
decision_number: string;
decision_date: string; // YYYY-MM-DD or ""
subject_categories: string[];
title?: string;
};
export type CommitTrainingResponse = { task_id: string };
export function useUploadFile() {
return useMutation({
mutationFn: async (file: File): Promise<UploadFileResponse> => {
const fd = new FormData();
fd.append("file", file);
const res = await fetch("/api/upload", { method: "POST", body: fd });
const contentType = res.headers.get("content-type") ?? "";
const parsed = contentType.includes("application/json")
? await res.json().catch(() => null)
: await res.text().catch(() => null);
if (!res.ok) {
throw new ApiError(
typeof parsed === "object" && parsed && "detail" in parsed
? String((parsed as { detail: unknown }).detail)
: `Upload failed with ${res.status}`,
res.status,
parsed,
);
}
return parsed as UploadFileResponse;
},
});
}
export function useAnalyzeTraining() {
return useMutation({
mutationFn: async (filename: string): Promise<AnalyzeTrainingResponse> => {
const fd = new FormData();
fd.append("filename", filename);
const res = await fetch("/api/training/analyze", {
method: "POST",
body: fd,
});
const contentType = res.headers.get("content-type") ?? "";
const parsed = contentType.includes("application/json")
? await res.json().catch(() => null)
: await res.text().catch(() => null);
if (!res.ok) {
throw new ApiError(
typeof parsed === "object" && parsed && "detail" in parsed
? String((parsed as { detail: unknown }).detail)
: `Analyze failed with ${res.status}`,
res.status,
parsed,
);
}
return parsed as AnalyzeTrainingResponse;
},
});
}
// ── Per-decision lessons ─────────────────────────────────────────
export type DecisionLesson = {
id: string;
style_corpus_id: string;
lesson_text: string;
category: "style" | "structure" | "lexicon" | "tabular" | "general";
source: "manual" | "curator" | "chair" | "style_analyzer";
applied_to_skill: boolean;
created_by: string;
created_at: string;
updated_at: string;
};
export type LessonCreate = {
lesson_text: string;
category?: DecisionLesson["category"];
source?: DecisionLesson["source"];
};
export type LessonPatch = {
lesson_text?: string;
category?: DecisionLesson["category"];
applied_to_skill?: boolean;
};
export const lessonsKeys = {
forCorpus: (corpusId: string) =>
[...trainingKeys.all, "lessons", corpusId] as const,
};
export function useCorpusLessons(corpusId: string | null) {
return useQuery({
queryKey: lessonsKeys.forCorpus(corpusId ?? ""),
queryFn: ({ signal }) =>
apiRequest<DecisionLesson[]>(
`/api/training/corpus/${encodeURIComponent(corpusId!)}/lessons`,
{ signal },
),
enabled: Boolean(corpusId),
staleTime: 30_000,
});
}
export function useAddLesson(corpusId: string) {
const qc = useQueryClient();
return useMutation({
mutationFn: (body: LessonCreate) =>
apiRequest<DecisionLesson>(
`/api/training/corpus/${encodeURIComponent(corpusId)}/lessons`,
{ method: "POST", body },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: lessonsKeys.forCorpus(corpusId) });
// lessons_count on the corpus row is computed server-side, so
// invalidate the list too — otherwise the badge stays stale.
qc.invalidateQueries({ queryKey: trainingKeys.corpus() });
},
});
}
export function usePatchLesson(corpusId: string) {
const qc = useQueryClient();
return useMutation({
mutationFn: ({ id, patch }: { id: string; patch: LessonPatch }) =>
apiRequest<{ updated: boolean }>(
`/api/training/lessons/${encodeURIComponent(id)}`,
{ method: "PATCH", body: patch },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: lessonsKeys.forCorpus(corpusId) });
},
});
}
export function useDeleteLesson(corpusId: string) {
const qc = useQueryClient();
return useMutation({
mutationFn: (id: string) =>
apiRequest<{ deleted: boolean }>(
`/api/training/lessons/${encodeURIComponent(id)}`,
{ method: "DELETE" },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: lessonsKeys.forCorpus(corpusId) });
qc.invalidateQueries({ queryKey: trainingKeys.corpus() });
},
});
}
export function usePatchCorpus() {
const qc = useQueryClient();
return useMutation({
mutationFn: ({ id, patch }: { id: string; patch: CorpusDecisionPatch }) =>
apiRequest<{ updated: boolean; id: string }>(
`/api/training/corpus/${encodeURIComponent(id)}`,
{ method: "PATCH", body: patch },
),
onSuccess: () => {
qc.invalidateQueries({ queryKey: trainingKeys.corpus() });
qc.invalidateQueries({ queryKey: trainingKeys.report() });
},
});
}
export function useCommitTrainingUpload() {
// No onSuccess invalidation here — the row only appears after the
// background task finishes. The dialog watches useProgress(task_id)
// and invalidates trainingKeys when status === "completed".
return useMutation({
mutationFn: (body: CommitTrainingRequest) =>
apiRequest<CommitTrainingResponse>("/api/training/upload", {
method: "POST",
body,
}),
});
}

View File

@@ -12,6 +12,7 @@ import subprocess
import sys import sys
import time import time
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
from datetime import date as date_type
from pathlib import Path from pathlib import Path
from uuid import UUID, uuid4 from uuid import UUID, uuid4
@@ -945,32 +946,648 @@ async def training_corpus_delete(corpus_id: str):
return result return result
def _format_legal_citation(decision_number: str, decision_date: str) -> str:
"""Compose the Israeli ועדת ערר citation string from corpus metadata.
Mirrors how decisions are referenced in Daphna's own writing — e.g.
"ערר 1130-25 ועדת ערר ירושלים (26.4.2026)". Empty parts are dropped
gracefully so partially populated rows still produce a readable label.
"""
if not decision_number:
return ""
parts = [f"ערר {decision_number}", "ועדת ערר ירושלים"]
if decision_date:
try:
d = date_type.fromisoformat(decision_date)
parts.append(f"({d.day}.{d.month}.{d.year})")
except ValueError:
pass
return " ".join(parts)
_PARTIES_PATTERNS = (
# "העורר: X" or "העוררים: X". Captures up to a newline / end of stanza.
re.compile(r"העורר(?:ים|ת)?[:\s]+([^\n]{3,120})"),
re.compile(r"המבקש(?:ים|ת)?[:\s]+([^\n]{3,120})"),
re.compile(r"בעניין[:\s]+([^\n]{3,120})"),
)
_RESPONDENT_PATTERNS = (
re.compile(r"המשיב(?:ים|ה|ות)?[:\s]+([^\n]{3,120})"),
re.compile(r"נגד\s*\n+\s*([^\n]{3,120})"),
)
def _extract_parties(text: str) -> dict[str, str]:
"""Best-effort regex extraction of עורר/משיב from the first 5K of full_text.
We only scan the head of the document because the parties are always
declared at the top in Israeli legal decisions. The result is a hint
for display — never authoritative — so a miss returns an empty string
rather than raising.
"""
head = (text or "")[:5000]
appellant = respondent = ""
for pat in _PARTIES_PATTERNS:
m = pat.search(head)
if m:
appellant = m.group(1).strip(" .,-—")
break
for pat in _RESPONDENT_PATTERNS:
m = pat.search(head)
if m:
respondent = m.group(1).strip(" .,-—")
break
return {"appellant": appellant, "respondent": respondent}
@app.get("/api/training/corpus") @app.get("/api/training/corpus")
async def training_corpus_list(): async def training_corpus_list():
"""List all decisions currently in the style corpus.""" """List all decisions currently in the style corpus, with enriched metadata.
Joins to ``documents`` via FK when available, falling back to the
title-token match used in the chunking pipeline so legacy rows with
``style_corpus.document_id IS NULL`` still resolve to their page_count
and chunk counts.
"""
pool = await db.get_pool() pool = await db.get_pool()
async with pool.acquire() as conn: async with pool.acquire() as conn:
rows = await conn.fetch( rows = await conn.fetch(
"SELECT id, decision_number, decision_date, subject_categories, " """
" length(full_text) as chars, created_at " SELECT sc.id,
"FROM style_corpus " sc.decision_number,
"ORDER BY created_at DESC" sc.decision_date,
sc.subject_categories,
length(sc.full_text) AS chars,
substring(sc.full_text from 1 for 5000) AS head_text,
sc.summary,
sc.outcome,
sc.key_principles,
sc.appeal_subtype,
sc.practice_area,
sc.document_id,
sc.created_at,
d.page_count AS page_count,
d.title AS doc_title
FROM style_corpus sc
LEFT JOIN documents d ON d.id = sc.document_id
ORDER BY sc.created_at DESC
"""
) )
return [ lessons_counts = await db.count_decision_lessons_per_corpus()
{ out = []
for r in rows:
cats = r["subject_categories"]
if isinstance(cats, str):
try:
cats = json.loads(cats)
except json.JSONDecodeError:
cats = []
kp = r["key_principles"]
if isinstance(kp, str):
try:
kp = json.loads(kp)
except json.JSONDecodeError:
kp = []
decision_date = str(r["decision_date"]) if r["decision_date"] else ""
parties = _extract_parties(r["head_text"] or "")
out.append({
"id": str(r["id"]), "id": str(r["id"]),
"decision_number": r["decision_number"] or "", "decision_number": r["decision_number"] or "",
"decision_date": str(r["decision_date"]) if r["decision_date"] else "", "decision_date": decision_date,
"subject_categories": ( "subject_categories": cats or [],
json.loads(r["subject_categories"])
if isinstance(r["subject_categories"], str)
else r["subject_categories"] or []
),
"chars": r["chars"], "chars": r["chars"],
"created_at": r["created_at"].isoformat() if r["created_at"] else "", "created_at": r["created_at"].isoformat() if r["created_at"] else "",
# ── enriched fields ──
"summary": r["summary"] or "",
"outcome": r["outcome"] or "",
"key_principles": kp or [],
"appeal_subtype": r["appeal_subtype"] or "",
"practice_area": r["practice_area"] or "",
"page_count": r["page_count"] or 0,
"document_id": str(r["document_id"]) if r["document_id"] else None,
"doc_title": r["doc_title"] or "",
"parties": parties,
"legal_citation": _format_legal_citation(r["decision_number"] or "", decision_date),
"lessons_count": lessons_counts.get(str(r["id"]), 0),
})
return out
# ── Style-agent chat (delegated to legal-chat-service on host) ─────
class ChatConversationCreate(BaseModel):
title: str = "שיחה חדשה"
style_corpus_id: str | None = None # optional — scope chat to a decision
class ChatMessageRequest(BaseModel):
content: str
def _conv_to_json(row: dict) -> dict:
"""Serialize a chat_conversations row for the API."""
return {
"id": str(row["id"]),
"title": row.get("title") or "",
"style_corpus_id": str(row["style_corpus_id"]) if row.get("style_corpus_id") else None,
"decision_number": row.get("decision_number") or "",
"claude_session_id": row.get("claude_session_id"),
"message_count": row.get("message_count", 0),
"created_at": row["created_at"].isoformat() if row.get("created_at") else "",
"last_message_at": row["last_message_at"].isoformat() if row.get("last_message_at") else "",
}
def _msg_to_json(row: dict) -> dict:
return {
"id": str(row["id"]),
"role": row["role"],
"content": row["content"],
"created_at": row["created_at"].isoformat() if row.get("created_at") else "",
}
@app.post("/api/training/chat/conversations")
async def chat_create_conversation(body: ChatConversationCreate):
"""Create a new style-agent chat conversation."""
corpus_uuid: UUID | None = None
if body.style_corpus_id:
try:
corpus_uuid = UUID(body.style_corpus_id)
except ValueError:
raise HTTPException(400, "invalid style_corpus_id")
row = await db.create_chat_conversation(
title=body.title.strip() or "שיחה חדשה",
style_corpus_id=corpus_uuid,
)
if not row:
raise HTTPException(500, "failed to create conversation")
return _conv_to_json(row)
@app.get("/api/training/chat/conversations")
async def chat_list_conversations(limit: int = 50):
rows = await db.list_chat_conversations(limit=limit)
return [_conv_to_json(r) for r in rows]
@app.get("/api/training/chat/conversations/{conv_id}")
async def chat_get_conversation(conv_id: str):
try:
cid = UUID(conv_id)
except ValueError:
raise HTTPException(400, "invalid conv_id")
conv = await db.get_chat_conversation(cid)
if not conv:
raise HTTPException(404, "conversation not found")
messages = await db.list_chat_messages(cid)
return {
"conversation": _conv_to_json(conv),
"messages": [_msg_to_json(m) for m in messages],
}
@app.delete("/api/training/chat/conversations/{conv_id}")
async def chat_delete_conversation(conv_id: str):
try:
cid = UUID(conv_id)
except ValueError:
raise HTTPException(400, "invalid conv_id")
result = await db.delete_chat_conversation(cid)
if not result.get("deleted"):
raise HTTPException(404, "conversation not found")
return result
@app.post("/api/training/chat/conversations/{conv_id}/messages")
async def chat_send_message(conv_id: str, body: ChatMessageRequest):
"""Send a user message; stream the assistant response as SSE.
Proxies through ``web.chat_proxy.stream_chat_message`` to the
legal-chat-service running on the host.
"""
try:
cid = UUID(conv_id)
except ValueError:
raise HTTPException(400, "invalid conv_id")
text = (body.content or "").strip()
if not text:
raise HTTPException(400, "content is required")
from web import chat_proxy
return await chat_proxy.stream_chat_message(cid, text)
@app.get("/api/training/chat/health")
async def chat_health():
"""Probe legal-chat-service liveness from inside the container.
Useful when the UI wants to gracefully degrade ("שירות הצ'אט אינו
זמין") instead of letting messages fail mid-stream.
"""
from web import chat_proxy
try:
async with httpx.AsyncClient(timeout=httpx.Timeout(5.0)) as client:
r = await client.get(f"{chat_proxy.CHAT_SERVICE_URL}/health")
return {"reachable": r.status_code == 200, "status": r.status_code,
"url": chat_proxy.CHAT_SERVICE_URL}
except Exception as e:
return {"reachable": False, "error": str(e),
"url": chat_proxy.CHAT_SERVICE_URL}
# ── Curator portrait — read prompt + stats + accept proposals ──────
# The curator agent's prompt is symlinked into Paperclip, but the source
# lives in the legal-ai repo. Resolve via env so the container (where the
# agent file is mounted from a different path) and the host both work.
_AGENTS_DIR = Path(os.environ.get(
"AGENTS_DIR",
str(Path(__file__).resolve().parent.parent / ".claude" / "agents"),
))
_CURATOR_PROPOSALS_DIR = Path(os.environ.get(
"CURATOR_PROPOSALS_DIR",
str(Path(__file__).resolve().parent.parent / "data" / "curator-proposals"),
))
_GITEA_REPO_BASE = os.environ.get(
"GITEA_REPO_BASE",
"https://gitea.nautilus.marcusgroup.org/ezer-mishpati/legal-ai",
)
@app.get("/api/training/curator/prompt")
async def get_curator_prompt():
"""Return the hermes-curator agent's prompt (read-only) + Gitea source URL.
The file is the canonical source of how the curator analyzes Daphna's
final decisions. Changes go through git/Gitea, not the UI — the UI just
surfaces it for transparency.
"""
path = _AGENTS_DIR / "hermes-curator.md"
if not path.exists():
raise HTTPException(404, f"curator prompt not found at {path}")
try:
content = path.read_text(encoding="utf-8")
stat = path.stat()
except OSError as e:
raise HTTPException(500, f"failed to read curator prompt: {e}")
gitea_url = (
f"{_GITEA_REPO_BASE}/src/branch/main/.claude/agents/hermes-curator.md"
)
return {
"content": content,
"filename": path.name,
"bytes": stat.st_size,
"last_modified": stat.st_mtime,
"gitea_url": gitea_url,
}
@app.get("/api/training/curator/style-analyzer-prompt")
async def get_style_analyzer_prompt():
"""Return the system prompt that style_analyzer.py uses to extract patterns.
Surfaces the *training-time* prompt (Claude Opus 1M context) so the
chair can compare it against the curator's post-export prompt. Both
are shown side-by-side in the curator-portrait tab.
"""
# Embedded as a string so we don't need to import the service module
# here (which would pull in claude_session + db). The prompt is the
# one defined in mcp-server/src/legal_mcp/services/style_analyzer.py.
try:
from legal_mcp.services import style_analyzer
return {
"analysis_prompt": style_analyzer.ANALYSIS_PROMPT,
"single_decision_prompt": style_analyzer.SINGLE_DECISION_PROMPT,
"synthesis_prompt": style_analyzer.SYNTHESIS_PROMPT,
"max_input_tokens": style_analyzer.MAX_INPUT_TOKENS,
}
except Exception as e:
raise HTTPException(500, f"failed to load style_analyzer prompt: {e}")
@app.get("/api/training/curator/stats")
async def get_curator_stats():
"""Cheap aggregate stats over decision_lessons + style_corpus.
Used by the Curator-Portrait tab to show "10 curator findings across 24
decisions". We deliberately keep this server-side and aggregate so the
UI can render a single card without fanning out N queries.
"""
pool = await db.get_pool()
async with pool.acquire() as conn:
total_lessons = await conn.fetchval(
"SELECT count(*) FROM decision_lessons WHERE source = 'curator'"
)
decisions_with_findings = await conn.fetchval(
"SELECT count(DISTINCT style_corpus_id) FROM decision_lessons "
"WHERE source = 'curator'"
)
total_corpus = await conn.fetchval("SELECT count(*) FROM style_corpus")
applied = await conn.fetchval(
"SELECT count(*) FROM decision_lessons "
"WHERE source = 'curator' AND applied_to_skill = true"
)
# Last 10 curator findings — newest first
recent_rows = await conn.fetch(
"""
SELECT dl.id, dl.lesson_text, dl.category, dl.applied_to_skill,
dl.created_at,
sc.decision_number, sc.decision_date
FROM decision_lessons dl
JOIN style_corpus sc ON sc.id = dl.style_corpus_id
WHERE dl.source = 'curator'
ORDER BY dl.created_at DESC
LIMIT 10
"""
)
return {
"total_findings": total_lessons or 0,
"decisions_with_findings": decisions_with_findings or 0,
"decisions_total": total_corpus or 0,
"findings_applied": applied or 0,
"recent_findings": [
{
"id": str(r["id"]),
"lesson_text": r["lesson_text"],
"category": r["category"],
"applied_to_skill": bool(r["applied_to_skill"]),
"decision_number": r["decision_number"] or "",
"decision_date": str(r["decision_date"]) if r["decision_date"] else "",
"created_at": r["created_at"].isoformat() if r["created_at"] else "",
}
for r in recent_rows
],
}
class CuratorProposal(BaseModel):
title: str
proposed_change: str # markdown — what to change in the prompt
rationale: str # markdown — why
@app.post("/api/training/curator/proposals")
async def create_curator_proposal(body: CuratorProposal):
"""Save a proposed change to the curator prompt as a file on disk.
No automatic commit, no overwrite — the chair (chaim) reviews the
file manually and applies it through git. This is intentional: the
prompt is too load-bearing to mutate from a web UI.
"""
title = (body.title or "").strip()
if not title:
raise HTTPException(400, "title is required")
if not body.proposed_change.strip():
raise HTTPException(400, "proposed_change is required")
_CURATOR_PROPOSALS_DIR.mkdir(parents=True, exist_ok=True)
# Slug-ish filename — strip anything that isn't a Hebrew letter, ASCII
# letter, digit, hyphen, or underscore. Hebrew letters are explicitly
# allowed because most proposals will be in Hebrew.
slug = re.sub(r"[^\w֐-׿\-]+", "-", title)[:60].strip("-_") or "proposal"
today = date_type.today().isoformat()
fname = f"{today}-{slug}.md"
path = _CURATOR_PROPOSALS_DIR / fname
# If a proposal with the same slug already exists today, append a
# numeric suffix so we don't silently overwrite.
idx = 2
while path.exists():
path = _CURATOR_PROPOSALS_DIR / f"{today}-{slug}-{idx}.md"
idx += 1
md = (
f"# הצעת שינוי לפרומפט hermes-curator\n\n"
f"- **תאריך:** {today}\n"
f"- **כותרת:** {title}\n\n"
f"## שינוי מוצע\n\n{body.proposed_change.strip()}\n\n"
f"## נימוק\n\n{body.rationale.strip() or '(לא ניתן)'}\n"
)
try:
path.write_text(md, encoding="utf-8")
except OSError as e:
raise HTTPException(500, f"failed to write proposal: {e}")
return {
"saved": True,
"filename": path.name,
"path": str(path),
"bytes": len(md.encode("utf-8")),
}
@app.get("/api/training/curator/proposals")
async def list_curator_proposals():
"""List proposed-change files in data/curator-proposals/, newest first."""
if not _CURATOR_PROPOSALS_DIR.exists():
return []
items = []
for p in sorted(_CURATOR_PROPOSALS_DIR.iterdir(),
key=lambda f: f.stat().st_mtime, reverse=True):
if not p.is_file() or p.suffix.lower() != ".md":
continue
stat = p.stat()
items.append({
"filename": p.name,
"bytes": stat.st_size,
"modified_at": stat.st_mtime,
})
return items
# ── Per-decision lessons (decision_lessons table) ──────────────────
class LessonCreate(BaseModel):
lesson_text: str
category: str = "general"
source: str = "manual"
class LessonPatch(BaseModel):
lesson_text: str | None = None
category: str | None = None
applied_to_skill: bool | None = None
_LESSON_CATEGORIES = {"style", "structure", "lexicon", "tabular", "general"}
_LESSON_SOURCES = {"manual", "curator", "chair", "style_analyzer"}
def _lesson_to_json(row: dict) -> dict:
return {
"id": str(row["id"]),
"style_corpus_id": str(row["style_corpus_id"]),
"lesson_text": row["lesson_text"],
"category": row["category"],
"source": row["source"],
"applied_to_skill": bool(row["applied_to_skill"]),
"created_by": row.get("created_by", ""),
"created_at": row["created_at"].isoformat() if row.get("created_at") else "",
"updated_at": row["updated_at"].isoformat() if row.get("updated_at") else "",
}
@app.get("/api/training/corpus/{corpus_id}/lessons")
async def list_corpus_lessons(corpus_id: str):
try:
cid = UUID(corpus_id)
except ValueError:
raise HTTPException(400, "invalid corpus_id")
rows = await db.list_decision_lessons(cid)
return [_lesson_to_json(r) for r in rows]
@app.post("/api/training/corpus/{corpus_id}/lessons")
async def add_corpus_lesson(corpus_id: str, body: LessonCreate):
try:
cid = UUID(corpus_id)
except ValueError:
raise HTTPException(400, "invalid corpus_id")
text = (body.lesson_text or "").strip()
if not text:
raise HTTPException(400, "lesson_text is required")
if body.category not in _LESSON_CATEGORIES:
raise HTTPException(400, f"invalid category; allowed: {sorted(_LESSON_CATEGORIES)}")
if body.source not in _LESSON_SOURCES:
raise HTTPException(400, f"invalid source; allowed: {sorted(_LESSON_SOURCES)}")
row = await db.add_decision_lesson(
cid, lesson_text=text, category=body.category, source=body.source,
)
if not row:
raise HTTPException(500, "failed to insert lesson")
return _lesson_to_json(row)
@app.patch("/api/training/lessons/{lesson_id}")
async def patch_corpus_lesson(lesson_id: str, body: LessonPatch):
try:
lid = UUID(lesson_id)
except ValueError:
raise HTTPException(400, "invalid lesson_id")
if body.category is not None and body.category not in _LESSON_CATEGORIES:
raise HTTPException(400, f"invalid category; allowed: {sorted(_LESSON_CATEGORIES)}")
result = await db.update_decision_lesson(
lid,
lesson_text=body.lesson_text,
category=body.category,
applied_to_skill=body.applied_to_skill,
)
if not result.get("updated"):
if result.get("reason") == "not found":
raise HTTPException(404, "lesson not found")
return result # "nothing to update" — 200 with reason
return result
@app.delete("/api/training/lessons/{lesson_id}")
async def delete_corpus_lesson(lesson_id: str):
try:
lid = UUID(lesson_id)
except ValueError:
raise HTTPException(400, "invalid lesson_id")
result = await db.delete_decision_lesson(lid)
if not result.get("deleted"):
raise HTTPException(404, "lesson not found")
return result
@app.get("/api/training/corpus/{corpus_id}/full-text")
async def training_corpus_full_text(corpus_id: str):
"""Return the proofread full_text for a single corpus row.
Kept out of the list endpoint because full_text is large (50K-650K chars
per decision) and the table view only needs counts. The drawer fetches
it on demand when the chair opens the "content" tab.
"""
try:
cid = UUID(corpus_id)
except ValueError:
raise HTTPException(400, "invalid corpus_id")
pool = await db.get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT decision_number, full_text FROM style_corpus WHERE id = $1",
cid,
)
if not row:
raise HTTPException(404, "corpus row not found")
return {
"id": corpus_id,
"decision_number": row["decision_number"] or "",
"full_text": row["full_text"] or "",
}
class TrainingCorpusPatch(BaseModel):
"""Editable metadata fields on a style_corpus row.
full_text is intentionally NOT editable — the corpus is write-once.
For corrections, re-upload the decision via /api/training/upload.
"""
decision_number: str | None = None
decision_date: str | None = None # ISO YYYY-MM-DD, or "" to clear
subject_categories: list[str] | None = None
summary: str | None = None
outcome: str | None = None
key_principles: list[str] | None = None
appeal_subtype: str | None = None
practice_area: str | None = None
@app.patch("/api/training/corpus/{corpus_id}")
async def training_corpus_patch(corpus_id: str, patch: TrainingCorpusPatch):
"""Update metadata fields on a corpus row. Only provided fields are touched."""
try:
cid = UUID(corpus_id)
except ValueError:
raise HTTPException(400, "invalid corpus_id")
fields = patch.model_dump(exclude_none=True)
if not fields:
return {"updated": False, "reason": "no fields to update"}
# Coerce decision_date "" → SQL NULL, otherwise parse as DATE.
if "decision_date" in fields:
v = fields["decision_date"]
if v == "":
fields["decision_date"] = None
else:
try:
fields["decision_date"] = date_type.fromisoformat(v)
except ValueError as e:
raise HTTPException(400, f"invalid decision_date: {e}")
# subject_categories + key_principles are JSONB columns.
if "subject_categories" in fields:
fields["subject_categories"] = json.dumps(fields["subject_categories"])
if "key_principles" in fields:
fields["key_principles"] = json.dumps(fields["key_principles"])
# Build a positional UPDATE — asyncpg doesn't support named parameters.
cols = list(fields.keys())
set_clause = ", ".join(f"{c} = ${i + 2}" for i, c in enumerate(cols))
values = [fields[c] for c in cols]
pool = await db.get_pool()
async with pool.acquire() as conn:
result = await conn.fetchrow(
f"UPDATE style_corpus SET {set_clause} "
f"WHERE id = $1 "
f"RETURNING id, decision_number, decision_date, summary, outcome",
cid, *values,
)
if not result:
raise HTTPException(404, "corpus row not found")
return {
"updated": True,
"id": str(result["id"]),
"decision_number": result["decision_number"] or "",
"decision_date": str(result["decision_date"]) if result["decision_date"] else "",
"summary_len": len(result["summary"] or ""),
"outcome_len": len(result["outcome"] or ""),
} }
for r in rows
]
# Headers that defeat proxy buffering for SSE streams. `X-Accel-Buffering: no` # Headers that defeat proxy buffering for SSE streams. `X-Accel-Buffering: no`

176
web/chat_proxy.py Normal file
View File

@@ -0,0 +1,176 @@
"""FastAPI ↔ legal-chat-service streaming bridge.
The browser hits ``/api/training/chat/conversations/{id}/messages`` on
the legal-ai container. The container is sealed off from the host's
``claude`` CLI (intentional — see ``claude_session.py`` docstring), so
we forward each request to the pm2-managed ``legal-chat-service`` over
loopback (``host.docker.internal:8770``).
Responsibilities:
- Save the user message to ``chat_messages`` before streaming starts.
- Open an HTTP streaming connection to the host service.
- Forward each SSE event to the browser as-is, accumulating the
assistant text and any ``session_id`` so we can persist them once
the stream closes.
- Persist the assistant turn + the CLI's session_id at end-of-stream.
"""
from __future__ import annotations
import json
import logging
import os
from typing import AsyncIterator
from uuid import UUID
import httpx
from fastapi import HTTPException
from fastapi.responses import StreamingResponse
from legal_mcp.services import db
from web import chat_system_prompt
logger = logging.getLogger(__name__)
# legal-chat-service lives on the host. In the container we reach it via
# host.docker.internal — which requires ``extra_hosts: host.docker.internal:host-gateway``
# in the Coolify service definition. Set ``CHAT_SERVICE_URL`` to override
# (handy for local dev outside Docker).
CHAT_SERVICE_URL = os.environ.get(
"CHAT_SERVICE_URL",
"http://host.docker.internal:8770",
)
CHAT_SERVICE_TIMEOUT_S = float(os.environ.get("CHAT_SERVICE_TIMEOUT_S", "3600"))
_SSE_HEADERS = {
"Cache-Control": "no-cache, no-transform",
"X-Accel-Buffering": "no",
"Connection": "keep-alive",
}
async def stream_chat_message(
conversation_id: UUID,
user_message: str,
) -> StreamingResponse:
"""Open SSE stream, forward events, persist when done.
Returns a FastAPI StreamingResponse the route can return directly.
"""
conv = await db.get_chat_conversation(conversation_id)
if not conv:
raise HTTPException(404, "conversation not found")
# Persist the user turn immediately so a network drop doesn't lose it.
await db.add_chat_message(
conversation_id, role="user", content=user_message,
)
is_first_turn = not conv.get("claude_session_id")
system_block: str | None = None
if is_first_turn:
try:
system_block = await chat_system_prompt.build_system_prompt(
corpus_id=conv.get("style_corpus_id"),
)
except Exception as e:
logger.exception("system prompt build failed")
raise HTTPException(500, f"system prompt failed: {e}")
payload = {
"prompt": user_message,
"system": system_block,
"resume_session_id": conv.get("claude_session_id"),
}
async def proxy_stream() -> AsyncIterator[bytes]:
accumulated_text: list[str] = []
events_log: list[dict] = []
new_session_id: str | None = None
try:
timeout_cfg = httpx.Timeout(
CHAT_SERVICE_TIMEOUT_S,
connect=10.0,
read=CHAT_SERVICE_TIMEOUT_S,
)
async with httpx.AsyncClient(timeout=timeout_cfg) as client:
async with client.stream(
"POST",
f"{CHAT_SERVICE_URL}/chat/start",
json=payload,
) as upstream:
if upstream.status_code != 200:
body = await upstream.aread()
msg = body.decode("utf-8", errors="replace")[:300]
err = {"type": "error",
"message": f"chat-service {upstream.status_code}: {msg}"}
yield f"data: {json.dumps(err, ensure_ascii=False)}\n\n".encode("utf-8")
return
async for line in upstream.aiter_lines():
if not line:
yield b"\n"
continue
# Forward verbatim so the browser sees the same
# SSE framing the host emits.
out = line + "\n"
yield out.encode("utf-8")
# Mirror events: capture text + session_id for
# persistence. The line starts with "data: <json>"
# so we strip the prefix before parsing.
if line.startswith("data: "):
try:
event = json.loads(line[len("data: "):])
except json.JSONDecodeError:
continue
events_log.append(event)
t = event.get("type")
if t == "session_id" and event.get("value"):
new_session_id = event["value"]
elif t == "text_delta" and event.get("text"):
accumulated_text.append(event["text"])
elif t == "done" and event.get("text"):
if not accumulated_text:
accumulated_text.append(event["text"])
except httpx.ConnectError:
err = {
"type": "error",
"message": (
f"לא ניתן להגיע ל-legal-chat-service בכתובת {CHAT_SERVICE_URL}. "
"ודא ש-pm2 מריץ אותו: `pm2 status legal-chat-service`."
),
}
yield f"data: {json.dumps(err, ensure_ascii=False)}\n\n".encode("utf-8")
return
except Exception as e:
logger.exception("chat proxy failed")
err = {"type": "error", "message": str(e)}
yield f"data: {json.dumps(err, ensure_ascii=False)}\n\n".encode("utf-8")
return
# End of stream — persist the assistant turn.
try:
full_text = "".join(accumulated_text).strip()
if full_text:
await db.add_chat_message(
conversation_id,
role="assistant",
content=full_text,
raw_events=events_log,
)
if new_session_id:
await db.update_chat_conversation_session_id(
conversation_id, new_session_id,
)
except Exception:
logger.exception("failed to persist assistant turn for conv=%s", conversation_id)
return StreamingResponse(
proxy_stream(),
media_type="text/event-stream",
headers=_SSE_HEADERS,
)

205
web/chat_system_prompt.py Normal file
View File

@@ -0,0 +1,205 @@
"""Compose the system prompt the style-chat agent receives.
The chat runs against the local ``claude`` CLI on the host (via
legal-chat-service). We assemble a once-per-conversation system block
that gives the agent everything it needs to discuss decisions in
Daphna's voice:
- The style guide (``skills/decision/SKILL.md``) — how she writes
- The lessons file (``docs/legal-decision-lessons.md``) — what we've
learned across the corpus
- The corpus-analysis report (``docs/corpus-analysis.md``) — the
structural map of 24+ decisions
- A summary of every style_corpus row (number, date, subjects,
chars + summary if extracted) so the agent can reason about the
whole corpus without us shipping all of it inline
- Optional: when the conversation is scoped to a specific decision
(``style_corpus_id``), append its full_text so the chat can dive
into the text directly
Sent **once**, when the conversation is first created. On subsequent
messages the legal-chat-service uses ``claude --resume <session_id>``
and the on-disk CLI session keeps the system context intact — no need
to re-ship the 100K+ chars of skills + lessons every turn.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
from uuid import UUID
from legal_mcp.services import db
logger = logging.getLogger(__name__)
# The reference files live in the repo at known paths. In the
# container they're mounted alongside the code, so resolve relative
# to web/app.py's parent.
_REPO_ROOT = Path(os.environ.get(
"LEGAL_AI_REPO_ROOT",
str(Path(__file__).resolve().parent.parent),
))
_SKILLS_PATH = _REPO_ROOT / "skills" / "decision" / "SKILL.md"
_LESSONS_PATH = _REPO_ROOT / "docs" / "legal-decision-lessons.md"
_CORPUS_ANALYSIS_PATH = _REPO_ROOT / "docs" / "corpus-analysis.md"
def _safe_read(path: Path, cap_chars: int = 50_000) -> str:
"""Read a file (UTF-8) or return a marker that it's missing.
The cap protects against accidentally injecting an enormous file —
even at 50K, a single source file is the lion's share of the
system prompt budget.
"""
try:
text = path.read_text(encoding="utf-8")
except FileNotFoundError:
return f"(קובץ {path.name} לא נמצא בנתיב {path})"
except OSError as e:
logger.warning("could not read %s: %s", path, e)
return f"(שגיאה בקריאת {path.name}: {e})"
if len(text) > cap_chars:
return text[:cap_chars] + f"\n\n[... חתך ב-{cap_chars:,} תווים מתוך {len(text):,}]"
return text
async def _corpus_summary_block() -> str:
"""Compact one-row-per-decision summary the agent can scan."""
rows = await db.get_pool()
async with rows.acquire() as conn:
records = await conn.fetch(
"""
SELECT decision_number, decision_date, appeal_subtype,
subject_categories, length(full_text) AS chars,
coalesce(summary, '') AS summary,
coalesce(outcome, '') AS outcome
FROM style_corpus
ORDER BY decision_date NULLS LAST
"""
)
if not records:
return "(הקורפוס ריק)"
lines = []
for r in records:
cats = r["subject_categories"]
if isinstance(cats, str):
import json as _json
try:
cats = _json.loads(cats)
except _json.JSONDecodeError:
cats = []
cats_str = ", ".join(cats or []) if cats else ""
date_str = str(r["decision_date"]) if r["decision_date"] else ""
summary = (r["summary"] or "").strip()
outcome = (r["outcome"] or "").strip()
head = f"- **{r['decision_number'] or ''}** ({date_str}) [{r['appeal_subtype'] or ''}] · {r['chars']:,} תווים"
meta = f" נושאים: {cats_str}"
body = ""
if summary:
body = f"\n תקציר: {summary}"
if outcome:
body += f" — תוצאה: {outcome}"
elif outcome:
body = f"\n תוצאה: {outcome}"
lines.append(head + "\n" + meta + body)
return "\n".join(lines)
async def _decision_full_text(corpus_id: UUID) -> str:
pool = await db.get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT decision_number, decision_date, full_text "
"FROM style_corpus WHERE id = $1",
corpus_id,
)
if not row:
return ""
header = f"# החלטה {row['decision_number']} ({row['decision_date']})\n\n"
return header + (row["full_text"] or "")
SYSTEM_PROMPT_HEADER = """\
אתה סוכן הסגנון של עו"ד דפנה תמיר, יו"ר ועדת הערר לתכנון ובניה — מחוז ירושלים.
תפקידך: לעזור לחיים (העוזר המקצועי של דפנה) להבין, לנתח ולחדד את הסגנון
של דפנה. אתה לא כותב החלטות חדשות; אתה דן בסגנון של החלטות קיימות,
מזהה דפוסים, מקפיד שהכותבים העתידיים (ה-writer agent) יישארו נאמנים
לקולה.
יש לך גישה ל:
1. **מדריך הסגנון** של דפנה (skills/decision/SKILL.md) — איך היא כותבת.
2. **הלקחים הגנריים** מהקורפוס (docs/legal-decision-lessons.md) — מה
למדנו לאורך 24+ החלטות. **חובה** להישען על הקבצים האלה כשאתה דן
בסגנון, ולא להמציא תובנות חדשות מהאוויר.
3. **ניתוח הקורפוס** המבני (docs/corpus-analysis.md) — מפת תוכן ופערים.
4. **רשימת ההחלטות בקורפוס** (למטה) — סקירה תמציתית של כל החלטה
שעלתה ל-style_corpus.
5. **טקסט מלא של החלטה ספציפית** (אם השיחה הוצמדה ל-style_corpus_id).
כללי תקשורת:
- כל התשובות בעברית.
- חיים יושב מולך, לא דפנה — אבל המטרה היא לחדד את הסגנון *של דפנה*.
- אם חיים שואל "האם פסקה X מתאימה לסגנון של דפנה?" — תן ניתוח מנומק
שמסתמך על SKILL.md ועל החלטות הקורפוס. אל תמציא ראיות.
- אם אתה צריך החלטה ספציפית שאין בקורפוס — הודע לחיים שיצרף אותה.
- אם חיים אומר לך משהו חדש על דפנה ("דפנה אומרת לעולם אל תפתח החלטה
במילה X") — שמור את זה בזיכרון השיחה; אם זה מצדיק תיעוד קבוע, הצע
לחיים להוסיף את זה כ-decision_lesson (POST /api/training/lessons)
או כתוספת ל-SKILL.md.
- אל תיתן לעצמך אישיות מומצאת — אתה כלי-עזר מקצועי, לא חבר.
"""
async def build_system_prompt(
*,
corpus_id: UUID | None = None,
include_corpus_summary: bool = True,
) -> str:
"""Assemble the full system prompt for a new chat conversation.
Args:
corpus_id: When set, the full_text of that decision is appended
so the chat can dive into the text.
include_corpus_summary: Set False for low-context chats (e.g.
quick "what does Daphna do at the end of a betterment-levy
decision?" — no need to ship 24 summaries).
"""
parts: list[str] = [SYSTEM_PROMPT_HEADER]
parts.append("\n## מדריך הסגנון (skills/decision/SKILL.md)\n")
parts.append(_safe_read(_SKILLS_PATH, cap_chars=40_000))
parts.append("\n\n## לקחים מהקורפוס (docs/legal-decision-lessons.md)\n")
parts.append(_safe_read(_LESSONS_PATH, cap_chars=30_000))
parts.append("\n\n## ניתוח קורפוס מבני (docs/corpus-analysis.md)\n")
parts.append(_safe_read(_CORPUS_ANALYSIS_PATH, cap_chars=15_000))
if include_corpus_summary:
parts.append("\n\n## רשימת ההחלטות בקורפוס הסגנון\n")
try:
parts.append(await _corpus_summary_block())
except Exception as e:
logger.warning("corpus summary failed: %s", e)
parts.append("(שגיאה בטעינת רשימת הקורפוס)")
if corpus_id is not None:
parts.append("\n\n## ההחלטה הספציפית בדיון (full_text)\n")
try:
txt = await _decision_full_text(corpus_id)
if txt:
parts.append(txt[:200_000]) # hard cap
else:
parts.append("(לא נמצאה החלטה — בדוק את ה-corpus_id)")
except Exception as e:
logger.warning("decision full_text failed: %s", e)
parts.append("(שגיאה בטעינת ההחלטה)")
return "\n".join(parts)