Derive practice_area from case (case row → number-prefix fallback); block only when a
case is present but undeterminable; case-less/exploratory search stays cross-domain.
Verified offline (test_search_domain_scope.py 5/5). Closes PR #10.
Granularity (epic-per-fix-unit + subtask-per-gap) and dependency-aware/WSJF
prioritization both backed by ≥3 authoritative sources (SAFe/Pichler/OWASP/CVSS;
Wake-INVEST/Cohn/Agile-Alliance/Atlassian/SAFe).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
X2/X3/X4 invariants are facts about this system's own integration/ops (no external
authority); they use מקור-סמכות=project runbooks, tied to a global engineering invariant.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per chair clarification: the ≥3-authoritative-source verification protocol governs
ENGINEERING/architecture decisions only (G1–G10). Legal-domain content (G11) is the
authority of the chair + project docs (block-schema, decision-methodology, lessons,
skills/decision) — NOT externally triple-sourced.
- §2/§4/§5 scoped to engineering invariants; added the two-authority distinction
- G11 reframed: source-of-authority = chair + project docs; removed FJC/South Bucks/
1958-statute as "sources to verify" and the UNVERIFIED flag
- Removed the "open items — primary-source verification" section (the over-application)
- Pruned now-orphaned legal sources from the appendix (kept NCSC/CEPEJ/FJC for G9/G10)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
13 tasks across 3 phases (keystone constitution → lifecycle files → cross-cutting),
each verification-gated (≥3 sources or UNVERIFIED+escalate) with review checkpoints.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Establishes the foundation to fix a recurring root-cause failure class
(non-canonical identifiers, asymmetric ingest paths, silent manual gates):
- Confirmed system mission (quasi-judicial decision assistant; human decides)
- Decomposition into 5 sub-projects (spec → audit → integrity layer → re-check → process agents)
- spec-set structure under docs/spec/ (lifecycle-organized + cross-cutting files)
- 11 global invariants + engineering rules, each backed by ≥3 authoritative sources
(NCSC/JTC, FJC, CEPEJ, South Bucks; RAG/Lewis, Manning IR, Elastic/Pinecone/Weaviate;
DAMA-DMBOK, ISO 8000, ISO 15489, Kleppmann, Codd, Fowler)
- 3-source verification protocol; UNVERIFIED items escalated, not decided solo
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reported: an agent claimed the case had no documents because document_list
returned empty — but the documents exist. Root cause: get_case_by_number did
an exact `WHERE case_number = $1`, so any formatting variant of the number
silently failed to resolve. Verified on 8137-24 (9 docs): "8137/24",
"ערר 8137-24", leading/trailing space, and "בל\"מ 8126/03/25" all returned
"תיק לא נמצא", which the agent read as "no documents" and went blind.
Add _normalize_case_number (strip leading proceeding-type prefix to the first
digit, trim, unify '/'→'-') and a normalized fallback in the lookup query
(exact match preferred via ORDER BY). One fix covers every case_number-scoped
tool (document_list, extract_references, search_case_documents, get_claims,
drafting, ...). Bogus numbers still correctly resolve to "not found". (#58)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause of "agent can't find the Agasi decision in the corpus" (CMPA-55):
the decision was fully ingested, but the retrieval layer failed on the
realistic agent query — searching by case name.
- RC-A (#52): lexical tsvector covered only chunk content + halacha text,
so a bare-name query ("אגסי") matched decisions that *cite* the case, not
the case itself. Add meta_tsv on case_law(case_name, case_number) (SCHEMA
V20) and OR it into the lexical halacha/chunk SQL with a match boost, so a
name/number hit surfaces the case's own rows. Agasi: rank 4 → rank 1.
- RC-B (#53): precedent_library_list hard-defaulted source_kind=external_upload
and never exposed the param, hiding uploaded ערר/בל"מ (internal_committee)
decisions. Thread source_kind through service → tool → MCP tool (supports
'internal_committee' / 'all_committees').
- #54: agent instructions (researcher/analyst/writer) — search-by-name
protocol: add content/case-number, search both corpora, use all_committees
before declaring "not in corpus".
- #55: chunker produced tiny fragment chunks ("דיון", "החלטה") from header
keywords matched mid-sentence. Anchor SECTION_PATTERNS to line start +
merge sub-min sections; exclude <50-char fragments at query time (484
existing fragments hidden; full re-chunk tracked as #57).
Tests: scripts/test_retrieval_by_name.py (name ranks case above citer +
substantive regressions); chunker unit checks (0 tiny chunks). New findings
filed as tasks #56 (halacha source_kind leak) and #57 (re-chunk migration).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
TaskMaster's --tag selects the logical group inside a file, not which
tasks.json to write; the CLI resolves the file from cwd. Document the
canonical project-root-relative path and the cwd footgun.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address security-review finding: the host-side legal-chat-service was
binding 0.0.0.0:8770 with no authentication. The service spawns the
claude CLI, whose tool set includes Bash + Edit — so an unauthenticated
/chat/start is effectively RCE. Oracle Cloud's security list closes the
port externally, but defense-in-depth requires two independent layers:
1. Bind defaults to 10.0.1.1 (docker0 bridge gateway). Reachable from
containers on docker bridges (the legal-ai container has a route via
the coolify network), invisible to anything outside the host. The
--host flag is still configurable for local-dev (127.0.0.1) or
special-case deployments, but 0.0.0.0 is explicitly discouraged in
the docstring.
2. /chat/start requires Authorization: Bearer <LEGAL_CHAT_SHARED_SECRET>.
The secret is loaded from /home/chaim/.legal-chat-service.env (chmod
600, off-repo) by the pm2 ecosystem and mirrored as a Coolify env
var so the FastAPI chat_proxy sends a matching header. hmac.compare_digest
prevents timing oracles. /health stays unauthenticated (static OK,
no subprocess) so the FastAPI proxy can probe liveness without the
secret.
The service refuses to start if LEGAL_CHAT_SHARED_SECRET is empty or
shorter than 24 chars — no silent fallback to an open mode.
When the Infisical MCP comes back, migrate the secret into the vault
at /_GUIDELINES per the project secrets policy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Citations starting with ערר/בל"מ/ARAR are committee decisions and must
carry chair_name + district. The /precedents upload form previously
errored out for these (precedent_library service rejects them) with no
in-UI path forward — internal_decision_upload was only reachable via
the /missing-precedents flow.
The form now auto-detects committee citations, reveals chair_name +
district fields, hides the irrelevant source_type/precedent_level
(derived server-side), and posts to /api/internal-decisions/upload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Style Studio's curator-prompt + chat features read reference docs
from disk at runtime. Two issues from the initial production run:
1. Dockerfile + .dockerignore excluded .claude/, docs/, and most of
skills/. Now COPY the four specific files the new endpoints need:
- .claude/agents/hermes-curator.md
- skills/decision/SKILL.md
- docs/legal-decision-lessons.md
- docs/corpus-analysis.md
.dockerignore opens whitelists for just those files.
2. Coolify's custom_docker_run_options=--add-host=host.docker.internal:host-gateway
is not honored on dockerimage build_pack apps (ExtraHosts stayed []).
Switch chat_proxy.py default to http://10.0.1.1:8770 — the docker0
bridge gateway, same pattern Paperclip uses for 3100. Bind the host
pm2 service to 0.0.0.0:8770 so the container can reach it via the
bridge IP. Oracle Cloud's security list keeps the port unreachable
from the public internet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Six-phase upgrade of /training from a read-only dashboard into a full
Style Studio for managing Daphna's style corpus.
- Upload Sheet on /training: file → proofread preview → commit (no more
CLI-only `upload-training` skill).
- Rich corpus metadata: GET /api/training/corpus returns summary, outcome,
key_principles, page_count, parties (regex), legal_citation, lessons_count.
PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details
/content/lessons/patterns) replaces the bare table row.
- LLM metadata enrichment: style_metadata_extractor + MCP tools
(style_corpus_enrich, style_corpus_pending_enrichment) fill summary
/outcome/key_principles via claude_session (free, host-side).
- Per-decision lessons: new decision_lessons table + 4 REST endpoints +
LessonsTab in drawer; hermes-curator now auto-posts findings as
decision_lessons(source=curator).
- Curator Portrait tab: prompt rendered with link to Gitea, recent
curator findings, style_analyzer training prompts, propose-change
form that writes proposals to data/curator-proposals/ for manual
chair review (no auto-mutation of the agent file).
- Style chat tab: SSE-streamed conversations with the style agent.
New host-side pm2 service (legal-chat-service, port 8770) wraps
claude CLI with stream-json + --resume continuation; FastAPI proxies
via host.docker.internal. Zero API cost — uses chaim's claude.ai
subscription. chat_conversations + chat_messages persist history.
Architecture: keeps the existing rule that claude_session only runs
on the host (not the container). The new legal-chat-service is the
canonical bridge between the container and the local CLI for the chat
feature; everything else (upload, metadata, lessons) stays within the
container's existing capabilities.
Audit script (scripts/audit_training_corpus.py) included for verifying
which corpus rows still need enrichment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The drawer was showing a full metadata form (legal topic, case name,
legal issue, cited-by-party + name, status) — most of it duplicated
fields that get auto-extracted from the file once it's uploaded, or
that are already known from when the row was detected. The visible
placeholder text ('לינדאב בע"מ', 'אנטרים', 'זכות עמידה') looked like
real data and confused readers.
Strip the form down to a single "הערות" textarea — that's the only
field the chair actually needs to edit. Reasons for who cited the
decision and in what context belong there too. Everything else (shape
of the precedent on the case_law side) is the LLM extractor's job.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "ערוך פרטים" sheet labeled the case_number field "מראה מקום" and
marked it read-only — confusing because the formal citation IS supposed
to be editable. Rename the read-only field to "מספר תיק (מזהה ייחודי)"
to clarify it's the system key, and add a separate Textarea for the
true formal citation (citation_formatted) with the same markdown-bold
convention used by the inline editor on the detail page.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the visible "העתק" / "ערוך" labels and keep just the icon —
matches the editorial/judicial restraint of the surrounding card.
Tooltip + aria-label preserve the affordance for hover and assistive
tech.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>