The Knowledge Curator (Hermes) couldn't read סופי-{case}.docx because
document_get_text only works on rows in the documents table — the final
file is just a copy in the case's exports/ directory, not a tracked
document. CMP-71 hit this and produced an unproductive interaction
asking the user how to fix the access issue.
Add a new MCP tool that:
- Locates exports/סופי-{case_number}.docx via config.find_case_dir
- Extracts text using the existing extractor service (python-docx based)
- Returns JSON with status + text + page_count + truncation info
- Optional max_chars cap for large decisions
Smoke test on case 1130-25: 400-char preview returns proper Hebrew text
beginning with "לפנינו ערר על החלטת הוועדה המקומית...".
The local MCP server reloads on next Hermes spawn (stdio mode), so the
tool is immediately available — no Coolify deploy needed.
Curator's promptTemplate (DB-stored) updated to use the new tool as the
primary path for reading the final.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The auto-creation in case_create had two failure modes that combined to
make repos silently missing: a stale GITEA_TOKEN returning 401, and the
outer try/except in case_create that swallowed every exception with a
bare pass. Result: cases like 8174-24 ended up with a local git repo and
Paperclip project but no Gitea repo, with no signal anywhere.
_setup_gitea_remote now returns {ok, url, error} and never raises; the
result is attached to the case JSON and the FastAPI endpoint logs a
warning when ok=false. The UI gets a "צור ריפו ב-Gitea" button on the
case header that appears only when the repo or remote is missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cases failed to push silently after the Gitea token in Infisical was
rotated: the embedded credential in each case repo's origin URL was
the old token, the rotation never propagated, and capture_output=True
hid the auth failure as a logger.warning. Three cases (1033-25,
1130-25, 1194-25) accumulated unpushed commits over weeks before
this was noticed.
Fixes the root cause in two places: web/gitea_client.py for uploads
through the FastAPI endpoint, and mcp-server/services/git_sync.py
for case_update / document_upload through MCP tools (which previously
committed but never pushed at all).
The new commit_and_push helper:
- re-injects the current GITEA_ACCESS_TOKEN into the existing origin
URL on every call, so pushes survive token rotation
- logs push failures at WARNING with the actual stderr (the previous
code suppressed errors entirely)
- continues to push even when the commit was a no-op, in case earlier
commits are still unpushed
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CEO agent was reverting case status from "processing" to "new" when
updating metadata fields. Added ordered status list — case_update now
silently ignores status changes that would move backwards.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- auto-sync-cases.sh: fix broken directory scan (was looking for
status subdirs that don't exist), fix env var word-splitting bug,
add safe.directory handling and error logging
- cases.py: auto-create Gitea repo on case_create, fix
documents/original → documents/originals naming mismatch
- app.py: add GET /api/cases/{case_number}/git-status endpoint
- web-ui: add SyncIndicator component in case header showing
sync status (synced/pending/no remote) with last commit time
- pyproject.toml: add httpx dependency
- CLAUDE.md: update Paperclip wakeup API docs
- settings page: switch tag input from Select to free-text with datalist
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Install git in Docker image and wrap all subprocess git calls in
try/except so a missing or failing git binary never kills an upload
that already succeeded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires a new case-deletion path across the three layers that needed it:
- db.delete_case(case_id) — single SQL DELETE; documents, chunks, and
qa_results cascade via existing schema FKs, audit_log nullifies.
- cases_tools.case_delete(case_number, remove_files=False) — MCP tool
wrapper. File tree on disk is kept by default (audit trail); pass
remove_files=True for a hard delete.
- DELETE /api/cases?case_number=... — FastAPI endpoint taking the case
number as a QUERY param rather than a path segment. Case numbers
like "1000/0426" can't be passed through a path parameter because
FastAPI routing decodes %2F before matching, so a query param is
the only shape that works for historical data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds two orthogonal columns — practice_area (top-level legal domain:
appeals_committee / national_insurance / labor_law) and appeal_subtype
(building_permit / betterment_levy / compensation_197) — denormalized
into cases, documents, document_chunks, decisions, and style_corpus so
vector searches can filter without JOINs.
Why: the system handles two unrelated sub-domains under the same
appeals committee (1xxx building permits and 8xxx/9xxx betterment/197),
with different rules and writing style. Without a separation axis,
search_similar() and the block-writer's precedent lookup were free to
surface betterment-levy paragraphs while drafting a building-permit
decision — a real risk of cross-domain contamination. The same axis
also lets future domains (national insurance, labor law) coexist
without separate schemas.
Schema (V4 migration in db.py):
- ALTER ... ADD COLUMN IF NOT EXISTS on all five tables + composite
indexes (practice_area first).
- Idempotent backfill: case_number ~ '^1' → building_permit, '^8' →
betterment_levy, '^9' → compensation_197; propagated to documents,
chunks, and decisions via case_id; training-corpus rows (case_id NULL)
default to appeals_committee.
Code:
- New services/practice_area.py with derive_subtype, validate, and
is_override + enum constants.
- db.create_case / create_document / store_chunks / create_decision
inherit practice_area from the parent case (or take an explicit
override for the case_id=None training corpus).
- db.search_similar and search_similar_paragraphs accept practice_area
+ appeal_subtype filters using the denormalized columns.
- tools/search.py auto-resolves the filter from case_number when given.
- block_writer._build_precedents_context now passes the active case's
practice_area to search_similar_paragraphs — closes the contamination
hole for the discussion-block precedent fetch.
- tools/cases.case_create auto-derives subtype from case_number; an
explicit override that disagrees writes a case_subtype_override entry
to audit_log so we can spot bad classifications later.
- tools/documents.document_upload_training tags new training material
with practice_area + subtype end-to-end (corpus, document, chunks).
UI (web/static/index.html + web/app.py):
- New-case wizard gets a practice_area dropdown (others disabled until
national_insurance / labor_law arrive) and an appeal_subtype dropdown
with JS auto-fill from the case-number prefix; manual edits stick.
- Case header shows a blue badge with practice_area · subtype.
- CaseCreateRequest plumbs both fields through to cases_tools.case_create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add delete_document_chunks for reprocessing, save extracted text to disk
- Expand case directory structure (original/extracted/proofread/backup)
- Update classifier patterns (תגובה, הודעת עמדה)
- Fix proofreader agent paths for new directory layout
- Update HEARTBEAT to notify on every task completion
- Improve bidi_table with LRE/PDF directional embedding
- Add Paperclip project verification and auto-close setup issue
- Add auto-sync-cases.sh for Gitea synchronization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace single CASES_DIR with find_case_dir() that searches across
all status directories. New cases created in cases/new/{number}/.
Config: CASES_BASE, CASES_NEW, CASES_IN_PROGRESS, CASES_COMPLETED
Docker: added -v /home/chaim/legal-ai/cases:/cases volume mount
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add expected_outcome field to cases (rejection/partial/full/betterment_levy)
- New lessons.py module with golden ratios, templates, and drafting guidance per outcome type
- Style analyzer now uses Opus with full decision text (no truncation), with multi-pass fallback for large corpora
- Drafting tool provides outcome-specific templates, section guidance, and ratio comments
- Improved JSON extraction with bracket-matching fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ezer Mishpati - AI legal decision drafting system with:
- MCP server (FastMCP) with document processing pipeline
- Web upload interface (FastAPI) for file upload and classification
- pgvector-based semantic search
- Hebrew legal document chunking and embedding