Commit Graph

14 Commits

Author SHA1 Message Date
a3ef9e5e34 fix(ui): ברירת-מחדל של ספריית הפסיקה — החלטות ועדות ערר ראשונות
מתג-המקטעים נפתח כעת על "החלטות ועדות ערר" (הקורפוס המרכזי של היו"ר)
במקום "פסיקת בתי משפט".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 17:26:49 +00:00
6bf19bd0d7 feat(ui): אינדיקטור התקדמות לחילוץ מטא-דאטה + מתג-מקטעים בספריית הפסיקה
שתי בעיות UX בדף /precedents:

1. חילוץ מטא-דאטה לא נתן שום אינדיקציה שהוא רץ. בניגוד לחילוץ טקסט/הלכות
   (extraction_status / halacha_extraction_status) למטא-דאטה היתה רק חותמת-זמן
   metadata_extraction_requested_at — אין מצב "processing", לכן StatusPill לא
   הציג כלום. נוספה עמודת metadata_extraction_status ('pending'|'processing'|
   'completed'|'failed') במתכונת העמודות הקיימות, וה-worker
   (process_pending_extractions + reextract_metadata) מעדכן אותה: processing
   בתחילת פריט, completed בסיום (מנקה גם את החותמת), pending בכשל (לריטריי).
   ה-UI מציג תג "מחלץ מטא-דאטה" + באנר מונה-אצווה עם אחוז התקדמות (high-water-mark
   של עומק-התור) שמתעדכן אוטומטית דרך ה-polling הקיים (5ש').

2. שתי טבלאות מוערמות (בתי משפט / ועדות ערר) חייבו גלילה ארוכה. הוחלפו במתג-
   מקטעים — טבלה אחת בכל פעם, עם שמירה על העמודות הייעודיות לכל סוג.

Invariants: G2 (מרחיב מנגנון-סטטוס קיים, לא מסלול מקביל), INV-TOOL4/GAP-45
(המשך חשיפת תור-החילוץ הסמוי). אין נגיעה בתוכן משפטי (G11).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 16:21:41 +00:00
cbc7a1e336 feat(precedents): formal citation per Israeli citation rules + copy/edit UI
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m25s
Until now, "case_number" was the only stored identifier for a precedent.
But a *citation per the Israeli unified citation rules* is a different
beast — it has bold parties, an unbold prefix (court abbrev + panel/
district parenthetical + case number), and an unbold trailing reporter
(נבו / פ"ד...).  Without storing it as a first-class field we couldn't
hand the chair a one-click "copy as citation" experience for pasting
into decisions.

Changes:
- Schema V19: case_law.citation_formatted TEXT (Markdown — parties
  wrapped in **…** so the copy helper can render <strong> for Word/Docs
  paste and keep plain-text fallback meaningful).
- Metadata extractor: composes citation_formatted from the document
  text per the unified citation rules, with worked examples for ע"א /
  עת"מ / ערר / בל"מ in the prompt. Refuses to store half-formed strings.
- PATCH /api/precedent-library/{id} accepts citation_formatted so the
  chair can correct LLM mistakes.
- /precedents/[id]: dedicated "מראה מקום" block with bold rendering,
  a copy-to-clipboard button (text/html + text/plain so Word keeps
  the bolds), and an inline edit textarea.
- /precedents list rows: link displays the formatted citation when
  available, with a small inline copy button — falls back to the bare
  case_number for older rows.

Backfill of existing rows happens by re-stamping the extraction queue
once V19 has rolled out and the new field is reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 07:14:34 +00:00
2aee398b4a feat: Stage C — RAG advanced (#33, #47, #48, #49, #50, #51)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
Six independent sub-tasks dispatched in parallel; aggregated here.

## #33 — Hide case_name column
library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם"
get `className="hidden"` in both Court and Committee row variants.
DB column preserved for future use.

## #47 — Audit script periodic
New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר
prefix, internal missing chair/district, cases.practice_area enum)
+ CEO wakeup on violations + cron `0 7 * * *`. First run: 0 issues.

## #48 — Parent-doc retrieval (gated, default off)
Schema V17: precedent_chunks.parent_chunk_id + chunk_role
('child'|'parent'). New chunker.chunk_document_hierarchical() —
section-aware parents (~1500 tokens) containing ~5 overlapping
children (~300 tokens each). New db.store_precedent_chunks_hierarchical
two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and
swap content + dedupe by parent_chunk_id when flag on. Toggle:
PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS.
Backfill ~3min and ~$0.20 — deferred to follow-up.

## #49 — Multimodal backfill
New scripts/backfill_multimodal_precedents.py with token-matching
case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container:
26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings
grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no
source file on disk) — will catch up when re-uploaded.

## #50 — Closed-loop feedback + nDCG
Schema V18: search_logs + search_relevance_feedback. New telemetry.py
with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) +
auto-infer_relevance_from_citations (reads case drafts → marks score=3
when cited precedent appears in past search top-K). Hooks added to 5
search paths. scripts/compute_ndcg.py for aggregation. Two admin API
endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI
deferred — API is enough for now.

## #51 — Halacha quality monitoring
New scripts/monitor_halacha_quality.py — baseline avg confidence
(trusted=0.849, all=0.833, pending=0.694) with rolling window drift
detection. Default 5% threshold. Exits non-zero on alert for cron
integration. Recommended: `0 8 * * 1` weekly Mon 8am.

## Bonus: 230 unlinked citations → missing_precedents
Bulk-imported 230 distinct unlinked citations from
precedent_internal_citations to missing_precedents.status='open',
party='committee', with notes listing source citers. Top candidate:
ע"א 3213/97 (cited 5x). Total open missing_precedents now 237.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 11:26:52 +00:00
10a63fb9e0 fix(precedents): separate court rulings from committee decisions correctly
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m37s
- DB: add 'all_committees' virtual source_kind covering internal_committee
  + external_upload appeals_committee rows in one query
- DB: stats now count all case_law rows (not just external_upload),
  fixing the precedents_total that excluded 44 internal-committee records
- UI: courts table filters to source_type=court_ruling only;
  committees table uses the new all_committees query

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 09:59:30 +00:00
f94201c577 feat(precedents): make citation link to detail page
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 34s
Both CourtRow and CommitteeRow citation cells are now Next.js Links
→ /precedents/{id}, letting users navigate directly from the list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 09:01:26 +00:00
171da84680 feat(precedent-library): add halacha-extract button to library list rows
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m8s
When a precedent has not had successful halacha extraction yet, show a
small wand icon between the edit and delete buttons. Clicking it queues
the precedent for the local MCP worker (request-halachot endpoint).

Visibility rule (`needsHalachaExtraction`): show when text extraction is
complete AND halacha status is "pending without requested_at" (never
tried) or "failed" (allow retry). Hide while processing, after
completion, or when already queued — to avoid duplicate requests.

Pairs with the metadata-extract button on the edit sheet.
2026-05-07 06:30:03 +00:00
c0f67ab841 feat(precedents): split library into court rulings + appeals committee tables
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m34s
- /api/precedent-library now accepts source_kind param (default external_upload)
- list_external_case_law returns chair_name/district fields
- LibraryListPanel renders two separate tables with appropriate columns
- internal_decisions migration: added queue_halachot param to defer extraction
- Fixed practice_area mapping from style_corpus (appeals_committee → proper enum)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 18:49:32 +00:00
1f17419ee9 ui(precedents): live status pill with shimmer + auto-queue + auto-refresh
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m44s
The chair pointed out three UX gaps after uploading a new precedent:

1. The status said "מחלץ הלכות" but nothing was actually running — the
   field only meant "halacha_extraction_status != completed", which
   includes the post-upload "pending" state where the local MCP worker
   hasn't been told to drain anything yet. Misleading.

2. The page didn't refresh on its own. The chair had to F5 to see new
   counts after extraction completed.

3. Clicking the trash icon mid-extraction would cascade-delete the row
   while the extractor was still using it (FK errors, partial writes).

Fixes:

- ingest_precedent now auto-queues both metadata and halacha extraction
  on upload by stamping the request timestamps. The chair (or me) drains
  the queue with one `precedent_process_pending` call from chat —
  no need to click any button before that.

- StatusPill is now five-state with proper labels:
    "נכשל" (extraction_status=failed) — red
    "מעבד טקסט" — shimmer (extraction_status=processing)
    "בתור" — neutral (chunks queued, not yet running)
    "מחלץ הלכות" — shimmer (halacha_extraction_status=processing)
    "ממתין לחילוץ" — neutral (queued for local MCP worker)
    "לא חולץ" — neutral (pending without queue stamp — shouldn't happen)
    "X/Y מאושרות" — gold (done, with halachot count)
  The shimmer is a CSS-only sliding-stripe animation defined in globals.

- usePrecedents has a conditional refetchInterval — polls every 5s while
  any row is mid-extraction or queued, then stops once everything settles
  to completed/failed. New helper isPrecedentActive() centralises the
  "is this row mid-something" check so the UI and the destructive-action
  guard agree.

- Trash button is disabled (opacity 30%, tooltip explains) while the row
  is active. Pencil/edit stays enabled — editing metadata fields during
  extraction is safe (last write wins, low-stakes race).

Schema: list_external_case_law now exposes the two *_requested_at
timestamps so the UI can distinguish "queued" from "never asked".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:47:31 +00:00
8e1384b897 fix(precedents): wrap citation column + extractor fills source_type
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m27s
Two follow-ups after running the metadata extractor on 403-17:

1. Library table: shadcn TableCell defaults to whitespace-nowrap and
   the table wrapper has overflow-x-auto, so the long citation forced
   a horizontal scrollbar inside the row. Override on the citation
   cell only — whitespace-normal + break-words + min/max-w to keep the
   column readable. Same for the case-name cell. Row aligns to top so
   wrapping doesn't push neighbours up.

2. Extractor now also fills source_type (court_ruling /
   appeals_committee). The previous round added decision_date_iso,
   precedent_level, and court but left source_type empty. Same
   closed-enum + merge-only-if-empty policy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:28:35 +00:00
fc3b6b6cae ui(precedents): collapsible groups by precedent + Hebrew labels + RTL fixes
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 33s
After running the dual-mode halacha extractor on a real appeals committee
decision (403-17), the pending-review tab surfaced 351 halachot in a
single flat list — the chair correctly pointed out that this is unusable
without grouping. Three fixes:

1. Group pending halachot by precedent (case_law_id). Each group shows
   the citation, court, date, level and item count; default state is
   collapsed so the chair picks one ruling at a time. Within a group,
   items still sort by confidence ascending so the doubtful ones surface
   first. J/K/A/R/E now scope to currently-expanded groups; toggling
   open auto-focuses the first item.

2. Translate the badges that were leaking English: rule_type values
   (`persuasive`, `interpretive`, `binding`, `application`, `procedural`,
   `obiter`) now render as Hebrew labels, and `confidence X.XX` becomes
   `ביטחון X.XX`. The card header no longer repeats the citation since
   it's already in the group header.

3. Strip Unicode bidi marks (U+200E/F/202A-E/2066-9) from displayed
   citations. Nevo PDFs and the upload form embed these in the
   case_number; they render as zero-width but visually push the text
   away from the right edge of the table cell. Also: hide the empty
   court line under the case name in the list (was rendering as a
   stray em-dash), and use a muted em-dash for empty date/level rather
   than blank/dash inconsistency across columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:05:40 +00:00
2cfdf35191 refactor(precedents): keep all LLM calls on the local-MCP path
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
Architectural correction: every claude_session caller in this project
runs through the local MCP server (~/.claude.json points at
/home/chaim/legal-ai/mcp-server/.venv/bin/python). The Coolify container
has no `claude` CLI and no claude.ai session, so any LLM call originating
from web/ FastAPI fails with "Claude CLI not found" — which is exactly
what we hit on 403-17.

The earlier Anthropic SDK fallback would have made it work, but at
direct API cost. The chair's preference is to stay on the claude.ai
session for everything. So:

- claude_session.py: removed the SDK fallback, restored CLI-only.
  The error message now points the next person at the architectural
  rule in the module docstring instead of papering over it.
- precedent_library.py:ingest_precedent (called from FastAPI on upload)
  now does only the non-LLM half: extract → chunk → embed → store.
  Sets halacha_extraction_status='pending' for the chair to act on.
- reextract_halachot / reextract_metadata kept, but lazy-import their
  extractors so the FastAPI path can't accidentally pull them in. They
  are reachable only via the MCP tools precedent_extract_halachot /
  precedent_extract_metadata, which run locally with CLI.
- Removed POST /api/precedent-library/{id}/extract-halachot and
  /extract-metadata — they were dead ends from the container.
- Dropped the `anthropic` Python dep that the SDK fallback required.
- UI: removed the "refresh halachot" and "sparkles metadata" buttons
  that called those endpoints. Edit sheet now points the chair at the
  MCP tool names instead.

Halacha and metadata extraction for an uploaded precedent now happen
when the chair (via Claude Code) runs:
  mcp__legal-ai__precedent_extract_metadata <case_law_id>
  mcp__legal-ai__precedent_extract_halachot <case_law_id>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:06:08 +00:00
73a79ea7e8 feat(precedents): metadata auto-fill, edit sheet, persuasive extraction
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
Three improvements to the precedent library based on usage feedback:

1. Auto-fill metadata at upload time. New service
   precedent_metadata_extractor reads the ruling's full_text and
   suggests case_name (short), summary, headnote, key_quote,
   subject_tags, appeal_subtype. The merge policy fills only empty
   fields, preserving everything the chair typed in the upload form.
   Wired into the ingest pipeline; also exposed as a re-run endpoint
   POST /api/precedent-library/{id}/extract-metadata for existing
   records.

2. Edit sheet in the UI. Pencil icon on each library row opens a
   pre-populated form covering every field. A Sparkles button on the
   sheet runs the metadata extractor on demand and refreshes the
   form. The case_number is read-only because halachot are FK'd to
   it; renaming requires delete + re-upload.

3. Halacha extractor branches on is_binding. Sources marked binding
   (Supreme/Administrative) keep the strict halacha prompt. Non-binding
   sources (other appeals committees, district courts on planning
   matters) get a different prompt that extracts applications,
   interpretive principles, and persuasive conclusions — labeled with
   new rule_types 'application' and 'persuasive'. The fallback also
   widens chunk selection: if the chunker labeled nothing as
   legal_analysis/ruling/conclusion, we now run on all chunks rather
   than returning zero halachot for a usable ruling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 10:19:35 +00:00
7ee90dce31 feat: external precedent library with auto halacha extraction
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m27s
Adds a third corpus of legal authority distinct from style_corpus
(Daphna's prior decisions for voice) and case_precedents (chair-attached
quotes per case). The new corpus holds chair-uploaded court rulings and
other appeals committee decisions, with binding rules (הלכות) extracted
automatically and queued for chair approval.

Pipeline (web/app.py + services/precedent_library.py):
file → extract → chunk → Voyage embed → halacha_extractor → store +
publish progress over the existing Redis SSE channel.

Schema V7 (services/db.py): extends case_law with source_kind +
extraction status fields under a CHECK constraint pinning practice_area
to the three appeals committee domains (rishuy_uvniya, betterment_levy,
compensation_197). New precedent_chunks (vector(1024)) and halachot
tables (vector(1024) over rule_statement, IVFFlat indexes, gin on
practice_areas/subject_tags). Halachot start as pending_review; only
approved/published rows are visible to search_precedent_library.

Agents: legal-writer, legal-researcher, legal-analyst, legal-ceo,
legal-qa get search_precedent_library. legal-writer prompt explains
the three-corpus distinction and CREAC use; legal-qa now verifies that
every cited halacha resolves to an approved row in the corpus.

UI: /precedents page with four tabs — library / semantic search /
pending review (J/K nav, A/R/E shortcuts, badge count) / stats.
Reuses the existing upload-sheet progress + SSE pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 08:38:18 +00:00