legal-ai

Author	SHA1	Message	Date
Chaim	1986fe3b14	feat(storage): X14 Phase 2a — route source-document writes through storage.py Rewire the source-document staging writes onto the unified storage layer (INV-STG1), replacing direct shutil.copy2 calls: - tools/documents.py: case originals + training-corpus uploads - services/ingest.py: _stage_file (now async) — covers precedent-library, internal-decisions, and digests (the canonical intake helper) - services/digest_library.py: awaits the now-async _stage_file Each write goes through storage.put_file(..., bucket=DOCUMENTS) with the DATA_DIR-relative key; the Hebrew original filename rides as object metadata (INV-STG2), content-type is guessed from the extension. DB path columns are unchanged (still the absolute dest) — object_key backfill is Phase 3. Under the default STORAGE_BACKEND=filesystem the bytes land at the exact legacy on-disk location (put_file → shutil.copy2 to DATA_DIR/key), so this is zero behaviour change in prod. shutil import dropped where now unused. tests: +2 staging regression tests (file lands under DATA_DIR at the legacy path); 20 storage + 22 ingest tests green; 242 collected with no import breakage. Derived/export write sites (thumbnails, extracted text, DOCX exports) are Phase 2b. Keeps G2; advances INV-STG1. Spec: docs/spec/X14-storage-minio.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 08:00:27 +00:00
Chaim	f56309da5a	feat(X13): auto-trigger court fetch from digests + drain tool סוגר את הלולאה — יומון שמצביע על פס"ד בית-משפט שלא בקורפוס מזניק אחזור אוטומטי, וקושר את היומון חזרה אחרי הקליטה (INV-DIG3 + INV-CF2). - digest_library.try_autolink: בכשל-קישור, אם הציטוט מסווג כפס"ד-בימ"ש (supreme/admin) → _enqueue_court_fetch יוצר court_fetch_jobs(pending); ועדת-ערר (skip) לא מוזנק. never-raises (לא שובר קליטת-יומון). - orchestrator.drain_pending(limit): מנקז pending/failed סדרתי (cooldown, INV-CF4), fetch+ingest לכל אחד; בהצלחה מקשר את היומון ל-case_law שנקלט. - כלי-MCP court_fetch_drain + רישום ב-server.py. - X13 spec: עודכן (הפער ב-INV-CF2 סומן כמתוקן). נבדק מול ה-DB: עת"מ 46111-12-22 → job tier=admin pending digest-linked; ערר 1110/20 → לא מוזנק. כלי מקומי בלבד (ingest = claude CLI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:04:12 +00:00
chaim	acb8e2c206	Merge pull request 'feat(X13): אחזור-פסיקה אוטומטי מנט המשפט → קורפוס (Tier 0 + scaffold)' (#110 ) from worktree-court-fetch into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m21s Details	2026-06-07 18:13:15 +00:00
Chaim	0990db7a3c	feat(X13): auto-fetch court verdicts from נט המשפט → corpus (Tier 0 + scaffold) תת-מערכת אחזור-פסיקה אוטומטי: כשיומון מצביע על פס"ד בית-משפט, מסווגים את הערכאה, מורידים מהמקור הציבורי המתאים, וקולטים דרך צינור-הקליטה הקנוני. - spec-first: docs/spec/X13-court-fetch.md (INV-CF1..CF7) + אינדקס - מסווג court_citation.py (supreme/admin/skip) + 10 בדיקות (עת"מ 46111-12-22 → admin) - Tier 0: court_fetch_supreme.py — supremedecisions API (reverse-engineered), httpx + browser-headers (אומת 200) + politeness - תור court_fetch_jobs (SCHEMA_V30) + DB helpers + court_fetch_orchestrator.py - Tier 1 scaffold: legal-court-fetch-service (aiohttp+Bearer, מראת legal-chat-service) + camofox_client (Camoufox open-source) + recaptcha_audio (Whisper מקומי) + pm2 - Tier 2 fallback חינני: manual + missing_precedent (INV-CF2/CF3 — אין drop שקט) - כלי-MCP court_verdict_fetch / court_fetch_status; SCRIPTS.md Invariants: מקיים G2 (מסלול-קליטה יחיד, INV-CF1) · G3/G1 (idempotent+נרמול, INV-CF5) · G4/§6 (אין בליעה שקטה, INV-CF2) · G10 (שער-אנושי, INV-CF3) · G5 (source_type, INV-CF6) · G9 (provenance+audit, INV-CF7). מקורות INV-CF4: RFC 9309 · Google crawler · OWASP OAT. Follow-ups (טרם אומתו חי): live Tier-0 validation · התקנת camofox-browser+whisper · כיול selectors Tier-1 · COURT_FETCH_SHARED_SECRET (Infisical+Coolify) · טריגר מ-digest try_autolink (worktree-digests-radar). V30 עלול להתנגש עם digests-radar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:12:13 +00:00
Chaim	06281996ca	feat(digests): Phase 2 — API endpoints + /digests UI (X12) משטחי-משתמש לקורפוס היומונים: endpoints ב-FastAPI + דף UI נפרד /digests (לדפדוף, חיפוש, העלאה, וקישור לפסק המקורי). היומון נשאר מקור-משני המצביע על הפסק — אינו מצוטט בהחלטה (INV-DIG1) ואינו מחלץ הלכות (INV-DIG2). Backend (container-safe + local split): - digest_library: פוצל ל-create_pending_digest (CONTAINER-SAFE: stage+ extract_text+create row 'pending', בלי LLM) ↔ enrich_digest/ process_pending_digests (local: LLM+embed+autolink). ingest_digest מאחד. - db.list_pending_digests; MCP digest_process_pending (tool+server) — חלופה ל-batch script לריקון התור. - web/app.py: 10 endpoints /api/digests/* (upload/list/search/queue-pending/ get/patch/delete/link/relink/unlink). upload=INSERT-only pending (ה-LLM רץ מקומית — claude_session local-only). כולם מחזירים dict בדפוס precedent. Frontend (Next 16, ללא api:types — hooks עם טיפוסים hand-written כמו precedent-library.ts): - lib/api/digests.ts — hooks (useDigests/useDigestSearch/useDigestPending/ useUploadDigest/useLink/Relink/Unlink/Delete/Update). - דף /digests נפרד (לא כרטיסייה ב-/precedents — לשמור גבול סמכותי/משני, INV-DIG1): טאבים יומונים/חיפוש + DigestCard (badge קישור-לפסק) + DigestUploadDialog + pending badge. nav + header-context. אומת: backend round-trip מלא (create_pending→list_pending→process_pending→ search→restore); web-ui מתקמפל (webpack/tsc נקי, route /digests נוצר). הערה: build דיפולטי (turbopack) נכשל ב-worktree עקב symlink ל-node_modules — ב-CI/Docker (node_modules אמיתי) עובד; אומת עם --webpack. Invariants: מקיים INV-DIG1/2 (upload לא מחלץ הלכות, UI מציג "מצביע לא מצוטט"), INV-DIG3 (link/relink/queue). G4 (אין בליעה — שגיאות→toast/HTTP), G2 (מסלול נפרד, לא מקביל). X6 (חוזה UI↔API — endpoints בדפוס precedent; hooks hand-written כמו שאר ה-domain modules). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:11:05 +00:00
Chaim	8171572cdd	feat(digests): קורפוס יומונים כשכבת-גילוי (radar) — X12 מאגר חדש ליומוני "כל יום" (עפר טויסטר) כשכבת-גילוי מעל קורפוסי-הפסיקה: מקור-משני המצביע על פסק הדין המקורי, נקלט לטבלה נפרדת `digests`, נחפש סמנטית, ומקושר לפסק המקורי בספריית הפסיקה — אך לעולם אינו מצוטט בהחלטה ואינו מחלץ הלכות. Phase 0 (spec): - docs/spec/X12-digests-radar.md — INV-DIG1 (מצביע לא מצוטט) / INV-DIG2 (מסלול-קליטה נפרד, לא מקביל — מקיים G2) / INV-DIG3 (קישור-לפסק הוא הגשר; חוסר-קישור = פער גלוי). עדכון אינדקס 00/03/README. Phase 1 (MVP): - SCHEMA_V30: טבלת `digests` (HNSW על embedding — לא ivfflat, להימנע מ-recall cliff בקורפוס קטן/צומח) + GIN/FTS + UNIQUE חלקי ל-idempotent. - services/digest_metadata_extractor.py — חילוץ-LLM (claude_session local-only, ייבוא lazy): תג-מושג, כותרת-הלכה, מראה-מקום, שני-תאריכים מובחנים, תגיות. - services/digest_library.py — מסלול קצר עצמאי (INV-DIG2): extract→hash→LLM→ embedding יחיד→autolink. לא משתמש ב-ingest.ingest_document. - tools/digests.py + רישום 7 כלים ב-server.py (digest_upload/list/get/link/ relink/delete + search_digests). - scripts/ingest_digests_batch.py — קליטה ידנית מ-data/digests/incoming. - legal-researcher.md: שלב 2ב.0 (סריקת-radar לפני אימות) + סעיף-דוח ט + 3 כלים ב-frontmatter. HEARTBEAT §8: ניתוב יומון→digest_upload. אומת end-to-end: 4 יומונים נקלטו (מטא-דאטה מדויק), חיפוש סמנטי מדרג נכון ("היטל השבחה"→5160, "תמא 38"→5158), link/relink/autolink/revert + מעטפת-MCP. Invariants: מוסיף INV-DIG1/2/3 (X12). מקיים G2 (bounded context נפרד, לא מסלול מקביל), G3 (idempotent upsert), G4 (אין בליעה שקטה — פער-קישור מוצף), G9 (עקיבוּת — היומון מצביע על מקור עקיב). נוגע G7 (RRF) — נדחה, חיפוש סמנטי-בלבד בשלב 1 (FTS index מוכן). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 17:49:00 +00:00
Chaim	e4651a9d06	feat(#99 / T10): get_style_guide — יחסי-זהב נמדדים מהקורפוס לצד היעד style_distance.measure_corpus_ratios(): מפצל כל החלטה ב-style_corpus לסעיפים (chunker) ומחשב ממוצע %-סעיף — אגרגט "_all" + פר-תוצאה (כשיש). cached. get_style_guide מציג שורת "נמדד בפועל" עם ⚠️ על פער מטווח-היעד. מצב נוכחי: style_corpus.outcome לא מאוכלס → מוצג אגרגט כל-ההחלטות (n=48: רקע 26.4% / טענות 9.7% / דיון 43.8% / סיכום 20.1%); פיצול לפי-תוצאה future-ready. המדידה גם מאירה מגבלות זיהוי-סעיפים (כוונת T10 — לסמן פער לבדיקה). חופף-חלקית ל-T7 שמודד adherence per-draft; זה מודד את הקורפוס. כשל מדידה מוצג, לא נבלע. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:01:42 +00:00
Chaim	420cb819f5	feat(halacha-triage): quality-gated + prioritized review queue + metrics (#84 ) Backend for the halacha approval-queue triage (#84). The keyboard UI, batch actions and defer/reject (#84.4–6) already shipped; this adds the gating, prioritization and metrics the queue was missing. db.list_halachot — two opt-in triage controls: * exclude_low_quality (#84.1): drop items carrying ANY quality_flag (application / quote_unverified / truncated / non_decision / thin / nli_unsupported / near_duplicate) — they belong in a 'needs extraction fix' bucket, not the chair's approve queue. * order_by_priority (#84.3): active-learning order — negatively-treated first, then most-uncertain (lowest confidence), then oldest — instead of FIFO, so the highest-value decisions surface first. halachot_pending (MCP) — now gated + prioritized BY DEFAULT; include_low_quality= true reveals the needs-fix bucket. The agent review path benefits immediately. GET /api/halachot — same two params, default OFF (non-breaking; the UI opts in). metrics.halacha_backlog (#84.7) — splits pending into clean vs flagged, adds deferred, reviewed_total, approve_ratio, and a pending_by_flag breakdown, so the backlog distinguishes real review work from extraction noise. Deferred (documented): #84.2 near-duplicate cluster cards and wiring the UI fetch to the new params require frontend work + an api:types regen AFTER this deploys (the new query params aren't in prod's OpenAPI until then) — a clean follow-up. The backend fully supports both now. Verified against the live DB (read-only): - pending 177 → gated-clean 110, 0 flagged items leak into the clean queue. - priority order surfaces the lowest-confidence items first (0.55, 0.55, ...). - backlog: pending_clean=110 / pending_flagged=67 / approve_ratio=0.916, pending_by_flag={nli_unsupported:59, quote_unverified:3, thin:3, truncated:2}. - pytest tests/test_halacha_quality.py — 52 passed (no regression). Invariants: G1 (gate at source — SQL filter, not post-hoc); G2 (no parallel path — same list_halachot); §6 (flagged items routed to a bucket, never dropped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:00:52 +00:00
Chaim	a2236363d4	feat(mcp): FU-14 GAP-50 — deprecate draft_section לטובת get_block_context INV-TOOL2. מיפוי הראה ש-6 כלי-הבלוק אינם כפילות מיותרת: write_block/ write_all_blocks/save_block_content/write_interim_draft משרתים זרימות שונות (CLI/initial-draft מול תהליך-ה-writer "התיקון בקובץ, לא ב-DB"). הכפילות האמיתית היחידה — draft_section (הקשר לפי-סעיף, granularity ישן) חופף ל- get_block_context (לפי-בלוק, תואם 12-הבלוקים הקנוני). הכרעת-יו"ר: draft_section סומן deprecated (docstring ב-server.py + drafting.py מפנה ל-get_block_context; draft-decision.md עודכן). ללא הסרה, ללא מיזוג כלי- הכתיבה — שמירת תהליך-הכתיבה המכוון. בדיקות: 182/182 עוברים. GAP-49+50 סגורים. Invariants: מקדם INV-TOOL2 + G2. מתועד ב-X9 (נסגר) + gap-audit פרוסה 9. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 18:57:02 +00:00
Chaim	14568fdd15	feat(mcp): FU-14 GAP-49 — תיקון שם-הכלי המטעה (precedent_search_library) INV-TOOL2: `precedent_search_library` (שמחפש ציטוטים מצורפים-לתיק) היה הפוך וכמעט-זהה ל-`search_precedent_library` (ספריית-הפסיקה הסמכותית, מקור CREAC), מה שסיכן ציטוט מהמקור הלא-נכון בהחלטה. שונה ל-`search_case_precedents` (שם ברור: case-attached). השם הישן נשמר כ-@mcp.tool() alias deprecated המנתב לחדש → אפס שבירה לסוכנים חיים. docstrings של שני כלי-הפסיקה הובהרו (case-attached מול authoritative). עודכנו: web/app.py (typeahead), legal-researcher/legal-writer docs, precedent_library docstring. 5 כלי-החיפוש הנותרים (search_decisions/case_documents/find_similar/internal/ precedent_library) מחפשים קורפוסים מובחנים בשמות סבירים — לא בוצע rename המוני (churn גבוה, ערך נמוך מול הסיכון). בדיקות: 182/182 עוברים. אחרי deploy — סנכרון cross-company של doc-הסוכן. Invariants: מקדם INV-TOOL2 + G2. מתועד ב-X9 + gap-audit פרוסה 8. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 18:51:17 +00:00
Chaim	29af008271	feat(mcp): FU-14 GAP-48 פרוסה 3 — envelope למשפחת drafting (סגירת GAP-48) הפרוסה האחרונה של GAP-48 (INV-TOOL1). 18 כלי drafting הומרו ל-{status,data,message} דרך tools/envelope.py — כולל מסלול הפקת-ההחלטה הקריטי. עיקרון לכלים עם כשל משמעותי (export_docx/revise_draft/apply_user_edit): err() ברמת-המעטפת — כך שהסוכן והמשתמש רואים את הכשל; failed_gates רוכב ב-data. שאר הכלים: ok(data=payload) להצלחה, err להיעדר-תיק/קלט-שגוי/חריגה. 6 צרכני-app.py חוּוטו (get_decision_template, apply_user_edit ×2, revise_draft, list_bookmarks, export_docx) עם envelope_unwrap + בדיקת status=="error"→4xx, לשמירת חוזה-ה-API (X6) ללא-שינוי. test_export_qa_gate עודכן לחוזה החדש. בדיקות: 182/182 עוברים (כולל שערי-QA של הייצוא). GAP-48 סגור: כל ~12 משפחות-הכלים אחידות. נותר ב-FU-14: GAP-49/50 (שובר), GAP-54. Invariants: משלים INV-TOOL1 + G2. מתועד ב-X9 (נסגר) + gap-audit פרוסה 7. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 17:51:56 +00:00
Chaim	79b9c37301	feat(mcp): FU-14 GAP-48 פרוסה 2 — envelope אחיד ל-11 משפחות-כלים המשך מיגרציית INV-TOOL1 מעבר למשפחת-החיפוש (#71). הומרו ל-{status,data,message}: precedent_library, citations, internal_decisions, missing_precedents, training_enrichment, precedents, legal_arguments, cases, documents, workflow (~55 כלים). בוטלו 5 עותקי _ok/_err משוכפלים (alias ל-tools/envelope.py — SSoT, G2). עיקרון: envelope-status = הצלחת-הקריאה-לכלי; תוצאה-עסקית (idempotent_existing, noop, completed...) נשמרת בתוך data. err רק לכשל אמיתי (not-found/invalid/exception). תאימות-API: צרכני web/app.py של cases/workflow/precedents חוּוטו דרך envelope_unwrap + בדיקת status=="error"→4xx — תשובת ה-HTTP זהה, web-ui לא מושפע. (documents/legal_arguments/citations/... אינם נצרכים מ-app.py — agent-only.) בדיקות: 182/182 עוברים (test_corpus_constraints עודכן לחוזה החדש). נותר: משפחת drafting (מסלול הפקת-ההחלטה) בפרוסה נפרדת עם שער טסט-ייצוא. Invariants: מקדם INV-TOOL1 + G2 (SSoT, ביטול כפילות). מתועד ב-X9 + gap-audit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 17:41:39 +00:00
Chaim	aa0a736a7b	feat(mcp): FU-14 GAP-48 פרוסה 1 — envelope אחיד (SSoT) + משפחת-חיפוש INV-TOOL1: כלי-ה-MCP החזירו 3 מוסכמות סותרות (raw payload / {error} / {status,message} אד-הוק) + 5 עותקי _ok/_err משוכפלים. נוצר tools/envelope.py כמקור-אמת יחיד: ok/empty/err → {status,data,message}, כש-status מבחין מפורשות הצלחה/ריק/שגיאה. פרוסה 1 ממירה את משפחת-החיפוש (search_decisions, search_case_documents, find_similar_cases, search_internal_decisions). web/app.py מפרק את המעטפת דרך envelope_unwrap כדי לשמר את חוזה-ה-UI↔API (X6) ללא-שינוי — תשובת ה-HTTP זהה (list על hits, {"message"} על ריק/שגיאה). טסט test_search_domain_scope עודכן לחוזה החדש (5/5 עוברים). החלטה: הדרגתי לפי-משפחה ולא big-bang. מפת-צרכנים: server.py pass-through, web-ui מבודד (/api/*), רק 17 כלים נצרכים ישירות מ-app.py → סיכון מינימלי לסוכנים החיים. ~73 כלים נותרו לפרוסות הבאות. Invariants: מקדם INV-TOOL1 (envelope עקבי) + G2 (SSoT, ביטול כפילות _ok/_err). לא נוגע ב-G1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 16:32:07 +00:00
Chaim	40c1111e9b	feat(mcp): FU-14 GAP-47 (חלק provenance) — draft_section מחזיר document_id+page+score ה-provenance (document_id, page_number, score) כבר נשלף ב-search_similar אך נזרק בבניית פלט draft_section. כעת מוחזר לכל קטע ב-case_documents/precedents, כך שהכותב יכול לעקוב אחורה אל מסמך-המקור והעמוד ולצטטם, ולא לסמוך על תוכן חסר-מקור. תוספתי בלבד — אין צרכן שמפרסר את מפתחות-הפלט, תואם-לאחור. נותר ב-GAP-47: העברת הנחיות-יו"ר מ-analysis-and-research.md ל-DB (get_chair_directions) — שינוי-מסלול גדול יותר, לפרוסה נפרדת. Invariants: מקיים INV-TOOL4 (מקור-אמת נגיש) + G9 (provenance). לא נוגע ב-G2/G1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:58:39 +00:00
Chaim	701efab726	feat(mcp): FU-14 GAP-51 — איחוד אוצר-המילים של תוצאת-תיק (set_outcome SSoT) הכרעת-יו"ר: קנוני = 3 תוצאות אמיתיות (rejection/partial_acceptance/full_acceptance); betterment_levy יוצא מהיותו "תוצאה" ועובר ל-override לפי practice_area. + עקרון "אנגלית-ב-DB, עברית-ב-UI": מפת-תוויות SSoT אחת. lessons.py: - VALID_OUTCOMES = 3 (הוסר betterment_levy). - OUTCOME_LABELS_HE (SSoT לתצוגה) + LEGACY_OUTCOME_MAP + canonical_outcome(). - PRACTICE_AREA_OVERRIDES["betterment_levy"] מרכז את כל ה-guidance שהיה מפתוח כ-outcome (golden_ratios/opening/summary/discussion/template). - get_lessons_for_outcome(outcome, practice_area) + format_ratios_comment(..., practice_area) מחילים override + מנרמלים legacy. block_writer.py: STRUCTURE_GUIDANCE קנוני + תווית מ-OUTCOME_LABELS_HE + override betterment. workflow.set_outcome: קנוני 3 + מיפוי-legacy סלחני; תווית מ-SSoT. drafting.py: טבלת יחסי-זהב + get_decision_template מודעי-practice_area (override). web-ui case.ts: הסרת betterment_levy מ-expectedOutcomes (הוא practice_area). server.py: docstrings קנוניים. מיגרציה: migrate_gap51_outcomes.py — 9 שורות נורמלו (rejected→rejection וכו'), גיבוי ב-data/audit/. הקוד canonicalize בקריאה ⇒ backward-compatible גם בלי מיגרציה. אומת: py_compile (5 קבצים) + בדיקות-יחידה offline (override/legacy/labels) + אימות-DB. עודכנו X9 §3 + gap-audit (GAP-51 ✅). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:34:49 +00:00
Chaim	ea8b48c6ac	feat(mcp): FU-14 GAP-45 — extraction_status (חשיפת תור-החילוץ הסמוי) INV-TOOL4 (visibility / persistence). תור בקשות-החילוץ (metadata/halacha) נשמר ב-case_law.{metadata,halacha}_extraction_requested_at ומרוקן ע"י precedent_process_pending — אבל לא היה כלי לראות את עומק-התור. נוסף: - db.extraction_queue_status() — count + גיל הבקשה הוותיקה לכל kind (read-only). - plib.extraction_status() — tool wrapper (envelope _ok/_err). - רישום extraction_status ב-server.py ליד precedent_process_pending. - precedent_process_pending קיבל _clamp_limit (עקביות עם GAP-53). תוספתי, read-only, אפס שבירה. עודכנו X9 (INV-TOOL4 ✅) ו-gap-audit (GAP-45 ✅). py_compile עבר על 3 קבצי הקוד. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:00:25 +00:00
Chaim	034b609bd3	feat(mcp): FU-14 GAP-52 — idempotency על case_create/precedent_attach/document_upload INV-TOOL3 (idempotency על מפתח דטרמיניסטי). כל שלושת הכלים מחזירים את הרשומה הקיימת במקום ליצור כפילות: - case_create — מפתח case_number (כבר UNIQUE ב-schema): מחזיר את התיק הקיים במקום unique-violation. - precedent_attach — מפתח (case_id, section_id, citation, quote): צירוף חוזר של אותו ציטוט לאותו סעיף מחזיר את הקיים. - document_upload — מפתח (case_id, SHA-256 של בייטי הקובץ): העלאה חוזרת של אותו קובץ מחזירה את המסמך הקיים ומדלגת על copy+OCR+embed (החלק היקר). נוספה עמודת documents.content_hash (תוספתי, DEFAULT '') + get_document_by_hash. נבחרה בדיקת-מפתח ברמת-אפליקציה (SELECT-לפני-INSERT) ולא UNIQUE-constraint — כדי לא לשבור startup אם קיימים נתונים-כפולים legacy. אין מיגרציה הרסנית. עודכנו docs/spec/X9 (INV-TOOL3 ✅) ו-gap-audit (GAP-52 ✅, פרוסה 2). py_compile עבר על 4 קבצי הקוד. אימות runtime (restart MCP server) נדחה עד שהחילוץ הפעיל יסתיים. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 14:52:33 +00:00
Chaim	ebfe7f6a1d	feat(mcp): FU-14 פרוסה 1 — get_appraiser_facts (GAP-44) + limit-caps (GAP-53) תוספתי בלבד, אפס שבירת-תאימות. שני invariants מחוזה-כלי-ה-MCP (X9): GAP-44 (INV-TOOL4, סימטריית extract/get): נוסף get_appraiser_facts — ה-get המקביל ל-extract_appraiser_facts. קורא list_appraiser_facts + detect_appraiser_conflicts מה-DB ללא חילוץ-LLM יקר ולא-דטרמיניסטי. מחזיר count=0 (לא שגיאה) אם טרם חולץ. GAP-53 (INV-TOOL5, limit-caps / OWASP API4:2023): נוסף _clamp_limit (תקרה 200, non-positive→max) על ~13 כלי list/search ב-server.py (case_list, search_, precedent_library_list, halachot_pending, missing_precedent_list, list__citations…). list_chair_feedback קיבל param limit חדש (server→workflow→db עם LIMIT) — היה ללא תקרה כלל. לא הוסף get_appraiser_facts ל-frontmatter של סוכנים (INV-AG3 "לא עודף" — ההוראות עוד לא מפנות אליו; חיווט = follow-up). נותר ב-FU-14: GAP-45/48/49/50/51/52. עודכנו docs/spec/X9 (INV-TOOL4/5) ו-gap-audit (סטטוס פרוסה 1). אומת: py_compile על 4 קבצי הקוד. אימות runtime (restart MCP server) נדחה עד שהחילוץ הפעיל של היו"ר יסתיים. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 14:37:30 +00:00
Chaim	8a0c206ecd	feat(reindex): precedent_reindex MCP tool (GAP-09, FU-3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 22:09:44 +00:00
Chaim	677f29ddec	feat(audit): blocks_stale drift flag + health-check visibility (GAP-17, FU-7) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 21:36:56 +00:00
Chaim	1f483383b9	feat(audit): log document_upload/extract_claims/export_docx (GAP-18, FU-7) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 21:31:09 +00:00
Chaim	0c8d415044	fix(retrieval): scope search_decisions by domain — derive from case, block only on undeterminable case (GAP-12, INV-RET1) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 18:23:41 +00:00
Chaim	084b31cd9b	fix(qa): enforce critical-QA gate on export + fix neutral_background critical-but-passed (GAP-15/16, INV-QA3/EX3) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:58:50 +00:00
Chaim	58ab003206	fix(retrieval): make decisions findable by name + unhide committee uploads All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m57s Details Root cause of "agent can't find the Agasi decision in the corpus" (CMPA-55): the decision was fully ingested, but the retrieval layer failed on the realistic agent query — searching by case name. - RC-A (#52): lexical tsvector covered only chunk content + halacha text, so a bare-name query ("אגסי") matched decisions that cite the case, not the case itself. Add meta_tsv on case_law(case_name, case_number) (SCHEMA V20) and OR it into the lexical halacha/chunk SQL with a match boost, so a name/number hit surfaces the case's own rows. Agasi: rank 4 → rank 1. - RC-B (#53): precedent_library_list hard-defaulted source_kind=external_upload and never exposed the param, hiding uploaded ערר/בל"מ (internal_committee) decisions. Thread source_kind through service → tool → MCP tool (supports 'internal_committee' / 'all_committees'). - #54: agent instructions (researcher/analyst/writer) — search-by-name protocol: add content/case-number, search both corpora, use all_committees before declaring "not in corpus". - #55: chunker produced tiny fragment chunks ("דיון", "החלטה") from header keywords matched mid-sentence. Anchor SECTION_PATTERNS to line start + merge sub-min sections; exclude <50-char fragments at query time (484 existing fragments hidden; full re-chunk tracked as #57). Tests: scripts/test_retrieval_by_name.py (name ranks case above citer + substantive regressions); chunker unit checks (0 tiny chunks). New findings filed as tasks #56 (halacha source_kind leak) and #57 (re-chunk migration). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:26:19 +00:00
Chaim	bb0cd7c6a2	feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat All checks were successful Build & Deploy / build-and-deploy (push) Successful in 2m7s Details Six-phase upgrade of /training from a read-only dashboard into a full Style Studio for managing Daphna's style corpus. - Upload Sheet on /training: file → proofread preview → commit (no more CLI-only `upload-training` skill). - Rich corpus metadata: GET /api/training/corpus returns summary, outcome, key_principles, page_count, parties (regex), legal_citation, lessons_count. PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details /content/lessons/patterns) replaces the bare table row. - LLM metadata enrichment: style_metadata_extractor + MCP tools (style_corpus_enrich, style_corpus_pending_enrichment) fill summary /outcome/key_principles via claude_session (free, host-side). - Per-decision lessons: new decision_lessons table + 4 REST endpoints + LessonsTab in drawer; hermes-curator now auto-posts findings as decision_lessons(source=curator). - Curator Portrait tab: prompt rendered with link to Gitea, recent curator findings, style_analyzer training prompts, propose-change form that writes proposals to data/curator-proposals/ for manual chair review (no auto-mutation of the agent file). - Style chat tab: SSE-streamed conversations with the style agent. New host-side pm2 service (legal-chat-service, port 8770) wraps claude CLI with stream-json + --resume continuation; FastAPI proxies via host.docker.internal. Zero API cost — uses chaim's claude.ai subscription. chat_conversations + chat_messages persist history. Architecture: keeps the existing rule that claude_session only runs on the host (not the container). The new legal-chat-service is the canonical bridge between the container and the local CLI for the chat feature; everything else (upload, metadata, lessons) stays within the container's existing capabilities. Audit script (scripts/audit_training_corpus.py) included for verifying which corpus rows still need enrichment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:06:22 +00:00
Chaim	2aee398b4a	feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 ) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details Six independent sub-tasks dispatched in parallel; aggregated here. ## #33 — Hide case_name column library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם" get `className="hidden"` in both Court and Committee row variants. DB column preserved for future use. ## #47 — Audit script periodic New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר prefix, internal missing chair/district, cases.practice_area enum) + CEO wakeup on violations + cron `0 7 * * `. First run: 0 issues. ## #48 — Parent-doc retrieval (gated, default off) Schema V17: precedent_chunks.parent_chunk_id + chunk_role ('child'\|'parent'). New chunker.chunk_document_hierarchical() — section-aware parents (~1500 tokens) containing ~5 overlapping children (~300 tokens each). New db.store_precedent_chunks_hierarchical two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and swap content + dedupe by parent_chunk_id when flag on. Toggle: PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS. Backfill ~3min and ~$0.20 — deferred to follow-up. ## #49 — Multimodal backfill New scripts/backfill_multimodal_precedents.py with token-matching case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container: 26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no source file on disk) — will catch up when re-uploaded. ## #50 — Closed-loop feedback + nDCG Schema V18: search_logs + search_relevance_feedback. New telemetry.py with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) + auto-infer_relevance_from_citations (reads case drafts → marks score=3 when cited precedent appears in past search top-K). Hooks added to 5 search paths. scripts/compute_ndcg.py for aggregation. Two admin API endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI deferred — API is enough for now. ## #51 — Halacha quality monitoring New scripts/monitor_halacha_quality.py — baseline avg confidence (trusted=0.849, all=0.833, pending=0.694) with rolling window drift detection. Default 5% threshold. Exits non-zero on alert for cron integration. Recommended: `0 8 * 1` weekly Mon 8am. ## Bonus: 230 unlinked citations → missing_precedents Bulk-imported 230 distinct unlinked citations from precedent_internal_citations to missing_precedents.status='open', party='committee', with notes listing source citers. Top candidate: ע"א 3213/97 (cited 5x). Total open missing_precedents now 237. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 11:26:52 +00:00
Chaim	7ad995aade	feat: #34 citation graph + #32 wide-modal precedent edit + #13 verify All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m38s Details ## #34 — Daphna's internal citation graph New schema V16 (V15 was already used by proceeding_type): table ``precedent_internal_citations`` (source→cited, with cited_case_law_id nullable for citations whose target isn't in the corpus yet) + 3 indexes (source, target, unlinked). New service ``citation_extractor.py`` with regex patterns for ערר / בל"מ / עע"מ / בר"מ / עמ"נ / ע"א / בג"ץ / רע"א — accepts both ``\/`` and ``-`` separators, requires actual parenthesized district label to avoid greedy mid-paragraph captures. Resolves citations against ``case_law.case_number`` substring; default confidence 0.90 linked, 0.75 unlinked. ON CONFLICT DO NOTHING on (source, cited_case_number). 3 new MCP tools: ``extract_internal_citations``, ``list_internal_citations``, ``list_incoming_citations``. Optional flag ``include_cited_by=True`` on ``search_internal_decisions`` appends cited-by candidates as ``match_type='cited_by'`` stubs. Bulk-extracted from 40 internal_committee rows authored by דפנה תמיר: 353 distinct citations, 348 stored, 96 linked / 252 unlinked. Top citers: 1079/24 (30), 1024/24 (19), 1009/25 (18). Top unlinked target: ע"א 3213/97 (cited 5x) — natural #35 candidates. ## #32 — Wide-modal precedent edit `precedent-edit-sheet.tsx`: ``<Sheet side="left">`` → centered ``<Dialog>`` with ``sm:max-w-4xl`` ``max-h-[90vh]`` ``overflow-y-auto``. Component API unchanged so existing callers (`/precedents/[id]/page.tsx`, `library-list-panel.tsx`) work as-is. RTL preserved. Mobile falls back to near-full-width via shadcn default. ## #13 — 403/17 verification `case_law e151fc25-...` (אהרון ברק - תכנית רחביה) already in perfect shape after Stage A work: all metadata fields populated, 351 halachot with avg_conf=0.864 (well above 0.78 threshold). No re-extraction needed; closing task as verified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 10:37:53 +00:00
Chaim	d359ab9884	feat(proceeding-type): explicit ערר/בל"מ field for cases + corpus All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m40s Details Same case_number can exist as both a regular appeal (ערר) and an extension-of-time request (בל"מ), and we were inferring the difference from appeal_subtype prefixes — fragile, and case-number lookups weren't disambiguated. Now stored as a first-class field on both case_law (corpus) and cases (live cases), with partial unique indexes on (case_number, proceeding_type). - SCHEMA_V15: column + CHECK constraints + backfill from appeal_subtype LIKE 'extension_request_%' + partial unique indexes replace the old global UNIQUE(case_number). - derive_proceeding_type() centralizes the inference rule (extension_request_* → בל"מ; subject regex fallback; default ערר). - Metadata extractor prompt asks Claude to populate the new field explicitly; apply_to_record writes it for internal_committee rows. - internal_decision_upload, case_create, case_update accept an optional proceeding_type; FastAPI request models expose it. - Wizard + edit dialog get a sided Select; case header renders the resolved label (ערר / בל"מ). - Uploaded the 2 staged בל"מ decisions on betterment levy: 8126/24 (סופר נוח, 13 chunks), 8047/23 (הרנון, 48 chunks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:17:33 +00:00
Chaim	f3cc9ca9d4	feat: Stage A finalizers + #35/#36/#37 — critical-gap closure Some checks failed Build & Deploy / build-and-deploy (push) Has been cancelled Details Four parallel sub-agents closed the remaining critical gaps from the 26/05 Stage A/B sprint. Each block independently tested; aggregated here. ## #30/#31 finalizers (sub-agent A) * Auto-derive practice_area in case_create from case_number prefix (1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197); default for CaseCreateRequest is now "" (the DB constraint catches any stray "appeals_committee"). * practice_area.py: derive_subtype now handles axis-B domain values (rishuy_uvniya/betterment_levy/compensation_197) without parsing the case number; new helper derive_domain_practice_area(). * Halacha re-extraction verified unnecessary — all 6 reclassified records already had is_binding=false and approved halachot. * Regression tests: 6 cases in tests/test_corpus_constraints.py covering practice_area enum, internal-committee chair/district, external-upload arar prefix, MCP guard. * UI: district input → Select dropdown (7 districts) in precedent-edit-sheet.tsx, preserving legacy free-text values. ## #37 בל"מ subtypes (sub-agent B) * 3 new appeal_subtypes: extension_request_{building_permit, betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended, SUBTYPES_BY_AREA mappings added. * New helpers: is_blam_subject(), is_blam_subtype(), derive_subtype_with_blam(case_number, subject, practice_area). case_create now uses it to auto-detect "בקשה להארכת מועד" subjects. * 3 methodology templates under docs/methodology/extension-request-.md. paperclip_client.py mapping updated for the 3 new subtypes (extension_request_building_permit→CMP, the other two→CMPA). * Frontend: bilingual "בל"מ" badge + filter dropdown on cases list + detail header; appeal-type-bars collapseBlam() merges בל"מ into its parent domain for aggregate bars. * Wizard auto-detects בל"מ from subject during case creation. * 3 Berlinger cases (1017/1018/1019-03-26) migrated to appeal_subtype=extension_request_building_permit via psql. ## #35 missing_precedents feature (sub-agent C) * Schema V13: missing_precedents table (citation, case_id, party, legal_topic, status, linked_case_law_id, claim_quote, ...) + FK constraints + 3 indexes. Applied via psql + idempotent migration. * 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints (POST/GET/PATCH/DELETE/upload — upload routes by citation prefix to ingest_internal_decision or ingest_precedent). * Next.js page /missing-precedents with 5 status tabs + filters + sidebar badge counter + detail drawer with metadata edit + smart upload form that switches fields per committee/court. * Bootstrap: 7 rows imported from the JSON file (3 citations × cases, all status=closed with linked_case_law_id). * legal-researcher.md: new §2ב.5 with missing_precedent_create usage + dedup semantics + tool grant. ## #36 legal_arguments aggregation (sub-agent D) * Schema V14: legal_arguments + legal_argument_propositions M:M. Applied via psql. * New service argument_aggregator.py with two functions — aggregate_claims_to_arguments() (Claude CLI / claude_session) and get_legal_arguments(). Graceful llm_unavailable handling when CLI is missing (containers). * 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as BackgroundTask, GET .../legal-arguments). * Frontend: shadcn Accordion + new legal-arguments-panel.tsx with hierarchical (party → priority badge → arguments) display, "טיעונים" tab on the case page, "חשב/חשב מחדש" buttons. * scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run found 8 candidate cases including 1017/1018/1019. ## Open follow-ups (intentionally deferred) * npm run api:types in web-ui (CLAUDE.md flow) — recommended before the next UI commit; not required for backend deployment. * Run backfill_legal_arguments.py --apply once the container picks up the new aggregator service. * webhook on missing-precedents upload-close to Paperclip (optional). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 08:34:40 +00:00
Chaim	c6e368e4f7	feat(corpus): Stage A — corpus tagging fixes + prevention layer All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m8s Details מתקן את הבאג של תיוג שגוי לועדות ערר ומונע חזרתו: Code changes: * New MCP tool `internal_decision_upload` (chair_name+district required) — sole supported path for ingesting committee decisions; tags source_kind='internal_committee' automatically. * Citation guard in `precedent_library_upload` rejects citations starting with "ערר" or "בל\"מ" with a directive to use internal_decision_upload. * `practice_area.py` taxonomy unification: PRACTICE_AREAS now accepts both multi-tenant (appeals_committee/national_insurance/labor_law) and domain (rishuy_uvniya/betterment_levy/compensation_197) values. New helper `to_db_practice_area(multi_tenant, subtype) -> domain`. Agent docs: * legal-researcher (+5K): upload-tool decision flowchart, code samples per source_kind, district enum (ירושלים/מרכז/תל אביב/צפון/דרום/חיפה/ארצי) * legal-ceo, legal-analyst, legal-writer, legal-qa, HEARTBEAT — taxonomy awareness + source_kind-aware citation patterns + research_complete as valid status. * Fixed two pre-existing wrong practice_area values in examples (histael_hashbacha→betterment_levy, pitsuim_197→compensation_197). Closes TaskMaster #30(parts), #38(parts), #39 (root cause). DB-side backfill + CHECK constraints applied directly via psql: * 11 cases.practice_area corrected (1xxx→rishuy, 8xxx→betterment) * 6 case_law records reclassified external_upload→internal_committee with inferred district * 6 chair_name backfilled from full_text (5 שרית אריאלי + 1 דפנה תמיר) * 88 new halachot extracted for newly-uploaded precedents (אנטרים + ירושלים שקופה 1112/22 + אגא וכט) * CHECK constraints: cases.practice_area enum, case_law internal⇒district Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 07:40:18 +00:00
Chaim	b368bce690	fix: handle invalid date formats gracefully and add missing dialog descriptions All checks were successful Build & Deploy / build-and-deploy (push) Successful in 4m14s Details - Wrap date.fromisoformat() in try/except in case_update tool — prevents unhandled ValueError from surfacing as 500; FastAPI now catches it as 422 - Add DialogDescription (sr-only) to 5 dialogs missing aria-describedby: documents-panel preview + delete, drafts-panel delete + feedback, link-related-dialog Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-25 15:53:01 +00:00
Chaim	cddc7c8d24	fix: start-workflow wakeup failure now returns 502 instead of silent success All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m33s Details If pc_wake_ceo fails, the endpoint now raises HTTP 502 and skips the case_update to processing — preventing cases from silently getting stuck with no CEO running. Also adds `processing` to CEO routing table and updates case_list docstring with full status list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 11:02:30 +00:00
Chaim	83b6ff51b7	feat: fix wizard step-skip bug + extend case edit with all fields + Paperclip title sync All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m38s Details - Fix keyboard navigation bug: React was reusing the submit button DOM element when transitioning "הבא" → "צור תיק", retaining focus and causing Enter to auto-submit step 3. Added key props to force element replacement. - CaseEditDialog now covers all wizard fields: appellants, respondents, property_address, permit_number (in addition to existing title, subject, hearing_date, expected_outcome, notes). - When case title changes, Paperclip project name is updated in background via new update_project_name() in paperclip_client.py. - Extended CaseUpdateRequest, case_update MCP tool, and caseUpdateSchema to carry the new fields end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 10:55:45 +00:00
Chaim	3e14cd6798	feat: link related precedents across court instances (SCHEMA_V11) Add ability to mark case_law records as related (e.g. same appeal through ועדת ערר → מנהלי → עליון): - DB: case_law_relations join table (bidirectional, V11 migration) - DB CRUD: add/remove/get_case_law_relations - Service: get_precedent() now returns related_cases[] - MCP: precedent_link_cases + precedent_unlink_cases tools - REST: POST/DELETE /api/precedent-library/{id}/relations - UI: RelatedCasesSection on detail page with search dialog and unlink Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 07:52:29 +00:00
Chaim	bd4b0ca766	feat(mcp): case_get_final_text — fall back to PDF/DOC/RTF/TXT/MD All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m58s Details The Hermes Knowledge Curator's hermes-curator.md says it must be able to read both DOCX and PDF final decisions. The original implementation hardcoded the .docx extension only. Extend to try .docx → .pdf → .doc → .rtf → .txt → .md, returning the first match. extractor.extract_text already supports all six formats, so no extractor changes needed. If none found, the not_found response now includes the tried_extensions list so the caller knows what was attempted. Verified on case 1130-25 (.docx still picked first) and tested via `curator-cmp mcp test legal-ai`.	2026-05-05 19:18:57 +00:00
Chaim	7c9582ed04	feat(mcp): case_get_final_text — let agents read the signed final DOCX All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m36s Details The Knowledge Curator (Hermes) couldn't read סופי-{case}.docx because document_get_text only works on rows in the documents table — the final file is just a copy in the case's exports/ directory, not a tracked document. CMP-71 hit this and produced an unproductive interaction asking the user how to fix the access issue. Add a new MCP tool that: - Locates exports/סופי-{case_number}.docx via config.find_case_dir - Extracts text using the existing extractor service (python-docx based) - Returns JSON with status + text + page_count + truncation info - Optional max_chars cap for large decisions Smoke test on case 1130-25: 400-char preview returns proper Hebrew text beginning with "לפנינו ערר על החלטת הוועדה המקומית...". The local MCP server reloads on next Hermes spawn (stdio mode), so the tool is immediately available — no Coolify deploy needed. Curator's promptTemplate (DB-stored) updated to use the new tool as the primary path for reading the final. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 15:57:10 +00:00
Chaim	92a2763b86	feat: add internal committee decisions corpus (source_kind='internal_committee') All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m31s Details Three-layer separation: style learning (style_corpus), appeals-committee decisions (internal_committee), and court rulings (external_upload). - SCHEMA_V10: chair_name + district columns on case_law and cases, partial indexes - create_internal_committee_decision() DB upsert function - search_precedent_library_semantic() now accepts source_kind/district/chair_name params - search_precedent_library_hybrid() passes through new params - services/internal_decisions.py: ingest_internal_decision, migrate_from_style_corpus, migrate_from_external_corpus (identifies rows via source_type='appeals_committee') - search_internal_decisions() MCP tool (server.py + tools/search.py) - internal_decision_migrate() MCP admin tool - Web endpoints: POST /api/internal-decisions/upload, POST /api/internal-decisions/migrate, GET /api/internal-decisions - ingest_final_version auto-ingests finalized decisions into internal corpus - SKILL.md updated: agents now search internal + external in parallel, present separately Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:33:39 +00:00
Chaim	eab0ca906c	feat(interim): include block-he opening in pre-ruling interim drafts block-he (פתיחה ניטרלית) was previously emitted only in final decisions. For interim drafts shown to the chair before ruling, including a neutral opening helps the chair confirm framing before approving downstream blocks. Skipped if empty, so legacy cases without block-he are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:25:54 +00:00
Chaim	81ccf3a888	feat(retrieval): track page_number on text chunks for multimodal hybrid boost All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6m33s Details The legacy chunker did not track which PDF page each chunk came from. Stored chunks had page_number=NULL, which blocked the multimodal hybrid retriever's text+image boost — it joins (chunk, image) on (document_id, page_number) and the join could never fire. This change: - extractor.extract_text now returns (text, page_count, page_offsets); page_offsets[i] is the start char offset of page (i+1) in the joined text. None for non-PDFs. - chunker.chunk_document accepts an optional page_offsets and tags each chunk with the page that contains its first character (uses the existing chunker logic; pages assigned post-hoc by content search to keep the diff minimal). - processor.process_document and precedent_library.ingest_precedent forward page_offsets through the chunker. New uploads now carry accurate page_number on every chunk. - Other extract_text callers (tools/documents, tools/workflow, web/app.py) updated to unpack the third element (ignored). - scripts/backfill_chunk_pages.py: per-case retrofit. Re-extracts each PDF (re-OCRs via Google Vision if needed, ~$0.0015/page), computes page_offsets, and updates page_number on every chunk by content search. Idempotent; --force re-runs on already-tagged docs. Forward-only would leave the 419 image embeddings backfilled on cases 8174-24 + 8137-24 unable to boost their corresponding text chunks. The retrofit script closes that gap (cost ~$0.60). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:49:41 +00:00
Chaim	242f668319	feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m50s Details Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid text+image search. Off by default; enable with MULTIMODAL_ENABLED=true. - Schema V9: document_image_embeddings + precedent_image_embeddings (vector(1024), page_number, image_thumbnail_path) - extractor.render_pages_for_multimodal renders PDF pages at MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass - embeddings.embed_images calls voyage-multimodal-3 in 50-page batches - services/hybrid_search.py orchestrator: rerank applied to text side first (rerank-2 is text-only); image side cosine; weighted merge with text_weight 0.65 (env-tunable); image-only pages surface as match_type='image' so dense scanned content still appears - processor.process_document and precedent_library.ingest_precedent gated by flag — non-fatal on multimodal failure - scripts/multimodal_backfill.py — idempotent per-case CLI to embed existing documents without re-extracting text Validated locally on a 5-page response brief: render 0.31s, embed 8.32s, hybrid merge surfaces image rows correctly. Production rollout starts with flag=false (no behavior change), then per-case A/B. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:24:52 +00:00
Chaim	26c3fddf41	feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which 4 POCs showed inconsistent improvement), add a cross-encoder rerank layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false). POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge): - mean@3 +4.5% (4.306 → 4.500) - practical-category queries +11.6% (3.78 → 4.22) - latency +702ms per query - no schema change, no re-embed, no double storage Plumbing: - config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars - embeddings.voyage_rerank() wraps voyageai client.rerank - services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is unavailable. - tools/search.py: search_decisions, search_case_documents, find_similar_cases all wrapped - services/precedent_library.search_library wrapped Smoke-tested locally with flag on/off — produces expected behaviour and latency profile. Ready for production rollout via Coolify env flip after deploy. POCs (kept under scripts/ for reference): - voyage_context3_poc{_long}.py — context-3 evaluation (rejected) - voyage_multimodal_poc.py — multimodal-3 (stage C, deferred) - voyage_rerank_judge_poc.py — single-case rerank benchmark - voyage_rerank_corpus_poc.py — full-corpus rerank validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:43:41 +00:00
Chaim	4a9a6b7970	feat(precedents): UI button queues extraction for local MCP worker All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details The chair wanted a one-click "extract metadata" button on the edit sheet. The constraint stays the same — claude_session needs the local CLI which the container doesn't have, so the button can't run the extractor itself. Compromise: button stamps a queue marker; the local MCP server drains the queue on demand. DB (V8): two nullable timestamps on case_law, metadata_extraction_requested_at and halacha_extraction_requested_at, with partial indexes for cheap "find pending" scans. API: POST /api/precedent-library/{id}/request-metadata → stamp the row POST /api/precedent-library/{id}/request-halachot → same for halacha GET /api/precedent-library/queue/pending?kind=... → read-only view UI: Sparkles button in the edit sheet header. Click → toast tells the chair what to run from Claude Code. The button never triggers the extractor directly from the container. MCP tool: precedent_process_pending(kind, limit) — runs from Claude Code with the local CLI, picks up everything stamped, calls the extractor for each, clears the timestamp on success. Failures keep the timestamp so the next invocation retries them. Architectural rule (claude_session local-only) is preserved end-to-end and called out in the new endpoint comment + tool docstring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 12:32:25 +00:00
Chaim	73a79ea7e8	feat(precedents): metadata auto-fill, edit sheet, persuasive extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details Three improvements to the precedent library based on usage feedback: 1. Auto-fill metadata at upload time. New service precedent_metadata_extractor reads the ruling's full_text and suggests case_name (short), summary, headnote, key_quote, subject_tags, appeal_subtype. The merge policy fills only empty fields, preserving everything the chair typed in the upload form. Wired into the ingest pipeline; also exposed as a re-run endpoint POST /api/precedent-library/{id}/extract-metadata for existing records. 2. Edit sheet in the UI. Pencil icon on each library row opens a pre-populated form covering every field. A Sparkles button on the sheet runs the metadata extractor on demand and refreshes the form. The case_number is read-only because halachot are FK'd to it; renaming requires delete + re-upload. 3. Halacha extractor branches on is_binding. Sources marked binding (Supreme/Administrative) keep the strict halacha prompt. Non-binding sources (other appeals committees, district courts on planning matters) get a different prompt that extracts applications, interpretive principles, and persuasive conclusions — labeled with new rule_types 'application' and 'persuasive'. The fallback also widens chunk selection: if the chunker labeled nothing as legal_analysis/ruling/conclusion, we now run on all chunks rather than returning zero halachot for a usable ruling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 10:19:35 +00:00
Chaim	7ee90dce31	feat: external precedent library with auto halacha extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details Adds a third corpus of legal authority distinct from style_corpus (Daphna's prior decisions for voice) and case_precedents (chair-attached quotes per case). The new corpus holds chair-uploaded court rulings and other appeals committee decisions, with binding rules (הלכות) extracted automatically and queued for chair approval. Pipeline (web/app.py + services/precedent_library.py): file → extract → chunk → Voyage embed → halacha_extractor → store + publish progress over the existing Redis SSE channel. Schema V7 (services/db.py): extends case_law with source_kind + extraction status fields under a CHECK constraint pinning practice_area to the three appeals committee domains (rishuy_uvniya, betterment_levy, compensation_197). New precedent_chunks (vector(1024)) and halachot tables (vector(1024) over rule_statement, IVFFlat indexes, gin on practice_areas/subject_tags). Halachot start as pending_review; only approved/published rows are visible to search_precedent_library. Agents: legal-writer, legal-researcher, legal-analyst, legal-ceo, legal-qa get search_precedent_library. legal-writer prompt explains the three-corpus distinction and CREAC use; legal-qa now verifies that every cited halacha resolves to an approved row in the corpus. UI: /precedents page with four tabs — library / semantic search / pending review (J/K nav, A/R/E shortcuts, badge count) / stats. Reuses the existing upload-sheet progress + SSE pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:38:18 +00:00
Chaim	f256eddbb1	git_sync: full case-dir backup to Gitea (sweep + explicit commits) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details The case repo is the user's backup, so anything in the dir must end up on Gitea. Two layers: 1. Periodic sweep (every 30s) — git_sync.sweep_loop runs as a FastAPI background task. It scans every case dir, runs git status --porcelain on each, and commit_and_push's any dirty changes with an auto-built Hebrew message ("אוטו: טיוטות (2) · מסמכים"). Catches files written outside the API path: agent research artefacts, manual edits, etc. 2. Explicit commits at known write paths — DOCX export, interim draft, apply_user_edit, revise_draft, mark-final, analysis DOCX export. These give immediate feedback with descriptive messages instead of waiting up to 30s for the sweep. safe.directory injection added to _git_env so sweep + explicit commits work even when the running uid differs from the case-dir owner (host runs vs. uniform-root container). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:27:36 +00:00
Chaim	fa70944ed4	case-create: surface Gitea repo result + UI retry button All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details The auto-creation in case_create had two failure modes that combined to make repos silently missing: a stale GITEA_TOKEN returning 401, and the outer try/except in case_create that swallowed every exception with a bare pass. Result: cases like 8174-24 ended up with a local git repo and Paperclip project but no Gitea repo, with no signal anywhere. _setup_gitea_remote now returns {ok, url, error} and never raises; the result is attached to the case JSON and the FastAPI endpoint logs a warning when ok=false. The UI gets a "צור ריפו ב-Gitea" button on the case header that appears only when the repo or remote is missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:12:05 +00:00
Chaim	5e4c03d0cd	Case sync: refresh remote URL with current token before each push All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details Cases failed to push silently after the Gitea token in Infisical was rotated: the embedded credential in each case repo's origin URL was the old token, the rotation never propagated, and capture_output=True hid the auth failure as a logger.warning. Three cases (1033-25, 1130-25, 1194-25) accumulated unpushed commits over weeks before this was noticed. Fixes the root cause in two places: web/gitea_client.py for uploads through the FastAPI endpoint, and mcp-server/services/git_sync.py for case_update / document_upload through MCP tools (which previously committed but never pushed at all). The new commit_and_push helper: - re-injects the current GITEA_ACCESS_TOKEN into the existing origin URL on every call, so pushes survive token rotation - logs push failures at WARNING with the actual stderr (the previous code suppressed errors entirely) - continues to push even when the commit was a no-op, in case earlier commits are still unpushed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:14:57 +00:00
Chaim	36ca713dfa	Retrofit: tighten yod-bet pattern, add cover-block fallback All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6s Details The "על כן" pattern for block-yod-bet was too greedy and matched mid-discussion transitional sentences (e.g. "על כן, במקום בו..."), which caused forward-scan to skip block-yod-alef ("סוף דבר") via the pointer advance. Tightened to require an operative subject (אנו / הערר / הוועדה / ועדת הערר) so terminal "על כן, אנו מחליטים" still matches but mid-block transitions don't. Added structural_fallback for cover blocks (alef/bet/gimel/dalet) — these are template metadata not present in user-edited DOCX bodies. Inject zero-content anchors so apply_user_edit can still target them later. The frontend toast distinguishes real content gaps from fallback anchors. Also expanded heading patterns based on training corpus inspection: - block-vav: על המקרקעין חלות / במצב התכנוני / התכניות החלות - block-zayin: טענות העוררת - block-chet: עיקר תגובת המשיב - block-tet: הדיון בוועדת הערר For case 1130-25, this raises detection from 6/12 to 11/12 blocks — only block-yod-bet remains missing (Daphna's edit ends at "סוף דבר" + numbered ruling, no terminal "ההחלטה" or "על כן אנו מחליטים" paragraph). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 06:57:41 +00:00
Chaim	c536ed0e63	Edit document doc_type and appraiser side from the case UI All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details Until now changing a document's doc_type required a manual SQL update. Adds an inline editor on the document badge so the chair can retag without leaving the case page, and threads an appraiser_side tag (committee / appellant / deciding) through the appraisal pipeline so betterment-levy cases — which usually have 2-3 appraisers — render conflicts with the deciding appraiser's view marked as governing. Backend - New appraiser_facts.appraiser_side column (V5.1) populated from documents.metadata.appraiser_side at extraction time. - extract_appraiser_facts now returns status='sides_missing' with the list of untagged appraisals instead of running with empty side labels — chair must tag every appraisal first via the UI. - Conflict detection orders entries committee → appellant → deciding so the deciding appraiser appears last; block-tet's prompt instructs the writer to phrase the deciding appraiser's view as the governing factual finding ("ואולם, השמאי המכריע קבע..."). - New PATCH /api/cases/{n}/documents/{doc_id} (Pydantic model with whitelist validation) and matching document_update MCP tool. Both merge appraiser_side into metadata JSONB instead of touching the schema. UI - New shared doc-types module exports the canonical 11 doc_type options plus the 3 appraiser-side options; both upload-sheet and the document badge now read from it instead of duplicating Hebrew labels. - New DocumentTypeEditor renders a Popover off the doc-type Badge with two Selects. The save button stays disabled while doc_type is appraisal but no side has been picked, mirroring the backend enforcement so the user finds out before triggering extraction. - usePatchDocument React-Query mutation invalidates the case detail on success so the badge updates without a manual refresh.	2026-04-19 06:26:51 +00:00
Chaim	c619c22a51	Add pre-ruling interim draft (טיוטת ביניים) for appeals committee All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details Lets the chair generate a partial decision DOCX before the discussion-and- ruling block is decided. Same template, skill and DOCX styling as the final decision (David, RTL, bookmarks) — only the block selection and order differ: רקע (ו) → תכניות+היתרים (ט) → טענות (ז) → הליכים (ח). The opening (ה), ruling (י), summary (יא), and signatures (יב) are omitted. - New appraiser_facts table + CRUD + conflict detection in db.py (V5 schema). Conflict = same plan/permit identifier reported differently by 2+ appraisers. - New appraiser_facts_extractor service: per-appraisal Claude extraction of plans + permits with raw quotes and page numbers. - block-tet prompt extended with a permits sub-section sourced from the extracted facts, plus an explicit instruction to flag inter-appraiser conflicts in neutral wording without resolving them (deferred to block-yod). - block-chet prompt extended with a post-hearing materials context sourced from documents.metadata.is_post_hearing. - docx_exporter.export_decision now accepts mode='interim' which reorders the blocks per the chair's mental model and writes טיוטת-ביניים-v{N}.docx (versioned independently of regular drafts). - 3 new MCP tools: extract_appraiser_facts, write_interim_draft, export_interim_draft. write_interim_draft auto-runs extraction if the appraiser_facts table is empty for the case.	2026-04-18 13:28:04 +00:00

1 2

69 Commits