legal-ai

Author	SHA1	Message	Date
Chaim	a02b929b5c	fix(precedents): נרמול case_number עמיד-להתנגשות — מדלג ומתעד, לא קורס (#145 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 4s Details Lint — undefined names / undefined-names (pull_request) Successful in 11s Details ה-backfill של citation_formatted חשף קריסה ב-apply_to_record: כשפסק-דין חיצוני מכיל docket שכבר שייך לרשומה כפולה אחרת, נרמול case_number → docket-נקי נתקל ב-uq_case_law_external_number ומפיל את כל המיזוג (כולל הציטוט). דוגמה: 'ע"א 3213/97' → '3213/97' שכבר קיים (כפילות נקר). - db.case_number_collides(case_number, exclude_id) — בודק אם docket כבר שייך לרשומה לא-internal אחרת (האינדקס החלקי). - apply_to_record — מדלג על נרמול ה-case_number כשיש התנגשות (כפילות לדדופ בהמשך, לא ענייננו כאן) וממשיך לכתוב את הציטוט. no-silent-swallow: מתעד warning. - scripts/backfill_precedent_citations.py — try/except per-row + מונה שגיאות, כך ששורה אחת לא מפילה את האצווה. אומת: ריצה-מחדש מלאה ללא קריסה (0 שגיאות); ההתנגשות תועדה ודולגה כצפוי; פסיקת בית-משפט: 224/228 מולאו, 4 נמנעו (חסר צדדים/תאריך — abstention, INV-AH). test_fu2b_reconcile ✓. Invariants: INV-AH (abstention) · G1 (נרמול-בכתיבה נשמר, רק לא קורס) · חוקה §6 (אין בליעה שקטה — דילוג מתועד). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 04:17:07 +00:00
chaim	7043de0ac2	Merge pull request 'feat(precedents): citation_formatted דטרמיניסטי בקוד — Gemini מחלץ רכיבים, לא מעצב (#145 )' (#262 ) from worktree-precedent-deterministic-citation into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m47s Details G12 Leak-Guard / leak-guard (push) Successful in 8s Details Lint — undefined names / undefined-names (push) Successful in 22s Details	2026-06-15 03:38:51 +00:00
Chaim	d6608ce849	feat(precedents): citation_formatted דטרמיניסטי בקוד — Gemini מחלץ רכיבים, לא מעצב (#145 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 5s Details Lint — undefined names / undefined-names (pull_request) Successful in 11s Details הבעיה (#145): מחלץ-המטא ביקש מ-Gemini Flash לעצב את מראה-המקום המלא (citation_formatted). ב-JSON-mode חופשי (ללא responseSchema) המודל החזיר JSON תקין ומלא אך השמיט בעקביות דווקא את השדה הזה — אומת על 8070-05-25, 1194-12-25, 1200-12-25 (וגם כשהצדדים זוהו). השדה הקשה ביותר (עיצוב מחרוזת) + היתר-בפרומפט להשאיר ריק → Flash מפיל אותו. הפתרון: citation_formatted הוא שדה-תצוגה נגזר (X1 §3 / INV-ID2) — מורכב דטרמיניסטית מרכיבים מובְנים, לא מעוצב ע"י LLM. תפקיד ה-LLM מצטמצם לחילוץ רכיבים אמינים (שורת-הצדדים, קידומת-ההליך לפסקי-בית-משפט). - db.format_precedent_citation(record) — מרכיב לפי כללי-הציטוט-האחיד: ועדת-ערר (מחוזית/ארצית/בל"מ) מ-proceeding_type+district+source_kind; פסקי-בית-משפט מ-court_prefix(LLM)+district-abbrev. מוציא docket נקי מ-case_number מזוהם ("עע\"מ 683/13"→"683/13"). נמנע ('') כשחסר רכיב (צדדים/docket/תאריך/קידומת) — abstention על המצאה (INV-AH). - case_law.parties (V39) — שורת "עורר נ' משיב" כבסיס re-derivable. - מחלץ-המטא: הפרומפט מחלץ parties+citation_prefix (לא citation_formatted); apply_to_record מרכיב דטרמיניסטית מהרשומה-האפקטיבית וממלא רק שדה ריק (עריכות-יו"ר נשמרות). - scripts/backfill_precedent_citations.py — backfill 2-מעברים (דטרמיניסטי→LLM), מדווח שורות-נמנעות, idempotent. אומת: 3 הרשומות הידניות משוחזרות תו-בתו; פסק עליון אמיתי מולא end-to-end (עע"מ 683/13 ... נבו 3.9.2015). test_fu2b_reconcile ✓. Invariants: INV-ID2/X1§3 (ציטוט=תצוגה נגזר, לא מפתח) · INV-AH (abstention, אפס המצאה) · G1 (docket נקי) · G2 (מסלול-יחיד — מחליף את נתיב-ה-LLM, לא מקביל). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 03:37:53 +00:00
Chaim	77817a46ad	fix(metadata): לא להתיישב 'completed' בכשל-חילוץ-Gemini חולף (#138 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 7s Details Lint — undefined names / undefined-names (pull_request) Successful in 17s Details צד-המטא-דאטה (precedent_metadata_extractor) קרא רק GEMINI_API_KEY בעוד בסביבה קיים GOOGLE_GEMINI_API_KEY — תוקן ב-PR #255 (fallback). הבאג המשני שנותר: כש- extract_and_apply החזיר 'no_metadata' (כשל-Gemini), מסלול-הדריינר process_pending_extractions התיישב ל-metadata_status='completed' ללא-תנאי, כך שהרשומה ננטשה בשקט עם מטא ריק והדריינר לא חזר אליה (נצפה: da2d9ccb '4491-02-21', 5fabdac5 '14306-09-23' — completed אך court/date/summary ריקים). תיקון (G1 — אבחנת-מקור): - extract_and_apply מבדיל תוצאה-ריקה: יש full_text → 'extraction_failed' (חולף, בר-retry); אין full_text → 'no_metadata' (אין מה לחלץ). - process_pending_extractions (metadata): 'extraction_failed' → חוזר ל-'pending' (משמר את חותם-התור) במקום להתיישב 'completed'. retry-loop הקיים מנסה שוב, ואחרי-מיצוי הרשומה נשארת בתור. מסלול reextract_metadata כבר עקבי (חוזר pending על כל מה שאינו completed/no_changes). תיקון-נתון (בוצע ידנית דרך כלי-MCP precedent_extract_metadata): da2d9ccb + 5fabdac5 חולצו-מחדש בהצלחה (court/date/summary/headnote/tags מלאים). 0 נותרו external 'completed-but-empty'. הערה: מפתח-Gemini אינו נדרש ב-Coolify — המחלץ רץ רק מקומית (precedent_library → extract_and_apply, host ~/.env עם GOOGLE_GEMINI_API_KEY); app.py מייבא רק את הקבוע PLACEHOLDER_PENDING_EXTRACTION, לא את פונקציית-החילוץ. בדיקות: test_metadata_extract_failure_status (transient/permanent/missing). כל 335 בדיקות mcp עוברות. guards נקיים. Invariants: G1 (אבחנת-מקור, לא התיישבות-בקריאה), INV-G3/X16 (עמידות — בר-retry), G12 (leak-guard נקי). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 03:34:43 +00:00
Chaim	a05df3eb1a	fix(precedents): normalize citation→docket case_number + enforce source_type↔precedent_level All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 3s Details Lint — undefined names / undefined-names (pull_request) Successful in 12s Details שני באגים בקליטת-פסיקה חיצונית (התגלו בתיק 1132-09-24 שהועלה דרך "פסקה חסרה"): 1. case_number קיבל את מחרוזת-הציטוט המלאה במקום דוקט נקי. הסיבה: overwrite_case_number=True הועבר רק לנתיב-הפנימי (internal_decisions); נתיב-הדריינר ל-external השאיר את הציטוט שב- case_number (precedent_library: case_number=citation). היקף: 122 רשומות external_upload. 2. source_type לא נאכף מול precedent_level — רק ה-prompt ביקש מה-LLM. כשה-LLM פלט level=ועדת_ערר_מחוזית אך source_type=court_ruling, ההחלטה סווגה בספרייה כ"פסיקת בית משפט". תיקון (ב-apply_to_record, כך שכל הנתיבים נהנים): • case_number מנורמל לדוקט הנקי כש-(א) caller כופה או (ב) הערך הנוכחי ציטוט-צורני (רווח/אורך>20); guard _is_clean_docket מבטיח שלעולם לא נכתב ערך לא-דוקט לשדה-הזהות (LLM-זבל נדחה). • _source_type_for_level גוזר source_type מ-precedent_level ודורס אי-עקביות (ועדת_ערר_*→ appeals_committee; עליון/מנהלי→court_ruling) — מקור-אמת אחד, לא הישענות על עקביות-LLM. נבדק: 18 unit-tests (docket-validation, level→type mapping) + 3 integration-tests מול apply_to_record עם DB מדומה (נרמול, אי-דריסת-דוקט-תקין, דחיית-זבל, אכיפת-עקביות). py_compile נקי. תיקון-נקודתי כבר בוצע ידנית ל-1132-09-24. Backfill ל-122 בנפרד (TaskMaster #141). Invariants: G1 (תיקון-במקור), G2 (אותו extractor — בלי מסלול מקביל), INV-AH (מקור-אמת דטרמיניסטי לסיווג, לא ניחוש-LLM). G11 (זהות-תיק נקייה). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 20:57:08 +00:00
Chaim	d95a36f310	feat(extraction): precedent metadata via Gemini Flash + scheduled drainer The /precedents metadata queue was stuck — 24 rows requested, nothing draining them — and the agentic claude CLI hit error_max_turns on what is a single structured text→JSON task (slow + flaky). Metadata extraction is bounded extraction, the wrong fit for an agentic loop. - gemini_session.py: query_json drop-in (gemini-2.5-flash, JSON mode, httpx — no new SDK dep). Reads GEMINI_API_KEY (~/.env; SoT Infisical nautilus:/external-apis/gemini). Host-side only — no LLM from the container. - precedent_metadata_extractor: claude_session.query_json → gemini_session. Validated live: rich, accurate fields (case_name/summary/appeal_subtype/tags). - process_pending_extractions: kind-aware cooldown — metadata 2s (Gemini, fast), halacha keeps 30s (Claude rate limits). - drain_metadata_queue.py + legal-metadata-drain.config.cjs (pm2 cron */15) so the queue never clogs again. SCRIPTS.md. - X8 INV-FP5 updated: per-task engine choice (Gemini=bounded metadata, claude_session=agentic halacha), both host-side, single canonical queue (G2). Agentic/voice-sensitive work (writing, analysis, halacha) stays on claude_session (Daphna's subscription). Gemini cost ≈ $0.10/1M tokens — negligible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 05:13:49 +00:00
Chaim	6dbcb7e798	feat(ingest): recompute searchable on ingest + metadata completion (GAP-13, FU-2a) Wire db.recompute_searchable into the ingest pipeline (after statuses are set) and into extract_and_apply (after fields are persisted to DB, success path only). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-30 20:47:51 +00:00
Chaim	cbc7a1e336	feat(precedents): formal citation per Israeli citation rules + copy/edit UI All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m25s Details Until now, "case_number" was the only stored identifier for a precedent. But a citation per the Israeli unified citation rules is a different beast — it has bold parties, an unbold prefix (court abbrev + panel/ district parenthetical + case number), and an unbold trailing reporter (נבו / פ"ד...). Without storing it as a first-class field we couldn't hand the chair a one-click "copy as citation" experience for pasting into decisions. Changes: - Schema V19: case_law.citation_formatted TEXT (Markdown — parties wrapped in … so the copy helper can render <strong> for Word/Docs paste and keep plain-text fallback meaningful). - Metadata extractor: composes citation_formatted from the document text per the unified citation rules, with worked examples for ע"א / עת"מ / ערר / בל"מ in the prompt. Refuses to store half-formed strings. - PATCH /api/precedent-library/{id} accepts citation_formatted so the chair can correct LLM mistakes. - /precedents/[id]: dedicated "מראה מקום" block with bold rendering, a copy-to-clipboard button (text/html + text/plain so Word keeps the bolds), and an inline edit textarea. - /precedents list rows: link displays the formatted citation when available, with a small inline copy button — falls back to the bare case_number for older rows. Backfill of existing rows happens by re-stamping the extraction queue once V19 has rolled out and the new field is reachable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 07:14:34 +00:00
Chaim	a02a4e3a64	feat(precedents): minimum-effort upload — file+citation, rest auto-extracted All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details The missing-precedents drawer + general precedent upload both required the user to type chair_name, district, practice_area, court, date etc. upfront — even though those fields can be (and already are, post-upload) extracted from the document text by the LLM. The metadata-extraction wakeup also only fired for the /precedent-library/upload path, leaving missing-precedents committee uploads stuck with whatever stub the user typed. Changes: - Extractor learns chair_name + district, overwrites the new PLACEHOLDER_PENDING_EXTRACTION sentinel for internal_committee rows (the DB CHECK forces non-empty; we stamp the placeholder at insert). - missing_precedent_upload no longer 400s on missing chair/district; it infers district from the citation when possible, falls back to the placeholder, and always fires pc_wake_for_precedent_extraction so the LLM can fill in the rest. - Both upload sheets default to file (+ citation) only; every other field is tucked into a closed <details> labeled "אופציונלי — דריסה ידנית של שדות שיחולצו אוטומטית". Required validators on chair/ district/practice_area dropped — the LLM fills them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 14:43:25 +00:00
Chaim	d359ab9884	feat(proceeding-type): explicit ערר/בל"מ field for cases + corpus All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m40s Details Same case_number can exist as both a regular appeal (ערר) and an extension-of-time request (בל"מ), and we were inferring the difference from appeal_subtype prefixes — fragile, and case-number lookups weren't disambiguated. Now stored as a first-class field on both case_law (corpus) and cases (live cases), with partial unique indexes on (case_number, proceeding_type). - SCHEMA_V15: column + CHECK constraints + backfill from appeal_subtype LIKE 'extension_request_%' + partial unique indexes replace the old global UNIQUE(case_number). - derive_proceeding_type() centralizes the inference rule (extension_request_* → בל"מ; subject regex fallback; default ערר). - Metadata extractor prompt asks Claude to populate the new field explicitly; apply_to_record writes it for internal_committee rows. - internal_decision_upload, case_create, case_update accept an optional proceeding_type; FastAPI request models expose it. - Wizard + edit dialog get a sided Select; case header renders the resolved label (ערר / בל"מ). - Uploaded the 2 staged בל"מ decisions on betterment levy: 8126/24 (סופר נוח, 13 chunks), 8047/23 (הרנון, 48 chunks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:17:33 +00:00
Chaim	afcc4818a4	fix(precedent-library): allow re-extraction for internal_committee rows All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m13s Details The "חלץ מטא-דאטה" / "חלץ הלכות" buttons in the UI were returning 404 for any precedent with `source_kind != 'external_upload'`. The original restriction was meant to keep LLM extraction off internal-committee imports (their metadata supposedly came from the case file system), but the same precedent rows can still need re-extraction when ingest produces broken data — e.g. the corrupted `subject_tags` value `['[','"','ה','י',...]` that motivated this change (an early ingest stored a JSON literal into a TEXT[] column, which Postgres split into single chars). Two changes here: 1. db.request_metadata_extraction / request_halacha_extraction: drop the `AND source_kind='external_upload'` filter. The extractor already preserves user values (only fills empty fields), so this is safe. 2. precedent_metadata_extractor.extract_and_apply: detect the character-by-character corruption above and treat it as empty so the freshly-extracted tags actually replace the broken ones. Heuristic: 3+ elements where every element is at most 2 chars (legitimate tags are multi-character Hebrew words). Coolify deploy required for the FastAPI container to pick this up.	2026-05-06 19:44:13 +00:00
Chaim	69d4827f33	feat(migration): enrich internal committee entries — fix case_number + metadata + halachot All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m32s Details - precedent_metadata_extractor: add case_number_clean extraction field - apply_to_record: overwrite_case_number param for one-time migration - internal_decisions: enrich_migrated_entries() — runs metadata then queues halachot - server: expose as internal_decision_enrich MCP tool Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 18:59:20 +00:00
Chaim	8e1384b897	fix(precedents): wrap citation column + extractor fills source_type All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details Two follow-ups after running the metadata extractor on 403-17: 1. Library table: shadcn TableCell defaults to whitespace-nowrap and the table wrapper has overflow-x-auto, so the long citation forced a horizontal scrollbar inside the row. Override on the citation cell only — whitespace-normal + break-words + min/max-w to keep the column readable. Same for the case-name cell. Row aligns to top so wrapping doesn't push neighbours up. 2. Extractor now also fills source_type (court_ruling / appeals_committee). The previous round added decision_date_iso, precedent_level, and court but left source_type empty. Same closed-enum + merge-only-if-empty policy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 12:28:35 +00:00
Chaim	6420fe4b0b	feat(precedents): metadata extractor also fills date, level, court All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details The first end-to-end run on 403-17 surfaced three fields the auto-fill left blank because the chair didn't set them in the upload form: date, precedent_level, and court. All three are right there in the ruling's header text — there's no reason to require manual entry. Prompt now asks for: - decision_date_iso (YYYY-MM-DD parsed from "ניתנה היום, … 5 בספטמבר 2022" style signatures) - precedent_level (closed enum: עליון/מנהלי/ועדת_ערר_ארצית/ועדת_ערר_מחוזית) - court (the full court name from the title block) Validation is unchanged: precedent_level only accepts the four enum values; decision_date_iso is parsed into a Python date object before being handed to update_case_law (asyncpg doesn't coerce strings to DATE columns); court is stored verbatim. Merge policy is unchanged — only fills empty fields. Anything the chair typed in the upload form survives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 12:16:03 +00:00
Chaim	5d836ca414	fix(precedents): Anthropic SDK fallback, format() crash, UI refresh All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m31s Details Three fixes to the precedent library after the first end-to-end test on 403-17 surfaced runtime issues: 1. Anthropic SDK fallback in claude_session. The legal-ai Docker container does not ship the `claude` CLI, so every halacha and metadata extraction was failing with "Claude CLI not found." Module now tries the CLI first (zero-cost local path) and falls back to the Anthropic SDK with ANTHROPIC_API_KEY when the binary is absent. Default model is claude-sonnet-4-6, overridable via CLAUDE_SDK_MODEL env. The system message gets cache_control: ephemeral so multi-chunk runs reuse the cached instruction prefix at ~10% read cost. Adds `anthropic` to pyproject deps. 2. precedent_metadata_extractor crashed with KeyError because the JSON example inside the prompt template contained literal { } characters that str.format() interpreted as placeholders. Switched to f-string concatenation; the prompt template no longer needs format() at all. 3. Library list query stays stale after upload because the upload mutation's onSuccess fires when the POST returns task_id, not when SSE reports completion. Added a second invalidate inside the SSE watcher in PrecedentUploadSheet so the new row appears with up-to-date chunk and halachot counts the moment processing finishes. Halacha and metadata extractors now route the long static prompt through the new `system=` parameter so the SDK path actually caches it; the CLI path concatenates and behaves as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 10:52:31 +00:00
Chaim	73a79ea7e8	feat(precedents): metadata auto-fill, edit sheet, persuasive extraction All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details Three improvements to the precedent library based on usage feedback: 1. Auto-fill metadata at upload time. New service precedent_metadata_extractor reads the ruling's full_text and suggests case_name (short), summary, headnote, key_quote, subject_tags, appeal_subtype. The merge policy fills only empty fields, preserving everything the chair typed in the upload form. Wired into the ingest pipeline; also exposed as a re-run endpoint POST /api/precedent-library/{id}/extract-metadata for existing records. 2. Edit sheet in the UI. Pencil icon on each library row opens a pre-populated form covering every field. A Sparkles button on the sheet runs the metadata extractor on demand and refreshes the form. The case_number is read-only because halachot are FK'd to it; renaming requires delete + re-upload. 3. Halacha extractor branches on is_binding. Sources marked binding (Supreme/Administrative) keep the strict halacha prompt. Non-binding sources (other appeals committees, district courts on planning matters) get a different prompt that extracts applications, interpretive principles, and persuasive conclusions — labeled with new rule_types 'application' and 'persuasive'. The fallback also widens chunk selection: if the chunker labeled nothing as legal_analysis/ruling/conclusion, we now run on all chunks rather than returning zero halachot for a usable ruling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 10:19:35 +00:00

16 Commits