legal-ai

Author	SHA1	Message	Date
Chaim	2e20e27e17	feat(style-acq T1-T3): קורפוס-דוגמאות של דפנה לכותב (style_exemplars) ממלא את ערוץ-הדוגמאות (B) של מערכת רכישת-הסגנון: הכותב מאחזר פסקאות-בלוק אמיתיות של דפנה בזמן כתיבה, ממוקדות section+outcome+practice_area. T1 — תשתית + backfill: - SCHEMA_V27: טבלת style_exemplars (purpose-built — בלי תיקים מזויפים בשרשרת decision_paragraphs). decision_number/source/section/outcome/practice_area+embedding. - db: insert/delete/search_style_exemplars + count_style_exemplars. - scripts/backfill_style_exemplars.py: מפצל קורפוס דפנה (style_corpus + internal_committee) לסעיפים→פסקאות, embed, שמירה. אידמפוטנטי, dry-run/apply. T2 — אחזור ממוקד: - search_style_exemplars(section, outcome, practice_area) — section=hard filter, outcome/practice_area=soft. block_writer._build_precedents_context ממפה block→section ומאחזר (ראשי), לצד הנתיב הישן (משלים). T3 — contrastive/adapt: - הדוגמאות מתויגות "מבנה/קול בלבד — התאם, אל תעתיק תוכן"; פסקה מלאה (1100 תווים). INV-LRN5 (טוהר — סגנון בלבד). G11. הרצת backfill --apply בנפרד. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 18:10:01 +00:00
Chaim	701efab726	feat(mcp): FU-14 GAP-51 — איחוד אוצר-המילים של תוצאת-תיק (set_outcome SSoT) הכרעת-יו"ר: קנוני = 3 תוצאות אמיתיות (rejection/partial_acceptance/full_acceptance); betterment_levy יוצא מהיותו "תוצאה" ועובר ל-override לפי practice_area. + עקרון "אנגלית-ב-DB, עברית-ב-UI": מפת-תוויות SSoT אחת. lessons.py: - VALID_OUTCOMES = 3 (הוסר betterment_levy). - OUTCOME_LABELS_HE (SSoT לתצוגה) + LEGACY_OUTCOME_MAP + canonical_outcome(). - PRACTICE_AREA_OVERRIDES["betterment_levy"] מרכז את כל ה-guidance שהיה מפתוח כ-outcome (golden_ratios/opening/summary/discussion/template). - get_lessons_for_outcome(outcome, practice_area) + format_ratios_comment(..., practice_area) מחילים override + מנרמלים legacy. block_writer.py: STRUCTURE_GUIDANCE קנוני + תווית מ-OUTCOME_LABELS_HE + override betterment. workflow.set_outcome: קנוני 3 + מיפוי-legacy סלחני; תווית מ-SSoT. drafting.py: טבלת יחסי-זהב + get_decision_template מודעי-practice_area (override). web-ui case.ts: הסרת betterment_levy מ-expectedOutcomes (הוא practice_area). server.py: docstrings קנוניים. מיגרציה: migrate_gap51_outcomes.py — 9 שורות נורמלו (rejected→rejection וכו'), גיבוי ב-data/audit/. הקוד canonicalize בקריאה ⇒ backward-compatible גם בלי מיגרציה. אומת: py_compile (5 קבצים) + בדיקות-יחידה offline (override/legacy/labels) + אימות-DB. עודכנו X9 §3 + gap-audit (GAP-51 ✅). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:34:49 +00:00
Chaim	7f4e036211	feat(spec): חיבור ספ-המערכת למסלול-הכתיבה האינטראקטיבי (אכיפה 3-שכבתית) הספ (docs/spec/, G1–G11) חובר לסוכני Paperclip דרך INV-AG1 אבל לא למסלול שבו רוב הקוד נכתב בפועל — הסשן האינטראקטיבי של Claude Code. סוגר את הפער לפני מחזור-2 (FU-9..15), שהוא כולו כתיבת-קוד. שלוש שכבות אכיפה: 1. תיעוד — CLAUDE.md §"פרוטוקול כתיבת-קוד" + docs/spec בטבלת-הייחוס 2. hook — scripts/spec-guard.sh (PreToolUse על Edit/Write/MultiEdit, רשום ב-.claude/settings.json) מזכיר פעם-בסשן בכל נגיעה בקובץ-קוד; non-blocking 3. PR — .gitea/PULL_REQUEST_TEMPLATE.md עם סעיף-חובה "Invariants" המקבילה האינטראקטיבית ל-INV-AG1 שכבר אוכף על הסוכנים (HEARTBEAT §"קריאת-ספ"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 13:28:15 +00:00
Chaim	434341cc29	chore(#57 ): re-chunk+re-embed legacy precedents (pre-#55 chunker remediation) Adds scripts/rechunk_legacy_precedents.py: selects every case_law with a tiny chunk (content<50 — the pre-fix chunker fingerprint) and runs ingest.reindex_case_law (re-chunk+re-embed from stored full_text only, no re-OCR/LLM, idempotent). Batch-idempotent (re-queries the affected set). Run result (2026-06-03): 73 precedents reindexed, 0 failed. Tiny chunks 483 -> 4 (99.2%); total precedent_chunks 5019 -> 3115 (fragments merged). Search verified healthy (substantial coherent passages, no errors). The 4 residual tiny chunks are isolated section headings ('דיון', 'טענות המשיבים', ...) emitted by the CURRENT (fixed) chunker — not legacy fragments — and are already filtered at query time (>=50, #55). Minor chunker edge case, candidate #55 follow-up. The DB chunk migration is already applied to prod; this commit is the script + SCRIPTS.md entry only (no app code change, no deploy needed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 07:55:42 +00:00
Chaim	887079535c	feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים), לצמצום היקף האישור-הידני של היו"ר: - docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ). - docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס. - Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable; claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31: פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל. - scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות. - .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו). תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות. audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 18:42:13 +00:00
Chaim	6ff2e36bf9	feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63 ) Covers GAP-11 (INV-RET4/G8) and GAP-14 (INV-QA1/G10). Retrieval quality was never measured (only telemetry observation) and the halacha review backlog was invisible (the 10/19 gap was found by accident). Unit B — backlog visibility (pure code, container): - metrics.halacha_backlog(conn) → {pending_review, approved, rejected, published, total, oldest_pending_at}; surfaced in metrics.get_dashboard() (get_metrics MCP tool) and /api/system/diagnostics. Live count revealed 178 pending / 1552 total, oldest from 2026-05-03 — previously invisible. Unit A — retrieval eval harness (host-side scripts): - scripts/eval_gold_bootstrap.py — seeds data/eval/gold-set.jsonl. Two sources: citations (cited==relevant via search_relevance_feedback — empty until decisions cite precedents) and known_item (query=case_name → relevant=self; a real citation-free signal, the methodology #52 checked by hand). Idempotent; preserves source='chair' rows. - scripts/eval_retrieval.py — runs the production retrieval path (search_library / search_internal) over the gold-set; computes precision@k, recall@k, MRR, nDCG@k (k=5,10); aggregates overall + per-corpus + per-practice_area; writes a report and a delta vs committed baseline.json (which records the retrieval_config it reflects). --self-test unit-checks the metric math offline. Gold-set strategy = hybrid (chair decision): bootstrap + chair review. The citation source is empty today (0 cited precedents in decisions), so the seed is known-item (77 queries: 54 internal_decisions + 23 precedent_library). The gold-set is PROVISIONAL until Dafna reviews it (the domain chair-gate). Baseline (production config: multimodal+rerank on): R@10=0.987, MRR=0.837, nDCG@10=0.872. Finding: MULTIMODAL_ENABLED=true slightly lowers known-item recall (image-page results displace exact name matches) — relevant to #15. precedent_library weaker than internal (R@10 0.957 vs 1.0) — one external precedent unfindable by name. "CI gate" realized as discipline (re-runnable harness + committed baseline + run before/after any retrieval-layer change) — retrieval needs prod DB + Voyage, no CI runner has that access. Spec: docs/superpowers/specs/2026-05-31-fu5-eval-harness-design.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 14:58:13 +00:00
Chaim	4fce9d503f	feat(migration): FU-2c — reconcile external case_law identifiers (GAP-08, #68 ) External court precedents stored the full citation (designator + docket + parties + Nevo date) inside case_number, violating INV-ID2/G1 (citation as identifier). Chair decision 2026-05-31 (Option A): canonical external case_number = proceeding-designator + docket, '/' preserved (court convention, not X1's '/'→'-'); parties/court/date → citation_formatted. scripts/fu2c_reconcile_external_case_numbers.py — deterministic dry-run → chair-review → apply, mirroring FU-2b: - extracts designator+docket; flags split into BLOCKING (MISMATCH / CIT_NO_DOCKET / DESIG_MISMATCH / DUP_CHECK / NO_DOCKET) vs ADVISORY (NO_CITATION — case_number fix still deterministic, missing citation is a separate gap), so advisory rows apply while uncertain identity does not. - --overrides CSV (id,proposed_canonical,citation_formatted,reason) for audited chair adjudication of blocking rows. - apply scoped to source_kind='external_upload' (task target) while keeping cited_only/nevo_seed in the reconciliation VIEW so DUP_CHECK spans the full external unique space; pre-flight collision guard before every UPDATE. Applied to production 2026-05-31: 21 case_number normalized + 3 citation_formatted reconciled (D = consolidated Supreme Court judgment לויתן/קלמנוביץ → lead docket 25226-04-25; 2×C empty citations composed from metadata). אהוד שפר עע"מ 317/10 deferred — cross-source duplicate with an existing cited_only reference (collision guard held; → #70). 49 cited_only records out of scope → new task #70 (committee-form NNNN-NN dockets the extractor misses, dedup, unresolvable "ערר אדלר"). Extraction + gating verified offline on all 24 records. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 14:12:45 +00:00
Chaim	e5b34e01dc	docs(scripts): note sync --verify drift-gate semantics (FU-8a)	2026-05-31 11:36:06 +00:00
Chaim	8477fd87e7	docs(scripts): register fu2b reconciliation script (FU-2b)	2026-05-31 08:58:32 +00:00
Chaim	58ab003206	fix(retrieval): make decisions findable by name + unhide committee uploads All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m57s Details Root cause of "agent can't find the Agasi decision in the corpus" (CMPA-55): the decision was fully ingested, but the retrieval layer failed on the realistic agent query — searching by case name. - RC-A (#52): lexical tsvector covered only chunk content + halacha text, so a bare-name query ("אגסי") matched decisions that cite the case, not the case itself. Add meta_tsv on case_law(case_name, case_number) (SCHEMA V20) and OR it into the lexical halacha/chunk SQL with a match boost, so a name/number hit surfaces the case's own rows. Agasi: rank 4 → rank 1. - RC-B (#53): precedent_library_list hard-defaulted source_kind=external_upload and never exposed the param, hiding uploaded ערר/בל"מ (internal_committee) decisions. Thread source_kind through service → tool → MCP tool (supports 'internal_committee' / 'all_committees'). - #54: agent instructions (researcher/analyst/writer) — search-by-name protocol: add content/case-number, search both corpora, use all_committees before declaring "not in corpus". - #55: chunker produced tiny fragment chunks ("דיון", "החלטה") from header keywords matched mid-sentence. Anchor SECTION_PATTERNS to line start + merge sub-min sections; exclude <50-char fragments at query time (484 existing fragments hidden; full re-chunk tracked as #57). Tests: scripts/test_retrieval_by_name.py (name ranks case above citer + substantive regressions); chunker unit checks (0 tiny chunks). New findings filed as tasks #56 (halacha source_kind leak) and #57 (re-chunk migration). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:26:19 +00:00
Chaim	bb0cd7c6a2	feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat All checks were successful Build & Deploy / build-and-deploy (push) Successful in 2m7s Details Six-phase upgrade of /training from a read-only dashboard into a full Style Studio for managing Daphna's style corpus. - Upload Sheet on /training: file → proofread preview → commit (no more CLI-only `upload-training` skill). - Rich corpus metadata: GET /api/training/corpus returns summary, outcome, key_principles, page_count, parties (regex), legal_citation, lessons_count. PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details /content/lessons/patterns) replaces the bare table row. - LLM metadata enrichment: style_metadata_extractor + MCP tools (style_corpus_enrich, style_corpus_pending_enrichment) fill summary /outcome/key_principles via claude_session (free, host-side). - Per-decision lessons: new decision_lessons table + 4 REST endpoints + LessonsTab in drawer; hermes-curator now auto-posts findings as decision_lessons(source=curator). - Curator Portrait tab: prompt rendered with link to Gitea, recent curator findings, style_analyzer training prompts, propose-change form that writes proposals to data/curator-proposals/ for manual chair review (no auto-mutation of the agent file). - Style chat tab: SSE-streamed conversations with the style agent. New host-side pm2 service (legal-chat-service, port 8770) wraps claude CLI with stream-json + --resume continuation; FastAPI proxies via host.docker.internal. Zero API cost — uses chaim's claude.ai subscription. chat_conversations + chat_messages persist history. Architecture: keeps the existing rule that claude_session only runs on the host (not the container). The new legal-chat-service is the canonical bridge between the container and the local CLI for the chat feature; everything else (upload, metadata, lessons) stays within the container's existing capabilities. Audit script (scripts/audit_training_corpus.py) included for verifying which corpus rows still need enrichment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:06:22 +00:00
Chaim	2aee398b4a	feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 ) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details Six independent sub-tasks dispatched in parallel; aggregated here. ## #33 — Hide case_name column library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם" get `className="hidden"` in both Court and Committee row variants. DB column preserved for future use. ## #47 — Audit script periodic New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר prefix, internal missing chair/district, cases.practice_area enum) + CEO wakeup on violations + cron `0 7 * * `. First run: 0 issues. ## #48 — Parent-doc retrieval (gated, default off) Schema V17: precedent_chunks.parent_chunk_id + chunk_role ('child'\|'parent'). New chunker.chunk_document_hierarchical() — section-aware parents (~1500 tokens) containing ~5 overlapping children (~300 tokens each). New db.store_precedent_chunks_hierarchical two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and swap content + dedupe by parent_chunk_id when flag on. Toggle: PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS. Backfill ~3min and ~$0.20 — deferred to follow-up. ## #49 — Multimodal backfill New scripts/backfill_multimodal_precedents.py with token-matching case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container: 26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no source file on disk) — will catch up when re-uploaded. ## #50 — Closed-loop feedback + nDCG Schema V18: search_logs + search_relevance_feedback. New telemetry.py with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) + auto-infer_relevance_from_citations (reads case drafts → marks score=3 when cited precedent appears in past search top-K). Hooks added to 5 search paths. scripts/compute_ndcg.py for aggregation. Two admin API endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI deferred — API is enough for now. ## #51 — Halacha quality monitoring New scripts/monitor_halacha_quality.py — baseline avg confidence (trusted=0.849, all=0.833, pending=0.694) with rolling window drift detection. Default 5% threshold. Exits non-zero on alert for cron integration. Recommended: `0 8 * 1` weekly Mon 8am. ## Bonus: 230 unlinked citations → missing_precedents Bulk-imported 230 distinct unlinked citations from precedent_internal_citations to missing_precedents.status='open', party='committee', with notes listing source citers. Top candidate: ע"א 3213/97 (cited 5x). Total open missing_precedents now 237. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 11:26:52 +00:00
Chaim	ac3ed455cf	fix(cases): בל"מ badge reads proceeding_type, not just appeal_subtype All checks were successful Build & Deploy / build-and-deploy (push) Successful in 43s Details After the proceeding_type field landed, users started flipping cases to בל"מ via the edit dialog. But the case-header badge + cases-table filter were still gated on isBlamSubtype(appeal_subtype), so the badge didn't appear when only the proceeding_type changed. Now the badge shows when either proceeding_type === 'בל"מ' OR appeal_subtype is an extension_request_* variant — the legacy path stays so existing rows that never got a proceeding_type still render correctly. Also regen types.ts from prod (proceeding_type now in OpenAPI schema) and register the one-shot process_pending_blam.py script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:34:23 +00:00
Chaim	d359ab9884	feat(proceeding-type): explicit ערר/בל"מ field for cases + corpus All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m40s Details Same case_number can exist as both a regular appeal (ערר) and an extension-of-time request (בל"מ), and we were inferring the difference from appeal_subtype prefixes — fragile, and case-number lookups weren't disambiguated. Now stored as a first-class field on both case_law (corpus) and cases (live cases), with partial unique indexes on (case_number, proceeding_type). - SCHEMA_V15: column + CHECK constraints + backfill from appeal_subtype LIKE 'extension_request_%' + partial unique indexes replace the old global UNIQUE(case_number). - derive_proceeding_type() centralizes the inference rule (extension_request_* → בל"מ; subject regex fallback; default ערר). - Metadata extractor prompt asks Claude to populate the new field explicitly; apply_to_record writes it for internal_committee rows. - internal_decision_upload, case_create, case_update accept an optional proceeding_type; FastAPI request models expose it. - Wizard + edit dialog get a sided Select; case header renders the resolved label (ערר / בל"מ). - Uploaded the 2 staged בל"מ decisions on betterment levy: 8126/24 (סופר נוח, 13 chunks), 8047/23 (הרנון, 48 chunks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:17:33 +00:00
Chaim	f3cc9ca9d4	feat: Stage A finalizers + #35/#36/#37 — critical-gap closure Some checks failed Build & Deploy / build-and-deploy (push) Has been cancelled Details Four parallel sub-agents closed the remaining critical gaps from the 26/05 Stage A/B sprint. Each block independently tested; aggregated here. ## #30/#31 finalizers (sub-agent A) * Auto-derive practice_area in case_create from case_number prefix (1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197); default for CaseCreateRequest is now "" (the DB constraint catches any stray "appeals_committee"). * practice_area.py: derive_subtype now handles axis-B domain values (rishuy_uvniya/betterment_levy/compensation_197) without parsing the case number; new helper derive_domain_practice_area(). * Halacha re-extraction verified unnecessary — all 6 reclassified records already had is_binding=false and approved halachot. * Regression tests: 6 cases in tests/test_corpus_constraints.py covering practice_area enum, internal-committee chair/district, external-upload arar prefix, MCP guard. * UI: district input → Select dropdown (7 districts) in precedent-edit-sheet.tsx, preserving legacy free-text values. ## #37 בל"מ subtypes (sub-agent B) * 3 new appeal_subtypes: extension_request_{building_permit, betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended, SUBTYPES_BY_AREA mappings added. * New helpers: is_blam_subject(), is_blam_subtype(), derive_subtype_with_blam(case_number, subject, practice_area). case_create now uses it to auto-detect "בקשה להארכת מועד" subjects. * 3 methodology templates under docs/methodology/extension-request-.md. paperclip_client.py mapping updated for the 3 new subtypes (extension_request_building_permit→CMP, the other two→CMPA). * Frontend: bilingual "בל"מ" badge + filter dropdown on cases list + detail header; appeal-type-bars collapseBlam() merges בל"מ into its parent domain for aggregate bars. * Wizard auto-detects בל"מ from subject during case creation. * 3 Berlinger cases (1017/1018/1019-03-26) migrated to appeal_subtype=extension_request_building_permit via psql. ## #35 missing_precedents feature (sub-agent C) * Schema V13: missing_precedents table (citation, case_id, party, legal_topic, status, linked_case_law_id, claim_quote, ...) + FK constraints + 3 indexes. Applied via psql + idempotent migration. * 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints (POST/GET/PATCH/DELETE/upload — upload routes by citation prefix to ingest_internal_decision or ingest_precedent). * Next.js page /missing-precedents with 5 status tabs + filters + sidebar badge counter + detail drawer with metadata edit + smart upload form that switches fields per committee/court. * Bootstrap: 7 rows imported from the JSON file (3 citations × cases, all status=closed with linked_case_law_id). * legal-researcher.md: new §2ב.5 with missing_precedent_create usage + dedup semantics + tool grant. ## #36 legal_arguments aggregation (sub-agent D) * Schema V14: legal_arguments + legal_argument_propositions M:M. Applied via psql. * New service argument_aggregator.py with two functions — aggregate_claims_to_arguments() (Claude CLI / claude_session) and get_legal_arguments(). Graceful llm_unavailable handling when CLI is missing (containers). * 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as BackgroundTask, GET .../legal-arguments). * Frontend: shadcn Accordion + new legal-arguments-panel.tsx with hierarchical (party → priority badge → arguments) display, "טיעונים" tab on the case page, "חשב/חשב מחדש" buttons. * scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run found 8 candidate cases including 1017/1018/1019. ## Open follow-ups (intentionally deferred) * npm run api:types in web-ui (CLAUDE.md flow) — recommended before the next UI commit; not required for backend deployment. * Run backfill_legal_arguments.py --apply once the container picks up the new aggregator service. * webhook on missing-precedents upload-close to Paperclip (optional). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 08:34:40 +00:00
Chaim	45341a0bc8	feat(curator): switch Hermes Curator to DeepSeek V4-Pro via deepseek_local adapter A/B test (2026-05-05) showed DeepSeek V4-Pro is 2-3x faster and ~20x cheaper than Sonnet for style/lexicon pattern analysis, with comparable quality. Adds adapters/deepseek-paperclip-adapter/ package, documents adapter requirements (env injection, run-id headers), updates CLAUDE.md with adapter integration notes, and records lessons from ערר 1200-25 (block order for 1xxx, "להלן מתוך" pattern, expanded factual background, bridge planning analysis, flat heading structure). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 05:58:52 +00:00
Chaim	1b14e04373	chore(skills): remove paperclip-dev, scope converting-plans-to-tasks All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details paperclip-dev is for maintaining the Paperclip codebase itself — not relevant to legal work. Removed from all 14 agents (was on CMPA mirror). paperclip-converting-plans-to-tasks helps decompose a plan into assigned issues. Useful for the planning-heavy agents (CEO, analyst). Now scoped to those two — removed from the other 5 in CMPA where it had crept in. Net effect: zero drift on paperclipai/* skills across all 7 master+mirror pairs. Verified via the new Agents tab dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:47:05 +00:00
Chaim	cf5f6fe274	feat(paperclip): close 11 integration gaps (#16-#28) Brings the legal-ai ↔ Paperclip integration in line with the official Paperclip skill. Net effect: HEARTBEAT.md -47% (370→195 lines), all 14 agents on uniform runtime_config + budget + instructionsBundleMode, and two cross-company helpers replacing manual SQL. Highlights: - HEARTBEAT.md refactor: project-specific only, delegates to the official paperclipai/paperclip skill (loaded per agent). Adds heartbeat-context fast-path (§1.7) and PAPERCLIP_WAKE_PAYLOAD_JSON shortcut (§1.5). - Issue Thread Interactions API: legal-ceo.md now uses ask_user_questions / request_confirmation / suggest_tasks instead of free-text comments — gives chair structured UI with idempotency keys. - pc.sh + paperclip_api.pc_request: every API call goes through helpers that inject Authorization + X-Paperclip-Run-Id (audit trail). - sync_agents_across_companies.py: master(CMP)→mirror(CMPA) sync via Paperclip API, idempotent, with --verify and --apply modes. - skills/new-company-setup: 11-step blueprint distilling all 11 gaps into a single onboarding runbook for the next company. - .taskmaster: 12 tasks covering each gap (one already closed: #29). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:25:45 +00:00
Chaim	81ccf3a888	feat(retrieval): track page_number on text chunks for multimodal hybrid boost All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6m33s Details The legacy chunker did not track which PDF page each chunk came from. Stored chunks had page_number=NULL, which blocked the multimodal hybrid retriever's text+image boost — it joins (chunk, image) on (document_id, page_number) and the join could never fire. This change: - extractor.extract_text now returns (text, page_count, page_offsets); page_offsets[i] is the start char offset of page (i+1) in the joined text. None for non-PDFs. - chunker.chunk_document accepts an optional page_offsets and tags each chunk with the page that contains its first character (uses the existing chunker logic; pages assigned post-hoc by content search to keep the diff minimal). - processor.process_document and precedent_library.ingest_precedent forward page_offsets through the chunker. New uploads now carry accurate page_number on every chunk. - Other extract_text callers (tools/documents, tools/workflow, web/app.py) updated to unpack the third element (ignored). - scripts/backfill_chunk_pages.py: per-case retrofit. Re-extracts each PDF (re-OCRs via Google Vision if needed, ~$0.0015/page), computes page_offsets, and updates page_number on every chunk by content search. Idempotent; --force re-runs on already-tagged docs. Forward-only would leave the 419 image embeddings backfilled on cases 8174-24 + 8137-24 unable to boost their corresponding text chunks. The retrofit script closes that gap (cost ~$0.60). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:49:41 +00:00
Chaim	242f668319	feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m50s Details Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid text+image search. Off by default; enable with MULTIMODAL_ENABLED=true. - Schema V9: document_image_embeddings + precedent_image_embeddings (vector(1024), page_number, image_thumbnail_path) - extractor.render_pages_for_multimodal renders PDF pages at MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass - embeddings.embed_images calls voyage-multimodal-3 in 50-page batches - services/hybrid_search.py orchestrator: rerank applied to text side first (rerank-2 is text-only); image side cosine; weighted merge with text_weight 0.65 (env-tunable); image-only pages surface as match_type='image' so dense scanned content still appears - processor.process_document and precedent_library.ingest_precedent gated by flag — non-fatal on multimodal failure - scripts/multimodal_backfill.py — idempotent per-case CLI to embed existing documents without re-extracting text Validated locally on a 5-page response brief: render 0.31s, embed 8.32s, hybrid merge surfaces image rows correctly. Production rollout starts with flag=false (no behavior change), then per-case A/B. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:24:52 +00:00
Chaim	26c3fddf41	feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which 4 POCs showed inconsistent improvement), add a cross-encoder rerank layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false). POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge): - mean@3 +4.5% (4.306 → 4.500) - practical-category queries +11.6% (3.78 → 4.22) - latency +702ms per query - no schema change, no re-embed, no double storage Plumbing: - config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars - embeddings.voyage_rerank() wraps voyageai client.rerank - services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is unavailable. - tools/search.py: search_decisions, search_case_documents, find_similar_cases all wrapped - services/precedent_library.search_library wrapped Smoke-tested locally with flag on/off — produces expected behaviour and latency profile. Ready for production rollout via Coolify env flip after deploy. POCs (kept under scripts/ for reference): - voyage_context3_poc{_long}.py — context-3 evaluation (rejected) - voyage_multimodal_poc.py — multimodal-3 (stage C, deferred) - voyage_rerank_judge_poc.py — single-case rerank benchmark - voyage_rerank_corpus_poc.py — full-corpus rerank validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:43:41 +00:00
Chaim	da0a385d9c	docs: register reembed_voyage.py in SCRIPTS.md All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details	2026-05-03 16:44:07 +00:00
Chaim	28f49defff	LLM session: async, 30min timeout, semantic chunking + parallel All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details The claude_session bridge had two structural defects that made any non-trivial document extraction unreliable: 1. subprocess.run() blocks the asyncio event loop in the MCP server for the full duration of every LLM call (60-180s typical). 2. The 120-second timeout was below the cold-cache cost of any document over ~12K Hebrew characters. Three back-to-back timeouts on case 8174-24 dropped 43 appellant claims on the floor. Phase 1 of the remediation plan — keeps claude_session as the engine (no Anthropic API switch) and restructures around it: claude_session.py • query / query_json are now async — asyncio.create_subprocess_exec instead of subprocess.run, so MCP server can serve other coroutines while a call is in flight. • DEFAULT_TIMEOUT 120 → 1800 (30 min). High enough that no realistic document hits it; bounded so a runaway never zombifies forever. • LONG_TIMEOUT 300 → 3600 for opus block writing on full case context. • TimeoutError now actually kills the subprocess (asyncio.wait_for cancellation alone leaves the child running). claims_extractor.py • _split_by_sections: chunks at numbered sections / Hebrew letter headings / "פרק" markers / markdown ##, falls back to paragraph breaks, then to hard splits. Targets 12K chars per chunk — small enough that each chunk reliably finishes inside the timeout. • _extract_chunk: per-chunk retry (1 attempt by default) with structured logging on failure. Failed chunks no longer crash the overall extraction; they're skipped with a partial-result warning. • extract_claims_with_ai now runs chunks in parallel via asyncio.gather bounded by a semaphore (CHUNK_CONCURRENCY=3). For a 25K-char appeal: was sequential 150-300s, now ~70-90s. Updated all 9 callers (claims, appraiser facts, block writer, qa validator, brainstorm, learning loop, style analyzer × 3) to await the now-async API. The one-shot scripts/extract_claims_8174.py used to recover 43 appellant claims on case 8174-24 has been moved to .archive/ — phase 1 makes it obsolete. SCRIPTS.md updated. Phase 2 (background-task wrapper around LLM-bound MCP tools, persistent llm_tasks table, SSE progress) is the structural follow-up — separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 14:21:35 +00:00
Chaim	726498126d	Add Track Changes architecture for draft revisions (CMP + CMPA) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Fixes critical bug in 1033-25: user-uploaded עריכה-*.docx files were orphaned on disk while exports kept rebuilding from stale DB blocks. New architecture: - User-uploaded DOCX becomes the source of truth (cases.active_draft_path) - System edits via XML surgery with real Word <w:ins>/<w:del> revisions - User can Accept/Reject each change from within Word Components: - docx_reviser.py: XML surgery for Track Changes (15 tests) - docx_retrofit.py: retroactive bookmark injection with Hebrew marker detection + heading heuristic (9 tests) - docx_exporter.py: emits bookmarks around each of the 12 blocks - 3 new MCP tools: apply_user_edit, list_bookmarks, revise_draft - 4 new/updated endpoints: upload (auto-registers active draft), /exports/revise, /exports/bookmarks, /exports/{filename}/retrofit, /active-draft - DB migration: cases.active_draft_path column - UI: correct banner using real v-numbers, "מקור האמת" badge, detailed upload toast with bookmarks_added/missing_blocks - agents: legal-exporter (3 export modes), legal-ceo (stage G for revision handling), legal-writer (revision mode) Multi-tenancy: - Works for both CMP (1xxx cases) and CMPA (8xxx/9xxx cases) - New revise-draft skill added to both companies - deploy-track-changes.sh syncs skills CMP ↔ CMPA - retrofit_case.py: one-off retrofit of existing files Tests: 34 passing (15 reviser + 9 retrofit + 4 exporter bookmarks + 6 e2e) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 18:49:30 +00:00
Chaim	5c9a5d702a	Clean up scripts/: archive 17, delete 5, add SCRIPTS.md registry Active scripts (5): auto-sync-cases.sh, backup-db.sh, restore-db.sh, notify.py, bidi_table.py Archived (17): one-time migration/seeding scripts whose functionality is now in MCP server or web API. Moved to scripts/.archive/ Deleted (5): zero-value scripts (duplicates, hardcoded single-case, debug scripts) Added scripts/SCRIPTS.md — registry of all scripts with purpose, status, and what superseded them. CLAUDE.md updated with rule: any script change requires SCRIPTS.md update. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 16:30:19 +00:00

25 Commits