legal-ai

Author	SHA1	Message	Date
Chaim	77e5996497	feat(agents): wire Hermes Knowledge Curator to CEO post-export (CMP + CMPA) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m37s Details Adds new sub-agent "מנהל ידע" (hermes_local adapter) that runs after each successful export to analyze the final decision and suggest updates to skills/decision/SKILL.md and lessons. Read-only on case data, write only on a single comment per run. - legal-ceo.md: new stage F2 after F (export). Looks up curator by name in current company, creates async sub-issue, no waiting. Falls back to silent skip if no curator configured. - legal-ceo.md: agents table updated with both curator UUIDs (CMP + CMPA). - hermes-curator.md: role instructions documenting CMP/CMPA split and what the curator does/does not do. Stage 1 POC. End-to-end validated on CMP-68 (case 1130-25) with two substantive findings on style patterns. CMPA agent created with separate ~/.hermes/profiles/curator-cmpa profile (own MEMORY.md focused on היטל השבחה / פיצויים). Known gaps to follow up: curator does not auto-close its issue, does not auto-persist findings to MEMORY.md, comment attribution falls back to chaim's user (install-key) — these are tracked separately and do not block validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:33:23 +00:00
Chaim	69d4827f33	feat(migration): enrich internal committee entries — fix case_number + metadata + halachot All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m32s Details - precedent_metadata_extractor: add case_number_clean extraction field - apply_to_record: overwrite_case_number param for one-time migration - internal_decisions: enrich_migrated_entries() — runs metadata then queues halachot - server: expose as internal_decision_enrich MCP tool Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 18:59:20 +00:00
Chaim	c0f67ab841	feat(precedents): split library into court rulings + appeals committee tables All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m34s Details - /api/precedent-library now accepts source_kind param (default external_upload) - list_external_case_law returns chair_name/district fields - LibraryListPanel renders two separate tables with appropriate columns - internal_decisions migration: added queue_halachot param to defer extraction - Fixed practice_area mapping from style_corpus (appeals_committee → proper enum) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 18:49:32 +00:00
Chaim	92a2763b86	feat: add internal committee decisions corpus (source_kind='internal_committee') All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m31s Details Three-layer separation: style learning (style_corpus), appeals-committee decisions (internal_committee), and court rulings (external_upload). - SCHEMA_V10: chair_name + district columns on case_law and cases, partial indexes - create_internal_committee_decision() DB upsert function - search_precedent_library_semantic() now accepts source_kind/district/chair_name params - search_precedent_library_hybrid() passes through new params - services/internal_decisions.py: ingest_internal_decision, migrate_from_style_corpus, migrate_from_external_corpus (identifies rows via source_type='appeals_committee') - search_internal_decisions() MCP tool (server.py + tools/search.py) - internal_decision_migrate() MCP admin tool - Web endpoints: POST /api/internal-decisions/upload, POST /api/internal-decisions/migrate, GET /api/internal-decisions - ingest_final_version auto-ingests finalized decisions into internal corpus - SKILL.md updated: agents now search internal + external in parallel, present separately Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:33:39 +00:00
Chaim	1b14e04373	chore(skills): remove paperclip-dev, scope converting-plans-to-tasks All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details paperclip-dev is for maintaining the Paperclip codebase itself — not relevant to legal work. Removed from all 14 agents (was on CMPA mirror). paperclip-converting-plans-to-tasks helps decompose a plan into assigned issues. Useful for the planning-heavy agents (CEO, analyst). Now scoped to those two — removed from the other 5 in CMPA where it had crept in. Net effect: zero drift on paperclipai/* skills across all 7 master+mirror pairs. Verified via the new Agents tab dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:47:05 +00:00
Chaim	69e153b3db	fix(settings/agents): exclude noise from drift detection All checks were successful Build & Deploy / build-and-deploy (push) Successful in 32s Details Two false positives surfaced after the Agents tab went live: 1. status (running/idle/paused) is runtime state, not config — drops in and out as agents pick up issues. Removed from _DRIFT_FIELDS. 2. desiredSkills compared raw, but local/* and company/* skills carry per-company hashes/scopes by design (sync_agents_across_companies.py filters local skills with a warning). Comparing them flags every master+mirror pair that has any local skill on master. Now compares only paperclipai/* skills (vendor-shipped, must match). UI shows an inline note explaining the filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:39:17 +00:00
Chaim	702c01d678	chore(tasks): mark Task #29 done — Agents tab deployed to prod All checks were successful Build & Deploy / build-and-deploy (push) Successful in 36s Details Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:29:30 +00:00
Chaim	bd6a66e80d	chore(types): regenerate OpenAPI types from prod Some checks failed Build & Deploy / build-and-deploy (push) Has been cancelled Details Picks up the new GET /api/admin/paperclip-agents endpoint (Task #29) plus any other endpoint changes accumulated since the last regeneration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:29:17 +00:00
Chaim	af2dc0df2a	chore(gitignore): ignore precedent-library data, .db files, .bak backups All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m36s Details After committing the Paperclip gaps refactor, the .bak-pre-* sentinels served their purpose. Add a wildcard so future similar backups won't be tracked. Also ignore data/precedent-library/ (binary PDFs, 11MB) and data/*.db (sqlite caches). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:26:20 +00:00
Chaim	eab0ca906c	feat(interim): include block-he opening in pre-ruling interim drafts block-he (פתיחה ניטרלית) was previously emitted only in final decisions. For interim drafts shown to the chair before ruling, including a neutral opening helps the chair confirm framing before approving downstream blocks. Skipped if empty, so legacy cases without block-he are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:25:54 +00:00
Chaim	cf5f6fe274	feat(paperclip): close 11 integration gaps (#16-#28) Brings the legal-ai ↔ Paperclip integration in line with the official Paperclip skill. Net effect: HEARTBEAT.md -47% (370→195 lines), all 14 agents on uniform runtime_config + budget + instructionsBundleMode, and two cross-company helpers replacing manual SQL. Highlights: - HEARTBEAT.md refactor: project-specific only, delegates to the official paperclipai/paperclip skill (loaded per agent). Adds heartbeat-context fast-path (§1.7) and PAPERCLIP_WAKE_PAYLOAD_JSON shortcut (§1.5). - Issue Thread Interactions API: legal-ceo.md now uses ask_user_questions / request_confirmation / suggest_tasks instead of free-text comments — gives chair structured UI with idempotency keys. - pc.sh + paperclip_api.pc_request: every API call goes through helpers that inject Authorization + X-Paperclip-Run-Id (audit trail). - sync_agents_across_companies.py: master(CMP)→mirror(CMPA) sync via Paperclip API, idempotent, with --verify and --apply modes. - skills/new-company-setup: 11-step blueprint distilling all 11 gaps into a single onboarding runbook for the next company. - .taskmaster: 12 tasks covering each gap (one already closed: #29). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:25:45 +00:00
Chaim	6f713042b5	feat(settings): add Agents tab — read-only Paperclip agent config view Task #29: surfaces all 14 agents (7 roles × 2 companies) in /settings as master+mirror pairs with drift detection. Replaces ad-hoc psql + script inspection with a single dashboard. Backend: GET /api/admin/paperclip-agents — fetches via Paperclip API (not direct DB), groups by name, computes drift across model/effort/ timeoutSec/maxTurnsPerRun/skills/runtime_config.heartbeat/budget/status. Frontend: new AgentsTab card-per-pair with side-by-side compare, drift highlighting, expandable details (skills list + instructions path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:23:48 +00:00
Chaim	d0994704cf	feat(agents): mirror Paperclip interactions in case page All checks were successful Build & Deploy / build-and-deploy (push) Successful in 47s Details Surface issue_thread_interactions (ask_user_questions / request_confirmation / suggest_tasks) directly inside legal-ai's case detail feed so the user can answer agent prompts without switching to Paperclip's UI. Backend (FastAPI): - paperclip_client.py: 4 new helpers — get_issue_interactions (DB), respond_to_interaction / accept_interaction / reject_interaction (REST). - app.py: extends GET /api/cases/{case_number}/agents to include `interactions`, and adds POST /api/cases/{case_number}/agents/interaction-response routing to /respond, /accept, /reject in Paperclip. - paperclip_client.py: also pulls existing httpx calls onto the centralized pc_request helper (paperclip_api.py) for consistent auth + run-id headers. Frontend (web-ui, Next.js 16 + TanStack Query): - agents.ts: Interaction / InteractionPayload / InteractionStatus types, useSubmitInteraction mutation hook (invalidates the activity query). - agent-activity-feed.tsx: InteractionCard renders radio (single) / checkbox (multi) for ask_user_questions, accept/reject + reason for request_confirmation, task selection for suggest_tasks. Resolved interactions show a read-only summary. Cards are interleaved with comments by created_at, so the feed reads chronologically. Paperclip auto-wakes the issue assignee on a successful response (queueResolvedInteractionContinuationWakeup) — no explicit wakeup needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 16:40:45 +00:00
Chaim	82b29510f2	fix(settings): RTL Tabs + Hebrew labels (סביבה/כלים/בלוקים/רישומים) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 34s Details Radix Tabs defaults dir to 'ltr' if not set explicitly, which broke RTL inside Tab content (cards flowing left-to-right). Set dir='rtl' on the Tabs root and translate trigger labels to Hebrew (kept Paperclip in English as a brand name).	2026-05-04 08:42:56 +00:00
Chaim	e90faa9ba4	feat(settings): add Blocks tab — 12-block decision schema reference All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details Read-only display of BLOCK_CONFIG from block_writer.py with CREAC role and JWM functional-purpose annotations per block (sourced from docs/block-schema.md). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 07:58:04 +00:00
Chaim	ae35934383	feat(settings): wire frontend to Coolify SoT response shape - McpEnvVar: infisical_value → coolify_value + has_duplicates - McpEnvResponse: drop Infisical metadata fields - EnvVarRow: 'Coolify:' label, 'ערוך ב-Coolify' external link - DriftBadge: infisicalAvailable → coolifyAvailable - EnvironmentTab: Coolify app badge, duplicates count	2026-05-04 07:53:27 +00:00
Chaim	d1e12619d4	refactor(settings): pivot to Coolify env API as source of truth Investigation showed legal-ai container has no INFISICAL_TOKEN and there is no /legal-ai folder in Infisical — all env vars are stored in Coolify and injected into os.environ at container start. - Replace _read_infisical_values with _read_coolify_envs - New: _coolify_authoritative_value picks among Coolify duplicates - PATCH writes via Coolify API (upsert by key) - Drift = Coolify-stored vs container-runtime (common: Coolify edited without redeploy) - Response field renamed: infisical_value → coolify_value - New 'has_duplicates' flag per row when Coolify has multiple entries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 07:50:02 +00:00
Chaim	1cb832473c	fix(settings): unknown drift state when Infisical unavailable + RTL drawer - DriftBadge shows 'Unknown' (not 'Synced') when infisical_available=false - Plumb infisicalAvailable from EnvironmentTab through EnvVarRow → DriftBadge - Add dir='rtl' to ToolDetailDrawer SheetContent for Hebrew descriptions	2026-05-04 07:01:42 +00:00
Chaim	89ce6c79d7	feat(settings): implement Registrations tab Replaces stub RegistrationsTab with a full read-only view grouped by client. Handles all 4 states: loading skeleton, fetch error, host_path_unavailable, empty list, and populated data with per-registration detail rows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:50:12 +00:00
Chaim	7e3c912899	feat(settings): implement Tools tab with detail drawer Replaces stub ToolsTab with a grouped-by-module grid of clickable tool cards. Adds ToolDetailDrawer (Sheet) showing name, description, module, source_location, and params_schema for the selected tool. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:50:08 +00:00
Chaim	f418686724	feat(settings): implement Environment tab with edit + drift detection Add drift-badge, env-var-editor, env-var-row components and replace the environment-tab stub; install shadcn Switch which was missing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:47:40 +00:00
Chaim	8289b4d643	refactor(settings): split into tabs (paperclip + 3 stubs) Extracts Paperclip companies + tag-mappings UI into PaperclipTab component, adds stub tabs for Environment / Tools / Registrations, and replaces the flat page.tsx with a shadcn Tabs layout to make room for Tasks 8-10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:44:27 +00:00
Chaim	6c129a1350	feat(settings): add MCP API hooks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:41:30 +00:00
Chaim	320b9d3529	fix(settings): guard paperclip mcp.json type + sort registrations	2026-05-04 06:40:16 +00:00
Chaim	394b971856	feat(settings): add MCP registrations endpoint + Coolify volume runbook Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:38:47 +00:00
Chaim	1da3587334	fix(settings): log tool source resolution failures (no silent swallow)	2026-05-04 06:37:09 +00:00
Chaim	272e49b6b0	feat(settings): add MCP tools introspection endpoint	2026-05-04 06:34:19 +00:00
Chaim	69bdf7b30a	fix(settings): harden PATCH/redeploy per code review - Add infisicalsdk dependency - Narrow update→create fallback to NotFound errors only (no silent swallow) - Truncate Coolify error response text to 200 chars - Add 60s cooldown to redeploy endpoint - Move httpx to top-level import	2026-05-04 06:33:01 +00:00
Chaim	2fe73fcce1	feat(settings): add PATCH env + Coolify redeploy endpoints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:26:00 +00:00
Chaim	c30c987ec2	fix(settings): suppress false drift when Infisical unreachable - Add infisical_available flag to _build_env_var_row - Stabilize error code (no exception text in API response) - Document raw-comparison safety inline	2026-05-04 06:24:26 +00:00
Chaim	562eae010a	feat(settings): add GET /api/settings/mcp/env endpoint Adds four helper functions (_infisical_client, _infisical_ctx, _read_infisical_values, _build_env_var_row) and the /api/settings/mcp/env endpoint that compares Infisical vs container env vars and reports drift. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:19:04 +00:00
Chaim	a3ca32355a	fix(settings): tighten coerce/normalize per code review - reject non-integer floats in int coerce path - document masking responsibility on to_public_dict - use tuple for enum_values (immutable) - treat empty string as None in normalize_for_compare	2026-05-04 06:17:22 +00:00
Chaim	55a0eca070	feat(settings): add MCP env catalog with type validation Static whitelist of 18 env vars (multimodal, rerank, halacha, general, credentials, connection) with per-key type coercion, secret masking, and drift-comparison helpers for the upcoming settings UI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 06:11:32 +00:00
Chaim	796f9d5f9c	docs(plans): add implementation plan for MCP settings page 11 tasks across backend (catalog, env GET/PATCH, redeploy, tools introspection, registrations) and frontend (tabs refactor, environment with drift detection, tools drawer, registrations). Includes Coolify volume runbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:58:53 +00:00
Chaim	70052b0133	docs(specs): add design for MCP settings page Settings page extension to view and edit MCP server config (env vars, tools, client registrations) — hybrid edit model: non-secrets editable through Infisical, secrets read-only with drift detection vs container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:44:31 +00:00
Chaim	2f05cdea2e	feat(precedents): add /precedents/[id] read-only detail page All checks were successful Build & Deploy / build-and-deploy (push) Successful in 34s Details Global search rows linked to /precedents/<case_law_id> but no route existed, so clicking a result hit a Next 404 and React threw hydration error #418. New page reads /api/precedent-library/{id} and shows metadata, summary/headnote/key_quote, subject tags, and the full halachot roll-up. "ערוך פרטים" opens the existing PrecedentEditSheet (no duplicate edit UX). Extracted ExtractedHalachotSection + ReviewStatusPill from the edit sheet into a shared component so both surfaces render the same block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:36:43 +00:00
Chaim	bd1fb61655	feat(precedents): show extracted halachot in library edit sheet All checks were successful Build & Deploy / build-and-deploy (push) Successful in 35s Details The "ספרייה" tab only exposed approved/total counts in a status pill; to inspect the actual extracted halachot per case the chair had to use the global "ממתין לאישור" tab, which only surfaces pending items, or the MCP tool. Now the per-precedent edit sheet renders a read-only roll-up of every halacha (approved + pending + rejected) with status filter tabs and counts. Review actions intentionally stay in the review tab to avoid duplicate approve/reject UX. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:24:25 +00:00
Chaim	f6bb46dc4a	fix(retrieval): restore _base(limit=) contract in hybrid precedent search All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m23s Details `rerank.maybe_rerank` calls `base_search(limit=…, base_kwargs)` on both the rerank-on and rerank-off paths. Commit `242f668` moved the closure into hybrid_search.py and renamed its parameter to `limit_inner`, so every call to `/api/precedent-library/search` raised TypeError 500 regardless of the VOYAGE_RERANK_ENABLED flag. Sibling `search_documents_hybrid` was unaffected because it uses `lambda kw:` which absorbs the kwarg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:19:53 +00:00
Chaim	36f21c815e	fix(precedents): distinguish silent extraction failure from "no halachot" All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m5s Details Observed 2026-05-03: a `precedent_process_pending(halacha)` run that chained two precedents (1110/20 → 317/10) succeeded for the first (9 halachot, 129 chunks) and produced status=`no_halachot` for the second despite it being a 47KB Supreme Court ruling with rich legal analysis. A manual single-precedent re-run on 317/10 immediately extracted 53 halachot. Diagnosis: every chunk's claude_session call in the back-to-back run silently failed (likely Anthropic rate-limit storm after the 1110/20 token burn), and the empty list was reported as "Claude looked and found nothing" — same code path as a real 0-halacha ruling. The user couldn't tell the difference. Three changes: 1. Surface chunk-level failures (halacha_extractor.py) `_extract_chunk` now returns `(halachot, succeeded)` so the caller can count how many chunks crashed. `extract()` uses this to distinguish: - `no_halachot` — chunks ran cleanly, Claude found nothing - `extraction_failed` — ≥50% of chunks crashed AND zero halachot came back (rate limit, subprocess crash, etc.) When `extraction_failed`, DB status is left as 'processing' so the request stays in the queue for the caller to retry — instead of the old behaviour where it got marked 'completed' and silently dropped from the queue. 2. Inter-precedent cooldown (precedent_library.py) `process_pending_extractions` now sleeps 30s between precedents. Anthropic rate-limits per-org, and back-to-back large rulings (~4M tokens for 1110/20, immediately followed by another 2-3M) was the empirical trigger. 30s gives the per-minute counter time to drain. 3. Auto-retry on extraction_failed (precedent_library.py) When a precedent comes back as `extraction_failed`, retry once after a 60s cooldown before giving up. Rate-limit storms are transient — the manual re-run of 317/10 minutes later succeeded with 53 halachot and zero chunk failures, confirming a single retry is sufficient. Only retries `extraction_failed`; never `no_halachot` (Claude looked and there genuinely is no holding). The DB status now ends up as 'failed' only after retries are exhausted, matching the UI's terminal-failure chip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 05:13:10 +00:00
Chaim	d4496b96f1	fix(mcp): eliminate "No such tool available" race at agent wakeup All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details When Paperclip wakes the CEO and the model issues an mcp__legal-ai__* call within ~10s of session init, Claude Code sometimes returns "No such tool available" because the legal-ai MCP server hasn't finished bringing up its tool catalog yet. Observed twice today on CMPA precedent-extraction wakeups (sessions 9989fbaf and a9c61801); the agent fell back to bash + .venv/bin/python and finished the work, but the race needed fixing on the server side. Three changes that close the window: 1. Lazy schema init (services/db.py + server.py) `init_schema()` was awaited inside the FastMCP lifespan, blocking the `initialize`/`tools/list` handshake until ~10 CREATE TABLE IF NOT EXISTS statements ran. Under contention (two CEOs waking at once for different companies) this stretched. Now the lifespan returns immediately and `get_pool()` runs the schema migrations exactly once on first DB access, guarded by an asyncio.Lock. tools/list is answered in milliseconds regardless of DB state. 2. Lazy heavy imports - services/embeddings.py: voyageai (~450ms) loaded only inside _get_client() - services/extractor.py: google.cloud.vision (~550ms) loaded only inside _get_vision_client() and _ocr_with_google_vision() These two were being imported at module top from legal_mcp.tools.documents -> services.processor -> services.{ extractor,embeddings}, so the FastMCP server couldn't even start responding until both finished. Cold start dropped from 2.7s to 1.17s end-to-end (init + tools/list response). 3. Agent-side warmup + retry guidance (.claude/agents/legal-ceo.md) Even with a fast server, the model can still race on the very first call. The precedent-extraction section now tells the CEO to call workflow_status as a warmup probe and to retry after a short sleep if it sees "No such tool available", before falling back to the python bypass. Also expanded the precedent-tool whitelists on the sub-agents that delegate halacha/library work (commits `4a9a6b7` + `7ee90dc` added the tools to the MCP server but only the CEO got them in its allowed list). Added to: legal-researcher (full extraction set), legal-analyst (library_get/list + halacha review), legal-writer (library lookups + halacha_review), legal-qa (library_get + halacha_review), and the two that the CEO was already missing (halacha_review, halachot_pending). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:23:14 +00:00
Chaim	d12cdb1fad	docs(voyage): mark stage C complete + record empirical fixes All checks were successful Build & Deploy / build-and-deploy (push) Successful in 10s Details Stage C of the voyage-upgrades-plan shipped to production on 2026-05-03. The doc now leads with the final state and the two empirical corrections vs the original plan: 1. Reciprocal Rank Fusion replaces weighted-sum hybrid merge. voyage-3 cosines (~0.4-0.5) systematically outscale voyage-multimodal-3 cosines (~0.20-0.25); a weighted sum lets text dominate even when image is the better signal. RRF is rank-based and robust to scale differences. 2. Chunker now propagates page_number end-to-end (extractor returns per-page offsets, chunker tags each chunk by its first character's page). A retrofit script backfills page_number on existing document_chunks without re-OCR — uses the stored documents.extracted_text plus PyMuPDF direct text reads as page anchors (linear interpolation for OCR-only pages). Production state on cases 8174-24 + 8137-24: 419 page-image embeddings, 819 chunks tagged with page_number, MULTIMODAL_ENABLED=true in Coolify env, hybrid search verified A/B against text-only baseline. The original stage C plan section is retained below for reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:16:13 +00:00
Chaim	8a815ecff5	fix(retrieval): rewrite chunk-page retrofit to skip OCR All checks were successful Build & Deploy / build-and-deploy (push) Successful in 16s Details The first-pass retrofit re-extracted via extractor.extract_text, which re-runs Google Vision OCR on scanned pages. OCR is non-deterministic, so the new text didn't match the chunk content stored in the DB (produced by the original OCR run) — only ~7% of chunks were located. New approach (no OCR cost): 1. Use the stored documents.extracted_text from the DB — the exact text the chunks were produced from, so chunk lookups match. 2. Anchor page boundaries via PyMuPDF direct text reads (free, no OCR). Pages with usable direct text are anchored by snippet match; OCR-only pages are linearly interpolated between anchors. 3. Search each chunk in extracted_text using a whitespace-tolerant helper — needed because the chunker joins paragraphs with single '\\n' while extracted_text uses '\\n\\n' as page separators. Verified on 8174-24 (5 docs, 307 chunks) + 8137-24 (9 docs, 512 chunks): 100% chunks tagged, 13s total, $0 cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:04:33 +00:00
Chaim	81ccf3a888	feat(retrieval): track page_number on text chunks for multimodal hybrid boost All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6m33s Details The legacy chunker did not track which PDF page each chunk came from. Stored chunks had page_number=NULL, which blocked the multimodal hybrid retriever's text+image boost — it joins (chunk, image) on (document_id, page_number) and the join could never fire. This change: - extractor.extract_text now returns (text, page_count, page_offsets); page_offsets[i] is the start char offset of page (i+1) in the joined text. None for non-PDFs. - chunker.chunk_document accepts an optional page_offsets and tags each chunk with the page that contains its first character (uses the existing chunker logic; pages assigned post-hoc by content search to keep the diff minimal). - processor.process_document and precedent_library.ingest_precedent forward page_offsets through the chunker. New uploads now carry accurate page_number on every chunk. - Other extract_text callers (tools/documents, tools/workflow, web/app.py) updated to unpack the third element (ignored). - scripts/backfill_chunk_pages.py: per-case retrofit. Re-extracts each PDF (re-OCRs via Google Vision if needed, ~$0.0015/page), computes page_offsets, and updates page_number on every chunk by content search. Idempotent; --force re-runs on already-tagged docs. Forward-only would leave the 419 image embeddings backfilled on cases 8174-24 + 8137-24 unable to boost their corresponding text chunks. The retrofit script closes that gap (cost ~$0.60). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:49:41 +00:00
Chaim	5724ed8e5b	chore: nudge Actions to build `c31fe08` (RRF) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 6m24s Details Previous push to main did not trigger a workflow run; act-runner went silent after task 112. Empty commit to re-fire the webhook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:42:37 +00:00
Chaim	c31fe0866b	fix(retrieval): switch hybrid merge to Reciprocal Rank Fusion (RRF) Some checks are pending Build & Deploy / build-and-deploy (push) Waiting to run Details Cosine scores in voyage-3 (~0.4-0.5) and voyage-multimodal-3 (~0.2-0.25) live on different scales. The previous weighted-sum merge let text always dominate — verified empirically: 0 image-only hits across 7 queries on case 8174-24, image side contributed nothing. RRF combines by rank in each list rather than raw score, robust to scale differences. Per-item score: rrf_score = text_weight / (k + text_rank) + image_weight / (k + image_rank) A row that appears in both lists (joined on (id_field, page_number)) gets both terms — surfaced as match_type='text+image'. After fix on 8174-24 (146 image rows): 2 image-only hits land in top-5 across all 7 test queries, surfacing actual table/diagram/ signature pages (p12, p13 of שומת המשיבה for 'טבלת השוואת ערכי שומה', p25 of שומת השגה for 'תרשים גוש וחלקה', etc). On 8137-24 (273 image rows): 'חישוב היוון של דמי החכירה' goes from 0 baseline results → 5 hybrid results (3 text + 2 image), opening recall on scanned content the OCR layer misses. Default MULTIMODAL_TEXT_WEIGHT 0.65 → 0.5 (vanilla RRF) since the prior 0.65 was tuned for raw cosine scales that no longer apply. New env knob MULTIMODAL_RRF_K (default 60, standard literature). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:39:31 +00:00
Chaim	242f668319	feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m50s Details Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid text+image search. Off by default; enable with MULTIMODAL_ENABLED=true. - Schema V9: document_image_embeddings + precedent_image_embeddings (vector(1024), page_number, image_thumbnail_path) - extractor.render_pages_for_multimodal renders PDF pages at MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass - embeddings.embed_images calls voyage-multimodal-3 in 50-page batches - services/hybrid_search.py orchestrator: rerank applied to text side first (rerank-2 is text-only); image side cosine; weighted merge with text_weight 0.65 (env-tunable); image-only pages surface as match_type='image' so dense scanned content still appears - processor.process_document and precedent_library.ingest_precedent gated by flag — non-fatal on multimodal failure - scripts/multimodal_backfill.py — idempotent per-case CLI to embed existing documents without re-extracting text Validated locally on a 5-page response brief: render 0.31s, embed 8.32s, hybrid merge surfaces image rows correctly. Production rollout starts with flag=false (no behavior change), then per-case A/B. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:24:52 +00:00
Chaim	b9cdcf980d	fix(precedents): translate practice_area slugs to Hebrew in halacha review All checks were successful Build & Deploy / build-and-deploy (push) Successful in 35s Details The halacha-review panel was rendering raw slugs (`betterment_levy`, `rishuy_uvniya`, `compensation_197`) as English badges. Pipe them through the existing `practiceAreaLabel()` helper so the chair sees "היטל השבחה", "רישוי ובניה", "פיצויים לפי ס' 197". All other UI sites (library-list-panel, library-stats-panel, precedent-edit-sheet) were already using the helper — this was the sole place left rendering the raw slug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:13:48 +00:00
Chaim	36e464f668	fix(halachot): exclude embedding from update_halacha RETURNING All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details PATCH /api/halachot/{id} was returning 500 because the row included ``embedding`` as a numpy.ndarray of np.float32, which FastAPI's jsonable_encoder cannot serialize (vars() and dict() both fail on it). The bug had been latent — it triggered for the first time today after the auto-approve batch left only low-confidence halachot for the chair to review manually, and her first PATCH hit the unserializable response. Replace ``RETURNING *`` with an explicit column list (everything except ``embedding``). Callers that need the embedding can re-fetch via ``get_halacha`` — but no current caller does. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:04:46 +00:00
Chaim	4d1924c7e6	feat(halachot): auto-approve high-confidence halachot at insert All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Halachot extracted by halacha_extractor with confidence >= 0.80 are now inserted with review_status='approved' instead of 'pending_review' — they appear in search_precedent_library immediately. Halachot below the threshold still require manual chair approval. Threshold tunable via env (HALACHA_AUTO_APPROVE_THRESHOLD), defaults to 0.80. Rationale: 89% of historical extractions (356/400) score 0.80+, spot-checks confirmed quality, and the manual review backlog was the single biggest reason rerank-2 was returning passages-only on ההבחנה-style queries. After this change + the one-time backfill UPDATE, search now returns 9/10 halachot for "ההבחנה בין השבחה לפיצויים" instead of 0 — and the top-3 are exact-match rules, not adjacent passages. Reviewer field records "auto-approved (confidence ≥ X.XX)" with the threshold value at insert time, for traceability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:01:03 +00:00
Chaim	26c3fddf41	feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which 4 POCs showed inconsistent improvement), add a cross-encoder rerank layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false). POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge): - mean@3 +4.5% (4.306 → 4.500) - practical-category queries +11.6% (3.78 → 4.22) - latency +702ms per query - no schema change, no re-embed, no double storage Plumbing: - config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars - embeddings.voyage_rerank() wraps voyageai client.rerank - services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is unavailable. - tools/search.py: search_decisions, search_case_documents, find_similar_cases all wrapped - services/precedent_library.search_library wrapped Smoke-tested locally with flag on/off — produces expected behaviour and latency profile. Ready for production rollout via Coolify env flip after deploy. POCs (kept under scripts/ for reference): - voyage_context3_poc{_long}.py — context-3 evaluation (rejected) - voyage_multimodal_poc.py — multimodal-3 (stage C, deferred) - voyage_rerank_judge_poc.py — single-case rerank benchmark - voyage_rerank_corpus_poc.py — full-corpus rerank validation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:43:41 +00:00

1 2 3 4 5 ...

289 Commits