Commit Graph

289 Commits

Author SHA1 Message Date
77e5996497 feat(agents): wire Hermes Knowledge Curator to CEO post-export (CMP + CMPA)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m37s
Adds new sub-agent "מנהל ידע" (hermes_local adapter) that runs after
each successful export to analyze the final decision and suggest updates
to skills/decision/SKILL.md and lessons. Read-only on case data, write
only on a single comment per run.

- legal-ceo.md: new stage F2 after F (export). Looks up curator by name
  in current company, creates async sub-issue, no waiting. Falls back to
  silent skip if no curator configured.
- legal-ceo.md: agents table updated with both curator UUIDs (CMP + CMPA).
- hermes-curator.md: role instructions documenting CMP/CMPA split and
  what the curator does/does not do.

Stage 1 POC. End-to-end validated on CMP-68 (case 1130-25) with two
substantive findings on style patterns. CMPA agent created with separate
~/.hermes/profiles/curator-cmpa profile (own MEMORY.md focused on
היטל השבחה / פיצויים).

Known gaps to follow up: curator does not auto-close its issue, does
not auto-persist findings to MEMORY.md, comment attribution falls back
to chaim's user (install-key) — these are tracked separately and do
not block validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:33:23 +00:00
69d4827f33 feat(migration): enrich internal committee entries — fix case_number + metadata + halachot
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m32s
- precedent_metadata_extractor: add case_number_clean extraction field
- apply_to_record: overwrite_case_number param for one-time migration
- internal_decisions: enrich_migrated_entries() — runs metadata then queues halachot
- server: expose as internal_decision_enrich MCP tool

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 18:59:20 +00:00
c0f67ab841 feat(precedents): split library into court rulings + appeals committee tables
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m34s
- /api/precedent-library now accepts source_kind param (default external_upload)
- list_external_case_law returns chair_name/district fields
- LibraryListPanel renders two separate tables with appropriate columns
- internal_decisions migration: added queue_halachot param to defer extraction
- Fixed practice_area mapping from style_corpus (appeals_committee → proper enum)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 18:49:32 +00:00
92a2763b86 feat: add internal committee decisions corpus (source_kind='internal_committee')
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m31s
Three-layer separation: style learning (style_corpus), appeals-committee decisions
(internal_committee), and court rulings (external_upload).

- SCHEMA_V10: chair_name + district columns on case_law and cases, partial indexes
- create_internal_committee_decision() DB upsert function
- search_precedent_library_semantic() now accepts source_kind/district/chair_name params
- search_precedent_library_hybrid() passes through new params
- services/internal_decisions.py: ingest_internal_decision, migrate_from_style_corpus,
  migrate_from_external_corpus (identifies rows via source_type='appeals_committee')
- search_internal_decisions() MCP tool (server.py + tools/search.py)
- internal_decision_migrate() MCP admin tool
- Web endpoints: POST /api/internal-decisions/upload, POST /api/internal-decisions/migrate,
  GET /api/internal-decisions
- ingest_final_version auto-ingests finalized decisions into internal corpus
- SKILL.md updated: agents now search internal + external in parallel, present separately

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:33:39 +00:00
1b14e04373 chore(skills): remove paperclip-dev, scope converting-plans-to-tasks
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 7s
paperclip-dev is for maintaining the Paperclip codebase itself — not
relevant to legal work. Removed from all 14 agents (was on CMPA mirror).

paperclip-converting-plans-to-tasks helps decompose a plan into assigned
issues. Useful for the planning-heavy agents (CEO, analyst). Now scoped
to those two — removed from the other 5 in CMPA where it had crept in.

Net effect: zero drift on paperclipai/* skills across all 7 master+mirror
pairs. Verified via the new Agents tab dashboard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:47:05 +00:00
69e153b3db fix(settings/agents): exclude noise from drift detection
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 32s
Two false positives surfaced after the Agents tab went live:

1. status (running/idle/paused) is runtime state, not config — drops in
   and out as agents pick up issues. Removed from _DRIFT_FIELDS.

2. desiredSkills compared raw, but local/* and company/* skills carry
   per-company hashes/scopes by design (sync_agents_across_companies.py
   filters local skills with a warning). Comparing them flags every
   master+mirror pair that has any local skill on master.

Now compares only paperclipai/* skills (vendor-shipped, must match).
UI shows an inline note explaining the filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:39:17 +00:00
702c01d678 chore(tasks): mark Task #29 done — Agents tab deployed to prod
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 36s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:29:30 +00:00
bd6a66e80d chore(types): regenerate OpenAPI types from prod
Some checks failed
Build & Deploy / build-and-deploy (push) Has been cancelled
Picks up the new GET /api/admin/paperclip-agents endpoint (Task #29) plus
any other endpoint changes accumulated since the last regeneration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:29:17 +00:00
af2dc0df2a chore(gitignore): ignore precedent-library data, .db files, .bak backups
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m36s
After committing the Paperclip gaps refactor, the .bak-pre-* sentinels
served their purpose. Add a wildcard so future similar backups won't be
tracked. Also ignore data/precedent-library/ (binary PDFs, 11MB) and
data/*.db (sqlite caches).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:26:20 +00:00
eab0ca906c feat(interim): include block-he opening in pre-ruling interim drafts
block-he (פתיחה ניטרלית) was previously emitted only in final decisions.
For interim drafts shown to the chair before ruling, including a neutral
opening helps the chair confirm framing before approving downstream blocks.
Skipped if empty, so legacy cases without block-he are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:25:54 +00:00
cf5f6fe274 feat(paperclip): close 11 integration gaps (#16-#28)
Brings the legal-ai ↔ Paperclip integration in line with the official
Paperclip skill. Net effect: HEARTBEAT.md -47% (370→195 lines), all 14
agents on uniform runtime_config + budget + instructionsBundleMode, and
two cross-company helpers replacing manual SQL.

Highlights:
- HEARTBEAT.md refactor: project-specific only, delegates to the official
  paperclipai/paperclip skill (loaded per agent). Adds heartbeat-context
  fast-path (§1.7) and PAPERCLIP_WAKE_PAYLOAD_JSON shortcut (§1.5).
- Issue Thread Interactions API: legal-ceo.md now uses
  ask_user_questions / request_confirmation / suggest_tasks instead of
  free-text comments — gives chair structured UI with idempotency keys.
- pc.sh + paperclip_api.pc_request: every API call goes through helpers
  that inject Authorization + X-Paperclip-Run-Id (audit trail).
- sync_agents_across_companies.py: master(CMP)→mirror(CMPA) sync via
  Paperclip API, idempotent, with --verify and --apply modes.
- skills/new-company-setup: 11-step blueprint distilling all 11 gaps
  into a single onboarding runbook for the next company.
- .taskmaster: 12 tasks covering each gap (one already closed: #29).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:25:45 +00:00
6f713042b5 feat(settings): add Agents tab — read-only Paperclip agent config view
Task #29: surfaces all 14 agents (7 roles × 2 companies) in /settings as
master+mirror pairs with drift detection. Replaces ad-hoc psql + script
inspection with a single dashboard.

Backend: GET /api/admin/paperclip-agents — fetches via Paperclip API
(not direct DB), groups by name, computes drift across model/effort/
timeoutSec/maxTurnsPerRun/skills/runtime_config.heartbeat/budget/status.

Frontend: new AgentsTab card-per-pair with side-by-side compare,
drift highlighting, expandable details (skills list + instructions path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:23:48 +00:00
d0994704cf feat(agents): mirror Paperclip interactions in case page
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 47s
Surface issue_thread_interactions (ask_user_questions / request_confirmation /
suggest_tasks) directly inside legal-ai's case detail feed so the user can
answer agent prompts without switching to Paperclip's UI.

Backend (FastAPI):
- paperclip_client.py: 4 new helpers — get_issue_interactions (DB),
  respond_to_interaction / accept_interaction / reject_interaction (REST).
- app.py: extends GET /api/cases/{case_number}/agents to include
  `interactions`, and adds POST /api/cases/{case_number}/agents/interaction-response
  routing to /respond, /accept, /reject in Paperclip.
- paperclip_client.py: also pulls existing httpx calls onto the centralized
  pc_request helper (paperclip_api.py) for consistent auth + run-id headers.

Frontend (web-ui, Next.js 16 + TanStack Query):
- agents.ts: Interaction / InteractionPayload / InteractionStatus types,
  useSubmitInteraction mutation hook (invalidates the activity query).
- agent-activity-feed.tsx: InteractionCard renders radio (single) /
  checkbox (multi) for ask_user_questions, accept/reject + reason for
  request_confirmation, task selection for suggest_tasks. Resolved
  interactions show a read-only summary. Cards are interleaved with
  comments by created_at, so the feed reads chronologically.

Paperclip auto-wakes the issue assignee on a successful response
(queueResolvedInteractionContinuationWakeup) — no explicit wakeup needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 16:40:45 +00:00
82b29510f2 fix(settings): RTL Tabs + Hebrew labels (סביבה/כלים/בלוקים/רישומים)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 34s
Radix Tabs defaults dir to 'ltr' if not set explicitly, which broke
RTL inside Tab content (cards flowing left-to-right). Set dir='rtl'
on the Tabs root and translate trigger labels to Hebrew (kept
Paperclip in English as a brand name).
2026-05-04 08:42:56 +00:00
e90faa9ba4 feat(settings): add Blocks tab — 12-block decision schema reference
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
Read-only display of BLOCK_CONFIG from block_writer.py with CREAC role
and JWM functional-purpose annotations per block (sourced from
docs/block-schema.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 07:58:04 +00:00
ae35934383 feat(settings): wire frontend to Coolify SoT response shape
- McpEnvVar: infisical_value → coolify_value + has_duplicates
- McpEnvResponse: drop Infisical metadata fields
- EnvVarRow: 'Coolify:' label, 'ערוך ב-Coolify' external link
- DriftBadge: infisicalAvailable → coolifyAvailable
- EnvironmentTab: Coolify app badge, duplicates count
2026-05-04 07:53:27 +00:00
d1e12619d4 refactor(settings): pivot to Coolify env API as source of truth
Investigation showed legal-ai container has no INFISICAL_TOKEN and there
is no /legal-ai folder in Infisical — all env vars are stored in Coolify
and injected into os.environ at container start.

- Replace _read_infisical_values with _read_coolify_envs
- New: _coolify_authoritative_value picks among Coolify duplicates
- PATCH writes via Coolify API (upsert by key)
- Drift = Coolify-stored vs container-runtime (common: Coolify edited
  without redeploy)
- Response field renamed: infisical_value → coolify_value
- New 'has_duplicates' flag per row when Coolify has multiple entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 07:50:02 +00:00
1cb832473c fix(settings): unknown drift state when Infisical unavailable + RTL drawer
- DriftBadge shows 'Unknown' (not 'Synced') when infisical_available=false
- Plumb infisicalAvailable from EnvironmentTab through EnvVarRow → DriftBadge
- Add dir='rtl' to ToolDetailDrawer SheetContent for Hebrew descriptions
2026-05-04 07:01:42 +00:00
89ce6c79d7 feat(settings): implement Registrations tab
Replaces stub RegistrationsTab with a full read-only view grouped by client.
Handles all 4 states: loading skeleton, fetch error, host_path_unavailable,
empty list, and populated data with per-registration detail rows.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:50:12 +00:00
7e3c912899 feat(settings): implement Tools tab with detail drawer
Replaces stub ToolsTab with a grouped-by-module grid of clickable tool cards.
Adds ToolDetailDrawer (Sheet) showing name, description, module, source_location,
and params_schema for the selected tool.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:50:08 +00:00
f418686724 feat(settings): implement Environment tab with edit + drift detection
Add drift-badge, env-var-editor, env-var-row components and replace the
environment-tab stub; install shadcn Switch which was missing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:47:40 +00:00
8289b4d643 refactor(settings): split into tabs (paperclip + 3 stubs)
Extracts Paperclip companies + tag-mappings UI into PaperclipTab component,
adds stub tabs for Environment / Tools / Registrations, and replaces the flat
page.tsx with a shadcn Tabs layout to make room for Tasks 8-10.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:44:27 +00:00
6c129a1350 feat(settings): add MCP API hooks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:41:30 +00:00
320b9d3529 fix(settings): guard paperclip mcp.json type + sort registrations 2026-05-04 06:40:16 +00:00
394b971856 feat(settings): add MCP registrations endpoint + Coolify volume runbook
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:38:47 +00:00
1da3587334 fix(settings): log tool source resolution failures (no silent swallow) 2026-05-04 06:37:09 +00:00
272e49b6b0 feat(settings): add MCP tools introspection endpoint 2026-05-04 06:34:19 +00:00
69bdf7b30a fix(settings): harden PATCH/redeploy per code review
- Add infisicalsdk dependency
- Narrow update→create fallback to NotFound errors only (no silent swallow)
- Truncate Coolify error response text to 200 chars
- Add 60s cooldown to redeploy endpoint
- Move httpx to top-level import
2026-05-04 06:33:01 +00:00
2fe73fcce1 feat(settings): add PATCH env + Coolify redeploy endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:26:00 +00:00
c30c987ec2 fix(settings): suppress false drift when Infisical unreachable
- Add infisical_available flag to _build_env_var_row
- Stabilize error code (no exception text in API response)
- Document raw-comparison safety inline
2026-05-04 06:24:26 +00:00
562eae010a feat(settings): add GET /api/settings/mcp/env endpoint
Adds four helper functions (_infisical_client, _infisical_ctx,
_read_infisical_values, _build_env_var_row) and the /api/settings/mcp/env
endpoint that compares Infisical vs container env vars and reports drift.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:19:04 +00:00
a3ca32355a fix(settings): tighten coerce/normalize per code review
- reject non-integer floats in int coerce path
- document masking responsibility on to_public_dict
- use tuple for enum_values (immutable)
- treat empty string as None in normalize_for_compare
2026-05-04 06:17:22 +00:00
55a0eca070 feat(settings): add MCP env catalog with type validation
Static whitelist of 18 env vars (multimodal, rerank, halacha, general,
credentials, connection) with per-key type coercion, secret masking, and
drift-comparison helpers for the upcoming settings UI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 06:11:32 +00:00
796f9d5f9c docs(plans): add implementation plan for MCP settings page
11 tasks across backend (catalog, env GET/PATCH, redeploy, tools introspection,
registrations) and frontend (tabs refactor, environment with drift detection,
tools drawer, registrations). Includes Coolify volume runbook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:58:53 +00:00
70052b0133 docs(specs): add design for MCP settings page
Settings page extension to view and edit MCP server config (env vars,
tools, client registrations) — hybrid edit model: non-secrets editable
through Infisical, secrets read-only with drift detection vs container.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:44:31 +00:00
2f05cdea2e feat(precedents): add /precedents/[id] read-only detail page
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 34s
Global search rows linked to /precedents/<case_law_id> but no route
existed, so clicking a result hit a Next 404 and React threw hydration
error #418. New page reads /api/precedent-library/{id} and shows
metadata, summary/headnote/key_quote, subject tags, and the full
halachot roll-up. "ערוך פרטים" opens the existing PrecedentEditSheet
(no duplicate edit UX).

Extracted ExtractedHalachotSection + ReviewStatusPill from the edit
sheet into a shared component so both surfaces render the same block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:36:43 +00:00
bd1fb61655 feat(precedents): show extracted halachot in library edit sheet
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 35s
The "ספרייה" tab only exposed approved/total counts in a status pill;
to inspect the actual extracted halachot per case the chair had to use
the global "ממתין לאישור" tab, which only surfaces pending items, or
the MCP tool. Now the per-precedent edit sheet renders a read-only
roll-up of every halacha (approved + pending + rejected) with status
filter tabs and counts. Review actions intentionally stay in the
review tab to avoid duplicate approve/reject UX.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:24:25 +00:00
f6bb46dc4a fix(retrieval): restore _base(limit=) contract in hybrid precedent search
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m23s
`rerank.maybe_rerank` calls `base_search(limit=…, **base_kwargs)` on both
the rerank-on and rerank-off paths. Commit 242f668 moved the closure into
hybrid_search.py and renamed its parameter to `limit_inner`, so every call
to `/api/precedent-library/search` raised TypeError 500 regardless of the
VOYAGE_RERANK_ENABLED flag. Sibling `search_documents_hybrid` was unaffected
because it uses `lambda **kw:` which absorbs the kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:19:53 +00:00
36f21c815e fix(precedents): distinguish silent extraction failure from "no halachot"
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m5s
Observed 2026-05-03: a `precedent_process_pending(halacha)` run that
chained two precedents (1110/20 → 317/10) succeeded for the first
(9 halachot, 129 chunks) and produced status=`no_halachot` for the
second despite it being a 47KB Supreme Court ruling with rich legal
analysis. A manual single-precedent re-run on 317/10 immediately
extracted 53 halachot. Diagnosis: every chunk's claude_session call
in the back-to-back run silently failed (likely Anthropic rate-limit
storm after the 1110/20 token burn), and the empty list was reported
as "Claude looked and found nothing" — same code path as a real
0-halacha ruling. The user couldn't tell the difference.

Three changes:

1. Surface chunk-level failures (halacha_extractor.py)
   `_extract_chunk` now returns `(halachot, succeeded)` so the caller
   can count how many chunks crashed. `extract()` uses this to
   distinguish:
   - `no_halachot` — chunks ran cleanly, Claude found nothing
   - `extraction_failed` — ≥50% of chunks crashed AND zero halachot
     came back (rate limit, subprocess crash, etc.)
   When `extraction_failed`, DB status is left as 'processing' so the
   request stays in the queue for the caller to retry — instead of
   the old behaviour where it got marked 'completed' and silently
   dropped from the queue.

2. Inter-precedent cooldown (precedent_library.py)
   `process_pending_extractions` now sleeps 30s between precedents.
   Anthropic rate-limits per-org, and back-to-back large rulings
   (~4M tokens for 1110/20, immediately followed by another 2-3M)
   was the empirical trigger. 30s gives the per-minute counter time
   to drain.

3. Auto-retry on extraction_failed (precedent_library.py)
   When a precedent comes back as `extraction_failed`, retry once
   after a 60s cooldown before giving up. Rate-limit storms are
   transient — the manual re-run of 317/10 minutes later succeeded
   with 53 halachot and zero chunk failures, confirming a single
   retry is sufficient. Only retries `extraction_failed`; never
   `no_halachot` (Claude looked and there genuinely is no holding).

The DB status now ends up as 'failed' only after retries are
exhausted, matching the UI's terminal-failure chip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:13:10 +00:00
d4496b96f1 fix(mcp): eliminate "No such tool available" race at agent wakeup
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m26s
When Paperclip wakes the CEO and the model issues an mcp__legal-ai__*
call within ~10s of session init, Claude Code sometimes returns
"No such tool available" because the legal-ai MCP server hasn't
finished bringing up its tool catalog yet. Observed twice today on
CMPA precedent-extraction wakeups (sessions 9989fbaf and a9c61801);
the agent fell back to bash + .venv/bin/python and finished the work,
but the race needed fixing on the server side.

Three changes that close the window:

1. Lazy schema init (services/db.py + server.py)
   `init_schema()` was awaited inside the FastMCP lifespan, blocking
   the `initialize`/`tools/list` handshake until ~10 CREATE TABLE IF
   NOT EXISTS statements ran. Under contention (two CEOs waking at
   once for different companies) this stretched. Now the lifespan
   returns immediately and `get_pool()` runs the schema migrations
   exactly once on first DB access, guarded by an asyncio.Lock.
   tools/list is answered in milliseconds regardless of DB state.

2. Lazy heavy imports
   - services/embeddings.py: voyageai (~450ms) loaded only inside
     _get_client()
   - services/extractor.py: google.cloud.vision (~550ms) loaded only
     inside _get_vision_client() and _ocr_with_google_vision()
   These two were being imported at module top from
   legal_mcp.tools.documents -> services.processor -> services.{
   extractor,embeddings}, so the FastMCP server couldn't even start
   responding until both finished. Cold start dropped from 2.7s to
   1.17s end-to-end (init + tools/list response).

3. Agent-side warmup + retry guidance (.claude/agents/legal-ceo.md)
   Even with a fast server, the model can still race on the very
   first call. The precedent-extraction section now tells the CEO
   to call workflow_status as a warmup probe and to retry after a
   short sleep if it sees "No such tool available", before falling
   back to the python bypass.

Also expanded the precedent-tool whitelists on the sub-agents that
delegate halacha/library work (commits 4a9a6b7 + 7ee90dc added the
tools to the MCP server but only the CEO got them in its allowed
list). Added to: legal-researcher (full extraction set), legal-analyst
(library_get/list + halacha review), legal-writer (library lookups +
halacha_review), legal-qa (library_get + halacha_review), and the two
that the CEO was already missing (halacha_review, halachot_pending).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:23:14 +00:00
d12cdb1fad docs(voyage): mark stage C complete + record empirical fixes
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 10s
Stage C of the voyage-upgrades-plan shipped to production on
2026-05-03. The doc now leads with the final state and the two
empirical corrections vs the original plan:

1. Reciprocal Rank Fusion replaces weighted-sum hybrid merge.
   voyage-3 cosines (~0.4-0.5) systematically outscale
   voyage-multimodal-3 cosines (~0.20-0.25); a weighted sum lets
   text dominate even when image is the better signal. RRF is
   rank-based and robust to scale differences.

2. Chunker now propagates page_number end-to-end (extractor returns
   per-page offsets, chunker tags each chunk by its first character's
   page). A retrofit script backfills page_number on existing
   document_chunks without re-OCR — uses the stored
   documents.extracted_text plus PyMuPDF direct text reads as page
   anchors (linear interpolation for OCR-only pages).

Production state on cases 8174-24 + 8137-24: 419 page-image
embeddings, 819 chunks tagged with page_number, MULTIMODAL_ENABLED=true
in Coolify env, hybrid search verified A/B against text-only baseline.

The original stage C plan section is retained below for reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:16:13 +00:00
8a815ecff5 fix(retrieval): rewrite chunk-page retrofit to skip OCR
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 16s
The first-pass retrofit re-extracted via extractor.extract_text, which
re-runs Google Vision OCR on scanned pages. OCR is non-deterministic,
so the new text didn't match the chunk content stored in the DB
(produced by the original OCR run) — only ~7% of chunks were located.

New approach (no OCR cost):

1. Use the stored documents.extracted_text from the DB — the exact
   text the chunks were produced from, so chunk lookups match.
2. Anchor page boundaries via PyMuPDF direct text reads (free, no
   OCR). Pages with usable direct text are anchored by snippet match;
   OCR-only pages are linearly interpolated between anchors.
3. Search each chunk in extracted_text using a whitespace-tolerant
   helper — needed because the chunker joins paragraphs with single
   '\\n' while extracted_text uses '\\n\\n' as page separators.

Verified on 8174-24 (5 docs, 307 chunks) + 8137-24 (9 docs, 512
chunks): 100% chunks tagged, 13s total, $0 cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:04:33 +00:00
81ccf3a888 feat(retrieval): track page_number on text chunks for multimodal hybrid boost
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 6m33s
The legacy chunker did not track which PDF page each chunk came from.
Stored chunks had page_number=NULL, which blocked the multimodal
hybrid retriever's text+image boost — it joins (chunk, image) on
(document_id, page_number) and the join could never fire.

This change:

- extractor.extract_text now returns (text, page_count, page_offsets);
  page_offsets[i] is the start char offset of page (i+1) in the joined
  text. None for non-PDFs.
- chunker.chunk_document accepts an optional page_offsets and tags
  each chunk with the page that contains its first character (uses
  the existing chunker logic; pages assigned post-hoc by content
  search to keep the diff minimal).
- processor.process_document and precedent_library.ingest_precedent
  forward page_offsets through the chunker. New uploads now carry
  accurate page_number on every chunk.
- Other extract_text callers (tools/documents, tools/workflow,
  web/app.py) updated to unpack the third element (ignored).
- scripts/backfill_chunk_pages.py: per-case retrofit. Re-extracts each
  PDF (re-OCRs via Google Vision if needed, ~$0.0015/page), computes
  page_offsets, and updates page_number on every chunk by content
  search. Idempotent; --force re-runs on already-tagged docs.

Forward-only would leave the 419 image embeddings backfilled on
cases 8174-24 + 8137-24 unable to boost their corresponding text
chunks. The retrofit script closes that gap (cost ~$0.60).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:49:41 +00:00
5724ed8e5b chore: nudge Actions to build c31fe08 (RRF)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 6m24s
Previous push to main did not trigger a workflow run; act-runner
went silent after task 112. Empty commit to re-fire the webhook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:42:37 +00:00
c31fe0866b fix(retrieval): switch hybrid merge to Reciprocal Rank Fusion (RRF)
Some checks are pending
Build & Deploy / build-and-deploy (push) Waiting to run
Cosine scores in voyage-3 (~0.4-0.5) and voyage-multimodal-3
(~0.2-0.25) live on different scales. The previous weighted-sum
merge let text always dominate — verified empirically: 0 image-only
hits across 7 queries on case 8174-24, image side contributed nothing.

RRF combines by *rank* in each list rather than raw score, robust
to scale differences. Per-item score:

    rrf_score = text_weight / (k + text_rank)
              + image_weight / (k + image_rank)

A row that appears in both lists (joined on (id_field, page_number))
gets both terms — surfaced as match_type='text+image'.

After fix on 8174-24 (146 image rows): 2 image-only hits land in
top-5 across all 7 test queries, surfacing actual table/diagram/
signature pages (p12, p13 of שומת המשיבה for 'טבלת השוואת ערכי שומה',
p25 of שומת השגה for 'תרשים גוש וחלקה', etc).

On 8137-24 (273 image rows): 'חישוב היוון של דמי החכירה' goes from
0 baseline results → 5 hybrid results (3 text + 2 image), opening
recall on scanned content the OCR layer misses.

Default MULTIMODAL_TEXT_WEIGHT 0.65 → 0.5 (vanilla RRF) since the
prior 0.65 was tuned for raw cosine scales that no longer apply.
New env knob MULTIMODAL_RRF_K (default 60, standard literature).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:39:31 +00:00
242f668319 feat(retrieval): add voyage-multimodal-3 page-image embeddings (feature flag)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m50s
Stage C: per-page image embeddings via voyage-multimodal-3 + hybrid
text+image search. Off by default; enable with MULTIMODAL_ENABLED=true.

- Schema V9: document_image_embeddings + precedent_image_embeddings
  (vector(1024), page_number, image_thumbnail_path)
- extractor.render_pages_for_multimodal renders PDF pages at
  MULTIMODAL_DPI (144) for embedding + JPEG thumbnails at
  MULTIMODAL_THUMB_DPI (96) for UI preview, in one pass
- embeddings.embed_images calls voyage-multimodal-3 in 50-page batches
- services/hybrid_search.py orchestrator: rerank applied to text side
  first (rerank-2 is text-only); image side cosine; weighted merge
  with text_weight 0.65 (env-tunable); image-only pages surface as
  match_type='image' so dense scanned content still appears
- processor.process_document and precedent_library.ingest_precedent
  gated by flag — non-fatal on multimodal failure
- scripts/multimodal_backfill.py — idempotent per-case CLI to embed
  existing documents without re-extracting text

Validated locally on a 5-page response brief: render 0.31s, embed 8.32s,
hybrid merge surfaces image rows correctly. Production rollout starts
with flag=false (no behavior change), then per-case A/B.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:24:52 +00:00
b9cdcf980d fix(precedents): translate practice_area slugs to Hebrew in halacha review
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 35s
The halacha-review panel was rendering raw slugs (`betterment_levy`,
`rishuy_uvniya`, `compensation_197`) as English badges. Pipe them through
the existing `practiceAreaLabel()` helper so the chair sees
"היטל השבחה", "רישוי ובניה", "פיצויים לפי ס' 197".

All other UI sites (library-list-panel, library-stats-panel,
precedent-edit-sheet) were already using the helper — this was the
sole place left rendering the raw slug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:13:48 +00:00
36e464f668 fix(halachot): exclude embedding from update_halacha RETURNING
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m26s
PATCH /api/halachot/{id} was returning 500 because the row included
``embedding`` as a numpy.ndarray of np.float32, which FastAPI's
jsonable_encoder cannot serialize (vars() and dict() both fail on it).

The bug had been latent — it triggered for the first time today after
the auto-approve batch left only low-confidence halachot for the chair
to review manually, and her first PATCH hit the unserializable response.

Replace ``RETURNING *`` with an explicit column list (everything except
``embedding``). Callers that need the embedding can re-fetch via
``get_halacha`` — but no current caller does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:04:46 +00:00
4d1924c7e6 feat(halachot): auto-approve high-confidence halachot at insert
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m29s
Halachot extracted by halacha_extractor with confidence >= 0.80 are now
inserted with review_status='approved' instead of 'pending_review' —
they appear in search_precedent_library immediately. Halachot below the
threshold still require manual chair approval.

Threshold tunable via env (HALACHA_AUTO_APPROVE_THRESHOLD), defaults to
0.80. Rationale: 89% of historical extractions (356/400) score 0.80+,
spot-checks confirmed quality, and the manual review backlog was the
single biggest reason rerank-2 was returning passages-only on
ההבחנה-style queries.

After this change + the one-time backfill UPDATE, search now returns
9/10 halachot for "ההבחנה בין השבחה לפיצויים" instead of 0 — and the
top-3 are exact-match rules, not adjacent passages.

Reviewer field records "auto-approved (confidence ≥ X.XX)" with the
threshold value at insert time, for traceability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:01:03 +00:00
26c3fddf41 feat(retrieval): add voyage rerank-2 cross-encoder stage (feature flag)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m29s
Stage B of voyage-upgrades-plan rewritten: instead of context-3 (which
4 POCs showed inconsistent improvement), add a cross-encoder rerank
layer on top of voyage-3. Default off (VOYAGE_RERANK_ENABLED=false).

POC validation (785-doc corpus, 12 queries, claude-haiku-4-5 judge):
- mean@3 +4.5% (4.306 → 4.500)
- practical-category queries +11.6% (3.78 → 4.22)
- latency +702ms per query
- no schema change, no re-embed, no double storage

Plumbing:
- config: VOYAGE_RERANK_ENABLED / _MODEL / _FETCH_K env vars
- embeddings.voyage_rerank() wraps voyageai client.rerank
- services/rerank.py: maybe_rerank() helper — fetches FETCH_K candidates
  via the bi-encoder then reranks to top-K. Fail-open if Voyage rerank is
  unavailable.
- tools/search.py: search_decisions, search_case_documents,
  find_similar_cases all wrapped
- services/precedent_library.search_library wrapped

Smoke-tested locally with flag on/off — produces expected behaviour and
latency profile. Ready for production rollout via Coolify env flip after
deploy.

POCs (kept under scripts/ for reference):
- voyage_context3_poc{_long}.py — context-3 evaluation (rejected)
- voyage_multimodal_poc.py — multimodal-3 (stage C, deferred)
- voyage_rerank_judge_poc.py — single-case rerank benchmark
- voyage_rerank_corpus_poc.py — full-corpus rerank validation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:43:41 +00:00