Completes the write-side rewiring (INV-STG1) for the call-sites that run in
synchronous contexts, via a new blocking facade in storage.py
(put_bytes_sync / put_file_sync — asyncio.run, or a worker thread when a loop
is already running):
- services/extractor.py: multimodal thumbnail JPEGs → DERIVED (rendered in a
to_thread worker)
- services/docx_reviser.py: track-changes save (_save_docx_xml) + empty-diff
copy (copy_with_revisions) → DOCUMENTS
- services/docx_retrofit.py: in-place retrofit backup → DOCUMENTS
Each site keeps a fallback to a direct disk write when the target path is
outside DATA_DIR (caller-provided). Under the default STORAGE_BACKEND=
filesystem the bytes land exactly where they did before — zero behaviour
change.
Also: mcp_env_catalog MINIO_ENDPOINT default updated to the durable
container-name endpoint (http://minio-bx2ykvw94xbutsex41hz4vv8:9000), matching
the Coolify "Connect to Predefined Network" change made for network durability.
All binary write-sites now flow through storage.py. git-tracked text
(case.json/notes/research-md/draft-md) stays on disk by design (INV-STG7);
court-fetch temp files are ephemeral.
tests: +2 (thumbnail renderer routes through storage; put_bytes_sync
round-trip); 55 storage/docx/track-changes green; 244 collected, no import
breakage.
Keeps G2; completes INV-STG1 write coverage. Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continue the write-site rewiring onto the unified storage layer (INV-STG1):
- services/processor.py: extracted-text .txt → DERIVED bucket (a derived
artifact; the DB column is the source of truth per INV-STG5, so the write
stays non-fatal)
- services/docx_exporter.py (export_decision): DOCX → DOCUMENTS bucket via
BytesIO → put_bytes, with a fallback to a direct disk write when the caller
passes an output_path outside DATA_DIR
- services/analysis_docx_exporter.py (build_analysis_docx): same pattern;
out_path is always under DATA_DIR
Under the default STORAGE_BACKEND=filesystem the bytes land at the exact
legacy path (put_bytes → DATA_DIR/key), so behaviour is unchanged. The
disk-reading bits that must stay for now (export_dir glob in _next_version)
are kept; storage-native versioning is a cutover concern.
Still on disk (sync call-sites, follow-up Phase 2c): docx_reviser
(track-changes), docx_retrofit backup, and multimodal thumbnails (rendered in
a to_thread). git-tracked text (case.json/notes/research-md/draft-md) stays on
disk by design (INV-STG7).
tests: 38 storage + docx tests green (incl. test_export_qa_gate /
test_docx_exporter_bookmarks which exercise the real export path); 242
collected, no import breakage.
Keeps G2; advances INV-STG1. Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewire the source-document staging writes onto the unified storage layer
(INV-STG1), replacing direct shutil.copy2 calls:
- tools/documents.py: case originals + training-corpus uploads
- services/ingest.py: _stage_file (now async) — covers precedent-library,
internal-decisions, and digests (the canonical intake helper)
- services/digest_library.py: awaits the now-async _stage_file
Each write goes through storage.put_file(..., bucket=DOCUMENTS) with the
DATA_DIR-relative key; the Hebrew original filename rides as object metadata
(INV-STG2), content-type is guessed from the extension. DB path columns are
unchanged (still the absolute dest) — object_key backfill is Phase 3.
Under the default STORAGE_BACKEND=filesystem the bytes land at the exact
legacy on-disk location (put_file → shutil.copy2 to DATA_DIR/key), so this
is zero behaviour change in prod. shutil import dropped where now unused.
tests: +2 staging regression tests (file lands under DATA_DIR at the legacy
path); 20 storage + 22 ingest tests green; 242 collected with no import
breakage.
Derived/export write sites (thumbnails, extracted text, DOCX exports) are
Phase 2b. Keeps G2; advances INV-STG1. Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The single choke-point for all binary file I/O (originals, derived
artifacts, exports), replacing the scattered open()/shutil/Path.write_bytes
calls across ~8 services. Backend chosen by STORAGE_BACKEND:
- filesystem (default): disk under DATA_DIR — byte-for-byte legacy behaviour
- dual: write disk + S3, read S3→disk fallback (migration window)
- s3: MinIO via aioboto3 (lazy import; absent in the filesystem path)
Keys are DATA_DIR-relative POSIX paths; the FS backend ignores the logical
bucket and keeps the existing single tree, so the default backend is zero
behaviour change. S3 maps a governance bucket (documents/immutable/derived)
→ MinIO bucket; presigned URLs are minted against the public endpoint
(browser-reachable) and carry the Hebrew filename via RFC-5987
Content-Disposition.
- config: STORAGE_BACKEND + MINIO_* (endpoint, public-endpoint, creds,
region, 3 bucket names, presign TTL)
- mcp_env_catalog: new "storage" category + 10 specs (X10/INV-ENV1)
- pyproject: aioboto3>=13 (consumed here, deployed with first use)
- tests: 18 unit tests (FS round-trip, key normalization/traversal guard,
bucket resolution, backend selection, dual write-both + S3-down fallback)
No call-sites are rewired yet — that is Phase 2 (106.3). STORAGE_BACKEND
stays filesystem in prod, so behaviour is unchanged.
Invariants: keeps G2 (one storage path replaces scattered I/O); establishes
INV-STG1 (single layer), INV-STG2 (atomic keys, Hebrew name in metadata),
INV-STG3 (governance buckets), INV-STG6 (presigned serving).
Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A single live page for all the background work that downloads/analyses, so the
chair can see what's running instead of guessing.
- court_fetch_service: GET /pm2 (unauthenticated, host-only) → trimmed pm2 jlist
for the legal-* services (status, restarts, mem, cron schedule).
- FastAPI GET /api/operations: aggregates the DB-backed pipelines (court_fetch
jobs, metadata + halacha extraction queues, halacha review gate,
missing_precedents, digests, recent court ingests) and proxies the host /pm2
over the docker bridge (graceful if the host service is down).
- web-ui /operations page (+ src/lib/api/operations.ts hook, nav entry under
admin): services grid (with Hebrew labels + schedules) + pipeline cards +
recent-fetch / recent-ingest lists. Auto-refreshes every 5s.
tsc --noEmit clean; pm2 status carries nothing sensitive and the bind
(10.0.1.1) is host/container-only, so /pm2 needs no secret.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
העלון החודשי "עו"ד על נדל"ן" הוא פרסום נפרד מהיומון היומי (חודשי, רב-נושאי).
לפני תכנון הקטלוג — נוריד את כל הארכיון (~29) לתיקייה. endpoint זה רק מ-stage
את ה-PDF ל-data/bulletins/incoming (ללא DB), dedup לפי content_hash. n8n ימשוך
מ-chaim.marcus@gmail (subject "עו"ד על נדל"ן") וישלח לכאן.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
שורת הניווט הצטמצמה מ-11 קישורים ישירים ל-4 קישורי-עבודה
(בית · מרכז אישורים · הערות יו״ר · ארכיון) + 2 תפריטים נפתחים:
- "פסיקה ▾": ספריית פסיקה · יומונים · פסיקה חסרה · —ניתוח וכיול— ·
מפת הקורפוס · מדגם-זהב
- "סגנון ▾": אימון סגנון · מתודולוגיה
מפת-הקורפוס, מדגם-זהב ומתודולוגיה הורדו-בדרגה מהשורה הראשית לתוך
התפריטים (לפי בקשת היו"ר) — אך כל ה-routes נשמרים, אין שינוי URL.
trigger התפריט מקבל הדגשה + קו-זהב תחתון כשאחד מילדיו פעיל;
badge "פסיקה חסרה" מוצג גם על trigger "פסיקה" וגם בתוך הפריט.
Invariants: מקיים G2 (איחוד מסלולי-ניווט, ללא יצירת מסלול מקביל —
כל הדפים נותרים נגישים, deep-links נשמרים).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 211 open missing_precedents include 99 Supreme serial-format rulings
(בג"ץ/בר"מ/עע"מ NNNN/YY) with no נט-format triple — fetchable only from
supremedecisions.court.gov.il. Decoded its public JSON API (no browser, no
CAPTCHA, no smart-card); validated live on בג"ץ 3483/05 + בר"מ 10212/16.
- court_fetch_supreme.py: rewrite. POST Home/SearchVerdicts with a structured
`document` ({Year:"YYYY", CaseNum, OldMainNumFormat:true, SearchText:[…]}) +
X-Requested-With header → records; GET Home/Download?path=&fileName=&type=4 →
PDF. The earlier attempt failed only on the request shape (string vs object).
2-digit→4-digit year; try candidate docs best-first (פסק-דין→pages), skipping
the published-report 's'-prefix files the free endpoint WAF-blocks.
- orchestrator: on successful ingest, close matching open missing_precedents
(link to the new case_law). End-to-end validated (בר"מ 10212/16 → corpus).
- backfill_missing_precedents.py: enqueue fetchable open gaps (supreme + net)
into court_fetch_jobs; the drainer fetches+ingests+closes. dry-run default.
- X13 spec + SCRIPTS.md updated (Tier-0 decoded, no longer a limitation).
Very old un-digitized Supreme cases (e.g. בג"ץ 389/87 → 0 records) → manual.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds docs/spec/X14-storage-minio.md — the domain spec + phased plan for
migrating binary document storage from the local data/ tree to the
already-deployed MinIO service (Coolify svc `minio`).
Captures: disk inventory, scattered file-I/O map (~8 services, no central
layer), DB path columns, MinIO deploy state, Paperclip = API-consumer only.
Defines 7 domain invariants (INV-STG1..7) and a 7-phase execution plan.
Chair decisions (2026-06-08): git-per-case keeps text/metadata + MinIO holds
binaries (INV-STG7); WORM Object-Lock on FINAL decisions only (INV-STG4);
internal Docker network for legal-ai↔MinIO.
Invariants: keeps G2 (single storage path replaces scattered I/O);
INV-STG1..7 new. Spec-only PR — no code/behavior change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Gemini key is stored in Infisical as GOOGLE_GEMINI_API_KEY
(nautilus /external-apis/gemini). Align the panel to read that canonical name
first, falling back to bare GEMINI_API_KEY for back-compat — so an
Infisical→.env sync keeps working.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
משלים את #141 בצד-לקוח: שדה digest_kind ב-Digest type (hand-written), ותג
"עדכון" ב-DigestCard לגיליונות announcement (לא-הכרעות). decision = ברירת-מחדל
ללא תג. זורם דרך /api/digests (digest_kind כבר ב-_DIGEST_COLS).
build (webpack) עובר, lint נקי בקבצי digests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The halacha extraction queue was stuck (same class as the metadata issue): 26
precedents requested extraction with no drainer, plus 1 orphaned in 'processing'
(status=processing, requested_at cleared → never re-picked by the queue).
- db.requeue_stale_processing_extractions(kind): re-stamp orphaned 'processing'
rows (requested_at IS NULL) so they re-drain; halacha extractor force=False
resumes from chunk checkpoints (no duplicates).
- process_pending_extractions calls it at the top — fully unattended, safe under
the global advisory lock. Mirrors the digests-drain self-heal.
- legal-halacha-drain.config.cjs: pm2 cron (every 2h, conservative — Claude is
slow/rate-limited and each run adds to the chair's pending_review queue).
drain_halacha_queue.py stays on claude_session (high reasoning quality for
holding/ratio; NOT moved to Gemini). SCRIPTS.md.
The chair-approval gate (INV-G10) is untouched — this only produces halachot;
Daphna still approves each in /approvals.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records the now-complete corpus citation graph: why native not Obsidian (G2),
the 6 opt-in node layers (precedent/topic/practice-area · halacha · gaps ·
digests), node size/color semantics, the Graph Analysis metrics
(PageRank/betweenness/community via web/graph_metrics.py), navigation, the
/api/graph/* endpoints, the key files, a how-to-extend recipe, the invariants
(G2/G5/UI2/UI4), and the PR history.
Adds docs/corpus-graph.md + a reference-table row in legal-ai/CLAUDE.md.
Docs only — no code change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`npm run api:types` — brings the generated src/lib/api/types.ts up to date
with the live FastAPI schema (UI1: types derive from the OpenAPI SSoT). The
file had drifted; this regen captures the corpus-graph endpoints/models
(/api/graph/corpus, /api/graph/facets, /api/graph/node/{id}/neighborhood;
CorpusGraph / GraphNode / GraphFacets) plus accumulated changes from other
merged work. web-ui build passes against the regenerated types.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /precedents metadata queue was stuck — 24 rows requested, nothing draining
them — and the agentic claude CLI hit error_max_turns on what is a single
structured text→JSON task (slow + flaky). Metadata extraction is bounded
extraction, the wrong fit for an agentic loop.
- gemini_session.py: query_json drop-in (gemini-2.5-flash, JSON mode, httpx —
no new SDK dep). Reads GEMINI_API_KEY (~/.env; SoT Infisical
nautilus:/external-apis/gemini). Host-side only — no LLM from the container.
- precedent_metadata_extractor: claude_session.query_json → gemini_session.
Validated live: rich, accurate fields (case_name/summary/appeal_subtype/tags).
- process_pending_extractions: kind-aware cooldown — metadata 2s (Gemini, fast),
halacha keeps 30s (Claude rate limits).
- drain_metadata_queue.py + legal-metadata-drain.config.cjs (pm2 cron */15) so
the queue never clogs again. SCRIPTS.md.
- X8 INV-FP5 updated: per-task engine choice (Gemini=bounded metadata,
claude_session=agentic halacha), both host-side, single canonical queue (G2).
Agentic/voice-sensitive work (writing, analysis, halacha) stays on claude_session
(Daphna's subscription). Gemini cost ≈ $0.10/1M tokens — negligible.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enables the previously-disabled "הלכות" toggle. Each approved/published halacha
of a displayed precedent becomes a hal:<id> node linked to its parent
precedent (extracted_from); two cross-rule edges when both endpoints are in
view: corroborates (a later ruling cites the rule —
halacha_citation_corroboration) and equivalent (same principle from another
committee — equivalent_halachot). Node size = corroboration in-degree.
Backend (web/graph_api.py — read-only, G2):
- _halacha_nodes_and_edges(): halachot WHERE case_law_id in view AND
review_status IN (approved, published), LIMIT 600; rule_type carried in the
source_kind slot, rule_statement in note. Wired into both build functions
(gated via node_types). Metrics still exclude halacha edges (only cites/
precedent-typed feed PageRank). Validated: 185 halachot on the top-30
precedents; 20 corroboration + 5 equivalent edges in the corpus.
Frontend:
- graph.ts: GraphEdgeType += extracted_from.
- graph-filter-panel: "הלכות" toggle enabled (was disabled "שלב ב׳").
- graph-canvas: amber halacha nodes; edge colours — extracted_from (faint
amber), corroborates (amber), equivalent (violet).
- graph-node-panel: halacha branch — אזכורים + סוג כלל + rule text; "open in
library" deep-links to the parent precedent.
- graph-view: halacha added to node + edge legends.
web-ui build + lint pass. Invariants: G2 (SELECT-only), UI2 (no model change —
reuses note/source_kind/case_law_id slots).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Periodic safety net for the multi-judge approval panel: samples panel-approved
halachot, re-runs the same 3-judge KEEP vote, and surfaces any that now lean
DROP — candidate false-keeps a human should glance at. Report-only by default;
--flag reopens flips to pending_review. Baseline 0/15 on the 2026-06-07 batch.
Closes the loop the literature prescribes (Trust-or-Escalate / selective
prediction): monitor the auto-decision error rate rather than trusting it
blindly. Reuses halacha_panel_approve's judges (single source of truth).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
אותו יומון יכול להגיע כשני PDF שונים (re-send/forward → בייטים שונים →
content_hash dedup מפספס), אבל yomon_number ייחודי → ה-update ב-enrich מתנגש
על uq_digests_yomon_number. עכשיו enrich תופס את ההתנגשות, מוחק את השורה
הכפולה (היומון כבר קיים), ומחזיר status='duplicate' — כך ה-cron לא מנסה אותה
שוב ושוב. סוגר לולאת-retry אינסופית פוטנציאלית במערכת הלא-מאוישת.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final corpus-graph PR. Connects the graph to the chair's workflow and rounds
out the Obsidian-grade interactions.
Backend (web/graph_api.py): neighborhood depth cap 2 → 3 (still bounded by
NODE_CAP_MAX).
Frontend:
- URL deep-link: /graph?focus=cl:<id> is read on mount and written on focus
change (router.replace, scroll:false). GraphView wrapped in <Suspense> per
Next 16's useSearchParams requirement.
- "הצג בגרף" button on the precedent detail page → /graph?focus=cl:<id>.
- Depth slider (1–3) in the focused overlay → useNodeNeighborhood(id, depth).
- Export PNG: grabs the rendered <canvas> from the area ref → toDataURL →
download; failures surface a toast (UI4).
- Rich node panel: precedent nodes fetch headnote/summary via the existing
usePrecedent hook (Skeleton while pending, error surfaced — UI4).
- Edge-type legend (ציטוט / נושא-תחום / יומון) added under the node legend.
Deferred (noted for a later pass): expand-in-place merge, search→camera-center.
web-ui build + lint pass. Invariants: G2 (depth change is read-only), UI4
(PNG + detail errors surfaced, not swallowed). api:types post-deploy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
drain_digests רץ תחת flock (drainer יחיד), אז כל שורה 'processing' בתחילת ריצה
היא שריד מריצה קודמת שנקטעה באמצע-שורה (סשן/מכסה). מאפסים אותה ל-'pending'
לריצה חוזרת — סוגר את הפער האחרון ל-resume אוטומטי מלא ללא התערבות.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chaim's idea: surface the downloaded "כל יום" digests in the graph. Each digest
COVERS the ruling it analyses — a corpus precedent when we have it (16), or a
synthesized gap node from its underlying_citation when we don't (269). So the
digest layer doubles as a discovery signal: it makes visible that the daily
feed overwhelmingly covers rulings NOT yet in the corpus.
Backend (web/graph_api.py — read-only, G2):
- "digest" added to VALID_NODE_TYPES (off by default).
- _digest_nodes_and_edges(): dig:<id> nodes from completed digests, `covers`
edge → cl:precedent (linked_case_law_id in view) or → gap:<underlying_citation>
(synthesized, deduped against the gap layer — real in-degree wins). Carries
concept_tag (label), headline_holding (note), underlying_court/date.
- _add_digests() appends the layer with gap dedup. Wired into both build
functions. GraphNode += note, digest_id. Gated via node_types (no app.py
change). Validated: 16 covers→precedent, 269 covers→gap.
Frontend:
- graph.ts: GraphNodeType += "digest"; GraphEdgeType += "covers"; node fields.
- graph-filter-panel: toggle "יומונים (כל יום)" (off by default).
- graph-canvas: digest = teal node (r=4); `covers` edges teal.
- graph-node-panel: digest branch — concept + holding + court/date + link to
/digests.
web-ui build + lint pass. Invariants: G2 (SELECT-only), UI2. api:types post-deploy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>