תיקון data-loss: reset_halacha_extraction ביצע DELETE ללא-תנאי לפני חילוץ-מחדש;
קריסה בין המחיקה לאחסון הראשון מחקה את כל אישורי-היו"ר והשאירה את הרשומה תקועה
status='processing' עם 0 שורות (תקרית עמיאל 8126-03-25, 2026-06-08).
עכשיו המחיקה מחריגה review_status IN ('approved','published') — אישור אנושי לא
נמחק בשקט (INV-G10). ה-dedup-on-insert של store_halachot_for_chunk מדלג על חילוץ
טרי שמשכפל מאושרת שנשמרה, כך שאין כפילות. reset מחזיר {deleted, preserved},
וה-extractor מתעד כמה מאושרות נשמרו (provenance, G9).
עמידות מלאה מול מוות-תהליך (OOM) נשארת ל-X16/#114 (durable resume) — זה תנאי-מקדים.
בדיקה: test_halacha_reextract_preserves_approved.py (offline SQL-capture) מאמת
שה-DELETE מחריג approved/published; 64 בדיקות-הלכה קיימות עוברות.
Invariants: G10 (שער-יו"ר — אישור לא נמחק), G1 (תיקון במקור), G9 (provenance).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
הקישור טיעון↔פרופוזיציות כבר נשמר ב-DB (legal_argument_propositions),
אך ה-UI הציג רק את המספר. מעשיר את get_legal_arguments באותו round-trip
(JOIN ל-claims) להחזיר supporting_propositions = {id, text, source_document},
ועוטף את שורת "מסתמך על N פרופוזיציות" ב-Popover שמציג את הטענות הגולמיות
verbatim עם מקור. שקיפות ועקיבוּת מהטיעון המאוגד חזרה לטענות-המקור.
- supporting_claims נשאר id-only (תאימות לאחור: מונה, צרכני MCP)
- supporting_propositions שדה חדש אופציונלי; fallback לטקסט סטטי כשחסר
- אין מסלול מקביל (G2) — העשרה של אותו endpoint; נרמול-במקור (G1)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Restarting/stopping legal-court-fetch-service from its own /pm2/control kills
the process before it can reply — the client got a misleading 502 even though
pm2 performed the restart. Detach the self-action (sleep 1; pm2 ...) so the HTTP
response flushes first, and report success optimistically. Other targets are
unchanged. Own name via COURT_FETCH_SERVICE_PM2_NAME (default legal-court-fetch-service).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
הדף הציג את התורים באופן לא-אחיד (by_status גולמי), בלי הבחנה בין "ממתין"
(בקלוג: status=pending) ל"בתור" (התור הפעיל: requested_at IS NOT NULL), בלי
הצגת הפריט שרץ כרגע, ובלי שום שליטה בתהליכים.
מה נוסף:
1. כרטיסי-תור אחידים — בתור / ממתין(בקלוג) / בעיבוד / הושלם / נכשל + "רץ עכשיו"
(citation/case_number של הפריט בעיבוד) לכל drain (אחזור-פסיקה, מטא-דאטה,
הלכות, יומונים). שערי-אנוש (אישור-הלכות, פסיקה-חסרה) נשארים מוני-סטטוס.
2. פאנל ניהול-תהליכים בסגנון "שירותי Windows":
- דמון (court-fetch-service/xvfb/chat/reaper): הפעל-מחדש / עצור / הפעל.
- cron drain: "הרץ עכשיו" (pm2 restart) + מתג הפעל/כבה תזמון.
3. כל תגי-הסטטוס מתורגמים לעברית.
מנגנון:
- הפעל/כבה תזמון = דגל ב-DB (טבלה drain_controls). pm2 cron_restart מחיה תהליך
שעוצר ב-stop, לכן ה"כיבוי" האמין הוא דגל שכל drain בודק ב-startup (no-op מיידי
כשכבוי). הקונטיינר כותב/קורא ישירות מ-DB.
- הרץ-עכשיו + restart/stop/start = proxy ל-pm2 דרך endpoint חדש בגשר-המארח
(court_fetch_service /pm2/control), מאובטח Bearer + whitelist ל-legal-* בלבד.
- יומונים: drain_digests הועבר מ-crontab ל-pm2 (legal-digest-drain.config.cjs)
כדי שיופיע ויהיה שליט כמו כל drain. drain_halacha_queue.py הובא לבקרת-גרסאות.
Invariants: מקיים G2 (הרחבת /operations + הגשר הקיים, לא מסלול מקביל) ו-G1
(drain_controls = מקור-אמת יחיד לכיבוי, נורמליזציה במקור ולא תיקון-בקריאה).
אין בליעת שגיאות שקטה (הגשר מחזיר {ok,error}; המוטציות מציגות toast).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the write-side rewiring (INV-STG1) for the call-sites that run in
synchronous contexts, via a new blocking facade in storage.py
(put_bytes_sync / put_file_sync — asyncio.run, or a worker thread when a loop
is already running):
- services/extractor.py: multimodal thumbnail JPEGs → DERIVED (rendered in a
to_thread worker)
- services/docx_reviser.py: track-changes save (_save_docx_xml) + empty-diff
copy (copy_with_revisions) → DOCUMENTS
- services/docx_retrofit.py: in-place retrofit backup → DOCUMENTS
Each site keeps a fallback to a direct disk write when the target path is
outside DATA_DIR (caller-provided). Under the default STORAGE_BACKEND=
filesystem the bytes land exactly where they did before — zero behaviour
change.
Also: mcp_env_catalog MINIO_ENDPOINT default updated to the durable
container-name endpoint (http://minio-bx2ykvw94xbutsex41hz4vv8:9000), matching
the Coolify "Connect to Predefined Network" change made for network durability.
All binary write-sites now flow through storage.py. git-tracked text
(case.json/notes/research-md/draft-md) stays on disk by design (INV-STG7);
court-fetch temp files are ephemeral.
tests: +2 (thumbnail renderer routes through storage; put_bytes_sync
round-trip); 55 storage/docx/track-changes green; 244 collected, no import
breakage.
Keeps G2; completes INV-STG1 write coverage. Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continue the write-site rewiring onto the unified storage layer (INV-STG1):
- services/processor.py: extracted-text .txt → DERIVED bucket (a derived
artifact; the DB column is the source of truth per INV-STG5, so the write
stays non-fatal)
- services/docx_exporter.py (export_decision): DOCX → DOCUMENTS bucket via
BytesIO → put_bytes, with a fallback to a direct disk write when the caller
passes an output_path outside DATA_DIR
- services/analysis_docx_exporter.py (build_analysis_docx): same pattern;
out_path is always under DATA_DIR
Under the default STORAGE_BACKEND=filesystem the bytes land at the exact
legacy path (put_bytes → DATA_DIR/key), so behaviour is unchanged. The
disk-reading bits that must stay for now (export_dir glob in _next_version)
are kept; storage-native versioning is a cutover concern.
Still on disk (sync call-sites, follow-up Phase 2c): docx_reviser
(track-changes), docx_retrofit backup, and multimodal thumbnails (rendered in
a to_thread). git-tracked text (case.json/notes/research-md/draft-md) stays on
disk by design (INV-STG7).
tests: 38 storage + docx tests green (incl. test_export_qa_gate /
test_docx_exporter_bookmarks which exercise the real export path); 242
collected, no import breakage.
Keeps G2; advances INV-STG1. Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewire the source-document staging writes onto the unified storage layer
(INV-STG1), replacing direct shutil.copy2 calls:
- tools/documents.py: case originals + training-corpus uploads
- services/ingest.py: _stage_file (now async) — covers precedent-library,
internal-decisions, and digests (the canonical intake helper)
- services/digest_library.py: awaits the now-async _stage_file
Each write goes through storage.put_file(..., bucket=DOCUMENTS) with the
DATA_DIR-relative key; the Hebrew original filename rides as object metadata
(INV-STG2), content-type is guessed from the extension. DB path columns are
unchanged (still the absolute dest) — object_key backfill is Phase 3.
Under the default STORAGE_BACKEND=filesystem the bytes land at the exact
legacy on-disk location (put_file → shutil.copy2 to DATA_DIR/key), so this
is zero behaviour change in prod. shutil import dropped where now unused.
tests: +2 staging regression tests (file lands under DATA_DIR at the legacy
path); 20 storage + 22 ingest tests green; 242 collected with no import
breakage.
Derived/export write sites (thumbnails, extracted text, DOCX exports) are
Phase 2b. Keeps G2; advances INV-STG1. Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The single choke-point for all binary file I/O (originals, derived
artifacts, exports), replacing the scattered open()/shutil/Path.write_bytes
calls across ~8 services. Backend chosen by STORAGE_BACKEND:
- filesystem (default): disk under DATA_DIR — byte-for-byte legacy behaviour
- dual: write disk + S3, read S3→disk fallback (migration window)
- s3: MinIO via aioboto3 (lazy import; absent in the filesystem path)
Keys are DATA_DIR-relative POSIX paths; the FS backend ignores the logical
bucket and keeps the existing single tree, so the default backend is zero
behaviour change. S3 maps a governance bucket (documents/immutable/derived)
→ MinIO bucket; presigned URLs are minted against the public endpoint
(browser-reachable) and carry the Hebrew filename via RFC-5987
Content-Disposition.
- config: STORAGE_BACKEND + MINIO_* (endpoint, public-endpoint, creds,
region, 3 bucket names, presign TTL)
- mcp_env_catalog: new "storage" category + 10 specs (X10/INV-ENV1)
- pyproject: aioboto3>=13 (consumed here, deployed with first use)
- tests: 18 unit tests (FS round-trip, key normalization/traversal guard,
bucket resolution, backend selection, dual write-both + S3-down fallback)
No call-sites are rewired yet — that is Phase 2 (106.3). STORAGE_BACKEND
stays filesystem in prod, so behaviour is unchanged.
Invariants: keeps G2 (one storage path replaces scattered I/O); establishes
INV-STG1 (single layer), INV-STG2 (atomic keys, Hebrew name in metadata),
INV-STG3 (governance buckets), INV-STG6 (presigned serving).
Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A single live page for all the background work that downloads/analyses, so the
chair can see what's running instead of guessing.
- court_fetch_service: GET /pm2 (unauthenticated, host-only) → trimmed pm2 jlist
for the legal-* services (status, restarts, mem, cron schedule).
- FastAPI GET /api/operations: aggregates the DB-backed pipelines (court_fetch
jobs, metadata + halacha extraction queues, halacha review gate,
missing_precedents, digests, recent court ingests) and proxies the host /pm2
over the docker bridge (graceful if the host service is down).
- web-ui /operations page (+ src/lib/api/operations.ts hook, nav entry under
admin): services grid (with Hebrew labels + schedules) + pipeline cards +
recent-fetch / recent-ingest lists. Auto-refreshes every 5s.
tsc --noEmit clean; pm2 status carries nothing sensitive and the bind
(10.0.1.1) is host/container-only, so /pm2 needs no secret.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 211 open missing_precedents include 99 Supreme serial-format rulings
(בג"ץ/בר"מ/עע"מ NNNN/YY) with no נט-format triple — fetchable only from
supremedecisions.court.gov.il. Decoded its public JSON API (no browser, no
CAPTCHA, no smart-card); validated live on בג"ץ 3483/05 + בר"מ 10212/16.
- court_fetch_supreme.py: rewrite. POST Home/SearchVerdicts with a structured
`document` ({Year:"YYYY", CaseNum, OldMainNumFormat:true, SearchText:[…]}) +
X-Requested-With header → records; GET Home/Download?path=&fileName=&type=4 →
PDF. The earlier attempt failed only on the request shape (string vs object).
2-digit→4-digit year; try candidate docs best-first (פסק-דין→pages), skipping
the published-report 's'-prefix files the free endpoint WAF-blocks.
- orchestrator: on successful ingest, close matching open missing_precedents
(link to the new case_law). End-to-end validated (בר"מ 10212/16 → corpus).
- backfill_missing_precedents.py: enqueue fetchable open gaps (supreme + net)
into court_fetch_jobs; the drainer fetches+ingests+closes. dry-run default.
- X13 spec + SCRIPTS.md updated (Tier-0 decoded, no longer a limitation).
Very old un-digitized Supreme cases (e.g. בג"ץ 389/87 → 0 records) → manual.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The halacha extraction queue was stuck (same class as the metadata issue): 26
precedents requested extraction with no drainer, plus 1 orphaned in 'processing'
(status=processing, requested_at cleared → never re-picked by the queue).
- db.requeue_stale_processing_extractions(kind): re-stamp orphaned 'processing'
rows (requested_at IS NULL) so they re-drain; halacha extractor force=False
resumes from chunk checkpoints (no duplicates).
- process_pending_extractions calls it at the top — fully unattended, safe under
the global advisory lock. Mirrors the digests-drain self-heal.
- legal-halacha-drain.config.cjs: pm2 cron (every 2h, conservative — Claude is
slow/rate-limited and each run adds to the chair's pending_review queue).
drain_halacha_queue.py stays on claude_session (high reasoning quality for
holding/ratio; NOT moved to Gemini). SCRIPTS.md.
The chair-approval gate (INV-G10) is untouched — this only produces halachot;
Daphna still approves each in /approvals.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /precedents metadata queue was stuck — 24 rows requested, nothing draining
them — and the agentic claude CLI hit error_max_turns on what is a single
structured text→JSON task (slow + flaky). Metadata extraction is bounded
extraction, the wrong fit for an agentic loop.
- gemini_session.py: query_json drop-in (gemini-2.5-flash, JSON mode, httpx —
no new SDK dep). Reads GEMINI_API_KEY (~/.env; SoT Infisical
nautilus:/external-apis/gemini). Host-side only — no LLM from the container.
- precedent_metadata_extractor: claude_session.query_json → gemini_session.
Validated live: rich, accurate fields (case_name/summary/appeal_subtype/tags).
- process_pending_extractions: kind-aware cooldown — metadata 2s (Gemini, fast),
halacha keeps 30s (Claude rate limits).
- drain_metadata_queue.py + legal-metadata-drain.config.cjs (pm2 cron */15) so
the queue never clogs again. SCRIPTS.md.
- X8 INV-FP5 updated: per-task engine choice (Gemini=bounded metadata,
claude_session=agentic halacha), both host-side, single canonical queue (G2).
Agentic/voice-sensitive work (writing, analysis, halacha) stays on claude_session
(Daphna's subscription). Gemini cost ≈ $0.10/1M tokens — negligible.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
אותו יומון יכול להגיע כשני PDF שונים (re-send/forward → בייטים שונים →
content_hash dedup מפספס), אבל yomon_number ייחודי → ה-update ב-enrich מתנגש
על uq_digests_yomon_number. עכשיו enrich תופס את ההתנגשות, מוחק את השורה
הכפולה (היומון כבר קיים), ומחזיר status='duplicate' — כך ה-cron לא מנסה אותה
שוב ושוב. סוגר לולאת-retry אינסופית פוטנציאלית במערכת הלא-מאוישת.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Live drain surfaced three issues:
1. Tier-0 needed `h2` (httpx http2) — added to the court-fetch extra.
2. Supreme cases that carry a נט-format number (e.g. בר"מ 72182-06-25) were
routed to the unvalidated Tier-0 and failed, even though נט המשפט serves
Supreme cases too. classify() now parses the file-month-year triple for
Supreme prefixes; the orchestrator routes by triple-availability:
נט-format present → Tier-1 (validated, all courts)
serial-only Supreme (עע"מ 5886/24) → Tier-0
neither → clear "no public route" failure
Validated live: בר"מ 72182-06-25 fetched via Tier-1 (5-page PDF).
3. A non-`RuntimeError` fetch exception (the h2 import error) left jobs stuck
in 'running'. The fetch block now catches any Exception → _record_failure
(INV-CF2/CF3), so a job always reaches a terminal state.
+ test_supreme_with_net_format_triple. Suite 11/11.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- scripts/drain_court_fetch.py: drives orchestrator.drain_pending (host-only;
no-op when queue empty). Mirrors drain_halacha_queue.py.
- scripts/legal-court-fetch-drain.config.cjs: pm2 cron (hourly :17, one-shot),
COURT_FETCH_DRAIN_CRON override.
- fix: orchestrator default service URL 127.0.0.1 → 10.0.1.1 (the service binds
the docker0 gateway; the host can't reach it on loopback). Found live — the
first drain failed "connection refused" until corrected.
- SCRIPTS.md entries.
Validated end-to-end in PRODUCTION on a real digest: עת"מ 43830-12-24
(החברה להגנת הטבע) fetched from נט המשפט → case_law (79 chunks, source_url),
digest relinked (INV-DIG3 closed), halacha queued pending_review. job=done.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
חילוץ-המטא-דאטה של יומון הוא טקסט→JSON טהור, אבל ה-claude CLI רץ עם tools
זמינים, ו-Sonnet לפעמים פולט stop_reason=tool_use → פוגע ב---max-turns 1 →
error_max_turns → retry (איטי). מבזבז זמן רב בגיבוי-המוני.
- claude_session.query/query_json: פרמטר חדש `tools` → מועבר כ---tools.
"" = ביטוי כל ה-tools (אין tool_use → אין max-turns trip). None = ברירת-CLI.
- digest_metadata_extractor.extract: מעביר tools="".
אומת: extract על יומון 5160 ב-Sonnet+tools="" → num_turns=1, JSON תקין, ללא
error_max_turns. claude_session נשאר local-only.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
סוגר את הלולאה — יומון שמצביע על פס"ד בית-משפט שלא בקורפוס מזניק אחזור
אוטומטי, וקושר את היומון חזרה אחרי הקליטה (INV-DIG3 + INV-CF2).
- digest_library.try_autolink: בכשל-קישור, אם הציטוט מסווג כפס"ד-בימ"ש
(supreme/admin) → _enqueue_court_fetch יוצר court_fetch_jobs(pending);
ועדת-ערר (skip) לא מוזנק. never-raises (לא שובר קליטת-יומון).
- orchestrator.drain_pending(limit): מנקז pending/failed סדרתי (cooldown,
INV-CF4), fetch+ingest לכל אחד; בהצלחה מקשר את היומון ל-case_law שנקלט.
- כלי-MCP court_fetch_drain + רישום ב-server.py.
- X13 spec: עודכן (הפער ב-INV-CF2 סומן כמתוקן).
נבדק מול ה-DB: עת"מ 46111-12-22 → job tier=admin pending digest-linked;
ערר 1110/20 → לא מוזנק. כלי מקומי בלבד (ingest = claude CLI).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
חילוץ-המטא-דאטה של יומון (תג-מושג, כותרת-הלכה, מראה-מקום, תגיות מסיכום
עמוד-אחד) הוא משימה פשוטה בנפח גבוה — Sonnet הוא נקודת-האיזון מהירות/עלות,
בניגוד לחילוץ-הלכות שמצמיד Opus.
- config.DIGEST_EXTRACT_MODEL (env-tunable, ברירת-מחדל claude-sonnet-4-6).
- digest_metadata_extractor.extract(model=None) → ברירת-מחדל מה-config; קודם
לא צוין model → רץ על ברירת-המחדל של ה-CLI (Opus 4.8).
אומת: extract על יומון 5163 עם Sonnet החזיר תג-מושג/כותרת/מראה-מקום/תחום/
תגיות תקינים (~36s). claude_session נשאר local-only.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
שלוש שכבות-הגנה נגד דליפת-זיכרון מדפדפנים יתומים, + טיפול בדליפה הגדולה
בפועל בשרת (task-master-mcp).
- camofox_client.py:
- asyncio.wait_for קשיח סביב כל ה-fetch (COURT_FETCH_HARD_TIMEOUT_S=180ש')
— hang → ביטול → async-with tear-down → reap.
- _reap_orphan_browsers(): הורג camoufox-bin יתומים (ppid=1) לפני ואחרי כל
fetch. סדרתיות (INV-CF4) → כל ppid=1 הוא שארית בטוחה.
- scripts/reap_orphan_procs.py: reaper כללי ל-task-master-mcp (~3GB יתומים)
+ camoufox-bin. רק ppid=1; /proc טהור. --dry-run / --loop N.
- scripts/legal-reaper.config.cjs: דמון pm2 (loop 180s, max_memory_restart 100M).
- X13 spec + SCRIPTS.md: תיעוד שכבות-ההגנה.
max_memory_restart בשירות (1.5G) כבר נותן רשת-ביטחון ברמת-התהליך.
Invariants: מקיים INV-CF4 (politeness/serial) — ללא שינוי חוזה.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The extractor classified rule_type by SOURCE bindingness (higher-court→binding,
committee→persuasive) instead of by rule KIND. The gold-set proved it: 'binding'
appeared on 19/19 external rulings & 0 committees; 'persuasive' on 13/13
committees & 0 external — only 58% agreement with the human role tags. The two
axes (authority vs rule role) were crammed into one enum.
This splits them per INV-DM7:
- authority (binding/persuasive) — DERIVED from case_law.precedent_level
(עליון/מנהלי→binding, ועדת_ערר_מחוזית→persuasive), never stored, never
LLM-guessed. New helper halacha_quality.derive_authority; surfaced read-only
in list_halachot / goldset_list / search results.
- rule_type — now the rule ROLE only: holding/interpretive/procedural/
application/obiter. Both extractor prompts unified to this vocabulary;
_coerce_halacha no longer defaults rule_type from the source; legacy
binding→holding / persuasive→interpretive fold for safety.
UI: authority shown as a separate read-only badge (gold=מחייב / muted=משכנע)
across the review queue, precedent detail, and gold-set; the gold-set role
selector drops binding/persuasive and adds מהותי (holding).
Migration: scripts/halacha_rule_role_backfill.py re-classifies the 276 pre-split
binding/persuasive rows into a genuine role via local claude_session (run after
deploy). Gold-set correct_type/ai_correct_type 'binding'→'holding' via SQL.
Sources (≥3, per research-decision policy): OASIS LegalRuleML v1.0
(appliesAuthority/Strength as metadata orthogonal to rule logic) · SemEval-2023
Task 6 LegalEval (rhetorical roles by function, authority kept separate) ·
Bluebook signals (weight-of-authority is a separate dimension).
Invariants: ESTABLISHES INV-DM7. Upholds G1 (normalize at source — extractor
classifies role, system derives authority) and G2 (single source of truth —
authority derived, not a parallel stored field). Tests: 211 pass + new
derive_authority/coerce coverage. web-ui build + tsc clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chair wanted an independent recommendation beside each tag, to reconsider
his own judgments. Adds a NON-ground-truth AI second-opinion:
- schema: halacha_goldset.ai_is_holding / ai_correct_type / ai_rationale /
ai_generated_at (additive).
- db.goldset_set_ai_recommendation + goldset_list now returns the ai_* fields.
- scripts/goldset_ai_recommend.py — local claude_session judges is_holding +
type + a one-line rationale per item, INDEPENDENTLY (own legal rubric).
Independent of the rule-based validators #81.8 measures → no circularity.
Never auto-applied; QA aid only.
- web-ui: each card shows "🤖 המלצת AI: הלכה/לא · type" + rationale and an
agreement/disagreement chip vs the human tag (amber on disagree); a
"⚠ אי-הסכמות AI (N)" filter to review only the conflicts.
Methodology note kept explicit: the human stays the ground truth; the AI is a
prompt to reconsider, not to copy.
Verified: tsc --noEmit 0; generator stores recs and flags disagreements with
existing human tags.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tagging is easier one source-type at a time. goldset_list now returns
case_law.source_type; the page adds:
- a filter (הכל / פסקי דין / ועדת ערר) with live counts,
- a group-sort so even in "הכל" all court rulings come first, then all
committee decisions,
- a per-card source badge (פסק-דין / ועדת ערר).
Verified: tsc --noEmit 0; source_type splits the live batch 58 court / 92 committee.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the CSV-edit workflow with an in-app tagging page so the chair/Dafna
can label the extraction-quality gold-set by clicking, and see validator
precision/recall live.
Schema (V29): halacha_goldset — a stratified, human-tagged evaluation batch
(is_holding / correct_type / quote_complete, NULL until tagged).
db.py:
- goldset_create_sample (stratified round-robin over case×rule_type, idempotent),
- goldset_list (items + halacha content + the machine's own labels),
- goldset_tag (partial — one field at a time for keyboard tagging),
- goldset_score (ports the script's P/R/F1: each validator scored as a
not-a-holding detector against the human tags — the #81.8 input).
API: GET /api/goldset, POST /api/goldset/sample, GET /api/goldset/score,
PATCH /api/goldset/{id}.
web-ui:
- lib/api/goldset.ts (hooks),
- components/goldset/goldset-panel.tsx — card-per-item, keyboard-first
(J/K nav, H/N holding, C/X quote), progress bar, hide-tagged toggle, and a
collapsible live score table,
- app/goldset/page.tsx + nav link "מדגם-זהב" under ידע ולמידה.
Methodology guard kept explicit in UI + docstrings: tags are HUMAN ground truth,
no AI pre-fill (circular bias). Populated a 150-item stratified batch.
Verified: backend create/list/tag/score against the live DB; tsc --noEmit 0;
py_compile ok. (Local Turbopack build blocked by worktree symlink — CI builds clean.)
Invariants: G1 (eval set modeled at source in its own table); G2 (reuses the same
halacha_quality validators the extractor runs — no parallel scoring logic).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cross-precedent recurrence of a principle is real but is NOT citation
corroboration (X11) — the 5 candidate pairs have ZERO citations between their
precedents. Recording them in halacha_citation_corroboration would fabricate
citation data and inflate corroboration_count. This adds a proper, separate
halacha-level link for parallel authority.
Schema (V28): equivalent_halachot — symmetric (halacha_a < halacha_b, CHECK +
UNIQUE), non-citation, cross-precedent-only. ON DELETE CASCADE.
db.py:
- link_equivalent_halachot (idempotent; rejects same-id and SAME-precedent pairs
— parallel authority is cross-precedent by definition), unlink, and
list_equivalent_for_halacha.
- list_halachot gains include_equivalents → _annotate_equivalents attaches an
`equivalents` list (both directions) per row.
API: include_equivalents on GET /api/halachot; GET/POST/DELETE
/api/halachot/{id}/equivalents for the chair to view/link/unlink manually.
scripts/halacha_batch_reconcile.py: --link records found cross-precedent pairs
as equivalent_halachot (non-destructive, idempotent).
web-ui: Halacha.equivalents type; the clean review queue fetches
include_equivalents; the review card shows a gold "עיקרון מקביל ב-N" badge + an
expandable list (case + rule + similarity) labeled "אסמכתה מקבילה — לא ציטוט".
Populated the 5 reviewed pairs (chair decision: keep all + link as parallel
authority). Verified: 5 rows; the 1023-20 hub annotates 3 of its halachot with
equivalents; tsc --noEmit exits 0.
Invariants: G1 (model recurrence at source in its own table, not by abusing the
citator); G2 (no parallel path — extends list_halachot); citator integrity
preserved (corroboration stays citation-only).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
style_distance.measure_corpus_ratios(): מפצל כל החלטה ב-style_corpus לסעיפים
(chunker) ומחשב ממוצע %-סעיף — אגרגט "_all" + פר-תוצאה (כשיש). cached.
get_style_guide מציג שורת "נמדד בפועל" עם ⚠️ על פער מטווח-היעד.
מצב נוכחי: style_corpus.outcome לא מאוכלס → מוצג אגרגט כל-ההחלטות (n=48:
רקע 26.4% / טענות 9.7% / דיון 43.8% / סיכום 20.1%); פיצול לפי-תוצאה future-ready.
המדידה גם מאירה מגבלות זיהוי-סעיפים (כוונת T10 — לסמן פער לבדיקה). חופף-חלקית
ל-T7 שמודד adherence per-draft; זה מודד את הקורפוס. כשל מדידה מוצג, לא נבלע.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes #84 — surfaces the backend gating/prioritization (#84.1/#84.3, PR
#93) in the chair's review UI and adds near-duplicate clustering (#84.2).
Backend
- db.list_halachot gains `cluster` (#84.2): annotates each row with cluster_id +
cluster_size by unioning same-precedent halachot within HALACHA_CLUSTER_COSINE
(0.90, new config). Display-only — never merges/deletes. Pairwise is confined
to the returned set (cheap).
- GET /api/halachot exposes the `cluster` query param (default off).
Frontend (web-ui)
- Halacha type gains optional cluster_id / cluster_size (hand-written module; no
api:types regen needed — halachot aren't typed off the generated schema).
- useHalachotPending(opts): the default "clean" queue now fetches
exclude_low_quality + order_by_priority + cluster; needsFix:true returns the
flagged 'needs extraction fix' bucket (filtered client-side).
- HalachaReviewPanel: a "תור נקי / דורש תיקון-חילוץ" toggle (#84.1); near-dup
clusters collapse into ONE card showing "+N וריאנטים" with an expandable list,
and approve/reject/defer on a clustered card applies to all variants via the
batch endpoint (#84.2 + #84.4). Counts show true halacha totals (pendingTotal).
New flag labels added (application / near_duplicate / nevo_preamble_leak).
Verified:
- backend: list_halachot(cluster=True) on the live queue — algorithm correct
(groups related same-precedent rules at 0.78; none at the production 0.90
because dedup #82 already removed near-dups — the desired state).
- frontend: `tsc --noEmit` exits 0 (type-clean); no new lint errors (the one
lint error is pre-existing in training/learning-panel.tsx from #94). Local
Turbopack build can't run on the worktree node_modules symlink — CI builds in
a clean checkout.
Invariants: G1 (gate/cluster at source in SQL, not post-hoc); G2 (same
list_halachot path); §6 (flagged items routed to a visible bucket, not dropped).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#88 (DB↔file, lessons #35): drafts/decision.md דרסה את עצמה רק ב-save_block_content;
renumber_all_blocks + נתיבי store_block אחרים השאירו את הקובץ stale → QA נכשל
פעמיים על אותה בעיה (CMPA-62). תיקון: _update_draft_file הפך ל-hook אוטומטי
(מקבל decision_id, מאתר case פנימית) שנקרא מ-store_block (כל persist) ומ-
renumber_all_blocks. legal-qa ממילא קורא מ-DB → שני הצדדים זהים תמיד.
#87 (claims_coverage, 1033-25): טענות מתכתובת (claim_type='reply' — תגובה/
השלמת-טיעון) סומנו "לא נענו" כ-false-positive. תיקון: check_claims_coverage
דורש מענה רק לטענות כתב-הערר (claim_type='claim', appellant); reply/תכתובת
מוחרגות. בקבלה מלאה הסף מוקל (0.2→0.4) כי העורר זכה במלואו.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
שורש #85 התברר: `claude -p` נכשל מדי פעם ב-exit מהיר + stderr ריק על
פרומפטים גדולים/איטיים (CEO write_interim_draft, learning_loop distillation),
**אותו פרומפט מצליח בריצה חוזרת** — כשל חולף, לא nesting (אומת: nested claude
מ-bash וגם פרומפט 70K הצליחו; הכשל אינו דטרמיניסטי).
query() עוטף spawn+communicate ב-לולאת retry (MAX_RETRIES=3, backoff לינארי
5s*attempt). FileNotFoundError + timeout נשארים דטרמיניסטיים (ללא retry).
empty-response גם מטופל כ-transient.
אומת e2e: distillation על 1130-25 רץ בהצלחה → pair=analyzed (9 שינויים,
6 style_method, 33.8% diff). פותר גם את write_interim_draft של ה-CEO.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Backend for the halacha approval-queue triage (#84). The keyboard UI, batch
actions and defer/reject (#84.4–6) already shipped; this adds the gating,
prioritization and metrics the queue was missing.
db.list_halachot — two opt-in triage controls:
* exclude_low_quality (#84.1): drop items carrying ANY quality_flag
(application / quote_unverified / truncated / non_decision / thin /
nli_unsupported / near_duplicate) — they belong in a 'needs extraction fix'
bucket, not the chair's approve queue.
* order_by_priority (#84.3): active-learning order — negatively-treated
first, then most-uncertain (lowest confidence), then oldest — instead of
FIFO, so the highest-value decisions surface first.
halachot_pending (MCP) — now gated + prioritized BY DEFAULT; include_low_quality=
true reveals the needs-fix bucket. The agent review path benefits immediately.
GET /api/halachot — same two params, default OFF (non-breaking; the UI opts in).
metrics.halacha_backlog (#84.7) — splits pending into clean vs flagged, adds
deferred, reviewed_total, approve_ratio, and a pending_by_flag breakdown, so the
backlog distinguishes real review work from extraction noise.
Deferred (documented): #84.2 near-duplicate cluster cards and wiring the UI
fetch to the new params require frontend work + an api:types regen AFTER this
deploys (the new query params aren't in prod's OpenAPI until then) — a clean
follow-up. The backend fully supports both now.
Verified against the live DB (read-only):
- pending 177 → gated-clean 110, 0 flagged items leak into the clean queue.
- priority order surfaces the lowest-confidence items first (0.55, 0.55, ...).
- backlog: pending_clean=110 / pending_flagged=67 / approve_ratio=0.916,
pending_by_flag={nli_unsupported:59, quote_unverified:3, thin:3, truncated:2}.
- pytest tests/test_halacha_quality.py — 52 passed (no regression).
Invariants: G1 (gate at source — SQL filter, not post-hoc); G2 (no parallel
path — same list_halachot); §6 (flagged items routed to a bucket, never dropped).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes
(pure + tested) plus measurement/ops tooling.
halacha_quality.py
- #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS
case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags
now takes rule_type and flags rule_type=='application' OR fact-dependent —
blocking auto-approve (an illustration is not a generalizable holding).
- #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein /
lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band.
halacha_extractor.py — pass rule_type to the flag computation; re-type a
binding-labeled fact-application to 'application' (mirrors non_decision→obiter).
db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent
neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with
high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a
possibly-distinct principle unreviewed).
config.py — HALACHA_DEDUP_BAND_COSINE (0.83).
Scripts:
- scripts/halacha_goldset.py (#81.7) — export stratified sample for human
tagging; score validators (P/R/F1) against the tags. Backbone for #81.8.
- scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent
dedup (cosine ≥0.95), dry-run report only.
- scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds
against the 2026-06-03 cleanup gold-set.
Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE
on normalized quote are NOT included — the current skip+flag behavior is safe,
whereas a UNIQUE on normalized_quote would fail on existing dups and a blind
merge risks losing provenance; they need their own chair-reviewed migration.
#82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role
classifier deferred (section pre-filter + application flag cover the practical
case); #81.8 blocked on the human-tagged gold-set (harness now provided).
Verified:
- pytest tests/test_halacha_quality.py — 52 passed (14 new).
- calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall
0.30 — correct profile for an auto-approve-blocking signal.
- goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5
cross-precedent candidate pairs.
Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no
silent swallow — suspect items flagged to review, never dropped); G2 (no
parallel path — same store_halachot_for_chunk / compute_quality_flags).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route.
extractor.py
- extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings
summary) before it is stripped — a free professional gold-set (#86.3).
- _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped.
(a) פסק-דין headers are markdown-wrapped (**פסק דין**); the old anchor
required the keyword as the first line char with one separator, so it
missed the header and matched a citation 32K deep (עמ"נ 50567-07-21,
losing 45% of the body). Now tolerates leading markdown + 0-3 seps,
and the final-nun form (דין ן vs דינו נ).
(b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The
authoring-judge line ends with a colon; we now require it.
ingest.py
- capture the ratio before stripping and store it on the row (best-effort,
non-fatal); also strip the text-upload path (was file-only).
db.py
- add case_law.nevo_ratio column (additive); allow it in update_case_law.
scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration:
finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites
full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose
quote lives in the removed preamble (review_status=pending_review +
nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are
excluded from --apply as suspected over-strip. --apply writes backup+manifest
to data/audit/ first. Chair-gated — NOT applied here.
scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session,
zero cost) measures recall/precision/granularity of our halachot vs the Nevo
ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text).
Verified:
- pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown
over-strip regressions).
- backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75%
keep (the 32K over-strip is gone).
- benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x.
Invariants: G1 (normalize at source — strip/capture at ingest, not at read);
no silent swallow (contaminated halachot flagged + reported, not dropped);
data-migration is dry-run-default with backup+manifest, chair-gated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
write_interim_draft failed for all blocks from the CEO MCP instance with
"Claude CLI failed (exit 1): unknown error". Two fixes:
1. Error surfacing (the certain win): on non-zero exit, capture and log
both stderr AND stdout (the CLI sometimes writes its diagnostic to
stdout or nowhere), so the next occurrence is diagnosable instead of
collapsing to "unknown error". This is why #85 was unsolved — the real
error was swallowed (engineering rule §6: no silent swallow).
2. Defensive hardening: strip Claude Code session markers (CLAUDECODE,
CLAUDE_CODE_*, CLAUDE_AGENT_*, AI_AGENT, CLAUDE_EFFORT) from the env of
nested `claude -p` calls and run them from $HOME, decoupling them from
the parent agent's session/project state. Aligns query() with the
existing query_streaming() path (which already sets cwd=HOME). Auth/
config vars are preserved.
Note: the original adapter-context failure could not be reproduced in a
plain interactive session (nested claude -p succeeds there in both old and
new code), so the env markers are a suspect, not a proven cause. The real
value is the diagnostics. Verified: nested query() returns PONG from
inside a CLAUDECODE=1 session; unit tests cover env sanitization.
Invariants: G1 (normalize at source — fix the spawn, not readers),
G2 (no parallel path — same query()), §6 (no silent error swallow).
INV: feedback_claude_session_local_only preserved (all calls stay local).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>