Commit Graph

71 Commits

Author SHA1 Message Date
a1db283ce1 Merge pull request 'fix(extraction): self-heal לתור חילוץ-ההלכות + drainer מתוזמן' (#142) from worktree-halacha-selfheal into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m41s
2026-06-08 06:05:27 +00:00
97ede1a49d fix(extraction): self-heal stale halacha 'processing' rows + scheduled drainer
The halacha extraction queue was stuck (same class as the metadata issue): 26
precedents requested extraction with no drainer, plus 1 orphaned in 'processing'
(status=processing, requested_at cleared → never re-picked by the queue).

- db.requeue_stale_processing_extractions(kind): re-stamp orphaned 'processing'
  rows (requested_at IS NULL) so they re-drain; halacha extractor force=False
  resumes from chunk checkpoints (no duplicates).
- process_pending_extractions calls it at the top — fully unattended, safe under
  the global advisory lock. Mirrors the digests-drain self-heal.
- legal-halacha-drain.config.cjs: pm2 cron (every 2h, conservative — Claude is
  slow/rate-limited and each run adds to the chair's pending_review queue).
  drain_halacha_queue.py stays on claude_session (high reasoning quality for
  holding/ratio; NOT moved to Gemini). SCRIPTS.md.

The chair-approval gate (INV-G10) is untouched — this only produces halachot;
Daphna still approves each in /approvals.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 06:04:53 +00:00
83d1a8253c feat(digests): digest_kind classification — robust extraction for all issue types (X12)
~2% מגיליונות "כל יום" הם לא-הכרעות (עדכוני-חקיקה/הודעות/ברכות) ללא ruling →
החילוץ ה-decision-centric החזיר ריק → both-empty → מחזורי ב-self-heal.

- SCHEMA_V32: `digest_kind` (decision/announcement/other) + backfill legacy בזול
  (יש citation→decision, אחרת announcement) — לפני שה-self-heal מסתמך עליו.
- extractor: prompt מסווג + מחלץ תמיד concept/headline/summary; underlying_* רק
  ל-decision. extract מנרמל digest_kind.
- enrich: שומר digest_kind; חילוץ מוצלח תמיד מסתיים ב-kind לא-ריק (ברירת-מחדל
  לפי citation אם המודל השמיט).
- drain self-heal: הגדרת-כשל = completed עם digest_kind='' (במקום both-empty) →
  הודעות לא מנוסות-מחדש לנצח.
- db: digest_kind ב-_DIGEST_COLS + update-whitelist (זורם ל-search/list/API).
- X12 spec: תיעוד digest_kind + הגדרת-הכשל המתוקנת.

אומת: V32 סיווג 533 (525 decision + 8 announcement, 0 unclassified — self-heal
לא נוגע בהם). extract: 5163→decision+citation · 5060→announcement+concept,
citation ריק (לא both-empty).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 06:02:08 +00:00
d95a36f310 feat(extraction): precedent metadata via Gemini Flash + scheduled drainer
The /precedents metadata queue was stuck — 24 rows requested, nothing draining
them — and the agentic claude CLI hit error_max_turns on what is a single
structured text→JSON task (slow + flaky). Metadata extraction is bounded
extraction, the wrong fit for an agentic loop.

- gemini_session.py: query_json drop-in (gemini-2.5-flash, JSON mode, httpx —
  no new SDK dep). Reads GEMINI_API_KEY (~/.env; SoT Infisical
  nautilus:/external-apis/gemini). Host-side only — no LLM from the container.
- precedent_metadata_extractor: claude_session.query_json → gemini_session.
  Validated live: rich, accurate fields (case_name/summary/appeal_subtype/tags).
- process_pending_extractions: kind-aware cooldown — metadata 2s (Gemini, fast),
  halacha keeps 30s (Claude rate limits).
- drain_metadata_queue.py + legal-metadata-drain.config.cjs (pm2 cron */15) so
  the queue never clogs again. SCRIPTS.md.
- X8 INV-FP5 updated: per-task engine choice (Gemini=bounded metadata,
  claude_session=agentic halacha), both host-side, single canonical queue (G2).

Agentic/voice-sensitive work (writing, analysis, halacha) stays on claude_session
(Daphna's subscription). Gemini cost ≈ $0.10/1M tokens — negligible.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 05:13:49 +00:00
da4ebeb724 feat(halacha): panel safety-net audit (selective-prediction monitoring)
Periodic safety net for the multi-judge approval panel: samples panel-approved
halachot, re-runs the same 3-judge KEEP vote, and surfaces any that now lean
DROP — candidate false-keeps a human should glance at. Report-only by default;
--flag reopens flips to pending_review. Baseline 0/15 on the 2026-06-07 batch.

Closes the loop the literature prescribes (Trust-or-Escalate / selective
prediction): monitor the auto-decision error rate rather than trusting it
blindly. Reuses halacha_panel_approve's judges (single source of truth).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 05:01:03 +00:00
b5f7b60fb5 fix(digests): self-heal stale 'processing' rows in drain (fully unattended)
drain_digests רץ תחת flock (drainer יחיד), אז כל שורה 'processing' בתחילת ריצה
היא שריד מריצה קודמת שנקטעה באמצע-שורה (סשן/מכסה). מאפסים אותה ל-'pending'
לריצה חוזרת — סוגר את הפער האחרון ל-resume אוטומטי מלא ללא התערבות.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 04:52:55 +00:00
dba2a131e0 feat(halacha): multi-judge approval panel + policy calibration (Trust-or-Escalate)
The chair cannot review every pending halacha. Three independent-lineage judges
(Opus via claude_session · DeepSeek · Gemini-2.5-flash — #1 on LegalBench) vote
on the COARSE axis we proved reliable across models (92%): "is this a genuine,
keepable rule?". Only an agreed verdict acts; every split escalates to the chair
(INV-G10). Buckets: clean→KEEP?; nli_unsupported→entailment re-adjudication;
extraction-defects→re-extraction.

halacha_panel_calibrate.py calibrates the voting policy on the gold-set's
is_holding (the coarse label) per Trust-or-Escalate (ICLR 2025): unanimous →
94.9% precision / 78% coverage; majority → 92.9% / 99%; ZERO false-drops in
both (the panel never rejects a good rule). Chosen policy (chair-approved):
clean→majority-2/3, nli→asymmetric (majority-reject, unanimous-approve),
defects→re-extraction. Reversible (--apply backs up review_status+flags first).

Sources: Panel-of-LLM-Evaluators (PoLL) · Trust-or-Escalate (ICLR 2025,
arXiv:2407.18370) · selective-prediction / learning-to-defer.

Invariants: upholds G10 (human gate — splits escalate, panel only collapses the
queue) and G9 (provenance — reviewer records the panel + policy). Read paths only
in calibrate; --apply writes review_status/quality_flags reversibly with backup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 21:11:30 +00:00
360f49d8b4 docs: record Infisical SoT for host-service shared secrets
COURT_FETCH_SHARED_SECRET + LEGAL_CHAT_SHARED_SECRET migrated to Infisical
nautilus:/legal-ai (2026-06-07). Updated the pm2 config comments: the stale
"migrate to Infisical once the MCP server is back" TODO is now done; local
env files remain the runtime source, Infisical is the SoT/record.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 21:04:44 +00:00
3ae183009f feat(digests): self-heal in drain_digests — auto-resume after quota/interruption
ה-cron של drain_digests הוא מנגנון ה-resume (pending-based, idempotent, host-side,
לא תלוי בסשן). חיזוק: אם enrich נכשל באמצע (מכסת claude נגמרה) השורה נשארה
'completed' עם שדות ריקים → לא היתה מטופלת שוב. עכשיו drain מאפס בתחילתו כל
digest 'completed' עם concept_tag ריק *וגם* underlying_citation ריק (= חילוץ
שמעולם לא נחת; שורה תקינה תמיד מכילה לפחות מראה-מקום) → pending לריצה חוזרת.
כך כל קטיעה/מכסה מתאוששת אוטומטית בריצת ה-cron הבאה.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:59:49 +00:00
c1abf2ec0e feat(digests): scripts/drain_digests.py — local enrichment drainer for cron (X12)
ריקון תור ההעשרה של יומונים מקומית (claude_session local-only): כל digest
'pending' → enrich_digest (Sonnet + embedding + autolink). מקבילי (3),
idempotent, מוסיף ~/.local/bin ל-PATH (claude CLI תחת cron). מיועד ל-cron
יומי אחרי ה-poll של n8n (flock למניעת חפיפה) + שימוש ידני אחרי backfill.
SCRIPTS.md עודכן.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:40:45 +00:00
6468e151d9 Merge pull request 'refactor(digests): single source of truth — drop processed/ folder state (X12)' (#122) from worktree-digests-single-truth into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 28s
2026-06-07 20:33:43 +00:00
fb40ec8565 refactor(digests): single source of truth — drop processed/ folder state (X12)
ה-DB (`digests`) הוא מקור-האמת היחיד למצב-קליטה. ingest_digests_batch.py העביר
קבצים incoming→processed/ — state מבוסס-תיקיות מקביל ל-DB (הפרת-G2 קטנה).

- הוסר ה-move ל-processed/ + import shutil + PROCESSED. הסקריפט מסתמך על
  dedup ב-content_hash (ingest_digest מחזיר 'exists' לקיימים) → הרצה חוזרת בטוחה.
- תיקיות (incoming/) = staging בלבד, לא state.
- X12 INV-DIG2: תועד מקור-אמת-יחיד + ההפרה-שתוקנה (processed/).
- SCRIPTS.md עודכן.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:33:18 +00:00
f4f110f0d1 feat(X13): scheduled drain — fully-autonomous digest→fetch→ingest loop
- scripts/drain_court_fetch.py: drives orchestrator.drain_pending (host-only;
  no-op when queue empty). Mirrors drain_halacha_queue.py.
- scripts/legal-court-fetch-drain.config.cjs: pm2 cron (hourly :17, one-shot),
  COURT_FETCH_DRAIN_CRON override.
- fix: orchestrator default service URL 127.0.0.1 → 10.0.1.1 (the service binds
  the docker0 gateway; the host can't reach it on loopback). Found live — the
  first drain failed "connection refused" until corrected.
- SCRIPTS.md entries.

Validated end-to-end in PRODUCTION on a real digest: עת"מ 43830-12-24
(החברה להגנת הטבע) fetched from נט המשפט → case_law (79 chunks, source_url),
digest relinked (INV-DIG3 closed), halacha queued pending_review. job=done.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:31:53 +00:00
808c2e4c46 feat(goldset): independent second-judge for rule_role (break AI-anchoring)
The gold-set's human role tags were made while seeing a claude AI recommendation,
so human↔AI agreement (~100%) is anchoring, not an independent accuracy signal.
This adds a third, genuinely independent judge — a DIFFERENT model (DeepSeek,
direct OpenAI-compatible API) classifies rule_role BLIND (never sees the human
tag nor the first AI's answer) — and reports an inter-rater agreement matrix.

Finding (100 tagged items): ai↔human 100% (anchored) vs deepseek↔human 50%
fine-grained — BUT 92% on the coarse axis (generalizable-rule vs application/
obiter). Conclusion: the fine sub-type (holding/interpretive/procedural) is an
inherently fuzzy boundary two capable models split differently; the coarse
"is this a real rule" axis is robust across models. Use the coarse axis as
ground truth; treat the sub-type as advisory, never as a gate.

Zero chair tagging, read-only on the gold-set. Key from ~/.hermes deepseek env.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:12:58 +00:00
e186183527 fix(X13): harden court-fetch against browser leaks + reaper for task-master-mcp leak
שלוש שכבות-הגנה נגד דליפת-זיכרון מדפדפנים יתומים, + טיפול בדליפה הגדולה
בפועל בשרת (task-master-mcp).

- camofox_client.py:
  - asyncio.wait_for קשיח סביב כל ה-fetch (COURT_FETCH_HARD_TIMEOUT_S=180ש')
    — hang → ביטול → async-with tear-down → reap.
  - _reap_orphan_browsers(): הורג camoufox-bin יתומים (ppid=1) לפני ואחרי כל
    fetch. סדרתיות (INV-CF4) → כל ppid=1 הוא שארית בטוחה.
- scripts/reap_orphan_procs.py: reaper כללי ל-task-master-mcp (~3GB יתומים)
  + camoufox-bin. רק ppid=1; /proc טהור. --dry-run / --loop N.
- scripts/legal-reaper.config.cjs: דמון pm2 (loop 180s, max_memory_restart 100M).
- X13 spec + SCRIPTS.md: תיעוד שכבות-ההגנה.

max_memory_restart בשירות (1.5G) כבר נותן רשת-ביטחון ברמת-התהליך.
Invariants: מקיים INV-CF4 (politeness/serial) — ללא שינוי חוזה.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:43:53 +00:00
781f24c643 feat(X13 Tier-1): calibrate נט המשפט fetch — Camoufox python, proven on 46111-12-22
אומת end-to-end: פס"ד 34 עמ' של עת"מ 46111-12-22 הורד אוטונומית מלא, נטו
קוד-פתוח, ללא כרטיס-חכם וללא פתרון-CAPTCHA.

ממצאי-כיול עיקריים:
- החיפוש+הניווט-לתיק ללא reCAPTCHA כלל. reCAPTCHA קיים רק בצופה ורק על
  שמירה/הדפסה מפורשת — לא על הצגת המסמך.
- הצופה מגיש עמודים כ-PNG דרך PageMethod GetImages (4/batch); משיכה ב-fetch
  עם הכותרת X-Requested-With: XMLHttpRequest (חובה — F5 WAF חוסם בלעדיה) →
  הרכבת PDF (Pillow).

שינויים:
- camofox_client.py: שכתוב מלא — Camoufox דרך חבילת-הפייתון (in-process,
  לא שרת-Node REST). מסלול מכויל: home→btnExternalSearchCases→Bama fields→
  CaseDetails→פסקי דין→DecisionList→NGCSViewerPage→GetImages→PDF.
- pm2 config: app Xvfb :99 + DISPLAY=:99 (Camoufox קורס headless בלי צג וירטואלי).
- pyproject: extra [court-fetch] = camoufox + faster-whisper (host-only; הקונטיינר
  לא מריץ דפדפן). Pillow כבר בבסיס.
- X13 spec + SCRIPTS.md: עודכנו לממצאים (image-API, Xvfb, אימות).

reCAPTCHA audio (Whisper) נשמר כ-fallback למסלול-השמירה-המפורש בלבד; המסלול
הראשי אינו זקוק לו. Invariants: מקיים INV-CF1/CF4/CF6 (ללא שינוי).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:32:13 +00:00
f3740fef68 Merge pull request 'fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7)' (#112) from worktree-halacha-authority-split into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m32s
2026-06-07 18:19:43 +00:00
2e33cac043 fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7)
The extractor classified rule_type by SOURCE bindingness (higher-court→binding,
committee→persuasive) instead of by rule KIND. The gold-set proved it: 'binding'
appeared on 19/19 external rulings & 0 committees; 'persuasive' on 13/13
committees & 0 external — only 58% agreement with the human role tags. The two
axes (authority vs rule role) were crammed into one enum.

This splits them per INV-DM7:
- authority (binding/persuasive) — DERIVED from case_law.precedent_level
  (עליון/מנהלי→binding, ועדת_ערר_מחוזית→persuasive), never stored, never
  LLM-guessed. New helper halacha_quality.derive_authority; surfaced read-only
  in list_halachot / goldset_list / search results.
- rule_type — now the rule ROLE only: holding/interpretive/procedural/
  application/obiter. Both extractor prompts unified to this vocabulary;
  _coerce_halacha no longer defaults rule_type from the source; legacy
  binding→holding / persuasive→interpretive fold for safety.

UI: authority shown as a separate read-only badge (gold=מחייב / muted=משכנע)
across the review queue, precedent detail, and gold-set; the gold-set role
selector drops binding/persuasive and adds מהותי (holding).

Migration: scripts/halacha_rule_role_backfill.py re-classifies the 276 pre-split
binding/persuasive rows into a genuine role via local claude_session (run after
deploy). Gold-set correct_type/ai_correct_type 'binding'→'holding' via SQL.

Sources (≥3, per research-decision policy): OASIS LegalRuleML v1.0
(appliesAuthority/Strength as metadata orthogonal to rule logic) · SemEval-2023
Task 6 LegalEval (rhetorical roles by function, authority kept separate) ·
Bluebook signals (weight-of-authority is a separate dimension).

Invariants: ESTABLISHES INV-DM7. Upholds G1 (normalize at source — extractor
classifies role, system derives authority) and G2 (single source of truth —
authority derived, not a parallel stored field). Tests: 211 pass + new
derive_authority/coerce coverage. web-ui build + tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:18:41 +00:00
0990db7a3c feat(X13): auto-fetch court verdicts from נט המשפט → corpus (Tier 0 + scaffold)
תת-מערכת אחזור-פסיקה אוטומטי: כשיומון מצביע על פס"ד בית-משפט, מסווגים את
הערכאה, מורידים מהמקור הציבורי המתאים, וקולטים דרך צינור-הקליטה הקנוני.

- spec-first: docs/spec/X13-court-fetch.md (INV-CF1..CF7) + אינדקס
- מסווג court_citation.py (supreme/admin/skip) + 10 בדיקות (עת"מ 46111-12-22 → admin)
- Tier 0: court_fetch_supreme.py — supremedecisions API (reverse-engineered), httpx
  + browser-headers (אומת 200) + politeness
- תור court_fetch_jobs (SCHEMA_V30) + DB helpers + court_fetch_orchestrator.py
- Tier 1 scaffold: legal-court-fetch-service (aiohttp+Bearer, מראת legal-chat-service)
  + camofox_client (Camoufox open-source) + recaptcha_audio (Whisper מקומי) + pm2
- Tier 2 fallback חינני: manual + missing_precedent (INV-CF2/CF3 — אין drop שקט)
- כלי-MCP court_verdict_fetch / court_fetch_status; SCRIPTS.md

Invariants: מקיים G2 (מסלול-קליטה יחיד, INV-CF1) · G3/G1 (idempotent+נרמול, INV-CF5)
· G4/§6 (אין בליעה שקטה, INV-CF2) · G10 (שער-אנושי, INV-CF3) · G5 (source_type,
INV-CF6) · G9 (provenance+audit, INV-CF7). מקורות INV-CF4: RFC 9309 · Google
crawler · OWASP OAT.

Follow-ups (טרם אומתו חי): live Tier-0 validation · התקנת camofox-browser+whisper
· כיול selectors Tier-1 · COURT_FETCH_SHARED_SECRET (Infisical+Coolify) · טריגר
מ-digest try_autolink (worktree-digests-radar). V30 עלול להתנגש עם digests-radar.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:12:13 +00:00
8171572cdd feat(digests): קורפוס יומונים כשכבת-גילוי (radar) — X12
מאגר חדש ליומוני "כל יום" (עפר טויסטר) כשכבת-גילוי מעל קורפוסי-הפסיקה:
מקור-משני המצביע על פסק הדין המקורי, נקלט לטבלה נפרדת `digests`, נחפש
סמנטית, ומקושר לפסק המקורי בספריית הפסיקה — אך לעולם אינו מצוטט בהחלטה
ואינו מחלץ הלכות.

Phase 0 (spec):
- docs/spec/X12-digests-radar.md — INV-DIG1 (מצביע לא מצוטט) /
  INV-DIG2 (מסלול-קליטה נפרד, לא מקביל — מקיים G2) / INV-DIG3 (קישור-לפסק
  הוא הגשר; חוסר-קישור = פער גלוי). עדכון אינדקס 00/03/README.

Phase 1 (MVP):
- SCHEMA_V30: טבלת `digests` (HNSW על embedding — לא ivfflat, להימנע מ-recall
  cliff בקורפוס קטן/צומח) + GIN/FTS + UNIQUE חלקי ל-idempotent.
- services/digest_metadata_extractor.py — חילוץ-LLM (claude_session local-only,
  ייבוא lazy): תג-מושג, כותרת-הלכה, מראה-מקום, שני-תאריכים מובחנים, תגיות.
- services/digest_library.py — מסלול קצר עצמאי (INV-DIG2): extract→hash→LLM→
  embedding יחיד→autolink. לא משתמש ב-ingest.ingest_document.
- tools/digests.py + רישום 7 כלים ב-server.py (digest_upload/list/get/link/
  relink/delete + search_digests).
- scripts/ingest_digests_batch.py — קליטה ידנית מ-data/digests/incoming.
- legal-researcher.md: שלב 2ב.0 (סריקת-radar לפני אימות) + סעיף-דוח ט +
  3 כלים ב-frontmatter. HEARTBEAT §8: ניתוב יומון→digest_upload.

אומת end-to-end: 4 יומונים נקלטו (מטא-דאטה מדויק), חיפוש סמנטי מדרג נכון
("היטל השבחה"→5160, "תמא 38"→5158), link/relink/autolink/revert + מעטפת-MCP.

Invariants: מוסיף INV-DIG1/2/3 (X12). מקיים G2 (bounded context נפרד, לא
מסלול מקביל), G3 (idempotent upsert), G4 (אין בליעה שקטה — פער-קישור מוצף),
G9 (עקיבוּת — היומון מצביע על מקור עקיב). נוגע G7 (RRF) — נדחה, חיפוש
סמנטי-בלבד בשלב 1 (FTS index מוכן).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:49:00 +00:00
0e35060d3d feat(goldset): AI second-opinion per item (QA aid) — compare vs human tag
The chair wanted an independent recommendation beside each tag, to reconsider
his own judgments. Adds a NON-ground-truth AI second-opinion:

- schema: halacha_goldset.ai_is_holding / ai_correct_type / ai_rationale /
  ai_generated_at (additive).
- db.goldset_set_ai_recommendation + goldset_list now returns the ai_* fields.
- scripts/goldset_ai_recommend.py — local claude_session judges is_holding +
  type + a one-line rationale per item, INDEPENDENTLY (own legal rubric).
  Independent of the rule-based validators #81.8 measures → no circularity.
  Never auto-applied; QA aid only.
- web-ui: each card shows "🤖 המלצת AI: הלכה/לא · type" + rationale and an
  agreement/disagreement chip vs the human tag (amber on disagree); a
  "⚠ אי-הסכמות AI (N)" filter to review only the conflicts.

Methodology note kept explicit: the human stays the ground truth; the AI is a
prompt to reconsider, not to copy.

Verified: tsc --noEmit 0; generator stores recs and flags disagreements with
existing human tags.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:24:35 +00:00
b7b44f4453 feat(halacha): equivalent-halacha (parallel-authority) links across precedents
Cross-precedent recurrence of a principle is real but is NOT citation
corroboration (X11) — the 5 candidate pairs have ZERO citations between their
precedents. Recording them in halacha_citation_corroboration would fabricate
citation data and inflate corroboration_count. This adds a proper, separate
halacha-level link for parallel authority.

Schema (V28): equivalent_halachot — symmetric (halacha_a < halacha_b, CHECK +
UNIQUE), non-citation, cross-precedent-only. ON DELETE CASCADE.

db.py:
- link_equivalent_halachot (idempotent; rejects same-id and SAME-precedent pairs
  — parallel authority is cross-precedent by definition), unlink, and
  list_equivalent_for_halacha.
- list_halachot gains include_equivalents → _annotate_equivalents attaches an
  `equivalents` list (both directions) per row.

API: include_equivalents on GET /api/halachot; GET/POST/DELETE
/api/halachot/{id}/equivalents for the chair to view/link/unlink manually.

scripts/halacha_batch_reconcile.py: --link records found cross-precedent pairs
as equivalent_halachot (non-destructive, idempotent).

web-ui: Halacha.equivalents type; the clean review queue fetches
include_equivalents; the review card shows a gold "עיקרון מקביל ב-N" badge + an
expandable list (case + rule + similarity) labeled "אסמכתה מקבילה — לא ציטוט".

Populated the 5 reviewed pairs (chair decision: keep all + link as parallel
authority). Verified: 5 rows; the 1023-20 hub annotates 3 of its halachot with
equivalents; tsc --noEmit exits 0.

Invariants: G1 (model recurrence at source in its own table, not by abusing the
citator); G2 (no parallel path — extends list_halachot); citator integrity
preserved (corroboration stays citation-only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:29:46 +00:00
1286a1e60d feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82)
Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes
(pure + tested) plus measurement/ops tooling.

halacha_quality.py
- #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS
  case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags
  now takes rule_type and flags rule_type=='application' OR fact-dependent —
  blocking auto-approve (an illustration is not a generalizable holding).
- #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein /
  lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band.

halacha_extractor.py — pass rule_type to the flag computation; re-type a
binding-labeled fact-application to 'application' (mirrors non_decision→obiter).

db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent
neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with
high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a
possibly-distinct principle unreviewed).

config.py — HALACHA_DEDUP_BAND_COSINE (0.83).

Scripts:
- scripts/halacha_goldset.py (#81.7) — export stratified sample for human
  tagging; score validators (P/R/F1) against the tags. Backbone for #81.8.
- scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent
  dedup (cosine ≥0.95), dry-run report only.
- scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds
  against the 2026-06-03 cleanup gold-set.

Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE
on normalized quote are NOT included — the current skip+flag behavior is safe,
whereas a UNIQUE on normalized_quote would fail on existing dups and a blind
merge risks losing provenance; they need their own chair-reviewed migration.
#82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role
classifier deferred (section pre-filter + application flag cover the practical
case); #81.8 blocked on the human-tagged gold-set (harness now provided).

Verified:
- pytest tests/test_halacha_quality.py — 52 passed (14 new).
- calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall
  0.30 — correct profile for an auto-approve-blocking signal.
- goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5
  cross-precedent candidate pairs.

Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no
silent swallow — suspect items flagged to review, never dropped); G2 (no
parallel path — same store_halachot_for_chunk / compute_quality_flags).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:55:45 +00:00
fb51a0e869 feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86)
#86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route.

extractor.py
- extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings
  summary) before it is stripped — a free professional gold-set (#86.3).
- _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped.
  (a) פסק-דין headers are markdown-wrapped (**פסק  דין**); the old anchor
      required the keyword as the first line char with one separator, so it
      missed the header and matched a citation 32K deep (עמ"נ 50567-07-21,
      losing 45% of the body). Now tolerates leading markdown + 0-3 seps,
      and the final-nun form (דין ן vs דינו נ).
  (b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The
      authoring-judge line ends with a colon; we now require it.

ingest.py
- capture the ratio before stripping and store it on the row (best-effort,
  non-fatal); also strip the text-upload path (was file-only).

db.py
- add case_law.nevo_ratio column (additive); allow it in update_case_law.

scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration:
finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites
full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose
quote lives in the removed preamble (review_status=pending_review +
nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are
excluded from --apply as suspected over-strip. --apply writes backup+manifest
to data/audit/ first. Chair-gated — NOT applied here.

scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session,
zero cost) measures recall/precision/granularity of our halachot vs the Nevo
ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text).

Verified:
- pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown
  over-strip regressions).
- backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75%
  keep (the 32K over-strip is gone).
- benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x.

Invariants: G1 (normalize at source — strip/capture at ingest, not at read);
no silent swallow (contaminated halachot flagged + reported, not dropped);
data-migration is dry-run-default with backup+manifest, chair-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:45:43 +00:00
2e20e27e17 feat(style-acq T1-T3): קורפוס-דוגמאות של דפנה לכותב (style_exemplars)
ממלא את ערוץ-הדוגמאות (B) של מערכת רכישת-הסגנון: הכותב מאחזר פסקאות-בלוק
אמיתיות של דפנה בזמן כתיבה, ממוקדות section+outcome+practice_area.

T1 — תשתית + backfill:
- SCHEMA_V27: טבלת style_exemplars (purpose-built — בלי תיקים מזויפים בשרשרת
  decision_paragraphs). decision_number/source/section/outcome/practice_area+embedding.
- db: insert/delete/search_style_exemplars + count_style_exemplars.
- scripts/backfill_style_exemplars.py: מפצל קורפוס דפנה (style_corpus +
  internal_committee) לסעיפים→פסקאות, embed, שמירה. אידמפוטנטי, dry-run/apply.

T2 — אחזור ממוקד:
- search_style_exemplars(section, outcome, practice_area) — section=hard filter,
  outcome/practice_area=soft. block_writer._build_precedents_context ממפה
  block→section ומאחזר (ראשי), לצד הנתיב הישן (משלים).

T3 — contrastive/adapt:
- הדוגמאות מתויגות "מבנה/קול בלבד — התאם, אל תעתיק תוכן"; פסקה מלאה (1100 תווים).

INV-LRN5 (טוהר — סגנון בלבד). G11. הרצת backfill --apply בנפרד.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 18:10:01 +00:00
701efab726 feat(mcp): FU-14 GAP-51 — איחוד אוצר-המילים של תוצאת-תיק (set_outcome SSoT)
הכרעת-יו"ר: קנוני = 3 תוצאות אמיתיות (rejection/partial_acceptance/full_acceptance);
betterment_levy יוצא מהיותו "תוצאה" ועובר ל-override לפי practice_area.
+ עקרון "אנגלית-ב-DB, עברית-ב-UI": מפת-תוויות SSoT אחת.

lessons.py:
- VALID_OUTCOMES = 3 (הוסר betterment_levy).
- OUTCOME_LABELS_HE (SSoT לתצוגה) + LEGACY_OUTCOME_MAP + canonical_outcome().
- PRACTICE_AREA_OVERRIDES["betterment_levy"] מרכז את כל ה-guidance שהיה מפתוח כ-outcome
  (golden_ratios/opening/summary/discussion/template).
- get_lessons_for_outcome(outcome, practice_area) + format_ratios_comment(..., practice_area)
  מחילים override + מנרמלים legacy.

block_writer.py: STRUCTURE_GUIDANCE קנוני + תווית מ-OUTCOME_LABELS_HE + override betterment.
workflow.set_outcome: קנוני 3 + מיפוי-legacy סלחני; תווית מ-SSoT.
drafting.py: טבלת יחסי-זהב + get_decision_template מודעי-practice_area (override).
web-ui case.ts: הסרת betterment_levy מ-expectedOutcomes (הוא practice_area).
server.py: docstrings קנוניים.

מיגרציה: migrate_gap51_outcomes.py — 9 שורות נורמלו (rejected→rejection וכו'),
גיבוי ב-data/audit/. הקוד canonicalize בקריאה ⇒ backward-compatible גם בלי מיגרציה.

אומת: py_compile (5 קבצים) + בדיקות-יחידה offline (override/legacy/labels) + אימות-DB.
עודכנו X9 §3 + gap-audit (GAP-51 ).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 15:34:49 +00:00
7f4e036211 feat(spec): חיבור ספ-המערכת למסלול-הכתיבה האינטראקטיבי (אכיפה 3-שכבתית)
הספ (docs/spec/, G1–G11) חובר לסוכני Paperclip דרך INV-AG1 אבל לא למסלול
שבו רוב הקוד נכתב בפועל — הסשן האינטראקטיבי של Claude Code. סוגר את הפער
לפני מחזור-2 (FU-9..15), שהוא כולו כתיבת-קוד.

שלוש שכבות אכיפה:
1. תיעוד — CLAUDE.md §"פרוטוקול כתיבת-קוד" + docs/spec בטבלת-הייחוס
2. hook — scripts/spec-guard.sh (PreToolUse על Edit/Write/MultiEdit, רשום
   ב-.claude/settings.json) מזכיר פעם-בסשן בכל נגיעה בקובץ-קוד; non-blocking
3. PR — .gitea/PULL_REQUEST_TEMPLATE.md עם סעיף-חובה "Invariants"

המקבילה האינטראקטיבית ל-INV-AG1 שכבר אוכף על הסוכנים (HEARTBEAT §"קריאת-ספ").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 13:28:15 +00:00
434341cc29 chore(#57): re-chunk+re-embed legacy precedents (pre-#55 chunker remediation)
Adds scripts/rechunk_legacy_precedents.py: selects every case_law with a tiny
chunk (content<50 — the pre-fix chunker fingerprint) and runs
ingest.reindex_case_law (re-chunk+re-embed from stored full_text only, no
re-OCR/LLM, idempotent). Batch-idempotent (re-queries the affected set).

Run result (2026-06-03): 73 precedents reindexed, 0 failed. Tiny chunks
483 -> 4 (99.2%); total precedent_chunks 5019 -> 3115 (fragments merged).
Search verified healthy (substantial coherent passages, no errors).

The 4 residual tiny chunks are isolated section headings ('דיון',
'טענות המשיבים', ...) emitted by the CURRENT (fixed) chunker — not legacy
fragments — and are already filtered at query time (>=50, #55). Minor
chunker edge case, candidate #55 follow-up.

The DB chunk migration is already applied to prod; this commit is the script
+ SCRIPTS.md entry only (no app code change, no deploy needed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 07:55:42 +00:00
887079535c feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction
ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים),
לצמצום היקף האישור-הידני של היו"ר:

- docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם
  ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ).
- docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר
  לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס.
- Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable;
  claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31:
  פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל.
- scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות.
- .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו).

תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות.
audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:42:13 +00:00
6ff2e36bf9 feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63)
Covers GAP-11 (INV-RET4/G8) and GAP-14 (INV-QA1/G10). Retrieval quality was
never measured (only telemetry observation) and the halacha review backlog was
invisible (the 10/19 gap was found by accident).

Unit B — backlog visibility (pure code, container):
- metrics.halacha_backlog(conn) → {pending_review, approved, rejected, published,
  total, oldest_pending_at}; surfaced in metrics.get_dashboard() (get_metrics MCP
  tool) and /api/system/diagnostics. Live count revealed 178 pending / 1552 total,
  oldest from 2026-05-03 — previously invisible.

Unit A — retrieval eval harness (host-side scripts):
- scripts/eval_gold_bootstrap.py — seeds data/eval/gold-set.jsonl. Two sources:
  citations (cited==relevant via search_relevance_feedback — empty until decisions
  cite precedents) and known_item (query=case_name → relevant=self; a real
  citation-free signal, the methodology #52 checked by hand). Idempotent; preserves
  source='chair' rows.
- scripts/eval_retrieval.py — runs the production retrieval path (search_library /
  search_internal) over the gold-set; computes precision@k, recall@k, MRR, nDCG@k
  (k=5,10); aggregates overall + per-corpus + per-practice_area; writes a report and
  a delta vs committed baseline.json (which records the retrieval_config it reflects).
  --self-test unit-checks the metric math offline.

Gold-set strategy = hybrid (chair decision): bootstrap + chair review. The citation
source is empty today (0 cited precedents in decisions), so the seed is known-item
(77 queries: 54 internal_decisions + 23 precedent_library). The gold-set is
PROVISIONAL until Dafna reviews it (the domain chair-gate).

Baseline (production config: multimodal+rerank on): R@10=0.987, MRR=0.837,
nDCG@10=0.872. Finding: MULTIMODAL_ENABLED=true slightly lowers known-item recall
(image-page results displace exact name matches) — relevant to #15. precedent_library
weaker than internal (R@10 0.957 vs 1.0) — one external precedent unfindable by name.

"CI gate" realized as discipline (re-runnable harness + committed baseline + run
before/after any retrieval-layer change) — retrieval needs prod DB + Voyage, no CI
runner has that access.

Spec: docs/superpowers/specs/2026-05-31-fu5-eval-harness-design.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:58:13 +00:00
4fce9d503f feat(migration): FU-2c — reconcile external case_law identifiers (GAP-08, #68)
External court precedents stored the full citation (designator + docket +
parties + Nevo date) inside case_number, violating INV-ID2/G1 (citation as
identifier). Chair decision 2026-05-31 (Option A): canonical external
case_number = proceeding-designator + docket, '/' preserved (court
convention, not X1's '/'→'-'); parties/court/date → citation_formatted.

scripts/fu2c_reconcile_external_case_numbers.py — deterministic dry-run →
chair-review → apply, mirroring FU-2b:
- extracts designator+docket; flags split into BLOCKING (MISMATCH /
  CIT_NO_DOCKET / DESIG_MISMATCH / DUP_CHECK / NO_DOCKET) vs ADVISORY
  (NO_CITATION — case_number fix still deterministic, missing citation is a
  separate gap), so advisory rows apply while uncertain identity does not.
- --overrides CSV (id,proposed_canonical,citation_formatted,reason) for
  audited chair adjudication of blocking rows.
- apply scoped to source_kind='external_upload' (task target) while keeping
  cited_only/nevo_seed in the reconciliation VIEW so DUP_CHECK spans the full
  external unique space; pre-flight collision guard before every UPDATE.

Applied to production 2026-05-31: 21 case_number normalized + 3
citation_formatted reconciled (D = consolidated Supreme Court judgment
לויתן/קלמנוביץ → lead docket 25226-04-25; 2×C empty citations composed from
metadata). אהוד שפר עע"מ 317/10 deferred — cross-source duplicate with an
existing cited_only reference (collision guard held; → #70). 49 cited_only
records out of scope → new task #70 (committee-form NNNN-NN dockets the
extractor misses, dedup, unresolvable "ערר אדלר"). Extraction + gating
verified offline on all 24 records.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:12:45 +00:00
e5b34e01dc docs(scripts): note sync --verify drift-gate semantics (FU-8a) 2026-05-31 11:36:06 +00:00
aac383acb7 feat(sync): --verify exits non-zero on drift; adapter mismatch = loud drift (GAP-21, FU-8a)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 11:14:44 +00:00
8477fd87e7 docs(scripts): register fu2b reconciliation script (FU-2b) 2026-05-31 08:58:32 +00:00
e46868feda feat(fu2b): flag PROC_MISMATCH (case_number prefix vs proceeding_type) for chair
Dry-run surfaced 2 rows with בל"מ prefix but proceeding_type=ערר. Since the
migration strips the prefix, a wrong proceeding_type would silently lose the
בל"מ signal — must be chair-adjudicated, not auto-applied. Chair table now
flags 4 rows: 2 DUP_CHECK (8047-23) + 2 PROC_MISMATCH.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 08:57:42 +00:00
ab8d17fdd8 feat(fu2b): chair-gated internal case_number reconciliation script (GAP-07/08)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 08:54:38 +00:00
58ab003206 fix(retrieval): make decisions findable by name + unhide committee uploads
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m57s
Root cause of "agent can't find the Agasi decision in the corpus" (CMPA-55):
the decision was fully ingested, but the retrieval layer failed on the
realistic agent query — searching by case name.

- RC-A (#52): lexical tsvector covered only chunk content + halacha text,
  so a bare-name query ("אגסי") matched decisions that *cite* the case, not
  the case itself. Add meta_tsv on case_law(case_name, case_number) (SCHEMA
  V20) and OR it into the lexical halacha/chunk SQL with a match boost, so a
  name/number hit surfaces the case's own rows. Agasi: rank 4 → rank 1.
- RC-B (#53): precedent_library_list hard-defaulted source_kind=external_upload
  and never exposed the param, hiding uploaded ערר/בל"מ (internal_committee)
  decisions. Thread source_kind through service → tool → MCP tool (supports
  'internal_committee' / 'all_committees').
- #54: agent instructions (researcher/analyst/writer) — search-by-name
  protocol: add content/case-number, search both corpora, use all_committees
  before declaring "not in corpus".
- #55: chunker produced tiny fragment chunks ("דיון", "החלטה") from header
  keywords matched mid-sentence. Anchor SECTION_PATTERNS to line start +
  merge sub-min sections; exclude <50-char fragments at query time (484
  existing fragments hidden; full re-chunk tracked as #57).

Tests: scripts/test_retrieval_by_name.py (name ranks case above citer +
substantive regressions); chunker unit checks (0 tiny chunks). New findings
filed as tasks #56 (halacha source_kind leak) and #57 (re-chunk migration).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 11:26:19 +00:00
d3c6baf9e2 security(chat): bind chat service to docker bridge + require Bearer auth
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m38s
Address security-review finding: the host-side legal-chat-service was
binding 0.0.0.0:8770 with no authentication. The service spawns the
claude CLI, whose tool set includes Bash + Edit — so an unauthenticated
/chat/start is effectively RCE. Oracle Cloud's security list closes the
port externally, but defense-in-depth requires two independent layers:

1. Bind defaults to 10.0.1.1 (docker0 bridge gateway). Reachable from
   containers on docker bridges (the legal-ai container has a route via
   the coolify network), invisible to anything outside the host. The
   --host flag is still configurable for local-dev (127.0.0.1) or
   special-case deployments, but 0.0.0.0 is explicitly discouraged in
   the docstring.
2. /chat/start requires Authorization: Bearer <LEGAL_CHAT_SHARED_SECRET>.
   The secret is loaded from /home/chaim/.legal-chat-service.env (chmod
   600, off-repo) by the pm2 ecosystem and mirrored as a Coolify env
   var so the FastAPI chat_proxy sends a matching header. hmac.compare_digest
   prevents timing oracles. /health stays unauthenticated (static OK,
   no subprocess) so the FastAPI proxy can probe liveness without the
   secret.

The service refuses to start if LEGAL_CHAT_SHARED_SECRET is empty or
shorter than 24 chars — no silent fallback to an open mode.

When the Infisical MCP comes back, migrate the secret into the vault
at /_GUIDELINES per the project secrets policy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:22:14 +00:00
a3454bcb57 fix(training): bundle reference content + use docker bridge gateway
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 9s
The Style Studio's curator-prompt + chat features read reference docs
from disk at runtime. Two issues from the initial production run:

1. Dockerfile + .dockerignore excluded .claude/, docs/, and most of
   skills/. Now COPY the four specific files the new endpoints need:
     - .claude/agents/hermes-curator.md
     - skills/decision/SKILL.md
     - docs/legal-decision-lessons.md
     - docs/corpus-analysis.md
   .dockerignore opens whitelists for just those files.

2. Coolify's custom_docker_run_options=--add-host=host.docker.internal:host-gateway
   is not honored on dockerimage build_pack apps (ExtraHosts stayed []).
   Switch chat_proxy.py default to http://10.0.1.1:8770 — the docker0
   bridge gateway, same pattern Paperclip uses for 3100. Bind the host
   pm2 service to 0.0.0.0:8770 so the container can reach it via the
   bridge IP. Oracle Cloud's security list keeps the port unreachable
   from the public internet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:15:27 +00:00
bb0cd7c6a2 feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 2m7s
Six-phase upgrade of /training from a read-only dashboard into a full
Style Studio for managing Daphna's style corpus.

- Upload Sheet on /training: file → proofread preview → commit (no more
  CLI-only `upload-training` skill).
- Rich corpus metadata: GET /api/training/corpus returns summary, outcome,
  key_principles, page_count, parties (regex), legal_citation, lessons_count.
  PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details
  /content/lessons/patterns) replaces the bare table row.
- LLM metadata enrichment: style_metadata_extractor + MCP tools
  (style_corpus_enrich, style_corpus_pending_enrichment) fill summary
  /outcome/key_principles via claude_session (free, host-side).
- Per-decision lessons: new decision_lessons table + 4 REST endpoints +
  LessonsTab in drawer; hermes-curator now auto-posts findings as
  decision_lessons(source=curator).
- Curator Portrait tab: prompt rendered with link to Gitea, recent
  curator findings, style_analyzer training prompts, propose-change
  form that writes proposals to data/curator-proposals/ for manual
  chair review (no auto-mutation of the agent file).
- Style chat tab: SSE-streamed conversations with the style agent.
  New host-side pm2 service (legal-chat-service, port 8770) wraps
  claude CLI with stream-json + --resume continuation; FastAPI proxies
  via host.docker.internal. Zero API cost — uses chaim's claude.ai
  subscription. chat_conversations + chat_messages persist history.

Architecture: keeps the existing rule that claude_session only runs
on the host (not the container). The new legal-chat-service is the
canonical bridge between the container and the local CLI for the chat
feature; everything else (upload, metadata, lessons) stays within the
container's existing capabilities.

Audit script (scripts/audit_training_corpus.py) included for verifying
which corpus rows still need enrichment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 10:06:22 +00:00
2aee398b4a feat: Stage C — RAG advanced (#33, #47, #48, #49, #50, #51)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
Six independent sub-tasks dispatched in parallel; aggregated here.

## #33 — Hide case_name column
library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם"
get `className="hidden"` in both Court and Committee row variants.
DB column preserved for future use.

## #47 — Audit script periodic
New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר
prefix, internal missing chair/district, cases.practice_area enum)
+ CEO wakeup on violations + cron `0 7 * * *`. First run: 0 issues.

## #48 — Parent-doc retrieval (gated, default off)
Schema V17: precedent_chunks.parent_chunk_id + chunk_role
('child'|'parent'). New chunker.chunk_document_hierarchical() —
section-aware parents (~1500 tokens) containing ~5 overlapping
children (~300 tokens each). New db.store_precedent_chunks_hierarchical
two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and
swap content + dedupe by parent_chunk_id when flag on. Toggle:
PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS.
Backfill ~3min and ~$0.20 — deferred to follow-up.

## #49 — Multimodal backfill
New scripts/backfill_multimodal_precedents.py with token-matching
case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container:
26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings
grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no
source file on disk) — will catch up when re-uploaded.

## #50 — Closed-loop feedback + nDCG
Schema V18: search_logs + search_relevance_feedback. New telemetry.py
with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) +
auto-infer_relevance_from_citations (reads case drafts → marks score=3
when cited precedent appears in past search top-K). Hooks added to 5
search paths. scripts/compute_ndcg.py for aggregation. Two admin API
endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI
deferred — API is enough for now.

## #51 — Halacha quality monitoring
New scripts/monitor_halacha_quality.py — baseline avg confidence
(trusted=0.849, all=0.833, pending=0.694) with rolling window drift
detection. Default 5% threshold. Exits non-zero on alert for cron
integration. Recommended: `0 8 * * 1` weekly Mon 8am.

## Bonus: 230 unlinked citations → missing_precedents
Bulk-imported 230 distinct unlinked citations from
precedent_internal_citations to missing_precedents.status='open',
party='committee', with notes listing source citers. Top candidate:
ע"א 3213/97 (cited 5x). Total open missing_precedents now 237.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 11:26:52 +00:00
ac3ed455cf fix(cases): בל"מ badge reads proceeding_type, not just appeal_subtype
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 43s
After the proceeding_type field landed, users started flipping cases
to בל"מ via the edit dialog. But the case-header badge + cases-table
filter were still gated on isBlamSubtype(appeal_subtype), so the badge
didn't appear when only the proceeding_type changed. Now the badge
shows when either proceeding_type === 'בל"מ' OR appeal_subtype is an
extension_request_* variant — the legacy path stays so existing rows
that never got a proceeding_type still render correctly.

Also regen types.ts from prod (proceeding_type now in OpenAPI schema)
and register the one-shot process_pending_blam.py script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 09:34:23 +00:00
d359ab9884 feat(proceeding-type): explicit ערר/בל"מ field for cases + corpus
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m40s
Same case_number can exist as both a regular appeal (ערר) and an
extension-of-time request (בל"מ), and we were inferring the difference
from appeal_subtype prefixes — fragile, and case-number lookups
weren't disambiguated. Now stored as a first-class field on both
case_law (corpus) and cases (live cases), with partial unique indexes
on (case_number, proceeding_type).

- SCHEMA_V15: column + CHECK constraints + backfill from
  appeal_subtype LIKE 'extension_request_%' + partial unique indexes
  replace the old global UNIQUE(case_number).
- derive_proceeding_type() centralizes the inference rule
  (extension_request_* → בל"מ; subject regex fallback; default ערר).
- Metadata extractor prompt asks Claude to populate the new field
  explicitly; apply_to_record writes it for internal_committee rows.
- internal_decision_upload, case_create, case_update accept an
  optional proceeding_type; FastAPI request models expose it.
- Wizard + edit dialog get a sided Select; case header renders the
  resolved label (ערר / בל"מ).
- Uploaded the 2 staged בל"מ decisions on betterment levy:
  8126/24 (סופר נוח, 13 chunks), 8047/23 (הרנון, 48 chunks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 09:17:33 +00:00
f3cc9ca9d4 feat: Stage A finalizers + #35/#36/#37 — critical-gap closure
Some checks failed
Build & Deploy / build-and-deploy (push) Has been cancelled
Four parallel sub-agents closed the remaining critical gaps from the
26/05 Stage A/B sprint. Each block independently tested; aggregated here.

## #30/#31 finalizers (sub-agent A)
* Auto-derive practice_area in case_create from case_number prefix
  (1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197);
  default for CaseCreateRequest is now "" (the DB constraint catches
  any stray "appeals_committee").
* practice_area.py: derive_subtype now handles axis-B domain values
  (rishuy_uvniya/betterment_levy/compensation_197) without parsing the
  case number; new helper derive_domain_practice_area().
* Halacha re-extraction verified unnecessary — all 6 reclassified
  records already had is_binding=false and approved halachot.
* Regression tests: 6 cases in tests/test_corpus_constraints.py
  covering practice_area enum, internal-committee chair/district,
  external-upload arar prefix, MCP guard.
* UI: district input → Select dropdown (7 districts) in
  precedent-edit-sheet.tsx, preserving legacy free-text values.

## #37 בל"מ subtypes (sub-agent B)
* 3 new appeal_subtypes: extension_request_{building_permit,
  betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended,
  SUBTYPES_BY_AREA mappings added.
* New helpers: is_blam_subject(), is_blam_subtype(),
  derive_subtype_with_blam(case_number, subject, practice_area).
  case_create now uses it to auto-detect "בקשה להארכת מועד" subjects.
* 3 methodology templates under docs/methodology/extension-request-*.md.
* paperclip_client.py mapping updated for the 3 new subtypes
  (extension_request_building_permit→CMP, the other two→CMPA).
* Frontend: bilingual "בל"מ" badge + filter dropdown on cases list +
  detail header; appeal-type-bars collapseBlam() merges בל"מ into its
  parent domain for aggregate bars.
* Wizard auto-detects בל"מ from subject during case creation.
* 3 Berlinger cases (1017/1018/1019-03-26) migrated to
  appeal_subtype=extension_request_building_permit via psql.

## #35 missing_precedents feature (sub-agent C)
* Schema V13: missing_precedents table (citation, case_id, party,
  legal_topic, status, linked_case_law_id, claim_quote, ...) +
  FK constraints + 3 indexes. Applied via psql + idempotent migration.
* 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints
  (POST/GET/PATCH/DELETE/upload — upload routes by citation prefix
  to ingest_internal_decision or ingest_precedent).
* Next.js page /missing-precedents with 5 status tabs + filters +
  sidebar badge counter + detail drawer with metadata edit + smart
  upload form that switches fields per committee/court.
* Bootstrap: 7 rows imported from the JSON file
  (3 citations × cases, all status=closed with linked_case_law_id).
* legal-researcher.md: new §2ב.5 with missing_precedent_create
  usage + dedup semantics + tool grant.

## #36 legal_arguments aggregation (sub-agent D)
* Schema V14: legal_arguments + legal_argument_propositions M:M.
  Applied via psql.
* New service argument_aggregator.py with two functions —
  aggregate_claims_to_arguments() (Claude CLI / claude_session) and
  get_legal_arguments(). Graceful llm_unavailable handling when CLI
  is missing (containers).
* 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as
  BackgroundTask, GET .../legal-arguments).
* Frontend: shadcn Accordion + new legal-arguments-panel.tsx with
  hierarchical (party → priority badge → arguments) display, "טיעונים"
  tab on the case page, "חשב/חשב מחדש" buttons.
* scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run
  found 8 candidate cases including 1017/1018/1019.

## Open follow-ups (intentionally deferred)
* npm run api:types in web-ui (CLAUDE.md flow) — recommended before
  the next UI commit; not required for backend deployment.
* Run backfill_legal_arguments.py --apply once the container picks up
  the new aggregator service.
* webhook on missing-precedents upload-close to Paperclip (optional).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 08:34:40 +00:00
5f43659b5a fix: add defensive JSON parsing in check_instructions 2026-05-16 17:53:42 +00:00
86734da210 feat: add --check-instructions, pre-flight validation, and mtime tracking to sync script
- P3-T1: --check-instructions flag + check_instructions() prints a table of all
  agents' instructionsFilePath with status ( OK /  MISSING / ⚠ NOT SET),
  size, mtime, and ⚠ DRIFT when file has changed since last sync
- P3-T2: --apply now runs a pre-flight check on master agents and aborts if any
  instruction file is missing, before touching the DB or calling any API
- P3-T3: get_claude_md_mtime() helper; --apply stamps claude_md_mtime and
  claude_md_last_synced into each mirror agent's metadata via the PATCH call
- P3-T4: alias check-agents added to ~/.bashrc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 17:51:34 +00:00
45341a0bc8 feat(curator): switch Hermes Curator to DeepSeek V4-Pro via deepseek_local adapter
A/B test (2026-05-05) showed DeepSeek V4-Pro is 2-3x faster and ~20x cheaper
than Sonnet for style/lexicon pattern analysis, with comparable quality.
Adds adapters/deepseek-paperclip-adapter/ package, documents adapter requirements
(env injection, run-id headers), updates CLAUDE.md with adapter integration notes,
and records lessons from ערר 1200-25 (block order for 1xxx, "להלן מתוך" pattern,
expanded factual background, bridge planning analysis, flat heading structure).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 05:58:52 +00:00
1b14e04373 chore(skills): remove paperclip-dev, scope converting-plans-to-tasks
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 7s
paperclip-dev is for maintaining the Paperclip codebase itself — not
relevant to legal work. Removed from all 14 agents (was on CMPA mirror).

paperclip-converting-plans-to-tasks helps decompose a plan into assigned
issues. Useful for the planning-heavy agents (CEO, analyst). Now scoped
to those two — removed from the other 5 in CMPA where it had crept in.

Net effect: zero drift on paperclipai/* skills across all 7 master+mirror
pairs. Verified via the new Agents tab dashboard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:47:05 +00:00
cf5f6fe274 feat(paperclip): close 11 integration gaps (#16-#28)
Brings the legal-ai ↔ Paperclip integration in line with the official
Paperclip skill. Net effect: HEARTBEAT.md -47% (370→195 lines), all 14
agents on uniform runtime_config + budget + instructionsBundleMode, and
two cross-company helpers replacing manual SQL.

Highlights:
- HEARTBEAT.md refactor: project-specific only, delegates to the official
  paperclipai/paperclip skill (loaded per agent). Adds heartbeat-context
  fast-path (§1.7) and PAPERCLIP_WAKE_PAYLOAD_JSON shortcut (§1.5).
- Issue Thread Interactions API: legal-ceo.md now uses
  ask_user_questions / request_confirmation / suggest_tasks instead of
  free-text comments — gives chair structured UI with idempotency keys.
- pc.sh + paperclip_api.pc_request: every API call goes through helpers
  that inject Authorization + X-Paperclip-Run-Id (audit trail).
- sync_agents_across_companies.py: master(CMP)→mirror(CMPA) sync via
  Paperclip API, idempotent, with --verify and --apply modes.
- skills/new-company-setup: 11-step blueprint distilling all 11 gaps
  into a single onboarding runbook for the next company.
- .taskmaster: 12 tasks covering each gap (one already closed: #29).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:25:45 +00:00
8a815ecff5 fix(retrieval): rewrite chunk-page retrofit to skip OCR
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 16s
The first-pass retrofit re-extracted via extractor.extract_text, which
re-runs Google Vision OCR on scanned pages. OCR is non-deterministic,
so the new text didn't match the chunk content stored in the DB
(produced by the original OCR run) — only ~7% of chunks were located.

New approach (no OCR cost):

1. Use the stored documents.extracted_text from the DB — the exact
   text the chunks were produced from, so chunk lookups match.
2. Anchor page boundaries via PyMuPDF direct text reads (free, no
   OCR). Pages with usable direct text are anchored by snippet match;
   OCR-only pages are linearly interpolated between anchors.
3. Search each chunk in extracted_text using a whitespace-tolerant
   helper — needed because the chunker joins paragraphs with single
   '\\n' while extracted_text uses '\\n\\n' as page separators.

Verified on 8174-24 (5 docs, 307 chunks) + 8137-24 (9 docs, 512
chunks): 100% chunks tagged, 13s total, $0 cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:04:33 +00:00