legal-ai

Author	SHA1	Message	Date
chaim	a1db283ce1	Merge pull request 'fix(extraction): self-heal לתור חילוץ-ההלכות + drainer מתוזמן' (#142 ) from worktree-halacha-selfheal into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m41s Details	2026-06-08 06:05:27 +00:00
Chaim	97ede1a49d	fix(extraction): self-heal stale halacha 'processing' rows + scheduled drainer The halacha extraction queue was stuck (same class as the metadata issue): 26 precedents requested extraction with no drainer, plus 1 orphaned in 'processing' (status=processing, requested_at cleared → never re-picked by the queue). - db.requeue_stale_processing_extractions(kind): re-stamp orphaned 'processing' rows (requested_at IS NULL) so they re-drain; halacha extractor force=False resumes from chunk checkpoints (no duplicates). - process_pending_extractions calls it at the top — fully unattended, safe under the global advisory lock. Mirrors the digests-drain self-heal. - legal-halacha-drain.config.cjs: pm2 cron (every 2h, conservative — Claude is slow/rate-limited and each run adds to the chair's pending_review queue). drain_halacha_queue.py stays on claude_session (high reasoning quality for holding/ratio; NOT moved to Gemini). SCRIPTS.md. The chair-approval gate (INV-G10) is untouched — this only produces halachot; Daphna still approves each in /approvals. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 06:04:53 +00:00
Chaim	83d1a8253c	feat(digests): digest_kind classification — robust extraction for all issue types (X12) ~2% מגיליונות "כל יום" הם לא-הכרעות (עדכוני-חקיקה/הודעות/ברכות) ללא ruling → החילוץ ה-decision-centric החזיר ריק → both-empty → מחזורי ב-self-heal. - SCHEMA_V32: `digest_kind` (decision/announcement/other) + backfill legacy בזול (יש citation→decision, אחרת announcement) — לפני שה-self-heal מסתמך עליו. - extractor: prompt מסווג + מחלץ תמיד concept/headline/summary; underlying_* רק ל-decision. extract מנרמל digest_kind. - enrich: שומר digest_kind; חילוץ מוצלח תמיד מסתיים ב-kind לא-ריק (ברירת-מחדל לפי citation אם המודל השמיט). - drain self-heal: הגדרת-כשל = completed עם digest_kind='' (במקום both-empty) → הודעות לא מנוסות-מחדש לנצח. - db: digest_kind ב-_DIGEST_COLS + update-whitelist (זורם ל-search/list/API). - X12 spec: תיעוד digest_kind + הגדרת-הכשל המתוקנת. אומת: V32 סיווג 533 (525 decision + 8 announcement, 0 unclassified — self-heal לא נוגע בהם). extract: 5163→decision+citation · 5060→announcement+concept, citation ריק (לא both-empty). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 06:02:08 +00:00
Chaim	d95a36f310	feat(extraction): precedent metadata via Gemini Flash + scheduled drainer The /precedents metadata queue was stuck — 24 rows requested, nothing draining them — and the agentic claude CLI hit error_max_turns on what is a single structured text→JSON task (slow + flaky). Metadata extraction is bounded extraction, the wrong fit for an agentic loop. - gemini_session.py: query_json drop-in (gemini-2.5-flash, JSON mode, httpx — no new SDK dep). Reads GEMINI_API_KEY (~/.env; SoT Infisical nautilus:/external-apis/gemini). Host-side only — no LLM from the container. - precedent_metadata_extractor: claude_session.query_json → gemini_session. Validated live: rich, accurate fields (case_name/summary/appeal_subtype/tags). - process_pending_extractions: kind-aware cooldown — metadata 2s (Gemini, fast), halacha keeps 30s (Claude rate limits). - drain_metadata_queue.py + legal-metadata-drain.config.cjs (pm2 cron */15) so the queue never clogs again. SCRIPTS.md. - X8 INV-FP5 updated: per-task engine choice (Gemini=bounded metadata, claude_session=agentic halacha), both host-side, single canonical queue (G2). Agentic/voice-sensitive work (writing, analysis, halacha) stays on claude_session (Daphna's subscription). Gemini cost ≈ $0.10/1M tokens — negligible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 05:13:49 +00:00
Chaim	da4ebeb724	feat(halacha): panel safety-net audit (selective-prediction monitoring) Periodic safety net for the multi-judge approval panel: samples panel-approved halachot, re-runs the same 3-judge KEEP vote, and surfaces any that now lean DROP — candidate false-keeps a human should glance at. Report-only by default; --flag reopens flips to pending_review. Baseline 0/15 on the 2026-06-07 batch. Closes the loop the literature prescribes (Trust-or-Escalate / selective prediction): monitor the auto-decision error rate rather than trusting it blindly. Reuses halacha_panel_approve's judges (single source of truth). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 05:01:03 +00:00
Chaim	b5f7b60fb5	fix(digests): self-heal stale 'processing' rows in drain (fully unattended) drain_digests רץ תחת flock (drainer יחיד), אז כל שורה 'processing' בתחילת ריצה היא שריד מריצה קודמת שנקטעה באמצע-שורה (סשן/מכסה). מאפסים אותה ל-'pending' לריצה חוזרת — סוגר את הפער האחרון ל-resume אוטומטי מלא ללא התערבות. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 04:52:55 +00:00
Chaim	dba2a131e0	feat(halacha): multi-judge approval panel + policy calibration (Trust-or-Escalate) The chair cannot review every pending halacha. Three independent-lineage judges (Opus via claude_session · DeepSeek · Gemini-2.5-flash — #1 on LegalBench) vote on the COARSE axis we proved reliable across models (92%): "is this a genuine, keepable rule?". Only an agreed verdict acts; every split escalates to the chair (INV-G10). Buckets: clean→KEEP?; nli_unsupported→entailment re-adjudication; extraction-defects→re-extraction. halacha_panel_calibrate.py calibrates the voting policy on the gold-set's is_holding (the coarse label) per Trust-or-Escalate (ICLR 2025): unanimous → 94.9% precision / 78% coverage; majority → 92.9% / 99%; ZERO false-drops in both (the panel never rejects a good rule). Chosen policy (chair-approved): clean→majority-2/3, nli→asymmetric (majority-reject, unanimous-approve), defects→re-extraction. Reversible (--apply backs up review_status+flags first). Sources: Panel-of-LLM-Evaluators (PoLL) · Trust-or-Escalate (ICLR 2025, arXiv:2407.18370) · selective-prediction / learning-to-defer. Invariants: upholds G10 (human gate — splits escalate, panel only collapses the queue) and G9 (provenance — reviewer records the panel + policy). Read paths only in calibrate; --apply writes review_status/quality_flags reversibly with backup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 21:11:30 +00:00
Chaim	360f49d8b4	docs: record Infisical SoT for host-service shared secrets COURT_FETCH_SHARED_SECRET + LEGAL_CHAT_SHARED_SECRET migrated to Infisical nautilus:/legal-ai (2026-06-07). Updated the pm2 config comments: the stale "migrate to Infisical once the MCP server is back" TODO is now done; local env files remain the runtime source, Infisical is the SoT/record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 21:04:44 +00:00
Chaim	3ae183009f	feat(digests): self-heal in drain_digests — auto-resume after quota/interruption ה-cron של drain_digests הוא מנגנון ה-resume (pending-based, idempotent, host-side, לא תלוי בסשן). חיזוק: אם enrich נכשל באמצע (מכסת claude נגמרה) השורה נשארה 'completed' עם שדות ריקים → לא היתה מטופלת שוב. עכשיו drain מאפס בתחילתו כל digest 'completed' עם concept_tag ריק וגם underlying_citation ריק (= חילוץ שמעולם לא נחת; שורה תקינה תמיד מכילה לפחות מראה-מקום) → pending לריצה חוזרת. כך כל קטיעה/מכסה מתאוששת אוטומטית בריצת ה-cron הבאה. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:59:49 +00:00
Chaim	c1abf2ec0e	feat(digests): scripts/drain_digests.py — local enrichment drainer for cron (X12) ריקון תור ההעשרה של יומונים מקומית (claude_session local-only): כל digest 'pending' → enrich_digest (Sonnet + embedding + autolink). מקבילי (3), idempotent, מוסיף ~/.local/bin ל-PATH (claude CLI תחת cron). מיועד ל-cron יומי אחרי ה-poll של n8n (flock למניעת חפיפה) + שימוש ידני אחרי backfill. SCRIPTS.md עודכן. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:40:45 +00:00
chaim	6468e151d9	Merge pull request 'refactor(digests): single source of truth — drop processed/ folder state (X12)' (#122 ) from worktree-digests-single-truth into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 28s Details	2026-06-07 20:33:43 +00:00
Chaim	fb40ec8565	refactor(digests): single source of truth — drop processed/ folder state (X12) ה-DB (`digests`) הוא מקור-האמת היחיד למצב-קליטה. ingest_digests_batch.py העביר קבצים incoming→processed/ — state מבוסס-תיקיות מקביל ל-DB (הפרת-G2 קטנה). - הוסר ה-move ל-processed/ + import shutil + PROCESSED. הסקריפט מסתמך על dedup ב-content_hash (ingest_digest מחזיר 'exists' לקיימים) → הרצה חוזרת בטוחה. - תיקיות (incoming/) = staging בלבד, לא state. - X12 INV-DIG2: תועד מקור-אמת-יחיד + ההפרה-שתוקנה (processed/). - SCRIPTS.md עודכן. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:33:18 +00:00
Chaim	f4f110f0d1	feat(X13): scheduled drain — fully-autonomous digest→fetch→ingest loop - scripts/drain_court_fetch.py: drives orchestrator.drain_pending (host-only; no-op when queue empty). Mirrors drain_halacha_queue.py. - scripts/legal-court-fetch-drain.config.cjs: pm2 cron (hourly :17, one-shot), COURT_FETCH_DRAIN_CRON override. - fix: orchestrator default service URL 127.0.0.1 → 10.0.1.1 (the service binds the docker0 gateway; the host can't reach it on loopback). Found live — the first drain failed "connection refused" until corrected. - SCRIPTS.md entries. Validated end-to-end in PRODUCTION on a real digest: עת"מ 43830-12-24 (החברה להגנת הטבע) fetched from נט המשפט → case_law (79 chunks, source_url), digest relinked (INV-DIG3 closed), halacha queued pending_review. job=done. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:31:53 +00:00
Chaim	808c2e4c46	feat(goldset): independent second-judge for rule_role (break AI-anchoring) The gold-set's human role tags were made while seeing a claude AI recommendation, so human↔AI agreement (~100%) is anchoring, not an independent accuracy signal. This adds a third, genuinely independent judge — a DIFFERENT model (DeepSeek, direct OpenAI-compatible API) classifies rule_role BLIND (never sees the human tag nor the first AI's answer) — and reports an inter-rater agreement matrix. Finding (100 tagged items): ai↔human 100% (anchored) vs deepseek↔human 50% fine-grained — BUT 92% on the coarse axis (generalizable-rule vs application/ obiter). Conclusion: the fine sub-type (holding/interpretive/procedural) is an inherently fuzzy boundary two capable models split differently; the coarse "is this a real rule" axis is robust across models. Use the coarse axis as ground truth; treat the sub-type as advisory, never as a gate. Zero chair tagging, read-only on the gold-set. Key from ~/.hermes deepseek env. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:12:58 +00:00
Chaim	e186183527	fix(X13): harden court-fetch against browser leaks + reaper for task-master-mcp leak שלוש שכבות-הגנה נגד דליפת-זיכרון מדפדפנים יתומים, + טיפול בדליפה הגדולה בפועל בשרת (task-master-mcp). - camofox_client.py: - asyncio.wait_for קשיח סביב כל ה-fetch (COURT_FETCH_HARD_TIMEOUT_S=180ש') — hang → ביטול → async-with tear-down → reap. - _reap_orphan_browsers(): הורג camoufox-bin יתומים (ppid=1) לפני ואחרי כל fetch. סדרתיות (INV-CF4) → כל ppid=1 הוא שארית בטוחה. - scripts/reap_orphan_procs.py: reaper כללי ל-task-master-mcp (~3GB יתומים) + camoufox-bin. רק ppid=1; /proc טהור. --dry-run / --loop N. - scripts/legal-reaper.config.cjs: דמון pm2 (loop 180s, max_memory_restart 100M). - X13 spec + SCRIPTS.md: תיעוד שכבות-ההגנה. max_memory_restart בשירות (1.5G) כבר נותן רשת-ביטחון ברמת-התהליך. Invariants: מקיים INV-CF4 (politeness/serial) — ללא שינוי חוזה. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 19:43:53 +00:00
Chaim	781f24c643	feat(X13 Tier-1): calibrate נט המשפט fetch — Camoufox python, proven on 46111-12-22 אומת end-to-end: פס"ד 34 עמ' של עת"מ 46111-12-22 הורד אוטונומית מלא, נטו קוד-פתוח, ללא כרטיס-חכם וללא פתרון-CAPTCHA. ממצאי-כיול עיקריים: - החיפוש+הניווט-לתיק ללא reCAPTCHA כלל. reCAPTCHA קיים רק בצופה ורק על שמירה/הדפסה מפורשת — לא על הצגת המסמך. - הצופה מגיש עמודים כ-PNG דרך PageMethod GetImages (4/batch); משיכה ב-fetch עם הכותרת X-Requested-With: XMLHttpRequest (חובה — F5 WAF חוסם בלעדיה) → הרכבת PDF (Pillow). שינויים: - camofox_client.py: שכתוב מלא — Camoufox דרך חבילת-הפייתון (in-process, לא שרת-Node REST). מסלול מכויל: home→btnExternalSearchCases→Bama fields→ CaseDetails→פסקי דין→DecisionList→NGCSViewerPage→GetImages→PDF. - pm2 config: app Xvfb :99 + DISPLAY=:99 (Camoufox קורס headless בלי צג וירטואלי). - pyproject: extra [court-fetch] = camoufox + faster-whisper (host-only; הקונטיינר לא מריץ דפדפן). Pillow כבר בבסיס. - X13 spec + SCRIPTS.md: עודכנו לממצאים (image-API, Xvfb, אימות). reCAPTCHA audio (Whisper) נשמר כ-fallback למסלול-השמירה-המפורש בלבד; המסלול הראשי אינו זקוק לו. Invariants: מקיים INV-CF1/CF4/CF6 (ללא שינוי). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 19:32:13 +00:00
chaim	f3740fef68	Merge pull request 'fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7)' (#112 ) from worktree-halacha-authority-split into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m32s Details	2026-06-07 18:19:43 +00:00
Chaim	2e33cac043	fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7) The extractor classified rule_type by SOURCE bindingness (higher-court→binding, committee→persuasive) instead of by rule KIND. The gold-set proved it: 'binding' appeared on 19/19 external rulings & 0 committees; 'persuasive' on 13/13 committees & 0 external — only 58% agreement with the human role tags. The two axes (authority vs rule role) were crammed into one enum. This splits them per INV-DM7: - authority (binding/persuasive) — DERIVED from case_law.precedent_level (עליון/מנהלי→binding, ועדת_ערר_מחוזית→persuasive), never stored, never LLM-guessed. New helper halacha_quality.derive_authority; surfaced read-only in list_halachot / goldset_list / search results. - rule_type — now the rule ROLE only: holding/interpretive/procedural/ application/obiter. Both extractor prompts unified to this vocabulary; _coerce_halacha no longer defaults rule_type from the source; legacy binding→holding / persuasive→interpretive fold for safety. UI: authority shown as a separate read-only badge (gold=מחייב / muted=משכנע) across the review queue, precedent detail, and gold-set; the gold-set role selector drops binding/persuasive and adds מהותי (holding). Migration: scripts/halacha_rule_role_backfill.py re-classifies the 276 pre-split binding/persuasive rows into a genuine role via local claude_session (run after deploy). Gold-set correct_type/ai_correct_type 'binding'→'holding' via SQL. Sources (≥3, per research-decision policy): OASIS LegalRuleML v1.0 (appliesAuthority/Strength as metadata orthogonal to rule logic) · SemEval-2023 Task 6 LegalEval (rhetorical roles by function, authority kept separate) · Bluebook signals (weight-of-authority is a separate dimension). Invariants: ESTABLISHES INV-DM7. Upholds G1 (normalize at source — extractor classifies role, system derives authority) and G2 (single source of truth — authority derived, not a parallel stored field). Tests: 211 pass + new derive_authority/coerce coverage. web-ui build + tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:18:41 +00:00
Chaim	0990db7a3c	feat(X13): auto-fetch court verdicts from נט המשפט → corpus (Tier 0 + scaffold) תת-מערכת אחזור-פסיקה אוטומטי: כשיומון מצביע על פס"ד בית-משפט, מסווגים את הערכאה, מורידים מהמקור הציבורי המתאים, וקולטים דרך צינור-הקליטה הקנוני. - spec-first: docs/spec/X13-court-fetch.md (INV-CF1..CF7) + אינדקס - מסווג court_citation.py (supreme/admin/skip) + 10 בדיקות (עת"מ 46111-12-22 → admin) - Tier 0: court_fetch_supreme.py — supremedecisions API (reverse-engineered), httpx + browser-headers (אומת 200) + politeness - תור court_fetch_jobs (SCHEMA_V30) + DB helpers + court_fetch_orchestrator.py - Tier 1 scaffold: legal-court-fetch-service (aiohttp+Bearer, מראת legal-chat-service) + camofox_client (Camoufox open-source) + recaptcha_audio (Whisper מקומי) + pm2 - Tier 2 fallback חינני: manual + missing_precedent (INV-CF2/CF3 — אין drop שקט) - כלי-MCP court_verdict_fetch / court_fetch_status; SCRIPTS.md Invariants: מקיים G2 (מסלול-קליטה יחיד, INV-CF1) · G3/G1 (idempotent+נרמול, INV-CF5) · G4/§6 (אין בליעה שקטה, INV-CF2) · G10 (שער-אנושי, INV-CF3) · G5 (source_type, INV-CF6) · G9 (provenance+audit, INV-CF7). מקורות INV-CF4: RFC 9309 · Google crawler · OWASP OAT. Follow-ups (טרם אומתו חי): live Tier-0 validation · התקנת camofox-browser+whisper · כיול selectors Tier-1 · COURT_FETCH_SHARED_SECRET (Infisical+Coolify) · טריגר מ-digest try_autolink (worktree-digests-radar). V30 עלול להתנגש עם digests-radar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:12:13 +00:00
Chaim	8171572cdd	feat(digests): קורפוס יומונים כשכבת-גילוי (radar) — X12 מאגר חדש ליומוני "כל יום" (עפר טויסטר) כשכבת-גילוי מעל קורפוסי-הפסיקה: מקור-משני המצביע על פסק הדין המקורי, נקלט לטבלה נפרדת `digests`, נחפש סמנטית, ומקושר לפסק המקורי בספריית הפסיקה — אך לעולם אינו מצוטט בהחלטה ואינו מחלץ הלכות. Phase 0 (spec): - docs/spec/X12-digests-radar.md — INV-DIG1 (מצביע לא מצוטט) / INV-DIG2 (מסלול-קליטה נפרד, לא מקביל — מקיים G2) / INV-DIG3 (קישור-לפסק הוא הגשר; חוסר-קישור = פער גלוי). עדכון אינדקס 00/03/README. Phase 1 (MVP): - SCHEMA_V30: טבלת `digests` (HNSW על embedding — לא ivfflat, להימנע מ-recall cliff בקורפוס קטן/צומח) + GIN/FTS + UNIQUE חלקי ל-idempotent. - services/digest_metadata_extractor.py — חילוץ-LLM (claude_session local-only, ייבוא lazy): תג-מושג, כותרת-הלכה, מראה-מקום, שני-תאריכים מובחנים, תגיות. - services/digest_library.py — מסלול קצר עצמאי (INV-DIG2): extract→hash→LLM→ embedding יחיד→autolink. לא משתמש ב-ingest.ingest_document. - tools/digests.py + רישום 7 כלים ב-server.py (digest_upload/list/get/link/ relink/delete + search_digests). - scripts/ingest_digests_batch.py — קליטה ידנית מ-data/digests/incoming. - legal-researcher.md: שלב 2ב.0 (סריקת-radar לפני אימות) + סעיף-דוח ט + 3 כלים ב-frontmatter. HEARTBEAT §8: ניתוב יומון→digest_upload. אומת end-to-end: 4 יומונים נקלטו (מטא-דאטה מדויק), חיפוש סמנטי מדרג נכון ("היטל השבחה"→5160, "תמא 38"→5158), link/relink/autolink/revert + מעטפת-MCP. Invariants: מוסיף INV-DIG1/2/3 (X12). מקיים G2 (bounded context נפרד, לא מסלול מקביל), G3 (idempotent upsert), G4 (אין בליעה שקטה — פער-קישור מוצף), G9 (עקיבוּת — היומון מצביע על מקור עקיב). נוגע G7 (RRF) — נדחה, חיפוש סמנטי-בלבד בשלב 1 (FTS index מוכן). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 17:49:00 +00:00
Chaim	0e35060d3d	feat(goldset): AI second-opinion per item (QA aid) — compare vs human tag The chair wanted an independent recommendation beside each tag, to reconsider his own judgments. Adds a NON-ground-truth AI second-opinion: - schema: halacha_goldset.ai_is_holding / ai_correct_type / ai_rationale / ai_generated_at (additive). - db.goldset_set_ai_recommendation + goldset_list now returns the ai_* fields. - scripts/goldset_ai_recommend.py — local claude_session judges is_holding + type + a one-line rationale per item, INDEPENDENTLY (own legal rubric). Independent of the rule-based validators #81.8 measures → no circularity. Never auto-applied; QA aid only. - web-ui: each card shows "🤖 המלצת AI: הלכה/לא · type" + rationale and an agreement/disagreement chip vs the human tag (amber on disagree); a "⚠ אי-הסכמות AI (N)" filter to review only the conflicts. Methodology note kept explicit: the human stays the ground truth; the AI is a prompt to reconsider, not to copy. Verified: tsc --noEmit 0; generator stores recs and flags disagreements with existing human tags. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:24:35 +00:00
Chaim	b7b44f4453	feat(halacha): equivalent-halacha (parallel-authority) links across precedents Cross-precedent recurrence of a principle is real but is NOT citation corroboration (X11) — the 5 candidate pairs have ZERO citations between their precedents. Recording them in halacha_citation_corroboration would fabricate citation data and inflate corroboration_count. This adds a proper, separate halacha-level link for parallel authority. Schema (V28): equivalent_halachot — symmetric (halacha_a < halacha_b, CHECK + UNIQUE), non-citation, cross-precedent-only. ON DELETE CASCADE. db.py: - link_equivalent_halachot (idempotent; rejects same-id and SAME-precedent pairs — parallel authority is cross-precedent by definition), unlink, and list_equivalent_for_halacha. - list_halachot gains include_equivalents → _annotate_equivalents attaches an `equivalents` list (both directions) per row. API: include_equivalents on GET /api/halachot; GET/POST/DELETE /api/halachot/{id}/equivalents for the chair to view/link/unlink manually. scripts/halacha_batch_reconcile.py: --link records found cross-precedent pairs as equivalent_halachot (non-destructive, idempotent). web-ui: Halacha.equivalents type; the clean review queue fetches include_equivalents; the review card shows a gold "עיקרון מקביל ב-N" badge + an expandable list (case + rule + similarity) labeled "אסמכתה מקבילה — לא ציטוט". Populated the 5 reviewed pairs (chair decision: keep all + link as parallel authority). Verified: 5 rows; the 1023-20 hub annotates 3 of its halachot with equivalents; tsc --noEmit exits 0. Invariants: G1 (model recurrence at source in its own table, not by abusing the citator); G2 (no parallel path — extends list_halachot); citator integrity preserved (corroboration stays citation-only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:29:46 +00:00
Chaim	1286a1e60d	feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82) Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes (pure + tested) plus measurement/ops tooling. halacha_quality.py - #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags now takes rule_type and flags rule_type=='application' OR fact-dependent — blocking auto-approve (an illustration is not a generalizable holding). - #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein / lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band. halacha_extractor.py — pass rule_type to the flag computation; re-type a binding-labeled fact-application to 'application' (mirrors non_decision→obiter). db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a possibly-distinct principle unreviewed). config.py — HALACHA_DEDUP_BAND_COSINE (0.83). Scripts: - scripts/halacha_goldset.py (#81.7) — export stratified sample for human tagging; score validators (P/R/F1) against the tags. Backbone for #81.8. - scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent dedup (cosine ≥0.95), dry-run report only. - scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds against the 2026-06-03 cleanup gold-set. Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE on normalized quote are NOT included — the current skip+flag behavior is safe, whereas a UNIQUE on normalized_quote would fail on existing dups and a blind merge risks losing provenance; they need their own chair-reviewed migration. #82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role classifier deferred (section pre-filter + application flag cover the practical case); #81.8 blocked on the human-tagged gold-set (harness now provided). Verified: - pytest tests/test_halacha_quality.py — 52 passed (14 new). - calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall 0.30 — correct profile for an auto-approve-blocking signal. - goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5 cross-precedent candidate pairs. Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no silent swallow — suspect items flagged to review, never dropped); G2 (no parallel path — same store_halachot_for_chunk / compute_quality_flags). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:55:45 +00:00
Chaim	fb51a0e869	feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86 ) #86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route. extractor.py - extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings summary) before it is stripped — a free professional gold-set (#86.3). - _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped. (a) פסק-דין headers are markdown-wrapped (פסק דין); the old anchor required the keyword as the first line char with one separator, so it missed the header and matched a citation 32K deep (עמ"נ 50567-07-21, losing 45% of the body). Now tolerates leading markdown + 0-3 seps, and the final-nun form (דין ן vs דינו נ). (b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The authoring-judge line ends with a colon; we now require it. ingest.py - capture the ratio before stripping and store it on the row (best-effort, non-fatal); also strip the text-upload path (was file-only). db.py - add case_law.nevo_ratio column (additive); allow it in update_case_law. scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration: finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose quote lives in the removed preamble (review_status=pending_review + nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are excluded from --apply as suspected over-strip. --apply writes backup+manifest to data/audit/ first. Chair-gated — NOT applied here. scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session, zero cost) measures recall/precision/granularity of our halachot vs the Nevo ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text). Verified: - pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown over-strip regressions). - backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75% keep (the 32K over-strip is gone). - benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x. Invariants: G1 (normalize at source — strip/capture at ingest, not at read); no silent swallow (contaminated halachot flagged + reported, not dropped); data-migration is dry-run-default with backup+manifest, chair-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:45:43 +00:00
Chaim	2e20e27e17	feat(style-acq T1-T3): קורפוס-דוגמאות של דפנה לכותב (style_exemplars) ממלא את ערוץ-הדוגמאות (B) של מערכת רכישת-הסגנון: הכותב מאחזר פסקאות-בלוק אמיתיות של דפנה בזמן כתיבה, ממוקדות section+outcome+practice_area. T1 — תשתית + backfill: - SCHEMA_V27: טבלת style_exemplars (purpose-built — בלי תיקים מזויפים בשרשרת decision_paragraphs). decision_number/source/section/outcome/practice_area+embedding. - db: insert/delete/search_style_exemplars + count_style_exemplars. - scripts/backfill_style_exemplars.py: מפצל קורפוס דפנה (style_corpus + internal_committee) לסעיפים→פסקאות, embed, שמירה. אידמפוטנטי, dry-run/apply. T2 — אחזור ממוקד: - search_style_exemplars(section, outcome, practice_area) — section=hard filter, outcome/practice_area=soft. block_writer._build_precedents_context ממפה block→section ומאחזר (ראשי), לצד הנתיב הישן (משלים). T3 — contrastive/adapt: - הדוגמאות מתויגות "מבנה/קול בלבד — התאם, אל תעתיק תוכן"; פסקה מלאה (1100 תווים). INV-LRN5 (טוהר — סגנון בלבד). G11. הרצת backfill --apply בנפרד. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 18:10:01 +00:00
Chaim	701efab726	feat(mcp): FU-14 GAP-51 — איחוד אוצר-המילים של תוצאת-תיק (set_outcome SSoT) הכרעת-יו"ר: קנוני = 3 תוצאות אמיתיות (rejection/partial_acceptance/full_acceptance); betterment_levy יוצא מהיותו "תוצאה" ועובר ל-override לפי practice_area. + עקרון "אנגלית-ב-DB, עברית-ב-UI": מפת-תוויות SSoT אחת. lessons.py: - VALID_OUTCOMES = 3 (הוסר betterment_levy). - OUTCOME_LABELS_HE (SSoT לתצוגה) + LEGACY_OUTCOME_MAP + canonical_outcome(). - PRACTICE_AREA_OVERRIDES["betterment_levy"] מרכז את כל ה-guidance שהיה מפתוח כ-outcome (golden_ratios/opening/summary/discussion/template). - get_lessons_for_outcome(outcome, practice_area) + format_ratios_comment(..., practice_area) מחילים override + מנרמלים legacy. block_writer.py: STRUCTURE_GUIDANCE קנוני + תווית מ-OUTCOME_LABELS_HE + override betterment. workflow.set_outcome: קנוני 3 + מיפוי-legacy סלחני; תווית מ-SSoT. drafting.py: טבלת יחסי-זהב + get_decision_template מודעי-practice_area (override). web-ui case.ts: הסרת betterment_levy מ-expectedOutcomes (הוא practice_area). server.py: docstrings קנוניים. מיגרציה: migrate_gap51_outcomes.py — 9 שורות נורמלו (rejected→rejection וכו'), גיבוי ב-data/audit/. הקוד canonicalize בקריאה ⇒ backward-compatible גם בלי מיגרציה. אומת: py_compile (5 קבצים) + בדיקות-יחידה offline (override/legacy/labels) + אימות-DB. עודכנו X9 §3 + gap-audit (GAP-51 ✅). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:34:49 +00:00
Chaim	7f4e036211	feat(spec): חיבור ספ-המערכת למסלול-הכתיבה האינטראקטיבי (אכיפה 3-שכבתית) הספ (docs/spec/, G1–G11) חובר לסוכני Paperclip דרך INV-AG1 אבל לא למסלול שבו רוב הקוד נכתב בפועל — הסשן האינטראקטיבי של Claude Code. סוגר את הפער לפני מחזור-2 (FU-9..15), שהוא כולו כתיבת-קוד. שלוש שכבות אכיפה: 1. תיעוד — CLAUDE.md §"פרוטוקול כתיבת-קוד" + docs/spec בטבלת-הייחוס 2. hook — scripts/spec-guard.sh (PreToolUse על Edit/Write/MultiEdit, רשום ב-.claude/settings.json) מזכיר פעם-בסשן בכל נגיעה בקובץ-קוד; non-blocking 3. PR — .gitea/PULL_REQUEST_TEMPLATE.md עם סעיף-חובה "Invariants" המקבילה האינטראקטיבית ל-INV-AG1 שכבר אוכף על הסוכנים (HEARTBEAT §"קריאת-ספ"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 13:28:15 +00:00
Chaim	434341cc29	chore(#57 ): re-chunk+re-embed legacy precedents (pre-#55 chunker remediation) Adds scripts/rechunk_legacy_precedents.py: selects every case_law with a tiny chunk (content<50 — the pre-fix chunker fingerprint) and runs ingest.reindex_case_law (re-chunk+re-embed from stored full_text only, no re-OCR/LLM, idempotent). Batch-idempotent (re-queries the affected set). Run result (2026-06-03): 73 precedents reindexed, 0 failed. Tiny chunks 483 -> 4 (99.2%); total precedent_chunks 5019 -> 3115 (fragments merged). Search verified healthy (substantial coherent passages, no errors). The 4 residual tiny chunks are isolated section headings ('דיון', 'טענות המשיבים', ...) emitted by the CURRENT (fixed) chunker — not legacy fragments — and are already filtered at query time (>=50, #55). Minor chunker edge case, candidate #55 follow-up. The DB chunk migration is already applied to prod; this commit is the script + SCRIPTS.md entry only (no app code change, no deploy needed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 07:55:42 +00:00
Chaim	887079535c	feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים), לצמצום היקף האישור-הידני של היו"ר: - docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ). - docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס. - Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable; claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31: פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל. - scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות. - .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו). תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות. audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 18:42:13 +00:00
Chaim	6ff2e36bf9	feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63 ) Covers GAP-11 (INV-RET4/G8) and GAP-14 (INV-QA1/G10). Retrieval quality was never measured (only telemetry observation) and the halacha review backlog was invisible (the 10/19 gap was found by accident). Unit B — backlog visibility (pure code, container): - metrics.halacha_backlog(conn) → {pending_review, approved, rejected, published, total, oldest_pending_at}; surfaced in metrics.get_dashboard() (get_metrics MCP tool) and /api/system/diagnostics. Live count revealed 178 pending / 1552 total, oldest from 2026-05-03 — previously invisible. Unit A — retrieval eval harness (host-side scripts): - scripts/eval_gold_bootstrap.py — seeds data/eval/gold-set.jsonl. Two sources: citations (cited==relevant via search_relevance_feedback — empty until decisions cite precedents) and known_item (query=case_name → relevant=self; a real citation-free signal, the methodology #52 checked by hand). Idempotent; preserves source='chair' rows. - scripts/eval_retrieval.py — runs the production retrieval path (search_library / search_internal) over the gold-set; computes precision@k, recall@k, MRR, nDCG@k (k=5,10); aggregates overall + per-corpus + per-practice_area; writes a report and a delta vs committed baseline.json (which records the retrieval_config it reflects). --self-test unit-checks the metric math offline. Gold-set strategy = hybrid (chair decision): bootstrap + chair review. The citation source is empty today (0 cited precedents in decisions), so the seed is known-item (77 queries: 54 internal_decisions + 23 precedent_library). The gold-set is PROVISIONAL until Dafna reviews it (the domain chair-gate). Baseline (production config: multimodal+rerank on): R@10=0.987, MRR=0.837, nDCG@10=0.872. Finding: MULTIMODAL_ENABLED=true slightly lowers known-item recall (image-page results displace exact name matches) — relevant to #15. precedent_library weaker than internal (R@10 0.957 vs 1.0) — one external precedent unfindable by name. "CI gate" realized as discipline (re-runnable harness + committed baseline + run before/after any retrieval-layer change) — retrieval needs prod DB + Voyage, no CI runner has that access. Spec: docs/superpowers/specs/2026-05-31-fu5-eval-harness-design.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 14:58:13 +00:00
Chaim	4fce9d503f	feat(migration): FU-2c — reconcile external case_law identifiers (GAP-08, #68 ) External court precedents stored the full citation (designator + docket + parties + Nevo date) inside case_number, violating INV-ID2/G1 (citation as identifier). Chair decision 2026-05-31 (Option A): canonical external case_number = proceeding-designator + docket, '/' preserved (court convention, not X1's '/'→'-'); parties/court/date → citation_formatted. scripts/fu2c_reconcile_external_case_numbers.py — deterministic dry-run → chair-review → apply, mirroring FU-2b: - extracts designator+docket; flags split into BLOCKING (MISMATCH / CIT_NO_DOCKET / DESIG_MISMATCH / DUP_CHECK / NO_DOCKET) vs ADVISORY (NO_CITATION — case_number fix still deterministic, missing citation is a separate gap), so advisory rows apply while uncertain identity does not. - --overrides CSV (id,proposed_canonical,citation_formatted,reason) for audited chair adjudication of blocking rows. - apply scoped to source_kind='external_upload' (task target) while keeping cited_only/nevo_seed in the reconciliation VIEW so DUP_CHECK spans the full external unique space; pre-flight collision guard before every UPDATE. Applied to production 2026-05-31: 21 case_number normalized + 3 citation_formatted reconciled (D = consolidated Supreme Court judgment לויתן/קלמנוביץ → lead docket 25226-04-25; 2×C empty citations composed from metadata). אהוד שפר עע"מ 317/10 deferred — cross-source duplicate with an existing cited_only reference (collision guard held; → #70). 49 cited_only records out of scope → new task #70 (committee-form NNNN-NN dockets the extractor misses, dedup, unresolvable "ערר אדלר"). Extraction + gating verified offline on all 24 records. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 14:12:45 +00:00
Chaim	e5b34e01dc	docs(scripts): note sync --verify drift-gate semantics (FU-8a)	2026-05-31 11:36:06 +00:00
Chaim	aac383acb7	feat(sync): --verify exits non-zero on drift; adapter mismatch = loud drift (GAP-21, FU-8a) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:14:44 +00:00
Chaim	8477fd87e7	docs(scripts): register fu2b reconciliation script (FU-2b)	2026-05-31 08:58:32 +00:00
Chaim	e46868feda	feat(fu2b): flag PROC_MISMATCH (case_number prefix vs proceeding_type) for chair Dry-run surfaced 2 rows with בל"מ prefix but proceeding_type=ערר. Since the migration strips the prefix, a wrong proceeding_type would silently lose the בל"מ signal — must be chair-adjudicated, not auto-applied. Chair table now flags 4 rows: 2 DUP_CHECK (8047-23) + 2 PROC_MISMATCH. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 08:57:42 +00:00
Chaim	ab8d17fdd8	feat(fu2b): chair-gated internal case_number reconciliation script (GAP-07/08) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 08:54:38 +00:00
Chaim	58ab003206	fix(retrieval): make decisions findable by name + unhide committee uploads All checks were successful Build & Deploy / build-and-deploy (push) Successful in 3m57s Details Root cause of "agent can't find the Agasi decision in the corpus" (CMPA-55): the decision was fully ingested, but the retrieval layer failed on the realistic agent query — searching by case name. - RC-A (#52): lexical tsvector covered only chunk content + halacha text, so a bare-name query ("אגסי") matched decisions that cite the case, not the case itself. Add meta_tsv on case_law(case_name, case_number) (SCHEMA V20) and OR it into the lexical halacha/chunk SQL with a match boost, so a name/number hit surfaces the case's own rows. Agasi: rank 4 → rank 1. - RC-B (#53): precedent_library_list hard-defaulted source_kind=external_upload and never exposed the param, hiding uploaded ערר/בל"מ (internal_committee) decisions. Thread source_kind through service → tool → MCP tool (supports 'internal_committee' / 'all_committees'). - #54: agent instructions (researcher/analyst/writer) — search-by-name protocol: add content/case-number, search both corpora, use all_committees before declaring "not in corpus". - #55: chunker produced tiny fragment chunks ("דיון", "החלטה") from header keywords matched mid-sentence. Anchor SECTION_PATTERNS to line start + merge sub-min sections; exclude <50-char fragments at query time (484 existing fragments hidden; full re-chunk tracked as #57). Tests: scripts/test_retrieval_by_name.py (name ranks case above citer + substantive regressions); chunker unit checks (0 tiny chunks). New findings filed as tasks #56 (halacha source_kind leak) and #57 (re-chunk migration). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:26:19 +00:00
Chaim	d3c6baf9e2	security(chat): bind chat service to docker bridge + require Bearer auth All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m38s Details Address security-review finding: the host-side legal-chat-service was binding 0.0.0.0:8770 with no authentication. The service spawns the claude CLI, whose tool set includes Bash + Edit — so an unauthenticated /chat/start is effectively RCE. Oracle Cloud's security list closes the port externally, but defense-in-depth requires two independent layers: 1. Bind defaults to 10.0.1.1 (docker0 bridge gateway). Reachable from containers on docker bridges (the legal-ai container has a route via the coolify network), invisible to anything outside the host. The --host flag is still configurable for local-dev (127.0.0.1) or special-case deployments, but 0.0.0.0 is explicitly discouraged in the docstring. 2. /chat/start requires Authorization: Bearer <LEGAL_CHAT_SHARED_SECRET>. The secret is loaded from /home/chaim/.legal-chat-service.env (chmod 600, off-repo) by the pm2 ecosystem and mirrored as a Coolify env var so the FastAPI chat_proxy sends a matching header. hmac.compare_digest prevents timing oracles. /health stays unauthenticated (static OK, no subprocess) so the FastAPI proxy can probe liveness without the secret. The service refuses to start if LEGAL_CHAT_SHARED_SECRET is empty or shorter than 24 chars — no silent fallback to an open mode. When the Infisical MCP comes back, migrate the secret into the vault at /_GUIDELINES per the project secrets policy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:22:14 +00:00
Chaim	a3454bcb57	fix(training): bundle reference content + use docker bridge gateway All checks were successful Build & Deploy / build-and-deploy (push) Successful in 9s Details The Style Studio's curator-prompt + chat features read reference docs from disk at runtime. Two issues from the initial production run: 1. Dockerfile + .dockerignore excluded .claude/, docs/, and most of skills/. Now COPY the four specific files the new endpoints need: - .claude/agents/hermes-curator.md - skills/decision/SKILL.md - docs/legal-decision-lessons.md - docs/corpus-analysis.md .dockerignore opens whitelists for just those files. 2. Coolify's custom_docker_run_options=--add-host=host.docker.internal:host-gateway is not honored on dockerimage build_pack apps (ExtraHosts stayed []). Switch chat_proxy.py default to http://10.0.1.1:8770 — the docker0 bridge gateway, same pattern Paperclip uses for 3100. Bind the host pm2 service to 0.0.0.0:8770 so the container can reach it via the bridge IP. Oracle Cloud's security list keeps the port unreachable from the public internet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:15:27 +00:00
Chaim	bb0cd7c6a2	feat(training): Style Studio — upload, rich corpus, lessons, curator portrait, chat All checks were successful Build & Deploy / build-and-deploy (push) Successful in 2m7s Details Six-phase upgrade of /training from a read-only dashboard into a full Style Studio for managing Daphna's style corpus. - Upload Sheet on /training: file → proofread preview → commit (no more CLI-only `upload-training` skill). - Rich corpus metadata: GET /api/training/corpus returns summary, outcome, key_principles, page_count, parties (regex), legal_citation, lessons_count. PATCH endpoint for chair edits. CorpusDetailDrawer with 4 tabs (details /content/lessons/patterns) replaces the bare table row. - LLM metadata enrichment: style_metadata_extractor + MCP tools (style_corpus_enrich, style_corpus_pending_enrichment) fill summary /outcome/key_principles via claude_session (free, host-side). - Per-decision lessons: new decision_lessons table + 4 REST endpoints + LessonsTab in drawer; hermes-curator now auto-posts findings as decision_lessons(source=curator). - Curator Portrait tab: prompt rendered with link to Gitea, recent curator findings, style_analyzer training prompts, propose-change form that writes proposals to data/curator-proposals/ for manual chair review (no auto-mutation of the agent file). - Style chat tab: SSE-streamed conversations with the style agent. New host-side pm2 service (legal-chat-service, port 8770) wraps claude CLI with stream-json + --resume continuation; FastAPI proxies via host.docker.internal. Zero API cost — uses chaim's claude.ai subscription. chat_conversations + chat_messages persist history. Architecture: keeps the existing rule that claude_session only runs on the host (not the container). The new legal-chat-service is the canonical bridge between the container and the local CLI for the chat feature; everything else (upload, metadata, lessons) stays within the container's existing capabilities. Audit script (scripts/audit_training_corpus.py) included for verifying which corpus rows still need enrichment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 10:06:22 +00:00
Chaim	2aee398b4a	feat: Stage C — RAG advanced (#33 , #47 , #48 , #49 , #50 , #51 ) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details Six independent sub-tasks dispatched in parallel; aggregated here. ## #33 — Hide case_name column library-list-panel.tsx: `<TableHead>` + `<TableCell>` for "שם" get `className="hidden"` in both Court and Committee row variants. DB column preserved for future use. ## #47 — Audit script periodic New scripts/audit_corpus_integrity.py — 3 SQL checks (external+ערר prefix, internal missing chair/district, cases.practice_area enum) + CEO wakeup on violations + cron `0 7 * * `. First run: 0 issues. ## #48 — Parent-doc retrieval (gated, default off) Schema V17: precedent_chunks.parent_chunk_id + chunk_role ('child'\|'parent'). New chunker.chunk_document_hierarchical() — section-aware parents (~1500 tokens) containing ~5 overlapping children (~300 tokens each). New db.store_precedent_chunks_hierarchical two-pass writer. Search SQL (semantic + lexical) LEFT-JOIN parent and swap content + dedupe by parent_chunk_id when flag on. Toggle: PARENT_DOC_RETRIEVAL_ENABLED + PARENT_DOC_{CHILD,PARENT}_SIZE_TOKENS. Backfill ~3min and ~$0.20 — deferred to follow-up. ## #49 — Multimodal backfill New scripts/backfill_multimodal_precedents.py with token-matching case_number ↔ source files (PDF + DOCX via PyMuPDF). Ran in container: 26 precedents embedded, 503 pages, $0.21, 0 errors. precedent_image_embeddings grew 3 → 29 rows. 44 remaining are style_corpus-migrated rows (no source file on disk) — will catch up when re-uploaded. ## #50 — Closed-loop feedback + nDCG Schema V18: search_logs + search_relevance_feedback. New telemetry.py with fire-and-forget log_search_bg (p50 = 0.002ms — zero overhead) + auto-infer_relevance_from_citations (reads case drafts → marks score=3 when cited precedent appears in past search top-K). Hooks added to 5 search paths. scripts/compute_ndcg.py for aggregation. Two admin API endpoints (GET /api/admin/rag-metrics + POST .../infer). Dashboard UI deferred — API is enough for now. ## #51 — Halacha quality monitoring New scripts/monitor_halacha_quality.py — baseline avg confidence (trusted=0.849, all=0.833, pending=0.694) with rolling window drift detection. Default 5% threshold. Exits non-zero on alert for cron integration. Recommended: `0 8 * 1` weekly Mon 8am. ## Bonus: 230 unlinked citations → missing_precedents Bulk-imported 230 distinct unlinked citations from precedent_internal_citations to missing_precedents.status='open', party='committee', with notes listing source citers. Top candidate: ע"א 3213/97 (cited 5x). Total open missing_precedents now 237. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 11:26:52 +00:00
Chaim	ac3ed455cf	fix(cases): בל"מ badge reads proceeding_type, not just appeal_subtype All checks were successful Build & Deploy / build-and-deploy (push) Successful in 43s Details After the proceeding_type field landed, users started flipping cases to בל"מ via the edit dialog. But the case-header badge + cases-table filter were still gated on isBlamSubtype(appeal_subtype), so the badge didn't appear when only the proceeding_type changed. Now the badge shows when either proceeding_type === 'בל"מ' OR appeal_subtype is an extension_request_* variant — the legacy path stays so existing rows that never got a proceeding_type still render correctly. Also regen types.ts from prod (proceeding_type now in OpenAPI schema) and register the one-shot process_pending_blam.py script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:34:23 +00:00
Chaim	d359ab9884	feat(proceeding-type): explicit ערר/בל"מ field for cases + corpus All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m40s Details Same case_number can exist as both a regular appeal (ערר) and an extension-of-time request (בל"מ), and we were inferring the difference from appeal_subtype prefixes — fragile, and case-number lookups weren't disambiguated. Now stored as a first-class field on both case_law (corpus) and cases (live cases), with partial unique indexes on (case_number, proceeding_type). - SCHEMA_V15: column + CHECK constraints + backfill from appeal_subtype LIKE 'extension_request_%' + partial unique indexes replace the old global UNIQUE(case_number). - derive_proceeding_type() centralizes the inference rule (extension_request_* → בל"מ; subject regex fallback; default ערר). - Metadata extractor prompt asks Claude to populate the new field explicitly; apply_to_record writes it for internal_committee rows. - internal_decision_upload, case_create, case_update accept an optional proceeding_type; FastAPI request models expose it. - Wizard + edit dialog get a sided Select; case header renders the resolved label (ערר / בל"מ). - Uploaded the 2 staged בל"מ decisions on betterment levy: 8126/24 (סופר נוח, 13 chunks), 8047/23 (הרנון, 48 chunks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 09:17:33 +00:00
Chaim	f3cc9ca9d4	feat: Stage A finalizers + #35/#36/#37 — critical-gap closure Some checks failed Build & Deploy / build-and-deploy (push) Has been cancelled Details Four parallel sub-agents closed the remaining critical gaps from the 26/05 Stage A/B sprint. Each block independently tested; aggregated here. ## #30/#31 finalizers (sub-agent A) * Auto-derive practice_area in case_create from case_number prefix (1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197); default for CaseCreateRequest is now "" (the DB constraint catches any stray "appeals_committee"). * practice_area.py: derive_subtype now handles axis-B domain values (rishuy_uvniya/betterment_levy/compensation_197) without parsing the case number; new helper derive_domain_practice_area(). * Halacha re-extraction verified unnecessary — all 6 reclassified records already had is_binding=false and approved halachot. * Regression tests: 6 cases in tests/test_corpus_constraints.py covering practice_area enum, internal-committee chair/district, external-upload arar prefix, MCP guard. * UI: district input → Select dropdown (7 districts) in precedent-edit-sheet.tsx, preserving legacy free-text values. ## #37 בל"מ subtypes (sub-agent B) * 3 new appeal_subtypes: extension_request_{building_permit, betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended, SUBTYPES_BY_AREA mappings added. * New helpers: is_blam_subject(), is_blam_subtype(), derive_subtype_with_blam(case_number, subject, practice_area). case_create now uses it to auto-detect "בקשה להארכת מועד" subjects. * 3 methodology templates under docs/methodology/extension-request-.md. paperclip_client.py mapping updated for the 3 new subtypes (extension_request_building_permit→CMP, the other two→CMPA). * Frontend: bilingual "בל"מ" badge + filter dropdown on cases list + detail header; appeal-type-bars collapseBlam() merges בל"מ into its parent domain for aggregate bars. * Wizard auto-detects בל"מ from subject during case creation. * 3 Berlinger cases (1017/1018/1019-03-26) migrated to appeal_subtype=extension_request_building_permit via psql. ## #35 missing_precedents feature (sub-agent C) * Schema V13: missing_precedents table (citation, case_id, party, legal_topic, status, linked_case_law_id, claim_quote, ...) + FK constraints + 3 indexes. Applied via psql + idempotent migration. * 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints (POST/GET/PATCH/DELETE/upload — upload routes by citation prefix to ingest_internal_decision or ingest_precedent). * Next.js page /missing-precedents with 5 status tabs + filters + sidebar badge counter + detail drawer with metadata edit + smart upload form that switches fields per committee/court. * Bootstrap: 7 rows imported from the JSON file (3 citations × cases, all status=closed with linked_case_law_id). * legal-researcher.md: new §2ב.5 with missing_precedent_create usage + dedup semantics + tool grant. ## #36 legal_arguments aggregation (sub-agent D) * Schema V14: legal_arguments + legal_argument_propositions M:M. Applied via psql. * New service argument_aggregator.py with two functions — aggregate_claims_to_arguments() (Claude CLI / claude_session) and get_legal_arguments(). Graceful llm_unavailable handling when CLI is missing (containers). * 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as BackgroundTask, GET .../legal-arguments). * Frontend: shadcn Accordion + new legal-arguments-panel.tsx with hierarchical (party → priority badge → arguments) display, "טיעונים" tab on the case page, "חשב/חשב מחדש" buttons. * scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run found 8 candidate cases including 1017/1018/1019. ## Open follow-ups (intentionally deferred) * npm run api:types in web-ui (CLAUDE.md flow) — recommended before the next UI commit; not required for backend deployment. * Run backfill_legal_arguments.py --apply once the container picks up the new aggregator service. * webhook on missing-precedents upload-close to Paperclip (optional). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-26 08:34:40 +00:00
Chaim	5f43659b5a	fix: add defensive JSON parsing in check_instructions	2026-05-16 17:53:42 +00:00
Chaim	86734da210	feat: add --check-instructions, pre-flight validation, and mtime tracking to sync script - P3-T1: --check-instructions flag + check_instructions() prints a table of all agents' instructionsFilePath with status (✅ OK / ❌ MISSING / ⚠ NOT SET), size, mtime, and ⚠ DRIFT when file has changed since last sync - P3-T2: --apply now runs a pre-flight check on master agents and aborts if any instruction file is missing, before touching the DB or calling any API - P3-T3: get_claude_md_mtime() helper; --apply stamps claude_md_mtime and claude_md_last_synced into each mirror agent's metadata via the PATCH call - P3-T4: alias check-agents added to ~/.bashrc Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 17:51:34 +00:00
Chaim	45341a0bc8	feat(curator): switch Hermes Curator to DeepSeek V4-Pro via deepseek_local adapter A/B test (2026-05-05) showed DeepSeek V4-Pro is 2-3x faster and ~20x cheaper than Sonnet for style/lexicon pattern analysis, with comparable quality. Adds adapters/deepseek-paperclip-adapter/ package, documents adapter requirements (env injection, run-id headers), updates CLAUDE.md with adapter integration notes, and records lessons from ערר 1200-25 (block order for 1xxx, "להלן מתוך" pattern, expanded factual background, bridge planning analysis, flat heading structure). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-10 05:58:52 +00:00
Chaim	1b14e04373	chore(skills): remove paperclip-dev, scope converting-plans-to-tasks All checks were successful Build & Deploy / build-and-deploy (push) Successful in 7s Details paperclip-dev is for maintaining the Paperclip codebase itself — not relevant to legal work. Removed from all 14 agents (was on CMPA mirror). paperclip-converting-plans-to-tasks helps decompose a plan into assigned issues. Useful for the planning-heavy agents (CEO, analyst). Now scoped to those two — removed from the other 5 in CMPA where it had crept in. Net effect: zero drift on paperclipai/* skills across all 7 master+mirror pairs. Verified via the new Agents tab dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:47:05 +00:00
Chaim	cf5f6fe274	feat(paperclip): close 11 integration gaps (#16-#28) Brings the legal-ai ↔ Paperclip integration in line with the official Paperclip skill. Net effect: HEARTBEAT.md -47% (370→195 lines), all 14 agents on uniform runtime_config + budget + instructionsBundleMode, and two cross-company helpers replacing manual SQL. Highlights: - HEARTBEAT.md refactor: project-specific only, delegates to the official paperclipai/paperclip skill (loaded per agent). Adds heartbeat-context fast-path (§1.7) and PAPERCLIP_WAKE_PAYLOAD_JSON shortcut (§1.5). - Issue Thread Interactions API: legal-ceo.md now uses ask_user_questions / request_confirmation / suggest_tasks instead of free-text comments — gives chair structured UI with idempotency keys. - pc.sh + paperclip_api.pc_request: every API call goes through helpers that inject Authorization + X-Paperclip-Run-Id (audit trail). - sync_agents_across_companies.py: master(CMP)→mirror(CMPA) sync via Paperclip API, idempotent, with --verify and --apply modes. - skills/new-company-setup: 11-step blueprint distilling all 11 gaps into a single onboarding runbook for the next company. - .taskmaster: 12 tasks covering each gap (one already closed: #29). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:25:45 +00:00
Chaim	8a815ecff5	fix(retrieval): rewrite chunk-page retrofit to skip OCR All checks were successful Build & Deploy / build-and-deploy (push) Successful in 16s Details The first-pass retrofit re-extracted via extractor.extract_text, which re-runs Google Vision OCR on scanned pages. OCR is non-deterministic, so the new text didn't match the chunk content stored in the DB (produced by the original OCR run) — only ~7% of chunks were located. New approach (no OCR cost): 1. Use the stored documents.extracted_text from the DB — the exact text the chunks were produced from, so chunk lookups match. 2. Anchor page boundaries via PyMuPDF direct text reads (free, no OCR). Pages with usable direct text are anchored by snippet match; OCR-only pages are linearly interpolated between anchors. 3. Search each chunk in extracted_text using a whitespace-tolerant helper — needed because the chunker joins paragraphs with single '\\n' while extracted_text uses '\\n\\n' as page separators. Verified on 8174-24 (5 docs, 307 chunks) + 8137-24 (9 docs, 512 chunks): 100% chunks tagged, 13s total, $0 cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:04:33 +00:00

1 2

71 Commits