legal-ai

Author	SHA1	Message	Date
Chaim	c7c402e7ef	feat(operations): manual burst control for the halacha drain + permanent supervisor All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 6s Details The halacha-extraction backlog needs to be worked off the chair's leftover weekly Claude quota on demand. This adds a MANUAL, time-boxed "burst" — run the drain continuously now until a chosen deadline (default the upcoming Saturday 18:00 IL), managed interactively from /operations — plus the permanent health-supervisor that enforces it. Backend (this PR; deploys via Coolify + host pm2): - db: drain_controls.burst_until (SCHEMA_V37) + set_drain_burst/get_drain_burst/ get_drain_bursts. Single source of truth shared by the container-side /operations API and the host-side supervisor. - web: POST /api/operations/drains/{name}/burst (on→until\|next-Sat-18:00, off→NULL), and burst_until surfaced per-service in the /operations snapshot. - scripts/halacha_drain_supervisor.py + legal-halacha-supervisor.config.cjs: pm2 cron (*/15, zero Claude quota) — re-triggers idle drain, restarts a HUNG run (liveness = per-chunk checkpoints, NOT log mtime), backs off on 429 until the parsed reset (fresh-gated), verifies crash-safe staging. Reads burst_until from the DB; burst auto-expires at the deadline (never bleeds into a fresh week). UI (separate follow-up PR, after Claude Design approval): the /operations toggle + date-picker that calls the burst endpoint. Invariants: G1 (normalize at source — burst lives once in the DB, read by both surfaces), G2 (no parallel control path — CAPTURE field on the existing drain_controls + orchestrates the existing drain, not a new one), G12 (no Paperclip touch), §6 (no silent error-swallow — burst-clear failure is surfaced as a note).	2026-06-12 11:11:13 +00:00
chaim	ca1a0ddaac	Merge pull request 'fix(learning): chair_name במקור — סופי-ועדה תמיד נכנס לקורפוס-הפסיקה (#134 )' (#226 ) from worktree-chair-name-rootfix into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m30s Details G12 Leak-Guard / leak-guard (push) Successful in 6s Details	2026-06-12 07:26:32 +00:00
Chaim	242e6cfd11	fix(learning): chair_name במקור — סופי-ועדה תמיד נכנס לקורפוס-הפסיקה (TaskMaster #134 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 5s Details הבאג: שלב-הלמידה (ingest_final_version → ingest_internal_decision) מוסיף כל סופי כתקדים ציטוטי ב-case_law (source_kind=internal_committee), אך נכשל בשקט (non-fatal warning) כש-cases.chair_name ריק — בגלל constraint case_law_internal_chair_check. כך סופיים של 1194/1200/8070 לא נכנסו לקורפוס-הפסיקה. שורש: (1) chair_name לא נקבע בפתיחת תיק; (2) מסלול-ה-MCP העביר chair גולמי בעוד מסלול-ה-UI (web/) כבר פתר אותו דטרמיניסטית — מסלולים מקבילים מתפצלים (הפרת INV-G2); (3) הכשל נבלע (נגד §6). תיקון-שורש (3 שכבות): 1. SoT יחיד (INV-G2): `config.committee_chair_for_case` — המקום היחיד שגם web/app.py וגם tools/workflow.py + db.create_case גוזרים ממנו chair (לפי תחילית מספר-התיק; override ל-env). web/ אחוד אליו (הוסרה הכפילות). 2. נרמול-במקור (INV-G1): `db.create_case` קובע chair_name תמיד לא-ריק; `cases.case_create` חושף param. `ingest_final_version` גוזר chair מה-SoT במקום הערך הגולמי → ה-constraint לא נופל. 3. נראות (§6/feedback_silent_swallow): כשל-העתק מוחזר ב-result (`internal_corpus_error`) ו-`final_learning_pipeline` מדפיס אזהרה — לא נבלע. backfill ל-11 תיקים עם chair ריק. `audit_corpus_integrity`: נוספו CHECK_D (תיקים מוכרעים ללא chair) + CHECK_E (סופי-final חסר מקורפוס-הפסיקה) — שניהם 0 כעת. invariants: מקיים INV-G1 (נרמול בכתיבה), INV-G2 (מסלול-יחיד, אוחד web↔MCP), §6 (אין בליעה שקטה). בדיקות: py_compile + 14 pytest (chair_seed_gate, audit_provenance) + integration של create_case (default+override) + הרצת ה-audit החי (A–E=0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 07:25:54 +00:00
Chaim	d246fb85fc	feat(learning): FU-5 — מדידת לולאת-הלמידה מול הכרעות-היו"ר (#133 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 6s Details מרחיב את halacha_panel_calibrate.py כדי למדוד את הלולאה לאורך-זמן ולשמור על בריאותה — סוגר את 5 ה-FU של #133. - --source captured (חדש, אפס-עלות): מצליב סבבי-פאנל שמורים (FU-1) מול הכרעות-היו"ר (FU-2) דרך db.panel_rounds_vs_chair, ומדווח split-rate + auto-precision + false-keep/false-drop לכל סבב (per round-day) מול ה-gold-set הגדל. כך רואים את הלולאה עובדת: ככל שהרובריקה משתפרת (FU-4 → אימוץ-יו"ר) — precision נשמר ו-split יורד. בלי re-vote, בלי LLM. - summarize_calibration() + bucket_by_round() — עוזרים טהורים (offline- testable). משתפים את analyze_pairs של FU-4 → "מה נכשל" מחושב במקום אחד (בלי drift, G2). - anon-stability: שתי המדידות מדווחות את שיעור-יציבות מבחן-האנונימיזציה (#81.7) כמטריקת-בריאות נגד echo-chamber — נפילה = שינון במקום היגיון. - --source live (קיים): נוסף עמוד split-rate מפורש + anon-stability. - tests/test_panel_calibrate_captured.py — 5 בדיקות offline. SCRIPTS.md עודכן. smoke read-only עבר (0 זוגות → nothing-to-measure). Invariants: read-only מדידה · INV-G10 (האמת=הכרעת-יו"ר) · anti-echo- chamber (anon-stability) · G2 (analyze_pairs מקור-יחיד). רגרסיה 30 עברו. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 07:19:48 +00:00
Chaim	4cad17df3a	feat(learning): FU-4 — זיקוק-רובריקה propose-only מהכרעות-היו"ר (#133 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 5s Details job תקופתי שסוגר את לולאת-הלמידה: מצליב את סבבי-הפאנל (FU-1, הצבעות+ נימוקים) מול הכרעות-היו"ר (FU-2 seeds), מזהה כשלים שיטתיים, ומציע KEEP_SYSTEM v2 + exemplars מופשטים — כדוח-diff לעיון-היו"ר. לעולם לא auto-applied. - db.panel_rounds_vs_chair() — read-only LATERAL join: לכל הלכה עם seed chair-live (FU-2, אמת אנושית) + סבב-פאנל אחרון (FU-1) → הצבעות+נימוקי- 3-השופטים מול keep/drop של היו"ר. הסיגנל היחיד = הכרעת-יו"ר, לא הצבעות-הפאנל (anti-echo-chamber, INV-LRN1). - scripts/halacha_rubric_distill.py: • analyze_pairs() — ליבה דטרמיניסטית טהורה (offline-testable): false-keep (פאנל שמר, יו"ר דחה), false-drop, פיצולים-שהוכרעו, שיעור-מחלוקת-עם- היו"ר לכל שופט; בוחר ראיות-מחלוקת מכוסות. • הצעת-LLM מקומית (claude_session, tools="", אפס עלות): מזהה דפוסי-כשל ומציע נוסח-רובריקה v2 + exemplars מופשטים (INV-LRN5 — בלי מהות-תיק). • כותב data/learning/rubric-proposal-<ts>.md עם diff(KEEP_SYSTEM→v2); אף שורת-קוד לא משתנה. אימוץ = עריכה ידנית דרך PR (INV-LRN1). • <12 זוגות → "אין מספיק נתונים" (מצב נוכחי: seeds עדיין מצטברים). • --no-llm (סטטיסטיקה בלבד) / --limit N. - tests/test_rubric_distill.py — 8 בדיקות offline על analyze_pairs. - SCRIPTS.md עודכן. smoke read-only עבר (0 זוגות → insufficient-data). תואם הדפוס הקיים (style_lesson_panel/halacha_panel_audit): פאנל מציע, הטמעה נשארת שער-יו"ר ידני. Invariants: INV-LRN1 (propose-only) · INV-LRN5 (טוהר-רובריקה) · INV-G10 · anti-echo-chamber. בלי שער/UI חדש. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 06:59:34 +00:00
Chaim	183156646c	chore(migration): renumber 11 cases to canonical NNNN-MM-YY All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 7s Details One-time host migration (executed 2026-06-12): adds the missing 2-digit month to 11 case numbers (and corrects 1046-26 → 1024-02-26, a wrong serial). All legal-ai FKs are on cases.id (UUID) → untouched. The script atomically migrates, per case, everything that embeds the number as TEXT: · cases.case_number + every column containing 'cases/{old}/' (file_path AND image_thumbnail_path — the latter is a DATA_DIR-relative storage key with no '/data' prefix, hence the slash-less needle) · disk dir + case.json · MinIO keys across 3 buckets (legal-immutable = WORM, copy-only) · Gitea repo rename + local .git remote + description · Paperclip project name For the 4 archived cases whose final was ingested, the canonical number is propagated to the precedent + style corpora identifier fields (case_law, style_corpus, style_exemplars, citations) per chair decision — document content / full_text / OCR text is left as the historical record. Verified: 0 stale identifier/path refs across all 11; documents, thumbnails, drafts, Gitea, Paperclip all resolve under the new numbers. Per-case backups in data/audit/renumber-*.json. Invariants: G1 (normalise at source — single rename op, not read-time patch), G2 (no parallel path — reuses the app's DB pool + storage semantics), G12 (Paperclip touched only via its declared surface). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 06:24:10 +00:00
Chaim	0a7869175e	feat(learning): FU-1 — לכידת סבבי-פאנל להלכות (#133 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 7s Details לולאת ה-active-learning זקוקה לסיגנל ללמוד ממנו, אבל הפאנל (halacha_panel_approve.py) זרק עד כה את הצבעות-3-השופטים ואת ההנמקות — שרד רק review_status הסופי על halachot. בלי ההצבעות+הנימוקים אין דרך לזקק rubric משופר. FU-1: - טבלה חדשה halacha_panel_rounds (SCHEMA_V35) — שורה לכל (הלכה, סבב): הצבעה+נימוק לכל לינאז' (claude/deepseek/gemini), ה-verdict, ומה הריצה עשתה (applied_action), apply_mode. במתכונת עמודות-הפאנל של halacha_goldset. - db.insert_panel_round() — helper כתיבה (capture-only). - halacha_panel_approve.py: שומר את התשובות הגולמיות (במקום לזרוק את הנימוק), מוסיף reason ל-NLI_SYSTEM, וכותב סבב לכל פריט בשני המצבים (dry-run ו---apply). --no-capture לדילוג. capture-only: לעולם לא נוגע ב-halachot — שער-היו"ר ב-/precedents נשאר מקור-האמת היחיד (INV-G10). ה-seed ללמידה נוצר בהצלבה מול הכרעת-היו"ר המאוחרת על אותה הלכה (FU-2). Invariants: מקיים INV-G10 (capture-only, שער-יו"ר יחיד), INV-LRN1/3 (לכידה-מבנית; propose-only — אין auto-commit), G1 (לכידה-במקור), G2 (יכולת חדשה, לא מסלול-מקביל), G12 (לא נוגע ב-Paperclip port). חלק מ-#133. smoke (dry-run --limit 8): 6 nli captured, errors=0, נימוקים מלאים מ-3 השופטים. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-12 04:22:48 +00:00
Chaim	b447ffb184	fix(ops): ייבוש backlog-רפאים של חילוץ-מטא — נרמול-במקור של metadata_extraction_status (G1) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 9s Details מונה "ממתין (בקלוג)" ב-/operations הציג 140 פריטים תקועים שהדריינר (Gemini, כל 15 דק') דיווח עליהם total_pending=0 — אי-התאמה בין שתי הגדרות-תור: ה-UI סופר status='pending' (ברירת-מחדל של העמודה), בעוד הדריינר סורק רק metadata_extraction_requested_at IS NOT NULL. שורות שקיבלו מטא במסלול אחר (internal דטרמיניסטי, cited_only חסר-טקסט) נשארו על ברירת-המחדל 'pending' לנצח. פילוח ה-140: 82 internal_committee (מטא דטרמיניסטי, מחוץ לצנרת-Gemini) · 31 cited_only (אין טקסט לחלץ) · 27 external_upload (כבר מלאים). תיקון-במקור (G1 — נרמול במקור, לא תיקון-בקריאה): - db.create_internal_committee_decision: INSERT + ON CONFLICT קובעים metadata_extraction_status='completed' ישירות → שורות פנימיות לא נכנסות שוב למצב-הרפאים. - scripts/reconcile_metadata_status.py: נרמול חד-פעמי/re-runnable של שורות קיימות (internal/external מלא→completed · external חסר→requeue · cited_only→skipped). הורץ: 82+27→completed, 31→skipped, pending=0. - web-ui /operations: התווית "ממתין (בקלוג)" → "ממתין" (הסרת המילה הלועזית) + tooltip מדויק; הערת operations.ts מעודכנת. Invariants: מקיים G1 (normalize-at-source) ו-INV-IA (מונה-אמת/מקור-אמת-יחיד). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 22:09:38 +00:00
chaim	383118bc5f	Merge pull request 'feat(storage): אטימת מסלול-הכתיבה INV-STG1 — 15 seals + CI leak-guard + tripwire' (#205 ) from worktree-seal-storage-write-path into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m30s Details G12 Leak-Guard / leak-guard (push) Successful in 5s Details	2026-06-11 19:57:54 +00:00
Chaim	0d8cc31a2b	feat(storage): seal INV-STG1 write path — 15 dual-write seals + CI leak-guard + tripwire All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 5s Details אחרי ה-cutover ל-s3-only, אודיט מצא 15 אתרי-כתיבת-בלוב שעוקפים את storage.py (uploads/ finalize/exports/training/research-backup/precedents/bulletins/draft) — קובץ ינחת בתיקיות-הישנות אך לא ב-MinIO → יאבד בניקוי, לא מוגש, לא מגובה. ה-pipeline (ingest/ extract) עדיין קורא לפי file_path מהדיסק, אז ביטול-מוחלט של כתיבה-לדיסק דורש read-wiring מלא (Phase 2, משימה נפרדת). תיקון בטוח עכשיו = dual-write seal. - storage.py: `mirror`/`mirror_file` (+ sync) — best-effort persist ל-S3 כשה-backend s3/dual (no-op ב-filesystem; כשל S3 נרשם, לא שובר request — DualBackend philosophy). - web/app.py: helpers `_seal_blob`/`_seal_blob_file` + 14 אתרים אטומים (storage.mirror אחרי כתיבת-הדיסק; הדיסק נשאר ל-pipeline). block_writer.py: draft אטום (async). - CI leak-guard (test_storage_write_leak_guard): נכשל על כל כתיבת-בלוב-לדיסק (write_bytes/write_text/shutil.copy/open(wb)) ב-web/+services ללא מרקר `# noqa: STG1`. כל ה-benign (fallbacks/tmp/staging/git-metadata/flag/state) מסומנים עם נימוק. storage.py מוחרג (הוא המימוש). - tripwire* (scripts/storage_leak_tripwire.py): ניטור-ריצה — בלובים בדיסק שלא ב-MinIO (json-key match, bucket per-file). אומת חי: 0 דליפות. invariants: INV-STG1 (כל I/O דרך storage / ממורר אליו) · INV-STG6 · feedback_silent_swallow (mirror רושם warning, לא bare-except). Phase 2 (read-wire ה-pipeline → להפיל את עותק-הדיסק) = follow-up. tests: 4 mirror + 1 leak-guard + 6 serve_blob + 18 storage קיימות עוברות. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-11 19:57:12 +00:00
Chaim	8651529327	feat(ui): דף /scripts — קטלוג סקריפטים read-only מ-SCRIPTS.md All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 8s Details מגיש את scripts/SCRIPTS.md כדף ב-/scripts: שם · סוג · תפקיד · תזמון לכל סקריפט בתיקיית scripts/. מקור-האמת היחיד נשאר SCRIPTS.md (G2 — אין מסלול-תוכן מקביל); עריכה דרך git, לא מה-UI. - web/app.py: GET /api/scripts/catalog קורא את הקובץ בזמן-ריצה (מחקה את דפוס get_curator_prompt; HTTPException על כשל — אין בליעה שקטה §6) - Dockerfile: COPY scripts/SCRIPTS.md (לא הועתק לקונטיינר עד כה) - web-ui: דף /scripts (AppShell + רכיב Markdown הקיים) + מודול api + קישור ניווט - SCRIPTS.md: תיעוד ingest_bulletins.py — היה הקובץ היחיד מ-73 שלא תועד Invariants: G2 (מקור-אמת יחיד), G12 (אין מגע-Paperclip), X6 (UI↔API). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 19:42:44 +00:00
Chaim	970e8dc748	feat(storage): #106.4 — DB-driven blob→MinIO migration script (dry-run default) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 5s Details הגירת בלובים מדיסק ל-MinIO, מונחית-DB ולא `mc mirror` גורף — כי ה-bucket נקבע per-file-SEMANTIC (מסמך/טיוטה→documents, thumbnail→derived). סורק 6 עמודות-נתיב שקיימות בפועל (documents.file_path · cases.active_draft_path · digests.source_document_path · draft_final_pairs.final_path · document_image_embeddings/precedent_image_embeddings. image_thumbnail_path) — לא כפי שהספ הניח (case_law.source_document_path/_image_pages לא קיימים). מטפל ב-3 פורמטי-נתיב legacy לא-עקביים (אומת 2026-06-11): container-abs `/data/…`, host-abs `/home/chaim/legal-ai/data/…`, ו-relative — מנרמל ל-key יחסי-DATA_DIR (תואם storage.normalize_key + אתרי-הכתיבה #106.3 + read-wiring העתידי #106.5). קבצים שלא נמצאים/מחוץ-ל-DATA_DIR מדווחים, לא נבלעים. dry-run (ברירת-מחדל): תוכנית + מניפסט CSV ל-data/audit, אפס-שינוי. --apply מעלה דרך mcli ומאמת size אחרי כל PUT; הדיסק לא נוגע* → re-run אידמפוטנטי וההגירה הפיכה (לרוקן דליות + flip חזרה ל-filesystem). נרמול עמודות-ה-DB ל-keys נקיים = צעד נפרד מאוחר (#106.5). אומת חי (dry-run): derived 2593 (260MB) · documents 811 (638MB) · 0 outside · 28 חסרים (רפרנסי-DB תלויים מראש). סה"כ 3404 קבצים / 899MB. invariants: G2 (key=normalize_key, מסלול-אחסון יחיד) · INV-STG1/3 (storage layer, bucket per-governance) · INV-G10 (dry-run/הפיך, לא נוגע בדיסק). הצעדים הבלתי-הפיכים (cutover/WORM) נפרדים ועוצרים לאישור. tests: dry-run חי = אימות (count+size+normalization). py_compile OK. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-11 17:19:05 +00:00
Chaim	ec14e8310b	feat(halacha): #86.2 nevo-leak audit + safe ratio backfill · #86.3 ratio-coverage benchmark All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 5s Details #86.2 — scripts/nevo_corpus_audit.py leak: סורק chunks+הלכות למרקרי-preamble של נבו (מיובאים מ-extractor._NEVO_MARKERS — מקור-אמת יחיד), מבחין בין הווקטור המזיק (מרקר בתוך הלכה = רציו-עריכה שזוהה כהלכה) ל-benign (chunk עם רשימת-ציטוטים). ממצא חי: 0/~1650 הלכות מזוהמות — שכבת-הידע נקייה (שערי-האיכות של #81 מנעו זאת). לכן אין purge/re-ingest (גם כי re-OCR retrofit נוגד-עיקרון, feedback_no_reocr_retrofit; וצ'אנקי-ציטוטים benign). `leak --apply` עושה backfill אדיטיבי של case_law.nevo_ratio מ-full_text השמור (extract_nevo_ratio, דטרמיניסטי, ללא re-OCR, לא נוגע בצ'אנקים/הלכות) — "לשמור במקום למחוק". הורץ: 16→32 פסקים עם רציו שמור. #86.3 — benchmark: לפסקים עם nevo_ratio, הפאנל התלת-מודלי שופט אילו עקרונות-רציו מכוסים ע"י ההלכות שלנו → recall. smoke: 1110-20 (13 הלכות) recall=1.0 (כיסוי מלא); פסקים עם 0 הלכות → recall=0 (אות-פער-חילוץ אמיתי, לא כשל-כיסוי). מזין את אות-האיכות של #81.7. invariants: G2 (מרקרים+strip מיובאים מ-extractor; פאנל מ-halacha_panel_approve) · INV-G10 (read-only/אדיטיבי; אין מחיקה) · no-reocr (backfill מטקסט שמור, לא חילוץ-מחדש). tests: 6 offline (_has_marker/_has_editorial) + nevo_preamble קיים. אומת חי. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-11 16:50:50 +00:00
Chaim	5b001bbd9d	feat(halacha): #81.7 — gold-set labeled by tri-model consensus (Opus+DeepSeek+Gemini) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 6s Details מבטל את ה-man-in-the-loop בתיוג ה-gold-set (הנחיית-יו"ר 2026-06-11): במקום תיוג ידני של חיים/דפנה, אמת-המידה נקבעת בקונצנזוס שלוש שושלות-מודל עצמאיות — אותו פאנל שמערכת האישור החיה כבר משתמשת בו (halacha_panel_approve), עם 92% הסכמה חוצת-מודלים על הציר הגס. למה לא מעגלי: הוולידטורים הנמדדים ב-#81.8 (compute_quality_flags / is_fact_dependent / is_quote_truncated / is_thin_restatement) הם היוריסטיקות rule-based — משפחת-שיטה שונה מה-LLM-judges. שני שומרי-יושר: (1) פיצול-קולות (אין רוב 2/3) לא כותב לייבל — הפריט נשאר NULL ומוסלם ליו"ר (INV-G10); (2) מבחן-אנונימיזציה — שיפוט-מחדש עם מזהה-התיק ממוסך, flip בקונצנזוס = שינון ולא הנמקה (arXiv:2505.02172). - db.py: עמודות per-lineage (ds_/gm_; ai_*=claude קיים) + consensus/agreement/anon + goldset_set_panel_label() שכותב רוב-2/3 ל-is_holding/correct_type (tagged_by='panel:…', לא דורס tagged_by='chair'). goldset_score נשאר ללא שינוי — קורא is_holding (G2, אין מסלול ניקוד מקביל). עדכון הערת-הסכמה (בוטלה דרישת "MUST be human"). - scripts/goldset_panel_label.py: 3 שופטים (מיובאים מ-halacha_panel_approve, מקור-אמת יחיד) + prompt עשיר (מיובא מ-goldset_ai_recommend) + Fleiss κ + מבחן-אנונימיזציה. דוח→data/audit/. - SCRIPTS.md: סקריפט חדש; goldset_ai_recommend/independent_judge מסומנים single-model נבלעים. invariants: G2 (שופטים+prompt מיובאים, אין כפילות; ניקוד יחיד) · INV-G10 (פיצול→יו"ר) · INV-LRN2/LRN3 (איכות-במקור, לכידה מובנית). מקור: PoLL · Trust-or-Escalate (ICLR 2025) · arXiv:2505.02172. tests: 18 offline (consensus/type/Fleiss-κ/anonymize). live labeling = צעד תפעולי אחרי deploy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-11 16:03:32 +00:00
Chaim	4fa62db192	feat(halacha): drain לילי (23:00–05:00) + per-upload חילוץ תיק-בודד דרך ה-CEO (#120 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 6s Details מפריד בין ריקון-באקלוג המוני לבין חילוץ per-upload, ומסיר את ה"פקק" שגרם timeout/process_lost ב-heartbeat של ה-CEO. הבעיה (אבחנה 2026-06-11): לחיצת "חלץ הלכות" על תיק בודד יצרה issue (CMP-165) שהורה ל-CEO להריץ precedent_process_pending(halacha) — בולען סינכרוני שמרוקן את כל התור ההיסטורי (147 ממתינים, שעות) בתוך heartbeat שחסום לשעה. תוצאה: timeout כל שעה → process_lost בפירוק קבוצת-התהליכים → retry → סטורם, והתיק הבודד (FIFO אחרון) לא טופל. לא OOM, לא קוד שבור — אי-התאמה ארכיטקטונית. התיקון: 1. per-upload (web/paperclip_client.py, wake_for_precedent_extraction): גוף ה-issue + תיאור-הפרויקט מורים כעת להריץ precedent_extract_metadata + precedent_extract_halachot ל-case_law_id של ה-issue בלבד — עם אזהרה מפורשת לא להריץ process_pending. reextract_halachot כבר מנקה requested_at ומסמן completed → התיק לא יחזור לתור הלילי. 2. הוראות ה-CEO (.claude/agents/legal-ceo.md): אותו שינוי — חילוץ תיק-בודד, לא ריקון-תור. (צריך sync_agents_across_companies.py --apply אחרי מיזוג.) 3. ריקון-באקלוג (scripts/drain_halacha_queue.py): שער חלון-לילה 23:00–05:00 שעון ישראל (zoneinfo, DST-safe — המכונה UTC). מחוץ לחלון ===SKIP===; נעצר ===STOP=== כשהחלון נסגר, השאר ממשיך בלילה הבא (FIFO + per-chunk checkpoint). env: HALACHA_DRAIN_WINDOW_START/_END/_TZ. 4. cron (scripts/legal-halacha-drain.config.cjs): UTC band 20:00–03:00 שמכסה את חלון-ישראל בשני מצבי-DST; הסקריפט גוזם לחלון המדויק. ירייה שעתית מחדשת one-shot שמת (advisory-lock → חפיפה בטוחה). רשת-ביטחון: request_halacha_extraction עדיין מסמן requested_at, כך שאם wakeup ל-CEO נכשל — הדריינר הלילי יתפוס את התיק (בלילה, חסום), אך שום נתיב יומי לא מרוקן את כל התור. Invariants: מקיים G12/INV-PORT1 (paperclip_client = shell; leak_guard עובר). נוגע X16 (durability — מתקציב-זמן heartbeat ל-job ייעודי). בדיקות: py_compile ✓ · window-logic + zoneinfo ✓ (17:00 IDT→False) · leak_guard ✓. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 14:02:38 +00:00
Chaim	b2912e1b83	feat(pipeline): durable execution for final_learning via shared runtime (P1, X16/INV-DUR1, #115 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 6s Details מחיל את scripts/_pipeline_runtime.py (מ-P0) על final_learning_pipeline: 3 הצעדים ([1]ingest/Opus-distillation [2]enroll-style-corpus [3]style-panel) רצים דרך אותו runtime עמידות — מימוש אחד לשני הפייפליינים (G2), לא מימוש מקביל. קריסה/OOM בפאנל-הסגנון [3] ממשיכה מ-[3] במקום לשלם שוב על דיסטילציית-ה-Opus [1] (היקרה). thread יציב לכל תיק (learning:{case}); dry-run = preview נפרד. CLI זהה + --fresh. שגיאת ingest קריטית → raise → halt + clean non-zero exit (resume מנסה שוב). degradation חיננית כמו ב-P0 (ללא langgraph → ליניארי). אימות: py_compile OK; מיובא נקי ב-venv המשותף (langgraph נעדר, lazy import). מנגנון ה-runtime עצמו מכוסה ב-test_pipeline_runtime.py (P0) — אותו runtime. Invariants: INV-DUR1 (עמידות), G2 (runtime יחיד), G3 (idempotency). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 09:57:01 +00:00
chaim	f5650196b7	Merge pull request 'feat(pipeline): עמידות (LangGraph) ל-final_halacha (P0, X16/INV-DUR1, #114 )' (#178 ) from worktree-langgraph-durable-pipeline into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m29s Details G12 Leak-Guard / leak-guard (push) Successful in 7s Details	2026-06-10 09:53:07 +00:00
Chaim	e7d8b24d7c	feat(pipeline): durable execution for final_halacha via LangGraph (P0, X16/INV-DUR1, #114 ) scripts/_pipeline_runtime.py — runtime עמידות משותף: עוטף רשימת-צעדים async ב-LangGraph StateGraph ליניארי עם AsyncSqliteSaver (checkpoint לכל צעד). קריסה/OOM ממשיכה מהצעד שנכשל במקום להריץ הכל מחדש. degradation חיננית: ללא langgraph → ריצה ליניארית כמו קודם (הכפתור לא נשבר). מימוש אחד לשני הפייפליינים (G2). final_halacha_pipeline.py — 4 הצעדים ([0]extract [1]citations [2]corroboration [3]panel) רצים דרך ה-runtime. CLI זהה + --fresh (ברירת-מחדל auto-resume). thread יציב לכל תיק; dry-run = preview נפרד (תמיד fresh). קריסה בפאנל [3] → resume מ-[3] (steps 0-2 שמורים). pyproject: extra "durable" (langgraph + langgraph-checkpoint-sqlite) — host-only, optional. data/checkpoints/ ב-.gitignore. גבול (X16 §1): LangGraph רק כמנוע-פנימי של הסקריפט — לא orchestrator (לא מסלול מקביל ל-Paperclip; G2/G12). #108 (atomic extract) קדם לזה כתנאי. אימות: test_pipeline_runtime.py — עם langgraph (venv-זמני): 3 passed (resume מדלג צעדים שהושלמו · fresh מריץ-מחדש · linear). בלי langgraph (venv משותף): 1 passed + 2 skipped (degradation). final_halacha מתקמפל ומיובא נקי בשני המצבים. הרצה end-to-end על הפייפליין החי (DB+LLM) — לאחר `pip install -e ".[durable]"` בעץ הראשי. Invariants: INV-DUR1 (עמידות), G2 (runtime יחיד), G3 (idempotency מחוזק). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 09:52:35 +00:00
Chaim	d2b622f28e	feat(ci): G12 leak-guard — enforce the Agent Platform Port seam (R4, #113 ) All checks were successful G12 Leak-Guard / leak-guard (pull_request) Successful in 5s Details המאכף האוטומטי של INV-G12 (docs/spec/X15 §4). שני כללים קשיחים: 1. mcp-server/src (שכבת-האינטליגנציה) ללא סמלי-Paperclip — allowlist מנומק לפי substring ל-6 ההפניות הלגיטימיות (pm2-bridge + הערות-מקור company_id). 2. import seam — רק web/agent_platform_port.py (+ קבצי-המעטפת) מייבאים paperclip_*. מימוש קנוני אחד (scripts/leak_guard.py, stdlib-בלבד), משותף לשלושה אכיפנים (G2): • CI hard gate: .gitea/workflows/leak-guard.yaml (pull_request + push→main) • pytest: mcp-server/tests/test_platform_port_leak_guard.py (כולל self-test שמוודא שה-guard תופס הזרקה — לא ירקב) • hook בזמן-אמת: spec-guard.sh בודק את התוכן-הנכתב (new_string/content) על כתיבה ל-mcp-server/src ומזהיר על הזרקת-Paperclip (לא-deduped); תזכורת-הספ עודכנה ל-G1–G12. מחריג קבצים-נוצרים (web-ui types.ts) ומעטפת מוצהרת; הפרונט מחוץ להיקף-האינטליגנציה (ממצא R3). עודכן scripts/SCRIPTS.md. אימות: סריקה נקייה exit 0; הזרקת pc.sh ל-mcp-server → exit 1; seam-violation ב-web → exit 1; hook מזהיר על mcp-server ומזכיר-ספ על web; pytest 3 passed; bash -n + YAML תקינים. Invariants: G12 (אכיפה), G2 (מאכף יחיד לשלושה צרכנים). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 09:40:42 +00:00
Chaim	98c5feff25	feat(cases): תצוגת "פסיקה שצוטטה בהחלטה" בעמוד-התיק + שחזור חיווט-הרמס UI שביקש חיים: בכניסה להחלטה רואים את הפסיקה שצוטטה בתוכה — מקושרת לספרייה (קליק → /precedents/[id]) מול חסרה (סומנה אוטומטית להעלאה). - web/app.py: GET /api/cases/{case}/citations — מהשורה internal_committee של ההחלטה ב-case_law → precedent_internal_citations: linked (join case_law) + missing (unresolved + האם flagged ב-missing_precedents). - web-ui: lib/api/citations.ts (hook) + CitationsSection ב-drafts-panel (מוצג כשההחלטה בספרייה). מקושרת=ירוק/קליק, חסרה=ענבר "סומנה להעלאה". - scripts/curator_apply_pipeline_branch.py: מקור-אמת לחיווט-הכפתורים של הרמס (ה-prompt חי רק ב-Paperclip DB). מקדים branch שמריץ את pipeline-ה-final ל-wake reason final_learning_/final_halacha_ (HOME/DOTENV/DATA_DIR מוחלטים → מפתחות DeepSeek+Gemini + DATA_DIR נפתרים נכון). idempotent, שני הסוכנים. כבר הוחל ב-DB; הסקריפט לשחזור אחרי reset. אומת: py_compile ✓ · tsc ✓ · החיווט אומת חי על 8126 (deepseek+gemini, dedup, ✓ pipeline הושלם). G2 (יכולת חסרה) · INV-LRN1/G10 נשמרים. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 11:59:21 +00:00
Chaim	0f0656ecca	feat(learning): חיווט אוטונומי לכפתורי מסלול-הסופי — סקריפט-תזמור אחד לכל שלב הכפתורים "הרץ למידת-קול"/"הרץ אימות-הלכות" מעירים את הרמס, ובמקום שהסוכן (DeepSeek) ירכיב כמה קריאות-כלי (שביר), הוא מריץ עכשיו פקודה דטרמיניסטית אחת. חדש: - scripts/final_learning_pipeline.py — (1) ingest_final_version עם נתיב-הסופי (מדלג אם הזוג כבר analyzed; --force לחידוש), (2) רישום לקורפוס-הסגנון (idempotent — סוגר את הפער שפאנל-הסגנון דרש corpus_id), (3) style_lesson_panel --apply. --dry-run להרצה בטוחה. - scripts/final_halacha_pipeline.py — extract_internal_citations → corroboration.build_all → halacha_panel_approve --apply. --dry-run / --limit. briefs הרמס (web/paperclip_client._curator_task_brief) פושטו לפקודה-אחת לכל task — חסין מול הרצת-סוכן. תוקנו שני הפערים שזוהו: ingest דרש file_path, ופאנל-הסגנון דרש style_corpus. נלווה: תיקון help מיושן של halacha_panel_approve (--apply מחווט). SCRIPTS.md. אומת: שני ה-pipelines רצו dry-run על בל"מ 8126-03-25 (skip-ingest, קורפוס, פאנלים) בהצלחה. Invariants: INV-LRN1/LRN5/G10 (הפיך, שער-יו"ר ידני נשמר), INV-DM7. G2 — תזמור של יכולות קיימות, לא מסלול-מקביל. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 10:21:39 +00:00
chaim	9ae49f0f70	Merge pull request 'feat(learning): מסלול נקי להעלאת החלטה סופית + פאנל-סגנון דו-סוכני (DeepSeek+Gemini)' (#158 ) from worktree-final-upload-pipeline into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 42s Details	2026-06-08 09:04:16 +00:00
Chaim	f79c46a352	feat(learning): מסלול נקי להעלאת החלטה סופית + פאנל-סגנון דו-סוכני (DeepSeek+Gemini) מוסיף מסלול ייעודי לקליטת ההחלטה החתומה של היו"ר, ומפעיל אותו דרך שני שלבים אוטומטיים מדורגים עם פאנלי-סוכנים (אוטו-אישור + אסקלציה ליו"ר). Backend (web/): - POST /api/cases/{case}/final/upload — קליטת final חיצוני: שמירה קנונית (סופי-{case}.docx + עותק קורפוס-סגנון תחת case_number מלא כדי שבל"מ לא יתנגש עם ערר באותו מספר), פתיחת draft_final_pairs (final_received). לא נוגע ב-active_draft ולא מריץ retrofit (נבדל מ-exports/upload ו-mark-final → לא G2). - POST .../final/run-learning + .../final/run-halacha — שלבים מדורגים שמעירים worker מקומי (claude/DeepSeek/Gemini מקומיים בלבד) דרך הרחבת wake_curator_for_final עם param task=learning\|halacha. פאנל-סגנון חדש (scripts/style_lesson_panel.py): שני שופטים (DeepSeek+Gemini) על-גבי דיסטילציית-ה-Opus; הסכמה 2/2-keep → decision_lesson (source=panel:deepseek+gemini); substance מדולג (INV-LRN5); הפיך + גיבוי CSV. פאנל-הלכות: docstring/SCRIPTS.md עודכנו (--apply מחווט). Frontend (web-ui/): כפתור "העלאת החלטה סופית של היו"ר" + שני כפתורים מדורגים "הרץ למידת-קול"/"הרץ אימות-הלכות" ב-drafts-panel; כל התוויות בעברית (badge מקור-לקח: "פאנל: דיפסיק+גמיני", "הרמס (סקירה)"...). Spec: docs/spec/07-learning.md §0.6. Invariants: INV-LRN1/LRN4/LRN5, G10 (שער-יו"ר ידני להטמעה ל-SKILL.md/lessons.md — הפאנלים יוצרים הצעות בלבד); G2 (מסלול-סופי הוא יכולת חסרה, לא מסלול-מקביל). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 09:03:26 +00:00
Chaim	638eef6803	feat(ops): /operations — מוני-תור אחידים, "מה רץ עכשיו", וניהול-תהליכים הדף הציג את התורים באופן לא-אחיד (by_status גולמי), בלי הבחנה בין "ממתין" (בקלוג: status=pending) ל"בתור" (התור הפעיל: requested_at IS NOT NULL), בלי הצגת הפריט שרץ כרגע, ובלי שום שליטה בתהליכים. מה נוסף: 1. כרטיסי-תור אחידים — בתור / ממתין(בקלוג) / בעיבוד / הושלם / נכשל + "רץ עכשיו" (citation/case_number של הפריט בעיבוד) לכל drain (אחזור-פסיקה, מטא-דאטה, הלכות, יומונים). שערי-אנוש (אישור-הלכות, פסיקה-חסרה) נשארים מוני-סטטוס. 2. פאנל ניהול-תהליכים בסגנון "שירותי Windows": - דמון (court-fetch-service/xvfb/chat/reaper): הפעל-מחדש / עצור / הפעל. - cron drain: "הרץ עכשיו" (pm2 restart) + מתג הפעל/כבה תזמון. 3. כל תגי-הסטטוס מתורגמים לעברית. מנגנון: - הפעל/כבה תזמון = דגל ב-DB (טבלה drain_controls). pm2 cron_restart מחיה תהליך שעוצר ב-stop, לכן ה"כיבוי" האמין הוא דגל שכל drain בודק ב-startup (no-op מיידי כשכבוי). הקונטיינר כותב/קורא ישירות מ-DB. - הרץ-עכשיו + restart/stop/start = proxy ל-pm2 דרך endpoint חדש בגשר-המארח (court_fetch_service /pm2/control), מאובטח Bearer + whitelist ל-legal-* בלבד. - יומונים: drain_digests הועבר מ-crontab ל-pm2 (legal-digest-drain.config.cjs) כדי שיופיע ויהיה שליט כמו כל drain. drain_halacha_queue.py הובא לבקרת-גרסאות. Invariants: מקיים G2 (הרחבת /operations + הגשר הקיים, לא מסלול מקביל) ו-G1 (drain_controls = מקור-אמת יחיד לכיבוי, נורמליזציה במקור ולא תיקון-בקריאה). אין בליעת שגיאות שקטה (הגשר מחזיר {ok,error}; המוטציות מציגות toast). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 08:57:23 +00:00
Chaim	8d2f1ea0a2	feat(X13 Tier-0): decode supremedecisions API — fetch serial-format Supreme verdicts The 211 open missing_precedents include 99 Supreme serial-format rulings (בג"ץ/בר"מ/עע"מ NNNN/YY) with no נט-format triple — fetchable only from supremedecisions.court.gov.il. Decoded its public JSON API (no browser, no CAPTCHA, no smart-card); validated live on בג"ץ 3483/05 + בר"מ 10212/16. - court_fetch_supreme.py: rewrite. POST Home/SearchVerdicts with a structured `document` ({Year:"YYYY", CaseNum, OldMainNumFormat:true, SearchText:[…]}) + X-Requested-With header → records; GET Home/Download?path=&fileName=&type=4 → PDF. The earlier attempt failed only on the request shape (string vs object). 2-digit→4-digit year; try candidate docs best-first (פסק-דין→pages), skipping the published-report 's'-prefix files the free endpoint WAF-blocks. - orchestrator: on successful ingest, close matching open missing_precedents (link to the new case_law). End-to-end validated (בר"מ 10212/16 → corpus). - backfill_missing_precedents.py: enqueue fetchable open gaps (supreme + net) into court_fetch_jobs; the drainer fetches+ingests+closes. dry-run default. - X13 spec + SCRIPTS.md updated (Tier-0 decoded, no longer a limitation). Very old un-digitized Supreme cases (e.g. בג"ץ 389/87 → 0 records) → manual. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 06:53:31 +00:00
Chaim	97ede1a49d	fix(extraction): self-heal stale halacha 'processing' rows + scheduled drainer The halacha extraction queue was stuck (same class as the metadata issue): 26 precedents requested extraction with no drainer, plus 1 orphaned in 'processing' (status=processing, requested_at cleared → never re-picked by the queue). - db.requeue_stale_processing_extractions(kind): re-stamp orphaned 'processing' rows (requested_at IS NULL) so they re-drain; halacha extractor force=False resumes from chunk checkpoints (no duplicates). - process_pending_extractions calls it at the top — fully unattended, safe under the global advisory lock. Mirrors the digests-drain self-heal. - legal-halacha-drain.config.cjs: pm2 cron (every 2h, conservative — Claude is slow/rate-limited and each run adds to the chair's pending_review queue). drain_halacha_queue.py stays on claude_session (high reasoning quality for holding/ratio; NOT moved to Gemini). SCRIPTS.md. The chair-approval gate (INV-G10) is untouched — this only produces halachot; Daphna still approves each in /approvals. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 06:04:53 +00:00
Chaim	d95a36f310	feat(extraction): precedent metadata via Gemini Flash + scheduled drainer The /precedents metadata queue was stuck — 24 rows requested, nothing draining them — and the agentic claude CLI hit error_max_turns on what is a single structured text→JSON task (slow + flaky). Metadata extraction is bounded extraction, the wrong fit for an agentic loop. - gemini_session.py: query_json drop-in (gemini-2.5-flash, JSON mode, httpx — no new SDK dep). Reads GEMINI_API_KEY (~/.env; SoT Infisical nautilus:/external-apis/gemini). Host-side only — no LLM from the container. - precedent_metadata_extractor: claude_session.query_json → gemini_session. Validated live: rich, accurate fields (case_name/summary/appeal_subtype/tags). - process_pending_extractions: kind-aware cooldown — metadata 2s (Gemini, fast), halacha keeps 30s (Claude rate limits). - drain_metadata_queue.py + legal-metadata-drain.config.cjs (pm2 cron */15) so the queue never clogs again. SCRIPTS.md. - X8 INV-FP5 updated: per-task engine choice (Gemini=bounded metadata, claude_session=agentic halacha), both host-side, single canonical queue (G2). Agentic/voice-sensitive work (writing, analysis, halacha) stays on claude_session (Daphna's subscription). Gemini cost ≈ $0.10/1M tokens — negligible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 05:13:49 +00:00
Chaim	da4ebeb724	feat(halacha): panel safety-net audit (selective-prediction monitoring) Periodic safety net for the multi-judge approval panel: samples panel-approved halachot, re-runs the same 3-judge KEEP vote, and surfaces any that now lean DROP — candidate false-keeps a human should glance at. Report-only by default; --flag reopens flips to pending_review. Baseline 0/15 on the 2026-06-07 batch. Closes the loop the literature prescribes (Trust-or-Escalate / selective prediction): monitor the auto-decision error rate rather than trusting it blindly. Reuses halacha_panel_approve's judges (single source of truth). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 05:01:03 +00:00
Chaim	dba2a131e0	feat(halacha): multi-judge approval panel + policy calibration (Trust-or-Escalate) The chair cannot review every pending halacha. Three independent-lineage judges (Opus via claude_session · DeepSeek · Gemini-2.5-flash — #1 on LegalBench) vote on the COARSE axis we proved reliable across models (92%): "is this a genuine, keepable rule?". Only an agreed verdict acts; every split escalates to the chair (INV-G10). Buckets: clean→KEEP?; nli_unsupported→entailment re-adjudication; extraction-defects→re-extraction. halacha_panel_calibrate.py calibrates the voting policy on the gold-set's is_holding (the coarse label) per Trust-or-Escalate (ICLR 2025): unanimous → 94.9% precision / 78% coverage; majority → 92.9% / 99%; ZERO false-drops in both (the panel never rejects a good rule). Chosen policy (chair-approved): clean→majority-2/3, nli→asymmetric (majority-reject, unanimous-approve), defects→re-extraction. Reversible (--apply backs up review_status+flags first). Sources: Panel-of-LLM-Evaluators (PoLL) · Trust-or-Escalate (ICLR 2025, arXiv:2407.18370) · selective-prediction / learning-to-defer. Invariants: upholds G10 (human gate — splits escalate, panel only collapses the queue) and G9 (provenance — reviewer records the panel + policy). Read paths only in calibrate; --apply writes review_status/quality_flags reversibly with backup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 21:11:30 +00:00
Chaim	c1abf2ec0e	feat(digests): scripts/drain_digests.py — local enrichment drainer for cron (X12) ריקון תור ההעשרה של יומונים מקומית (claude_session local-only): כל digest 'pending' → enrich_digest (Sonnet + embedding + autolink). מקבילי (3), idempotent, מוסיף ~/.local/bin ל-PATH (claude CLI תחת cron). מיועד ל-cron יומי אחרי ה-poll של n8n (flock למניעת חפיפה) + שימוש ידני אחרי backfill. SCRIPTS.md עודכן. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:40:45 +00:00
chaim	6468e151d9	Merge pull request 'refactor(digests): single source of truth — drop processed/ folder state (X12)' (#122 ) from worktree-digests-single-truth into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 28s Details	2026-06-07 20:33:43 +00:00
Chaim	fb40ec8565	refactor(digests): single source of truth — drop processed/ folder state (X12) ה-DB (`digests`) הוא מקור-האמת היחיד למצב-קליטה. ingest_digests_batch.py העביר קבצים incoming→processed/ — state מבוסס-תיקיות מקביל ל-DB (הפרת-G2 קטנה). - הוסר ה-move ל-processed/ + import shutil + PROCESSED. הסקריפט מסתמך על dedup ב-content_hash (ingest_digest מחזיר 'exists' לקיימים) → הרצה חוזרת בטוחה. - תיקיות (incoming/) = staging בלבד, לא state. - X12 INV-DIG2: תועד מקור-אמת-יחיד + ההפרה-שתוקנה (processed/). - SCRIPTS.md עודכן. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:33:18 +00:00
Chaim	f4f110f0d1	feat(X13): scheduled drain — fully-autonomous digest→fetch→ingest loop - scripts/drain_court_fetch.py: drives orchestrator.drain_pending (host-only; no-op when queue empty). Mirrors drain_halacha_queue.py. - scripts/legal-court-fetch-drain.config.cjs: pm2 cron (hourly :17, one-shot), COURT_FETCH_DRAIN_CRON override. - fix: orchestrator default service URL 127.0.0.1 → 10.0.1.1 (the service binds the docker0 gateway; the host can't reach it on loopback). Found live — the first drain failed "connection refused" until corrected. - SCRIPTS.md entries. Validated end-to-end in PRODUCTION on a real digest: עת"מ 43830-12-24 (החברה להגנת הטבע) fetched from נט המשפט → case_law (79 chunks, source_url), digest relinked (INV-DIG3 closed), halacha queued pending_review. job=done. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:31:53 +00:00
Chaim	808c2e4c46	feat(goldset): independent second-judge for rule_role (break AI-anchoring) The gold-set's human role tags were made while seeing a claude AI recommendation, so human↔AI agreement (~100%) is anchoring, not an independent accuracy signal. This adds a third, genuinely independent judge — a DIFFERENT model (DeepSeek, direct OpenAI-compatible API) classifies rule_role BLIND (never sees the human tag nor the first AI's answer) — and reports an inter-rater agreement matrix. Finding (100 tagged items): ai↔human 100% (anchored) vs deepseek↔human 50% fine-grained — BUT 92% on the coarse axis (generalizable-rule vs application/ obiter). Conclusion: the fine sub-type (holding/interpretive/procedural) is an inherently fuzzy boundary two capable models split differently; the coarse "is this a real rule" axis is robust across models. Use the coarse axis as ground truth; treat the sub-type as advisory, never as a gate. Zero chair tagging, read-only on the gold-set. Key from ~/.hermes deepseek env. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 20:12:58 +00:00
Chaim	e186183527	fix(X13): harden court-fetch against browser leaks + reaper for task-master-mcp leak שלוש שכבות-הגנה נגד דליפת-זיכרון מדפדפנים יתומים, + טיפול בדליפה הגדולה בפועל בשרת (task-master-mcp). - camofox_client.py: - asyncio.wait_for קשיח סביב כל ה-fetch (COURT_FETCH_HARD_TIMEOUT_S=180ש') — hang → ביטול → async-with tear-down → reap. - _reap_orphan_browsers(): הורג camoufox-bin יתומים (ppid=1) לפני ואחרי כל fetch. סדרתיות (INV-CF4) → כל ppid=1 הוא שארית בטוחה. - scripts/reap_orphan_procs.py: reaper כללי ל-task-master-mcp (~3GB יתומים) + camoufox-bin. רק ppid=1; /proc טהור. --dry-run / --loop N. - scripts/legal-reaper.config.cjs: דמון pm2 (loop 180s, max_memory_restart 100M). - X13 spec + SCRIPTS.md: תיעוד שכבות-ההגנה. max_memory_restart בשירות (1.5G) כבר נותן רשת-ביטחון ברמת-התהליך. Invariants: מקיים INV-CF4 (politeness/serial) — ללא שינוי חוזה. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 19:43:53 +00:00
Chaim	781f24c643	feat(X13 Tier-1): calibrate נט המשפט fetch — Camoufox python, proven on 46111-12-22 אומת end-to-end: פס"ד 34 עמ' של עת"מ 46111-12-22 הורד אוטונומית מלא, נטו קוד-פתוח, ללא כרטיס-חכם וללא פתרון-CAPTCHA. ממצאי-כיול עיקריים: - החיפוש+הניווט-לתיק ללא reCAPTCHA כלל. reCAPTCHA קיים רק בצופה ורק על שמירה/הדפסה מפורשת — לא על הצגת המסמך. - הצופה מגיש עמודים כ-PNG דרך PageMethod GetImages (4/batch); משיכה ב-fetch עם הכותרת X-Requested-With: XMLHttpRequest (חובה — F5 WAF חוסם בלעדיה) → הרכבת PDF (Pillow). שינויים: - camofox_client.py: שכתוב מלא — Camoufox דרך חבילת-הפייתון (in-process, לא שרת-Node REST). מסלול מכויל: home→btnExternalSearchCases→Bama fields→ CaseDetails→פסקי דין→DecisionList→NGCSViewerPage→GetImages→PDF. - pm2 config: app Xvfb :99 + DISPLAY=:99 (Camoufox קורס headless בלי צג וירטואלי). - pyproject: extra [court-fetch] = camoufox + faster-whisper (host-only; הקונטיינר לא מריץ דפדפן). Pillow כבר בבסיס. - X13 spec + SCRIPTS.md: עודכנו לממצאים (image-API, Xvfb, אימות). reCAPTCHA audio (Whisper) נשמר כ-fallback למסלול-השמירה-המפורש בלבד; המסלול הראשי אינו זקוק לו. Invariants: מקיים INV-CF1/CF4/CF6 (ללא שינוי). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 19:32:13 +00:00
chaim	f3740fef68	Merge pull request 'fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7)' (#112 ) from worktree-halacha-authority-split into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m32s Details	2026-06-07 18:19:43 +00:00
Chaim	2e33cac043	fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7) The extractor classified rule_type by SOURCE bindingness (higher-court→binding, committee→persuasive) instead of by rule KIND. The gold-set proved it: 'binding' appeared on 19/19 external rulings & 0 committees; 'persuasive' on 13/13 committees & 0 external — only 58% agreement with the human role tags. The two axes (authority vs rule role) were crammed into one enum. This splits them per INV-DM7: - authority (binding/persuasive) — DERIVED from case_law.precedent_level (עליון/מנהלי→binding, ועדת_ערר_מחוזית→persuasive), never stored, never LLM-guessed. New helper halacha_quality.derive_authority; surfaced read-only in list_halachot / goldset_list / search results. - rule_type — now the rule ROLE only: holding/interpretive/procedural/ application/obiter. Both extractor prompts unified to this vocabulary; _coerce_halacha no longer defaults rule_type from the source; legacy binding→holding / persuasive→interpretive fold for safety. UI: authority shown as a separate read-only badge (gold=מחייב / muted=משכנע) across the review queue, precedent detail, and gold-set; the gold-set role selector drops binding/persuasive and adds מהותי (holding). Migration: scripts/halacha_rule_role_backfill.py re-classifies the 276 pre-split binding/persuasive rows into a genuine role via local claude_session (run after deploy). Gold-set correct_type/ai_correct_type 'binding'→'holding' via SQL. Sources (≥3, per research-decision policy): OASIS LegalRuleML v1.0 (appliesAuthority/Strength as metadata orthogonal to rule logic) · SemEval-2023 Task 6 LegalEval (rhetorical roles by function, authority kept separate) · Bluebook signals (weight-of-authority is a separate dimension). Invariants: ESTABLISHES INV-DM7. Upholds G1 (normalize at source — extractor classifies role, system derives authority) and G2 (single source of truth — authority derived, not a parallel stored field). Tests: 211 pass + new derive_authority/coerce coverage. web-ui build + tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:18:41 +00:00
Chaim	0990db7a3c	feat(X13): auto-fetch court verdicts from נט המשפט → corpus (Tier 0 + scaffold) תת-מערכת אחזור-פסיקה אוטומטי: כשיומון מצביע על פס"ד בית-משפט, מסווגים את הערכאה, מורידים מהמקור הציבורי המתאים, וקולטים דרך צינור-הקליטה הקנוני. - spec-first: docs/spec/X13-court-fetch.md (INV-CF1..CF7) + אינדקס - מסווג court_citation.py (supreme/admin/skip) + 10 בדיקות (עת"מ 46111-12-22 → admin) - Tier 0: court_fetch_supreme.py — supremedecisions API (reverse-engineered), httpx + browser-headers (אומת 200) + politeness - תור court_fetch_jobs (SCHEMA_V30) + DB helpers + court_fetch_orchestrator.py - Tier 1 scaffold: legal-court-fetch-service (aiohttp+Bearer, מראת legal-chat-service) + camofox_client (Camoufox open-source) + recaptcha_audio (Whisper מקומי) + pm2 - Tier 2 fallback חינני: manual + missing_precedent (INV-CF2/CF3 — אין drop שקט) - כלי-MCP court_verdict_fetch / court_fetch_status; SCRIPTS.md Invariants: מקיים G2 (מסלול-קליטה יחיד, INV-CF1) · G3/G1 (idempotent+נרמול, INV-CF5) · G4/§6 (אין בליעה שקטה, INV-CF2) · G10 (שער-אנושי, INV-CF3) · G5 (source_type, INV-CF6) · G9 (provenance+audit, INV-CF7). מקורות INV-CF4: RFC 9309 · Google crawler · OWASP OAT. Follow-ups (טרם אומתו חי): live Tier-0 validation · התקנת camofox-browser+whisper · כיול selectors Tier-1 · COURT_FETCH_SHARED_SECRET (Infisical+Coolify) · טריגר מ-digest try_autolink (worktree-digests-radar). V30 עלול להתנגש עם digests-radar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 18:12:13 +00:00
Chaim	8171572cdd	feat(digests): קורפוס יומונים כשכבת-גילוי (radar) — X12 מאגר חדש ליומוני "כל יום" (עפר טויסטר) כשכבת-גילוי מעל קורפוסי-הפסיקה: מקור-משני המצביע על פסק הדין המקורי, נקלט לטבלה נפרדת `digests`, נחפש סמנטית, ומקושר לפסק המקורי בספריית הפסיקה — אך לעולם אינו מצוטט בהחלטה ואינו מחלץ הלכות. Phase 0 (spec): - docs/spec/X12-digests-radar.md — INV-DIG1 (מצביע לא מצוטט) / INV-DIG2 (מסלול-קליטה נפרד, לא מקביל — מקיים G2) / INV-DIG3 (קישור-לפסק הוא הגשר; חוסר-קישור = פער גלוי). עדכון אינדקס 00/03/README. Phase 1 (MVP): - SCHEMA_V30: טבלת `digests` (HNSW על embedding — לא ivfflat, להימנע מ-recall cliff בקורפוס קטן/צומח) + GIN/FTS + UNIQUE חלקי ל-idempotent. - services/digest_metadata_extractor.py — חילוץ-LLM (claude_session local-only, ייבוא lazy): תג-מושג, כותרת-הלכה, מראה-מקום, שני-תאריכים מובחנים, תגיות. - services/digest_library.py — מסלול קצר עצמאי (INV-DIG2): extract→hash→LLM→ embedding יחיד→autolink. לא משתמש ב-ingest.ingest_document. - tools/digests.py + רישום 7 כלים ב-server.py (digest_upload/list/get/link/ relink/delete + search_digests). - scripts/ingest_digests_batch.py — קליטה ידנית מ-data/digests/incoming. - legal-researcher.md: שלב 2ב.0 (סריקת-radar לפני אימות) + סעיף-דוח ט + 3 כלים ב-frontmatter. HEARTBEAT §8: ניתוב יומון→digest_upload. אומת end-to-end: 4 יומונים נקלטו (מטא-דאטה מדויק), חיפוש סמנטי מדרג נכון ("היטל השבחה"→5160, "תמא 38"→5158), link/relink/autolink/revert + מעטפת-MCP. Invariants: מוסיף INV-DIG1/2/3 (X12). מקיים G2 (bounded context נפרד, לא מסלול מקביל), G3 (idempotent upsert), G4 (אין בליעה שקטה — פער-קישור מוצף), G9 (עקיבוּת — היומון מצביע על מקור עקיב). נוגע G7 (RRF) — נדחה, חיפוש סמנטי-בלבד בשלב 1 (FTS index מוכן). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 17:49:00 +00:00
Chaim	0e35060d3d	feat(goldset): AI second-opinion per item (QA aid) — compare vs human tag The chair wanted an independent recommendation beside each tag, to reconsider his own judgments. Adds a NON-ground-truth AI second-opinion: - schema: halacha_goldset.ai_is_holding / ai_correct_type / ai_rationale / ai_generated_at (additive). - db.goldset_set_ai_recommendation + goldset_list now returns the ai_* fields. - scripts/goldset_ai_recommend.py — local claude_session judges is_holding + type + a one-line rationale per item, INDEPENDENTLY (own legal rubric). Independent of the rule-based validators #81.8 measures → no circularity. Never auto-applied; QA aid only. - web-ui: each card shows "🤖 המלצת AI: הלכה/לא · type" + rationale and an agreement/disagreement chip vs the human tag (amber on disagree); a "⚠ אי-הסכמות AI (N)" filter to review only the conflicts. Methodology note kept explicit: the human stays the ground truth; the AI is a prompt to reconsider, not to copy. Verified: tsc --noEmit 0; generator stores recs and flags disagreements with existing human tags. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:24:35 +00:00
Chaim	b7b44f4453	feat(halacha): equivalent-halacha (parallel-authority) links across precedents Cross-precedent recurrence of a principle is real but is NOT citation corroboration (X11) — the 5 candidate pairs have ZERO citations between their precedents. Recording them in halacha_citation_corroboration would fabricate citation data and inflate corroboration_count. This adds a proper, separate halacha-level link for parallel authority. Schema (V28): equivalent_halachot — symmetric (halacha_a < halacha_b, CHECK + UNIQUE), non-citation, cross-precedent-only. ON DELETE CASCADE. db.py: - link_equivalent_halachot (idempotent; rejects same-id and SAME-precedent pairs — parallel authority is cross-precedent by definition), unlink, and list_equivalent_for_halacha. - list_halachot gains include_equivalents → _annotate_equivalents attaches an `equivalents` list (both directions) per row. API: include_equivalents on GET /api/halachot; GET/POST/DELETE /api/halachot/{id}/equivalents for the chair to view/link/unlink manually. scripts/halacha_batch_reconcile.py: --link records found cross-precedent pairs as equivalent_halachot (non-destructive, idempotent). web-ui: Halacha.equivalents type; the clean review queue fetches include_equivalents; the review card shows a gold "עיקרון מקביל ב-N" badge + an expandable list (case + rule + similarity) labeled "אסמכתה מקבילה — לא ציטוט". Populated the 5 reviewed pairs (chair decision: keep all + link as parallel authority). Verified: 5 rows; the 1023-20 hub annotates 3 of its halachot with equivalents; tsc --noEmit exits 0. Invariants: G1 (model recurrence at source in its own table, not by abusing the citator); G2 (no parallel path — extends list_halachot); citator integrity preserved (corroboration stays citation-only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:29:46 +00:00
Chaim	1286a1e60d	feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82) Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes (pure + tested) plus measurement/ops tooling. halacha_quality.py - #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags now takes rule_type and flags rule_type=='application' OR fact-dependent — blocking auto-approve (an illustration is not a generalizable holding). - #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein / lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band. halacha_extractor.py — pass rule_type to the flag computation; re-type a binding-labeled fact-application to 'application' (mirrors non_decision→obiter). db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a possibly-distinct principle unreviewed). config.py — HALACHA_DEDUP_BAND_COSINE (0.83). Scripts: - scripts/halacha_goldset.py (#81.7) — export stratified sample for human tagging; score validators (P/R/F1) against the tags. Backbone for #81.8. - scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent dedup (cosine ≥0.95), dry-run report only. - scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds against the 2026-06-03 cleanup gold-set. Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE on normalized quote are NOT included — the current skip+flag behavior is safe, whereas a UNIQUE on normalized_quote would fail on existing dups and a blind merge risks losing provenance; they need their own chair-reviewed migration. #82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role classifier deferred (section pre-filter + application flag cover the practical case); #81.8 blocked on the human-tagged gold-set (harness now provided). Verified: - pytest tests/test_halacha_quality.py — 52 passed (14 new). - calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall 0.30 — correct profile for an auto-approve-blocking signal. - goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5 cross-precedent candidate pairs. Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no silent swallow — suspect items flagged to review, never dropped); G2 (no parallel path — same store_halachot_for_chunk / compute_quality_flags). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:55:45 +00:00
Chaim	fb51a0e869	feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86 ) #86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route. extractor.py - extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings summary) before it is stripped — a free professional gold-set (#86.3). - _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped. (a) פסק-דין headers are markdown-wrapped (פסק דין); the old anchor required the keyword as the first line char with one separator, so it missed the header and matched a citation 32K deep (עמ"נ 50567-07-21, losing 45% of the body). Now tolerates leading markdown + 0-3 seps, and the final-nun form (דין ן vs דינו נ). (b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The authoring-judge line ends with a colon; we now require it. ingest.py - capture the ratio before stripping and store it on the row (best-effort, non-fatal); also strip the text-upload path (was file-only). db.py - add case_law.nevo_ratio column (additive); allow it in update_case_law. scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration: finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose quote lives in the removed preamble (review_status=pending_review + nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are excluded from --apply as suspected over-strip. --apply writes backup+manifest to data/audit/ first. Chair-gated — NOT applied here. scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session, zero cost) measures recall/precision/granularity of our halachot vs the Nevo ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text). Verified: - pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown over-strip regressions). - backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75% keep (the 32K over-strip is gone). - benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x. Invariants: G1 (normalize at source — strip/capture at ingest, not at read); no silent swallow (contaminated halachot flagged + reported, not dropped); data-migration is dry-run-default with backup+manifest, chair-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:45:43 +00:00
Chaim	2e20e27e17	feat(style-acq T1-T3): קורפוס-דוגמאות של דפנה לכותב (style_exemplars) ממלא את ערוץ-הדוגמאות (B) של מערכת רכישת-הסגנון: הכותב מאחזר פסקאות-בלוק אמיתיות של דפנה בזמן כתיבה, ממוקדות section+outcome+practice_area. T1 — תשתית + backfill: - SCHEMA_V27: טבלת style_exemplars (purpose-built — בלי תיקים מזויפים בשרשרת decision_paragraphs). decision_number/source/section/outcome/practice_area+embedding. - db: insert/delete/search_style_exemplars + count_style_exemplars. - scripts/backfill_style_exemplars.py: מפצל קורפוס דפנה (style_corpus + internal_committee) לסעיפים→פסקאות, embed, שמירה. אידמפוטנטי, dry-run/apply. T2 — אחזור ממוקד: - search_style_exemplars(section, outcome, practice_area) — section=hard filter, outcome/practice_area=soft. block_writer._build_precedents_context ממפה block→section ומאחזר (ראשי), לצד הנתיב הישן (משלים). T3 — contrastive/adapt: - הדוגמאות מתויגות "מבנה/קול בלבד — התאם, אל תעתיק תוכן"; פסקה מלאה (1100 תווים). INV-LRN5 (טוהר — סגנון בלבד). G11. הרצת backfill --apply בנפרד. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 18:10:01 +00:00
Chaim	701efab726	feat(mcp): FU-14 GAP-51 — איחוד אוצר-המילים של תוצאת-תיק (set_outcome SSoT) הכרעת-יו"ר: קנוני = 3 תוצאות אמיתיות (rejection/partial_acceptance/full_acceptance); betterment_levy יוצא מהיותו "תוצאה" ועובר ל-override לפי practice_area. + עקרון "אנגלית-ב-DB, עברית-ב-UI": מפת-תוויות SSoT אחת. lessons.py: - VALID_OUTCOMES = 3 (הוסר betterment_levy). - OUTCOME_LABELS_HE (SSoT לתצוגה) + LEGACY_OUTCOME_MAP + canonical_outcome(). - PRACTICE_AREA_OVERRIDES["betterment_levy"] מרכז את כל ה-guidance שהיה מפתוח כ-outcome (golden_ratios/opening/summary/discussion/template). - get_lessons_for_outcome(outcome, practice_area) + format_ratios_comment(..., practice_area) מחילים override + מנרמלים legacy. block_writer.py: STRUCTURE_GUIDANCE קנוני + תווית מ-OUTCOME_LABELS_HE + override betterment. workflow.set_outcome: קנוני 3 + מיפוי-legacy סלחני; תווית מ-SSoT. drafting.py: טבלת יחסי-זהב + get_decision_template מודעי-practice_area (override). web-ui case.ts: הסרת betterment_levy מ-expectedOutcomes (הוא practice_area). server.py: docstrings קנוניים. מיגרציה: migrate_gap51_outcomes.py — 9 שורות נורמלו (rejected→rejection וכו'), גיבוי ב-data/audit/. הקוד canonicalize בקריאה ⇒ backward-compatible גם בלי מיגרציה. אומת: py_compile (5 קבצים) + בדיקות-יחידה offline (override/legacy/labels) + אימות-DB. עודכנו X9 §3 + gap-audit (GAP-51 ✅). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 15:34:49 +00:00
Chaim	7f4e036211	feat(spec): חיבור ספ-המערכת למסלול-הכתיבה האינטראקטיבי (אכיפה 3-שכבתית) הספ (docs/spec/, G1–G11) חובר לסוכני Paperclip דרך INV-AG1 אבל לא למסלול שבו רוב הקוד נכתב בפועל — הסשן האינטראקטיבי של Claude Code. סוגר את הפער לפני מחזור-2 (FU-9..15), שהוא כולו כתיבת-קוד. שלוש שכבות אכיפה: 1. תיעוד — CLAUDE.md §"פרוטוקול כתיבת-קוד" + docs/spec בטבלת-הייחוס 2. hook — scripts/spec-guard.sh (PreToolUse על Edit/Write/MultiEdit, רשום ב-.claude/settings.json) מזכיר פעם-בסשן בכל נגיעה בקובץ-קוד; non-blocking 3. PR — .gitea/PULL_REQUEST_TEMPLATE.md עם סעיף-חובה "Invariants" המקבילה האינטראקטיבית ל-INV-AG1 שכבר אוכף על הסוכנים (HEARTBEAT §"קריאת-ספ"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 13:28:15 +00:00
Chaim	434341cc29	chore(#57 ): re-chunk+re-embed legacy precedents (pre-#55 chunker remediation) Adds scripts/rechunk_legacy_precedents.py: selects every case_law with a tiny chunk (content<50 — the pre-fix chunker fingerprint) and runs ingest.reindex_case_law (re-chunk+re-embed from stored full_text only, no re-OCR/LLM, idempotent). Batch-idempotent (re-queries the affected set). Run result (2026-06-03): 73 precedents reindexed, 0 failed. Tiny chunks 483 -> 4 (99.2%); total precedent_chunks 5019 -> 3115 (fragments merged). Search verified healthy (substantial coherent passages, no errors). The 4 residual tiny chunks are isolated section headings ('דיון', 'טענות המשיבים', ...) emitted by the CURRENT (fixed) chunker — not legacy fragments — and are already filtered at query time (>=50, #55). Minor chunker edge case, candidate #55 follow-up. The DB chunk migration is already applied to prod; this commit is the script + SCRIPTS.md entry only (no app code change, no deploy needed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 07:55:42 +00:00
Chaim	887079535c	feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים), לצמצום היקף האישור-הידני של היו"ר: - docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ). - docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס. - Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable; claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31: פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל. - scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות. - .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו). תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות. audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 18:42:13 +00:00
Chaim	6ff2e36bf9	feat(eval): FU-5 — retrieval eval harness + halacha backlog visibility (#63 ) Covers GAP-11 (INV-RET4/G8) and GAP-14 (INV-QA1/G10). Retrieval quality was never measured (only telemetry observation) and the halacha review backlog was invisible (the 10/19 gap was found by accident). Unit B — backlog visibility (pure code, container): - metrics.halacha_backlog(conn) → {pending_review, approved, rejected, published, total, oldest_pending_at}; surfaced in metrics.get_dashboard() (get_metrics MCP tool) and /api/system/diagnostics. Live count revealed 178 pending / 1552 total, oldest from 2026-05-03 — previously invisible. Unit A — retrieval eval harness (host-side scripts): - scripts/eval_gold_bootstrap.py — seeds data/eval/gold-set.jsonl. Two sources: citations (cited==relevant via search_relevance_feedback — empty until decisions cite precedents) and known_item (query=case_name → relevant=self; a real citation-free signal, the methodology #52 checked by hand). Idempotent; preserves source='chair' rows. - scripts/eval_retrieval.py — runs the production retrieval path (search_library / search_internal) over the gold-set; computes precision@k, recall@k, MRR, nDCG@k (k=5,10); aggregates overall + per-corpus + per-practice_area; writes a report and a delta vs committed baseline.json (which records the retrieval_config it reflects). --self-test unit-checks the metric math offline. Gold-set strategy = hybrid (chair decision): bootstrap + chair review. The citation source is empty today (0 cited precedents in decisions), so the seed is known-item (77 queries: 54 internal_decisions + 23 precedent_library). The gold-set is PROVISIONAL until Dafna reviews it (the domain chair-gate). Baseline (production config: multimodal+rerank on): R@10=0.987, MRR=0.837, nDCG@10=0.872. Finding: MULTIMODAL_ENABLED=true slightly lowers known-item recall (image-page results displace exact name matches) — relevant to #15. precedent_library weaker than internal (R@10 0.957 vs 1.0) — one external precedent unfindable by name. "CI gate" realized as discipline (re-runnable harness + committed baseline + run before/after any retrieval-layer change) — retrieval needs prod DB + Voyage, no CI runner has that access. Spec: docs/superpowers/specs/2026-05-31-fu5-eval-harness-design.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 14:58:13 +00:00

1 2

69 Commits