legal-ai/mcp-server/tests at 6933d1d0163cc4d59331f7f5cd5fc717aab69084 - legal-ai - Dafna Tamir Vault

ezer-mishpati/legal-ai

Files

History

Chaim 5f93c7492f

G12 Leak-Guard / leak-guard (pull_request) Successful in 5s

Details

fix(halacha): #81.7 — report Gwet AC1 + consensus-vs-human (κ paradox under skew)

ריצת-הפאנל החיה חשפה Fleiss κ=-0.07 למרות 97.5% הסכמה-גסה (28/40 פה-אחד, 11/40 רוב).
זה אינו חוסר-אמינות אלא **פרדוקס-הקאפא**: ה-marginal של is_holding מוטה קיצונית
(≈הכול True, כמו 93/100 ה-keep בתוויות-האנוש), וכש-Pe→1 גם κ→0 (Feinstein & Cicchetti
1990, "high agreement, low kappa").

- gwet_ac1(): מדד הסכמה עמיד-שכיחות (Gwet 2008) — אותו Pa כמו Fleiss, אומדן-מקריות שונה
  (2·p·(1-p)). הופך לכותרת; Fleiss κ עדיין מודווח לשקיפות + raw 3/3.
- consensus-vs-HUMAN: כשקיים תיוג-יו"ר, הדוח מודד התאמת-הקונצנזוס מולו (תוקף חיצוני).
  אימות בפועל על 100 תוויות-היו"ר: 29/29 = 100% התאמה.

invariants: ללא שינוי בהתנהגות-הכתיבה; מטריקה בלבד. tests: 21 (3 חדשות, כולל מקרה-פרדוקס מפורש).
מקור: Gwet 2008 (AC1) · Feinstein & Cicchetti 1990.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-11 16:13:24 +00:00

..

__init__.py

Add Track Changes architecture for draft revisions (CMP + CMPA)

2026-04-16 18:49:30 +00:00

test_audit_provenance.py

test(audit): failing tests for audit-trail + provenance (FU-7)

2026-05-30 21:27:54 +00:00

test_claude_session.py

fix(claude_session): surface real CLI error + sanitize nested env (#85 )

2026-06-06 19:29:36 +00:00

test_corpus_constraints.py

feat(mcp): FU-14 GAP-48 פרוסה 2 — envelope אחיד ל-11 משפחות-כלים

2026-06-06 17:41:39 +00:00

test_corroboration.py

feat(corroboration): approval_action decision fn + kill-switch (INV-COR2/COR4, X11 Phase 2)

2026-06-01 04:34:23 +00:00

test_court_citation.py

fix(X13): route by נט-format availability; robust fetch error handling

2026-06-07 20:45:20 +00:00

test_docx_exporter_bookmarks.py

DOCX exporter: 3-layer RTL + David font on all slots

2026-04-28 17:37:52 +00:00

test_docx_retrofit.py

Retrofit: tighten yod-bet pattern, add cover-block fallback

2026-04-26 06:57:41 +00:00

test_docx_reviser.py

Add Track Changes architecture for draft revisions (CMP + CMPA)

2026-04-16 18:49:30 +00:00

test_export_qa_gate.py

feat(mcp): FU-14 GAP-48 פרוסה 3 — envelope למשפחת drafting (סגירת GAP-48)

2026-06-06 17:51:56 +00:00

test_fu2b_reconcile.py

feat(fu2b): flag PROC_MISMATCH (case_number prefix vs proceeding_type) for chair

2026-05-31 08:57:42 +00:00

test_goldset_panel_consensus.py

fix(halacha): #81.7 — report Gwet AC1 + consensus-vs-human (κ paradox under skew)

2026-06-11 16:13:24 +00:00

test_halacha_coerce.py

fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7)

2026-06-07 18:18:41 +00:00

test_halacha_quality.py

fix(halacha): split authority (derived) from rule_role — stop source-conflation (INV-DM7)

2026-06-07 18:18:41 +00:00

test_halacha_reextract_preserves_approved.py

fix(halacha): re-extraction preserves chair-approved halachot (INV-G10, #108 )

2026-06-10 09:08:16 +00:00

test_halacha_rhetorical_prefilter.py

feat(halacha): rhetorical-role pre-filter — fallback excludes facts/arguments (#81.6)

2026-06-11 15:52:13 +00:00

test_idempotent_ingest.py

test(reindex): stub db.mark_indexed in FU-1/FU-2a ingest fixtures (FU-3 interaction)

2026-05-30 22:07:18 +00:00

test_nevo_preamble.py

feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86 )

2026-06-06 19:45:43 +00:00

test_paperclip_access_guard.py

feat(guard): fitness function blocking raw Paperclip access (GAP-22, FU-8a)

2026-05-31 11:35:07 +00:00

test_pipeline_runtime.py

feat(pipeline): durable execution for final_halacha via LangGraph (P0, X16/INV-DUR1, #114 )

2026-06-10 09:52:35 +00:00

test_platform_port_leak_guard.py

feat(ci): G12 leak-guard — enforce the Agent Platform Port seam (R4, #113 )

2026-06-10 09:40:42 +00:00

test_precedent_corpus_isolation.py

fix(retrieval): enforce source_kind on halacha_filters — close cross-corpus leak (GAP-10, INV-RET1)

2026-05-30 17:46:59 +00:00

test_reindex_on_change.py

test(reindex): cover empty-text raise path (FU-3 review)

2026-05-30 22:13:18 +00:00

test_search_domain_scope.py

feat(mcp): FU-14 GAP-48 פרוסה 1 — envelope אחיד (SSoT) + משפחת-חיפוש

2026-06-06 16:32:07 +00:00

test_storage_staging.py

feat(storage): X14 Phase 2c — route remaining sync write-sites through storage.py

2026-06-08 08:26:09 +00:00

test_storage.py

feat(storage): X14 Phase 1 — unified storage layer (services/storage.py)

2026-06-08 07:47:49 +00:00

test_sync_verify_gate.py

feat(sync): --verify exits non-zero on drift; adapter mismatch = loud drift (GAP-21, FU-8a)

2026-05-31 11:14:44 +00:00

test_track_changes_e2e.py

Add Track Changes architecture for draft revisions (CMP + CMPA)

2026-04-16 18:49:30 +00:00

test_unified_ingest.py

test(reindex): fix mark_indexed stub arity in FU-1 fixture (FU-3)

2026-05-30 22:07:39 +00:00