Commit Graph

516 Commits

Author SHA1 Message Date
2d7ab26c71 Merge pull request 'fix(#78): trigger extraction wakeup on committee-decision upload + surface silent failures' (#39) from fix/78-precedent-extraction-wakeup into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 10s
2026-06-02 12:06:56 +00:00
1d3e235556 fix(#78): trigger extraction wakeup on committee-decision upload + surface failures
The /api/internal-decisions/upload path (used by the UI for ועדת-ערר
decisions) never called pc_wake_for_precedent_extraction, so committee
decisions were stuck at halacha_extraction_status='pending' forever — the
CEO was never woken to drain the queue. Root cause behind 8027-25's stuck
extraction. The other two upload paths (precedent_library, missing-precedent)
already wake the CEO; this one was missing it.

- internal-decisions upload: add the wakeup, routing the company by case
  number prefix (1xxx→רישוי, 8xxx→היטל, 9xxx→פיצויים) when practice_area is
  empty (else an 8xxx case wrongly routes to the licensing CEO).
- all three call sites: the wake helper returns {ok:False} WITHOUT raising
  on a skipped/failed wakeup; that was silently dropped. Now logged at
  WARNING with the reason, and the upload progress carries extraction_queued.

Fallback drainer (scheduled precedent_process_pending) deferred — the
missing wakeup was the actual failure; manual precedent_process_pending
remains the recovery path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:06:31 +00:00
7471dcf3cc Merge pull request 'chore: tasks #76-78 + weekly chair-feedback lessons #34-35' (#38) from chore/tasks-and-weekly-lessons into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 8s
2026-06-02 11:57:44 +00:00
d790fb26e0 docs(lessons): weekly chair-feedback lessons #34-35 (week ending 2026-05-31)
#34 don't manufacture doubt about unambiguous statutes (s.19(ג)(2));
#35 writer/QA two-sources-of-truth sync gap (DB vs drafts/decision.md).
Output of the weekly-feedback-analysis job, pending commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:57:24 +00:00
7e34c53224 chore(tasks): add #76-78 — Paperclip create-task button + 2 precedent-upload bugs
#76 צור-משימה button (enabled but submit no-ops), #77 committee-upload
field mapping (citation→case_number, case_number uneditable), #78 silent
extraction wakeup failure. Discovered while debugging precedent 8027-25.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:57:24 +00:00
77ed0361b7 Merge pull request 'fix(appraiser-facts): valid Paperclip priority enum (normal→medium)' (#37) from fix/appraiser-facts-priority-enum into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m18s
2026-06-02 11:49:23 +00:00
5d63a903ce fix(appraiser-facts): valid Paperclip priority enum (normal→medium)
The 'חלץ עובדות שמאיות עכשיו' button returned HTTP 500. Root cause:
wake_analyst_for_appraiser_facts POSTs a child issue to Paperclip with
priority='normal', but Paperclip's ISSUE_PRIORITIES enum is only
critical|high|medium|low. createChildIssueSchema (Zod) rejects 'normal'
with 400 Bad Request; pc_request raise_for_status() turns it into a 500
surfaced to the chair. Fixed to 'medium' (the sole non-normal occurrence
in the repo).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:48:58 +00:00
aeddcb41eb Merge pull request 'feat(web-ui): sort corroborated halachot first' (#36) from feat/x11-corroborated-first into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 36s
2026-06-01 05:50:29 +00:00
1aadd3b455 feat(web-ui): sort corroborated halachot first in extracted list (X11)
Halachot carrying a corroboration badge (positive citation count or a
negative treatment) float to the top of 'הלכות שחולצו', ordered by
corroboration strength; the rest keep document order by halacha_index.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 05:50:12 +00:00
f66a2a27e7 Merge pull request 'feat(web-ui): X11 corroboration badge on halachot' (#35) from feat/x11-corroboration-web-ui into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
2026-06-01 05:04:58 +00:00
f46bf47d5b feat(web-ui): expose citation-corroboration badge on halachot (X11)
- db.list_halachot: aggregate corroboration_count (distinct positive sources)
  + corroboration_negative from halacha_citation_corroboration (LEFT JOIN)
- web-ui: CorroborationBadge — 'מתוקף · N ציטוטים' at ≥2 (gold), soft single
  citation, danger badge on negative treatment; native title tooltips
- shown in ExtractedHalachotSection (per-precedent) + halacha review panel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 05:04:31 +00:00
9f2adc4dd0 Merge pull request 'docs(X11): wire corroboration tools into CEO flow + user guide' (#34) from docs/x11-phase2-tool-integration into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 8s
2026-06-01 04:52:25 +00:00
e79f74bc23 docs(X11): wire corroboration tools into CEO halacha flow + guide (X11 Phase 2)
- CEO: run corroboration_rebuild after precedent_process_pending(halacha);
  report {approved, demoted}; tools added to allowlist
- researcher: halacha_corroboration (read) in allowlist
- TaskMaster #75 → done

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:52:02 +00:00
3bd2d16652 Merge pull request 'feat(X11): citation-corroboration Phase 2 — wire the approval gate + backfill' (#33) from feat/x11-corroboration-phase2 into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
2026-06-01 04:43:24 +00:00
b4d1fc5539 docs(audit): X11 Phase 2 corroboration backfill result (X11 Phase 2)
12 precedents, 20 links, 0 negatives. 4 halachot corroborated — all already
confidence-approved (signal fully overlaps confidence set), so 0 transitions.
Approve path proven in rolled-back tx; no chair-final state touched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:41:58 +00:00
ed547e20ad feat(corroboration): wire approval gate + backfill driver + rebuild tool (X11 Phase 2)
- db: approve_halacha_by_corroboration (pending_review→approved only),
  demote_halacha_overruled (approved→pending_review only), list_corroboration_grouped,
  precedents_with_halachot_and_incoming_citations
- corroboration: reconcile_approvals (INV-COR2/COR4/COR5), build_all backfill;
  build_for_precedent now returns approved/demoted counts
- mcp: corroboration_rebuild write tool (single precedent or full-corpus backfill)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:35:37 +00:00
df007784c9 feat(corroboration): approval_action decision fn + kill-switch (INV-COR2/COR4, X11 Phase 2)
- HALACHA_CORROBORATION_AUTO_APPROVE config (default ON, Dafna validated 2026-06-01)
- approval_action(agg, has_overruled): overruled→demote, corroborated→approve, else None
- 4 offline unit tests; Phase 2 plan + TaskMaster #75

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:34:23 +00:00
391b025e8a Merge pull request 'feat(halacha): effort קל-יותר לחילוץ-bulk (מהירות בקנה-מידה)' (#32) from feat/halacha-bulk-effort into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m36s
2026-05-31 21:34:44 +00:00
885cba543e feat(halacha): lighter effort for BULK queue-drain extraction (speed at scale)
xhigh is the quality sweet-spot for a single precedent but very slow at scale
(64-chunk case ≈ 20 min). Bulk queue-drains (process_pending over many
precedents) now use a lighter effort to cut wall-clock; interactive single
re-extraction keeps xhigh quality.

- config.HALACHA_BULK_EXTRACT_EFFORT (env, default 'high'; set 'medium' for max
  speed, 'xhigh' to match single).
- extract()/_extract_impl()/_extract_chunk() take an `effort` override threaded
  to claude_session.query_json; None falls back to HALACHA_EXTRACT_EFFORT (xhigh).
- process_pending_extractions(kind='halacha') passes the bulk effort; single
  reextract_halachot keeps xhigh.

Verified end-to-end (mocked LLM): _extract_chunk(effort='medium') → query_json
effort='medium'; effort=None → 'xhigh' fallback. Closes the open item in #72.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 21:34:13 +00:00
acfd5bae3e Merge pull request 'feat(halacha): חילוץ מצטבר crash-safe + resume (A + resume)' (#31) from feat/halacha-incremental-resume into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m5s
2026-05-31 21:28:19 +00:00
8e4ea23882 feat(halacha): crash-safe incremental extraction + resume (A + resume)
Halacha extraction held ALL chunk results in memory and stored once at the very
end — a crash/interrupt mid-run (e.g. the 2026-05-31 freeze) lost everything and
re-paid the full LLM cost on retry.

Now each chunk's halachot are stored AND the chunk is checkpointed
(precedent_chunks.halacha_extracted_at) the moment it finishes:

- V25 schema: precedent_chunks.halacha_extracted_at (per-chunk checkpoint).
- db.store_halachot_for_chunk: atomic per-chunk insert (halacha_index continues
  from MAX, caller serializes via an in-process store-lock) + checkpoint mark.
- db.reset_halacha_extraction (force) / mark_all_chunks_extracted (legacy backfill).
- _extract_impl rewritten: resume by default (skip checkpointed chunks; failed
  chunks stay pending and are retried; status stays 'processing' until all done);
  force=True wipes + redoes all. reextract_halachot passes force=True; the queue
  drain (process_pending) resumes by default.
- Legacy guard: a pre-V25 precedent (halachot exist, no checkpoints) is
  backfilled and treated as complete — never re-extracted (would duplicate).

Verified on 9002-24 (55 halachot, legacy): resume → legacy-backfill, NO
duplication (stays 55), all chunks checkpointed. Index continuation: store at
55,56 after max 54, no collision. Tracks #72.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 21:27:46 +00:00
6183e24316 Merge pull request 'fix(halacha): נעילה גלובלית — חילוץ אחד בכל רגע (מונע הקפאת מכונה)' (#30) from fix/halacha-extract-global-lock into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
2026-05-31 20:43:10 +00:00
807053ec54 fix(halacha): global advisory lock — one extraction at a time (prevents box freeze)
2026-05-31: opus-4-8 @ xhigh extraction + overlapping driver processes (agent
fallback retries each spawn an independent `python -c` driver; process_pending is
serial WITHIN a process but the box ran 4-5 drivers in parallel) → 12-16 concurrent
xhigh `claude -p` procs → load 69 → hard reboot.

Fix: halacha_extractor.extract() now takes a Postgres advisory lock
(pg_try_advisory_lock, key 'HALA') before any work. If another extraction (any
process/agent/driver — all share the legal-ai DB) holds it, the call returns
status='busy' and the precedent stays pending for the next drain. Guarantees ONE
extraction at a time ACROSS PROCESSES — an in-process Semaphore cannot (drivers
are separate OS processes). Core logic moved to _extract_impl (unchanged) under
the lock. CHUNK_CONCURRENCY now env-tunable (HALACHA_CHUNK_CONCURRENCY, default 3).

Verified: while a lock is held, extract() returns 'busy' with no LLM call; lock
releases cleanly and the next extraction proceeds. Tracks #72.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 20:42:15 +00:00
62e5e5183d Merge pull request 'fix(precedents): החלטות ועדת ערר אינן מחייבות (is_binding=false)' (#29) from fix/committee-decisions-not-binding into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 43s
2026-05-31 20:40:54 +00:00
1b62fa4af8 fix(precedents): ועדת ערר decisions are never binding (is_binding=false)
מסך העלאת הפסיקה הציג צ'קבוקס "הלכה מחייבת" עם ברירת מחדל true גם
להחלטות ועדת ערר (isCommittee), כך שהלכות שחולצו מהחלטה לא-מחייבת
תויגו rule_type='binding' — בסתירה להגדרה הדוקטרינרית (ועדת ערר =
persuasive בלבד, לא binding כמו עליון/מנהלי).

- מסלול ההגשה של החלטות ועדת ערר שולח כעת is_binding=false תמיד
- הצ'קבוקס ננעל (disabled+unchecked) כשזוהתה החלטת ועדת ערר, עם
  הסבר שההלכות יסומנו persuasive

יישור דוקטרינרי בלבד — אין השפעה downstream על ranking/injection;
rule_type הוא תווית תצוגה, והשער הפונקציונלי הוא review_status.

TaskMaster #73

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:39:59 +00:00
e712573766 Merge pull request 'docs(X11): מקורות פתוחים + אימות ההחלטה מול הספרות הפתוחה' (#28) from docs/x11-open-sources into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 9s
2026-05-31 19:30:27 +00:00
6ed5c9e99f docs(X11): foreground open-access sources; verify decision against open literature
החלפת מיקוד שורות-המקורות של INV-COR1–COR5 + תיקון-G10 ממוצרים סגורים (Shepard's/KeyCite)
למקורות פתוחים שאומתו בפועל — בהתאם ל-feedback_legal_db_authoritative_sources ולפרוטוקול
≥3-המקורות של החוקה:

- Fowler et al., Network Analysis and the Law (Political Analysis 2007) — ציטוטים-נכנסים =
  מדד-סמכות, מאומת בניבוי ציטוט עתידי (INV-COR1/COR4).
- Demir & Canbaz, Validate Your Authority (NLLP/ACL 2025) — LLM מסווג טיפול-תקדים ב-67.7–79.1%;
  הדיוק הלא-מושלם מצדיק את הסייגים השמרניים (≥N, שער-אנוש, שלילי→דגל) (INV-COR2/COR4/COR5).
- CaseHOLD (arXiv 2021) — סיווג ברמת holding (INV-COR3). LePaRD (arXiv 2023) — citation dataset.
- Hellyer (LLJ 2018, open-access), NCSC/JTC, CEPEJ, ISO 15489 — ללא שינוי, פתוחים.

מסקנה: הספרות הפתוחה תומכת בהחלטה (citator + סיווג-טיפול + סמכות-מבוססת-ציטוט), ודווקא
מחזקת את הגרסה השמרנית. אין גישה ל-Shepard's/KeyCite הסגורים — המידע עליהם הגיע ממקורות
משניים פתוחים בלבד.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 19:30:02 +00:00
a9472187ff Merge pull request 'feat(X11): citation-corroboration Phase 1 — the signal (no approval change)' (#27) from feat/x11-corroboration-phase1 into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m46s
2026-05-31 19:18:49 +00:00
5abfbd2746 feat(mcp): halacha_corroboration read-only tool (INV-COR6, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:07:37 +00:00
b57e590275 feat(corroboration): orchestrator + persistence over both citation graphs (X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:04:20 +00:00
33f955e372 feat(corroboration): aggregator — distinct positive + negative-flag (INV-COR4, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:00:16 +00:00
dbc176ae66 feat(corroboration): halacha matcher + cosine threshold (INV-COR3, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 18:57:47 +00:00
09eec6a906 feat(corroboration): treatment classifier + polarity (INV-COR2, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 18:54:50 +00:00
ca31932a5f feat(db): V24 — citation treatment column + halacha corroboration link table (X11) 2026-05-31 18:52:16 +00:00
beba24dfc5 docs(plan): X11 corroboration Phase 1 implementation plan
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:50:50 +00:00
ae8efc0b63 Merge pull request 'feat(spec): X11 ציטוט-corroboration + תיקון INV-G10 + Opus 4.8 לחילוץ הלכות' (#26) from feat/x11-citation-corroboration into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m42s
2026-05-31 18:42:59 +00:00
887079535c feat(spec): X11 citation-corroboration + INV-G10 amendment + Opus 4.8 halacha extraction
ספ חדש לשכבת citator פנימית — תיקוף הלכות לפי טיפול-שיפוטי מצטבר (ציטוטים נכנסים),
לצמצום היקף האישור-הידני של היו"ר:

- docs/spec/X11-citation-corroboration.md — 6 invariants (INV-COR1–COR6), כל אחד עם
  ≥3 מקורות מקצועיים (Shepard's/KeyCite, Hellyer LLJ 2018, UNC Law, NCSC/JTC, CEPEJ).
- docs/spec/00-constitution.md — תיקון מבוקר ל-INV-G10: השער מסופק ע"י טיפול-שיפוטי-מצטבר
  לתת-הקבוצה החיובית, שער-היו"ר נשאר חובה לזנב ולשלילי. + X11 באינדקס.
- Opus 4.8 @ xhigh כמודל חילוץ הלכות (config HALACHA_EXTRACT_MODEL/EFFORT, env-tunable;
  claude_session model/effort params; halacha_extractor מחווט). מבוסס A/B 2026-05-31:
  פחות חילוץ-יתר, 100% quote-verified, ביטחון מכויל.
- scripts/ab_halacha_opus48.py — harness A/B לא-הרסני להשוואת מודל/effort בחילוץ הלכות.
- .taskmaster #70 (FU-2c-b) — תיעוד dedup שפר + סריקת-קורפוס (0 stubs תקועים נותרו).

תנאי-קדם (זהות נקייה) הושלם: שפר מוזג לרשומה קנונית + סריקת 128 רשומות.
audit-findings גלויים ב-X11 §7: קישור הלכה↔ציטוט + סיווג-טיפול = greenfield, ל-implementation plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 18:42:13 +00:00
d83a2a2fb2 Merge pull request 'docs(spec): מחזור-2 — 8 משטחי-האפליקציה (X6–X10) + ui-audit + GAP-24..62/FU-9..15' (#25) from docs/fu9-15-cycle2-spec into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 10s
2026-05-31 16:22:42 +00:00
37c56ff22a docs(spec): cycle-2 — 8 application-surface domains (X6–X10) + ui-audit + GAP-24..62/FU-9..15
Extends the system spec beyond the core pipeline to the 8 surfaces outside it:
- X6 UI↔API contract + design rules (INV-UI1..6)
- X7 Paperclip client & connection params (INV-INT4..8)
- X8 field-population & extraction provenance (INV-FP1..5)
- X9 MCP tool contract — 71 tools (INV-TOOL1..6)
- X10 deploy/env/secrets (INV-ENV1..5)
- ui-audit.md — page-by-page UI audit (13 pages)
- 02-data-model: derived-entity invariants (INV-DM4..6)
- X4-agents: tool-grant map + INV-AG3
- gap-audit: GAP-24..62 → FU-9..15; cycle-1 (FU-1..8b) marked done
- constitution §7 + README index (X1..X10)

Planning/spec artifacts only — no application code. All engineering invariants
backed by ≥3 sources; every finding carries verified file:line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 16:21:27 +00:00
c70a03f91e Merge pull request 'chore(tasks): #71 — FU-5 follow-up (multi-precedent recall depth)' (#24) from chore/task-71-retrieval-depth into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 8s
2026-05-31 16:06:12 +00:00
1cc7c0e757 chore(tasks): #71 — FU-5 follow-up, multi-precedent recall depth tuning
Diagnosis from the FU-5 eval: co-relevant precedents for broad legal questions
rank 15-16 (retrieved, not absent — recall ~1.0 by rank 20). Tracked as a
deliberate, harness-measured tuning task rather than an unmeasured global limit
change (which affects UI + writer agents + token cost).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 16:05:53 +00:00
ae7d475103 Merge pull request 'FU-8b: חיווט הספ לסוכנים — INV-AG1 read-before-act (GAP-23)' (#23) from feat/fu-8b-spec-wiring-agents into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 8s
2026-05-31 16:03:45 +00:00
a02a606f34 feat(agents): wire spec into agents — INV-AG1 read-before-act gate (FU-8b/GAP-23)
חיווט ספ-המערכת לסוכני-Paperclip כך שכל סוכן חייב לקרוא את 00-constitution
תחילה, ואז את ספ-התחום הרלוונטי לתפקידו (לפי טבלת X4 §2) — לפני עבודה מהותית.

- HEARTBEAT.md: סעיף עליון "קריאת-ספ — קודם החוקה (00), אז ספ-התחום" לפני §0–§8,
  עם טבלת תפקיד→ספ ל-8 הסוכנים.
- 8 קבצי-סוכן (ceo/proofreader/researcher/analyst/writer/qa/exporter/hermes):
  סעיף "קרא לפני פעולה (INV-AG1)" בראש הגוף.
- X4-agents.md: שדה "אכיפה" של INV-AG1 → "מחוּוט (פרוצדורלי)"; §5 → "בוצע".

אכיפה פרוצדורלית בכוונה — invariant פרויקטלי-תפעולי, אין שער-קוד שמכריח קריאה.
prereq לסוכני-התהליך (תת-פרויקט 5). gap-audit נשמר כ-snapshot (כמו FU-8a).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 16:02:04 +00:00
ff5187c9c1 Merge pull request 'chore(eval): add 9 chair-approved semantic queries to FU-5 gold-set' (#22) from chore/goldset-semantic-queries into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 8s
2026-05-31 15:58:10 +00:00
7161c3d010 chore(eval): add 9 chair-approved semantic queries to gold-set (FU-5)
The gold-set was 77 known-item probes (query=case_name). Added 9 chair-approved
SEMANTIC queries (S1–S9) — a real legal question per row, relevant = the
precedents that should surface (drawn from subject_tags, chair-confirmed). These
test what matters: does retrieval answer a legal issue, not just find a case by
name. source='chair' (preserved across re-bootstrap). practice_area left empty
so the filter never excludes a cross-tagged precedent (s.197 rulings sit under
betterment_levy).

Baseline now 86 queries. Finding from the 9 semantic queries: MRR ≈ 1.0 — the
system surfaces a lead relevant precedent at rank 1 for nearly every question —
but R@10 ranges 0.5–1.0: for broad questions with many co-relevant precedents
(e.g. נטרול תמ"א 38 = 5 relevant → R@10 0.60; שמאי מכריע = 2 → 0.50) some
co-relevant rulings miss the top-10. Lead-precedent retrieval is strong;
exhaustive multi-precedent recall is the gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 15:57:45 +00:00
eef04b0f09 Merge pull request 'chore(eval): chair fix — rename ARAR-24-9002 → קרקעות ירושלים 2 + refresh gold-set' (#21) from chore/goldset-chair-fix-arar into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 8s
2026-05-31 15:48:17 +00:00
411ee18786 chore(eval): chair review — rename code-named record + refresh gold-set
Chair review of the FU-5 gold-set surfaced one internal_committee record whose
case_name was a code ("ARAR-24-9002") rather than a real name. Per the chair's
citation (ערר 9002/24 קרקעות ירושלים 2 בע"מ נ' הוועדה המקומית ירושלים, נבו
13.8.2025, a s.197 compensation appeal), case_name corrected in the DB to
"קרקעות ירושלים 2" (case_number 9002-24 and citation_formatted were already
correct; only 1 such code-named record exists corpus-wide). Re-bootstrapped the
gold-set (the known-item query is now the real name) and refreshed baseline
(aggregate unchanged — the case retrieves identically under the corrected name).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 15:47:57 +00:00
83d6b5ecf0 Merge pull request 'fix: drop gold-set card from chair approval center (data/ not in image)' (#20) from fix/chair-pending-drop-goldset-card into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 8s
2026-05-31 15:41:40 +00:00
c231782ee8 fix(ui): drop gold-set card from /api/chair/pending — data/ excluded from image
The gold-set card read data/eval/gold-set.jsonl, but .dockerignore excludes
data/ from the build context, so the file is never in the container and the
card silently never rendered. Baking eval data into the image is the wrong
layering (data/ is runtime volumes). The gold-set review is a one-time task,
not a recurring chair queue, so it doesn't belong on the live board — it's
tracked via task #63 and reviewed directly with the chair. The board now
returns the 4 robust DB-backed gates (halachot, missing precedents, feedback,
qa_failed). Removes the best-effort file read + its unused Path import.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 15:41:00 +00:00
dfa2f5bd7f Merge pull request 'מרכז אישורים — chair approval center (everything Dafna must approve, in one page)' (#19) from feat/chair-approval-center into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 37s
2026-05-31 15:37:00 +00:00