Commit Graph

712 Commits

Author SHA1 Message Date
90f3c472b5 fix(goldset): single view-mode filter — can't get stuck hiding untagged
The old independent toggles had a trap: clicking "אי-הסכמות AI" set a filter,
and once all disagreements were resolved the toggle button disappeared
(rendered only when count>0) while the filter stayed ON — so the list showed
zero items and the untagged ones were unreachable.

Replaced hideTagged + disagreeOnly with one mutually-exclusive segmented
control: הכל / לא תויגו / תויגו / ⚠ אי-הסכמות, each with a live count and always
visible. No stuck state; "לא תויגו" makes the remaining work obvious.

Verified: tsc --noEmit 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:47:53 +00:00
638a542cf4 Merge pull request 'feat(goldset): AI second-opinion per item (QA aid)' (#107) from worktree-goldset-ai-recommendation into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m35s
2026-06-07 14:25:06 +00:00
0e35060d3d feat(goldset): AI second-opinion per item (QA aid) — compare vs human tag
The chair wanted an independent recommendation beside each tag, to reconsider
his own judgments. Adds a NON-ground-truth AI second-opinion:

- schema: halacha_goldset.ai_is_holding / ai_correct_type / ai_rationale /
  ai_generated_at (additive).
- db.goldset_set_ai_recommendation + goldset_list now returns the ai_* fields.
- scripts/goldset_ai_recommend.py — local claude_session judges is_holding +
  type + a one-line rationale per item, INDEPENDENTLY (own legal rubric).
  Independent of the rule-based validators #81.8 measures → no circularity.
  Never auto-applied; QA aid only.
- web-ui: each card shows "🤖 המלצת AI: הלכה/לא · type" + rationale and an
  agreement/disagreement chip vs the human tag (amber on disagree); a
  "⚠ אי-הסכמות AI (N)" filter to review only the conflicts.

Methodology note kept explicit: the human stays the ground truth; the AI is a
prompt to reconsider, not to copy.

Verified: tsc --noEmit 0; generator stores recs and flags disagreements with
existing human tags.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:24:35 +00:00
a0c1b74c55 Merge pull request 'fix(goldset): score panel open by default + sparse-negatives hint' (#106) from worktree-goldset-score-open into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 38s
2026-06-07 14:12:08 +00:00
7e7de485a4 fix(goldset): score panel open by default + sparse-negatives hint
The validator score panel was collapsed by default, so taggers thought nothing
was happening. Now open by default, with a caption explaining the metrics
measure "not-a-holding" detection and become meaningful as more "לא הלכה" items
are tagged (showing the current negative count while it's small).

Verified: tsc --noEmit 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:11:49 +00:00
e62f39aabf Merge pull request 'feat(goldset): separate court rulings from committee decisions' (#105) from worktree-goldset-source-split into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m26s
2026-06-07 13:55:27 +00:00
632fe73857 feat(goldset): separate court rulings from committee decisions in tagging
Tagging is easier one source-type at a time. goldset_list now returns
case_law.source_type; the page adds:
- a filter (הכל / פסקי דין / ועדת ערר) with live counts,
- a group-sort so even in "הכל" all court rulings come first, then all
  committee decisions,
- a per-card source badge (פסק-דין / ועדת ערר).

Verified: tsc --noEmit 0; source_type splits the live batch 58 court / 92 committee.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 13:55:06 +00:00
f60fdc2c6d Merge pull request 'fix(goldset): order help table to match the type buttons' (#104) from worktree-goldset-help-order into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 37s
2026-06-07 13:45:45 +00:00
a07622659c fix(goldset): order rule-type help table to match the buttons
TYPE_HELP popover now follows the same order as the type buttons:
מחייבת · פרשני · יישום · אמרת-אגב · פרוצדורלי · משכנע.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 13:45:30 +00:00
a1f491e9cc Merge pull request 'feat(goldset): soft consistency warning between is_holding and type' (#103) from worktree-goldset-consistency-warn into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 37s
2026-06-07 13:40:28 +00:00
5aa3d4ed99 feat(goldset): soft consistency warning between is_holding and type
"לא הלכה" + "מחייבת" (or any holding-type) is a logical contradiction — binding
means it IS the holding. Likewise "הלכה" + application/obiter. The three controls
are independent, so the combo was clickable with no signal.

Adds a non-blocking amber warning under the type buttons when is_holding and
correct_type contradict (holding ↔ binding/interpretive/procedural/persuasive;
not-holding ↔ application/obiter). Soft by design — flags the inconsistency for
the tagger to fix without forcing, leaving room for genuine edge cases.

Verified: tsc --noEmit exits 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 13:40:05 +00:00
b107654ee4 Merge pull request 'fix(goldset): "tagged" = all 3 answers + rule-type help popover' (#102) from worktree-goldset-tagged-fix into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 38s
2026-06-07 13:27:19 +00:00
27911c5beb fix(goldset): "tagged" = all 3 answers set + add rule-type help popover
Two UX fixes on the gold-set tagging page:

1. isTagged now requires is_holding AND correct_type AND quote_complete — not
   just is_holding. Previously, in "hide tagged" mode the card vanished the
   instant is_holding was clicked, so the type and quote-complete answers could
   never be set. The progress counter / "תויג" badge now reflect full tagging.

2. An info (ℹ) icon next to "הסוג הנכון" opens a popover explaining the six
   rule types (definition + the deciding test + an example each), so the tagger
   has the criteria in front of them while tagging.

Verified: tsc --noEmit exits 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 13:26:52 +00:00
1a1757f29d Merge pull request 'feat(goldset): interactive gold-set tagging page (#81.7/#81.8)' (#101) from worktree-goldset-tagging-ui into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m25s
2026-06-06 21:52:41 +00:00
ac279220c4 feat(goldset): interactive gold-set tagging page (#81.7/#81.8)
Replaces the CSV-edit workflow with an in-app tagging page so the chair/Dafna
can label the extraction-quality gold-set by clicking, and see validator
precision/recall live.

Schema (V29): halacha_goldset — a stratified, human-tagged evaluation batch
(is_holding / correct_type / quote_complete, NULL until tagged).

db.py:
- goldset_create_sample (stratified round-robin over case×rule_type, idempotent),
- goldset_list (items + halacha content + the machine's own labels),
- goldset_tag (partial — one field at a time for keyboard tagging),
- goldset_score (ports the script's P/R/F1: each validator scored as a
  not-a-holding detector against the human tags — the #81.8 input).

API: GET /api/goldset, POST /api/goldset/sample, GET /api/goldset/score,
PATCH /api/goldset/{id}.

web-ui:
- lib/api/goldset.ts (hooks),
- components/goldset/goldset-panel.tsx — card-per-item, keyboard-first
  (J/K nav, H/N holding, C/X quote), progress bar, hide-tagged toggle, and a
  collapsible live score table,
- app/goldset/page.tsx + nav link "מדגם-זהב" under ידע ולמידה.

Methodology guard kept explicit in UI + docstrings: tags are HUMAN ground truth,
no AI pre-fill (circular bias). Populated a 150-item stratified batch.

Verified: backend create/list/tag/score against the live DB; tsc --noEmit 0;
py_compile ok. (Local Turbopack build blocked by worktree symlink — CI builds clean.)

Invariants: G1 (eval set modeled at source in its own table); G2 (reuses the same
halacha_quality validators the extractor runs — no parallel scoring logic).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:52:05 +00:00
9bd247c421 Merge pull request 'feat(halacha): equivalent-halacha (parallel-authority) links across precedents' (#100) from worktree-equivalent-halachot into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m24s
2026-06-06 21:30:21 +00:00
b7b44f4453 feat(halacha): equivalent-halacha (parallel-authority) links across precedents
Cross-precedent recurrence of a principle is real but is NOT citation
corroboration (X11) — the 5 candidate pairs have ZERO citations between their
precedents. Recording them in halacha_citation_corroboration would fabricate
citation data and inflate corroboration_count. This adds a proper, separate
halacha-level link for parallel authority.

Schema (V28): equivalent_halachot — symmetric (halacha_a < halacha_b, CHECK +
UNIQUE), non-citation, cross-precedent-only. ON DELETE CASCADE.

db.py:
- link_equivalent_halachot (idempotent; rejects same-id and SAME-precedent pairs
  — parallel authority is cross-precedent by definition), unlink, and
  list_equivalent_for_halacha.
- list_halachot gains include_equivalents → _annotate_equivalents attaches an
  `equivalents` list (both directions) per row.

API: include_equivalents on GET /api/halachot; GET/POST/DELETE
/api/halachot/{id}/equivalents for the chair to view/link/unlink manually.

scripts/halacha_batch_reconcile.py: --link records found cross-precedent pairs
as equivalent_halachot (non-destructive, idempotent).

web-ui: Halacha.equivalents type; the clean review queue fetches
include_equivalents; the review card shows a gold "עיקרון מקביל ב-N" badge + an
expandable list (case + rule + similarity) labeled "אסמכתה מקבילה — לא ציטוט".

Populated the 5 reviewed pairs (chair decision: keep all + link as parallel
authority). Verified: 5 rows; the 1023-20 hub annotates 3 of its halachot with
equivalents; tsc --noEmit exits 0.

Invariants: G1 (model recurrence at source in its own table, not by abusing the
citator); G2 (no parallel path — extends list_halachot); citator integrity
preserved (corroboration stays citation-only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:29:46 +00:00
ab99cfa1d3 Merge pull request 'docs(paperclip-quirks): §5 — pruned npx cache → 500/crash-loop + fix' (#99) from worktree-pc-quirks-doc into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 9s
2026-06-06 21:24:42 +00:00
e239915fd3 docs(paperclip-quirks): §5 — pruned npx cache → 500/crash-loop + fix
Document the failure mode hit on 06/06/26: a pruned npx cache makes the
running paperclip serve GET / → 500 (deleted ui-dist) and, on restart,
crash-loop because the server's startup assertCloudDatabaseContract()
out-races the post-exec patch loop.

Records the synchronous pre-extract+patch gate now in start-paperclip.sh
(paperclip-config c824e0f), the `--help` clean-extract trick, the three
bugs found while building the fix (ui-dist vs dist marker, set -e on patch
failure, pkill -f self-match), the manual recovery runbook, and the e2e
verification.

Invariants: docs-only; touches no G*/INV-* code paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:24:12 +00:00
86f5797dbd chore(tasks): mark style-acquisition T0-T15 + #85/#87/#88 done (initiative complete)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 25s
2026-06-06 21:03:27 +00:00
25e0662ead Merge pull request 'feat(halacha-triage UI): wire gating + near-duplicate cluster cards (#84.2)' (#98) from worktree-task84.2-ui-clustering into main
Some checks failed
Build & Deploy / build-and-deploy (push) Has been cancelled
2026-06-06 21:02:09 +00:00
6dbc9130b0 Merge pull request 'feat(#99 / T10): get_style_guide — יחסי-זהב נמדדים מהקורפוס' (#97) from worktree-style-acquisition-mvp into main
Some checks failed
Build & Deploy / build-and-deploy (push) Has been cancelled
2026-06-06 21:02:03 +00:00
e4651a9d06 feat(#99 / T10): get_style_guide — יחסי-זהב נמדדים מהקורפוס לצד היעד
style_distance.measure_corpus_ratios(): מפצל כל החלטה ב-style_corpus לסעיפים
(chunker) ומחשב ממוצע %-סעיף — אגרגט "_all" + פר-תוצאה (כשיש). cached.
get_style_guide מציג שורת "נמדד בפועל" עם ⚠️ על פער מטווח-היעד.

מצב נוכחי: style_corpus.outcome לא מאוכלס → מוצג אגרגט כל-ההחלטות (n=48:
רקע 26.4% / טענות 9.7% / דיון 43.8% / סיכום 20.1%); פיצול לפי-תוצאה future-ready.
המדידה גם מאירה מגבלות זיהוי-סעיפים (כוונת T10 — לסמן פער לבדיקה). חופף-חלקית
ל-T7 שמודד adherence per-draft; זה מודד את הקורפוס. כשל מדידה מוצג, לא נבלע.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:01:42 +00:00
12313774a1 feat(halacha-triage UI): wire gating + near-duplicate cluster cards (#84.2)
Completes #84 — surfaces the backend gating/prioritization (#84.1/#84.3, PR
#93) in the chair's review UI and adds near-duplicate clustering (#84.2).

Backend
- db.list_halachot gains `cluster` (#84.2): annotates each row with cluster_id +
  cluster_size by unioning same-precedent halachot within HALACHA_CLUSTER_COSINE
  (0.90, new config). Display-only — never merges/deletes. Pairwise is confined
  to the returned set (cheap).
- GET /api/halachot exposes the `cluster` query param (default off).

Frontend (web-ui)
- Halacha type gains optional cluster_id / cluster_size (hand-written module; no
  api:types regen needed — halachot aren't typed off the generated schema).
- useHalachotPending(opts): the default "clean" queue now fetches
  exclude_low_quality + order_by_priority + cluster; needsFix:true returns the
  flagged 'needs extraction fix' bucket (filtered client-side).
- HalachaReviewPanel: a "תור נקי / דורש תיקון-חילוץ" toggle (#84.1); near-dup
  clusters collapse into ONE card showing "+N וריאנטים" with an expandable list,
  and approve/reject/defer on a clustered card applies to all variants via the
  batch endpoint (#84.2 + #84.4). Counts show true halacha totals (pendingTotal).
  New flag labels added (application / near_duplicate / nevo_preamble_leak).

Verified:
- backend: list_halachot(cluster=True) on the live queue — algorithm correct
  (groups related same-precedent rules at 0.78; none at the production 0.90
  because dedup #82 already removed near-dups — the desired state).
- frontend: `tsc --noEmit` exits 0 (type-clean); no new lint errors (the one
  lint error is pre-existing in training/learning-panel.tsx from #94). Local
  Turbopack build can't run on the worktree node_modules symlink — CI builds in
  a clean checkout.

Invariants: G1 (gate/cluster at source in SQL, not post-hoc); G2 (same
list_halachot path); §6 (flagged items routed to a visible bucket, not dropped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:01:30 +00:00
7d97ca25a2 Merge pull request 'fix(#88+#87): סנכרון DB↔file אוטומטי + claims_coverage מבחין כתב-ערר מתכתובת' (#96) from worktree-style-acquisition-mvp into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m25s
2026-06-06 20:54:52 +00:00
a571ad535b fix(#88+#87): סנכרון DB↔file אוטומטי + claims_coverage מבחין כתב-ערר מתכתובת
#88 (DB↔file, lessons #35): drafts/decision.md דרסה את עצמה רק ב-save_block_content;
renumber_all_blocks + נתיבי store_block אחרים השאירו את הקובץ stale → QA נכשל
פעמיים על אותה בעיה (CMPA-62). תיקון: _update_draft_file הפך ל-hook אוטומטי
(מקבל decision_id, מאתר case פנימית) שנקרא מ-store_block (כל persist) ומ-
renumber_all_blocks. legal-qa ממילא קורא מ-DB → שני הצדדים זהים תמיד.

#87 (claims_coverage, 1033-25): טענות מתכתובת (claim_type='reply' — תגובה/
השלמת-טיעון) סומנו "לא נענו" כ-false-positive. תיקון: check_claims_coverage
דורש מענה רק לטענות כתב-הערר (claim_type='claim', appellant); reply/תכתובת
מוחרגות. בקבלה מלאה הסף מוקל (0.2→0.4) כי העורר זכה במלואו.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 20:54:31 +00:00
c7933b9de3 Merge pull request 'chore(style-acq T11): regen API types (learning + methodology)' (#95) from worktree-style-acquisition-mvp into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 49s
2026-06-06 20:45:00 +00:00
afc1548bca chore(style-acq T11): regen API types (learning + methodology endpoints)
npm run api:types — מסנכרן types.ts המחולל עם ה-endpoints החדשים
(/api/learning/pairs, style-distance, promote). הקוד משתמש בטיפוסים ידניים
(learning.ts) אז זה היגיינה לעתיד, לא תלות. סוגר את T11.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 20:44:41 +00:00
161d0d6ed6 Merge pull request 'fix(#85): claude_session retry על כשלים חולפים של claude -p' (#94) from worktree-style-acquisition-mvp into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m24s
2026-06-06 20:09:09 +00:00
e096c51037 fix(#85): claude_session — retry על כשלים חולפים של claude -p
שורש #85 התברר: `claude -p` נכשל מדי פעם ב-exit מהיר + stderr ריק על
פרומפטים גדולים/איטיים (CEO write_interim_draft, learning_loop distillation),
**אותו פרומפט מצליח בריצה חוזרת** — כשל חולף, לא nesting (אומת: nested claude
מ-bash וגם פרומפט 70K הצליחו; הכשל אינו דטרמיניסטי).

query() עוטף spawn+communicate ב-לולאת retry (MAX_RETRIES=3, backoff לינארי
5s*attempt). FileNotFoundError + timeout נשארים דטרמיניסטיים (ללא retry).
empty-response גם מטופל כ-transient.

אומת e2e: distillation על 1130-25 רץ בהצלחה → pair=analyzed (9 שינויים,
6 style_method, 33.8% diff). פותר גם את write_interim_draft של ה-CEO.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 20:08:54 +00:00
85c5a4aacb Merge pull request 'feat(halacha-triage): quality-gated + prioritized review queue + metrics (#84)' (#93) from worktree-task84-halacha-triage into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m25s
2026-06-06 20:01:27 +00:00
420cb819f5 feat(halacha-triage): quality-gated + prioritized review queue + metrics (#84)
Backend for the halacha approval-queue triage (#84). The keyboard UI, batch
actions and defer/reject (#84.4–6) already shipped; this adds the gating,
prioritization and metrics the queue was missing.

db.list_halachot — two opt-in triage controls:
  * exclude_low_quality (#84.1): drop items carrying ANY quality_flag
    (application / quote_unverified / truncated / non_decision / thin /
    nli_unsupported / near_duplicate) — they belong in a 'needs extraction fix'
    bucket, not the chair's approve queue.
  * order_by_priority (#84.3): active-learning order — negatively-treated
    first, then most-uncertain (lowest confidence), then oldest — instead of
    FIFO, so the highest-value decisions surface first.

halachot_pending (MCP) — now gated + prioritized BY DEFAULT; include_low_quality=
true reveals the needs-fix bucket. The agent review path benefits immediately.

GET /api/halachot — same two params, default OFF (non-breaking; the UI opts in).

metrics.halacha_backlog (#84.7) — splits pending into clean vs flagged, adds
deferred, reviewed_total, approve_ratio, and a pending_by_flag breakdown, so the
backlog distinguishes real review work from extraction noise.

Deferred (documented): #84.2 near-duplicate cluster cards and wiring the UI
fetch to the new params require frontend work + an api:types regen AFTER this
deploys (the new query params aren't in prod's OpenAPI until then) — a clean
follow-up. The backend fully supports both now.

Verified against the live DB (read-only):
- pending 177 → gated-clean 110, 0 flagged items leak into the clean queue.
- priority order surfaces the lowest-confidence items first (0.55, 0.55, ...).
- backlog: pending_clean=110 / pending_flagged=67 / approve_ratio=0.916,
  pending_by_flag={nli_unsupported:59, quote_unverified:3, thin:3, truncated:2}.
- pytest tests/test_halacha_quality.py — 52 passed (no regression).

Invariants: G1 (gate at source — SQL filter, not post-hoc); G2 (no parallel
path — same list_halachot); §6 (flagged items routed to a bucket, never dropped).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 20:00:52 +00:00
32ef259843 Merge pull request 'feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82)' (#92) from worktree-task81-82-halacha-engine into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m25s
2026-06-06 19:56:22 +00:00
1286a1e60d feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82)
Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes
(pure + tested) plus measurement/ops tooling.

halacha_quality.py
- #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS
  case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags
  now takes rule_type and flags rule_type=='application' OR fact-dependent —
  blocking auto-approve (an illustration is not a generalizable holding).
- #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein /
  lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band.

halacha_extractor.py — pass rule_type to the flag computation; re-type a
binding-labeled fact-application to 'application' (mirrors non_decision→obiter).

db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent
neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with
high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a
possibly-distinct principle unreviewed).

config.py — HALACHA_DEDUP_BAND_COSINE (0.83).

Scripts:
- scripts/halacha_goldset.py (#81.7) — export stratified sample for human
  tagging; score validators (P/R/F1) against the tags. Backbone for #81.8.
- scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent
  dedup (cosine ≥0.95), dry-run report only.
- scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds
  against the 2026-06-03 cleanup gold-set.

Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE
on normalized quote are NOT included — the current skip+flag behavior is safe,
whereas a UNIQUE on normalized_quote would fail on existing dups and a blind
merge risks losing provenance; they need their own chair-reviewed migration.
#82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role
classifier deferred (section pre-filter + application flag cover the practical
case); #81.8 blocked on the human-tagged gold-set (harness now provided).

Verified:
- pytest tests/test_halacha_quality.py — 52 passed (14 new).
- calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall
  0.30 — correct profile for an auto-approve-blocking signal.
- goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5
  cross-precedent candidate pairs.

Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no
silent swallow — suspect items flagged to review, never dropped); G2 (no
parallel path — same store_halachot_for_chunk / compute_quality_flags).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:55:45 +00:00
366d89e6bb Merge pull request 'feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86)' (#91) from worktree-task86-nevo-backfill-benchmark into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m25s
2026-06-06 19:46:25 +00:00
fb51a0e869 feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86)
#86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route.

extractor.py
- extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings
  summary) before it is stripped — a free professional gold-set (#86.3).
- _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped.
  (a) פסק-דין headers are markdown-wrapped (**פסק  דין**); the old anchor
      required the keyword as the first line char with one separator, so it
      missed the header and matched a citation 32K deep (עמ"נ 50567-07-21,
      losing 45% of the body). Now tolerates leading markdown + 0-3 seps,
      and the final-nun form (דין ן vs דינו נ).
  (b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The
      authoring-judge line ends with a colon; we now require it.

ingest.py
- capture the ratio before stripping and store it on the row (best-effort,
  non-fatal); also strip the text-upload path (was file-only).

db.py
- add case_law.nevo_ratio column (additive); allow it in update_case_law.

scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration:
finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites
full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose
quote lives in the removed preamble (review_status=pending_review +
nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are
excluded from --apply as suspected over-strip. --apply writes backup+manifest
to data/audit/ first. Chair-gated — NOT applied here.

scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session,
zero cost) measures recall/precision/granularity of our halachot vs the Nevo
ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text).

Verified:
- pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown
  over-strip regressions).
- backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75%
  keep (the 32K over-strip is gone).
- benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x.

Invariants: G1 (normalize at source — strip/capture at ingest, not at read);
no silent swallow (contaminated halachot flagged + reported, not dropped);
data-migration is dry-run-default with backup+manifest, chair-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:45:43 +00:00
12bdec10fa Merge pull request 'fix(claude_session): surface real CLI error + sanitize nested env (#85)' (#90) from worktree-task85-claude-session-nested into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m27s
2026-06-06 19:30:22 +00:00
8ec24cf822 fix(claude_session): surface real CLI error + sanitize nested env (#85)
write_interim_draft failed for all blocks from the CEO MCP instance with
"Claude CLI failed (exit 1): unknown error". Two fixes:

1. Error surfacing (the certain win): on non-zero exit, capture and log
   both stderr AND stdout (the CLI sometimes writes its diagnostic to
   stdout or nowhere), so the next occurrence is diagnosable instead of
   collapsing to "unknown error". This is why #85 was unsolved — the real
   error was swallowed (engineering rule §6: no silent swallow).

2. Defensive hardening: strip Claude Code session markers (CLAUDECODE,
   CLAUDE_CODE_*, CLAUDE_AGENT_*, AI_AGENT, CLAUDE_EFFORT) from the env of
   nested `claude -p` calls and run them from $HOME, decoupling them from
   the parent agent's session/project state. Aligns query() with the
   existing query_streaming() path (which already sets cwd=HOME). Auth/
   config vars are preserved.

Note: the original adapter-context failure could not be reproduced in a
plain interactive session (nested claude -p succeeds there in both old and
new code), so the env markers are a suspect, not a proven cause. The real
value is the diagnostics. Verified: nested query() returns PONG from
inside a CLAUDECODE=1 session; unit tests cover env sanitization.

Invariants: G1 (normalize at source — fix the spawn, not readers),
G2 (no parallel path — same query()), §6 (no silent error swallow).
INV: feedback_claude_session_local_only preserved (all calls stay local).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:29:36 +00:00
3b9f77daa8 Merge pull request 'feat(style-acq T8): analyze_corpus — הסרת LIMIT 20 (כיסוי מלא)' (#89) from worktree-style-acquisition-mvp into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m24s
2026-06-06 19:25:40 +00:00
5fa76a09b4 feat(style-acq T8): analyze_corpus — הסרת LIMIT 20 (כיסוי 48/48)
LIMIT 20 קבוע השמיט בשקט שליש מקורפוס דפנה מחילוץ author-features שהפרופיל
של הכותב (T0) נסמך עליו. עכשיו limit=0 (ברירת-מחדל) = כל הקורפוס; פרמטר
lim>0 אופציונלי לתקרה.

G11.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:25:13 +00:00
32a6e2b57b Merge pull request 'fix(style-acq T9): מספור-אוטומטי אמיתי בייצוא DOCX' (#88) from worktree-style-acquisition-mvp into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m28s
2026-06-06 19:24:02 +00:00
3c68383e86 fix(style-acq T9): מספור-אוטומטי אמיתי בייצוא DOCX (היה ללא מספור)
באג: ה-exporter הסיר את הקידומת "N." והחיל סגנון "List Paragraph" — שאין לו
numPr בתבנית (אין numbering.xml) → ההחלטות יצאו **ללא מספור** כלל.

- docx_exporter._ensure_decision_numbering: מזריק abstractNum עשרוני (RTL,
  lvlJc=right) + num לחלק-המספור פעם אחת; _apply_list_numbering מחבר כל
  פסקת-גוף לרשימה הרציפה. מספור Word אמיתי — מתעדכן בעריכה, copy/paste נקי.
  אומת מבנית: numId יחיד, decimal, שתי פסקאות→אותו numId, docx נשמר.
- התאמת ANTI_PATTERNS (T7): הוסר manual_paragraph_numbers — "N." בתחילת-שורה
  הוא ה-signal הנדרש לייצוא, לא אנטי-דפוס. נשאר inline (1)..(2)/markdown/bullets.
- voice-fingerprint §3.1: תוקן — הכותב כן מקדים "N. " בתחילת-שורה (signal),
  הייצוא ממיר ל-auto-numbering. סתירה קודמת ("אל תקליד מספרים") יושבה.

⚠️ אימות-מבנה עבר; אימות ויזואלי ב-Word מומלץ על ייצוא ראשון. G11.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:23:29 +00:00
37c00bac13 Merge pull request 'feat(style-acq T14): שער-יו"ר לאישור הצעות-curator → הטמעה לפרופיל' (#87) from worktree-style-acquisition-mvp into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m42s
2026-06-06 19:18:13 +00:00
f20a3a09fd feat(style-acq T14): שער-יו"ר לאישור הצעות-curator → הטמעה לפרופיל
סוגר את הלולאה מקצה-לקצה (INV-G10/LRN1): ה-curator מציע (status=analyzed),
היו"ר מאשרת, והלקחים נכתבים לערוצים שהכותב צורך (T15) — אין auto-commit.

- db.get_draft_final_pair(id) — שורת-פנקס מלאה כולל analysis.
- app.py: GET /api/learning/pairs/{id} (חושף רק changes מסוג style_method —
  INV-LRN5) + POST .../promote (לקחים→discussion_rules['universal'],
  ביטויים→transition_phrases['universal'] דרך merge ל-appeal_type_rules;
  status→lessons_folded). _append_methodology_override משותף.
- web-ui: usePairDetail/usePromoteLearning + ProposalReview (בחירת לקחים/
  ביטויים לאימוץ) בטאב "למידה" עבור pairs במצב analyzed.

INV-G10 (שער-יו"ר) · INV-LRN1 (אין auto-commit) · INV-LRN5 (טוהר).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:17:56 +00:00
6313fcd316 Merge pull request 'feat(style-acq T6+T13): פנקס-התאמה + מדד מרחק-סגנון ב-UI' (#86) from worktree-style-acquisition-mvp into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 38s
2026-06-06 19:13:32 +00:00
ee76455a9a feat(style-acq T6+T13): פנקס-התאמה + מדד מרחק-סגנון ב-UI
ה"איך מנהלים/רואים את הלמידה": טאב "למידה" ב-/training.

- app.py: GET /api/learning/pairs (פנקס-ההתאמה — כל ההחלטות + סטטוס draft↔final,
  INV-LRN4) + GET /api/learning/style-distance/{case} (מדד T7).
- web-ui: learning.ts hooks + LearningPanel (טבלת פנקס; לחיצה על תיק →
  מדד מרחק-הסגנון: שינוי draft→final, סטיית יחסי-זהב, אנטי-דפוסים) + טאב ב-/training.

מכסה גם את T6 (רשימת כל ההחלטות הנסגרות מול הסופי). ללא endpoint-schema חדש
לטיפוסים מחוללים (טיפוסים ידניים). G9, INV-LRN4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:13:10 +00:00
7b1c0c1a32 Merge pull request 'feat(style-acq T12): /methodology — ביטויי-מעבר + אנטי-דפוסים editable' (#85) from worktree-style-acquisition-mvp into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m25s
2026-06-06 19:09:15 +00:00
e4fbda6c1f feat(style-acq T12): /methodology — קטגוריות ביטויי-מעבר + אנטי-דפוסים
מרחיב את עורך-הפרופיל ב-/methodology עם 2 קטגוריות נוספות שהכותב (T15)
והמדד (T7) צורכים — כך שהיו"ר עורכת אותן והעריכה זורמת לכתיבה:

- app.py: _METHODOLOGY_DEFAULTS += transition_phrases (מקובץ לפי תוצאה) +
  anti_patterns (מ-lessons.ANTI_PATTERNS). דרך ה-CRUD הגנרי הקיים (appeal_type_rules).
- block_writer (T15 loop): קורא overrides גם ל-transition_phrases + anti_patterns.
- web-ui: GenericMethodologyPanel (עורך key→JSON) + 2 טאבים ב-/methodology.

voice_invariants (doc) — נדחה (לא key-value). G11, INV-LRN4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:08:44 +00:00
3b3e1e3bbf Merge pull request 'docs: FU-14 GAP-54 — סגירה כ-resolved-by-FU-1 (קליטת-פסיקה כבר מאוחדת)' (#84) from docs/gap54-closure into main
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 9s
2026-06-06 19:03:14 +00:00
37dcb30604 docs: FU-14 GAP-54 — סגירה כ-resolved-by-FU-1 (איחוד קליטת-פסיקה)
אימות (G2 — לא לפתור מחדש): קליטת-הפסיקה כבר מאוחדת ע"י FU-1. שני מסלולי-
הפסיקה (precedent_library + internal_decisions) עוברים דרך
ingest.ingest_document הקנוני עם ולידציית-enums + citation-guard סימטריים
(מתועד ב-01-ingest §4). המסלול ה-3 (training→style_corpus) הוא קורפוס נפרד
במכוון. מאומת ב-test_unified_ingest (9/9). אין קוד — רק תיעוד סגירה.

Invariants: מאשר INV-ING1 + G2 מקוימים. doc-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:02:55 +00:00