legal-ai

Author	SHA1	Message	Date
Chaim	90f3c472b5	fix(goldset): single view-mode filter — can't get stuck hiding untagged The old independent toggles had a trap: clicking "אי-הסכמות AI" set a filter, and once all disagreements were resolved the toggle button disappeared (rendered only when count>0) while the filter stayed ON — so the list showed zero items and the untagged ones were unreachable. Replaced hideTagged + disagreeOnly with one mutually-exclusive segmented control: הכל / לא תויגו / תויגו / ⚠ אי-הסכמות, each with a live count and always visible. No stuck state; "לא תויגו" makes the remaining work obvious. Verified: tsc --noEmit 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:47:53 +00:00
chaim	638a542cf4	Merge pull request 'feat(goldset): AI second-opinion per item (QA aid)' (#107 ) from worktree-goldset-ai-recommendation into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m35s Details	2026-06-07 14:25:06 +00:00
Chaim	0e35060d3d	feat(goldset): AI second-opinion per item (QA aid) — compare vs human tag The chair wanted an independent recommendation beside each tag, to reconsider his own judgments. Adds a NON-ground-truth AI second-opinion: - schema: halacha_goldset.ai_is_holding / ai_correct_type / ai_rationale / ai_generated_at (additive). - db.goldset_set_ai_recommendation + goldset_list now returns the ai_* fields. - scripts/goldset_ai_recommend.py — local claude_session judges is_holding + type + a one-line rationale per item, INDEPENDENTLY (own legal rubric). Independent of the rule-based validators #81.8 measures → no circularity. Never auto-applied; QA aid only. - web-ui: each card shows "🤖 המלצת AI: הלכה/לא · type" + rationale and an agreement/disagreement chip vs the human tag (amber on disagree); a "⚠ אי-הסכמות AI (N)" filter to review only the conflicts. Methodology note kept explicit: the human stays the ground truth; the AI is a prompt to reconsider, not to copy. Verified: tsc --noEmit 0; generator stores recs and flags disagreements with existing human tags. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:24:35 +00:00
chaim	a0c1b74c55	Merge pull request 'fix(goldset): score panel open by default + sparse-negatives hint' (#106 ) from worktree-goldset-score-open into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 38s Details	2026-06-07 14:12:08 +00:00
Chaim	7e7de485a4	fix(goldset): score panel open by default + sparse-negatives hint The validator score panel was collapsed by default, so taggers thought nothing was happening. Now open by default, with a caption explaining the metrics measure "not-a-holding" detection and become meaningful as more "לא הלכה" items are tagged (showing the current negative count while it's small). Verified: tsc --noEmit 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:11:49 +00:00
chaim	e62f39aabf	Merge pull request 'feat(goldset): separate court rulings from committee decisions' (#105 ) from worktree-goldset-source-split into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m26s Details	2026-06-07 13:55:27 +00:00
Chaim	632fe73857	feat(goldset): separate court rulings from committee decisions in tagging Tagging is easier one source-type at a time. goldset_list now returns case_law.source_type; the page adds: - a filter (הכל / פסקי דין / ועדת ערר) with live counts, - a group-sort so even in "הכל" all court rulings come first, then all committee decisions, - a per-card source badge (פסק-דין / ועדת ערר). Verified: tsc --noEmit 0; source_type splits the live batch 58 court / 92 committee. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 13:55:06 +00:00
chaim	f60fdc2c6d	Merge pull request 'fix(goldset): order help table to match the type buttons' (#104 ) from worktree-goldset-help-order into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 37s Details	2026-06-07 13:45:45 +00:00
Chaim	a07622659c	fix(goldset): order rule-type help table to match the buttons TYPE_HELP popover now follows the same order as the type buttons: מחייבת · פרשני · יישום · אמרת-אגב · פרוצדורלי · משכנע. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 13:45:30 +00:00
chaim	a1f491e9cc	Merge pull request 'feat(goldset): soft consistency warning between is_holding and type' (#103 ) from worktree-goldset-consistency-warn into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 37s Details	2026-06-07 13:40:28 +00:00
Chaim	5aa3d4ed99	feat(goldset): soft consistency warning between is_holding and type "לא הלכה" + "מחייבת" (or any holding-type) is a logical contradiction — binding means it IS the holding. Likewise "הלכה" + application/obiter. The three controls are independent, so the combo was clickable with no signal. Adds a non-blocking amber warning under the type buttons when is_holding and correct_type contradict (holding ↔ binding/interpretive/procedural/persuasive; not-holding ↔ application/obiter). Soft by design — flags the inconsistency for the tagger to fix without forcing, leaving room for genuine edge cases. Verified: tsc --noEmit exits 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 13:40:05 +00:00
chaim	b107654ee4	Merge pull request 'fix(goldset): "tagged" = all 3 answers + rule-type help popover' (#102 ) from worktree-goldset-tagged-fix into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 38s Details	2026-06-07 13:27:19 +00:00
Chaim	27911c5beb	fix(goldset): "tagged" = all 3 answers set + add rule-type help popover Two UX fixes on the gold-set tagging page: 1. isTagged now requires is_holding AND correct_type AND quote_complete — not just is_holding. Previously, in "hide tagged" mode the card vanished the instant is_holding was clicked, so the type and quote-complete answers could never be set. The progress counter / "תויג" badge now reflect full tagging. 2. An info (ℹ) icon next to "הסוג הנכון" opens a popover explaining the six rule types (definition + the deciding test + an example each), so the tagger has the criteria in front of them while tagging. Verified: tsc --noEmit exits 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 13:26:52 +00:00
chaim	1a1757f29d	Merge pull request 'feat(goldset): interactive gold-set tagging page (#81.7/#81.8)' (#101 ) from worktree-goldset-tagging-ui into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 21:52:41 +00:00
Chaim	ac279220c4	feat(goldset): interactive gold-set tagging page (#81.7/#81.8) Replaces the CSV-edit workflow with an in-app tagging page so the chair/Dafna can label the extraction-quality gold-set by clicking, and see validator precision/recall live. Schema (V29): halacha_goldset — a stratified, human-tagged evaluation batch (is_holding / correct_type / quote_complete, NULL until tagged). db.py: - goldset_create_sample (stratified round-robin over case×rule_type, idempotent), - goldset_list (items + halacha content + the machine's own labels), - goldset_tag (partial — one field at a time for keyboard tagging), - goldset_score (ports the script's P/R/F1: each validator scored as a not-a-holding detector against the human tags — the #81.8 input). API: GET /api/goldset, POST /api/goldset/sample, GET /api/goldset/score, PATCH /api/goldset/{id}. web-ui: - lib/api/goldset.ts (hooks), - components/goldset/goldset-panel.tsx — card-per-item, keyboard-first (J/K nav, H/N holding, C/X quote), progress bar, hide-tagged toggle, and a collapsible live score table, - app/goldset/page.tsx + nav link "מדגם-זהב" under ידע ולמידה. Methodology guard kept explicit in UI + docstrings: tags are HUMAN ground truth, no AI pre-fill (circular bias). Populated a 150-item stratified batch. Verified: backend create/list/tag/score against the live DB; tsc --noEmit 0; py_compile ok. (Local Turbopack build blocked by worktree symlink — CI builds clean.) Invariants: G1 (eval set modeled at source in its own table); G2 (reuses the same halacha_quality validators the extractor runs — no parallel scoring logic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:52:05 +00:00
chaim	9bd247c421	Merge pull request 'feat(halacha): equivalent-halacha (parallel-authority) links across precedents' (#100 ) from worktree-equivalent-halachot into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m24s Details	2026-06-06 21:30:21 +00:00
Chaim	b7b44f4453	feat(halacha): equivalent-halacha (parallel-authority) links across precedents Cross-precedent recurrence of a principle is real but is NOT citation corroboration (X11) — the 5 candidate pairs have ZERO citations between their precedents. Recording them in halacha_citation_corroboration would fabricate citation data and inflate corroboration_count. This adds a proper, separate halacha-level link for parallel authority. Schema (V28): equivalent_halachot — symmetric (halacha_a < halacha_b, CHECK + UNIQUE), non-citation, cross-precedent-only. ON DELETE CASCADE. db.py: - link_equivalent_halachot (idempotent; rejects same-id and SAME-precedent pairs — parallel authority is cross-precedent by definition), unlink, and list_equivalent_for_halacha. - list_halachot gains include_equivalents → _annotate_equivalents attaches an `equivalents` list (both directions) per row. API: include_equivalents on GET /api/halachot; GET/POST/DELETE /api/halachot/{id}/equivalents for the chair to view/link/unlink manually. scripts/halacha_batch_reconcile.py: --link records found cross-precedent pairs as equivalent_halachot (non-destructive, idempotent). web-ui: Halacha.equivalents type; the clean review queue fetches include_equivalents; the review card shows a gold "עיקרון מקביל ב-N" badge + an expandable list (case + rule + similarity) labeled "אסמכתה מקבילה — לא ציטוט". Populated the 5 reviewed pairs (chair decision: keep all + link as parallel authority). Verified: 5 rows; the 1023-20 hub annotates 3 of its halachot with equivalents; tsc --noEmit exits 0. Invariants: G1 (model recurrence at source in its own table, not by abusing the citator); G2 (no parallel path — extends list_halachot); citator integrity preserved (corroboration stays citation-only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:29:46 +00:00
chaim	ab99cfa1d3	Merge pull request 'docs(paperclip-quirks): §5 — pruned npx cache → 500/crash-loop + fix' (#99 ) from worktree-pc-quirks-doc into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 9s Details	2026-06-06 21:24:42 +00:00
Chaim	e239915fd3	docs(paperclip-quirks): §5 — pruned npx cache → 500/crash-loop + fix Document the failure mode hit on 06/06/26: a pruned npx cache makes the running paperclip serve GET / → 500 (deleted ui-dist) and, on restart, crash-loop because the server's startup assertCloudDatabaseContract() out-races the post-exec patch loop. Records the synchronous pre-extract+patch gate now in start-paperclip.sh (paperclip-config c824e0f), the `--help` clean-extract trick, the three bugs found while building the fix (ui-dist vs dist marker, set -e on patch failure, pkill -f self-match), the manual recovery runbook, and the e2e verification. Invariants: docs-only; touches no G/INV- code paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:24:12 +00:00
Chaim	86f5797dbd	chore(tasks): mark style-acquisition T0-T15 + #85/#87/#88 done (initiative complete) All checks were successful Build & Deploy / build-and-deploy (push) Successful in 25s Details	2026-06-06 21:03:27 +00:00
chaim	25e0662ead	Merge pull request 'feat(halacha-triage UI): wire gating + near-duplicate cluster cards (#84.2)' (#98 ) from worktree-task84.2-ui-clustering into main Some checks failed Build & Deploy / build-and-deploy (push) Has been cancelled Details	2026-06-06 21:02:09 +00:00
chaim	6dbc9130b0	Merge pull request 'feat(#99 / T10): get_style_guide — יחסי-זהב נמדדים מהקורפוס' (#97 ) from worktree-style-acquisition-mvp into main Some checks failed Build & Deploy / build-and-deploy (push) Has been cancelled Details	2026-06-06 21:02:03 +00:00
Chaim	e4651a9d06	feat(#99 / T10): get_style_guide — יחסי-זהב נמדדים מהקורפוס לצד היעד style_distance.measure_corpus_ratios(): מפצל כל החלטה ב-style_corpus לסעיפים (chunker) ומחשב ממוצע %-סעיף — אגרגט "_all" + פר-תוצאה (כשיש). cached. get_style_guide מציג שורת "נמדד בפועל" עם ⚠️ על פער מטווח-היעד. מצב נוכחי: style_corpus.outcome לא מאוכלס → מוצג אגרגט כל-ההחלטות (n=48: רקע 26.4% / טענות 9.7% / דיון 43.8% / סיכום 20.1%); פיצול לפי-תוצאה future-ready. המדידה גם מאירה מגבלות זיהוי-סעיפים (כוונת T10 — לסמן פער לבדיקה). חופף-חלקית ל-T7 שמודד adherence per-draft; זה מודד את הקורפוס. כשל מדידה מוצג, לא נבלע. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:01:42 +00:00
Chaim	12313774a1	feat(halacha-triage UI): wire gating + near-duplicate cluster cards (#84.2) Completes #84 — surfaces the backend gating/prioritization (#84.1/#84.3, PR #93) in the chair's review UI and adds near-duplicate clustering (#84.2). Backend - db.list_halachot gains `cluster` (#84.2): annotates each row with cluster_id + cluster_size by unioning same-precedent halachot within HALACHA_CLUSTER_COSINE (0.90, new config). Display-only — never merges/deletes. Pairwise is confined to the returned set (cheap). - GET /api/halachot exposes the `cluster` query param (default off). Frontend (web-ui) - Halacha type gains optional cluster_id / cluster_size (hand-written module; no api:types regen needed — halachot aren't typed off the generated schema). - useHalachotPending(opts): the default "clean" queue now fetches exclude_low_quality + order_by_priority + cluster; needsFix:true returns the flagged 'needs extraction fix' bucket (filtered client-side). - HalachaReviewPanel: a "תור נקי / דורש תיקון-חילוץ" toggle (#84.1); near-dup clusters collapse into ONE card showing "+N וריאנטים" with an expandable list, and approve/reject/defer on a clustered card applies to all variants via the batch endpoint (#84.2 + #84.4). Counts show true halacha totals (pendingTotal). New flag labels added (application / near_duplicate / nevo_preamble_leak). Verified: - backend: list_halachot(cluster=True) on the live queue — algorithm correct (groups related same-precedent rules at 0.78; none at the production 0.90 because dedup #82 already removed near-dups — the desired state). - frontend: `tsc --noEmit` exits 0 (type-clean); no new lint errors (the one lint error is pre-existing in training/learning-panel.tsx from #94). Local Turbopack build can't run on the worktree node_modules symlink — CI builds in a clean checkout. Invariants: G1 (gate/cluster at source in SQL, not post-hoc); G2 (same list_halachot path); §6 (flagged items routed to a visible bucket, not dropped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:01:30 +00:00
chaim	7d97ca25a2	Merge pull request 'fix(#88+#87): סנכרון DB↔file אוטומטי + claims_coverage מבחין כתב-ערר מתכתובת' (#96 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 20:54:52 +00:00
Chaim	a571ad535b	fix(#88+#87): סנכרון DB↔file אוטומטי + claims_coverage מבחין כתב-ערר מתכתובת #88 (DB↔file, lessons #35): drafts/decision.md דרסה את עצמה רק ב-save_block_content; renumber_all_blocks + נתיבי store_block אחרים השאירו את הקובץ stale → QA נכשל פעמיים על אותה בעיה (CMPA-62). תיקון: _update_draft_file הפך ל-hook אוטומטי (מקבל decision_id, מאתר case פנימית) שנקרא מ-store_block (כל persist) ומ- renumber_all_blocks. legal-qa ממילא קורא מ-DB → שני הצדדים זהים תמיד. #87 (claims_coverage, 1033-25): טענות מתכתובת (claim_type='reply' — תגובה/ השלמת-טיעון) סומנו "לא נענו" כ-false-positive. תיקון: check_claims_coverage דורש מענה רק לטענות כתב-הערר (claim_type='claim', appellant); reply/תכתובת מוחרגות. בקבלה מלאה הסף מוקל (0.2→0.4) כי העורר זכה במלואו. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:54:31 +00:00
chaim	c7933b9de3	Merge pull request 'chore(style-acq T11): regen API types (learning + methodology)' (#95 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 49s Details	2026-06-06 20:45:00 +00:00
Chaim	afc1548bca	chore(style-acq T11): regen API types (learning + methodology endpoints) npm run api:types — מסנכרן types.ts המחולל עם ה-endpoints החדשים (/api/learning/pairs, style-distance, promote). הקוד משתמש בטיפוסים ידניים (learning.ts) אז זה היגיינה לעתיד, לא תלות. סוגר את T11. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:44:41 +00:00
chaim	161d0d6ed6	Merge pull request 'fix(#85 ): claude_session retry על כשלים חולפים של claude -p' (#94 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m24s Details	2026-06-06 20:09:09 +00:00
Chaim	e096c51037	fix(#85 ): claude_session — retry על כשלים חולפים של claude -p שורש #85 התברר: `claude -p` נכשל מדי פעם ב-exit מהיר + stderr ריק על פרומפטים גדולים/איטיים (CEO write_interim_draft, learning_loop distillation), אותו פרומפט מצליח בריצה חוזרת — כשל חולף, לא nesting (אומת: nested claude מ-bash וגם פרומפט 70K הצליחו; הכשל אינו דטרמיניסטי). query() עוטף spawn+communicate ב-לולאת retry (MAX_RETRIES=3, backoff לינארי 5s*attempt). FileNotFoundError + timeout נשארים דטרמיניסטיים (ללא retry). empty-response גם מטופל כ-transient. אומת e2e: distillation על 1130-25 רץ בהצלחה → pair=analyzed (9 שינויים, 6 style_method, 33.8% diff). פותר גם את write_interim_draft של ה-CEO. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:08:54 +00:00
chaim	85c5a4aacb	Merge pull request 'feat(halacha-triage): quality-gated + prioritized review queue + metrics (#84 )' (#93 ) from worktree-task84-halacha-triage into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 20:01:27 +00:00
Chaim	420cb819f5	feat(halacha-triage): quality-gated + prioritized review queue + metrics (#84 ) Backend for the halacha approval-queue triage (#84). The keyboard UI, batch actions and defer/reject (#84.4–6) already shipped; this adds the gating, prioritization and metrics the queue was missing. db.list_halachot — two opt-in triage controls: * exclude_low_quality (#84.1): drop items carrying ANY quality_flag (application / quote_unverified / truncated / non_decision / thin / nli_unsupported / near_duplicate) — they belong in a 'needs extraction fix' bucket, not the chair's approve queue. * order_by_priority (#84.3): active-learning order — negatively-treated first, then most-uncertain (lowest confidence), then oldest — instead of FIFO, so the highest-value decisions surface first. halachot_pending (MCP) — now gated + prioritized BY DEFAULT; include_low_quality= true reveals the needs-fix bucket. The agent review path benefits immediately. GET /api/halachot — same two params, default OFF (non-breaking; the UI opts in). metrics.halacha_backlog (#84.7) — splits pending into clean vs flagged, adds deferred, reviewed_total, approve_ratio, and a pending_by_flag breakdown, so the backlog distinguishes real review work from extraction noise. Deferred (documented): #84.2 near-duplicate cluster cards and wiring the UI fetch to the new params require frontend work + an api:types regen AFTER this deploys (the new query params aren't in prod's OpenAPI until then) — a clean follow-up. The backend fully supports both now. Verified against the live DB (read-only): - pending 177 → gated-clean 110, 0 flagged items leak into the clean queue. - priority order surfaces the lowest-confidence items first (0.55, 0.55, ...). - backlog: pending_clean=110 / pending_flagged=67 / approve_ratio=0.916, pending_by_flag={nli_unsupported:59, quote_unverified:3, thin:3, truncated:2}. - pytest tests/test_halacha_quality.py — 52 passed (no regression). Invariants: G1 (gate at source — SQL filter, not post-hoc); G2 (no parallel path — same list_halachot); §6 (flagged items routed to a bucket, never dropped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 20:00:52 +00:00
chaim	32ef259843	Merge pull request 'feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82)' (#92 ) from worktree-task81-82-halacha-engine into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 19:56:22 +00:00
Chaim	1286a1e60d	feat(halacha): application gate + lexical dedup tail + quality harnesses (#81,#82) Halacha-extraction quality (#81) and dedup-on-insert (#82) — engine changes (pure + tested) plus measurement/ops tooling. halacha_quality.py - #81.4 application gate: is_fact_dependent() (high-precision "applied to THIS case" deixis per the strict rubric §3/§27) + FLAG_APPLICATION. compute_quality_flags now takes rule_type and flags rule_type=='application' OR fact-dependent — blocking auto-approve (an illustration is not a generalizable holding). - #82.3 lexical tail signal: jaccard_shingles / normalized_levenshtein / lexical_near_duplicate + FLAG_NEAR_DUPLICATE, for the 0.83–0.93 cosine band. halacha_extractor.py — pass rule_type to the flag computation; re-type a binding-labeled fact-application to 'application' (mirrors non_decision→obiter). db.py (store_halachot_for_chunk) — dedup now fetches the nearest same-precedent neighbor once: cosine ≥ DEDUP → skip (unchanged); cosine in [BAND, DEDUP) with high lexical overlap → FLAG_NEAR_DUPLICATE (review, not skip — never drop a possibly-distinct principle unreviewed). config.py — HALACHA_DEDUP_BAND_COSINE (0.83). Scripts: - scripts/halacha_goldset.py (#81.7) — export stratified sample for human tagging; score validators (P/R/F1) against the tags. Backbone for #81.8. - scripts/halacha_batch_reconcile.py (#82.7) — conservative cross-precedent dedup (cosine ≥0.95), dry-run report only. - scripts/calibrate_halacha_dedup.py (#82.1) — calibrate the lexical thresholds against the 2026-06-03 cleanup gold-set. Deferred (documented): #82.4 merge-provenance and #82.5 DB ON CONFLICT/UNIQUE on normalized quote are NOT included — the current skip+flag behavior is safe, whereas a UNIQUE on normalized_quote would fail on existing dups and a blind merge risks losing provenance; they need their own chair-reviewed migration. #82.6 over-merge guard is moot until merge lands. #81.6 full rhetorical-role classifier deferred (section pre-filter + application flag cover the practical case); #81.8 blocked on the human-tagged gold-set (harness now provided). Verified: - pytest tests/test_halacha_quality.py — 52 passed (14 new). - calibrate: configured (0.55,0.70) → precision 1.0 (zero false-merge), recall 0.30 — correct profile for an auto-approve-blocking signal. - goldset export: 15-row sample CSV. batch reconcile: 819 halachot → 5 cross-precedent candidate pairs. Invariants: G1 (normalize at source — flag at insert, not at read); §6 (no silent swallow — suspect items flagged to review, never dropped); G2 (no parallel path — same store_halachot_for_chunk / compute_quality_flags). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:55:45 +00:00
chaim	366d89e6bb	Merge pull request 'feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86 )' (#91 ) from worktree-task86-nevo-backfill-benchmark into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 19:46:25 +00:00
Chaim	fb51a0e869	feat(nevo): backfill leaked preamble + ratio gold-set benchmark (#86 ) #86.2 backfill + #86.3 benchmark, plus a #86.1 over-strip fix found en route. extractor.py - extract_nevo_ratio(): capture Nevo's מיני-רציו block (editorial holdings summary) before it is stripped — a free professional gold-set (#86.3). - _DECISION_START hardening (#86.2): the merged #86.1 regex over-stripped. (a) פסק-דין headers are markdown-wrapped (פסק דין); the old anchor required the keyword as the first line char with one separator, so it missed the header and matched a citation 32K deep (עמ"נ 50567-07-21, losing 45% of the body). Now tolerates leading markdown + 0-3 seps, and the final-nun form (דין ן vs דינו נ). (b) bare השופט/הנשיא matched CITATIONS ("השופט מ' חשין, פסקה 23"). The authoring-judge line ends with a colon; we now require it. ingest.py - capture the ratio before stripping and store it on the row (best-effort, non-fatal); also strip the text-upload path (was file-only). db.py - add case_law.nevo_ratio column (additive); allow it in update_case_law. scripts/backfill_nevo_preamble.py (#86.2) — dry-run-by-default data migration: finds historically-leaked rulings, captures ratio→nevo_ratio, rewrites full_text (+content_hash), reindexes, and FLAGS (never deletes) halachot whose quote lives in the removed preamble (review_status=pending_review + nevo_preamble_leak flag). Safety guard: rows with keep%<--min-keep (60) are excluded from --apply as suspected over-strip. --apply writes backup+manifest to data/audit/ first. Chair-gated — NOT applied here. scripts/nevo_ratio_benchmark.py (#86.3) — LLM-as-judge (local claude_session, zero cost) measures recall/precision/granularity of our halachot vs the Nevo ratio. Works pre- and post-backfill (reads nevo_ratio, falls back to full_text). Verified: - pytest tests/test_nevo_preamble.py — 12 passed (incl. citation/markdown over-strip regressions). - backfill dry-run: 19 leaked rulings, 27 contaminated halachot, all ≥75% keep (the 32K over-strip is gone). - benchmark on בג"ץ 1764/05: recall=0.875 precision=1.0 granularity=1.75x. Invariants: G1 (normalize at source — strip/capture at ingest, not at read); no silent swallow (contaminated halachot flagged + reported, not dropped); data-migration is dry-run-default with backup+manifest, chair-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:45:43 +00:00
chaim	12bdec10fa	Merge pull request 'fix(claude_session): surface real CLI error + sanitize nested env (#85 )' (#90 ) from worktree-task85-claude-session-nested into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m27s Details	2026-06-06 19:30:22 +00:00
Chaim	8ec24cf822	fix(claude_session): surface real CLI error + sanitize nested env (#85 ) write_interim_draft failed for all blocks from the CEO MCP instance with "Claude CLI failed (exit 1): unknown error". Two fixes: 1. Error surfacing (the certain win): on non-zero exit, capture and log both stderr AND stdout (the CLI sometimes writes its diagnostic to stdout or nowhere), so the next occurrence is diagnosable instead of collapsing to "unknown error". This is why #85 was unsolved — the real error was swallowed (engineering rule §6: no silent swallow). 2. Defensive hardening: strip Claude Code session markers (CLAUDECODE, CLAUDE_CODE_, CLAUDE_AGENT_, AI_AGENT, CLAUDE_EFFORT) from the env of nested `claude -p` calls and run them from $HOME, decoupling them from the parent agent's session/project state. Aligns query() with the existing query_streaming() path (which already sets cwd=HOME). Auth/ config vars are preserved. Note: the original adapter-context failure could not be reproduced in a plain interactive session (nested claude -p succeeds there in both old and new code), so the env markers are a suspect, not a proven cause. The real value is the diagnostics. Verified: nested query() returns PONG from inside a CLAUDECODE=1 session; unit tests cover env sanitization. Invariants: G1 (normalize at source — fix the spawn, not readers), G2 (no parallel path — same query()), §6 (no silent error swallow). INV: feedback_claude_session_local_only preserved (all calls stay local). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:29:36 +00:00
chaim	3b9f77daa8	Merge pull request 'feat(style-acq T8): analyze_corpus — הסרת LIMIT 20 (כיסוי מלא)' (#89 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m24s Details	2026-06-06 19:25:40 +00:00
Chaim	5fa76a09b4	feat(style-acq T8): analyze_corpus — הסרת LIMIT 20 (כיסוי 48/48) LIMIT 20 קבוע השמיט בשקט שליש מקורפוס דפנה מחילוץ author-features שהפרופיל של הכותב (T0) נסמך עליו. עכשיו limit=0 (ברירת-מחדל) = כל הקורפוס; פרמטר lim>0 אופציונלי לתקרה. G11. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:25:13 +00:00
chaim	32a6e2b57b	Merge pull request 'fix(style-acq T9): מספור-אוטומטי אמיתי בייצוא DOCX' (#88 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m28s Details	2026-06-06 19:24:02 +00:00
Chaim	3c68383e86	fix(style-acq T9): מספור-אוטומטי אמיתי בייצוא DOCX (היה ללא מספור) באג: ה-exporter הסיר את הקידומת "N." והחיל סגנון "List Paragraph" — שאין לו numPr בתבנית (אין numbering.xml) → ההחלטות יצאו ללא מספור כלל. - docx_exporter._ensure_decision_numbering: מזריק abstractNum עשרוני (RTL, lvlJc=right) + num לחלק-המספור פעם אחת; _apply_list_numbering מחבר כל פסקת-גוף לרשימה הרציפה. מספור Word אמיתי — מתעדכן בעריכה, copy/paste נקי. אומת מבנית: numId יחיד, decimal, שתי פסקאות→אותו numId, docx נשמר. - התאמת ANTI_PATTERNS (T7): הוסר manual_paragraph_numbers — "N." בתחילת-שורה הוא ה-signal הנדרש לייצוא, לא אנטי-דפוס. נשאר inline (1)..(2)/markdown/bullets. - voice-fingerprint §3.1: תוקן — הכותב כן מקדים "N. " בתחילת-שורה (signal), הייצוא ממיר ל-auto-numbering. סתירה קודמת ("אל תקליד מספרים") יושבה. ⚠️ אימות-מבנה עבר; אימות ויזואלי ב-Word מומלץ על ייצוא ראשון. G11. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:23:29 +00:00
chaim	37c00bac13	Merge pull request 'feat(style-acq T14): שער-יו"ר לאישור הצעות-curator → הטמעה לפרופיל' (#87 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m42s Details	2026-06-06 19:18:13 +00:00
Chaim	f20a3a09fd	feat(style-acq T14): שער-יו"ר לאישור הצעות-curator → הטמעה לפרופיל סוגר את הלולאה מקצה-לקצה (INV-G10/LRN1): ה-curator מציע (status=analyzed), היו"ר מאשרת, והלקחים נכתבים לערוצים שהכותב צורך (T15) — אין auto-commit. - db.get_draft_final_pair(id) — שורת-פנקס מלאה כולל analysis. - app.py: GET /api/learning/pairs/{id} (חושף רק changes מסוג style_method — INV-LRN5) + POST .../promote (לקחים→discussion_rules['universal'], ביטויים→transition_phrases['universal'] דרך merge ל-appeal_type_rules; status→lessons_folded). _append_methodology_override משותף. - web-ui: usePairDetail/usePromoteLearning + ProposalReview (בחירת לקחים/ ביטויים לאימוץ) בטאב "למידה" עבור pairs במצב analyzed. INV-G10 (שער-יו"ר) · INV-LRN1 (אין auto-commit) · INV-LRN5 (טוהר). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:17:56 +00:00
chaim	6313fcd316	Merge pull request 'feat(style-acq T6+T13): פנקס-התאמה + מדד מרחק-סגנון ב-UI' (#86 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 38s Details	2026-06-06 19:13:32 +00:00
Chaim	ee76455a9a	feat(style-acq T6+T13): פנקס-התאמה + מדד מרחק-סגנון ב-UI ה"איך מנהלים/רואים את הלמידה": טאב "למידה" ב-/training. - app.py: GET /api/learning/pairs (פנקס-ההתאמה — כל ההחלטות + סטטוס draft↔final, INV-LRN4) + GET /api/learning/style-distance/{case} (מדד T7). - web-ui: learning.ts hooks + LearningPanel (טבלת פנקס; לחיצה על תיק → מדד מרחק-הסגנון: שינוי draft→final, סטיית יחסי-זהב, אנטי-דפוסים) + טאב ב-/training. מכסה גם את T6 (רשימת כל ההחלטות הנסגרות מול הסופי). ללא endpoint-schema חדש לטיפוסים מחוללים (טיפוסים ידניים). G9, INV-LRN4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:13:10 +00:00
chaim	7b1c0c1a32	Merge pull request 'feat(style-acq T12): /methodology — ביטויי-מעבר + אנטי-דפוסים editable' (#85 ) from worktree-style-acquisition-mvp into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 1m25s Details	2026-06-06 19:09:15 +00:00
Chaim	e4fbda6c1f	feat(style-acq T12): /methodology — קטגוריות ביטויי-מעבר + אנטי-דפוסים מרחיב את עורך-הפרופיל ב-/methodology עם 2 קטגוריות נוספות שהכותב (T15) והמדד (T7) צורכים — כך שהיו"ר עורכת אותן והעריכה זורמת לכתיבה: - app.py: _METHODOLOGY_DEFAULTS += transition_phrases (מקובץ לפי תוצאה) + anti_patterns (מ-lessons.ANTI_PATTERNS). דרך ה-CRUD הגנרי הקיים (appeal_type_rules). - block_writer (T15 loop): קורא overrides גם ל-transition_phrases + anti_patterns. - web-ui: GenericMethodologyPanel (עורך key→JSON) + 2 טאבים ב-/methodology. voice_invariants (doc) — נדחה (לא key-value). G11, INV-LRN4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:08:44 +00:00
chaim	3b3e1e3bbf	Merge pull request 'docs: FU-14 GAP-54 — סגירה כ-resolved-by-FU-1 (קליטת-פסיקה כבר מאוחדת)' (#84 ) from docs/gap54-closure into main All checks were successful Build & Deploy / build-and-deploy (push) Successful in 9s Details	2026-06-06 19:03:14 +00:00
Chaim	37dcb30604	docs: FU-14 GAP-54 — סגירה כ-resolved-by-FU-1 (איחוד קליטת-פסיקה) אימות (G2 — לא לפתור מחדש): קליטת-הפסיקה כבר מאוחדת ע"י FU-1. שני מסלולי- הפסיקה (precedent_library + internal_decisions) עוברים דרך ingest.ingest_document הקנוני עם ולידציית-enums + citation-guard סימטריים (מתועד ב-01-ingest §4). המסלול ה-3 (training→style_corpus) הוא קורפוס נפרד במכוון. מאומת ב-test_unified_ingest (9/9). אין קוד — רק תיעוד סגירה. Invariants: מאשר INV-ING1 + G2 מקוימים. doc-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:02:55 +00:00

1 2 3 4 5 ...

712 Commits