הספ (docs/spec/, G1–G11) חובר לסוכני Paperclip דרך INV-AG1 אבל לא למסלול
שבו רוב הקוד נכתב בפועל — הסשן האינטראקטיבי של Claude Code. סוגר את הפער
לפני מחזור-2 (FU-9..15), שהוא כולו כתיבת-קוד.
שלוש שכבות אכיפה:
1. תיעוד — CLAUDE.md §"פרוטוקול כתיבת-קוד" + docs/spec בטבלת-הייחוס
2. hook — scripts/spec-guard.sh (PreToolUse על Edit/Write/MultiEdit, רשום
ב-.claude/settings.json) מזכיר פעם-בסשן בכל נגיעה בקובץ-קוד; non-blocking
3. PR — .gitea/PULL_REQUEST_TEMPLATE.md עם סעיף-חובה "Invariants"
המקבילה האינטראקטיבית ל-INV-AG1 שכבר אוכף על הסוכנים (HEARTBEAT §"קריאת-ספ").
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
סוגר את לולאת פידבק-יו"ר→ידע-סוכנים. עד כה resolve רק עדכן את ה-DB; עכשיו
לחיצה ב-/feedback מעירה את ה-CEO שמקפל את הלקח לקובץ לפי הקטגוריה.
- paperclip_client.py: wake_ceo_for_feedback_fold() — יוצר issue ב-Paperclip
עם הלקח + rubric ניתוב (style→SKILL.md, wrong_structure→block-schema,
אחר→lessons.md), מעיר CEO. משכפל את דפוס wake_for_precedent_extraction
- db.py: get_chair_feedback(id) — שליפת הערה בודדת עם case_number/appeal_type
- app.py: resolve endpoint מקבל fold (ברירת מחדל true); BackgroundTask
fire-and-forget; guard — רק עם lesson_extracted. מחזיר fold_queued
- legal-ceo.md: dispatch ל-feedback_fold_ + סעיף "קיפול הערת יו"ר" עם rubric
- frontend: useResolveFeedback מקבל fold; /feedback שולח fold=true עם toast;
drafts-panel שולח fold=false (bookkeeping per-case, בלי קיפול כפול)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- דף /feedback חדש: מאגד את כל הערות chair_feedback מכל התיקים, סינון
טרם-יושמו/הכל + לפי קטגוריה, כפתור "סמן כיושם" לכל הערה
- מרכז אישורים: כרטיס "הערות יו"ר" קישר ל-/ (חסר תועלת) → עכשיו /feedback
- מרכז אישורים: כרטיס "תיקים שנכשלו ב-QA" — כל תיק במדגם קליקבילי לדף
התיק, והכרטיס מקשר ישירות לתיק כשיש רק אחד
- ApprovalSample.href אופציונלי; פריטי מדגם נהפכים ל-Link כשיש href
- ניווט: הוספת "הערות יו"ר" לקבוצת work ב-app-shell
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ExtractedHalachotSection היה read-only — הוסף כפתורי פעולה לכל הלכה לפי
review_status: נדחתה → אשר/שחזר לתור · מאושרת → בטל אישור/דחה ·
ממתינה → אשר/דחה. משתמש ב-useUpdateHalacha שמרענן את detail query.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- בקאנד: GET לפני ה-async task — אם citation כבר קיים כ-external_upload מחזיר 409
- DB: get_external_case_law_by_citation — lookup לפי citation + source_kind
- פרונט: banner אדום עם פרטי הרשומה הקיימת ושני כפתורות:
• "הפעל חילוץ מחדש" — request-halachot ל-ID הקיים וסגירת הטופס
• "מחק את הרשומה" — DELETE עם confirm, ניקוי conflict לאחר מכן
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new "ההחלטה" tab to the case detail page showing all 12 decision
blocks with rendered markdown content and inline editing that saves back
to the DB via two new FastAPI endpoints.
Backend (web/app.py):
- GET /api/cases/{n}/decision-blocks — returns all 12 blocks (empty
ones included) merged from BLOCK_CONFIG + decision_blocks table.
Exposes source_of_truth ("docx"|"blocks") and active_draft_path.
- PUT /api/cases/{n}/decision-blocks/{block_id} — inline save via
block_writer.save_block_content; warns (does not block) when an
active DOCX draft exists.
Frontend:
- src/lib/api/decision-blocks.ts — typed hooks (useDecisionBlocks,
useSaveBlock) following the cases.ts hand-written-module pattern.
- src/components/cases/decision-blocks-panel.tsx — accordion of 12
blocks; view mode renders Markdown component; edit mode is a textarea
with on-blur save (derived from ChairEditor pattern, setState-during-
render for re-sync to avoid effect cascade).
- BLOCK_LABELS in feedback.ts extended from 7 → 12 blocks.
- cases/[caseNumber]/page.tsx — new "ההחלטה" tab wired to the panel.
No DB migration required — decision_blocks + active_draft_path exist.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
strip_nevo_preamble's _DECISION_START only matched ועדת-ערר openings (בפנינו /
הערר שבנדון / ...), so Nevo COURT judgments — exactly the ones carrying a
מיני-רציו — slipped through unstripped. The editorial mini-ratio then leaked into
the chunked body, risking that the halacha extractor reads Nevo's answer key
(contamination) and polluting the corpus. Proven on בג"ץ 1764/05: its full_text
still contained the מיני-רציו (unstripped).
Fix:
- Extend _DECISION_START with court-ruling openings: פסק-דין/פסק דין header and
the authoring-judge line (השופט/ת, כב' השופט, הנשיא, המשנה לנשיא). re.search
picks the earliest line-start match → the real opinion start, not the prose
ratio above it.
- Widen the Nevo-marker detection window 400→1500 chars so a long court/parties
header doesn't push חקיקה שאוזכרה:/מיני-רציו: out of range.
Verified on the real 1764/05 full_text: strips 2702 chars, body now starts at
'השופט ס' ג'ובראן:', מיני-רציו gone. Regression: ועדת-ערר openings still strip;
non-Nevo text untouched; markers-past-400 now detected. Suite 182 passed (6 new).
This is the anti-contamination prerequisite for the Nevo-ratio gold-set (#86.3/#81.7).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After a precedent finishes extracting, a claude_session pass folds facets of the
SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89
granularity gap) into one canonical; the rest are marked 'rejected' (reversible:
out of the active corpus AND the review queue, but recoverable). FOLD-ONLY —
never merges distinct legal questions, never invents.
- Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding
needs careful judgment. One pass per precedent, runs in _extract_impl once all
chunks are done (the prompt dedups within a chunk; this catches across chunks).
- Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM,
build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed
shape; drops <2-member groups; coerces/dedups indices).
- halacha_extractor._consolidate_precedent picks the canonical per group
(approved>pending, higher confidence, quote_verified, longer) and rejects the
rest via the existing update_halachot_batch (#84). Never rejects a canonical.
Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched).
- config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT.
Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group
folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude
error → 0 folds (fail-open).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#81.3 — a post-extraction validator that flags halachot whose rule_statement is
NOT entailed by its supporting_quote (the model over-reaching beyond its source).
- Engine: claude_session-as-judge (local CLI, zero API cost) per chaim's standing
preference — one batched judge call per chunk, NOT a hosted NLI model.
- Pure, unit-tested helpers in halacha_quality: NLI_SYSTEM, build_nli_prompt,
parse_nli_verdicts (fails OPEN — any shape/label ambiguity → 'entailed').
- halacha_extractor._nli_check wraps the call; fails OPEN on any error (e.g. no
CLI in the container) so a flaky judge never blocks a genuine halacha.
- Non-entailed (neutral/contradiction) → quality_flag 'nli_unsupported' which
blocks auto-approve (routes to pending_review) via the existing store gate.
- config: HALACHA_NLI_ENABLED/MODEL/EFFORT (effort 'low' — entailment is simple).
Verified: suite 166 passed (10 new); LIVE smoke test against the real claude CLI
returned ['entailed','neutral'] for a supported vs unsupported rule.
Also commits TaskMaster #86 (Nevo preamble/ratio: anti-contamination strip fix +
gold-set benchmark) capturing today's strip_nevo_preamble findings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Legacy Hebrew .doc precedents (e.g. nevo.co.il CP1255 OLE2) can now be
uploaded directly through the precedent-library, missing-precedent, and
training upload paths — the frontend already advertised .doc but the
backend gate rejected it before reaching the extractor.
- web/app.py: add .doc to ALLOWED_EXTENSIONS (covers all paths that share
the set: precedent library, missing-precedent, training).
- Dockerfile: install libreoffice-writer-nogui (no X11/Java) so the
extractor's existing _extract_doc LibreOffice conversion works in the
Coolify container (was missing → would fail at runtime).
- extractor.py: isolate the LibreOffice user profile per call to avoid a
profile-lock failure on concurrent .doc conversions.
Verified in python:3.12-slim (prod base): .doc→.docx→text yields text
byte-identical to a native Word .docx save (103 paragraphs, 24,341 chars).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the chair's pending-halacha review faster and less exhausting.
Backend:
- New 'deferred' review_status (snooze): stays out of the active library AND
out of the default pending queue, without the finality of 'rejected'.
update_halacha stamps reviewer+reviewed_at on defer; HALACHA_REVIEW_STATUSES
is the single source of valid statuses (PATCH validation now uses it).
- db.update_halachot_batch(ids, status, reviewer) — one atomic UPDATE for a
whole group; invalid status / empty ids are a no-op.
- POST /api/halachot/batch (HalachaBatchReviewRequest) wraps it.
- update_halacha now RETURNs quality_flags too (parity with list_halachot).
Frontend (halacha-review-panel):
- Quality-flag badges (#81: non_decision / truncated_quote / thin_restatement /
quote_unverified) so the chair sees WHY an item was held back.
- Defer action — button + keyboard 'D' — to snooze without rejecting (fixes the
'leave in pending forever' anti-pattern; reject stays the junk verb).
- Per-precedent batch bar: 'אשר הכל' / 'דחה הכל' via useBatchReviewHalachot
(one request, one refetch) with confirm guards.
- Halacha/HalachaPatch types gain quality_flags + 'deferred'.
Verified: mcp-server suite 156 passed; web build green; end-to-end integration
against dev DB (batch approve/reject, defer sets status+timestamp, pending
excludes approved+deferred, deferred queryable, invalid status no-op).
Note: api:types regen deferred until deploy (the batch hook is hand-typed, not
dependent on generated types).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#83 pipeline robustness — the index-numbering correctness guarantee:
- Add CREATE UNIQUE INDEX idx_halachot_unique_index ON halachot(case_law_id,
halacha_index). The extractor assigns the index as MAX+1 under an in-process
store-lock + a cross-process pg advisory lock, so collisions shouldn't occur
in normal operation — but per the research (FireHydrant/OneUptime) the
constraint is the actual correctness guarantee while the lock is the
optimization. A racing/double run now fails LOUDLY (UniqueViolation, chunk
left un-checkpointed → clean resume) instead of silently appending the
duplicates that were the 2026-05/06 over-extraction root cause.
Data prep (run against the live DB before the constraint, backed up to
data/audit/halacha-reindex-backup-*.sql): the 6 precedents that still carried
colliding halacha_index values (9 groups, distinct principles that shared a
number — NOT content dups) were renumbered to unique sequential indices.
Verified: advisory lock holds cross-process and the DB path is direct asyncpg
(no transaction-pooler), so the session lock is safe (83.1); force=True does
delete+checkpoint-clear in one transaction (83.5); constraint rejects a
duplicate-index insert (integration-checked). Full suite 156 passed.
Also commits the TaskMaster tracking for the whole halacha-quality initiative
(#81-#84 + research-backed subtasks, statuses).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bake the 2026-06-03 strict-cleanup rubric into the extraction pipeline so the
corpus stays clean at the source instead of accumulating duplicates, obiter
dicta, truncated quotes and thin restatements that clog the review queue.
#81 — quality gate:
- New pure module halacha_quality.py with unit-tested validators:
non-decision/obiter (Wambaugh markers), truncated-quote (mid-word cut),
thin-restatement (rule≈quote), quote-unverified.
- Validators run in halacha_extractor._process; a non-decision is re-typed
obiter; flags persist in new halachot.quality_flags column.
- Auto-approve now requires confidence>=threshold AND no quality flags;
flagged items route to pending_review regardless of confidence.
- Both extraction prompts hardened: reject undecided dicta, exclude
case-specific applications, require abstraction, forbid over-splitting.
#82 — dedup-on-insert (store_halachot_for_chunk):
- Within the same precedent, skip a halacha whose normalized supporting_quote
already exists, or whose rule-embedding has cosine>=HALACHA_DEDUP_COSINE
(0.93) against an already-stored one. Makes re-runs idempotent.
Migration: halachot.quality_flags TEXT[] (additive, idempotent ALTER).
Tests: 19 new unit tests; full suite 156 passed. Validated end-to-end against
dev DB (dedup skips dups, flag blocks auto-approve, re-run inserts 0).
Calibration: flags fire on only ~10% of current survivors (low false-positive).
Spec: docs/halacha-strict-rubric.md
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Full check found the premise wrong on every count (like #71/#70):
- Not 140 docs/17,700 pages/2hr/$$ needing Dafna+chaim. Of 140 image-less
docs, only 65 are PDF (rest MD/DOCX — pipeline renders PDF only) = 704 pages.
- The value docs (appraisal, where multimodal's table/image worth is) were
already 8/12 embedded. The only gap was ONE case, 8070-25 (4 appraisal docs).
- Backfilled 8070-25 locally (voyage-multimodal-3, ~30s, cents): all 14 docs
embedded. Appraisal coverage now 12/12 (100%).
- Remaining 51 PDFs/649 pages are all text-dense (reference/response/appeal);
#15 proved multimodal does NOT help text-dense docs, so they're intentionally
left text-only. Not an inconsistency — the correct config.
No gold-set / Dafna labeling / chaim cost approval needed — cost was cents and
value was already proven in #15. #80 done (technical, not human-gated).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 4 'ambiguous' citation items flagged for chair turned out to be dead
orphan stubs: 0 inbound/outbound edges across all 5 citation mechanisms,
0 full_text, 0 halachot, 0 chunks/embeddings. A corpus-wide check found 15
such orphans total (incl. clean-looking ones). Per OpenCitations (keep an
id-less entity only if it is CITED — these are cited by nothing), these are
pure noise → deleted, not chair-judgment.
- 15 orphan cited_only stubs deleted (cited_only 46 -> 31); backup in
data/audit/fu2b-orphan-stub-cleanup-*.json.
- 0 malformed / 0 orphans remain; all 31 remaining stubs are cited.
- Combines with the 3 earlier mechanical normalizations. #70 fully done.
- Known forward-edge (no current data, no task): '+' combined-citation
handling in citation_extractor if it recurs in future extraction.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A/B eval (eval_retrieval.py, 86-query gold-set) showed the 0.5 default was
mis-tuned: the image side was too heavy and dragged precedent_library recall
0.971 -> 0.885. Sweep 0.5..0.75 — at 0.65 multimodal beats text-only on every
overall metric AND every corpus (R@5 0.994 vs 0.989, nDCG@5 0.960 vs 0.944,
MRR 0.954 vs 0.936). Dafna approved.
- MULTIMODAL_TEXT_WEIGHT=0.65 set in Coolify (legal-ai, runtime) + redeploy.
- baseline.json updated to the 0.65 config (future regression reference).
- #15 done (premise was stale — multimodal already default on 110 docs; the
win was tuning the weight, not the backfill).
- #80 opened: the costly 140-doc legacy backfill is deferred until a targeted
image-answer gold-set proves the table/image value prop (untested here).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A section that opens with a short header line ('דיון', 'טענות המשיבים')
followed by a paragraph larger than chunk_size flushed the header alone as a
tiny chunk. #55 added a query-time >=50 filter to hide these; this removes
them at the source.
_split_section: (1) don't flush a buffer still below MIN_CHUNK_CHARS — let it
absorb the next paragraph even if that overflows chunk_size, so a short header
rides with its following content; (2) fold a trailing tiny chunk back into its
predecessor.
Verified: re-chunked the 4 corpus docs that still had a tiny chunk
(ע"א 5138/04, בר"מ 2340/02, בג"ץ 6525/15, 403-17) — corpus-wide chunks<50
went 4 -> 0; all 4 stay embedded/searchable and rank top in a relevant search
(נווה שלום #1 for the s.19(ג)(1) exemption query). No regression.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Discovered closing #57: the current chunker still emits 4 tiny chunks that
are standalone section headings ('דיון', 'טענות המשיבים', ...). Low priority
— filtered at query time, search unaffected. Proposed fix: anchor a short
isolated heading forward into the following section.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds scripts/rechunk_legacy_precedents.py: selects every case_law with a tiny
chunk (content<50 — the pre-fix chunker fingerprint) and runs
ingest.reindex_case_law (re-chunk+re-embed from stored full_text only, no
re-OCR/LLM, idempotent). Batch-idempotent (re-queries the affected set).
Run result (2026-06-03): 73 precedents reindexed, 0 failed. Tiny chunks
483 -> 4 (99.2%); total precedent_chunks 5019 -> 3115 (fragments merged).
Search verified healthy (substantial coherent passages, no errors).
The 4 residual tiny chunks are isolated section headings ('דיון',
'טענות המשיבים', ...) emitted by the CURRENT (fixed) chunker — not legacy
fragments — and are already filtered at query time (>=50, #55). Minor
chunker edge case, candidate #55 follow-up.
The DB chunk migration is already applied to prod; this commit is the script
+ SCRIPTS.md entry only (no app code change, no deploy needed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#78 (committee-upload wakeup) + #77 (case_number identity) shipped.
#76 (Paperclip create-task button): root-caused to ee=companyId guard —
button enabled on title only but submit requires a company; not safely
patchable via injection. Deferred with workaround + upstream note.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pairs with the backend PR. Stops the citation (מראה-מקום) from being stored
as the identifier, and lets a wrong identifier be corrected after the fact.
- upload sheet: new required 'מספר תיק (מזהה ייחודי)' field for committee
decisions → sent as case_number; the citation field is now sent as the
separate citation (→ citation_formatted) instead of as case_number.
- edit sheet: the case_number block is now an editable input (was read-only).
Halachot/chunks key off case_law_id (UUID), so renaming case_number is safe.
- precedent-library.ts: InternalDecisionUploadInput += citation; PrecedentPatch
+= case_number.
- types.ts: regenerated (api:types) — PrecedentUpdateRequest now carries
case_number.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two identity fixes for the precedent corpus:
1. PrecedentUpdateRequest += case_number — the canonical identifier was not
in the edit model, so a wrong id captured at upload (e.g. the full
citation pasted into the field) could not be corrected. update_case_law
already whitelists case_number.
2. /api/internal-decisions/upload += citation form field — case_number is
now the clean identifier (e.g. 8027-25) and citation is the full
מראה-מקום, stored as citation_formatted up-front (previously the UI sent
the citation AS case_number, leaving the id polluted and citation_formatted
empty until extraction). Stored via a post-ingest update_case_law, not the
core INSERT.
Frontend (separate case_number field in the upload + edit sheets) follows in
a second PR after api:types regen.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /api/internal-decisions/upload path (used by the UI for ועדת-ערר
decisions) never called pc_wake_for_precedent_extraction, so committee
decisions were stuck at halacha_extraction_status='pending' forever — the
CEO was never woken to drain the queue. Root cause behind 8027-25's stuck
extraction. The other two upload paths (precedent_library, missing-precedent)
already wake the CEO; this one was missing it.
- internal-decisions upload: add the wakeup, routing the company by case
number prefix (1xxx→רישוי, 8xxx→היטל, 9xxx→פיצויים) when practice_area is
empty (else an 8xxx case wrongly routes to the licensing CEO).
- all three call sites: the wake helper returns {ok:False} WITHOUT raising
on a skipped/failed wakeup; that was silently dropped. Now logged at
WARNING with the reason, and the upload progress carries extraction_queued.
Fallback drainer (scheduled precedent_process_pending) deferred — the
missing wakeup was the actual failure; manual precedent_process_pending
remains the recovery path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#34 don't manufacture doubt about unambiguous statutes (s.19(ג)(2));
#35 writer/QA two-sources-of-truth sync gap (DB vs drafts/decision.md).
Output of the weekly-feedback-analysis job, pending commit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>