Commit Graph

29 Commits

Author SHA1 Message Date
f8c3fd6c89 fix(nevo): strip preamble/mini-ratio from court rulings too (#86.1)
strip_nevo_preamble's _DECISION_START only matched ועדת-ערר openings (בפנינו /
הערר שבנדון / ...), so Nevo COURT judgments — exactly the ones carrying a
מיני-רציו — slipped through unstripped. The editorial mini-ratio then leaked into
the chunked body, risking that the halacha extractor reads Nevo's answer key
(contamination) and polluting the corpus. Proven on בג"ץ 1764/05: its full_text
still contained the מיני-רציו (unstripped).

Fix:
- Extend _DECISION_START with court-ruling openings: פסק-דין/פסק דין header and
  the authoring-judge line (השופט/ת, כב' השופט, הנשיא, המשנה לנשיא). re.search
  picks the earliest line-start match → the real opinion start, not the prose
  ratio above it.
- Widen the Nevo-marker detection window 400→1500 chars so a long court/parties
  header doesn't push חקיקה שאוזכרה:/מיני-רציו: out of range.

Verified on the real 1764/05 full_text: strips 2702 chars, body now starts at
'השופט ס' ג'ובראן:', מיני-רציו gone. Regression: ועדת-ערר openings still strip;
non-Nevo text untouched; markers-past-400 now detected. Suite 182 passed (6 new).

This is the anti-contamination prerequisite for the Nevo-ratio gold-set (#86.3/#81.7).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 16:55:31 +00:00
fb60dca796 feat(halacha): over-extraction consolidation — fold facets via claude_session (#81.5)
After a precedent finishes extracting, a claude_session pass folds facets of the
SAME legal question (below #82's dedup cosine — the שפר 14-vs-4 / 403-17→89
granularity gap) into one canonical; the rest are marked 'rejected' (reversible:
out of the active corpus AND the review queue, but recoverable). FOLD-ONLY —
never merges distinct legal questions, never invents.

- Engine: claude_session-as-judge (local CLI, zero cost), 'high' effort — folding
  needs careful judgment. One pass per precedent, runs in _extract_impl once all
  chunks are done (the prompt dedups within a chunk; this catches across chunks).
- Pure, unit-tested helpers in halacha_quality: CONSOLIDATE_SYSTEM,
  build_consolidation_prompt, parse_fold_groups (fails SAFE → [] on any malformed
  shape; drops <2-member groups; coerces/dedups indices).
- halacha_extractor._consolidate_precedent picks the canonical per group
  (approved>pending, higher confidence, quote_verified, longer) and rejects the
  rest via the existing update_halachot_batch (#84). Never rejects a canonical.
  Fails OPEN on any error (no CLI / parse fail → 0 folds, data untouched).
- config: HALACHA_CONSOLIDATE_ENABLED/MODEL/EFFORT.

Verified: suite 176 passed (10 new); integration vs dev DB — a 2-facet group
folds to 1 canonical + 1 rejected (tagged), distinct rules untouched, claude
error → 0 folds (fail-open).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 16:26:44 +00:00
f196bed564 feat(halacha): NLI entailment validator via claude_session (#81.3) + task #86
#81.3 — a post-extraction validator that flags halachot whose rule_statement is
NOT entailed by its supporting_quote (the model over-reaching beyond its source).

- Engine: claude_session-as-judge (local CLI, zero API cost) per chaim's standing
  preference — one batched judge call per chunk, NOT a hosted NLI model.
- Pure, unit-tested helpers in halacha_quality: NLI_SYSTEM, build_nli_prompt,
  parse_nli_verdicts (fails OPEN — any shape/label ambiguity → 'entailed').
- halacha_extractor._nli_check wraps the call; fails OPEN on any error (e.g. no
  CLI in the container) so a flaky judge never blocks a genuine halacha.
- Non-entailed (neutral/contradiction) → quality_flag 'nli_unsupported' which
  blocks auto-approve (routes to pending_review) via the existing store gate.
- config: HALACHA_NLI_ENABLED/MODEL/EFFORT (effort 'low' — entailment is simple).

Verified: suite 166 passed (10 new); LIVE smoke test against the real claude CLI
returned ['entailed','neutral'] for a supported vs unsupported rule.

Also commits TaskMaster #86 (Nevo preamble/ratio: anti-contamination strip fix +
gold-set benchmark) capturing today's strip_nevo_preamble findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 14:46:12 +00:00
ca959d4a9c feat(halacha): strict-rubric quality gate + dedup-on-insert (#81,#82)
Bake the 2026-06-03 strict-cleanup rubric into the extraction pipeline so the
corpus stays clean at the source instead of accumulating duplicates, obiter
dicta, truncated quotes and thin restatements that clog the review queue.

#81 — quality gate:
- New pure module halacha_quality.py with unit-tested validators:
  non-decision/obiter (Wambaugh markers), truncated-quote (mid-word cut),
  thin-restatement (rule≈quote), quote-unverified.
- Validators run in halacha_extractor._process; a non-decision is re-typed
  obiter; flags persist in new halachot.quality_flags column.
- Auto-approve now requires confidence>=threshold AND no quality flags;
  flagged items route to pending_review regardless of confidence.
- Both extraction prompts hardened: reject undecided dicta, exclude
  case-specific applications, require abstraction, forbid over-splitting.

#82 — dedup-on-insert (store_halachot_for_chunk):
- Within the same precedent, skip a halacha whose normalized supporting_quote
  already exists, or whose rule-embedding has cosine>=HALACHA_DEDUP_COSINE
  (0.93) against an already-stored one. Makes re-runs idempotent.

Migration: halachot.quality_flags TEXT[] (additive, idempotent ALTER).
Tests: 19 new unit tests; full suite 156 passed. Validated end-to-end against
dev DB (dedup skips dups, flag blocks auto-approve, re-run inserts 0).
Calibration: flags fire on only ~10% of current survivors (low false-positive).

Spec: docs/halacha-strict-rubric.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 12:30:38 +00:00
df007784c9 feat(corroboration): approval_action decision fn + kill-switch (INV-COR2/COR4, X11 Phase 2)
- HALACHA_CORROBORATION_AUTO_APPROVE config (default ON, Dafna validated 2026-06-01)
- approval_action(agg, has_overruled): overruled→demote, corroborated→approve, else None
- 4 offline unit tests; Phase 2 plan + TaskMaster #75

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 04:34:23 +00:00
33f955e372 feat(corroboration): aggregator — distinct positive + negative-flag (INV-COR4, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:00:16 +00:00
dbc176ae66 feat(corroboration): halacha matcher + cosine threshold (INV-COR3, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 18:57:47 +00:00
09eec6a906 feat(corroboration): treatment classifier + polarity (INV-COR2, X11)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 18:54:50 +00:00
4d8422198a feat(guard): fitness function blocking raw Paperclip access (GAP-22, FU-8a)
Wakeup-INSERT rule is universal (never allowlisted — hard invariant). Raw-HTTP
rule exempts the sanctioned helpers + standalone operator/admin scripts (a
distinct category per fitness-function scope differentiation + DRY: tooling
needn't reuse the FastAPI wrapper). Repo scanned clean under these rules.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 11:35:07 +00:00
a66ab3b3cd feat(guard): fitness function blocking raw Paperclip access (GAP-22, FU-8a)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 11:16:36 +00:00
aac383acb7 feat(sync): --verify exits non-zero on drift; adapter mismatch = loud drift (GAP-21, FU-8a)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 11:14:44 +00:00
e46868feda feat(fu2b): flag PROC_MISMATCH (case_number prefix vs proceeding_type) for chair
Dry-run surfaced 2 rows with בל"מ prefix but proceeding_type=ערר. Since the
migration strips the prefix, a wrong proceeding_type would silently lose the
בל"מ signal — must be chair-adjudicated, not auto-applied. Chair table now
flags 4 rows: 2 DUP_CHECK (8047-23) + 2 PROC_MISMATCH.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 08:57:42 +00:00
a41fcedc28 test(fu2b): failing tests for bare-number extraction (FU-2b) 2026-05-31 08:52:48 +00:00
7e35a24d80 test(reindex): cover empty-text raise path (FU-3 review)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:13:18 +00:00
63abf83e76 test(reindex): fix mark_indexed stub arity in FU-1 fixture (FU-3)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:07:39 +00:00
c8de42150e test(reindex): stub db.mark_indexed in FU-1/FU-2a ingest fixtures (FU-3 interaction)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:07:18 +00:00
e522555b1a test(reindex): failing tests for content-hash re-index (FU-3) 2026-05-30 22:02:16 +00:00
bffd2ec701 test(audit): failing tests for audit-trail + provenance (FU-7) 2026-05-30 21:27:54 +00:00
5d3c340243 test(ingest): stub recompute_searchable in FU-1 fixture (FU-2a interaction)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 20:59:11 +00:00
358d82e90e feat(retrieval): require practice_area only for internal/cases; enable searchable filter + health visibility (GAP-13, FU-2a)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 20:57:27 +00:00
bcd226ac1a test(ingest): failing tests for idempotent ingest + searchable (FU-2a)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 20:41:34 +00:00
9ae2d47d03 test(ingest): failing tests for unified pipeline (FU-1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 19:09:37 +00:00
0c8d415044 fix(retrieval): scope search_decisions by domain — derive from case, block only on undeterminable case (GAP-12, INV-RET1)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 18:23:41 +00:00
084b31cd9b fix(qa): enforce critical-QA gate on export + fix neutral_background critical-but-passed (GAP-15/16, INV-QA3/EX3)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 17:58:50 +00:00
1af689a969 fix(retrieval): enforce source_kind on halacha_filters — close cross-corpus leak (GAP-10, INV-RET1)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 17:46:59 +00:00
f3cc9ca9d4 feat: Stage A finalizers + #35/#36/#37 — critical-gap closure
Some checks failed
Build & Deploy / build-and-deploy (push) Has been cancelled
Four parallel sub-agents closed the remaining critical gaps from the
26/05 Stage A/B sprint. Each block independently tested; aggregated here.

## #30/#31 finalizers (sub-agent A)
* Auto-derive practice_area in case_create from case_number prefix
  (1xxx→rishuy_uvniya, 8xxx→betterment_levy, 9xxx→compensation_197);
  default for CaseCreateRequest is now "" (the DB constraint catches
  any stray "appeals_committee").
* practice_area.py: derive_subtype now handles axis-B domain values
  (rishuy_uvniya/betterment_levy/compensation_197) without parsing the
  case number; new helper derive_domain_practice_area().
* Halacha re-extraction verified unnecessary — all 6 reclassified
  records already had is_binding=false and approved halachot.
* Regression tests: 6 cases in tests/test_corpus_constraints.py
  covering practice_area enum, internal-committee chair/district,
  external-upload arar prefix, MCP guard.
* UI: district input → Select dropdown (7 districts) in
  precedent-edit-sheet.tsx, preserving legacy free-text values.

## #37 בל"מ subtypes (sub-agent B)
* 3 new appeal_subtypes: extension_request_{building_permit,
  betterment_levy,compensation}. APPEALS_COMMITTEE_SUBTYPES extended,
  SUBTYPES_BY_AREA mappings added.
* New helpers: is_blam_subject(), is_blam_subtype(),
  derive_subtype_with_blam(case_number, subject, practice_area).
  case_create now uses it to auto-detect "בקשה להארכת מועד" subjects.
* 3 methodology templates under docs/methodology/extension-request-*.md.
* paperclip_client.py mapping updated for the 3 new subtypes
  (extension_request_building_permit→CMP, the other two→CMPA).
* Frontend: bilingual "בל"מ" badge + filter dropdown on cases list +
  detail header; appeal-type-bars collapseBlam() merges בל"מ into its
  parent domain for aggregate bars.
* Wizard auto-detects בל"מ from subject during case creation.
* 3 Berlinger cases (1017/1018/1019-03-26) migrated to
  appeal_subtype=extension_request_building_permit via psql.

## #35 missing_precedents feature (sub-agent C)
* Schema V13: missing_precedents table (citation, case_id, party,
  legal_topic, status, linked_case_law_id, claim_quote, ...) +
  FK constraints + 3 indexes. Applied via psql + idempotent migration.
* 6 db.py service functions, 3 MCP tools, 6 FastAPI endpoints
  (POST/GET/PATCH/DELETE/upload — upload routes by citation prefix
  to ingest_internal_decision or ingest_precedent).
* Next.js page /missing-precedents with 5 status tabs + filters +
  sidebar badge counter + detail drawer with metadata edit + smart
  upload form that switches fields per committee/court.
* Bootstrap: 7 rows imported from the JSON file
  (3 citations × cases, all status=closed with linked_case_law_id).
* legal-researcher.md: new §2ב.5 with missing_precedent_create
  usage + dedup semantics + tool grant.

## #36 legal_arguments aggregation (sub-agent D)
* Schema V14: legal_arguments + legal_argument_propositions M:M.
  Applied via psql.
* New service argument_aggregator.py with two functions —
  aggregate_claims_to_arguments() (Claude CLI / claude_session) and
  get_legal_arguments(). Graceful llm_unavailable handling when CLI
  is missing (containers).
* 2 MCP tools + 2 API endpoints (POST .../aggregate-arguments as
  BackgroundTask, GET .../legal-arguments).
* Frontend: shadcn Accordion + new legal-arguments-panel.tsx with
  hierarchical (party → priority badge → arguments) display, "טיעונים"
  tab on the case page, "חשב/חשב מחדש" buttons.
* scripts/backfill_legal_arguments.py + SCRIPTS.md entry — dry-run
  found 8 candidate cases including 1017/1018/1019.

## Open follow-ups (intentionally deferred)
* npm run api:types in web-ui (CLAUDE.md flow) — recommended before
  the next UI commit; not required for backend deployment.
* Run backfill_legal_arguments.py --apply once the container picks up
  the new aggregator service.
* webhook on missing-precedents upload-close to Paperclip (optional).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 08:34:40 +00:00
03e7d88aee DOCX exporter: 3-layer RTL + David font on all slots
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m30s
Hebrew was rendering LTR or in Times New Roman fallback in some Word
contexts. Root cause: incomplete RTL marking and missing font hints
on the run level.

Three layers of RTL are required (per skills/docx/SKILL.md):
1. Section: <w:bidi/> in sectPr (now inherited from template)
2. Paragraph: <w:bidi/> directly in pPr (paragraph direction)
3. Run: <w:rtl/> in rPr — tells Word to use cs (complex-script) font

Without an explicit font on the run, Hebrew renders in the ascii slot
(Times New Roman). Force David on all four slots (ascii / hAnsi / cs /
eastAsia) so every shaping path picks the correct font.

Changes:
- TEMPLATE_PATH now points to skills/docx/decision_template.docx
  (carries David, RTL, margins, styles); replaces hard-coded constants.
- _mark_run_rtl: writes rFonts on all four slots, then appends <w:rtl/>.
- _mark_paragraph_rtl: places <w:bidi/> directly in pPr (not nested in
  rPr — that was the bug), and adds <w:rtl/> to the paragraph-mark rPr.
- _set_paragraph_jc: forces explicit jc, overriding style-inherited.

Tests:
- test_mark_paragraph_rtl_adds_bidi_directly_in_pPr — guards against
  the regression where bidi was nested inside rPr.
- test_mark_run_rtl_forces_david_on_all_font_slots — ensures all four
  font slots are set, not just cs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 17:37:52 +00:00
36ca713dfa Retrofit: tighten yod-bet pattern, add cover-block fallback
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 6s
The "על כן" pattern for block-yod-bet was too greedy and matched mid-discussion
transitional sentences (e.g. "על כן, במקום בו..."), which caused forward-scan
to skip block-yod-alef ("סוף דבר") via the pointer advance.

Tightened to require an operative subject (אנו / הערר / הוועדה / ועדת הערר)
so terminal "על כן, אנו מחליטים" still matches but mid-block transitions don't.

Added structural_fallback for cover blocks (alef/bet/gimel/dalet) — these are
template metadata not present in user-edited DOCX bodies. Inject zero-content
anchors so apply_user_edit can still target them later. The frontend toast
distinguishes real content gaps from fallback anchors.

Also expanded heading patterns based on training corpus inspection:
- block-vav: על המקרקעין חלות / במצב התכנוני / התכניות החלות
- block-zayin: טענות העוררת
- block-chet: עיקר תגובת המשיב
- block-tet: הדיון בוועדת הערר

For case 1130-25, this raises detection from 6/12 to 11/12 blocks — only
block-yod-bet remains missing (Daphna's edit ends at "סוף דבר" + numbered
ruling, no terminal "ההחלטה" or "על כן אנו מחליטים" paragraph).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 06:57:41 +00:00
726498126d Add Track Changes architecture for draft revisions (CMP + CMPA)
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m29s
Fixes critical bug in 1033-25: user-uploaded עריכה-*.docx files were
orphaned on disk while exports kept rebuilding from stale DB blocks.

New architecture:
- User-uploaded DOCX becomes the source of truth (cases.active_draft_path)
- System edits via XML surgery with real Word <w:ins>/<w:del> revisions
- User can Accept/Reject each change from within Word

Components:
- docx_reviser.py: XML surgery for Track Changes (15 tests)
- docx_retrofit.py: retroactive bookmark injection with Hebrew marker
  detection + heading heuristic (9 tests)
- docx_exporter.py: emits bookmarks around each of the 12 blocks
- 3 new MCP tools: apply_user_edit, list_bookmarks, revise_draft
- 4 new/updated endpoints: upload (auto-registers active draft),
  /exports/revise, /exports/bookmarks, /exports/{filename}/retrofit,
  /active-draft
- DB migration: cases.active_draft_path column
- UI: correct banner using real v-numbers, "מקור האמת" badge,
  detailed upload toast with bookmarks_added/missing_blocks
- agents: legal-exporter (3 export modes), legal-ceo (stage G for
  revision handling), legal-writer (revision mode)

Multi-tenancy:
- Works for both CMP (1xxx cases) and CMPA (8xxx/9xxx cases)
- New revise-draft skill added to both companies
- deploy-track-changes.sh syncs skills CMP ↔ CMPA
- retrofit_case.py: one-off retrofit of existing files

Tests: 34 passing (15 reviser + 9 retrofit + 4 exporter bookmarks + 6 e2e)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 18:49:30 +00:00