# FU-2a: Idempotent Ingest + Write-Time Normalization + `searchable` Flag — Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Make ingest idempotent (`ON CONFLICT` upsert), normalize identifiers at the write boundary (type-aware), and add a materialized `searchable` flag — all forward-only, no identifier migration. **Architecture:** Pure-code + one schema-additive migration (V21) in `db.py`. The two `create_*_case_law` functions move from app-level SELECT-then-INSERT/UPDATE to atomic `INSERT … ON CONFLICT … DO UPDATE` against the existing V15 partial unique indexes (predicate repeated). A new `_canonical_case_number` normalizes at write for identifier-keyed corpora (internal/cases), not for external (citation is its id). A new `searchable` boolean is recomputed from the completeness contract on ingest/metadata completion; the search-layer filter is gated behind a dry-run. **Tech Stack:** Python 3.12, asyncpg, PostgreSQL (pgvector) at localhost:5433, pytest offline, local `.venv` at `mcp-server/.venv`. **Spec:** [docs/superpowers/specs/2026-05-30-fu2a-idempotent-ingest-design.md](../specs/2026-05-30-fu2a-idempotent-ingest-design.md) **Run tests:** `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_idempotent_ingest.py -v` **DB smoke (real Postgres):** source `~/.env`, connect to `localhost:5433` db `legal_ai` (see Task 6). --- ## File Structure - **Modify** `mcp-server/src/legal_mcp/services/db.py`: - add `_canonical_case_number(s)` (pure) near `_normalize_case_number` (~line 1196). - add pure `_compute_searchable(row, has_embedded_chunk)` + async `recompute_searchable(...)`. - add `SCHEMA_V21_SQL` (after V20, ~line 1094) + wire into `_run_schema_migrations` (~line 1119). - normalize at write in `create_case`, `create_internal_committee_decision` (NOT `create_external_case_law`). - convert `create_external_case_law` + `create_internal_committee_decision` to `ON CONFLICT … DO UPDATE`. - **Modify** `mcp-server/src/legal_mcp/services/ingest.py`: call `db.recompute_searchable(case_law_id)` after statuses are set (uniform, both types). - **Modify** the search layer (`services/hybrid_search.py` and/or `db.py` search functions) — gated `searchable = true` filter (Task 6, only if dry-run is clean). - **Create** `mcp-server/tests/test_idempotent_ingest.py` — offline tests for the pure pieces + ingest wiring. **Unchanged:** public signatures of `ingest_precedent`/`ingest_internal_decision` (FU-1) and the DB-create parameter lists. Normalization/upsert live inside the write boundary. --- ## Task 1: Failing tests (pure logic + ingest wiring) **Files:** Create `mcp-server/tests/test_idempotent_ingest.py` - [ ] **Step 1: Write the failing tests** ```python """FU-2a: idempotent ingest + write-time normalization + searchable flag. Offline tests for the *pure* pieces (canonical normalization, completeness predicate) and ingest wiring. The real ON CONFLICT upsert is verified by a DB smoke test against localhost:5433 (see plan Task 6), since it requires a live Postgres partial unique index. """ from __future__ import annotations import asyncio from uuid import uuid4 import pytest from legal_mcp.services import db, ingest def _run(coro): return asyncio.run(coro) # ── GAP-06: canonical normalization (pure, deterministic) ────────────── @pytest.mark.parametrize("raw,expected", [ ("ערר 8137/24", "8137-24"), (" עע\"מ 1/20 ", "1-20"), ("8126-03-25", "8126-03-25"), # month segment preserved ("בל\"מ 1010-01-25", "1010-01-25"), ("8047/23", "8047-23"), ]) def test_canonical_case_number(raw, expected): assert db._canonical_case_number(raw) == expected def test_canonical_does_not_invent_month(): # No month in input → none added (X1 §1). assert db._canonical_case_number("8126/24") == "8126-24" # ── GAP-13: completeness predicate (pure) ────────────────────────────── def _complete_row(): return { "case_number": "8047-23", "case_name": "פלוני נ' הוועדה", "practice_area": "rishuy_uvniya", "source_kind": "internal_committee", "extraction_status": "completed", "headnote": "תקציר", "summary": "", "subject_tags": [], } def test_compute_searchable_true_when_complete(): assert db._compute_searchable(_complete_row(), has_embedded_chunk=True) is True def test_compute_searchable_false_without_embedded_chunk(): assert db._compute_searchable(_complete_row(), has_embedded_chunk=False) is False def test_compute_searchable_false_without_metadata(): row = _complete_row() row["headnote"] = ""; row["summary"] = ""; row["subject_tags"] = [] assert db._compute_searchable(row, has_embedded_chunk=True) is False def test_compute_searchable_false_when_extraction_incomplete(): row = _complete_row(); row["extraction_status"] = "pending" assert db._compute_searchable(row, has_embedded_chunk=True) is False def test_compute_searchable_false_without_core_fields(): row = _complete_row(); row["practice_area"] = "" assert db._compute_searchable(row, has_embedded_chunk=True) is False # ── ingest wires in recompute_searchable (both types) ────────────────── def test_ingest_calls_recompute_searchable(monkeypatch, tmp_path): calls = {"recompute": [], "meta": [], "hal": []} async def _extract_text(path): return ("text", 1, [0]) monkeypatch.setattr(ingest.extractor, "extract_text", _extract_text) monkeypatch.setattr(ingest.extractor, "strip_nevo_preamble", lambda t: t) monkeypatch.setattr(ingest.chunker, "chunk_document", lambda t, page_offsets=None: [type("C", (), { "chunk_index": 0, "content": "c", "section_type": "b", "page_number": 1})()]) async def _embed(texts, input_type="document"): return [[0.0] * 8 for _ in texts] monkeypatch.setattr(ingest.embeddings, "embed_texts", _embed) async def _store(cid, dicts): return len(dicts) monkeypatch.setattr(ingest.db, "store_precedent_chunks", _store) async def _create_internal(**kw): return {"id": uuid4()} monkeypatch.setattr(ingest.db, "create_internal_committee_decision", _create_internal) async def _noop(*a, **k): return None monkeypatch.setattr(ingest.db, "set_case_law_extraction_status", _noop) monkeypatch.setattr(ingest.db, "set_case_law_halacha_status", _noop) monkeypatch.setattr(ingest.db, "request_metadata_extraction", lambda cid: calls["meta"].append(cid) or _noop()) monkeypatch.setattr(ingest.db, "request_halacha_extraction", lambda cid: calls["hal"].append(cid) or _noop()) async def _recompute(cid): calls["recompute"].append(cid) monkeypatch.setattr(ingest.db, "recompute_searchable", _recompute) monkeypatch.setattr(ingest.config, "PARENT_DOC_RETRIEVAL_ENABLED", False) monkeypatch.setattr(ingest.config, "MULTIMODAL_ENABLED", False) from legal_mcp.services import internal_decisions _run(internal_decisions.ingest_internal_decision( case_number="8047/23", text="t", chair_name="x", practice_area="rishuy_uvniya")) assert len(calls["recompute"]) == 1, "ingest must recompute searchable after success" ``` - [ ] **Step 2: Run to verify failure** Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_idempotent_ingest.py -v` Expected: FAIL — `AttributeError: module 'legal_mcp.services.db' has no attribute '_canonical_case_number'` (and `_compute_searchable`, `recompute_searchable`). - [ ] **Step 3: Commit** ```bash cd ~/legal-ai git add mcp-server/tests/test_idempotent_ingest.py git commit -m "test(ingest): failing tests for idempotent ingest + searchable (FU-2a)" ``` --- ## Task 2: `_canonical_case_number` + write-time normalization **Files:** Modify `mcp-server/src/legal_mcp/services/db.py` - [ ] **Step 1: Add `_canonical_case_number` next to `_normalize_case_number` (~line 1212)** ```python def _canonical_case_number(s: str) -> str: """Canonical write-time form per X1 §1: trim · prefix-strip · '/'→'-'. Deterministic and format-only — does NOT add or remove a month segment. Used at the write boundary for identifier-keyed corpora (internal committee decisions, active cases). NOT for external precedents, whose canonical identifier is the full citation. """ s = (s or "").strip() m = re.search(r"\d", s) if m: s = s[m.start():] return s.strip().replace("/", "-") ``` - [ ] **Step 2: Normalize at write in `create_case` (~line 1158)** Change the INSERT's `case_number` binding to normalized form. Replace `case_id, case_number, title,` with: ```python case_id, _canonical_case_number(case_number), title, ``` - [ ] **Step 3: Normalize at write in `create_internal_committee_decision` (top of function body, ~line 2649)** Immediately after `pool = await get_pool()`, add: ```python case_number = _canonical_case_number(case_number) ``` (Do NOT add this to `create_external_case_law` — external keeps its citation verbatim; that function only `.strip()`s, which the caller adapter already does.) - [ ] **Step 4: Run normalization tests** Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_idempotent_ingest.py -k "canonical" -v` Expected: `test_canonical_case_number` (5 cases) + `test_canonical_does_not_invent_month` PASS. - [ ] **Step 5: Commit** ```bash cd ~/legal-ai git add mcp-server/src/legal_mcp/services/db.py git commit -m "feat(ingest): write-time canonical case_number normalization (GAP-06, FU-2a)" ``` --- ## Task 3: Convert both create functions to `ON CONFLICT DO UPDATE` **Files:** Modify `mcp-server/src/legal_mcp/services/db.py` - [ ] **Step 1: Replace `create_external_case_law` body (lines 2566-2624, from `pool = await get_pool()` to `return _row_to_case_law(row)`)** ```python pool = await get_pool() tags_json = json.dumps(subject_tags or [], ensure_ascii=False) async with pool.acquire() as conn: # Atomic upsert on the V15 partial unique index # uq_case_law_external_number (case_number) WHERE source_kind <> 'internal_committee'. # The predicate is repeated in ON CONFLICT (required for partial indexes). # This also subsumes the old cited_only→external_upload promotion: a # cited_only row with the same case_number conflicts and is promoted by # DO UPDATE. Scoped to the external partial index, so an internal row with # the same number is NOT touched (the old SELECT-without-source_kind could # wrongly promote it). row = await conn.fetchrow( """ INSERT INTO case_law ( case_number, case_name, court, date, subject_tags, summary, key_quote, full_text, source_url, source_kind, document_id, extraction_status, halacha_extraction_status, practice_area, appeal_subtype, headnote, source_type, precedent_level, is_binding ) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, 'external_upload', $10, 'processing', 'pending', $11, $12, $13, $14, $15, $16 ) ON CONFLICT (case_number) WHERE source_kind <> 'internal_committee' DO UPDATE SET case_name = EXCLUDED.case_name, court = COALESCE(NULLIF(EXCLUDED.court, ''), case_law.court), date = COALESCE(EXCLUDED.date, case_law.date), practice_area = EXCLUDED.practice_area, appeal_subtype = EXCLUDED.appeal_subtype, subject_tags = EXCLUDED.subject_tags, summary = COALESCE(NULLIF(EXCLUDED.summary, ''), case_law.summary), headnote = EXCLUDED.headnote, key_quote = COALESCE(NULLIF(EXCLUDED.key_quote, ''), case_law.key_quote), full_text = EXCLUDED.full_text, source_url = COALESCE(NULLIF(EXCLUDED.source_url, ''), case_law.source_url), source_type = EXCLUDED.source_type, precedent_level = EXCLUDED.precedent_level, is_binding = EXCLUDED.is_binding, document_id = COALESCE(EXCLUDED.document_id, case_law.document_id), source_kind = 'external_upload', extraction_status = 'processing', halacha_extraction_status = 'pending' RETURNING * """, case_number, case_name, court, decision_date, tags_json, summary, key_quote, full_text, source_url, document_id, practice_area, appeal_subtype, headnote, source_type, precedent_level, is_binding, ) return _row_to_case_law(row) ``` - [ ] **Step 2: Replace `create_internal_committee_decision` body (lines 2649-2708)** ```python pool = await get_pool() case_number = _canonical_case_number(case_number) tags_json = json.dumps(subject_tags or [], ensure_ascii=False) async with pool.acquire() as conn: # Atomic upsert on V15 partial unique index # uq_case_law_internal_number_proc (case_number, proceeding_type) # WHERE source_kind = 'internal_committee'. Predicate repeated for the # partial index. Replaces the old SELECT-then-INSERT/UPDATE (race-prone). row = await conn.fetchrow( """ INSERT INTO case_law ( case_number, case_name, court, date, chair_name, district, subject_tags, summary, full_text, source_kind, source_type, document_id, extraction_status, halacha_extraction_status, practice_area, appeal_subtype, is_binding, proceeding_type ) VALUES ( $1, $2, $3, $4, $5, $6, $7, $8, $9, 'internal_committee', 'appeals_committee', $10, 'processing', 'pending', $11, $12, $13, $14 ) ON CONFLICT (case_number, proceeding_type) WHERE source_kind = 'internal_committee' DO UPDATE SET case_name = EXCLUDED.case_name, court = COALESCE(NULLIF(EXCLUDED.court, ''), case_law.court), date = COALESCE(EXCLUDED.date, case_law.date), chair_name = COALESCE(NULLIF(EXCLUDED.chair_name, ''), case_law.chair_name), district = COALESCE(NULLIF(EXCLUDED.district, ''), case_law.district), practice_area = EXCLUDED.practice_area, appeal_subtype = EXCLUDED.appeal_subtype, subject_tags = EXCLUDED.subject_tags, summary = COALESCE(NULLIF(EXCLUDED.summary, ''), case_law.summary), full_text = EXCLUDED.full_text, source_type = 'appeals_committee', source_kind = 'internal_committee', is_binding = EXCLUDED.is_binding, document_id = COALESCE(EXCLUDED.document_id, case_law.document_id), extraction_status = 'processing', halacha_extraction_status = 'pending' RETURNING * """, case_number, case_name, court, decision_date, chair_name, district, tags_json, summary, full_text, document_id, practice_area, appeal_subtype, is_binding, proceeding_type, ) return _row_to_case_law(row) ``` - [ ] **Step 3: Verify import + no syntax error** Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -c "from legal_mcp.services import db; print('db imports')"` Expected: prints `db imports`. - [ ] **Step 4: Commit** ```bash cd ~/legal-ai git add mcp-server/src/legal_mcp/services/db.py git commit -m "feat(ingest): atomic ON CONFLICT upsert in create_*_case_law (GAP-03, FU-2a)" ``` --- ## Task 4: V21 migration — `searchable` column + recompute **Files:** Modify `mcp-server/src/legal_mcp/services/db.py` - [ ] **Step 1: Add `SCHEMA_V21_SQL` after `SCHEMA_V20_SQL` (~line 1094)** ```python # ── V21: explicit `searchable` flag (GAP-13 / INV-DM1) ───────────── # Materialized completeness flag — a case_law row is exposed to search only # when it satisfies the completeness contract (02-data-model §2a). Recomputed # on ingest/metadata completion via recompute_searchable(); not inferred at # query time. Default false so a freshly-inserted row is excluded until proven # complete. Health-check surfaces count(*) FILTER (WHERE NOT searchable). SCHEMA_V21_SQL = """ ALTER TABLE case_law ADD COLUMN IF NOT EXISTS searchable boolean NOT NULL DEFAULT false; CREATE INDEX IF NOT EXISTS idx_case_law_searchable ON case_law (searchable); """ ``` - [ ] **Step 2: Wire V21 into `_run_schema_migrations` (~line 1119) and bump the log line** After `await conn.execute(SCHEMA_V20_SQL)` add: ```python await conn.execute(SCHEMA_V21_SQL) ``` Change the log line `"Database schema initialized (v1-v20)"` → `"Database schema initialized (v1-v21)"`. - [ ] **Step 3: Add `_compute_searchable` (pure) + `recompute_searchable` (async) near the case_law helpers (after `create_internal_committee_decision`, ~line 2709)** ```python def _compute_searchable(row: dict, has_embedded_chunk: bool) -> bool: """Completeness contract (INV-DM1 / 02-data-model §2a). A row is searchable IFF: canonical id present · case_name/practice_area/ source_kind present · ≥1 chunk with a non-null embedding · extraction completed · metadata non-empty (≥1 of headnote/summary/subject_tags). Pure — `has_embedded_chunk` is supplied by the caller (cross-table check). """ if not has_embedded_chunk: return False if (row.get("extraction_status") or "") != "completed": return False if not (row.get("case_number") or "").strip(): return False if not (row.get("case_name") or "").strip(): return False if not (row.get("practice_area") or "").strip(): return False if not (row.get("source_kind") or "").strip(): return False tags = row.get("subject_tags") or [] has_meta = bool((row.get("headnote") or "").strip()) \ or bool((row.get("summary") or "").strip()) \ or (len(tags) > 0) return has_meta async def recompute_searchable(case_law_id: "UUID | str | None" = None) -> int: """Recompute and persist the `searchable` flag. Idempotent / reversible. If case_law_id is None, recompute ALL rows (used by the V21 backfill and the dry-run). Returns the number of rows now marked searchable=true. """ pool = await get_pool() async with pool.acquire() as conn: if case_law_id is not None: cid = case_law_id if isinstance(case_law_id, UUID) else UUID(str(case_law_id)) rows = await conn.fetch( "SELECT * FROM case_law WHERE id = $1", cid) else: rows = await conn.fetch("SELECT * FROM case_law") n_true = 0 for r in rows: row = dict(r) # subject_tags is stored jsonb; _row_to_case_law parses it, but here # we read raw — normalize to a list length check. tags = row.get("subject_tags") if isinstance(tags, str): try: tags = json.loads(tags) except (ValueError, TypeError): tags = [] row["subject_tags"] = tags or [] has_chunk = await conn.fetchval( "SELECT EXISTS(SELECT 1 FROM precedent_chunks " "WHERE case_law_id = $1 AND embedding IS NOT NULL)", row["id"]) val = _compute_searchable(row, bool(has_chunk)) await conn.execute( "UPDATE case_law SET searchable = $2 WHERE id = $1", row["id"], val) if val: n_true += 1 return n_true ``` - [ ] **Step 4: Run the completeness-predicate tests** Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_idempotent_ingest.py -k "searchable and not ingest" -v` Expected: all `test_compute_searchable_*` PASS. - [ ] **Step 5: Commit** ```bash cd ~/legal-ai git add mcp-server/src/legal_mcp/services/db.py git commit -m "feat(data-model): V21 searchable flag + recompute_searchable (GAP-13, FU-2a)" ``` --- ## Task 5: Wire `recompute_searchable` into ingest **Files:** Modify `mcp-server/src/legal_mcp/services/ingest.py` - [ ] **Step 1: Call recompute after statuses are set in `ingest_document`** In `ingest.py`, find the block (added by FU-1) that sets statuses + queues extraction: ```python await db.set_case_law_extraction_status(case_law_id, "completed") await db.set_case_law_halacha_status(case_law_id, "pending") await db.request_metadata_extraction(case_law_id) await db.request_halacha_extraction(case_law_id) ``` Immediately AFTER `request_halacha_extraction`, add: ```python await db.recompute_searchable(case_law_id) ``` > Rationale: at this point chunks+embeddings are stored and extraction_status is > completed, so the completeness predicate is meaningful. Metadata may still be > pending (queued), so the row may compute searchable=false until metadata fills — > the metadata extractor also calls recompute (Task 5 Step 2). - [ ] **Step 2: Call recompute after metadata extraction fills fields** In `mcp-server/src/legal_mcp/services/precedent_metadata_extractor.py`, find `extract_and_apply`'s success path (where it persists the filled metadata fields). After the DB update that writes the extracted metadata, add a call: ```python await db.recompute_searchable(case_law_id) ``` (Import `db` is already present in that module; if not, add `from legal_mcp.services import db`. Confirm by reading the file's imports first.) - [ ] **Step 3: Run the ingest-wiring test** Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_idempotent_ingest.py -k "ingest_calls_recompute" -v` Expected: `test_ingest_calls_recompute_searchable` PASS. - [ ] **Step 4: Commit** ```bash cd ~/legal-ai git add mcp-server/src/legal_mcp/services/ingest.py mcp-server/src/legal_mcp/services/precedent_metadata_extractor.py git commit -m "feat(ingest): recompute searchable on ingest + metadata completion (GAP-13, FU-2a)" ``` --- ## Task 6: DB smoke + dry-run + GATED search filter **Files:** Modify search layer ONLY if dry-run is clean (see Step 4). - [ ] **Step 1: Apply the V21 migration to the local DB and smoke-test upsert idempotency** Run (sources env, exercises real Postgres): ```bash cd ~/legal-ai && set -a && source ~/.env && set +a cd mcp-server && .venv/bin/python -c " import asyncio, uuid from legal_mcp.services import db async def main(): await db.get_pool() # runs migrations incl V21 # idempotent internal upsert: same (case_number, proceeding_type) twice cn = 'ZZ9999/24' r1 = await db.create_internal_committee_decision(case_number=cn, case_name='t', full_text='x', practice_area='rishuy_uvniya') r2 = await db.create_internal_committee_decision(case_number=cn, case_name='t2', full_text='x2', practice_area='rishuy_uvniya') assert r1['id'] == r2['id'], 'upsert must update, not duplicate' # cleanup pool = await db.get_pool() async with pool.acquire() as c: await c.execute(\"DELETE FROM case_law WHERE case_number = 'ZZ9999-24'\") print('UPSERT IDEMPOTENT OK; normalized stored as ZZ9999-24') asyncio.run(main()) " ``` Expected: `UPSERT IDEMPOTENT OK` and no duplicate. (Note: `ZZ9999/24` normalizes to `ZZ9999-24` — confirms write-time normalization too.) - [ ] **Step 2: Backfill the `searchable` flag (recompute, reversible)** ```bash cd ~/legal-ai && set -a && source ~/.env && set +a cd mcp-server && .venv/bin/python -c " import asyncio from legal_mcp.services import db async def main(): n = await db.recompute_searchable() print('recompute_searchable: rows now searchable =', n) asyncio.run(main()) " ``` - [ ] **Step 3: Dry-run report — which rows would drop from search if the filter is enabled** ```bash cd ~/legal-ai && set -a && source ~/.env && set +a PGPASSWORD="$POSTGRES_PASSWORD" psql "host=$POSTGRES_HOST port=$POSTGRES_PORT dbname=$POSTGRES_DB user=$POSTGRES_USER" -c " SELECT source_kind, count(*) AS total, count(*) FILTER (WHERE NOT searchable) AS would_drop FROM case_law GROUP BY source_kind ORDER BY source_kind;" ``` Report the table to the controller. **Decision gate:** if `would_drop` includes legitimate, currently-findable precedents (e.g. external_upload / internal_committee rows that users rely on), DO NOT enable the search filter in Step 4 — stop and report; the filter waits for FU-2b. If `would_drop` is only genuinely-incomplete rows, proceed. - [ ] **Step 4: (GATED) Enable `searchable = true` filter in the search layer** ONLY if Step 3 is clean. Read `mcp-server/src/legal_mcp/services/hybrid_search.py` to find the `case_law` WHERE clauses in `search_precedent_library_hybrid` / `search_documents_hybrid`. Add `AND cl.searchable = true` (alias as used in that query) to the case_law-joined precedent search paths. Add a focused test asserting a non-searchable row is excluded (monkeypatch or DB smoke). If deferred, write a one-line note in the spec §7 that the filter is pending FU-2b and skip. - [ ] **Step 5: Add health-check visibility** Find the health-check endpoint/function (search `def health` / `processing_status` in `web/app.py` or `tools/`). Add a field `non_searchable_case_law = SELECT count(*) FROM case_law WHERE NOT searchable`. Keep it a single cheap COUNT. - [ ] **Step 6: Commit** ```bash cd ~/legal-ai git add -A mcp-server/ web/ git commit -m "feat(retrieval): gated searchable filter + health-check visibility (GAP-13, FU-2a)" ``` --- ## Task 7: Full suite + smoke + lint + TaskMaster - [ ] **Step 1: Full test suite** Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/ -q` Expected: all pass (the FU-1 77 + new FU-2a tests). Report the summary line. - [ ] **Step 2: Smoke-import** Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -c "from legal_mcp.services import db, ingest, precedent_library, internal_decisions; print('clean')"` Expected: `clean`. - [ ] **Step 3: Lint changed files (if ruff available)** Run: `cd ~/legal-ai/mcp-server && .venv/bin/python -m ruff check src/legal_mcp/services/db.py src/legal_mcp/services/ingest.py 2>/dev/null; echo "exit=$?"` Expected: clean or "ruff not available". - [ ] **Step 4: Mark TaskMaster #60 + subtasks done** Controller handles this (edit `.taskmaster/tasks/tasks.json`, verify via MCP get_task). Subtasks 60.1 (GAP-03), 60.2 (GAP-06), 60.5 (GAP-13). --- ## Self-Review Notes - **GAP-03** → Task 3 (ON CONFLICT both functions). **GAP-06** → Task 2 (`_canonical_case_number` + write-time, type-aware). **GAP-13** → Tasks 4-5 (column + recompute + wiring) and gated Task 6 (filter). - **No identifier migration** — FU-2b (#67) owns GAP-07/08. The V21 backfill only sets a derived, reversible flag. - **Gated search filter** (Task 6 Step 3-4): the behavior-visible change is contingent on a clean dry-run; otherwise deferred. Surface the dry-run table to the user. - **Offline-test limitation:** ON CONFLICT needs real Postgres → verified by Task 6 Step 1 smoke; offline tests cover the pure logic (normalize, completeness) and ingest wiring. - **Type-consistency:** `_canonical_case_number`, `_compute_searchable(row, has_embedded_chunk)`, `recompute_searchable(case_law_id=None)` — names used identically in tests (Task 1) and impl (Tasks 2,4).