Files

Chaim adc196ac20 docs(plan): FU-8a process→code guards implementation plan (3 tasks, TDD)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-05-31 10:51:31 +00:00

15 KiB

Raw Blame History

FU-8a: Process→Code Guards — Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Make two process barriers enforceable in code: sync_agents_across_companies.py --verify exits non-zero on any drift (incl. adapter_type mismatch, loud not silent), and a fitness-function test fails the suite if the repo gains raw Paperclip HTTP calls or direct agent_wakeup_requests inserts.

Architecture: GAP-21 — extract the drift loop into a pure build_drift_report(...) and a pure _verify_exit_code(...), then make --verify exit 1 on drift. GAP-22 — a self-contained pytest fitness function that scans web/, mcp-server/src/, scripts/ for forbidden Paperclip-access patterns with an explicit allowlist. Both pure-code; repo pre-scanned clean (0 existing violations).

Tech Stack: Python 3.12, asyncpg (sync script), pytest offline, .venv at mcp-server/.venv.

Spec: docs/superpowers/specs/2026-05-31-fu8a-process-to-code-guards-design.md

Run tests: cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_sync_verify_gate.py tests/test_paperclip_access_guard.py -v

File Structure

Modify scripts/sync_agents_across_companies.py — extract build_drift_report(...) + _verify_exit_code(...) (pure); --verify exits non-zero on drift; adapter_type mismatch + missing-in-mirror counted as drift.
Create mcp-server/tests/test_sync_verify_gate.py — offline tests for the two pure functions (imports the script via importlib, like the FU-2b test).
Create mcp-server/tests/test_paperclip_access_guard.py — the fitness-function guard (scan + fixtures + real-repo assertion).
Modify scripts/SCRIPTS.md — note the new --verify gate semantics.

Task 1: GAP-21 — `--verify` becomes an enforceable drift gate

Files: Modify scripts/sync_agents_across_companies.py; Create mcp-server/tests/test_sync_verify_gate.py

Step 1: Write the failing tests

"""FU-8a / GAP-21: sync --verify drift-gate logic (offline)."""
from __future__ import annotations

import importlib.util
from pathlib import Path

_SCRIPT = Path(__file__).resolve().parents[2] / "scripts" / "sync_agents_across_companies.py"
_spec = importlib.util.spec_from_file_location("sync_agents", _SCRIPT)
sync = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(sync)


def _agent(name, adapter="claude_code", cfg=None):
    return {"id": f"id-{name}", "name": name, "adapter_type": adapter,
            "adapter_config": cfg or {"model": "x"}, "runtime_config": {}, "metadata": {},
            "budget_monthly_cents": 0, "icon": "", "title": "", "role": "", "agent_api_keys": []}


def test_verify_exit_code_clean_is_zero():
    assert sync._verify_exit_code(plan=[], mismatches=[], missing=[]) == 0


def test_verify_exit_code_drift_is_nonzero():
    assert sync._verify_exit_code(plan=[("m", "mi", {"x": 1})], mismatches=[], missing=[]) == 1


def test_verify_exit_code_adapter_mismatch_is_nonzero():
    # adapter_type mismatch must count as drift (not silent skip)
    assert sync._verify_exit_code(plan=[], mismatches=["עוזר משפטי"], missing=[]) == 1


def test_verify_exit_code_missing_is_nonzero():
    assert sync._verify_exit_code(plan=[], mismatches=[], missing=["סוכן"]) == 1


def test_build_drift_report_flags_adapter_mismatch():
    master = [_agent("A", adapter="claude_code")]
    mirror_by_name = {"A": _agent("A", adapter="deepseek_local")}
    rep = sync.build_drift_report(master, mirror_by_name, mirror_skills=set(), only=None)
    assert "A" in rep["mismatches"]
    assert rep["plan"] == []  # mismatch short-circuits the diff


def test_build_drift_report_flags_missing_and_plan():
    master = [_agent("A"), _agent("B")]
    # A missing in mirror; B present but differing config
    mirror_by_name = {"B": _agent("B", cfg={"model": "different"})}
    rep = sync.build_drift_report(master, mirror_by_name, mirror_skills=set(), only=None)
    assert "A" in rep["missing"]
    assert any(p[0]["name"] == "B" for p in rep["plan"])

Step 2: Run to verify it fails

Run: cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_sync_verify_gate.py -v Expected: FAIL — AttributeError: module 'sync_agents' has no attribute '_verify_exit_code' / build_drift_report. (Note: the script imports asyncpg at module top — confirm it imports cleanly under importlib; it does not connect at import time.)

Step 3: Add the two pure functions

In scripts/sync_agents_across_companies.py, add ABOVE async def main():

def build_drift_report(master_agents, mirror_by_name, mirror_skills, only=None) -> dict:
    """Pure drift computation (no DB, no printing). Returns:
    {"plan": [(master, mirror, diff), ...], "mismatches": [name, ...], "missing": [name, ...]}.
    adapter_type mismatch and missing-in-mirror are recorded as drift, not skipped silently.
    """
    plan, mismatches, missing = [], [], []
    for m in master_agents:
        if only and m["name"] != only:
            continue
        mirror = mirror_by_name.get(m["name"])
        if not mirror:
            missing.append(m["name"])
            continue
        if m["adapter_type"] != mirror["adapter_type"]:
            mismatches.append(m["name"])
            continue
        diff = compute_diff(m, mirror, mirror_skills)
        if diff:
            plan.append((m, mirror, diff))
    return {"plan": plan, "mismatches": mismatches, "missing": missing}


def _verify_exit_code(plan, mismatches, missing) -> int:
    """0 iff fully in sync; 1 if any drift (needs-sync / adapter mismatch / missing-in-mirror)."""
    return 1 if (plan or mismatches or missing) else 0

Step 4: Rewire main()'s drift loop + --verify to use them

In main(), REPLACE the inline drift loop (the plan = [] block through the for m in master_agents: loop that builds plan) with:

    print(f"=== Drift report ===")
    report = build_drift_report(master_agents, mirror_by_name, mirror_skills, only=args.only)
    plan = report["plan"]
    for name in report["missing"]:
        print(f"  ⚠ {name:14s} — NOT FOUND in mirror (we never auto-create) — DRIFT")
    for name in report["mismatches"]:
        m = next(a for a in master_agents if a["name"] == name)
        mi = mirror_by_name[name]
        print(f"  ❌ {name:14s} — adapter_type mismatch ({m['adapter_type']} vs {mi['adapter_type']}) "
              f"— DRIFT (apply skips it; fix manually in both companies)")
    for master, mirror, diff in plan:
        print_diff(master["name"], diff, master["id"], mirror["id"])

And REPLACE the if args.verify: block with:

    if args.verify:
        code = _verify_exit_code(plan, report["mismatches"], report["missing"])
        total_drift = len(plan) + len(report["mismatches"]) + len(report["missing"])
        print(f"\nSummary: {len(plan)} need sync, {len(report['mismatches'])} adapter-mismatch, "
              f"{len(report['missing'])} missing-in-mirror → {'DRIFT' if code else 'IN SYNC'}")
        sys.exit(code)

(The --apply path still uses plan and still does NOT touch adapter_type-mismatch agents — only --verify's exit code changes + the loud reporting.)

Step 5: Run tests + import check

Run: cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_sync_verify_gate.py -v → all PASS. Run: .venv/bin/python -c "import importlib.util,pathlib; p=pathlib.Path('scripts/sync_agents_across_companies.py'); s=importlib.util.spec_from_file_location('s',p); m=importlib.util.module_from_spec(s); s.loader.exec_module(m); print('imports')" (from repo root) → imports.

Step 6: Commit

cd ~/legal-ai
git add scripts/sync_agents_across_companies.py mcp-server/tests/test_sync_verify_gate.py
git commit -m "feat(sync): --verify exits non-zero on drift; adapter mismatch = loud drift (GAP-21, FU-8a)"

Task 2: GAP-22 — Paperclip-access fitness function

Files: Create mcp-server/tests/test_paperclip_access_guard.py

Step 1: Write the guard + its tests

"""FU-8a / GAP-22: fitness function — forbid un-sanctioned Paperclip access.

Fails if any scanned source (outside the allowlist) reaches the Paperclip API
with a raw HTTP client or inserts directly into agent_wakeup_requests. The
sanctioned paths are web/paperclip_api.py::pc_request (Python) and scripts/pc.sh
(bash); wakeup must go through POST /api/agents/{id}/wakeup.
"""
from __future__ import annotations

import re
from pathlib import Path

import pytest

REPO = Path(__file__).resolve().parents[2]
SCAN_ROOTS = [REPO / "web", REPO / "mcp-server" / "src", REPO / "scripts"]

# Files exempt from the HTTP-to-Paperclip rule (the sanctioned helpers + legacy DB-read client).
ALLOWLIST = {
    REPO / "web" / "paperclip_api.py",       # the sanctioned pc_request helper
    REPO / "scripts" / "pc.sh",              # the sanctioned bash wrapper
    REPO / "web" / "paperclip_client.py",    # legacy: DB reads only (no raw http, no wakeup insert)
}

_PC_URL = re.compile(r"PAPERCLIP_API_URL|127\.0\.0\.1:3100|localhost:3100|pc\.nautilus\.marcusgroup\.org")
_HTTP_CLIENT = re.compile(r"\bhttpx\b|\brequests\.(get|post|put|patch|delete)\b|\baiohttp\b|\bcurl\b")
_WAKEUP_INSERT = re.compile(r"insert\s+into\s+agent_wakeup_requests", re.IGNORECASE)


def _scan_text(text: str) -> list[str]:
    """Return violation reasons for a single file's text."""
    reasons = []
    if _WAKEUP_INSERT.search(text):
        reasons.append("direct INSERT INTO agent_wakeup_requests — use the wakeup API")
    # raw HTTP to Paperclip: both a paperclip-URL token and an http-client token present
    if _PC_URL.search(text) and _HTTP_CLIENT.search(text):
        reasons.append("raw HTTP client + Paperclip URL — use web/paperclip_api.pc_request or scripts/pc.sh")
    return reasons


def _iter_source_files():
    for root in SCAN_ROOTS:
        if not root.exists():
            continue
        for ext in ("*.py", "*.sh"):
            for f in root.rglob(ext):
                if f in ALLOWLIST or "/.venv/" in str(f) or "/tests/" in str(f):
                    continue
                yield f


def find_violations() -> list[tuple[str, str]]:
    out = []
    for f in _iter_source_files():
        try:
            text = f.read_text(encoding="utf-8")
        except (UnicodeDecodeError, OSError):
            continue
        for reason in _scan_text(text):
            out.append((str(f.relative_to(REPO)), reason))
    return out


# ── the guard catches positives, ignores sanctioned negatives ──────────
def test_scan_flags_raw_http_to_paperclip():
    bad = 'import httpx\nasync def f():\n    await httpx.post(f"{PAPERCLIP_API_URL}/x")\n'
    assert _scan_text(bad)


def test_scan_flags_wakeup_insert():
    bad = "await conn.execute('INSERT INTO agent_wakeup_requests (id) VALUES ($1)', x)"
    assert _scan_text(bad)


def test_scan_ignores_sanctioned_helper_shape():
    ok = 'url = f"{PAPERCLIP_API_URL}{path}"\n# the only place httpx is allowed for paperclip\n'
    # this shape WOULD flag if not allowlisted — proving the allowlist is what protects it
    assert _scan_text(ok)  # raw text matches; the file is protected by ALLOWLIST, not by content


def test_scan_ignores_plain_code():
    assert _scan_text("def add(a, b):\n    return a + b\n") == []


# ── the real repo must be clean (pre-scanned 2026-05-31: 0 violations) ──
def test_repo_has_no_paperclip_access_violations():
    violations = find_violations()
    assert violations == [], "Un-sanctioned Paperclip access found:\n" + "\n".join(
        f"  {f}: {r}" for f, r in violations)

Step 2: Run the guard tests

Run: cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/test_paperclip_access_guard.py -v Expected: ALL PASS — including test_repo_has_no_paperclip_access_violations (repo is clean). If test_repo_has_no_paperclip_access_violations FAILS, it found a real violation: either fix the offending code to use the sanctioned helper, or (if it's a genuine sanctioned location) add it to ALLOWLIST with a comment justifying it. Report any such case.

Step 3: Commit

cd ~/legal-ai
git add mcp-server/tests/test_paperclip_access_guard.py
git commit -m "feat(guard): fitness function blocking raw Paperclip access (GAP-22, FU-8a)"

Task 3: SCRIPTS.md + full suite + smoke + PR

Step 1: Note the --verify gate semantics in SCRIPTS.md

In scripts/SCRIPTS.md, in the sync_agents_across_companies.py row, append to its Purpose cell: "--verify יוצא exit≠0 על drift (כולל adapter_type-mismatch — מדווח רם, נספר כ-drift) — שמיש כ-gate ל-cron/CI (GAP-21/FU-8a)."

Step 2: Full offline suite

Run: cd ~/legal-ai/mcp-server && .venv/bin/python -m pytest tests/ -q Expected: all pass (prior suite + the new GAP-21/GAP-22 tests). Report the summary line.

Step 3: Smoke — run --verify against the live Paperclip DB (read-only)

cd ~/legal-ai && set -a && source ~/.env 2>/dev/null && set +a
PAPERCLIP_BOARD_API_KEY="${PAPERCLIP_BOARD_API_KEY:-}" \
  /home/chaim/legal-ai/mcp-server/.venv/bin/python scripts/sync_agents_across_companies.py --verify; echo "exit=$?"

Report the output + exit code. Expected: prints a drift report; exit=0 if agents are in sync, exit=1 if drift exists (either is a valid result — it proves the gate works). The script only READS in --verify (no mutation). (If the script needs PAPERCLIP_DB_URL/board key and they're absent, report that the smoke needs the Paperclip env; the offline unit tests already validate the gate logic.)

Step 4: Commit + PR

cd ~/legal-ai
git add scripts/SCRIPTS.md
git commit -m "docs(scripts): note sync --verify drift-gate semantics (FU-8a)"
git push -u origin fix/fu8a-process-to-code-guards

Create the PR via the Gitea REST API (token from ~/.git-credentials) and merge per the standing PR+merge rule.

Step 5: TaskMaster #66 → done (controller; verify via MCP). GAP-23 remains in #69.

Self-Review Notes

GAP-21 → Task 1: build_drift_report + _verify_exit_code (pure, tested); --verify exits 1 on drift; adapter mismatch loud + counted. --apply behavior unchanged.
GAP-22 → Task 2: fitness function; tested on positive fixtures + sanctioned negatives + the real repo (clean). Allowlist explicit (paperclip_api.py, pc.sh, legacy paperclip_client.py).
Repo pre-scanned clean — Task 2 Step 2's repo assertion passes today; if it ever fails, that's the guard doing its job.
No production-data risk — pure-code; smoke --verify is read-only.
Type consistency: build_drift_report(...)->{plan,mismatches,missing}, _verify_exit_code(plan,mismatches,missing)->int, find_violations()->[(file,reason)], _scan_text(text)->[reason] — names match across tasks + tests.
GAP-23 out of scope (#69 / FU-8b).

15 KiB Raw Blame History