All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 5s
אחרי ה-cutover ל-s3-only, אודיט מצא 15 אתרי-כתיבת-בלוב שעוקפים את storage.py (uploads/ finalize/exports/training/research-backup/precedents/bulletins/draft) — קובץ ינחת בתיקיות-הישנות אך **לא** ב-MinIO → יאבד בניקוי, לא מוגש, לא מגובה. ה-pipeline (ingest/ extract) עדיין קורא לפי file_path מהדיסק, אז ביטול-מוחלט של כתיבה-לדיסק דורש read-wiring מלא (Phase 2, משימה נפרדת). תיקון בטוח עכשיו = **dual-write seal**. - storage.py: `mirror`/`mirror_file` (+ sync) — best-effort persist ל-S3 כשה-backend s3/dual (no-op ב-filesystem; כשל S3 נרשם, לא שובר request — DualBackend philosophy). - web/app.py: helpers `_seal_blob`/`_seal_blob_file` + 14 אתרים אטומים (storage.mirror אחרי כתיבת-הדיסק; הדיסק נשאר ל-pipeline). block_writer.py: draft אטום (async). - **CI leak-guard** (test_storage_write_leak_guard): נכשל על כל כתיבת-בלוב-לדיסק (write_bytes/write_text/shutil.copy*/open(wb)) ב-web/+services ללא מרקר `# noqa: STG1`. כל ה-benign (fallbacks/tmp/staging/git-metadata/flag/state) מסומנים עם נימוק. storage.py מוחרג (הוא המימוש). - **tripwire** (scripts/storage_leak_tripwire.py): ניטור-ריצה — בלובים בדיסק שלא ב-MinIO (json-key match, bucket per-file). אומת חי: 0 דליפות. invariants: INV-STG1 (כל I/O דרך storage / ממורר אליו) · INV-STG6 · feedback_silent_swallow (mirror רושם warning, לא bare-except). Phase 2 (read-wire ה-pipeline → להפיל את עותק-הדיסק) = follow-up. tests: 4 mirror + 1 leak-guard + 6 serve_blob + 18 storage קיימות עוברות. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
63 lines
2.4 KiB
Python
63 lines
2.4 KiB
Python
"""INV-STG1 leak-guard — no blob may be written to disk without going through, or
|
|
being mirrored to, the storage layer (services/storage.py).
|
|
|
|
After the cutover to ``STORAGE_BACKEND=s3`` a direct disk write under DATA_DIR
|
|
that bypasses storage creates an orphan: a file in the old folders that never
|
|
reaches MinIO (lost on cleanup, not served, not backed up). This static guard
|
|
fails CI on any NEW direct blob-write (``write_bytes``/``write_text``/
|
|
``shutil.copy*``/``shutil.move``/``open(...,'wb')``) in the web API or services
|
|
that is not explicitly acknowledged with a ``# noqa: STG1`` marker.
|
|
|
|
Marking a line means the author has CONSCIOUSLY handled it — either sealed it
|
|
(``_seal_blob`` / ``storage.mirror`` right after, for paths the disk-based
|
|
pipeline still reads) or justified it as benign (temp file, staging-then-unlink,
|
|
git-per-case metadata, log/flag, BytesIO buffer, storage fallback). New
|
|
unmarked writes block the build until the author does the same.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import re
|
|
from pathlib import Path
|
|
|
|
import pytest
|
|
|
|
_ROOT = Path(__file__).resolve().parents[2]
|
|
# storage.py is the storage layer itself — its disk writes ARE the implementation.
|
|
_EXCLUDE = {"storage.py"}
|
|
_SCAN = [
|
|
_ROOT / "web" / "app.py",
|
|
*(p for p in sorted((_ROOT / "mcp-server" / "src" / "legal_mcp" / "services").glob("*.py"))
|
|
if p.name not in _EXCLUDE),
|
|
]
|
|
|
|
# Direct-disk-write patterns that could land a blob in the old folders.
|
|
_PATTERNS = re.compile(
|
|
r"\.write_bytes\(|\.write_text\(|shutil\.copy2?\(|shutil\.move\(|open\([^)]*,\s*['\"][wax]b?['\"]"
|
|
)
|
|
_MARKER = "noqa: STG1"
|
|
|
|
|
|
def _violations() -> list[str]:
|
|
out: list[str] = []
|
|
for f in _SCAN:
|
|
if not f.exists():
|
|
continue
|
|
for i, line in enumerate(f.read_text(encoding="utf-8").splitlines(), 1):
|
|
s = line.strip()
|
|
if s.startswith("#"):
|
|
continue
|
|
if _PATTERNS.search(line) and _MARKER not in line:
|
|
out.append(f"{f.relative_to(_ROOT)}:{i}: {s[:100]}")
|
|
return out
|
|
|
|
|
|
def test_no_unmarked_blob_disk_writes():
|
|
violations = _violations()
|
|
assert not violations, (
|
|
"INV-STG1: direct blob-disk-write(s) without a `# noqa: STG1` marker — "
|
|
"seal each via `_seal_blob`/`storage.mirror` (if the pipeline reads the "
|
|
"disk path) or justify it as benign on the line:\n "
|
|
+ "\n ".join(violations)
|
|
)
|