Files
legal-ai/web/tests/test_serve_blob.py
Chaim 63784f1f91
All checks were successful
G12 Leak-Guard / leak-guard (pull_request) Successful in 6s
feat(storage): #106.5 — read-wiring via serve_blob (presigned + dual disk-fallback)
חיווט-קריאה של 4 endpoints מגישי-קבצים (api_read_local_file · research/analysis
download · analysis export-docx · exports download) דרך helper serve_blob יחיד
(INV-STG6). מיישם את אסטרטגיית-ה-cutover שהפאנל התלת-מודלי (Opus+DeepSeek+Gemini)
אישר פה-אחד 2026-06-11:
- filesystem → FileResponse מדיסק (משמר-התנהגות; ה-backend הפעיל בייצור — אפס שינוי).
- s3/dual → 302 ל-presigned-URL כשהאובייקט ב-MinIO (bytes browser↔MinIO, לא דרך FastAPI).
- dual + miss → **fallback-לדיסק** — מכסה שקוף קבצים שמחוץ לסט-ההגירה מתויג-ה-DB
  (analysis-and-research.md, DOCX דינמי, proofread). זו רשת-הביטחון שהפאנל דרש.
- s3 + miss + ללא-דיסק → 404.
כשל normalize_key/presign → fallback-לדיסק, לעולם לא 500 (לא נשבר בשקט — logger.exception).

ה-cutover (#106.6 flip ל-s3) + WORM (#106.7) **נשארים נעולים מאחורי אישור-אדם** —
הכרעת-הפאנל פה-אחד (proceed_autonomously=false). PR זה הפיך: תחת filesystem אין שינוי-
התנהגות, וה-helper מוכן להפעלה כשיוחלט flip מפוקח + curl-ירוק per-endpoint.

invariants: INV-STG6 (presigned) · INV-STG1 (storage layer יחיד) · G2 (serve_blob יחיד,
לא 4 העתקי-לוגיקה) · INV-G10 (אפס שינוי-התנהגות בייצור filesystem).
tests: 4 חדשות (web/tests/test_serve_blob.py — filesystem/dual-S3/dual-fallback/s3-404), עוברות. py_compile OK.
מקור: פאנל תלת-מודלי (תיעוד ב-TaskMaster #106.6).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 17:43:31 +00:00

81 lines
2.9 KiB
Python

"""Tests for #106.5 — web.app.serve_blob backend routing (INV-STG6).
Verifies the tri-model-panel cutover-safety design (2026-06-11):
- filesystem → FileResponse from disk (behaviour-preserving, current prod).
- s3/dual → 302 redirect to a presigned URL when the object is in MinIO.
- dual + miss → disk fallback (covers files outside the DB-tracked migration set
— e.g. dynamically-built analysis DOCX / research markdown).
- s3 + miss + no disk → 404.
Importing web/app.py needs a few env vars (it wires the Paperclip pool at import);
they're set before import. Skips cleanly if the heavy import can't be satisfied.
"""
from __future__ import annotations
import asyncio
import os
import sys
from pathlib import Path
import pytest
os.environ.setdefault("PAPERCLIP_DB_URL", "postgres://x:x@127.0.0.1:54329/paperclip")
os.environ.setdefault("DATA_DIR", "/home/chaim/legal-ai/data")
sys.path.insert(0, str(Path(__file__).resolve().parents[1])) # web/
sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "mcp-server" / "src")) # legal_mcp
app = pytest.importorskip("app", reason="web/app.py import prerequisites unavailable")
from fastapi.responses import FileResponse, RedirectResponse # noqa: E402
from legal_mcp.services import storage # noqa: E402
DATA_DIR = Path(os.environ["DATA_DIR"])
class _FakeBackend:
def __init__(self, name: str, has: bool) -> None:
self.name, self._has = name, has
async def exists(self, key, *, bucket) -> bool: # noqa: ANN001
return self._has
async def presign_get(self, key, *, bucket, download_name=None) -> str: # noqa: ANN001
return f"https://s3.example/{key}"
@pytest.fixture()
def blob(tmp_path_factory, monkeypatch):
# a real file UNDER DATA_DIR so storage.normalize_key accepts it
d = DATA_DIR / "audit"
d.mkdir(parents=True, exist_ok=True)
f = d / "_serveblob_pytest.txt"
f.write_text("hi")
yield f
f.unlink(missing_ok=True)
def _serve(monkeypatch, name, has, path):
monkeypatch.setattr(storage, "get_storage", lambda: _FakeBackend(name, has))
return asyncio.new_event_loop().run_until_complete(
app.serve_blob(str(path), media_type="text/plain", filename="x.txt"))
def test_filesystem_serves_from_disk(blob, monkeypatch):
assert isinstance(_serve(monkeypatch, "filesystem", False, blob), FileResponse)
def test_dual_in_s3_redirects_presigned(blob, monkeypatch):
assert isinstance(_serve(monkeypatch, "dual", True, blob), RedirectResponse)
def test_dual_missing_falls_back_to_disk(blob, monkeypatch):
# the panel's safety net: a file not yet in MinIO is still served from disk
assert isinstance(_serve(monkeypatch, "dual", False, blob), FileResponse)
def test_s3_missing_no_disk_404(monkeypatch):
from fastapi import HTTPException
with pytest.raises(HTTPException) as ei:
_serve(monkeypatch, "s3", False, DATA_DIR / "audit" / "_nope.txt")
assert ei.value.status_code == 404