feat(storage): X14 Phase 1 — unified storage layer (services/storage.py)

The single choke-point for all binary file I/O (originals, derived
artifacts, exports), replacing the scattered open()/shutil/Path.write_bytes
calls across ~8 services. Backend chosen by STORAGE_BACKEND:
- filesystem (default): disk under DATA_DIR — byte-for-byte legacy behaviour
- dual: write disk + S3, read S3→disk fallback (migration window)
- s3: MinIO via aioboto3 (lazy import; absent in the filesystem path)

Keys are DATA_DIR-relative POSIX paths; the FS backend ignores the logical
bucket and keeps the existing single tree, so the default backend is zero
behaviour change. S3 maps a governance bucket (documents/immutable/derived)
→ MinIO bucket; presigned URLs are minted against the public endpoint
(browser-reachable) and carry the Hebrew filename via RFC-5987
Content-Disposition.

- config: STORAGE_BACKEND + MINIO_* (endpoint, public-endpoint, creds,
  region, 3 bucket names, presign TTL)
- mcp_env_catalog: new "storage" category + 10 specs (X10/INV-ENV1)
- pyproject: aioboto3>=13 (consumed here, deployed with first use)
- tests: 18 unit tests (FS round-trip, key normalization/traversal guard,
  bucket resolution, backend selection, dual write-both + S3-down fallback)

No call-sites are rewired yet — that is Phase 2 (106.3). STORAGE_BACKEND
stays filesystem in prod, so behaviour is unchanged.

Invariants: keeps G2 (one storage path replaces scattered I/O); establishes
INV-STG1 (single layer), INV-STG2 (atomic keys, Hebrew name in metadata),
INV-STG3 (governance buckets), INV-STG6 (presigned serving).
Spec: docs/spec/X14-storage-minio.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 07:47:49 +00:00
parent ade22ca871
commit b4a28f072d
5 changed files with 751 additions and 1 deletions

View File

@@ -202,6 +202,32 @@ EXPORTS_DIR = DATA_DIR / "exports" # legacy exports only
# Cases directory — flat structure: data/cases/{case_number}/
CASES_DIR = DATA_DIR / "cases"
# ── Object storage (X14 / MinIO) ───────────────────────────────────
# Single storage layer (services/storage.py) replaces the scattered file
# I/O across ~8 services (INV-STG1 / G2). Backend selector:
# "filesystem" (default) — disk under DATA_DIR; current behaviour, no change.
# "dual" — write disk + S3, read S3→disk fallback (migration).
# "s3" — MinIO only.
# See docs/spec/X14-storage-minio.md.
STORAGE_BACKEND = os.environ.get("STORAGE_BACKEND", "filesystem").strip().lower()
# Endpoint reached server-side (internal Docker network: http://minio:9000).
MINIO_ENDPOINT = os.environ.get("MINIO_ENDPOINT", "http://minio:9000")
# Public endpoint used when MINTING presigned URLs for the browser (INV-STG6) —
# the browser cannot resolve the internal hostname. Falls back to the internal
# endpoint when unset (e.g. local dev).
MINIO_PUBLIC_ENDPOINT = os.environ.get("MINIO_PUBLIC_ENDPOINT", MINIO_ENDPOINT)
MINIO_ACCESS_KEY = os.environ.get("MINIO_ACCESS_KEY", "")
MINIO_SECRET_KEY = os.environ.get("MINIO_SECRET_KEY", "")
MINIO_REGION = os.environ.get("MINIO_REGION", "us-east-1")
# Logical bucket → name. Governance boundaries (INV-STG3): documents
# (versioned), immutable (versioned + Object-Lock COMPLIANCE for final
# decisions, INV-STG4), derived (thumbnails/extracted text — regenerable).
MINIO_BUCKET_DOCUMENTS = os.environ.get("MINIO_BUCKET_DOCUMENTS", "legal-documents")
MINIO_BUCKET_IMMUTABLE = os.environ.get("MINIO_BUCKET_IMMUTABLE", "legal-immutable")
MINIO_BUCKET_DERIVED = os.environ.get("MINIO_BUCKET_DERIVED", "legal-derived")
# Default presigned-URL TTL (seconds). SigV4 hard max is 7 days; keep short.
MINIO_PRESIGN_TTL = int(os.environ.get("MINIO_PRESIGN_TTL", "900"))
def find_case_dir(case_number: str) -> Path:
"""Return the case directory for a given case number."""