תיקון: כל העלאת קובץ עם שם עברי נכשלה ב-500 תחת backend s3-only. השורש:
`ingest._stage_file` מצרף את שם-הקובץ המקורי כ-S3 object metadata
(`metadata={"filename": src.name}`), ו-`S3Backend.put_bytes` העביר אותו כמו-שהוא
ל-`put_object`. botocore אוכף ASCII-only על S3 metadata → ParamValidationError →
500. שם עברי כמו "יומון 5167 - 11.6.26.pdf" שבר כל upload. נחשף ב-cutover ל-s3-only
(2026-06-11): קליטת היומונים (וגם כל מסמך/פסיקה עם שם עברי) הפסיקה לעבוד; היומון
האחרון שנקלט (5165, 9.6) היה לפני ה-cutover.
התיקון (נרמול-במקור, G1; בשכבת-האחסון היחידה, INV-STG2):
- `_ascii_metadata` מקודד ערכי-metadata לא-ASCII ב-percent-encoding (lossless,
שחזור עם urllib.parse.unquote); ASCII רגיל עובר ללא שינוי (קריאוּת).
- `S3Backend.put_bytes` מחיל אותו על כל ערכי ה-Metadata.
בדיקות: test_ascii_metadata_encodes_hebrew (helper) +
test_s3_put_bytes_sends_ascii_metadata (משחזר את מסלול-הכשל מול fake put_object).
16 עוברות בקובץ.
Invariants: מקיים G1 (נרמול-במקור, לא תיקון-בקריאה), INV-STG2 (שם-קובץ עברי
כ-metadata ולא ככ-key), G2 (אין מסלול-אחסון מקביל — תיקון ה-choke-point היחיד).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the write-side rewiring (INV-STG1) for the call-sites that run in
synchronous contexts, via a new blocking facade in storage.py
(put_bytes_sync / put_file_sync — asyncio.run, or a worker thread when a loop
is already running):
- services/extractor.py: multimodal thumbnail JPEGs → DERIVED (rendered in a
to_thread worker)
- services/docx_reviser.py: track-changes save (_save_docx_xml) + empty-diff
copy (copy_with_revisions) → DOCUMENTS
- services/docx_retrofit.py: in-place retrofit backup → DOCUMENTS
Each site keeps a fallback to a direct disk write when the target path is
outside DATA_DIR (caller-provided). Under the default STORAGE_BACKEND=
filesystem the bytes land exactly where they did before — zero behaviour
change.
Also: mcp_env_catalog MINIO_ENDPOINT default updated to the durable
container-name endpoint (http://minio-bx2ykvw94xbutsex41hz4vv8:9000), matching
the Coolify "Connect to Predefined Network" change made for network durability.
All binary write-sites now flow through storage.py. git-tracked text
(case.json/notes/research-md/draft-md) stays on disk by design (INV-STG7);
court-fetch temp files are ephemeral.
tests: +2 (thumbnail renderer routes through storage; put_bytes_sync
round-trip); 55 storage/docx/track-changes green; 244 collected, no import
breakage.
Keeps G2; completes INV-STG1 write coverage. Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The single choke-point for all binary file I/O (originals, derived
artifacts, exports), replacing the scattered open()/shutil/Path.write_bytes
calls across ~8 services. Backend chosen by STORAGE_BACKEND:
- filesystem (default): disk under DATA_DIR — byte-for-byte legacy behaviour
- dual: write disk + S3, read S3→disk fallback (migration window)
- s3: MinIO via aioboto3 (lazy import; absent in the filesystem path)
Keys are DATA_DIR-relative POSIX paths; the FS backend ignores the logical
bucket and keeps the existing single tree, so the default backend is zero
behaviour change. S3 maps a governance bucket (documents/immutable/derived)
→ MinIO bucket; presigned URLs are minted against the public endpoint
(browser-reachable) and carry the Hebrew filename via RFC-5987
Content-Disposition.
- config: STORAGE_BACKEND + MINIO_* (endpoint, public-endpoint, creds,
region, 3 bucket names, presign TTL)
- mcp_env_catalog: new "storage" category + 10 specs (X10/INV-ENV1)
- pyproject: aioboto3>=13 (consumed here, deployed with first use)
- tests: 18 unit tests (FS round-trip, key normalization/traversal guard,
bucket resolution, backend selection, dual write-both + S3-down fallback)
No call-sites are rewired yet — that is Phase 2 (106.3). STORAGE_BACKEND
stays filesystem in prod, so behaviour is unchanged.
Invariants: keeps G2 (one storage path replaces scattered I/O); establishes
INV-STG1 (single layer), INV-STG2 (atomic keys, Hebrew name in metadata),
INV-STG3 (governance buckets), INV-STG6 (presigned serving).
Spec: docs/spec/X14-storage-minio.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>