feat(mcp): case_get_final_text — fall back to PDF/DOC/RTF/TXT/MD
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m58s

The Hermes Knowledge Curator's hermes-curator.md says it must be able to
read both DOCX and PDF final decisions. The original implementation
hardcoded the .docx extension only. Extend to try .docx → .pdf → .doc →
.rtf → .txt → .md, returning the first match. extractor.extract_text
already supports all six formats, so no extractor changes needed.

If none found, the not_found response now includes the tried_extensions
list so the caller knows what was attempted.

Verified on case 1130-25 (.docx still picked first) and tested via
`curator-cmp mcp test legal-ai`.
This commit is contained in:
2026-05-05 19:18:57 +00:00
parent 7c9582ed04
commit bd4b0ca766

View File

@@ -377,19 +377,30 @@ async def case_get_final_text(case_number: str, max_chars: int = 0) -> str:
בניגוד ל-`document_get_text` שעובד על שורות בטבלת `documents`,
הקובץ הסופי הוא רק קובץ בתיקייה (נוצר על ידי `api_mark_final`).
תומך בכל הפורמטים ש-extractor.extract_text מטפל בהם — מנסה
`.docx` תחילה, ואז `.pdf`, `.doc`, `.rtf`, `.txt`, `.md`.
Args:
case_number: מספר תיק הערר
max_chars: אם >0, חתוך את הטקסט המוחזר לאורך הזה. 0 = הכל.
"""
case_dir = config.find_case_dir(case_number)
final_path = case_dir / "exports" / f"סופי-{case_number}.docx"
exports_dir = case_dir / "exports"
final_stem = f"סופי-{case_number}"
if not final_path.exists():
final_path = None
for ext in (".docx", ".pdf", ".doc", ".rtf", ".txt", ".md"):
candidate = exports_dir / f"{final_stem}{ext}"
if candidate.exists():
final_path = candidate
break
if final_path is None:
return json.dumps({
"status": "not_found",
"case_number": case_number,
"expected_path": str(final_path),
"expected_path": str(exports_dir / f"{final_stem}.docx"),
"tried_extensions": [".docx", ".pdf", ".doc", ".rtf", ".txt", ".md"],
"hint": (
"ההחלטה הסופית עדיין לא סומנה כ'סופית' ב-UI. "
"דפנה צריכה ללחוץ 'סמן כסופי' על קובץ הטיוטה הנכון."