Files
legal-ai/mcp-server
Chaim 4892fb6e8f
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 1m40s
fix(extractor): apply Hebrew quote fixer to direct PDF extraction path
Born-digital Hebrew PDFs from legal software often encode gershayim (״)
as double-yod (יי), producing the same corruption patterns as OCR.
The fixer was only called after Google Cloud Vision OCR — digitally
created PDFs that passed quality checks received no correction.

Changes:
- Apply _fix_hebrew_quotes() in the direct extraction path
- Add 'בליימ' → 'בל"מ' (בקשה להארכת מועד — systematic corruption in 1017-03-26)
- Add 'תמייא' → 'תמ"א' (תכנית מתאר ארצית)
- Update docstring to reflect the broader scope

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 15:59:39 +00:00
..