fix(precedent-library): allow re-extraction for internal_committee rows
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m13s
All checks were successful
Build & Deploy / build-and-deploy (push) Successful in 3m13s
The "חלץ מטא-דאטה" / "חלץ הלכות" buttons in the UI were returning 404 for any precedent with `source_kind != 'external_upload'`. The original restriction was meant to keep LLM extraction off internal-committee imports (their metadata supposedly came from the case file system), but the same precedent rows can still need re-extraction when ingest produces broken data — e.g. the corrupted `subject_tags` value `['[','"','ה','י',...]` that motivated this change (an early ingest stored a JSON literal into a TEXT[] column, which Postgres split into single chars). Two changes here: 1. db.request_metadata_extraction / request_halacha_extraction: drop the `AND source_kind='external_upload'` filter. The extractor already preserves user values (only fills empty fields), so this is safe. 2. precedent_metadata_extractor.extract_and_apply: detect the character-by-character corruption above and treat it as empty so the freshly-extracted tags actually replace the broken ones. Heuristic: 3+ elements where every element is at most 2 chars (legitimate tags are multi-character Hebrew words). Coolify deploy required for the FastAPI container to pick this up.
This commit is contained in:
@@ -223,7 +223,17 @@ async def apply_to_record(
|
||||
fields_to_update["key_quote"] = s
|
||||
|
||||
cur_tags = record.get("subject_tags") or []
|
||||
if not cur_tags:
|
||||
# Treat character-by-character corruption as empty. Early ingest
|
||||
# pipelines stored a JSON string (`'["היטל השבחה"]'`) into a TEXT[]
|
||||
# column, which Postgres split into individual chars:
|
||||
# `['[', '"', 'ה', 'י', 'ט', 'ל', ' ', 'ה', 'ש', ...]`. Detection:
|
||||
# 3+ elements where every element is at most 2 chars (legitimate
|
||||
# tags are multi-character Hebrew words like `היטל_השבחה`).
|
||||
is_corrupt = (
|
||||
len(cur_tags) >= 3
|
||||
and all(isinstance(t, str) and len(t) <= 2 for t in cur_tags)
|
||||
)
|
||||
if not cur_tags or is_corrupt:
|
||||
sug_tags = suggested.get("subject_tags") or []
|
||||
if sug_tags:
|
||||
fields_to_update["subject_tags"] = sug_tags
|
||||
|
||||
Reference in New Issue
Block a user