feat(bulletins): catalog monthly "עו"ד על נדל"ן" bulletins into the radar (X12)

עלון חודשי רב-נושאי (פרסום נפרד מהיומון היומי) → מתפצל ל-N שורות digest באותה
טבלה (publication='עו"ד על נדל"ן', לא קורפוס מקביל — G2):
- bulletin_splitter (LLM local-only, tools=""): מפצל ל-cases[]+articles[];
  עדכוני-חקיקה מדולגים (החלטת יו"ר).
- bulletin_library.ingest_bulletin: כל מצביע-פסיקה → digest_kind='decision'
  + embedding + autolink (כולל X13 court-fetch); כל מאמר → digest_kind='article'
  (טקסט-מלא + embedding, רקע בלבד — INV-DIG1 חל).
- content_hash per-item הוא מפתח-הדדאפ (yomon_number ריק) → אידמפוטנטי.
- db.create_digest: פרמטר digest_kind (זורם ל-INSERT + upsert).
- scripts/ingest_bulletins.py (host, venv) לעיבוד הארכיון.
- spec X12 §2.1.

אומת (dry-run, ללא DB): עלון 180 → 4 cases+1 article · עלון 201 → 4 cases
(כולל ערר-197) +1 article. עדכוני-חקיקה דולגו. claude_session נשאר local-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-08 08:07:45 +00:00
parent 81b3de6f4f
commit 85f94a4f3f
5 changed files with 344 additions and 4 deletions

View File

@@ -3667,10 +3667,12 @@ async def create_digest(
subject_tags: list[str] | None = None,
source_document_path: str = "",
extraction_status: str = "processing",
digest_kind: str = "",
) -> dict:
"""Upsert a digest (X12). Idempotent on yomon_number (INV-G3): a repeat
upload of the same yomon updates in place. content_hash is the secondary
dedup key for digests whose number couldn't be parsed."""
dedup key for digests whose number couldn't be parsed (and the primary key
for bulletin items, which carry no yomon_number — see uq_digests_content_hash)."""
pool = await get_pool()
content_hash = _content_hash(analysis_text)
async with pool.acquire() as conn:
@@ -3684,10 +3686,10 @@ async def create_digest(
headline_holding, analysis_text, summary, underlying_citation,
underlying_court, underlying_date, underlying_judge, practice_area,
appeal_subtype, subject_tags, source_document_path,
content_hash, extraction_status
content_hash, extraction_status, digest_kind
) VALUES (
$1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13,
$14, $15, $16, $17, $18
$14, $15, $16, $17, $18, $19
)
ON CONFLICT (yomon_number) WHERE yomon_number <> ''
DO UPDATE SET
@@ -3708,6 +3710,7 @@ async def create_digest(
source_document_path = COALESCE(NULLIF(EXCLUDED.source_document_path, ''), digests.source_document_path),
content_hash = EXCLUDED.content_hash,
extraction_status = EXCLUDED.extraction_status,
digest_kind = COALESCE(NULLIF(EXCLUDED.digest_kind, ''), digests.digest_kind),
updated_at = now()
RETURNING {_DIGEST_COLS}
""",
@@ -3715,7 +3718,7 @@ async def create_digest(
headline_holding, analysis_text, summary, underlying_citation,
underlying_court, underlying_date, underlying_judge, practice_area,
appeal_subtype, list(subject_tags or []), source_document_path,
content_hash, extraction_status,
content_hash, extraction_status, digest_kind,
)
return _row_to_digest(row)