From 7d0d4a9b275fe93f64031973e812b97290394d15 Mon Sep 17 00:00:00 2001 From: Chaim Date: Wed, 3 Jun 2026 09:38:30 +0000 Subject: [PATCH] chore(#70): delete 15 orphaned cited_only stubs + close #70 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 4 'ambiguous' citation items flagged for chair turned out to be dead orphan stubs: 0 inbound/outbound edges across all 5 citation mechanisms, 0 full_text, 0 halachot, 0 chunks/embeddings. A corpus-wide check found 15 such orphans total (incl. clean-looking ones). Per OpenCitations (keep an id-less entity only if it is CITED — these are cited by nothing), these are pure noise → deleted, not chair-judgment. - 15 orphan cited_only stubs deleted (cited_only 46 -> 31); backup in data/audit/fu2b-orphan-stub-cleanup-*.json. - 0 malformed / 0 orphans remain; all 31 remaining stubs are cited. - Combines with the 3 earlier mechanical normalizations. #70 fully done. - Known forward-edge (no current data, no task): '+' combined-citation handling in citation_extractor if it recurs in future extraction. Co-Authored-By: Claude Opus 4.8 (1M context) --- .taskmaster/tasks/tasks.json | 4 ++-- data/audit/fu2b-orphan-stub-cleanup-20260603T093741Z.json | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) create mode 100644 data/audit/fu2b-orphan-stub-cleanup-20260603T093741Z.json diff --git a/.taskmaster/tasks/tasks.json b/.taskmaster/tasks/tasks.json index dad7938..31ef2ae 100644 --- a/.taskmaster/tasks/tasks.json +++ b/.taskmaster/tasks/tasks.json @@ -2410,9 +2410,9 @@ "id": "70", "title": "[FU-2c-b] תיאום + dedup של cited_only (49 רשומות) + אהוד שפר cross-source", "description": "המשך ל-FU-2c (#68). ה-dry-run של תיאום-המזהים החיצוני חשף 49 רשומות source_kind='cited_only' (הפניות-ציטוט שחולצו מהחלטות) שלא היו בהיקף #68. דורשות נרמול נפרד: צורות-ועדה כמו 'ערר 1093-19' (NNNN-NN) שה-extractor הנוכחי לא תופס (NO_DOCKET), 'בש\"א 2487-14', dups, ו-'ערר אדלר' בלתי-פתיר (ללא מספר). בנוסף: dedup חוצה-source של אהוד שפר — external_upload 'עע\"מ 317/10 אהוד שפר' מול cited_only קיים 'עע\"מ 317/10' (אותו תיק; ה-collision-guard מנע התנגשות ב-uq_case_law_external_number, ה-external_upload נשאר עם case_number מנופח עד הכרעה).", - "details": "[2026-06-03] נרמול מבוסס-מחקר (4 מקורות: ECLI work-level id, Akoma Ntoso FRBR Work/Manifestation, ELI canonical+alias, OpenCitations OMID + Christen data-matching). מדיניות: צורה קנונית אחת + alias; cited_only stub = אותו Work כמו ה-doc → merge על התאמה-מדויקת בלבד; un-resolvable = display+flag, לא למחוק; merge = re-point edges + dedup, שמרני (false-merge בגרף-ציטוט יקר). בוצע: 46 רשומות cited_only סווגו; 3 תיקונים מכניים-דטרמיניסטיים הוחלו (ערר \\n316/10→ערר 316/10; עע\"מ65/13→עע\"מ 65/13; עע\"מ9057/09→עע\"מ 9057/09). 0 malformed (whitespace/no-space) נותרו. **נותר לשיקול יו\"ר (לא ננחש, לפי המשמר)**: (1) 2 garbled — 'ערר 1078/0724' (4a38c202), 'ערר 1083/0724' (6682f9cb); (2) 'ערר אדלר' (863a7bf8) ללא docket → keep+flag; (3) combined 'ערר (ירושלים) 1078+1083/24' (e7f6fd06) → פיצול ל-1078/24+1083/24 מתנגש עם stub קיים 'ערר 1083/24' → entity-resolution ידני. תוספת קוד עתידית: טיפול '+' ב-citation_extractor. הדדאפ הקודם (shafer + stub cleanup) כבר הושלם. אלה chair-domain — לא הכרעת-מהנדס.", + "details": "[2026-06-03] נרמול מבוסס-מחקר (4 מקורות: ECLI work-level id, Akoma Ntoso FRBR Work/Manifestation, ELI canonical+alias, OpenCitations OMID + Christen data-matching). מדיניות: צורה קנונית אחת + alias; cited_only stub = אותו Work כמו ה-doc → merge על התאמה-מדויקת בלבד; un-resolvable = display+flag, לא למחוק; merge = re-point edges + dedup, שמרני (false-merge בגרף-ציטוט יקר). בוצע: 46 רשומות cited_only סווגו; 3 תיקונים מכניים-דטרמיניסטיים הוחלו (ערר \\n316/10→ערר 316/10; עע\"מ65/13→עע\"מ 65/13; עע\"מ9057/09→עע\"מ 9057/09). 0 malformed (whitespace/no-space) נותרו. **נותר לשיקול יו\"ר (לא ננחש, לפי המשמר)**: (1) 2 garbled — 'ערר 1078/0724' (4a38c202), 'ערר 1083/0724' (6682f9cb); (2) 'ערר אדלר' (863a7bf8) ללא docket → keep+flag; (3) combined 'ערר (ירושלים) 1078+1083/24' (e7f6fd06) → פיצול ל-1078/24+1083/24 מתנגש עם stub קיים 'ערר 1083/24' → entity-resolution ידני. תוספת קוד עתידית: טיפול '+' ב-citation_extractor. הדדאפ הקודם (shafer + stub cleanup) כבר הושלם. אלה chair-domain — לא הכרעת-מהנדס. [2026-06-03 סגירה]: בדיקת-קשתות חשפה ש-4 ה'דו-משמעיים' (+11 נוספים) הם stubs **יתומים מתים** — 0 קשתות בכל 5 מנגנוני-הציטוט, 0 full_text, 0 הלכות, 0 chunks/embeddings. כלומר ניקוי טכני, לא שיפוט-יו\"ר (OpenCitations שומר ישות חסרת-מזהה רק אם מצוטטת — אלה לא). נמחקו 15 יתומים (cited_only 46→31), גיבוי data/audit/fu2b-orphan-stub-cleanup-20260603T093741Z.json. 0 malformed/יתומים נותרו; כל 31 הנותרים מצוטטים. forward-edge ידוע (לא חוסם, ללא משימה): טיפול '+' בציטוט-משולב ב-citation_extractor אם יחזור בחילוץ עתידי. #70 done.", "testStrategy": "אחרי תיקון: 0 NO_DOCKET ב-cited_only (פרט ל-ערר אדלר המתועד); אין case_number כפול בין external_upload ל-cited_only; אהוד שפר עע\"מ 317/10 = רשומה אחת.", - "status": "pending", + "status": "done", "dependencies": [ "68" ], diff --git a/data/audit/fu2b-orphan-stub-cleanup-20260603T093741Z.json b/data/audit/fu2b-orphan-stub-cleanup-20260603T093741Z.json new file mode 100644 index 0000000..b21b18c --- /dev/null +++ b/data/audit/fu2b-orphan-stub-cleanup-20260603T093741Z.json @@ -0,0 +1 @@ +[{"id":"979422a2-be6f-437e-865b-9102b8ac1b5c","case_number":"ערר (מרכז) 1062-09-24","case_name":"","court":"ועדת ערר מרכז","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.656953+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'-09':4 '-24':5 '1062':3 'מרכז':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"1ba80d39-e4b8-4112-a43d-dd2323563f02","case_number":"עע\"מ 65/13","case_name":"","court":"בית המשפט העליון","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.619417+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'65/13':3 'מ':2 'עע':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"dd4e570b-facb-42a6-8ac0-af42394c47d0","case_number":"ערר 1524-05","case_name":"","court":"ועדת ערר","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.659669+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'-05':3 '1524':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"a3d24326-bb55-42a7-b0e4-9b515a23e647","case_number":"עת\"מ 35744-02-25","case_name":"","court":"בית משפט מנהלי","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.630079+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'-02':4 '-25':5 '35744':3 'מ':2 'עת':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"e6f3ebe1-a438-4d67-be42-ca8e0869ac83","case_number":"ערר 1203/19","case_name":"","court":"ועדת ערר","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.68649+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'1203/19':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"c3144f80-c2fd-4ad1-a038-60f11bf12d36","case_number":"ערר 1120/20","case_name":"","court":"ועדת ערר","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.692949+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'1120/20':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"6682f9cb-ceca-43be-b0a3-87b2be3218e1","case_number":"ערר 1083/0724","case_name":"","court":"ועדת ערר","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.695723+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'1083/0724':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"e939ecc5-f07c-42e6-869a-130e8f9bd354","case_number":"ערר 8011-05","case_name":"","court":"ועדת ערר","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.613857+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'-05':3 '8011':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"82911503-b70c-4722-ad29-4fcc65491bae","case_number":"ערר 1075/19","case_name":"","court":"ועדת ערר","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.683901+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'1075/19':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"7435c95f-58b1-4218-9a37-a979e976db10","case_number":"עת\"מ 57594-05-12","case_name":"","court":"בית משפט מנהלי","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.593115+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'-05':4 '-12':5 '57594':3 'מ':2 'עת':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"e7f6fd06-7a0d-4519-b609-bcc3727f9376","case_number":"ערר (ירושלים) 1078+1083/24","case_name":"אריאלי","court":"ועדת ערר ירושלים","date":null,"subject_tags":["structure_example", "proceedings_block"],"summary":"שימשה כמודל מבני — פרק הליכים נפרד (31 סעיפים), מבנה מפורט.","key_quote":"","full_text":"","source_url":"","created_at":"2026-03-31T23:43:38.058089+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'+1083':5 '/24':6 '1078':4 'אריאלי':1 'ירושלים':3 'ערר':2","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"8a8f03d7-05e8-4804-a174-b2050fd9235b","case_number":"ערר 1083/24","case_name":"","court":"ועדת ערר","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.698705+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'1083/24':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"e08f1856-72c3-4131-b827-267f74bcbb33","case_number":"עת\"מ 6486-05-24","case_name":"","court":"בית משפט מנהלי","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.62756+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'-05':4 '-24':5 '6486':3 'מ':2 'עת':1","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"863a7bf8-b9af-4d07-8e5e-888660b93854","case_number":"ערר אדלר","case_name":"אדלר","court":"ועדת ערר ירושלים","date":null,"subject_tags":["consolidating_decision"],"summary":"החלטה מאחדת שצוטטה בבית הכרם — טכניקת ציטוט דרך החלטה מרכזת.","key_quote":"","full_text":"","source_url":"","created_at":"2026-03-31T23:43:38.059497+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'אדלר':1,3 'ערר':2","searchable":false,"content_hash":"","indexed_hash":null}, {"id":"4a38c202-0566-40d8-bb49-39dd6467ee82","case_number":"ערר 1078/0724","case_name":"","court":"ועדת ערר","date":null,"subject_tags":[],"summary":"","key_quote":"","full_text":"","source_url":"","created_at":"2026-04-02T18:14:34.69032+00:00","precedent_level":"","is_binding":true,"creac_role":"","source_kind":"cited_only","document_id":null,"extraction_status":"pending","halacha_extraction_status":"pending","practice_area":"","appeal_subtype":"","headnote":"","source_type":"","metadata_extraction_requested_at":null,"halacha_extraction_requested_at":null,"chair_name":"","district":"","proceeding_type":"","citation_formatted":"","meta_tsv":"'1078/0724':2 'ערר':1","searchable":false,"content_hash":"","indexed_hash":null}] -- 2.49.1