Introduces canonical_halachot table: one row per unique legal principle, replacing the equivalent_halachot bidirectional-link model (V28/G2 improvement). Per-precedent halachot rows become instances that point to their canonical. Schema (V41): - canonical_halachot: canonical_statement, rule_type, practice_areas, subject_tags, embedding (ivfflat), review_status (pending_synthesis→published), first_established_in FK → case_law, instance_count. - halachot: +canonical_id FK, +instance_type (original|citation|application), +treatment; rule_statement + embedding become nullable for citation instances. - halacha_citation_corroboration: +canonical_id FK so X11 aggregates at principle level, not instance level. store_corroboration auto-populates it via INSERT...SELECT. New DB functions: create_canonical_halacha, nearest_canonical_halacha (threshold search for Phase 3 lookup-before-insert), refresh_canonical_instance_count, get_canonical_halacha (principle + instance list). Backfill: scripts/backfill_canonical_halachot.py — dry-run by default, --apply to execute. Uses union-find over equivalent_halachot pairs, picks canonical representative (corroboration→confidence→earliest), creates canonicals, sets canonical_id + instance_type on all instances. Invariants: G2 (equivalent_halachot deprecated post-backfill), INV-G10 (canonical review_status gate), INV-DM7 (authority derived, not stored), INV-AH (canonical_statement grounded in source statements, pending_synthesis until chair approves). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
10 KiB
10 KiB