Introduces canonical_halachot table: one row per unique legal principle,
replacing the equivalent_halachot bidirectional-link model (V28/G2 improvement).
Per-precedent halachot rows become instances that point to their canonical.
Schema (V41):
- canonical_halachot: canonical_statement, rule_type, practice_areas,
subject_tags, embedding (ivfflat), review_status (pending_synthesis→published),
first_established_in FK → case_law, instance_count.
- halachot: +canonical_id FK, +instance_type (original|citation|application),
+treatment; rule_statement + embedding become nullable for citation instances.
- halacha_citation_corroboration: +canonical_id FK so X11 aggregates at
principle level, not instance level. store_corroboration auto-populates it
via INSERT...SELECT.
New DB functions: create_canonical_halacha, nearest_canonical_halacha
(threshold search for Phase 3 lookup-before-insert), refresh_canonical_instance_count,
get_canonical_halacha (principle + instance list).
Backfill: scripts/backfill_canonical_halachot.py — dry-run by default,
--apply to execute. Uses union-find over equivalent_halachot pairs, picks
canonical representative (corroboration→confidence→earliest), creates canonicals,
sets canonical_id + instance_type on all instances.
Invariants: G2 (equivalent_halachot deprecated post-backfill), INV-G10
(canonical review_status gate), INV-DM7 (authority derived, not stored),
INV-AH (canonical_statement grounded in source statements, pending_synthesis
until chair approves).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>