feat(graph): centrality + cluster analytics (corpus graph PR B)

The Obsidian "Graph Analysis" equivalent — surfaces influence and structure
beyond raw citation count.

Backend (new web/graph_metrics.py — pure, dependency-free, no DB → G2):
- PageRank (power-iteration), betweenness (Brandes), community (deterministic
  label-propagation + connected-components fallback), computed in-memory over
  the precedent citation subgraph that build_corpus_graph already fetched.
  Normalized 0–1; community ints dense-ranked by size (stable colours).
- GraphNode += pagerank/betweenness/community (None unless metrics=true).
- build_corpus_graph + /api/graph/corpus gain metrics=false (default path
  unchanged). Validated on the live corpus: 147 nodes in 13ms.

Frontend:
- graph.ts: GraphNode metrics fields + metrics param.
- graph-canvas: color-by (type | practice_area | precedent_level | community |
  recency) and size-by (in-degree | pagerank | betweenness) via colorForNode /
  radiusForNode; exported palettes.
- graph-view: colorBy/sizeBy controls; metrics requested only when needed;
  global metrics overlaid onto neighborhood nodes by id (a node's PageRank
  shouldn't change when focused); a ranking panel (Tabs: המשפיעות / גשרים,
  click → focus); dynamic legend per color-by.
- graph-filter-panel: "צביעה לפי" + "גודל נקודה לפי" Selects.

web-ui build + lint pass. Invariants: G2 (metrics pure, no DB writes),
UI2 (model grows on explicit Pydantic). api:types post-deploy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 21:04:47 +00:00
parent 106ab53231
commit 2fbc0cd3c2
7 changed files with 497 additions and 19 deletions

View File

@@ -35,6 +35,8 @@ from uuid import UUID
import asyncpg
from pydantic import BaseModel
from web import graph_metrics
# ── Node-type vocabulary ─────────────────────────────────────────────
VALID_NODE_TYPES = {"precedent", "halacha", "topic", "practice_area"}
DEFAULT_NODE_TYPES = ("precedent", "topic", "practice_area")
@@ -63,6 +65,10 @@ class GraphNode(BaseModel):
court: str | None = None # precedents only — for color-by / filter
date: str | None = None # precedents only — ISO date, for recency color/filter
case_law_id: str | None = None # canonical id for deep-link (precedents)
# Graph metrics — populated only when ``metrics=true`` (precedents only).
pagerank: float | None = None # normalized 01 (global influence)
betweenness: float | None = None # normalized 01 (bridge-ness)
community: int | None = None # dense cluster id, 0 = largest
class GraphFacets(BaseModel):
@@ -243,6 +249,7 @@ async def build_corpus_graph(
district: str = "",
year_from: int = 0,
year_to: int = 0,
metrics: bool = False,
) -> CorpusGraph:
"""Assemble the full corpus graph under the given filters.
@@ -250,6 +257,10 @@ async def build_corpus_graph(
so clipping never hides the structurally important nodes. ``truncated`` +
``total_available`` let the UI prompt the user to narrow filters. All
filters are applied server-side in the WHERE clause (G5).
When ``metrics`` is true, PageRank / betweenness / community are computed
in-memory over the precedent citation subgraph (``graph_metrics``) and
stamped onto precedent nodes — no extra DB work (G2).
"""
types = normalize_node_types(node_types)
cap = max(1, min(int(limit), NODE_CAP_MAX))
@@ -298,6 +309,9 @@ async def build_corpus_graph(
hub_nodes, edges = await _edges_and_hubs(conn, prec_rows, types)
nodes.extend(hub_nodes)
if metrics:
_stamp_metrics(nodes, edges)
return CorpusGraph(
nodes=nodes,
edges=edges,
@@ -306,6 +320,23 @@ async def build_corpus_graph(
)
def _stamp_metrics(nodes: list[GraphNode], edges: list[GraphEdge]) -> None:
"""Compute PageRank/betweenness/community over the precedent subgraph and
stamp them onto precedent nodes in place (hubs stay ``None``)."""
prec_ids = [n.id for n in nodes if n.type == "precedent"]
if not prec_ids:
return
directed = [(e.source, e.target) for e in edges if e.type == "cites"]
undirected = [(e.source, e.target) for e in edges if e.type == "same_chain"]
m = graph_metrics.compute(prec_ids, directed, undirected)
for n in nodes:
mv = m.get(n.id)
if mv:
n.pagerank = mv["pagerank"]
n.betweenness = mv["betweenness"]
n.community = mv["community"]
async def build_node_neighborhood(
pool: asyncpg.Pool,
node_id: str,