feat(graph): centrality + cluster analytics (corpus graph PR B)
The Obsidian "Graph Analysis" equivalent — surfaces influence and structure beyond raw citation count. Backend (new web/graph_metrics.py — pure, dependency-free, no DB → G2): - PageRank (power-iteration), betweenness (Brandes), community (deterministic label-propagation + connected-components fallback), computed in-memory over the precedent citation subgraph that build_corpus_graph already fetched. Normalized 0–1; community ints dense-ranked by size (stable colours). - GraphNode += pagerank/betweenness/community (None unless metrics=true). - build_corpus_graph + /api/graph/corpus gain metrics=false (default path unchanged). Validated on the live corpus: 147 nodes in 13ms. Frontend: - graph.ts: GraphNode metrics fields + metrics param. - graph-canvas: color-by (type | practice_area | precedent_level | community | recency) and size-by (in-degree | pagerank | betweenness) via colorForNode / radiusForNode; exported palettes. - graph-view: colorBy/sizeBy controls; metrics requested only when needed; global metrics overlaid onto neighborhood nodes by id (a node's PageRank shouldn't change when focused); a ranking panel (Tabs: המשפיעות / גשרים, click → focus); dynamic legend per color-by. - graph-filter-panel: "צביעה לפי" + "גודל נקודה לפי" Selects. web-ui build + lint pass. Invariants: G2 (metrics pure, no DB writes), UI2 (model grows on explicit Pydantic). api:types post-deploy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -35,6 +35,8 @@ from uuid import UUID
|
||||
import asyncpg
|
||||
from pydantic import BaseModel
|
||||
|
||||
from web import graph_metrics
|
||||
|
||||
# ── Node-type vocabulary ─────────────────────────────────────────────
|
||||
VALID_NODE_TYPES = {"precedent", "halacha", "topic", "practice_area"}
|
||||
DEFAULT_NODE_TYPES = ("precedent", "topic", "practice_area")
|
||||
@@ -63,6 +65,10 @@ class GraphNode(BaseModel):
|
||||
court: str | None = None # precedents only — for color-by / filter
|
||||
date: str | None = None # precedents only — ISO date, for recency color/filter
|
||||
case_law_id: str | None = None # canonical id for deep-link (precedents)
|
||||
# Graph metrics — populated only when ``metrics=true`` (precedents only).
|
||||
pagerank: float | None = None # normalized 0–1 (global influence)
|
||||
betweenness: float | None = None # normalized 0–1 (bridge-ness)
|
||||
community: int | None = None # dense cluster id, 0 = largest
|
||||
|
||||
|
||||
class GraphFacets(BaseModel):
|
||||
@@ -243,6 +249,7 @@ async def build_corpus_graph(
|
||||
district: str = "",
|
||||
year_from: int = 0,
|
||||
year_to: int = 0,
|
||||
metrics: bool = False,
|
||||
) -> CorpusGraph:
|
||||
"""Assemble the full corpus graph under the given filters.
|
||||
|
||||
@@ -250,6 +257,10 @@ async def build_corpus_graph(
|
||||
so clipping never hides the structurally important nodes. ``truncated`` +
|
||||
``total_available`` let the UI prompt the user to narrow filters. All
|
||||
filters are applied server-side in the WHERE clause (G5).
|
||||
|
||||
When ``metrics`` is true, PageRank / betweenness / community are computed
|
||||
in-memory over the precedent citation subgraph (``graph_metrics``) and
|
||||
stamped onto precedent nodes — no extra DB work (G2).
|
||||
"""
|
||||
types = normalize_node_types(node_types)
|
||||
cap = max(1, min(int(limit), NODE_CAP_MAX))
|
||||
@@ -298,6 +309,9 @@ async def build_corpus_graph(
|
||||
hub_nodes, edges = await _edges_and_hubs(conn, prec_rows, types)
|
||||
nodes.extend(hub_nodes)
|
||||
|
||||
if metrics:
|
||||
_stamp_metrics(nodes, edges)
|
||||
|
||||
return CorpusGraph(
|
||||
nodes=nodes,
|
||||
edges=edges,
|
||||
@@ -306,6 +320,23 @@ async def build_corpus_graph(
|
||||
)
|
||||
|
||||
|
||||
def _stamp_metrics(nodes: list[GraphNode], edges: list[GraphEdge]) -> None:
|
||||
"""Compute PageRank/betweenness/community over the precedent subgraph and
|
||||
stamp them onto precedent nodes in place (hubs stay ``None``)."""
|
||||
prec_ids = [n.id for n in nodes if n.type == "precedent"]
|
||||
if not prec_ids:
|
||||
return
|
||||
directed = [(e.source, e.target) for e in edges if e.type == "cites"]
|
||||
undirected = [(e.source, e.target) for e in edges if e.type == "same_chain"]
|
||||
m = graph_metrics.compute(prec_ids, directed, undirected)
|
||||
for n in nodes:
|
||||
mv = m.get(n.id)
|
||||
if mv:
|
||||
n.pagerank = mv["pagerank"]
|
||||
n.betweenness = mv["betweenness"]
|
||||
n.community = mv["community"]
|
||||
|
||||
|
||||
async def build_node_neighborhood(
|
||||
pool: asyncpg.Pool,
|
||||
node_id: str,
|
||||
|
||||
Reference in New Issue
Block a user