Graphs and skeletons¶
ZVF provides two graph-structured geometry types. Use skeleton for
neuronal morphologies, vascular trees, and any other branching structure
that must align to the SWC convention. Use graph for arbitrary
connectivity — vascular networks with anastomoses, synaptic connectivity
graphs embedded in 3-D space, or any structure where cycles are valid.
Both types use the same on-disk array schema; the distinction is the
is_tree flag and the additional SWC-compatible attributes that skeleton
stores.
All examples on this page use only the core zarr-vectors API.
SWC/GraphML converters live in the companion package
zarr-vectors-tools.
Skeletons (SWC-aligned)¶
Write a skeleton programmatically¶
import numpy as np
from zarr_vectors.types.graphs import write_graph
rng = np.random.default_rng(0)
n_nodes = 800
# Simulate a branching skeleton: soma at origin, random walk branches
positions = np.zeros((n_nodes, 3), dtype=np.float32)
positions[1:] = rng.normal(0, 5, (n_nodes - 1, 3)).cumsum(0)
# Parent array: node 0 is root (parent = -1), others have sequential parents
# In a real skeleton, parent relationships encode the tree topology
parents = np.arange(-1, n_nodes - 1, dtype=np.int32)
edges = np.column_stack([
np.arange(1, n_nodes, dtype=np.int32), # child indices
parents[1:], # parent indices
])
radii = rng.uniform(0.2, 2.0, n_nodes).astype(np.float32)
swc_type = np.ones(n_nodes, dtype=np.int32)
swc_type[0] = 1 # soma
write_graph(
"neuron.zarrvectors",
positions=positions,
edges=edges, # [child, parent] pairs; root has parent = -1
chunk_shape=(200.0, 200.0, 200.0),
geometry_type="skeleton",
is_tree=True, # enables tree topology validation at write time
vertex_attributes={
"radius": radii,
"swc_type": swc_type,
},
)
Spatial query on a skeleton¶
from zarr_vectors.types.graphs import read_graph
# All nodes within a 100³ µm region
result = read_graph(
"neuron.zarrvectors",
bbox=(np.array([0., 0., 0.]), np.array([100., 100., 100.])),
)
print(result["node_count"])
print(result["edge_count"]) # edges where both endpoints are in bbox
To include edges that cross the bbox boundary:
result = read_graph(
"neuron.zarrvectors",
bbox=(np.array([0., 0., 0.]), np.array([100., 100., 100.])),
include_boundary_edges=True,
)
Multi-skeleton stores¶
For connectome-scale datasets with thousands of neurons, a single ZVF
store is far more efficient than per-neuron files. SWC-directory ingest
and SWC ID-mapping helpers live in zarr-vectors-tools.
Read a specific neuron¶
from zarr_vectors.types.graphs import read_graph
result = read_graph("connectome.zarrvectors", object_ids=[42])
print(result["node_count"]) # nodes in neuron 42
print(result["edge_count"])
print(result["attributes"]["radius"])
# Reconstruct as SWC-formatted dict
nodes = result["positions"] # (N, 3)
radii = result["attributes"]["radius"]
types = result["attributes"]["swc_type"]
edges = result["edges"] # (E, 2) [child, parent]
Spatial query in a connectome store¶
# Which neurons have processes in a given region?
result = read_graph(
"connectome.zarrvectors",
bbox=(np.array([2000., 3000., 1500.]),
np.array([2500., 3500., 2000.])),
return_object_ids=True,
)
print(result["object_ids"]) # IDs of neurons with nodes in region
print(result["node_count"]) # total nodes in the bbox across all neurons
General graphs¶
Writing a graph¶
from zarr_vectors.types.graphs import write_graph
rng = np.random.default_rng(0)
n_nodes = 2000
positions = rng.uniform(0, 500, (n_nodes, 3)).astype(np.float32)
# Simulate a vascular network: ~3 edges per node
src = rng.integers(0, n_nodes, 3000)
dst = rng.integers(0, n_nodes, 3000)
# Remove self-loops
mask = src != dst
edges = np.column_stack([src[mask], dst[mask]]).astype(np.int32)
write_graph(
"vessels.zarrvectors",
positions=positions,
edges=edges,
chunk_shape=(100.0, 100.0, 100.0),
bin_shape=(25.0, 25.0, 25.0),
is_directed=False,
is_tree=False,
vertex_attributes={
"diameter": rng.uniform(1, 20, n_nodes).astype(np.float32),
"flow": rng.uniform(0, 1, n_nodes).astype(np.float32),
},
)
Writing a directed graph¶
write_graph(
"directed.zarrvectors",
positions=positions,
edges=edges, # [source, destination] — direction is significant
chunk_shape=(100., 100., 100.),
is_directed=True,
)
Reading a graph¶
from zarr_vectors.types.graphs import read_graph
result = read_graph("vessels.zarrvectors")
print(result["node_count"]) # 2000
print(result["edge_count"])
print(result["positions"].shape) # (2000, 3)
print(result["edges"].shape) # (E, 2)
print(result["attributes"]["diameter"].shape) # (2000,)
GraphML ingest¶
GraphML conversion lives in the companion package
zarr-vectors-tools.
Validation¶
from zarr_vectors.validate import validate
result = validate("neuron.zarrvectors", level=4)
print(result.summary())
# Level 4 validation: PASS
# 38 passed, 0 warnings, 0 errors
Level 4 validation includes tree topology checks (for is_tree = true):
connected, acyclic, exactly one root vertex per component.
Multi-resolution pyramids¶
Graph pyramids coarsen vertex positions and deduplicate edges:
from zarr_vectors.multiresolution.coarsen import build_pyramid
build_pyramid(
"vessels.zarrvectors",
factors=[(2.0, 1.00)],
agg_mode="mean", # 0.4+: a single global mode (per-attribute via manual coarsen_level)
)
For skeleton stores with object_sparsity < 1.0, individual neurons are
thinned at coarser levels using the declared sparsity strategy.
Common pitfalls¶
is_tree=True raises an error at write time.
The writer validates tree topology when is_tree=True. If your data has
cycles (even due to floating-point duplicate positions that produce
coincident vertices), the write will fail. Check for cycles with:
import networkx as nx
G = nx.from_edgelist(edges.tolist())
print(nx.is_forest(G)) # should be True for a valid tree
Cross-chunk edges in a graph are not in links/<delta>/.
Edges between vertices in different chunks are stored in
cross_chunk_links/, not in the per-chunk links/<delta>/ array. When
reading a specific region, pass include_boundary_edges=True to include
these inter-chunk connections.
SWC parent ID −1 vs 0.
Some SWC tools use parent ID 0 (1-indexed) for the root; others use
-1 (ZVF convention). ingest_swc detects the convention automatically
from the first data row. If your SWC uses a non-standard root convention,
pass root_parent_id=<value> to override.
Object IDs change after rechunking.
Object IDs are assigned at write time and are stable across reads on the
same store. However, rechunking rebuilds the object_index/ and may
reassign IDs. If you need stable long-term IDs (e.g. for a connectome
database), store the canonical ID as a per-object attribute:
write_graph(..., object_attributes={"neuron_id": canonical_ids})