zarr_vectors¶

Top-level package. Importing zarr_vectors exposes the most commonly used constants and a convenience validate entry-point.

zarr-vectors: Python utilities for the Zarr Vectors (ZV) format.

Cloud-native storage for points, lines, streamlines, graphs, and meshes built on Zarr v3.

class zarr_vectors.FsGroup(path, *, create=False)[source]¶

Bases: Group

Backwards-compatible subclass of Group rooted at a local path.

Deprecated since version Direct: use of FsGroup is deprecated. New code should call create_store() / open_store().

Parameters:

path (str | Path) – Filesystem path or pathlib.Path. A file:// URL is also accepted.
create (bool) – If True, create the directory if it does not already exist. If False, raise StoreError if the directory is missing (matching the legacy behaviour).

__init__(path, *, create=False)[source]¶

Parameters:

path (str | Path)
create (bool)

Return type:

None

class zarr_vectors.Group(zarr_group)[source]¶

Bases: object

A ZV group wrapping an underlying zarr.Group.

Parameters:: zarr_group (zarr.Group)

__init__(zarr_group)[source]¶

Parameters:: zarr_group (Group)
Return type:: None

array_exists(array_name)[source]¶

Parameters:: array_name (str)
Return type:: bool

property attrs: _Attrs¶

property backend: _BackendShim¶

batched_reads(plan)[source]¶

Prefetch every chunk in plan via one asyncio.gather() and serve subsequent read_bytes() calls from the resulting in-memory cache.

plan is a list of (array_name, [chunk_keys, ...]) pairs — typically (VERTICES, list_chunk_keys(group, VERTICES)) plus the parallel vertex_fragments and per-attribute arrays. On entry every (array_name, chunk_key) pair is fetched in a single async gather; on exit the cache is dropped.

Reads for a key NOT in the plan fall through to the sync read_bytes() path, so under-specifying the plan degrades performance gracefully (still correct).

Use for chunk-heavy read loops against high-latency object stores (GCS / S3 / Azure). Each per-chunk GET becomes one async task instead of one serial sync call, so the total wall time approaches one round-trip rather than N round-trips.

Nesting is not supported and raises StoreError. Writes inside the block are unaffected.

Example:

chunk_keys = list_chunk_keys(level_group, VERTICES)
with level_group.batched_reads([
    (VERTICES, chunk_keys),
    (VERTEX_FRAGMENTS, chunk_keys),
    *((f"{VERTEX_ATTRIBUTES}/{a}", chunk_keys) for a in attrs),
]):
    for cc in chunk_keys:
        fragments = read_chunk_vertices(level_group, cc, ...)

Parameters:: plan (list[tuple[str, list[str]]])
Return type:: Iterator[None]

batched_writes(compressor=None)[source]¶

Defer every write_bytes() and write_array_meta() call inside the block and flush them in a single asyncio.gather() on exit.

Use for chunk-heavy write loops against high-latency object stores (GCS / S3 / Azure). Each per-chunk PUT and each per-array zarr.json PUT becomes one async task instead of one serial sync call, so the total wall time approaches one round-trip rather than N round-trips.

Parameters:: compressor (Any) – Codec selection applied to every chunk array written inside the block. See zarr_vectors.encoding.compression.resolve_compressor() for accepted values; the default None resolves to zarr v3’s default (bytes + zstd). # TODO: per-array-type codec dict (vertices vs fragments # vs links) — future work; today every chunk gets the # same codec.
Return type:: Iterator[None]

Nesting is not supported and raises StoreError. Reads inside the block are unaffected and execute synchronously.

Example:

with level_group.batched_writes():
    create_vertices_array(level_group, dtype="float32")
    create_attribute_array(level_group, "intensity")
    for cc in chunk_coords:
        write_chunk_vertices(level_group, cc, ...)
        write_chunk_attributes(level_group, "intensity", cc, ...)
# exit point: every PUT scheduled above flushes in parallel

children()[source]¶

Return every immediate child name — both arrays and groups.

Use this when iterating a parent path that may contain a mix of Option-G chunk-group children (vertex / link attributes) and flat-array children (object / group attributes after the 0.8.1 migration). __iter__ deliberately yields groups only to preserve back-compat for callers that predate the migration.

Return type:: list[str]

chunk_exists(array_name, chunk_key)[source]¶

Parameters:

array_name (str)
chunk_key (str)

Return type:

bool

create_group(name, **_kwargs)[source]¶

Parameters:

name (str)
_kwargs (Any)

Return type:

Group

create_sharded_chunk_array(array_name, grid_shape, *, shard_shape=None, attributes=None)[source]¶

Allocate a sharded vlen-bytes Zarr array for per-chunk blobs.

Replaces the legacy “Option G” group-of-tiny-arrays layout with a single Zarr v3 array whose shape equals the level’s chunk grid. Each cell holds one ZVF spatial chunk’s payload. When shard_shape is provided the sharding_indexed codec packs many cells into a single storage object — the standard cloud layout described in Sharding.

Parameters:

array_name (str) – Logical path of the array, e.g. "vertices" or "links/0". Intermediate path segments become Zarr sub-groups if absent.
grid_shape (tuple[int, ...]) – Number of chunks along each spatial axis at this level. Computed via zarr_vectors.spatial.chunking.compute_grid_shape().
shard_shape (tuple[int, ...] | None) – Outer-chunk shape in inner-chunk units. None (default) creates an unsharded array — one storage object per ZVF chunk, same I/O cost as the legacy layout but already in the new structural form. A typical cloud workload uses (8, 8, 8): 512 ZVF chunks per storage object.
attributes (dict[str, Any] | None) – Per-array metadata merged into the array’s zarr.json attributes block (the standard Zarr v3 location for user metadata).

Return type:

None

delete_subtree(name)[source]¶

Parameters:: name (str)
Return type:: None

list_chunks(array_name)[source]¶

Parameters:: array_name (str)
Return type:: list[str]

native_sharded_arrays(shard_shape, grid_shape)[source]¶

Make subsequent per-chunk-array creations native-sharded.

Inside this block, every zarr_vectors.core.arrays._ensure_array_dir() call for a per-spatial-chunk array (vertices, vertex_fragments, links/<delta>, link_fragments, vertex_attributes/<name>, fragment_attributes/<name>) allocates a single multidim vlen-bytes Zarr array at that path using Zarr v3’s sharding_indexed codec — instead of the legacy Option-G group-of-tiny-arrays. Subsequent write_bytes() calls go through the grid-coord write dispatch on this class, so callers don’t need to change.

Parameters:

shard_shape (tuple[int, ...]) – Outer-chunk shape in inner-chunk units, e.g. (8, 8, 8). Length must match grid_shape.
grid_shape (tuple[int, ...]) – Number of spatial chunks along each axis at this level. Compute via zarr_vectors.spatial.chunking.compute_grid_shape().

Return type:

Iterator[None]

Nesting is not supported; an inner call raises StoreError. The legacy batched_writes context can be active concurrently — writes still route through the sharded path (the batched queue’s flush is a no-op when nothing is queued).

property path: Path¶

property prefix: str¶

read_array(path)[source]¶

Read a chunked Zarr array at path as a numpy array.

Parameters:: path (str)
Return type:: ndarray

read_array_attrs(path)[source]¶

Read the attributes block of a Zarr array at path.

Returns {} for missing paths or paths that point at a group rather than an array.

Parameters:: path (str)
Return type:: dict[str, Any]

read_array_fill_value(path)[source]¶

Return the fill_value of the standard Zarr array at path.

fill_value lives at the top level of the array’s zarr.json (not in the user attributes block), so this helper exists alongside read_array_attrs() rather than being folded into it. Raises StoreError if path is missing or points at a group.

Parameters:: path (str)
Return type:: Any

read_array_meta(array_name)[source]¶

Parameters:: array_name (str)
Return type:: dict[str, Any]

read_bytes(array_name, chunk_key)[source]¶

Parameters:

array_name (str)
chunk_key (str)

Return type:

bytes

read_vlen_array(path)[source]¶

Read a vlen-bytes Zarr array at path as a list of bytes.

Parameters:: path (str)
Return type:: list[bytes]

require_group(name)[source]¶

Parameters:: name (str)
Return type:: Group

standalone_array_exists(path)[source]¶

True when path points at a Zarr array (not a group).

Counterpart to array_exists(), which reports True for the legacy Option-G layout (a group containing chunk arrays).

Parameters:: path (str)
Return type:: bool

property url: str¶

write_array(path, data, *, chunks=None, fill_value=None, attributes=None, compressors=None)[source]¶

Write a chunked Zarr v3 array at path.

Parameters:

path (str) – Logical path of the array within this group, e.g. "object_attributes/intensity". Intermediate path segments become Zarr groups if absent.
data (Any) – Numpy-coercible array. dtype and shape set the array’s dtype and shape.
chunks (tuple[int, ...] | None) – Chunk shape. Defaults to data.shape (single chunk).
fill_value (Any) – Value Zarr returns for unwritten chunk positions. Special floats are JSON-string-encoded per spec ("NaN", "Infinity", "-Infinity").
attributes (dict[str, Any] | None) – Dict applied to the array’s attributes block in its own zarr.json — the spec-blessed location for per-array user metadata.
compressors (Any) – Override the codec pipeline. None uses the active session codec (from batched_writes()) or zarr v3’s default (bytes + zstd).

Return type:

None

write_array_meta(array_name, meta)[source]¶

Parameters:

array_name (str)
meta (dict[str, Any])

Return type:

None

write_bytes(array_name, chunk_key, data)[source]¶

Parameters:

array_name (str)
chunk_key (str)
data (bytes)

Return type:

None

write_vlen_array(path, blobs, *, chunks=None, attributes=None)[source]¶

Write a 1D variable-length-bytes Zarr v3 array at path.

Each element of blobs becomes one row of the resulting array. Mirrors the layout used by object_index/manifests.

Parameters:

path (str) – Logical path of the array.
blobs (Sequence[bytes]) – Iterable of bytes blobs. Empty input writes no array (caller’s responsibility to check).
chunks (int | None) – Elements per chunk. Defaults to len(blobs) (single chunk).
attributes (dict[str, Any] | None) – Dict applied to the array’s attributes block.

Return type:

None

property zarr_group: Group¶: The underlying zarr.Group.

class zarr_vectors.RechunkSpec(by, bins=None, spatial_chunk_shape=None, prefix_dim_name=None, categorical=False)[source]¶

Bases: object

Specification for rechunking a store along a non-spatial dimension.

Parameters:

by (str) – Dimension to rechunk by. One of: - "group" — chunk by group membership - "object_id" — chunk by object ID ranges - "attribute:<name>" — chunk by attribute value bins - "spatial" — re-spatial-chunk only (change chunk_shape)
bins (list[float] | None) – Explicit bin edges for continuous values. For by="object_id", these are ID boundaries. For by="attribute:length", these are value boundaries. If None and categorical is False, the legacy quartile-based auto-binning is used for attributes with >10 unique values.
spatial_chunk_shape (tuple[float, ...] | None) – Override spatial chunk shape for the output store. If None, keeps the source chunk shape.
prefix_dim_name (str | None) – Custom name for the prefix dimension in metadata. Defaults to the by value.
categorical (bool) – If True, treat the binning dimension as categorical — every unique value gets its own bin regardless of how many unique values there are. Set this for gene labels, bundle labels, etc. Ignored when bins is also set.

__init__(by, bins=None, spatial_chunk_shape=None, prefix_dim_name=None, categorical=False)¶

Parameters:

by (str)
bins (list[float] | None)
spatial_chunk_shape (tuple[float, ...] | None)
prefix_dim_name (str | None)
categorical (bool)

Return type:

None

bins: list[float] | None = None¶

by: str¶

categorical: bool = False¶

property dimension_name: str¶

prefix_dim_name: str | None = None¶

spatial_chunk_shape: tuple[float, ...] | None = None¶

class zarr_vectors.StorageBackend(*args, **kwargs)[source]¶

Bases: Protocol

Byte-level key/value backend rooted at a single URL.

Implementations must provide the methods below. All key arguments are forward-slash-separated relative paths under the store root. Empty string refers to the root itself.

__init__(*args, **kwargs)¶

close()[source]¶

Release any connections / file handles.

Return type:: None

delete(key)[source]¶

Delete a single key. Silent if absent.

Parameters:: key (str)
Return type:: None

delete_prefix(prefix)[source]¶

Delete every key whose path starts with prefix.

Used to implement rmtree-style subtree removal.

Parameters:: prefix (str)
Return type:: None

ensure_prefix(prefix)[source]¶

Best-effort create a container at prefix.

On filesystem backends this calls mkdir(parents=True, exist_ok=True). On flat object stores this is a no-op — the prefix exists implicitly once any key under it is written.

Parameters:: prefix (str)
Return type:: None

exists(key)[source]¶

Return True if key exists.

Parameters:: key (str)
Return type:: bool

get_bytes(key)[source]¶

Read raw bytes from key.

Raises:: KeyError – If key does not exist.
Parameters:: key (str)
Return type:: bytes

list_prefix(prefix, *, recursive=False)[source]¶

Yield keys under prefix.

recursive=True: yields every descendant file key. Keys are full paths relative to the store root. No trailing /.
recursive=False: yields immediate children — files and containers. Container entries are flagged with a trailing / so callers can tell them apart from files.

Yields nothing if prefix does not exist.

Parameters:

prefix (str)
recursive (bool)

Return type:

Iterator[str]

put_bytes(key, data)[source]¶

Write raw bytes to key, overwriting any existing value.

Parameters:

key (str)
data (bytes)

Return type:

None

property url: str¶: Canonical URL of the store root (e.g. "file:///C:/x/y").

class zarr_vectors.ZVWriter(level)[source]¶

Bases: object

Mutation handle for one ZVLevel.

Acquire one via zv[0].writer(). Holds a reference to the level’s Group so all mutations go through the same backend the reader uses.

Usage:

# Async — recommended for cloud stores
async with zv[0].writer() as w:
    await w.add_attribute("normal", normals)

# Sync — convenient for scripts
with zv[0].writer() as w:
    w.add_attribute_sync("normal", normals)

Parameters:: level (ZVLevel)

__init__(level)[source]¶

Parameters:: level (ZVLevel)
Return type:: None

async add_attribute(name, values, *, dtype=None)[source]¶

Write a per-vertex attribute aligned with this level’s vertices.

Splits values by chunk using the existing vertex-count sidecars (Tier E) and writes one attributes/<name>/<chunk_key> per chunk. No vertex data is re-encoded.

Parameters:

name (str) – Attribute name. Stored under attributes/<name>/.
values (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – (N,) or (N, C) array of length equal to the level’s total vertex count.
dtype (str | dtype | None) – Override the on-disk dtype (default: values.dtype).

Return type:

None

add_attribute_sync(name, values, *, dtype=None)[source]¶

Parameters:

name (str)
values (ndarray[tuple[Any, ...], dtype[_ScalarT]])
dtype (str | dtype | None)

Return type:

None

async add_face_attribute(name, values, *, dtype=None)[source]¶

Per-face attribute on a mesh level.

Stored under face_attributes/<name>/<chunk_key>. Faces are aligned 1:1 with the intra-chunk links array — values for a chunk’s F_local faces appear in the same order as the decoded links/<chunk_key>.

Note: cross-chunk faces are stored in cross_chunk_links/<delta>/ with link_width=3 (0.6.0+); per-face attributes for those records use the parallel cross_chunk_link_attributes/<name>/<delta>/ array.

Parameters:

name (str)
values (ndarray[tuple[Any, ...], dtype[_ScalarT]])
dtype (str | dtype | None)

Return type:

None

add_face_attribute_sync(name, values, *, dtype=None)[source]¶

Parameters:

name (str)
values (ndarray[tuple[Any, ...], dtype[_ScalarT]])
dtype (str | dtype | None)

Return type:

None

async add_node_attribute(name, values, *, dtype=None)[source]¶

Per-node attribute on a graph / skeleton level.

Identical semantics to add_attribute() — nodes are the graph’s vertices. Provided as an ergonomic alias for code that reads more naturally with the graph terminology.

Parameters:

name (str)
values (ndarray[tuple[Any, ...], dtype[_ScalarT]])
dtype (str | dtype | None)

Return type:

None

add_node_attribute_sync(name, values, *, dtype=None)[source]¶

Parameters:

name (str)
values (ndarray[tuple[Any, ...], dtype[_ScalarT]])
dtype (str | dtype | None)

Return type:

None

async add_object_attribute(name, values, *, dtype=None)[source]¶

Per-object attribute, length equal to num_objects.

Writes the dense (O,) | (O, C) array to object_attributes/<name>/data.

Parameters:

name (str)
values (ndarray[tuple[Any, ...], dtype[_ScalarT]])
dtype (str | dtype | None)

Return type:

None

add_object_attribute_sync(name, values, *, dtype=None)[source]¶

Parameters:

name (str)
values (ndarray[tuple[Any, ...], dtype[_ScalarT]])
dtype (str | dtype | None)

Return type:

None

async append_vertices(positions, *, object_ids=None, dtype=None)[source]¶

Append new vertices (and new objects) to this level.

Routes each vertex to its spatial chunk, reads the existing chunk data, appends one fragment per new object, and rewrites the chunk. Per-chunk RMW is parallelised over chunks via asyncio.gather().

Per-object manifest entries are staged in memory and flushed to a pending sidecar by commit().

Parameters:

positions (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – (N, D) array of new vertex positions.
object_ids (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – (N,) integer object IDs for each new vertex. IDs must be >= the current num_objects (no conflict with existing objects). Defaults to a contiguous range starting at the current count.
dtype (str | dtype | None) – Vertex dtype. Defaults to the level’s recorded dtype.

Returns:

Summary dict with vertices_added, new_objects, chunks_touched.

Return type:

dict

append_vertices_sync(positions, *, object_ids=None, dtype=None)[source]¶

Parameters:

positions (ndarray[tuple[Any, ...], dtype[_ScalarT]])
object_ids (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None)
dtype (str | dtype | None)

Return type:

dict

async commit()[source]¶

Flush pending appends into the main object_index/ array.

Reads the existing main index (if any), merges the staged manifests with last-write-wins on duplicate OIDs, and rewrites object_index/. Transactional backends (icechunk) make this cheap via copy-on-write; plain LocalStore rewrites the whole index on every commit.

Return type:: dict

commit_sync()[source]¶

Return type:: dict

async compact()[source]¶

Compatibility shim: pending-sidecar staging was removed in 0.6.0. Calls commit() (which now directly rewrites the main index) and reports the count for callers that used to rely on the explicit compaction step.

Return type:: dict

compact_sync()[source]¶

Return type:: dict

zarr_vectors.create_store(path, *, bounds=None, chunk_shape=None, axes=None, geometry_types=None, crs=None, ndim=None, vertex_dtype='float32', vertex_encoding='raw', links_convention=None, object_index_convention=None, cross_chunk_strategy=None, cross_level_depth=None, cross_level_storage=None, reduction_factor=None, base_bin_shape=None, format_capabilities=None, backend=None, **backend_kwargs)[source]¶

Create a new ZV store.

create_store(path) produces a structurally valid “warm” shell: root group with format markers + default bounds, a 0/ sub-group, and the empty ragged-vertex pair 0/vertices/ + 0/vertex_fragments/.

bounds is mandatory for every ZV store; when the caller does not pass one, the default ([0,...,0], [128,...,128]) is stamped (using ndim if given, otherwise inferred from axes / chunk_shape / bounds, defaulting to 3D). Subsequent writes must fit within these bounds unless the caller passes out_of_bounds="expand" (which grows the bounds in-place) or calls set_bounds() first.

The /parametric sub-group is not created here; it is created lazily on first use via get_parametric_group().

Parameters:

path (StoreLike) – URL or filesystem path for the new store.
bounds (tuple[list[float], list[float]] | None) – (min_corner, max_corner). When omitted, defaults to ([0]*ndim, [128]*ndim).
chunk_shape (tuple[float, ...] | None) – Spatial chunk size per dimension. When omitted, defaults to a single chunk covering bounds.
axes (list[NgffAxis] | None) – OME-Zarr-style axes list. When omitted, generated from DEFAULT_AXES_NAMES.
geometry_types (list[str] | None) – List of geometry types the store will contain. Defaults to [].
crs (dict[str, Any] | None) – Coordinate reference system dict.
ndim (int | None) – Number of spatial index dimensions. Useful when no other ndim-bearing kwarg is supplied (axes / bounds / chunk_shape). Defaults to 3.
vertex_dtype (str) – dtype for the level-0 vertices array.
vertex_encoding (str) – "raw" or "draco".
links_convention (str | None) – How edges are encoded ("explicit" / "implicit_sequential" / "implicit_sequential_with_branches"). When omitted the store has no convention stamped at create time; the first type writer (write_graph, write_polyline, …) fills it in via _ensure_root_metadata_for_write.
object_index_convention (str | None) – How object_index/ is encoded ("standard" / "identity"). Same lazy-fill rule as links_convention.
cross_chunk_strategy (str | None) – Cross-chunk connectivity strategy ("boundary_deduplication" / "explicit_links" / "both"). Lazy-filled by type writers.
cross_level_depth (int | None) – Maximum |delta| materialised by build_pyramid. 0 disables cross-level link arrays.
cross_level_storage (str | None) – "none" / "implicit" / "explicit" — see zarr_vectors.constants.VALID_XLEVEL_STORAGE.
reduction_factor (int | None) – Default vertex-count fold per pyramid step.
base_bin_shape (tuple[float, ...] | None) – Level-0 supervoxel bin edge lengths. When omitted, defaults to chunk_shape (one bin per chunk).
format_capabilities (list[str] | None) – Optional capability tokens to stamp on the root. See zarr_vectors.constants CAP_*.
backend (str | None) – Force a particular backend ("local" / "icechunk").
**backend_kwargs (Any) – Forwarded to the backend constructor.

Returns:

The root Group.

Raises:

StoreError – If a store already exists at path.
MetadataError – If kwargs are inconsistent (mismatched ndim).

Return type:

Group

zarr_vectors.detect_scheme(url)[source]¶

Return the URL scheme of url, lowercased; empty string if none.

A bare Windows drive letter (C:\foo) is treated as local (returns ""), not as the scheme c.

Parameters:: url (str | Path)
Return type:: str

zarr_vectors.open_store(path, mode='r', *, backend=None, **backend_kwargs)[source]¶

Open an existing ZV store.

Parameters:

path (StoreLike) – URL or filesystem path to the store.
mode (str) – "r" (read-only — writes will raise), "r+" (read-write), "a" (append). For mode="r" the underlying Zarr store is wrapped via store.with_read_only(True) when the store implementation supports it (the LocalStore and FsspecStore do; icechunk readonly sessions enforce read-only at the transaction layer). Callers that need to mutate must open with mode="r+".
backend (str | None) – Force a particular backend (auto-detect by default).
**backend_kwargs (Any) – Forwarded to the backend constructor.

Returns:

The root Group.

Raises:

StoreError – If the store does not exist or is structurally invalid.
MetadataError – If root metadata cannot be parsed.

Return type:

Group

zarr_vectors.rebind(group, backend, **backend_kwargs)[source]¶

Re-open the underlying store with a different driver (no data move).

Under the Zarr-native layer, rebind opens a new Zarr store at the same URL and swaps it in. For phases 1-3 only the local Zarr store is supported, so this is effectively a no-op unless the caller explicitly passes a different store.

Parameters:

group (Group)
backend (str | Any)
backend_kwargs (Any)

Return type:

Group

zarr_vectors.rechunk(store_path, spec, output=None)[source]¶

Rechunk a store along a non-spatial dimension.

Reads object data from the source store, assigns each object to a rechunk bin via DimensionMapper, and writes the result to an output store where chunk keys have a prefix dimension (bin, z, y, x).

Parameters:

store_path (str | Path) – Source store path.
spec (RechunkSpec) – Rechunk specification.
output (str | Path | None) – Output store path. If None, rechunks in-place by writing to a temporary store then replacing the source.

Returns:

Summary dict with objects_rechunked, bins_created, output_path.

Return type:

dict[str, Any]

zarr_vectors.rechunk_by_attribute(store_path, attribute_name, *, output=None, spatial_chunk_shape=None)[source]¶

Rechunk a store so that one chunk == one attribute value.

Categorical only — every unique value of the named per-object attribute becomes its own bin, regardless of how many unique values there are. Resulting chunk keys gain a leading dim: (attr_bin, z, y, x).

Parameters:

store_path (str) – Source store path or URL.
attribute_name (str) – Name of a per-object attribute already present on the source store (under object_attributes/<name>).
output (str | None) – Output path; if None, rechunks in place.
spatial_chunk_shape (tuple[float, ...] | None) – Optional new spatial chunk shape for the output store.

Returns:

The summary dict produced by rechunk().

Return type:

dict[str, Any]

Core store access¶

ZV store creation, opening, and management.

Naming: the on-disk format is referred to as ZV (Zarr Vectors). The older ZVF initialism may still appear in archived doc text but is not used in the wire format.

All storage I/O routes through a zarr.abc.store.Store wrapped by the Group abstraction in zarr_vectors.core.group.

class zarr_vectors.core.store.FsGroup(path, *, create=False)[source]¶

Bases: Group

Backwards-compatible subclass of Group rooted at a local path.

Deprecated since version Direct: use of FsGroup is deprecated. New code should call create_store() / open_store().

Parameters:

path (str | Path) – Filesystem path or pathlib.Path. A file:// URL is also accepted.
create (bool) – If True, create the directory if it does not already exist. If False, raise StoreError if the directory is missing (matching the legacy behaviour).

__init__(path, *, create=False)[source]¶

Parameters:

path (str | Path)
create (bool)

Return type:

None

zarr_vectors.core.store.add_resolution_level(root, level_index, bin_ratio, *, object_sparsity=1.0, coarsening_method='manual', parent_level=None)[source]¶

Create a new resolution level with a specified bin ratio.

Parameters:

root (Group)
level_index (int)
bin_ratio (tuple[int, ...])
object_sparsity (float)
coarsening_method (str)
parent_level (int | None)

Return type:

Group

zarr_vectors.core.store.branch(group, name, *, from_snapshot_id=None)[source]¶

Create a new icechunk branch off the current session’s tip.

Returns the snapshot id the branch was anchored at. Raises StoreError on non-icechunk backends.

Parameters:

group (Group)
name (str)
from_snapshot_id (str | None)

Return type:

str

zarr_vectors.core.store.commit(group, message='zarr-vectors write')[source]¶

Commit pending changes when the store is backed by a transactional backend (currently icechunk).

For non-transactional backends this is a no-op and returns None; writes are durable as soon as they hit the store.

For icechunk-backed stores this calls session.commit(message) and returns the new snapshot id (a hex string). The same session continues to be writable after the commit — subsequent ZV writes accumulate uncommitted state until the next commit(group, ...) call.

Parameters:

group (Group) – A Group returned by create_store() or open_store() (or any sub-group of one).
message (str) – Commit message; defaults to a placeholder. Empty strings are rejected by icechunk.

Returns:

Snapshot id (str) for icechunk-backed groups, else None.

Return type:

str | None

zarr_vectors.core.store.create_resolution_level(root, level, level_metadata)[source]¶

Create a new resolution level group within the store.

The level’s spatial transform (bin_ratio → NGFF scale, bin_shape → NGFF translation = bin_shape / 2) is written to the NGFF multiscales[0].datasets block via zarr_vectors.core.multiscale.upsert_level_transform() so the NGFF block stays the single source of truth for spatial geometry. bin_shape and bin_ratio are intentionally omitted from the level’s own zarr_vectors_level attrs; readers derive them from the NGFF block (see read_level_metadata()).

Parameters:

root (Group)
level (int)
level_metadata (LevelMetadata)

Return type:

Group

zarr_vectors.core.store.create_store(path, *, bounds=None, chunk_shape=None, axes=None, geometry_types=None, crs=None, ndim=None, vertex_dtype='float32', vertex_encoding='raw', links_convention=None, object_index_convention=None, cross_chunk_strategy=None, cross_level_depth=None, cross_level_storage=None, reduction_factor=None, base_bin_shape=None, format_capabilities=None, backend=None, **backend_kwargs)[source]¶

Create a new ZV store.

The /parametric sub-group is not created here; it is created lazily on first use via get_parametric_group().

Parameters:

path (StoreLike) – URL or filesystem path for the new store.
bounds (tuple[list[float], list[float]] | None) – (min_corner, max_corner). When omitted, defaults to ([0]*ndim, [128]*ndim).
chunk_shape (tuple[float, ...] | None) – Spatial chunk size per dimension. When omitted, defaults to a single chunk covering bounds.
axes (list[NgffAxis] | None) – OME-Zarr-style axes list. When omitted, generated from DEFAULT_AXES_NAMES.
geometry_types (list[str] | None) – List of geometry types the store will contain. Defaults to [].
crs (dict[str, Any] | None) – Coordinate reference system dict.
ndim (int | None) – Number of spatial index dimensions. Useful when no other ndim-bearing kwarg is supplied (axes / bounds / chunk_shape). Defaults to 3.
vertex_dtype (str) – dtype for the level-0 vertices array.
vertex_encoding (str) – "raw" or "draco".
links_convention (str | None) – How edges are encoded ("explicit" / "implicit_sequential" / "implicit_sequential_with_branches"). When omitted the store has no convention stamped at create time; the first type writer (write_graph, write_polyline, …) fills it in via _ensure_root_metadata_for_write.
object_index_convention (str | None) – How object_index/ is encoded ("standard" / "identity"). Same lazy-fill rule as links_convention.
cross_chunk_strategy (str | None) – Cross-chunk connectivity strategy ("boundary_deduplication" / "explicit_links" / "both"). Lazy-filled by type writers.
cross_level_depth (int | None) – Maximum |delta| materialised by build_pyramid. 0 disables cross-level link arrays.
cross_level_storage (str | None) – "none" / "implicit" / "explicit" — see zarr_vectors.constants.VALID_XLEVEL_STORAGE.
reduction_factor (int | None) – Default vertex-count fold per pyramid step.
base_bin_shape (tuple[float, ...] | None) – Level-0 supervoxel bin edge lengths. When omitted, defaults to chunk_shape (one bin per chunk).
format_capabilities (list[str] | None) – Optional capability tokens to stamp on the root. See zarr_vectors.constants CAP_*.
backend (str | None) – Force a particular backend ("local" / "icechunk").
**backend_kwargs (Any) – Forwarded to the backend constructor.

Returns:

The root Group.

Raises:

StoreError – If a store already exists at path.
MetadataError – If kwargs are inconsistent (mismatched ndim).

Return type:

Group

zarr_vectors.core.store.discard_changes(group)[source]¶

Drop uncommitted changes on a transactional backend.

No-op for non-transactional backends.

Parameters:: group (Group)
Return type:: None

zarr_vectors.core.store.get_parametric_group(root)[source]¶

Get the /parametric/ group, creating it if needed.

Parameters:: root (Group)
Return type:: Group

zarr_vectors.core.store.get_resolution_level(root, level)[source]¶

Get an existing resolution level group.

Raises:

StoreError – If the level does not exist.

Parameters:

root (Group)
level (int)

Return type:

Group

zarr_vectors.core.store.list_available_ratios(root)[source]¶

Return bin ratios for all existing resolution levels.

Parameters:: root (Group)
Return type:: list[tuple[int, …]]

zarr_vectors.core.store.list_resolution_levels(root)[source]¶

Return sorted level indices present in the store.

Resolution-level group names are bare integers (0, 1, …) under the 0.4.1+ layout (formerly resolution_0 / resolution_1). Top-level groups whose name doesn’t parse as int are some other entity (e.g. parametric/) and are silently skipped.

Parameters:: root (Group)
Return type:: list[int]

zarr_vectors.core.store.merge_branch(group, name, *, message='merge branch')[source]¶

Merge branch name into the group’s current branch.

Parameters:

group (Group)
name (str)
message (str)

Return type:

str | None

zarr_vectors.core.store.open_store(path, mode='r', *, backend=None, **backend_kwargs)[source]¶

Open an existing ZV store.

Parameters:

path (StoreLike) – URL or filesystem path to the store.
mode (str) – "r" (read-only — writes will raise), "r+" (read-write), "a" (append). For mode="r" the underlying Zarr store is wrapped via store.with_read_only(True) when the store implementation supports it (the LocalStore and FsspecStore do; icechunk readonly sessions enforce read-only at the transaction layer). Callers that need to mutate must open with mode="r+".
backend (str | None) – Force a particular backend (auto-detect by default).
**backend_kwargs (Any) – Forwarded to the backend constructor.

Returns:

The root Group.

Raises:

StoreError – If the store does not exist or is structurally invalid.
MetadataError – If root metadata cannot be parsed.

Return type:

Group

zarr_vectors.core.store.read_level_metadata(root, level)[source]¶

Read and parse level metadata.

bin_shape and bin_ratio are read from the NGFF multiscales[0].datasets block (the authoritative source under 0.5+). Legacy stores that still carry them under zarr_vectors_level are honoured as a fallback.

Raises:

StoreError – If the level does not exist.
MetadataError – If metadata is malformed.

Parameters:

root (Group)
level (int)

Return type:

LevelMetadata

zarr_vectors.core.store.read_parametric_types(root)[source]¶

Read parametric type registry from /parametric/.zattrs.

Parameters:: root (Group)
Return type:: list[ParametricTypeDef]

zarr_vectors.core.store.read_root_metadata(root)[source]¶

Read and parse root metadata from the store.

Parameters:: root (Group)
Return type:: RootMetadata

zarr_vectors.core.store.rebase(group, base='main')[source]¶

Rebase the group’s session’s branch onto base.

Parameters:

group (Group)
base (str)

Return type:

None

zarr_vectors.core.store.rebind(group, backend, **backend_kwargs)[source]¶

Re-open the underlying store with a different driver (no data move).

Parameters:

group (Group)
backend (str | Any)
backend_kwargs (Any)

Return type:

Group

zarr_vectors.core.store.remove_resolution_level(root, level_index)[source]¶

Remove a resolution level from the store.

Level 0 cannot be removed.

Raises:

StoreError – If the level does not exist or is level 0.

Parameters:

root (Group)
level_index (int)

Return type:

None

zarr_vectors.core.store.session_for(group)[source]¶

Return the underlying icechunk.Session if any, else None.

Looks up the session stashed on the group’s Zarr store at construction time (see create_store() / open_store()). Works for root groups and any sub-group reached via root.create_group(...) / root[name] — they all share the same underlying Zarr store.

Useful when callers need branch / snapshot operations that aren’t surfaced by the zarr-vectors API (creating branches, tagging, listing snapshots, etc.).

Parameters:: group (Group)
Return type:: Any | None

zarr_vectors.core.store.set_bounds(root, new_bounds, *, force=False)[source]¶

Update the store’s bounds after creation.

Expanding in any dimension (new_min <= cur_min and new_max >= cur_max) is always allowed.
Contracting in any dimension requires force=True. When forced, any vertices that fall outside the new bounds are pruned per-vertex from every level’s vertices/ array. Auxiliary arrays (object_index, attributes) are NOT rewritten — re-run the type writer or rechunk if you need a fully consistent store after a contract.

Parameters:

root (Group) – Store root group returned by create_store() / open_store().
new_bounds (tuple[list[float], list[float]]) – (min_corner, max_corner) for the new bounds.
force (bool) – Required when any dimension is contracting.

Return type:

None

zarr_vectors.core.store.store_info(root)[source]¶

Return summary information about a ZV store.

Parameters:: root (Group)
Return type:: dict[str, Any]

zarr_vectors.core.store.switch_branch(group, name)[source]¶

Swap the group’s underlying session to a writable session on branch name.

Existing references to group continue to work; the swap is transparent. Pending uncommitted edits on the previous session are discarded — caller should commit() first if they want to keep them.

Parameters:

group (Group)
name (str)

Return type:

None

zarr_vectors.core.store.write_parametric_types(root, types)[source]¶

Write parametric type registry to /parametric/.zattrs.

Parameters:

root (Group)
types (list[ParametricTypeDef])

Return type:

None

OME-Zarr compatible multiscale metadata for zarr vectors stores.

Generates and reads multiscales metadata blocks in the root .zattrs, following the OME-NGFF spec (v0.4). This allows OME-Zarr-aware viewers to discover the resolution pyramid structure.

NGFF v0.5 nests OME metadata under attributes.ome inside Zarr v3 zarr.json; ZV writes multiscales at bare root, which matches the v0.4 layout — hence the version: "0.4" declaration.

The ZV format discriminator lives in multiscales[].metadata.format = "zarr_vectors", NOT in multiscales[].type — NGFF reserves type for the downsampling method ("gaussian", "nearest", …).

The metadata is informational and coexists with zarr vectors-specific metadata — it does not replace the zarr_vectors_level entries.

zarr_vectors.core.multiscale.get_level_scale(root, level)[source]¶

Get the scale factors for a specific level from multiscale metadata.

Parameters:

root (FsGroup) – Root store group.
level (int) – Resolution level index.

Returns:

List of scale factors per dimension, or None if not available.

Return type:

list[float] | None

zarr_vectors.core.multiscale.get_level_translation(root, level)[source]¶

Get the translation offset for a specific level.

Parameters:

root (FsGroup) – Root store group.
level (int) – Resolution level index.

Returns:

List of translation offsets per dimension, or None.

Return type:

list[float] | None

zarr_vectors.core.multiscale.read_level_transform(root, level)[source]¶

Read (scale, translation) for a level from the NGFF block.

Returns (None, None) when the level has no entry in the NGFF multiscales[0].datasets list — callers should fall back to the legacy zarr_vectors_level.bin_ratio / bin_shape fields on the level’s own attrs.

Parameters:

root (FsGroup)
level (int)

Return type:

tuple[list[float] | None, list[float] | None]

zarr_vectors.core.multiscale.read_multiscale_metadata(root)[source]¶

Read OME-Zarr multiscale metadata from root .zattrs.

Parameters:: root (FsGroup) – Root store group.
Returns:: The multiscales list, or None if not present.
Return type:: list[dict[str, Any]] | None

zarr_vectors.core.multiscale.upsert_level_transform(root, level, *, scale, translation=None)[source]¶

Upsert one level’s entry in the NGFF multiscales[0].datasets list.

This is the authoritative writer for per-level spatial transforms under the 0.5+ format: bin_ratio lives as the scale factor and bin_shape / 2 lives as the translation offset. Callers in zarr_vectors.core.store invoke this inside create_resolution_level() and add_resolution_level() after writing the level’s other attrs.

Parameters:

root (FsGroup) – Root store group.
level (int) – Resolution level index (0 for full resolution).
scale (list[float]) – Per-axis scale factor (= bin_ratio for that level).
translation (list[float] | None) – Optional per-axis translation offset (= bin_shape / 2). When all entries are zero, the translation transform is omitted.

Return type:

None

zarr_vectors.core.multiscale.write_multiscale_metadata(root)[source]¶

Generate and write OME-Zarr multiscale metadata to root .zattrs.

Reads all existing resolution levels and their bin shapes to compute scale and translation transforms.

Parameters:: root (FsGroup) – Root store group.
Returns:: The multiscales list written to .zattrs.
Return type:: dict[str, Any]

zarr_vectors¶

Core store access¶

Rechunking¶