Zarr v3 primer¶
Terms¶
- Store
The root container of a Zarr hierarchy. A store maps string keys to bytes and can be backed by a local file system, a cloud object store, a ZIP archive, or an in-memory dictionary. In ZVF, a store corresponds to the
.zarrvectorsdirectory (or prefix).- Group
A Zarr node that contains other groups and/or arrays. Analogous to a directory. Groups may carry metadata in a
.zattrsfile (Zarr v3:zarr.jsonunder the group key). ZVF uses groups to represent the store root, each resolution level, and each array sub-group (e.g.vertices/,attributes/).- Array
A Zarr node that stores an n-dimensional typed numeric array, divided into chunks. Each chunk is stored as one key in the underlying store (or, with the sharding codec, multiple logical chunks per storage key).
- Chunk
The unit of storage for a Zarr array. Chunks are identified by a multi-dimensional integer index (the chunk coordinate). Each chunk is independently compressible and retrievable.
- Codec pipeline
An ordered sequence of transformations applied to chunk data before writing and reversed on reading. In Zarr v3, the pipeline is declared in the array metadata and may include array-to-bytes codecs (e.g.
bytes), byte-to-byte codecs (e.g.blosc,zstd), and thesharding_indexedcodec for sub-chunk addressing..zattrsA JSON file at a group path that stores arbitrary user-defined metadata. ZVF makes extensive use of
.zattrsat the store root and at each resolution level group.zarr.jsonIn Zarr v3, the per-node metadata file (replaces
.zarrayand.zgroupfrom Zarr v2). Stores array shape, dtype, chunk shape, and codec pipeline.
Introduction¶
ZVF does not reinvent storage primitives. It is a convention layer built on top of Zarr v3: every array inside a ZVF store is a standard Zarr v3 array, and every group is a standard Zarr v3 group. A reader that understands only Zarr v3 can open any ZVF store, traverse its groups, and read raw array data — it simply would not know how to interpret the ZVF-specific conventions (the VG index, the object model, the multiscale metadata).
This inheritance is intentional. It means ZVF benefits automatically from Zarr’s broad ecosystem: cloud backends (S3, GCS, Azure), codec libraries (Blosc, Zstd, LZ4, Brotli), language bindings (Python, JavaScript, Java, C++), and tools. ZVF adds only what Zarr itself does not provide: a spatial index structure, an object model for discrete geometry, and a multi- resolution metadata convention.
This page summarises the subset of Zarr v3 that ZVF relies on directly. It is not a substitute for the Zarr v3 specification; consult that document for full detail.
Technical reference¶
Store model¶
A Zarr v3 store is an abstract key–value mapping. Keys are slash-separated
strings (e.g. 0/vertices/c/0/1/2). Values are bytes. The
mapping is implemented by a store backend; ZVF supports the following
backends (see Store types for detail):
Local file system (
zarr.storage.LocalStore)ZIP file (
zarr.storage.ZipStore)In-memory (
zarr.storage.MemoryStore)S3 via
s3fs(requireszarr-vectors[cloud])GCS via
gcsfs(requireszarr-vectors[cloud])
The store root of a ZVF store is identified by the path to the
.zarrvectors directory (or equivalent prefix in object storage).
Groups and the hierarchy¶
A Zarr v3 group is represented in the store by:
<path>/zarr.json— group metadata:{"zarr_format": 3, "node_type": "group"}<path>/.zattrs— optional user metadata (JSON object)
ZVF uses the following group hierarchy (abbreviated):
zarr.json ← store root group
.zattrs ← ZVF root metadata
0/
zarr.json ← resolution level group
.zattrs ← per-level metadata (bin_ratio, object_sparsity)
vertices/
zarr.json ← array metadata (shape, dtype, chunks, codecs)
c/0/0/0 ← chunk (0,0,0) data bytes
c/0/0/1 ← chunk (0,0,1) data bytes
…
vertex_fragments/
zarr.json
c/0/0/0 ← uint8 fragment-index blob for chunk (0,0,0)
c/0/0/1
…
All paths are relative to the store root.
Array metadata (zarr.json)¶
Each Zarr v3 array carries a zarr.json file with at minimum:
{
"zarr_format": 3,
"node_type": "array",
"shape": [N, 3],
"data_type": "float32",
"chunk_grid": {
"name": "regular",
"configuration": { "chunk_shape": [65536, 3] }
},
"chunk_key_encoding": {
"name": "default",
"configuration": { "separator": "/" }
},
"codecs": [
{ "name": "bytes", "configuration": { "endian": "little" } },
{ "name": "blosc", "configuration": { "cname": "zstd", "clevel": 5 } }
],
"fill_value": 0.0,
"dimension_names": ["vertex", "spatial_dim"]
}
ZVF does not mandate a specific codec; any Zarr v3-compatible codec
pipeline is acceptable. The default pipeline used by zarr-vectors-py is
bytes → blosc(zstd, level 5). Draco compression for mesh arrays uses a
custom codec registered with Zarr; see Codec pipeline.
Chunk key encoding¶
ZVF uses the Zarr v3 default chunk key encoding with / as the
separator. A chunk at grid coordinate (i, j, k) in array
0/vertices is stored at:
0/vertices/c/i/j/k
The c/ prefix is the Zarr v3 default; do not omit it.
Metadata inheritance¶
All Zarr v3 metadata rules apply within ZVF stores:
Array shape is the logical shape of the full array (all chunks combined). Chunks at the boundary of the array may be smaller than
chunk_shapein one or more dimensions; Zarr v3 handles this transparently.Missing chunks (chunks that contain only the fill value) need not be stored. ZVF exploits this for sparse levels: chunks with no objects at a given resolution simply have no chunk keys in the store.
The fill value for position arrays is
0.0; for index arrays it is-1(indicating “no entry”). Readers must treat missing chunks as filled with the declared fill value.
Zarr v2 compatibility¶
ZVF targets Zarr v3 exclusively. Zarr v2 stores use a different metadata
format (.zarray, .zgroup, Fortran vs C order ambiguity) and a different
chunk key scheme. zarr-vectors-py will raise ValueError if asked to open
a Zarr v2 store.
If you have data in a Zarr v2 store, migrate it with the zarr library’s
conversion utilities before ingesting into ZVF.