Bin shape¶

Terms¶

bin_shape: A D-tuple of positive floats declaring the spatial extent of each supervoxel bin within a chunk. bin_shape must evenly divide chunk_shape in every dimension. The number of bins per chunk along axis d is chunk_shape[d] / bin_shape[d].
Supervoxel bin: The finest spatial subdivision in ZVF. Each bin is a rectangular region of physical space whose size is bin_shape. Bins tile the interior of each chunk exactly, with no gaps or overlaps. All vertices whose position falls within a bin are stored together as a vertex group (VG).
Bins per chunk (B_per_chunk): The total number of bins within a single chunk: product(chunk_shape[d] / bin_shape[d] for d in range(D)). For chunk_shape = (200, 200, 200) and bin_shape = (50, 50, 50): B_per_chunk = 4 × 4 × 4 = 64.
Effective bin shape: The bin shape at a given resolution level, after applying the level’s bin_ratio. effective_bin_shape[d] = base_bin_shape[d] × bin_ratio[d]. At level 0, effective bin shape equals base_bin_shape.
base_bin_shape: The bin shape at the full-resolution level (level 0). Stored in root .zattrs. Serves as the reference from which all coarser-level bin shapes are derived.
Divisibility constraint: The requirement that chunk_shape[d] % bin_shape[d] == 0 for all d. Ensures bins tile chunks exactly, which is necessary for the VG index to be well-defined. Validated at write time and by the L2 validator.

Introduction¶

Bins are the unit of spatial querying in ZVF. When you issue a bounding-box query, the query engine does not load entire chunks — it identifies which bins overlap the query region and reads only those bins’ vertices. This sub-chunk spatial resolution is what makes ZVF efficient for small targeted queries on large datasets.

bin_shape is the parameter that controls this query granularity. A smaller bin_shape means finer spatial queries (less read amplification for small bboxes) but more bins per chunk (larger VG index, more index bookkeeping). A larger bin_shape means coarser queries but lower index overhead.

Unlike chunk_shape, bin_shape does not affect the physical file layout. Changing bin_shape requires only recomputing the VG index and re-sorting vertices within each chunk — the chunk files themselves stay the same size and location. Re-binning is exposed for point-cloud stores via zarr_vectors.rechunk.rebin_level(store_path, target_bin_shape, level=0); for other geometry types (polyline, streamline, graph, skeleton, mesh) the VG structure carries per-object semantics so a full rechunk is required instead.

Technical reference¶

Declaration and storage¶

base_bin_shape is declared in root .zattrs:

{
  "chunk_shape":    [200.0, 200.0, 200.0],
  "base_bin_shape": [50.0,  50.0,  50.0]
}

The effective bin shape at each resolution level is stored in the per-level .zattrs for convenience:

{
  "level":     1,
  "bin_ratio": [2, 2, 2],
  "bin_shape": [100.0, 100.0, 100.0]
}

Readers should compute bin_shape at level N as:

bin_shape[N][d] = base_bin_shape[d] × bin_ratio[N][d]

and cross-check against the stored bin_shape value. A mismatch is a validation error (L2).

Divisibility constraint¶

For bins to tile chunks exactly, bin_shape must divide chunk_shape evenly in every dimension:

chunk_shape[d] / bin_shape[d]  must be a positive integer,  for all d

This is enforced at write time. The write functions raise ValueError if the constraint is violated.

Example of a valid configuration:

chunk_shape = (200, 200, 200)
bin_shape   = (50, 50, 50)    → 4.0 bins per axis ✓

Example of an invalid configuration:

chunk_shape = (200, 200, 200)
bin_shape   = (60, 60, 60)    → 3.333… bins per axis ✗

The constraint applies at every resolution level. Because the effective bin shape at level N is base_bin_shape × bin_ratio[N], and base_bin_shape already satisfies the constraint at level 0, coarser levels are automatically valid provided bin_ratio components are positive integers and the products base_bin_shape[d] × bin_ratio[N][d] still divide chunk_shape[d].

For example, with chunk_shape = (200,), base_bin_shape = (50,), and bin_ratio = [4] at level 2: effective_bin_shape = 200.0. This is valid (1 bin per chunk at level 2). bin_ratio = [5] would give effective_bin_shape = 250.0 > chunk_shape[0], which is invalid.

Rule: bin_ratio[N][d] must satisfy base_bin_shape[d] × bin_ratio[N][d] ≤ chunk_shape[d], and the ratio must be a divisor.

Bin coordinate arithmetic¶

For a vertex at position p within a chunk at coordinate chunk_coord:

# Local position within the chunk (physical units)
local = p - np.array(chunk_coord) * np.array(chunk_shape)  # shape (D,)

# Bin coordinate (integer, 0-indexed within chunk)
bin_coord = np.floor(local / effective_bin_shape).astype(int)  # shape (D,)

# Flat bin index (C-order ravel)
bpc_shape  = tuple(int(c / b) for c, b in zip(chunk_shape, effective_bin_shape))
bin_flat   = int(np.ravel_multi_index(bin_coord, bpc_shape))

Default behaviour when `bin_shape` is omitted¶

If bin_shape is not specified at write time, it defaults to chunk_shape. This results in exactly one bin per chunk (B_per_chunk = 1), which is equivalent to a purely chunk-indexed store without sub-chunk spatial indexing.

This default preserves backward compatibility with readers and tools that do not understand the bin concept. A bbox query on a one-bin-per-chunk store degrades gracefully to chunk-granularity queries.

Choosing `bin_shape`¶

The optimal bin_shape depends on the typical bounding-box query size relative to chunk_shape.

Heuristic: set bin_shape so that a typical query bbox spans 2–8 bins per axis. This ensures the query reads only the relevant data while keeping B_per_chunk manageable (8–512 bins per chunk for a 3-D store).

`chunk_shape`	Typical query bbox	Recommended `bin_shape`	`B_per_chunk`
(200, 200, 200)	~50³ region	(50, 50, 50)	64
(500, 500, 500)	~100³ region	(100, 100, 100)	125
(500, 500, 500)	~250³ region	(250, 250, 250)	8
(1000, 1000, 1000)	~200³ region	(200, 200, 200)	125

For point cloud stores that will be queried at many scales (e.g. by a visualiser that pans and zooms), a smaller bin_shape (more bins) is preferable. For batch analysis that reads entire volumes sequentially, a larger bin_shape (fewer bins) reduces VG index overhead.

Effect on multiscale pyramids¶

At coarser resolution levels, the effective bin shape grows proportionally with bin_ratio. This is the mechanism by which ZVF achieves spatial downsampling: the bin grid at level 1 is half as fine as at level 0 (for bin_ratio = (2, 2, 2)), so each bin at level 1 covers 8× the volume and contains up to 8× as many merged vertices.

The chunk_shape never changes across levels. Only the bin grid changes. This means the file layout (one file per chunk) is identical at all levels; only the VG index and vertex positions differ.

Validation¶

L1: base_bin_shape is present in root .zattrs and is a list of D positive numbers.

L2:

len(base_bin_shape) == spatial_dims.
All elements are strictly positive.
chunk_shape[d] % base_bin_shape[d] == 0 for all d.
For each resolution level N: effective_bin_shape[N][d] ≤ chunk_shape[d] and chunk_shape[d] % effective_bin_shape[N][d] == 0 for all d.
bin_ratio values at every level are positive integers.

L3:

At level 0 with the default writer (i.e. when LevelMetadata.shared_fragments=False), the range fragment count R in each chunk’s vertex_fragments/<chunk_coords> blob equals that chunk’s non-empty-bin count. B_per_chunk = product(chunk_shape[d] / effective_bin_shape[N][d]) remains the upper bound on the total fragment count F.
For every range fragment, start + count ≤ vertex_count_in_chunk. For every explicit fragment, all entries of the index list lie in [0, vertex_count_in_chunk).
At coarsened levels with shared_fragments=True, F is bounded by the number of distinct metavertices in the chunk and has no fixed relation to B_per_chunk.