Bin shape¶
Terms¶
bin_shapeA D-tuple of positive floats declaring the spatial extent of each supervoxel bin within a chunk.
bin_shapemust evenly dividechunk_shapein every dimension. The number of bins per chunk along axisdischunk_shape[d] / bin_shape[d].- Supervoxel bin
The finest spatial subdivision in ZVF. Each bin is a rectangular region of physical space whose size is
bin_shape. Bins tile the interior of each chunk exactly, with no gaps or overlaps. All vertices whose position falls within a bin are stored together as a vertex group (VG).- Bins per chunk (
B_per_chunk) The total number of bins within a single chunk:
product(chunk_shape[d] / bin_shape[d] for d in range(D)). Forchunk_shape = (200, 200, 200)andbin_shape = (50, 50, 50):B_per_chunk = 4 × 4 × 4 = 64.- Effective bin shape
The bin shape at a given resolution level, after applying the level’s
bin_ratio.effective_bin_shape[d] = base_bin_shape[d] × bin_ratio[d]. At level 0, effective bin shape equalsbase_bin_shape.base_bin_shapeThe bin shape at the full-resolution level (level 0). Stored in root
.zattrs. Serves as the reference from which all coarser-level bin shapes are derived.- Divisibility constraint
The requirement that
chunk_shape[d] % bin_shape[d] == 0for alld. Ensures bins tile chunks exactly, which is necessary for the VG index to be well-defined. Validated at write time and by the L2 validator.
Introduction¶
Bins are the unit of spatial querying in ZVF. When you issue a bounding-box query, the query engine does not load entire chunks — it identifies which bins overlap the query region and reads only those bins’ vertices. This sub-chunk spatial resolution is what makes ZVF efficient for small targeted queries on large datasets.
bin_shape is the parameter that controls this query granularity. A smaller
bin_shape means finer spatial queries (less read amplification for small
bboxes) but more bins per chunk (larger VG index, more index bookkeeping). A
larger bin_shape means coarser queries but lower index overhead.
Unlike chunk_shape, bin_shape does not affect the physical file layout.
Changing bin_shape requires only recomputing the VG index and re-sorting
vertices within each chunk — the chunk files themselves stay the same size
and location. Re-binning is exposed for point-cloud stores via
zarr_vectors.rechunk.rebin_level(store_path, target_bin_shape, level=0);
for other geometry types (polyline, streamline, graph, skeleton, mesh)
the VG structure carries per-object semantics so a full
rechunk is required instead.
Technical reference¶
Declaration and storage¶
base_bin_shape is declared in root .zattrs:
{
"chunk_shape": [200.0, 200.0, 200.0],
"base_bin_shape": [50.0, 50.0, 50.0]
}
The effective bin shape at each resolution level is stored in the per-level
.zattrs for convenience:
{
"level": 1,
"bin_ratio": [2, 2, 2],
"bin_shape": [100.0, 100.0, 100.0]
}
Readers should compute bin_shape at level N as:
bin_shape[N][d] = base_bin_shape[d] × bin_ratio[N][d]
and cross-check against the stored bin_shape value. A mismatch is a
validation error (L2).
Divisibility constraint¶
For bins to tile chunks exactly, bin_shape must divide chunk_shape
evenly in every dimension:
chunk_shape[d] / bin_shape[d] must be a positive integer, for all d
This is enforced at write time. The write functions raise ValueError if
the constraint is violated.
Example of a valid configuration:
chunk_shape = (200, 200, 200)
bin_shape = (50, 50, 50) → 4.0 bins per axis ✓
Example of an invalid configuration:
chunk_shape = (200, 200, 200)
bin_shape = (60, 60, 60) → 3.333… bins per axis ✗
The constraint applies at every resolution level. Because the effective
bin shape at level N is base_bin_shape × bin_ratio[N], and
base_bin_shape already satisfies the constraint at level 0, coarser
levels are automatically valid provided bin_ratio components are
positive integers and the products base_bin_shape[d] × bin_ratio[N][d]
still divide chunk_shape[d].
For example, with chunk_shape = (200,), base_bin_shape = (50,),
and bin_ratio = [4] at level 2: effective_bin_shape = 200.0. This is
valid (1 bin per chunk at level 2). bin_ratio = [5] would give
effective_bin_shape = 250.0 > chunk_shape[0], which is invalid.
Rule: bin_ratio[N][d] must satisfy
base_bin_shape[d] × bin_ratio[N][d] ≤ chunk_shape[d], and the ratio
must be a divisor.
Bin coordinate arithmetic¶
For a vertex at position p within a chunk at coordinate chunk_coord:
# Local position within the chunk (physical units)
local = p - np.array(chunk_coord) * np.array(chunk_shape) # shape (D,)
# Bin coordinate (integer, 0-indexed within chunk)
bin_coord = np.floor(local / effective_bin_shape).astype(int) # shape (D,)
# Flat bin index (C-order ravel)
bpc_shape = tuple(int(c / b) for c, b in zip(chunk_shape, effective_bin_shape))
bin_flat = int(np.ravel_multi_index(bin_coord, bpc_shape))
Default behaviour when bin_shape is omitted¶
If bin_shape is not specified at write time, it defaults to
chunk_shape. This results in exactly one bin per chunk
(B_per_chunk = 1), which is equivalent to a purely chunk-indexed store
without sub-chunk spatial indexing.
This default preserves backward compatibility with readers and tools that do not understand the bin concept. A bbox query on a one-bin-per-chunk store degrades gracefully to chunk-granularity queries.
Choosing bin_shape¶
The optimal bin_shape depends on the typical bounding-box query size
relative to chunk_shape.
Heuristic: set bin_shape so that a typical query bbox spans 2–8 bins
per axis. This ensures the query reads only the relevant data while keeping
B_per_chunk manageable (8–512 bins per chunk for a 3-D store).
|
Typical query bbox |
Recommended |
|
|---|---|---|---|
(200, 200, 200) |
~50³ region |
(50, 50, 50) |
64 |
(500, 500, 500) |
~100³ region |
(100, 100, 100) |
125 |
(500, 500, 500) |
~250³ region |
(250, 250, 250) |
8 |
(1000, 1000, 1000) |
~200³ region |
(200, 200, 200) |
125 |
For point cloud stores that will be queried at many scales (e.g. by a
visualiser that pans and zooms), a smaller bin_shape (more bins) is
preferable. For batch analysis that reads entire volumes sequentially, a
larger bin_shape (fewer bins) reduces VG index overhead.
Effect on multiscale pyramids¶
At coarser resolution levels, the effective bin shape grows proportionally
with bin_ratio. This is the mechanism by which ZVF achieves spatial
downsampling: the bin grid at level 1 is half as fine as at level 0 (for
bin_ratio = (2, 2, 2)), so each bin at level 1 covers 8× the volume and
contains up to 8× as many merged vertices.
The chunk_shape never changes across levels. Only the bin grid changes.
This means the file layout (one file per chunk) is identical at all levels;
only the VG index and vertex positions differ.
Validation¶
L1: base_bin_shape is present in root .zattrs and is a list of D
positive numbers.
L2:
len(base_bin_shape) == spatial_dims.All elements are strictly positive.
chunk_shape[d] % base_bin_shape[d] == 0for alld.For each resolution level
N:effective_bin_shape[N][d] ≤ chunk_shape[d]andchunk_shape[d] % effective_bin_shape[N][d] == 0for alld.bin_ratiovalues at every level are positive integers.
L3:
At level 0 with the default writer (i.e. when
LevelMetadata.shared_fragments=False), the range fragment countRin each chunk’svertex_fragments/<chunk_coords>blob equals that chunk’s non-empty-bin count.B_per_chunk = product(chunk_shape[d] / effective_bin_shape[N][d])remains the upper bound on the total fragment countF.For every range fragment,
start + count ≤ vertex_count_in_chunk. For every explicit fragment, all entries of the index list lie in[0, vertex_count_in_chunk).At coarsened levels with
shared_fragments=True,Fis bounded by the number of distinct metavertices in the chunk and has no fixed relation toB_per_chunk.