Fragments¶
Terms¶
- Fragment
The on-disk unit that packages rows of
vertices/<chunk>for indexing. Each fragment is one entry in the chunk’s fragment-index blob and is either a contiguous range of rows or an explicit list of row indices. See Fragment-index arrays for the byte layout.- Fragment address
The pair
(chunk_coords, fragment_index)that uniquely identifies a fragment in one level of a store. Chunk coordinates are a D-tuple (not a flat ravel), andfragment_indexis chunk-local — it indexes the chunk’svertex_fragments/<chunk_coords>blob only.- Primary fragment
The first fragment of an object in traversal order (for sequential types) or the fragment containing the root vertex (for tree types). Useful for preview rendering; no longer the only addressable entry in
object_index/.- Object fragment list
The ordered list of fragment addresses belonging to one object. Encoded as the per-object manifest in
object_index/data(see Object manifest).
Introduction¶
Fragments are the bridge between the chunk-level I/O of Zarr and the bin-level spatial queries of ZVF. A small binary fragment index sits alongside each chunk’s vertex payload, naming the row ranges that belong to each fragment.
At level 0 with the default writer, fragments line up with bins
one-to-one: each non-empty bin emits exactly one range fragment.
The coarsening pipeline relaxes that correspondence — at coarsened
pyramid levels with LevelMetadata.shared_fragments=True a single
fragment may represent a metavertex referenced by several objects’
manifests, and the bin model is supplanted by the fragment as the
addressing primitive.
This page describes the level-0 write path (where fragments and bins align one-to-one), the fragment addressing scheme, the read path, and what changes at coarsened levels.
Technical reference¶
Fragment creation during write (level 0)¶
When a chunk is written, its vertices are first sorted into bin order; then one range fragment is emitted per non-empty bin:
def build_level0_fragments(positions, chunk_coord, chunk_shape, bin_shape):
"""Compute fragment list for one chunk's vertices."""
D = len(chunk_shape)
bpc_shape = tuple(int(c / b) for c, b in zip(chunk_shape, bin_shape))
# 1. Compute bin flat index for each vertex
local = positions - np.array(chunk_coord) * np.array(chunk_shape)
bin_coords = np.floor(local / np.array(bin_shape)).astype(int)
bin_flats = np.ravel_multi_index(bin_coords.T, bpc_shape)
# 2. Sort vertices into bin order (stable sort within each bin)
order = np.argsort(bin_flats, kind="stable")
sorted_pos = positions[order]
sorted_bin = bin_flats[order]
# 3. Emit one range fragment per non-empty bin
unique_bins, first_occ, counts = np.unique(
sorted_bin, return_index=True, return_counts=True,
)
fragments = [
(int(first_occ[i]), int(counts[i]))
for i in range(len(unique_bins))
]
# 4. Encode and write
blob = encode_fragments(fragments)
return sorted_pos, order, blob
The order permutation is applied to attribute arrays in the same
way: sorted_attributes = attributes[order]. Attribute row k always
corresponds to vertex row k in the chunk.
Empty bins do not contribute a fragment — the pre-0.6 (0, 0) and
(-1, -1) filler rows are gone. The total fragment count F in a
level-0 chunk equals the number of non-empty bins, bounded above by
B_per_chunk.
Fragment addressing¶
A fragment is uniquely identified within one level of a store by:
chunk_coords: the D-tuple(i, j, k, …)naming the chunk. No flat ravel; the tuple is stored asint64words in the manifest block (see Object manifest).fragment_index: anintin[0, F), chunk-local. F may differ from chunk to chunk; readers learn F from the chunk’svertex_fragments/blob.
There is no global flat fragment index, and there is no need for one:
manifests reference fragments by their (chunk_coords, fragment_index) pair, and the pair is decoded directly from the
manifest blob without a ravel-multi-index step.
Fragment access pattern¶
Reading a specific fragment given its address:
def read_fragment_vertices(level_group, chunk_coords, fragment_index):
"""Read the vertices of a single fragment."""
fidx = read_fragment_index(
level_group, VERTEX_FRAGMENTS, chunk_coords,
)
rows = fidx.indices(fragment_index)
vertices = level_group.read_vertices(chunk_coords)
return vertices[rows, :]
For range fragments, fidx.indices(f) materialises arange(start, start + count). For explicit fragments, it returns a copy of the
CSR slice; hot loops can use fidx.indices_view(f) for a zero-copy
view over the underlying bytes. See
Fragment-index arrays for the
decoder algorithm.
Because each chunk of vertices/ is independently stored, reading
any fragment in a chunk reads the chunk’s vertex payload in full;
the fragment index then selects rows from the in-memory array.
Subsequent fragment reads within the same chunk are served from
the in-process chunk cache.
Fragments and the object model¶
For discrete-object geometry types (polyline, streamline, graph,
skeleton, mesh), object_index/data stores each object’s full
manifest — the ordered list of (chunk_coords, fragment_ref) blocks
that span the object. The first block names the object’s primary
fragment:
For polylines and streamlines, blocks are in traversal order. A single chunk’s contribution may be one fragment (mode 0), a contiguous range of adjacent-bin fragments (mode 1), or an arbitrary list (mode 2 — most common after coarsening).
For graphs and skeletons, block order is implementation-defined; connectivity is recovered from
links/<delta>/andcross_chunk_links/.
Fragments across multiple chunks¶
Reading a multi-chunk object is a single decode of the object’s manifest blob followed by parallel chunk reads — no link-walk loop:
def read_object(level_group, object_id):
blocks = decode_object_manifest(level_group, object_id)
chunks = [
read_chunk_fragments(level_group, coords, fragment_ref)
for coords, fragment_ref in blocks
]
return np.concatenate(chunks, axis=0)
Pre-0.6 stores required walking cross_chunk_links/ forward from
each chunk to discover the next chunk; the walk was a sequence of
dependent reads. Post-0.6 the manifest enumerates every chunk
directly. cross_chunk_links/0/ still encodes geometric edges
across chunks but is not used for chunk discovery during object
reads.
Fragment count at coarser levels¶
The intuition “B_per_chunk shrinks with bin_ratio until each
chunk is one fragment at the all-coarse level” is true only when the
default writer is in use and the level does not share fragments.
For levels with shared_fragments=False:
Level 0: bpc_shape = (4, 4, 4) → max F per chunk = 64
Level 1: bpc_shape = (2, 2, 2) → max F per chunk = 8 (bin_ratio = (2,2,2))
Level 2: bpc_shape = (1, 1, 1) → max F per chunk = 1 (bin_ratio = (4,4,4))
For levels with shared_fragments=True, F equals the number of
distinct metavertices in the chunk after coarsening — there is no
fixed relationship to B_per_chunk. F is typically smaller than
B_per_chunk (each metavertex represents many original vertices)
but can in principle exceed it when many objects converge on
non-aligned positions inside one bin.
Invariants maintained by the writer¶
A writer SHALL preserve:
Bin order at level 0. Vertices within a chunk are stored in ascending bin-flat-index order. Attribute row
kcorresponds to vertex rowk.Fragment-blob well-formedness. Every chunk’s
vertex_fragments/blob conforms to the v1 byte layout — magic'ZVFG', version 1,R == popcount(bitmap[0:F]), monotone CSR offsets. See Fragment-index arrays §”Write-time invariants”.Index bounds. Every range fragment’s
[start, start + count)and every explicit fragment’s indices lie in[0, len(vertices[chunk_coords])).Fragment order at level 0 (default writer). Fragments are emitted in ascending bin-flat-index order, so readers that need a bin ↔ fragment map at level 0 can derive it from the writer’s emission order without a separate sidecar.
Attribute row alignment. Attribute arrays for chunk
care row-aligned withvertices/<c>. Attribute access for fragmentfgoes viaattributes[chunk][fidx.indices(f)]; no separate attribute index is maintained.Manifest consistency. The manifest in
object_index/datafor objectoidnames only fragments that exist in the named chunks’ fragment-index blobs.Sharing discipline. When
LevelMetadata.shared_fragments=False, no fragment is named by more than one object’s manifest. WhenTrue, sharing is permitted and readers MUST NOT assume uniqueness.
Pre-0.6 ↔ post-0.6 cross-reference¶
The pre-0.6 layout used the term fragment (fragment) for a
(chunk_flat, bin_flat) pair backed by a (B, 2) int64 table
named vertex_group_offsets/. The v0.6 fragment is the post-rename
equivalent on disk; users migrating off pre-0.6 stores may find this
mapping useful:
Pre-0.6 term |
Post-0.6 equivalent |
|---|---|
fragment address |
Fragment address |
|
|
|
|
Bin with |
No fragment emitted for that bin |
Bin with |
No fragment emitted for that bin |
One row per bin per chunk (fixed shape) |
One fragment per non-empty bin (variable shape) |
|
Zero overhead for empty bins |
n/a |
Explicit fragment (mode-2; arbitrary index list) |
n/a |
Shared fragment (named by ≥ 2 manifests) |
Object manifest reconstructed via |
Object manifest enumerated directly in |