ZVF and TRX format¶
Terms¶
- TRX format
Tractography Exchange format — a ZIP-based tractography file format designed to replace TRK and TCK for cross-software data exchange. Defined at https://tractography-file-format.github.io. Implemented in
trx-pythonand supported by DIPY, MRtrix, Mrtrix3Tissue, and others.dps(data per streamline)Per-streamline scalar or matrix arrays in TRX, stored under
dps/in the ZIP archive. Each array has one row per streamline. Equivalent to ZVFobject_attributes/.dpp(data per point)Per-vertex scalar or matrix arrays in TRX, stored under
dpp/in the ZIP archive. Each array has one value per vertex across all streamlines, in concatenated order. Equivalent to ZVFattributes/.- TRX header
A JSON file
header.jsonat the root of the ZIP archive, storing the affine transform, voxel dimensions, voxel-to-rasmm transform, and data space identifier.- Memory mapping
TRX arrays are stored as raw binary
.npyfiles within the ZIP, allowing direct memory mapping on POSIX systems without decompression. This is the primary efficiency advantage of TRX for local I/O.- Offset array
TRX stores the lengths (or cumulative offsets) of individual streamlines in
streamlines/offsets.int64.npy. Given offsets[0, 40, 85, …], streamlinekspans points[offsets[k], offsets[k+1])ofstreamlines/data.float32.npy.
Introduction¶
TRX and ZVF both store tractography streamline data, but they were designed for fundamentally different access patterns. TRX prioritises fast sequential access on local storage via memory mapping — it is ideal for software pipelines that process all streamlines in sequence. ZVF prioritises spatial random access and cloud-native serving — it is ideal for spatial queries on large datasets, multi-resolution visualisation, and scalable cloud pipelines.
Most tractography workflows begin with data in TRK, TCK, or TRX format.
The ZVF ingest pipeline (ingest_trx, ingest_trk, ingest_tck) converts
these formats to ZVF, preserving all dps and dpp attributes as
object_attributes/ and attributes/ respectively.
Technical reference¶
Format structure comparison¶
Property |
TRX |
ZVF ( |
|---|---|---|
Container |
ZIP archive ( |
Directory tree ( |
Vertex storage |
Flat concatenated array + offset table |
Spatially chunked, fragment-indexed |
Spatial index |
None |
Chunk grid + fragment index |
Per-vertex attributes ( |
Separate |
|
Per-streamline attributes ( |
Separate |
|
Affine transform |
Stored in |
Not stored; apply before writing |
Multi-resolution |
No |
Yes |
Cloud access |
Not designed for (ZIP download required) |
Native (chunk-level range requests) |
Write API |
Yes ( |
Yes ( |
Memory mapping |
Yes (fast local sequential access) |
No (but chunk caching is equivalent for sequential reads) |
Groupings |
|
|
Data model mapping¶
Vertex positions¶
TRX stores all streamline vertices as a flat (total_points, 3) float32
array in streamlines/data.float32.npy. Streamline boundaries are given
by streamlines/offsets.int64.npy (cumulative sum of streamline lengths).
ZVF stores vertices chunked spatially. The equivalent of the TRX offset
table is the combination of object_index/ (primary fragment per streamline)
and cross_chunk_links/ (inter-chunk continuations).
dpp → attributes/¶
Each dpp/<name>.<dtype>.npy file in TRX contains one value per vertex in
the same concatenated order as streamlines/data. In ZVF, the equivalent
is attributes/<name>/, which stores one value per vertex in fragment order
(per-chunk, spatially sorted).
TRX |
ZVF |
|---|---|
|
|
|
|
|
|
dps → object_attributes/¶
Each dps/<name>.<dtype>.npy file in TRX contains one value per streamline
(row k corresponds to streamline k). In ZVF, the equivalent is
object_attributes/<name>/.
TRX |
ZVF |
|---|---|
|
|
|
|
|
|
Matrix dps attributes (shape (n_streamlines, K, M)) are supported in
both formats. ZVF stores them as object_attributes/<name>/ with shape
(n_objects, K, M).
Groups¶
TRX groups are sub-ZIP directories: groups/<name>/offsets.int64.npy
lists the indices of streamlines in the group. ZVF stores groups in
groupings/ with group names in groupings_attributes/name/.
TRX metadata not preserved in ZVF¶
TRX stores an affine transform from voxel space to RAS mm in header.json.
ZVF does not store or apply affine transforms; the caller must apply the
affine to vertex positions before writing to ZVF. The affine can be stored
in root .zattrs under custom_metadata.affine for provenance, but it
will not be automatically applied by zarr-vectors-py.
TRX ingest (via the companion package zarr-vectors-tools) exposes
an apply_affine keyword that controls whether the TRX header affine
is applied to vertex positions before they are written to ZVF.
Performance comparison¶
Operation |
TRX |
ZVF |
|---|---|---|
Sequential read of all streamlines |
Very fast (memory mapped) |
Fast (sequential chunk reads, ~same throughput) |
Random access to one streamline by ID |
O(1) offset lookup + linear read |
O(1) object_index lookup + fragment read |
Spatial bbox query |
O(n) — scan all streamlines |
O(chunks × bins) — spatial index |
Cloud access (S3/GCS) |
Requires full download |
Native range requests |
Spatial query on 1M-streamline dataset |
Seconds to minutes |
< 1 second for a small bbox |
ZVF’s spatial query advantage is most pronounced for datasets with millions of streamlines where only a small spatial region is needed. For pipelines that process all streamlines sequentially, TRX’s memory-mapped access pattern has comparable or better throughput than ZVF’s chunked reads.
Ingest and export¶
TRX converters live in the companion package zarr-vectors-tools.
TRX ingest preserves all dps and dpp attributes; TRX export is lossless
for the base level (level 0). Coarser resolution levels are not exported
to TRX (TRX has no multi-resolution concept), and the exported TRX does
not include the ZVF chunk layout metadata.