Benchmarks¶
Stub
Authoritative benchmark numbers for zarr-vectors-py are not yet
published. This page describes how to run the benchmark notebooks
shipped with the repository and what each one measures. Results
tables and plots will be added in a future release once a fixed
benchmark harness, reference hardware, and reproducibility protocol
have been agreed.
Notebooks¶
Three small Jupyter notebooks live under
benchmarks/
in the source tree. Each follows the same shape — setup → sweep
→ table → plot — and is intentionally small (~10 cells, one
plot per notebook).
Notebook |
Axis swept |
Fixed |
|---|---|---|
|
vertex count |
point cloud, local backend |
|
geometry type (all seven) |
|
|
storage backend ( |
point cloud, |
The notebooks are designed as “what to expect on my machine” sanity references, not as cross-format comparisons.
Running locally¶
pip install zarr-vectors pandas matplotlib
jupyter lab benchmarks/
Then open one notebook and run all cells. Expected runtime on a laptop:
01_size_scaling: a few minutes (the 1 M-vertex case dominates).02_data_types: ~30 seconds.03_backends: ~10 seconds without cloud, longer with.
Cloud-backend mode¶
Notebook 03_backends.ipynb benchmarks the obstore and fsspec
cloud backends only when the ZV_BENCH_S3_URL env var is set:
export ZV_BENCH_S3_URL="s3://my-bucket/zv-bench/"
pip install "zarr-vectors[obstore]" # preferred
# or
pip install "zarr-vectors[cloud]" # fsspec fallback
Without the env var the cloud rows are skipped and a one-row local-only result is reported.
Results¶
To be added. Numbers will be published here once a reproducibility protocol (hardware spec, OS, dependency versions, dataset seeds) has been frozen and the harness has been re-run against the locked target. Until then, the notebooks themselves are the only available reference and they are machine-dependent — do not treat them as published metrics.
Caveats¶
These notebooks measure disk bytes and wall time only — no memory profiling.
No CI gating and no
pytest-benchmarkintegration: regressions are not caught automatically.Different geometry types do genuinely different work; do not cross-compare rows of
02_data_types.Numbers are not directly comparable across hardware, file systems, or cloud regions.
To regenerate the notebooks from the source recipe:
python benchmarks/_build.py