Benchmarks

Stub

Authoritative benchmark numbers for zarr-vectors-py are not yet published. This page describes how to run the benchmark notebooks shipped with the repository and what each one measures. Results tables and plots will be added in a future release once a fixed benchmark harness, reference hardware, and reproducibility protocol have been agreed.

Notebooks

Three small Jupyter notebooks live under benchmarks/ in the source tree. Each follows the same shape — setup → sweep → table → plot — and is intentionally small (~10 cells, one plot per notebook).

Notebook

Axis swept

Fixed

01_size_scaling.ipynb

vertex count N {10³, 10⁴, 10⁵, 10⁶}

point cloud, local backend

02_data_types.ipynb

geometry type (all seven)

N = 50 000, local backend

03_backends.ipynb

storage backend (local, obstore, fsspec)

point cloud, N = 100 000

The notebooks are designed as “what to expect on my machine” sanity references, not as cross-format comparisons.

Running locally

pip install zarr-vectors pandas matplotlib
jupyter lab benchmarks/

Then open one notebook and run all cells. Expected runtime on a laptop:

  • 01_size_scaling: a few minutes (the 1 M-vertex case dominates).

  • 02_data_types: ~30 seconds.

  • 03_backends: ~10 seconds without cloud, longer with.

Cloud-backend mode

Notebook 03_backends.ipynb benchmarks the obstore and fsspec cloud backends only when the ZV_BENCH_S3_URL env var is set:

export ZV_BENCH_S3_URL="s3://my-bucket/zv-bench/"
pip install "zarr-vectors[obstore]"   # preferred
# or
pip install "zarr-vectors[cloud]"     # fsspec fallback

Without the env var the cloud rows are skipped and a one-row local-only result is reported.

Results

To be added. Numbers will be published here once a reproducibility protocol (hardware spec, OS, dependency versions, dataset seeds) has been frozen and the harness has been re-run against the locked target. Until then, the notebooks themselves are the only available reference and they are machine-dependent — do not treat them as published metrics.

Caveats

  • These notebooks measure disk bytes and wall time only — no memory profiling.

  • No CI gating and no pytest-benchmark integration: regressions are not caught automatically.

  • Different geometry types do genuinely different work; do not cross-compare rows of 02_data_types.

  • Numbers are not directly comparable across hardware, file systems, or cloud regions.

To regenerate the notebooks from the source recipe:

python benchmarks/_build.py