Validation overview¶
Terms¶
- Conformance level
An integer from 1 to 5 declaring the thoroughness of a validation run. Each level is a strict superset of the previous: a store that passes level
Nalso passes all levels belowN. Running level 5 runs all checks.ValidationResultThe object returned by
validate(). Carries a list of passed checks, warnings, and errors, plus asummary()method and anis_validproperty.- Check
A single assertion evaluated during validation. A check either passes, emits a warning (non-fatal), or raises an error (fatal). Errors accumulate; validation does not stop at the first failure.
- Warning
A non-fatal issue that indicates the store may behave unexpectedly in some tools but is not technically invalid. Example: a store without consolidated metadata will be slow to open on object stores.
- Error
A fatal issue that indicates the store does not conform to the ZVF spec at the declared conformance level. Example: a
bin_shapevalue that does not evenly dividechunk_shape.
Introduction¶
zarr-vectors-py ships a multi-level validator that checks ZVF stores for
correctness and conformance. Validation is organised into five progressively
deeper levels. Shallow levels (1–2) are fast and check structural and
metadata properties. Deeper levels (3–5) are more expensive because they
require reading array data, but they catch logical inconsistencies that
metadata checks alone cannot detect.
The validator is designed to be useful in several contexts:
After writing: confirm that a newly written store is valid before sharing it.
After ingest: confirm that a third-party format was correctly translated.
After rechunking: confirm that the rechunked store is consistent.
CI pipelines: run level 1–2 checks quickly; reserve level 5 for nightly runs.
Debugging: identify the specific check that fails to pinpoint bugs in writer or converter code.
Technical reference¶
Conformance levels¶
Level |
Name |
What it checks |
Typical runtime |
|---|---|---|---|
1 |
Structural |
Required files/groups/arrays exist; correct Zarr node types |
< 1 s |
2 |
Metadata |
|
1–5 s |
3 |
Consistency |
fragment offset arithmetic; manifest integrity; cross-chunk link validity; attribute–vertex alignment |
10 s – 10 min (reads all chunks) |
4 |
Geometry |
Type-specific constraints: tree topology for |
varies |
5 |
Pyramid |
Multi-resolution correctness: monotonically non-increasing vertex and object counts; |
adds per-level cost |
Python API¶
from zarr_vectors.validate import validate
# Run full validation (level 5)
result = validate("scan.zarrvectors", level=5)
# Print summary
print(result.summary())
# Level 5 validation: PASS
# 54 passed, 2 warnings, 0 errors
# Check programmatically
if not result.is_valid:
for err in result.errors:
print(f"ERROR [{err.level}] {err.check}: {err.message}")
# Run only fast checks (CI use)
result = validate("scan.zarrvectors", level=2)
# Validate a specific level group only
result = validate("scan.zarrvectors", level=3, resolution_levels=[0])
ValidationResult API¶
result.is_valid # bool: True if no errors at any level
result.passed # list[Check]: checks that passed
result.warnings # list[Check]: non-fatal warnings
result.errors # list[Check]: fatal errors
result.summary() # str: one-line human-readable summary
result.report() # str: full multi-line report with all checks
result.as_dict() # dict: machine-readable representation
# Iterate over all checks
for check in result.all_checks:
print(check.level, check.name, check.status, check.message)
Check object¶
check.level # int: conformance level this check belongs to
check.name # str: human-readable check name (e.g. "bin_divisibility")
check.status # str: "pass", "warning", or "error"
check.message # str: description of what was checked / what failed
check.path # str: store path relevant to this check (e.g. "0/.zattrs")
CLI usage¶
The zarr-vectors validate CLI lives in the companion package
zarr-vectors-tools.
Validation and writers¶
The write functions in zarr-vectors-py perform inline validation of
arguments before writing. However, this does not substitute for post-write
validation: inline checks guard against obviously invalid parameters but do
not verify the correctness of the written data (e.g. fragment offset arithmetic,
cross-chunk link completeness). Always run at least level 3 after writing
a new store.
Validation and performance¶
Levels 1 and 2 touch only metadata files. Level 3 and above require reading all array chunks. For a large store (> 100 GB), level 3 validation may take several minutes. To validate a sample of chunks rather than all:
result = validate("scan.zarrvectors", level=3, sample_fraction=0.1, seed=42)
Sampled validation is non-deterministic (random chunk selection) and may miss errors confined to specific chunks. It is suitable for quick sanity checks, not for publication-quality conformance certification.