truelabelRequest data

Delivery format

HDF5 robot data format for robot training data

HDF5 robot data is useful for large structured arrays, trajectories, pose, sensor streams, and compact robot episodes. Define schema version, groups for observations/actions, timestamps, task labels, and metadata attributes before reviewing samples so you can verify that delivery matches the training pipeline.

Updated 2026-05-04
By truelabel
Reviewed by truelabel ·
HDF5 robot data

Quick facts

Origin
Hierarchical Data Format v5 — HDF Group, BSD license, hdfgroup.org. Industry standard since 1998.
Robotics adoption
ALOHA, RoboMIND (12.3 TB / 107k trajectories), and many manipulation benchmarks ship HDF5.
Strengths
Compact binary, partial reads, schema-rich groups, native chunking and compression.
Required schema fields
Schema version, groups for observations/actions, timestamps, task labels, metadata attributes.

Comparison

Format choiceStrengthRisk
HDF5 robot datalarge structured arrays, trajectories, pose, sensor streams, and compact robot episodesNeeds exact schema agreement before capture
Raw filesFast supplier exportHigh buyer cleanup burden
Custom schemaMatches internal pipelineHarder supplier onboarding

What is HDF5 robot data?

HDF5 robot data should be requested when the buyer's training or evaluation pipeline already expects large structured arrays, trajectories, pose, sensor streams, and compact robot episodes. Anchor the bounty to the canonical specification before suppliers submit samples [1], then use implementation documentation to make the expected file layout reviewable [2]. Robotics teams should also name the dataset or paper lineage they expect suppliers to support [3].

"HDF5 is a data model, library, and file format for storing and managing data."

[1]

For truelabel buyers, that quote matters because it turns HDF5 robot data from a generic delivery preference into a source-backed requirement the supplier can test against a sample file.

Using HDF5 robot data with robot data

A useful HDF5 robot data sample should prove schema version, groups for observations/actions, timestamps, task labels, and metadata attributes, plus file naming, manifest completeness, timestamp behavior, and rejected-example traceability. Include at least one workflow or converter reference so the supplier can show how the files load in practice [4], one interoperability reference for adjacent formats [5], and one comparison source for why this format is preferable to a raw folder dump [6].

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

External references and source context

  1. HDF5 1.14 documentation

    HDF5 is a file format and library for storing structured data.

    The HDF Group
  2. Introduction to HDF5

    HDF5 organizes data using groups and datasets.

    The HDF Group
  3. h5py groups

    h5py documents HDF5 groups for Python dataset hierarchies.

    h5py
  4. LeRobot dataset documentation

    LeRobot dataset docs are relevant to HDF5 episode packaging.

    Hugging Face
  5. Apache Parquet file format

    Parquet is a columnar alternative to HDF5-style structured delivery.

    Apache Parquet
  6. Dremel: Interactive Analysis of Web-Scale Datasets

    Dremel is a nested columnar-data reference useful in HDF5 versus Parquet comparisons.

    VLDB Endowment

FAQ

What is HDF5 robot data used for?

HDF5 robot data is used for large structured arrays, trajectories, pose, sensor streams, and compact robot episodes.

What fields should HDF5 robot data delivery require?

At minimum, require schema version, groups for observations/actions, timestamps, task labels, and metadata attributes, plus a delivery manifest and validation notes.

Can suppliers convert into this format?

Some suppliers can deliver directly in the requested format; others may need conversion. Buyers should require a small sample before full delivery.

Should the format be decided before capture?

Yes. Deciding the format before capture prevents missing fields, timestamp drift, and expensive post-delivery cleanup.

Working with HDF5 robot data

Truelabel normalizes HDF5 robot data across capture partners so you can ingest one consistent schema instead of writing per-vendor adapters.

Request HDF5 robot data data