truelabelRequest data

Format reference

Robot data format guides

Eight formats cover almost every robotics dataset shipped today. The matrix below compares them on streaming, schema preservation, language SDK coverage, compression, and license. Each row links to a deeper format-specific page with verified scale facts and when-to-use guidance.

Feature matrix — sortable

Click any column header to sort. Sort by Since for the maturity timeline, by Reader SDKs alphabetically for language coverage, or by Streaming to find formats that support partial reads.

Robot data formats — feature matrix (truelabel review, 2026-05)
DatasetPrimary useStreamingSchema preservationReader SDKsCompressionLicenseSince
LeRobotRobot-learning datasets + policy trainingPartial (HF Datasets streaming)Strong — LeRobotDataset v2.x pinnedPython (lerobot, datasets)MP4 (AV1) + ParquetApache-2.02024
MCAPTime-synchronized robotics logsYes — chunk-indexedSelf-describing (protobuf, ROS, JSON, FlatBuffer)Rust, Python, C++, Go, TypeScriptlz4, zstdApache-2.02022
RLDSRL / robotics episodes (obs, action, reward, metadata)Yes (TFDS-based)Strong — episode + step shape pinnedPython (TFDS, NumPy)Inherits TFDSApache-2.02021
ParquetTabular state/action streams + frame tablesColumnar partial readsStrong — Arrow schemaC/C++, Java, Python, R, Rust, Go, JSsnappy, gzip, zstd, brotliApache-2.02013
ROS bagROS-native robot fleet logs (legacy)ROS 1 yes; ROS 2 / SQLite yesROS message types onlyC++, Python (via ROS)lz4, bz2BSD-style (ROS)2007
HDF5Trajectories, pose, sensor streamsPartial reads supportedSchema-rich (groups + attributes)C/C++, Python, R, Java, MATLABNative (gzip, szip, zstd)BSD-style1998
PicklePython-first benchmark releasesNoSchema-free (any Python object)Python onlyExternal (gzip wrapper)PSF (Python)1994
Point cloud3D scene geometry, LiDAR, depthFormat-dependentPCD / PLY / LAS / USDZ headersC++, Python, USD/Pixar toolsLAZ, draco, ZSTDBSD / open1994

8 format guides — search and filter

8 of 8 datasets

HDF5 robot data format for robot training data

Delivery format

HDF5 robot data is useful for large structured arrays, trajectories, pose, sensor streams, and compact robot episodes. Define schema version, groups for observations/actions, timestamps, task labels, and metadata attributes before reviewing samples so you can verify that delivery matches the training pipeline.

  • HDF5 robot data robotics dataset
  • HDF5 robot data training data

LeRobot format format for robot training data

Delivery format

LeRobot format is useful for developer-friendly robot learning datasets and policy training pipelines. Define episode metadata, observation tensors, action tensors, timestamps, and repo-compatible manifest before reviewing samples so you can verify that delivery matches the training pipeline.

  • LeRobot format robotics dataset
  • LeRobot format training data

MCAP format for robot training data

Delivery format

MCAP is useful for time-synchronized robotics logs, compressed video topics, IMU, state, and action messages. Define topic schema, timestamp domain, compression settings, camera topic, state topic, and manifest before reviewing samples so you can verify that delivery matches the training pipeline.

  • MCAP robotics dataset
  • MCAP training data

Parquet robot data format for robot training data

Delivery format

Parquet robot data is useful for large robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distribution. Define episode index, frame offsets, task descriptions, feature schema, video references, statistics, and split metadata before reviewing samples so you can verify that delivery matches the training pipeline.

  • Parquet robot data robotics dataset
  • Parquet robot data training data

Pickle robot data format for robot training data

Delivery format

Pickle robot data is useful for Python-first benchmark releases that package demonstrations, robot state dictionaries, observations, and task metadata. Define schema documentation, Python version notes, object keys, observation/action fields, conversion script, and checksum manifest before reviewing samples so you can verify that delivery matches the training pipeline.

  • Pickle robot data robotics dataset
  • Pickle robot data training data

Point cloud format for robot training data

Delivery format

Point cloud is useful for 3D scene geometry, object reconstruction, LiDAR/depth capture, navigation perception, and manipulation planning. Define coordinate frame, units, sensor intrinsics and extrinsics, timestamps, segmentation or object labels where available, and source RGB/depth references before reviewing samples so you can verify that delivery matches the training pipeline.

  • Point cloud robotics dataset
  • Point cloud training data

RLDS format for robot training data

Delivery format

RLDS is useful for reinforcement learning and robotics episodes with observations, actions, rewards, and metadata. Define episode ID, observation stream, action stream, timestamps, task label, and success flag before reviewing samples so you can verify that delivery matches the training pipeline.

  • RLDS robotics dataset
  • RLDS training data

ROS bag format for robot training data

Delivery format

ROS bag is useful for robot-native data collection where buyers need ROS topics preserved for replay or conversion. Define topic list, message types, timestamps, sensor calibration, and conversion notes before reviewing samples so you can verify that delivery matches the training pipeline.

  • ROS bag robotics dataset
  • ROS bag training data

Picking a format — decision rule

  • Training a learning policy on the buyer’s pipeline → LeRobot or RLDS.
  • Multi-robot fleet logs with synchronized topics → MCAP (replaces ROS bag).
  • Compact structured arrays for trajectories, pose, sensor streams → HDF5.
  • Tabular state/action + frame tables, columnar reads → Parquet.
  • 3D scene geometry, LiDAR, depth → Point cloud.
  • Inheriting a Python research release → convert pickle to one of the above before production ingest.