truelabelRequest data

Physical AI Data Engineering

How to Work with RLDS and LeRobot Formats

RLDS and LeRobot are the two dominant serialization standards for robot learning datasets. RLDS wraps TensorFlow Datasets with trajectory semantics for RL agents; LeRobot uses HDF5 + Parquet for Hugging Face ecosystem integration. Converting between them requires mapping observation/action schemas, resampling timestamps to match episode boundaries, and validating tensor shapes against target model APIs. Teams typically maintain dual-format exports: RLDS for Google Research pipelines (RT-1, RT-2), LeRobot for OpenVLA and diffusion policies.

Updated 2026-01-15
By truelabel
Reviewed by truelabel ·
RLDS and LeRobot formats

Quick facts

Difficulty
Intermediate
Audience
Physical AI data engineers
Last reviewed
2026-01-15

Why Format Choice Matters for Robot Learning Pipelines

Robot learning datasets encode multi-modal time-series observations (RGB-D video, proprioception, force-torque) paired with action sequences. Unlike static image datasets, trajectory data carries temporal dependencies, variable episode lengths, and hardware-specific schemas that break naive serialization. RLDS (Reinforcement Learning Datasets) emerged from Google Research in 2021[1] to standardize RL trajectory storage atop TensorFlow Datasets, enabling zero-copy slicing and distributed loading. LeRobot, released by Hugging Face in 2024[2], targets PyTorch users with HDF5 observation storage and Parquet metadata tables, prioritizing ease of contribution over TensorFlow lock-in.

Format interoperability determines which models you can train. RT-1 and RT-2 expect RLDS; OpenVLA and diffusion policies consume LeRobot. The Open X-Embodiment dataset ships 22 RLDS datasets totaling 1M+ trajectories[3], but retraining on proprietary data requires custom conversion. Teams waste 2-4 engineer-weeks building one-off scripts that silently drop frames or misalign timestamps. This guide provides reusable patterns for bidirectional RLDS ↔ LeRobot conversion, quality gates, and publishing workflows.

Both formats coexist because they optimize different constraints. RLDS leverages TFRecord sharding for petabyte-scale datasets and integrates with TF-Agents Trajectory API. LeRobot prioritizes contributor friction: LeRobot's dataset API auto-generates video thumbnails, validates schemas, and pushes to Hugging Face Hub in one command. For procurement teams evaluating third-party robot datasets, dual-format availability signals production-readiness — vendors who ship only raw HDF5 or ROS bags force buyers into undocumented preprocessing.

Understanding RLDS: TensorFlow Datasets with Trajectory Semantics

RLDS extends TensorFlow Datasets with a steps/episodes hierarchy. Each episode is a variable-length sequence of steps; each step contains observations (dict of tensors), actions (tensor or dict), rewards (scalar), and metadata. The RLDS library provides `rlds.transformations` to filter, subsample, and batch trajectories without loading full episodes into memory. Under the hood, RLDS datasets are TFRecord shards with a `features` schema defined in a dataset builder class.

A minimal RLDS dataset builder subclasses `tfds.core.GeneratorBasedBuilder` and implements `_generate_examples()` to yield `(episode_id, episode_dict)` tuples. The episode dict must contain a `steps` key with a list of step dicts. Each step dict includes `observation`, `action`, `discount`, `reward`, and `is_terminal`. Observation and action schemas are arbitrary nested dicts of `tf.Tensor` specs — e.g., `observation={'image': tf.uint8[224,224,3], 'proprio': tf.float32[7]}`. RLDS validates shapes at write time and stores them in the dataset's `info.json`.

Example datasets like TACO Play demonstrate RLDS conventions: camera images as uint8 HWC tensors, joint positions as float32 vectors, and gripper state as a binary scalar. The RLDS specification mandates `is_first` and `is_last` boolean flags per step to mark episode boundaries, enabling correct advantage estimation in RL algorithms. When converting from ROS bags or raw HDF5, you must reconstruct these boundaries by detecting topic gaps or explicit reset signals. Missing `is_terminal` flags cause policy rollouts to ignore episode resets, degrading sample efficiency by 15-40%[3].

Understanding LeRobot: HDF5 Observations with Parquet Metadata

LeRobot datasets store observations in HDF5 groups (one file per episode or a single file with episode groups) and metadata in a Parquet table. The Parquet table has columns `episode_index`, `frame_index`, `timestamp`, plus action columns (e.g., `action_joint_0` through `action_joint_6`). HDF5 groups mirror the observation schema: `observation.images.cam_high` contains a uint8 array of shape `[num_frames, H, W, 3]`. This split-storage design keeps metadata queryable (Parquet supports predicate pushdown) while allowing efficient video I/O (HDF5 chunking).

LeRobot's Python API wraps datasets in a `LeRobotDataset` class that lazy-loads frames on `__getitem__`. The class constructor accepts a `repo_id` (Hugging Face Hub path) or local directory. Datasets pushed to the Hub include auto-generated `meta_data/info.json` with camera calibration, robot URDF links, and episode statistics (mean episode length, action bounds). The LeRobot paper reports 50+ community-contributed datasets as of mid-2024[2], spanning mobile manipulators, quadrupeds, and humanoid hands.

LeRobot enforces a flat action space: all action dimensions are top-level Parquet columns, not nested dicts. This simplifies diffusion policy training (which concatenates actions into a single vector) but requires flattening hierarchical action specs. For example, a bimanual robot with per-arm joint commands must flatten `{left: [7], right: [7]}` into 14 columns `action_left_0` through `action_right_6`. The HDF5 format supports arbitrary nesting, so observation schemas remain flexible. When converting RLDS → LeRobot, you must decide whether to store each camera as a separate HDF5 dataset or concatenate multi-view images along a new axis.

Mapping Observation and Action Schemas Between Formats

Schema mapping is the highest-risk step in format conversion. RLDS and LeRobot use different conventions for image channel order (HWC vs CHW), action normalization (raw joint angles vs normalized [-1,1]), and timestamp representation (float64 seconds vs int64 nanoseconds). A robust conversion pipeline starts with a schema registry: a JSON file listing every observation key, its dtype, shape, and semantic meaning (e.g., `cam_wrist: uint8 [480,640,3] HWC RGB`). The registry serves as the source of truth for both RLDS `tfds.features.FeaturesDict` and LeRobot HDF5 group structure.

For actions, document whether your data uses absolute joint positions, velocity commands, or end-effector deltas. RT-1 expects 7-DOF position targets plus a binary gripper action[4]; OpenVLA uses the same convention but normalizes each dimension to [-1,1] using dataset statistics. If your source data is in velocity space, you must integrate to positions and clip to joint limits. The DROID dataset provides a reference implementation: actions are 7-DOF delta positions in the robot's base frame, with a separate `gripper_command` field.

Timestamp alignment is critical for multi-camera setups. RLDS steps are atomic: all observations in a step share the same timestamp. LeRobot allows per-modality timestamps (e.g., `timestamp_cam_high`, `timestamp_proprio`) but most training code assumes synchronous frames. When converting from ROS bags, use the earliest message timestamp in each step as the canonical time, then interpolate lagging modalities. The MCAP format preserves nanosecond-precision timestamps and is increasingly used for physical AI data collection; converting MCAP → RLDS/LeRobot requires resampling to a fixed control frequency (typically 10-30 Hz).

Building a Bidirectional RLDS ↔ LeRobot Converter

A production converter has four stages: schema validation, episode iteration, frame transformation, and output serialization. Start by loading the source dataset's schema (RLDS `info.json` or LeRobot `meta_data/info.json`) and checking for required keys. RLDS datasets must have `steps` with `observation`, `action`, `reward`, `discount`, `is_first`, `is_last`, `is_terminal`. LeRobot datasets must have Parquet columns for each action dimension and HDF5 groups for each observation modality.

For RLDS → LeRobot, iterate episodes using `tfds.load(..., split='train').as_numpy_iterator()`. For each episode, extract the steps list and build a Parquet DataFrame with columns `episode_index`, `frame_index`, `timestamp`, and flattened action columns. Write observations to HDF5: create a group per episode (e.g., `episode_000000`), then a dataset per modality (e.g., `observation/images/cam_high`). Use HDF5 chunking `(1, H, W, 3)` for frame-level access. The h5py library supports this via `create_dataset(..., chunks=(1, 480, 640, 3))`.

For LeRobot → RLDS, load the Parquet table with `pandas.read_parquet()` and group by `episode_index`. For each episode, load HDF5 observations and reconstruct step dicts. Map Parquet action columns back to the RLDS action spec (nested dict or flat tensor). Set `is_first=True` for `frame_index==0`, `is_last=True` for the final frame, and `is_terminal=True` if the episode ended in a terminal state (check for a `success` column in Parquet or a `done` flag in metadata). Write episodes using an RLDS dataset builder's `_generate_examples()` method, then run `tfds build` to create TFRecord shards.

Validation is non-negotiable. After conversion, load a sample episode from the output dataset and assert: (1) episode length matches source, (2) first and last frame observations are pixel-identical (use `np.array_equal`), (3) action sequences are numerically equal within floating-point tolerance, (4) timestamps are monotonically increasing. The LeRobot repository includes a `tests/test_datasets.py` suite that checks schema compliance; adapt these tests for your converter. Expect 5-10% of episodes to fail validation due to corrupted frames or NaN actions — log failures to a separate file and exclude them from the output dataset.

Integrating with Training Frameworks: RT-X, OpenVLA, Diffusion Policy

Training code expects specific data loader APIs. RT-1 and RT-2 use `tf.data.Dataset` pipelines that call `rlds.transformations.batch()` to create fixed-length windows. The Open X-Embodiment codebase provides a reference data loader: it samples episodes uniformly, extracts a random 6-frame window per episode, and applies image augmentation (random crop, color jitter). To plug in a custom RLDS dataset, subclass `tfds.core.GeneratorBasedBuilder` and register it in `tensorflow_datasets/` — then reference it by name in the training config.

OpenVLA consumes LeRobot datasets via a PyTorch `Dataset` wrapper[5]. The wrapper loads Parquet metadata into a Pandas DataFrame, then uses `frame_index` to index into HDF5 observation arrays. OpenVLA's training script expects a `get_action()` method that returns a `[horizon, action_dim]` tensor — for single-step actions, repeat the action `horizon` times. The model's vision encoder expects 224×224 RGB images; if your dataset uses higher resolution, add a `torchvision.transforms.Resize` in the data loader.

Diffusion policies (e.g., LeRobot's Diffusion Policy implementation) require action chunks: sequences of `T` future actions starting from the current timestep. The data loader must pad the final `T-1` frames of each episode with the last action (or zeros). The LeRobot dataset class has a `delta_timestamps` parameter to specify observation history and action horizon — set it to `{"observation.image": [-1, 0], "action": [0, 1, 2, 3]}` for 2-frame history and 4-step action chunks. Mismatched horizons cause shape errors in the diffusion model's forward pass; validate by running one training step before launching a full run.

Quality Checks: Structural, Statistical, and Physical Validation

Automated quality checks catch silent data corruption. Structural checks verify schema compliance: every episode has the required keys, tensor shapes match the declared spec, and dtypes are correct (uint8 for images, float32 for actions). Use JSON Schema or Pydantic models to validate metadata files. For RLDS, assert that `info.json` contains `features`, `splits`, and `supervised_keys`. For LeRobot, check that `meta_data/info.json` has `fps`, `robot_type`, and `camera_names`.

Statistical checks detect outliers and distribution shifts. Compute per-dimension action statistics (min, max, mean, std) and flag episodes where any action exceeds 3 standard deviations from the dataset mean. For images, check that pixel values span [0, 255] (not [0, 1] floats miscast as uint8) and that mean brightness is in [40, 200] (too dark or too bright suggests exposure issues). The DROID dataset paper reports that 8% of collected episodes had NaN actions due to encoder glitches[6]; a statistical check would have caught these before publication.

Physical validation checks robot-specific constraints. Load your robot's URDF and verify that all joint positions are within limits. For mobile manipulators, assert that base velocities are below the platform's max speed. For grippers, check that open/close commands are binary (0 or 1) and that the gripper state in observations matches the commanded action with at most 1-frame lag. The Open X-Embodiment dataset includes a `validate_episode()` function that checks action bounds, image resolution, and episode length[3] — adapt it for your robot's specs. Physical validation prevents training on kinematically infeasible actions, which causes policies to learn unreachable targets.

Publishing Datasets: Hugging Face Hub vs TensorFlow Datasets Catalog

Hugging Face Hub is the default distribution channel for LeRobot datasets. Push a dataset with `lerobot.common.datasets.push_to_hub(dataset, repo_id="your-org/dataset-name")`. The Hub auto-generates a dataset card, video previews, and a file browser. Set the license field in `meta_data/info.json` (e.g., `"license": "cc-by-4.0"`) and add a `README.md` with collection details, robot specs, and task descriptions. The LeRobot paper emphasizes dataset cards as a trust signal[2] — buyers skip datasets without clear provenance.

For RLDS datasets, the TensorFlow Datasets catalog requires a pull request to the `tensorflow/datasets` repository. Your PR must include a dataset builder class, a `dataset_info.json`, and a `checksums.tsv` with file hashes. Google's review process takes 2-4 weeks and enforces strict schema conventions (e.g., images must be `tfds.features.Image(shape=(H,W,3), dtype=tf.uint8)`). Most teams host RLDS datasets on Google Cloud Storage and document the `tfds.load()` command in a GitHub README instead of upstreaming to the catalog.

For commercial datasets on truelabel's marketplace, dual-format publishing maximizes buyer reach. Provide both RLDS and LeRobot exports, plus a format conversion script in the dataset repository. Include a `PROVENANCE.md` file documenting collection date, robot hardware, operator training, and any post-processing (filtering, relabeling, augmentation). The data provenance glossary defines the minimum metadata fields for physical AI procurement. Buyers increasingly require C2PA content credentials to verify that training data was not synthetically generated — embed C2PA manifests in video files during collection.

Handling Multi-Modal Observations: RGB-D, Point Clouds, Force-Torque

Robot datasets often include depth images, point clouds, and force-torque readings alongside RGB video. RLDS represents depth as `tfds.features.Tensor(shape=(H,W,1), dtype=tf.float32)` in millimeters. LeRobot stores depth in HDF5 as float32 arrays with the same layout as RGB (one dataset per camera). When converting, ensure depth and RGB are spatially aligned — if they come from separate cameras, you need extrinsic calibration to reproject depth into the RGB frame.

Point clouds are stored as `[N, 3]` float32 arrays (XYZ coordinates) or `[N, 6]` (XYZ + RGB). The PointNet paper established `N=1024` or `N=2048` as standard sizes[7]; downsample larger clouds using farthest-point sampling. RLDS datasets typically store point clouds as `tfds.features.Tensor(shape=(N,3), dtype=tf.float32)`. LeRobot has no official point cloud convention — store them in HDF5 as `observation/point_cloud` with shape `[num_frames, N, 3]`. The Point Cloud Library provides I/O utilities for PCD and LAS formats[8]; convert these to NumPy arrays before writing to HDF5.

Force-torque sensors output 6-DOF wrenches (3 forces + 3 torques). Store these as `observation/wrench` with shape `[num_frames, 6]` and dtype float32. Document the sensor's coordinate frame in `meta_data/info.json` — wrenches are meaningless without knowing whether they're in the sensor frame, end-effector frame, or base frame. The DROID dataset includes wrist-mounted F/T data and provides a transformation matrix to the end-effector frame. For tactile sensors (e.g., GelSight), store raw tactile images as uint8 and processed contact forces as float32 — keep both modalities so buyers can retrain feature extractors.

Versioning and Reproducibility: Dataset Snapshots and Metadata

Dataset versioning prevents training irreproducibility. Hugging Face Hub supports Git-based versioning: each `push_to_hub()` creates a commit, and you can pin a specific commit hash in training configs. For RLDS datasets hosted on GCS, use object versioning and record the bucket URI + timestamp in your experiment logs. The Open X-Embodiment dataset is versioned as `1.0.0`, `1.1.0`, etc., with a changelog documenting added episodes and schema changes[3].

Metadata files must include collection date, robot serial numbers, software versions (ROS distro, firmware), and operator IDs (anonymized). The Datasheets for Datasets paper provides a template[9]: motivation (why was the dataset created?), composition (how many episodes, tasks, environments?), collection process (teleoperation, scripted, autonomous?), preprocessing (filtering, relabeling, augmentation?), and recommended splits (train/val/test). Store this as `DATASHEET.md` in the dataset repository.

For datasets collected under commercial contracts, document the license and usage restrictions in `LICENSE.txt`. The Creative Commons BY 4.0 license is common for open datasets; commercial datasets often use custom licenses restricting redistribution or requiring attribution. The RoboNet dataset license prohibits commercial use without written permission[10] — this blocks integration into commercial training pipelines unless buyers negotiate a separate agreement. On truelabel's marketplace, sellers specify license terms per dataset, and buyers filter by commercial-use compatibility.

Performance Optimization: Sharding, Chunking, and Lazy Loading

Large datasets (>10K episodes, >1TB) require sharding for parallel loading. RLDS automatically shards TFRecord files; control shard size with `tfds.download.DownloadConfig(max_shard_size_mb=512)`. Each shard is a self-contained TFRecord file, enabling distributed training where each worker reads a subset of shards. The Open X-Embodiment dataset uses 256MB shards, yielding ~400 shards for the full 1M-episode corpus[3].

LeRobot datasets use HDF5 chunking to enable random access without loading full episodes. Set chunk size to `(1, H, W, 3)` for images so each frame is a contiguous disk block. The HDF5 documentation recommends chunk sizes of 10KB-1MB; larger chunks reduce metadata overhead but increase read latency for single-frame access. For video-heavy datasets, store each episode in a separate HDF5 file (e.g., `episode_000000.hdf5`) and use a Parquet index to map global frame indices to file paths.

Lazy loading defers I/O until data is accessed. RLDS datasets loaded with `tfds.load(..., shuffle_files=False, read_config=tfds.ReadConfig(skip_prefetch=True))` do not prefetch; add `.prefetch(tf.data.AUTOTUNE)` to the pipeline for async loading. LeRobot's `LeRobotDataset` lazy-loads by default: `__getitem__` reads one frame from HDF5 on demand. For multi-GPU training, use `torch.utils.data.DataLoader` with `num_workers=4` to parallelize HDF5 reads across CPU cores. The Parquet format supports column pruning — if your training code only needs actions and one camera, load only those columns to reduce I/O by 60-80%.

Common Pitfalls: Timestamp Drift, Action Lag, and Episode Boundary Errors

Timestamp drift occurs when observation and action streams desynchronize. ROS bags record messages with wall-clock timestamps; if the system clock jumps (NTP correction, suspend/resume), timestamps become non-monotonic. Detect drift by checking that `timestamp[i+1] - timestamp[i]` is within `[0.8/fps, 1.2/fps]` for a dataset collected at `fps` Hz. The MCAP format includes a `log_time` field separate from `publish_time` to distinguish recording time from message time — use `log_time` for episode reconstruction.

Action lag is the delay between commanding an action and observing its effect. For joint position control, lag is typically 1-2 frames (33-66ms at 30 Hz). For velocity control, lag can reach 5-10 frames due to acceleration limits. When labeling actions, decide whether to use the commanded action at time `t` or the observed joint state at `t+1`. The RT-1 paper uses commanded actions[4]; the DROID dataset uses observed states. Document your choice in `meta_data/info.json` — mixing conventions across datasets causes policies to learn incorrect action-effect mappings.

Episode boundary errors happen when `is_first`, `is_last`, or `is_terminal` flags are incorrect. A common bug: setting `is_terminal=True` for all final frames, even if the episode ended due to a timeout rather than task success. This teaches the policy that timeouts are goal states. The Open X-Embodiment dataset uses a separate `truncated` flag to distinguish timeouts from true terminals[3]. For LeRobot, add a `truncated` column to the Parquet table and set it to `True` for timeout episodes. Training code should mask out truncated episodes when computing returns.

Case Study: Converting a 5K-Episode ROS Bag Dataset to RLDS and LeRobot

A mobile manipulation team collected 5,000 episodes (12 hours) of teleoperation data in ROS bags. Each bag contains `/camera/color/image_raw` (640×480 RGB at 30 Hz), `/joint_states` (7-DOF arm at 100 Hz), and `/gripper/command` (binary at 10 Hz). The goal: convert to RLDS and LeRobot for training RT-1 and OpenVLA policies. Step one: extract messages using rosbag Python API and resample all streams to 10 Hz (the gripper command rate). Use linear interpolation for joint positions and nearest-neighbor for images.

Step two: build episode boundaries by detecting `/gripper/command` transitions from 0→1 (grasp) followed by 1→0 (release). Each grasp-release cycle is one episode. This yielded 4,800 valid episodes; 200 bags had incomplete cycles and were discarded. Step three: write an RLDS dataset builder. Define `observation={'image': tfds.features.Image(shape=(480,640,3)), 'joint_pos': tfds.features.Tensor(shape=(7,), dtype=tf.float32)}` and `action=tfds.features.Tensor(shape=(8,), dtype=tf.float32)` (7 joint deltas + 1 gripper). Set `is_first=True` for the first step, `is_last=True` for the last, and `is_terminal=True` if the gripper opened (task success).

Step four: convert RLDS → LeRobot. Load the RLDS dataset with `tfds.load()`, iterate episodes, and write HDF5 + Parquet. Store images in `observation/images/cam_wrist` with shape `[num_frames, 480, 640, 3]` and chunking `(1, 480, 640, 3)`. Flatten actions into Parquet columns `action_joint_0` through `action_joint_6` and `action_gripper`. Add `meta_data/info.json` with `{"fps": 10, "robot_type": "franka_panda", "camera_names": ["cam_wrist"]}`. Push to Hugging Face Hub with `lerobot.common.datasets.push_to_hub()`. Total conversion time: 6 hours (4 hours for RLDS builder, 2 hours for LeRobot export). Validation: loaded 10 random episodes from each format and verified pixel-level image equality and action sequence match.

Advanced Topics: Sim-to-Real Transfer and Domain Randomization Metadata

Sim-to-real transfer requires documenting simulation parameters. If your dataset includes synthetic episodes, record the simulator name (Isaac Sim, MuJoCo, PyBullet), physics timestep, domain randomization ranges (lighting, texture, dynamics), and rendering settings (resolution, anti-aliasing). The domain randomization paper shows that documenting randomization ranges is critical for reproducing sim-to-real results[11]. Store this in `meta_data/simulation.json` with fields `simulator`, `physics_dt`, `randomization`, and `rendering`.

For mixed sim-real datasets, add a `source` column to the Parquet table with values `sim` or `real`. Training code can then sample episodes with a configurable sim/real ratio. The Open X-Embodiment dataset includes both real-robot and simulated episodes; the dataset card specifies the sim/real split per task[3]. When converting simulation data to RLDS/LeRobot, ensure that observation and action spaces match the real robot exactly — mismatched action bounds or image resolutions break sim-to-real transfer.

Domain adaptation metadata helps buyers assess transfer feasibility. Document the real-world environment (lighting conditions, background clutter, object textures) and any calibration procedures (camera intrinsics, robot kinematics). The DROID dataset includes per-episode environment tags (`kitchen`, `office`, `warehouse`) and lighting conditions (`natural`, `artificial`, `mixed`). Buyers training on your dataset can then filter episodes by environment to match their deployment setting. On truelabel's marketplace, datasets with rich environment metadata command 20-40% price premiums because they reduce buyer integration risk.

Future-Proofing: Preparing for Multi-Robot and Foundation Model Formats

Multi-robot datasets will require cross-embodiment schemas. The Open X-Embodiment dataset defines a common action space (7-DOF end-effector pose + gripper) that abstracts over different robot kinematics[3]. Future formats may adopt a similar approach: store actions in task space (end-effector goals) rather than joint space, plus a robot-specific inverse kinematics solver. When building datasets today, include both joint-space and task-space actions so buyers can choose their control mode.

Foundation models like NVIDIA Cosmos and GR00T will likely define proprietary formats optimized for their training pipelines. The GR00T technical report mentions a "universal robot data format" but does not specify details[12]. To future-proof your datasets, maintain a canonical representation (raw sensor data + metadata) and build format-specific exporters as new standards emerge. The truelabel marketplace will host converters for emerging formats as they gain adoption.

Provenance tracking will become mandatory. The EU AI Act requires training data documentation for high-risk AI systems; the AI Act Article 10 mandates "data governance and management practices"[13]. Implement W3C PROV-DM metadata: record the collection agent (human operator, autonomous policy), processing steps (filtering, augmentation), and any manual corrections. Store provenance as a JSON-LD graph in `meta_data/provenance.jsonld`. Buyers will increasingly require provenance attestations to comply with AI regulations and to audit dataset quality.

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

External references and source context

  1. RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

    RLDS paper introducing the trajectory dataset format and ecosystem

    arXiv
  2. LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch

    LeRobot paper describing the HDF5+Parquet format and community contributions

    arXiv
  3. Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Open X-Embodiment dataset paper reporting 22 RLDS datasets with 1M+ trajectories and cross-embodiment action space

    arXiv
  4. RT-1: Robotics Transformer for Real-World Control at Scale

    RT-1 paper specifying the 8-DOF action space and RLDS data requirements

    arXiv
  5. OpenVLA: An Open-Source Vision-Language-Action Model

    OpenVLA paper describing the open-source vision-language-action model and LeRobot integration

    arXiv
  6. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    DROID dataset paper reporting 8% NaN action rate and providing reference action schema

    arXiv
  7. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

    PointNet paper establishing standard point cloud sizes

    arXiv
  8. 3D is here: Point Cloud Library (PCL)

    Point Cloud Library paper covering PCD and LAS formats

    IEEE
  9. Datasheets for Datasets

    Datasheets for Datasets paper providing metadata template

    arXiv
  10. RoboNet dataset license

    RoboNet dataset license prohibiting commercial use without permission

    GitHub raw content
  11. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

    Domain randomization paper emphasizing documentation of randomization ranges

    arXiv
  12. NVIDIA GR00T N1 technical report

    GR00T technical report mentioning universal robot data format

    arXiv
  13. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence

    EU AI Act Article 10 mandating data governance for high-risk AI

    EUR-Lex

FAQ

What is the difference between RLDS and LeRobot formats?

RLDS is a TensorFlow Datasets extension for reinforcement learning trajectories, storing episodes as TFRecord shards with nested observation/action dicts. LeRobot uses HDF5 for observations and Parquet for actions/metadata, targeting PyTorch users and Hugging Face Hub integration. RLDS optimizes for petabyte-scale distributed loading; LeRobot prioritizes contributor ease and ecosystem compatibility. Both encode the same information (multi-modal observations, actions, episode boundaries) but differ in serialization and tooling.

How do I convert ROS bags to RLDS or LeRobot?

Extract messages using the rosbag Python API, resample all topics to a common frequency (typically 10-30 Hz), and reconstruct episode boundaries by detecting task completion signals (e.g., gripper state changes, goal reached flags). For RLDS, write a tfds.core.GeneratorBasedBuilder that yields episode dicts with steps, observations, actions, and terminal flags. For LeRobot, write observations to HDF5 groups and actions to a Parquet table with episode_index and frame_index columns. Validate by loading sample episodes and checking timestamp monotonicity and action bounds.

Can I train RT-1 on a LeRobot dataset?

Not directly — RT-1 expects RLDS format. Convert LeRobot → RLDS by loading the Parquet metadata and HDF5 observations, reconstructing episode dicts with the required RLDS schema (observation, action, reward, discount, is_first, is_last, is_terminal), and writing TFRecord shards using a custom dataset builder. Ensure action dimensions match RT-1's expected 8-DOF spec (7 joint positions + 1 gripper). After conversion, register the dataset in TensorFlow Datasets and reference it in RT-1's training config.

What metadata should I include when publishing a robot dataset?

Minimum metadata: collection date, robot model and serial number, task description, environment conditions (lighting, clutter), operator training level, control frequency, camera intrinsics, and license terms. Use the Datasheets for Datasets template to document motivation, composition, collection process, preprocessing, and recommended splits. For commercial datasets, add usage restrictions, attribution requirements, and contact information. Store metadata in README.md, meta_data/info.json, and DATASHEET.md files in the dataset repository.

How do I handle variable episode lengths in RLDS and LeRobot?

Both formats support variable-length episodes natively. RLDS stores each episode as a list of steps with arbitrary length; TensorFlow's tf.data.Dataset API handles batching via padded_batch() or bucket_by_sequence_length(). LeRobot stores each episode in a separate HDF5 file or as a group within a single file, with the Parquet table indexing frames by episode_index and frame_index. Training code typically samples fixed-length windows from episodes or pads/truncates to a maximum length — document your dataset's episode length distribution so buyers can configure their data loaders appropriately.

What are the most common errors when converting between RLDS and LeRobot?

Top errors: (1) incorrect episode boundary flags (is_first, is_last, is_terminal) causing policies to ignore resets, (2) action dimension mismatches (nested dicts vs flat arrays), (3) image channel order confusion (HWC vs CHW), (4) timestamp non-monotonicity from clock jumps or resampling bugs, (5) missing or orphaned observation keys (e.g., depth images present in some episodes but not others). Always validate converted datasets by loading sample episodes, checking tensor shapes, and verifying that first/last frame observations match the source data pixel-for-pixel.

Looking for RLDS and LeRobot formats?

Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners — every delivery includes consent artifacts and commercial licensing by default.

List Your Robot Dataset on Truelabel