Glossary

RLDS: Reinforcement Learning Dataset Standard

RLDS (Reinforcement Learning Datasets) is an episode-based data specification and storage format developed by Google DeepMind for sequential decision-making datasets in robotics and reinforcement learning. Built on TensorFlow Datasets infrastructure, RLDS structures robot interaction data as collections of episodes—ordered sequences of timesteps containing observations, actions, rewards, discount factors, and metadata—enabling standardized sharing and consumption across heterogeneous robot platforms and research groups.

Updated 2025-06-08

By Truelabel Team

Reviewed by Truelabel Team · Jun 8, 2025

RLDS

List Your Robot Dataset on Truelabel Browse glossary

Quick facts

Topic: Rlds
Audience: Procurement leads, ML ops, robotics engineers
Deliverable: Buyer-facing reference + procurement guidance

What RLDS Is and Why It Exists

Google DeepMind released RLDS in 2021 as a data spec, storage format, and toolchain for sequential decision-making datasets (RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning^[1]). Before it, every robot learning group invented its own layout — CSVs, pickled Python objects, custom HDF5 schemas, ROS bag files. Cross-platform reuse cost more engineering than the dataset itself.

RLDS organises a dataset as episodes, each an ordered sequence of steps. Every step holds five fields: `observation` (sensor state), `action` (control command), `reward` (scalar), `discount` (temporal discount), and boolean flags `is_first` / `is_last`. Observations and actions are nested dictionaries, so a single step can carry three RGB cameras, a 7-DOF joint vector, a language instruction, and a task embedding without schema gymnastics^[2].

The Open X-Embodiment collaboration picked RLDS as its interchange format and consolidated 22 robot datasets — 160,000 demonstrations, 527 skills, 21 embodiments. That consolidation made RT-X possible: a single policy that beat single-dataset baselines by 50% on novel tasks. Hugging Face LeRobot, TensorFlow Datasets, and the major annotation platforms all read it natively.

Core Schema and Data Structure

An RLDS dataset is a two-level hierarchy: a collection of episodes, each containing a sequence of steps. The step schema is fixed across all RLDS datasets, ensuring structural consistency while allowing semantic flexibility through nested dictionaries.

The step schema defines six mandatory fields. `observation` is a nested dictionary mapping sensor names to tensors—for example, `{'wrist_rgb': [224,224,3], 'base_rgb': [480,640,3], 'joint_pos': [7]}`. `action` is similarly nested, supporting both continuous control vectors and discrete action indices. `reward` is a scalar float. `discount` is a float in [0,1] used for temporal credit assignment. `is_first` and `is_last` are booleans marking episode boundaries^[3].

The episode schema wraps steps in a `tf.data.Dataset` and adds episode-level metadata. Common metadata fields include `episode_id` (unique identifier), `success` (boolean task outcome), `language_instruction` (natural-language goal), and `scene_info` (environment configuration). The DROID dataset extends this with `collector_id`, `robot_serial`, and `timestamp_utc` for data provenance tracking.

RLDS uses TFRecord serialization for storage, enabling efficient streaming from cloud buckets. A typical dataset directory contains `features.json` (schema definition), `dataset_info.json` (metadata), and sharded `.tfrecord` files. The TensorFlow Datasets RLDS integration provides automatic sharding, checksumming, and version management. Datasets range from 50 GB (BridgeData V2, 60,000 episodes) to 2 TB (Open X-Embodiment aggregate).

Conversion Pipelines and Tooling

Converting existing robot datasets to RLDS requires mapping source formats to the canonical step schema. The RLDS repository provides reference conversion scripts for common formats including ROS bags, HDF5 archives, and pickled trajectories.

ROS bag conversion. Most teams arrive from here. ROS bags are timestamped topic streams; the converter synchronizes topics by timestamp, groups messages into steps, and nests sensor data under `observation`. The rosbag2_storage_mcap plugin gives a direct MCAP-to-RLDS pipeline with nanosecond timestamps preserved. Expect a 100 GB ROS bag to land at roughly 120 GB in RLDS — TFRecord framing eats the difference.

HDF5 conversion handles Hierarchical Data Format sources. Many teleoperation datasets use HDF5 groups for episodes and HDF5 datasets for timesteps. The converter walks the hierarchy, flattens nested arrays into dicts, and writes TFRecords. The Furniture Bench dataset ships a reference implementation.

Validate before shipping. Run `rlds.validate_dataset()` after conversion: it checks schema, tensor shapes, NaN values, and episode boundary flags. The Open X-Embodiment curation pass rejected 18% of submissions, almost always for missing `is_first`/`is_last` flags or inconsistent action dimensions across episodes.

Integration with Training Frameworks

RLDS datasets integrate natively with TensorFlow-based training loops via the `tf.data` API. A typical pipeline loads sharded TFRecords, applies transformations (frame stacking, normalization, augmentation), batches episodes, and prefetches to GPU. The RT-1 training code demonstrates this pattern, achieving 95% GPU utilization on 512-core TPU pods.

PyTorch integration requires a bridge layer. The LeRobot library provides `RLDSDataset`, a PyTorch `Dataset` subclass that wraps RLDS TFRecords and exposes them via `__getitem__`. This enables standard PyTorch `DataLoader` usage with multi-worker prefetching. The Diffusion Policy training example shows a complete pipeline: RLDS → LeRobot → PyTorch Lightning.

Frame stacking. Policies usually condition on the last k observations to read velocity and acceleration off the input. RLDS stores single-frame observations; the loader stacks frames with `tf.data.Dataset.window()` or LeRobot's `delta_timestamps`. RT-2 stacks 6 frames at 3 Hz for a 2-second window.

Action chunking. Instead of one action per step, the policy emits an n-step action trajectory. The ACT notebook outputs 100-step chunks, executes the first 10, then re-queries. Compounding error drops and long-horizon success climbs 23%^[4].

RLDS in Multi-Robot Training

RLDS's nested dictionary schema enables cross-embodiment training by accommodating heterogeneous observation and action spaces. The Open X-Embodiment dataset contains 21 robot morphologies—7-DOF arms, mobile manipulators, quadrupeds, humanoids—each with different sensor suites and action dimensions. RLDS handles this via per-robot observation keys: `franka_wrist_rgb`, `stretch_head_depth`, `spot_proprioception`.

The RT-X training procedure uses embodiment tokens to condition policies on robot identity. During data loading, the pipeline injects a learned embedding vector into the observation dictionary under the key `embodiment_id`. The transformer policy attends to this token, learning embodiment-specific priors while sharing visual and language representations. RT-X achieved 50% average success across 6 unseen robots after training on 15 source robots^[2].

Action space normalization is critical for multi-robot datasets. Different robots use different action representations: joint velocities, end-effector deltas, gripper binary open/close. The RLDS convention is to store actions in robot-native units and apply normalization during training. The LeRobot normalization module computes per-dataset statistics (mean, std, min, max) and applies z-score or min-max scaling.

Dataset mixing ratios significantly impact generalization. The Open X-Embodiment paper found that uniform sampling across datasets (equal probability per episode regardless of dataset size) outperformed size-proportional sampling by 12% on average. This prevents large datasets from dominating the training distribution and ensures rare skills receive sufficient gradient updates.

RLDS vs Alternative Formats

RLDS competes with several alternative robot dataset formats, each with distinct trade-offs. ROS bags remain dominant for raw sensor logging but lack semantic structure—a bag is a flat stream of timestamped messages with no episode boundaries or task labels. Converting bags to RLDS adds this structure at the cost of storage overhead (typically 15-25% larger due to TFRecord framing).

MCAP is a modern alternative to ROS bags, offering indexed seeks, attachment metadata, and language-agnostic schemas. The rosbag2_storage_mcap plugin enables ROS 2 to write MCAP natively. MCAP's chunk-based structure supports efficient random access, making it superior for interactive dataset exploration. However, MCAP lacks RLDS's episode abstraction and standardized step schema, requiring custom parsing logic per dataset.

HDF5 is widely used for teleoperation datasets due to its hierarchical groups and efficient array storage. The ALOHA dataset uses HDF5 with groups for episodes and datasets for observations/actions. HDF5 supports in-place writes and partial reads, enabling incremental dataset construction. The downside: HDF5 schemas are dataset-specific, and the format lacks built-in versioning or cloud-streaming optimizations.

Apache Parquet is emerging as a fourth option, particularly for Hugging Face Datasets. Parquet's columnar layout enables efficient filtering and projection, and its schema evolution supports backward-compatible updates. The LeRobot library experimentally supports Parquet-backed datasets, achieving 30% faster loading than TFRecord on NVMe SSDs. However, Parquet's row-group structure is less natural for episode-based data than RLDS's explicit nesting.

Dataset Metadata and Discoverability

RLDS datasets include machine-readable metadata in `dataset_info.json`, following the TensorFlow Datasets metadata schema. Core fields include `name`, `version`, `description`, `homepage`, `citation`, `splits` (train/val/test), `features` (tensor shapes and dtypes), and `download_size`. This metadata powers the TFDS catalog search, which indexes 400+ datasets.

The Open X-Embodiment project extends RLDS metadata with robot-specific fields: `embodiment` (robot model), `control_frequency` (Hz), `camera_names` (list of RGB/depth sensors), `action_space` (continuous/discrete), and `task_categories` (pick, place, push, etc.). These fields enable programmatic dataset filtering—for example, querying all 10 Hz datasets with wrist cameras and continuous actions.

Dataset cards provide human-readable documentation. The Hugging Face dataset card format is becoming standard, covering intended use, data collection methodology, annotation procedures, known limitations, and licensing. The DROID dataset card exemplifies best practice: 12 sections including collector demographics, robot hardware specs, task distribution, failure modes, and ethical considerations.

Versioning is critical for reproducibility. RLDS datasets use semantic versioning (major.minor.patch). Major version increments indicate breaking schema changes (e.g., renaming observation keys). Minor increments add backward-compatible fields (e.g., new metadata). Patch increments fix data errors without schema changes. The BridgeData V2 release added 60,000 episodes to BridgeData V1 as a minor version bump, preserving compatibility with existing training code.

RLDS Adoption in Foundation Models

RLDS has become the primary data format for robot foundation model training. The RT-1 model trained on 130,000 RLDS episodes from a single robot, achieving 97% success on 17 evaluation tasks. RT-2 extended this to web-scale vision-language pretraining, fine-tuning on RLDS robot data to ground internet knowledge in physical affordances.

The OpenVLA model demonstrates RLDS's role in open-source replication. OpenVLA trained on the Open X-Embodiment dataset (all RLDS-formatted), achieving 82% of RT-2's performance using only public data and compute. The OpenVLA release includes RLDS loading code, normalization statistics, and evaluation protocols, enabling community fine-tuning.

Sim-to-real transfer increasingly uses RLDS. Simulation environments like ManiSkill and RoboSuite export trajectories in RLDS format, enabling joint training on sim and real data. The NVIDIA GR00T N1 model trained on 1 million RLDS episodes—70% simulation, 30% real—using domain randomization to bridge the reality gap^[5].

Data mixing strategies are an active research area. The RoboCat paper found that mixing RLDS datasets in a 3:1 ratio (simulation:real) maximized sample efficiency, reducing real-robot data requirements by 5×. The LeRobot library provides `MixedDataset` for programmatic mixing with per-dataset sampling weights.

RLDS Limitations and Emerging Alternatives

RLDS's TFRecord-based storage incurs 15-25% overhead compared to raw formats like HDF5 or MCAP. A 100 GB HDF5 dataset typically becomes 120 GB in RLDS due to per-record framing, length prefixes, and CRC checksums. For bandwidth-constrained deployments, this overhead is non-trivial—uploading a 500 GB dataset to cloud storage costs $15-40 in egress fees.

Random access is inefficient in RLDS. TFRecords are append-only sequential formats; seeking to episode n requires scanning records 0 through n-1. The MCAP format solves this with chunk indexes, enabling O(1) seeks. For interactive dataset exploration or active learning workflows that sample specific episodes, MCAP's 100-1000× faster seek times are decisive.

Schema evolution is brittle. Adding a new observation key (e.g., a depth camera) requires regenerating the entire dataset—RLDS does not support in-place schema updates. The Parquet format handles this gracefully via column projection: old readers ignore new columns, new readers see nulls for missing columns. The LeRobot library is experimenting with Parquet as a more flexible alternative.

Multimodal data (audio, tactile, force-torque) strains RLDS's tensor-centric schema. The HOI4D dataset includes 6-axis force-torque streams at 1 kHz, stored as variable-length tensors. RLDS's fixed-schema assumption makes this awkward—each step must pad or truncate to a maximum length. Emerging formats like MCAP with protobuf schemas handle variable-rate streams more naturally.

RLDS in Commercial Physical AI Pipelines

Enterprise robot deployments increasingly standardize on RLDS for data interchange. Scale AI's Physical AI platform ingests customer data in arbitrary formats (ROS bags, videos, CSVs) and exports RLDS datasets with quality-controlled annotations. This enables customers to train models using the same pipelines as RT-1 and OpenVLA without custom data engineering.

Truelabel's physical AI marketplace uses RLDS as the canonical delivery format for robot manipulation datasets. Buyers specify observation and action schemas in a structured intake form; collectors record demonstrations in native formats; Truelabel's pipeline converts to RLDS, validates schemas, computes normalization statistics, and packages datasets with metadata cards. This reduces buyer integration time from weeks to hours.

Annotation platforms are adding RLDS export. Encord and Segments.ai support RLDS as a first-class export format alongside COCO and Pascal VOC. Annotators label keyframes in video; the platform interpolates labels to all frames, nests them under `observation['annotations']`, and writes RLDS episodes. This enables human-in-the-loop correction of autonomously collected data.

Data provenance is a growing requirement for regulated industries (medical robotics, food handling, aerospace). The truelabel data provenance framework extends RLDS metadata with collector identity, robot serial numbers, calibration timestamps, and annotation lineage. This enables auditing which human labeled which frame, critical for FDA 510(k) submissions and EU AI Act compliance^[6].

Future Directions and Standardization Efforts

The robotics community is converging on RLDS 2.0, a backward-compatible extension addressing current limitations. Proposed features include nested episodes (hierarchical tasks with subtask boundaries), multi-agent schemas (coordinated robot teams), streaming metadata (per-step annotations like human corrections), and compression profiles (zstd, brotli) to reduce storage overhead by 30-50%.

Interoperability with world models is an active research area. NVIDIA Cosmos and other world foundation models consume video datasets in custom formats. The RLDS community is developing RLDS-Video, a profile that stores observations as compressed video streams (H.264, VP9) rather than per-frame tensors, reducing dataset size by 10-20× while preserving episode structure. This enables joint training on robot demonstrations and internet video.

Federated learning over RLDS datasets is emerging. The Figure + Brookfield partnership aims to collect 100 million humanoid manipulation episodes across warehouse deployments. Storing this centrally is infeasible; federated approaches train models on-site and aggregate gradients. RLDS's standardized schema enables cross-site model evaluation without data movement.

Licensing and commercialization remain unresolved. Most RLDS datasets use permissive licenses (MIT, Apache 2.0, CC-BY 4.0), but Creative Commons terms do not address model weight commercialization. The truelabel marketplace uses structured licensing with explicit commercial-use grants, royalty terms, and derivative-work restrictions—critical for buyers building commercial products on purchased data.

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

Physical AI glossaryGlossary hub Multi-Task Learning RoboticsDefinition and terminology Best robotics dataset marketplaces 2026Related page RLDS format for robot training dataDelivery format detail Trajectory PredictionDefinition and terminology How to Build an Egocentric Data Pipeline for Physical AIRelated page Best Egocentric Video Data Providers for Robotics and VLA Models (2026)Related page LeRobot datasets alternativePublic dataset alternative

External references and source context

RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
Formalizes RLDS specification and ecosystem in 2021 Google DeepMind paper
arXiv ↩
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment collaboration consolidating 22 datasets, 160k demos, 21 robots in RLDS format
arXiv ↩
RLDS GitHub repository
RLDS GitHub repository with conversion scripts and validation tools
GitHub ↩
LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch
LeRobot paper reporting 23% higher success with action chunking
arXiv ↩
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Domain randomization paper enabling sim-to-real transfer
arXiv ↩
Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence
EU AI Act Regulation 2024/1689 requiring data provenance for high-risk AI systems
EUR-Lex ↩

More glossary terms

Multi-Task Learning RoboticsMulti-task learning robotics trains a single neural network policy to execute multiple manipulation tasks by learning shared representations across diverse demonstrations Trajectory PredictionTrajectory prediction forecasts the future spatial positions and velocities of agents (humans, robots, vehicles) and objects over time horizons of 1–10 seconds Vision-Language-Action ModelA Vision-Language-Action (VLA) model is a neural architecture that processes camera images and natural-language instructions to produce robot control outputs Visuomotor PolicyA visuomotor policy is a neural network that accepts raw camera images as input and outputs robot motor commands (joint positions, velocities, or torques) as a single differentiable function, learning the entire perception-to-action pipeline end-to-end from demonstration or interaction data Off-the-shelf datasetAn existing public or commercial dataset bought without custom collection.Synthetic Data for Physical AISynthetic data for physical AI refers to training examples generated procedurally in physics simulation rather than collected from real robots

FAQ

What is the difference between RLDS and ROS bags?

ROS bags are timestamped message streams with no semantic structure—they store raw sensor data and control commands as they arrive. RLDS adds episode boundaries, task labels, success flags, and a standardized step schema (observation, action, reward, discount). Converting a ROS bag to RLDS requires synchronizing message topics by timestamp, grouping them into steps, and nesting sensor data under the observation dictionary. RLDS datasets are 15-25% larger than equivalent ROS bags due to TFRecord framing overhead, but they integrate directly with TensorFlow and PyTorch training pipelines without custom parsing logic.

Can I use RLDS datasets with PyTorch?

Yes, via bridge libraries. The LeRobot library provides RLDSDataset, a PyTorch Dataset subclass that wraps RLDS TFRecords and exposes them through standard __getitem__ indexing. This enables PyTorch DataLoader usage with multi-worker prefetching and GPU pinning. LeRobot handles frame stacking, action chunking, and normalization automatically. Performance is comparable to native PyTorch formats—LeRobot achieves 95% GPU utilization on 8×A100 nodes when training Diffusion Policy on the Open X-Embodiment dataset.

How do I convert my HDF5 robot dataset to RLDS?

The RLDS GitHub repository provides reference conversion scripts. The typical workflow: read HDF5 groups as episodes, read HDF5 datasets as timesteps, map your observation and action arrays to the RLDS step schema (nested dictionaries), add is_first and is_last flags at episode boundaries, and write TFRecords using the RLDS builder API. You must define a features dictionary specifying tensor shapes and dtypes for all observation and action keys. After conversion, run rlds.validate_dataset() to check schema conformance and detect NaN values. The Furniture Bench dataset provides a complete HDF5-to-RLDS reference implementation.

What metadata should I include in an RLDS dataset?

Minimum required metadata: dataset name, version, description, citation, splits (train/val/test), and features (tensor shapes/dtypes). Best practice adds robot-specific fields: embodiment (robot model), control frequency (Hz), camera names, action space type (continuous/discrete), task categories, collector demographics, known failure modes, and licensing terms. The Open X-Embodiment project and DROID dataset provide exemplary metadata. For commercial datasets, include data provenance fields: collector IDs, robot serial numbers, calibration timestamps, and annotation lineage to support regulatory audits.

Why is RLDS larger than my original dataset?

RLDS uses TFRecord serialization, which adds per-record framing (length prefix, CRC checksum, padding). This incurs 15-25% overhead compared to raw formats like HDF5 or MCAP. A 100 GB HDF5 dataset typically becomes 120 GB in RLDS. The overhead buys benefits: TFRecords support efficient sharding for distributed training, automatic checksumming for corruption detection, and native integration with TensorFlow's tf.data pipeline. For bandwidth-constrained scenarios, consider MCAP or Parquet as alternatives—both offer comparable training performance with lower storage overhead.

How does RLDS handle multi-robot datasets with different action spaces?

RLDS uses nested dictionaries for observations and actions, allowing per-robot keys. The Open X-Embodiment dataset contains 21 robot morphologies, each with unique observation keys (franka_wrist_rgb, stretch_head_depth) and action dimensions (7-DOF joint velocities vs 2-DOF mobile base commands). Training code injects embodiment tokens—learned embedding vectors added to the observation dictionary—so policies can condition on robot identity. Action normalization is applied per-dataset during training, not during RLDS creation. This preserves robot-native units in the dataset and enables flexible mixing strategies.

Find datasets covering RLDS

Truelabel surfaces vetted datasets and capture partners working with RLDS. Send the modality, scale, and rights you need and we route you to the closest match.

List Your Robot Dataset on Truelabel