Delivery format
LeRobot format format for robot training data
LeRobot format is useful for developer-friendly robot learning datasets and policy training pipelines. Define episode metadata, observation tensors, action tensors, timestamps, and repo-compatible manifest before reviewing samples so you can verify that delivery matches the training pipeline.
Quick facts
- Origin
- Hugging Face — LeRobot framework released 2024, Apache-2.0 license, github.com/huggingface/lerobot.
- Datasets
- 181 datasets in the public LeRobot collection on Hugging Face Hub.
- Supported policies
- 10 architectures: ACT, Diffusion, VQ-BeT, HIL-SERL, TDMPC, π0, π0.5, GR00T N1.5, SmolVLA, XVLA.
- Format
- Synchronized MP4 videos + Parquet tables for state/action streams; LeRobotDataset v2.0 / v2.1.
- Simulators
- LIBERO and MetaWorld benchmarks supported.
Comparison
| Format choice | Strength | Risk |
|---|---|---|
| LeRobot format | developer-friendly robot learning datasets and policy training pipelines | Needs exact schema agreement before capture |
| Raw files | Fast supplier export | High buyer cleanup burden |
| Custom schema | Matches internal pipeline | Harder supplier onboarding |
What is LeRobot format?
LeRobot format should be requested when the buyer's training or evaluation pipeline already expects developer-friendly robot learning datasets and policy training pipelines. Anchor the bounty to the canonical specification before suppliers submit samples [1], then use implementation documentation to make the expected file layout reviewable [2]. Robotics teams should also name the dataset or paper lineage they expect suppliers to support [3].
[1]"LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch."
For truelabel buyers, that quote matters because it turns LeRobot format from a generic delivery preference into a source-backed requirement the supplier can test against a sample file.
Using LeRobot format with robot data
A useful LeRobot format sample should prove episode metadata, observation tensors, action tensors, timestamps, and repo-compatible manifest, plus file naming, manifest completeness, timestamp behavior, and rejected-example traceability. Include at least one workflow or converter reference so the supplier can show how the files load in practice [4], one interoperability reference for adjacent formats [5], and one comparison source for why this format is preferable to a raw folder dump [6].
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
External references and source context
- LeRobot repository
LeRobot provides robotics models, datasets, and tools for PyTorch workflows.
GitHub ↩ - LeRobot documentation
Hugging Face publishes LeRobot documentation for robotics dataset workflows.
Hugging Face ↩ - LeRobot dataset documentation
LeRobot dataset documentation defines dataset packaging expectations.
Hugging Face ↩ - LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch
The LeRobot paper frames the library as real-world robotics tooling.
arXiv ↩ - RLDS: Reinforcement Learning Datasets
RLDS is a related robotics episode format for conversion planning.
GitHub ↩ - HDF5 1.14 documentation
HDF5 is relevant to LeRobot-compatible robot episode storage.
The HDF Group ↩
FAQ
What is LeRobot format used for?
LeRobot format is used for developer-friendly robot learning datasets and policy training pipelines.
What fields should LeRobot format delivery require?
At minimum, require episode metadata, observation tensors, action tensors, timestamps, and repo-compatible manifest, plus a delivery manifest and validation notes.
Can suppliers convert into this format?
Some suppliers can deliver directly in the requested format; others may need conversion. Buyers should require a small sample before full delivery.
Should the format be decided before capture?
Yes. Deciding the format before capture prevents missing fields, timestamp drift, and expensive post-delivery cleanup.
Working with LeRobot format
Truelabel normalizes LeRobot format across capture partners so you can ingest one consistent schema instead of writing per-vendor adapters.
Request LeRobot format data