truelabelRequest data

Delivery format

LeRobot format format for robot training data

LeRobot format is useful for developer-friendly robot learning datasets and policy training pipelines. Define episode metadata, observation tensors, action tensors, timestamps, and repo-compatible manifest before reviewing samples so you can verify that delivery matches the training pipeline.

Updated 2026-05-04
By truelabel
Reviewed by truelabel ·
LeRobot format

Quick facts

Origin
Hugging Face — LeRobot framework released 2024, Apache-2.0 license, github.com/huggingface/lerobot.
Datasets
181 datasets in the public LeRobot collection on Hugging Face Hub.
Supported policies
10 architectures: ACT, Diffusion, VQ-BeT, HIL-SERL, TDMPC, π0, π0.5, GR00T N1.5, SmolVLA, XVLA.
Format
Synchronized MP4 videos + Parquet tables for state/action streams; LeRobotDataset v2.0 / v2.1.
Simulators
LIBERO and MetaWorld benchmarks supported.

Comparison

Format choiceStrengthRisk
LeRobot formatdeveloper-friendly robot learning datasets and policy training pipelinesNeeds exact schema agreement before capture
Raw filesFast supplier exportHigh buyer cleanup burden
Custom schemaMatches internal pipelineHarder supplier onboarding

What is LeRobot format?

LeRobot format should be requested when the buyer's training or evaluation pipeline already expects developer-friendly robot learning datasets and policy training pipelines. Anchor the bounty to the canonical specification before suppliers submit samples [1], then use implementation documentation to make the expected file layout reviewable [2]. Robotics teams should also name the dataset or paper lineage they expect suppliers to support [3].

"LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch."

[1]

For truelabel buyers, that quote matters because it turns LeRobot format from a generic delivery preference into a source-backed requirement the supplier can test against a sample file.

Using LeRobot format with robot data

A useful LeRobot format sample should prove episode metadata, observation tensors, action tensors, timestamps, and repo-compatible manifest, plus file naming, manifest completeness, timestamp behavior, and rejected-example traceability. Include at least one workflow or converter reference so the supplier can show how the files load in practice [4], one interoperability reference for adjacent formats [5], and one comparison source for why this format is preferable to a raw folder dump [6].

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

External references and source context

  1. LeRobot repository

    LeRobot provides robotics models, datasets, and tools for PyTorch workflows.

    GitHub
  2. LeRobot documentation

    Hugging Face publishes LeRobot documentation for robotics dataset workflows.

    Hugging Face
  3. LeRobot dataset documentation

    LeRobot dataset documentation defines dataset packaging expectations.

    Hugging Face
  4. LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch

    The LeRobot paper frames the library as real-world robotics tooling.

    arXiv
  5. RLDS: Reinforcement Learning Datasets

    RLDS is a related robotics episode format for conversion planning.

    GitHub
  6. HDF5 1.14 documentation

    HDF5 is relevant to LeRobot-compatible robot episode storage.

    The HDF Group

FAQ

What is LeRobot format used for?

LeRobot format is used for developer-friendly robot learning datasets and policy training pipelines.

What fields should LeRobot format delivery require?

At minimum, require episode metadata, observation tensors, action tensors, timestamps, and repo-compatible manifest, plus a delivery manifest and validation notes.

Can suppliers convert into this format?

Some suppliers can deliver directly in the requested format; others may need conversion. Buyers should require a small sample before full delivery.

Should the format be decided before capture?

Yes. Deciding the format before capture prevents missing fields, timestamp drift, and expensive post-delivery cleanup.

Working with LeRobot format

Truelabel normalizes LeRobot format across capture partners so you can ingest one consistent schema instead of writing per-vendor adapters.

Request LeRobot format data