truelabelRequest data

Dataset profile

IPEC-COMMUNITY/droid_lerobot: Large-Scale Franka Teleoperation Dataset

IPEC-COMMUNITY/droid_lerobot is a large-scale robotics dataset comprising 92,233 episodes and over 27 million frames of Franka Panda manipulation teleoperation, released under the permissive Apache-2.0 license. The dataset uses the LeRobot v2.0 codebase structure with 15fps video and Parquet episode storage across 93 chunks, covering 31,308 distinct tasks and 276,699 video segments. Robotics teams building vision-language-action models or imitation-learning systems can leverage this corpus to train manipulation policies on real-world teleoperation traces, benefiting from both the commercial-friendly licensing and the standardized LeRobot format that integrates directly with modern robot-learning frameworks.

Updated 2026-06-11
By Truelabel Team
Reviewed by Truelabel Team ·
franka robot dataset

Quick facts

Scale
92,233 episodes / 27M frames
License
Apache-2.0
Format
LeRobot v2.0 Parquet + video
Robot embodiment
Franka Panda
Frame rate
15 fps
Commercial use
Permitted

Dataset composition and structure

IPEC-COMMUNITY/droid_lerobot contains 92,233 teleoperation episodes collected from a Franka Panda robotic arm, totaling more than 27 million individual frames distributed across 93 data chunks of approximately 1,000 episodes each. The dataset captures 31,308 unique manipulation tasks with 276,699 associated video files, all recorded at 15 frames per second and stored in the LeRobot v2.0 codebase format. Each episode is serialized as a Parquet file following the schema data/chunk-XXX/episode_XXXXXX.parquet, while corresponding videos enable visual verification and multi-modal training workflows. The unified train split spans the entire collection from episode 0 through 92,232, offering a single contiguous partition for teams that prefer to implement custom cross-validation or hold-out strategies downstream. This architectural choice—Parquet for structured state-action tuples and separate video assets—mirrors the design philosophy of OpenX Embodiment and related large-scale robot-learning initiatives, ensuring compatibility with existing data loaders and transformation pipelines. The Franka embodiment focus means joint configurations, end-effector poses, and gripper states follow the kinematics and control conventions widely adopted in academic and industry manipulation research, reducing the integration overhead for teams already deploying Franka hardware or simulators.

Licensing and commercial deployment

Released under the Apache-2.0 license, IPEC-COMMUNITY/droid_lerobot grants both academic researchers and commercial entities broad rights to use, modify, and redistribute the data with minimal restrictions beyond attribution and disclaimer retention. Apache-2.0 is among the most permissive open-source licenses, explicitly allowing derivative works in proprietary products, which makes this dataset a viable foundation for startups and enterprises building manipulation models destined for production environments. Unlike datasets governed by non-commercial or share-alike clauses, teams can fine-tune vision-language-action policies on droid_lerobot, ship those policies in commercial robot fleets, and retain full control over model weights and downstream applications. The only formal obligations are to preserve the original license notice and provide clear attribution to IPEC-COMMUNITY when redistributing the dataset or trained artifacts that directly incorporate its contents. For procurement and legal-review workflows, the Hugging Face repository includes the license identifier in machine-readable metadata, streamlining compliance audits and due-diligence processes that often gate enterprise adoption of open datasets.

Integration with LeRobot and modern robot-learning stacks

The dataset adheres to the LeRobot v2.0 specification, a standardized schema designed to unify heterogeneous robot datasets under a common Parquet-and-video layout that modern PyTorch and JAX dataloaders can consume without custom parsing logic. Each episode file exposes time-indexed state observations, action commands, and metadata fields in columnar Parquet format, while the accompanying videos are synchronized frame-by-frame to enable end-to-end visuomotor policy training. Because LeRobot enjoys growing adoption in the OpenX and RT-X ecosystems, practitioners can drop droid_lerobot into existing training harnesses with minimal boilerplate, reusing data augmentation, normalization, and batching utilities already tested on sibling datasets. The 15fps capture rate strikes a balance between temporal resolution and storage overhead, matching the control frequencies of many real-time manipulation systems and avoiding the aliasing or motion-blur artifacts that plague higher-speed recordings of fast tasks. Teams targeting cross-embodiment generalization can combine droid_lerobot with other Franka or non-Franka corpora in the LeRobot family, leveraging shared action and observation spaces to train unified policies that transfer across hardware platforms. The chunked storage design—93 chunks of roughly 1,000 episodes—also facilitates distributed data loading and parallel preprocessing on multi-node clusters, a practical requirement when scaling imitation learning to tens of millions of frames.

Limitations and sourcing considerations

While the dataset's scale and permissive license are significant strengths, procurement teams should note that modality metadata—such as the presence of depth maps, force-torque readings, or tactile signals—is not explicitly enumerated in the available documentation, requiring manual inspection of sample episodes to confirm sensor coverage. The 31,308 task labels suggest broad task diversity, yet without a published task taxonomy or difficulty ranking, teams must allocate engineering effort to cluster or filter episodes by relevance to their target applications. The Franka embodiment constrains direct sim-to-real transfer to platforms with similar kinematics; organizations deploying non-Franka arms will need to invest in action-space remapping or cross-embodiment training techniques to realize value from this corpus. Additionally, the dataset was created using the DROID teleoperation framework and contributed by IPEC-COMMUNITY, but public documentation does not detail data-quality filters, annotator expertise, or failure-mode prevalence, so pilot evaluations on held-out test scenarios are essential before committing to full-scale training runs. Finally, the 512,529 download count indicates strong community adoption, yet version control and update schedules are managed by the dataset authors; teams should monitor the Hugging Face repository for schema changes or supplementary releases that might require pipeline adjustments.

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

FAQ

What is IPEC-COMMUNITY/droid_lerobot and what does it contain?

IPEC-COMMUNITY/droid_lerobot is a large-scale robotics dataset comprising 92,233 teleoperation episodes and over 27 million frames collected from a Franka Panda robotic arm. The dataset captures 31,308 distinct manipulation tasks and includes 276,699 video files recorded at 15 frames per second, all packaged in the LeRobot v2.0 format with Parquet episode files and synchronized video assets. It was created using the DROID teleoperation infrastructure and is distributed across 93 data chunks for efficient loading and parallel processing in modern robot-learning pipelines.

What license governs this dataset and can I use it commercially?

The dataset is released under the Apache-2.0 license, one of the most permissive open-source licenses available. Apache-2.0 explicitly permits commercial use, modification, and redistribution with minimal restrictions—you are free to train proprietary manipulation policies on this data, deploy those models in commercial robot fleets, and retain full ownership of derived model weights. The only requirements are to preserve the original license notice and provide attribution to IPEC-COMMUNITY when redistributing the dataset or works that directly incorporate its contents, making compliance straightforward for enterprise legal and procurement workflows.

Who should use IPEC-COMMUNITY/droid_lerobot for their robotics projects?

This dataset is ideal for robotics teams building vision-language-action models, imitation-learning systems, or world models that require large-scale real-world manipulation data from a widely adopted embodiment. Organizations deploying Franka Panda arms will benefit most directly, as the kinematic and control conventions align with their hardware, reducing integration overhead. Researchers exploring cross-embodiment generalization can combine droid_lerobot with other LeRobot-format corpora to train unified policies. Startups and enterprises that need commercially permissive training data for production deployments will find the Apache-2.0 license particularly valuable, avoiding the restrictions that accompany non-commercial or copyleft datasets.

When is this dataset NOT the right choice for my robot-learning pipeline?

Teams deploying non-Franka embodiments—such as UR-series arms, mobile manipulators, or custom kinematic chains—will face action-space mismatches that require remapping or cross-embodiment training techniques, potentially diluting the dataset's immediate value. If your application demands specific sensor modalities like high-resolution depth, force-torque feedback, or tactile arrays, you should first inspect sample episodes to confirm their presence, as modality metadata is not exhaustively documented. Organizations requiring fine-grained task taxonomies, difficulty labels, or annotated failure modes will need to invest engineering effort in clustering and filtering the 31,308 task labels, since the dataset does not ship with a structured task hierarchy. Finally, teams with strict data-provenance or quality-assurance requirements may find the lack of public details on annotator expertise and failure-rate filtering insufficient for mission-critical deployments without conducting their own validation studies.

How do I integrate droid_lerobot into my existing training infrastructure?

The dataset uses the LeRobot v2.0 specification, which means episode data is stored as Parquet files with a standardized schema and synchronized video assets, enabling direct consumption by PyTorch and JAX dataloaders that support the LeRobot format. You can clone or download the repository from Hugging Face, then point your dataloader at the chunk directories following the pattern data/chunk-XXX/episode_XXXXXX.parquet. Because the schema matches other OpenX and RT-X datasets, existing normalization, augmentation, and batching utilities can be reused with minimal modification. The 93-chunk structure supports distributed loading across multi-node clusters, and the 15fps frame rate aligns with typical real-time manipulation control frequencies, reducing the need for temporal resampling or interpolation in your preprocessing pipeline.

What are the main limitations or gaps I should plan for when sourcing this dataset?

The primary gap is incomplete modality documentation—while the dataset includes video and episode Parquet files, the exact sensor suite (RGB, depth, proprioception, force-torque) must be confirmed by inspecting sample data rather than relying on metadata alone. The 31,308 task labels suggest diversity, but without a published taxonomy or difficulty ranking, teams must allocate engineering resources to categorize and filter episodes by relevance. The Franka-only embodiment limits out-of-the-box applicability for organizations using different robot hardware, necessitating cross-embodiment techniques or action-space transformations. Additionally, public information does not detail data-quality filters, annotator protocols, or failure-mode prevalence, so pilot studies on held-out test scenarios are essential to validate dataset suitability before scaling training. Finally, version control and updates are managed by the dataset authors on Hugging Face, so teams should monitor the repository for schema changes that might require pipeline adjustments.

Need data like IPEC-COMMUNITY/droid_lerobot: Large-Scale Franka Teleoperation Dataset?

If your project needs similar modality, scale, or licensing, truelabel can surface comparable open datasets or match you with capture partners that deliver to spec.

Access dataset on Hugging Face