RoboMimic
A benchmark and dataset framework for robot imitation learning with standardized tasks and evaluation utilities.
COMMERCIAL-USE FACETS
The license tag is a first-pass signal. truelabel's commercial-use field captures the conservative buyer-grade interpretation — combining license, consent posture, contributor terms, and downstream model-use restrictions — so teams know which datasets are commercial-ready before they invest in training.
DIRECT ANSWER
Commercial-use status is a buyer-grade verdict, not a re-statement of the license. A dataset can ship under Apache-2.0 code but still carry contributor or environmental footage with non-commercial restrictions inherited from the capture process. truelabel’s status is one of: allowed, restricted, unclear, or research-only.
SOURCE APPEARS PERMISSIVE; VERIFY DATA TERMS · 5 DATASETS
Source appears permissive — verify data terms
A benchmark and dataset framework for robot imitation learning with standardized tasks and evaluation utilities.
A simulated manipulation benchmark for multi-task and meta-reinforcement learning.
A simulation benchmark and toolkit for manipulation skills and embodied AI policy evaluation.
An interactive simulated environment for embodied AI agents in household-like scenes.
A simulation framework and benchmark suite for robot manipulation tasks.
A large cross-institution collection of robot demonstrations spanning many embodiments and manipulation tasks.
A real-world robot manipulation dataset focused on diverse teleoperated demonstrations outside narrow lab-only settings.
A robot manipulation dataset from Berkeley focused on real-world behavior cloning and task generalization.
A robotics transformer data release associated with language-conditioned robot manipulation research.
A low-cost bimanual teleoperation platform and dataset family used for imitation learning in dexterous manipulation.
A multi-robot dataset for visual foresight and manipulation policy research.
COMMERCIAL USE RESTRICTED · 6 DATASETS
Commercial use restricted by license or consent
A large-scale egocentric video dataset focused on first-person human activity understanding.
An egocentric video dataset of kitchen activities used for action recognition and human-object interaction research.
An indoor RGB-D reconstruction dataset used for 3D scene understanding.
A human action video dataset focused on object interactions and temporal reasoning.
A large autonomous driving dataset with camera, LiDAR, and labeled traffic scenes.
A large video action recognition dataset used widely for video model pretraining.
RESEARCH PATHS
A dataset record is only useful when it connects into the rest of the buyer workflow. The next review step is usually not another summary; it is a fit check, rights triage, source comparison, or custom bounty spec that names the missing proof.
For physical AI teams, the hard question is whether the public source can support a specific model objective under real deployment constraints. That requires adjacent dataset records, tools, comparisons, and sourcing paths, plus external references that a reviewer can open and challenge.
Use the links below to keep the review grounded. Start broad when discovery is incomplete, move into profile and comparison pages when the candidate source is known, and switch to custom collection when the blocker is rights, consent, geography, robot embodiment, or target environment coverage.
INTERNAL LINKS
Use the catalog to compare source-backed dataset profiles by modality, task, rights signal, consent risk, and deployment fit.
Scan the broader robotics dataset surface before narrowing into promoted profiles, comparisons, and custom collection specs.
Track source updates, licensing notes, and buyer-readiness changes that should trigger a renewed review.
Score whether a public source is enough for the model, rights path, modalities, and target environment.
Separate source license language from contributor consent, redistribution, private-space risk, and model-use assumptions.
Turn a public-source gap into a scoped capture request with sample QA, metadata, and delivery requirements.
Compare data providers when the answer is not another public dataset but a better sourcing or capture route.
Use the company index to separate annotation vendors, data engines, marketplaces, and specialist capture teams.
EXTERNAL REFERENCES
Market context for why physical AI systems need custom, enriched, real-world data beyond generic labeling workflows.
Robotics dataset and tooling context for Hugging Face based collection, sharing, conversion, and training workflows.
A cross-embodiment robotics dataset reference for comparing trajectory scale, robot diversity, and VLA training assumptions.
A large in-the-wild robot manipulation dataset reference for real-world trajectory capture and deployment transfer risk.
TRUELABEL ROUTING
If the catalog can't surface a commercial-use-allowed dataset for your task, commission custom data with explicit commercial-training terms, signed contributor consent, and per-batch QA gates.