COMMERCIAL-USE FACETS

Robotics datasets by commercial-use status

The license tag is a first-pass signal. truelabel's commercial-use field captures the conservative buyer-grade interpretation — combining license, consent posture, contributor terms, and downstream model-use restrictions — so teams know which datasets are commercial-ready before they invest in training.

DIRECT ANSWER

Commercial-use status is a buyer-grade verdict, not a re-statement of the license. A dataset can ship under Apache-2.0 code but still carry contributor or environmental footage with non-commercial restrictions inherited from the capture process. truelabel’s status is one of: allowed, restricted, unclear, or research-only.

Benchmark datasets and demonstration formats vary by task suite.

RoboMimic

A benchmark and dataset framework for robot imitation learning with standardized tasks and evaluation utilities.

Benchmark suite of simulated manipulation tasks.

Meta-World

A simulated manipulation benchmark for multi-task and meta-reinforcement learning.

Simulation suite with tasks and environments maintained by the ManiSkill project.

ManiSkill

A simulation benchmark and toolkit for manipulation skills and embodied AI policy evaluation.

Interactive household simulation scenes and tasks.

AI2-THOR

An interactive simulated environment for embodied AI agents in household-like scenes.

Simulation tasks and assets for manipulation research.

RoboSuite

A simulation framework and benchmark suite for robot manipulation tasks.

Multi-institution robot demonstration corpus; exact per-task scale varies by contributing dataset.

Open X-Embodiment

A large cross-institution collection of robot demonstrations spanning many embodiments and manipulation tasks.

Large real-world manipulation corpus; check source for current release counts.

DROID

A real-world robot manipulation dataset focused on diverse teleoperated demonstrations outside narrow lab-only settings.

Robot manipulation demonstrations across multiple tasks; source release describes exact split.

BridgeData V2

A robot manipulation dataset from Berkeley focused on real-world behavior cloning and task generalization.

Large language-conditioned robot demonstrations described in the source paper and project materials.

RT-1

A robotics transformer data release associated with language-conditioned robot manipulation research.

Task-specific demonstrations released around the ALOHA platform and follow-on projects.

ALOHA

A low-cost bimanual teleoperation platform and dataset family used for imitation learning in dexterous manipulation.

Multi-robot manipulation dataset; source materials specify exact robot/task counts.

RoboNet

A multi-robot dataset for visual foresight and manipulation policy research.

See all 25 commercial use uncleardatasets →

Large-scale first-person video corpus with annotations; source controls exact access terms.

Ego4D

A large-scale egocentric video dataset focused on first-person human activity understanding.

Large egocentric kitchen activity dataset with action annotations.

EPIC-KITCHENS

An egocentric video dataset of kitchen activities used for action recognition and human-object interaction research.

Large indoor RGB-D scene reconstruction corpus.

ScanNet

An indoor RGB-D reconstruction dataset used for 3D scene understanding.

Large human-object action video dataset; access and current terms are controlled by the dataset host.

Something-Something V2

A human action video dataset focused on object interactions and temporal reasoning.

Large autonomous driving scenes with cameras and LiDAR.

Waymo Open Dataset

A large autonomous driving dataset with camera, LiDAR, and labeled traffic scenes.

Large action recognition video corpus maintained through public benchmark releases.

Kinetics

A large video action recognition dataset used widely for video model pretraining.

A dataset record is only useful when it connects into the rest of the buyer workflow. The next review step is usually not another summary; it is a fit check, rights triage, source comparison, or custom bounty spec that names the missing proof.

For physical AI teams, the hard question is whether the public source can support a specific model objective under real deployment constraints. That requires adjacent dataset records, tools, comparisons, and sourcing paths, plus external references that a reviewer can open and challenge.

Use the links below to keep the review grounded. Start broad when discovery is incomplete, move into profile and comparison pages when the candidate source is known, and switch to custom collection when the blocker is rights, consent, geography, robot embodiment, or target environment coverage.

Curated profiles

Physical AI dataset catalog

Use the catalog to compare source-backed dataset profiles by modality, task, rights signal, consent risk, and deployment fit.

Broad discovery

Hugging Face robotics index

Scan the broader robotics dataset surface before narrowing into promoted profiles, comparisons, and custom collection specs.

Freshness layer

Dataset changelog

Track source updates, licensing notes, and buyer-readiness changes that should trigger a renewed review.

Buyer workflow

Dataset fit checker

Score whether a public source is enough for the model, rights path, modalities, and target environment.

Rights triage

License risk checker

Separate source license language from contributor consent, redistribution, private-space risk, and model-use assumptions.

Custom data path

Data spec generator

Turn a public-source gap into a scoped capture request with sample QA, metadata, and delivery requirements.

Supplier research

Vendor alternatives hub

Compare data providers when the answer is not another public dataset but a better sourcing or capture route.

Market map

Data annotation companies

Use the company index to separate annotation vendors, data engines, marketplaces, and specialist capture teams.

External reference

Scale AI physical AI data engine

Market context for why physical AI systems need custom, enriched, real-world data beyond generic labeling workflows.

External reference

LeRobot documentation

Robotics dataset and tooling context for Hugging Face based collection, sharing, conversion, and training workflows.

External reference

Open X-Embodiment

A cross-embodiment robotics dataset reference for comparing trajectory scale, robot diversity, and VLA training assumptions.

External reference

DROID dataset

A large in-the-wild robot manipulation dataset reference for real-world trajectory capture and deployment transfer risk.

TRUELABEL ROUTING

Need a commercial-ready dataset that doesn't exist publicly?

If the catalog can't surface a commercial-use-allowed dataset for your task, commission custom data with explicit commercial-training terms, signed contributor consent, and per-batch QA gates.

Request commercial-ready data

Robotics datasets by commercial-use status

Source appears permissive; verify data terms

RoboMimic

Meta-World

ManiSkill

AI2-THOR

RoboSuite

Commercial use unclear

Open X-Embodiment

DROID

BridgeData V2

RT-1

ALOHA

RoboNet

Commercial use restricted

Ego4D

EPIC-KITCHENS

ScanNet

Something-Something V2

Waymo Open Dataset

Kinetics

Use this record as part of a broader dataset review

Continue the buyer workflow

Physical AI dataset catalog

Hugging Face robotics index

Dataset changelog

Dataset fit checker

License risk checker

Data spec generator

Vendor alternatives hub

Data annotation companies

Source context to verify

Scale AI physical AI data engine

LeRobot documentation

Open X-Embodiment

DROID dataset

Need a commercial-ready dataset that doesn't exist publicly?