MODALITY FACETS

Datasets by modality

Pick the modality your model architecture expects. Egocentric video, teleoperation traces, RGB-D, point cloud, and tactile signals each unlock different downstream tasks — and each carries different consent and licensing posture by default.

DIRECT ANSWER

Modality is the most common starting point for buyer queries. The wrong modality forces a re-collection; the right modality with wrong rights forces a re-licensing pass. These facets group datasets by what they actually ship at the sensor level.

30 datasets

RGB-D

Color video paired with depth maps or point clouds for scene geometry, object localization, and manipulation training.

Open X Embodiment
DROID
BridgeData V2

24 datasets

Proprioception

Robot state streams such as joint positions, velocities, gripper state, and end-effector poses.

Open X Embodiment
DROID
BridgeData V2

10 datasets

Teleoperation trajectories

Robot demonstrations recorded while a human controls the platform, often including action states, end-effector poses, and synchronized video.

Open X Embodiment
DROID
ALOHA

6 datasets

Point cloud

3D points from depth cameras, LiDAR, reconstruction, or simulation, used for scene geometry, mapping, object shape, and manipulation planning.

ManiSkill
ScanNet
Habitat datasets

4 datasets

Egocentric video

First-person video captured from an operator, wearer, or robot perspective, usually used to model action, attention, and hand-object interaction.

Ego4D
EPIC KITCHENS
HOI4D

4 datasets

Motion capture

Optical or IMU-derived body and hand pose data used to reconstruct physical movement for training or evaluation.

Ego4D
HOI4D
DexYCB

2 datasets

Third-person video

External-camera video of people, objects, scenes, or tasks, usually used for action understanding and visual pretraining rather than robot state learning.

Something Something V2
Kinetics

2 datasets

Tactile sensing

Force, pressure, taxel, or glove-derived signals used for contact-rich manipulation and dexterous control.

ObjectFolder
RH20T

CROSS-CATALOG

Pair with another facet

Combine this facet with a second filter (modality, task, robot, format, license, or commercial-use) on the main dataset catalog to narrow the buyer decision faster.

Other facet hubs

Modality Task Robot Format License Commercial use

A dataset record is only useful when it connects into the rest of the buyer workflow. The next review step is usually not another summary; it is a fit check, rights triage, source comparison, or custom bounty spec that names the missing proof.

For physical AI teams, the hard question is whether the public source can support a specific model objective under real deployment constraints. That requires adjacent dataset records, tools, comparisons, and sourcing paths, plus external references that a reviewer can open and challenge.

Use the links below to keep the review grounded. Start broad when discovery is incomplete, move into profile and comparison pages when the candidate source is known, and switch to custom collection when the blocker is rights, consent, geography, robot embodiment, or target environment coverage.

TRUELABEL ROUTING

Need a modality combination not in any one dataset?

If your model needs paired modalities (egocentric + tactile, RGB-D + proprioception) that no single public corpus delivers, commission a custom collection with synchronized capture.

Request paired-modality data