Sourcing reference

Robotics data sourcing specs

When the public catalog doesn't have what you need, these pages show what to specify in a sourcing request: modality, environment, metadata, scale, license, and acceptance criteria. Each page maps the gap between what public datasets ship and what a deployment-ready buyer needs.

How to use this hub

Start here when you know the broad category but haven't nailed the exact bounty spec yet. Each linked page narrows the request into a concrete data shape: modality, task, environment, metadata, rights, consent, delivery format, and sample QA. That structure is what turns a vague physical AI data need into something a supplier can prove or reject with evidence.

The hub isn't meant to be the last page you read. It should hand off to a detail page where the specific intent is answered with sample specs, comparison tables, proof requirements, and external source context.

10 pages — search and filter

10 of 10 datasets

Sourcing egocentric kitchen video

Sourcing spec

An egocentric kitchen video dataset is useful for household robotics, VLA, and world-model teams. When sourcing it, specify first-person cooking, cleaning, object handling, and cabinet interaction video, capture in residential kitchens with varied layouts and appliances, and task label, appliance type, object set, consent artifact, and clip boundary so supplier samples can be reviewed before adoption.

Egocentric kitchen video
Physical AI dataset

Sourcing egocentric warehouse video

Sourcing spec

An egocentric warehouse video dataset is useful for robotics teams modeling logistics and manipulation tasks. When sourcing it, specify head-mounted video with optional hand pose, capture in warehouse picking, packing, sorting, and staging, and SKU family, task phase, region, contributor consent, and camera intrinsics so supplier samples can be reviewed before adoption.

Egocentric warehouse video
Physical AI dataset

Sourcing egocentric workshop video

Sourcing spec

An egocentric workshop video dataset is useful for teams building tool-use and dexterous manipulation models. When sourcing it, specify wearable camera video of tools, fasteners, repair, and assembly, capture in workshops, garages, labs, and maker spaces, and tool class, material, action phase, safety notes, and contributor consent so supplier samples can be reviewed before adoption.

Egocentric workshop video
Physical AI dataset

Sourcing industrial egocentric video

Sourcing spec

An industrial egocentric video dataset is useful for industrial robotics and vision-language-action teams. When sourcing it, specify first-person video from factory, maintenance, inspection, and logistics work, capture in industrial facilities, warehouses, field service routes, and workcells, and site permission, task type, safety constraints, equipment class, and consent so supplier samples can be reviewed before adoption.

Industrial egocentric video
Physical AI dataset

Sourcing mocap human demonstrations

Sourcing spec

A mocap human demonstration dataset is useful for humanoid and dexterous manipulation teams needing motion priors. When sourcing it, specify body, hand, and object motion capture with video reference, capture in studio or controlled workspace task capture, and skeleton schema, marker set, task segment, object contact, and performer consent so supplier samples can be reviewed before adoption.

Mocap human demonstrations
Physical AI dataset

Sourcing multi-view manipulation

Sourcing spec

A multi-view manipulation dataset is useful for teams needing scene context and first-person interaction in one dataset. When sourcing it, specify egocentric, wrist, and third-person video captured together, capture in tabletop, bin, shelf, and workbench manipulation, and camera calibration, sync offset, object set, task phase, and consent artifact so supplier samples can be reviewed before adoption.

Multi View manipulation
Physical AI dataset

Sourcing rgbd manipulation

Sourcing spec

An RGBD robot manipulation data is useful for teams that need geometry-aware manipulation data. When sourcing it, specify RGB video plus depth, pose, and object state metadata, capture in tabletop, bin-picking, shelf, and small parts workflows, and depth calibration, camera intrinsics, object IDs, and task outcomes so supplier samples can be reviewed before adoption.

Rgbd manipulation
Physical AI dataset

Sourcing tactile glove data

Sourcing spec

A tactile glove dataset is useful for teams working on contact-rich manipulation. When sourcing it, specify per-finger force or tactile signals with synchronized video, capture in grasping, tool use, insertion, fabric, and deformable object tasks, and sensor layout, calibration, contact phase, object class, and task outcome so supplier samples can be reviewed before adoption.

Tactile glove data
Physical AI dataset

Sourcing teleop kitchen data

Sourcing spec

A teleoperation kitchen dataset is useful for home robotics teams validating household manipulation behavior. When sourcing it, specify robot teleoperation traces with wrist or external video, capture in kitchen counters, cabinets, drawers, appliances, and utensils, and task instruction, object set, robot state, success flag, and episode boundary so supplier samples can be reviewed before adoption.

Teleop kitchen data
Physical AI dataset

Sourcing teleop warehouse data

Sourcing spec

A teleoperation warehouse dataset is useful for teams training warehouse manipulation and logistics policies. When sourcing it, specify robot camera streams, state, action, and task outcome traces, capture in warehouse pick, place, tote transfer, and shelf interaction workcells, and robot embodiment, object class, success flag, timestamp sync, and format manifest so supplier samples can be reviewed before adoption.

Teleop warehouse data
Physical AI dataset

Procurement questions before posting a bounty

What exact model behavior or evaluation question should this data improve?
Which modality, camera viewpoint, robot state, or metadata stream is required?
What evidence proves the supplier has rights, consent, and provenance?
Which delivery format must the sample open in before scale-up?
What specific failure reasons should cause sample rejection?

Quality gate before a page becomes a deal spec

A page in this hub should not be treated as a finished procurement document by itself. It is a starting point for a bounty. Before a buyer funds capture or licenses off-the-shelf data, the page needs to become a short operating spec: accepted examples, rejected examples, file format, metadata fields, consent requirements, delivery location, and a named reviewer who can approve the sample.

The practical test is simple: if two suppliers read the same detail record, would they submit comparable samples? If not, the buyer needs to narrow the research into a more specific bounty. The strongest truelabel references help with that narrowing by linking from broad hubs into task pages, dataset profiles, format guides, glossary definitions, and public dataset alternatives.

Gate	Question	Pass signal
Intent	What model behavior does the data improve?	The objective is tied to a task, benchmark, or evaluation gap.
Evidence	What proves a supplier can deliver?	A sample package includes files, manifest, rights, and QA notes.
Ingestion	Can the buyer load the sample?	The sample opens in the expected format or converter.

Hub FAQ

How should buyers use the Robotics data sourcing specs hub?

Use the Robotics data sourcing specs hub to move from a broad physical AI data need into a concrete page with modality, sample, QA, format, rights, and supplier-evidence requirements.

Are these pages public datasets?

No. These pages are sourcing and specification guides for posting bounties. They help buyers define what a supplier must prove before data is accepted.

Why does this hub link to so many detail pages?

Each detail page handles one specific task, dataset, comparison, definition, or format. The hub is the index that helps a buyer pick the right one for the bounty they want to post.

What makes a page ready for a bounty?

A page is ready when it names a model objective, concrete files, metadata requirements, rights and consent expectations, sample QA checks, and a delivery format.

External source context

Scale AI physical AI data engine
Shows enterprise demand for custom physical AI collection and enrichment programs.
NVIDIA Physical AI Data Factory Blueprint
Frames physical AI data as an end-to-end factory problem spanning curation, generation, evaluation, and delivery.
Open X-Embodiment
Baseline open robotics data entity for cross-embodiment tasks and VLA pretraining discussions.
Ego4D dataset
Canonical egocentric video benchmark for first-person physical-world capture and limitations.