truelabelRequest data

Reference

Physical AI data glossary

Plain-English definitions of the terms buyers and suppliers use when scoping physical AI data bounties — modalities, capture rigs, formats, and metadata.

How to use this hub

Start here when you know the broad category but haven't nailed the exact bounty spec yet. Each linked page narrows the request into a concrete data shape: modality, task, environment, metadata, rights, consent, delivery format, and sample QA. That structure is what turns a vague physical AI data need into something a supplier can prove or reject with evidence.

The hub isn't meant to be the last page you read. It should hand off to a detail page where the specific intent is answered with sample specs, comparison tables, proof requirements, and external source context.

10 pages — search and filter

10 of 10 datasets

Consent artifact

Glossary

Consent artifact means a record showing that a contributor or site granted permission for data capture and downstream use. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is consent artifact
  • Consent artifact definition

Data provenance

Glossary

Data provenance means the record of where data came from, how it was collected, what rights apply, and how it changed before delivery. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is data provenance definition
  • Data provenance definition definition

Egocentric data

Glossary

Egocentric data means first-person video or sensor data captured from the perspective of a person or embodied actor. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is egocentric data
  • Egocentric data definition

Off-the-shelf dataset

Glossary

Off-the-shelf dataset means an existing dataset a supplier can license without running a new capture program. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is off The Shelf dataset
  • Off The Shelf dataset definition

Physical AI training data

Glossary

Physical AI training data means data that teaches models to perceive, reason about, and act in real or simulated physical environments. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is physical AI training data definition
  • Physical AI training data definition definition

Robot demonstrations

Glossary

Robot demonstrations means task examples showing a robot or human demonstrator completing a behavior that a model should learn or evaluate. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is robot demonstrations
  • Robot demonstrations definition

Sim-to-real gap

Glossary

Sim-to-real gap means the performance gap between behavior learned in simulation and behavior deployed in real physical environments. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is sim to real gap
  • Sim to real gap definition

Teleoperation data

Glossary

Teleoperation data means robot observations, state, and action traces recorded while a human remotely controls the robot. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is teleoperation data definition
  • Teleoperation data definition definition

VLA model

Glossary

VLA model means a vision-language-action model that connects visual observations and language context to physical actions. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is VLA model
  • VLA model definition

World model AI

Glossary

World model AI means a model that learns predictive structure about environments, objects, motion, and consequences. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.

  • What is world model AI
  • World model AI definition

Procurement questions before posting a bounty

  • What exact model behavior or evaluation question should this data improve?
  • Which modality, camera viewpoint, robot state, or metadata stream is required?
  • What evidence proves the supplier has rights, consent, and provenance?
  • Which delivery format must the sample open in before scale-up?
  • What specific failure reasons should cause sample rejection?

Quality gate before a page becomes a deal spec

A page in this hub should not be treated as a finished procurement document by itself. It is a starting point for a bounty. Before a buyer funds capture or licenses off-the-shelf data, the page needs to become a short operating spec: accepted examples, rejected examples, file format, metadata fields, consent requirements, delivery location, and a named reviewer who can approve the sample.

The practical test is simple: if two suppliers read the same detail record, would they submit comparable samples? If not, the buyer needs to narrow the research into a more specific bounty. The strongest truelabel references help with that narrowing by linking from broad hubs into task pages, dataset profiles, format guides, glossary definitions, and public dataset alternatives.

GateQuestionPass signal
IntentWhat model behavior does the data improve?The objective is tied to a task, benchmark, or evaluation gap.
EvidenceWhat proves a supplier can deliver?A sample package includes files, manifest, rights, and QA notes.
IngestionCan the buyer load the sample?The sample opens in the expected format or converter.

Hub FAQ

How should buyers use the Physical AI data glossary hub?

Use the Physical AI data glossary hub to move from a broad physical AI data need into a concrete page with modality, sample, QA, format, rights, and supplier-evidence requirements.

Are these pages public datasets?

No. These pages are sourcing and specification guides for posting bounties. They help buyers define what a supplier must prove before data is accepted.

Why does this hub link to so many detail pages?

Each detail page handles one specific task, dataset, comparison, definition, or format. The hub is the index that helps a buyer pick the right one for the bounty they want to post.

What makes a page ready for a bounty?

A page is ready when it names a model objective, concrete files, metadata requirements, rights and consent expectations, sample QA checks, and a delivery format.

External source context

  1. Scale AI physical AI data engine

    Shows enterprise demand for custom physical AI collection and enrichment programs.

  2. NVIDIA Physical AI Data Factory Blueprint

    Frames physical AI data as an end-to-end factory problem spanning curation, generation, evaluation, and delivery.

  3. Open X-Embodiment

    Baseline open robotics data entity for cross-embodiment tasks and VLA pretraining discussions.

  4. Ego4D dataset

    Canonical egocentric video benchmark for first-person physical-world capture and limitations.