Glossary
VLA model
VLA model means a vision-language-action model that connects visual observations and language context to physical actions. The term matters because it turns a model or procurement concept into concrete data requirements you can evaluate samples against.
Quick facts
- RT-1
- Google DeepMind robotics transformer for real-world control at scale (Brohan et al., 2022, arXiv:2212.06817)
- RT-2
- Vision-language-action transfer from web-scale data to robot control; 6,000 evaluation trials (Google, July 2023, arXiv:2307.15818)
- OpenVLA
- 7B-parameter open VLA — Prismatic-7B VLM (SigLIP + DinoV2 + Llama 2 7B) trained on 970,000 episodes from Open X-Embodiment (2024)
- π0 (Pi-Zero)
- Physical Intelligence VLA, 8 robot embodiments (UR5e, Bimanual UR5e, Franka, Bimanual Trossen/Arx, Mobile Trossen/Fibocom); π0-small variant 470M params (Oct 2024)
- What VLA training data needs
- Synchronized observations + language instructions + action traces — missing any stream means the dataset is not VLA-ready.
Comparison
| Question | Answer |
|---|---|
| Where it appears | Sourcing specs, QA requirements, dataset manifests, and buyer review notes |
| Why it matters | It turns abstract AI language into a supplier-verifiable requirement |
| Common failure | Using the term without defining modality, format, rights, or acceptance criteria |
How to use this term in a spec
A VLA model connects visual observations and language instructions to robot actions, so training data must align all three signals. OpenVLA explicitly defines its model as a vision-language-action system that maps image observations and language instructions to continuous robot actions. [1]
What to avoid
Do not use vla model as a vague keyword. Define the data files, metadata, rights, QA checks, and delivery format that make it measurable.
VLA model in buyer review
The VLA pattern is not just a labeling task: data must preserve robot episodes, instructions, action tokens or traces, and embodiment details. OpenVLA's project page, RT-2, and Open X-Embodiment all emphasize paired robot data as the substrate for action-producing models. [2] [3] [4]
VLA model supplier evidence
A buyer asking for VLA data should request a small sample that can be loaded into the intended schema and checked for observation-action-instruction alignment. If any of the three streams is missing, the dataset is not VLA-ready.
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
External references and source context
- OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA defines a vision-language-action model that maps image observations and language instructions to robot actions.
arXiv ↩ - OpenVLA project
The OpenVLA project describes an open-source vision-language-action model trained on robot episodes.
openvla.github.io ↩ - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RT-2 is presented as a vision-language-action model that transfers web knowledge to robotic control.
robotics-transformer2.github.io ↩ - Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment provides the robot datasets and RT-X models used to study generalist robot policies.
arXiv ↩
More glossary terms
FAQ
What is VLA model?
VLA model is a vision-language-action model that connects visual observations and language context to physical actions.
Why does it matter for physical AI?
It matters because physical AI data must be connected to actions, environments, metadata, rights, and model use, not just raw files.
How should buyers spec it in a sourcing request?
Request observations, instructions, action traces, and metadata in a training-ready schema.
Can suppliers validate this from samples?
Yes, if the buyer defines visible evidence, metadata requirements, and acceptance criteria before suppliers submit files.
Find datasets covering VLA model
Truelabel surfaces vetted datasets and capture partners working with VLA model. Send the modality, scale, and rights you need and we route you to the closest match.
Request vla model data