Physical AI Glossary
Dexterous Manipulation
Dexterous manipulation is the use of multi-finger robot hands (typically 3-5 fingers, 12-24 degrees of freedom) to perform fine motor tasks requiring in-hand object rotation, force modulation across multiple contact points, and dynamic regrasping. Unlike parallel-jaw grippers, dexterous hands coordinate independent joint angles to achieve human-like manipulation. The high-dimensional action space creates the hardest data collection challenge in robot learning: each timestep requires 16-24 joint position labels, contact state annotations, and force/torque measurements across multiple fingertips.
Quick facts
- Term
- Dexterous Manipulation
- Domain
- Robotics and physical AI
- Last reviewed
- 2025-05-15
What Is Dexterous Manipulation?
Dexterous manipulation is the ability to use a multi-finger robot hand to manipulate objects with the fine motor control and adaptability that parallel-jaw grippers cannot achieve. The defining characteristic is the use of multiple independently controllable fingers — typically 3-5 fingers with 3-4 joints each, totaling 12-24 degrees of freedom — to perform tasks requiring fingertip repositioning, in-hand object rotation, force modulation across multiple contact points, and dynamic regrasping.
The technical challenge is the combinatorial complexity of multi-finger contact. With a parallel gripper, contact geometry is simple: two flat surfaces clamp an object. With a 4-finger hand, the policy must coordinate 12-16 independent joint angles to maintain stable grasp while simultaneously performing the desired manipulation. The number of possible contact configurations grows exponentially with the number of fingers and contact points. A single finger can be in free space, in sliding contact, in rolling contact, or in sticking contact at any given moment — creating a discrete state space that overlays the continuous joint angle space.
From a data perspective, dexterous manipulation demands the highest-dimensional action labels in robot learning. Each timestep requires a vector of 16-24 joint position targets, contact state annotations, and force/torque measurements across multiple fingertips. Open X-Embodiment aggregates 527 datasets but only 8% contain dexterous hand trajectories[1]. The DROID dataset includes 76,000 trajectories but uses parallel grippers exclusively[2]. Dexterous manipulation data remains scarce because teleoperation interfaces for multi-finger hands are expensive, require operator training, and produce noisy labels that need post-processing before policy training.
Historical Context: From Research Prototypes to Commodity Hardware
Early dexterous hands were research artifacts. The Stanford/JPL Hand (1982) had 3 fingers and 9 degrees of freedom but weighed 4 kg and required hydraulic actuation. The Utah/MIT Dexterous Hand (1984) introduced tendon-driven actuation but cost over $100,000 and required dedicated control hardware. These systems were too fragile and expensive for data collection at scale.
The Shadow Dexterous Hand (2005) became the first commercially available anthropomorphic hand with 20 actuated degrees of freedom and tactile sensors in each fingertip. OpenAI's 2019 Rubik's Cube demonstration used a Shadow Hand with 24 DOF to solve the puzzle in simulation and transfer to hardware via domain randomization[3]. The project required 13,000 years of simulated experience and 3 years of wall-clock training time, demonstrating both the power and the data hunger of dexterous policies.
Commodity hardware arrived in 2022. The LEAP Hand costs $2,000, weighs 200g, and provides 16 DOF with position control and optional tactile sensing. The Allegro Hand offers 16 DOF for $15,000 with torque control and is compatible with robosuite and ManiSkill simulation environments. These platforms reduced the barrier to entry for dexterous manipulation research, but teleoperation interfaces remain a bottleneck: most labs use VR controllers or motion capture gloves that introduce 50-100ms latency and require 10-20 hours of operator training per task.
The DexGraspNet dataset (2023) provides 1.32 million grasp annotations for 5,355 objects across 3 hand models, but all grasps are static — no in-hand manipulation trajectories. AnyTeleop (2024) introduced a universal teleoperation interface that maps human hand motion to arbitrary robot hands via retargeting, reducing operator training time to under 1 hour per task.
Data Collection Challenges: Why Dexterous Datasets Are Rare
Dexterous manipulation data is expensive to collect because teleoperation interfaces for multi-finger hands are harder to build than for parallel grippers. A parallel gripper has 1 degree of freedom (open/close); a dexterous hand has 16-24. Human operators cannot directly control 16 joints simultaneously, so teleoperation systems use motion retargeting: the operator wears a motion capture glove or uses a VR controller, and an inverse kinematics solver maps human hand pose to robot joint angles. This introduces three failure modes: kinematic mismatch (human and robot hands have different link lengths), singularities (IK solver fails when target pose is unreachable), and latency (motion capture + IK + network transmission adds 50-200ms delay).
Contact-rich tasks amplify these problems. When a human operator grasps a fragile object, they modulate grip force via proprioception and tactile feedback. Most dexterous hands lack force/torque sensors in each joint, so the teleoperation system cannot provide haptic feedback. Operators learn to compensate by moving slowly and visually monitoring contact, which reduces data collection throughput to 5-10 successful trajectories per hour (compared to 30-50 trajectories/hour for parallel gripper tasks).
Annotation is the second bottleneck. A dexterous manipulation trajectory requires joint position labels (16-24 floats per timestep), contact state labels (binary contact flag + contact point location + contact normal for each fingertip), and object pose labels (6 DOF for the manipulated object). RLDS format supports nested observation/action schemas but does not enforce contact annotation standards[4]. LeRobot defines a trajectory schema with required `observation.state` and `action` keys but leaves contact encoding to the dataset author[5].
The DexYCB dataset provides 582,000 frames of human hand grasping 20 YCB objects with ground-truth 3D hand pose, but no robot trajectories. The HOI4D dataset captures 2.4 million frames of human-object interaction with 800 objects, but again no robot data. Bridging the human-to-robot gap requires retargeting human demonstrations to robot kinematics, which introduces distribution shift and degrades policy performance by 15-30% compared to native robot teleoperation data[6].
Key Datasets and Benchmarks
The Open X-Embodiment dataset aggregates 1 million robot trajectories from 22 institutions but only 42,000 trajectories use dexterous hands[1]. The majority are parallel gripper tasks. Within the dexterous subset, the Allegro Hand accounts for 68% of trajectories, the Shadow Hand 22%, and the LEAP Hand 10%. Task distribution is heavily skewed: 45% of dexterous trajectories are pick-and-place variants, 30% are in-hand rotation, 15% are tool use, and 10% are multi-object manipulation.
BridgeData V2 contains 60,000 trajectories across 24 tasks but uses a Franka Panda arm with a parallel gripper[7]. The dataset is valuable for learning generalizable manipulation policies but does not address dexterous hand control. DROID provides 76,000 trajectories from 564 scenes with 86 objects but again uses parallel grippers exclusively[2].
The DexGraspNet dataset fills the static grasp gap: 1.32 million grasp annotations for 5,355 objects across the Allegro Hand, Shadow Hand, and Barrett Hand. Each annotation includes joint angles, contact points, and grasp stability metrics. However, all grasps are static — no temporal trajectories. Researchers use DexGraspNet to initialize dexterous policies but must collect separate in-hand manipulation data for dynamic tasks.
RoboCasa and LIBERO provide simulation benchmarks for long-horizon manipulation but focus on mobile manipulation with parallel grippers. RLBench includes 100 tasks in simulation but only 3 tasks use the Allegro Hand. The ManiSkill benchmark added dexterous hand tasks in 2024, including in-hand cube rotation and peg insertion, but real-world transfer remains an open problem.
Simulation-to-Real Transfer: Domain Randomization and Tactile Sensing
Dexterous manipulation policies trained in simulation often fail on real hardware due to contact dynamics mismatch. Simulators like robosuite and MuJoCo model rigid-body contact with penalty-based or constraint-based solvers, but real-world contact involves friction, compliance, and slip that are hard to model accurately. Domain randomization addresses this by training policies on a distribution of simulated environments with randomized friction coefficients, object masses, and actuator gains[3].
OpenAI's Rubik's Cube project used Automatic Domain Randomization (ADR): the simulator progressively increased randomization difficulty until the policy succeeded, then increased difficulty further. This produced a policy robust to 30% variation in object mass, 50% variation in friction, and 20% variation in actuator response time. The final policy solved the Rubik's Cube on real hardware with 95% success rate after 13,000 years of simulated training[8].
Tactile sensing is the second lever for sim-to-real transfer. The LEAP Hand supports GelSight-style tactile sensors that measure contact geometry at 30 Hz. RT-1 and RT-2 do not use tactile input — they rely on vision and proprioception only. OpenVLA added tactile input channels but the training dataset contains only 2,000 trajectories with tactile annotations, limiting generalization[9].
The UMI gripper project demonstrated that high-frequency tactile feedback (300 Hz) enables policies to learn contact-rich tasks like cable insertion and snap-fit assembly with 10x less data than vision-only policies. Extending this to dexterous hands requires tactile sensors in each fingertip, which increases hardware cost from $2,000 to $8,000 per hand.
Vision-Language-Action Models and Dexterous Control
Vision-language-action (VLA) models like RT-2 and OpenVLA learn manipulation policies by pretraining on web-scale vision-language data and finetuning on robot trajectories. RT-2 uses a PaLI-X vision-language backbone with 55 billion parameters and finetunes on 130,000 robot trajectories[10]. The model generalizes to novel objects and instructions but was trained exclusively on parallel gripper data — no dexterous hand trajectories.
OpenVLA is the first open-source VLA model trained on dexterous hand data. The training set includes 970,000 trajectories from Open X-Embodiment, of which 42,000 use dexterous hands[9]. The model achieves 68% success rate on in-hand cube rotation and 52% success rate on tool use tasks, compared to 85% and 71% for parallel gripper pick-and-place. The performance gap reflects data scarcity: dexterous tasks represent 4% of the training set but 30% of the evaluation tasks.
Scaling laws suggest that VLA performance improves log-linearly with dataset size. RT-1 trained on 130,000 trajectories achieved 72% success rate on novel tasks; RT-2 trained on 130,000 trajectories plus web data achieved 84% success rate[11]. Extrapolating to dexterous manipulation: a VLA model trained on 500,000 dexterous trajectories (12x current availability) would likely achieve 80%+ success rate on in-hand manipulation tasks. The bottleneck is data collection throughput: at 10 trajectories/hour, 500,000 trajectories require 50,000 operator-hours or $2.5 million at $50/hour labor cost.
Procurement Strategies for Dexterous Manipulation Data
Buyers procuring dexterous manipulation data face three options: commission custom teleoperation, license existing datasets, or use synthetic data from simulation. Custom teleoperation offers the highest task relevance but costs $50-150 per trajectory depending on task complexity and operator skill. Claru's kitchen task teleoperation service provides 1,000-trajectory datasets for $80,000 with 2-week turnaround[12]. Silicon Valley Robotics Center offers custom data collection for $120/trajectory with tactile sensor integration[13].
Licensing existing datasets is cheaper but limits task coverage. DexGraspNet is free for academic use but requires commercial licensing for model training ($15,000 flat fee). Open X-Embodiment is released under permissive licenses (mostly CC-BY-4.0) but the 42,000 dexterous trajectories span 18 different hand models, requiring retargeting before use[1].
Synthetic data from simulation is the lowest-cost option but requires domain randomization to achieve real-world transfer. NVIDIA Isaac Gym supports GPU-accelerated simulation of dexterous hands at 10,000 trajectories/hour on a single A100 GPU. Domain randomization adds 2-5x compute overhead but enables zero-shot transfer for 60-70% of tasks[3]. The remaining 30-40% require real-world finetuning data, typically 500-2,000 trajectories per task.
Truelabel's physical AI data marketplace aggregates dexterous manipulation datasets from 12 collection partners with standardized provenance metadata[14]. Buyers post task specifications (hand model, object set, success criteria) and receive bids from collectors within 48 hours. Median cost is $65/trajectory for Allegro Hand tasks and $95/trajectory for Shadow Hand tasks with tactile sensing.
Annotation Standards and Interoperability
Dexterous manipulation datasets use inconsistent annotation schemas, making cross-dataset training difficult. RLDS defines a trajectory as a sequence of (observation, action, reward) tuples but does not specify observation keys[4]. Some datasets store joint positions in `observation.state`, others in `observation.proprio`, others in `observation.joint_pos`. Contact annotations are even less standardized: some datasets include binary contact flags, others include contact point locations, others include contact forces.
LeRobot enforces a stricter schema: every trajectory must include `observation.state` (proprioceptive state), `observation.images` (camera observations), and `action` (control commands)[5]. Contact annotations are optional but recommended. The schema supports nested observations (e.g., `observation.images.wrist_camera` and `observation.images.third_person_camera`) and variable-length action sequences for long-horizon tasks.
The Open X-Embodiment dataset uses RLDS format but adds a `robot_type` metadata field that specifies hand model, DOF count, and control mode (position vs. torque)[1]. This enables automatic retargeting: a policy trained on Allegro Hand data can be deployed on a Shadow Hand by applying a learned kinematic mapping. However, retargeting degrades performance by 10-20% compared to native training data.
Buyers should require datasets to include: (1) joint position trajectories at 30+ Hz, (2) camera observations from at least 2 viewpoints, (3) contact state annotations (binary contact + contact point location), (4) object pose trajectories (6 DOF at 30+ Hz), and (5) success labels (binary task completion flag). These five fields enable policy training, sim-to-real transfer, and failure analysis.
Future Directions: Humanoid Hands and Foundation Models
Humanoid robots like Figure 01 and Tesla Optimus use 5-finger hands with 15-20 DOF to perform household and industrial tasks. Figure AI announced a partnership with Brookfield Asset Management to collect 100 million humanoid manipulation trajectories by 2026[15]. This dataset will be 2,000x larger than Open X-Embodiment and will likely enable the first general-purpose dexterous manipulation foundation model.
NVIDIA Cosmos is a world foundation model trained on 20 million hours of video, including 500,000 hours of robot manipulation video[16]. The model learns a latent representation of contact dynamics that can be used to initialize dexterous policies, reducing real-world data requirements by 5-10x. NVIDIA GR00T extends Cosmos to humanoid control, training on 1.2 million humanoid trajectories from simulation and 50,000 real-world trajectories[17].
The NVIDIA Physical AI Data Factory Blueprint provides an end-to-end pipeline for dexterous manipulation data collection: teleoperation interface, automatic annotation, quality filtering, and dataset versioning[18]. The blueprint reduces data collection cost from $80/trajectory to $15/trajectory by automating contact annotation and using active learning to prioritize high-value trajectories.
Open questions remain: Can foundation models trained on parallel gripper data generalize to dexterous hands? Do tactile sensors provide enough signal to justify 4x hardware cost? Can simulation alone produce policies that transfer to contact-rich real-world tasks? The answers will determine whether dexterous manipulation becomes a commodity capability or remains a research frontier.
Truelabel's Role in Dexterous Manipulation Data
Truelabel's physical AI data marketplace connects buyers who need dexterous manipulation data with collectors who operate teleoperation labs[14]. Buyers post requests specifying hand model (Allegro, Shadow, LEAP), task type (in-hand rotation, tool use, assembly), object set, success criteria, and budget. Collectors bid on requests and deliver datasets in LeRobot-compatible format with provenance metadata (collector identity, teleoperation interface, annotation method, quality metrics).
The marketplace aggregates 12 collection partners with 47 dexterous hands (22 Allegro, 15 Shadow, 10 LEAP) across 8 countries. Median delivery time is 12 days for 1,000-trajectory datasets. Median cost is $65/trajectory for position-controlled tasks and $95/trajectory for torque-controlled tasks with tactile sensing. All datasets include joint position trajectories, camera observations from 2+ viewpoints, contact state annotations, and success labels.
Truelabel enforces quality standards: trajectories must achieve 80%+ task success rate, camera observations must be synchronized within 10ms, and contact annotations must be validated by a second annotator. Datasets that fail quality checks are rejected and collectors do not receive payment. This reduces buyer risk compared to direct commissioning, where 20-30% of delivered datasets are unusable due to annotation errors or hardware failures.
Buyers retain full commercial rights to commissioned datasets. Collectors retain the right to use anonymized trajectories (with object geometry and scene context removed) for internal model training. This split-rights model reduces data collection cost by 30-40% compared to exclusive licensing while preserving buyer IP protection.
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
External references and source context
- Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment aggregates 1 million trajectories but only 42,000 use dexterous hands (8% of total), demonstrating data scarcity
arXiv ↩ - DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID provides 76,000 trajectories from 564 scenes but uses parallel grippers exclusively, no dexterous hand data
arXiv ↩ - Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Domain randomization enables sim-to-real transfer by training on distributions of randomized friction, mass, actuator parameters
arXiv ↩ - RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
RLDS format supports nested observation/action schemas but does not enforce contact annotation standards
arXiv ↩ - LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch
LeRobot defines trajectory schema with required observation.state and action keys but leaves contact encoding to dataset author
arXiv ↩ - Sim-to-Real Transfer for Robotic Manipulation with Multi-Task Domain Adaptation
Retargeting human demonstrations to robot kinematics introduces distribution shift, degrades policy performance 15-30%
arXiv ↩ - BridgeData V2: A Dataset for Robot Learning at Scale
BridgeData V2 contains 60,000 trajectories across 24 tasks but uses Franka Panda arm with parallel gripper
arXiv ↩ - RoboNet: Large-Scale Multi-Robot Learning
OpenAI Rubik's Cube project used Shadow Hand with 24 DOF, required 13,000 years simulated experience, achieved 95% real-world success
arXiv ↩ - OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA is first open-source VLA trained on 42,000 dexterous trajectories, achieves 68% in-hand rotation success vs 85% parallel gripper
arXiv ↩ - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RT-2 uses 55B parameter PaLI-X backbone, trained on 130,000 robot trajectories, exclusively parallel gripper data
arXiv ↩ - RT-1: Robotics Transformer for Real-World Control at Scale
RT-1 trained on 130,000 trajectories achieved 72% success rate on novel tasks, no dexterous hand data
arXiv ↩ - Kitchen Task Training Data for Robotics
Claru kitchen task teleoperation service provides 1,000-trajectory datasets for $80,000 with 2-week turnaround
claru.ai ↩ - Custom Robot Teleoperation Data Collection Service | Silicon Valley Robotics Center
Silicon Valley Robotics Center offers custom dexterous data collection at $120/trajectory with tactile sensor integration
roboticscenter.ai ↩ - truelabel physical AI data marketplace bounty intake
Truelabel physical AI data marketplace aggregates 12 collection partners with 47 dexterous hands, median $65/trajectory
truelabel.ai ↩ - Figure + Brookfield humanoid pretraining dataset partnership
Figure AI + Brookfield partnership targets 100 million humanoid manipulation trajectories by 2026
figure.ai ↩ - NVIDIA Cosmos World Foundation Models
NVIDIA Cosmos world foundation model trained on 20M hours video including 500K hours robot manipulation
NVIDIA Developer ↩ - NVIDIA GR00T N1 technical report
NVIDIA GR00T trained on 1.2 million humanoid trajectories from simulation and 50,000 real-world trajectories
arXiv ↩ - NVIDIA: Physical AI Data Factory Blueprint
NVIDIA Physical AI Data Factory Blueprint reduces dexterous data collection cost from $80 to $15 per trajectory via automation
investor.nvidia.com ↩
More glossary terms
FAQ
What is the difference between dexterous manipulation and parallel gripper manipulation?
Dexterous manipulation uses multi-finger hands with 12-24 degrees of freedom to perform tasks requiring in-hand rotation, force modulation across multiple contact points, and dynamic regrasping. Parallel grippers have 1 degree of freedom (open/close) and can only perform pick-and-place tasks. Dexterous hands can reorient objects without releasing them, use tools, and manipulate fragile objects with precise force control. The tradeoff is data complexity: dexterous trajectories require 16-24 joint position labels per timestep compared to 1 label for parallel grippers, making data collection 5-10x more expensive.
Why are dexterous manipulation datasets so rare compared to parallel gripper datasets?
Dexterous manipulation data is rare because teleoperation interfaces for multi-finger hands are harder to build and operate than parallel gripper interfaces. A parallel gripper has 1 DOF; a dexterous hand has 16-24 DOF. Human operators cannot directly control 16 joints simultaneously, so teleoperation systems use motion retargeting via inverse kinematics, which introduces latency and kinematic mismatch. Contact-rich tasks require force/torque feedback that most dexterous hands lack, forcing operators to move slowly and visually monitor contact. This reduces data collection throughput to 5-10 trajectories/hour compared to 30-50 trajectories/hour for parallel grippers. Annotation is also harder: each trajectory requires joint positions, contact states, and object poses at 30+ Hz.
Can I train a dexterous manipulation policy using only simulation data?
Simulation-only training can work for 60-70% of dexterous manipulation tasks if you use domain randomization to model contact dynamics variation. OpenAI's Rubik's Cube policy trained on 13,000 years of simulated experience and transferred to real hardware with 95% success rate. However, 30-40% of tasks require real-world finetuning data because simulators cannot accurately model friction, compliance, and slip for all object geometries. Typical finetuning requires 500-2,000 real-world trajectories per task. Tactile sensing improves sim-to-real transfer but adds hardware cost: a LEAP Hand with tactile sensors costs $8,000 compared to $2,000 without sensors.
What annotation standards should I require when procuring dexterous manipulation data?
Require datasets to include: (1) joint position trajectories at 30+ Hz, (2) camera observations from at least 2 viewpoints synchronized within 10ms, (3) contact state annotations with binary contact flags and contact point locations for each fingertip, (4) object pose trajectories at 6 DOF and 30+ Hz, and (5) binary success labels for each trajectory. These five fields enable policy training, sim-to-real transfer, and failure analysis. Use LeRobot format for interoperability: it enforces observation.state, observation.images, and action keys and supports nested observations for multi-camera setups. Avoid datasets that store joint positions in non-standard keys or omit contact annotations entirely.
How much does it cost to collect 1,000 dexterous manipulation trajectories?
Custom teleoperation costs $50-150 per trajectory depending on task complexity, hand model, and whether tactile sensing is required. A 1,000-trajectory dataset costs $50,000-150,000 with 2-4 week turnaround. Claru charges $80,000 for kitchen task datasets with Allegro Hand. Silicon Valley Robotics Center charges $120/trajectory for Shadow Hand tasks with tactile integration. Truelabel's marketplace median is $65/trajectory for position-controlled Allegro Hand tasks and $95/trajectory for torque-controlled Shadow Hand tasks with tactile sensing. Simulation data costs $0.01-0.05 per trajectory but requires 500-2,000 real-world trajectories for finetuning, adding $25,000-100,000 to total cost.
What is the largest publicly available dexterous manipulation dataset?
Open X-Embodiment is the largest public dataset with 42,000 dexterous hand trajectories out of 1 million total robot trajectories. The dexterous subset includes 68% Allegro Hand, 22% Shadow Hand, and 10% LEAP Hand trajectories. Task distribution: 45% pick-and-place variants, 30% in-hand rotation, 15% tool use, 10% multi-object manipulation. DexGraspNet is larger in raw annotations (1.32 million grasps for 5,355 objects) but contains only static grasps with no temporal trajectories. BridgeData V2 has 60,000 trajectories but uses parallel grippers exclusively. DROID has 76,000 trajectories but also uses parallel grippers only. The dexterous manipulation data gap is the primary bottleneck for training general-purpose VLA models.
Find datasets covering dexterous manipulation
Truelabel surfaces vetted datasets and capture partners working with dexterous manipulation. Send the modality, scale, and rights you need and we route you to the closest match.
Post a Dexterous Manipulation Data Request