Dataset profile
NVIDIA PhysicalAI GR00T X-Embodiment Sim Dataset
The NVIDIA PhysicalAI-Robotics-GR00T-X-Embodiment-Sim dataset provides 9,000 simulated bimanual manipulation trajectories across multiple robot embodiments, released under the permissive CC-BY-4.0 license. The collection includes tasks such as threading, tray lifting, and three-piece assembly performed with Panda gripper and Panda hand configurations. Robotics teams use this dataset for post-training vision-language-action models and cross-embodiment policy learning, particularly when developing bimanual manipulation capabilities that must generalize across different arm-gripper combinations in simulation before real-world deployment.
Quick facts
- Scale
- 9,000 trajectories
- License
- CC-BY-4.0
- Embodiments
- Panda gripper, Panda hand
- Task type
- Bimanual manipulation
- Commercial use
- Permitted with attribution
- Environment
- Isaac Sim
Dataset composition and task coverage
The dataset comprises three primary bimanual manipulation tasks distributed across two embodiment configurations. Threading tasks with the Panda gripper configuration contribute 1,000 trajectories, demonstrating fine motor control and precise coordination between dual arms. LiftTray operations using the Panda hand embodiment provide another 1,000 trajectories focused on coordinated grasping and lift maneuvers with anthropomorphic end-effectors. The ThreePieceAssembly task rounds out the core collection with 1,000 additional Panda gripper trajectories that capture multi-step assembly sequences requiring spatial reasoning and sequential manipulation.
This cross-embodiment structure explicitly supports transfer learning research, where policies trained on one morphology must adapt to different kinematic chains and end-effector geometries. The simulation environment leverages NVIDIA Isaac Sim to generate physically plausible interactions, providing ground-truth state information and deterministic replay capabilities that facilitate debugging and ablation studies during model development.
Licensing and commercial deployment rights
Released under Creative Commons Attribution 4.0 International, this dataset grants broad usage rights including commercial application, modification, and redistribution provided appropriate credit is given to NVIDIA. The permissive terms eliminate procurement friction for product teams building commercial robotic systems, as the attribution requirement can be satisfied through standard acknowledgment in technical documentation or model cards.
The CC-BY-4.0 designation specifically permits derivative works, meaning teams can augment trajectories with additional annotations, resample for different control frequencies, or blend this data with proprietary collections without license contamination concerns. For organizations operating under strict IP policies, the absence of share-alike or non-commercial clauses makes this dataset compatible with proprietary model training pipelines where the resulting weights and system behaviors will remain confidential.
- Attribution required but no share-alike restrictions
- Compatible with proprietary model development
- Derivative works and commercial use explicitly permitted
Procurement considerations for foundation model teams
Teams sourcing data for vision-language-action transformer architectures will find the cross-embodiment structure particularly valuable for few-shot adaptation experiments. The consistent task definitions across different morphologies enable controlled studies of embodiment gap challenges, where a policy must map high-level action semantics to embodiment-specific motor commands. The 1,000 trajectory scale per task provides sufficient diversity for fine-tuning while remaining computationally tractable for teams without massive compute budgets.
The simulation provenance means state observations include perfect ground truth for proprioceptive signals, object poses, and contact forces—information often noisy or unavailable in real-robot datasets. This makes the collection especially suitable for pre-training world models or dynamics predictors before sim-to-real transfer phases. However, procurement teams should budget for domain randomization and real-world validation datasets, as policies trained purely on this simulated data will require bridging techniques to handle real sensor noise, calibration errors, and physical discrepancies not modeled in Isaac Sim.
Known limitations and deployment gaps
The dataset's simulation origin introduces characteristic distribution gaps that affect real-world deployment. Isaac Sim physics, while sophisticated, uses simplified contact models and friction parameters that diverge from real material interactions, particularly for deformable objects or high-frequency vibrations. Teams deploying to production environments with varied lighting, worn hardware, or non-rigid objects should plan for substantial sim-to-real adaptation effort, potentially requiring hundreds of real-world demonstrations for domain alignment.
The 9,000 trajectory scale, while sufficient for post-training and adaptation experiments, falls short of the data volumes typically required to train foundation models from scratch. Organizations building general-purpose manipulation policies should view this as a high-quality supplement to larger real-world collections rather than a standalone training corpus. The focus on bimanual tasks also means single-arm manipulation scenarios and mobile manipulation contexts receive no coverage, limiting applicability for teams working outside stationary dual-arm workstation paradigms.
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
FAQ
What is the NVIDIA GR00T X-Embodiment Sim dataset and what does it contain?
This dataset provides 9,000 simulated bimanual manipulation trajectories generated in NVIDIA Isaac Sim, specifically designed for post-training the GR00T N1 robotics foundation model. The collection spans three distinct manipulation tasks—threading, tray lifting, and three-piece assembly—performed across two robot embodiments: Panda arms equipped with parallel-jaw grippers and Panda arms with anthropomorphic hands. Each task contributes 1,000 trajectories per embodiment configuration, creating a structured benchmark for cross-embodiment policy learning. The trajectories capture full state information including joint positions, end-effector poses, object states, and action sequences, enabling both behavior cloning and inverse dynamics learning. The data was generated as part of NVIDIA's PhysicalAI initiative to develop general-purpose humanoid and manipulation capabilities, with simulation allowing rapid iteration and perfect ground-truth labels unavailable in real-world capture scenarios.
What are the exact licensing terms and can I use this commercially?
The dataset is released under CC-BY-4.0, one of the most permissive open licenses available. This explicitly grants rights for commercial use, modification, distribution, and creation of derivative works, with the sole requirement being appropriate attribution to NVIDIA. You can train proprietary models on this data, deploy those models in commercial products, and retain full ownership of the resulting system without share-alike obligations. For compliance, attribution typically involves citing NVIDIA as the dataset creator in technical documentation, model cards, or research papers. Unlike GPL-style licenses, CC-BY-4.0 does not require you to release your model weights or disclose training procedures, making it fully compatible with commercial product development where intellectual property protection is essential.
Which robotics teams should prioritize acquiring this dataset?
This dataset is particularly valuable for teams developing vision-language-action models or bimanual manipulation policies that must generalize across different robot morphologies. If you are building foundation models that need to transfer skills between robots with different end-effectors—such as adapting a gripper-trained policy to dexterous hands—the explicit cross-embodiment structure provides controlled experimental scaffolding. Teams working on humanoid robots with dual-arm capabilities will find the task diversity and coordination patterns directly applicable. Organizations in the post-training or fine-tuning phase of foundation model development gain the most immediate value, as the 9,000 trajectory scale suits adaptation workflows better than pre-training from scratch. The simulation environment also benefits teams performing extensive ablation studies or hyperparameter sweeps, where deterministic replay and perfect state observability accelerate debugging compared to noisy real-world data.
When is this dataset NOT the right choice for my robotics project?
If your deployment environment involves single-arm manipulation, mobile manipulation, or outdoor unstructured settings, this dataset provides limited relevance as it focuses exclusively on stationary bimanual tasks. Teams requiring massive-scale pre-training data for foundation models should look elsewhere, as 9,000 trajectories undershoots the millions of demonstrations typically needed for training large transformer policies from random initialization. This serves better as a supplementary fine-tuning resource than a primary training corpus. The simulation provenance also makes it unsuitable as a sole data source for production deployment without significant real-world bridging. If you lack the infrastructure for sim-to-real transfer techniques like domain randomization, real-world fine-tuning, or adversarial adaptation, the distribution gap will likely cause policy failures on physical hardware. Teams working with deformable objects, precise force control, or tasks sensitive to contact dynamics should be especially cautious, as Isaac Sim's rigid-body assumptions diverge substantially from real material behavior.
How does this dataset integrate with existing VLA training pipelines?
The dataset follows standard trajectory formats compatible with behavior cloning and offline reinforcement learning workflows common in VLA development. Each trajectory includes observation sequences, action labels, and state information that can be directly ingested by frameworks like OpenVLA, RoboMimic, or custom transformer architectures. The cross-embodiment structure allows for multi-task training where a single policy network learns task semantics while embodiment-specific adapter layers handle morphology differences, a pattern increasingly common in foundation model architectures. For teams already training on real-world data, this simulation collection serves as an effective augmentation source to improve sample efficiency and provide coverage of rare edge cases difficult to capture physically. The consistent task definitions enable curriculum learning strategies where models first master simulated versions before fine-tuning on smaller real-world datasets. However, practitioners should implement observation normalization and domain randomization during training to prevent overfitting to Isaac Sim's specific rendering and physics characteristics.
What data preparation and preprocessing should I expect before model training?
While the dataset provides structured trajectories, production training pipelines typically require several preprocessing steps. You will likely need to resample control frequencies to match your target robot's actuation rates, as simulation often uses higher frequencies than physical hardware supports. Observation spaces may require normalization or projection to match your robot's sensor suite, particularly if your cameras have different intrinsics or your proprioceptive sensors report in different coordinate frames than the Panda embodiments used here. Teams pursuing sim-to-real transfer should implement domain randomization during data loading, adding synthetic noise to proprioceptive readings, varying lighting and texture in visual observations, and introducing small perturbations to action execution. The perfect ground-truth state information should be degraded to match real-world sensor noise profiles, preventing the policy from relying on unrealistically precise signals. Budget engineering time for these augmentation pipelines, as naive training on unmodified simulation data typically yields policies that fail immediately when deployed to physical robots due to the distribution shift.
Need data like NVIDIA PhysicalAI GR00T X-Embodiment Sim Dataset?
If your project needs similar modality, scale, or licensing, truelabel can surface comparable open datasets or match you with capture partners that deliver to spec.
Access dataset on Hugging Face