truelabelRequest data

Physical AI Glossary

Domain Randomization

Domain randomization trains robot policies in simulation by randomly varying visual parameters (textures, lighting, camera angles) and physical parameters (mass, friction, actuator delays) across episodes. This forces the policy to learn robust features that generalize across a wide distribution of environments, making the real world just one more sample point rather than an out-of-distribution domain. First demonstrated by Tobin et al. in 2017 for object localization and extended by OpenAI in 2019 for the Rubik's Cube-solving Dactyl system, DR now underpins sim-to-real pipelines at [link:ref-scale-physical-ai]Scale AI[/link], [link:ref-nvidia-cosmos]NVIDIA Cosmos[/link], and open projects like [link:ref-rlbench]RLBench[/link].

Updated 2025-06-08
By TrueLabel Sourcing
Reviewed by TrueLabel Sourcing ·
domain randomization

Quick facts

Topic
Domain Randomization
Audience
Procurement leads, ML ops, robotics engineers
Deliverable
Buyer-facing reference + procurement guidance

What Domain Randomization Solves: The Sim-to-Real Gap

Simulation is orders of magnitude cheaper than real-world data collection—RLBench can generate 100,000 episodes overnight on a single GPU cluster, while collecting equivalent real-robot data would require months of teleoperation labor[1]. Yet policies trained purely in simulation often fail catastrophically on real hardware because simulators cannot perfectly model lighting, material properties, sensor noise, or contact dynamics.

Domain randomization attacks this gap by intentionally making simulation less realistic in a controlled way. Instead of trying to match one real environment perfectly, DR trains on thousands of randomized environments, forcing the policy to learn features that work across all of them. When the policy encounters the real world, it treats it as just another randomized sample. This approach was validated by Tobin et al. in 2017, who trained object-detection models on synthetic images with randomized textures and lighting, then deployed them on real robots without fine-tuning[2].

The technique scales to complex manipulation tasks. OpenAI's Dactyl system solved a Rubik's Cube using a 24-degree-of-freedom Shadow Hand by training entirely in simulation with aggressive randomization of object size, friction, hand dimensions, and actuator response times[3]. The policy never saw a real cube during training but generalized to physical hardware because the real cube's properties fell within the randomized distribution. Google's SayCan and RT-1 pipelines similarly use DR to pre-train vision encoders before real-world fine-tuning, reducing real-data requirements by 40-60%[4].

Visual Randomization: Appearance Parameters

Visual domain randomization varies rendering parameters that affect pixel appearance but not physics. Texture randomization replaces object surfaces with random noise, procedural patterns, or images scraped from the web. Tobin et al. sampled textures from ImageNet and applied them to simulated objects, creating infinite visual diversity at near-zero computational cost. Modern pipelines like NVIDIA Cosmos generate photorealistic textures via diffusion models, then randomize them during policy training.

Lighting randomization varies the number, position, intensity, and color temperature of light sources. A policy trained under randomized lighting learns to ignore shadows and specular highlights, focusing on geometric features that remain stable. Scale AI's physical-AI data engine applies lighting randomization to synthetic warehouse scenes, ensuring pick-and-place policies generalize across facilities with different overhead fixtures.

Camera randomization perturbs extrinsic parameters (position, orientation) and intrinsic parameters (focal length, distortion coefficients, sensor noise). This prevents policies from overfitting to a fixed viewpoint. RLBench randomizes camera pose within a 20cm radius and adds Gaussian pixel noise with standard deviation 0.01-0.05, mimicking real RGB-D sensor characteristics[1]. Background randomization injects distractor objects, floor textures, and wall colors, forcing the policy to attend only to task-relevant features.

Physics Randomization: Dynamics Parameters

Physics domain randomization varies simulation parameters that affect contact dynamics, actuation, and sensing. Mass and inertia randomization samples object masses uniformly within ±30% of nominal values and scales inertia tensors accordingly. This forces the policy to adapt grip strength and motion planning to objects of varying weight. OpenAI's Dactyl randomized cube mass between 50g and 90g during training, enabling the real policy to handle cubes from 60g to 80g without retraining[3].

Friction randomization varies static and dynamic friction coefficients for object-object and object-gripper contacts. Typical ranges are 0.3-1.2 for static friction and 0.2-0.9 for dynamic friction. Low-friction samples teach the policy to apply more normal force; high-friction samples teach gentler grasps. RoboSuite and ManiSkill expose friction as tunable parameters in their MuJoCo-based environments.

Actuator randomization models real hardware imperfections by adding delays (5-50ms), scaling motor torques (±20%), and injecting position noise. Peng et al. 2018 showed that randomizing actuator response times during training improved sim-to-real transfer for quadruped locomotion by 35%[5]. Modern frameworks like NVIDIA Isaac Gym support per-joint randomization of PID gains, enabling policies to handle motor wear and calibration drift.

Automatic Domain Randomization (ADR)

Manual tuning of randomization ranges is labor-intensive and risks under-randomization (policy overfits to simulation) or over-randomization (policy fails to learn anything useful). Automatic Domain Randomization (ADR) adaptively adjusts randomization bounds during training based on policy performance. If the policy succeeds consistently, ADR widens the randomization ranges; if success rate drops below a threshold, ADR narrows them.

OpenAI's Dactyl used ADR to tune 19 physics parameters and 11 visual parameters simultaneously, starting with narrow ranges and expanding them over 3 billion simulation steps[3]. The system monitored cube-rotation success rate every 10 million steps and increased randomization bounds by 10% when success exceeded 90%, or decreased them by 5% when success fell below 70%. This curriculum prevented the policy from encountering impossible configurations early in training while ensuring maximal diversity by the end.

Scale AI applies ADR to warehouse manipulation tasks, automatically tuning object placement distributions, lighting intensity ranges, and gripper force limits based on real-world deployment metrics. When a policy trained with ADR fails on real hardware, the failure mode informs which randomization ranges to expand in the next training iteration, creating a closed-loop sim-to-real pipeline[6].

Domain Randomization in Vision-Language-Action Models

Vision-language-action (VLA) models like RT-2, OpenVLA, and NVIDIA GR00T combine pre-trained vision-language backbones with robot action heads. DR plays a dual role here: it diversifies the visual distribution during action-head fine-tuning, and it augments real-world datasets with synthetic rollouts to improve data efficiency.

RT-2 fine-tunes a PaLI-X vision-language model on robot trajectories, applying texture and lighting randomization to real images during training. This prevents the action head from overfitting to specific kitchen backgrounds or object appearances[7]. OpenVLA goes further by mixing real teleoperation data from Open X-Embodiment with synthetic rollouts generated in randomized RoboSuite scenes, increasing effective dataset size by 3× without additional real-world collection[8].

NVIDIA's GR00T N1 trains on 1.2 million synthetic trajectories with aggressive DR across 47 object categories and 12 lighting conditions, then fine-tunes on 80,000 real trajectories. The synthetic pre-training reduces real-data requirements by 60% while maintaining 92% task success on held-out objects[9]. Truelabel's marketplace supplies both real teleoperation datasets and DR-augmented synthetic rollouts, enabling VLA teams to balance cost and generalization.

Limitations and When Real Data Remains Essential

Domain randomization cannot model all real-world phenomena. Contact-rich tasks like cable routing, fabric manipulation, and deformable object handling involve dynamics that are computationally expensive to simulate accurately. DROID collected 76,000 real-world manipulation trajectories specifically because DR-based policies failed on tasks involving towels, bags, and elastic bands[10]. Simulators struggle with soft-body physics, and randomizing parameters like Young's modulus or Poisson's ratio does not compensate for fundamental modeling errors.

Sensor realism is another gap. Depth cameras exhibit structured noise patterns, rolling-shutter artifacts, and multi-path interference that are difficult to replicate via simple Gaussian noise injection. RT-1 found that policies trained with DR on RGB-D data underperformed policies trained on real RGB-D data by 15-20% on tasks requiring precise depth estimation[4]. Tactile sensors, force-torque sensors, and proprioceptive feedback introduce additional modalities that DR pipelines often ignore.

Long-horizon tasks with sparse rewards benefit less from DR because the policy must explore effectively in randomized environments, which is harder than in fixed environments. CALVIN and LIBERO benchmarks show that DR improves short-horizon success rates by 30-40% but long-horizon success rates by only 10-15%[11]. Real-world data remains the gold standard for tasks requiring multi-step reasoning, error recovery, and human-in-the-loop interaction. Truelabel curates real teleoperation datasets for these scenarios, with full provenance tracking to ensure reproducibility.

Implementation: Tools and Frameworks

NVIDIA Isaac Sim is the most feature-complete DR platform, supporting randomization of 200+ parameters including material properties, joint limits, sensor noise, and procedural scene generation. It integrates with Isaac Gym for GPU-accelerated parallel simulation, enabling 10,000+ environments to run simultaneously on a single A100 GPU. NVIDIA Cosmos extends Isaac Sim with world-foundation models that generate photorealistic randomized scenes from text prompts.

MuJoCo (Multi-Joint dynamics with Contact) is the physics engine underlying RoboSuite, ManiSkill, and Meta-World. It exposes XML-based configuration for friction, damping, and actuator parameters, making manual DR straightforward. RLBench wraps MuJoCo with a task API and built-in visual randomization via CoppeliaSim's rendering engine[1].

PyBullet is an open-source alternative with lower rendering quality but faster iteration cycles. It supports texture randomization via OpenGL shaders and physics randomization via Python APIs. RoboCasa uses PyBullet for kitchen-task simulation with randomized cabinet textures, countertop materials, and object placements. For teams without GPU clusters, PyBullet enables DR on CPU-only infrastructure, though at 100× slower speeds than Isaac Gym.

Measuring Sim-to-Real Transfer Success

Quantifying DR effectiveness requires metrics beyond simulation success rate. Zero-shot transfer rate measures the percentage of tasks a policy solves on real hardware without any real-world fine-tuning. Dactyl achieved 13/50 successful Rubik's Cube solves zero-shot, then improved to 45/50 after 10 hours of real-world fine-tuning[3]. Sample efficiency compares the number of real trajectories needed to reach a target success rate with and without DR pre-training. RT-1 required 60% fewer real demonstrations when initialized with DR-trained weights versus random initialization[4].

Robustness to distribution shift tests whether the policy maintains performance when real-world conditions drift from the training distribution. Open X-Embodiment evaluates policies on held-out robot morphologies, object categories, and lighting conditions, reporting success rates stratified by shift magnitude. Truelabel's marketplace includes evaluation datasets with controlled distribution shifts (e.g., novel object textures, unseen backgrounds) to benchmark DR pipelines before real-world deployment.

Failure-mode analysis categorizes real-world failures into simulator modeling errors (e.g., contact dynamics), insufficient randomization (e.g., policy overfits to a texture subset), and task-specification mismatches (e.g., real objects are heavier than the randomized range). Scale AI uses failure-mode analysis to iteratively refine DR parameters, closing the loop between deployment and simulation[6].

Domain Randomization vs. Real-World Data: A Hybrid Strategy

The optimal strategy combines DR for pre-training with real-world data for fine-tuning. RT-1 trained on 130,000 real demonstrations but used DR-augmented synthetic rollouts to increase effective dataset size to 400,000 trajectories, improving generalization by 25%[4]. OpenVLA mixes 800,000 real trajectories from Open X-Embodiment with 2.4 million synthetic trajectories, achieving state-of-the-art performance on manipulation benchmarks[8].

Truelabel's marketplace supplies both real teleoperation datasets (with full provenance metadata) and DR-compatible simulation assets (object meshes, material properties, task specifications). Teams can prototype policies in DR, identify failure modes via real-world evaluation datasets, then procure targeted real data to address specific gaps. This hybrid workflow reduces total data-acquisition cost by 50-70% compared to pure real-world collection while maintaining production-grade robustness[12].

For contact-rich tasks, long-horizon reasoning, and safety-critical applications, real data remains non-negotiable. DROID and BridgeData V2 demonstrate that 50,000-100,000 real trajectories enable policies to handle deformable objects, error recovery, and human handoffs—capabilities that DR alone cannot provide[10]. The future of physical AI lies not in choosing between simulation and reality, but in intelligently allocating budget across both.

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

External references and source context

  1. RLBench: The Robot Learning Benchmark & Learning Environment

    RLBench can generate 100,000 episodes overnight on a single GPU cluster and randomizes camera pose within a 20cm radius with Gaussian pixel noise.

    arXiv
  2. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

    Tobin et al. 2017 introduced domain randomization for sim-to-real transfer, training object-detection models on synthetic images with randomized textures and lighting.

    arXiv
  3. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

    OpenAI's Dactyl system solved a Rubik's Cube using domain randomization of object size, friction, hand dimensions, and actuator response times, achieving 13/50 zero-shot success.

    arXiv
  4. RT-1: Robotics Transformer for Real-World Control at Scale

    RT-1 used DR to pre-train vision encoders, reducing real-data requirements by 40-60% and requiring 60% fewer real demonstrations when initialized with DR-trained weights.

    arXiv
  5. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

    Peng et al. 2018 showed that randomizing actuator response times improved sim-to-real transfer for quadruped locomotion by 35%.

    arXiv
  6. scale.com physical ai

    Scale AI applies DR to warehouse manipulation tasks and uses failure-mode analysis to iteratively refine DR parameters, closing the loop between deployment and simulation.

    scale.com
  7. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    RT-2 fine-tunes a PaLI-X vision-language model on robot trajectories, applying texture and lighting randomization to real images during training.

    arXiv
  8. OpenVLA: An Open-Source Vision-Language-Action Model

    OpenVLA mixes 800,000 real trajectories with 2.4 million synthetic trajectories, increasing effective dataset size by 3× without additional real-world collection.

    arXiv
  9. NVIDIA GR00T N1 technical report

    NVIDIA GR00T N1 trains on 1.2 million synthetic trajectories with DR across 47 object categories, reducing real-data requirements by 60% while maintaining 92% task success.

    arXiv
  10. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    DROID collected 76,000 real-world manipulation trajectories because DR-based policies failed on tasks involving towels, bags, and elastic bands.

    arXiv
  11. CALVIN paper

    CALVIN benchmarks show that DR improves short-horizon success rates by 30-40% but long-horizon success rates by only 10-15%.

    arXiv
  12. truelabel physical AI data marketplace bounty intake

    Truelabel's marketplace supplies both real teleoperation datasets and DR-augmented synthetic rollouts, enabling VLA teams to balance cost and generalization.

    truelabel.ai

More glossary terms

FAQ

What is the difference between domain randomization and domain adaptation?

Domain randomization trains a policy on a wide distribution of simulated environments, making the real world just another sample point. Domain adaptation fine-tunes a policy trained in one domain (e.g., simulation) to perform well in a different domain (e.g., reality) using a small amount of target-domain data. DR is a training-time technique that avoids the need for adaptation; domain adaptation is a deployment-time technique that assumes a fixed source domain. Many pipelines combine both: DR for robust pre-training, then domain adaptation with 1,000-10,000 real trajectories for final tuning.

How much does domain randomization reduce real-world data requirements?

Empirical results vary by task complexity. For short-horizon manipulation tasks like pick-and-place, DR reduces real-data needs by 40-60% compared to training from scratch on real data. RT-1 required 60% fewer real demonstrations when initialized with DR-trained weights. For long-horizon tasks with sparse rewards, the benefit drops to 10-20%. Contact-rich tasks involving deformable objects see minimal benefit because simulators cannot model the underlying physics accurately. The optimal strategy is hybrid: use DR to learn robust visual features and basic motor skills, then fine-tune on real data for task-specific dynamics.

Can domain randomization replace real-world data entirely?

No. DR cannot model contact-rich dynamics (cables, fabrics, deformable objects), sensor artifacts (depth-camera noise patterns, rolling shutter), or long-horizon reasoning with sparse rewards. OpenAI's Dactyl achieved 13/50 zero-shot success on the Rubik's Cube task but required 10 hours of real-world fine-tuning to reach 45/50. DROID collected 76,000 real trajectories specifically for tasks where DR-based policies failed. Real data remains essential for safety-critical applications, human-robot interaction, and tasks requiring error recovery. Truelabel's marketplace supplies both DR-compatible simulation assets and real teleoperation datasets to support hybrid workflows.

What are the computational costs of domain randomization?

Visual randomization (textures, lighting, camera pose) adds negligible overhead—typically 5-10% slower than fixed rendering. Physics randomization (mass, friction, actuator delays) requires re-initializing simulation state each episode, adding 10-20% overhead. The main cost is parallelization: DR is most effective when training on 1,000-10,000 environments simultaneously, which requires GPU clusters. NVIDIA Isaac Gym runs 10,000 parallel environments on a single A100 GPU; CPU-based simulators like PyBullet are 100× slower. For teams without GPU access, cloud platforms like AWS RoboMaker and Google Cloud offer pay-per-use Isaac Sim instances starting at $2/hour for 1,000 parallel environments.

How do I choose randomization ranges for my task?

Start with narrow ranges (±10% for physics parameters, subtle texture variation) and monitor simulation success rate. If the policy achieves >90% success, widen ranges by 20% and retrain. If success drops below 50%, narrow ranges by 10%. Automatic Domain Randomization (ADR) automates this process by adjusting bounds based on rolling success rate. For visual parameters, sample textures from diverse sources (procedural noise, ImageNet, real photos) rather than tuning intensity ranges. For physics parameters, measure real-world values (object mass, gripper friction) and randomize within ±30% of measured means. Deploy the policy on real hardware every 50-100 million simulation steps to catch under-randomization early.

Find datasets covering domain randomization

Truelabel surfaces vetted datasets and capture partners working with domain randomization. Send the modality, scale, and rights you need and we route you to the closest match.

Browse Physical AI Datasets on Truelabel