Question 1

What is the difference between RGB-D data and point clouds?

Accepted Answer

RGB-D data is a dense 2D grid where each pixel has color (RGB) and depth (D) values, maintaining the image structure and pixel correspondence. Point clouds are unordered 3D coordinate sets (X, Y, Z) with optional color attributes, generated by backprojecting RGB-D pixels into 3D space. RGB-D preserves spatial locality for convolutional networks, while point clouds enable rotation-invariant processing with architectures like PointNet. Most manipulation datasets store RGB-D because policies need dense per-pixel features; point clouds are used primarily for scene reconstruction and registration tasks.

Question 2

Can RGB-D datasets be used for outdoor robotics applications?

Accepted Answer

RGB-D sensors have limited outdoor utility due to 3-6 meter range constraints and infrared interference from sunlight. Structured light and ToF depth sensors fail in bright outdoor conditions because ambient infrared overwhelms the projected signal. Outdoor robotics relies on LiDAR (50-200m range) combined with separate RGB cameras, not fused RGB-D. However, RGB-D remains viable for outdoor manipulation tasks within arm's reach (bin picking, agricultural harvesting) where the robot operates in the sensor's 0.5-3m sweet spot. Datasets like Waymo Open provide LiDAR + RGB but not aligned RGB-D.

Question 3

How do I convert RGB-D data between ROS bags and HDF5 formats?

Accepted Answer

Use rosbag2 Python API to extract synchronized RGB and depth topics, then write arrays to HDF5 with h5py. Key steps: subscribe to /camera/color/image_raw and /camera/depth/image_rect topics, verify timestamps match within 10ms, decode compressed images to NumPy arrays, and store in HDF5 groups with dataset chunking (e.g., chunks=(1, 480, 640) for per-frame access). Preserve camera intrinsics (fx, fy, cx, cy) as HDF5 attributes. For MCAP conversion, use the mcap Python library to write Image and CameraInfo messages. LeRobot provides conversion scripts for common formats in their GitHub repository.

Question 4

What annotation tools support RGB-D data labeling?

Accepted Answer

Segments.ai, CVAT, Labelbox, and V7 Darwin support RGB-D annotation with 3D visualization. Segments.ai overlays 2D masks on RGB while displaying the corresponding point cloud, enabling annotators to verify boundaries against depth discontinuities. CVAT supports depth as a separate layer but lacks native 3D viewers. Labelbox and V7 offer 3D cuboid tools that snap to depth surfaces. For manipulation-specific workflows (grasp poses, 6-DOF annotations), specialized tools like Kognic or custom Unity-based annotators are common. Open-source options include Label Studio with custom 3D plugins, though setup requires engineering effort.

Question 5

How much RGB-D training data is required for manipulation policies?

Accepted Answer

Modern manipulation policies require 50-200 hours of RGB-D teleoperation data for single-task learning, and 500-2,000 hours for multi-task generalist models. RT-1 trained on 130,000 episodes (≈700 hours) across 700 tasks. OpenVLA used 970,000 trajectories from Open X-Embodiment. Data efficiency improves with pretraining: policies initialized on large RGB-D datasets can fine-tune on 10-50 task-specific demonstrations. Quality matters more than quantity — 20 hours of clean, diverse RGB-D data outperforms 100 hours of repetitive or low-variation captures. Budget $3,000-$8,000 per task for custom RGB-D collection and annotation.

Question 6

What are common failure modes in RGB-D datasets that affect policy training?

Accepted Answer

Five critical failure modes: (1) depth-RGB misalignment from poor calibration, causing 3D position errors; (2) invalid depth pixels (NaN or zero values) on reflective/transparent surfaces, creating holes in point clouds; (3) temporal desynchronization where RGB and depth frames are captured 30-50ms apart, introducing motion artifacts; (4) depth noise at range limits (>3m) where sensor precision degrades; (5) compression artifacts in depth maps stored as lossy JPEG instead of lossless PNG. Policies trained on corrupted RGB-D data exhibit 20-40% lower success rates. Truelabel's marketplace filters datasets by calibration quality and invalid-pixel percentage to surface clean training data.

RGB-D Data

Quick facts

What RGB-D Data Represents in Physical AI Systems

Depth Acquisition Technologies and Trade-offs

Standard File Formats and Storage Architectures

Annotation Workflows for Manipulation Datasets

RGB-D in Reinforcement Learning and Imitation Learning

Procurement Considerations for RGB-D Training Data

External references and source context

More glossary terms

FAQ

Find datasets covering RGB-D data