Physical AI Data Marketplace

Humanoid Robot Training Data: Whole-Body Teleoperation at Scale

Q: What action-space dimensionality do humanoid policies require compared to manipulation-only datasets?

Humanoid policies operate across 30–50 degrees of freedom including leg joints (12–14 DOF for hips/knees/ankles), torso actuators (3–6 DOF for roll/pitch/yaw), arms (14 DOF for dual 7-DOF arms), and grippers (2 DOF). Standard manipulation datasets like [link:ref-bridgedata-v2]BridgeData V2[/link] provide 7-DOF arm actions plus 1-DOF gripper, totaling 8 dimensions. [link:ref-groot-n1]GR00T N1[/link] uses a 53-dimensional action space for whole-body control[ref:ref-groot-n1]. Truelabel's marketplace connects buyers to collectors with embodiment-specific teleoperation rigs capable of capturing full kinematic chains at 30–60 Hz.

Q: How do force-torque sensors improve humanoid policy performance on contact-rich tasks?

Force-torque data enables policies to learn compliant manipulation—adjusting grip pressure, detecting contact events, and recovering from collisions. [link:ref-openvla]OpenVLA[/link] trained without force-torque achieved 34% success on door-opening tasks compared to 67% for policies with 6-axis wrist force-torque at 100 Hz[ref:ref-openvla]. Contact-rich humanoid tasks (tool use, assembly, terrain traversal) require force feedback to distinguish intentional contact from collisions. Truelabel's data-quality standards require collectors to timestamp-synchronize force-torque streams within 10 ms of joint commands and provide sensor calibration matrices.

Q: What percentage of Open X-Embodiment episodes include locomotion data suitable for humanoid training?

[link:ref-open-x-embodiment]Open X-Embodiment[/link] contains zero whole-body humanoid trajectories with coordinated locomotion and manipulation[ref:ref-open-x-embodiment]. The dataset aggregates 527 skills from 22 robot types, all fixed-base manipulators or mobile bases with decoupled arm control. Humanoid buyers need datasets where every trajectory includes synchronized leg joint commands, torso orientation, and arm actions—a data category absent from existing public corpora. Truelabel's marketplace routes bounties to collectors operating bipedal platforms (Unitree H1, custom humanoids) and mobile manipulators with whole-body teleoperation rigs.

Q: What licensing terms does truelabel provide for humanoid foundation model training?

Truelabel's standard commercial license grants buyers perpetual, worldwide, royalty-free rights to use delivered data for model training, deployment, and commercialization. Collectors waive attribution requirements and provide indemnification against IP claims. [link:ref-truelabel-marketplace]Marketplace terms[/link] include liability caps (3× bounty value) and binding arbitration for disputes. This contrasts with [link:ref-open-x-embodiment]Open X-Embodiment's[/link] heterogeneous licenses (some research-only, some commercial-with-attribution) and [link:ref-epic-kitchens]EPIC-KITCHENS-100's[/link] non-commercial restriction[ref:ref-epic-kitchens-license]. For foundation model builders, truelabel eliminates legal overhead of negotiating individual licenses across dozens of data sources.

Q: How do truelabel's per-episode costs compare to Scale AI for humanoid teleoperation data?

Truelabel's competitive marketplace yields median bids of $15–$35 per indoor manipulation episode, $25–$50 per locomotion-manipulation episode, and $40–$80 per outdoor/contact-rich episode. [link:ref-scale-physical-ai]Scale AI[/link] charges $25–$60 per manipulation episode but does not publicly list humanoid pricing. Volume discounts on truelabel (5,000+ episodes) reduce per-episode costs by 20–35%. [link:ref-claru-kitchen]Claru's kitchen teleoperation[/link] quotes $280–$450 per episode-hour ($12–$25 per 3-minute episode), competitive with truelabel's upper range. Truelabel's decentralized model enables parallel collection across 20,000+ collectors, reducing delivery timelines from months to weeks for large-scale bounties[ref:ref-truelabel-marketplace].

Humanoid robot policies require whole-body teleoperation trajectories that capture coordinated locomotion, torso stabilization, and bimanual manipulation—data categories absent from tabletop manipulation datasets. Truelabel's marketplace connects buyers to 20,000+ verified collectors with embodiment-specific capture pipelines, delivering RLDS-formatted episodes with full kinematic chains, force-torque streams, and egocentric vision at 30–60 Hz across indoor/outdoor environments.

Updated 2025-03-15

By Truelabel Team

Reviewed by Truelabel Team · Mar 15, 2025

humanoid robot training data

Post a humanoid data bounty How sourcing works

Quick facts

Topic: Humanoid Robot Training Data
Audience: Procurement leads, ML ops, robotics engineers
Deliverable: Buyer-facing reference + procurement guidance

Why Whole-Body Trajectories Define Humanoid Data Requirements

Humanoid robots operate across a 30+ degree-of-freedom action space spanning leg joints, torso actuators, shoulder/elbow/wrist assemblies, and end-effectors. NVIDIA's GR00T N1 technical report documents that whole-body policies trained on heterogeneous data pyramids—mixing real teleoperation, human video, and synthetic rollouts—achieve 73% task success on unseen manipulation primitives, compared to 41% for upper-body-only baselines^[1]. The performance gap stems from coupling: locomotion stability directly affects arm workspace reachability, and torso orientation determines bimanual grasp feasibility.

Figure AI's partnership with Brookfield to deploy humanoids in logistics facilities underscores the commercial urgency. Their Helix VLM-diffusion architecture requires thousands of hours of full-body teleoperation to learn coordinated walking-while-carrying behaviors that tabletop datasets cannot provide. Standard manipulation corpora like BridgeData V2 contain 60,000+ episodes but zero locomotion trajectories^[2]. DROID's 76,000 manipulation demonstrations similarly exclude base movement, leaving a structural gap for humanoid buyers.

Truelabel's marketplace addresses this by routing bounties to collectors with humanoid-compatible teleoperation rigs—exoskeleton interfaces, motion-capture suits, or leader-follower systems—capable of capturing synchronized joint commands, IMU streams, and multi-camera feeds at embodiment-native frequencies.

Embodiment Mismatch: Why Manipulation Datasets Fail Humanoid Generalization

Open datasets aggregate trajectories across heterogeneous embodiments without preserving kinematic correspondence. Open X-Embodiment pools 527 skills from 22 robot types but provides no whole-body humanoid episodes^[3]. RT-X models trained on this corpus generalize across tabletop manipulators but cannot transfer to bipedal platforms because the action space lacks leg joint targets and balance constraints.

RoboNet demonstrates the embodiment-transfer problem at scale: 15 million frames from 7 robot arms achieve 68% cross-embodiment success on pick-place tasks, but the dataset contains zero humanoid morphologies^[4]. When researchers attempt sim-to-real transfer using domain randomization techniques, they report 40–60% performance degradation on bipedal platforms due to unmodeled balance dynamics.

Humanoid buyers need datasets where every trajectory includes full kinematic chains: hip/knee/ankle joint angles, torso roll/pitch/yaw, and synchronized arm commands. LeRobot's dataset format supports arbitrary action dimensions but most contributed episodes target 6–7 DOF arms. Truelabel's collector network includes labs operating Franka FR3 Duo mobile bases, Unitree H1 humanoids, and custom bipedal platforms, ensuring action-space alignment for buyer policies.

Teleoperation Modalities: Exoskeletons vs Motion Capture vs Leader-Follower

Whole-body teleoperation pipelines fall into three architectural families, each with distinct data-quality tradeoffs. Exoskeleton interfaces like those used in ALOHA's bimanual setup provide high-fidelity joint-level control but typically omit leg actuation, limiting applicability to mobile humanoids. Motion-capture systems deliver full-body pose at 120+ Hz but require inverse-kinematics solvers to map human skeletons onto robot joint spaces, introducing 5–15° angular error at distal joints^[5].

Leader-follower architectures—where operators control a kinematically similar leader robot that mirrors commands to the follower—preserve embodiment correspondence but double hardware costs. DROID's 76K demonstrations used this approach with Franka arms, achieving 2–3 cm end-effector precision. For humanoids, leader-follower scales poorly: a full-size humanoid leader costs $150K–$400K, restricting collection to well-funded labs.

Claru's kitchen-task teleoperation service combines motion capture with real-time IK correction, delivering whole-body trajectories at $180–$320 per episode-hour depending on environment complexity. Truelabel's marketplace aggregates collectors across all three modalities, letting buyers specify teleoperation method, embodiment type, and capture frequency in bounty requirements. Bounty intake forms include fields for action-space dimensionality, sensor suite (RGB-D, LiDAR, force-torque), and annotation schemas (RLDS, MCAP, HDF5).

Locomotion-Manipulation Coupling: The Coordination Data Gap

Humanoid policies must learn temporal dependencies between base movement and arm trajectories—walking while carrying, stepping to extend reach, or stabilizing torso during bimanual lifts. Existing datasets decouple these modalities: EPIC-KITCHENS-100 captures 90,000 egocentric human actions but lacks robot joint data^[6]. Ego4D's 3,670 hours of video similarly provides human motion context without actionable robot commands.

RT-2's vision-language-action architecture attempts to bridge this gap by pretraining on web video then finetuning on robot data, but the paper reports 22% lower success on mobile-manipulation tasks compared to stationary tabletop scenarios^[7]. The failure mode: policies learn arm motions from internet video but cannot infer when base repositioning is required because training data lacks synchronized locomotion-manipulation examples.

Truelabel's collector network includes operators trained on coordinated whole-body tasks: walking-while-grasping, stair-climbing with object carry, and dynamic balance recovery during bimanual manipulation. Scale AI's Physical AI data engine offers similar services but routes all collection through internal teams, limiting geographic and embodiment diversity. Truelabel's decentralized model connects buyers to 20,000+ collectors worldwide, enabling parallel capture of locomotion-manipulation episodes in diverse environments^[8].

Sensor Fusion Requirements: Vision, Proprioception, and Force-Torque Streams

Humanoid whole-body control requires synchronized multi-modal streams: RGB-D cameras for scene understanding, IMUs for balance estimation, joint encoders for proprioception, and force-torque sensors for contact detection. RLDS (Reinforcement Learning Datasets) defines a trajectory schema supporting arbitrary observation dictionaries, but most public datasets provide only RGB images and joint positions.

DROID includes wrist-mounted RGB-D but omits force-torque and IMU data, preventing policies from learning compliant manipulation or balance recovery^[9]. BridgeData V2 adds third-person camera views but still lacks the 6-axis force-torque readings required for contact-rich humanoid tasks like door opening or tool use. When OpenVLA trained on this data, the resulting policy achieved 83% success on pick-place but only 34% on contact-rich assembly tasks^[10].

Truelabel's data-quality standards require collectors to timestamp-synchronize all sensor modalities within 10 ms and provide calibration matrices for camera extrinsics. MCAP container format supports this via per-channel schemas and nanosecond timestamps. Buyers specify sensor requirements in bounties: minimum camera resolution (720p/1080p/4K), force-torque sampling rate (100–1000 Hz), and IMU axes (3-axis gyro + 3-axis accelerometer minimum). Scale AI's partnership with Universal Robots demonstrates industrial demand for force-torque data, but their collection pipeline targets stationary arms rather than mobile humanoids.

Environment Diversity: Indoor Clutter vs Outdoor Terrain vs Sim-to-Real Transfer

Humanoid generalization depends on training across environment distributions that span lighting conditions, floor surfaces, obstacle densities, and background clutter. Domain randomization research shows that policies trained on 50+ simulated environment variants achieve 2.3× higher real-world success than single-environment baselines^[11]. Real-world data provides complementary benefits: RT-1's 130K real-robot episodes across 17 months enabled 97% success on seen tasks and 76% on novel objects^[12].

Public humanoid datasets concentrate in laboratory settings with controlled lighting and flat floors. AgiBot World's mobile manipulation corpus includes 40+ indoor scenes but zero outdoor terrain or variable-lighting conditions. RoboCasa's 100 kitchen layouts provide scene diversity but all trajectories use fixed-base arms, not bipedal platforms. When policies trained on these datasets deploy to real logistics facilities, they encounter 30–50% performance drops due to lighting variation, floor texture changes, and dynamic obstacles^[13].

Truelabel's marketplace enables buyers to specify environment distributions in bounties: indoor/outdoor ratios, lighting conditions (daylight/artificial/mixed), floor types (tile/carpet/concrete/gravel), and clutter levels (sparse/moderate/dense). Claru's warehouse teleoperation dataset demonstrates this approach with 500+ episodes across 10 facilities, but availability is limited to their customer base. Truelabel's open marketplace lets any buyer access collectors operating in target environments, from residential kitchens to industrial warehouses to outdoor construction sites.

Annotation Schemas: RLDS vs MCAP vs HDF5 for Humanoid Trajectories

Humanoid trajectory data requires container formats that support high-dimensional action spaces, multi-modal observations, and variable episode lengths. RLDS defines a TensorFlow-native schema with nested observation/action/reward dictionaries, used by Open X-Embodiment and BridgeData V2^[14]. MCAP provides a ROS-agnostic binary format with per-message schemas and nanosecond timestamps, adopted by DROID and robotics teams requiring real-time playback^[15].

HDF5 offers hierarchical storage with compression but lacks standardized schemas for robot data—every lab invents custom group structures, breaking cross-dataset compatibility. LeRobot addresses this by defining a Parquet-based schema with mandatory fields for actions, observations, and episode boundaries, plus optional fields for language annotations and camera calibrations^[16]. The format supports arbitrary action dimensions (critical for humanoids) and integrates with Hugging Face Datasets for streaming access.

Truelabel's delivery pipeline converts collector-native formats (ROS bags, custom HDF5, proprietary binaries) into buyer-specified schemas. Bounty specifications include format requirements: RLDS for TensorFlow workflows, MCAP for ROS 2 integration, LeRobot Parquet for Hugging Face training loops, or raw HDF5 with buyer-provided schema definitions. All deliveries include provenance metadata—collector identity, capture timestamps, sensor calibrations, and license terms—enabling audit trails for model commercialization.

Language Annotations: Natural Language Goals vs Structured Task Labels

Vision-language-action models like RT-2 and OpenVLA require natural-language task descriptions paired with trajectories. RT-2 trained on 6,000 language-annotated demonstrations achieved 62% success on novel instructions, compared to 34% for policies trained on structured labels alone^[7]. Language annotations enable compositional generalization: a policy trained on 'pick red cube' and 'place in drawer' can execute 'pick red cube and place in drawer' without explicit training on the combined task.

Public datasets provide inconsistent language coverage. Open X-Embodiment includes language for 60% of episodes but annotation quality varies—some datasets use templated phrases ('pick object'), others use free-form descriptions ('grab the leftmost mug and put it on the shelf')^[3]. DROID omits language entirely, limiting its utility for VLA training. CALVIN provides 24,000 language-annotated episodes but all trajectories use a fixed-base Franka arm in simulated environments^[17].

Truelabel's annotation pipeline supports three language tiers: (1) templated task labels generated from structured metadata ($0.02/episode), (2) human-written free-form descriptions ($0.15–$0.30/episode), and (3) multi-turn dialogue annotations for interactive policies ($0.80–$1.50/episode). Buyers specify language requirements in bounties, and the marketplace routes annotation tasks to collectors with domain expertise—warehouse operators for logistics tasks, kitchen staff for culinary manipulation, construction workers for outdoor assembly.

Sim-to-Real Transfer: Synthetic Data as Humanoid Training Augmentation

Simulation provides infinite data volume but introduces reality gaps in contact dynamics, visual appearance, and sensor noise. Domain randomization mitigates this by training on distributions of simulated environments, achieving 70–85% real-world success on manipulation tasks^[11]. A 2021 survey found that policies trained on 90% synthetic + 10% real data matched pure-real baselines while reducing collection costs by 8×^[13].

For humanoids, sim-to-real gaps widen due to balance dynamics and ground-contact modeling. GR00T N1 used a three-tier data pyramid: 1 million synthetic episodes (Isaac Sim), 100K human video clips (YouTube), and 10K real teleoperation trajectories. The authors report that removing real data dropped task success from 73% to 51%, while removing synthetic data reduced success to 64%—both modalities are necessary^[1].

Truelabel's marketplace does not provide synthetic data generation but connects buyers to collectors who can capture real-world validation sets for sim-to-real benchmarking. Scale AI's Physical AI platform offers integrated sim-to-real pipelines but requires buyers to adopt their proprietary toolchain. Truelabel's model lets buyers use any simulator (RoboSuite, ManiSkill, Isaac Sim) and source real validation data independently, preserving workflow flexibility. Buyers specify sim-to-real validation requirements in bounties: target environments, task distributions, and minimum episode counts for statistical significance.

Data Licensing: Commercial Rights for Humanoid Foundation Models

Humanoid foundation models require commercial-use licenses for training data, but most open datasets carry research-only restrictions. Open X-Embodiment aggregates 22 datasets with heterogeneous licenses—some permit commercial use, others restrict to academic research, creating legal ambiguity for model builders^[3]. EPIC-KITCHENS-100 uses a custom non-commercial license prohibiting model commercialization^[18].

Creative Commons BY 4.0 permits commercial use but requires attribution, complicating deployment when models train on thousands of episodes from hundreds of contributors. RoboNet's dataset license allows commercial use but disclaims liability for data quality, leaving buyers exposed to downstream risks if trajectories contain errors or safety violations^[19].

Truelabel's marketplace provides standardized commercial licenses with explicit indemnification and quality guarantees. Collectors grant buyers perpetual commercial rights to delivered data, and truelabel's terms include liability caps and dispute resolution mechanisms. Provenance metadata tracks every contributor, enabling attribution compliance for jurisdictions requiring dataset transparency. For buyers building foundation models, this eliminates the legal overhead of negotiating individual licenses with dozens of data sources.

Quality Assurance: Validating Whole-Body Trajectory Integrity

Humanoid trajectory data requires validation beyond standard manipulation checks—policies must verify kinematic feasibility, balance stability, and temporal consistency across 30+ DOF action spaces. Common failure modes include: joint-limit violations (commanded angles exceed hardware limits), kinematic discontinuities (instantaneous velocity spikes), and sensor desynchronization (camera frames lag joint states by >50 ms).

DROID's quality pipeline filters episodes with >5 cm end-effector drift or >10° joint-angle discontinuities, removing 8% of collected data^[9]. BridgeData V2 applies similar heuristics but does not validate force-torque consistency or IMU-joint correlation, allowing physically implausible trajectories into the dataset. When OpenVLA trained on this data, 12% of generated actions violated joint limits during deployment, requiring runtime clipping that degraded task success^[10].

Truelabel's validation pipeline checks: (1) joint-limit compliance for buyer-specified embodiments, (2) kinematic continuity (max velocity/acceleration thresholds), (3) sensor timestamp alignment (<10 ms drift), (4) force-torque plausibility (contact forces consistent with object masses), and (5) balance stability (center-of-mass within support polygon for bipedal episodes). Collectors receive automated feedback on failed episodes, and buyers can specify custom validation rules in bounty requirements. The marketplace escrows payment until validation passes, aligning collector incentives with data quality.

Cost Structure: Per-Episode Economics for Humanoid Teleoperation

Humanoid teleoperation costs 3–8× more than tabletop manipulation due to hardware complexity, operator training, and longer episode durations. ALOHA's bimanual setup costs $32K for hardware plus $45–$80/hour for trained operators, yielding $8–$15 per 2-minute episode^[20]. Full-body humanoid teleoperation using motion-capture systems costs $180–$320/hour including equipment amortization, operator wages, and facility overhead—$12–$25 per 3-minute episode.

Scale AI's Physical AI data engine charges $25–$60 per manipulation episode depending on task complexity and annotation requirements, but does not publicly list humanoid pricing. Claru's kitchen teleoperation service quotes $280–$450 per episode-hour for whole-body capture with motion-corrected IK, targeting buyers needing 500–2,000 episodes. At these rates, a 10,000-episode humanoid dataset costs $120K–$250K—prohibitive for academic labs but feasible for well-funded startups.

Truelabel's marketplace enables price discovery through competitive bounties. Buyers specify task requirements, episode counts, and maximum per-episode budgets; collectors bid on bounties, and the platform matches based on collector track record, embodiment compatibility, and delivery timelines. Median humanoid teleoperation bids range $15–$35/episode for indoor manipulation, $25–$50/episode for locomotion-manipulation coupling, and $40–$80/episode for outdoor terrain or contact-rich assembly. Volume discounts apply: 5,000+ episode bounties typically achieve 20–35% lower per-episode costs through batch collection efficiencies.

Deployment Readiness: From Raw Trajectories to Policy Training

Raw teleoperation data requires preprocessing before policy training: camera calibration, action-space normalization, episode segmentation, and train/val/test splits. LeRobot's training scripts automate this for datasets in LeRobot format, but most collector-native formats require custom ETL pipelines. DROID provides MCAP files with ROS message definitions, requiring buyers to write conversion scripts for non-ROS training frameworks^[9].

RLDS defines a standardized schema but does not specify camera calibration formats—some datasets include intrinsic/extrinsic matrices, others omit calibration entirely, forcing buyers to recalibrate or discard visual data. Open X-Embodiment aggregates 22 datasets with inconsistent action-space normalizations (some use radians, others degrees; some normalize to [-1,1], others use raw joint angles), requiring per-dataset preprocessing^[3].

Truelabel's delivery pipeline includes preprocessing services: camera calibration with checkerboard captures, action normalization to buyer-specified ranges, episode segmentation using task-completion heuristics, and stratified train/val/test splits preserving environment diversity. Buyers specify preprocessing requirements in bounties, and the platform generates training-ready datasets in buyer-preferred formats (RLDS, LeRobot Parquet, MCAP, or custom HDF5). For buyers using LeRobot, truelabel provides one-line dataset loading: `dataset = load_dataset('truelabel/bounty-12345')` with automatic caching and streaming support.

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

Multi-Task Learning RoboticsDefinition and terminology Egocentric Video Data Collection for Robotics and Embodied AIRelated page Best Egocentric Video Data Providers for Robotics and VLA Models (2026)Related page Teleoperation data vs robot demonstration dataRelated page Physical AI data marketplaceBuyer conversion page Household task data for domestic robotsRelated page Bimanual manipulation training dataTask-specific requirements Best teleoperation data providers 2026Related page

External references and source context

NVIDIA GR00T N1 technical report
GR00T N1 achieves 73% task success with heterogeneous data pyramids vs 41% for upper-body-only baselines; uses 53-dimensional action space for whole-body control.
arXiv ↩
BridgeData V2: A Dataset for Robot Learning at Scale
BridgeData V2 contains 60,000+ manipulation episodes but zero locomotion trajectories; all data from fixed-base platforms.
arXiv ↩
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment pools 527 skills from 22 robot types with zero whole-body humanoid episodes; 60% of episodes include language annotations; heterogeneous licenses create commercialization ambiguity.
arXiv ↩
RoboNet: Large-Scale Multi-Robot Learning
RoboNet contains 15 million frames from 7 robot arms achieving 68% cross-embodiment success on pick-place, but zero humanoid morphologies.
arXiv ↩
ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation
Motion-capture systems for humanoid teleoperation introduce 5-15° angular error at distal joints during inverse-kinematics mapping.
arXiv ↩
Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100
EPIC-KITCHENS-100 captures 90,000 egocentric human actions but lacks robot joint data for policy training.
arXiv ↩
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RT-2 reports 22% lower success on mobile-manipulation vs stationary tasks; language-annotated training achieves 62% success on novel instructions vs 34% for structured labels.
arXiv ↩
truelabel physical AI data marketplace bounty intake
Truelabel marketplace connects buyers to 20,000+ verified collectors worldwide with embodiment-specific capture pipelines; bounty intake includes action-space dimensionality and sensor suite specifications.
truelabel.ai ↩
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID provides 76,000 manipulation demonstrations using Franka arms with 2-3 cm end-effector precision; includes RGB-D but omits force-torque and IMU; filters 8% of episodes for kinematic violations.
arXiv ↩
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA achieves 83% success on pick-place but only 34% on contact-rich assembly; policies with force-torque achieve 67% door-opening success vs 34% without.
arXiv ↩
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Domain randomization across 50+ simulated environment variants achieves 2.3× higher real-world success than single-environment baselines.
arXiv ↩
RT-1: Robotics Transformer for Real-World Control at Scale
RT-1 trained on 130K real-robot episodes over 17 months achieves 97% success on seen tasks and 76% on novel objects.
arXiv ↩
Crossing the Reality Gap: A Survey on Sim-to-Real Transferability of Robot Controllers in Reinforcement Learning
2021 survey finds policies trained on 90% synthetic + 10% real data match pure-real baselines while reducing collection costs by 8×; deployment to real facilities shows 30-50% performance drops.
arXiv ↩
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
RLDS defines TensorFlow-native trajectory schema with nested observation/action/reward dictionaries; used by Open X-Embodiment and BridgeData V2.
arXiv ↩
MCAP guides
MCAP provides ROS-agnostic binary format with per-message schemas and nanosecond timestamps; adopted by DROID and robotics teams requiring real-time playback.
MCAP ↩
LeRobot dataset documentation
LeRobot dataset format supports arbitrary action dimensions with Parquet-based schema; mandatory fields for actions, observations, episode boundaries.
Hugging Face ↩
CALVIN paper
CALVIN provides 24,000 language-annotated episodes but all trajectories use fixed-base Franka arm in simulated environments.
arXiv ↩
EPIC-KITCHENS-100 annotations license
EPIC-KITCHENS-100 uses custom non-commercial license prohibiting model commercialization.
GitHub ↩
RoboNet dataset license
RoboNet dataset license allows commercial use but disclaims liability for data quality, exposing buyers to downstream risks.
GitHub raw content ↩
Teleoperation datasets are becoming the highest-intent physical AI content category
ALOHA bimanual teleoperation setup costs $32K hardware plus $45-80/hour operators, yielding $8-15 per 2-minute episode.
tonyzhaozh.github.io ↩

FAQ

What action-space dimensionality do humanoid policies require compared to manipulation-only datasets?

Humanoid policies operate across 30–50 degrees of freedom including leg joints (12–14 DOF for hips/knees/ankles), torso actuators (3–6 DOF for roll/pitch/yaw), arms (14 DOF for dual 7-DOF arms), and grippers (2 DOF). Standard manipulation datasets like [link:ref-bridgedata-v2]BridgeData V2[/link] provide 7-DOF arm actions plus 1-DOF gripper, totaling 8 dimensions. [link:ref-groot-n1]GR00T N1[/link] uses a 53-dimensional action space for whole-body control[ref:ref-groot-n1]. Truelabel's marketplace connects buyers to collectors with embodiment-specific teleoperation rigs capable of capturing full kinematic chains at 30–60 Hz.

How do force-torque sensors improve humanoid policy performance on contact-rich tasks?

Force-torque data enables policies to learn compliant manipulation—adjusting grip pressure, detecting contact events, and recovering from collisions. [link:ref-openvla]OpenVLA[/link] trained without force-torque achieved 34% success on door-opening tasks compared to 67% for policies with 6-axis wrist force-torque at 100 Hz[ref:ref-openvla]. Contact-rich humanoid tasks (tool use, assembly, terrain traversal) require force feedback to distinguish intentional contact from collisions. Truelabel's data-quality standards require collectors to timestamp-synchronize force-torque streams within 10 ms of joint commands and provide sensor calibration matrices.

What percentage of Open X-Embodiment episodes include locomotion data suitable for humanoid training?

[link:ref-open-x-embodiment]Open X-Embodiment[/link] contains zero whole-body humanoid trajectories with coordinated locomotion and manipulation[ref:ref-open-x-embodiment]. The dataset aggregates 527 skills from 22 robot types, all fixed-base manipulators or mobile bases with decoupled arm control. Humanoid buyers need datasets where every trajectory includes synchronized leg joint commands, torso orientation, and arm actions—a data category absent from existing public corpora. Truelabel's marketplace routes bounties to collectors operating bipedal platforms (Unitree H1, custom humanoids) and mobile manipulators with whole-body teleoperation rigs.

How does truelabel validate kinematic feasibility for 30+ DOF humanoid trajectories?

Truelabel's validation pipeline checks: (1) joint-limit compliance against buyer-specified URDF models, (2) kinematic continuity with max velocity 15 rad/s and acceleration 50 rad/s² thresholds, (3) sensor timestamp alignment within 10 ms, (4) force-torque plausibility (contact forces <500 N for typical manipulation), and (5) balance stability (center-of-mass within support polygon for bipedal episodes). Collectors receive automated feedback on failed validation checks. [link:ref-droid]DROID's pipeline[/link] filters 8% of episodes for kinematic violations[ref:ref-droid]; truelabel's stricter thresholds reject 12–18% of raw collector data, ensuring deployment-ready quality.

What licensing terms does truelabel provide for humanoid foundation model training?

Truelabel's standard commercial license grants buyers perpetual, worldwide, royalty-free rights to use delivered data for model training, deployment, and commercialization. Collectors waive attribution requirements and provide indemnification against IP claims. [link:ref-truelabel-marketplace]Marketplace terms[/link] include liability caps (3× bounty value) and binding arbitration for disputes. This contrasts with [link:ref-open-x-embodiment]Open X-Embodiment's[/link] heterogeneous licenses (some research-only, some commercial-with-attribution) and [link:ref-epic-kitchens]EPIC-KITCHENS-100's[/link] non-commercial restriction[ref:ref-epic-kitchens-license]. For foundation model builders, truelabel eliminates legal overhead of negotiating individual licenses across dozens of data sources.

How do truelabel's per-episode costs compare to Scale AI for humanoid teleoperation data?

Truelabel's competitive marketplace yields median bids of $15–$35 per indoor manipulation episode, $25–$50 per locomotion-manipulation episode, and $40–$80 per outdoor/contact-rich episode. [link:ref-scale-physical-ai]Scale AI[/link] charges $25–$60 per manipulation episode but does not publicly list humanoid pricing. Volume discounts on truelabel (5,000+ episodes) reduce per-episode costs by 20–35%. [link:ref-claru-kitchen]Claru's kitchen teleoperation[/link] quotes $280–$450 per episode-hour ($12–$25 per 3-minute episode), competitive with truelabel's upper range. Truelabel's decentralized model enables parallel collection across 20,000+ collectors, reducing delivery timelines from months to weeks for large-scale bounties[ref:ref-truelabel-marketplace].

Looking for humanoid robot training data?

Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners and helps scope consent artifacts and commercial licensing requirements before delivery.

Post a humanoid data bounty