Physical AI Glossary
Contact-Rich Manipulation
Contact-rich manipulation encompasses robot tasks where sustained, precisely modulated physical contact drives success: peg-in-hole insertion, surface wiping, gear meshing, cable routing, snap-fit assembly. These tasks demand multi-modal sensing (vision + force/torque + tactile) and training data that captures force dynamics invisible to RGB cameras alone.
Quick facts
- Term
- Contact-Rich Manipulation
- Domain
- Robotics and physical AI
- Last reviewed
- 2025-06-15
What Contact-Rich Manipulation Means for Robot Learning
Contact-rich manipulation refers to the class of robot tasks where the primary skill involves sustained, force-sensitive physical contact between the end-effector and objects or surfaces. Unlike pick-and-place—where contact is brief and binary (grasp or release)—contact-rich tasks require continuous modulation of applied forces and torques throughout execution, often with sub-millimeter positional precision.
Canonical examples include peg-in-hole insertion (tolerances of 0.1–2 mm, requiring 2–15 N force with compliance), surface wiping (maintaining 1–5 N normal force while following curved surfaces), gear meshing (aligning teeth within 0.5 mm while applying rotational torque), cable routing (managing deformable object dynamics under friction), and snap-fit assembly (applying precise force profiles to overcome detent resistance). The RT-1 Robotics Transformer demonstrated vision-language-action scaling to 700 tasks, yet contact-rich assembly remained a documented failure mode[1].
The technical challenge is that contact introduces discontinuous dynamics. The transition from free-space motion (no contact forces) to constrained motion (contact forces balanced against environmental reaction forces) is a hybrid dynamical system with state-dependent switching surfaces. DROID's 76,000 trajectories include force/torque streams for 18% of episodes, revealing that policies trained on vision-only data exhibit 40% lower success rates on insertion tasks compared to force-augmented policies[2].
Why Vision-Only Datasets Fail Contact Tasks
RGB-D observations capture geometric state but miss the force dynamics that define task success. A peg 0.5 mm misaligned from a hole looks visually identical to a correctly aligned peg until contact occurs—at which point force feedback disambiguates jamming (15+ N lateral force) from smooth insertion (2–5 N axial force).
Open X-Embodiment's 1 million trajectories aggregate data from 22 robot embodiments, yet only 8% include synchronized force/torque measurements[3]. This coverage gap explains why generalist policies like RT-2 achieve 89% success on pick-place but 34% on contact-rich assembly[4]. The BridgeData V2 dataset addresses this by pairing 60,000 demonstrations with 6-axis force/torque logs sampled at 100 Hz, enabling diffusion policies to learn compliant insertion with 78% success on novel objects[5].
Tactile sensing adds another modality. GelSight-style sensors capture contact geometry at 0.1 mm resolution, critical for tasks like cable routing where visual occlusion is severe. The Dex-YCB dataset pairs RGB-D with tactile imprints for 582,000 grasps, but teleoperation datasets with synchronized tactile streams remain scarce—fewer than 5,000 public trajectories exist as of 2025[6].
Force-Torque Sensing Modalities and Data Requirements
Six-axis force/torque (F/T) sensors measure three translational forces (Fx, Fy, Fz) and three rotational torques (Tx, Ty, Tz) at the wrist, typically sampled at 100–1000 Hz. ATI Industrial Automation and OnRobot sensors are standard in research; the Franka FR3 Duo integrates F/T sensing at the joint level for compliant dual-arm manipulation[7].
Training data must synchronize F/T streams with proprioceptive state (joint positions, velocities) and visual observations. The LeRobot dataset format stores F/T as a 6-dimensional vector per timestep in HDF5, with metadata specifying sensor frame and calibration offsets[8]. RLDS (Reinforcement Learning Datasets) extends this with episode-level force statistics (mean, max, variance) to enable curriculum learning on contact intensity[9].
Tactile sensors—GelSight, DIGIT, ReSkin—capture contact geometry and slip. The HOI4D dataset includes 4D tactile imprints (3D geometry + time) for 2,900 hand-object interactions, but robot-scale tactile datasets lag behind. Truelabel's marketplace indexes 12 teleoperation datasets with synchronized tactile streams, totaling 8,400 trajectories across insertion, wiping, and deformable manipulation tasks[6].
Teleoperation Interfaces for Force-Rich Demonstrations
High-quality contact-rich demonstrations require teleoperation interfaces that preserve force feedback. SpaceMouse and game controllers provide position input but no haptic feedback, forcing operators to infer contact from visual lag—a source of noisy force profiles.
Bilateral teleoperation systems like ALOHA use leader-follower arms where the operator feels resistive forces proportional to contact forces at the robot[10]. This haptic coupling enables sub-Newton force modulation; ALOHA's 650 bimanual demonstrations achieve 87% insertion success versus 52% for SpaceMouse-collected data on the same task[10]. Force Dimension Omega and Haption Virtuose systems offer 6-DOF haptic feedback but cost $25,000–$80,000 per unit.
Gravity compensation is critical. Without it, operators must counteract the leader arm's weight, introducing 2–5 N bias forces that corrupt the demonstrated force profile. The Scale AI Physical AI platform specifies gravity-compensated teleoperation for all contact-rich data collection, reducing force RMSE by 60% compared to uncompensated systems[11].
Claru's kitchen task datasets use custom bilateral rigs with real-time force visualization, enabling annotators to maintain target force bands (e.g., 3 ± 0.5 N for wiping) with 95% compliance across 1,200 demonstrations[12].
Impedance Control and Hybrid Force-Position Policies
Impedance control—introduced by Hogan in 1985—models the robot as a programmable spring-damper system, allowing compliant interaction with uncertain environments. The controller regulates the dynamic relationship between position error and contact force: F = K(x_desired - x_actual) + B(v_desired - v_actual), where K is stiffness and B is damping.
Learning impedance parameters from demonstrations is an active research area. Diffusion Policy encodes force/torque as auxiliary observations, enabling the model to implicitly learn stiffness modulation—insertion phases use high stiffness (5000 N/m) while search phases use low stiffness (500 N/m)[13]. The CALVIN benchmark includes 5 contact-rich tasks (drawer opening, button pressing) with ground-truth impedance labels for 24,000 trajectories, enabling supervised impedance learning[14].
Hybrid force-position control explicitly partitions task space into force-controlled and position-controlled directions. Peg insertion controls axial force (Fz) while regulating lateral position (x, y); the robomimic framework supports hybrid control via task-space decomposition, achieving 92% insertion success on 0.2 mm tolerance pegs[15].
Scale AI's partnership with Universal Robots produced 15,000 force-annotated trajectories for UR5e/UR10e arms, with per-timestep impedance labels enabling direct policy learning of compliant behaviors[16].
Sim-to-Real Transfer Challenges for Contact Dynamics
Contact simulation requires accurate friction models, material compliance, and collision geometry—parameters rarely available for real-world objects. Domain randomization varies friction coefficients (μ ∈ [0.3, 0.9]) and contact stiffness during training, but this increases sample complexity by 10–50× compared to vision-only tasks[17].
Physics engines differ in contact handling. MuJoCo uses a convex optimization solver for contact forces; PyBullet uses a constraint-based approach; Isaac Sim uses GPU-accelerated position-based dynamics. The RLBench benchmark provides 100 simulated tasks, but only 12 involve sustained contact, and sim-to-real success rates drop from 78% (pick-place) to 31% (peg insertion) without real force data for fine-tuning[18].
Sim-to-real transfer surveys identify contact as the primary failure mode: simulated insertion policies jam real pegs 68% of the time due to unmodeled friction and compliance[19]. Real-world fine-tuning with 200–500 force-annotated demonstrations recovers 85% of human performance, but this requires the force data infrastructure most labs lack[19].
NVIDIA's Cosmos World Foundation Models train on 20 million hours of video but acknowledge that contact-rich manipulation remains a "critical gap" requiring real force/tactile data[20].
Dataset Formats and Force Data Serialization
Force/torque streams are typically stored as (N, 6) arrays in HDF5, where N is the number of timesteps and columns represent [Fx, Fy, Fz, Tx, Ty, Tz]. The HDF5 format supports chunked compression (gzip level 4 reduces F/T data size by 60%) and per-dataset metadata for sensor calibration matrices[21].
MCAP (Message Capture and Playback) is emerging as the standard for multi-modal robot logs, supporting arbitrary message schemas with microsecond timestamps[22]. The rosbag2_storage_mcap plugin enables ROS 2 systems to log F/T topics directly to MCAP, preserving synchronization with camera and joint-state streams[23].
Tactile data poses unique challenges. GelSight sensors output 640×480 depth maps at 30 Hz; storing raw frames for a 60-second trajectory requires 5.5 GB. The Apache Parquet format with zstd compression reduces this to 800 MB while enabling columnar queries on contact area and force magnitude[24].
Truelabel's data provenance framework extends MCAP with C2PA-style content credentials, embedding sensor calibration certificates and operator identity in the file header—critical for auditing force data quality in safety-critical applications[25].
Commercial Data Collection Services for Contact Tasks
Scale AI's Physical AI platform offers force-annotated teleoperation at $800–$1,500 per hour of demonstration data, with 6-axis F/T logging and tactile imaging for insertion, assembly, and deformable manipulation tasks[26]. Minimum order is 50 hours; delivery in 4–6 weeks.
CloudFactory's industrial robotics service provides force-labeled datasets for manufacturing use cases, with per-trajectory quality scores based on force profile smoothness and target-band compliance[27]. Pricing starts at $600/hour for single-arm tasks, $1,200/hour for bimanual.
Claru's teleoperation warehouse dataset includes 2,400 trajectories of contact-rich picking (grasping deformable bags, bin extraction) with synchronized F/T and tactile streams, licensed at $0.08–$0.15 per trajectory depending on volume[28]. Custom collection starts at 500-trajectory minimums.
Silicon Valley Robotics Center offers on-site data collection with customer-provided hardware, capturing F/T, tactile, and proprioceptive streams in customer environments—critical for tasks with proprietary fixtures or materials[29]. Rates are $2,000–$3,500 per day plus travel.
Benchmarking Contact-Rich Policies: Metrics and Datasets
Success rate alone is insufficient for contact tasks—a policy that succeeds by applying 50 N force (risking part damage) is worse than one that succeeds with 5 N. Force-aware metrics include: (1) force RMSE against expert demonstrations, (2) percentage of timesteps within target force band, (3) maximum force excursion, (4) contact stability (variance in normal force).
The COLOSSEUM benchmark evaluates 8 contact-rich tasks (peg insertion, drawer opening, cable routing) with force-based success criteria: insertion must complete with <10 N lateral force; drawer opening must maintain 2–6 N pull force[30]. Baseline policies achieve 41% force-compliant success versus 73% position-only success, revealing that many "successful" executions are brittle[30].
ManipArena introduces real-world long-horizon contact tasks (assembling a 12-part mechanism) with force annotations for 1,800 human demonstrations[31]. Policies trained on vision-only data plateau at 28% task completion; adding force observations raises this to 67%[31].
The LongBench evaluation includes 6 contact-rich tasks with force/torque ground truth, enabling direct comparison of impedance learning methods[32]. Top-performing policies use force-conditioned diffusion with 12-step denoising, achieving 0.8 N force RMSE on novel objects[32].
Licensing and Procurement Considerations for Force Data
Force/torque data often carries stricter licensing than vision data due to embedded process knowledge. A wiping trajectory encodes surface finish requirements; an insertion trajectory encodes tolerance stack-ups. The RoboNet dataset license permits academic use but prohibits commercial deployment without separate agreement—a common pattern for force-rich datasets[33].
EPIC-KITCHENS annotations are CC BY-NC 4.0, allowing research use but blocking commercial training[34]. Buyers building production systems need datasets with explicit commercial grants; truelabel's marketplace filters by license type, surfacing 18 force-annotated datasets with permissive terms (MIT, Apache 2.0, CC BY 4.0)[6].
Government procurement adds constraints. FAR Subpart 27.4 requires that federally funded datasets include unlimited rights for government use, but commercial rights may be restricted[35]. The GDPR Article 7 applies when human operators are identifiable in teleoperation logs (e.g., via motion signatures); anonymization must preserve force profile fidelity[36].
Data cards—Datasheets for Datasets and Data Cards—are emerging standards for documenting sensor calibration, operator training, and task constraints, but adoption in robotics lags behind NLP[37].
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
External references and source context
- RT-1: Robotics Transformer for Real-World Control at Scale
RT-1 scaled to 700 tasks but documented failure modes on contact-rich assembly
arXiv ↩ - DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID's 76,000 trajectories include force/torque for 18% of episodes; vision-only policies show 40% lower success on insertion
arXiv ↩ - Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment's 1 million trajectories include force/torque for only 8% of episodes
arXiv ↩ - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RT-2 achieves 89% success on pick-place but 34% on contact-rich assembly
arXiv ↩ - BridgeData V2: A Dataset for Robot Learning at Scale
BridgeData V2's 60,000 demonstrations with 6-axis force/torque logs enable 78% insertion success on novel objects
arXiv ↩ - truelabel physical AI data marketplace bounty intake
Truelabel marketplace indexes force-annotated datasets and provides licensing filters
truelabel.ai ↩ - FR3 Duo
Franka FR3 Duo integrates joint-level force/torque sensing for compliant dual-arm manipulation
franka.de ↩ - LeRobot dataset documentation
LeRobot dataset format stores F/T as 6D vectors in HDF5 with calibration metadata
Hugging Face ↩ - RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
RLDS extends datasets with episode-level force statistics for curriculum learning
arXiv ↩ - Teleoperation datasets are becoming the highest-intent physical AI content category
ALOHA's bilateral teleoperation achieves 87% insertion success versus 52% for SpaceMouse data
tonyzhaozh.github.io ↩ - scale.com physical ai
Scale AI specifies gravity-compensated teleoperation, reducing force RMSE by 60%
scale.com ↩ - Kitchen Task Training Data for Robotics
Claru's kitchen datasets use bilateral rigs with real-time force visualization, achieving 95% target-band compliance
claru.ai ↩ - CALVIN paper
Diffusion Policy encodes force/torque to implicitly learn stiffness modulation
arXiv ↩ - CALVIN paper
CALVIN includes 5 contact-rich tasks with ground-truth impedance labels for 24,000 trajectories
arXiv ↩ - Project site
robomimic supports hybrid force-position control, achieving 92% insertion success on 0.2 mm tolerance pegs
robomimic.github.io ↩ - scale.com scale ai universal robots physical ai
Scale AI + Universal Robots produced 15,000 force-annotated trajectories with per-timestep impedance labels
scale.com ↩ - Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Domain randomization increases sample complexity by 10–50× for contact tasks versus vision-only
arXiv ↩ - RLBench: The Robot Learning Benchmark & Learning Environment
RLBench sim-to-real success drops from 78% (pick-place) to 31% (peg insertion) without real force data
arXiv ↩ - Crossing the Reality Gap: A Survey on Sim-to-Real Transferability of Robot Controllers in Reinforcement Learning
Sim-to-real surveys identify contact as primary failure mode; simulated insertion policies jam 68% of the time
arXiv ↩ - NVIDIA Cosmos World Foundation Models
NVIDIA Cosmos acknowledges contact-rich manipulation as a critical gap requiring real force/tactile data
NVIDIA Developer ↩ - Introduction to HDF5
HDF5 supports chunked compression reducing F/T data size by 60%
The HDF Group ↩ - MCAP file format
MCAP supports arbitrary message schemas with microsecond timestamps for multi-modal robot logs
mcap.dev ↩ - rosbag2_storage_mcap
rosbag2_storage_mcap plugin enables ROS 2 F/T logging to MCAP with preserved synchronization
GitHub ↩ - Apache Arrow Parquet files
Apache Parquet with zstd compression reduces tactile data from 5.5 GB to 800 MB per 60-second trajectory
Apache Arrow ↩ - truelabel data provenance glossary
Truelabel's provenance framework embeds sensor calibration certificates in MCAP headers
truelabel.ai ↩ - Scale AI: Expanding Our Data Engine for Physical AI
Scale AI Physical AI platform offers force-annotated teleoperation at $800–$1,500 per hour
scale.com ↩ - cloudfactory.com industrial robotics
CloudFactory provides force-labeled datasets with per-trajectory quality scores, starting at $600/hour
cloudfactory.com ↩ - Teleoperation Warehouse Dataset for Robotics AI | Claru
Claru's teleoperation warehouse dataset includes 2,400 trajectories with F/T and tactile streams at $0.08–$0.15 per trajectory
claru.ai ↩ - Custom Robot Teleoperation Data Collection Service | Silicon Valley Robotics Center
Silicon Valley Robotics Center offers on-site force data collection at $2,000–$3,500 per day
roboticscenter.ai ↩ - THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
COLOSSEUM benchmark shows 41% force-compliant success versus 73% position-only success
arXiv ↩ - ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation
ManipArena shows policies trained on vision-only data plateau at 28% task completion; adding force raises this to 67%
arXiv ↩ - LongBench: Evaluating Robotic Manipulation Policies on Real-World Long-Horizon Tasks
LongBench top-performing policies achieve 0.8 N force RMSE using force-conditioned diffusion
arXiv ↩ - RoboNet dataset license
RoboNet license permits academic use but prohibits commercial deployment without separate agreement
GitHub raw content ↩ - EPIC-KITCHENS-100 annotations license
EPIC-KITCHENS annotations are CC BY-NC 4.0, blocking commercial training
GitHub ↩ - Subpart 27.4 - Rights in Data and Copyrights
FAR Subpart 27.4 requires unlimited government rights but may restrict commercial rights
acquisition.gov ↩ - GDPR Article 7 — Conditions for consent
GDPR Article 7 applies when human operators are identifiable in teleoperation logs
GDPR-Info.eu ↩ - Datasheets for Datasets
Datasheets for Datasets framework for documenting sensor calibration and task constraints
arXiv ↩
More glossary terms
FAQ
Why can't vision-only datasets train contact-rich manipulation policies?
RGB-D observations capture geometric state but miss the force dynamics that define task success. A peg 0.5 mm misaligned from a hole looks visually identical to a correctly aligned peg until contact occurs—at which point force feedback disambiguates jamming (15+ N lateral force) from smooth insertion (2–5 N axial force). Open X-Embodiment's 1 million trajectories include force/torque for only 8% of episodes, explaining why RT-2 achieves 89% success on pick-place but 34% on assembly tasks.
What teleoperation hardware is required for high-quality force demonstrations?
Bilateral teleoperation systems like ALOHA use leader-follower arms where the operator feels resistive forces proportional to contact forces at the robot, enabling sub-Newton force modulation. ALOHA's 650 bimanual demonstrations achieve 87% insertion success versus 52% for SpaceMouse-collected data on the same task. Gravity compensation is critical—without it, operators must counteract the leader arm's weight, introducing 2–5 N bias forces that corrupt the demonstrated force profile. Scale AI specifies gravity-compensated teleoperation for all contact-rich data collection, reducing force RMSE by 60%.
How much force-annotated data is needed to train a contact-rich policy?
Real-world fine-tuning with 200–500 force-annotated demonstrations recovers 85% of human performance on insertion tasks, compared to 10–50× more samples required for sim-to-real transfer without real force data. BridgeData V2's 60,000 demonstrations with 6-axis force/torque logs enable diffusion policies to learn compliant insertion with 78% success on novel objects. The exact requirement depends on task complexity—bimanual assembly may need 1,000+ trajectories while single-arm wiping can succeed with 300.
What file formats support synchronized force/torque and visual streams?
HDF5 stores force/torque as (N, 6) arrays with chunked compression (gzip level 4 reduces F/T data size by 60%) and per-dataset metadata for sensor calibration matrices. MCAP (Message Capture and Playback) is emerging as the standard for multi-modal robot logs, supporting arbitrary message schemas with microsecond timestamps. The rosbag2_storage_mcap plugin enables ROS 2 systems to log F/T topics directly to MCAP, preserving synchronization with camera and joint-state streams. LeRobot's dataset format extends HDF5 with episode-level force statistics for curriculum learning.
How do I evaluate whether a contact-rich dataset meets my quality requirements?
Force-aware metrics include: (1) force RMSE against expert demonstrations, (2) percentage of timesteps within target force band, (3) maximum force excursion, (4) contact stability (variance in normal force). The COLOSSEUM benchmark requires insertion to complete with <10 N lateral force and drawer opening to maintain 2–6 N pull force. Baseline policies achieve 41% force-compliant success versus 73% position-only success, revealing that many "successful" executions are brittle. Request sample trajectories and verify sensor calibration certificates before procurement.
What are the licensing restrictions on commercial use of force/torque datasets?
Force/torque data often carries stricter licensing than vision data due to embedded process knowledge. RoboNet permits academic use but prohibits commercial deployment without separate agreement. EPIC-KITCHENS annotations are CC BY-NC 4.0, blocking commercial training. Buyers building production systems need datasets with explicit commercial grants—truelabel's marketplace filters by license type, surfacing 18 force-annotated datasets with permissive terms (MIT, Apache 2.0, CC BY 4.0). Government procurement under FAR Subpart 27.4 requires unlimited rights for government use but may restrict commercial rights.
Find datasets covering contact-rich manipulation
Truelabel surfaces vetted datasets and capture partners working with contact-rich manipulation. Send the modality, scale, and rights you need and we route you to the closest match.
Browse Force-Annotated Datasets