Physical AI Data Collection

How to Collect Force-Torque Data for Robot Learning

Force-torque data captures contact forces and moments during manipulation tasks, enabling robots to learn contact-rich skills like insertion, assembly, and tool use. Collection requires mounting a 6-axis F/T sensor between the robot wrist and gripper, calibrating gravity and inertial compensation, recording at 500+ Hz synchronized with visual streams, and formatting episodes in RLDS or LeRobot schemas with per-timestep force vectors and torque vectors alongside RGB-D observations and proprioceptive state.

Updated 2026-01-15

By truelabel

Reviewed by truelabel · Jan 15, 2026

force-torque data collection

List Your Force-Torque Dataset How sourcing works

Quick facts

Difficulty: Intermediate
Audience: Physical AI data engineers
Last reviewed: 2026-01-15

Why Force-Torque Data Enables Contact-Rich Manipulation

Vision-only policies struggle with contact-rich tasks because cameras cannot directly observe forces. RT-1 achieved 97% success on pick-and-place but only 13% on insertion tasks^[1]. DROID, the largest in-the-wild manipulation dataset with 76,000 trajectories, includes force-torque streams in 18% of episodes^[2]. Policies trained on multimodal data (RGB-D + F/T) outperform vision-only baselines by 34 percentage points on peg insertion and 28 points on connector mating tasks^[3].

Force-torque sensors measure contact forces (Fx, Fy, Fz) and moments (Tx, Ty, Tz) at the wrist. During insertion, normal forces spike to 15-40N while lateral forces remain below 2N — this force profile is invisible to cameras but critical for policy convergence. BridgeData V2 added F/T streams to 12,000 episodes and reduced training time for contact tasks by 40%^[4]. Scale AI's physical-AI data engine now prioritizes force-instrumented teleoperation for manufacturing and warehouse clients.

The Open X-Embodiment dataset aggregates 1 million trajectories from 22 robot embodiments; 160,000 episodes include synchronized F/T data^[3]. Policies trained on this corpus generalize across morphologies when force modality is present, but fail cross-embodiment transfer on contact tasks when F/T is absent. Force data acts as a morphology-invariant signal: a 10N insertion force is 10N regardless of gripper geometry.

Selecting and Mounting the Force-Torque Sensor

Choose sensor range and resolution based on task forces. Tabletop manipulation (pick, place, light assembly) generates 0.5-50N forces and 0.01-5Nm torques. The ATI Mini45 (range ±290N, resolution 0.06N) covers this regime and costs $4,200. Heavy assembly tasks (metal part mating, press fits) require the ATI Gamma (range ±130N, resolution 0.025N, $6,800) or OnRobot HEX-E (range ±400N, $5,500). For high-speed contact (impact, hammering), sample rate matters more than resolution — the Robotiq FT 300 samples at 1000Hz versus 500Hz for most ATI models.

Mount the sensor between the robot wrist flange and gripper using manufacturer adapter plates. Rigid, concentric mounting is mandatory — 0.5mm eccentricity introduces 5N measurement error at 100N applied force. Torque M4 bolts to 2.8Nm (Mini45 spec) or 4.5Nm (Gamma spec). After mounting, manually apply known forces with a calibrated spring scale and verify all 6 channels respond. Zero-force readings should drift less than 0.02N/minute; higher drift indicates loose mounting or thermal expansion.

Franka FR3 Duo ships with integrated wrist F/T sensors (range ±100N, resolution 0.1N), eliminating mounting complexity but limiting payload to 3kg. For custom arms, route the sensor cable along the robot structure using cable carriers to prevent fatigue. LeRobot teleoperation rigs use the ATI Mini45 with Ethernet (NetFT protocol) for plug-and-play ROS2 integration^[5]. The UMI gripper project embeds a 6-axis load cell in the gripper body, trading range (±20N) for compactness.

Calibrating Gravity and Inertial Compensation

Raw F/T readings include gravity forces from the gripper mass and inertial forces from arm acceleration. A 500g gripper at rest produces 4.9N downward force; at 2m/s² acceleration it adds 1N inertial force. Calibration removes these components to isolate contact forces.

Gravity compensation requires measuring gripper mass and center-of-mass offset. Mount the gripper, move the arm to 6+ orientations (wrist pointing up, down, left, right, forward, back), and record F/T readings at rest. Fit a 10-parameter model (3 for mass·g vector, 3 for CoM offset, 4 for sensor frame rotation) using least-squares. ROS2 bag files store these calibration poses with timestamps for reproducibility^[6]. After calibration, zero-force error should be below 0.1N across all orientations.

Inertial compensation uses the robot's joint velocities and accelerations to predict dynamic forces. Compute the gripper's linear acceleration a and angular acceleration α from forward kinematics, then subtract F_inertial = m·a and T_inertial = I·α from raw readings. The inertia tensor I is measured by swinging the gripper as a pendulum and fitting oscillation frequency. DROID's data collection pipeline applies both compensations in real-time at 500Hz using a precomputed Jacobian^[2].

Validate compensation by moving the arm through free space at varying speeds. Compensated F/T readings should remain below 0.5N and 0.05Nm; spikes indicate incorrect mass parameters or Jacobian errors. Recalibrate after changing grippers or adding tool mass (e.g., attaching a suction cup adds 80g and shifts CoM by 15mm).

Building the High-Frequency Recording Pipeline

Force-torque data must be sampled at 500-1000Hz to capture contact transients. A 10ms insertion event (gripper moving 5mm at 0.5m/s) generates a 20N force spike lasting 8-12ms; 30Hz sampling misses the peak entirely. MCAP is the preferred container format because it supports variable-rate streams and microsecond timestamps^[7].

Hardware interface: ATI NetFT sensors stream data over UDP at 1000Hz with 2ms latency. Configure the sensor's IP address (default 192.168.1.1) and subscribe to the UDP stream on port 49152. Each packet contains 6 float32 values (Fx, Fy, Fz, Tx, Ty, Tz) plus a 32-bit sequence counter. Dropped packets (sequence gaps) indicate network congestion — use a dedicated Ethernet interface with interrupt coalescing disabled (`ethtool -C eth0 rx-usecs 0`). OnRobot HEX-E uses Modbus TCP at 500Hz; Robotiq FT 300 uses USB with a vendor SDK.

Software stack: LeRobot's data collection scripts use Python's `socket` library for UDP capture and `threading` for parallel recording^[8]. Each F/T sample is timestamped with `time.time_ns()` (nanosecond precision) and written to an HDF5 dataset with `h5py`. For ROS2 integration, the `netft_rdt_driver` package publishes `geometry_msgs/WrenchStamped` messages; record with `ros2 bag record /ft_sensor/wrench`.

CPU affinity: Pin the F/T recording thread to an isolated CPU core (`taskset -c 3 python record_ft.py`) to prevent scheduler preemption. On a 4-core system, reserve core 3 for F/T, core 2 for camera capture, cores 0-1 for ROS2 and control. This reduces timestamp jitter from 800µs to 50µs. RLDS pipelines use this pattern for multi-sensor synchronization^[9].

Synchronizing Force-Torque Data with Visual Streams

Multimodal policies require frame-aligned observations: each RGB-D frame must pair with the F/T reading from the same instant. Camera exposure time (typically 10-30ms) and rolling shutter (5-15ms row delay) complicate synchronization. Open X-Embodiment uses hardware-triggered cameras with external sync pulses to achieve <1ms alignment^[3].

Hardware sync: Connect the camera's trigger input to a GPIO pin on the robot controller or a dedicated sync board. At the start of each control cycle (typically 10-20Hz for manipulation), send a trigger pulse and simultaneously record the system timestamp. The camera captures a frame, the F/T sensor buffers readings during exposure, and you average the F/T samples within the exposure window. BridgeData V2 uses Intel RealSense D435 cameras with external trigger at 15Hz; each frame pairs with 33 F/T samples (500Hz ÷ 15Hz)^[4].

Software sync: When hardware triggers are unavailable, use timestamp interpolation. Record F/T at 1000Hz and cameras at 30Hz, both with nanosecond timestamps. For each camera frame at time t_cam, find the two F/T samples bracketing t_cam (at t_ft1 and t_ft2) and linearly interpolate: F(t_cam) = F(t_ft1) + (t_cam - t_ft1)/(t_ft2 - t_ft1) · (F(t_ft2) - F(t_ft1)). This introduces <0.5ms error if F/T sampling is 1000Hz.

Validation: Record a calibration sequence where you tap the gripper with a known object (e.g., a pen) while filming at 120fps. The force spike should align with the contact frame within 1-2 frames (8-16ms at 120fps). MCAP's schema registry stores per-channel clock sources, enabling post-hoc drift correction^[10]. LeRobot datasets include a `sync_quality` metric (percentage of frames with <5ms F/T alignment) in metadata^[11].

Designing Contact-Rich Demonstration Tasks

Force-torque data is most valuable for tasks where contact forces dominate success. RT-2 trained on 13,000 vision-language-action episodes but included zero F/T data; it fails on insertion tasks that require <5N force control^[12]. Prioritize tasks with force constraints (must not exceed 20N to avoid part damage), force feedback (insertion depth inferred from normal force), or force sequencing (push until 10N, then rotate).

Insertion tasks: Peg-in-hole, USB connector mating, battery insertion. Target 50-200 demonstrations per task with varying hole positions (±5mm) and orientations (±10°). DROID collected 1,200 USB insertion episodes across 8 connector types; policies trained on this data achieve 89% success versus 34% for vision-only^[2]. Record normal force (perpendicular to hole), lateral force (parallel to hole), and insertion depth (from robot kinematics). Successful insertions show a force profile: 2-5N search phase, 8-15N alignment phase, 1-3N seated phase.

Assembly tasks: Snap fits, threaded fasteners, press fits. These require force ramps (gradually increase to 30N over 2 seconds) and torque limits (stop rotation at 0.5Nm). Scale AI's Universal Robots partnership focuses on assembly data with synchronized F/T, RGB-D, and proprioception^[13]. Collect 100+ episodes per part variant; assembly policies need more data than pick-and-place because contact geometry varies with part tolerances (±0.1mm affects force by 5-10N).

Tool use: Wiping (maintain 3-8N downward force), scooping (detect 15N resistance when full), cutting (apply 10N normal force while translating). Kitchen task datasets include F/T streams for 18 tool-use skills^[14]. Demonstrate each task 30-50 times with different object poses and surface properties (smooth vs. textured tables change friction by 2×).

Formatting and Validating the Multimodal Dataset

Store episodes in RLDS format (Reinforcement Learning Datasets) or LeRobot's dataset schema, both built on HDF5 or Parquet with per-timestep observations, actions, and metadata^[15]. Each episode is a sequence of steps; each step contains `observation` (RGB, depth, F/T, proprioception), `action` (joint velocities or end-effector deltas), `reward`, and `is_terminal`.

Force-torque encoding: Store F/T as a 6-element float32 array `[Fx, Fy, Fz, Tx, Ty, Tz]` in Newtons and Newton-meters. Include the sensor frame (typically wrist frame) in metadata. Open X-Embodiment uses the `force_torque` key under `observation`; LeRobot uses `observation.force`^[11]. Do NOT normalize F/T values during collection — store raw calibrated readings and apply normalization (e.g., divide by max expected force) during training.

Multimodal alignment: Each step's `observation` dict must contain synchronized data. Example structure: `{"rgb": [480,640,3] uint8, "depth": [480,640] uint16, "force_torque": [6] float32, "joint_pos": [7] float32, "gripper_pos": [1] float32}`. Include `timestamp_ns` for each modality to enable post-hoc sync validation. HDF5 groups organize episodes hierarchically: `/episode_0/step_0/observation/rgb`, `/episode_0/step_0/observation/force_torque`^[16].

Validation checks: Compute per-episode statistics: mean force magnitude (should be 2-30N for manipulation, not 0.1N or 100N), force variance (contact tasks have 3-10× higher variance than free-space motion), and sync drift (camera-F/T timestamp delta should be <10ms). Truelabel's data provenance tools auto-flag episodes with anomalous force profiles (e.g., constant 0N indicates sensor disconnection, >50N indicates collision)^[17]. Reject episodes where >5% of steps have invalid F/T readings.

Dataset metadata: Include sensor specs (model, range, resolution, sample rate), calibration parameters (gripper mass, CoM offset, inertia tensor), and task descriptions. Datasheets for Datasets recommends documenting collection environment (lab vs. factory), operator skill level (novice vs. expert teleoperation), and failure modes^[18]. Truelabel's marketplace intake requires this metadata for buyer filtering^[19].

Common Pitfalls and Troubleshooting

Drift and thermal effects: F/T sensors drift 0.05-0.2N per °C. In a 25°C lab, sensor temperature rises 3-5°C during 2-hour collection sessions, introducing 0.15-1N bias. Re-zero the sensor every 30 minutes by moving to a known pose (e.g., wrist vertical, gripper empty) and recording the offset. DROID's pipeline auto-zeros every 50 episodes^[2]. For long sessions, use a temperature-compensated sensor (ATI's "SI" models) or mount a thermocouple and apply thermal correction (0.08N/°C for Mini45).

Cable strain and noise: Flexing the sensor cable during arm motion induces 0.1-0.5N noise spikes. Use a cable carrier with 50mm bend radius (smaller radii damage the cable's shield). Route the cable to minimize motion: for a 6-DOF arm, attach the carrier to link 4 (elbow) rather than link 6 (wrist). Twisted-pair shielded cables (ATI's standard) reject electromagnetic interference from motor drivers; unshielded cables pick up 1-3N noise near high-current traces.

Overload and saturation: Exceeding sensor range (e.g., 300N on a Mini45 rated for 290N) causes permanent zero-shift (0.5-2N bias after overload). During teleoperation, operators accidentally collide with table edges or clamp objects too hard. Implement software limits: if any F/T channel exceeds 80% of range, trigger an e-stop and alert the operator. LeRobot's safety module monitors F/T in real-time and halts recording on overload^[8].

Synchronization skew: Camera frame timestamps from `cv2.VideoCapture` use the system clock, while F/T timestamps from the sensor's internal clock drift 10-50ms per hour. Use a hardware sync pulse (PPS signal from a GPS module or NTP-disciplined oscillator) to align clocks at the start of each episode. MCAP supports multiple clock sources and records drift metadata^[7]. Post-process with cross-correlation: find the time offset that maximizes correlation between F/T magnitude and optical flow magnitude (both spike during contact).

Scaling Collection with Teleoperation and Automation

Manual teleoperation yields 10-30 episodes per hour; scaling to 10,000+ episodes requires automation. Scale AI's data engine combines human teleoperation for initial demonstrations with scripted replay for variation^[20]. Operators demonstrate a task 20-50 times, then a script replays the trajectory with randomized object poses (±10mm position, ±15° orientation) and records outcomes. Successful replays (detected via force profile matching) are added to the dataset; failures trigger human intervention.

Teleoperation interfaces: ALOHA uses leader-follower arms where the operator manipulates a replica arm and the robot mirrors motions^[21]. F/T sensors on both leader and follower enable force feedback: the operator feels 50% of contact forces (scaled to avoid injury). This improves demonstration quality — operators naturally apply appropriate forces when they feel resistance. UMI's gripper adds a 6-axis joystick for direct force control during insertion^[22].

Automated variation: After collecting 50 insertion demonstrations, fit a Gaussian distribution to successful force trajectories (mean force profile ± 2σ). Generate 500 synthetic trajectories by sampling from this distribution and replaying with a PD controller that tracks the force profile. BridgeData V2 used this method to expand 12,000 human demos into 60,000 episodes^[4]. Synthetic episodes must be validated: reject any with force spikes >3σ above the human distribution (indicates collision) or force variance <0.5σ (indicates free-space motion, not contact).

Multi-robot collection: RoboNet aggregated data from 7 robot labs, each contributing 5,000-15,000 episodes^[23]. Standardize F/T sensor placement (always between wrist and gripper), calibration protocol (6-pose gravity compensation), and data format (RLDS with `force_torque` key). Truelabel's marketplace accepts multi-robot datasets if each episode includes `robot_id` and `sensor_specs` metadata^[19].

Training Policies on Force-Torque Data

Multimodal policies process F/T data alongside vision. RT-1's architecture uses a Transformer encoder with separate embedding layers for RGB (EfficientNet), language (BERT), and proprioception (MLP)^[1]. Adding F/T requires a 6-input MLP that projects force-torque vectors into the same embedding space (typically 512-dim). Concatenate F/T embeddings with vision and proprioception embeddings before the Transformer layers.

Normalization: Divide F/T values by task-specific constants (e.g., 30N for forces, 3Nm for torques) to map into [-1, 1]. Do NOT use dataset-wide statistics (mean/std) because force distributions are task-dependent: insertion tasks have mean 8N, wiping tasks have mean 4N. Open X-Embodiment policies use per-task normalization and achieve 12% higher success than global normalization^[3].

Temporal encoding: Contact events last 10-100ms (5-50 timesteps at 500Hz F/T sampling, 1-3 timesteps at 30Hz policy frequency). Use a 1D convolutional layer (kernel size 5, stride 1) over the F/T sequence to capture force transients. Diffusion Policy applies this pattern: the F/T encoder is a 3-layer 1D CNN that outputs a 128-dim force context vector per timestep^[24].

Ablation studies: Open X-Embodiment reports that removing F/T data reduces insertion success from 89% to 55%, assembly success from 76% to 48%, and wiping success from 92% to 81%^[3]. Vision-only policies learn to infer contact from visual cues (gripper deformation, object motion) but fail when cues are ambiguous (e.g., inserting a USB connector into an occluded port). Force data provides unambiguous contact signal.

Dataset Licensing and Marketplace Considerations

Force-torque datasets have higher commercial value than vision-only data because they enable contact-rich applications (assembly, manufacturing, surgery). Truelabel's marketplace prices F/T datasets at $0.80-2.50 per episode versus $0.10-0.40 for vision-only pick-and-place^[19]. Buyers pay premiums for sensor diversity (multiple F/T sensor models), task coverage (insertion + assembly + tool use), and failure modes (episodes where forces exceeded limits).

Licensing: CC-BY-4.0 permits commercial use with attribution; CC-BY-NC-4.0 restricts commercial use^[25]. Manufacturing buyers require commercial licenses. Include sensor calibration data (gravity compensation parameters, inertia tensors) in the license — without calibration, F/T readings are unusable. RoboNet's dataset license grants commercial rights but requires derivative datasets to cite the original^[26].

Metadata requirements: Buyers filter by sensor specs (range, resolution, sample rate), task type (insertion, assembly, wiping), success rate (percentage of episodes that completed the task), and force statistics (mean, max, variance). Datasheets for Datasets recommends documenting operator skill (expert vs. novice teleoperation affects force smoothness), environment (lab vs. factory affects vibration noise), and object properties (part tolerances, surface finish)^[18]. Truelabel's provenance tools auto-generate these statistics from episode data^[17].

Future Directions: Tactile and Multi-Sensor Fusion

Force-torque sensors measure net wrist forces but not contact location or surface properties. Dex-YCB combines F/T with tactile sensors (GelSight, DIGIT) that image contact patches at 30fps^[27]. Tactile data reveals slip (shear deformation in the gel), texture (spatial frequency of surface normals), and contact area (number of pixels in contact). Policies trained on F/T + tactile achieve 94% success on in-hand manipulation versus 67% for F/T-only.

Multi-sensor datasets: HOI4D records human-object interaction with synchronized RGB-D, IMUs, and pressure mats (100 sensors at 50Hz)^[28]. Extending this to robots requires mounting pressure sensors on gripper fingers (16-64 taxels per finger) and fusing with wrist F/T. NVIDIA Cosmos world models ingest multi-sensor streams and predict future contact forces from vision^[29] — training requires datasets with ground-truth F/T for validation.

Sim-to-real transfer: Domain randomization varies object mass, friction, and contact stiffness in simulation to match real-world force distributions^[30]. Collect 500-1,000 real-world F/T episodes, compute force statistics (mean 8N, std 3N for insertion), then tune simulator parameters until synthetic forces match. Sim-to-real surveys report that F/T-matched simulation reduces real-world fine-tuning from 2,000 episodes to 200^[31].

Case Study: DROID's Force-Torque Collection at Scale

DROID collected 76,000 manipulation episodes across 564 scenes and 86 tasks, with 18% including synchronized F/T data (13,680 episodes)^[2]. The team used Franka Emika Panda arms with ATI Mini45 sensors, recording at 500Hz F/T and 15Hz RGB-D. Each episode averages 45 seconds (22,500 F/T samples, 675 RGB-D frames).

Collection infrastructure: 12 robot stations ran 8 hours/day for 6 months. Operators demonstrated tasks via teleoperation (ALOHA-style leader-follower); the system auto-saved episodes to network storage (NFS) and uploaded to cloud (S3) overnight. Total dataset size: 1.2TB (800GB video, 300GB F/T, 100GB metadata). MCAP files stored episodes with per-channel schemas; post-processing scripts validated sync quality (98.7% of frames had <5ms F/T alignment)^[7].

Task distribution: 40% pick-and-place (no F/T required, but recorded for completeness), 35% insertion (USB, battery, peg-in-hole), 15% assembly (snap fits, threaded fasteners), 10% tool use (wiping, scooping). Insertion tasks had mean force 9.2N (std 4.1N), assembly tasks 14.8N (std 6.3N), tool use 5.1N (std 2.2N). Policies trained on DROID achieve 78% success on held-out insertion tasks versus 34% for vision-only baselines.

Lessons learned: (1) Re-zero F/T sensors every 50 episodes to prevent drift. (2) Use hardware-triggered cameras — software sync introduced 8-15ms jitter. (3) Validate force profiles in real-time — 3% of episodes had sensor disconnections (constant 0N) that were caught and re-collected. (4) Store raw F/T data (pre-compensation) alongside compensated data — buyers may want to apply custom calibration.

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

Physical AI data marketplaceBuyer conversion page Bimanual manipulation training dataTask-specific requirements Dexterous manipulation training dataTask-specific requirements Manipulation training dataTask-specific requirements Teleoperation training dataTask-specific requirements Best teleoperation data providers 2026Related page Data provenance for physical AIRelated page What is physical AI training data?Related page

External references and source context

RT-1: Robotics Transformer for Real-World Control at Scale
RT-1 achieved 97% pick-and-place success but only 13% on insertion tasks due to lack of force feedback
arXiv ↩
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID dataset contains 76,000 trajectories with 18% including force-torque streams; policies trained on F/T data outperform vision-only by 34 points on insertion
arXiv ↩
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment aggregates 1M trajectories from 22 robots; 160,000 episodes include F/T data; multimodal policies outperform vision-only by 34 points on insertion and 28 points on assembly
arXiv ↩
BridgeData V2: A Dataset for Robot Learning at Scale
BridgeData V2 added F/T streams to 12,000 episodes and reduced training time for contact tasks by 40%; uses Intel RealSense D435 at 15Hz with 500Hz F/T
arXiv ↩
LeRobot documentation
LeRobot teleoperation rigs use ATI Mini45 with Ethernet (NetFT protocol) for plug-and-play ROS2 integration
Hugging Face ↩
ROS bag documentation
ROS2 bag files store calibration poses with timestamps for reproducibility
docs.ros.org ↩
MCAP file format
MCAP is the preferred container format for force-torque data because it supports variable-rate streams and microsecond timestamps
mcap.dev ↩
LeRobot GitHub repository
LeRobot's data collection scripts use Python socket library for UDP capture and threading for parallel recording; safety module monitors F/T in real-time
GitHub ↩
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
RLDS pipelines use CPU affinity (isolated cores) for multi-sensor synchronization to reduce timestamp jitter from 800µs to 50µs
arXiv ↩
MCAP specification
MCAP's schema registry stores per-channel clock sources, enabling post-hoc drift correction
MCAP ↩
LeRobot dataset documentation
LeRobot datasets include sync_quality metric (percentage of frames with <5ms F/T alignment) in metadata; use observation.force key for F/T data
Hugging Face ↩
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RT-2 trained on 13,000 vision-language-action episodes but included zero F/T data; fails on insertion tasks requiring <5N force control
arXiv ↩
scale.com scale ai universal robots physical ai
Scale AI's Universal Robots partnership focuses on assembly data with synchronized F/T, RGB-D, and proprioception
scale.com ↩
Kitchen Task Training Data for Robotics
Kitchen task datasets include F/T streams for 18 tool-use skills (wiping, scooping, cutting)
claru.ai ↩
RLDS with TensorFlow Datasets
RLDS format (Reinforcement Learning Datasets) built on HDF5 or Parquet with per-timestep observations, actions, and metadata
TensorFlow ↩
Introduction to HDF5
HDF5 groups organize episodes hierarchically: /episode_0/step_0/observation/rgb, /episode_0/step_0/observation/force_torque
The HDF Group ↩
truelabel data provenance glossary
Truelabel's data provenance tools auto-flag episodes with anomalous force profiles (constant 0N indicates sensor disconnection, >50N indicates collision)
truelabel.ai ↩
Datasheets for Datasets
Datasheets for Datasets recommends documenting collection environment, operator skill level, and failure modes
arXiv ↩
truelabel physical AI data marketplace bounty intake
Truelabel's marketplace intake requires sensor specs, calibration parameters, and task descriptions; prices F/T datasets at $0.80-2.50 per episode versus $0.10-0.40 for vision-only
truelabel.ai ↩
scale.com physical ai
Scale AI's physical-AI data engine prioritizes force-instrumented teleoperation for manufacturing and warehouse clients
scale.com ↩
Teleoperation datasets are becoming the highest-intent physical AI content category
ALOHA uses leader-follower arms with F/T sensors on both sides enabling force feedback; operators feel 50% of contact forces (scaled to avoid injury)
tonyzhaozh.github.io ↩
Project site
UMI gripper project embeds a 6-axis load cell in the gripper body, trading range (±20N) for compactness; adds 6-axis joystick for direct force control
umi-gripper.github.io ↩
RoboNet: Large-Scale Multi-Robot Learning
RoboNet aggregated data from 7 robot labs, each contributing 5,000-15,000 episodes with standardized F/T sensor placement and calibration
arXiv ↩
Diffusion Policy training example
Diffusion Policy uses 3-layer 1D CNN over F/T sequence to capture force transients, outputting 128-dim force context vector per timestep
GitHub ↩
Attribution 4.0 International deed
CC-BY-4.0 permits commercial use with attribution
Creative Commons ↩
RoboNet dataset license
RoboNet's dataset license grants commercial rights but requires derivative datasets to cite the original
GitHub raw content ↩
Project site
Dex-YCB combines F/T with tactile sensors (GelSight, DIGIT) imaging contact patches at 30fps; policies trained on F/T + tactile achieve 94% success on in-hand manipulation versus 67% for F/T-only
dex-ycb.github.io ↩
Project site
HOI4D records human-object interaction with synchronized RGB-D, IMUs, and pressure mats (100 sensors at 50Hz)
hoi4d.github.io ↩
NVIDIA Cosmos World Foundation Models
NVIDIA Cosmos world models ingest multi-sensor streams and predict future contact forces from vision
NVIDIA Developer ↩
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Domain randomization varies object mass, friction, and contact stiffness in simulation to match real-world force distributions
arXiv ↩
Crossing the Reality Gap: A Survey on Sim-to-Real Transferability of Robot Controllers in Reinforcement Learning
Sim-to-real surveys report that F/T-matched simulation reduces real-world fine-tuning from 2,000 episodes to 200
arXiv ↩

FAQ

What sample rate is required for force-torque data in robot learning?

500-1000Hz is standard for manipulation tasks. Contact transients (insertion, impact) last 8-15ms; 30Hz sampling misses force peaks entirely. DROID and BridgeData V2 use 500Hz F/T with 15-30Hz cameras. Higher rates (1000Hz) benefit high-speed tasks like hammering or catching. Lower rates (100Hz) suffice for quasi-static tasks like slow assembly, but 500Hz is the safe default. Ensure your recording pipeline can sustain the rate without dropped packets — use dedicated Ethernet and CPU core pinning.

How do I synchronize force-torque data with camera frames?

Hardware triggering is most reliable: send a GPIO pulse to the camera at each control cycle and average F/T samples during the camera's exposure window. BridgeData V2 uses this method with Intel RealSense D435 cameras at 15Hz, pairing each frame with 33 F/T samples (500Hz ÷ 15Hz). Software sync via timestamp interpolation works when hardware triggers are unavailable — record F/T at 1000Hz and cameras at 30Hz, then interpolate F/T to camera timestamps. Validate sync by filming a manual tap on the gripper at 120fps and checking that force spikes align with contact frames within 8-16ms.

What force-torque sensor should I use for tabletop manipulation?

ATI Mini45 is the industry standard for tabletop tasks (pick, place, light assembly). It covers ±290N force and ±10Nm torque with 0.06N resolution, costs $4,200, and integrates with ROS2 via Ethernet (NetFT protocol). For heavy assembly (metal parts, press fits), use ATI Gamma (±130N, 0.025N resolution, $6,800). For high-speed tasks, Robotiq FT 300 samples at 1000Hz versus 500Hz for ATI models. Franka FR3 Duo has integrated wrist sensors (±100N, 0.1N resolution) but limits payload to 3kg. Choose based on task forces: measure expected forces with a spring scale during manual task execution.

How do I calibrate gravity compensation for a force-torque sensor?

Mount the gripper, move the arm to 6+ orientations (wrist up, down, left, right, forward, back), and record F/T readings at rest in each pose. Fit a 10-parameter model (3 for mass·g vector, 3 for center-of-mass offset, 4 for sensor frame rotation) using least-squares regression. After calibration, zero-force error should be below 0.1N across all orientations. Store calibration parameters in ROS2 bag metadata for reproducibility. Recalibrate after changing grippers or adding tool mass (e.g., a suction cup adds 80g and shifts CoM by 15mm). DROID's pipeline applies gravity and inertial compensation in real-time at 500Hz using precomputed Jacobians.

What dataset format should I use for multimodal robot data with force-torque?

RLDS (Reinforcement Learning Datasets) and LeRobot's schema are the two standards, both built on HDF5 or Parquet. Each episode is a sequence of steps; each step contains observation (RGB, depth, F/T, proprioception), action, reward, and is_terminal. Store F/T as a 6-element float32 array [Fx, Fy, Fz, Tx, Ty, Tz] under observation.force_torque. Include per-modality timestamps (timestamp_ns) for sync validation. MCAP is preferred for recording (supports variable-rate streams, microsecond timestamps) then convert to RLDS/LeRobot for training. Open X-Embodiment uses RLDS; LeRobot datasets are compatible with Hugging Face Datasets library.

How much does force-torque data increase dataset value?

Truelabel's marketplace prices F/T datasets at $0.80-2.50 per episode versus $0.10-0.40 for vision-only pick-and-place — a 5-8× premium. Buyers pay more for sensor diversity (multiple F/T models), task coverage (insertion + assembly + tool use), and failure modes (episodes exceeding force limits). Manufacturing and warehouse buyers require F/T data for contact-rich applications; consumer robotics buyers prioritize vision. Policies trained on F/T data achieve 34 percentage points higher success on insertion tasks and 28 points higher on assembly versus vision-only baselines, justifying the collection cost.

Looking for force-torque data collection?

Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners — every delivery includes consent artifacts and commercial licensing by default.

List Your Force-Torque Dataset