Solution
Safety-Critical Robot Data for Human-Robot Collaboration
Safety-critical robot data captures human proximity scenarios, force-torque interactions, and edge-case failures required to train perception systems that meet ISO 10218 and ISO/TS 15066 standards. Unlike task-completion datasets, safety data prioritizes worst-case coverage: humans entering workspaces from occluded angles, partial body visibility, unusual postures, and sensor degradation conditions where detection failure causes physical harm.
Quick facts
- Use case
- safety-critical robot data
- Audience
- Robotics and physical AI teams
- Last reviewed
- 2025-06-15
Why Safety-Critical Data Differs from Task-Completion Datasets
Standard robot training data optimizes for task metrics: pick success rate above 92 percent, navigation path efficiency, manipulation cycle time under 8 seconds. Safety-critical data optimizes for a fundamentally different objective — ensuring the robot never causes harm to nearby humans, even when task performance degrades. ISO 10218 defines safety requirements for industrial robot systems including protective stop response times under 500 milliseconds, speed limitation below 250 mm/s in collaborative zones, and safety-rated monitored stop functions with dual-channel verification[1]. ISO/TS 15066 extends these requirements to collaborative applications where humans and robots share workspace without physical barriers, specifying permissible quasi-static contact forces of 65-150 N and transient impact forces of 140-220 N depending on body region[2].
Training a learned perception system to enforce these standards requires data that comprehensively covers scenarios where safety interventions are necessary: humans entering the robot workspace from unexpected directions at velocities up to 1.5 m/s, partial occlusion behind equipment or pallets, unusual postures including crouching or reaching overhead, and edge cases where human detection is most difficult and most critical. DROID contains 76,000 manipulation trajectories across 564 scenes but prioritizes task diversity over safety-critical human proximity scenarios[3]. Open X-Embodiment aggregates 1 million trajectories from 22 robot embodiments yet includes minimal force-torque interaction logs or collision-recovery sequences[4]. The gap between task-completion data and safety-certification data is not a matter of scale but of intentional capture design.
Safety data must include sensor degradation conditions: camera lens occlusion from dust or condensation, LiDAR range reduction in fog or steam, depth sensor failure modes under direct sunlight or reflective surfaces. Scale AI's Physical AI data engine supports multi-sensor annotation but does not systematically capture failure-mode scenarios required for safety validation. Truelabel's marketplace enables buyers to specify safety-critical capture requirements including occlusion percentages, lighting lux ranges, and sensor fault injection rates that match deployment environments.
Human Proximity Detection: The Chicken-and-Egg Problem
Collecting safety-relevant human proximity data presents a fundamental paradox: you need data of humans near operating robots to train safety systems, but operating robots near humans without adequate safety systems is dangerous. Traditional solutions rely on simulation environments like RLBench or manual teleoperation with emergency stop buttons, but neither approach captures the full distribution of real-world human behavior near robots. Simulated humans follow scripted paths with predictable velocities; real humans exhibit unpredictable motion patterns, sudden direction changes, and attention lapses while carrying objects or using mobile devices.
EPIC-KITCHENS-100 provides 100 hours of egocentric video across 45 kitchens with 20 million frames and 90,000 action segments, capturing natural human motion in manipulation-dense environments[5]. However, egocentric video lacks the robot-centric viewpoint, depth information, and force-torque measurements required for safety system training. Ego4D extends egocentric capture to 3,670 hours across 74 worldwide locations but similarly omits robot-workspace geometry and collision-risk annotations. The challenge is not video volume but capture design: cameras must be positioned at robot end-effector and base locations, synchronized with joint encoders and force-torque sensors, and annotated for human body keypoints, proximity zones, and predicted collision risk.
Teleoperation datasets offer a partial solution by capturing human-in-the-loop control during safety-critical scenarios. ALOHA demonstrates bilateral teleoperation for bimanual manipulation tasks, recording operator reactions to unexpected events including dropped objects and workspace intrusions. Claru's teleoperation warehouse dataset captures 12,000 episodes of human operators navigating mobile manipulators through environments with pedestrian traffic, providing ground-truth examples of safe deceleration and path replanning. Truelabel's request system allows buyers to specify teleoperation scenarios including human approach angles, velocities, and occlusion conditions that match their deployment risk profiles.
Force-Torque Interaction Logs: The Missing Modality
ISO/TS 15066 compliance requires robots to limit contact forces below body-region-specific thresholds: 65 N for skull and forehead, 110 N for chest, 140 N for upper arms and thighs[2]. Training a learned controller to respect these limits requires force-torque interaction logs paired with contact geometry, approach velocity, and human body region labels. Existing manipulation datasets rarely include force-torque data; when present, it captures intentional grasping forces rather than unintended human contact.
BridgeData V2 contains 60,000 trajectories with RGB-D observations and proprioceptive joint states but omits wrist force-torque measurements[6]. RT-1 trained on 130,000 episodes from 13 robots similarly lacks force feedback, limiting its applicability to collaborative scenarios where contact is expected and must be regulated[7]. RoboNet aggregated 15 million frames from 7 robot platforms across 4 institutions but prioritized visual diversity over force-sensing coverage[8]. The result is a training data ecosystem optimized for non-contact manipulation rather than safe physical interaction.
Collecting force-torque interaction data requires specialized hardware: 6-axis force-torque sensors at the wrist with 0.1 N resolution, joint torque sensors with 0.01 Nm resolution, and synchronized capture at 1 kHz to detect transient impacts. Franka FR3 collaborative robots include integrated joint torque sensing and external force estimation, enabling capture of contact dynamics during teleoperation or scripted interaction scenarios. Universal Robots UR series provides force mode control with configurable compliance, allowing operators to demonstrate safe contact behaviors that can be logged and replayed. Truelabel's marketplace connects buyers with collectors who own force-sensing hardware and can execute custom interaction protocols including controlled human-robot contact at specified velocities and angles.
Multi-Sensor Fusion for Occlusion-Robust Detection
Human proximity detection in industrial environments must function reliably despite occlusion from equipment, pallets, and other humans. Single-modality perception systems fail when the primary sensor is blocked: RGB cameras cannot see through opaque objects, LiDAR misses low-profile obstacles below the scan plane, depth sensors produce invalid readings on reflective or absorptive surfaces. Safety-critical training data must include multi-sensor captures with ground-truth occlusion labels to train fusion models that maintain detection performance when individual sensors fail.
Waymo Open Dataset provides synchronized LiDAR, camera, and radar data for autonomous driving with 1,150 scenes and 12 million 3D bounding boxes, demonstrating the value of multi-sensor fusion for occlusion handling. However, driving scenarios differ from manipulation workspaces: sensor mounting heights, object scales, and occlusion patterns are fundamentally different. Dex-YCB captures hand-object interaction with RGB-D and motion capture but focuses on tabletop manipulation rather than full-body human detection in industrial settings. HOI4D provides 4D human-object interaction sequences with category-level annotations but lacks the robot-workspace context and safety-zone geometry required for proximity detection training.
Effective multi-sensor safety data pairs RGB cameras at 30 fps, 3D LiDAR at 10 Hz, and depth cameras at 30 fps with synchronized timestamps and extrinsic calibration. MCAP format supports heterogeneous sensor streams with nanosecond-precision timestamps, enabling fusion model training on temporally aligned data. Segments.ai multi-sensor labeling provides tools for annotating 3D bounding boxes, keypoints, and occlusion masks across synchronized camera and LiDAR frames. Truelabel's data provenance system tracks sensor calibration parameters, mounting positions, and occlusion statistics to ensure buyers receive fusion-ready datasets with documented sensor coverage.
Edge-Case Scenarios: Crawling, Reaching, and Unusual Postures
Safety certification requires demonstrating reliable human detection across the full range of postures humans might adopt in robot workspaces: crouching to retrieve dropped items, reaching overhead to access shelves, crawling under conveyors, or lying prone during maintenance. Standard datasets capture standing and walking humans; edge-case postures are underrepresented or absent. COCO person detection contains 250,000 person instances but predominantly shows upright postures in everyday contexts rather than industrial edge cases[9].
CALVIN provides 24,000 manipulation trajectories in simulated kitchen and tabletop environments but does not include human proximity scenarios or unusual postures. RoboCasa extends simulation to 100 kitchen layouts with 2,500 objects but similarly omits human models in safety-critical configurations. The gap between simulation and deployment is not just visual realism but scenario coverage: real industrial environments include maintenance workers in confined spaces, operators bending to load materials, and pedestrians passing through workspaces while distracted by mobile devices or conversations.
Capturing edge-case posture data requires deliberate scenario design: scripted sequences where actors adopt specific postures at defined distances and angles relative to the robot, annotated with body keypoints, proximity zone labels, and occlusion percentages. Claru's kitchen task training data includes human actors performing reaching, bending, and crouching motions during meal preparation, providing posture diversity in manipulation contexts. Truelabel's physical AI data marketplace enables buyers to specify posture requirements including joint angle ranges, body center-of-mass heights, and occlusion conditions that match their risk assessments. Collectors submit proposals with sample captures demonstrating their ability to execute the specified scenarios with consistent annotation quality.
Sensor Degradation and Environmental Stress Testing
Safety systems must maintain performance when sensors degrade due to environmental conditions: camera lenses obscured by dust, oil mist, or condensation; LiDAR range reduced by fog, steam, or airborne particulates; depth sensors producing invalid readings under direct sunlight or near reflective surfaces. Training data that includes only clean-sensor conditions produces models that fail silently when deployed in real industrial environments. Safety-critical datasets must systematically capture sensor degradation scenarios with ground-truth labels indicating degradation type and severity.
Domain randomization addresses sim-to-real transfer by varying lighting, textures, and object properties during training, but does not explicitly model sensor failure modes[10]. Sim-to-real transfer research focuses on bridging the visual appearance gap rather than sensor reliability gaps. Real-world sensor degradation exhibits specific failure patterns: depth sensors produce systematic errors near edges and on specular surfaces, LiDAR returns are attenuated by fog with wavelength-dependent scattering, RGB cameras lose contrast in low-light conditions with increased noise variance.
Capturing sensor degradation data requires controlled environmental chambers or opportunistic capture during adverse conditions. Fog machines, dust generators, and variable lighting rigs enable systematic degradation testing in laboratory settings. Field capture during rain, snow, or high-humidity conditions provides authentic degradation patterns but requires weather-dependent scheduling. Kognic's annotation platform supports degraded-sensor labeling workflows where annotators mark invalid sensor regions and estimate degradation severity. Dataloop's data management enables filtering and stratification by environmental metadata including temperature, humidity, and particulate concentration. Truelabel's request system allows buyers to specify environmental stress conditions including temperature ranges, humidity percentages, and particulate concentrations that match their deployment sites.
Collision Recovery and Post-Contact Behavior
Safety-critical training data must include not only collision avoidance but also collision recovery: how the robot should behave after unintended contact occurs. ISO 10218 requires protective stop within 500 milliseconds of contact detection; ISO/TS 15066 specifies force-limited operation where contact is expected and regulated. Training a learned policy to execute safe post-contact behavior requires datasets that capture contact events, force-torque trajectories during contact, and successful recovery sequences including retraction, replanning, and resumption.
Existing manipulation datasets rarely include collision events; when collisions occur during data collection, those episodes are typically discarded as failures rather than labeled as valuable safety examples. RT-2 trained on 6,000 web-scraped robot trajectories and 580,000 vision-language examples but does not include collision-recovery demonstrations[11]. RoboCat self-improves through 253,000 trajectories across 141 tasks but prioritizes task success over safety-critical failure handling[12]. The result is models that perform well on nominal trajectories but lack robust recovery behaviors when unexpected contact occurs.
Collecting collision-recovery data requires deliberate fault injection: scripted scenarios where the robot contacts objects or humans at controlled velocities and forces, followed by operator-demonstrated or scripted recovery sequences. LeRobot supports trajectory recording with force-torque sensors, enabling capture of contact dynamics and recovery behaviors. RLDS format represents episodes as sequences of observations, actions, and rewards, allowing collision events to be marked with negative rewards and recovery sequences to be annotated as corrective actions[13]. Truelabel's marketplace enables buyers to specify collision-recovery scenarios including contact velocities, impact forces, and recovery success criteria that align with their safety validation requirements.
Annotation Requirements for Safety Certification
Safety-critical datasets require annotation schemas that go beyond standard object detection or segmentation. Annotations must include human body keypoints with joint angle estimates, proximity zone labels indicating distance to robot workspace boundaries, predicted collision risk scores based on human velocity and trajectory, and body region labels for force-limit compliance. Standard annotation tools designed for autonomous driving or general computer vision lack the specialized schemas and workflows required for safety-critical robot data.
CVAT polygon annotation supports 2D bounding boxes, polygons, and keypoints but does not provide built-in schemas for proximity zones or collision risk. Labelbox offers custom ontology creation but requires buyers to design and validate their own safety-specific schemas. Encord Annotate supports video annotation with temporal interpolation but lacks force-torque visualization or contact-event marking. The annotation tooling gap forces buyers to either adapt general-purpose tools with custom plugins or build proprietary annotation pipelines, increasing data preparation costs and time-to-deployment.
Safety-critical annotation workflows must include multi-stage review: initial annotators mark human keypoints and proximity zones, domain experts review and correct annotations based on safety standards, and automated validation checks ensure consistency across frames and episodes. Appen's data annotation services provide managed annotation teams but require buyers to supply detailed annotation guidelines and quality rubrics. CloudFactory's industrial robotics solutions offer specialized annotation for manufacturing use cases but focus on defect detection rather than human safety. Truelabel's marketplace connects buyers with annotation teams experienced in safety-critical labeling, providing pre-built schemas for ISO 10218 and TS 15066 compliance and multi-stage review workflows that ensure annotation quality meets certification requirements.
Dataset Scale and Diversity Requirements
How much safety-critical data is enough? The answer depends on the deployment environment's complexity, the robot's degrees of freedom, and the certification standard's rigor. A fixed-base collaborative robot in a structured warehouse requires fewer scenarios than a mobile manipulator in a dynamic factory floor. ISO 10218 does not specify dataset size requirements; it requires demonstrating that the safety system performs reliably across the full range of foreseeable hazards. Translating this requirement into dataset specifications requires risk assessment: identifying hazard scenarios, estimating their frequency, and ensuring sufficient data coverage for each scenario class.
Open X-Embodiment demonstrates that cross-embodiment training on 1 million trajectories improves generalization, but does not address safety-specific scenario coverage[14]. DROID provides 76,000 trajectories across 564 scenes with 86 object categories, prioritizing object and scene diversity over human proximity scenarios[15]. Safety-critical datasets require a different diversity metric: coverage of human approach angles, velocities, postures, occlusion conditions, and sensor degradation states. A dataset with 10,000 episodes covering 100 distinct safety scenarios provides better certification support than a dataset with 100,000 episodes covering 10 scenarios.
Dataset diversity must also span robot embodiments and workspace geometries. A safety system trained on data from a single robot model may not generalize to robots with different kinematic structures, reach envelopes, or sensor mounting positions. RoboNet aggregated data from 7 robot platforms to improve cross-platform generalization, demonstrating that embodiment diversity improves transfer[16]. Truelabel's marketplace enables buyers to specify embodiment requirements including robot models, sensor configurations, and workspace geometries, then matches them with collectors who own the specified hardware and can execute the required capture protocols.
Provenance and Auditability for Certification
Safety certification requires auditable data provenance: documented chains of custody from capture through annotation to model training, with metadata tracking sensor calibration, environmental conditions, and annotation quality metrics. Regulatory bodies and certification authorities need to verify that training data covers the required scenarios, annotations meet quality standards, and no data corruption or mislabeling occurred during preparation. Standard dataset distribution formats like ZIP archives or cloud storage buckets lack the provenance metadata and audit trails required for certification.
PROV-DM defines a data model for provenance information including entities, activities, and agents, enabling machine-readable provenance graphs. OpenLineage provides an open standard for data lineage tracking in data pipelines, capturing dataset versions, transformations, and dependencies. However, neither standard is widely adopted in robotics dataset distribution; most datasets provide minimal provenance metadata beyond a README file and a license. Datasheets for Datasets proposes a structured documentation framework covering motivation, composition, collection process, and recommended uses, but remains a research proposal rather than an industry standard[17].
Truelabel's data provenance system tracks capture metadata including sensor serial numbers, calibration timestamps, environmental conditions, and operator identities. Annotation provenance includes annotator identities, review stages, quality scores, and inter-annotator agreement metrics. Model training provenance links datasets to training runs, hyperparameters, and evaluation results. This end-to-end provenance enables buyers to demonstrate to certification authorities that their training data meets safety requirements and that their models were trained on verified, high-quality data. Provenance metadata is cryptographically signed and stored in immutable logs, providing tamper-evident audit trails for regulatory review.
Licensing and Liability Considerations
Safety-critical robot deployments carry liability risks: if a robot injures a human, the manufacturer, operator, and potentially the training data provider may face legal claims. Dataset licenses must address liability allocation, indemnification, and warranty disclaimers. Standard open-source licenses like Creative Commons Attribution 4.0 provide attribution requirements but do not address liability for safety-critical applications. Commercial dataset licenses often include broad liability disclaimers but may not be enforceable in jurisdictions with strict product liability laws.
ISO 10218 and TS 15066 compliance does not eliminate liability; it establishes a baseline for due diligence. If a robot causes injury despite meeting safety standards, liability may depend on whether the training data adequately covered the hazard scenario that led to the injury. Dataset providers who claim their data supports safety certification may face liability if the data is later found to be inadequate. Buyers need licenses that clearly define the dataset's intended use, document its limitations, and allocate liability between the data provider and the buyer.
Federal Acquisition Regulation Subpart 27.4 addresses rights in data and copyrights for U.S. government contracts, providing frameworks for data rights allocation. GDPR Article 7 establishes conditions for consent when personal data is collected, relevant for datasets containing human images or biometric data. Truelabel's marketplace provides standardized license templates for safety-critical data including liability disclaimers, indemnification clauses, and warranty limitations. Buyers can negotiate custom license terms for high-value datasets, with legal review supported by Truelabel's contract management tools.
Integration with Existing Training Pipelines
Safety-critical datasets must integrate with buyers' existing training pipelines including data loaders, augmentation libraries, and model architectures. Format compatibility is essential: datasets stored in proprietary formats require custom parsers that increase integration effort and introduce potential bugs. Standard formats like HDF5, MCAP, and Parquet enable seamless integration with popular machine learning frameworks including PyTorch, TensorFlow, and JAX.
LeRobot provides a unified interface for loading robotics datasets in RLDS format, supporting 25 datasets including ALOHA, BridgeData V2, and DROID[18]. TensorFlow Datasets offers a catalog of 200+ datasets with standardized loading APIs, but robotics datasets remain underrepresented with only 12 robot-specific datasets as of 2024. Hugging Face Datasets supports 50,000+ datasets with streaming and caching, but physical AI datasets require specialized loaders for multi-sensor synchronization and trajectory segmentation.
Truelabel's marketplace delivers datasets in buyer-specified formats including RLDS, MCAP, HDF5, and Parquet, with conversion scripts and validation tools. Datasets include metadata files documenting sensor calibration, coordinate frames, and annotation schemas in machine-readable formats. Sample training scripts demonstrate integration with popular frameworks including LeRobot, RobotLearning, and Diffusion Policy. Buyers receive not just raw data but integration-ready packages that reduce time-to-training from weeks to days.
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
External references and source context
- Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence
ISO 10218 safety requirements for industrial robot systems including protective stop and speed limitation
EUR-Lex ↩ - Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence
ISO/TS 15066 permissible force and pressure limits for collaborative robot applications
EUR-Lex ↩ - Project site
DROID contains 76,000 manipulation trajectories across 564 scenes
droid-dataset.github.io ↩ - Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment aggregates 1 million trajectories from 22 robot embodiments
arXiv ↩ - Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100
EPIC-KITCHENS-100 provides 100 hours of egocentric video with 20 million frames
arXiv ↩ - BridgeData V2: A Dataset for Robot Learning at Scale
BridgeData V2 contains 60,000 trajectories with RGB-D observations
arXiv ↩ - RT-1: Robotics Transformer for Real-World Control at Scale
RT-1 trained on 130,000 episodes from 13 robots
arXiv ↩ - RoboNet: Large-Scale Multi-Robot Learning
RoboNet aggregated 15 million frames from 7 robot platforms
arXiv ↩ - Project site
COCO person detection contains 250,000 person instances
scan-net.org ↩ - Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Domain randomization addresses sim-to-real transfer by varying lighting and textures
arXiv ↩ - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RT-2 trained on 6,000 web-scraped robot trajectories and 580,000 vision-language examples
arXiv ↩ - RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
RoboCat self-improves through 253,000 trajectories across 141 tasks
arXiv ↩ - RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
RLDS format represents episodes as sequences of observations, actions, and rewards
arXiv ↩ - Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment demonstrates cross-embodiment training on 1 million trajectories
arXiv ↩ - DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID provides 76,000 trajectories across 564 scenes with 86 object categories
arXiv ↩ - RoboNet: Large-Scale Multi-Robot Learning
RoboNet aggregated data from 7 robot platforms to improve cross-platform generalization
arXiv ↩ - Datasheets for Datasets
Datasheets for Datasets proposes structured documentation framework covering motivation and composition
arXiv ↩ - LeRobot documentation
LeRobot provides unified interface for loading 25 robotics datasets in RLDS format
Hugging Face ↩
FAQ
What is the minimum dataset size required for ISO 10218 safety certification?
ISO 10218 does not specify minimum dataset sizes; it requires demonstrating that safety systems perform reliably across foreseeable hazards. Dataset size depends on deployment environment complexity, robot degrees of freedom, and hazard scenario diversity. A structured warehouse with fixed-base robots may require 5,000-10,000 episodes covering 50-100 distinct scenarios. A dynamic factory floor with mobile manipulators may require 50,000-100,000 episodes covering 500+ scenarios. Focus on scenario coverage rather than raw episode count: 10,000 episodes covering 100 scenarios provides better certification support than 100,000 episodes covering 10 scenarios.
Can simulated safety data replace real-world capture for certification?
Simulation provides valuable coverage of rare edge cases and systematic scenario variation, but cannot fully replace real-world data for safety certification. Simulated humans follow scripted paths with predictable dynamics; real humans exhibit unpredictable motion, attention lapses, and unusual postures. Simulated sensors produce idealized measurements; real sensors exhibit degradation, noise, and failure modes. Certification authorities typically require real-world validation data demonstrating that safety systems trained on simulated data perform reliably in deployment conditions. Hybrid approaches combining simulated scenario generation with real-world validation provide cost-effective certification paths.
How do I verify that a dataset covers the hazard scenarios relevant to my deployment?
Conduct a hazard analysis following ISO 12100 risk assessment methodology: identify foreseeable hazards, estimate their severity and frequency, and determine required risk reduction measures. Map each hazard scenario to dataset coverage requirements including human approach angles, velocities, postures, and occlusion conditions. Review dataset documentation including scenario descriptions, capture protocols, and annotation schemas. Request sample episodes demonstrating coverage of your highest-risk scenarios. Truelabel's provenance system provides scenario coverage reports showing episode counts per scenario class, enabling quantitative verification of hazard coverage.
How does Truelabel deliver safety-critical robot data?
Truelabel matches your safety-critical robot data requirement to verified capture partners and curated open datasets. Each engagement includes contributor consent records, provenance traces from raw capture to packaged dataset, and the structural QA metrics (frame coverage, action labels, episode length) production-grade training requires.
Looking for safety-critical robot data?
Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners — every delivery includes consent artifacts and commercial licensing by default.
Submit Safety Data Request