JSON
Structured metadata format used for manifests, annotations, and calibration files.
FORMAT FACETS
Pick the format your training pipeline expects. RLDS, LeRobot, HDF5, MCAP, ROS bag, and Parquet each carry trade-offs for streaming, schema preservation, and tool compatibility — these facets group datasets by what they actually ship.
DIRECT ANSWER
A dataset's delivery format decides whether a small sample can move from supplier to your training pipeline without a hidden conversion project. Buyers should require samples in the requested format before scaling, since not every format preserves episode boundaries, action streams, or timestamps the same way.
6 FACETS
Structured metadata format used for manifests, annotations, and calibration files.
Compressed video container used for RGB or egocentric footage.
Hierarchical data format commonly used for trajectories, demonstrations, and arrays.
3D scene or object geometry represented as spatial points, often from depth sensors or reconstruction.
Columnar data format used by modern robotics dataset hubs for episode metadata, frame tables, and sharded time-series records.
Python object serialization format sometimes used by benchmark releases for demonstrations, observations, and robot state dictionaries.
CROSS-CATALOG
Combine this facet with a second filter (modality, task, robot, format, license, or commercial-use) on the main dataset catalog to narrow the buyer decision faster.
KEEP DIGGING
A dataset record is only useful when it connects into the rest of the buyer workflow. The next review step is usually not another summary; it is a fit check, rights triage, source comparison, or custom bounty spec that names the missing proof.
For physical AI teams, the hard question is whether the public source can support a specific model objective under real deployment constraints. That requires adjacent dataset records, tools, comparisons, and sourcing paths, plus external references that a reviewer can open and challenge.
Use the links below to keep the review grounded. Start broad when discovery is incomplete, move into profile and comparison pages when the candidate source is known, and switch to custom collection when the blocker is rights, consent, geography, robot embodiment, or target environment coverage.
TRUELABEL ROUTING
Tell us the source dataset and your target schema. Our partners can deliver pre-validated conversions with manifest, checksums, and conversion-loss notes.