Delivery format
Parquet robot data format for robot training data
Parquet robot data is useful for large robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distribution. Define episode index, frame offsets, task descriptions, feature schema, video references, statistics, and split metadata before reviewing samples so you can verify that delivery matches the training pipeline.
Quick facts
- Origin
- Apache Parquet — columnar binary table format; the Hugging Face Hub auto-converts dataset uploads to Parquet.
- Robotics adoption
- LeRobot v2.x ships state/action streams in Parquet alongside MP4 video — DROID (cadene/droid: 92,233 ep / 27M frames), BridgeData, fractal20220817 all use this layout.
- Strengths
- Columnar reads (load only the fields the trainer uses), efficient predicate pushdown, native pandas/polars/pyarrow support.
- Pairing convention
- Parquet for tabular state/action; MP4 (H.264 / H.265 / AV1) for video; JSONL for task metadata. The HF dataset card describes the pairing.
Comparison
| Format choice | Strength | Risk |
|---|---|---|
| Parquet robot data | large robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distribution | Needs exact schema agreement before capture |
| Raw files | Fast supplier export | High buyer cleanup burden |
| Custom schema | Matches internal pipeline | Harder supplier onboarding |
What is Parquet robot data?
Parquet robot data should be requested when the buyer's training or evaluation pipeline already expects large robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distribution. Anchor the bounty to the canonical specification before suppliers submit samples [1], then use implementation documentation to make the expected file layout reviewable [2]. Robotics teams should also name the dataset or paper lineage they expect suppliers to support [3].
[1]"Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval."
For truelabel buyers, that quote matters because it turns Parquet robotics dataset from a generic delivery preference into a source-backed requirement the supplier can test against a sample file.
Using Parquet robot data with robot data
A useful Parquet robot data sample should prove episode index, frame offsets, task descriptions, feature schema, video references, statistics, and split metadata, plus file naming, manifest completeness, timestamp behavior, and rejected-example traceability. Include at least one workflow or converter reference so the supplier can show how the files load in practice [4], one interoperability reference for adjacent formats [5], and one comparison source for why this format is preferable to a raw folder dump [6].
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
External references and source context
- Apache Parquet file format
Apache Parquet is a column-oriented file format for efficient data storage and retrieval.
Apache Parquet ↩ - Apache Arrow Parquet files
Apache Arrow documents reading and writing Parquet files in Python.
Apache Arrow ↩ - Dremel: Interactive Analysis of Web-Scale Datasets
Dremel is a columnar nested-data reference related to Parquet's data model lineage.
VLDB Endowment ↩ - Hugging Face Datasets features and storage
Hugging Face Datasets documentation is relevant to Parquet-backed feature schemas.
Hugging Face ↩ - HDF5 1.14 documentation
HDF5 is a structured data format often compared with Parquet for robotics records.
The HDF Group ↩ - RLDS with TensorFlow Datasets
RLDS is relevant when robotics episodes are represented outside columnar tables.
TensorFlow ↩
FAQ
What is Parquet robot data used for?
Parquet robot data is used for large robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distribution.
What fields should Parquet robot data delivery require?
At minimum, require episode index, frame offsets, task descriptions, feature schema, video references, statistics, and split metadata, plus a delivery manifest and validation notes.
Can suppliers convert into this format?
Some suppliers can deliver directly in the requested format; others may need conversion. Buyers should require a small sample before full delivery.
Should the format be decided before capture?
Yes. Deciding the format before capture prevents missing fields, timestamp drift, and expensive post-delivery cleanup.
Working with Parquet robotics dataset
Truelabel normalizes Parquet robotics dataset across capture partners so you can ingest one consistent schema instead of writing per-vendor adapters.
Request Parquet robot data data