truelabelRequest data

Delivery format

Parquet robot data format for robot training data

Parquet robot data is useful for large robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distribution. Define episode index, frame offsets, task descriptions, feature schema, video references, statistics, and split metadata before reviewing samples so you can verify that delivery matches the training pipeline.

Updated 2026-05-04
By truelabel
Reviewed by truelabel ·
Parquet robotics dataset

Quick facts

Origin
Apache Parquet — columnar binary table format; the Hugging Face Hub auto-converts dataset uploads to Parquet.
Robotics adoption
LeRobot v2.x ships state/action streams in Parquet alongside MP4 video — DROID (cadene/droid: 92,233 ep / 27M frames), BridgeData, fractal20220817 all use this layout.
Strengths
Columnar reads (load only the fields the trainer uses), efficient predicate pushdown, native pandas/polars/pyarrow support.
Pairing convention
Parquet for tabular state/action; MP4 (H.264 / H.265 / AV1) for video; JSONL for task metadata. The HF dataset card describes the pairing.

Comparison

Format choiceStrengthRisk
Parquet robot datalarge robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distributionNeeds exact schema agreement before capture
Raw filesFast supplier exportHigh buyer cleanup burden
Custom schemaMatches internal pipelineHarder supplier onboarding

What is Parquet robot data?

Parquet robot data should be requested when the buyer's training or evaluation pipeline already expects large robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distribution. Anchor the bounty to the canonical specification before suppliers submit samples [1], then use implementation documentation to make the expected file layout reviewable [2]. Robotics teams should also name the dataset or paper lineage they expect suppliers to support [3].

"Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval."

[1]

For truelabel buyers, that quote matters because it turns Parquet robotics dataset from a generic delivery preference into a source-backed requirement the supplier can test against a sample file.

Using Parquet robot data with robot data

A useful Parquet robot data sample should prove episode index, frame offsets, task descriptions, feature schema, video references, statistics, and split metadata, plus file naming, manifest completeness, timestamp behavior, and rejected-example traceability. Include at least one workflow or converter reference so the supplier can show how the files load in practice [4], one interoperability reference for adjacent formats [5], and one comparison source for why this format is preferable to a raw folder dump [6].

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

External references and source context

  1. Apache Parquet file format

    Apache Parquet is a column-oriented file format for efficient data storage and retrieval.

    Apache Parquet
  2. Apache Arrow Parquet files

    Apache Arrow documents reading and writing Parquet files in Python.

    Apache Arrow
  3. Dremel: Interactive Analysis of Web-Scale Datasets

    Dremel is a columnar nested-data reference related to Parquet's data model lineage.

    VLDB Endowment
  4. Hugging Face Datasets features and storage

    Hugging Face Datasets documentation is relevant to Parquet-backed feature schemas.

    Hugging Face
  5. HDF5 1.14 documentation

    HDF5 is a structured data format often compared with Parquet for robotics records.

    The HDF Group
  6. RLDS with TensorFlow Datasets

    RLDS is relevant when robotics episodes are represented outside columnar tables.

    TensorFlow

FAQ

What is Parquet robot data used for?

Parquet robot data is used for large robotics dataset hubs, frame tables, episode metadata, sharded time-series records, and LeRobot-compatible distribution.

What fields should Parquet robot data delivery require?

At minimum, require episode index, frame offsets, task descriptions, feature schema, video references, statistics, and split metadata, plus a delivery manifest and validation notes.

Can suppliers convert into this format?

Some suppliers can deliver directly in the requested format; others may need conversion. Buyers should require a small sample before full delivery.

Should the format be decided before capture?

Yes. Deciding the format before capture prevents missing fields, timestamp drift, and expensive post-delivery cleanup.

Working with Parquet robotics dataset

Truelabel normalizes Parquet robotics dataset across capture partners so you can ingest one consistent schema instead of writing per-vendor adapters.

Request Parquet robot data data