Downloads
638
HUGGING FACE ROBOTICS RECORD
Rank #565 in the current truelabel Hugging Face robotics crawl by downloads.
DIRECT ANSWER
unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2 is a robotics dataset candidate on Hugging Face. Treat it as a source record first: useful for discovery, benchmarking, or ingestion planning, but not procurement-ready until rights, consent, format, and sample QA are checked.
638
Apache 2.0
Sep 16, 2025
SOURCE CONTEXT
This dataset was created using LeRobot. Dataset Structure meta/info.json: { "codebase_version": "v2.1", "robot_type": "Unitree_Z1_Dual", "total_episodes": 254, "total_frames": 178104, "total_tasks": 1, "total_videos": 762, "total_chunks": 1, "chunks_size": 1000, "fps": 30, "splits": { "train": "0:254"}, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":…
EDITORIAL REVIEW
unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2 is a sourced discovery record, not a certified procurement package. The crawl captured the dataset id, author, downloads, license tag, format tags, modality tags, gating status, and available dataset-card description from the Hugging Face API snapshot listed in the sources. That is enough to decide whether the record deserves review. It is not enough to approve model training, redistribution, or deployment. The first question is whether the record can survive a rights, ingestion, and evaluation review.
In the current robotics crawl this record sits at rank #565 by downloads, which places it in the specialized-catalog band. Download rank is a discovery signal, not a quality score: popular records can be stale, under-documented, or unsuitable for a particular robot, while lower-download records can fit a narrow deployment gap. Use rank for prioritization, then inspect files, card text, and source project details on the live Hub record.
The crawl captured 638 downloads, Parquet as the format signal, and Tabular, Timeseries, Video as the modality signal. Those fields separate records that may be runnable in a training pipeline from records that only help as research context. They are incomplete by design. A Hub tag can say "parquet" or "video" without proving that action arrays, timestamps, camera calibration, robot state, success labels, rejection notes, or scene metadata exist in the shape a buyer needs.
the license tag looks permissive, but the dataset files, upstream sources, and contributor permissions still have to match the intended use. Read the license tag together with dataset-card language, repository files, linked papers, upstream datasets, and collection consent workflow. Physical AI rights review goes beyond software licensing because data can include identifiable people, private sites, proprietary objects, robot logs, or human demonstrations. Treat the Hub metadata as a source citation and run sample-level review before use.
LeRobot-style records often help teams reason about modern robot-learning ingestion patterns, especially where video observations, Parquet-style metadata, or Hugging Face distribution are part of the workflow. This matters because buyers often search by famous dataset names but need operational fit: the target robot, camera layout, task boundary, environment, and acceptable failure modes must resemble the intended deployment. A record can be an excellent benchmark neighbor and still be a poor substitute for a production collection if its provenance, embodiment, or task distribution diverges from the buyer's model objective.
RECORD-SPECIFIC SIGNALS
The immediate source packet for unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2 is author unitreerobotics, 638 downloads, 2 likes, Apache 2.0 as the visible license tag, 100K<n<1M as the size category, and September 16, 2025 as the captured freshness note. Those fields do not prove quality, but they tell the reviewer where to start: source ownership, community attention, likely scale, and the places where metadata is missing.
The visible tag set is Robotics, Apache 2.0, Size Categories:100K<N<1M, Parquet, Tabular, Timeseries, Video, Datasets, Pandas, Mlcroissant. Tags are helpful because they expose the author's own distribution language, but they can also hide ambiguity. A tag like Parquet or Tabular, Timeseries, Video needs a file-level check before it becomes a procurement fact. The review memo should quote the exact tag, then record whether a sample confirmed the same signal in the files.
The library and region signals are Datasets, Mlcroissant, Pandas, Polars and Us. Library tags point toward likely loading paths, while region tags can matter for privacy, contributor expectations, data residency review, and supplier sourcing assumptions. When those tags are absent, the buyer needs to treat the gap as a question rather than inventing a location, loader, or governance story.
The author neighborhood in this crawl includes unitreerobotics/G1_WBT_Brainco_Collect_Plates_Into_Dishwasher, unitreerobotics/G1_Dex3_ToastedBread_Dataset, unitreerobotics/G1_WBT_Inspire_Put_Clothes_into_Washing_Machine, unitreerobotics/G1_WBT_Brainco_Pickup_Pillow. Same-author records can reveal whether this is a maintained family, a one-off upload, a benchmark conversion, or a collection series with repeated schema choices. Reviewers should inspect nearby records before writing a bounty, because a related upload may contain cleaner files, newer terms, stronger metadata, or a more relevant task slice.
The research-reference signal is no arXiv tag captured. An arXiv tag is useful for understanding method context, benchmark claims, or collection intent, but it is not a rights grant and not an ingestion test. If the paper and Hub card disagree, the procurement memo needs both links and a written resolution before the record influences a commercial training plan.
USE CASE FIT
The safest initial use case for unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2 is discovery. The record can help a team learn what similar authors publish, what formats are common, which modalities appear in the open ecosystem, and whether public repositories contain a close enough example before the team funds new collection. Discovery value is real even when procurement value is uncertain, because it shortens the path from a vague "robotics dataset" query to a concrete sample, schema, and rights checklist.
The second use case is benchmark adjacency. If the target task resembles the tags, description, or naming pattern on this record, a reviewer can use it to frame evaluation questions: what does a successful episode look like, what failure modes are missing, and what metadata would be required to compare a new collection against a public baseline? This is especially useful when the buyer is deciding whether to adapt a model, run a small evaluation, or commission task-specific data rather than ingesting public data directly.
The third use case is ingestion planning. Format tags such as Parquet tell the engineering team what loader, conversion, checksum, and schema validation work may be needed. That planning belongs before any buyer assumes the data is cheap. A dataset that looks free can still be expensive if it needs manual cleanup, missing calibration reconstruction, frame extraction, action/state alignment, or metadata normalization before it becomes training-ready.
The fourth use case is supplier brief writing. Even when this record is not the right source to train on, it can define vocabulary for a better bounty: requested modalities, target file layout, sample manifest, rights artifacts, acceptance rules, and proof of conversion. The internal links connect this source record to curated dataset profiles and dataset-fit tooling, so the review can move from discovery into a procurement decision.
RIGHTS AND CONSENT
Start commercial review with the visible license tag, then check the underlying terms. For unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2, the captured tag is Apache 2.0. Compare that signal against the dataset card, repository license files, linked project pages, linked papers, and upstream sources. If those sources disagree, use the strictest interpretation until counsel or the source owner confirms the intended permissions.
Consent review is separate from license review. Physical AI records can include human operators, bystanders, homes, labs, workplaces, or proprietary objects. Even when a license appears permissive, review how data was collected, whether people knew their demonstrations could train downstream models, whether private spaces appear, and whether access restrictions transfer to derived checkpoints. This matters most when the dataset trains a model that will be sold, embedded, or deployed outside research.
The record is not marked gated in the crawl, but an ungated Hub page is not the same as a cleared commercial data package. Preserve a snapshot of the record at the moment of decision: card text, file list, license tag, last modified date, and source URLs. The crawl captured freshness as September 16, 2025, but that date only helps when the buyer records what was approved. If the Hub page changes later, a procurement memo needs to prove which version informed the decision.
The rights question also includes redistribution and derivative-use rules. Some records are fine for local research but unclear for checkpoint release, model-as-a-service products, fine-tuned commercial models, or downstream sublicense. Ask about model-use terms beyond file download permission, and keep a documented rights memo before scale.
INGESTION PLAN
The minimum ingestion proof is a small sample that can be downloaded, hashed, parsed, and converted without manual fixes. For unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2, the first engineering pass should identify the file layout, row or episode boundaries, media paths, compression choices, and whether every observation has a corresponding timestamp. A screenshot of a Hub page is not proof of ingestion. Proof means a repeatable script can move from source files to the buyer's internal schema.
For robot-learning data, ingestion proof should include action/state alignment. A record with video observations but no synchronized robot actions may still help perception pretraining, but it cannot serve the same role as teleoperation demonstrations. A record with action arrays but weak metadata can be hard to filter by task, object, success, or failure. The reviewer should inspect whether the public files include task names, split definitions, camera ids, calibration, robot state, gripper state, and episode-level outcomes.
The captured format signal for this record is Parquet. That points the reviewer toward the likely loader, but format tags do not guarantee schema quality. Parquet, JSON, MP4, HDF5, ROS bag, RLDS, and custom archives can all be excellent or unusable depending on the fields present. Test syntax and semantics: files open, values align, labels are meaningful, and conversion preserves the fields that model training or evaluation depends on.
A strong sample gate includes rejected examples. Teams often test one clean episode and miss the real cost: corrupted clips, missing frames, mislabeled tasks, empty action arrays, duplicate rows, inconsistent units, or incompatible camera timing. Pull edge cases, confirm checksums, log parse failures, and make a go/no-go call before buying more data or investing engineering time into a full pipeline.
EVALUATION DESIGN
The evaluation plan should start from the buyer's target behavior, not the dataset name. If unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2 is being considered for manipulation, navigation, humanoid control, or general embodied training, the reviewer should define the exact behavior change expected from using the record. That might be better object handling, better visual robustness, faster policy adaptation, stronger language grounding, or a more reliable offline benchmark.
Once the target behavior is explicit, the buyer should list the distribution gaps: robot embodiment, camera placement, object taxonomy, geography, site layout, lighting, operator skill, safety constraints, and task boundary definitions. A public record earns its place when it closes one or more of those gaps. It becomes risky when it only matches the keyword. The review is intentionally structured around modalities, formats, rights, and fit because those are the dimensions that decide transfer, not the popularity of a dataset name.
The holdout set should be closer to deployment than the public source. If this record is used for pretraining, the acceptance metric should be measured on target-environment examples that the model did not see. If the record is used for benchmarking, the benchmark should expose failure modes that matter to the product. If the record is used only for schema planning, the test should measure whether a custom supplier can match or exceed the public schema with cleaner rights and metadata.
The final evaluation artifact is a decision memo. It states whether unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2 is approved for research only, approved for limited ingestion, rejected for rights reasons, rejected for poor fit, or promoted into a custom bounty brief. That memo links the Hub record, cites the dataset card documentation, includes sample QA results, and records any unresolved questions before the team spends money on scale.
PROCUREMENT PATH
A procurement workflow should treat this record as the intake layer. The first pass answers whether the record deserves attention. The second pass pulls samples and source documents. The third pass compares the record against curated alternatives and the buyer's target distribution. The fourth pass either approves limited use, rejects the record, or converts the findings into a custom data request.
The buyer should assign owners for three reviews: legal and rights, data engineering, and model evaluation. Legal verifies license, consent, redistribution, and derivative model use. Data engineering proves parsing, conversion, schema completeness, and quality checks. Model evaluation proves that the data improves or measures the target behavior. A record should not be called procurement-ready unless all three owners have signed off with evidence.
The buying brief needs concrete acceptance criteria. Name the number of accepted samples, required modalities, accepted formats, mandatory metadata fields, consent documentation, failure labels, and tolerable rejection rate. That turns a public source record into a request suppliers can satisfy.
Stage the scale decision: source review, sample gate, small paid pilot, then full collection or ingestion. That sequence keeps the buyer from paying for a large dataset that later fails rights, conversion, or evaluation. Internal links support that staged path: curated dataset comparisons, alternatives, license review, fit checking, and bounty-spec generation.
SOURCE LIMITS
This review does not claim that unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2 is safe for commercial training. It reports the captured license tag and points the reviewer back to the source record. It does not claim that every file is valid, because the only valid proof is a sample-level ingestion test. It does not claim that download count equals quality, because downloads are a prioritization signal, not a substitute for review.
The review also avoids private claims about the collection process. If the dataset card does not explain contributor consent, site permission, camera setup, hardware, or task labeling, mark those items as questions. That beats filling gaps with confident language.
The strongest next step for this record is structured source review: open the source record, inspect the card and files, run a sample, map it against target behavior, and document the rights posture. If the record survives that process, it can become an input to training or evaluation. If it fails, it can still teach the buyer what to ask for in custom data.
A broad catalog needs this conservative stance. Each record must separate source evidence, buyer inference, and open questions. That distinction turns generated research into a tool physical AI teams can use without inheriting hidden assumptions.
Refresh the crawl on a schedule. Hub records, tags, licenses, file lists, and dataset cards can change after a record is generated. Reopen the source record and compare it against the generated crawl details before acting. That final refresh step keeps the record useful as a research index without pretending it is a permanent legal or engineering audit.
BUYER QA
| Area | Current signal | Verification step |
|---|---|---|
| Rights | Apache 2.0 | Review license text, dataset card, source project, and downstream model training permissions. |
| Ingestion | Parquet | Download a small sample and prove conversion into the buyer's schema before any scale decision. |
| Provenance | unitreerobotics | Confirm who collected the data, what hardware was used, and whether contributor or site consent applies. |
| Freshness | Sep 16, 2025 | Check whether the record, files, and dataset-card terms changed after any model or procurement decision. |
| Fit | Tabular, Timeseries, Video | Compare the modality, task, robot, environment, and failure modes against the target deployment distribution. |
SOURCE REVIEW
| Review area | Current signal | Action before use |
|---|---|---|
| Source snapshot | Rank #565, 638 downloads, last modified 9/16/2025 | Save the dataset card, file list, license tag, and crawl date into the review memo before making a training or procurement decision. |
| Rights posture | Apache 2.0 | Compare the license tag with card text, repository license files, upstream sources, and downstream model-use permissions. |
| Format proof | Parquet | Parse a representative sample, validate checksums, and prove conversion into the buyer's target schema with no manual repair. |
| Modality proof | Tabular, Timeseries, Video | Confirm whether observations, actions, timestamps, labels, calibration, robot state, and success/failure markers exist together. |
| Access and freshness | Not marked gated in crawl | Record access approvals, available files, version date, and any terms that changed after review. |
| Author neighborhood | unitreerobotics/G1_WBT_Brainco_Collect_Plates_Into_Dishwasher, unitreerobotics/G1_Dex3_ToastedBread_Dataset, unitreerobotics/G1_WBT_Inspire_Put_Clothes_into_Washing_Machine | Review adjacent author records to see whether this record is a one-off upload, a dataset family, or part of a maintained release series. |
EVALUATION ROUTES
| Use route | Current signal | Pass/fail decision |
|---|---|---|
| Pretraining | Tabular, Timeseries, Video | Use only if the observations improve target-domain robustness on a held-out deployment-like validation set. |
| Imitation learning | Parquet; action/state fields still need sample proof | Require synchronized observations, actions, timestamps, robot state, task boundaries, and success labels before treating the record as demonstrations. |
| Benchmarking | Hub rank #565 with 638 downloads | Use rank as a community-attention signal, then design an evaluation that measures the buyer's target behavior rather than public popularity. |
| Custom collection brief | unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset_V2 plus curated truelabel alternatives | Translate the useful parts of the source record into a bounty with explicit modalities, rights artifacts, QA gates, and target-environment coverage. |
INTERNAL LINKS
Compare this Hub record against curated physical AI dataset profiles with buyer-readiness notes.
Use this when Parquet, MP4 video observations, and Hugging Face distribution are part of the ingestion plan.
Convert the license and usage terms into an explicit commercial-use risk review.
Check whether the record matches the buyer's modality, task, environment, and acceptance criteria.
RELATED HUB RECORDS
Shared signals: Robotics, Apache 2.0, Size Categories:100K<N<1M, Parquet.
Shared signals: Robotics, Apache 2.0, Size Categories:100K<N<1M, Parquet.
Shared signals: Robotics, Apache 2.0, Size Categories:100K<N<1M, Parquet.
Shared signals: Robotics, Apache 2.0, Size Categories:100K<N<1M, Parquet.
Shared signals: Robotics, Apache 2.0, Size Categories:100K<N<1M, Parquet.
Shared signals: Robotics, Apache 2.0, Size Categories:100K<N<1M, Parquet.
TAGS
TRUELABEL ROUTING
Use this source record as a starting point, then request a validated sample package with rights notes, consent artifacts, conversion proof, and deployment-fit checks.