DATASET PROFILE

UMI

Universal Manipulation Interface is an in-the-wild human demonstration framework for transferring portable gripper data to robot policies.

DIRECT ANSWER

UMI is useful when teams want portable, task-centric human demonstrations that can transfer to robot policies. It is a collection method and dataset reference, not a generic rights-cleared corpus for any commercial model.

Commercial use unclear

Commercial use

Public dataset rights are treated conservatively until source terms, contributor consent, and downstream model use rights are reviewed.

Medium consent risk

Consent risk

Risk reflects identifiable people, private spaces, and whether consent artifacts are obvious from public documentation.

Source-backed note

Scale

Project materials emphasize portable in-the-wild data collection and fast demonstrations for tasks such as cup manipulation, dish washing, cloth folding, and dynamic tossing.

truelabel review

Last checked

May 1, 2026

MODALITIES

Egocentric video
Teleoperation
Proprioception

TASKS

Bimanual Manipulation
Deformable Manipulation
Long Horizon Manipulation
Household Manipulation

FORMATS

MP4
JSON
HDF5

ENVIRONMENTS

In the wild
Home
Restaurant
Tabletop

Best for

portable in-the-wild demonstrations
bimanual and dynamic manipulation
human-to-robot transfer research

Limitations

collection method may require custom hardware
rights and consent depend on capture setup
policy transfer still needs target robot validation

Gap recommendations

define contributor consent before capture
collect target robot calibration runs
separate human demonstration rights from robot rollout data

UMI belongs in a physical AI source review, not a generic dataset listing. The useful question is whether its evidence, modalities, rights posture, and task distribution can improve a specific model workflow. The source material below forms the citation trail for procurement decisions. The profile summarizes what the dataset appears to cover, then separates that evidence from the open questions a buyer still has to resolve.

The strongest starting point is the stated use case: portable in-the-wild demonstrations, bimanual and dynamic manipulation, human-to-robot transfer research. That tells a reviewer what the dataset can plausibly support before a custom collection is necessary. A buyer should map those use cases to a concrete model objective such as pretraining, imitation learning, simulation-to-real evaluation, perception robustness, benchmark comparison, or supplier brief design. If the model objective is not connected to the dataset's strongest coverage, the dataset should stay in research context rather than procurement planning.

The current summary says: Universal Manipulation Interface is an in-the-wild human demonstration framework for transferring portable gripper data to robot policies. That summary is intentionally conservative because dataset names often travel farther than the documentation behind them. The profile does not assume that public availability means commercial training approval, target-domain fit, or metadata completeness. Instead, it gives the buyer a repeatable review path: inspect the source, pull a sample, validate the schema, trace rights, and compare the record against alternatives before approving scale.

UMI usually comes up after the buyer has moved past a broad "robotics dataset" search and needs to know whether this exact source can help with a real model decision. The review therefore focuses on details that change procurement outcomes: Egocentric Video, Teleoperation, Proprioception, Bimanual Manipulation, Deformable Manipulation, Long Horizon Manipulation, Household Manipulation, MP4, JSON, HDF5, commercial-use clarity, consent exposure, and the collection gaps that would need a truelabel bounty or supplier request.

The source trail begins with Project site. A reviewer should open the source, record the date, save the relevant license or access language, and note which claims come from the source versus which are buyer-side interpretation. That distinction matters because public dataset pages can change, papers may describe a research release rather than procurement terms, and mirrored downloads may not preserve the same rights or metadata details.

The scale note for this profile is: Project materials emphasize portable in-the-wild data collection and fast demonstrations for tasks such as cup manipulation, dish washing, cloth folding, and dynamic tossing.. Scale is useful, but it is not the same as usable volume. Physical AI teams need accepted episodes, usable frames, aligned state/action records, and metadata that survives conversion. A large corpus can yield little production value if most examples do not match the target robot, task, camera geometry, scene distribution, or consent requirements. A smaller corpus can win when the sample quality and review trail are stronger.

The public evidence should be reviewed at three levels. First, the dataset-level claim: what the source says the release is. Second, the file-level claim: what is actually present in the downloadable or requestable files. Third, the sample-level claim: whether a representative sample can be parsed, validated, and mapped into the buyer's training or evaluation schema. A strong review keeps those levels separate so the buyer does not mistake a paper abstract for ingestion proof.

This is also where citations matter. The profile's source list is not decoration; it is the beginning of a review memo. If the source says one thing about usage terms and a repository says another, the buyer should resolve that conflict before model use. If the source does not describe consent, site permission, or downstream derivative-use rights, the page should preserve that uncertainty rather than replacing it with optimistic language.

The commercial-use signal for UMI is Commercial use unclear. That signal is a starting point, not a final approval. A buyer should confirm whether source terms allow model training, commercial deployment, internal evaluation, redistribution, checkpoint release, and derivative model use. Those are different questions. A release can be acceptable for academic research while still being unclear or unsuitable for a commercial product.

The consent signal is Medium consent risk. Consent risk is especially important for embodied data because the recording context may include human demonstrators, operators, homes, workplaces, private facilities, bystanders, faces, voices, object labels, or site layouts. Even if the license appears permissive, a procurement review should ask whether people and sites understood the downstream use and whether those permissions transfer to model training or evaluation.

Derivative-use risk deserves its own line item. Many teams ask only whether they can download data, but the more important business question is whether they can use the data to train a model that will later be sold, deployed, or provided as a service. The review should document whether the source terms mention model weights, outputs, sublicensing, attribution, non-commercial clauses, privacy limitations, or restrictions tied to source contributors.

If any of those answers are unclear, UMI can still work as research context, benchmark context, or a source of data-shape requirements. It just should not be treated as a cleared commercial training asset. The right procurement move is to turn the useful parts into a custom collection spec with explicit rights artifacts, contributor consent, and sample acceptance rules rather than hoping a public source will cover the deployment use case.

UMI is indexed with these modalities: Egocentric Video, Teleoperation, Proprioception. The modality list tells the buyer what kind of signal is present, but it does not prove the signal is aligned or complete. RGB-D, video, proprioception, point clouds, motion capture, action logs, and task labels each become useful only when timestamps, calibration, coordinate frames, episode boundaries, and metadata are consistent enough for the training pipeline.

The format signal is MP4, JSON, HDF5. Formats are procurement-relevant because they decide loader work, conversion risk, and sample QA cost. A familiar extension does not guarantee a good schema. HDF5, RLDS, Parquet, MCAP, ROS bag, MP4, JSON, and custom archives can all hide missing fields or unit mismatches. The buyer should verify syntax first, then verify semantics: the files open, the rows or episodes mean what the source says, and the conversion preserves the fields the model will use.

The task signal is Bimanual Manipulation, Deformable Manipulation, Long Horizon Manipulation, Household Manipulation. Tasks should be mapped to a buyer's target behavior at the level of objects, environment, action sequence, and success criteria. A dataset that says it covers manipulation may still miss the exact grasp type, tool use, lighting, object category, safety constraint, or recovery behavior that a deployment model needs. The sample review should therefore include positive examples, edge cases, and failures rather than only the cleanest demonstration.

The embodiment signal is Franka. If the target system uses a different robot, the dataset may still help perception, representation learning, language grounding, or benchmark comparison, but the buyer should be careful about direct policy transfer. Differences in kinematics, gripper geometry, action space, camera placement, control frequency, and operator workflow can turn a seemingly close dataset into an expensive adaptation project.

The environment signal for UMI is In the wild, home, restaurant, tabletop. Environment fit is one of the main reasons public datasets underperform in deployment. Lighting, object mix, geography, room layout, weather, floor surfaces, clutter, camera mounting, safety constraints, and operator behavior can all shift the model distribution. A buyer should write those deployment assumptions down before comparing datasets.

The main limitation to test is: collection method may require custom hardware. Limitations are not defects when they are visible; they are the boundary conditions that keep a model team from overusing the data. The review should decide whether the limitation blocks the target use case, can be handled through filtering, can be solved through a small custom supplement, or makes the dataset useful only as background research.

The gap recommendations for this dataset are: define contributor consent before capture; collect target robot calibration runs; separate human demonstration rights from robot rollout data. Those recommendations should become collection requirements when the buyer needs production confidence. Instead of asking for more generic robot data, the buyer can ask for missing object classes, target geography, harder lighting, failure examples, consent artifacts, or a specific file schema. That is how a public dataset profile becomes a practical supplier brief.

Deployment fit also affects evaluation. The holdout set should look more like the buyer's deployment than the public dataset does. If UMI is used for pretraining, the model should be judged on target examples outside the public distribution. If it is used for benchmark context, the benchmark should measure failure modes that matter to the product. If it is used for schema planning, the custom supplier should prove parity or improvement on the fields that matter.

A practical review starts with 10 to 25 representative samples, not the full dataset. The buyer should include easy cases, hard cases, malformed cases, and rejected cases if the source makes them available. Each sample should be checked for file integrity, metadata completeness, timestamp alignment, label meaning, consent artifacts where relevant, and conversion into the target schema. Without that gate, a team can spend engineering time cleaning data it should never have approved.

The acceptance rules should be concrete. An episode is accepted only if required observations exist, action/state fields align, timestamps are monotonic, task labels are machine-readable, source provenance is recorded, and the example matches the target distribution. If the dataset is visual-only, the rules should say that explicitly. If the dataset lacks actions or calibration, the model team should decide whether perception pretraining is enough to justify ingestion.

A rejection log is as important as an acceptance log. It tells the buyer whether problems are isolated or systemic: missing metadata, inconsistent units, broken paths, corrupted media, unclear rights, non-target environments, low-quality demonstrations, or unrepresentative objects. That log becomes evidence for either rejecting the public source or commissioning a cleaner custom collection.

The sample gate should end in a written decision: approve for research only, approve for limited ingestion, reject for rights, reject for data shape, or convert into a custom bounty. That decision should cite the source list below, include sample QA outputs, and state which unresolved questions remain. Without a written decision, a dataset profile can create attention without actually improving procurement quality.

The recommended first step is source verification. Open the cited sources, capture the current terms, and confirm whether the page, paper, repository, and downloadable files agree. The second step is schema verification: pull a sample, parse it, and prove the data can enter the buyer's pipeline. The third step is rights verification: confirm commercial use, consent, redistribution, and model-use rules. The fourth step is evaluation verification against target-domain data.

If UMI passes those checks, it can be used as part of a staged model workflow. That might mean pretraining, benchmark comparison, evaluation design, or limited fine-tuning. If it fails one check but remains conceptually useful, the buyer should preserve the useful specification details and fund a custom supplement. If it fails rights or consent checks, it should stay out of commercial training even if the model team likes the content.

The related research paths keep that decision tree moving. Alternatives help when UMI is close but blocked by task, modality, environment, or rights. Comparisons help when a buyer is choosing between two public options. Tools help convert a vague data need into a bounty spec with sample QA and procurement artifacts. That path matters because buyers usually arrive with a specific question and need a next action, not a dead-end catalog entry.

The final recommendation is to use UMI as a cited reference and a review input, not a default answer. Public datasets help when they reduce uncertainty. They create risk when they create false certainty. A high-quality buyer page should make the uncertainty visible, document what is known from sources, and show exactly how to close the remaining gaps before model use.

A useful internal scorecard should give UMI separate grades for source confidence, rights clarity, consent confidence, schema completeness, model-task fit, environment fit, and sample quality. Those grades should not be averaged into one vague score. A dataset can be excellent for research and still fail commercial use. It can be easy to parse and still fail deployment transfer. Keeping the dimensions separate protects the buyer from approving a dataset because one impressive signal hides a blocker elsewhere.

The scorecard should assign evidence owners. Legal or operations should own rights, consent, contributor terms, and redistribution risk. Data engineering should own loader proof, schema conversion, checksums, timestamp alignment, metadata completeness, and parse failures. The model team should own behavior fit, holdout evaluation, target-domain transfer, and whether the dataset's limitations affect the intended product. If nobody owns a dimension, that dimension should stay unresolved rather than being quietly assumed.

The buyer should also record a confidence reason for each score. For example, "source terms explicitly allow commercial model training" is different from "license tag appears permissive." "Twenty accepted samples converted with no manual edits" is different from "the source says HDF5." "Held-out warehouse examples improved" is different from "dataset contains manipulation." This language forces the memo to distinguish verified evidence from plausible inference.

The last line of the scorecard should be an action, not a score. Approve UMI for research only, approve it for a narrow ingestion experiment, request counsel review, commission a supplement, reject for rights, reject for schema, or use it only as a specification reference. That action orientation is what turns a long dataset page into a practical tool for the people who actually have to buy, clean, or train on data.

UMI starts with its canonical public source ^[1], then cross-checks buyer discovery paths before any data is treated as procurement-ready ^[2].

A dataset in downloadable or non-downloadable form.
Source: Schema.org Dataset definition

A buyer should compare the source trail with the stated modalities (egocentric-video, teleoperation, proprioception), formats (mp4, json, hdf5), and known limitations before accepting UMI as training or evaluation evidence ^[3].

UMI procurement review checklist
Review area	Current signal	Buyer question
Commercial training use	Commercial use unclear	Do the public terms explicitly allow downstream model training, commercial deployment, sublicensing, and derivative checkpoints?
Contributor and site consent	Medium consent risk	Can the source prove consent for identifiable people, private spaces, operators, and any site-specific footage that appears in the samples?
Observation and action stack	Egocentric Video, Teleoperation, Proprioception	Are observations, actions, timestamps, camera calibration, robot state, gripper state, and task labels present at the cadence your training pipeline expects?
Embodiment fit	Franka	Does the embodiment match the target robot closely enough, or is this only useful for perception pretraining or benchmark context?
Environment fit	In the wild, home, restaurant, tabletop	Do scenes, lighting, geography, object mix, failure modes, and operator workflow match the buyer's deployment distribution?
Ingestion proof	MP4, JSON, HDF5	Can a small sample be parsed, validated, and converted into the buyer's training format before any larger transfer or capture is funded?

No truelabel sample parse has been performed for this public-source profile. Treat the checklist as the acceptance plan a buyer would run before model use.

UMI sample acceptance checks
Check	Why it matters here	Acceptance question
Sample acceptance	portable in-the-wild demonstrations	What concrete pass/fail rules decide whether one episode, clip, scan, or trajectory counts as accepted?
Negative examples	collection method may require custom hardware	Does the dataset include failed attempts, recoveries, occlusions, dropped objects, out-of-distribution scenes, and rejected samples?
Metadata completeness	MP4, JSON, HDF5	Are task IDs, timestamps, camera IDs, calibration files, split definitions, and provenance fields present and machine-readable?
Evaluation transfer	define contributor consent before capture	Which held-out environments, objects, or tasks should be collected before trusting model performance outside the public benchmark?

UMI-style data for portable in-the-wild demonstrations, bimanual and dynamic manipulation, human-to-robot transfer research

Starter collection brief

Ask suppliers for 10-25 accepted samples before funding scale, with source files, metadata, rights notes, and a rejection log.

MP4, JSON, HDF5 plus manifest

Minimum delivery package

Require raw files, normalized metadata, checksums, collection notes, consent artifacts where relevant, and conversion instructions.

Egocentric Video, Teleoperation, Proprioception

Primary enrichment layer

Add missing camera calibration, action/state streams, object IDs, task boundaries, and evaluator notes before model ingestion.

Replacement search

UMI alternatives

Compare nearby datasets when the blocker is task coverage, modality, rights clarity, consent, or deployment environment fit.

Head-to-head comparison

UMI vs ALOHA

Use UMI when portable in-the-wild human demonstrations are central; use ALOHA when the buyer needs low-cost bimanual teleoperation data aligned to a robot platform.

Nearby dataset

AgiBot World

AgiBot World is relevant when teams want to understand the direction of large robot-data factories and high-volume manipulation corpora. Buyers still need to verify access terms, export constraints, embodiment match, and whether the data can support their downstream commercial model use.

Nearby dataset

LeRobot datasets

LeRobot datasets matter because the format and Hub ecosystem are becoming a practical distribution layer for robotics data. Buyers should evaluate individual datasets separately because each repository can differ in rights, hardware, task quality, and schema completeness.

Nearby dataset

RoboSet

RoboSet is useful for studying real-world household manipulation with multiple camera views and task hierarchy. Buyers should still validate access, commercial use, camera calibration, and whether kitchen tasks match their deployment distribution.

A dataset record is only useful when it connects into the rest of the buyer workflow. The next review step is usually not another summary; it is a fit check, rights triage, source comparison, or custom bounty spec that names the missing proof.

For physical AI teams, the hard question is whether the public source can support a specific model objective under real deployment constraints. That requires adjacent dataset records, tools, comparisons, and sourcing paths, plus external references that a reviewer can open and challenge.

Use the links below to keep the review grounded. Start broad when discovery is incomplete, move into profile and comparison pages when the candidate source is known, and switch to custom collection when the blocker is rights, consent, geography, robot embodiment, or target environment coverage.

Curated profiles

Physical AI dataset catalog

Use the catalog to compare source-backed dataset profiles by modality, task, rights signal, consent risk, and deployment fit.

Broad discovery

Hugging Face robotics index

Scan the broader robotics dataset surface before narrowing into promoted profiles, comparisons, and custom collection specs.

Freshness layer

Dataset changelog

Track source updates, licensing notes, and buyer-readiness changes that should trigger a renewed review.

Buyer workflow

Dataset fit checker

Score whether a public source is enough for the model, rights path, modalities, and target environment.

Rights triage

License risk checker

Separate source license language from contributor consent, redistribution, private-space risk, and model-use assumptions.

Custom data path

Data spec generator

Turn a public-source gap into a scoped capture request with sample QA, metadata, and delivery requirements.

Supplier research

Vendor alternatives hub

Compare data providers when the answer is not another public dataset but a better sourcing or capture route.

Market map

Data annotation companies

Use the company index to separate annotation vendors, data engines, marketplaces, and specialist capture teams.

External reference

Scale AI physical AI data engine

Market context for why physical AI systems need custom, enriched, real-world data beyond generic labeling workflows.

External reference

LeRobot documentation

Robotics dataset and tooling context for Hugging Face based collection, sharing, conversion, and training workflows.

External reference

Open X-Embodiment

A cross-embodiment robotics dataset reference for comparing trajectory scale, robot diversity, and VLA training assumptions.

External reference

DROID dataset

A large in-the-wild robot manipulation dataset reference for real-world trajectory capture and deployment transfer risk.

Sources

TRUELABEL ROUTING

Need a cleaner version of this data?

Turn the missing modality, geography, task, or rights profile into a truelabel bounty with sample QA before full delivery.

Generate a bounty spec

UMI

Commercial use

Consent risk

Scale

Last checked

Best fit, limitations, and custom data gaps

Best for

Limitations

Gap recommendations

What UMI is actually useful for

How to read the public evidence for UMI

Commercial use, consent, and derivative model risk

Modalities, formats, and ingestion risk

Where public coverage usually breaks

Minimum proof before scaling beyond research

Recommended buyer path for UMI

How a team should score this dataset internally

Source review markers for UMI

Procurement fit markers for UMI

What a buyer should verify before using UMI

How to test whether the data is actually usable

Turn the public reference into a buyer-ready data spec

Starter collection brief

Minimum delivery package

Primary enrichment layer

Where to go next after reviewing UMI

UMI alternatives

UMI vs ALOHA

AgiBot World

LeRobot datasets

RoboSet

Use this record as part of a broader dataset review

Continue the buyer workflow

Physical AI dataset catalog

Hugging Face robotics index

Dataset changelog

Dataset fit checker

License risk checker

Data spec generator

Vendor alternatives hub

Data annotation companies

Source context to verify

Scale AI physical AI data engine

LeRobot documentation

Open X-Embodiment

DROID dataset

Sources

Need a cleaner version of this data?