Briefing topic

Bimanual manipulation briefings

Bimanual-manipulation briefings track two-arm teleop and demonstration data for humanoid and dual-arm policies. Each briefing names the rig (ALOHA-class, humanoid teleop chair, custom dual-arm), the operator tier, the action-space schema, and the buyer-readiness gap.

Updated 2026-05-21

By Truelabel Team

Reviewed by Truelabel Team · May 21, 2026

bimanual manipulation data

Request bimanual teleop How sourcing works

Quick facts

Topic: Bimanual manipulation
Reference rigs: ALOHA, ALOHA-Pro, Mobile ALOHA, humanoid teleop chair
Operator tiers: Tier-1 pick-place, Tier-2 folding/assembly, Tier-3 dexterous
Action-space schemas: RLDS, LeRobot — joint pos or EE delta
Adjacent topics: Teleoperation, consent, provenance

Why is bimanual manipulation data its own topic?

Bimanual manipulation data captures coordinated two-arm tasks: folding, packing, kitchen prep, assembly, and the long tail of household and warehouse work that single-arm policies cannot reach. Briefings under this topic focus on the corpora that anchor the category — ALOHA (2x 6-DOF + grippers at 50 Hz), ALOHA-Pro, Mobile ALOHA, humanoid teleop chairs — and the buyer-readiness gaps that determine whether each is procurement-grade. The unit of comparison is the same as in teleoperation: trajectories, not images.

The category is moving fast in 2026 because frontier deployments (humanoid manipulation, dual-arm warehouse work, surgical assist) all require bimanual policies ^[1]. Briefings here describe how operator skill, embodiment match, and sync tolerance change a corpus's commercial value. The recurring lesson: a bimanual dataset that does not match the buyer's embodiment is closer to a benchmark than a training set. The same lesson holds across Open X-Embodiment cross-embodiment training runs: even with 1.4M trajectories pretrained, deployment success on a custom dual-arm rig still requires 1,000-5,000 embodiment-matched fine-tuning episodes.

Operator economics are the under-discussed half of bimanual procurement. Two-handed teleop demands operator training and coordination skill above what single-arm capture requires; yield per operator-hour is roughly 40-60% of single-arm yield; the operator pool is 5-10x smaller, especially for skill-tier-3 tasks like assembly or assisted-surgical procedures.

What should procurement ask about bimanual data?

A bimanual procurement conversation should resolve embodiment, operator tier, action-space schema, sync tolerance, and accept rule before signing. Embodiment is the first cut: ALOHA, ALOHA-Pro, Mobile ALOHA, humanoid teleop chair, custom dual-arm ^[2]. Each rig produces a different kinematic distribution, and a buyer training on one and deploying on another carries the embodiment gap as silent training error.

Operator tier is the second cut. Bimanual tasks split into tiers — Tier-1 (basic pick-and-place coordinated across two hands), Tier-2 (folding, packing, simple assembly), Tier-3 (complex assembly, kitchen prep, assistive procedures). Operator skill at the tier determines accept rate; operator skill below the tier produces sessions that look complete and fail the accept rule ^[3]. Ask the supplier for tier mix in the proposed corpus and tier-specific accept rates from prior deliveries.

Action-space schema is the third cut. RLDS and LeRobot both serialise bimanual action vectors, but the field convention varies: some pipelines log left-arm and right-arm vectors separately, others log a concatenated action vector, others log joint targets versus end-effector deltas. A mismatch between the supplier's schema and the buyer's training loop is a conversion problem that loses precision; the procurement-grade pattern is to standardise the schema at spec time.

Sync tolerance and accept rule close the conversation. Sub-frame drift across left arm, right arm, and two cameras is the procurement bar ^[4]; an accept rule that names contact-event completion, grasp continuity, and operator-side procedural correctness is the procurement-grade specification.

What rigs anchor the bimanual surface?

ALOHA is the published bimanual reference. The leader-follower bimanual setup produces high-fidelity dual-arm traces at modest hardware cost, which is why the academic and early-commercial bimanual literature cites it heavily ^[2]. For procurement, ALOHA-class data is the natural pretraining substrate for any bimanual policy; the trade-off is that ALOHA-published corpora are often research-scoped and require consent and derived-model rights review before they touch a commercial training pipeline.

ALOHA-Pro and Mobile ALOHA extend the platform — Pro toward higher-fidelity manipulation, Mobile ALOHA toward whole-body coordination with locomotion. Both inherit ALOHA's action-space conventions and broaden the task distribution. Buyers training a humanoid policy that involves locomotion alongside manipulation will often cite Mobile ALOHA as the closest published embodiment.

Humanoid teleop chairs and exoskeleton-style interfaces are the high-skill-ceiling end of the category. They produce action distributions that closer match deployment for upper-body dexterous tasks, at the cost of a smaller operator pool and a higher per-trajectory price ^[1]. Custom dual-arm rigs — common in industrial and surgical deployments — are the procurement-grade endpoint where the buyer specifies the rig and the supplier matches.

Across all of these rigs, the technical signature of usable bimanual data is consistent: hardware-triggered sync, MCAP or equivalent frame indexing, RLDS or LeRobot serialisation, and per-session metadata that names embodiment, operator tier, and accept rule.

Rig	Action distribution	Operator pool size	Procurement role
ALOHA	Standard bimanual	Trained Tier-2	Pretraining baseline
ALOHA-Pro	Higher-fidelity bimanual	Tier-2 to Tier-3	Pretraining + selective fine-tuning
Mobile ALOHA	Bimanual + locomotion	Tier-2 to Tier-3	Humanoid whole-body pretraining
Humanoid teleop chair	Upper-body dexterous	Narrow Tier-3	Deployment substrate (humanoid)
Custom dual-arm rig	Buyer-specified	Trained Tier-2 to Tier-3	Deployment substrate (industrial)

Bimanual rigs by deployment match and operator pool. Embodiment match is the strongest predictor of deployment-time policy success.

Where does bimanual procurement break down?

The dominant failure mode is embodiment mismatch. A team pretrains on ALOHA data, fine-tunes on a small custom corpus, then deploys on a humanoid; the policy has learned ALOHA's kinematics and the deployment-side action distribution is different ^[5]. The fix is not more pretraining data — it is more embodiment-matched fine-tuning data. Briefings flag the embodiment match explicitly when a corpus is being considered as a deployment substrate rather than a pretraining baseline.

The second failure mode is operator-tier mismatch. A supplier ships Tier-1 operators on a Tier-3 task and the corpus passes raw QA (footage looks complete) and fails policy training (the policy learns coordination errors that the operator made systematically). The fix is to specify operator tier per task in the spec and verify the supplier's prior Tier-3 yield before signing.

The third failure mode is sync drift that compounds across the trajectory. Bimanual policies are more sensitive to inter-stream drift than single-arm ones because two action streams need to remain aligned with vision ^[6]. A corpus with frame-level drift between left arm and right arm trains a policy that misjudges coordination timing; the failure mode appears at evaluation as a class of tasks the policy never closes. Briefings name the sync tolerance explicitly because the failure is not visible in spot-checks of the footage.

Bimanual sourcing workflow

Every truelabel bimanual sourcing request runs the same four-step workflow as teleop in general, with operator-tier and sync-tolerance added as explicit checks ^[3]. The steps below are the operational template; skipping a step is what produces the three failure modes named earlier.

The fourth step — sample-acceptance gate — is the structural check that catches operator-tier and sync-drift failures before they scale into a six-figure capture programme. A sample that passes the gate is a sample that produces deployment-grade policy training; one that does not is a research baseline.

01
Specify embodiment and operator tier
Name the deployment rig (ALOHA-class, humanoid chair, custom). Require Tier-2 minimum, Tier-3 for assembly / surgical-assist / dexterous tasks.
02
Lock action-space schema and field conventions
RLDS or LeRobot, joint-position or end-effector delta, left-arm and right-arm separated or concatenated. Standardise at spec time, not at delivery.
03
Verify sync tolerance at sample review
Sub-frame drift across left arm, right arm, and two camera angles. Hardware-triggered timestamping or MCAP frame indexing. Verification artifact, not a claim.
04
Run a written accept rule
Contact-event completion, grasp continuity, operator-side procedural correctness. Sample-acceptance gate before volume capture begins.

How does bimanual compose with teleop, consent, and provenance?

Bimanual is a specialised slice of teleoperation. A briefing tagged bimanual almost always carries teleoperation as a secondary tag because the capture mechanics are shared. The cross-link to consent is the rights stack; the cross-link to provenance is the metadata chain that makes embodiment and operator-tier claims auditable. The serialisation surface — RLDS or LeRobot per the datasets format guide — is shared as well.

The cross-link to VLA training (see also the VLA glossary) is also load-bearing. A bimanual corpus that fine-tunes a VLA from a cross-embodiment pretrained baseline is the dominant deployment pattern; the briefings under this topic flag corpora that compose cleanly into that pattern and those that do not. The pair pattern typically combines 200-400 hours of egocentric video for the perceptual prior with 3,000-8,000 bimanual teleop trajectories routed via the sourcing brief into the marketplace.

Briefing index and recurring patterns

Briefings tagged bimanual-manipulation share a recurring shape: the rig, the operator tier, the action-space schema, the sync tolerance, and the buyer implication. The pattern lets a procurement reader scan an archive and exit with a prioritised candidate list for fine-tuning capture.

Use this topic when scoping data for any two-arm policy. Pair it with teleoperation for the capture mechanics and with consent for the rights stack.

Practical patterns: how a buyer uses bimanual-manipulation briefings in a sourcing memo

Procurement memos cite briefings for a reason: the briefings carry the source evidence the memo cannot reconstruct from a vendor pitch deck. A memo that names bimanual-manipulation as the load-bearing variable should quote the briefings that profile the candidate sources, copy the buyer-implication sentence verbatim, and date-stamp the citation so a re-audit cadence can be set against the freshness of the brief ^[2].

The first practical pattern is sequencing: scan the topic archive before any supplier outreach, narrow to two or three candidate sources, then enter supplier conversations with the briefing's buyer-implication sentence as the opening question. Suppliers who have read the same briefings tend to respond faster and more substantively because they can see the gap the buyer is trying to close. Suppliers who have not read them tend to pitch their default offering, which is usually a poor match for a topic-specific sourcing request.

The second pattern is composition. A briefing under bimanual-manipulation rarely lives alone — it almost always carries a secondary tag covering one of the procurement layers (consent, licensing, commercial-use, provenance). A memo that quotes any bimanual-manipulation briefing should also quote the corresponding briefing under the secondary tag, so the procurement question is answered across both layers rather than only the primary one ^[7].

The third pattern is the buyer-implication chain. Each briefing's buyer-implication sentence becomes a memo line; each memo line becomes a supplier question; each supplier question becomes a contract clause; each contract clause becomes a delivery-acceptance check. A briefings archive used this way is not a reading list — it is the procurement workflow with citations attached workflow guidance.

What good looks like across bimanual-manipulation briefings

Across the bimanual-manipulation archive, the briefings that survive a deployment review six months later share a pattern. They name the source with version, they cite the rights and consent posture inside the source (not the dataset card), they identify the embodiment or capture rig explicitly, they date-stamp the review, and they end with one sentence a procurement memo can quote without modification. The pattern is shorter than the typical research write-up because the audience is different — a procurement reader does not need the lit review, they need the buyer implication.

A good briefing also names what is missing. The hardest part of writing a buyer-grade brief is admitting that a candidate source does not clear the bar for the deployment context. Briefings under bimanual-manipulation that name the gap explicitly are more useful than briefings that paper over it, because the procurement memo has to cite the gap to defend the decision to commission custom capture instead via the marketplace.

The third quality marker is freshness. Robotics datasets, vendor positions, and capture rigs move quickly. A briefing that is six months old needs a freshness header that says so; a briefing that has been re-audited and confirms the original position needs a date-stamp on the re-audit. Briefings under bimanual-manipulation that maintain this freshness cadence are the ones procurement teams cite repeatedly across multiple sourcing engagements.

The fourth quality marker is cross-link discipline. A briefing that closes by naming the adjacent topics it depends on (consent, licensing, provenance, embodiment, capture rig) gives the reader the entry point into the rest of the archive. Briefings under bimanual-manipulation that do this consistently let a procurement reader navigate the archive as a working surface rather than a flat list of articles.

Reading bimanual-manipulation briefings as a working file, not a static archive

The briefings under this topic are designed to be a working file. The archive is not a textbook; it is a procurement reference whose entries are written once, re-audited on cadence, and discarded when the underlying source changes in a way that invalidates the original brief. A buyer who treats the archive as a working file gets value from it every quarter; a buyer who treats it as a static archive reads it once and never returns.

Use the archive in three modes. In sourcing-decision mode, scan the topic, narrow to two or three candidates, and enter supplier conversations with the buyer-implication sentence as the opening question. In re-audit mode, revisit the briefings whose sources have changed (publisher term updates, contributor withdrawals, new releases) and update the procurement memos that cite them. In planning mode, read the topic archive end to end to build a mental model of where the buyer-readiness gaps cluster and what the dominant recommendation patterns look like.

The fourth use case is briefing-to-briefing comparison. A buyer reading two briefings under bimanual-manipulation side by side can compare the buyer-implication sentences directly because the briefings follow the same structural shape. The comparison is the lightest-weight diligence step in the workflow and the most common reason to enter the archive in the first place. Briefings under bimanual-manipulation are written to support this comparison: same shape, same fields, different sources ^[2].

A working archive also needs an entry point and an exit point. The entry point is this topic page, with its TL;DR, sample-spec quick-facts, comparison table, and steps block. The exit point is the briefing card whose buyer implication a procurement memo cites. Everything between is the reading workflow the briefings are designed to support.

Common mistakes when buyers ignore bimanual-manipulation

The dominant mistake when bimanual-manipulation is treated as a secondary concern is sequencing: the buyer commits to a source on the basis of the catalog presence, the licence label, or the supplier pitch, and discovers the bimanual-manipulation-related gap weeks or months later when the policy is already partway through training. The cost of that mistake is retraining cost plus schedule cost; the structural fix is to treat bimanual-manipulation as a gating field before training compute, not after ^[2].

The second mistake is partial coverage. A corpus that scores well on bimanual-manipulation for 80% of trajectories and poorly for 20% is not 80% usable — it is unusable for any pipeline that cannot filter at the trajectory level. The briefings under this topic flag partial-coverage candidates explicitly because the gap is structural and the fix is rarely available downstream. The procurement-grade pattern is to require complete coverage at the spec level or to plan for the surgical removal of the non-compliant fraction before training starts.

The third mistake is reliance on aggregator labels. Aggregators pool sources under a single banner and a single posture, but the upstream chain frequently breaks at the second or third hop ^[7]. A buyer using an aggregator-licensed corpus needs to verify that every upstream source supports the aggregator's release terms; aggregators rarely surface this verification, so the buyer carries the diligence cost. Briefings under bimanual-manipulation flag aggregator-inherited risk for the cases where the inheritance chain is most likely to break.

The fourth mistake is treating the topic as resolved when only the label has been checked. bimanual-manipulation is an engineering and contractual problem; resolving it requires evidence (sample artifacts, audit trails, per-trajectory metadata) rather than assertion. Suppliers who can produce evidence are procurement-grade; suppliers who can only assert are research baselines. The briefings under this topic name the evidence explicitly so the buyer can distinguish between the two.

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

Physical AI data guidesGuide hub Multi-Task Learning RoboticsDefinition and terminology Bimanual manipulation training dataTask-specific requirements Best teleoperation data providers 2026Supporting guide Physical AI data marketplaceBuyer conversion page Hand-Object Interaction Data for RoboticsDefinition and terminology Household task data for domestic robotsSupporting guide Dexterous manipulation training dataTask-specific requirements

External references and source context

Figure + Brookfield humanoid pretraining dataset partnership
Figure AI and Brookfield announced a partnership to capture humanoid teleoperation data in deployment environments, including bimanual upper-body tasks.
figure.ai ↩
Teleoperation datasets are becoming the highest-intent physical AI content category
ALOHA's leader-follower bimanual rig produces high-fidelity dual-arm traces at modest hardware cost.
tonyzhaozh.github.io ↩
truelabel physical AI data marketplace bounty intake
Truelabel routes bimanual sourcing requests to vetted capture partners with operator-tier verification and sync-tolerance audits.
truelabel.ai ↩
MCAP file format
MCAP enables hardware-triggered timestamping and frame-level indexing across multiple robot and camera streams.
mcap.dev ↩
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA shows how cross-embodiment pretraining followed by deployment-rig fine-tuning composes into a usable policy.
arXiv ↩
Project site
DROID demonstrates large-scale real-world manipulation capture with synchronized observations and actions across diverse scenes.
droid-dataset.github.io ↩
RLDS: Reinforcement Learning Datasets
RLDS defines episode-based robot-learning datasets with per-step observations, actions, and metadata, used by ALOHA-class corpora.
GitHub ↩
truelabel Open X-Embodiment glossary
Truelabel glossary entry on Open X-Embodiment.
truelabel.ai

FAQ

Why is bimanual teleop data harder to source than single-arm?

Capture requires two synchronised action streams, two camera angles minimum, and operators trained on coordinated tasks. Yield is lower, accept rate is more variable, and embodiment-specific rigs limit the operator pool.

Can a single-arm dataset bootstrap a bimanual policy?

Partially. Single-arm corpora help component skills (grasping, picking) but rarely transfer to tasks that require coordination — folding, packing, two-hand assembly. Bimanual deployments almost always commission bimanual data.

What's the most common bimanual data acceptance rule?

Synchronised streams with sub-frame drift, complete proprioception on both arms, gripper-state continuity, and operator-side action commands logged with millisecond timestamps.

How does operator-tier mismatch surface in training?

The corpus passes raw QA because footage looks complete, but policy training stalls because the operator's coordination errors are baked into the action distribution. The fix is to specify operator tier per task and verify the supplier's tier-specific yield before signing.

Looking for bimanual manipulation data?

Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners and helps scope consent artifacts and commercial licensing requirements before delivery.

Request bimanual teleop