Sub-vertical sourcing
Household task data for domestic robots
Household task data is real-world capture of humans performing domestic activities — cleaning, laundry, kitchen prep, kid-care, light home repair — used to train household and domestic robots. Compared to warehouse or industrial data, household capture is constrained by extended consent (multi-resident households, minors), environmental variability (no fixed lighting, clutter, pets, soft surfaces), and longer task horizons. Buyers source through truelabel when their deployment context requires fresh capture with commercial-use rights, child-safety review, and the kind of metadata public datasets like Ego4D or EPIC-KITCHENS don't carry.
Quick facts
- Request type
- OTS or NET_NEW exclusive collection
- Modality
- Egocentric video + hand pose + IMU
- Environment
- US/CA residential, multi-resident households
- Volume
- 50-200 hours first-batch, 1,000+ scale-up
- Rights
- Commercial training, multi-resident consent with revocation
Comparison
| Source | Strength | Limitation |
|---|---|---|
| Ego4D / EPIC-KITCHENS | Research-grade activity coverage at scale | Restrictive commercial licensing |
| Generic stock footage | Fast to license | No task metadata, no robot-relevant signals |
| Internal household capture | Full control over rig and protocol | Slow recruitment, fixed cost, consent overhead |
| truelabel household sourcing | Vetted collectors, child-safety-reviewed protocols, commercial licensing | Requires defined spec + first-batch QA |
Why household data is hard to procure off-the-shelf
Public egocentric corpora hit two walls when buyers try to use them for commercial domestic-robot training. Ego4D and EPIC-KITCHENS cover the activity surface, but EPIC explicitly blocks commercial use [1], and most Ego4D contributions carry researcher-only licensing terms that don't transfer to model deployment [2]. The deeper issue is that household environments don't replicate: lighting, clutter, soft surfaces, and pet presence vary per-house in ways that warehouse capture rigs ignore. Buyers training for commercial deployment need capture inside houses that match their deployment context, not benchmark houses.
[1]"You may not use the material for commercial purposes."
That single clause is the practical reason commercial domestic-robot programs can't lean on the strongest available household corpus, and the forcing function for everything that follows. The buy-side response is humanoid commercialization. GR00T-style architectures require heterogeneous data pyramids [3], and household deployments — laundry-fold, kitchen prep, surface-clean — are the consumer-facing use cases driving the next wave of capture demand. Truelabel routes those requests through factory-style pipelines [4] with vetted collectors who run extended-consent protocols (multi-resident, minors, pets) and ship in LeRobotDataset v3 [5].
- Cleaning — surface wipe, floor mop, vacuum, tidy-up
- Laundry — sort, load, fold, put-away
- Kitchen — meal prep, dishwashing, fridge organization
- Kid-care — supervision, simple food prep, play interaction (extended consent)
- Light home repair — assembly, fastener install, simple fixes
Capture-platform fit for household sub-tasks
Household task families don't share a capture platform. Fine bimanual work — folding laundry, threading buckles, kitchen-prep handoffs — maps to ALOHA-style low-cost bimanual rigs because two-handed coordination is the dominant failure mode for single-arm baselines [6]. Surface-level cleaning maps to mobile-base manipulator capture with vacuum or wipe tooling. Object-rich tasks (fridge organization, cabinet sorting, tool-handling) lean on hand-object interaction capture in the HOI4D tradition, where per-object grasp specifications matter more than coarse activity labels [7]. The shared procurement principle across these: capture environmental diversity at the DROID scale [8], not benchmark-house uniformity.
- Bimanual fine manipulation (fold, thread, kitchen prep) → ALOHA-style rig
- Mobile cleaning (vacuum, wipe, tidy-up) → mobile-base manipulator with tooling
- Object-rich grasping (organize, sort, handle) → hand-object capture with per-object grasp specs
- Long-horizon kitchen flows (multi-step prep) → continuous egocentric + appliance state
Consent architecture for household capture
Household capture's binding constraint is consent, not capture volume. Multi-resident households require informed consent from every adult resident, separate guardian consent for any minor in frame, capture-pause protocols for any non-pre-consented party entering the scene, and per-session revocation terms attached to the contribution. Truelabel's standard household protocol embeds those gates in the collector's session checklist before capture starts, so consent doesn't become a post-hoc audit problem. Sessions involving minors run a child-safety review layer (capture window, task selection, supervision presence) on top of the base protocol.
Each contribution ships with a per-session consent artifact, location release where applicable, and downstream license verification metadata. That artifact is the procurement-side answer to a buyer's compliance team when they ask whether the dataset's commercial-use terms are defensible at the contribution level, not just the corpus level.
- Multi-resident informed consent — every adult resident, pre-briefed
- Guardian consent for minors — separate from adult consent
- Capture-pause protocol for non-pre-consented parties
- Revocation terms attached to every contribution
- Child-safety review layer for minor-involved sessions
How truelabel structures a household task request
Buyers specify embodiment (humanoid, mobile manipulator, fixed-base manipulator), target task family, and per-house environmental profile (single-family vs apartment, pet presence, child presence, soft-surface density). Truelabel pre-shares the evaluation rubric with the collector, runs a first-batch eval pack against the buyer's spec, and only scales after the buyer accepts the first batch. Delivery defaults to LeRobotDataset v3 or RLDS with per-session metadata including capture rig, consent artifacts, location release, and downstream license verification context.
- Embodiment + environment profile defined up front
- Multi-resident consent, child-safety review where applicable
- First-batch eval pack against buyer rubric before scale-up
- LeRobotDataset v3 / RLDS delivery with provenance metadata
Delivery format and quality bars
Household captures default to LeRobotDataset v3 because the schema accommodates the modality mix household robots actually need: synchronized egocentric video, hand pose at 30-60 Hz, IMU streams, optional force-torque where the embodiment carries the sensor, and per-episode language annotations [5]. RLDS is the alternative when buyers have downstream pipelines built against the canonical episode-step representation. Quality bars apply at three layers: per-frame (sensor sync within tolerance, no dropouts), per-episode (task labeling consistency, intent annotation), per-batch (environmental diversity coverage matches the buyer's deployment profile). First-batch eval surfaces any of those before scale-up burns budget.
- Default delivery: LeRobotDataset v3 (Parquet-based, multi-modal)
- Alternative: RLDS for buyers with existing episode-step pipelines
- Per-frame, per-episode, per-batch quality gates
- Environmental-diversity coverage verified at first-batch acceptance
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
External references and source context
- Project site
EPIC-KITCHENS-100 documents non-commercial licensing constraints that exclude its use for commercial model training, the principal reason domestic-robot buyers procure fresh capture.
epic-kitchens.github.io ↩ - Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D captures 3,670 hours of first-person daily-life activity across 74 worldwide locations, establishing baseline domestic-task coverage that buyers benchmark deployment-specific capture against.
arXiv ↩ - NVIDIA GR00T N1 technical report
GR00T N1 demonstrates that humanoid foundation models trained for household deployment require heterogeneous data pyramids spanning teleoperation, egocentric video, and synthetic generation.
arXiv ↩ - NVIDIA: Physical AI Data Factory Blueprint
Physical AI data programs require factory-style pipelines for curation, generation, evaluation, and training rather than passive scraping of consumer-uploaded content.
investor.nvidia.com ↩ - LeRobot dataset documentation
LeRobotDataset v3 is the de facto delivery format for household task captures, providing multi-modal time-series, multi-camera video, and rich metadata in a single schema.
Hugging Face ↩ - Teleoperation datasets are becoming the highest-intent physical AI content category
ALOHA's bimanual low-cost teleoperation rig is the dominant capture platform for household fine-manipulation tasks (folding, threading, kitchen prep) because two-handed coordination is the dominant failure mode for single-arm baselines.
tonyzhaozh.github.io ↩ - Project site
HOI4D's category-level hand-object interaction sequences expose the per-object grasp + manipulation specifications that domestic robots need beyond coarse activity labels.
hoi4d.github.io ↩ - DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID's 76,000 in-the-wild manipulation demonstrations across 564 scenes establish the procurement-quality bar for environmental diversity that household capture programs benchmark against.
arXiv ↩
FAQ
Why not just use Ego4D or EPIC-KITCHENS for household robot training?
Both corpora are useful baselines for activity recognition and task structure, but neither carries commercial training rights for production model deployment. EPIC-KITCHENS explicitly excludes commercial use. Ego4D's terms vary by contribution. Buyers shipping commercial domestic robots need fresh capture with revocable commercial licensing, the kind public datasets weren't built for.
What environmental conditions does household capture need to cover?
At minimum: variable lighting (natural + artificial, including evening conditions), clutter (counters, floors, kid spaces), soft surfaces (carpet, bedding, upholstery, laundry), pet presence where applicable, and multi-resident scenes. Deployment-realistic capture also covers transitions between rooms, doorways, stairs, and the kind of environmental edge cases (spills, dropped items, mid-task interruptions) that domestic robots actually encounter.
How does consent work when minors or multiple residents are in frame?
Truelabel's household protocols require informed consent from every resident in the household, with separate guardian consent for any minor in frame. Sessions are pre-briefed and capture pauses if anyone not pre-consented enters the scene. Each contribution carries a session-level consent artifact with revocation terms, and downstream license verification metadata travels with the delivered data.
What scale of capture is typical for a household program?
Volumes vary by task complexity, but a typical first-batch order targets 50–200 hours of accepted footage across 10–30 households for one task family (e.g., kitchen prep). Buyers scale up to 1,000+ hours after first-batch acceptance, with task-family expansion (laundry, cleaning) running in parallel collection waves.
What sensor stack does household capture typically need?
For most household task families: a head-mounted egocentric camera at 30-60 Hz, hand-pose tracking (Vive trackers or Quest controller-based for budget setups, optical hand-tracking glove for higher fidelity), IMU streams on the head rig, and optional force-torque on the manipulator if it's a robot rig (vs human teleoperator). Object-rich tasks add a wrist-mounted camera for grasp context. Long-horizon kitchen flows benefit from appliance state telemetry (oven on/off, fridge open/closed) when accessible.
How is household capture different from warehouse capture in delivery terms?
Warehouse captures pack tightly: 5-10 minute episodes, narrow task vocabulary, high per-hour throughput. Household captures are the opposite — 60-90 minute sessions, broad task vocabulary, lower per-hour density but richer per-episode signal. That changes the buyer's procurement math: warehouse asks 'how many episodes' while household asks 'how many distinct households + tasks' to hit the environmental-diversity target.
How long does a first-batch household program take from kickoff?
Standard timeline: 1 week scoping (embodiment + task family + environmental profile), 2 weeks collector matching and first-batch capture (8-12 households for one task family), 1 week eval against buyer rubric, 1 week buyer decision. ~5 weeks kickoff-to-first-batch-accepted. Scale-up to 1,000+ hours runs 8-12 weeks depending on parallel-collection capacity.
Looking for household task robot data?
Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners — every delivery includes consent artifacts and commercial licensing by default.
Request household task data