truelabelRequest data

Geographic sourcing

North American egocentric data for physical AI

North American egocentric data is first-person video captured from collectors based in the United States and Canada, used to train physical AI systems for deployment in North American environments. Geography matters because product packaging, store layouts, sidewalk infrastructure, household appliances, and even ambient lighting differ enough across regions that models trained on global capture under-perform on NA-specific deployment. truelabel routes North American egocentric requests to vetted collectors in US and Canadian metros with commercial licensing, multi-resident consent, and per-session location metadata.

Updated 2026-05-21
By TrueLabel Sourcing
Reviewed by TrueLabel Sourcing ·
north american egocentric data
US + CAGeographic coverage with commercial licensing
4Sub-verticals from one sourcing surface
Per-metroLocation metadata granularity

Quick facts

Request type
OTS or NET_NEW exclusive collection
Geographic scope
US (top-20 metros) + Canada (top-10 metros)
Sub-verticals
Household, commercial, retail, last-mile
Volume
50-300 hours first-batch by sub-vertical
Rights
Commercial training, jurisdiction-specific consent artifacts

Comparison

SourceStrengthNA-fit limitation
Ego4D (global)Massive scale, broad activity coverageNot segmented by region; not commercially licensed
EPIC-KITCHENS (global)Rich kitchen-task structureNon-commercial license; UK-skewed environments
Buyer-internal NA captureMaximum control over rig + protocolSlow, expensive, limited metro spread
truelabel North American sourcingVetted US/CA collectors, per-metro metadata, commercial licensingRequires defined metro mix + first-batch QA

Why geographic origin matters for egocentric capture

Egocentric data trained for global deployment often under-performs on region-specific tasks because environmental statistics aren't uniform across the world. North American product packaging (UPC layouts, package sizes, brand visibility), store layouts (grocery aisle widths, big-box shelving systems), household appliances (US/CA appliance designs and form-factors), sidewalk and curb infrastructure, and even ambient lighting (color temperature variations across regions) differ in ways that show up at deployment when a model trained on global data is sent into a North American context.

Public egocentric corpora — Ego4D spans 74 locations globally [1] but isn't segmented by region for buyer-side filtering. EPIC-KITCHENS' non-commercial licensing blocks production use regardless.

"You may not use the material for commercial purposes."

[2]

That single clause is why a London-trained corpus can't substitute for US/CA household capture in production. Open X-Embodiment aggregates 527 skills globally [3] but provides no per-region segmentation either. The buy-side response is geographically segmented capture, ideally with factory-style pipelines that can scale [4]. Humanoid programs deploying in North American logistics [5] specifically need NA-environment training distributions.

Truelabel's North American collector network covers all four sub-verticals — household, commercial, retail, last-mile — from a single sourcing surface at /physical-ai-data-marketplace. Licensing and consent framework anchors at /egocentric-data-licensing.

  • Household — US and Canada residential environments
  • Commercial — warehouse, light-industrial, fulfillment
  • Retail — grocery, big-box, convenience (subject to retailer access)
  • Last-mile — sidewalk and curb capture in US and Canadian metros

What 'NA-specific' actually means at the data level

Region-specific capture isn't a marketing label — it changes what's in the frames. Object distributions: US/CA packaging dimensions (Walmart-format household goods, US-standard pantry containers), appliance form-factors (top-mount refrigerators vs European cabinet-integrated), tool conventions (Imperial-unit fasteners alongside metric), seasonal product cycles (US holiday merchandising vs European). Environmental statistics: ceiling heights, doorway widths, ambient lighting color temperature, exterior signage density. Behavioral conventions: pedestrian sidewalk-side patterns, retail checkout layouts, residential floor-plan archetypes. HOI4D-style hand-object interaction work [6] documents how much grasp behavior depends on object distribution — and NA object distributions are distinct enough that capture has to be NA-sourced to match.

The procurement implication: NA-segmented capture needs collectors operating inside the actual deployment geography, not collectors with global mandates. Truelabel's network reflects that — collector capacity is metro-tagged and filterable.

  • Object distributions: US/CA packaging, appliances, tools
  • Environmental statistics: ceiling heights, doorway widths, ambient lighting
  • Behavioral conventions: sidewalks, retail checkout, residential layouts
  • Per-object grasp specs differ when object distributions differ

Metro coverage and stratification

Active US collector coverage spans the top-20 metros (New York, Los Angeles, Chicago, Houston, Phoenix, Philadelphia, San Antonio, San Diego, Dallas, San Jose, Austin, Jacksonville, Fort Worth, Columbus, Charlotte, Indianapolis, San Francisco, Seattle, Denver, Washington DC). Canadian coverage spans the top-10 metros (Toronto, Montreal, Vancouver, Calgary, Edmonton, Ottawa, Winnipeg, Quebec City, Hamilton, Kitchener). Tier-2 metros and rural geographies are available with longer lead times. Buyers can specify metro mix (e.g., top-10 only, balanced geographic distribution, sun-belt-only, cold-climate-only) or accept truelabel's default deployment-realistic distribution.

Stratification dimensions buyers can specify beyond metro: urban vs suburban vs rural, climate zone, residential density, retailer brand availability. DROID's diversity bar [7] is the benchmark for how broadly buyers should stratify — narrow-distribution NA capture deploys worse than narrow-distribution global capture.

  • Top-20 US metros + top-10 Canadian metros active
  • Tier-2 metros + rural available with longer lead times
  • Buyer-specified or default deployment-realistic mix
  • Stratifiable: urban/suburban/rural, climate, density, brand

How truelabel structures a North American request

Buyers specify the target metros or regional distribution, the sub-vertical(s), and per-deployment environmental requirements. Truelabel matches against active US and Canadian collectors, runs first-batch evals across representative geography, and scales after buyer acceptance. Delivery defaults to LeRobotDataset v3 with per-session metadata including capture metro, environmental conditions, and downstream license verification context [8]. Requests can also stratify by metro density (top-10 metros vs national distribution) and by season for outdoor sub-verticals. The warehouse-egocentric sub-vertical also has a dedicated sourcing spec at /sourcing/egocentric-warehouse-video.

  • Metro or regional-distribution scoping at brief stage
  • Sub-vertical filtering (household / commercial / retail / last-mile)
  • First-batch eval across representative metros
  • Per-metro, per-season metadata in delivery

Jurisdiction-specific consent and licensing

Consent and licensing requirements track jurisdiction, not just country. In the US, contributor consent follows informed-consent standards with commercial-training-use language and revocation terms; multi-resident households and minors require the additional consent layers documented in the household playbook. State-specific privacy law adds layers in jurisdictions like California (CCPA), Illinois (BIPA for biometric data), and Texas (CUBI). Canadian capture follows PIPEDA federally, with Quebec (Bill 25) and BC (FOIPPA / PIPA) adding province-specific layers. Truelabel's per-session consent artifact captures the jurisdiction context so buyer compliance teams can verify defensibility at the contribution level, not just the corpus level. That's the practical answer to a procurement-side counsel asking 'is this dataset defensible if a contributor in [California / Quebec / Illinois] objects post-capture?'

  • US federal: informed-consent baseline + commercial-training language
  • CA federal: PIPEDA + provincial layers (QC Bill 25, BC PIPA/FOIPPA)
  • Biometric / facial-image: CCPA + BIPA + CUBI state layers in US
  • Per-session consent artifact carries the jurisdiction context

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

External references and source context

  1. Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Ego4D's 3,670 hours across 74 worldwide locations establishes the global egocentric baseline but is not segmented by region in a way that supports NA-specific deployment training.

    arXiv
  2. Project site

    EPIC-KITCHENS documents non-commercial licensing that excludes the dataset from commercial physical-AI deployment, regardless of geography.

    epic-kitchens.github.io
  3. Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Open X-Embodiment aggregates 527 skills across 22 robot embodiments globally, but provides no per-region segmentation that lets buyers filter for NA deployment context.

    arXiv
  4. NVIDIA: Physical AI Data Factory Blueprint

    Physical AI data programs benefit from factory-style pipelines that can deliver geographically segmented capture at scale.

    investor.nvidia.com
  5. NVIDIA GR00T N1 technical report

    GR00T N1 establishes that humanoid foundation models for commercial deployment require data pyramids matching the target deployment context.

    arXiv
  6. Project site

    HOI4D's category-level hand-object interaction sequences expose the per-object grasp specifications that NA-specific procurement uses to surface region-specific object distributions (US/CA packaging, appliances).

    hoi4d.github.io
  7. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    DROID's 76,000 in-the-wild demonstrations across 564 scenes establish the environmental-diversity bar buyers benchmark NA-segmented capture against.

    arXiv
  8. LeRobot dataset documentation

    LeRobotDataset v3 schema carries per-session metadata sufficient to encode capture metro, jurisdiction, and consent framework — the metadata layer that makes regionally segmented delivery practical.

    Hugging Face

FAQ

Why pay for NA-specific capture instead of using global egocentric data?

Global capture is a useful baseline but doesn't carry the regional environmental statistics that affect deployment. North American product packaging, store layouts, household appliances, and infrastructure differ enough that models trained on global capture show measurable performance drops at NA deployment. Buyers deploying production systems in the US and Canada typically need a layer of NA-segmented capture for the last mile of training accuracy.

Which US and Canadian metros are covered?

Active collector coverage spans top-20 US metros and top-10 Canadian metros, with capacity weighted toward New York, Los Angeles, Chicago, Houston, Toronto, Vancouver, and Montreal. Tier-2 metro coverage is available with longer lead times. Buyers can specify metro mix or accept truelabel's default geographic distribution.

Can NA capture be combined with global baseline data in one delivery?

Yes. Many buyers run a mixed-portfolio strategy: a global baseline (often from public corpora like Open X-Embodiment for tasks that allow it) plus an NA-segmented top-up layer for deployment-context training. Truelabel's delivery metadata makes the regional split transparent, so buyers can experiment with capture-mix ratios during model training.

Does NA-specific consent and licensing differ from global capture?

Consent and licensing requirements track jurisdiction. In the US and Canada, contributor consent follows informed-consent standards with commercial-training-use language and revocation terms; multi-resident households and minors require the additional consent layers documented in the household and egocentric-data-licensing pages. Province- or state-specific privacy law (e.g., California's CCPA, Quebec's Bill 25, Illinois's BIPA, Texas's CUBI) is addressed in the per-session consent artifact when applicable.

How does NA-specific capture interact with US trade compliance (export controls)?

Egocentric video and teleoperation data are not currently subject to ITAR or EAR export controls in the way dual-use technology is, but downstream model training that uses NA-sourced data may itself be export-controlled depending on the buyer's use case (defense, intelligence, certain commercial applications). Truelabel's data delivery doesn't change export-control posture, but the per-session metadata makes the data's NA origin auditable, which is often what buyer compliance teams need to demonstrate.

Can buyers specify single-metro or city-block-level capture?

Yes for single-metro (e.g., 'San Francisco only', 'Toronto only'). City-block level is more constrained — public-space sidewalk capture is freely available, but block-specific commercial-space capture depends on the property-owner location releases truelabel can secure for that block. For deployment-critical block-level capture (e.g., buyer's own retail brand at specific locations), buyer-introduced collectors with the right access can be onboarded through the standard vetting pipeline.

What's the price difference between NA-segmented and global capture?

NA-segmented capture typically runs 15-30% more per hour than global / unrestricted-geography capture because the collector pool is narrower and the consent layer is more jurisdiction-specific. The premium is usually justified by the deployment-realism improvement — buyers who try global-only and then NA-only frequently find the marginal cost worth it for the production performance lift. Buyers running mixed-portfolio strategies pay the premium only on the NA-specific layer, which limits the budget impact.

Looking for north american egocentric data?

Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners — every delivery includes consent artifacts and commercial licensing by default.

Request North American egocentric data