Egocentric video data hub
Egocentric Video Data: Capture, License & Deliver for Physical AI
Egocentric video data is first-person video recorded from a head-mounted or wearable camera while a person performs a real-world task. It teaches physical-AI and robotics models hand-object interaction and viewpoint grounding. Truelabel sources it from 20,000+ consented collectors worldwide and delivers it robotics-ready, with full provenance.
Comparison
| Ego4D | EPIC-KITCHENS | Egocentric-10K | Truelabel custom | |
|---|---|---|---|---|
| Commercial use | Research license | CC BY-NC (non-commercial) | Apache-2.0 (permissive) | Rights-cleared per spec |
| Scale | ~3,670 hours | ~100 hours | 10,000 hours | 100,000+ hours delivered |
| Capture network | Fixed academic cohort | 45 participants | Factory-worker cohort | 20,000+ collectors, global |
| Consent + provenance | Dataset-level | Dataset-level | Dataset-level | Per-clip consent + provenance |
| Fit to your embodiment | As-is | Kitchen-only | Factory-only | Scoped to your spec |
| Acceptance after pilot | n/a | n/a | n/a | ~97% |
What is egocentric video data?
Egocentric video data is first-person footage recorded from a head-, chest-, or wrist-mounted camera while a person performs a real task. The viewpoint is what makes it valuable: a robot perceives the world through a camera near its own head or end-effector, so a first-person human demonstration transfers with far less domain shift than third-person (exocentric) footage shot from across the room. That viewpoint match is why egocentric data is the default for manipulation and vision-language-action (VLA) training. The cluster links below cover the definitions, the public datasets, custom capture, and the licensing that makes it usable.
Why does physical AI need egocentric data?
Physical-AI models learn hand-object interaction and viewpoint grounding from the same first-person perspective the robot will act from. The data stack maps cleanly: perception and contact learning draw on hand-object interaction data; world-modeling draws on embodied AI datasets; policy learning draws on egocentric video datasets; and grounding for commercial use depends on consent and provenance. Each of those is a page in the cluster index below.
Public datasets vs custom collection — which do you need?
Use public corpora when you are pretraining or running ablations and the public scenes are close enough to your task. Commission custom capture when your embodiment, task distribution, environment, or commercial-use rights diverge from what the public sets offer. The table below compares the major public egocentric datasets against Truelabel custom capture on the dimensions that decide a production program: licensing, scale, capture network, consent, and fit to your embodiment.
How Truelabel captures and delivers egocentric data
Truelabel is a physical-AI data marketplace: capture, enrichment, provenance, and robotics-ready delivery in one pipeline. A global network of more than 20,000 consented collectors captures to your spec; a calibration pilot returns its first batch within about a week, then accepted batches ship on a recurring cadence with first-review acceptance running around 97%. Pricing runs about $5 to $100 per hour of delivered footage depending on the capture environment, geography, and enrichment layers, and every clip carries per-session consent and provenance. Across 100,000+ delivered hours for VLA and world-model teams, the rights and consent questions that actually block deployment get handled before delivery, not after.
Is egocentric data legally usable for commercial robotics?
Most public egocentric corpora are research- or non-commercial-licensed, so commercial use generally requires custom capture with explicit consent and location releases. Ego4D ships under a research license, EPIC-KITCHENS under CC BY-NC, and Egocentric-10K under permissive Apache-2.0 — the license, not the content, is usually what decides whether a clip can train a deployed model. The licensing and consent pages in the cluster index below cover how to clear footage for commercial training.
Explore the egocentric video data cluster
The links below group the full egocentric library by where you are in the decision: Define (what the data is), Consider (public datasets and comparisons), Decide (custom capture, providers, and how-to guides), and Trust (licensing, consent, and governance).
Related pages
Use these to move from category-level context into specific task, dataset, format, and comparison detail.
FAQ
What is egocentric video data?
Egocentric video data is first-person footage recorded from a head-mounted or wearable camera while a person performs a real-world task. Because the camera sits where the actor's eyes are, it captures hand-object interaction and viewpoint the way a robot's own camera would see it, which is why it is used to train physical-AI and robotics models.
Why does physical AI need first-person video data?
A robot perceives the world through a camera near its own head or gripper, so a first-person human demonstration aligns with the robot's deployment viewpoint and transfers with far less domain shift than third-person video. That viewpoint match makes egocentric data the default for manipulation and vision-language-action training.
How is egocentric data different from exocentric (third-person) data?
Egocentric data is shot from the actor's own perspective; exocentric data is shot from an external or overhead camera. Egocentric wins for fine manipulation and contact-rich tasks because it matches the robot's camera frame; exocentric is better for whole-workspace context, multi-actor scenes, and navigation.
Can I use Ego4D or EPIC-KITCHENS egocentric data commercially?
Usually not directly. Ego4D is released under a research license and EPIC-KITCHENS under CC BY-NC (non-commercial), so training a deployed commercial model on them typically requires separate permission. Commercial programs generally use custom capture with consent and commercial licensing, or permissively licensed sets such as Egocentric-10K (Apache-2.0).
How is egocentric video data collected and made consent-cleared?
Collectors wear head-, chest-, or wrist-mounted cameras and perform prompted real-world tasks, with framing and lighting checked live and faces and PII blurred before delivery. Consent-clearance means each clip carries a signed consent artifact and per-session provenance so the footage is licensable for commercial training and auditable for governance.
What does egocentric video data cost?
Truelabel pricing typically ranges from about $5 to $100 per hour of delivered footage, depending on the capture environment, geography, and the enrichment layers your spec requires (hand pose, gaze, depth, action segmentation). Custom collection is priced to the spec rather than sold as a flat-rate commodity.
How fast can Truelabel deliver a custom egocentric dataset?
A calibration pilot typically returns its first batch of data within about a week of kickoff. After the pilot calibrates the partnership, accepted batches ship on a recurring cadence scoped to the spec's complexity and volume.
How do you ensure egocentric data quality?
Every batch is gated against your acceptance rubric before scale-up, and first-review acceptance runs around 97% once a partnership's calibration pilot is complete. Each clip is consented and provenance-tracked, and footage is delivered in robotics-ready formats such as RLDS and LeRobot.
Looking for egocentric video data?
Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners and helps scope consent artifacts and commercial licensing requirements before delivery.
Request egocentric data