truelabelRequest data

Dataset hub

Egocentric Video Datasets

An egocentric video dataset contains video captured from the viewpoint of a person, wearable device, or robot. Robotics and physical AI teams use these datasets to study hands, objects, tools, spaces, task progress, and interaction context, while separately evaluating consent, provenance, licensing, and task fit.

Updated 2026-05-25
By TrueLabel Sourcing
Reviewed by TrueLabel Sourcing ·
egocentric video datasets

Comparison

Public exampleUseful forCaveat to verify
Ego4DBroad egocentric daily-life researchAccess terms and task fit
Ego-Exo4DPaired first- and third-person skilled activitiesAnnotation and viewpoint fit
EPIC-KITCHENSEgocentric kitchen activity recognitionNon-commercial licensing caveats

What egocentric datasets typically capture

Egocentric video datasets usually capture a first-person view of task execution. Public references such as Ego4D, Ego-Exo4D, and EPIC-KITCHENS help teams benchmark tasks and terminology, but each source must be evaluated for access terms, scope, modality, and limitations before use [1] [2] [3].

Modalities and tasks to specify

A buyer brief should separate modality from task. Modality describes what is captured; task describes what the model should learn or evaluate.

DimensionExamplesWhy it matters
ModalityRGB, audio, gaze, IMU, depth, pose, narrationDefines capture hardware and annotation needs
TaskAction recognition, anticipation, hand-object interaction, SLAMDefines labels, clips, and acceptance checks
DomainKitchen, workshop, warehouse, home, retailDefines object set and environment context
Common egocentric dataset planning dimensions

Public datasets are references, not commercial supply by default

Public datasets are useful for benchmarking and task language. They should not be treated as commercial training supply unless the source terms, consent basis, and downstream license allow that use. EPIC-KITCHENS materials, for example, carry visible non-commercial licensing language in the public annotation license [4].

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

External references and source context

  1. Egocentric video remains useful but incomplete for robot data buyers

    Ego4D is an official public reference for egocentric video dataset scope, access, and dataset documentation.

    ego4d-data.org
  2. Ego-Exo4D project site

    Ego-Exo4D is the official project source for paired first-person and third-person skilled-activity capture.

    ego-exo4d-data.org
  3. EPIC-KITCHENS project site

    EPIC-KITCHENS is an official project reference for egocentric kitchen-activity data.

    epic-kitchens.github.io
  4. EPIC-KITCHENS-100 annotations license

    The EPIC-KITCHENS-100 annotation license is a visible source for non-commercial licensing caveats.

    GitHub
  5. Ego4D: Around the World in 3,000 Hours of Egocentric Video

    The Ego4D paper is the source-backed reference for first-person daily-life activity video and benchmark design.

    arXiv
  6. Ego-Exo4D annotations documentation

    Ego-Exo4D annotation documentation supports dataset-structure and skilled-activity-label discussion.

    docs.ego-exo4d-data.org
  7. Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    The Ego-Exo4D paper describes skilled human activity from first- and third-person perspectives.

    arXiv
  8. Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100

    The EPIC-KITCHENS-100 paper supports public kitchen-activity benchmark facts and caveats.

    arXiv

FAQ

What is an egocentric video dataset?

It is a dataset of videos recorded from the actor or agent viewpoint, often using wearable, head-mounted, handheld, or robot-mounted cameras.

What are egocentric video datasets used for?

They support research and planning for action recognition, anticipation, hand-object interaction, task understanding, and embodied AI evaluation.

What modalities are common in egocentric video data?

Common modalities include RGB video, audio, gaze, IMU, depth, pose, point cloud data, narration, and task annotations.

Can public egocentric datasets be used commercially?

Do not assume so. Review the dataset source, license, consent basis, and downstream model-use terms before using any public dataset in a commercial program.

Looking for egocentric video datasets?

Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners and helps scope consent artifacts and commercial licensing requirements before delivery.

Compare public dataset limits with a custom collection plan