Sub-vertical sourcing

Last-mile delivery robot data sourcing

Last-mile delivery robot data captures the navigation, manipulation, and handoff behaviors required for autonomous sidewalk delivery and curb-to-customer fulfillment. The task split is roughly 70% navigation (sidewalk traversal, curb negotiation, obstacle avoidance, weather conditions) and 30% manipulation (package retrieval from compartment, customer handoff, door-step placement). Procurement constraints differ from indoor capture: outdoor location releases, weather variation across capture windows, sidewalk-jurisdiction permitting, and pedestrian-presence consent. truelabel routes last-mile requests to outdoor-cleared collectors operating in US and Canadian metros.

Updated 2026-05-21

By Truelabel Team

Reviewed by Truelabel Team · May 21, 2026

last-mile delivery robot data

Request last-mile data How sourcing works

70/30Navigation vs manipulation task split

All-weatherCapture coverage including rain, snow, low-light

US/CAMetro coverage with location releases

Quick facts

Request type: OTS or NET_NEW exclusive collection
Task split: 70% navigation, 30% manipulation by default
Environment: US/CA metros, all-weather, day + low-light
Volume: 100-300 hours first-batch, 1,000-3,000 scale-up
Rights: Commercial training + sidewalk-permitting on file

Comparison

Source	Strength	Limitation
Indoor robotics datasets	Manipulation task coverage	No outdoor environment, no weather, no curb
Autonomous vehicle (AV) datasets	Outdoor urban coverage	Wrong embodiment (vehicle), no sidewalk-scale capture
Internal pilot fleet	Best deployment-realism if pilot fleet exists	Slow to scale, expensive, limited geographic coverage
truelabel last-mile sourcing	Pre-cleared sidewalk permitting, weather-stratified capture	Outdoor capture has longer per-route ramp than indoor

Why last-mile data is mostly outdoors and mostly missing

Most public robot-learning corpora are indoor and tabletop. Open X-Embodiment aggregates 527 skills across 22 embodiments ^[1], but outdoor sidewalk navigation, curb negotiation, and curb-to-door handoff are essentially absent. Ego4D touches outdoor first-person segments ^[2] but isn't structured for robot-navigation training. The gap matters because last-mile robots actually deploy outside, in weather, on sidewalks subject to municipal permitting, with pedestrian-presence patterns that indoor capture doesn't carry. Models trained primarily on indoor data show the gap at deployment: weather sensitivity, curb-negotiation failures, pedestrian-handling ambiguity.

The buy-side response leans on factory-style data pipelines ^[3] that can deliver outdoor capture under varied conditions. GR00T-style architectures need heterogeneous task data spanning navigation, manipulation, and human-interaction primitives ^[4]; commercial humanoid deployments in adjacent logistics ^[5] establish the precedent.

"LeRobotDataset v3.0 is a standardized format for robot learning data. It provides unified access to multi-modal time-series data, sensorimotor signals and multi‑camera video, as well as rich metadata for indexing, search, and visualization on the Hugging Face Hub."
— from LeRobot dataset documentation — Hugging Face

^[6]

The standardized format is what makes outdoor capture practical to deliver — GPS, IMU, depth, multi-view RGB, weather, and time-of-day all travel inside a single schema buyers can ingest without bespoke ETL. Truelabel collectors operate in US and Canadian metros with pre-cleared sidewalk-jurisdiction permitting and outdoor location releases .

Sidewalk traversal — straight, turns, narrow passages, surface variation
Curb negotiation — drop-off, drop-on, transition timing
Obstacle avoidance — pedestrians, dogs, scooters, parked vehicles, construction
Weather conditions — rain, snow, low-light, glare, puddles
Package manipulation — retrieve from compartment, hand off, place at door

Stratified capture is how diversity targets actually get met

Last-mile diversity is multi-dimensional: weather × time-of-day × sidewalk type × metro × pedestrian density × curb-condition. A naive flat capture hits one diagonal of that space. Stratified capture — explicit episode targets per cell — is what gets coverage everywhere the deployment will operate. Domain-randomization research establishes the principle: training across explicitly varied environment dimensions transfers better to deployment than narrow-distribution training ^[7]. DROID's in-the-wild diversity bar ^[8] is the manipulation-side equivalent of the navigation-side stratification last-mile programs need.

Truelabel's stratification protocol: weather (clear / overcast / rain / snow / low-light), time-of-day (morning / midday / dusk / night), sidewalk type (residential / commercial / mixed-use / urban dense), metro (top-20 US + top-10 CA), pedestrian density (empty / light / moderate / dense). Buyers specify required coverage per dimension or accept truelabel's deployment-realistic default mix.

Weather: clear / overcast / rain / snow / low-light
Time-of-day: morning / midday / dusk / night
Sidewalk type: residential / commercial / mixed-use / urban dense
Pedestrian density: empty / light / moderate / dense
Curb condition: standard / cut / no-curb / damaged

Permitting and consent for outdoor capture

Outdoor capture in US and Canadian metros lives in a different regulatory environment from indoor robot data. Public sidewalks: capture is permitted under most municipal codes, but sustained capture sessions benefit from sidewalk-jurisdiction permitting to avoid friction with local enforcement. Public-space pedestrian capture doesn't require individual consent under existing case law in most US states and Canadian provinces, but truelabel's protocol includes face-blur post-processing for any clearly identifiable individual in frame as a defensible default. Semi-public spaces (commercial property exteriors, residential complexes, transit stations) require property-owner location releases, which truelabel pre-clears before capture starts.

The compliance picture changes per jurisdiction. California (CCPA), Quebec (Bill 25), and other privacy-forward jurisdictions have stricter biometric / facial-image rules; truelabel's per-session consent artifact captures the jurisdiction context so buyer compliance teams can verify defensibility at the contribution level.

Public sidewalks — sidewalk-jurisdiction permitting on file
Public-space pedestrians — face-blur post-processing default
Semi-public spaces — property-owner location release required
Privacy-forward jurisdictions (CA, QC) — additional per-session artifacts

How truelabel structures a last-mile request

Last-mile requests scope embodiment (wheeled drone, quadruped, humanoid), task split (navigation-heavy vs manipulation-heavy), and target metros. Truelabel pre-clears sidewalk-jurisdiction permitting and outdoor location releases before collection begins. First-batch evals run across 2-3 representative routes with weather variation, and scale-up follows buyer acceptance. Delivery defaults to LeRobotDataset v3 with outdoor-specific metadata (GPS, weather, time-of-day, pedestrian-presence flag).

Embodiment + task-split scoping at brief stage
Sidewalk jurisdiction + outdoor location releases pre-cleared
First-batch eval across 2-3 routes with weather variation
Outdoor sensor stack metadata (GPS, weather, time-of-day)

Adjacencies: where last-mile shares procurement with industrial

Last-mile sub-vertical shares two procurement profiles with industrial / commercial first-person capture: outdoor or semi-public consent (both need location releases, neither relies on individual consent), and stratified environmental diversity (both depend on coverage across lighting and weather rather than narrow-distribution scale). Buyers running humanoid programs that span last-mile delivery + warehouse logistics + light-industrial often source from the same collector pool, with the task family and sensor stack specified per-request. Truelabel routes the cross-vertical efficiency back into the per-collector cost basis.

Shared outdoor / semi-public consent profile with industrial capture
Shared stratified-diversity capture protocol
Cross-vertical collector reuse drops per-request cost basis

Use these to move from category-level context into specific task, dataset, format, and comparison detail.

Retail robotics data sourcingRelated page Egocentric Video Data Collection for Robotics and Embodied AIRelated page Household task data for domestic robotsRelated page Navigation training dataTask-specific requirements Best Egocentric Video Data Providers for Robotics and VLA Models (2026)Related page Best teleoperation data providers 2026Related page Data provenance for physical AIRelated page What is physical AI training data?Related page

External references and source context

Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment aggregates indoor robot demonstrations but contains negligible outdoor sidewalk or curb-handoff data — buyers fill that gap through exclusive capture.
arXiv ↩
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D's first-person outdoor segments establish baseline pedestrian + sidewalk capture coverage that last-mile programs benchmark deployment-specific exclusive capture against.
arXiv ↩
NVIDIA: Physical AI Data Factory Blueprint
Last-mile delivery robot programs need factory-style data pipelines spanning navigation, manipulation, and handoff behaviors.
investor.nvidia.com ↩
NVIDIA GR00T N1 technical report
GR00T N1 demonstrates that humanoid and mobile-manipulator deployments require heterogeneous data pyramids including navigation, manipulation, and human-interaction primitives.
arXiv ↩
Figure + Brookfield humanoid pretraining dataset partnership
Commercial humanoid deployments in logistics establish the adjacent precedent for autonomous package handoff at curb and door.
figure.ai ↩
LeRobot dataset documentation
LeRobotDataset v3 schema accommodates outdoor sensor stacks (GPS, IMU, depth, multi-view RGB) with per-episode weather and time-of-day metadata.
Hugging Face ↩
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Domain randomization across simulated environment variants achieves higher real-world transfer than narrow-distribution training — the principle that justifies stratified weather and time-of-day capture in last-mile programs.
arXiv ↩
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID's 76,000 in-the-wild demonstrations across 564 scenes set the environmental-diversity bar; last-mile captures need similar diversity across weather, sidewalk types, and curb conditions.
arXiv ↩

FAQ

Why is last-mile data so distinct from indoor robot data?

Three forcing functions. Environment: sidewalks, curbs, weather, and outdoor lighting differ from indoor capture in ways policies trained on indoor data don't generalize over. Behavioral context: pedestrian-handling, parked-vehicle proximity, and curb-handoff timing have no indoor equivalent. Procurement: outdoor capture requires sidewalk-jurisdiction permitting and location releases that indoor capture sidesteps entirely.

What weather conditions should a last-mile dataset cover?

At minimum: clear daylight, overcast, rain (light + heavy), and low-light (dusk + night). Deployment-critical datasets also cover snow, glare conditions, puddle navigation, and post-rain surface conditions. Truelabel's capture protocols stratify episodes by weather and time-of-day, so buyers can request balanced distributions or weighted coverage for their target deployment geography.

How do you handle pedestrian and bystander consent in outdoor capture?

Outdoor public-space capture in US and Canadian metros typically does not require individual pedestrian consent under existing case law, but truelabel's protocols include face-blur post-processing for any clearly identifiable individual in frame, and capture sessions avoid private property without explicit permission. Sessions inside semi-public spaces (commercial property, residential complexes) require property-owner location releases, which are pre-cleared before capture begins.

What's the typical scale for a last-mile capture program?

First-batch orders target 100-300 hours of accepted capture across 5-10 metro routes, balanced by weather and time-of-day. Buyers scale up to 1,000-3,000 hours after first-batch acceptance, with route expansion (additional metros, additional embodiment configurations) running in parallel waves. Pure navigation captures volume faster than manipulation; handoff demonstrations require longer per-episode capture and lower per-route throughput.

What sensor stack does last-mile capture typically require?

Baseline: multi-view RGB (front + side cameras), depth or stereo for sidewalk-distance, IMU for embodiment pose, GPS for route geocoding. For manipulation phases (compartment retrieval, customer handoff): hand-pose tracking + force-torque on the manipulator. Some buyers add LiDAR for harder edge-case obstacles (transparent surfaces, glass doors, dark low-contrast pavement). Outdoor sensor stacks need calibration against weather conditions — rain on a depth-sensor housing degrades signal in ways indoor capture sidesteps.

Can last-mile data integrate with AV (autonomous vehicle) datasets?

Partial fit. AV datasets like Waymo Open Dataset and nuScenes carry the right outdoor environment and weather coverage, but the wrong embodiment — AVs move at vehicle speed, occupy a vehicle frame, have vehicle-scale sensor stacks. Last-mile robots move at pedestrian speed, occupy sidewalk frames, have sidewalk-scale sensor stacks. AV data is useful for outdoor environment pretraining, not policy training. Most last-mile buyers run a mixed-portfolio: AV pretraining baseline + exclusive sidewalk-scale capture for policy fine-tuning.

What's the typical lead time from kickoff to first-batch acceptance?

Outdoor capture has longer kickoff than indoor: ~2 weeks for sidewalk-jurisdiction permitting + outdoor location releases, then 3-4 weeks for first-batch capture across stratified weather and routes, then 1-2 weeks for eval against buyer rubric. ~6-8 weeks kickoff-to-first-batch-accepted is the standard. Weather seasonality can extend the timeline if the capture spec requires snow or other constrained-window conditions.

Looking for last-mile delivery robot data?

Specify modality, task, environment, rights, and delivery format. Truelabel matches you with vetted capture partners and helps scope consent artifacts and commercial licensing requirements before delivery.

Request last-mile data