AI-Powered Ground Truth Generation

Watch our automated pipeline transform first-person videos into structured training data

What We Generate

Structured labels for humanoid training — extracted automatically from video.

HAND DETECTION

Human motion & intent

• 21 keypoint tracking

• Bounding boxes

• MediaPipe integration

• Per-frame output

OBJECT DETECTION

Open-vocabulary detection

• Custom object lists

• Confidence scores

• Bounding boxes

• Trajectory tracking

DEPTH ESTIMATION

Spatial understanding

• Relative depth mapping

• Near/far reasoning

• 3D spatial context

• Distance estimation

OBJECT TRACKING

Persistent object identity

• IoU-based matching

• Unique object IDs

• Handles occlusion

• Track persistence

INTERACTION STATUS

Object interaction classification

● ACTIVE

Being held/manipulated

● REACHABLE

Within arm's reach

● DETECTED

Visible, not in use

ACTION ANALYSIS

Temporal action segmentation

• reaching

• grasping

• manipulating

• releasing

Motion phases:

• motion vs pause

• start/end times

STATE CHANGES

Track object states over time

• open ↔ closed (doors, containers)

• locked ↔ unlocked

Transition timestamps recorded

AFFORDANCES

What can be done with each object

• graspable

• traversable

• openable

• turnable

• liftable

• pushable

• pourable

• insertable

• activatable

• ...

Video Samples

Sample video of a chef cooking using Rayban Meta glasses

Technical Details▼

Video Properties

Filenamesardines_short.mp4

Duration11.03 seconds

Resolution636x640

Frame Rate30.0 fps

DeviceSmartphone (chest-mounted)

Annotation Metrics

Objects Detected13 unique objects

Objects Interacted7 objects

Total Interactions104 events

Action Phases11 distinct phases

Hand Visibility86.8% of frames

Ground Truth & Downloads

Ground Truth JSON

Loading...

Download Ground Truth JSON

More Examples

Coffee Mug Manipulation - RayBan Meta Recording

Technical Details▼

Video Properties

ActionCoffee mug interaction

DeviceRayBan Meta Smart Glasses

PerspectiveNatural eye-level POV

EnvironmentKitchen/Dining area

Duration~15 seconds

Annotation Focus

Primary ObjectCoffee mug

Interaction TypeGrasping, lifting

Hand TrackingBoth hands visible

ComplexitySimple manipulation

Ground Truth & Downloads

Ground Truth JSON

Loading...

Download Ground Truth JSON

Pouring Water - RayBan Meta Recording

Technical Details▼

Video Properties

ActionLiquid pouring task

DeviceRayBan Meta Smart Glasses

PerspectiveNatural eye-level POV

EnvironmentKitchen workspace

Duration~20 seconds

Annotation Focus

Primary ObjectsBottle, container

Interaction TypePouring, fluid dynamics

Hand TrackingPrecise grip tracking

ComplexityCoordinated bi-manual

Ground Truth & Downloads

Ground Truth JSON

Loading...

Download Ground Truth JSON

Cross-Device Data Collection

📱

Smartphone

Chest-mounted POV recording

Example: Sardines manipulation

🕶️

RayBan Meta

Natural eye-level perspective

Examples: Coffee mug, Water pouring

📹

Action Cameras

Head/body-mounted capture

GoPro, Insta360 support

Get Custom Dataset

Get tailored datasets for your specific robotics training needs across any device or environment

Request Dataset →