AI-Powered Ground Truth Generation

Watch our automated pipeline transform first-person videos into structured training data

What We Generate

Structured labels for humanoid training — extracted automatically from video.

HAND DETECTION

Human motion & intent

• 21 keypoint tracking
• Bounding boxes
• MediaPipe integration
• Per-frame output
OBJECT DETECTION

Open-vocabulary detection

• Custom object lists
• Confidence scores
• Bounding boxes
• Trajectory tracking
DEPTH ESTIMATION

Spatial understanding

• Relative depth mapping
• Near/far reasoning
• 3D spatial context
• Distance estimation
OBJECT TRACKING

Persistent object identity

• IoU-based matching
• Unique object IDs
• Handles occlusion
• Track persistence
INTERACTION STATUS

Object interaction classification

● ACTIVE
Being held/manipulated
● REACHABLE
Within arm's reach
● DETECTED
Visible, not in use
ACTION ANALYSIS

Temporal action segmentation

• reaching
• grasping
• manipulating
• releasing
Motion phases:
• motion vs pause
• start/end times
STATE CHANGES

Track object states over time

• open ↔ closed (doors, containers)
• locked ↔ unlocked
Transition timestamps recorded
AFFORDANCES

What can be done with each object

• graspable
• traversable
• openable
• turnable
• liftable
• pushable
• pourable
• insertable
• activatable
• ...

Video Samples

Sample video of a chef cooking using Rayban Meta glasses
Technical Details

Video Properties

Filenamesardines_short.mp4
Duration11.03 seconds
Resolution636x640
Frame Rate30.0 fps
DeviceSmartphone (chest-mounted)

Annotation Metrics

Objects Detected13 unique objects
Objects Interacted7 objects
Total Interactions104 events
Action Phases11 distinct phases
Hand Visibility86.8% of frames
Ground Truth & Downloads
Ground Truth JSON
Loading...

More Examples

Coffee Mug Manipulation - RayBan Meta Recording
Technical Details

Video Properties

ActionCoffee mug interaction
DeviceRayBan Meta Smart Glasses
PerspectiveNatural eye-level POV
EnvironmentKitchen/Dining area
Duration~15 seconds

Annotation Focus

Primary ObjectCoffee mug
Interaction TypeGrasping, lifting
Hand TrackingBoth hands visible
ComplexitySimple manipulation
Ground Truth & Downloads
Ground Truth JSON
Loading...
Pouring Water - RayBan Meta Recording
Technical Details

Video Properties

ActionLiquid pouring task
DeviceRayBan Meta Smart Glasses
PerspectiveNatural eye-level POV
EnvironmentKitchen workspace
Duration~20 seconds

Annotation Focus

Primary ObjectsBottle, container
Interaction TypePouring, fluid dynamics
Hand TrackingPrecise grip tracking
ComplexityCoordinated bi-manual
Ground Truth & Downloads
Ground Truth JSON
Loading...
Cross-Device Data Collection
📱

Smartphone

Chest-mounted POV recording

Example: Sardines manipulation
🕶️

RayBan Meta

Natural eye-level perspective

Examples: Coffee mug, Water pouring
📹

Action Cameras

Head/body-mounted capture

GoPro, Insta360 support

Get Custom Dataset

Get tailored datasets for your specific robotics training needs across any device or environment

Request Dataset →