Beruflich Dokumente
Kultur Dokumente
Toby Sharp Mark Finocchi Richard Moore Alex Kipman Andrew Blake Microsoft Research Cambridge & Xbox Incubation CVPR 2011 Best Paper
2/26/13
2/26/13
OUTLINE
Introduction Data Body Part Inference and Joint Proposals Experiments Discussion
2/26/13
Introduction
gaming, human-computer interaction, security, telepresence, health-care tracking from frame to frame but struggle to re-initialize quickly and so are not robust
Introduction
Tracking people with twists and exponential maps (CVPR 1998) Tracking loose limbed people (CVPR 2004) Nonlinear body pose estimation from depth images (DAGM
2005)
Real-time hand-tracking with a color glove (ACM 2009) Real time motion capture using a single time-of-flight camera (CVPR 2010)
2/26/13
Introduction
inspired by recent object recognition work that divides objects into parts
Object class recognition by unsupervised scale-invariant learning [CVPR 2003] The layout consistent random field for recognizing and segmenting partially occluded objects [CVPR 2006]
2/26/13
Introduction
Depth Image
segme nt
dense probabilistic body part labeling + spatially localized near skeletal joints
genera te
3D proposal
2/26/13
Introduction
Evaluating each pixel separately generate realistic synthetic depth images train a deep randomized decision forest classifier
Training data
2/26/13
avoid overfitting
Introduction
Overfitting
2/26/13
Introduction
For further speed, the classifier can be run in parallel on each pixel on a GPU mean shift resulting in the 3D joint proposals
2/26/13
Dat a
2/26/13
PDF Analysis
Intuitive Description
Region of interest Center of mass
2/26/13
Intuitive Description
Region of interest Center of mass
2/26/13
Intuitive Description
Region of interest Center of mass
2/26/13
Intuitive Description
Region of interest Center of mass
2/26/13
Intuitive Description
Region of interest Center of mass
2/26/13
Intuitive Description
Region of interest Center of mass
2/26/13
Intuitive Description
Region of interest Center of mass
2/26/13
Main contribution
using a novel intermediate body parts representation spatially localize joints low computational cost and high accuracy
2/26/13
Experiments
(i) synthetic depth training data is an excellent proxy for real data
(ii) scaling up the learning problem with varied synthetic data is important for high accuracy (iii) our parts-based approach generalizes better than even an oracular exact nearest neighbor
2/26/13
Data
2/26/13
Depth image
Retargetted to a variety of base character models to synthesize a large, varied dataset 640x480 image at 30 frames per second
2/26/13 resolving
using furthest neighbor clustering algorithm where the distance between poses body joints , Pi mean i pose
2/26/13mean j
sampling from our model training the classifier testing joint prediction accuracy
2/26/13
Goals
2/26/13
render depth and body part images from texture mapped 3D meshes slight random variation in height weight give extra coverage of body
and 2/26/13
2/26/13
Body part labeling Depth image features Randomized decision forests Joint position proposals
2/26/13
as color-coded Some directly localize particular skeletal joints others fill the gaps
transforms the problem into one that can readily be solved by efficient classification algorithms
2/26/13
2/26/13
31 body parts:
LU/RU/LW/RW head, neck, L/R shoulder, LU/RU/LW/RW arm, L/R elbow, L/R wrist, L/R hand, LU/RU/LW/RW torso, LU/RU/LW/RW leg, L/R knee, L/R ankle, L/R foot (Left, Right, Upper, loWer)
2/26/13
di (x) is the depth at pixel x in image I = (u, v) describe offsets u and v 1/di (x) ensures the features are depth invariant
2/26/13
2/26/13
The design of these features was strongly motivated by their computational efficiency
no preprocessing is needed read at most 3 image pixels at most 5 arithmetic operations straightforwardly implemented on the GPU
2/26/13
2/26/13
2/26/13
2/26/13
the final output of our algorithm used by a tracking algorithm to self initialize and recover from failure
2/26/13
A local mode-finding approach based on mean shift with a weighted Gaussian kernel
2/26/13
PDF(x) =
c e
Wic considers both the inferred body part probability at the pixel and the world surface area of the pixel
2/26/13
lie on the surface of the body pushed back into the scene by a learned z offset produce a final joint position proposal
Bandwidth Bc = 0.065m Threshold c = 0.14 Z offset = 0.039m Set = 5000 images by grid search
2/26/13
2/26/13
Experiments
3 trees, 20 deep, 300k training images per tree 2000 training example pixels per image 2000 candidate features 50 candidate thresholds per feature
2/26/13
Experiments
Test data
challenging synthetic and real depth images to evaluate our approach synthesize 5000 depth images 8808 frames of real depth images 15 different subjects 7 upper body joint positions
2/26/13
Experiments
Error metric:
average of the diagonal of the confusion matrix between the ground truth part label
generate recall-precision curvesas a function of confidence threshold quantify accuracy as average precision per
2/26/13
Experiments
Error metric:
This penalizes multiple spurious detections Near the correct position which might slow a downstream tracking algorithm
2/26/13
Experiments
2/26/13
Experiments
2/26/13
Experiments
2/26/13
Experiments
2/26/13
Experiments
2/26/13
Experiments
Real time motion capture using a single time-of-flight camera. [CVPR 2010]
2/26/13
Discussion
accurate proposals
super real-time from single depth images as an intermediate representation train very deep decision forests invariant features without
2/26/13 Depth
Future work
2/26/13
Thank you
2/26/13