Sie sind auf Seite 1von 75

Visual Interpretation of

Hand Gestures for


Human-Computer Interactions:
Reference
Pavlovic, V.I.; Sharma, R.; Huang, T.S.,
University of Illinois at Urbana-Champaign, 1995

Lee, Yung-Hui

Department of Industrial Management,


National Taiwan University of Science and Technology,
Taipei, Taiwan
2005, 10, 20
1. Introduction
¾ Human-computer interaction (HCI) has become an
increasingly important part of our daily lives.
¾ Keyboards and mice are the most popular mode of HCI.
¾ Virtual Reality and Wearable Computing require novel
interaction modalities with following characteristics:
• in a way that humans communicate with each other.
¾ Hand gesture is a natural and intuitive communication mode.
¾ Other applications: Sign Language Recognition, video
transmission, and so on.

NTUST, Ergonomic/design Lab


2 Hand Gestures in HCI
(1/3)
¾Hand gestures is non-obstructive
¾Vision-based gesture interpretation system:
9Image extraction from video input
9Featuring parameters of the gesture
9Classification and interpretation grammar
¾Accuracy, robustness, speed, variability?

NTUST, Ergonomic/design Lab


2. Hand Gesture in HCI
(2/3)
¾Vision-based recognition of dynamic hand
gestures is a challenging interdisciplinary
project.
9hand gestures are rich in diversities, multi-
meanings, and space-time variation.
9human hand is a complex non-rigid object.
9computer vision itself is a ill-pose problem.

NTUST, Ergonomic/design Lab


2. Hand Gesture in HCI
(3/3)
¾To recognize continuous dynamic hand gesture
9Design of gesture command set and interaction
model.
9Real-time segmentation of gesture streams.
9Modeling, analysis, and recognition of gestures.
9Real-time processing is mandatory for practically
using hand gestures in HCI.

NTUST, Ergonomic/design Lab


3. Gesture Modeling
3.1 Definition of gesture
3.2 General Taxonomy
3.3 Temporal Modeling of gesture
3.4 Spatial modeling of gesture
3D hand model-base
Appearance-base

NTUST, Ergonomic/design Lab


3.1 Definition of Gestures
¾ Construction of the gestural model over the
parameter set
¾ Define the gesture interval
¾ Single or two hands and/or arms
¾ Def: Let h(t) =
where t: the time t
h(t) the spatial position within an
environment at time t
S: the parameter space

NTUST, Ergonomic/design Lab


3.2 Gestural Taxonomy
¾ Gestures and unintentional movements
¾ Gestures: communicative and manipulative
¾ Manipulative: act on objects in an environment (ex:
rotation)
¾ Communicative: symbols (a linguistic role) vs acts
(interpretation of the movement)
¾ Symbols: referential (circular motion of the index) vs
modalizing (wing is vibrating)
¾ Acts: mimetric (imitate actions) vs deictic (pointing
acts)

NTUST, Ergonomic/design Lab


3.2 Taxonomy of Gesture for HCI
H a nd / A r m M o ve m e n ts

D y na m ic S ta t ic

G e s t u re s U n int e nt io n a l M o ve m e nt s

M a n ip u la t ive C o m m u n ic a t ive

A c ts S y m b o ls

M im e t ic D e ic t ic R e fe re nt ia l M o d a liz in g

Fig.1: A Taxonomy of hand gestures for Human-computer Interaction. Meaningful gestures are
differentiated from unintentional movements. Gestures used for manipulation of objects are separated
from the gestures which possess inherent communicational character. Symbols are those gestures
having a linguistic role. They symbolize some referential action or are used as modalizers, often of
speech.
3.3 Temporal Modeling of Gestures
¾ A dynamic process
¾ Gesture interval consists of: preparation, stroke, and
retraction
¾ Hand pose during the stroke follows a classible path in
the parameter space
¾ Gestures are conned to a specied spatial volume
¾ Repetitive hand movements are gestures
¾ Manipulative gestures have longer gesture interval than
communicative gestures

NTUST, Ergonomic/design Lab


3.4 Spatial Modeling of Gestures
¾Dimensionality of the parameter space
¾Complex of the techniques for computer vision
¾3D hand model-based: joint angles, palm
position
¾Appearance-based: images, image geometric
parameters, image motion parameters, finger
position & motion

NTUST, Ergonomic/design Lab


3.4 Spatial Modeling of
Hand Gestures
Hand Gesture Modeling

3-D hand /arm model based modeling Appearance based modeling

3-D 3-D 3-D 3-D Gray 2-D Image Image


textured wireframe Geome- Skeleton image deformable properties motion
volumetric volumetric trical model based template based based
model model model model based model model model

Fig. 2: Classification of spatial modeling of hand gestures


3.4.1 3D Hand/arm Models
¾Simple skeleton of the hand/arm
¾Hand: 8 carpals (wrist) + 5 metacarpals (palm) +
14 phalanges (fingers)
¾2 degree of freedom (DoF): MCP and TM joints
¾1 DoF: PIP and DIP joints
¾Dependability between the movements in
neighboring joints
¾Range + dependencies = 26 DoF (Kuch, 1994)
NTUST, Ergonomic/design Lab
3.4.2 Appearance-Based
Models
¾Deformable 2D template set
¾Principal component of training sets
¾A sequence of image n-tuples
¾One (monoscopic) or two (stereoscopic) views
¾Image property parameters: contours, edges,
image moments, image eigenvectors
¾The palm must be assumed to be rigid and the
fingers can have a limited number of DoFs.
NTUST, Ergonomic/design Lab
3.4 Hand Gesture Modeling

(a) (b) (c) (d) (e)

Fig.3: Representing the same hand posture by different hand


models. (a) 3-D textured volumetric model; (b) 3-D
wireframe volumetric model; (c) 3-D skeletal model; (d)
Binary silhouette; (e) Contour model.
4. Gesture Analysis
¾ To estimate the parameter (trajectory in parameter
space) of the gesture model based on the number of
low level features extracted from image of human
operators acting on a HCI environment
¾ Hand/arm localization
¾ Hand/arm feature extraction
¾ Hand/arm model parameter computation from
features

NTUST, Ergonomic/design Lab


4.1 Hand/arm localization
¾Hand/arm are extracted from the rest of image
¾Restrictions on background: uniform,
distinctive dark
¾Restriction on user: long dark sleeves
¾Restriction on imaging: on-hand focused
cameras
¾Thresholding, color space-based analysis

NTUST, Ergonomic/design Lab


4.2 Features
¾Image of hands/arms
¾Hands/arms silhouettes (color histogram)
¾Contours: color or gray level images
¾Fingertip locations

NTUST, Ergonomic/design Lab


4.3 Parameter Computation
¾ Vary model parameters until the features extracted
from the model match the ones obtained form the
data images
¾ Begins with the palm and ends with the fingers
¾ Hand silhouettes, finger locations, characteristics
of the palm, contours and edges
¾ Direct mapping between the features and the
parameter spaces

NTUST, Ergonomic/design Lab


4.4 Examples
¾Estimation of 3-D hand /arm model parameters
9two sets of parameters: angular (joint angles) and linear
(palm dimensions)
9the initial parameter estimation
9the parameter update as the hand gesture evolve in time.
¾Estimation of appearance based model parameters
9image motion estimation (e.g. optical flow)
9shape analysis (e.g. computing moments)
9histogram based feature parameters (e.g. )
9active contour model.

NTUST, Ergonomic/design Lab


5. Gesture Recognition (1/3)
¾The trajectory in the parameter space is classified
as a member of some meaningful subset of the
parameter space
¾Optimal partitioning of the time-model parameter
space: (1) how to design a meaningful partition of the
parameter space such that it reflects the human perception
of different gestures? (2) how to determine class
membership mapping in the parameter space.
¾Implementation of the recognition procedure
NTUST, Ergonomic/design Lab
5. Gesture Recognition (2/3)
¾Partitioning of the time-model parameter space
should produce single class in the parameter space
¾Only the stoke is meaningful
¾Initial and final phase contain no gestural
information
¾ Methods: averaging, K-mean, hidden Markov
models (HMM), and neural network
¾Based on minimum distance measure from a class
representative
NTUST, Ergonomic/design Lab
5. Gesture Recognition (3/3)
¾Only certain subclass of gestures actions with
respect to the current and previous states of the
HCI environmental are plausible
¾Grammar + linguistic character of
communicative gestures + space character of
manipulative gestures
¾Computational effectiveness: complexity vs time

NTUST, Ergonomic/design Lab


6. Application and Systems
¾Gestural interfaces in virtual environment
¾Manipulation of virtual object: simulate 2D, 3D,
windows, device control panels, robotic arms
¾Select object, rotate object
¾Recognition of American Sign Language

NTUST, Ergonomic/design Lab


Examples of Application
Application Technique of modeling Gestural commends

CD player control panel(Ahmad, ) hand silhouette moments tracking only


Virtual squash (Brockl, 1995) hand silhouette moment track & 3 metaphors
Finger paint (Crowley, 1995) fingertip template tracking
TV display (Freeamn,1995) template correlation tracking
Finger Pointer (Fukumoto, 1994) heuristic detection tracking+ metaphor+ speech
Window manager (Kjeldsen,1995) neural network tracking+ 4 metaphors
Gesture computer (Maggioni,1995)image moment+fin position
Finger mouse (Quek, 1995) heuristic detection tracking only
Digit Eyes (Rehg, 1993) 27 DoF 3D hand tracking only
ROBOGEST (Hunter,1995) Silhouette Zernike moments 6 metaphors
Automatic robot instruction fingertip position in 2D grasp traking
Robot control (Torige,1992) fingertip position in 3D 6 metaphors
Hand sign recognition (Cui, 1995) Most discriminating features of image 28 signs
ASL recognition (Starner, 1995) silhouette moments + grammar 40 words

NTUST, Ergonomic/design Lab


7. Future Directions
(Naturalness )
¾Naturalness of the interface?
¾Complexity of the analysis and recognition:
narrow group of application
¾3D hand model cover more prospect than the
appearances-based models: wider class of hand
gestures but is hindered by lack of speed and
restriction of the background
¾Fingertip positions: skin and nail texture
NTUST, Ergonomic/design Lab
7. Future Directions
(Two hands)
¾Distinguish between or indexing of left/right
hand
¾Speed?
¾Multiple users?
¾Interaction of workspaces
¾Dierentiation between the users: cameras
adaptively focus on some area of interest

NTUST, Ergonomic/design Lab


7. Future Directions
(Multiple communication modes)
¾Hand gestures
¾Hand gestures + speech
¾Hand gestures + speech + body movement
¾Hand gestures + speech + body movement +
gaze
¾The means of communication = integration of
communication modes at all levels
¾More robust interface + reduce complexity +
increase naturalness
NTUST, Ergonomic/design Lab
5 Gesture Recognition
Techniques
G estu re recognition
techniqu es

Static gesture recognition D ynam ic gesture recogn ition

C lassical N on-lin ear H idden D ynam ic Tim e


clusterin g clusterin g M ark ov Tim e reduced
m eth ods m eth ods M od el Warpin g m eth ods
(e.g. K - (e.g. neu ral based based
m ean) netw ork s) m eth ods m eth ods

Fig. 4: Classification of hand gesture recognition techniques


3.1 Interaction Model
¾ Strength and weakness of gesture based interaction
¾ Structure of interaction model
9 users performing gestures follow three steps.
9 suitable feedback
9 apply gesture based input to appropriate tasks
¾ A set of rules for designing gesture command set.
9 Performing gestures intentionally and intensively, easy to
learn, be symmetrical ...

NTUST, Ergonomic/design Lab


3.2 A Prototype System:
Gesture-controlled Panoramic Map Browser

(a) (b)
Fig. 5: Gesture-controlled panoramic map browser. (a)
System setting; (b) User interface.
3.3 Gesture Command Set
• Four translation gesture commands
– move up (1); move down (2); move left (3);
move right (4)
• Six rotation gesture commands
– yaw right (7); yaw left (8); roll clockwise (9);
roll counterclockwise (10); pitch down (11);
pitch (12)
• Two other gesture commands
– zoom in (5); zoom out (6).
4 Real-Rime Segmentation of
Continuous Dynamic Hand Gestures
• Goals
– segment the moving hand from background.
– portion of gesture streams into meaningful
sections.
• Methodology
– integrating multiple clues: skin color, motion.
– post-processing (morphological filtering
techniques).
Let t= 0, read a fram e from video buffer,
nam ed I t , then push I t into a stack.

R ead next fram e from video


buffer and nam e it I t+ 1 .

A m oving hand N t larger than N


appears in I t+ 1 ? L1 ?

Y
Y
A nalysis and recognition
Push I t+ 1 into the stack
of the gesture sam ple
and increase t b y 1. ?
stored in the stack.

t larger Y
E m pty the stack
than L 2 ?

Fig. 6: Processing flow chart of real-time segmentation


5. Recovering Image Motion Model
Parameters by Robust Regression
5.1 Parameterized Image Motion Model
5.2 Constructing Objective Function
5.3 Robust Error Norms
5.4 Simultaneous Over Relaxation with Continuation
Method.
5.5 Multi-resolution Analysis.
5.6 Examples of Experiment Results.
4.1 Parameterized Image Motion Models
⎡1 x y 0 0 0 x 2 xy ⎤
Define: X(x) = ⎢ 2⎥
⎣ 0 0 0 1 x y xy y ⎦

Translation Model: u(x, T) = X(x)T , T = ( a 0 , 0 , 0 , a 3 , 0 ,0 ,0 ,0 )T


Affine Model: u(x, A) = X(x)A , A = ( a0 , a1 , a 2 , a3 , a 4 , a5 ,0,0)T

Planar Model: u(x, P) = X(x)P , P = (a0 , a1 , a2 , a3 , a4 , a5 ,a6 ,a7 )T


For example:
⎡u ( x, y )⎤ ⎡ a0 + a1 x + a2 y + a6 x 2 + a7 xy ⎤
u ( x, P ) = ⎢ ⎥ =⎢ 2⎥
⎣ v( x, y ) ⎦ ⎣a3 + a4 x + a5 y + a6 xy + a7 y ⎦
4.2 Constructing Objective Function
Brightness Constancy assumption:

I (x, t ) = I ( x − X(x)Θ, t + 1), ∀x ∈ ℜ

Taking the Taylor series expansion, simplifying, and dropping


terms above first order gives
∇I T ( X(x)Θ ) + I t = 0, ∀x ∈ ℜ

Recover model parameters by minimizing following objective


function:
E ( Θ ) = ∑ ρ ( ∇ I T ( X ( x ) Θ) + I t , σ )
x∈ℜ
5.3 Robust Error Norms

Quadratic: ρ = x2
⎧ 2 α
Truncated quadratic: ⎪λ x <
ρ (x ,α , λ ) = ⎨ x
λ
⎪⎩α Otherwise

x2
Geman-McClure function: ρ (x , σ ) = 2
σ + x2

⎛ 1 ⎛ x ⎞
2⎞

Lorentzian function: ρ ( x , σ ) = log ⎜ 1 + ⎜ ⎟ ⎟


⎜ 2 ⎝ σ ⎠ ⎟⎠

5.3 Robust Error Norms

Fig. 7: Geman-McClure function. (a) Geman-McClure


function; (b) Its derivative function. (σ = {5.0,10.0,15.0,20.0})
5.4 Simultaneous Over Relaxation
with Continuation Method
•The iterative updating equation at n+1 iteration:
1 ∂ E (Θ )
a n +1
= a −ω
n
, i = 0 ,1, 2 ,..., 7
T (ai ) ∂ai
i i

Where,
∂ 2 E (Θ)
T (ai ) ≥
∂ai2

σ n +1 = 0.95 σ n
5.5 Multi-resolution Analysis
Θ0

Warp Estimate

Θ1 ΔΘ 0
+

Warp Estimate

Θ2 ΔΘ 1
+

Warp Estimate

ΔΘ 2
It +
I t +1

Fig. 8: Illustration of multi-resolution analysis.


5.6 Examples of Image Motion Estimation

(a) (b) (c)

(d) (e) (f)


Fig.9: An example of robust image motion regression. (a) and (b) are the 2nd and
3nd frames in an image sequence. (c) Inliers and Outliers identified according
to the result of the first regression. (d) Segmentation of the moving hand. (e)
outliers identified according to result of the second regression. (e) The
difference image between (a) and (b).
4.6 Examples of Image Motion Estimation

(a) (b) (c)

(d) (e) (f)


Fig. 10: Another example of robust image motion
regression
6. Spatio-Temporal Appearance
Modeling

6. 1. Inter-frame Motion Appearance


6.2. Inner-frame Shape Appearance
6.3. Spatio-temporal Appearance
6. 1. Inter-frame Motion Appearance
m[t ] = [m1 , m2 , m3 , m4 , m5 , m6 , m7 ]

Horizontal translation: m1 = a0 ; Vertical translation: m2 = a3

Isotropic expansion: m3 = a1 + a 5 ;

Pure shear or deformation: m 4 = ( a1 − a 5 ) 2 + ( a 2 + a 4 ) 2

2D rigid rotation: m5 = − a 2 + a 4 ; Yaw about the view direction: m6 = a 6

Pitch about the view direction: m7 = a 7


6.2. Inner-frame Shape Appearance
s[t ] = [s1 , s2 , s3 ]

⎧ π
a ⎪θ , θ ∈ [0, ]
2
s1 = a s2 = s3 = ⎨
b π
⎪π − θ , θ ∈ ( , π ]
⎩ 2

a , b , and θ represent the length of the major axis, length ratio of the major axis
to the minor axis, and the normalized angle between the major axis and the x-axis
of the image plane.
6.3. Spatio-temporal Appearance

g L = [f0 , f1,..., f L −1 ]
Where,
ft = [m[t ], s[t ]]T

L represents the temporal length of the sequence; It (t = 0,1,2, ...,


L-1) is the tth frame; m[t] is the extracted motion appearance
between frame It and frame It+1; s[t] is the extracted shape
appearance at frame It.
7.1 Dynamic Time Warping

Fig. 11: DTW assumes that the endpoints of the two patterns have been
accurately located and formulates the pattern matching problem as finding
the optimal path from the start to the end on a finite grid. The optimal path
can be found efficiently by dynamic programming.
7.2 Modified DTW
• Our experiments find that the traditional DTW is not
adequate to match two spatio-temporal appearance
patterns.
– Unlike the high sampling rate used in speech recognition, the
sampling rate is usually 10 Hz in hand gesture recognition.
Therefore, the fluctuation in the time axis of hand gesture
patterns is much sharper than that of speech patterns.
– A modified DTW algorithm, a kind of non-linear re-sampling
technique, is developed to dynamically warp each spatio-
temporal pattern to a fixed temporal length, which can reserve
necessary temporal information and spatial distribution of
original patterns.
7.3 Template based Recognition
• The distance between two sptio-temporal appearance patterns
is calculated based on correlation between their warped
patterns.

A = (aij )10× L1 B = (bij )10× L2 Â = (aˆij )10× K B̂ = (bˆij )10× K


K −1 9

∑∑ (w aˆ
j =0 i =0
i ij ) × ( wi bˆij )
D( A, B) = 1 −
K −1 9 K −1 9

∑∑ (w aˆ
j =0 i =0
i ij ) 2
∑∑ (w bˆ )
j =0 i =0
i ij
2

• Given a training set, a reference template is created for each type


of gestures by a minimax type of optimization, then template-based
classification technique is employed to recognized hand gestuers.
8. Experiment Results
8.1 Examples of Hand Gesture Segmentation.
8.2 Choosing Image Motion Models.
8.3 Examples of Spatio-temporal Appearance.
8.4 Examples of Warped Spatio-temporal
Appearance.
8.5 Motion Appearance versus Shape
Appearance.
8.6 Testing.
8.1 Examples of hand gesture segmentation

Fig.12: Segmentation result of a “move up” hand gesture.


8.1 Examples of hand gesture segmentation

Fig. 13: Segmentation result of a “move left” hand gesture.


8.1 Examples of hand gesture segmentation

Fig. 14: Segmentation result of a “zoom in” hand gesture.


8.1 Examples of hand gesture segmentation

Fig. 15: Segmentation result of a “yaw right” hand gesture.


8.2 Choosing Image Motion Model
Recognition rates when choosing different image
motion model:
Chosen image motion Average recognition
model rate on training set
Translation model 80.6%
Affine model 94.4%
Planar model 95.8%

Conclusion: For our gesture command set,


choosing affine model is necessary and sufficient.
8.3 Examples of Spatio-temporal Appearances

Table 1: Spatio-temporal appearance model parameters of the


“move up” gesture sample.
ft m1 m2 m3 m4 m5 s1 s2 s3
t
0 0.4183 6.1250 0.1376 0.0484 0.0941 25.9734 2.4492 0.0076
1 -0.1071 2.3900 0.0066 -0.0169 0.0441 26.7226 2.8045 0.0358
2 1.0242 0.0802 -0.0742 -0.1518 0.1154 26.6768 3.0311 0.0377
3 2.6851 2.9310 -0.0188 -0.2070 0.1646 24.2770 3.0005 0.0188
4 2.1216 5.5608 0.0813 -0.2101 0.1511 24.8164 2.5718 0.1233
5 0.4889 8.3181 0.2556 -0.2183 0.5119 25.3746 2.8234 0.0986
6 0.7085 6.1152 0.5113 -0.3368 0.4714 22.8396 2.2939 0.1172
7 -0.5898 2.8397 0.1999 -0.0517 0.2489 22.8431 1.9318 0.1460
8 0.0107 1.9039 0.1116 0.0159 0.1010 22.8911 1.8986 0.1523
8.3 Examples of Spatio-temporal Appearances
Table 2: Spatio-temporal appearance parameters of the “move
left” gesture sample.
ft m1 m2 m3 m4 m5 s1 s2 s3
t
0 0.6314 -0.2927 -0.0193 -0.0053 0.0273 22.8277 1.9476 1.4100
1 1.1937 -0.6409 -0.0562 -0.0368 0.0666 24.0954 1.9793 1.3756
2 1.2840 -1.1944 -0.0874 -0.0735 0.0659 24.9579 1.9811 1.3291
3 2.0238 -1.2641 -0.0903 -0.0523 0.0351 28.4130 2.1979 1.3089
4 2.1726 -0.3364 -0.1059 -0.0273 0.1069 29.8538 2.3551 1.2219
5 2.3646 -0.8494 -0.1017 -0.0377 0.1275 26.8498 2.1877 1.3451
6 1.9015 -0.9238 -0.0399 -0.0216 0.1055 30.6782 2.1824 1.3303
7 1.7636 -0.1526 0.0261 -0.0148 0.0350 26.4047 1.4144 1.5056
8.3 Examples of Spatio-temporal Appearances
Table 3: Spatio-temporal appearance parameters of the “zoom in”
gesture sample.
ft m1 m2 m3 m4 m5 s1 s2 s3
t
0 -0.5627 -0.4283 0.0323 0.0219 0.1331 19.2683 1.5319 0.8548
1 0.6483 0.3485 0.0629 -0.0620 0.1210 22.9368 1.5536 0.8178
2 -0.6371 0.3143 0.1029 -0.0415 0.0787 31.4489 1.8005 0.9353
3 -0.8954 -0.2239 0.1123 -0.0257 0.0435 32.1809 1.3438 0.8470
4 -1.5216 -0.0952 0.1237 -0.0192 0.0574 29.1261 1.1967 1.1138
5 -1.9579 -0.7117 0.1408 0.0213 0.0513 31.0639 1.1290 1.4396
6 -1.9493 -0.3413 0.1201 0.0082 0.0689 33.4124 1.1363 1.1634
7 -0.0951 0.8231 0.1096 -0.0059 0.0245 36.0675 1.1972 0.8686
8.3 Examples of Spatio-temporal Appearances

Table 4: Spatio-temporal appearance parameters of the “yaw


right” gesture sample.
ft m1 m2 m3 m4 m5 s1 s2 s3
t
0 0.5164 0.2989 -0.0831 0.0396 0.0747 40.2918 2.0552 1.5191
1 0.5565 1.0714 -0.1105 0.0235 0.0788 41.7891 2.1914 1.5401
2 0.3188 0.8934 -0.1605 0.0151 0.1321 41.0017 1.9943 1.5652
3 -0.3202 0.3807 -0.0135 0.0079 0.0222 42.1295 1.6445 1.4429
4 0.0904 0.2775 0.0324 -0.0112 0.0409 45.7522 2.2741 1.4093
5 -0.1091 0.6437 0.0682 0.0207 0.1010 39.0260 2.4736 1.3890
6 -0.7747 0.1814 -0.0067 0.1045 0.0905 43.7648 2.4416 1.4120
8.4 Determining of Warping Length
Table 5: Statistics of temporal length (number of frames)
of samples in the training set.
Gesture Max. Min. Mean Standard Variance
No. deviation
1 11 7 9 1.6733 2.8000
2 10 6 8.5000 1.5166 2.3000
3 12 7 9.3333 1.9664 3.8667
4 10 7 8.3333 1.0328 1.0667
5 10 6 8.1667 1.3292 1.7667
6 10 6 7.8333 1.6021 2.5667
7 10 7 8.1667 0.9832 0.9667
8 8 6 6.8333 0.7528 0.5667
9 13 10 11.8333 1.1690 1.3667
10 13 9 10.6667 1.3663 1.8667
11 13 7 10 2 4
12 15 8 10.3333 2.5033 6.2667
8.5 Examples of Warped Spatio-
temporal Appearance
Table 6: Parameters of the warped spatio-temporal appearance
of the “move up” gesture sample.
ft m1 m2 m3 m4 m5 s1 s2 s3
t
0 0.5673 8.5350 0.1256 -0.0065 0.1670 26.7111 2.8611 0.0363
1 4.5141 5.7716 -0.0338 -0.4259 0.3266 24.5467 2.7861 0.0711
2 2.0811 15.6849 0.6797 -0.5759 0.9409 23.4734 2.4263 0.1125
3 -0.4020 6.2724 0.4394 -0.1201 0.4678 22.8911 1.8986 0.1523
8.5 Examples of Warped Spatio-
temporal Appearances
Table 7: Parameters of the warped spatio-temporal appearance
of the “move left” gesture sample.
ft m1 m2 m3 m4 m5 s1 s2 s3
t
0 1.8251 -0.9336 -0.0755 -0.0421 0.0939 24.0954 1.9793 1.3756
1 3.3078 -2.4585 -0.1777 -0.1258 0.1010 28.4130 2.1979 1.3089
2 4.5372 -1.1858 -0.2076 -0.0650 0.2344 26.8498 2.1877 1.3451
3 3.6651 -1.0765 -0.0138 -0.0364 0.1406 26.4047 1.4144 1.5056
8.5 Examples of Warped Spatio-
temporal Appearances
Table 8: Parameters of the warped spatio-temporal appearance
of the “zoom in” gesture sample.
ft m1 m2 m3 m4 m5 s1 s2 s3
t
0 0.0856 -0.0798 0.0951 -0.0401 0.2540 22.9368 1.5536 0.8178
1 -1.5325 0.0904 0.2152 -0.0672 0.1221 32.1809 1.3438 0.8470
2 -3.4795 -0.8069 0.2645 0.0020 0.1087 31.0639 1.1290 1.4396
3 -2.0444 0.4818 0.2297 0.0023 0.0934 36.0675 1.1972 0.8686
8.5 Examples of Warped Spatio-
temporal Appearances
Table 9: Parameters of the warped spatio-temporal appearance
of the “yaw right” gesture sample.

ft m1 m2 m3 m4 m5 s1 s2 s3
t
0 0.9338 1.1024 -0.1660 0.0572 0.1337 41.4148 2.1573 1.5349
1 0.2979 1.3516 -0.1949 0.0249 0.1628 41.5656 1.8194 1.5041
2 -0.0969 0.6288 0.0427 -0.0021 0.0772 44.0707 2.3240 1.4043
3 -0.8565 0.6642 0.0445 0.1200 0.1662 43.7648 2.4416 1.4120
8.6 Motion Appearance Vs Shape Appearance

• To explore the discrimination power of motion appearance


or shape appearance separately, two experiments are
carried out, one with only motion appearances being
feature vectors and the other with only shape appearances
being feature vectors.

Adopted Appearance Model Recognition Rate on the


Training Set
Motion Appearance 88.9%
Shape Appearance 73.6%
Spatio-temporal appearance 94.4%
8.7 Testing Experiment
• The average recognition rate achieved on the test
set is 89.6% .
• Gesture-controlled panoramic map controller.
• The prototype system can recognize hand gestures
performed by a trained user with accuracy ranged
from 83% to 92%.
9. Summary

• Aiming at real-time gesture-controlled


human-computer interaction, we propose
novel approaches for visual modeling,
analysis, and recognition of continuous
dynamic hand gestures.
9. Summary
1 A spatio-temporal appearance model is proposed
to represent dynamic hand gestures.
- The model integrates temporal information, motion and
shape appearances.
- The motion appearance represents the image
appearance changes caused by motion itself, not a
temporal sequence of static configurations.
- The shape appearance is based on the geometrical
features of an ellipse fitted to the hand image region
rather than the simply moment-based features.
9. Summary
2 Novel approaches are developed to extract model
parameters by hierarchically integrating multiple
clues.
- At low level, fusion of flesh chrominance analysis and
coarse image motion detection is employed to detect
and segment hand gestures
- At high level, the model parameters are recovered by
integrating fine image motion estimation and shape
analysis.
- The approaches achieve both real-time processing and
high recognition rates.
9. Summary
3 A modified Dynamic Time Warping algorithm is
suggested for eliminating time variation of spatio-
temporal appearance patterns due to various
gesturing rates.
- It is a kind of non-linear re-sampling technique.
- It can reserve necessary temporal information and
spatial distribution of original patterns.
9. Summary
4 A prototype system, gesture-controlled panoramic
map browser, is designed and implemented to
demonstrate the usability of gesture-controlled real-
time interaction.
– Dynamic hand gestures are recognized without resorting to
any special marks, limited or uniform background, or
particular illumination.
– Only one uncalibrated video camera is utilized.
– Higher recognition rates are achieved.
– User is allowed to perform continuous hand gestures,
starting at any point within the view field of the camera.
10. Future Work
¾We currently assume that the moving skin color
region in the scene is the gesturing hand, which
could be invalid when there appears a moving
human face. Exploiting simple geometrical
model of human body can alleviate this problem,
in that case multiple cameras can be necessary.
10. Future Work
¾ To practically use hand gestures in HCI, more gestural
commands will be needed.
¾ Some kind of commands would be more reasonably
input by static hand gestures (hand postures).
¾ On the other hand, speech commands will be an
alternative to some gestural commands.
¾ Cooperating hand gesture recognition into multi-modal
interface (MMI) is our next work.

NTUST, Ergonomic/design Lab


THANK YOU