Sie sind auf Seite 1von 4

Wiizards: 3D Gesture Recognition for Game Play Input

Louis Kratz Matthew Smith Frank J. Lee


Dept. of Computer Science Digital Media Labs Dept. of Computer Science
Drexel University Drexel University Drexel University
3141 Chestnut Street 3141 Chestnut Street 3141 Chestnut Street
Philadelphia, PA 19104 Philadelphia, PA 19104 Philadelphia, PA 19104
louis.kratz@drexel.edu mas97@drexel.edu fjl@drexel.edu

ABSTRACT
Gesture based input is an emerging technology gaining wide-
spread popularity in interactive entertainment. The use of
gestures provides intuitive and natural input mechanics for
games, presenting an easy to learn yet richly immersive expe-
rience. In Wiizards, we explore the use of 3D accelerometer
gestures in a multiplayer, zero sum game. Hidden Markov
models are constructed for gesture recognition, providing
increased flexibility and fluid tolerance. Users can strategi-
cally effect the outcome via combinations of gestures with
limitless scalability.

Categories and Subject Descriptors


H.5.2 [Information Interfaces and Presentation]: User
Interfaces—Input devices and strategies; K.8.0 [Personal
Computing]: General—Games; I.5.5 [Pattern Recogni-
tion]: Implementation—Interactive systems Figure 1: Wiizards presents a two player, spell based
environment.
General Terms
Games, Interactive systems, Pattern Recognition

Keywords the shapes parameters. Heap’s approach, while accurate,


Gestures, HMM, Games requires the use of vision based system for the shape model,
which introduces latency and speed issues not found in ac-
1. INTRODUCTION celerometers. Keskin et. al. [8] also uses a vision based
Gesture recognition as an input mechanic has been explored approach.
academically in a variety of approaches. Two definitions of
gestures have been popular in the literature. The first de- Accelerometer based gesture recognition has been explored,
fines a gesture as the spatial orientation of a person or hand. though the construction of such have varied. Keir et al.
This approach was utilized by Freeman et. al. [3], Segen et [6] created a gesture recognition system for accelerometers
al. [13] [12] and GeFighters [14]. The second definition of using curve fitting. The curve fitting approach presented
gestures classify specific motion paths of the user. This work by Keir who integrates the accelerometer data for absolute
explores the use of accelerometers to classify motion paths position. Payane [10] uses the gesture recognition created
via a Bayesian approach. By using accelerometer data, the by Keir as a game input mechanic.
path that the user creates can be recorded directly, rather
than relying on tracking to estimate the motion differen- Payane’s work, though similar in spirit to ours, does not have
tials. Other approaches include Heap [4] who uses active the advantages of a Bayesian method. Wiizards uses Hid-
shape models to track the user and identifies gestures by den Markov Models [11] (HMMs) to classify gestures from
the accelerometer data. This Bayesian approach provides
Permission to make digital/hard copy of part of this work for personal or more flexibility on a per user basis, and handles noisy sen-
classroom use is granted without fee provided that the copies are not made sors data within the model. A HMM is a statistical model
or distributed for profit or commercial advantage, the copyright notice, the whose hidden states exhibit the Markov process. HMMs
title of the publication, and its date of appear, and notice is given that are parametrized by the number of states in the model N ,
copying is by permission of the ACM, Inc. To copy otherwise, to a probability distribution function for each state Bi , initial
republish, to post on servers, or to redistribute to lists, requires prior state distribution π, and a transition probability matrix A .
specific permission and/or a fee.
FuturePlay 2007, November 15-17, 2007, Toronto, Canada. HMMs have been used in other applications for gesture recog-
Copyright 2007 ACM 978-1-59593-943-2/07/0011...$5.00 nition, however. Keskin et. al. [8], for example, use HMMs

209
The user interface is divided into three sections: a bar re-
vealing the current status of all the gestures available to
each of the players, a playing field, and a queue for each
player indicating the current spell (Figure 1). The gesture
bar serves two purposes: a visual reminder to the player of
Figure 2: The unique ordering of gestures for spell how to perform the gesture, and the cool down time remain-
creation. ing. Alpha transparency is used to indicate how long until
a particular gesture available. The amount of transparency
indicates how long until it will be available again. When the
representation of the gesture is fully visible and opaque it
in their vision based approach. In addition Kela et al. [7] use
is available for the player to use. At current the game is in
accelerometer gestures for design applications. Kela’s con-
mid development with the queuing system, gesture recogni-
struction, however, transforms the 3D signal into a sequence
tion system, and GUI completed.
of discrete symbols whose tolerance of noise and ambiguity
to the accelerometer state is not investigated. Mantyjarvi et
al. [9] also uses discrete HMMs for controlling a DVD player. 2.4 Communication Design
Gesture recognition in gaming is just now being explored in Our software utilizes three main components: the Nintendo R
commercial products. AiLive [5] has produced a product for TM
Wii controller, the gesture recognition system, and the
gesture recognition and training for the Nintendo R TM
Wii , graphical game implementation. Communication with the
Nintendo
but has not released the details of their machine learning R TM
Wii controller is done via publicly available
techniques.
open source libraries [1]. The accelerometer data is then
directed into our HMM gesture recognition package. The
2. IMPLEMENTATION results of such are communicated to Adobe R
Flash
TM
via
2.1 Game Overview XML.
Wiizards is a two player zero-sum game. The goal for each
player is to damage the opposition to a critical point while
limiting damage to themselves. The player casts spells by 2.5 Gesture Recognition
performing gestures, the order of which determines the ef-
fect that they have. The gestures are unique arm motions di-
2.5.1 Model Construction
The observations for the TM
models are the accelerometer data
from the Nintendo
vided into three categories: Actions, Modifiers and Blockers. R
Wii controller. The device provides
Each of these serve a different purpose in the combination
a gravitational reading for three axis, making our observa-
of the gestures to complete a total spell.
tions a three dimensional vector o as indicated in (1). This
data is normalized using the wiimote calibration information
2.2 Strategic Composition [1].
The order in which the gestures are composed is vital to 2 3
determining the behavior of a spell. Each spell consists of x
blockers and modifiers, and must conclude with an action. oi = 4 y 5 (1)
Modifiers effect only the gestures following them in the spell z
(Figure 2). For example, if a spell consists of gestures XY Z, Each gesture, or observation vector, is a collection of obser-
the modifier X will effect Y and Z, while Y will only effect vations.
Z. To successfully block a spell, players must directly mimic
their opponent’s gestures. For example, to block spell XY Z, Gi = o1 , o2 , . . . , om (2)
a player must perform gestures BXY Z where B is a blocker.
Since the observations are vectors, multi-variant Gaussian
Blockers can also be modified by gestures performed prior
distributions are used for the emission probabilities. There-
to them.
fore each emission probability Bi is classified by a 3 dimen-
sional mean vector, and a 3×3 covariance matrix. The model
A queue is populated as players perform gestures. When
parameters Bi ,A, and π, are trained using the Baum-Welch
the spell is cast, the elements are removed from the queue
algorithm [2].
in order, modifying the parameters of proceeding gestures.
The ability to combine multiple gestures in spell creation
provides highly customizable and scalable game play expe- 2.5.2 Gesture Classification
rience. The level of customization also gives a wider range of We create a separate model Mi for each gesture to be rec-
possibilities to each of the players, making the game scalable ognized. To classify an observation sequence as a specific
in strategy and individual skill level. Players more fluent gesture, we maximize over the probability of the sequence
in the gestural language of the game can explore different for each model as shown in equation 3.
strategies as they find more effective usages for each gesture.
Gesture management is also a major strategic component to Gesture(G) = arg max p(G|Mi ) (3)
the game. Each gesture has a cool down time limiting how i

often it may be used, forcing the player to make use of a


wide verity of gestures. The probability of a gesture G given a model M is the dis-
tribution of the observations and the hidden states, as cal-
2.3 Visual Feedback culated by the Viterbi Algorithm [2].

210
HMM State Recognition Rates
100 Training Convergence Rate

Correct Classification Percentage


100

Correct Classification Percentage


90
80
80
60
70
40
60
20
50
5 10 15 20 25
0
Number of States 5 10 15 20 25 30 35 40
Number of Gestures used in Training
Figure 3: Percentage of classifications for varied
model size. Figure 4: Average correct classification for varied
training set sizes.

3. IMPLEMENTATION RESULTS
3.1 Model Size Exploration
To train our gesture recognizer models, we gathered training
data from 7 different users. Each user was presented with Classification Without Local Training
images of the gestures from the game, and performed each 100

Correct Classification Percentage


gesture over 40 times.
80
The number of states was varied for each gesture, and an
HMM was created with the data from all of the users. We 60
then measured the percentage of correct classifications based
on those models, results of which are show in Figure 3. A 40
recognition rate of over 90% was achieved with only ten
states. For the game implementation, we use 15 states which
20
achieves over 93% recognition rate with our test data.
0
3.2 Training Convergence Rate 5 10 15 20 25 30 35 40
The gesture recognizer models were trained with the sam- Number of Gestures used in Training
ple data to measure how quickly a model would adapt to
each user. For each user, we trained models with an in- Figure 5: Average correct classification without user
creasing number of observation sequences, and then evalu- training.
ated the percentage of correct classifications with the sam-
ple data. The models used have 15 states. The results of
this are shown in Figure 4. Significant accuracy, over 80%,
is achieved with a training set of only 10 gestures. At 20
gestures the recognition rate is over 95%. This data also HMM Recognizer Performance
indicates that user-dependent training is more reliable than 2200
the global training measured in Figure 3.
Gestures Evaluated Per Second

2000
1800
The classification correctness was also measured against mod- 1600
els where the user had not contributed to the training set. 1400
We created HMM models using the sample data from other 1200
users, and measured the recognition correctness of the user 1000
against it. This scenario approximates how an “out of the 800
box” gesture set would perform. The results of this are in 600
Figure 5. The average recognition rate remains around 50% 400
regardless of how much training data is used. The sample 200
standard deviation is indicated by the vertical bars. This 0
5 10 15 20 25
large sample standard deviation indicates that some gestures Number of States
are frequently being misclassified.
Figure 6: Average evaluation time for increasing
3.3 Implementation Performance model sizes.
The time to evaluate the probability of a gesture is directly
related to the number of states in the HMM. We measured

211
HMM Trainer Performance 5. REFERENCES
30 [1] J. Andersson and C. Phillips. Simple wiimote library
for linux, 2007.
Average Time to Train HMM
25 [2] C. M. Bishop. Pattern Recognition and Machine
Learning (Information Science and Statistics).
20 Springer, August 2006.
15
[3] W. T. Freeman, D. B. Anderson, P. A. Beardsley,
C. N. Dodge, M. Roth, C. D. Weissman, W. S.
10 Yerazunis, H. Kage, K. Kyuma, Y. Miyake, and
K. ichi Tanaka. Computer vision for interactive
5 computer graphics. IEEE Computer Graphics and
Applications, 18(3):42–53, 1998.
0
5 10 15 20 25 [4] A. Heap. Real-time hand tracking and gesture
Number of States recognition using smart snakes, 1995.
[5] A. Inc. Livemove white paper. Technical report,
Figure 7: Average training time for increasing model AiLive Inc., http://www.ailive.net/, 2006.
sizes. [6] P. Keir, J. Payne, J. Elgoyhen, M. Horner, M. Naef,
and P. Anderson. Gesture-recognition with
non-referenced tracking. In 3DUI ’06: Proceedings of
the average time to evaluate a gesture for HMMs with vary- the 3D User Interfaces (3DUI’06), pages 151–158,
ing number of states. These experiments were run on an Washington, DC, USA, 2006. IEEE Computer Society.
Intel Core 2 processor at 2.66Ghz with 4GB of RAM. As [7] J. Kela, P. Korpipaa, J. Mantyjarvi, S. Kallio,
shown in Figure 6, an HMM with 15 states can evaluate G. Savino, L. Jozzo, and D. Marca.
over 250 gestures per second. Note that this number is for a Accelerometer-based gesture control for a design
single model, thus equation 3 will introduce a scaling factor. environment. Personal Ubiquitous Comput.,
10(5):285–299, 2006.
Training the HMMs, however, can not be achieved in a real [8] C. Keskin, A. Erkan, and L. Akarun. Real time hand
time environment. We measured the average training time tracking and 3d gesture recognition for interactive
for a set of 10 gestures for increasing model sizes. These interfaces using hmm. In Proceedings of the Joint
results are presented in Figure 7. The time for training International Conference ICANN/ICONIP 2003.
significantly increases with the number of states. Our im- Springer.
plementation which uses 15 states takes about 10 seconds [9] J. Mantyjarvi, J. Kela, P. Korpipaa, and S. Kallio.
to train. The training for each user must therefore be done Enabling fast and effortless customisation in
offline, with a trade off between the training time and recog- accelerometer based gesture interaction. In MUM ’04:
nition rate. Proceedings of the 3rd international conference on
Mobile and ubiquitous multimedia, pages 25–31, New
York, NY, USA, 2004. ACM Press.
4. CONCLUSION [10] J. Payne, P. Keir, J. Elgoyhen, M. McLundie,
Natural, innovative input is increasingly becoming the sell- M. Naef, M. Horner, and P. Anderson. Gameplay
ing point for interactive applications. With this work we ex- issues in the design of spatial 3d gestures for video
plore how simple, easy to learn controls can lend themselves games. In CHI ’06: CHI ’06 extended abstracts on
to a highly strategic and player driven experience. Wiizard’s Human factors in computing systems, pages
stack based spell approach grants the players the freedom to 1217–1222, New York, NY, USA, 2006. ACM Press.
play at their skill level and strategy of their choice. Our hid-
[11] L. R. Rabiner. A tutorial on hidden markov models
den Markov model construction allows the players a level of
and selected applications in speech recognition. pages
input flexibility while providing easy extensions for more de-
267–296, 1990.
tailed game play. The accuracy of the recognition depends
on time spent on user training, and the number of states [12] J. Segen and S. Kumar. Fast and accurate 3d gesture
in the model. For high accuracy, user specific training is recognition interface. In ICPR ’98: Proceedings of the
required. 14th International Conference on Pattern
Recognition-Volume 1, page 86, Washington, DC,
Our gesture recognition system can perform in real time with USA, 1998. IEEE Computer Society.
high accuracy after an initial training period. The imple- [13] J. Segen and S. Kumar. Human-computer interaction
mentation achieves significant recognition rates with 10 − 20 using gesture recognition and 3d hand tracking. In
user samples, however we consider the machine training time ICIP (3), pages 188–192, 1998.
of 10 seconds to be limiting. Our game implementation will [14] J. M. Teixeira, T. Farias, G. Moura, J. Lima,
handle this by providing training sessions for each player. S. Pessoa, and V. Teichrieb. Gefighters: an experiment
The goal will be to train both the user and the system to- for gesture-based interaction analysis in a fighting
gether in an entertaining fashion. After this initial training game. In SBGames, Brazil, 2006.
period, in game data can be used to update the model. Fu-
ture work will explore the use of adaptive HMMs to avoid
this training overhead and explore alternative input devices
such as multi touch displays.

212

Das könnte Ihnen auch gefallen