Beruflich Dokumente
Kultur Dokumente
ABSTRACT
Gesture based input is an emerging technology gaining wide-
spread popularity in interactive entertainment. The use of
gestures provides intuitive and natural input mechanics for
games, presenting an easy to learn yet richly immersive expe-
rience. In Wiizards, we explore the use of 3D accelerometer
gestures in a multiplayer, zero sum game. Hidden Markov
models are constructed for gesture recognition, providing
increased flexibility and fluid tolerance. Users can strategi-
cally effect the outcome via combinations of gestures with
limitless scalability.
209
The user interface is divided into three sections: a bar re-
vealing the current status of all the gestures available to
each of the players, a playing field, and a queue for each
player indicating the current spell (Figure 1). The gesture
bar serves two purposes: a visual reminder to the player of
Figure 2: The unique ordering of gestures for spell how to perform the gesture, and the cool down time remain-
creation. ing. Alpha transparency is used to indicate how long until
a particular gesture available. The amount of transparency
indicates how long until it will be available again. When the
representation of the gesture is fully visible and opaque it
in their vision based approach. In addition Kela et al. [7] use
is available for the player to use. At current the game is in
accelerometer gestures for design applications. Kela’s con-
mid development with the queuing system, gesture recogni-
struction, however, transforms the 3D signal into a sequence
tion system, and GUI completed.
of discrete symbols whose tolerance of noise and ambiguity
to the accelerometer state is not investigated. Mantyjarvi et
al. [9] also uses discrete HMMs for controlling a DVD player. 2.4 Communication Design
Gesture recognition in gaming is just now being explored in Our software utilizes three main components: the Nintendo R
commercial products. AiLive [5] has produced a product for TM
Wii controller, the gesture recognition system, and the
gesture recognition and training for the Nintendo R TM
Wii , graphical game implementation. Communication with the
Nintendo
but has not released the details of their machine learning R TM
Wii controller is done via publicly available
techniques.
open source libraries [1]. The accelerometer data is then
directed into our HMM gesture recognition package. The
2. IMPLEMENTATION results of such are communicated to Adobe R
Flash
TM
via
2.1 Game Overview XML.
Wiizards is a two player zero-sum game. The goal for each
player is to damage the opposition to a critical point while
limiting damage to themselves. The player casts spells by 2.5 Gesture Recognition
performing gestures, the order of which determines the ef-
fect that they have. The gestures are unique arm motions di-
2.5.1 Model Construction
The observations for the TM
models are the accelerometer data
from the Nintendo
vided into three categories: Actions, Modifiers and Blockers. R
Wii controller. The device provides
Each of these serve a different purpose in the combination
a gravitational reading for three axis, making our observa-
of the gestures to complete a total spell.
tions a three dimensional vector o as indicated in (1). This
data is normalized using the wiimote calibration information
2.2 Strategic Composition [1].
The order in which the gestures are composed is vital to 2 3
determining the behavior of a spell. Each spell consists of x
blockers and modifiers, and must conclude with an action. oi = 4 y 5 (1)
Modifiers effect only the gestures following them in the spell z
(Figure 2). For example, if a spell consists of gestures XY Z, Each gesture, or observation vector, is a collection of obser-
the modifier X will effect Y and Z, while Y will only effect vations.
Z. To successfully block a spell, players must directly mimic
their opponent’s gestures. For example, to block spell XY Z, Gi = o1 , o2 , . . . , om (2)
a player must perform gestures BXY Z where B is a blocker.
Since the observations are vectors, multi-variant Gaussian
Blockers can also be modified by gestures performed prior
distributions are used for the emission probabilities. There-
to them.
fore each emission probability Bi is classified by a 3 dimen-
sional mean vector, and a 3×3 covariance matrix. The model
A queue is populated as players perform gestures. When
parameters Bi ,A, and π, are trained using the Baum-Welch
the spell is cast, the elements are removed from the queue
algorithm [2].
in order, modifying the parameters of proceeding gestures.
The ability to combine multiple gestures in spell creation
provides highly customizable and scalable game play expe- 2.5.2 Gesture Classification
rience. The level of customization also gives a wider range of We create a separate model Mi for each gesture to be rec-
possibilities to each of the players, making the game scalable ognized. To classify an observation sequence as a specific
in strategy and individual skill level. Players more fluent gesture, we maximize over the probability of the sequence
in the gestural language of the game can explore different for each model as shown in equation 3.
strategies as they find more effective usages for each gesture.
Gesture management is also a major strategic component to Gesture(G) = arg max p(G|Mi ) (3)
the game. Each gesture has a cool down time limiting how i
210
HMM State Recognition Rates
100 Training Convergence Rate
3. IMPLEMENTATION RESULTS
3.1 Model Size Exploration
To train our gesture recognizer models, we gathered training
data from 7 different users. Each user was presented with Classification Without Local Training
images of the gestures from the game, and performed each 100
2000
1800
The classification correctness was also measured against mod- 1600
els where the user had not contributed to the training set. 1400
We created HMM models using the sample data from other 1200
users, and measured the recognition correctness of the user 1000
against it. This scenario approximates how an “out of the 800
box” gesture set would perform. The results of this are in 600
Figure 5. The average recognition rate remains around 50% 400
regardless of how much training data is used. The sample 200
standard deviation is indicated by the vertical bars. This 0
5 10 15 20 25
large sample standard deviation indicates that some gestures Number of States
are frequently being misclassified.
Figure 6: Average evaluation time for increasing
3.3 Implementation Performance model sizes.
The time to evaluate the probability of a gesture is directly
related to the number of states in the HMM. We measured
211
HMM Trainer Performance 5. REFERENCES
30 [1] J. Andersson and C. Phillips. Simple wiimote library
for linux, 2007.
Average Time to Train HMM
25 [2] C. M. Bishop. Pattern Recognition and Machine
Learning (Information Science and Statistics).
20 Springer, August 2006.
15
[3] W. T. Freeman, D. B. Anderson, P. A. Beardsley,
C. N. Dodge, M. Roth, C. D. Weissman, W. S.
10 Yerazunis, H. Kage, K. Kyuma, Y. Miyake, and
K. ichi Tanaka. Computer vision for interactive
5 computer graphics. IEEE Computer Graphics and
Applications, 18(3):42–53, 1998.
0
5 10 15 20 25 [4] A. Heap. Real-time hand tracking and gesture
Number of States recognition using smart snakes, 1995.
[5] A. Inc. Livemove white paper. Technical report,
Figure 7: Average training time for increasing model AiLive Inc., http://www.ailive.net/, 2006.
sizes. [6] P. Keir, J. Payne, J. Elgoyhen, M. Horner, M. Naef,
and P. Anderson. Gesture-recognition with
non-referenced tracking. In 3DUI ’06: Proceedings of
the average time to evaluate a gesture for HMMs with vary- the 3D User Interfaces (3DUI’06), pages 151–158,
ing number of states. These experiments were run on an Washington, DC, USA, 2006. IEEE Computer Society.
Intel Core 2 processor at 2.66Ghz with 4GB of RAM. As [7] J. Kela, P. Korpipaa, J. Mantyjarvi, S. Kallio,
shown in Figure 6, an HMM with 15 states can evaluate G. Savino, L. Jozzo, and D. Marca.
over 250 gestures per second. Note that this number is for a Accelerometer-based gesture control for a design
single model, thus equation 3 will introduce a scaling factor. environment. Personal Ubiquitous Comput.,
10(5):285–299, 2006.
Training the HMMs, however, can not be achieved in a real [8] C. Keskin, A. Erkan, and L. Akarun. Real time hand
time environment. We measured the average training time tracking and 3d gesture recognition for interactive
for a set of 10 gestures for increasing model sizes. These interfaces using hmm. In Proceedings of the Joint
results are presented in Figure 7. The time for training International Conference ICANN/ICONIP 2003.
significantly increases with the number of states. Our im- Springer.
plementation which uses 15 states takes about 10 seconds [9] J. Mantyjarvi, J. Kela, P. Korpipaa, and S. Kallio.
to train. The training for each user must therefore be done Enabling fast and effortless customisation in
offline, with a trade off between the training time and recog- accelerometer based gesture interaction. In MUM ’04:
nition rate. Proceedings of the 3rd international conference on
Mobile and ubiquitous multimedia, pages 25–31, New
York, NY, USA, 2004. ACM Press.
4. CONCLUSION [10] J. Payne, P. Keir, J. Elgoyhen, M. McLundie,
Natural, innovative input is increasingly becoming the sell- M. Naef, M. Horner, and P. Anderson. Gameplay
ing point for interactive applications. With this work we ex- issues in the design of spatial 3d gestures for video
plore how simple, easy to learn controls can lend themselves games. In CHI ’06: CHI ’06 extended abstracts on
to a highly strategic and player driven experience. Wiizard’s Human factors in computing systems, pages
stack based spell approach grants the players the freedom to 1217–1222, New York, NY, USA, 2006. ACM Press.
play at their skill level and strategy of their choice. Our hid-
[11] L. R. Rabiner. A tutorial on hidden markov models
den Markov model construction allows the players a level of
and selected applications in speech recognition. pages
input flexibility while providing easy extensions for more de-
267–296, 1990.
tailed game play. The accuracy of the recognition depends
on time spent on user training, and the number of states [12] J. Segen and S. Kumar. Fast and accurate 3d gesture
in the model. For high accuracy, user specific training is recognition interface. In ICPR ’98: Proceedings of the
required. 14th International Conference on Pattern
Recognition-Volume 1, page 86, Washington, DC,
Our gesture recognition system can perform in real time with USA, 1998. IEEE Computer Society.
high accuracy after an initial training period. The imple- [13] J. Segen and S. Kumar. Human-computer interaction
mentation achieves significant recognition rates with 10 − 20 using gesture recognition and 3d hand tracking. In
user samples, however we consider the machine training time ICIP (3), pages 188–192, 1998.
of 10 seconds to be limiting. Our game implementation will [14] J. M. Teixeira, T. Farias, G. Moura, J. Lima,
handle this by providing training sessions for each player. S. Pessoa, and V. Teichrieb. Gefighters: an experiment
The goal will be to train both the user and the system to- for gesture-based interaction analysis in a fighting
gether in an entertaining fashion. After this initial training game. In SBGames, Brazil, 2006.
period, in game data can be used to update the model. Fu-
ture work will explore the use of adaptive HMMs to avoid
this training overhead and explore alternative input devices
such as multi touch displays.
212