You are on page 1of 1

Gesture Recognition Using Ishikawa Namiki Laboratory

UNIVERSITY OF TOKYO
Laser-Based Tracking System http://www.k2.t.u­tokyo.ac.jp/

Stéphane Perrin, Alvaro Cassinelli and Masatoshi Ishikawa
Introduction 2D Tracking
Input of information is becoming a challenging task as portable electronic devices become smaller.
Alternatives to the keyboard have been proposed for portable devices such as personal digital assistants. For Tracking is based on the analysis of a temporal signal corresponding to
example, input of data is often done through a touch sensitive screen using a prescribed input method, such the amount of backscattered light measured during a laser saccade
as Graffiti®. The next step is to remove the need for an input device such as a stylus, thus allowing input using generated in the neighborhood of the tracked object.
only the fingers.
A continuously generated
saccade whose trajectory
The complexity of the
falls fully inside the object
 Based on these considerations, hardware setup is
surface is used to obtain a
a very simple active tracking equivalent to that of a
sensitive tracking: as the
system is proposed whose main portable laser-based
object moves, a relatively
purpose is to provide a user with a barcode reader.
small portion of the saccade
natural way of interacting with a It is interesting to note will fall outside the object
portable device or a computer that this tracking system surface and the
through the recognition of finger does not require the backscattered signal will
gestures. user to hold any special momentarily drop. Both the
device. angular width and the
relative position of that
portion can be determined
This simple active tracking system is used as an input method of data, that is characters, similarly to the use by the computer. Using such
of a stylus on some PDAs. Gesture Recognition itself is performed using HMMs. information, an accurate
Other applications of the described system are proposed, some of them using the ability to track 3D gestures. translation vector is derived
and used to re-center the
saccade back inside the

3D Tracking object again.

It is important to note that the maximum speed experimentally obtained
The maximum working distance and the achievable precision of the estimated depth (depth resolution), are was about 2.75 m/s for a tracked object whose size is approximately the
both dependent on the noise characteristics of the backscattered signal as well as the illumination same as a finger tip. This speed value is higher than the typical speed of a
background. A theoretical study was conducted in order to determine these two characteristics of the finger performing gestures.
system. An experimental verification was then performed.
Depth Resolution Maximum Working Distance
It was verified that the system has optimal depth
resolution when the finger is placed at a distance
As expected, the tracking robustness decreases
with distance. If a minimum confidence of 95% is
Gesture Recognition
of around 5.5 cm from the mirrors. Two points sought for a proper discrimination between the
whose depths differ by 5.5 mm can be object and the background, then in our present The first application of the system is the input of 2D gestural characters in a
discriminated with 95% confidence, and a finer configuration (and using a Class-I equivalent laser similar way to what is done using a stylus and the Graffiti® method.
resolution of 1.7 mm is achieved with 70% source) the maximum working distance is about Six characters from the modern Latin alphabet were selected (A, B, C, D, E
confidence. The system can reach sub-millimetric 166 cm. This is more than enough for the and S) in order to evaluate the performance of the
precision with a lower confidence of 60%. application considered. proposed system.
At 30 cm, the resolution drops to the rather The chosen method for recognition was based on HMMs.
unusable level of 20 cm (for 95% confidence).
Then, if one is seeking a 95% confidence for The results obtained in such
depth discrimination, only a few ”levels” (in the conditions showed several
depth direction) would be available in an problems:
operating area limited to 10 or 20 centimeters - First, only gestural characters
wide. whose first and last points were
not at the same position (that is,
all considered characters except D
and B) could be successfully
recognized. Then, there was some
confusion in the recognition of the
character C which when
performed in a row, looked like
several A’s.
- Secondly, the HMM-based A method to obtain better results
method was unable to distinguish could be to divide the actual
a short pause inside a character morphemes, that is whole
gesture from the pause between characters, into smaller morphemes
two gestures. such as segments or curves.

Future Works
In the case of finger gestures, and especially handwritten characters Further research will be conducted on an efficient interface based on 3D gesture recognition. For
recognition, other methods appear to be more suited. Instead of recognizing example, the use of the third dimension allows switching between different modes, such as from
sequences of characters, one can consider recognizing sequences of words. character input mode to a mouse-like mode (where a pointer is moved according to the finger
Doing this prevents the user to perform a special gesture between each movement). Alternatively, in drawing application, switching from a pencil tool to an eraser tool, for
character, as suggested before. example, is possible. This switching can be done in a similar way to that used for indicating the
Each word would be explicit transition, as described above. Of course, the third dimension can be used for more complex
”written” in the air by the applications that fully make use of this feature of the system.
user with one of his
fingers. Once a word is
completed, the following One application could be to define a complete 3D
word would be written alphabet in order to allow a more natural and richer
roughly at the same interaction language with a machine. Another could be
position. The movement to allow the user to input 3D shapes or drawings. For
made by the user from more specialized applications, the 3D capacity offers
the end of a word to the users a natural and powerful way to perform some
beginning of the next tasks, such as controlling the zooming of a map.
one can be recognized A further step is to use gestures along with other
by the recognition modalities, such as speech. Such a multimodal system
system as a ”space” could provide a user interface that would combine the
between words. complexity and the naturalness of human interaction.