Sie sind auf Seite 1von 26

Exploring Gesture Based Interfaces using Wii Remotes and IR Lights

Jessica Areias Forbes Yevgeniy Goldenberg Jeffrey Maqsoudi Ashson Mirza Ritvik Mudur
ECSE 475 Design Project 2 Presented to: Kenneth Fraser Coordinator Frank P. Ferrie Supervisor April 14, 2009 McGill University - Electrical & Computer Engineering Department

ABSTRACT
The purpose of this project is to explore and develop an API that supports simple hand gestures. The hardware used includes two Wii remotes, a Bluetooth adapter, and infrared lights. The Wii remote acts as an infrared camera. With a glove having infrared light emitters on the fingertips, the setup will allow the API to distinguish gestures such as pinching fingers, turning hands, and so forth. The goal is to use positional information and temporal trajectories to track gestures.

Page | II

ACKNOWLEDGEMENTS
We would like to thank our supervisor Professor Frank Ferrie for his continued guidance, support and encouragement. We would also like to thank Professor Joelle Pineau for her advice and guidance in implementing our neural network. We would also like take this opportunity to thank Professor Kenneth Fraser, our course coordinator, for his moral support and attendance at our presentation.

Page | III

TABLE OF CONTENTS
ABSTRACT........................................................................................................................................ II ACKNOWLEDGEMENTS .................................................................................................................. III TABLE OF CONTENTS...................................................................................................................... IV 1 INTRODUCTION........................................................................................................................ 1 1.1 1.2 2 Why Wii? .......................................................................................................................... 1 Design Goals ..................................................................................................................... 1

Building the System ................................................................................................................. 1 2.1 2.2 2.3 Setup................................................................................................................................. 2 The Glove.......................................................................................................................... 2 The API.............................................................................................................................. 4 Code Structure and Dataflow.................................................................................... 4 Supported Functionality............................................................................................ 5 Gesture Detection ..................................................................................................... 7 Neural Networks ....................................................................................................... 8

2.3.1 2.3.2 2.3.3 2.3.4 3

Sample Applications .............................................................................................................. 12 3.1 3.2 3.3 3.4 3.5 Pong ................................................................................................................................ 12 Space Invaders................................................................................................................ 13 Paint................................................................................................................................ 14 3D Modeling ................................................................................................................... 14 Image Viewer.................................................................................................................. 15

Quantifying the System ......................................................................................................... 16


Page | IV

4.1 4.2 5 6 7

IR light tracking............................................................................................................... 16 Gesture recognition performance.................................................................................. 17

Limitations ............................................................................................................................. 18 Future Improvements............................................................................................................ 19 Conclusion ............................................................................................................................. 20

REFERENCES .................................................................................................................................. 21

Page | V

1 INTRODUCTION
The main goal of this design project is to replace mouse functions with hand gestures, which can be more natural and intuitive to use. To accomplish this, an API was created to recognize basic gestures.

1.1 Why Wii?


The design uses two Wii remotes to track the location of two to four points of light. This technology was initially chosen due to the resources available online. The open -source C# library and the several demonstrations using this library greatly helped in the jump start of this project. The required equipment is also easily accessible and well priced. The main equipment needed consists of two Wii remotes, a Bluetooth adapter, and IR lights.

1.2 Design Goals


The project consists of building an API that: Has a set of predetermined functions capable of recognizing basic gestures Is built in C# Can be used to implement interfaces for several applications such as: o Games such as Pong and Space Invaders o Image Viewer o 3D Modeling Tool

2 Building the System


Our project contains both software and hardware aspects. This section describes in detail the design of each component.
Page | 1

2.1 Setup
The setup includes two Wii remotes placed in parallel adjacent to each other. The user wears a glove with infra red LEDs on the index finger and thumb. The glove must be approximately one foot away from the Wiii remotes.

Figure 1. Setup

2.2 The Glove


The first step towards building the glove was to build and test a circuit which would meet the specifications of the LEDs. Namely, the LEDs required a voltage of approximately 1.5 volts across them and a current in the range of 60-120 mA. Resistors were placed in series with the LEDs (as shown in Figure 2) in order to set the cut-in voltage of the diodes at the appropriate
Page | 2

level, as well as prevent the LEDs from burning out. After building this circuit on a breadboard and applying 3.0 V as the source, we found that using 22 ohm resistors achieved the specifications outlined above. Therefore, in order to build the device, we needed a glove, 2 LEDs, some wires, two AA battery, a battery holder, some electrical tape and two 10 ohm resistors. The device can be seen in Figure 3. In order to use a single 1.5 V battery, the circuit requires eliminating any resistance. The setup works but there is the risk of the LEDs burning out and hence the below design was chosen.

1.5 V

R = 22

R = 22

1.5 V

Figure 2. Glove circuit

Figure 3. IR Glove

Page | 3

2.3 The API


The main tools used for the software component were Visual Studio 2008 and a C# programming language which allowed the rapid development of prototypes and easily test them. Using an object oriented design, the system was divided into separate modules. In addition, the design was made as efficient as possible since the system must be capable of detecting gestures in real time.

2.3.1 Code Structure and Dataflow


The central component of the system is the WiimoteLib library; an open source library written in C#. The library handles the task of communicating with each Wii remote and provides the system with current IR coordinates. The system has a special data structure called PosData which is a circular buffer capable of holding the last 100 IR points for each Wii remote and for each sensor. PosData contains an array and an index pointing to the oldest point. Every time a new point is added, it overwrites the oldest value and updates the index. The MainLib object then updates the Windows cursor, attempts to detect gestures using various techniques and notifies the application if a gesture match is found. Alternatively, the application can poll MainLib for changes.

Page | 4

Figure 4. Class Diagram

Figure 5. Data Flow Diagram

2.3.2 Supported Functionality


The first step of the design consisted of finding the most efficient way of tracking points of light. Both IR lights and reflective tape were experimented with. IR lights seemed to be the best option since reflective tape only works well under certain conditions. In fact, reflective tape usually works best in a dimmed environment since other sources of light reflect off of it, which interferes with the Wiis ability to track a specific point of light.
Page | 5

Note that the most basic gestures were implemented before attempting the more complex ones. Basic gestures include pinching and scrolling, while the more advanced gestures include zooming in and out, as well as other gestures that differ in their execution from one person to another.

2.3.2.1 Pinch
The pinch is the equivalent to a mouse click and can be used to select an object or initialize a more complex gesture. To detect a pinch a certain delta was chosen to represent the distance between both IR lights when attempting to pinch. Once the distance achieved, a pinch is detected. Through experimentation, it was found that, depending on the orientation of a users hand and of the IR lights, one of the IR lights was lost by the Wii in the attempt of a pinch. As a result, the pinch was undetected. To resolve this problem, an Almost Pinch state was created. The Almost Pinch state requires a distance slightly above the delta of a Pinch. If one of the IR lights is lost right after having entered the Almost Pinch state, a pinch is detected.

2.3.2.2 Scroll
Scrolling is the equivalent to using a mouse wheel and can be used to scroll up, down, left, and to the right of an image. To detect a scroll, the cursor, which maps the location of the IR light, must be in a specific region of the screen or image. To scroll left or right, for example, the cursor must be detected on the far left or right of the image respectively. Similarly, to scroll up or down an

Page | 6

image, the cursor must be located at the upmost or downmost region of the image. Once the specific condition detected, the image is scrolled in the corresponding direction.

2.3.2.3 Zoom
The zoom gesture can be used to zoom in and out of an image. To detect a zoom gesture, one must use two Wii remotes to track the changes in depth (zcoordinate) of the point of light. The IR light must be moved towards the Wii remotes to zoom in and away from them to zoom out. To measure the depth, two Wii remotes must be placed in parallel as shown in Figure 1. The distance between the Wii remotes must be preset. This allows the API to triangulate and determine the depth of each IR light detected by both Wii remotes.

2.3.2.4 Gestures that differ


There exist more complex gestures such as drawing a circle, drawing an X to close a window, or a combination of simple movements to execute other functions. To detect such gestures, both Finite State Machines (FSM) and Neural Networks were explored.

2.3.3 Gesture Detection


One way to detect gestures is to directly look at the coordinates of three equally spaced points in the PosData array. For example, to detect the Up+Left gesture as in Figure 6, the application verifies if the difference between the y coordinate of the Middle and Oldest points is greater than the threshold and also checks if the difference in the x coordinate of the Newest and

Page | 7

Middle points is greater than the threshold. If both are true, the gesture is marked as detected and the application is notified. Using this approach, a single gesture can be detected many times because many subsequent points can match the above criteria. To remedy this, once a gesture is detected, a timer is started and gesture detection is halted until the timer expires.

Figure 6. Up+Left Gesture Detection

Another method of detecting gestures is using a Finite State Machine. This allows the detection of more complicated gestures such as the X and the pinch. The last method used was Neural Networks. This approach is explained in detail in the following section.

2.3.4 Neural Networks


Neural networks present an interesting approach to achieving gesture recognition. It is a paradigm that is heavily inspired by the way that the human brain processes information. Similarly to the case of a human attempting to learn a new gesture, learning is done through examples and learning from errors. Each time a person tries to execute a gesture during the
Page | 8

learning process, they attempt to minimize the error when comparing it to the ideal gesture. In theory, no human can draw a perfect circle, but it is still possible to draw a shape that most people would consider as being a circle. In addition to being an intuitive way of recognizing new gestures, neural networks also offer other advantages. First, since neural networks receive many trials as its input, somewhere in the range of 10000 data sets, it is very good at recognizing patterns. As mentioned above, no human can draw a perfect circle but most people draw circles with similar characteristics. Also, neural networks are easy to implement because there is abundant literature on the subject. Many have used this tool to recognize shapes and patterns. Finally, neural networks are very fast and have a constant run-time. Training only needs to be done once at the beginning and the same data is used for all subsequent gesture recognitions.

Figure 7. Neural Network

Page | 9

As it is shown in Figure 7, a neural network has several input layers. It may have some middle layers and some target output layers. For the projects design, middle layers were not used since they are not useful for gesture recognition. The neural network used has many inputs and one target output which is a perfect normalized circle. The inputs are sent across the network with some random weights and are then compared to the output. At that point, the error is calculated and the gradient descent is propagated back into the network to recalculate the weights of the inputs. The best inputs will ultimately have the highest weights. To recognize a circular motion, ten thousand circles consisting of 100 points each were generated. Each circle has a random origin and radius length. After a circle is generated, Gaussian noise is added to it to mimic the imperfections of a human gesture as it is seen in Figure 8. These inputs are given a value of 1 since they are the expected outputs. With this input set, the program was able to recognize circles but there were several false-positives. When a long oval was drawn, it was being detected as being a circle. To resolve this issue, unwanted inputs such as lines (as seen in Figure 9) were introduced. These inputs are given the value 0 since they must not match the expected output.

Page | 10

Figure 8. Gaussian Circle

Figure 9. Gaussian Line

The neural network was first trained with 10000 desired inputs and 4000 unwanted inputs. After the training, to ensure that good results are obtained, the trained neural network is used on a separate validation set that was not used during the training. When a user uses the glove to depict a motion, the software tracks the 100 last points at any given moment. These 100 points are sent to the network and a value is outputted. If this value is greater than a
Page | 11

certain threshold value, it is considered to be a circle. As previously mentioned, it is impossible to draw a perfect circle, but if a value greater than 0.985 is outputted than it is a circle. Obviously, the testing must also reject non-circles and this is the case because their output is not greater than the threshold. The final neural network is able to successfully recognize a circle 3.5/5.0 times and is also capable of ignoring lines. It is unable to constantly recognize circles and reject falsepositives because our input set is quite limited due to the few gestures. This explains why motions such as the half-circle are being identified as being a full-circle.

3 Sample Applications
This section provides a brief sample of the applications of our product such as playing games such as Pong and Space Invaders, drawing using Paint, manipulate 3D objects using a 3D Modeling tool, and viewing and scrolling images in an Image Viewer application.

3.1 Pong
This application binds the location of the paddle to the vertical position of the users hand. The user can then move the IR light up and down and the paddle moves along with the user. The point of this application was to verify the sensitivity of the sensor as well as the responsiveness of the control.

Page | 12

Figure 10. Pong Screenshot

3.2 Space Invaders


This game consists of shooting the invaders that are on top of the screen by pinching. To avoid the enemys attacks, the ship can move horizontally by moving the index finger.

Figure 11. Space Invaders Screenshot

Page | 13

3.3 Paint
Paint is a common application found in the Windows operating system. With a simple pinch, it is possible to select a tool and then draw a picture.

Figure 12. Paint Screenshot

3.4 3D Modeling
By using the pinch movement, it is possible to select a vertex or edge and stretch the model. Also, it is possible to change the angle of view by pinching on an open area and moving ones fingers.

Page | 14

Figure 13. 3D Modeling Tool Screenshot

3.5 Image Viewer


The Image Viewer application was coded for scratch as a prototype for new ideas and new types of gestures. The application allows switching images by performing the Up+Right or Up+Left gestures. The user can also zoom in or out by pinching fingers and moving the fingers closer or further away from the Wii remotes. The application then zoom in or out proportionally to the depth. Finally, the user can scroll the image by positioning the cursor on the appropriate edge of the screen.

Page | 15

Figure 14. Image Viewer Screenshot showing the region where the cursor must be to scroll up the image

4 Quantifying the System


4.1 IR light tracking
An important metric to measure the performance of the system is its repeatability, which is the main concern of this section. The repeatability of tracking infra-red lights was tested along the horizontal and vertical axes of the Wii remotes coordinate system, and the depth measure provided by our API. To test the horizontal axis, the Wii remote was placed in a fixed position and the glove was used to draw a horizontal line (along a fixed trajectory) at various distances from the Wii remote. The glove was held in place for approximately one second at preset positions. This approach allowed us to estimate the variation in our readings. For the vertical axis, the Wii remote was simply placed on its side and the above test was repeated. The repeatability of the depth measure provided by our API was measured in a similar fashion. Two Wii remotes were placed in parallel at fixed distance (in a stereo setup). Without
Page | 16

moving the Wii remotes, the gloves IR light was held in place at three distinct distances from the remotes over five trials. The accuracy of the depth measure and the field of view provided by the Wii remotes was also measured. All trials were performed by the same user. The results of all tests are summarized below.
Table 1a. Variation (maximum) measured when testing the repeatability of our system.

Attribute Tested IR Tracking - Horizontal Axis IR Tracking - Vertical Axis Depth measure 15 cm 10-2 units 10-2 units 10-1 cm

Order of variation 20 cm 30 cm -2 10 units 10-3 units 10-2 units 10-3 units 10-1 cm 10-1 cm

Table 1b. Quantified measure of some other features

Feature Field of View (Horizontal) Field of View (Vertical) Depth measure accuracy

Property 22 degrees 23 degrees 0.5 cm

The variation on the horizontal and vertical axis was measured in units of raw data provided by the Wii remote. The order of variation seen in Table 1a maps from 1-10 pixels (10-3 to 10-2 units) for a 1024 x 768 resolution screen. Bearing in mind that a certain amount of error is caused by a users hand when moving the glove, the results of repeatability on both depth and infra-red light tracking are encouraging.

4.2 Gesture recognition performance


In order to test the performance of recognizing gestures, the success rate of two experienced and two inexperienced users was used. Each user attempted each gesture 10 times and the
Page | 17

number of successes was counted. The inexperienced users were provided a brief tutorial in using the system before their trials. A view of the IR lights being tracked was provided to all users. The results of these tests are summarized in the table below.
Table 2. Gesture recognition results

Pinch Novice 1 Novice 2 Expert 1 Expert 2 9/10 10/10 10/10 10/10

UpRight 9/10 9/10 10/10 10/10

UpLeft 8/10 10/10 10/10 10/10

X 8/10 8/10 9/10 10/10

Circle 7/10 5/10 8/10 7/10

From Table 2, it is apparent that the success rate drops for inexperienced users. One of the main causes for missed gestures (for both classes of users) was the Wii remote losing track of the IR lights on fingertips. The inexperienced users are not aware of how to orient their hands to ensure that the IR lights are seen by the Wii remotes camera, which is something learned from experience.

5 Limitations
There are some limitations to our project and they will be described in this section. Firstly, there are a limited amount of gestures that are properly detected. Users can only pinch and move up/down or left/right, or some combination of these gestures. This limitation is mainly due to the fact that we did not have enough time to implement more complex gestures. As

Page | 18

previously mentioned, neural networks have been used to recognize new gestures, but this requires proper training and elaborating a wide range of gestures. Secondly, it is difficult to continuously track the LEDs. This is mainly due to the brightness of the LEDs and the surrounding environment. If the LEDs are not bright enough, it is difficult for the Wii remote to detect them. A similar issue is distance. The Wii remote can only detect the LEDs up to a certain distance, after which point they are too far away to be detected. Another problem is the orientation of the LEDs with respect to the Wii remote. The LEDs have to be directly pointed towards the Wii remote; otherwise the Wii remote will not be able to detect the LEDs. Similarly, if the IR lights are out of the field of view of the Wii remote, it obviously can no longer track the LEDs.

6 Future Improvements
One of the main hardware limitations is the unidirectional aspect of the IR light LEDs. A possible improvement can be to use LEDs that are more omnidirectional and spread the light evenly in all directions. Another solution can be to use multiple LEDs pointed in different directions. We can also try surrounding each LED with a reflective material that will reflect the IR light when the finger is pointed away from the Wii remote. In order to improve the gesture recognition reliability, we can try training the neural network with other types of inputs. For example, we can try using the cosine of each point instead of the x and y coordinates to establish more unique features. We can also try sampling the inputs. For example, instead of using every single point as an input, we can try using every second point.
Page | 19

Another improvement would be to use a Kalman filter in order to smooth the measurements of the IR position. The filter will help to get rid of the measurement noise as well as the trembling in the users hand. Finally, we can use other algorithms for gesture detection such as Support Vector Machines (SVM) or the Hidden Markov Model (HMM). SVM is a supervised learning algorithm that is often used for classification. In the Hidden Markov Model, the system is assumed to be a Markov process where the hidden state is defined by the gesture that the user is performing.

7 Conclusion
Gesture recognition is gaining popularity in todays world. Products such as the iPhone or the Nintendo Wii are testaments to this growing trend. The purpose of this project was to explore such an interface by designing a low-cost system that provides a range of functionality. This was achieved by developing API that uses the Wii remote with an IR-light glove. The goal of providing a variety of gestures for different types of applications was accomplished. Moreover, the API also incorporates Neural Networks that can be trained to recognize various gestures. Although the performance of the network trained for this project was not optimal, the framework to support such functionality was implemented. The current version of the product does have its share of limitations; however these can be addressed with additional improvements. In the end, the system developed can be extended for use in various applications, and with some further adjustments, it could be a solid product that might be worth marketing.

Page | 20

REFERENCES
Research papers:
[1] K. Boehm, W. Broll, M. Sokolewicz, Dynamic gesture recognition using neural networks: a fundament for advanced interaction construction, in Stereoscopic Displays and Virtual Reality Systems, Proc. SPIE, Vol. 2177, 336 (1994); DOI:10.1117/12.173889, San Jose, CA, USA, November, 2004. [2] M. Black, A. Jepson, Recognizing temporal trajectories using the Condensation algorithm, In Proceedings of the International Conference on Automatic Face and Gesture Recognition (Nara, Japan, 1998), pp. 16-21. [3] Y. Yuan, K. Barner Hybrid Feature Selection For Gesture Recognition Using Support Vector Machines, IEEEXplore, Accessed: March 30, 2009 [4] Lee, Johnny Chung. Hacking the Nintendo Wii Remote, Pervasive Computing, IEEE, Volume: 7, Issue: 3, pp 39-45, July 15 2008 nteraction, Bonn, Germany, 2008. [5] T. Schlomer et al., Gesture Recognition with a Wii Controller, in Proceedings of the 2nd international conference on Tangible and embedded I

Page | 21

Das könnte Ihnen auch gefallen