Beruflich Dokumente
Kultur Dokumente
Jessica Areias Forbes Yevgeniy Goldenberg Jeffrey Maqsoudi Ashson Mirza Ritvik Mudur
ECSE 475 Design Project 2 Presented to: Kenneth Fraser Coordinator Frank P. Ferrie Supervisor April 14, 2009 McGill University - Electrical & Computer Engineering Department
ABSTRACT
The purpose of this project is to explore and develop an API that supports simple hand gestures. The hardware used includes two Wii remotes, a Bluetooth adapter, and infrared lights. The Wii remote acts as an infrared camera. With a glove having infrared light emitters on the fingertips, the setup will allow the API to distinguish gestures such as pinching fingers, turning hands, and so forth. The goal is to use positional information and temporal trajectories to track gestures.
Page | II
ACKNOWLEDGEMENTS
We would like to thank our supervisor Professor Frank Ferrie for his continued guidance, support and encouragement. We would also like to thank Professor Joelle Pineau for her advice and guidance in implementing our neural network. We would also like take this opportunity to thank Professor Kenneth Fraser, our course coordinator, for his moral support and attendance at our presentation.
Page | III
TABLE OF CONTENTS
ABSTRACT........................................................................................................................................ II ACKNOWLEDGEMENTS .................................................................................................................. III TABLE OF CONTENTS...................................................................................................................... IV 1 INTRODUCTION........................................................................................................................ 1 1.1 1.2 2 Why Wii? .......................................................................................................................... 1 Design Goals ..................................................................................................................... 1
Building the System ................................................................................................................. 1 2.1 2.2 2.3 Setup................................................................................................................................. 2 The Glove.......................................................................................................................... 2 The API.............................................................................................................................. 4 Code Structure and Dataflow.................................................................................... 4 Supported Functionality............................................................................................ 5 Gesture Detection ..................................................................................................... 7 Neural Networks ....................................................................................................... 8
Sample Applications .............................................................................................................. 12 3.1 3.2 3.3 3.4 3.5 Pong ................................................................................................................................ 12 Space Invaders................................................................................................................ 13 Paint................................................................................................................................ 14 3D Modeling ................................................................................................................... 14 Image Viewer.................................................................................................................. 15
4.1 4.2 5 6 7
REFERENCES .................................................................................................................................. 21
Page | V
1 INTRODUCTION
The main goal of this design project is to replace mouse functions with hand gestures, which can be more natural and intuitive to use. To accomplish this, an API was created to recognize basic gestures.
2.1 Setup
The setup includes two Wii remotes placed in parallel adjacent to each other. The user wears a glove with infra red LEDs on the index finger and thumb. The glove must be approximately one foot away from the Wiii remotes.
Figure 1. Setup
level, as well as prevent the LEDs from burning out. After building this circuit on a breadboard and applying 3.0 V as the source, we found that using 22 ohm resistors achieved the specifications outlined above. Therefore, in order to build the device, we needed a glove, 2 LEDs, some wires, two AA battery, a battery holder, some electrical tape and two 10 ohm resistors. The device can be seen in Figure 3. In order to use a single 1.5 V battery, the circuit requires eliminating any resistance. The setup works but there is the risk of the LEDs burning out and hence the below design was chosen.
1.5 V
R = 22
R = 22
1.5 V
Figure 3. IR Glove
Page | 3
Page | 4
Note that the most basic gestures were implemented before attempting the more complex ones. Basic gestures include pinching and scrolling, while the more advanced gestures include zooming in and out, as well as other gestures that differ in their execution from one person to another.
2.3.2.1 Pinch
The pinch is the equivalent to a mouse click and can be used to select an object or initialize a more complex gesture. To detect a pinch a certain delta was chosen to represent the distance between both IR lights when attempting to pinch. Once the distance achieved, a pinch is detected. Through experimentation, it was found that, depending on the orientation of a users hand and of the IR lights, one of the IR lights was lost by the Wii in the attempt of a pinch. As a result, the pinch was undetected. To resolve this problem, an Almost Pinch state was created. The Almost Pinch state requires a distance slightly above the delta of a Pinch. If one of the IR lights is lost right after having entered the Almost Pinch state, a pinch is detected.
2.3.2.2 Scroll
Scrolling is the equivalent to using a mouse wheel and can be used to scroll up, down, left, and to the right of an image. To detect a scroll, the cursor, which maps the location of the IR light, must be in a specific region of the screen or image. To scroll left or right, for example, the cursor must be detected on the far left or right of the image respectively. Similarly, to scroll up or down an
Page | 6
image, the cursor must be located at the upmost or downmost region of the image. Once the specific condition detected, the image is scrolled in the corresponding direction.
2.3.2.3 Zoom
The zoom gesture can be used to zoom in and out of an image. To detect a zoom gesture, one must use two Wii remotes to track the changes in depth (zcoordinate) of the point of light. The IR light must be moved towards the Wii remotes to zoom in and away from them to zoom out. To measure the depth, two Wii remotes must be placed in parallel as shown in Figure 1. The distance between the Wii remotes must be preset. This allows the API to triangulate and determine the depth of each IR light detected by both Wii remotes.
Page | 7
Middle points is greater than the threshold. If both are true, the gesture is marked as detected and the application is notified. Using this approach, a single gesture can be detected many times because many subsequent points can match the above criteria. To remedy this, once a gesture is detected, a timer is started and gesture detection is halted until the timer expires.
Another method of detecting gestures is using a Finite State Machine. This allows the detection of more complicated gestures such as the X and the pinch. The last method used was Neural Networks. This approach is explained in detail in the following section.
learning process, they attempt to minimize the error when comparing it to the ideal gesture. In theory, no human can draw a perfect circle, but it is still possible to draw a shape that most people would consider as being a circle. In addition to being an intuitive way of recognizing new gestures, neural networks also offer other advantages. First, since neural networks receive many trials as its input, somewhere in the range of 10000 data sets, it is very good at recognizing patterns. As mentioned above, no human can draw a perfect circle but most people draw circles with similar characteristics. Also, neural networks are easy to implement because there is abundant literature on the subject. Many have used this tool to recognize shapes and patterns. Finally, neural networks are very fast and have a constant run-time. Training only needs to be done once at the beginning and the same data is used for all subsequent gesture recognitions.
Page | 9
As it is shown in Figure 7, a neural network has several input layers. It may have some middle layers and some target output layers. For the projects design, middle layers were not used since they are not useful for gesture recognition. The neural network used has many inputs and one target output which is a perfect normalized circle. The inputs are sent across the network with some random weights and are then compared to the output. At that point, the error is calculated and the gradient descent is propagated back into the network to recalculate the weights of the inputs. The best inputs will ultimately have the highest weights. To recognize a circular motion, ten thousand circles consisting of 100 points each were generated. Each circle has a random origin and radius length. After a circle is generated, Gaussian noise is added to it to mimic the imperfections of a human gesture as it is seen in Figure 8. These inputs are given a value of 1 since they are the expected outputs. With this input set, the program was able to recognize circles but there were several false-positives. When a long oval was drawn, it was being detected as being a circle. To resolve this issue, unwanted inputs such as lines (as seen in Figure 9) were introduced. These inputs are given the value 0 since they must not match the expected output.
Page | 10
The neural network was first trained with 10000 desired inputs and 4000 unwanted inputs. After the training, to ensure that good results are obtained, the trained neural network is used on a separate validation set that was not used during the training. When a user uses the glove to depict a motion, the software tracks the 100 last points at any given moment. These 100 points are sent to the network and a value is outputted. If this value is greater than a
Page | 11
certain threshold value, it is considered to be a circle. As previously mentioned, it is impossible to draw a perfect circle, but if a value greater than 0.985 is outputted than it is a circle. Obviously, the testing must also reject non-circles and this is the case because their output is not greater than the threshold. The final neural network is able to successfully recognize a circle 3.5/5.0 times and is also capable of ignoring lines. It is unable to constantly recognize circles and reject falsepositives because our input set is quite limited due to the few gestures. This explains why motions such as the half-circle are being identified as being a full-circle.
3 Sample Applications
This section provides a brief sample of the applications of our product such as playing games such as Pong and Space Invaders, drawing using Paint, manipulate 3D objects using a 3D Modeling tool, and viewing and scrolling images in an Image Viewer application.
3.1 Pong
This application binds the location of the paddle to the vertical position of the users hand. The user can then move the IR light up and down and the paddle moves along with the user. The point of this application was to verify the sensitivity of the sensor as well as the responsiveness of the control.
Page | 12
Page | 13
3.3 Paint
Paint is a common application found in the Windows operating system. With a simple pinch, it is possible to select a tool and then draw a picture.
3.4 3D Modeling
By using the pinch movement, it is possible to select a vertex or edge and stretch the model. Also, it is possible to change the angle of view by pinching on an open area and moving ones fingers.
Page | 14
Page | 15
Figure 14. Image Viewer Screenshot showing the region where the cursor must be to scroll up the image
moving the Wii remotes, the gloves IR light was held in place at three distinct distances from the remotes over five trials. The accuracy of the depth measure and the field of view provided by the Wii remotes was also measured. All trials were performed by the same user. The results of all tests are summarized below.
Table 1a. Variation (maximum) measured when testing the repeatability of our system.
Attribute Tested IR Tracking - Horizontal Axis IR Tracking - Vertical Axis Depth measure 15 cm 10-2 units 10-2 units 10-1 cm
Order of variation 20 cm 30 cm -2 10 units 10-3 units 10-2 units 10-3 units 10-1 cm 10-1 cm
Feature Field of View (Horizontal) Field of View (Vertical) Depth measure accuracy
The variation on the horizontal and vertical axis was measured in units of raw data provided by the Wii remote. The order of variation seen in Table 1a maps from 1-10 pixels (10-3 to 10-2 units) for a 1024 x 768 resolution screen. Bearing in mind that a certain amount of error is caused by a users hand when moving the glove, the results of repeatability on both depth and infra-red light tracking are encouraging.
number of successes was counted. The inexperienced users were provided a brief tutorial in using the system before their trials. A view of the IR lights being tracked was provided to all users. The results of these tests are summarized in the table below.
Table 2. Gesture recognition results
From Table 2, it is apparent that the success rate drops for inexperienced users. One of the main causes for missed gestures (for both classes of users) was the Wii remote losing track of the IR lights on fingertips. The inexperienced users are not aware of how to orient their hands to ensure that the IR lights are seen by the Wii remotes camera, which is something learned from experience.
5 Limitations
There are some limitations to our project and they will be described in this section. Firstly, there are a limited amount of gestures that are properly detected. Users can only pinch and move up/down or left/right, or some combination of these gestures. This limitation is mainly due to the fact that we did not have enough time to implement more complex gestures. As
Page | 18
previously mentioned, neural networks have been used to recognize new gestures, but this requires proper training and elaborating a wide range of gestures. Secondly, it is difficult to continuously track the LEDs. This is mainly due to the brightness of the LEDs and the surrounding environment. If the LEDs are not bright enough, it is difficult for the Wii remote to detect them. A similar issue is distance. The Wii remote can only detect the LEDs up to a certain distance, after which point they are too far away to be detected. Another problem is the orientation of the LEDs with respect to the Wii remote. The LEDs have to be directly pointed towards the Wii remote; otherwise the Wii remote will not be able to detect the LEDs. Similarly, if the IR lights are out of the field of view of the Wii remote, it obviously can no longer track the LEDs.
6 Future Improvements
One of the main hardware limitations is the unidirectional aspect of the IR light LEDs. A possible improvement can be to use LEDs that are more omnidirectional and spread the light evenly in all directions. Another solution can be to use multiple LEDs pointed in different directions. We can also try surrounding each LED with a reflective material that will reflect the IR light when the finger is pointed away from the Wii remote. In order to improve the gesture recognition reliability, we can try training the neural network with other types of inputs. For example, we can try using the cosine of each point instead of the x and y coordinates to establish more unique features. We can also try sampling the inputs. For example, instead of using every single point as an input, we can try using every second point.
Page | 19
Another improvement would be to use a Kalman filter in order to smooth the measurements of the IR position. The filter will help to get rid of the measurement noise as well as the trembling in the users hand. Finally, we can use other algorithms for gesture detection such as Support Vector Machines (SVM) or the Hidden Markov Model (HMM). SVM is a supervised learning algorithm that is often used for classification. In the Hidden Markov Model, the system is assumed to be a Markov process where the hidden state is defined by the gesture that the user is performing.
7 Conclusion
Gesture recognition is gaining popularity in todays world. Products such as the iPhone or the Nintendo Wii are testaments to this growing trend. The purpose of this project was to explore such an interface by designing a low-cost system that provides a range of functionality. This was achieved by developing API that uses the Wii remote with an IR-light glove. The goal of providing a variety of gestures for different types of applications was accomplished. Moreover, the API also incorporates Neural Networks that can be trained to recognize various gestures. Although the performance of the network trained for this project was not optimal, the framework to support such functionality was implemented. The current version of the product does have its share of limitations; however these can be addressed with additional improvements. In the end, the system developed can be extended for use in various applications, and with some further adjustments, it could be a solid product that might be worth marketing.
Page | 20
REFERENCES
Research papers:
[1] K. Boehm, W. Broll, M. Sokolewicz, Dynamic gesture recognition using neural networks: a fundament for advanced interaction construction, in Stereoscopic Displays and Virtual Reality Systems, Proc. SPIE, Vol. 2177, 336 (1994); DOI:10.1117/12.173889, San Jose, CA, USA, November, 2004. [2] M. Black, A. Jepson, Recognizing temporal trajectories using the Condensation algorithm, In Proceedings of the International Conference on Automatic Face and Gesture Recognition (Nara, Japan, 1998), pp. 16-21. [3] Y. Yuan, K. Barner Hybrid Feature Selection For Gesture Recognition Using Support Vector Machines, IEEEXplore, Accessed: March 30, 2009 [4] Lee, Johnny Chung. Hacking the Nintendo Wii Remote, Pervasive Computing, IEEE, Volume: 7, Issue: 3, pp 39-45, July 15 2008 nteraction, Bonn, Germany, 2008. [5] T. Schlomer et al., Gesture Recognition with a Wii Controller, in Proceedings of the 2nd international conference on Tangible and embedded I
Page | 21