Sie sind auf Seite 1von 14

Wii-Remote

Wiimote Infrared Detection


Gregory Peaker (Teammate: Max Chiswick)

Wii-Remote Wiimote Infrared Detection by Gregory Peaker Computer vision technology is used word-wide to reproduce human visual perception in a computer. This field is cross-disciplinary because it harnesses techniques from mathematical, biological, and other systems to emulate our eyes and brain. The most essential component is obtaining images, and then processing this into information understood by a computer. A computer can process images and video, and to a small extent understand the content. A lot of computer vision is analogous to signal processing, where sampling, quantization, transformations, and etc. are applied to images and video. This makes computer vision an interesting exercise in developing artificial intelligence and has many applications to society. Most of the applications Ive seen today are meant to aid human behavior. For instance, the DARPA Grand Challenge aims to develop fully autonomous vehicles to further technology that can provide drivers assistance. DARPA vehicles use a range of sensors, including 2D imaging, 3D terrain mapping, and laser range finding, in developing information that is later understood by the computer. I find the integration of disparate technologies particularly interesting in developing autonomous driving. As humans, we most often rely on one sensory input for our actions (for instance, our walking movement is based on vision not on hearing, touching, smelling, or tasting). DARPA competitors make the task of integrating different sensor inputs to create a serialized flow of information for computer processing seem like childs play. I am curious to see the novel technologies people will come out tomorrow from techniques developed in the DARPA Grand Challenge and similar competitions. Another useful application of computer vision is aiding sports broadcast and analysis. I am an avid tennis player and enthusiast, and I am always amazed in seeing computer vision applied to the game. Multiple high-speed cameras and computer vision are used to track the tennis ball then 3D

visualization allows me to views games in more detail and receives better sports broadcaster analysis. I am most interested in these technologies applications to human-computer interaction. I am consistently surrounded with technology; I wake up in the morning to a digital alarm clock. I then check my email by dividing time between a Smartphone and cutlery. This pattern of pressing and typing on devices repeats all day until I go to sleep. I think tomorrows world will see computer vision applied to changing how we interact with computers and data. Human gesture tracking can be used to play video games and interacting with objects on the screen. My project applies technology found in the Nintendo Wii game console system that allows people to play video games and manipulate on-screen information in demonstrating what computer vision can do for human-computer interaction tomorrow. This class has taught me general histogram analysis algorithms and several morphological operators. I leave this class with very useful knowledge of color space models including RGB and HSV. We have examined image matrices in spatial and polar form and used dilation and erosion in various combinations. Opening and closing morphological operations allow for edge/border detection and this can be commonly seen in our digital photo cameras. We have analyzed and enhanced images knowing information about a specific pixel and its surrounding area and statistic analysis across a whole image. Image histograms show the distribution of pixel intensities in an image. We have applied statistical analysis, for instance mean and standard deviation, in our machine problems. Edge detection is used to find boundaries between objects, and this done by finding places in the image where pixel intensities quickly change and derivatives are used here. The Hough transformation is used to find lines by modeling the parametric representation of a line. Peak values in a Hough transformation reveal possible lines in an image. This class has provided a foundation that enables me to apply my computer science acumen to a diverse range of problems facing todays computer vision. Through the weekly machine problems I

learned how to successfully simplify, plan, and complete projects in a timely manner. I am more interested in the synergy of computer vision and human-compute interaction and most importantly I have honed my engineering and analytical skills. I would have enjoyed learning about machine learning in computer vision. Machine learning applies to my final project, and I think the success and results of this project would have greatly increased by using machine learning in my gesture recognition algorithm. I would want to study more the importance of the correlation in statistics. I would like to have learned more about image enhancement techniques understanding the techniques and theories used in automatic color adjustment, exposure adjustment, and image de-blurring. We touched upon this idea with histogram equalization, but I would like to have delved further for another class. Understanding linear, median, and adaptive filtering would come in handy for the occasional Photoshop project. My project deals with a new and in my opinion very interesting - application of computer vision in the IR spectrum. I would like to have seen more real-world applications of computer vision then gain knowledge in the algorithms and how they work. It would be interesting to learn the algorithms used in infrared multi-touch products like the Microsoft Surface. Learning more on face detection, redeye removal, and other algorithms seen in digital cameras would be fascinating. Another area I would like to study more is the integration of 3D vision systems using lasers, range sensors, tomographic sensors, RADAR/LADAR sensors, and other imaging technologies. This type of integration can be seen in projects like DARPAs autonomous vehicles challenge.

Project Description The goal in this project is to create a very inexpensive computer vision system that allows interaction between a computer and a human (human-computer interaction). Additionally, this is to be done in approximately one months time. The interaction is performed by moving one hand or finger in a two-dimensional space and determining its best fit to one of twelve possible gestures/moves. Each gesture controls a specific function within a computer application thus allowing a human to interact with a computer through the air. Information is obtained using an infrared camera. Moreover, I want to learn about infrared and gain working experience in this spectrum. Problems with similar systems today are large computing requirement and ineffectiveness when light background or skin tone is changed from the visible light spectrum. Additionally, interacting with computers is difficult because we are limited to movements of the mouse and keyboard. Humans do not interact with real-world objects the same way we interact with computers. For instance, we open our hand, move the hand around an object, close our hand, move our hand again, and then open the hand to move physical objects to its desired location. Using computers, we move our hand to the mouse, open then close our hand around the mouse, find the cursor position on the screen, move the mouse and corresponding cursor to our desired object, click a button, move the mouse, then release the mouse button when the object is in its desired location.

My Design Thinking about the previous problem statement, I realized that the large computing requirement and ineffectiveness between working environments is caused by problems with cameras in the visible light spectrum. I was then determined to not use the visible light spectrum thus leaving the radio frequency, microwave, infrared, ultraviolet, and other spectrums viable alternatives. Further deduction showed several existing computer vision systems using the infrared spectrum (most notable the Nintendo Wii game console and Microsofts multi-touch Surface). It was then determined that an infrared detector and emitter is necessary. I placed a budgetary constraint on the equipment to be less than $50 this level was considered matched my goal for an inexpensive system. A detector (for instance a camera) and emitter are required for human-computer interaction. The budgetary requirement and need to work in the infrared spectrum generated two viable options. First is the $40 Wii remote that has a built-in 1024x768 infrared camera. Second is a $20-$50 webcam that is modifiable into an infrared camera (1). The second option was quickly eliminated because of a need for custom hardware modification (this would take too much time and I do not have the necessary skills). The Wii remote was the remaining option for infrared detection. I decided to construct my own infrared emitter since it only required an infrared LED, resistor, and battery pack. Gesture recognition was decided as the best method to demonstrate the viability of humancomputer interaction using computer vision in the infrared spectrum. I used the Interlink VP 6600 presentation remote to model the best computer functions for this demonstration (2) (Figure 1). This remote controls media and powerpoint functions on its host computer. This comes out to eleven unique Figure 1: Interlink VP6600 functions (Play/Pause, Previous music track, Next music track, Stop, Volume Up, Volume Down, Volume mute/unmute, Up, Down, Left, and Right

the last four controlling PowerPoint). Further research into gesture recognition showed most using machine learning techniques for neural networks. It was decided that those systems were outside of my skillset and I need to create a simpler gesture recognition system. This requires me to work with straight-line gestures and I was able to come up with twelve uniquiely recognizable gestures (Left, Right, Up, Down, Up-Left, UpRight, Down-Left, Down-Right, Right-Up, Right-Down, Left-up, and Left-Down Figure 2). Using this framework of a Wii remote with a built-in infrared camera, infrared LED, and a desired set of gestures I set out to build the following system where I will discuss the Wii Remote, how it connects and is understood by the computer, then how my gesture recognition algorithm works. The Wii Remote (WiiMote) The Wiimote is the primary input device for the Nintendo Wii gaming console. It is a one-handed remote control that supports a very intuitive motion sensitivity. The remote is designed perfectly for manipulating objects and characters on the screen, and this design has appealed to non-gamers. The manipulation synergizes the built-in accelerometer and front-facing optical sensor, and it also has 11 input buttons (as seen in Figure 3). This device measures 5.8 long, 1.4 wide, and 1.2 tall. The built-in Bluetooth works up to thirty feet and the optical sensor works up to fifteen feet. A set of four LEDs Figure 3: The Wii Remote Figure 2: Twelve uniquely recognizable gestures

indicate player number with the Wii console and remaining battery in quartiles. These features have made the Wii remote a popular hacking project; for instance, the accelerometer is used as virtual drumsticks in the Virtual Drum Kit program and infrared capability can often be seen as a replacement for mouse input (3). The Wiimote uses a very standard Bluetooth wireless link (4). A Broadcom Bluetooth System-ona-chip is used to process the eleven available button inputs and to send optical data sensor to the host Bluetooth device. The standard Bluetooth Human Interface (HID) standard is implemented this is the same standard any Bluetooth keyboard or mouse uses. A Bluetooth host uses Bluetooth Service Discovery Protocol (SDP) to receive vendor and product ID - all Wiimotes have the same IDs. This allows any application to query the operating systems Bluetooth stack for all available devices, and the ID uniquely identifies the Wiimote device from all other Bluetooth devices. Full duplex communication is performed and communicates at most 100Hz between the computer and the Wiimote with all discrete packets equaling 22 bytes. Any button press or release event triggers a new packet; moreover, no encryption or authentication is used. Most features of the Wiimote have been fully reverse-engineered; areas that have not been completed include advanced functionality of the IR Camera and built-in speaker. 3-axis linear accelerometer is housed near the center of the remote. The accelerometer use tiny masses attached to silicon springs, and the movement of this spring causes voltage differences that is measured and used to determine Force applied by the mass. One determines acceleration using the simple physics formula F=m*a, and this device is able to measure +/- 3g with 10% sensitivity. The microscopic design of this accelerometer intrinsically makes precise mass Figure 4: The Coordinate system used for accelerometers 7

production of this device difficult; however, the Wiimote performs software calibration when its first started and stores it in memory. These facts are used to derive the Wiis acceleration and tilt-rotation values. During Nintendo Wiis development, researchers found the accelerometer to be inaccurate in cursor positioning. The engineers came up with the idea of adding an infrared image sensor with two stationary IR beacons. These IR beacons are housed in the Sensor Bar which can be located above or below the TV; each IR beacon consists of 5 IR LEDs where the farthest LED is pointed slightly away from the center, the LED closest to the center is pointed slightly towards the middle, and the other three LEDs point straight most likely maximizing the WiiMotes field of view. This gives the Wiimote about fifteen range. Triangulation due to a fixed distance between the IR beacons determined the rotation and distance from the TV. The infrared sensor is a 1024x768 monochrome camera with an integrated IR-pass filter. Similar to Bluetooth, the camera is a Systemon-a-chip design with a built-in processor capable of tracking up to four moving objects emitting IR light. Due to Bluetooth bandwidth constraints, the Wiimote is unable to send raw image back and relies on the built-in object tracking to send coordinate pairs (x,y) and intensity values for up t four moving objects. Figure 4: IR sensor data is used to determine where a cursor should be on the screen. Figure 4: The Nintendo Wii Sensor Bar and highlighted IR LEDs.

Wiimote C# API Using .NET and C#, a reverse engineered API for the Wiimote on Windows has been developed. There are additional APIs for Linux and Mac, but I will be focusing my attention on a single .NET API. This can be included in any program as a Dynamic Link Library (DLL). Steps for using this API is very simple: First, pair the Wiimote with the computers Bluetooth stack and install generic keyboard/mouse drivers. Second, initialize the DLL and it will automatically search, find, connect, and start retrieving data from the Wiimote. A button in the battery compartment places the Wiimote into pairing mode. One found and paired with a computer, it is identified as a Human Interface Device (HID) compliant and generic Win32/Win64 drivers are installed. The P/Invoke feature of .NET allows C# to receive and send information from Windows API functions meaning the Wiimote is paired with the Bluetooth stack, Windows API functions are able to access the Bluetooth stack, and P/Invoke in C# accesses the Windows API functions. The API searches searches then connects to the Wiimote in this order: 1. Attain references to GUID and HID classes in Windows. 2. Receive list of all HID devices on the computer. 3. Loop through the list and receive detailed information on each device. 4. Select all devices that match the Wiimotes Vendor ID and Product ID (this allows multiple Wiimotes to connect simulateneously) 5. Create a FileStream object to each device
Windows API Wiimote & Bluetooth

The Wiimote is paired with the windows using Bluetooth The Wiimote can be found in the Bluetooth Stack.

Windows has built-in API calls for bi-directional communication with devices in the Bluetooth Stack.

P/Invoke calls Windows API P/Invoke and C# uses P/Invoke


C#

6. Disconnect from classes uses in 1 and 2 The FileStream object allows bi-directional communication between the computer and the Wiimote. All data is send and received as discrete packets and saved in a 22 byte buffer; the API allows for events and polling to take place when receiving data. The API parses packets and wraps this information in a class with members defining the states of each button and doubles containing the coordinate (x,y) and intensity of all four tracked objects. An event allows forces the update function to be called everytime the 22 byte buffer is full and polling lets the application query the API for the Wiimotes last packet. Packets sent to the Wiimote can be done at anytime, polling or event querying is not necessary. Gesture Recognition

10

Gesture recognition is becoming a commonly used tool to navigate the host of applications on computers. It is possible to have an unlimited number of gestures; for instance, a gesture could be a drawing of a computer or a human. One would not want to have too many gestures because itll become exponentially harder to remember them all and itll become easy to forget how to draw one gesture than another gesture. I use a simple IR LED, resistor, and battery pack circuit to create an IR emitter. I tape this emitter to my pointer finger thus demonstrating pointing/gesture functionality with a finger. My IR LED gesture recognition algorithm has a simple, effective, and elegant design.

11

1. A ticker at 1Hz keeps records of the coordinate (x,y) of the first tracked IR object for the last two tickers. a. If all three ticks are within a certain threshold of one another, it starts gesture recognition (similar to a button press or gesture press). 2. Using the event notification of the C# Wiimote API, a list of coordinate for the first tracked IR object is stored in an ArrayList object. 3. Since the ticks still keep records every 1Hz (from Part 1), the gesture ends when the last two tickers are within a certain threshold of one another (similar to part 1 a). At gesture end, a function to recognize which of the twelve possible gestures has been performed. The gesture recognition function performs this: Compare final (x,y) to initial (x,y) Obtain variables x_diff and y_diff If x_diff > threshold, define as East or West If y_diff > threshold, define as North or South If neither > threshold, no motion If one > threshold, basic North, South, East, or West direction If both > threshold, see if one direction is significantly greater than other o o Interpret this as only the larger one nits east and 12 units north probably intended to be only east

If both > threshold and neither significantly greater than other o o o Create line between start and end points of gesture array Count every (x,y) element between start and end and count them as above or below line Use this info to determine, for example, whether gesture is South-East or East-South

12

My Approach & Design Using off-the-shelf and inexpensive components - the wiimote, Infrared emitter (LED, resister, battery pack), and free, open source software I am able to get the Wiimote to detect and track IR sports then transmit this information to a computer appolication. The Wiimote is connected through Bluetooth and data is send full duplex at 100Hz. The Wiimote API parses incoming data packets and creates an object storing informati on from the most recently received packet. I do not acceleration, or IR intensity coordinates (x,y) from the first recognition algorithm detects recording movement at the same remote to the computer when the gesture ends. This that returns one of twelve gestures are mapped to keyboard presses according to Table 1. Gesture Down Up Left Right Down-Left Up- Left Up-Right Down-Right Left-Down Left-Up Right-Up Right-Down Keyboard Action Down Up Left Right Previous Track Volume Down Volume Up Next Track Mute/Unmute Mute/Unmute Pause Stop use the button press, information and I focus only on tracked object. The Gesture when a gesture is started, begins rate data is transmitted from the (typically 100Hz), and recognizes information is fed into a function direction combinations. The

13

Das könnte Ihnen auch gefallen