Sie sind auf Seite 1von 25




(080107122003) (080107122037) (080107122053)

MATHAN PRASAD.S (080107122054)

In partial fulfillment for the award of the degree Of






Certified that this project report HAND GESTURE RECOGNITION TECHNIQUES FOR ROBOT CONTROL is the bonafide work of AMUTHAN.A, JIBIN GEORGE , MASANI DINESH.K , MATHAN PRASAD.S who carried out the project work under my supervision.

SIGNATURE Mr.V.J. ARUL KARTHICK SUPERVISOR Assistant Professor, Department of Electronics and Communication Engineering, SNS College of Technology, Coimbatore 35.

SIGNATURE Prof.S.ARUMUGAM HEAD OF THE DEPARTMENT Professor and Head, Department of Electronics and Communication Engineering, SNS College of Technology, Coimbatore 35.

Submitted for the viva-voce examination held on.

------------------------Internal Examiner

------------------------External Examiner


First of all we extend our heart-felt gratitude to the management of SNS College of Technology, Dr.V.S.Velusamy, Founder Trustee, Dr.S.Rajalakshmi, Correspondent and Dr.S.N.Subbramanian, Director cum Secretary for providing us with all sorts of supports in completion of this project phase. We record our indebtedness to our Principal Dr.V.P.Arunachalam, for his guidance and sustained encouragement for the successful completion of this project phase. We are highly grateful to Prof.S.Arumugam, Professor and Head, Department of Electronics and Communication Engineering, for his valuable suggestions and guidance throughout the course of this project phase. His positive approach had offered incessant help in all possible ways from the beginning. We also extend our sincere thanks to Mr.P.Rajeswaran, Professor, Department of Electronics and Communication Engineering, for the support and timely assistance rendered for this project phase to have come out in flying colours. We take immense pleasure in expressing our humble note of gratitude to our project guide MR.V.J.ARUL KARTHICK, Associate Professor, Department of Electronics and Communication Engineering, for his remarkable guidance in the course of completion of this project phase. We also extend our thanks to other faculty members and our friends for their moral support to us in helping us to successfully complete this project phase.

Table of contents
1. ABSTRACT..5 2. INTRODUCTION6 A.Brief description7 B. Literature survey..9 C. Software Engineering Approach11 3.PROBLEM DEFINITION12 4. DESIGN.13 A.Hand Gesture Recognition.14 B. Image Database .15 C.Image Processing.............16 5.PROPOSED OUTPUT. 18 6. CONCLUSION..23 7. FUTURE SCOPE..24 8. BIBLIOGRAPHY.....25

Hand gesture recognition techniques have been studied for more than two decades. Several solutions have been developed; however, little attention has been paid on the human factors, e.g. the intuitiveness of the applied hand gestures. This study was inspired by the movie Minority Report, in which a gesture-based interface was presented to a large audience. In the movie, a video-browsing application was controlled by hand gestures. Nowadays the tracking of hand movements and the computer recognition of gestures is realizable; however, for a usable system it is essential to have an intuitive set of gestures. The system functions used in Minority Report were reverse engineered and a user study was conducted, in which participants were asked to express these functions by means of hand gestures. We were interested how people formulate gestures and whether we could find any pattern in these gestures. In particular, we focused on the types of gestures in order to study intuitiveness, and on the kinetic features to discover how they influence computer recognition. We found that there are typical gestures for each function, and these are not necessarily related to the technology people are used to. This result suggests that an intuitive set of gestures can be designed, which is not only usable in this specific application, but can be generalized for other purposes as well. Furthermore, directions are given for computer recognition of gestures regarding the number of hands used and the dimensions of the space where the gestures are formulated.



Several successful approaches to spatio-temporal signal processing such as speech recognition and hand gesture recognition have been proposed. Most of them involve time alignment which requires substantial computation and considerable memory storage. In this paper, we present a Hand Gesture-based approach to control the robot. This approach employs a powerful method based on easy response of robot movement for the patient by selecting templates; therefore, considerable memory is alleviated. Due to congenital malfunctions, diseases, head injuries, or virus infections, deaf or non- vocal individuals are unable to communicate with hearing persons through speech. They use sign language or hand gestures to express themselves, however, most hearing persons do not have the special sign language expertise. Hand gestures can be classified into two classes: (1) static hand gestures which relies only the information about the angles of the lingers and (2) dynamic hand gestures which relies not only the fingers' flex angles but also the hand trajectories and orientations. The dynamic hand gestures can be further divided into two subclasses. The first subclass consists of hand gestures involving hand movements and the second subclass consists; of hand gestures involving fingers' movements but without changing the position of the hands. That is, it requires at least two different hand shapes connected sequentially to form a particular hand gesture. Therefore samples of these hand gestures are spatio-temporal patterns. The accumulated similarity associated with all samples of the input is computed for each hand gesture in the vocabulary, and the unknown gesture is classified as the gesture yielding the highest accumulative similarity. Developing sign language applications for deaf people can be very important, as many of them, being not able to speak a language, are also not able to read or write a spoken language. Ideally, a translation system would make it possible to communicate with deaf people. Compared to speech commands, hand gestures are advantageous in noisy environments, in situations where speech commands would be disturbing, as well as for communicating quantitative information and spatial relationships. A gesture is a form of non-verbal communication made with a part of the body and used instead of verbal communication (or in combination with it).

Most people use gestures and body Language in addition to words when they speak. A sign language is a language which uses gestures instead of sound to convey meaning combining handshapes, orientation and movement of the hands, arms or body, facial expressions and lip-patterns. Similar to automatic speech recognition (ASR), we focus in gesture recognition which can be later translated to a certain machine movement. The goal of this project is to develop a program implementing real time gesture recognition. At any time, a user can exhibit his hand doing a specific gesture in front of a video camera linked to a computer. However, the user is not supposed to be exactly at the same place when showing his hand. The program has to collect pictures of this gesture thanks to the video camera, to analyze it and to identify the sign. It has to do it as fast as possible, given that real time processing is required. In order to lighten the project, it has been decided that the identification would consist in counting the number of fingers that are shown by the user in the input picture. We propose a fast algorithm for automatically recognizing a limited set of gestures from hand images for a robot control application. Hand gesture recognition is a challenging problem in its general form. We consider a fixed set of manual commands and a reasonably structured environment, and develop a simple, yet effective, procedure for gesture recognition. Our approach contains steps for segmenting the hand region, locating the fingers and finally classifying the gesture. The algorithm is in variant to translation, rotation, and scale of the hand .We can even demonstrate the effectiveness of the technique on real imagery.

Our objective is to identify requirements (i.e., quality attributes and functional requirements) for Gesture Based Recognition. We especially focus on requirements. For research tools that target the domains of visualization for software maintenance, reengineering, and reverse engineering mainly in the field of medical engineering.

The requirements are identified with a comprehensive literature survey based on relevant publications in journals, conference proceedings, and theses. We have referred Documents and journals available on the net for the same. Most of the data has been referred from the IEEE website. As our library has online subscription of the IEEE journals, it provided immense help in locating the resources. The various journals referred are: 1) Implementation of adaptive feed-forward algorithm by JaroslawSzewinski_, Wojciech Jalmuzna, University of Technology, Institute of Electronic Systems, Warsaw, Poland. This deals with the description of the various algorithms used in Neural Networks viz. feed-forward (FF) feedback (FB) adaptive feed-forward (AFF).


A Fast Algorithm For Vision-Based Hand Gesture Recognition For Robot Control by Asanterabi Malima, Erolzgr, and Mjdatetin, Faculty
of Engineering and Natural Sciences, Sabanc University, Tuzla, stanbul, Turkey.

The above approach contains steps for segmenting the hand region, locating the fingers ,and finally classifying the gesture. The algorithm is invariant to translation, rotation, and scale of the hand.

3) A Gesture controlled robot for object perception and Manipulation by Mark Batcher, Institute of Neuron informatics, Germany.
Gripsee is the name of the Robot of whose design is discussed in the paper ,it is used for identifying an object, grasp it, and moving it to a new position. It serves as a multipurpose Robot which can perform a no. of tasks , it is used as a Service Robot.


Programming-By-Example Gesture Recognition by Kevin Gabayan,

Steven Lansel .

Machine learning and hardware improvements to a programming-by-example rapid prototyping system are proposed. This paper deals with the dynamic time warping gesture recognition approach involving single signal channels.



For developing the code, and the whole algorithm, it was preferable to use Matlab. Indeed, in this environment, image displaying, graphical analysis and image processing turn into a simple enough issue concerning the coding, because Matlab has a huge and the fact that Matlab is optimized for matrix-based calculus make any image treatment more easier given that any image can be considered as a matrix. Thats why the whole Code has been developed first under Matlab environment. Only the code of the Image Scanning Method and of the Weighted Averaging Analysis method is provided. Indeed, given that the last one is a kind of combination of the Pixel Counting Method and of the Edge Counting Method; their respective codes may be extracted from the code of the Weighted Averaging Method.

For the movement of dc motor of the robot, the program has been written in assembly language since it is most suitable and we are well aware of the subject. The IC used is 8051 microcontroller; hence the code was written and tested in Keil C software.



The experimental setup consists of a digital camera used to take the images .The camera is interfaced to computer. Computer is used to create the database & analysis of the images. The computer consists of a program prepared in Matlab for the various operations on the images. Using Vision and Motion tool box, analysis of the images is done.

The initial step is to create the database of the images which are used for training & testing. The image database can have different formats. Images can be either hand drawn, digitized photographs or a 3D dimensional hand. Photographs were used, as they are the most realistic approach. Two operations were carried out in all of the images. They were converted to grayscale and the background was made uniform.

The images with internet databases already had uniform backgrounds but the ones taken with the digital camera had to be processed in Photoshop .The pattern recognition system that will be used consists of some transformation T, which converts an image into a feature vector, which will be then compared with feature vectors of a training set of gestures.





Consider a robot navigation problem, in which a wheel controlling DC motor responds to the hand pose signs given by a human, visually observed by the camera. We are interested in an algorithm that enables the image from camera to identify a hand pose sign in the input image, as one of five possible commands (or counts). The identified command will then be used as a control input for the robot to perform a certain action or execute a certain task. For examples of the signs to be used in our algorithm, see Figure . The signs could be associated with various meanings depending on the function of the robot. For example, a one count could mean move forward, a five count could mean stop. Furthermore, two, three, and four counts could be interpreted as reverse, turn right, and turn left.

Set of hand gestures, or counts considered in our work.



The starting point of the project was the creation of a database with all the images that would be used for training and testing. The image database can have different formats. Images can be either hand drawn, digitized photographs or a 3D dimensional hand. Photographs were used, as they are the most realistic approach. Images came from two main sources. Various ASL databases on the Internet and Photographs. I took with a digital camera. This meant that they have different sizes, different resolutions and sometimes almost completely different angles of shooting. Images belonging to the last case were very few but they were discarded, as there was no chance of classifying them correctly. Two operations were carried out in all of the images. They were converted to gray scale and the background was made uniform. The internet databases already had uniform backgrounds but the ones I took with the digital camera had to be processed in Adobe Photoshop. Drawn images can still simulate translational variances with the help of an editing Program (e.g. Adobe Photoshop). The database itself was constantly changing throughout the completion of the project as it was it that would decide the robustness of the algorithm. Therefore, it had to be done in such way that different situations could be tested and thresholds above which the algorithm didnt classify correct would be decided. The construction of such a database is clearly dependent on the application. If the application is a crane controller for example operated by the same person for long periods the algorithm doesnt have to be robust on different persons images. In this case noise and motion blur should be tolerable.


Image processing is any form of signal processing for which the input is an image, such as photographs or frames of video; the output of image processing can be either an image or a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signalprocessing techniques to it.

Typical operations Among many other image processing operations are: Geometric transformations such as enlargement, reduction, and rotation Color corrections such as brightness and contrast adjustments, quantization, or conversion to a different color space Digital compositing or Optical compositing (combination of two or more images). Used in film making to make a "matte" Image editing (e.g., to increase the quality of a digital image) Image registration (alignment of two or more images), differencing and morphing Image segmentation Extending dynamic range by combining differently exposed images 2-D object recognition with affine invariance



Face detection Feature detection Lane departure warning system Non-photorealistic rendering Medical image processing Microscope image processing Morphological image processing Remote sensing

Robot control:
The robot is two wheel control with a castor wheel provided for the support. ULN2004A IC is used for driving the motors. DC motors has been used. As the name suggests, this motors spin freely under the command of a controller. This makes them easier to control, as the controller knows exactly how far they have rotated, without having to use a sensor. Therefore they are used on many rolling wheels. Stepper motor used is a unipolar motor , hence having six wires coming out of it. Four of them are used for receiving data from the microcontroller for its movement while two are short circuited and connected to 12V DC supply.




Matlab Program for count one:

clc; close all; clear all; x1=imread('D:\h.jpg'); figure,imshow(x1); x2=imread('D:\b.jpg'); figure,imshow(x2); x=x2-x1; figure,imshow(x); display(x); bw=im2bw(x,0.9); figure,imshow(bw); display(bw); z=size(bw); c1=0; c2=0; for i=1:z(1) for j=1:z(2) c1=c1+1; if bw(i,j)==1 c2=c2+1; end end end display(c1); display(c2);


Image plus background Only background

After background subtraction

Binary image


OUTPUT: Number of ones and zeros:

No.of zeros: c1 = 1920000 No.of ones c2 = 0

Similarly the input image for count 2,3,4,5 are


The respected output for the count 2,3,4,and 5 are For count----------2
c1 = 85801 c2 = 63387

For count ----------3

c1 = 81920 c2 = 13319

For count ----------4

c1 = 68480 c2 = 48

For count ----------5

c1 = 32292 c2 = 19815. C1----denotes number of black color in the image C2---- denotes number of white color in the image

It should be noted that for different count of input images ,the values for black and white color is varied. Hence we found that the action will be performed according the different values of output.



We proposed a fast and simple algorithm for a hand gesture recognition for controlling robot problem. Given observed images of the hand, the algorithm segments the hand region, and then makes an inference on the activity of the fingers involved in the gesture. We have demonstrated the effectiveness of this computationally efficient algorithm on real images we have acquired. Based on our motivating robot control application, we have only considered a limited number of gestures.

Our algorithm can be extended in a number of ways to recognize a broader set of gestures. The segmentation portion of our algorithm is too simple, and would need to be improved if this technique would need to be used in challenging operating conditions.

However we should note that the segmentation problem in a general setting is an open research problem itself. Reliable performance of hand gesture recognition techniques in a general setting require dealing with occlusions, temporal tracking for recognizing dynamic gestures, as well as 3D modeling of the hand, which are still mostly beyond the current state of the art.



Even with limited processing power, it will be possible to design very efficient algorithms in order to Advanced DSP processor can reduce the size of module Understand their (static) gestures Control for other biometric uses Our software has been designed to be reusable and many behaviors that are more complex may be added to our work. Because we limited ourselves to low processing power, our work could easily be made more performing by adding a state-of-the-art processor. The use real embedded OS could improve our system in terms of speed and stability. In addition, implementing more sensor modalities would improve robustness even in very complex scenes. Our system has shown the possibility that interaction with machines through gestures is a feasible task and the set of detected gestures could be enhanced to more commands by implementing a more complex model of a advanced vehicle for not only in limited space while also in broader area as in the roads too . In the future, service robot executing many different tasks from private movement to a fully-fledged advanced automotive that can make disabled to able in all sense.


Books and references: Image Processing book by Anil.K.Jain Digital Image Processing by Jayaraman