Final Project Report

Hand Gesture Recognition Using Neural Networks
A
Project Report On
Hand Gesture Recognition using Neural Networks
Submitted by
Kiran P V Exam No. B3223021
Vidit Mediratta Exam No. B3223027
Gaurav Sharma Exam No. B3223042
Under the Guidance of
Prof. Vijay Karra
For the partial fulfillment of
B.E. (Electronics & Telecommunication) 2008-2009
To
DEPARTMENT OF ELECTRONICS & TELECOMMUNICATION
ARMY INSTITUTE OF TECHNOLOGY
DIGHI HILLS, PUNE-411015
Under
University of Pune
1
CERTIFICATE
This is to certify that Kiran P V, Gaurav Sharma and Vidit Mediratta have
successfully submitted the seminar report on
HAND GESTURE RECOGNITION USING NEURAL
NETWORK
During the academic year 2008-2009 in the partial fulfillment towards completion
of bachelors Degree Program in Engineering(Electronics and Telecommunication)
under University of Pune.

Mrs. Surekha K.S Prof. Vijay
Karra
Head of Department Project Guide
Electronics and Telecommunication Electronics and Telecommunication

Mrs. Surekha K.S
Principal
Army Institute of Technology
Dighi Hills,Pune-411015
2
ACKNOWLEDGEMENT
We wish to express our sincere gratitude to our guide Prof. Vijay Karra for his valuable
guidance at all stages of our project. We acknowledge the whole hearted, unreserved and positive
encouragement on his part, which helped us to tackle all our problems to ensure successful
completion of the project.
We are thankful to the Staff of Department of Electronics & Telecommunication, A.I.T., for
all the direct and indirect help and for making available the resources of the department for the
timely completion of our project.
We are also thankful to our Prof. Surekha K S for her valuable suggestions.
We sincerely believe that our guides were the motivating forces behind the project. It was their
constant encouragement and constructive criticism that has made our project achieve its present
form.
We would be ungrateful if we did not acknowledge our family and friends who were always by
our side.
Last but not the least to whom we have named, we express deep gratitude and to whom we
havent please note that even though you are unnamed, you are appreciated more than you know.
Kiran P V
Vidit Mediratta
Gaurav Sharma
3
Table of contents
1. Abstract .........5
2. Introduction
A.Brief description.7
B. Literature survey9
C. Software Engineering Approach..........11
3. Problem Definition..12
4. Design
I. A.Hand Gesture Recognition.. .14
B. Image .Database 15
C.Image Processing.........16
D. Matlab.. ..17
E. Neural Network..18
F.Block Diagram. ..... .21
II. Defining the different issues
A.Database Creation 23
B.Counting the fingers29
C.Matlab Operations 37
D.Neuron Model.. .43
E.Microcontroller & Robot49
5. Stepwise procedure flow. ..55
6.Time Activity Chart.. .56
7. Conclusion. 57
8. Future scope...58
9. Bibliography. .59
4
ABSTRACT
Hand gesture recognition techniques have been studied for more than two decades. Several
solutions have been developed , however, little attention has been paid on the human factors, e.g.
the intuitiveness of the applied hand gestures. This study was inspired by the movie Minority
Report, in which a gesture-based interface was presented to a
large audience. In the movie, a video-browsing application was controlled by hand gestures.
Nowadays the tracking of hand movements and the computer recognition of gestures is
realizable , however, for a usable system it is essential to have an intuitive set of gestures. The
system functions used in Minority Report were reverse engineered and a user study was
conducted, in which participants were asked to express these functions by means of hand
gestures. We were interested how people formulate gestures and whether we could find any
pattern in these gestures. In particular, we focused on the types of gestures in order to study
intuitiveness, and on the kinetic features to discover how they influence computer recognition.
We found that there are typical gestures for each function, and these are not necessarily related to
the technology people are used to. This result suggests that an intuitive set of gestures can be
designed, which is not only usable in this specific application, but can be generalized for other
purposes as well. Furthermore, directions are given for computer recognition of gestures
regarding the number of hands used and the dimensions of the space where the gestures are
formulated.
5
INTRODUCTION
6
BRIEF DESCRIPTION
Several successful approaches to spatio-temporal signal processing such as speech recognition
and hand gesture recognition have been proposed. Most of them involve time alignment which
requires substantial computation and considerable memory storage. In this paper, we present a
neural-network-based approach to spatio-temporal pattern recognition. This approach employs
a powerful method based on Hyper Rectangular Composite Neural Networks (HRCNNs) for
selecting templates; therefore, considerable memory is alleviated.
Due to congenital malfunctions, diseases, head injuries, or virus infections, deaf or
non- vocal individuals are unable to communicate with hearing persons through speech. They
use sign language or hand gestures to express themselves, however, most hearing persons do
not have the special sign language expertise. Hand gestures can be classified into two classes:
(1) static hand gestures which relies only the information about the angles of the lingers and (2)
dynamic hand gestures which relies not only the fingers' flex angles but also the hand
trajectories and orientations. The dynamic hand gestures can be further divided into two
subclasses. The first subclass consists of hand gestures involving hand movements and the
second subclass consists; of hand gestures involving fingers' movements but without changing
the position of the hands. That is, it requires at least two different hand shapes connected
sequentially to form a particular hand gesture. Therefore samples of these hand gestures are
spatio-temporal patterns. The basic idea of our method for recognizing these spatio-temporal
hand gestures is as follows. We generate templates for each basic hand shape by training a
Hyper Rectangular Composite Neural Network (HRCNN). Templates for each hand shape are
then represented in the form of crisp IF-THEN rules, which are extracted from the values of
synaptic weights of the corresponding trained HRCNN. The accumulated similarity associated
7
with all samples of the input is computed for each hand gesture in the vocabulary, and the
unknown gesture is classified as the gesture yielding the highest accumulative similarity.
Developing sign language applications for deaf people can be very important, as many of them,
being not able to speak a language, are also not able to read or write a spoken language.
Ideally, a translation systems would make it possible to communicate with deaf people.
Compared to speech commands, hand gestures are advantageous in noisy environments, in
situations where speech commands would be disturbing, as well as for communicating
quantitative information and spatial relationships.
A gesture is a form of non-verbal communication made with a part of the body and used instead
of verbal communication (or in combination with it). Most people use gestures and body
language in addition to words when they speak. A sign language is a language which uses
gestures instead of sound to convey meaning combining hand-shapes, orientation and movement
of the hands, arms or body, facial expressions and lip-patterns. Similar to automatic speech
recognition (ASR), we focus in gesture recognition which can be later translated to a certain
machine movement.
The goal of this project is to develop a program implementing real time gesture recognition. At
any time, a user can exhibit his hand doing a specific gesture in front of a video camera linked to
a computer. However, the user is not supposed to be exactly at the same place when showing his
hand. The program has to collect pictures of this gesture thanks to the video camera, to analyze it
and to identify the sign. It has to do it as fast as possible, given that real time processing is
required. In order to lighten the project, it has been decided that the identification would consist
in counting the number of fingers that are shown by the user in the input picture.
We propose a fast algorithm for automatically recognizing a limited set of gestures from hand
images for a robot control application. Hand gesture recognition is a challenging problem in its
general form. We consider a fixed set of manual commands and a reasonably structured
environment, and develop a simple, yet effective, procedure for gesture recognition. Our
approach contains steps for segmenting the hand region, locating the fingers and finally
classifying the gesture. The algorithm is in variant to translation, rotation, and scale of the
hand .We can even demonstrate the effectiveness of the technique on real imagery.
8
LITERATURE SURVEY
Objective:
Our objective is to identify requirements (i.e., quality attributes and functional
requirements) for Gesture Based Recognition. We especially focus on requirements
for research tools that target the domains of visualization for software maintenance,
reengineering, and reverse engineering.
Method:
The requirements are identified with a comprehensive literature survey based on relevant
publications in journals, conference proceedings, and theses. We have referred
Documents and journals available on the net for the same . Most of the data has been referred
from the IEEE website. As our library has online subscription of the IEEE journals, it
provided immense help in locating the resources.
The various journals referred are:
1) Implementation of adaptive feed-forward algorithm by Jaroslaw Szewinski_, Wojciech
Jalmuzna_, University of Technology, Institute of Electronic Systems, Warsaw, Poland.

This deals with the description of the various algorithms used in Neural Networks viz.
feed-forward (FF) feedback (FB) adaptive feed-forward (AFF).

2) Gesture Based Robot Control by V. S. Rao and C. Mahanta ,Department of
Electronics and Communication Engineering ,Indian Institute of Technology, Guwahati.

9
This journal deals with the past and recent developments in gesture recognition system. It
provided the great works by different scientists in different parts of the globe working on the
same aim: visual gesture recognition system for controlling robots.

3) A Fast Algorithm For Vision-Based Hand Gesture Recognition For Robot
Control by Asanterabi Malima, Erol zgr, and Mjdat etin, Faculty of Engineering and
Natural Sciences, Sabanc University, Tuzla, stanbul, Turkey.
The approach contains steps for segmenting the hand region, locating the fingers,and
finally classifying the gesture. The algorithm is invariant to translation, rotation, and scale of
the hand.
4) A Gesture controlled robot for object perception and Manipulation by Mark
Batcher, Institute of Neuroninformatics , Germany.
Gripsee is the name of the Robot of whose design is discussed in the paper ,it is used
for identifying an object, grasp it, and moving it to a new position. It serves as a
multipurpose Robot which can perform a no. of tasks , it is used as a Service Robot.
5) Programming-By-Example Gesture Recognition by Kevin Gabayan, Steven Lansel .
Machine learning and hardware improvements to a programming-by-example rapid
prototyping system are proposed This paper deals with the dynamic time warping gesture
recognition approach involving single signal channels.
10
SOFTWARE ENGINEERING APPROACH
For developing the code, and the whole algorithm, it was preferable to use Matlab. Indeed, in this
environment, image displaying, graphical analysis and image processing turn into a simple
enough issue concerning the coding, because Matlab has a huge and very complete Image
Processing Toolbox, and the fact that Matlab is optimized for matrix-based calculus make any
image treatment more easier given that any image can be considered as a matrix.
Thats why the whole Code has been developed first under Matlab environment. Only the
code of the Neural Network Method and of the Weighted Averaging Analysis method is
provided. Indeed, given that the last one is a kind of combination of the Pixel Counting Method
and of the Edge Counting Method, their respective codes may be extracted from the code of the
Weighted Averaging Method.
For the movement of robot, the program has been written in assembly language since it is
most suitable and we are well aware of the subject. The IC used is 8051 microcontroller, hence
the code was written and tested in RIDE software.
11
PROBLEM DEFINITION
The experimental setup consists of a digital camera used to take the images .The camera
is interfaced to computer. Computer is used to create the database & analysis of the
images. The computer consists of a program prepared in MATLAB for the various
operations on the images. Using Neural Network tool box, analysis of the images is done.
The initial step is to create the database of the images which are used for training &
testing. The image database can have different formats. Images can be either hand drawn,
digitized photographs or a 3D dimensional hand. Photographs were used, as they are the
most realistic approach. Two operations were carried out in all of the images. They were
converted to grayscale and the background was made uniform. The images with internet
databases already had uniform backgrounds but the ones taken with the digital camera
had to be processed in Photoshop .The pattern recognition system that will be used
consists of some transformation T, which converts an image into a feature vector, which
will be then compared with feature vectors of a training set of gestures.

12
DESIGN
13
HAND GESTURE RECOGNITION
Consider a robot navigation problem, in which a robot responds to the hand pose signs given by
a human, visually observed by the robot through a camera. We are interested in an algorithm that
enables the robot to identify a hand pose sign in the input image, as one of five possible
commands (or counts). The identified command will then be used as a
control input for the robot to perform a certain action or execute a certain task. For examples of
the signs to be used in our algorithm, see Figure . The signs could be associated with various
meanings depending on the function of the robot. For example, a one count could mean move
forward, a five count could mean stop. Furthermore, two, three, and four counts
could be interpreted as reverse, turn
right, and turn left.
Set of hand gestures, or counts considered in our work.
14
IMAGE DATABASE
The starting point of the project was the creation of a database with all the images that
would be used for training and testing.
The image database can have different formats. Images can be either hand drawn,
digitized photographs or a 3D dimensional hand. Photographs were used, as they are the
most realistic approach.
Images came from two main sources. Various ASL databases on the Internet and
photographs I took with a digital camera. This meant that they have different sizes,
different resolutions and some times almost completely different angles of shooting.
Images belonging to the last case were very few but they were discarded, as there was no
chance of classifying them correctly. Two operations were carried out in all of the
images. They were converted to grayscale and the background was made uniform. The
internet databases already had uniform backgrounds but the ones I took with the digital
camera had to be processed in Adobe Photoshop.
Drawn images can still simulate translational variances with the help of an editing
program (e.g. Adobe Photoshop).
The database itself was constantly changing throughout the completion of the project as it
was it that would decide the robustness of the algorithm. Therefore, it had to be done in
such way that different situations could be tested and thresholds above which the
algorithm didnt classify correct would be decided.
The construction of such a database is clearly dependent on the application. If the
application is a crane controller for example operated by the same person for long periods
the algorithm doesnt have to be robust on different persons images. In this case noise
and motion blur should be tolerable.
15
IMAGE PROCESSING
Image processing is any form of signal processing for which the input is an image, such as
photographs or frames of video; the output of image processing can be either an image or a set of
characteristics or parameters related to the image. Most image-processing techniques involve
treating the image as a two-dimensional signal and applying standard signal-
processing techniques to it.
Typical operations
Among many other image processing operations are:
Geometric transformations such as enlargement, reduction, and rotation
Color corrections such as brightness and contrast adjustments, quantization, or
conversion to a different color space
Digital compositing or Optical compositing (combination of two or more images). Used
in filmmaking to make a "matte"
Image editing (e.g., to increase the quality of a digital image)
Image registration (alignment of two or more images), differencing and morphing
Image segmentation
Extending dynamic range by combining differently exposed images
2-D object recognition with affine invariance
Applications
Computer vision
Face detection
Feature detection
Lane departure warning system
Non-photorealistic rendering
Medical image processing
16
Microscope image processing
Morphological image processing
Remote sensing
MATLAB
The name MATLAB stands for matrix laboratory.
MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation. Typical uses
include:
_ Math and computation
_ Algorithm development
_ Modeling, simulation, and prototyping
_ Data analysis, exploration, and visualization
_ Scientific and engineering graphics
_ Application development, including Graphical User Interface building
MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing problems,
especially those with matrix and vector formulations, in a fraction of the time it would
take to write a program in a scalar non-interactive language such as C or Fortran.
MATLAB has evolved over a period of years with input from many users. In university
environments, it is the standard instructional tool for introductory and advanced courses
in mathematics, engineering, and science. In industry, MATLAB is the tool of choice for
high-productivity research, development, and analysis.
The reason that I have decided to use MATLAB for the development of this project is its
toolboxes. Toolboxes allow you to learn and apply specialized technology. Toolboxes
are comprehensive collections of MATLAB functions (M-files) that extend the
17
MATLAB environment to solve particular classes of problems. It includes among others
image processing and neural networks toolboxes.
NEURAL NETWORK
An artificial neural network (ANN), also called a simulated neural network (SNN) or commonly
just neural network (NN) is an interconnected group of artificial neurons that uses
a mathematical or computational model for information processing based on a
connectionistic approach to computation. In most cases an ANN is an adaptive system that
changes its structure based on external or internal information that flows through the network.
In more practical terms neural networks are non-linear statistical data modeling or decision
making tools. They can be used to model complex relationships between inputs and outputs or
to find patterns in data
An artificial neural network involves a network of simple processing elements (artificial
neurons) which can exhibit complex global behavior, determined by the connections between the
processing elements and element parameters. One classical type of artificial neural network is the
Hopfield net.
In a neural network model simple nodes, which can be called variously "neurons", "neurodes",
"Processing Elements" (PE) or "units", are connected together to form a network of nodes
hence the term "neural network". While a neural network does not have to be adaptive per se, its
practical use comes with algorithms designed to alter the strength (weights) of the connections in
the network to produce a desired signal flow.
In modern software implementations of artificial neural networks the approach inspired by
biology has more or less been abandoned for a more practical approach based on statistics and
signal processing. In some of these systems neural networks, or parts of neural networks (such as
18
artificial neurons) are used as components in larger systems that combine both adaptive and non-
adaptive elements.
Neural networks are composed of simple elements operating in parallel. These elements
are inspired by biological nervous systems. As in nature, the network function is
determined largely by the connections between elements. We can train a neural network
to perform a particular function by adjusting the values of the connections (weights)
between elements.
Commonly neural networks are adjusted, or trained, so that a particular input leads to a
specific target output There, the network is adjusted, based on a comparison of the output and the
target, until the network output matches the target.
Figure : Neural Net block diagram
Neural networks have been trained to perform complex functions in various fields of
application including pattern recognition, identification, classification, speech, vision and
control systems.
Today neural networks can be trained to solve problems that are difficult for conventional
computers or human beings. The supervised training methods are commonly used, but
other networks can be obtained from unsupervised training techniques or from direct
design methods. Unsupervised networks can be used, for instance, to identify groups of
data. Certain kinds of linear networks and Hopfield networks are designed directly. In
19
summary, there are a variety of kinds of design and learning techniques that enrich the
choices that a user can make.
Applications
The utility of artificial neural network models lies in the fact that they can be used to infer a
function from observations and also to use it. This is particularly useful in applications where the
complexity of the data or task makes the design of such a function by hand impractical.
Real life applications
The tasks to which artificial neural networks are applied tend to fall within the following broad
categories:
Function approximation, or regression analysis, including time series prediction and
modelling.
Classification, including pattern and sequence recognition, novelty detection and
sequential decision making.
Data processing, including filtering, clustering, blind signal separation and
compression.
Application areas include system identification and control (vehicle control, process control),
game-playing and decision making (backgammon, chess, racing), pattern recognition (radar
systems, face identification, object recognition, etc.), sequence recognition (gesture, speech,
handwritten text recognition), medical diagnosis, financial applications, data mining (or
knowledge discovery in databases, "KDD"), visualization and e-mail spam filtering.
20
BLOCK DIAGRAM
PC WITH
MATLAB
MOTOR DRIVER MOTOR
8051
MICROCON
TROLLER

21
Pattern to Recognized
recognize Pattern
Generation of
templates

Pattern
recognition
Decision
Logic
Sampling
22
DEFINING THE DIFFERENT ISSUES
Collecting the pictures
First of all, and obviously, it will be necessary to collect pictures. There is a choice to do
concerning the way we want to collect these pictures, given that it depends on how we
implement the main program. Running in the MATLAB environment requires the pictures to be
saved in memory and called back when running the program, because the Image Acquisition
Toolbox is not available on the MATLAB version used for the design of the program.
Thats why, for a real time processing, it will be necessary to implement the program in a
C or C++ environment. So, the easiest way to collect pictures is to use VideoOCX for example,
assuming encoding in C++.
However, to develop the body of the program, there are no real time constraints: it is
possible to work on typical and representative pictures previously chosen and saved. The whole
MATLAB program has been developed using such saved pictures. Then, it has been modified so
that it can be used in real time C++ stand-alone functions.
Finding the hand
Now, lets suppose that a set of representative pictures is provided. We need then to
analyze the picture, and to find the relevant part of the picture. Indeed the user will never put his
hand in the same area of the picture. Here are given few examples of the same sign done in
different areas, which have to lead to the same identification result, which should be 2:
23
Analysis and identification
Then, the real work can start: Lets suppose we got the relevant part of the image, which
contains only the hand. How can we guess the type of sign? To make the problem easier, we
can consider that we are interested only in the number of fingers exhibited by the user. So, we
can sum up the problem: How can we count the number of fingers in a picture of hand?
There are plenty of ways to do it. In the following pages, the advantages and drawbacks
of few of them will be described. There are some geometrical ways that can make the problem
solved by counting numbers of blocks within a picture, or some more sophisticated methods,
such as neural networks or laplacian filtering, which can lead to interesting results.
Examples of Allowed pictures

24
It has been already explained that the position of the hand in the picture is not important.
Given that the background is known, it is possible to build a new picture that corresponds to the
difference between the current picture of hand and the background. So it is possible to collect a
picture that contains only the hand, and some noise.
After processing noise removal, the resulting picture will be black almost everywhere except
where the hand is. So, zooming can then be easily realized by cropping areas whose pixel values
are close to 0.
Picture of the difference with the background
The difference with the background can be done using the Matlab function imabsdiff.
After that, to make all the preprocessing easier, it is better to create a binary picture. To do so, it
is necessary to choose a threshold: pixels with value lower than this threshold will be set to 0
(black) and others will be set to 1. The choice of this threshold depends on the video camera
properties: if we consider that the camera provides pixels coded on bytes, pixel values will be
from 0 to 255. Some measurements have proven that in this case, the presence of the hand will
imply a variation of pixel values bigger than 20 units. Of course, the optimal threshold depends
on the background, nevertheless, this threshold can be correct in most of cases.
Then it is necessary to execute noise-removal functions, else every noisy pixel that its
value is too high may be considered as part of the hand and will be included in the zoom-in
25
picture. For example if we suppose that the hand is in one corner of the picture and that there is a
noisy pixel in the opposite corner of the picture of the differences, so the zooming function will
keep it and the resulting picture, after zooming, will not be very different of the initial picture!
Thats why it is necessary to use noise removal functions.
The noise removal is processed using the function bwmorph(open), that erodes then
dilates the noisy picture. By this way, lonely pixels disappear during the erosion; other elements
are restored to their initial shape thanks to the following dilation.
Here are given few examples of resulting pictures.
Background Input Picture Binary Picture
26
Standard Re-sizing
According to the requirements, the user is not supposed to be systematically at the same
distance of the video camera. The consequences are obvious: if he is close to it, the hand will
occupy a large part of the input picture. At the contrary, when he is far from it, the hand will
appear small enough on the picture. So, the pictures of the hands after cropping may have some
very different sizes. Thats to say that it is necessary to resize all the pictures to a standard size so
that we can process them all the same way.
It seems evident that it is not useful to resize it to a size larger than the original one given
that it will not add information. Worse, it would be a serious drawback because it would increase
the amount of massive calculus, and it is contrary to the constraint of real time processing. For
these reasons, it is quite more interesting to reduce the size, but not too much. Indeed, in an
excessively reduced picture, some fingers can disappear, and some spaces between two fingers
way also disappear so that is seems there is only one finger.
After few tests and measurements, it has been decided that a size of 30x30 is quite small
enough to make calculus fast, and large enough to avoid any major damage to the initial picture.
In these conditions, the average dimensions of a finger are:
- width: 3~5 pixels
- length: 15~20 pixels
Of course, different users will all have different hands, hence different absolute
measurements. Nevertheless, such standard re-sizing will provide relative measurements: if the
size of the real thumb and ring fingers depend on the user, the ratio will be generally constant.
For almost all users:
-
( ) ( ) ... ( ) Width thumb Width ring Width atrial
-
1 2
1 2
( ) ( )
...
( ) ( )
User User
User User
Length ring Length ring
Cst
Length atrial Length atrial

Thats why this re-sizing operation can be considered as a standardization process: for any user,
the final re-sized image will have almost identical properties concerning the dimension of its
elements.
27
Finally, the fact that the width of a finger is 4 to 5 pixels implies that in the resulting
picture
A schematic example
A real example:
Input picture Binary picture Zoom-in Resizing
In these conditions, for any input picture, for any hand gesture that involve the thumb
finger, the preprocessing algorithm provides a standard-sized binary picture that corresponds to a
zoom on the hand. Once this preprocessing is finished, the real processing can start, that is to
say, the identification process can be launched.
28
Initial Picture, Size: 240 x 320
Hand, Size: ? x ?
Re-sized hand,
30 x 30
Counting the fingers
Simple Pixel Counting Analysis
The first immediate idea is the following: a picture that contains only the hand of the user
is provided to the program. In this picture, if there are only one or two fingers that are exhibited,
the numbers of pixels with value 1 will be small. If the five fingers are shown, there will be
more pixels at 1. So, there is a strong link between the number of fingers and the number of
pixels set to 1. The easiest way to classify an image is then to compute the sum of the pixels of
the re-sized hand picture, and to compare to the resulting value to different ranges:
If sum < range_1
Then No fingers
If range_1 < sum < range_2
Then 1 finger
Then 2 fingers
Then 3 fingers
Then 4 fingers
If range_5 < sum
Then 5 fingers
The advantage of this method is huge: Such programming is quite easy and very fast.
However, it is not a very efficient way:
According to the previous sections, the width of a finger will generally be 4 to 5 pixels,
and lets suppose its length is 15 to 20 pixels, according to the user. So lets consider that for
User 1, each finger has a dimension:
4 ( ) *15 ( ) 60 / pixels width pixels length pixels finger
29
For User 1, four fingers will lead to about 200 pixels. Lets suppose that for User 2, the width of
a finger is 5 and its length is 20. Finger dimension is:
5 ( ) *20 ( ) 100 pixels width pixels length pixels
For User 2, two fingers will also lead to 200 pixels. The Consequence is that the program will get
confused and may tell the User 2 he is exhibiting five fingers (two fingers and the thumb) when
he just shows three of them (two and the thumb)!
Another issue is that even if it is always the same use who do signs, and that the different
ranges have been optimized for his average finger size, errors will probably occur if he doesnt
open widely the hand: Indeed, if the hand is fully open, lets assume no error will occur, but if
the fingers are a little bit cockled (closed), then for each finger, the sum of its relative pixels
will be smaller, and if it is the case of several fingers, the global sum may lead to a mistake. An
example of this phenomenon is given here:
The program answers 5 The program answers 4
In this example, when the two last fingers are cockled, the sum of their pixels makes the program
consider there are only four fingers, because the global sum is almost the same than the one the
program would obtain if four fingers were exhibited in a hand fully opened.
This very simple method is efficient for a single user, and if he accepts more constraints
concerning the allowed signs. Such solution is not acceptable for the project, at least because it
has to work with several users. Then, it is necessary to consider some more sophisticated
solutions.
30
Simple Block Counting Analysis
The program has to count the number of fingers? So lets create a picture in which will remain
only the fingers. It is easy to do, given that the orientation of the hand is known. Cropping the
left part of the picture (including the thumb) will cause that only the fingers remain on the
picture
In such cases, the number of fingers is the number of blocks in the cropped picture, plus
1, because the thumb has to be considered, even if it has been cropped.
This method offers a huge advantage: its simplicity. Indeed, no calculus or special
treatment is required; the only operation we have to do is to compute the number of blocks in the
shortened image. Using a Matlab function, in the Image Processing Toolbox, called bwlabel
makes the coding very easy.
31
However, this method has also some major drawbacks. Indeed, the re-sizing operation
can make some well-separated fingers turn into to two joined fingers, that will look like one
single big finger, and it will cause an error in the evaluation of the number of fingers.
If the user wants to avoid such problems, he has to open widely the hand. By this way,
any confusion get impossible. The problem is that if the user opens the hand widely, the index
finger or the atrial finger (the fifth one) may not be present in the last columns of the picture. So
the user has to open the hand widely, but not too much, and he may need time to find the best
opening for each one of the different signs he want to do. And even if we suppose, that he
succeed in doing it, another phenomenon occurs:
If the user opens the hand just enough according to the sign he does, some noisy pixels
that remain, although the noise removal, may join two fingers. Then the function bwlabel will
consider they are just one block and it will imply an error in the estimation of the number of
fingers.
This method is very interesting and efficient while considering its low level of complexity and its
simple coding. However, there are possibilities to improve this method, because the rate of error
is can be reduced. With this method, around 70-75 percent of the allowed signs (say: that include
the thumb fingers) are successfully classified.
32
Weighted Averaging Analysis
In order to understand the basic idea that is discussed here, lets consider the differences
and the common points between the methods that have already been introduced:
- The Pixel Counting method and the Edges Counting method were some very simple
solutions, but their problem was they were not efficient enough. Their advantage was
their low-complexity level for the implementation, given that they were geometrical
solutions.
- The Neural Networks solution has been proven quite more efficient, but it requires
training, and special management and processing of the binary picture. Moreover, when
looking at the weights of the input layer, it appears that the neural network just realizes a
kind of weighted averaging.
Hence, the motivation in this section is to try to realize weighed averaging by a simpler
way.
Choosing the weights
In this section, the explanations will refer to the following picture, which has already
been introduced in the section Edge Counting Analysis. This picture was an example that leads
to a classification error:
33
First of all, lets suppose not weights are sued, say weights are all set to the same value,
one for example. When averaging the pixel value, all the pixels will have the same importance.
Given that the left part of the picture is not relevant in order to compute the number of fingers
(except the thumb finger, all the fingers are in the right part of the picture), the only columns that
will be considered are the columns 15 to 25 for example.
It has been proven previously that only edge counting
in this area is not efficient in this case, and that only counting
the number of pixels set to 1 may lead to incoherent results,
given that the relative dimensions of a finger depend on the user
and that the following picture will lead to 4.
One solution is to mix these two methods, say to realize weighted averaging when the
weight of each pixel set to 1 is half the number of edges in the column of the considered pixel.
For example, according to the picture provided at the beginning of this section, the pixel at line
19, column 16 is set to one and its weight is 6 given that there are 12 edges in the column 16.
A fast-approximated calculus leads to the following results:
If there is only the thumb finger in the picture, no pixel will be set to 1 in the columns 15
up to 25, and the weighted averaging will lead to 0.
If there are the thumb and one fingers in the picture, about 60 to 100 pixels will be set to
1 in the columns 15 to 25, and the number of edges in this area should be 1. So the
weighted averaging should lead to values from 60 to 100.
If there are the thumb and two fingers in the picture, about 2*60 to 2*100 pixels will be
set to 1 in the columns 15 to 25, and the number of edges in this area should be 2. So the
weighted averaging should lead to values from 2*60*2 to 2*100*2, say 240 to 400.
If there are the thumb and three fingers in the picture, about 3*60 to 3*100 pixels will be
34
If there are the thumb and four fingers in the picture, about 4*60 to 4*100 pixels will be
According to these values, lets create the following bounds:
Bound between 1 and 2 fingers:
0 60
30
2
+

100 240
170
2
+

400 540
470
2
+

900 960
930
2
+
That is to say that the algorithm has to realize the following operations:
1) Calculate
( )
25 30
15 1
_ ( ) * ( , )
colmun line
column line
WA number edges column pixel line column

]
]
]

2) Estimate the number of fingers in the picture using:

30
_ _ 1
30 170
_ _ 2
170 470
_ _ 3
470 930
_ _ 4
930
_ _ 5
if WA then
Number of fingers
if WA then
Number of fingers
if WA then
Number of fingers
if WA then
Number of fingers
if WA then
Number of fingers
<
< <
< <
< <
<
_ _
2
100
Number of edges
WA
| `

. ,
35
The consequence is that the distance between typical WA values (values of the weighted
averaging) increases at an exponential rate, and that makes the classification less sensitive to
errors. Indeed, in this case, the bound between two close possibilities is always large: for
example it has been said that the typical WA when 5 fingers is (960+1600)/2=1280. An error can
occur only if the calculated WA, which should be 1280, is under 930, the calculation error has to
be bigger than 350. This can happen only if there are a lot of errors on the number of edges in
each column and if the relative dimensions of the fingers are strange: one finger very thick,
and three fingers very thin and the thumb.
In order to understand the efficiency of this method, lets compare it to the bound that
would have been considered in a simple pixel counting algorithm: for four fingers, the sum of the
pixel will be about 3*60=180, and for five fingers, it would be equal to 4*60=240. The bound
between 4 and 5 fingers would be (180+240)/2=210. An error on five fingers happens when less
than 210 pixels are counted in the columns 15 to 25. The margin is:
240-210=30.
When comparing the error margins, it appears that without any weights, it is equal to 30,
and that with weights chosen as number of edges in the column of the analyzed pixel, this margin
tend to 350, so more than 10 times the previous margin! Thats why this method is quite better
the simple pixel counting one: different number of fingers lead to different ranges that are
separated by very large spaces that only huge errors can get through, and such errors are not very
frequent.
Without weights, confusion may occur when several fingers are exhibited (three, four or
five fingers). The use of weights makes these confusion quite more rare because three four and
five fingers pictures turn into WA values that are very distant one to the other.
36
Matlab Operations
Building GUI interfaces in Matlab
This example shows how to build user GUI in Matlab.
Start gui builder by typing
>>guide

Select "Blank GUI", click OK
37
The GUI window will open

Resize the design window.
Using the pallette on the left, drag and drop, resize and position the canvas, buttons, and static text
windows
38

Double-click on an object to open the properties dialog. Change the captions on the buttons
and remove "Static Text" string from the text window. Set the font size 30 for the text
windows and change horizontal alingment to "right."

39

The GUI is finished. Save the work.
The rest of the design process will take care of the functionality provided by each GUI component
Neural Network Toolbox
MATLAB with tools for designing, implementing, visualizing, and simulating neural networks. Neural
networks are invaluable for applications where formal analysis would be difficult or impossible, such
as pattern recognition and nonlinear system identification and control. Neural Network Toolbox
40
software provides comprehensive support for many proven network paradigms, as well as graphical
user interfaces (GUIs) that enable you to design and manage your networks. The modular, open,
and extensible design of the toolbox simplifies the creation of customized functions and networks.
Neural Network Toolbox GUIs make it easy to work with neural networks. The Neural
Network Fitting Tool is a wizard that leads you through the process of fitting data using
neural networks. You can use the tool to import large and complex data sets, quickly
create and train networks, and evaluate network performance.
Key features
GUI for creating, training, and simulating neural networks
Support for the most commonly used supervised and
unsupervised network architectures
Comprehensive set of training and learning functions
Dynamic learning networks,including time delay, nonlinear
autoregressive (NARX), layer-recurrent, and custom dynamic
Simulink blocks for building neural networks and advanced
blocks for control systems applications
Support for automatically generating Simulink blocks from
neural network objects
Preprocessing and postprocessing functions and Simulink blocks
for improving network training and assessing network performance
41
Network Architectures
Neural network toolbox supports both supervised and unsupervised networks.
Supervised Networks
Supervised neural networks are trained to produce desired outputs in response to
sample inputs, making them particularly well suited to modeling and controlling dynamic
systems, classifying noisy data, and predicting future events.
Neural Network Toolbox supports four supervised networks:feedforward, radial basis, dynamic,
and learning vectorquantization (LVQ).
Feedforward networks have one-way connections from input to output layers. They are most
commonly used for prediction, pattern recognition, and nonlinear function fitting. Supported
feedforward networks include feedforward backpropagation,cascade-forward backpropagation,
feedforward input-delay backpropagation, linear, and perceptron networks.
Radial basis networks provide an alternative, fast method for designing nonlinear feedfor-
42
ward networks. Supported variations include generalized regression and probabilistic
neural networks.
Dynamic networks use memory and recurrent feedback connections to recognize spatial and
temporal patterns in data. They are commonly used for time-series prediction, nonlinear dynamic
system modeling, and control system applications. Prebuilt dynamic networks in the toolbox
include focused and distributed time-delay, nonlinear autoregressive (NARX), layer-recurrent,
Elman, and Hopfield networks. The toolbox also supports dynamic training of custom networks
with arbitrary connections.
LVQ is a powerful method for classifying patterns that are not linearly separable. LVQ lets you
specify class boundaries and the granularity of classification.
Unsupervised Networks
Unsupervised neural network saretrained by letting the network continually adjust itself
to new inputs.They find relationships within data and can automatically define classification
schemes.
Neural Network Toolbox supports two types of self-organizing, unsupervised etworks:
competitive layers and self-organizing maps.
Competitive layers recognize and group similar input vectors. By using these groups, the
network automatically sorts the inputs into categories.
Training and Learning Functions
Training and learning functions are mathematical procedures used to automatically adjust the
networks weights and biases. The training function dictates a global algorithm that affects all the
weights and biases of a given network. The learning function can be applied to individual weights
and biases within a network.
43
Neuron Model

Simple Neuron
A neuron with a single scalar input and no bias is shown on the left below.

Figure : Neuron
The scalar input p is transmitted through a connection that multiplies its strength by the
scalar weight w, to form the product wp, again a scalar. Here the weighted input wp is the
only argument of the transfer function f, which produces the scalar output a. The neuron
on the right has a scalar bias, b. You may view the bias as simply being added to the
product wp as shown by the summing junction or as shifting the function f to the left by
an amount b. The bias is much like a weight, except that it has a constant input of 1. The
transfer function net input n, again a scalar, is the sum of the weighted input wp and the
bias b. This sum is the argument of the transfer function f. Here f is a transfer function,
typically a step function or a sigmoid function, that takes the argument n and produces
the output a. Examples of various transfer functions are given in the next section. Note
that w and b are both adjustable scalar parameters of the neuron. The central idea of
neural networks is that such parameters can be adjusted so that the network exhibits some
desired or interesting behavior.
Thus, we can train the network to do a particular job by adjusting the weight or bias
parameters, or perhaps the network itself will adjust these parameters to achieve some
desired end. All of the neurons in the program written in MATLAB have a bias.
44
.
45
Feed forward Neural Networks
Feed forward neural networks (FF networks) are the most popular and most widely used models
in many practical applications. They are known by many different names, such as "multi-layer
perceptrons."
Figure illustrates a one-hidden-layer FF network with inputs ,..., and output . Each arrow in
the figure symbolizes a parameter in the network. The network is divided into layers. The input
layer consists of just the inputs to the network. Then follows a hidden layer, which consists of
any number of neurons, or hidden units placed in parallel. Each neuron performs a weighted
summation of the inputs, which then passes a nonlinear activation function , also called
the neuron function.
A feedforward network with one hidden layer and one output.
Mathematically the functionality of a hidden neuron is described by
where the weights { , } are symbolized with the arrows feeding into the neuron.
The network output is formed by another weighted summation of the outputs of the neurons in
the hidden layer. This summation on the output is called the output layer. In Figure there is only
one output in the output layer since it is a single-output problem. Generally, the number of output
neurons equals the number of outputs of the approximation problem.
46
The output of this network is given by
where n is the number of inputs and nh is the number of neurons in the hidden layer. The
variables { , , , } are the parameters of the network model that are represented
collectively by the parameter vector ..
Note that the size of the input and output layers are defined by the number of inputs and outputs
of the network and, therefore, only the number of hidden neurons has to be specified when the
network is defined..
In training the network, its parameters are adjusted incrementally until the training data satisfy
the desired mapping as well as possible; that is, until ( ) matches the desired output y as closely
as possible up to a maximum number of iterations
The FF network in Figure is just one possible architecture of an FF network. You can modify the
architecture in various ways by changing the options. For example, you can change the activation
function to any differentiable function you want..
47
Advantages of Neural Computing
There are a variety of benefits that an analyst realizes from using neural networks in their
work.
Pattern recognition is a powerful technique for harnessing the information in
the data and generalizing about it. Neural nets learn to recognize the patterns
which exist in the data set.
The system is developed through learning rather than programming.
Programming is much more time consuming for the analyst and requires the
analyst to specify the exact behavior of the model. Neural nets teach
themselves the patterns in the data freeing the analyst for more interesting
work.
Neural networks are flexible in a changing environment. Rule based systems
or programmed systems are limited to the situation for which they were
designed--when conditions change, they are no longer valid. Although neural
networks may take some time to learn a sudden drastic change, they are
excellent at adapting to constantly changing information.
Neural networks can build informative models where more conventional
approaches fail. Because neural networks can handle very complex
interactions they can easily model data which is too difficult to model with
traditional approaches such as inferential statistics or programming logic.
Performance of neural networks is at least as good as classical statistical
modeling, and better on most problems. The neural networks build models
that are more reflective of the structure of the data in significantly less time.
48
Limitations of Neural Computing
There are some limitations to neural computing. The key limitation is the neural
network's inability to explain the model it has built in a useful way. Analysts often want
to know why the model is behaving as it is. Neural networks get better answers but they
have a hard time explaining how they got there.
There are a few other limitations that should be understood. First, It is difficult to extract
rules from neural networks. This is sometimes important to people who have to explain
their answer to others and to people who have been involved with artificial intelligence,
particularly expert systems which are rule-based.
As with most analytical methods, you cannot just throw data at a neural net and get a
good answer. You have to spend time understanding the problem or the outcome you are
trying to predict. And, you must be sure that the data used to train the system are
appropriate and are measured in a way that reflects the behavior of the factors. If the data
are not representative of the problem, neural computing will not product good results.
This is a classic situation where "garbage in" will certainly produce "garbage out."
Finally, it can take time to train a model from a very complex data set. Neural techniques
are computer intensive and will be slow on low end PCs or machines without math
coprocessors. It is important to remember though that the overall time to results can still
be faster than other data analysis approaches, even when the system takes longer to train.
Processing speed alone is not the only factor in performance and neural networks do not
require the time programming and debugging or testing assumptions that other analytical
approaches do.
49
MICROCONTROLLER AND ROBOT
Power Supply
We are directly providing 12V D C supply. The 12V D C is converted into 5V DC supply. 12v is
required for motor driving and 5 v for the microcontroller assembly.
12V is converted into 5V with the help of 7805 and capacitor combination.
Microcontroller(8051)
A microcontroller has a CPU in addition to a fixed amount of RAM, ROM, I/O ports, and timers
are all embedded together on one chip. These are used in embedded system. We have used
80c51 8-bit flash microcontroller family AT89C5124PIwith 64k of flash memory and 1kB of
RAM. The 89C5124PI device contains a non-volatile 64kB Flash program memory that is both
parallel programmable and serial In-System and In-Application Programmable. In-System
Programming (ISP) allows the user to download new code while the microcontroller sits in the
application.
In-Application Programming (IAP) means that the microcontroller fetches new program code
and reprograms itself while in the system. This allows for remote programming over a modem
link. A default serial loader (boot loader) program in ROM allows serial In-System programming
of the Flash memory via the UART without the need for a loader in the Flash code. For In-
Application Programming, the user program erases and reprograms the Flash memory by use of
standard routines contained in ROM.
50
This device is a Single-Chip 8-Bit Microcontroller manufactured in advanced CMOS process
and is a derivative of the 80C51 microcontroller family. The instruction set is 100% compatible
with the 80C51 instruction set.The device also has four 8-bit I/O ports, three 16-bit timer/event
counters, a multi-source, four-priority-level, nested interrupt structure, an enhanced UART and
on-chip oscillator and timing circuits.

The added features of the AT89C5124PI makes it a powerful microcontroller for applications
that require pulse width modulation, high-speed I/O and up/down counting capabilities such as
motor control.
Features :-
a) 80C51 Central Processing Unit
b) On-chip Flash Program Memory with In-System Programming(ISP) and In-Application
Programming (IAP) capability
c) Boot ROM contains low level Flash programming routines for downloading via the UART
d) Can be programmed by the end-user application (IAP)
e) 6 clocks per machine cycle operation (standard)
f) 12 clocks per machine cycle operation (optional)
g) Speed up to 20 MHz with 6 clock cycles per machine cycle (40 MHz equivalent
performance); up to 33 MHz with 12 clocks per machine cycle
h) Fully static operation
i) RAM expandable externally to 64 kB
j) 4 level priority interrupt
k) 8 interrupt sources
l) Four 8-bit I/O ports
m) Full-duplex enhanced UART
n) Framing error detection
o) Automatic address recognition
p) Power control modes
Clock can be stopped and resumed
Idle mode
Power down mode
q) Programmable clock out
r) Second DPTR register
s) Asynchronous port reset
t) Low EMI (inhibit ALE)
u) Programmable Counter Array (PCA)
--- PWM
---Capture/Compare
51
PIN DESCRIPTION :
a) Ground: 0 V reference.
b) Power Supply(Vcc): This is the power supply voltage for normal, idle, and power- down
operation.
c) Port 0(8 I/O pins from 39-32):
Port 0 is an open-drain, bidirectional I/O port. Port 0 pins that have 1s written to them float
and can be used as high-impedance inputs. Port 0 is also the multiplexed low-order address and
data bus during accesses to external program and data memory. In this application, it uses
strong internal pull-ups when emitting 1s.
d) Port 1(8 I/O numbered 1-8):
Port 1 is an 8-bit bidirectional I/O port with
internal pull-ups on all pins except P1.6 and P1.7 which are open Drain.Port 1 pins that
have 1s written to them are pulled high by the internal pull-ups and can be used as inputs.
As inputs, port 1 pins that are externally pulled low will source current because of the
internal pull-ups.
Alternate functions for 89C51RB2/RC2/RD2 Port 1 include:
1) T2 (P1.0): Timer/Counter 2 external count input/Clockout
2) T2EX (P1.1): Timer/Counter 2 Reload/Capture/Direction Control
3) ECI (P1.2): External Clock Input to the PCA
4) CEX0 (P1.3): Capture/Compare External I/O for PCA module 0
e) Port 2(21-28):
Port 2 is an 8-bit bidirectional I/O port with internal pull-
ups. Port 2 pins that have 1s written to them are pulled high by the
internal pull-ups and can be used as inputs. As inputs, port 2 pins that are
externally being pulled low will source current because of the internal
pull-ups. Port 2 emits the high-order address byte during fetches from
external program memory and during accesses to external data memor
that use 16-bit addresses (MOVX @DPTR).
52
f) Port 3(10-17):
Port 3 is an 8-bit bidirectional I/O port with internal pull-
ups. Port 3 pins that have 1s written to them are pulled high by the
internal pull-ups and can be used as inputs. As inputs, port 3 pins that are
externally being pulled low will source current because of the pull-ups.
Port 3 also serves the special features of the 89C51RB2/RC2/RD2, as listed
below:
I. RxD (P3.0): Serial input port
II. TxD (P3.1): Serial output port
III. INT0 (P3.2): External interrupt
IV. INT1 (P3.3): External interrupt
V. T0 (P3.4): Timer 0 external input
VI. T1 (P3.5): Timer 1 external input
VII. WR (P3.6): External data memory write strobe
VIII. RD (P3.7): External data memory read strobe
g) RST Reset(pin 9): A high on this pin for two machine cycles while the
oscillator is running, resets the device. An internal diffused resistor to
VSS permits a power-on reset using only an external capacitor to VCC.
h) ALE (Address Latch Enable, pin 30): Output pulse for latching the low
byte of the address during an access to external memory. In normal
operation, ALE is emitted twice every machine cycle, and can be used
for external timing or clocking. Note that one ALE pulse is skipped
during each access to external data memory. ALE can be disabled by
setting SFR auxiliary.0. With this bit set, ALE will be active only during
a MOVX instruction.
i) PSEN (Program Store Enable, pin 29): The read strobe to external
program memory. When executing code from the external program
memory, PSEN is activated twice each machine cycle, except that two
PSEN activations are skipped during each access to external data
memory. PSEN is not activated during fetches from internal program
memory.
j) EA/VPP(External Access Enable/Programming Supply Voltage, pin 31):
EA must be externally held low to enable the device to fetch code
from external program memory locations. If EA is held high, the device
executes from internal program memory. The value on the EA pin is
latched when RST is released and any subsequent changes have no
53
effect. This pin also receives the programming supply voltage (VPP)
during Flash programming.
k) XTAL1 and XTAL2(pin 19 & 18): Input & output respectively to the
inverting oscillator amplifier and input to the internal clock generator
circuits.
To avoid latch-up effect at power-on, the voltage on any pin (other than
VPP) must not be higher than VCC + 0.5 V or less than VSS 0.5 V.
Motor Driver(ULN2004A)
The ULN2004A is high voltage, high current darlington arrays each containing seven open
collector darlington pairs with common emitters. Each channel rated at 500mA and can withstand
peak currents of 600mA.Suppression diodes are included for inductive load driving and the inputs
are pinned opposite the outputs to simplify board layout.
These versatile devices are useful for driving a wide range of loads including solenoids, relays DC
motors, LED displays filament lamps, thermal print-heads and high power buffers
Maximum output voltage is 50V
The 2004A is supplied in 16 pin plastic DIP packages with a copper lead frame to reduce thermal
resistance.

54
Robot
The robot is two wheel robot with a castor wheel provided for the support.ULN2004A ic is
used for driving the motors. Stepper motor has been used. As the name suggests, stepper
motors do not spin freely like DC motors; they rotate in discrete steps, under the command of a
controller. This makes them easier to control, as the controller knows exactly how far they
have rotated, without having to use a sensor. Therefore they are used on many robots. Stepper
motor used is a unipolar motor ,hence having six wires coming out of it. Four of them are used
for receiving data from the microcontroller for its movement while two are short circuited and
connected to 12V DC supply.
For the movement of motor ,its alternate windings are excited continuously with the help of
assembly code

55
Stepwise procedure/ flow:
Input pattern to be recognized
56
Sampling
Generation of
templates
Template matching with
input pattern
Best match
Recognized pattern

TIME ACTIVITY CHART:
5
A
C
T
I
V
I
T
I
E
S
4
3
2
1
3 4 7 10 12
Months
57
Activities:
1- Literature review
2- Selection of Application & decide the specifications of the
equipments required for same.
3 Make an experimental set-up.
4 - Conduct trials, plot results & conclusion.
5 - Preparation of report
CONCLUSION
We proposed a fast and simple algorithm for a hand gesture recognition problem. Given
observed images of the hand, the algorithm segments the hand region, and then makes an
inference on the activity of the fingers involved in the gesture. We have demonstrated the
effectiveness of this computationally efficient algorithm on real images we have acquired. Based
on our motivating robot control application, we have only considered a limited number of
gestures. Our algorithm can be extended in a number of ways to recognize a broader set of
gestures. The segmentation portion of our algorithm is too simple, and would need to be
improved if this technique would need to be used in challenging operating conditions. However
we should note that the segmentation problem in a general setting is an open research problem
itself. Reliable performance of hand gesture recognition techniques in a general setting require
dealing with occlusions, temporal tracking for recognizing dynamic gestures, as well as 3D
modeling of the hand, which are still mostly beyond the current state of the art.
58

FUTURE SCOPE
Even with limited processing power, it will be possible to design very efficient algorithms in
order to
Track people,
(Re-)identify them
Understand their (static) gestures
Control a robot
Our software has been designed to be reusable and many behaviors that are more complex may
be added to our work. Because we limited ourselves to low processing power, our work could
easily be made more performing by adding a state-of-the-art processor. The use real embedded
OS could improve our system in terms of speed and stability. In addition, implementing more
sensor modalities would improve robustness even in very complex scenes. Our system has
59
shown the possibility that interaction with machines through gestures is a feasible task and the
set of detected gestures could be enhanced to more commands by implementing a more complex
model of a human being. In the future, service robots executing many different tasks from house-
maid work to nuclear power plant services might arise and become a common part of everyday
live normal as computers nowadays.
BIBLIOGRAPHY
Books and references
Matlab by R P Singh
The 8051 Microcontroller by Mazidi
Image Processing book by Bijith Marakarkandy
Digital Image Processing: An Algorithmic Approach by Joshi M A
Neural Network by Gonzales Cenelia
www.wikipedia.com
www.google.com
ieeexplore.ieee.org
60

Final Project Report

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Final Project Report

Hochgeladen von

Copyright:

Verfügbare Formate

Hand Gesture Recognition Using Neural Networks

Bound between 2 and 3 fingers:

Bound between 3 and 4 fingers:

Bound between 4 and 5 fingers:

Das könnte Ihnen auch gefallen