Sie sind auf Seite 1von 55

Hand Gesture Recognition System

Submitted by:
Ahsan Ayyaz

2010-MC-09

Hassan Javaid

2010-MC-20

M. Usman Khan

2010-MC-41

Supervised by: Sir Ahsan Naeem

Department of Mechatronics and Control Engineering


University of Engineering and Technology Lahore

Hand Gesture Recognition System

Submitted to the faculty of the Mechatronics and Control Engineering Department of the University
of Engineering and Technology Lahore
in partial fulfillment of the requirements for the Degree of

Bachelor of Science
in

Mechatronics and Control Engineering.

Internal Examiner

External Examiner

Department of Mechatronics and Control Engineering


University of Engineering and Technology Lahore

[i]

Declaration
I declare that the work contained in this thesis is my own, except where explicitly stated otherwise. In
addition this work has not been submitted to obtain another degree or professional qualification.
Signed:
Date:

[ii]

Acknowledgments
Thanks to Allah Almighty who gave strength and courage to work on this project and achieve
this goal with His grace and blessings. The completion brings satisfaction, but it is incomplete without
thanking those people who made it possible and whose constant support polished our efforts with
success.
We would like to thank our project advisor Mr. Ahsan Naeem whose valuable suggestions and
guidance brought us to the point of achievement. His greatest support was always with us in every
situation throughout completion of the project. Finally, we offer regards and prayers for all who helped
and supported us in this project.

[iii]

Dedication
This project is dedicated to our beloved parents whose prayers, care and efforts made us
capable of reaching this greatest point of achievement.

[iv]

Abstract
Hand gesture recognition is a way to create a useful, highly adaptive interface between
machines and their users. The recognition of gestures is difficult because gestures exhibit human
variability. Sign languages are used for communication and interface. There are various types of
systems and methods available for sign languages recognition. Our approach is robust and efficient for
static hand gesture recognition. The main objective of this thesis is to propose a system which is able
to recognize 26 static hand gestures of American Sign Language (ASL) for letter A- Z successfully and
also it is able to perform the classification on static images correctly in real time. We proposed a novel
method of pattern recognition to recognize symbols of the ASL based on the features extracted by SIFT
algorithm and the classification by MK-ROD algorithm. A modified technique for classification is also
presented which has drastically improved our results.

[v]

LIST OF TABLES

Table#1

Existing Systems

Table#2

MK-ROD SIFT Results DistRatio = 0.8

Table#3

MK-ROD SIFT Results DistRatio = 0.65

Table#4

MK-ROD SIFT Results Letter =L

Table#5

MK-ROD SIFT Results Letter = B

Table#6

MK-ROD SIFT Results DistRatio = 0.65 Changes Applied

Table#7

Offline Testing Results

Table#8

Online Testing Results

[vi]

LIST OF FIGURES

Figure#1

Hardware Setup

Figure#2

Lab Environment

Figure#3

Defects of Self-Shadowing and Cast-Shadowing

Figure#4

Hardware Setup for Image Acquisition

Figure#5

Original Image (Detection of Hand)

Figure#6

Marked Skin Pixels on the Image

Figure#7

Result after Morphological Operations

Figure#8

Detected Hand Shown

Figure#9

Final Result

Figure#10

Hand Gesture Recognition Using Point Pattern Matching Flow Chart

Figure#11

Flow Chart of Point Pattern Matching

Figure#12

Mk-Rod Distance Calculation

Figure#13

Original Images and Detected Hands

Figure#14

SIFT Key Points on Detected Hands

Figure#15

Match Key Points for Both Images

Figure#16

Appended Resultant Database and Input Images

Figure#17

GUI for Demonstration

Figure#18

Gestures Set

[vii]

CONTENTS

Acknowledgements

Dedication

ii

Abstract

iii

List of Table

iv

List of figures

Chapter# 1

Introduction

1.1 Application ..1


1.2 Motivation ...2
1.3 Problem Statement...2
Chapter# 2

Background

2.1 Gesture Types...3


2.2 Uses...3
2.3 Input Devices ...4
2.4 Existing Systems ..5
Chapter# 3

Hardware Setup & Implementation Strategy

3.1 Report Overview ..7


3.2 Project Summary ..7
3.3 MATLAB Toolboxes ...8
3.3.1 Image Processing Toolbox ..8
3.3.2 Image Acquisition Toolbox ....9
3.3.3 Computer Vision Toolbox ..9
3.4 Lab Environment 10
3.4.1 Embedded Systems Lab ........10
3.4.2 Lighting Environment ...10
Chapter#4 Methodology

12

4.1 Acquisition of Image .....12


4.2 Image Pre-Processing and Detection of Hand .......13
4.3 Recognition Strategies .......16
4.3.1 With Marker ..16
4.3.2 Without Marker (Point Pattern Matching) ....16
4.4 Choice of Strategy ......16

[viii]

Chapter# 5 Experimentation

22

5.1 Feature Extraction Through SIFT Algorithm 22


5.2 Classification through MK-ROD Algorithm..24
5.2.1. Finding Match Key Point from Image Pseudo code.....24
5.2.2. Steps for Finding Validity Ratio of Images..26
5.2.3. Pseudo Code for Final Classification27
5.3 Classification Results 1: MK-ROD Algorithm.......29
5.3.1. Offline and Online Testing.......29
5.4 Classification Results through Modified MK-ROD Algorithm..31
5.4.1 Need for Modification31
5.5 Classification Result 2: Modified MK-ROD Algorithm34
5.6 GUI Development.......35
5.6.1 Features of the GUI35

Chapter# 6 Results and Conclusions

36

6.1 Final Results Obtained....36


6.2 Conclusion..38
6.3 Future Improvements .....38
Appendix A
Appendix B
Appendix C
Appendix D

Gesture Set Pictures 39


Pseudo Code .40
References ....44
Bibliography ................................................................................................................45

[ix]

CHAPTER# 1
INTRODUCTION:

In this project we will design and build a man-machine interface using a video camera to interpret
the American one-handed sign language alphabet and number gestures (plus others for additional keyboard
and mouse control).
The keyboard and mouse are currently the main interfaces between man and computer.
In other areas where 3D information is required, such as computer games, robotics and design, other
mechanical devices such as roller-balls, joysticks and data-gloves are used.
Humans communicate mainly by vision and sound, therefore, a man-machine interface would be
more intuitive if it made greater use of vision and audio recognition. Another advantage is that the user not
only can communicate from a distance, but need have no physical contact with the computer. However,
unlike audio commands, a visual system would be preferable in noisy environments or in situations where
sound would cause a disturbance.
The visual system chosen was the recognition of hand gestures. The amount of computation
required to process hand gestures is much greater than that of the mechanical devices, however standard
desktop computers are now quick enough to make this project hand gesture recognition using computer
vision a viable proposition.

1.1 APPLICATIONS:
A gesture recognition system could be used in any of the following areas:

Man-machine interface: using hand gestures to control the computer mouse and/or keyboard
functions. An example of this, which has been implemented in this project, controls various
keyboard and mouse functions using gestures alone.

3D animation: Rapid and simple conversion of hand movements into 3D computer space for the
purposes of computer animation.

Visualization: Just as objects can be visually examined by rotating them with the hand, so it would
be advantageous if virtual 3D objects (displays on the computer screen) could be manipulated by
rotating the hand in space [Bretzner & Lindeberg, 1998].

Computer games: Using the hand to interact with computer games would be more natural for many
applications.

Control of mechanical systems (such as robotics): Using the hand to remotely control a
manipulator.

[1]

1.2 MOTIVATION:
In this modern world of technology, automation has not only provided human beings with comfort
and facilitation but it also has opened many doors for the improvement of business and earning. Technology
like gesture recognition and machine vision has revolutionized industrial estates along with the creation of
luxury and comfort in the lives of common people.
Gesture recognition is an area of keen interest in computer science. Recent development are driving
the industry both in the domain of gaming and biometrics. Our interest and support from the faculty in
alternate human-computer interfaces motivated us to take up this project.

1.3 PROBLEM STATEMENT:


To convert American Sign Language into alphabets and display the corresponding letter on the
computer screen using machine vision and image processing techniques.

[2]

CHAPTER# 2

BACKGROUND:
The history of hand gesture recognition for computer control started with the invention of glove-based
control interfaces. Researchers realized that gestures inspired by sign language can be used to offer simple
commands for a computer interface. This gradually evolved with the development of much accurate
accelerometers, infrared cameras and even fiber optic bend-sensors (optical goniometers).
Some of those developments in glove based systems eventually offered the ability to realize computer vision
based recognition without any sensors attached to the glove. These are the colored gloves or gloves that
offer unique colors for finger tracking ability that would be discussed here on computer vision based gesture
recognition. Over past 25 years, this evolution has resulted in many successful products that offer total
wireless connection with least resistance to the wearer. This chapter discusses the chronological order of
some fundamental approaches that significantly contributed to the expansion of the knowledge of hand
gesture recognition.
Gesture recognition is a topic in computer science and language technology with the goal of
interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion
or state but commonly originate from the face or hand. Current focuses in the field include emotion
recognition from the face and hand gesture recognition. Many approaches have been made using cameras
and computer vision algorithms to interpret sign language. However, the identification and recognition of
posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques. Gesture
recognition can be seen as a way for computers to begin to understand human body language, thus building
a richer bridge between machines and humans than primitive text user interfaces or even GUIs (graphical
user interfaces), which still limit the majority of input to keyboard and mouse.
Gesture recognition enables humans to communicate with the machine (HMI) and interact naturally
without any mechanical devices. Using the concept of gesture recognition, it is possible to point a finger at
the computer screen so that the cursor will move accordingly. This could potentially make conventional
input devices such as mouse, keyboards and even touch-screens redundant.

2.1 GESTURE TYPES:


In computer interfaces, two types of gestures are distinguished. We consider online gestures, which can
also be regarded as direct manipulations like scaling and rotating. In contrast, offline gestures are usually
processed after the interaction is finished; e. g. a circle is drawn to activate a context menu.
Offline gestures: Those gestures that are processed after the user interaction with the object. An
example is the gesture to activate a menu.
Online gestures: Direct manipulation gestures. They are used to scale or rotate a tangible object.

2.2 USES:
Gesture recognition is useful for processing information from humans that is not conveyed through speech
or type. There are also various types of gestures that can be identified by computers.

[3]

Sign language recognition. Just as speech recognition can transcribe speech to text, certain types
of gesture recognition software can transcribe the symbols represented through sign language into
text.
For socially assistive robotics. By using proper sensors (accelerometers and gyros) worn on the
body of a patient and by reading the values from those sensors, robots can assist in patient
rehabilitation. The best example can be stroke rehabilitation.
Directional indication through pointing. Pointing has a very specific purpose in our society, to
reference an object or location based on its position relative to ourselves. The use of gesture
recognition to determine where a person is pointing is useful for identifying the context of
statements or instructions. This application is of particular interest in the field of robotics.
Control through facial gestures. Controlling a computer through facial gestures is a useful
application of gesture recognition for users who may not physically be able to use a mouse or
keyboard. Eye tracking in particular may be of use for controlling cursor motion or focusing on
elements of a display.
Alternative computer interfaces. Foregoing the traditional keyboard and mouse setup to interact
with a computer, strong gesture recognition could allow users to accomplish frequent or common
tasks using hand or face gestures to a camera.
Immersive game technology. Gestures can be used to control interactions within video games to
try and make the game player's experience more interactive or immersive.
Virtual controllers. For systems where the act of finding or acquiring a physical controller could
require too much time, gestures can be used as an alternative control mechanism. Controlling
secondary devices in a car, or controlling a television set are examples of such usage.
Affective computing. In affective computing, gesture recognition is used in the process of
identifying emotional expression through computer systems.
Remote control. Through the use of gesture recognition, "remote control with the wave of a hand"
of various devices is possible. The signal must not only indicate the desired response, but also
which device to be controlled.

2.3 INPUT DEVICES:


The ability to track a person's movements and determine what gestures they may be performing can be
achieved through various tools. Although there is a large amount of research done in image/video based
gesture recognition, there is some variation within the tools and environments used between
implementations.

Wired gloves. These can provide input to the computer about the position and rotation of the hands
using magnetic or inertial tracking devices. Furthermore, some gloves can detect finger bending
with a high degree of accuracy (5-10 degrees), or even provide haptic feedback to the user, which
is a simulation of the sense of touch. The first commercially available hand-tracking glove-type
device was the Data Glove, a glove-type device which could detect hand position, movement and
finger bending. This uses fiber optic cables running down the back of the hand. Light pulses are
created and when the fingers are bent, light leaks through small cracks and the loss is registered,
giving an approximation of the hand pose.
Depth-aware cameras. Using specialized cameras such as structured light or time-of-flight
cameras, one can generate a depth map of what is being seen through the camera at a short range,
and use this data to approximate a 3d representation of what is being seen. These can be effective
for detection of hand gestures due to their short range capabilities.
Stereo cameras. Using two cameras whose relations to one another are known, a 3d representation
can be approximated by the output of the cameras. To get the cameras' relations, one can use a
[4]

positioning reference such as a lexian-stripe or infrared emitters. In combination with direct motion
measurement (6D-Vision) gestures can directly be detected.
Controller-based gestures. These controllers act as an extension of the body so that when gestures
are performed, some of their motion can be conveniently captured by software. Mouse gestures are
one such example, where the motion of the mouse is correlated to a symbol being drawn by a
person's hand, as is the Wii Remote or the Myo, which can study changes in acceleration over time
to represent gestures. Devices such as the LG Electronics Magic Wand, the Loop and the Scoop
use Hillcrest Labs' Free space technology, which uses MEMS accelerometers, gyroscopes and other
sensors to translate gestures into cursor movement. The software also compensates for human
tremor and inadvertent movement. Audio Cubes are another example. The sensors of these smart
light emitting cubes can be used to sense hands and fingers as well as other objects nearby, and can
be used to process data. Most applications are in music and sound synthesis, but can be applied to
other fields.
Single camera. A standard 2D camera can be used for gesture recognition where the
resources/environment would not be convenient for other forms of image-based recognition. Earlier
it was thought that single camera may not be as effective as stereo or depth aware cameras, but
some companies are challenging this theory. Software-based gesture recognition technology using
a standard 2D camera that can detect robust hand gestures, hand signs, as well as track hands or
fingertip at high accuracy has already been embedded in Lenovos Yoga ultrabooks, Pantechs
Vega LTE smartphones, Hisenses Smart TV models, among other devices.

2.4 EXISTING SYSTEMS:


A simplification used in this project, which was not found in any recognition methods researched,
is the use of a wrist band to remove several degrees of freedom. This enabled three new recognition methods
to be devised. The recognition frame rate achieved is comparable to most of the systems in existence (after
allowance for processor speed) but the number of different gestures recognized and the recognition
accuracy are amongst the best found. Figure shows several of the existing gesture recognition systems along
with recognition statistics and method.
Prior gesture-recognition systems can be primarily classified into vision based, infrared-based,
electric-field sensing, ultrasonic, and wearable approaches. Xbox Kinect, Leap Motion, PointGrab, and
CrunchFish use advances in cameras and computer vision to enable gesture recognition. Xbox Kinect uses
the 3D sensor by Prime Sense which consumes 2.25 W while Point- Grab and CrunchFish run on mobile
devices and consume as much power as the embedded camera
Samsung Galaxy S4 introduced an air gesture feature that uses infrared cameras to enable gesture
recognition. It is, however, not recommended to keep the gesture recognition system ON as it can drain the
battery. Further, it is known to be sensitive to lighting conditions and does not work in through-the pocket
scenarios. GestIC, which we believe is the state-of-the-art system, uses electric-field sensing to enable
gesture recognition using about 66 mW in the processing mode. However, it requires the users hand to be
within 15 centimeters of the screen and also does not work in through-the-pocket scenarios. Further, it
requires a relatively large surface area for sensing the electric fields. AllSee on the other hand achieves
gesture recognition with three to four orders of magnitude lower power, works on devices with smaller
form factors and in through-the-pocket scenarios.

[5]

Table 1: Existing Systems

Paper

Primary method Number of Background Additional


of recognition gestures to gesture markers
recognized images
required
(such as
wrist band)

[Bauer &
Hienz,
2000]
[Starner, &
Weaver
Pentland,
1998]
[Bowden &
Sarhadi,
2000]

Hidden Markov 97
Models

General

Multicolored 7-hours
gloves
signing

Hidden Markov 40
Models

General

No

Linear
26
approximation
to
nonlinear
point
distribution
models
Square invariant
Feature
transform[SIFT]

Blue screen No

[ David
Lowe]

This project Fast


template 46
matching

Static

None

Static

Wrist band

[6]

Number Accuracy Frame


of
rate
training
images
91.7%

400
97.6%
training
sentences

10

7441
images

24

100
99.1%
examples
per
gesture

15

CHAPTER # 3

HARDWARE SETUP AND IMPLEMENTATION STRATEGY:

3.1 REPORT OVERVIEW:


The system will use a single, color camera mounted above a black colored surface next to the computer
(see Figure 2). The output of the camera will be displayed on the monitor. The user will interact with the
system by gesturing in the view of the camera. Shape and position information about the hand will be
gathered using detection of skin.
The refined shape information will then be compared with a set of predefined training data (in the form of
templates) to recognize which gesture is being signed. In particular, the contribution of this project is a
novel way of speeding up the comparison process. A label corresponding to the recognized gesture will be
displayed on the monitor screen.

Figure 1 Hardware Setup

3.2 PROJECT SUMMARY:


In order to detect hand gestures, data about the hand will have to be collected. A decision has to
be made as to the nature and source of the data. Two possible technologies to provide this information
are:
A glove with sensors attached that measure the position of the finger joints.
[7]

An optical method has been chosen, since this is more practical (many modern computers come with a
camera attached), cost effective and has no moving parts, so is less likely to be damaged through use.
The first step in any recognition system is collection of relevant data. In this case the raw image information
will have to be processed to differentiate the skin of the hand (and various markers) from the background.
Once the data has been collected it is then possible to use prior information about the hand (for example,
the fingers are always separated from the wrist by the palm) to refine the data and remove as much noise
as possible. This step is important because as the number of gestures to be distinguished increases the data
collected has to be more and more accurate and noise free in order to permit recognition.
The next step will be to take the refined data and determine what gesture it represents. Any recognition
system will have to simplify the data to allow calculation in a reasonable amount of time (the target
recognition time for a set of 36 gestures is 25 frames per second). Obvious ways to simplify the data include
translating, rotating and scaling the hand so that it is always presented with the same position, orientation
and effective hand-camera distance to the recognition system.

3.3 MATLAB TOOLBOXES:


We have used three MATLAB toolboxes in our project for hand gesture recognition named:

Image processing toolbox


Image acquisition toolbox
Computer vision toolbox

3.3.1 Image Processing Toolbox:


Image Processing Toolbox provides a comprehensive set of reference-standard algorithms,
functions, and apps for image processing, analysis, visualization, and algorithm development. You can
perform image analysis, image segmentation, image enhancement, noise reduction, geometric
transformations, and image registration. Many toolbox functions support multicore processors, GPUs, and
C-code generation.
Image Processing Toolbox supports a diverse set of image types, including high dynamic range, giga pixel
resolution, embedded ICC profile, and tomographic. Visualization functions and apps let you explore
images and videos, examine a region of pixels, adjust color and contrast, create contours or histograms, and
manipulate regions of interest (ROIs). The toolbox supports workflows for processing, displaying, and
navigating large images.
Key Features:
Image analysis, including segmentation, morphology, statistics, and measurement
Image enhancement, filtering, and de - blurring
Geometric transformations and intensity-based image registration methods
Image transforms, including FFT, DCT, Radon, and fan-beam projection
Large image workflows, including block processing, tiling, and multi resolution display
Visualization apps, including Image Viewer and Video Viewer

[8]

3.3.2 Image Acquisition Toolbox:


Image Acquisition Toolbox enables you to acquire images and video from cameras and frame
grabbers directly into MATLAB and Simulink. You can detect hardware automatically and configure
hardware properties. Advanced workflows let you trigger acquisition while processing in-the-loop, perform
background acquisition, and synchronize sampling across several multimodal devices. With support for
multiple hardware vendors and industry standards, you can use imaging devices ranging from inexpensive
Web cameras to high-end scientific and industrial devices that meet low-light, high-speed, and other
challenging requirements.
Key Features:
Support for industry standards, including DCAM, Camera Link, and GigE Vision
Support for common OS interfaces for webcams, including Direct Show, QuickTime, and
video4linux2
Support for a range of industrial and scientific hardware vendors
Multiple acquisition modes and buffer management options
Synchronization of multimodal acquisition devices with hardware triggering
Image Acquisition app for rapid hardware configuration, image acquisition, and live video
previewing
Support for C code generation in Simulink

Image acquiring feature was used from this toolbox.

3.3.3 Computer Vision Toolbox:


Computer Vision System Toolbox provides algorithms, functions, and apps for the design and
simulation of computer vision and video processing systems. You can perform object detection and
tracking, feature detection and extraction, feature matching, stereo vision, camera calibration, and motion
detection tasks. The system toolbox also provides tools for video processing, including video file I/O, video
display, object annotation, drawing graphics, and compositing. Algorithms are available as MATLAB
functions, System objects, and Simulink blocks.
For rapid prototyping and embedded system design, the system toolbox supports fixed-point arithmetic and
automatic C-code generation.
Key Features:
Object detection, including Viola-Jones and other pre-trained detectors
Object tracking, including Kanade-Lucas-Tomasi (KLT) and Kalman filters
Feature detection, extraction, and matching, including FAST, BRISK, MSER, and HOG
Camera calibration for single and stereo cameras, including automatic checkerboard detection and
an app for workflow automation
Stereo vision, including rectification, disparity calculation, and 3D reconstruction

[9]

3.4 LAB ENVIRONMENT:


Testing and demonstration was done in a constant lighting environment for stable results. The
output of the camera system chosen in Section 3.1 comprises of a 2D array of RGB pixels provided at
regular time intervals. In order to detect silhouette information it will be necessary to differentiate skin from
background pixels.

Figure 2 Lab Environment

3.4.1 Embedded System Lab:


Embedded system lab was chosen for testing and demonstration. Hardware setup was done in the
lab to make lighting conditions constant.

3.4.2 Lighting Environment:


The task of differentiating the skin pixels from those of the background and markers is made
considerably easier by a careful choice of lighting. If the lighting is constant across the view of the camera
then the effects of self-shadowing can be reduced to a minimum. The intensity should also be set to provide
sufficient light for the CCD in the camera.

[10]

Figure 3: The effect of self-shadowing (A) and cast shadowing (B). The top three images were lit by
a single light source situated off to the left. A self-shadowing effect can be seen on all three, especially
marked on the right image where the hand is angled away from the source. The bottom three images
are more uniformly lit, with little self-shadowing. Cast shadows do not affect the skin for any of the
images and therefore should not degrade detection. Note how an increase of illumination in the
bottom three images results in a greater contrast between skin and background.

However, since this system is intended to be used by the consumer it would be a disadvantage if
special lighting equipment was required. It was decided to attempt to extract the hand and marker
information using standard room lighting (in this case a 100 watt bulb and shade mounted on the ceiling).
This would permit the system to be used in a non-specialist environment.
Camera orientation: It is important to carefully choose the direction in which the camera points to
permit an easy choice of background. The two realistic options are to point the camera towards a wall or
towards the floor (or desktop). However since the lighting was a single overhead bulb, light intensity would
be higher and shadowing effects least if the camera was pointed downwards.
Background: In order to maximize differentiation it is important that the color of the background
differs as much as possible from that of the skin. The floor color in the project room was a dull brown. It
was decided that this color would suffice initially.

[11]

CHAPTER # 4
METHODOLOGY:
Methodology of the project constitutes of the following points.

4.1. IMAGE ACCQUISITION:


Image acquisition was done through A4-Tech USB webcam. The lighting environment of the lab
was kept constant for our experiments whereas lamp from light was also given to enhance details in the
image to the maximum extent.
Following hardware setup was used in the acquisition

Figure 4: Hardware setup for image acquisition

[12]

4.2 IMAGE PRE-PROCESSING AND DETECTION OF HAND:


When the image is acquired through the image acquisition toolbox. It has to be undergone through certain
pre-processing steps before further working. Image pre-processing and detection of hand consists of
following steps.

Figure 5 Original Image

1. Binarization: Input image is converted into a binary image. Binary image is an image consists of
only one or zero intensity pixels.
2. Marking Skin Pixels: Input RGB input is used for marking the skin pixels. This RGB image is
then converted to YCbCr plane. After that Thresholding is done, ranges of these thresholds are
mention below.
Cb 67 & Cb 137
Cr 133 & Cr 173
Pixels which lie in this range are marked as BLUE pixels. This YCbCr plane is then converted to
RGB
plane.

[13]

Figure 6 Marked Skin Pixels on the Image

3. Morphological operations: These Binary images were further processed through morphological
operations. Which are following.

Cleaning
Closing

Figure 7 Result after Morphological Operations

4. Hand Detection: vision.BlobAnalysis which is a Computer Vision Toolbox Function is used for
detecting Blobs in a binary image and outputs regional properties of the blob in the image.
Properties like

Area
Centroid
[14]

Bounded Box

Centroid is displayed as a BLUE CROSS (*) and Bounded Box with Yellow Color.

Figure 8 Detected Hand shown

Figure 9 Final Result

[15]

4.3 RECOGNITION STRATEGIES:


We have used the following two strategies but after a series of experiments and for better results we have
opted for the second one:
With marker
Without marker

4.3.1. With Marker:


Following are the strategies which include the use of a band marker.

Area metric
Radial length signature
Template matching in the canonical frame

4.3.2. Without Marker:


Following is the strategy which does not include a band marker.

Point Pattern Matching.

4.4 CHOICE OF STRATEGY:


Strategy without marker was chosen because

Convenient to use
More practical

Point Pattern Matching (SIFT):


Hand gesture recognition system recognizes 36 static ASL hand gestures in real time. It takes hand
gesture images and performs action based on the category of pattern. Key-points of the image (i.e. the points
lying on the high contrast region) are calculated and matched by point pattern matching algorithm. After
analyzing the raw data based on a certain purpose and method, we can do actions such as classification.
Here, Hand gesture recognition system consisting of point pattern matching and SIFT is used to recognize
hand gesture for ASL signs, because it is fast and simple approach. Point pattern matching is the assignment
of some sort of output value to a given input value, according to some specific algorithm. An example of
recognition is classification, which attempts to assign each input value to one of a given set of classes. It
looks for exact matches of the input image features with pre-existing class features (patterns). Fig. 1 shows
how the Hand Gesture Recognition system works.

[16]

Figure 10: Hand gesture recognition system with point pattern matching flowchart

Hand Gesture Recognition System with Point Pattern Matching.


As shown in Fig. 10, first we have to take an image from webcam or from database. These images
will go through a SIFT Transformation. This transformation will find key-points (Features) of that particular
image into a feature vector, which will be then compared with other feature vectors of a Database images
of ASL hand gestures. For comparison we use Point Pattern Matching Algorithm. Hand Gesture
Recognition system is developed by using SIFT algorithm with point pattern matching algorithm to perform
hand gesture recognition of 36 ASL signs.

Point Pattern Matching Algorithm:


Hand Gesture recognition systems often require matching two sets of points in space. This is
because the analyzed images are raster graphics or the extracted features are pixels subset of the original
image. For this point pattern matching algorithm is the best solution. Point pattern matching algorithm
provides a navel approach to achieve a matching of adequate quality in an efficient and robust manner. For
hand gesture recognition of ASL sign, we used point pattern matching with SHIFT match algorithm. The
flowchart of proposed algorithm is as shown in following Fig. 11.

[17]

Figure 11: Flow chart of point pattern matching algorithm

Flow Chart of Point Pattern Matching Algorithm The working of point pattern matching algorithm is as
follows:
Take a test image
Pre-process the test image.
Initialize the distRatio = 0.65 and threshold= 0.035
Run the SHIFT match algorithm
Key point matching starts its execution by running the threshold. It gets the key point matched
between test and all 36 trained images. We get the validity ratio.
Check that we got more than one result or not.
If we get more than1 result then increment the SHIFT distRatio by 0.05 and threshold by 0.005
and repeat the steps from 4 to 7.
If we get only one result then display the result.

[18]

IMPLEMENTATION:
Implementation of Point Pattern Matching Algorithm:
During the test implementation of Hand Gesture Recognition System, the point pattern matching algorithm
is executed. First test image is taken from the database or from the webcam. Then point pattern matching
algorithm start its execution to find the matching key-points between test and train database images. After
executing this algorithm it recognizes ASL input (query) images by comparing test image with all the
database images and outputs the equivalent ASCII representation of it. This algorithm is implemented in
two parts.
i)

SIFT Algorithm
For any object in an image, points of interest of that object can be extracted to provide a "feature
description" of the object. This description, extracted from a training image, can then be used to locate and
identify the object in a test image containing many other objects. For accurate and reliable recognition, the
features extracted from the training image must be detectable even under changes in image scale, noise and
illumination. Also, the relative positions between them in the original image shouldn't change from one
image to another. SIFT detects and uses a much larger number of features from the images, which reduces
the contribution of the errors caused by these local variations in the average error of all feature matching
errors.
The SIFT algorithm consists of following steps:
Constructing a scale space.
Log Approximation
Finding key-points
Get rid of bad key points (Edges and low contrast regions)
Assigning an orientation to the key-points.
Generate SIFT features.
During the test implementation, the point pattern matching algorithm uses the SIFT algorithm to find the
key-points of the images. These key-points are the Scale invariant features located near the high contrast
regions of the image that can be used to distinguish them. In SIFT algorithm, image1 and image2 are taken
as a two images to match. For our case the first image is one of the database images and image2 is the input
(query) image. distRatio is the parameter of SIFT algorithm. In the original implementation, this parameter
is set as a constant. For our algorithm's recursivity we made it a variable parameter and threshold is the
threshold value for the MK-ROD algorithm.
Here, for finding the shift key-points of an image the function sift is called which finds the key-points with
the combination of image description and location of given image. These terms are:

Image (im): the image in double array format


Descriptors (des): a K-by-128 matrix, where each row gives an invariant descriptor for one of the
K key-points. The descriptor is a vector of 128 values normalized to unit length.
Locs (loc): K-by-4 matrix, in which each row has the 4 values for a key-point location (row,
column, scale, orientation). The orientation is in the range [-PI, PI] radians.

[19]

ii)

MK-ROD Algorithm:

For finding the validity ratio MK-ROD algorithm is used. For example, Fig. 3 shows the two images for
finding the validity ratio.

Figure 12: MK-ROD Distance Calculation

Representation of (a) Trained Database Image (b) Test Input Image with Key points
C- Denotes the center points
D- Denotes the distance mask
T - Denotes the No. of test image to match
M - Denotes the No. of Matched Points 1, 2, 3 are the key points.
The procedure to find the Validity ratio of One Database Image versus Test Input Image is as follows:

Once we got the validity ratio, mask the distances by taking the absolute which are below the algorithm's
threshold. This operation is done in order to determine the similar pattern of the matched key points from
the center of the matched key points. The absolute of the difference of the points which are below the given
threshold are treated as valid matched key point.

[20]

Implementation of Other Methods


i)
Template Matching:
Template matching determines whether an obtained gesture image can be classified as a member
of a set of stored gestures. Hand gesture recognition using template matching is done in two steps. The first
is to create the templates by collecting data values for each gesture in the image set. Generally, each gesture
is performed a number of times with a slight natural variations and the average of the data for each image
is taken and stored as the template. The second part is to compare the obtained image with the given set of
templates to find the gesture template most closely matching the current image.
An example of template matching comparisons is the use of distance measurements between the current
gesture image and each of the image in the gesture image set recognizing the posture with the lowest
distance measurement. The distance measurement must be below some threshold value to avoid false
positive recognition. Distance measurements used for hand posture template matching can be the sum of
the absolute differences or sum of squares.
ii)

Principal Component Analysis (PCA):


Principal Component Analysis(PCA) computes and study the Eigenvectors of the different pictures
and then express each image with its principal components (Eigenvectors).In this method a data set is
created choosing a considerable no of images of good resolution for better recognition with smallest
database. Then mean is subtracted from each of the data dimension. The next step is to calculate the
covariance matrix of database of the principal eigenvectors Covariance matrix
Then, the eigenvectors and the Eigen values of C are the principal components of data set. Next, good
components are chosen to feature vector. The principal (most important) eigenvectors with which data is
expressed with the lowest information loss are chosen. Finally, new data set is created (Eigen set). To
compare the different pictures, each image of the data set should be expressed with these principal
eigenvectors. Then compare by calculating the Euclidian distance between the coefficients that are before
each eigenvector.

[21]

CHAPTER # 5

EXPERIMENTATION:
Our experimentation was performed in a controlled environment in embedded systems lab. Hand gesture
was acquired through the camera and further image processing and classification was done through software
program in the computer.
We have used Scale Invariant Feature Transform or SIFT algorithm along with MK-ROD to properly detect
and recognize a particular gesture.
The SIFT algorithm was used for extracting features and MK-ROD was used as a classifier.

5.1 FEATURE EXTRACTION THROUGH SIFT ALGORITHM


As mentioned in chapter 4, the detected and preprocessed image was passed on to the sift algorithm.
This input was passed through the SIFT algorithm (details of working are given in chap 4). SIFT key points
were detected on the image which was passed as input and the results were shown as red circles on the
image.
Following figures show the results applied to the input image

Figure 13 shows the input image and the detected hand from the image by masking the skin samples and then drawing
the bounding box around it. Figures on the left were used for further processing.

[22]

SIFT Keypoints for "Image-2"

SIFT Keypoints for "Image-1"

50
50

100
100

150
200

150

250
200

300
350

250

400
300

450
50

100

150

200

250

50

300

SIFT Keypoints for "Image-1"

100

150

200

SIFT Keypoints for "Image-2"

50

50

100
100

150
150

200
200

250
50

100

50

150

100

150

200

250

Figure 14 shows the SIFT key points that were detected. Numerous SIFT key points were detected, some of these result
from noise in texture and could not be used so they have to be removed

Image 1 correspond to the input image and image 2 correspond to the image in database. SIFT key points
are numerous in numbers but most of them are outliers and will be excluded in the match. This matching
is done through the dot product approximation of the ratio of cosine of angles having small orientations.

[23]

5.2 CLASSFICATION THORUGH MK-ROD ALGORITHM:


As described in the scheme of the project after feature extraction comes the classification of the
gesture, for this we have used the MK-ROD algorithm. This algorithm is explained in the previous
chapter and the code implementing it is available in the appendix B. Here only some of the results and the
pseudo code generating it are shown.
5.2.1. Finding Match Key-Point from image pseudo code:
The key points shown are evaluated using the pseudo code:


<


An important thing to mention in this pseudo code is this we have not used the Euclidean distance for
calculation of the distances. Euclidean distance was the best option to use with respect to calculating
distance in offline mode, bur as the main objective of our project was to detect the gesture in real time and
then display the result so for the reasons of efficiency in the code we had opt to go with calculating the
dot product of cosine angles.
From the math it was seen that there was a close approximation between the ratio of Euclidean distance
and ratio of the dot product of cosine angle having small orientations. This option was also benefited as it
was a close approximation to the Euclidean distance.

[24]

Matched Points for "Image-1"

Matched Points for "Image-2"

50
50

100
100

150
200

150

250
200

300
350

250

400
300

450
50

100

150

200

250

50

300

100

150

200

Matched Points for "Image-2"

Matched Points for "Image-1"

50

50

100
100

150
150

200
200

250
50

100

50

150

100

150

200

250

Figure 15 shows the matched key points found after the calculating the distances of each of the key points as proposed in
the algorithm inly those points are kept and shown whose distance is less than distance ratio (which is defined in the
beginning of the process)

Now after feature extraction and matching we have to calculate the validity ratio of the results obtained
after matching. The validity ratio is given as:

[25]

5.2.2. Steps for finding validity ratio of images


In this equation the number of valid key points are obtained by the following steps
1. Sum of the distances of all of the matched key points in database image.

=
=

2. Sum of the distances of all of the matched key points in input image

=
=

Where M = Number of matched key points in both the cases.


3. Calculate the distance ratio of the database image this is done by dividing d1 calculated in step 1
by each of the distance.
[ ]
=

4. Calculate the distance ratio of the database image this is done by dividing d1 calculated in step 1
by each of the distance.
[ ]
=

5. Calculate the distance mask given by:


= ( ) <
6. Calculate number valid points by summing the distance mask
= ()
7. Calculate the validity ratio as:
=

The validity ratio corresponding to each of the image is calculated and the results are removed after each
iteration, when only a single result remains then that result is fed to the output.
The classification step comes next whose code is given in the next section.
[26]

5.2.3. Pseudo Code for Final Classification


The pseudo code for final classification on the basis of validity ratio is
(() > )
.
.



The code implements some basic steps:


First of all it checks that if the mask sum is greater than 1 then it preforms the iteration, mask is a
logical array in which index corresponding to a 1 represents the letter which is still being
considered for final output e.g. if a 1 is present at index 13 then we can say that letter m is still
being considered as a possible output.
It calculates the validity ratio along with other parameters of each of the gestures and then stores
the results in an array named results.
The best candidates normally 3 are selected based on their validity ratio and the rest are removed.
The mask is updated.
Again the process repeats the same until the final results is obtained.
The SIFT key points that were detected in the first half of the program were evaluated using SIFT
application and not the source code so the number of false detections detected can only be trimmed after
apply MK-ROD and applying the classifier.
The use of SIFT application for finding key points and its corresponding code implementation in
MATLAB is mentioned in references.
In order to make the results more presentable a separate function was created to append the output and the
input images. The results are shown in the figure.

[27]

Matched Points for "Image-1" and "Image-2"

50
100
150
200
250
300
350
400
450
50

100

150

200

250

300

350

400

450

500

Matched Points for "Image-1" and "Image-2"

50
100
150
200
250
50

100

150

200

250

300

350

400

Figure 16 shows the final result after classification through MK-ROD algorithm. The image on the left is the input and
the right image is its corresponding match in the database.

The final result of the match shows that MK-ROD did successfully classified the gesture. But its results
from real-time input was not that much accurate as explained in the next section

[28]

5.3 CLASSFICATION RESULTS 1: MK-ROD ALGORITHM


The standard for the output to correct was kept at accuracy of 50 percent or more, different input
gesture set were given for different orientations and lighting. The classification results of the MK-ROD
algorithm is as follows:
5.3.1. Offline and online Testing
The results contain both online and offline accuracy carried out on a number of dataset of images
of different persons. As mentioned before the lighting conditions were kept same in all of the tests.
Table 2: MK-ROD SIFT Results DistRatio = 0.8

MK-ROD - SIFT Results DistRatio = 0.8


Input
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Most Recurring
False-Match
E
E
C
G
C
G
W
T
N
J
X
P
J
A
G
Q
E
F
A
E
Z
C
M
E
W
Q

Percent accuracy
(>= 50 is correct)
42%
33%
64%
23%
14%
43%
22%
11%
13%
52%
10%
10%
9%
7%
23%
21%
22%
15%
5%
19%
20%
21%
18%
15%
8%
38%
NumCorrect
NumIncorrect
Total
[29]

Correct/
Incorrect
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
2
24
26

Accuracy

8%

Average Accuracy
of Each Gesture

22%

After changing the initial value of distance ratio the results of the MK-ROD classifier was, this is shown
in the table below
Table 3: MK-ROD SIFT Results DistRatio = 0.65

MK-ROD - SIFT Results DistRatio = 0.65


Input
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Most Recurring
False-Match
A
P
E
O
I
F
W
W
S
J
X
U
M
X
E
P
Q
R
T
T
V
V
X
T
X
Z

Percent accuracy
(>= 50 is correct)
56%
46%
36%
22%
17%
55%
32%
22%
44%
60%
10%
16%
50%
42%
22%
66%
51%
50%
20%
52%
28%
63%
30%
30%
40%
68%
NumCorrect
NumIncorrect
Total
Accuracy

Correct/
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Correct
Incorrect
Incorrect
Correct
Correct
Correct
Incorrect
Correct
Incorrect
Correct
Incorrect
Incorrect
Incorrect
Correct
10
16
26
38%

[30]

Can changes
be applied?
-Y
Y
Y
N
-Y
Y
Y
-N
Y
-N
Y
---N
-Y
-Y
Y
N
--

Average Accuracy
of Each Gesture

40%

The increase in accuracy due to decrease in the value of distRatio is because now it matches only those
key points which have very close correspondence to their neighbors (details can be seen in pseudo code),
as a result the accuracy increased to 40%, but this was not a good result the MK-ROD classifier was
detecting some false positives and the classification was a bit modified to improve the results.

5.4 CLASSFICATION RESULTS THORUGH MODIFIED MK-ROD ALGORITHM


5.4.1 Need for modification
The need from modification in the MK-ROD classifier was done because of the some false
positives in its results, validity ratio was not enough to fully classify the input gesture and some other
variable was needed to classify. From the tables it can be seen that many of the character set gestures
were miss-classified as their validity ratios were maximum but their valid points were less than that of the
other characters.
A look through the table shows that the maximum valid points are of the letter l and b for the
two tables respectively but due to the fact that classification is based on only validity ratio so these letters
are miss-classified, this resulted in poor accuracy of MK-ROD classifier.
One suggestion was to include both, validity ratio as well as number of valid points for classification
purposes.
Table 4: MK-ROD SIFT Results letter = L

MK-ROD-SIFT Results for letter = L


Letter
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P

# MK
30
40
44
43
36
31
27
35
32
35
45
69
30
35
35
51

# Valid
MK
24
35
30
31
30
29
23
30
24
33
40
67
29
33
34
51

Validity
Ratio
0.80
0.88
0.68
0.72
0.83
0.94
0.85
0.86
0.75
0.94
0.89
0.97
0.97
0.94
0.97
1.00
[31]

Letter
num
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

ind
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

Q
R
S
T
U
V
W
X
Y
Z

27
37
30
37
45
38
43
43
40
25

26
31
30
37
44
36
43
42
38
23

0.96
0.84
1.00
1.00
0.98
0.95
1.00
0.98
0.95
0.92

17
18
19
20
21
22
23
24
25
26

1
1
1
1
1
1
1
1
1
1

Table 5: MK-ROD SIFT Results letter = B

MK-ROD-SIFT Results for letter = B


Letter
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

# MK
33
66
29
55
37
47
20
41
37
30
42
42
28
31
39
46
22
38
36
38
41
46
50
39
45
31

# Valid
MK
24
65
28
50
37
45
20
41
37
28
42
39
27
29
38
46
21
36
36
36
40
45
48
38
44
31

Validity
Ratio
0.73
0.98
0.97
0.91
1.00
0.96
1.00
1.00
1.00
0.93
1.00
0.93
0.96
0.94
0.97
1.00
0.95
0.95
1.00
0.95
0.98
0.98
0.96
0.97
0.98
1.00

[32]

Letter
num
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

ind
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

Both of the tables shown above strengthens the case that validity ratio alone results in a poor classifier. So
in order to have a better classifier, validity ratio as well as number of valid key points must be included in
the determination of the correct letter for the input gesture. The pseudo code for the modified MK-ROD
algorithm is as follows
(() > )
.
.

[33]

The bold part in the pseudo code shows the modified portion, as can be seen number of valid points are
also accounted for during classification.
Now the best 5 results are selected according to the number of valid points. From these 5 results
if the one having the highest validity ratio has the maximum number of valid key points then it is selected
as the output and loop is broken and result is displayed.
5.5 CLASSIFICATION RESULTS 2: MODIFIED MK-ROD ALGORITHM
Table 6: MK-ROD SIFT Results DistRatio = 0.65 Changes Applied

MK-ROD - SIFT Results DistRatio = 0.65 Changes applied


Input
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Most Recurring
False-Match
A
B
C
D
I
F
G
P
I
J
K
L
T
A
O
P
Q
R
T
T
U
V
V
X
Y
Z

Percent accuracy
(>= 50 is correct)
98%
88%
91%
75%
45%
85%
98%
48%
87%
90%
70%
80%
45%
45%
90%
75%
75%
90%
49%
90%
85%
87%
45%
70%
90%
90%
NumCorrect
NumIncorrect
Total
Accuracy

Correct/
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Correct
Correct
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Correct
Correct
Correct
Incorrect
Correct
Correct
Correct
20
6
26
77%

[34]

Correct match
after changes?
-Y
Y
Y
N
-Y
N
Y
-Y
Y
N (result changed)
N
Y
---N
-Y
-N
Y
N
--

Average Accuracy
of Each Gesture

76%

As we can see that the accuracy of the result has dramatically increased to 76%. This was because of the
modifications to the code. This accuracy value includes both online and offline testing.

5.6. GUI DEVELOPMENT:


A GUI was also developed as a part of this project. Some features of the GUI are given below:
5.6.1. Features of the GUI:

Loads the database.


Provides a real time interface for detecting hand and classification of gesture.
Displays the input image with the database image after the match.
Displays the letter and make the word formed.

Figure 17 shows the GUI developed for the demonstration of this project. The GUI shows all of the features mentioned.

[35]

CHAPTER # 6
RESULTS AND CONCLUSIONS
6.1.

FINAL RESULTS OBTAINED


Following results were obtained from the experimentation:

Offline testing results were more accurate than those obtained through online testing.
In offline mode 20 gestures were correctly recognized resulting in a 77% accuracy.
In online mode 14 gestures were correctly recognized resulting in a 54% accuracy.
The overall system accuracy then comes out to be 66%.
The rotation invariance in the roll direction was only 20 degrees.
The rotation invariance due to yaw was removed to a certain extent i.e. +- 10
degrees.
The tables with the offline and online accuracies are shown:
Table 7: Offline testing results
Offline Testing Results
Input
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W

Most Recurring
False-Match
A
B
C
D
I
F
G
P
I
J
K
L
T
A
O
P
Q
R
T
T
U
V
V
[36]

Correct/
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Correct
Correct
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Correct
Correct
Correct
Incorrect

X
Y
Z

X
Y
Z

Correct
Correct
Correct

NumCorrect
NumIncorrect
Total
Accuracy

20
6
26
77%

Table 8: Online testing results


Online Testing Results
Input
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

Most Recurring
False-Match
A
B
C
D
I
B
G
H
I
J
B
G
T
A
J
P
Q
E
T
T
I
V
V
C
Y
Z
[37]

Correct/
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Incorrect
Correct
Correct
Correct
Correct
Incorrect
Incorrect
Incorrect
Incorrect
Incorrect
Correct
Correct
Incorrect
Incorrect
Correct
Incorrect
Correct
Incorrect
Incorrect
Correct
Correct

NumCorrect
NumIncorrect
Total
Accuracy

14
12
26
54%

6.2.

CONCLUSIONS:
Following conclusions were drawn from the experimentation and the project itself:
As this project was a machine vision and image processing project so it was
necessary that that environment must remain constant.
The lighting variations did had some impact on the experiments but the overall
accuracy was not much affected due to the use of SIFT algorithm which is light
invariant.
Yellow light is much better than white light for illumination of hand as more detail is
seen through it.
The background in our case was kept constant and non-reflective, the reason being
that if it had been shiny light variations would have been more.
Light variations must be considered in the choice of background.
The maximum invariance to the rotation in the roll an yaw did have some impact on
the output result.
For making a more robust system these rotations must be accounted for.
The dataset images must be clear and the gesture must be at front so in order to aid
detection.
In order to process more dataset images in real time a computer with a compatible
processor must be selected.
Hand gesture recognition system is an up and coming technology and has a future in
automation.

6.3.

FUTURE IMPROVEMENTS:

This project was made to recognize gestures of American Sign Language (ASL) of a
single hand, it can be extended to include both of the hands and thus producing entire
words or parts of words in one gesture instead of one letter.
When using two hands problems of occlusion and person detection can arise so this
problem can be sorted out in future improvements.
The problem of rotation invariance in yaw axis can be removed by using Harris
Corner Detector along with SIFT descriptors.
More work can be done on making the project more robust by making the program
learn the gestures but the processor speed should also be increased accordingly for
real time processing
The main objective of our project was to demonstrate the use of a gesture
recognition, some other application can also serve the same purpose but ASL
recognition is more challenging.

[38]

APPENDIX A: GESTURE SET PICTURES

Figure 18 Gesture set pictures

[39]

APPENDIX B: PROGRAM CODE


B.1. GUI CODE
function varargout = gui(varargin)
% Last Modified by GUIDE v2.5 11-Jun-2014 16:05:21
% Begin initialization code - DO NOT EDIT
gui_Singleton = 1;
gui_State = struct('gui_Name',
mfilename, ...
'gui_Singleton', gui_Singleton, ...
'gui_OpeningFcn', @gui_OpeningFcn, ...
'gui_OutputFcn', @gui_OutputFcn, ...
'gui_LayoutFcn', [] , ...
'gui_Callback',
[]);
if nargin && ischar(varargin{1})
gui_State.gui_Callback = str2func(varargin{1});
end
if nargout
[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT

% --- Executes just before gui is made visible.


function gui_OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject
handle to figure
% eventdata reserved - to be defined in a future version of MATLAB
% handles
structure with handles and user data (see GUIDATA)
% varargin
command line arguments to gui (see VARARGIN)
handles.word = '';
handles.count = 0;
handles.s = '';
handles.s2 = '';
handles.letter = '-';
ah = axes('unit', 'normalized', 'position', [0 0 1 1]);
bg = imread('back.jpg'); imagesc(bg);
set(ah,'handlevisibility','off','visible','off');
uistack(ah, 'bottom');
% Choose default command line output for gui
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);
% UIWAIT makes gui wait for user response (see UIRESUME)
% uiwait(handles.figure1);

% --- Outputs from this function are returned to the command line.
function varargout = gui_OutputFcn(hObject, eventdata, handles)
% varargout cell array for returning output args (see VARARGOUT);

[40]

% hObject
% eventdata
% handles

handle to figure
reserved - to be defined in a future version of MATLAB
structure with handles and user data (see GUIDATA)

% Get default command line output from handles structure


varargout{1} = handles.output;

function outStr = str_concat(str1,str2)


outStr = strcat(str1,str2);

% --- Executes on button press in readDB.


function readDB_Callback(hObject, eventdata, handles)
% hObject
handle to readDB (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles
structure with handles and user data (see GUIDATA)
if handles.count == 0;
readAndInitDatabase();
handles.s = str_concat(handles.s,'\nDatabase initialized. Press Start to
start detection');
set(handles.text12,'String',sprintf(handles.s));
handles.count = 1;
else
errordlg('Database initialized. Press Start to start detection','Error');
end
guidata(hObject,handles);

% --- Executes when figure1 is resized.


function figure1_ResizeFcn(hObject, eventdata, handles)
% hObject
handle to figure1 (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles
structure with handles and user data (see GUIDATA)

% --- Executes on button press in detect.


function detect_Callback(hObject, eventdata, handles)
% hObject
handle to detect (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles
structure with handles and user data (see GUIDATA)
if handles.count == 2
handles.count = 1;
str1 = handles.s;
handles.s = str_concat(str1,'\nRecognizing Gesture ...');
set(handles.text12,'String',sprintf(handles.s));
H = vision.BlobAnalysis('MinimumBlobArea',5000,'MaximumCount',3);
[handles.I,handles.bboxPoly] = detectHand_image(handles.frame,H);
if ~isempty(handles.bboxPoly)
[xMin,yMin,width,height] = poly2rect(handles.bboxPoly);
bbox = [xMin,yMin,width,height];
handles.img = imcrop(handles.I,bbox);
[~,handles.letter,handles.image] = hgr_2(handles.img);

[41]

%
%

end
disp('letter');
disp(handles.letter);
str1 = handles.s;
handles.s = str_concat(str1,'\nRecognition Complete.');
set(handles.text12,'String',sprintf(handles.s));

if handles.letter ~= '-'
if size(handles.image,3) == 3
axes(handles.axes4);
imshow(handles.image);
end
axes(handles.axes3);
str1 = strcat(handles.word,' ');
handles.word = str_concat(str1,handles.letter);
set(handles.text11,'String',sprintf('%s',handles.word));
str1 = handles.s2;
handles.s2 = str_concat(str1,sprintf('Match Found: %c
char.\n',handles.letter));
set(handles.text14,'String',handles.s2);
else
set(handles.text14,'String',sprintf('No Match Found'));
set(handles.text11,'String','-');
end
guidata(hObject,handles);
else
errordlg('Database not initialized or Video not playing.','Error');
end
guidata(hObject,handles);

% --- Executes on button press in start.


function start_Callback(hObject, eventdata, handles)
% hObject
handle to start (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles
structure with handles and user data (see GUIDATA)
if handles.count == 1
handles.count = 2;
handles.vid = videoinput('winvideo', 1, 'YUY2_320x240'); %select input device
src = getselectedsource(handles.vid);
handles.vid.FramesPerTrigger =1;
handles.vid.TriggerRepeat = Inf;
handles.vid.ReturnedColorspace = 'rgb';
start(handles.vid);
axes(handles.axes3);
%
H = vision.BlobAnalysis('MinimumBlobArea',5000,'MaximumCount',3);
while(true)
if get(handles.stop,'Value') == get(handles.stop,'Max')
guidata(hObject,handles);
break;
else
handles.frame = getdata(handles.vid,1,'uint8');%get image from camera
imshow(handles.frame);
if get(handles.detect,'Value') == get(handles.detect,'Max')

[42]

guidata(hObject,handles);
break;
end
guidata(hObject,handles);
end
end
else
errordlg('Database not initialized. Initialize database first','Initialize
Database.');
end
guidata(hObject,handles);

% --- Executes on button press in stop.


function stop_Callback(hObject, eventdata, handles)
% hObject
handle to stop (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles
structure with handles and user data (see GUIDATA)
delete(handles.vid);
handles.count = 2;
guidata(hObject,handles);

[43]

APPENDIX C: REFERENCES
[1]

Distinctive Image Features from Scale-Invariant Key points By David G. Lowe, 2003, University
of British Columbia, Canada.

[2]

Point Pattern Matching Algorithm for Recognition of 36 ASL Gestures by Deval G.


Patel, 2013.

[3]

Pattern Matching Algorithm for Asl Sign Gesture Recognition Using Neural Network by ms.
Kalyani p. Sable1, dr. H. R. Deshmukh2, prof. N. S. Band3, prof. R. N. Gadbail3, 2014.

[4]

Problems on Algorithms 2 nd Edition by Ian Parberry and William Gasarch, 2002.

[5]

Hand Gesture Recognition Using Computer Vision by Ray Lockton, Balliol College Oxford
University.

[6]

MATLAB Help and Documentation.

[7]

Image Processing Toolbox, Image Acquisition, Computer Vision Toolbox Documentation and
Online Help.

[8]

The Viola/Jones Face Detector, 2001.

[9]

SIFT Key Point Detection Code and Application Modified By D. Alvaro and J,J, Gurraero.
University of Zaragoza, Modified by D. Lowe.

[44]

APPENDIX D: BIBLOGRAPHY
http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
http://www.ijisme.org/attachments/File/v1i7/G0336061713.pdf
http://www.ijpret.com/publishedarticle/2014/4/IJPRET%20-%20CSIT%20248.pdf
http://research.microsoft.com/en-us/um/people/awf/bmvc02/project.pdf
http://www.mathworks.com/
http://www.cs.ubc.ca/~lowe/425/slides/13-ViolaJones.pdf

[45]