Sie sind auf Seite 1von 5

International Journal of Advanced Technology & Engineering Research (IJATER)

2nd International e-Conference on Emerging Trends in Technology

HAND MOTION TRANSLATOR FOR SPEECH AND


HEARING IMPAIRED
Pankaj Patil, Student, SIT, Lonavala; G.V.Lohar, Professor, SIT, Lonavala

Abstract
The problems tackled by hearing and speech impaired people
while interacting with normal people can be easily overcome
by construction of a communication system which allows
communicate impaired people to communicate other without a
middle interpreter. The proposed system is a cost-effective and
possible to minimize the distance between hearing and speech
impaired people with normal people. Proposed system
captures the hand signs and compare with existing database
and then accordingly converted into text followed by speech in
a commonly spoken language like English. This system is
using an Image Processing algorithm which processes the
detection and extraction of the input hand gesture from the
image stream. In this system we are using functions like skin
color based thresholding, contour detection and convexity
defect (convex hull) for detection of hands and identification
of important points on the hand respectively. The distance
between these contour points from the centroid of the hand
becomes our feature vector against which we will train our
neural network.
Keywords Image Processing, Hand Gesture Recognition,
Convex-Hull, Neural Networks.

Introduction

The review of previous work shows that some of the


techniques achieved higher recognition accuracy rate and
some of the techniques are even capable of handling dynamic
gestures or signs involving movement of the hand but most of
the methods achieve this at the cost of dependence on data
gloves, colored gloves or the use of some additional intrusive
hardware.
The contribution of this current paper can be summarized
as the following:
- The approach aims at making the sign language translation
system non-intrusively, such that it does not involve any
additional hardware or sensors other than a cheap webcam and
is also free of any other additional material dependency such
as colored gloves or sensors.
- It proposes a generalized sign language interpretation
process which is signer independent and dialect free i.e.
anyone around the world can use this translation system
without the bounds of dialect difference within the same
language. Also the system can be potentially extended from
one hand based gesture recognition to two hand recognition.
This paper proposes a system which is efficient, as it uses
simplest of techniques like skin based thresholding,
contouring and convexity defects to extract features of the
hand from a real time input video and these basic simple
features are then used to train our neural network This helps in
considerably reducing the training time.

Related work
The use of hand gestures is an important area in the
development of intelligent human interaction systems. In the
A lot of researches work has been done in the arena of
field of Gesture recognition we have large number of innocomputerization
of sign language interpreter to make schemes
vations. The gestures can be defined as a physical action,
that
successfully
interpret hand gestures into speech and text.
which can convey the information. Sign language is mainly
The two main methods for identifying hand gesture for the
imitated by using hand gestures as communication medium
hearing and speech impaired people while interacting with
among people having vocal and hearing impairments so that
normal people are glove based techniques and vision based
they can communicate to normal peoples. A person who can
techniques [1],[2]. A novel approach presented by
talk and hear properly cannot communicate with a mute perRaghavendra[3] to detect hand gestures which are a part of
son unless he is aware with sign language. A lot of work has
sign language, by utilizing special color coded gloves. The
been carried out in the field of automation of sign language
capturing image from the camera, the very first step is
interpretation to make use of systems effectively to translate
segmentation that is isolating the hand region from the
signs i.e. hand gestures into speech or text. Hand gesture is an
captured image [12]. The methods for object segmentation
ideal option for expressing the feelings or in order to convey
mainly depends on the color model that can be extracted from
something like representing a number, words.
the existence RGB color model which could be HSV color
We can use hand as an input and by making its gesture
model or YCbCr color space [13], The thresholding is done on
understandable to computer database we can interpret the text.
the basis of Otsus method [14]. A vision-based scheme able
In this paper we are presenting a method to recognizing the
to identify 14 gestures in real time to handle windows was
various hand gestures, converting them into the text and then
developed by C.W. Ng in [4]. F.Ullah has intended a system
into voice.
ISSN No: 2250-3536
E-ICETT 2014
81

that knows 26 alphabets of ASL (American Sign Language)


from fixed images using Cartesian Genetic Programming
which correctness of 90% [5]. The real time hand gesture
recognition system using skin color based segmentation and
multiple-feature template-matching techniques was presented
by Ampornaramveth [6]. R. Palaniappan use a bulky camera
and lighting arrangement, skin based thresholding for feature
extraction and neural networks with a maximum accuracy of
92% against 9 English word[7] Likewise, Akmeliawatil[8] and
Raimond[9] have presented a whole automatic system for sign
language conversion using the image processing techniques
and neural networks, but again with customized gloves. There
has also been relevant research directed towards making sign
language translation signer independent. [10], [11] both aim at
making the sign language recognition signer independent but
they again make use of cyber gloves for gathering information
about hand shape. Fang et al. make use of three additional
trackers in their hybrid system with self-organizing feature
maps and Hidden Markov Models to increase the recognition
accuracy which is between 90-96% [10].

by the linearized Bradford transform. The Bradford matrix


maps the XYZ-values for a color.
Generic gamma correction, G=2.2, C=R, G, B
C = CG
sRGB gamma correction, C=R, G, B
C

12.92
0.055+

C=

1.055

0.03928
2.4

RGB to XYZ (Same white point)


X= CxrR
RGB to XYZ (Same with Bradford)
X= BCxrR
XYZ to L*a*b conversation
X1=

Y1=

System Architecture
Figure 1 shows the key components of our system that
transform the sign language symbols and compare in database
to convert into corresponding text followed by speech.

Fig.1. System Architecture


The first step in this development of sign language interpreter
on which the correctness of the whole development depends is
the extraction of the gesture from the input video stream. The
hand gesture is sensed from the image using Skin-color
Thresholding which is built on the distance between the
centroid and main curve points on the hand and Features
extracted from this image. Finally hand motion identification
carried out by training and testing of the neural network.

A. Skin based thresholding


The images are acquisition is the main step in the system.
The image capturing from input video stream is then further
processed to identify the hand and determine the gesture, for
this they need to be defined to a specific color model like
RGB, HSV HIS or gray scale. In our work the extension of
L*a*b color space is used. So captured sRGB image is
converted into the lab color space. RGB image are first
converted into XYZ color space then converted into L*a*b
color space. Each RGB system has a white point (w). The
transformation to CIE Lab requires a reference white point (n)
which is either (w). Issues of adaptation are taken into account
ISSN No: 2250-3536

Z1=
n
Where:
X1 = X11/3
ifX1>0.008856
= 7.787X1+16/116
else
Y1 = Y11/3
ifY1>0.008856
= 7.787Y1+16/116
else
Z1 = Z11/3
ifZ1>0.008856
= 7.787Z1+16/116
else
Then,
L*=116 Y1-16
a*=500(X1-Y1)
b*=200(Y1-Z1)
Matrix
u xr/yw vxg/yw wxb/yw
Cxr =
u yr/yw vyg/yw wyb/yw
u zr/yw vzg/yw wzb/yw
Bradford matrix
B= M-1cx D Mcx
L*a*b is CIE specification that attempts to make the
luminance scale more perceptually uniform L* is a nonlinear
scaling of L normalized to a reference white point. Otsu's
method is used to automatically perform clustering-based
image thresholding, [14] or the reduction of a gray level image
to a binary image.
Let the g(x, y) is binary image is defined as:
g(x,y)=

1
0

(, )
(, ) <

Define the normalized histogram of an image as

pr (rq) =
q=1,1,2,..L-1

Otsus method chooses the threshold value k that maximizes


the between-class variance, b2
2b (t) = 2 - 2(t) = 1(t)2(t)[1(t) - 2(t)]2

E-ICETT 2014

82

Fig.3.The result of the contours detection on image


Fig.2.The result of the background subtraction, skin color
mapping & thresholding on an image which containing
hand

B. Feature extraction
Features extraction plays an important role in the whole
process. It is only the features which decide the accuracy of
the algorithm. Initially many other techniques were used for
the feature extraction, which includes color, texture etc. but
these feature may vary from person to person as each person
can have different tone of color, and it may also be affected by
varying lightning conditions. Once the hand is identified and
separated from the rest of the image then it is processed
further to determine the centroid and convex hull of the given
shape.
In proposed scheme we are working on vision based hand
gesture recognition techniques, which mainly focus on the
shape of the hand. The moments are structures of the hand
which allow rebuilding of the object, the central and spatial
moments are determined and the centroid of the hand is
calculated as follows:
Mi,j= f yf I(x,y)
Where I (x, y) defines the intensity at coordinate of the
centroid (x, y) is found by using;
x

M10
M00

0
255

The equation z = x2 + y2. A point p = (x; y) in the plane is


lifted to the point L (p) = (X; Y;Z) in E3,
Where X = x, Y = y, and Z = x2 + y2.
The circle C is defined by the equation:
x2 + y2 + ax + by + c = 0;
Since X = x, Y = y, and Z = x2 + y2, by eliminating x2 + y2 we
get
Z = -ax - by - c;
and thus X; Y;Z satisfy the linear equation as follows:
aX + bY + Z + c = 0;
This is the equation of a plane. Thus, the intersection of the
cylinder of Revolution consisting of the lines parallel to the zaxis and passing through a point of the circle C with the
paraboloid z = x2 + y2 is a planar curve (an ellipse).
We can compute the convex hull of the set of lifted points. Let
us focus on the downward-facing faces of this convex hull. Let
(L(p1); L(p2); L(p3)) be such a face. The points p1; p2; p3
belong to the set P. We claim that no other point from P is
inside the circle C. Indeed, a point p inside the circle C would
lift to a point L(p) on the paraboloid. Then, the face (L(p1);
L(p2); L(p)) would be below the face (L(p1); L(p2); L(p3)),
contradicting the fact that (L(p1); L(p2); L(p3)) is one of the
downward-facing faces of the convex hull of P.
The convex hull of these contour points is a necessary step for
finding the defects of convexity.

M01
M00

The intensity co-ordinates are being calculated with the


help of above equations. Where M10, M00, and M01 are the
moments along the axis x, y and origin
Contour detection and identification is a dynamic step in
noise reduction. Contour Detection and Cropping carried out
as:
1. Search image for all contours, CN; in image I, where N is
the number of contours
2. Then
CN =

Delaunay Triangulations and Convex Hulls:

<
<

Defects Normalization for input image


D={d1,d2,d3,,dn}, di N * N
Be the set of defect point location
Let
Dmax=(maxdi.1,maxdi.2)
For i={1,2,3,,n}
Then for each Defects,
Let
F: N x N [0,1] x [0,1]
Such that
f(x,t)=

3. Sort remaining contours, keep contour with largest area, CL


4. Draw CL to new image, I
5. Fill contour CL to obtain silhouette SL
6. Crop I to fit SL
7. Add a uniform four pixel wide border
8. the image I with SL will be used for convex hull creation.

, ,2

Let
Dn= { f(d1),f(d2),f(d3),,f(dn)}
Be the normalized defect location

Fig.4. Feature Extraction


ISSN No: 2250-3536

E-ICETT 2014

83

network in the training phase, user will be free to use our


system for translation or communication with other people.

Results
This hand motion translator is clever to translate Indian
sign (A-Z) and numbers (0-9). All the motion can be translated
real-time. The current system has only been trained to skin
color thresholding and convex hull are applied on images and
get result.
The proposed algorithm is applied on small database of
images with Different hand movements. With the help of
defined features extraction skin color thresholding and convex
hull, we can successfully recognize the different hand
movement patterns sample result is shown in below diagram.

Fig.5. 2, 3, A Indian Sign Symbols

Conclusion
A simple sign language interpretation system is developed
which uses a user-specific training for an independent signer,
dialect free sign language translation without the system
relying on expensive additional hardware such as data gloves
or sensors. We have proposed a simple and novel feature set
that can be extracted in real time. The non-intrusive solution
we are aiming to achieve reasonable average accuracies and
maximum recognition accuracy on numerals and alphabets
respectively.
Future work will revolve exploration use simple neural
network with back-propagation learning algorithm for training
and testing and generalization abilities across different sign
languages and improving upon the accuracy rates.

Acknowledgments

Fig.6. P, B Indian Sign Symbols

Future work

The authors are thankful to IJATER Journal for the support


to develop this document.

ARTIFICIAL NEURAL NETWORKS

References

Classification and generalization are the most basic and


important properties of artificial neural networks. The
architecture will be consisting of three layers - an input layer,
one hidden layer and an output layer. To complete the gesture
classification stage, two neural networks are developed for the
recognition of gestures using the distance based five features
computed from the video frames captured, one for numerals
and the other for alphabets. Lives of neural networks consist
of two phases: training and testing. This will typically requires
that data collected for validation of the system be separated
into two sets, a training set and a testing set. The training set
will be the set of data used to train the network, and the testing
set will be used to measure the performance of the trained
network on unseen data.
Assessment of the network using the testing set helps judge
how well the model will generalize to new data. Since in the
training phase over fitting will occur, and it can result in
improved performance over the data on which training is
going tm be performed but at the expense of generating bad
results in generalization, and thus a decrease in classification
accuracy over unknown data. The problem of over fitting is
handled by adjusting the number of neurons in the hidden
layer. So for our neural network a moderate number of ten
neurons in the hidden layer were used. The input layer will
receive the feature vector of five distances. The output layer of
numeric neural network will corresponds to 9 numerals, while
that of alphabet neural network corresponds to 26 alphabets, in
each case barring those numerals and alphabets that require a
dynamic movement lasting more than one frame to complete.
Once the communication between users hand gestures against
each sign language symbol is learnt by the respective neural

[1] N. A. Ibraheem, R. Z. Khan, Vision Based Gesture


Recognition Using Neural Networks Approaches: A Re
view .
International Journal of Human Computer Inter
action (IJHCI),Malaysia, Vol. 3(1), 2012.
[2] T.S. Hunang and V.I. Pavloic, Hand Gesture Modeling,
Analysis, and Synthesis,Proc. of International Workshop On
Automatic
Face
and
gesture
recognition,
Zurich,
pp.7379,1995
[3] Sachin S.K., Sthuthi B., Pavithra R. and Raghavendra
Novel Segmentation Algorithm for Hand Gesture
Recognition 2013 IEEE
[4] C. W. Ng and S. Ranganath, Real-time gesture recognition
system and application, Image Vis. Comput., vol. 20, no. 1314,
pp.9931007,2002 .
[5] F. Ullah, American Sign Language recognition system for
hearing
impaired
people
using
Cartesian
Genetic
Programming Automation, Robotics and Applications
(ICARA), 5th International Conference, pp.96-99, 2011.
[6]
Md. Hasanuzzaman, V. Ampornaramveth, Tao Zhang,
M.A. Bhuiyan , Y. Shirai and H. Ueno, Real-time Vision based
Gesture Recognition for Human Robot Interaction, In the
Proceedings of the IEEE International Conference on Robotics
and Biomimetics, Shenyang China 2004.
[7] M. P. Paulraj, S. Yaacob, M. S. bin Zanar Azalan , R.
Palaniappan, A phoneme based sign language recognition
system using skin color segmentation . Signal Processing
and its Applications (CSPA), 6th International Colloquium,
pp. 1-5-2010.
[8] R. Akmeliawati, M. P-L. Ooi, Y. C. Kuang, Real-Time
Malaysian Sign Language Translation using Colour
Segmentation and Neural Network. IEEE Instrumentation

ISSN No: 2250-3536

E-ICETT 2014

84

and Measurement Technology Conference Proceedings,


IMTC, pp. 1-6, 2007.
[9] Y. F. Admasu, K. Raimond, Ethiopian sign language
recognition using Artificial Neural Network.Intelligent
Systems Design and Applications (ISDA), 10th International
Conference, pp. 995-1000, 2010.
[10] G. Fang, W. Gao , J. Ma, Signer-independent sign
language recognition based on SOFM/HMM.Recognition,
Analysis, and Tracking of Faces and Gestures in Real-Time
Systems, Proceedings. IEEE ICCV Work shop, pp. 90-95,2001.
[11] P. Vamplew, Recognition of sign language gestures
using neural networks. Presented at the Eur. Conf.
Disabilities,
Virtual
Reality
Associated
Technol.,
Maidenhead, U.K., 1996.
[12] M. M. Hasan and P. K. Mishra, HSV brightness factor
matching for gesture recognition system, International
Journal of Image Processing (IJIP), vol. 4(5), 2010.
[13] E. Stergiopoulou and N. Papamarkos, Hand gesture
recognition using a neural network shape fitting technique,
Elsevier Engineering Applications of Artificial Intelligence 22,
1141-1158, 2009.
[14] N.Otsu, A Threshold Selection Method from Gray-Level
Histograms, IEEE transactions systems, man, and
cybernetics, vol. smc-9, no. 1, January 1979

Biographies
PANKAJ PATIL received the B.E. degree in Electronics
Engineering from the Shivaji University, Kolhapur,
Maharashtra, in 2012. Currently, He is pursuing the M.E.
degree in Electronics and Telecommunications Engineering in
VLSI and Embedded engineering. Author may be reached at
pankspatil5310@gmail.com
G. V. LOHAR currently working as a Professor in S.I.T.,
Lonavala in Electronics and Telecommunication department.
Author may be reached at Ganeshlohar73@gmail.com

ISSN No: 2250-3536

E-ICETT 2014

85

Das könnte Ihnen auch gefallen