Speech Recognition for Gender Discrimination Project Report

A Project Report
on
Speech Recognition for Gender Discrimination
SUBMITTED IN PARTIAL FULFILLMENT FOR THE AWARD
OF DEGREE OF BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Student Name: Enrolment Number:
HARSH SRIVASTAVA (9911102222)
TARANDEEP SINGH (9911102350)
YOGESH SINDHWANI (9911102363)
Department of Electronics and Communication Engineerng

Jaypee Institute of Information Technology, Noida,
June 2015
A Project Report
on
Speech Recognition for Gender Discrimination
SUBMITTED IN PARTIAL FULFILLMENT FOR THE AWARD
OF DEGREE OF BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Student Name: Enrolment Number:
HARSH SRIVASTAVA (9911102222)
TARANDEEP SINGH (9911102350)
YOGESH SINDHWANI (9911102363)
Under the Guidance of

Mr.B.Suresh
Department of Electronics and Communication Engineerng
Jaypee Institute of Information Technology, Noida,
June 2015
CERTIFICATE
This is to certify that the work titled “speech recognition – gender discrim-
intion” submitted by “Yogesh Sindhwani , Tarandeep Singh and Harsh Srivastva”
in partial fulfillment for the award of degree of Bachelor of Technology (B.Tech)
of Jaypee Institute of Information Technology-128, Noida has carried out under
my supervision. The project has not been submitted to any other University or
Institute for the award of this or any other degree or diploma.
Signature of Supervisor .
Name of Supervisor : Mr. B Suresh
Date:
ACKNOWLEDGEMENT
The*success*of*our project depends largely on the encouragement and help from

many others. We take this opportunity to express our*thankfulness to the people
who have contributed in the successful completion of this project*and have been
the real guiding force in our project.
We would like to show our gratitude to our project mentor Mr. B Suresh
for*continuously guiding and tremendously helping*us throughout this project.
We feel motivated and encouraged every time we attend his meeting. Without*his
encouragement and guidance the completion of this project on speech recognition
gender discrimination would not have been achieved. .
ii
DECLARATION
Project work is a part of our curriculum that gives us the knowledge about the
topic and subject we have studied. It also helps in understanding and relating
the theoretical concepts better which was not enlightened in the classroom. We
have prepared this report as a part of our ‘MAJOR PROJECT FOR SEMESTER
VIII’. The topic we have selected for the project is ‘SPEECH RECOGNITION -
GENDER DISCRIMINATION ’
Student Name:
HARSH SRIVASTAVA
TARANDEEP SINGH
YOGESH SINDHWANI
iii
Contents
CERTIFICATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . iii
DECLARATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
1 INTRODUCTION 1
1.1 VISION OF SPEECH RECOGNITION . . . . . . . . . . . . . . . 1
1.2 HISTORY AND BACKGROUND . . . . . . . . . . . . . . . . . . . 1
2 CHARACTERSTICS 4
2.1 CHARACTERSTICS OF SPEECH RECOGNITION . . . . . . . . 4
2.2 CLASSIFICATION OF SPEECH . . . . . . . . . . . . . . . . . . . 6
2.3 APPLICATION OF SPEECH . . . . . . . . . . . . . . . . . . . . . 8
3 SOURCE CODE 9
3.1 ALOGRITHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 OUR TECHNIQUE . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Simulation and Results 17

4.1 Results Obtained . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Appendes 21
6 Conclusion 24
iv
Abstract
This paper is about a comparative investigation on speech signals to devise a gen-

der classifier. Gender classification by speech analysis basically aims to predict the
gender of the speaker by analyzing different parameters of the voice sample. This
comparative investigation mainly concentrates on pitch analysis of the speech sig-
nals. The analysis includes comparison of different pitch values of male and female
voice samples. This quantitative comparison is implemented through MATLAB
programming by determining pitch through auto-correlation method. A database
consisting of voice samples collected from many students, both male and female,
of our college was created. The pitch analysis was performed on all the collected
voice samples and the values were compared to establish a working principle for
the gender classifier from speech.
Chapter 1
INTRODUCTION
1.1 VISION OF SPEECH RECOGNITION
Development of speech recognition systems started early in the 1950’s with ad-
vancement into voice, and individual voice was easily distuinguish from others
voice. Previous limitations and thesis to derive a more reliable method of finding
the relation between 2 sets of speech sounds. For identifying speech research has
continued today under the field of speech processing where many acheivements
have taken place.
The task of speech recognition is to convert speech sounds into a sequence of words
by a computer program such as matlab. As the most natural communication clas-
sification for humans, the dream of speech recognition is to enable humans to
communicate more effectively.
1.2 HISTORY AND BACKGROUND
The first attempts to invent systems for Automatic Speech Recognition (ASR) by
machines which is invented in 1940s. Earlier research leads to the development
of speech and its recognition which was funded by the National Science Founda-
tion (NSF) and Defense Advanced Research Projects Agency (DARPA). Initial
1
Figure 1.1: Speech Recognition Process
research, performed by NSA and National Science Foundation funding which was
conducted in the 1970s. Speech recognition technology was designed initially for
people in the differently abled community like voice recognition overcome people
with disabilities caused by multiple sclerosis, arthritis achieve maximum produc-
tivity on computers. In the early 1970’s of market opportunities emerged for
speech recognition. The early versions of these products were hard to use. The
early speech recognition systems had lot of problems: they were set to be speaker
independent or had small library, or used a very stylized code.However, in the
information technology industry, nothing stays the same for very long and by the
end ofthe 1980s there was a whole new generation of commercial speech recogni-
tion software systems that were easier to use and speech recognition technology
is being used by millions of individuals to automatically create files from dicta-
tion reports are formatted, changed for mistakes in punctuation and grammar;
and verified for consistency or any possible errors. Speech recognition technology
will become widespread as the technology becomes better. Some speech system
2
Figure 1.2: Speech Production model
produces syntax in real scenario, using speech recognition. Speech recognition

activates speech detectors and careers as closed cautioners and Internet streaming
text providers. Having a system to get fluent spoken speech which has driven
speech research for more than 60 years. Although ASR technology is not yet at
the point where machines understand all speech, in any acoustic system, or by
any person, it is used in a number of applications and services. The goal of ASR
research is to allow a system to get in dynamic time, with cent percentile precision
and the words that are spoken by any person, independent of library size, voice of
speaker characteristics. Nowadays, the system are trained to get accuracy of more
than 90. The condition make assumption that user have speech characteristics
which match the training data can get proper speaker adaptation and work in a
clean noise environment such as quiet space.
3
Chapter 2
CHARACTERSTICS
2.1 CHARACTERSTICS OF SPEECH RECOG-
NITION
Speech signals comprises of many type of sounds. It is classified into some cate-
gories as per as their frequency and amplitude sequence are concerned.
Figure 2.1: Characteristics of Speech Signal
4
1. 1. Voiced sounds are produced by oral cavity with the tension of the glottis.
They have less frequency and less mean average zero crossing rates which is
approximately 14 in 10ms. They vibrate in a relaxed mode.
Figure 2.2: Unvoiced Speech Signal
2. Unvoiced sounds are generated by vocal tract through their constriction

which produce upheaval. They usually have more frequency and more mean
average zero crossing rates which is approximately 49 in 10ms.They also high
energy density.
3. Plosive sounds are of two types voiced (b,d,g) and unvoiced(p,k,t).Unvoiced

sound results in release of burst.
5
Figure 2.3: Plosive Speech Signal
2.2 CLASSIFICATION OF SPEECH
Isolated word versus continuous speech: Speech systems which identify only
single words at one instant while continuous recognizes sequences of words at
an instant. The isolated words are easier to prepare. Continuous phrase system
combines patterns of smaller speech such as words into larger speech pattern which
is a combination of words which is a sentence.
Figure 2.4: Contineous worde
6
Figure 2.5: Isolated word
textbfSpeaker dependent versus speaker independent systems: Speaker dependent

is system which is dependent on speaker voice particularly. The Software used
should be trained in such a manner for that particular speech where it learns
the characteristics of speaker voice. Training method is the method where we
train software for particular individual for words like ae, ei, ee, ie etc. Example
is security management in bank, company accessories like laptop etc. Speaker
independent are system which is not dependent on particular speech. This software
does not need training for characteristics. Examples of independent system are
attendance system, car navigation and music system.
Figure 2.6: Contineous word
7
2.3 APPLICATION OF SPEECH
Usages of Speech Recognition: despite of the truth that speech recognition has lot
of limitations but this technology can be very useful in lot of very application but
keeping in mind but weakness and strength of systems.
1. Speech recognition is used for speech to text conversion
2. Speech recognition is used for text to speech conversion.
3. Speech recognition is use for live subtitling on television.
It is used for gender discrimination
8
Chapter 3
SOURCE CODE
3.1 ALOGRITHM
The method/Technique we have used in this project are based on pitch analysis
of the signal.In this method we differentiate between genders which is male or
female on the basis of the pitch and pitch also defined as the basic frequency of
the source.
Therefore we are using a pitch estimate calculation which is quite accurate
and a pitch extractor which is quite efficient .These factors and calculations helps
in differentiating between male and female. The results are quite precise as well.
A PDA which stands for Pitch Detection Algorithm is a process which helps in
determining the pitch or basic frequency which is also known as fundamental
frequency of any speech sound or signal.
While those of female certain pitch or basic frequency range has been defined
on the basis of some results conducted before and similarly for male the frequency
has been pre defined. The results show that the frequency range for female is
higher in comparison to male frequency range.
9
3.2 OUR TECHNIQUE
This technique is based on auto-corelation method.the autocorrelation method

is define as correlation among their waveforms is measure of the similarity.From
the autocorrelation method we get different waveform at different time interval.the
correlation result is to detect the similarity as a function of time difference between
starting of two waveform. The autocorrelation function is the correlation of a
waveform with themselves. As time lag increase with increasing dissimilarity one
of waveform expect same similarity. The pitch extraction process of a speech
signal is based on displaying the short-time autocorrelation function of the speech
signal.
Formula for Short Time Auto correlation speech signal is given by :
X
Rn (k) = x(m)w(n − m) (3.1)
X
Rn (k) = x(m + k)w(n − m − k) (3.2)
here
Rn (k) = Short-Time Auto-correlation
x = Speech Signal
w = Hamming Window
k = Sample time at which auto-correlation is calculated
function varargout = gui speech analysis(varargin)
gui Singleton = 1;
gui State = struct('gui Name', mfilename, ...
'gui Singleton', gui Singleton, ...
10
'gui OpeningFcn', @gui speech analysis OpeningFcn, ...
'gui OutputFcn', @gui speech analysis OutputFcn, ...
'gui LayoutFcn', [] , ...
'gui Callback', []);
if nargin && ischar(varargin{1})
gui State.gui Callback = str2func(varargin{1});
end
if nargout
[varargout{1:nargout}] = gui mainfcn(gui State, varargin{:});
else
gui mainfcn(gui State, varargin{:});
end
% End initialization code − DO NOT EDIT
% −−− Executes just before gui speech analysis is made visible.

function gui speech analysis OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject handle to figure
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
% varargin command line arguments to gui speech analysis (see VARARGIN)
% Choose default command line output for gui speech analysis

handles.output = hObject;
% Update handles structure

global x fs ms20 ms2 r;
x = ones(9600,1);
set(hObject, 'toolbar', 'figure');
guidata(hObject, handles);
% UIWAIT makes gui speech analysis wait for user response (see UIRESUME)
11
% uiwait(handles.figure1);
% −−− Outputs from this function are returned to the command line.
function varargout = gui speech analysis OutputFcn(hObject, eventdata, handles)
% varargout cell array for returning output args (see VARARGOUT);
% hObject handle to figure
% Get default command line output from handles structure

varargout{1} = handles.output;
% −−− Executes on button press in pushbutton plotSignal.

function pushbutton plotSignal Callback(hObject, eventdata, handles)
% hObject handle to pushbutton plotSignal (see GCBO)
global x fs ms20;
axes(handles.axes1);
cla;
time = get(handles.time slider,'Value');
if time == 0
%the first, second and third lines of the message box
msgboxText{1} = 'You have tried to plot a signal without recording one';
msgboxText{2} = 'Try recording a signal using record button';
%this command actually creates the message box
msgbox(msgboxText,'Signal recording not done', 'warn');
else
fs = 8000;
ms20 = fs/50;
t = (0:length(x)−1)/fs;
plot(t,x);
12
title('Waveform');
xlabel('Time (s)');
ylabel('Amplitude');
end
% −−− Executes on button press in pushbutton record.

function pushbutton record Callback(hObject, eventdata, handles)
% hObject handle to pushbutton record (see GCBO)
global x;
x = wavrecord(time*8000,8000);
set(handles.gender,'String','See the estimated gender here');
set(handles.Fx1,'String', 'Fundamental Frequency');
% −−− Executes on button press in estimateGender.

function estimateGender Callback(hObject, eventdata, handles)
% hObject handle to estimateGender (see GCBO)
global x fs ms20 ms2 r;
axes(handles.axes2);
cla;
if time == 0
msgboxText{1} = 'You have tried to estimate gender without recording signal';
13
else
%calculate autocorrelation
r = xcorr(x, ms20, 'coeff');
%plot autocorrelation
d = (−ms20:ms20)/fs;
plot(d, r);
title('Autocorrelation');
xlabel('Delay (s)');
ylabel('Correlation coeff.');
ms2 = fs/500; %maximum speech Fx at 500Hz
ms20 = fs/50; %maximum speech Fx at 50Hz
%just look at region corresponding to positive delays
r = r(ms20 + 1 : 2*ms20+1);
[rmax, tx] = max(r(ms2:ms20));
Fx = fs/(ms2+tx−1);
if Fx <= 175 && Fx >=80

set(handles.gender,'String', 'Male');
set(handles.Fx1,'String', num2str(round(Fx)));
elseif Fx>175 && Fx<=255 % output result
between 70% to 100
set(handles.gender,'String', 'Female');
set(handles.Fx1,'String', num2str(round(Fx)));
else
set(handles.gender,'String', 'Could not recognize. Try speaking slowly.');
set(handles.Fx1,'String', num2str(Fx));
end
end
14
% −−− Executes on button press in Play.
function Play Callback(hObject, eventdata, handles)
% hObject handle to Play (see GCBO)
global x;
if time == 0
msgboxText{1} = 'You have tried to play without recording signal';
else
wavplay(x,8000);
end
% −−− Executes on slider movement.

function time slider Callback(hObject, eventdata, handles)
% hObject handle to time slider (see GCBO)
% Hints: get(hObject,'Value') returns position of slider

% get(hObject,'Min') and get(hObject,'Max') to determine range of slider
set(handles.time text, 'String', num2str(time));
% −−− Executes during object creation, after setting all properties.

function time slider CreateFcn(hObject, eventdata, handles)
% hObject handle to time slider (see GCBO)
15
% handles empty − handles not created until after all CreateFcns called
% Hint: slider controls usually have a light gray background.

if isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor'))
set(hObject,'BackgroundColor',[.9 .9 .9]);
end
16
Chapter 4
Simulation and Results
In this project we have taken into account many practical examples of both female
and male voice. While speaking it has to made sure that background noise is
minimal which does not affect the original signal for whom which we are trying
to differentiate between male and female. Frequency is determined with help of
matlab software. There are different time plot on basis of the time domains and
frequency domains.
Figure 4.1: Time domain plots for male and female
17
Figure 4.2: Frequency domain plots for male and female voice
4.1 Results Obtained
Figure 4.3: Result of female voice
18
Figure 4.4: Female Voice
Figure 4.5: Result of male voice
19
Figure 4.6: Male voice 2
Figure 4.7: Could not recognise
20
Chapter 5
Appendes
Some Important Graphical user interface code
Figure 5.1: Awesome Image
21
22
23
Chapter 6
Conclusion
From the above table it can be easily deduced that there is a significant difference
between the male and female voice in term of frequencies or the pitch. Hence
this method of frequency can act as a parameter in gender difference by setting a
threshold frequency value for male and female voice.
24
Bibliography
[1] .R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-
Hall, Englewood Cliffs, N.J., 1993.
[2] .R Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-
Hall, Englewood Cliffs, N.J., 1978.
[3] .B. Davis and P. Mermelstein, “Comparison of parametric representations for

monosyllabic word recognition in continuously spoken sentences”, IEEE Trans-
actions on Acoustics, Speech, Signal Processing, Vol. ASSP-28, No. 4, August
1980.
[4] . Linde, A. Buzo and R. Gray, “An algorithm for vector quantizer design”,
IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980.
[5] . Furui, “Speaker independent isolated word recognition using dynamic fea-
tures of speech spectrum”, IEEE Transactions on Acoustic, Speech, Signal
Processing, Vol. ASSP-34, No. 1, pp. 52-59, February 1986.
[6] . Furui, “An overview of speaker recognition technology”, ESCA Workshop on

Automatic Speaker Recognition, Identification and Verification, pp. 1-9, 1994.
[7] .K. Song, A.E. Rosenberg and B.H. Juang, “A vector quantisation approach
to speaker recognition”, ATT Technical Journal, Vol. 66-2, pp. 14-26, March
1987.
25
[8] omp.speech Frequently Asked Questions WWW site, http://svr-
www.eng.cam.ac.uk/comp.speech/
26

Speech Recognition for Gender Discrimination Project Report

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Speech Recognition for Gender Discrimination Project Report

Hochgeladen von

Copyright:

Verfügbare Formate

A Project Report

SUBMITTED IN PARTIAL FULFILLMENT FOR THE AWARD

OF DEGREE OF BACHELOR OF TECHNOLOGY

Student Name: Enrolment Number:

HARSH SRIVASTAVA (9911102222)

TARANDEEP SINGH (9911102350)

YOGESH SINDHWANI (9911102363)

Department of Electronics and Communication Engineerng

SUBMITTED IN PARTIAL FULFILLMENT FOR THE AWARD

OF DEGREE OF BACHELOR OF TECHNOLOGY

Student Name: Enrolment Number:

HARSH SRIVASTAVA (9911102222)

TARANDEEP SINGH (9911102350)

YOGESH SINDHWANI (9911102363)

Under the Guidance of

The*success*of*our project depends largely on the encouragement and help from

4 Simulation and Results 17

This paper is about a comparative investigation on speech signals to devise a gen-

1.1 VISION OF SPEECH RECOGNITION

1.2 HISTORY AND BACKGROUND

produces syntax in real scenario, using speech recognition. Speech recognition

2.1 CHARACTERSTICS OF SPEECH RECOG-

Figure 2.1: Characteristics of Speech Signal

Figure 2.2: Unvoiced Speech Signal

2. Unvoiced sounds are generated by vocal tract through their constriction

3. Plosive sounds are of two types voiced (b,d,g) and unvoiced(p,k,t).Unvoiced

2.2 CLASSIFICATION OF SPEECH

Figure 2.4: Contineous worde

textbfSpeaker dependent versus speaker independent systems: Speaker dependent

Figure 2.6: Contineous word

1. Speech recognition is used for speech to text conversion

2. Speech recognition is used for text to speech conversion.

3. Speech recognition is use for live subtitling on television.

It is used for gender discrimination

This technique is based on auto-corelation method.the autocorrelation method

function varargout = gui speech analysis(varargin)

% −−− Executes just before gui speech analysis is made visible.

% Choose default command line output for gui speech analysis

% Update handles structure

% Get default command line output from handles structure

% −−− Executes on button press in pushbutton plotSignal.

% −−− Executes on button press in pushbutton record.

% −−− Executes on button press in estimateGender.

if Fx <= 175 && Fx >=80

% −−− Executes on slider movement.

% Hints: get(hObject,'Value') returns position of slider

% −−− Executes during object creation, after setting all properties.

% Hint: slider controls usually have a light gray background.

Simulation and Results

Figure 4.1: Time domain plots for male and female

4.1 Results Obtained

Figure 4.3: Result of female voice

Figure 4.5: Result of male voice

Figure 4.7: Could not recognise

Some Important Graphical user interface code

Figure 5.1: Awesome Image

Figure 5.3: Awesome Image

Figure 5.5: Awesome Image

[3] .B. Davis and P. Mermelstein, “Comparison of parametric representations for

[6] . Furui, “An overview of speaker recognition technology”, ESCA Workshop on

Das könnte Ihnen auch gefallen

Thesuccessof*our project depends largely on the encouragement and help from