Sie sind auf Seite 1von 33

A Project Report

on
Speech Recognition for Gender Discrimination

SUBMITTED IN PARTIAL FULFILLMENT FOR THE AWARD

OF DEGREE OF BACHELOR OF TECHNOLOGY

IN
ELECTRONICS AND COMMUNICATION ENGINEERING

Student Name: Enrolment Number:

HARSH SRIVASTAVA (9911102222)

TARANDEEP SINGH (9911102350)

YOGESH SINDHWANI (9911102363)

Department of Electronics and Communication Engineerng


Jaypee Institute of Information Technology, Noida,
June 2015
A Project Report

on
Speech Recognition for Gender Discrimination

SUBMITTED IN PARTIAL FULFILLMENT FOR THE AWARD

OF DEGREE OF BACHELOR OF TECHNOLOGY

IN
ELECTRONICS AND COMMUNICATION ENGINEERING

Student Name: Enrolment Number:

HARSH SRIVASTAVA (9911102222)

TARANDEEP SINGH (9911102350)

YOGESH SINDHWANI (9911102363)

Under the Guidance of


Mr.B.Suresh
Department of Electronics and Communication Engineerng
Jaypee Institute of Information Technology, Noida,
June 2015
CERTIFICATE
This is to certify that the work titled “speech recognition – gender discrim-
intion” submitted by “Yogesh Sindhwani , Tarandeep Singh and Harsh Srivastva”
in partial fulfillment for the award of degree of Bachelor of Technology (B.Tech)
of Jaypee Institute of Information Technology-128, Noida has carried out under
my supervision. The project has not been submitted to any other University or
Institute for the award of this or any other degree or diploma.

Signature of Supervisor .
Name of Supervisor : Mr. B Suresh
Date:
ACKNOWLEDGEMENT

The*success*of*our project depends largely on the encouragement and help from


many others. We take this opportunity to express our*thankfulness to the people
who have contributed in the successful completion of this project*and have been
the real guiding force in our project.
We would like to show our gratitude to our project mentor Mr. B Suresh
for*continuously guiding and tremendously helping*us throughout this project.
We feel motivated and encouraged every time we attend his meeting. Without*his
encouragement and guidance the completion of this project on speech recognition
gender discrimination would not have been achieved. .

ii
DECLARATION

Project work is a part of our curriculum that gives us the knowledge about the
topic and subject we have studied. It also helps in understanding and relating
the theoretical concepts better which was not enlightened in the classroom. We
have prepared this report as a part of our ‘MAJOR PROJECT FOR SEMESTER
VIII’. The topic we have selected for the project is ‘SPEECH RECOGNITION -
GENDER DISCRIMINATION ’

Student Name:

HARSH SRIVASTAVA

TARANDEEP SINGH

YOGESH SINDHWANI

iii
Contents

CERTIFICATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . iii
DECLARATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

1 INTRODUCTION 1
1.1 VISION OF SPEECH RECOGNITION . . . . . . . . . . . . . . . 1
1.2 HISTORY AND BACKGROUND . . . . . . . . . . . . . . . . . . . 1

2 CHARACTERSTICS 4
2.1 CHARACTERSTICS OF SPEECH RECOGNITION . . . . . . . . 4
2.2 CLASSIFICATION OF SPEECH . . . . . . . . . . . . . . . . . . . 6
2.3 APPLICATION OF SPEECH . . . . . . . . . . . . . . . . . . . . . 8

3 SOURCE CODE 9
3.1 ALOGRITHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 OUR TECHNIQUE . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Simulation and Results 17


4.1 Results Obtained . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Appendes 21

6 Conclusion 24

iv
Abstract

This paper is about a comparative investigation on speech signals to devise a gen-


der classifier. Gender classification by speech analysis basically aims to predict the
gender of the speaker by analyzing different parameters of the voice sample. This
comparative investigation mainly concentrates on pitch analysis of the speech sig-
nals. The analysis includes comparison of different pitch values of male and female
voice samples. This quantitative comparison is implemented through MATLAB
programming by determining pitch through auto-correlation method. A database
consisting of voice samples collected from many students, both male and female,
of our college was created. The pitch analysis was performed on all the collected
voice samples and the values were compared to establish a working principle for
the gender classifier from speech.
Chapter 1

INTRODUCTION

1.1 VISION OF SPEECH RECOGNITION

Development of speech recognition systems started early in the 1950’s with ad-
vancement into voice, and individual voice was easily distuinguish from others
voice. Previous limitations and thesis to derive a more reliable method of finding
the relation between 2 sets of speech sounds. For identifying speech research has
continued today under the field of speech processing where many acheivements
have taken place.
The task of speech recognition is to convert speech sounds into a sequence of words
by a computer program such as matlab. As the most natural communication clas-
sification for humans, the dream of speech recognition is to enable humans to
communicate more effectively.

1.2 HISTORY AND BACKGROUND

The first attempts to invent systems for Automatic Speech Recognition (ASR) by
machines which is invented in 1940s. Earlier research leads to the development
of speech and its recognition which was funded by the National Science Founda-
tion (NSF) and Defense Advanced Research Projects Agency (DARPA). Initial

1
Figure 1.1: Speech Recognition Process

research, performed by NSA and National Science Foundation funding which was
conducted in the 1970s. Speech recognition technology was designed initially for
people in the differently abled community like voice recognition overcome people
with disabilities caused by multiple sclerosis, arthritis achieve maximum produc-
tivity on computers. In the early 1970’s of market opportunities emerged for
speech recognition. The early versions of these products were hard to use. The
early speech recognition systems had lot of problems: they were set to be speaker
independent or had small library, or used a very stylized code.However, in the
information technology industry, nothing stays the same for very long and by the
end ofthe 1980s there was a whole new generation of commercial speech recogni-
tion software systems that were easier to use and speech recognition technology
is being used by millions of individuals to automatically create files from dicta-
tion reports are formatted, changed for mistakes in punctuation and grammar;
and verified for consistency or any possible errors. Speech recognition technology
will become widespread as the technology becomes better. Some speech system

2
Figure 1.2: Speech Production model

produces syntax in real scenario, using speech recognition. Speech recognition


activates speech detectors and careers as closed cautioners and Internet streaming
text providers. Having a system to get fluent spoken speech which has driven
speech research for more than 60 years. Although ASR technology is not yet at
the point where machines understand all speech, in any acoustic system, or by
any person, it is used in a number of applications and services. The goal of ASR
research is to allow a system to get in dynamic time, with cent percentile precision
and the words that are spoken by any person, independent of library size, voice of
speaker characteristics. Nowadays, the system are trained to get accuracy of more
than 90. The condition make assumption that user have speech characteristics
which match the training data can get proper speaker adaptation and work in a
clean noise environment such as quiet space.

3
Chapter 2

CHARACTERSTICS

2.1 CHARACTERSTICS OF SPEECH RECOG-

NITION

Speech signals comprises of many type of sounds. It is classified into some cate-
gories as per as their frequency and amplitude sequence are concerned.

Figure 2.1: Characteristics of Speech Signal

4
1. 1. Voiced sounds are produced by oral cavity with the tension of the glottis.
They have less frequency and less mean average zero crossing rates which is
approximately 14 in 10ms. They vibrate in a relaxed mode.

Figure 2.2: Unvoiced Speech Signal

2. Unvoiced sounds are generated by vocal tract through their constriction


which produce upheaval. They usually have more frequency and more mean
average zero crossing rates which is approximately 49 in 10ms.They also high
energy density.

3. Plosive sounds are of two types voiced (b,d,g) and unvoiced(p,k,t).Unvoiced


sound results in release of burst.

5
Figure 2.3: Plosive Speech Signal

2.2 CLASSIFICATION OF SPEECH

Isolated word versus continuous speech: Speech systems which identify only
single words at one instant while continuous recognizes sequences of words at
an instant. The isolated words are easier to prepare. Continuous phrase system
combines patterns of smaller speech such as words into larger speech pattern which
is a combination of words which is a sentence.

Figure 2.4: Contineous worde

6
Figure 2.5: Isolated word

textbfSpeaker dependent versus speaker independent systems: Speaker dependent


is system which is dependent on speaker voice particularly. The Software used
should be trained in such a manner for that particular speech where it learns
the characteristics of speaker voice. Training method is the method where we
train software for particular individual for words like ae, ei, ee, ie etc. Example
is security management in bank, company accessories like laptop etc. Speaker
independent are system which is not dependent on particular speech. This software
does not need training for characteristics. Examples of independent system are
attendance system, car navigation and music system.

Figure 2.6: Contineous word

7
2.3 APPLICATION OF SPEECH

Usages of Speech Recognition: despite of the truth that speech recognition has lot
of limitations but this technology can be very useful in lot of very application but
keeping in mind but weakness and strength of systems.

1. Speech recognition is used for speech to text conversion

2. Speech recognition is used for text to speech conversion.

3. Speech recognition is use for live subtitling on television.

It is used for gender discrimination

8
Chapter 3

SOURCE CODE

3.1 ALOGRITHM

The method/Technique we have used in this project are based on pitch analysis
of the signal.In this method we differentiate between genders which is male or
female on the basis of the pitch and pitch also defined as the basic frequency of
the source.
Therefore we are using a pitch estimate calculation which is quite accurate
and a pitch extractor which is quite efficient .These factors and calculations helps
in differentiating between male and female. The results are quite precise as well.
A PDA which stands for Pitch Detection Algorithm is a process which helps in
determining the pitch or basic frequency which is also known as fundamental
frequency of any speech sound or signal.
While those of female certain pitch or basic frequency range has been defined
on the basis of some results conducted before and similarly for male the frequency
has been pre defined. The results show that the frequency range for female is
higher in comparison to male frequency range.

9
3.2 OUR TECHNIQUE

This technique is based on auto-corelation method.the autocorrelation method


is define as correlation among their waveforms is measure of the similarity.From
the autocorrelation method we get different waveform at different time interval.the
correlation result is to detect the similarity as a function of time difference between
starting of two waveform. The autocorrelation function is the correlation of a
waveform with themselves. As time lag increase with increasing dissimilarity one
of waveform expect same similarity. The pitch extraction process of a speech
signal is based on displaying the short-time autocorrelation function of the speech
signal.
Formula for Short Time Auto correlation speech signal is given by :

X
Rn (k) = x(m)w(n − m) (3.1)

X
Rn (k) = x(m + k)w(n − m − k) (3.2)

here
Rn (k) = Short-Time Auto-correlation
x = Speech Signal
w = Hamming Window
k = Sample time at which auto-correlation is calculated

function varargout = gui speech analysis(varargin)

gui Singleton = 1;
gui State = struct('gui Name', mfilename, ...
'gui Singleton', gui Singleton, ...

10
'gui OpeningFcn', @gui speech analysis OpeningFcn, ...
'gui OutputFcn', @gui speech analysis OutputFcn, ...
'gui LayoutFcn', [] , ...
'gui Callback', []);
if nargin && ischar(varargin{1})
gui State.gui Callback = str2func(varargin{1});
end

if nargout
[varargout{1:nargout}] = gui mainfcn(gui State, varargin{:});
else
gui mainfcn(gui State, varargin{:});
end
% End initialization code − DO NOT EDIT

% −−− Executes just before gui speech analysis is made visible.


function gui speech analysis OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject handle to figure
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
% varargin command line arguments to gui speech analysis (see VARARGIN)

% Choose default command line output for gui speech analysis


handles.output = hObject;

% Update handles structure


global x fs ms20 ms2 r;
x = ones(9600,1);
set(hObject, 'toolbar', 'figure');
guidata(hObject, handles);

% UIWAIT makes gui speech analysis wait for user response (see UIRESUME)

11
% uiwait(handles.figure1);

% −−− Outputs from this function are returned to the command line.
function varargout = gui speech analysis OutputFcn(hObject, eventdata, handles)
% varargout cell array for returning output args (see VARARGOUT);
% hObject handle to figure
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)

% Get default command line output from handles structure


varargout{1} = handles.output;

% −−− Executes on button press in pushbutton plotSignal.


function pushbutton plotSignal Callback(hObject, eventdata, handles)
% hObject handle to pushbutton plotSignal (see GCBO)
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
global x fs ms20;
axes(handles.axes1);
cla;
time = get(handles.time slider,'Value');
if time == 0
%the first, second and third lines of the message box
msgboxText{1} = 'You have tried to plot a signal without recording one';
msgboxText{2} = 'Try recording a signal using record button';
%this command actually creates the message box
msgbox(msgboxText,'Signal recording not done', 'warn');
else
fs = 8000;
ms20 = fs/50;
t = (0:length(x)−1)/fs;
plot(t,x);

12
title('Waveform');
xlabel('Time (s)');
ylabel('Amplitude');
end
guidata(hObject, handles);

% −−− Executes on button press in pushbutton record.


function pushbutton record Callback(hObject, eventdata, handles)
% hObject handle to pushbutton record (see GCBO)
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
global x;
time = get(handles.time slider,'Value');
x = wavrecord(time*8000,8000);
set(handles.gender,'String','See the estimated gender here');
set(handles.Fx1,'String', 'Fundamental Frequency');
guidata(hObject, handles);

% −−− Executes on button press in estimateGender.


function estimateGender Callback(hObject, eventdata, handles)
% hObject handle to estimateGender (see GCBO)
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
global x fs ms20 ms2 r;
axes(handles.axes2);
cla;
time = get(handles.time slider,'Value');
if time == 0
%the first, second and third lines of the message box
msgboxText{1} = 'You have tried to estimate gender without recording signal';
msgboxText{2} = 'Try recording a signal using record button';
%this command actually creates the message box

13
msgbox(msgboxText,'Signal recording not done', 'warn');
else
%calculate autocorrelation
r = xcorr(x, ms20, 'coeff');
%plot autocorrelation
d = (−ms20:ms20)/fs;
plot(d, r);
title('Autocorrelation');
xlabel('Delay (s)');
ylabel('Correlation coeff.');
ms2 = fs/500; %maximum speech Fx at 500Hz
ms20 = fs/50; %maximum speech Fx at 50Hz
%just look at region corresponding to positive delays
r = r(ms20 + 1 : 2*ms20+1);
[rmax, tx] = max(r(ms2:ms20));
Fx = fs/(ms2+tx−1);

if Fx <= 175 && Fx >=80


set(handles.gender,'String', 'Male');
set(handles.Fx1,'String', num2str(round(Fx)));
guidata(hObject, handles);
elseif Fx>175 && Fx<=255 % output result
between 70% to 100
set(handles.gender,'String', 'Female');
set(handles.Fx1,'String', num2str(round(Fx)));
guidata(hObject, handles);
else
set(handles.gender,'String', 'Could not recognize. Try speaking slowly.');
set(handles.Fx1,'String', num2str(Fx));
guidata(hObject, handles);
end
end

14
% −−− Executes on button press in Play.
function Play Callback(hObject, eventdata, handles)
% hObject handle to Play (see GCBO)
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
global x;
time = get(handles.time slider,'Value');
if time == 0
%the first, second and third lines of the message box
msgboxText{1} = 'You have tried to play without recording signal';
msgboxText{2} = 'Try recording a signal using record button';
%this command actually creates the message box
msgbox(msgboxText,'Signal recording not done', 'warn');
else
wavplay(x,8000);
guidata(hObject, handles);
end

% −−− Executes on slider movement.


function time slider Callback(hObject, eventdata, handles)
% hObject handle to time slider (see GCBO)
% eventdata reserved − to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)

% Hints: get(hObject,'Value') returns position of slider


% get(hObject,'Min') and get(hObject,'Max') to determine range of slider
time = get(handles.time slider,'Value');
set(handles.time text, 'String', num2str(time));
guidata(hObject, handles);

% −−− Executes during object creation, after setting all properties.


function time slider CreateFcn(hObject, eventdata, handles)
% hObject handle to time slider (see GCBO)

15
% eventdata reserved − to be defined in a future version of MATLAB
% handles empty − handles not created until after all CreateFcns called

% Hint: slider controls usually have a light gray background.


if isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor'))
set(hObject,'BackgroundColor',[.9 .9 .9]);
end

16
Chapter 4

Simulation and Results

In this project we have taken into account many practical examples of both female
and male voice. While speaking it has to made sure that background noise is
minimal which does not affect the original signal for whom which we are trying
to differentiate between male and female. Frequency is determined with help of
matlab software. There are different time plot on basis of the time domains and
frequency domains.

Figure 4.1: Time domain plots for male and female

17
Figure 4.2: Frequency domain plots for male and female voice

4.1 Results Obtained

Figure 4.3: Result of female voice

18
Figure 4.4: Female Voice

Figure 4.5: Result of male voice

19
Figure 4.6: Male voice 2

Figure 4.7: Could not recognise

20
Chapter 5

Appendes

Some Important Graphical user interface code

Figure 5.1: Awesome Image

21
Figure 5.2: Awesome Image

Figure 5.3: Awesome Image

22
Figure 5.4: Awesome Image

Figure 5.5: Awesome Image

23
Chapter 6

Conclusion

From the above table it can be easily deduced that there is a significant difference
between the male and female voice in term of frequencies or the pitch. Hence
this method of frequency can act as a parameter in gender difference by setting a
threshold frequency value for male and female voice.

24
Bibliography

[1] .R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-
Hall, Englewood Cliffs, N.J., 1993.

[2] .R Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-
Hall, Englewood Cliffs, N.J., 1978.

[3] .B. Davis and P. Mermelstein, “Comparison of parametric representations for


monosyllabic word recognition in continuously spoken sentences”, IEEE Trans-
actions on Acoustics, Speech, Signal Processing, Vol. ASSP-28, No. 4, August
1980.

[4] . Linde, A. Buzo and R. Gray, “An algorithm for vector quantizer design”,
IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980.

[5] . Furui, “Speaker independent isolated word recognition using dynamic fea-
tures of speech spectrum”, IEEE Transactions on Acoustic, Speech, Signal
Processing, Vol. ASSP-34, No. 1, pp. 52-59, February 1986.

[6] . Furui, “An overview of speaker recognition technology”, ESCA Workshop on


Automatic Speaker Recognition, Identification and Verification, pp. 1-9, 1994.

[7] .K. Song, A.E. Rosenberg and B.H. Juang, “A vector quantisation approach
to speaker recognition”, ATT Technical Journal, Vol. 66-2, pp. 14-26, March
1987.

25
[8] omp.speech Frequently Asked Questions WWW site, http://svr-
www.eng.cam.ac.uk/comp.speech/

26

Das könnte Ihnen auch gefallen