2016-Assessment On Speech Emotion Recognition For ASD

World Applied Sciences Journal 34 (1): 94-102, 2016
ISSN 1818-4952
IDOSI Publications, 2016
DOI: 10.5829/idosi.wasj.2016.34.1.15637
Assessment on Speech Emotion Recognition for

Autism Spectrum Disorder children using Support Vector Machine
1
C. Sunitha Ram and 2R. Ponnusamy
Department of CSE, SCSVMV University, Kanchipuram, India

Principal, Rajiv Gandhi College of Engineering, Sriperumbudur, India
1
Abstract: The affinity between human and machine interaction is a formidable mission that machine should be
able to identify and respond human non verbal communication such as emotions through signal processing
become more natural. This proposal compares emotion from the Autism Spectrum Disorder children with other
Language like Tamil, Telugu and German. It is an extension of recognising and classifies emotion from Autism
Spectrum Disorder children of their speech of Tamil language with Telugu database. Discrete Wavelet
Transform and Mel Frequency Cepstral Coefficient are the feature extraction method which is used. In speech
emotion recognition discipline Support Vector Machine was used for classification method to classify various
databases like Berlin Database, speaker dependent Tamil emotion databases of Autism Spectrum Disorder
children, Tamil Databases and Telugu Databases carries the data from movies. The experimentation shows that
the results of Berlin-DB, Telugu-DB, Tamil-DB and ASD-DB exhibits the values are closer.
Key words: Autism Spectrum disorder
Emotion recognising and classify
Mel Frequency Cepstral
Coefficient Support Vector machine Discrete Wavelet Transform
INTRODUCTION
expressions. To achieve Human Computer Interaction an

assessment is made upon automatic speech emotion
recognition [3].
Many researchers explored their ideas on speech
emotion recognition. In this connection they uses several
feature extraction methods such as Discrete Wavelet
Transform(DWT), Dynamic Time Wrapping (DTW),
Linear Prediction Coefficients (LPC), Linear Prediction
Cepstral Coefficients (LPCC), Modulation Spectral
features (MSF), Mel Frequency Cepstral Coefficients
(MFCC), Zero Crossings with Peak Amplitudes (ZCPA),
Log Frequency Power Coefficients (LFPC). Several
classification methods such as Gaussian Mixture Model
(GMM), Hidden Markov model (HMM), Maximum
Likelihood Bayesian Classifier (MLBC), k-nearest
Neighbours (KNN), Granular SVM (GSVM), Support
Vector Machines (SVM), Radial Basis Function (RBF),
Back Propagation Neural Network (BPNN), Adaptive
Resonance theory Neural Network (ARTNN), Multi Layer
Perceptron (MLP), Binary Decision Tree (BDT) are
inquired by many researchers.
Various application of speech emotion recognition in
daily life are:
As per human perspective emotions are imperative to

express his or her psychological condition to human.
Autism Spectrum disability children havent natural ability
to recognize the emotions and machine does not have
capability to analyse emotions from the human speech.
In past decade, identify emotion are attracted by the
researchers towards automatic speech emotion
recognition from speech signal [1].
An Autism Spectrum disorder is difficult to recognise
and express the emotions from speech in everyday life.
Verbal and Nonverbal communication and monotonous
behaviours in emotion plays a critical role in child
development and this role is emphasized with the help of
clinicians in Autism Spectrum disability of children [2].
Normally emotions are categorised as primary and
secondary. Basic emotions are joy, anger, fear, sad and
happy. Most of the work emphasized on primary emotions
which is called as basic emotions. In natural speech, body
language, gestures and facial expressions are used to
express basic emotion respectively. The present study
focuses on the recognition of emotion in vocal
Corresponding Author: C. Sunitha Ram, Department of CSE, SCSVMV University, Kanchipuram, India.
94
World Appl. Sci. J., 34 (1): 94-102, 2016
Improve the quality of the naturalness in speech

based HCI.
Improving the service of call assistant may be used
in Call center exchange information.
Student emotion states was remodel themselves by
some of the applications like Storytelling and Etutoring.
In crime investigation department emotion takes
important role to finding the misconduct between
criminals by telephone conversation.
Like human, robotic pets and humanoid are express
and categorize emotions by conversation.
Psychiatric diagnosis.
In Banking through ATM ensure higher security to
access confidential data
In speech recognition aircraft cockpits recognize
speech with stress which is used for better
performance [4].
A large sample was taken from Autism Spectrum

disorder youngster and results was analysed. SER
modelled by face and speech components [7].
The basic emotion of affected children with ASD has
been recognised using physiology based affective
modelling. It allows identifying the affective states of
child with ASD by recognising the basic emotion in the
form of Human Computer Interaction [8].
Compare and analyze manually selected prosodic
features with automatic selected features with respect to
autism spectrum disorder children for discriminating each
of the eight emotion classes [9].
Detect and classify facial emotional expression
through game technology for ASD children which is
helpful in teaching mode for teachers and parents of
children [10].
Compare to sad emotion in music, happy emotion in
music increases brain activation for ASD children. Results
are compared between ASD and Neurotypical with brain
parameters [11].
ASD children memorize and differentiate new learner
with existing learner based on their speech with gender
[12].
Recognition of fear is more critical to find than
recognition of happiness in autism. They provide
approximate result for poorer recognition of negative
emotion [13].
Author trails vocal emotion on verbal semantics and
face emotion trails on no semantic content. Identifying
emotion for ASC adults and adults was more accurate and
performance was compared [14].
Speech emotion recognition system implemented
by the spectral components of MFCC and DWT for
feature extraction and different machine learning
algorithms were used for classification like GMM, K-NN
[16].
From Speech signal emotion is extracted by Mel
frequency perceptual linear predictive cepstrum and
classification is made by Vector quantization and K-means
clustering technique [15].
An intelligent robot with sensor is incorporated and
it is modeled for virtual education system which is used to
study the emotion from facial expression and heart rate of
the student [17].
Using EEG signal, DWT extracted features from
primary emotion and classification accuracy was
compared by following methods such as kNN and
LDA[18].
This proposed work uses both Telugu and Tamil

database carries from movies and speaker dependent
Tamil emotional database built from ASD children and
Berlin database to train recognise and classify primary
emotion from speech for SER system. Most of the
researchers use spectral features in speech emotion
recognition which gives better result in feature extraction.
In this paper, MFCC and DWT have been used to extract
good features from speech signal and eliminate bad
features like noise, gap etc. MFCC is a best approach to
examine signal from speech. It derives Fourier transform,
power of spectrum, log power spectrum and resulting
amplitude spectrum [5].
DWT break up the input speech signal by using
these steps such as Low pass filter, coefficients extraction
and High pass filter. Another method is decomposing the
bad signal from speech signal and result was obtained
from 4th and 6th level using db4 wavelets. After extracting
good features SVM is used for training, testing and
classifying the primary emotion.
The article is structured as follows: Section 2
summarise literature review, Section 3 shows the design
of the propound work, Section 4 shows the results and
discussions and Section 5 shows the conclusion and
future directions.
Literature Review: Multimodal like facial and vocal
recognition employ to ascertain the presence or absence
of emotion recognition to rectify the problems in youth
with ASD by ERP measures [6].
95
World Appl. Sci. J., 34 (1): 94-102, 2016
Proposed Work
Overview of Proposed Work
Speech Emotion Databases: Primary emotion speech
databases were created for this experiment. Speech
samples if collected from real life situations then it will be
more realistic and natural. But it is difficult to collect due
to some natural parameters such as noise at the time of
recordings. Database plays a vital role for automatic
speech emotion recognition. As the naturalness of the
database increases the complexity increases based on
parameters. Selecting the correct parameters is an
important part of reducing the complexity of system.
Tamil and Telugu Database: In existing work ASD

children Tamil emotion database was compared with
Tamil emotion database and Berlin Emotion database.
In this connection for better result Telugu database was
compared which is carries from cinemas. Actors under the
age group of 40-50 years were used for creating the
speech emotion databases. In existing work Tamil emotion
database was taken from cinemas [20]. In the same way
Telugu emotion database was created and it consists of
Six Female speech samples and Eight Male speech
samples respectively. It consists of 15 and 13 speech
utterances of male and female with primary emotion
respectively.
ASD Database: The ASD children under the age group of

7-12 years are used for collecting the primary emotion.
The Speech corpus consisting of 15 male speech samples
and 14 female speech samples were collected from Special
school in front of their practitioners. Speech samples
consist of 35 and 26 speech utterances of male and female
respectively. Both speech samples are natural ie., speaker
dependent. Table 3.1 gives the information of recording
and training session.
During a day discussion with the single child among
29, a playing method is implemented to interact with the
child and recording was done in the special school. In the
class room without any interruption recoding the speech
signal was done in smooth manner. The Clinicians and the
individual child sat in front of laptop and at the same time
speech utterances were also recorded by cell phone.
Web camera is used as recording device in laptop.
The microphone and cell phone stood next to the
laptop, about 25 cm in front of the child. Recordings were
taken in. wav format by laptop as well as .ogg format by
cell phone at a sampling rate 16000 KHz and a
quantization of 16 bits video format. By many online
conversions .ogg format and .mp3 format are converted
into .wav file format. According to the primary emotion
files are split into 2 to 3 minutes as per Berlin emotion
database which is used for comparison.
The clinicians read a sequence of short Tamil stories
from a story book and train the ASD children to repeat the
speech utterances with emotion in Tamil language itself.
The stories contained many emotion utterances
and each of these phrases related to a specific emotion.
To illustrate: [Paampai parka payamaga irunthathu] It
was fear emotion. In total, each story as described with
3 to 5 different emotional utterances and it contains 7
stories with 46 utterances. According to the emotion the
utterances are characterized [19].
Feature Extraction
Feature Extraction from Wavelet Transform: The speech
signal is send into sequential high pass and low pass filter
in order to extract features from wavelet coefficients.
Daubechis wavelet family provides good results for signal
analysis [21]. The feature vectors obtained at six level
wavelet decomposition provides close packed
representation of the signal. The coefficients are
calculated by the bandwidth from low to high. The Sub
band coefficients are sum by the original signal in the
form of cD1-cD6. Feature vectors are obtained by
standard deviation (mean) and pitch frequency f0 by
applying common statistics [22].
Energy distributed in the form of frequency and time
domain of the signal is described by Wavelet
transformation. In this technique decomposing the
signal into multilevel sequential frequency bands as sub
bands and this process is done by low and high pass
filters.
In DWT, it has different types of wavelet function.
Information extracted by different types of families need
not be same. To evaluate the information for particular
application, we are in need to choose exact wavelet
function to get exact result.
The output of high pass filter and low pass
filter can be represented in discrete wavelet
decomposition of signal and mathematically solved by
equation 1 and 2.
Yhigh [k ] =
X [n]g[2k -1]
(1)
Ylow [k ] =
X [n]h[2k - 1]
(2)
where Yhigh and Ylow are the outputs of the high band
pass and low band pass filters respectively [22].
96
World Appl. Sci. J., 34 (1): 94-102, 2016

Table 3.1: Training Requirement Process Description
S.No
Process
Description
----------------------------------------------------------------------------------------------------------------------------------------------ASD Database
Tamil Database
Telugu Database
1
2
3
4
5
Speaker
Tools
Recording Environment
Sampling Frequency, fs
Feature extraction
Fourteen Female, Fifteen Male

Laptop with Web camera and cell phone
Special school, Class room
16000Khz
39 double delta MFCC coefficient.
Seven Female, Eleven Male

Gold Wave software
Laboratory
16000Khz
Six Female, Eight Male

Gold Wave software
Laboratory
16000Khz
Fig. 3.1: Diagrammatic representation of Discrete Wavelet transformation of a speech signal

recognition performance. The good feature extraction
result of this phase is important for the next phase. Below
1000 Hz as low Frequency and above 1000Hz as
logarithmic spacing are the two types of MFCC filters. A
subjective pitch is present on Mel Frequency Scale to
capture important characteristic of phonetic in speech
[23].
As shown in Figure 3.2, MFCC feature extraction
implemented by seven steps. Each step is executed by
mathematical function as discussed below:
Emotional speech
Preprocessing
Preprocessing
Framing
Wavelet Decomposition
Hamming windowing
Low pass filter
Step 1: Preemphasis
This step processes the passing of signal through a
filter which emphasizes higher frequencies. This process
will increase the energy of signal at higher frequency [24].
FFT
Triangular Band
Pass Filter
Coefficients Extraction
Y (n)= X(n) - a*X(n-1)
Logarithm
Lets consider a = 0.95, which make 95% of any one

sample is presumed to originate from previous sample.
High pass filter

DCT
MFCC
(3.1)
Step 2: Framing
The process of segmenting the speech samples
obtained from analog to digital conversion (ADC) into a
small frame with the length within the range of 20 to 40
msec. The voice signal is split up into N samples in the
form of frames. Adjacent frames are being separated by M
(M<N).
DWT
Fig. 3.2: Shows feature extraction process using MFCC

and DWT
Feature Extraction from MFCC: The extracting
information of acoustic signal in the form of parametric
representation is a vital quest to produce a better
Frame = floor((l - N) /M) + 1;

Typical values used are M = 100 and N= 256.
97
(3.2)
World Appl. Sci. J., 34 (1): 94-102, 2016
Step 3: Hamming windowing

Hamming window is used to integrate all the closest
frequency lines by considering the next flow in feature
extraction processing. The Hamming window equation is
given as:
adjacent filters [20]. Then, each filter output is the sum of

its filtered spectral components.
Step 6: Discrete Cosine Transform
Discrete Cosine Transform is the conversion of the
log Mel spectrum into time domain. This conversion is
called Mel Frequency Cepstral Coefficient. The set of
coefficient is called acoustic vectors. Therefore, each
input utterance is transformed into a sequence of acoustic
vector [20].
If the window is defined as W (n), 0 n N-1 where

N = number of samples in each frame
Y[n] = Output signal
X (n) = input signal
W (n) = Hamming window, then the result of windowing
signal is shown below:
SVM Training and Classification: SVM is a very a good

classifier which gives its solution unique and global
widely used for pattern recognition. SVM empirically
gives good performance in classification. Thus we used
this machine as multi classifier for classifying the primary
emotion with different emotional databases.
Optimized SVM is most widely used tool for
SVM training and classification [25]. LIBSVM runs a
one-against-one classification for each of the k classes.
Support vector machine is compared between different
classes and also minimize the error by the given k( k- 1)/2
classifiers [26].
Multiclass SVM classifier is the best method to
classify the primary emotion. First the classifier is trained
and tested with primary inputs of different emotional
states. After training the classifier, based on the
parameters the system can recognize the new given input.
In the series of actions, SVM training contains labelling of
extracted features.
If the signal in a frame is denoted by Y(n), n = 0, N1, then the signal after Hamming windowing is X(n)*W(n),
where W(n) is the Hamming window defined by:
= 00.5
W(n, ) = (1 - ) -
cos(2 n/(N-1)) (3.3)
Step 4: Fast Fourier Transform

Each frame of N samples is converted time domain
into frequency domain. The Fourier Transform is to
convert the convolution of the glottal pulse U[n] and the
vocal tract impulse response H[n] in the time domain.
This statement supports the equation below:
Y (w)= FFT[h(t)* X(t)]= H(w)*X(w)
(3.4)
If X (w), H (w) and Y (w) are the Fourier Transform of

X (t), H (t) and Y (t) respectively.
Berlin
emotion
database
Step 5: Triangular Bandpass Filters

The frequencies range in FFT spectrum and voice
signal does not follow the linear scale. To get the log
energy, multiple the magnitude frequency response by a
set of 20 triangular bandpass filters. The positions of
these filters are equally spaced along the Mel frequency,
which is related to the common linear frequency f by the
following equation:
ASD Tamil
emotion
database
MFCC Feature Extraction
Tamil
emotion
database
DWT Feature Extraction
SVM Training & Testing
mel(f)=2595*ln(1+f/700)
(3.5)
SVM classification
To get Mel scale as an output, a weighted sum of

filters was computed by a set of triangular filters. Each
filters magnitude frequency response is triangular in
shape and equal to unity at the centre frequency and
decrease linearly to zero at centre frequency of two
Emotion speech output
Fig. 3.3: Overview of Proposed Work

98
Telugu
emotion
database
World Appl. Sci. J., 34 (1): 94-102, 2016
After the training process, every feature is extracted

and names the class label according to the primary
emotion. The SVM is trained and tested according to the
labelled feature parameters. The SVM kernel functions are
used in the training process of SVM. The result can be
improved if we implemented all the above steps properly
[20].The Profound representation of this work is shown in
Fig. 3.3.
Table 4.1: Berlin Emotional Speech Database

Emotion
Number of utterances
Anger
Happiness
Neutral
Sadness
Fear
112
69
72
59
55
Total
367
Table 4.2: Confusion matrix for SVM classifier
RESULTS AND DISCUSSIONS

Tamil emotion database for ASD is created that
contains an emotion utterance which was recorded from
Special School in Class room. It consists of 35 and 26
speech utterances of male and female respectively.
Files are recorded in the form of .mp3, .ogg format
and then the files are converted into .wav file format.
In this experiment we uses Berlin Emotion database
for testing and classifying emotion with different
emotion databases. It consists of 367 speech files of
five primary emotion classes. Emotion classes like
anger, sad, happy, neutral and fear are having 112, 69, 72,
59 and 55 speech utterance respectively as shown in
Table 4.1.
SVM were developed to deal with binary
classification but it will extend them for multi-class
classification. The SVM is used to test and train
based on MFCC and DWT feature vectors. Speaker
dependent of ASD and speaker independent of
Berlin, Tamil, Telugu databases are taken for
experimentation. The results of SVM classification are
given in the form of confusion matrix tables. The SVM
technique is trained and classified based on the
parameters as shown below:
Type of Databases Features
Angry Happy Sad
Neutral Fear
EMO-DB
Pitch (F0)
75
76
75.4
73.66
SD(Mean)
70
68.55
70.34 70.55
74
Max
73.8
74.8
74.44 72.34
77.5
Min
66.2
62.3
66.24 68.76
70.5
Fusion
71.25
70.41
71.61 71.33
75.23
Pitch (F0)
ASD- DB
TAMIL-DB
TELUGU-DB
78.91
74.71
68.95
69.96 68.22
75.5
SD(Mean) 73.19
72.13
71.05 70.16
74.86
Max
75.31
78.03
76.53 74.78
79.26
Min
71.06
66.22
65.56 65.53
70.46
Fusion
73.57
71.33
70.78 69.67
75.02
Pitch (F0)
69.91
73.5
78.69 72.22
74.5
SD (Mean) 74.69
76.13
72.55 70.16
74.21
Max
78.31
78.03
73.53 70.78
77.16
Min
71.06
74.22
71.56 69.53
71.26
Fusion
73.49
75.47
74.08 70.67
74.28
Pitch (F0)
75.6
74.5
71.41 75.66
78.69
SD (Mean) 71.96
71.05
67.77 62.05
75
Max
74.8
76.8
69.34 62.34
78.5
Min
69.12
65.3
66.2
71.5
Fusion
72.87
71.91
68.68 65.45
61.76
75.92
The execution of this proposed work is done using

MATLAB 12. Fig. 4.1 shows the Pre emphasis of Telugu
emotional database and Fig. 4.2 shows both Pre emphasis
and MFCC and DWT graph of Autism Tamil Emotional
database. In preliminary work [19] [20], we recently
compared with ASD-DB with Tamil emotional database
and EMODB.
MFCC and DWT feature extraction are calculated and
result was shown in the form of graph and confusion
matrix. Input and output of MFCC and DWT of Telugu
DB is shown in Fig. 4.3. All the values are calculated in
the previous steps. Trained, tested and classified the
different emotion databases by Support vector machine
and the comparison values was shown in Table 4.2.
During the comparison Support vector machines will
find the minimum difference between the trained voice
sample and test voice sample. Based on the parameters
SVM classifier will recognize emotion.
Table 4.2 shows confusion matrix implemented by
SVM for given type of database.
If (Pitch F0=0.470.03) and (SD-Mean=0.160.02) and

(max value =0.42+0.04) and (min value=0.42-0.04)
Neutral
Else if (Pitch F0=0.660.05) and (SD-Mean =0.290.02)
and (max value =0.52+0.04) and (min value=0.52-0.04)
Anger
Sad
Fear
Else Happy.
99
World Appl. Sci. J., 34 (1): 94-102, 2016
Fig. 4.1: Preemphasis for Telugu Emotional database
Fig. 4.2: MFCC and DWT for ASD Tamil Emotional database
Fig. 4.3: MFCC and DWT (Telugu)

100
World Appl. Sci. J., 34 (1): 94-102, 2016
CONCLUSIONS
6.
In this paper comparison was made for Telugu

database with Tamil emotion database, Autism Spectrum
Disorder children and Berlin database. It contains an
emotion dialogues which was recorded from Special
school for ASD database and Telugu database was taken
from movies and comparison was made. Files are recorded
in the form of .mp3, .ogg format and then the files are
converted into .wav file format. Standard Berlin emotion
database of German language is used for comparison.
MFCC and DWT features are extracted from a speech files
in .wav format.
From experimentation the result is shown in confused
matrix for speaker dependent (ASD-DB) and speaker
independent (EMO-DB, Telugu DB, Tamil DB) with
various parameters. In this paper the size of the training
set is very limited. Gathering emotion data in speech with
disabled children was particularly delicate. Collecting data
from Multimodal recognition for ASD children is more
complex and collecting the large becomes more complex in
real world.
One of the challenges in the research in this field of
analysis of emotional speech is difficulty for worldwide
various available databases of emotional speech. In
future, extending the efforts for the segmentation of the
natural speakers from real life situations should be
automated speech emotion recognition system.
7.
8.
9.
10.
11.
REFERENCES
1.
2.
3.
4.
5.
Chandra, Praksah and V.B. Gaikwad, 2015. Analysis

Of Emotion recognition System Through Speech
Signal Using KNN, GMM & SVM Classifier
International Journal Of Engineering And Computer
Science ISSN:2319-7242, 4(6): 12523-12528.
Mirko Uljarevic Antonia Hamilton, Recognition of
Emotions in Autism: A Formal Meta-Analysis, DOI
10.1007/s10803-012-1695-5.
Ramakrishnan, S., Department of IT, Recognition of
emotion from speech:A Review.
Nitin, Thapliyal and Gargi Amoli, 2012. Speech
based Emotion Recognition with Gaussian Mixture
Model international Journal of Advanced Research
in Computer Engineering & Technology, 1(5).
http://en.wikipedia.org/wiki/Melfrequency_cepstrum.
12.
13.
14.
15.
101
Matthew D. Lerner, James C. Mc Partland, James P.

Morris, 2013. Multimodal emotion processing in
autism spectrum disorders: An event-related
potential
study,
Developmental
Cognitive
Neuroscience, 3: 11-21.
Catherine R.G. Jones, Andrew Pickles, Milena
Falcaro, Anita J.S. Marsden, Francesca Happe,
Sophie K. Scott and Disa Sauter, Jenifer, 2011. A
multimodal approach to emotion recognition ability in
autism spectrum disorders, Journal of Child
Psychology and Psychiatry, 52(3): 275-285.
doi:10.1111/j.1469-7610.2010.02328.x .
Karla Conn Welch, Uttama Lahiri, Nilanjan Sarkar,
Zachary Warren, Wendy Stone and Changchun Liu,
2011. Affect-Sensitive, Computing and Autism
2011 DOI: 10.4018/978-1-61692-892-6.ch015.
Erik Marchi1, Bjorn Schuller, Anton Batliner, Shimrit
Fridenzon, Shahar Tal and Ofer Golan, 2012. Emotion
in the Speech of Children with Autism Spectrum
Conditions: Prosody and Everything ElseIn WOCCI,
pp: 17-24.
Natalie, Harrold and Chek Tien Tany Daniel Rosserz,
2012. Towards an Expression Recognition Game to
Assist the Emotional Development of Children with
Autism Spectrum Disorders WASA 2012,
Singapore, November 26-27, 2012. ACM 978-1-45031835-8/12/0011.
Line Gebauer, Joshua Skewes, Gitte Westphael,
Pamela Heaton and Peter Vuust, 2014.
Intactbrainprocessingofmusicalemotionsinautisms
p e c t r u m d i s o r d e r ,
butmorecognitiveloadandarousalinhappyvs.sadmu
s i c
published:
15July2014
doi:
10.3389/fnins.2014.00192.
Fan Lin, I., Takashi Yamada, Yoko Komine,
Nobumasa Kato, Masaharu Kato and Makio
Kashino, 2015. Vocal Identity Recognition in Autism
Spectrum DisorderPublished online 2015 Jun 12. doi:
10.1371/journal.pone.0129451.
Mirko Uljarevic Antonia Hamilton, Recognition of
Emotions in Autism: A Formal Meta-Analysis, DOI
10.1007/s10803-012-1695-5
Mary, E., Stewart Applied Psychology, Emotional
recognition in autism spectrum conditions from
voices and faces, DOI: 10.1177/1362361311424572.
Anila, R. and A. Revathy, 2015. Speech emotion
recognition using iterative clustering technique
Proceedings of 23rd IRF International Conference, 5th
April 2015, Chennai, India, ISBN: 978-93-82702-90-0.
World Appl. Sci. J., 34 (1): 94-102, 2016
16. Rahul B. Lanjewar, Swarup Mathurkar and Nilesh

Patel, 2015. Implementation and Comparison of
Speech Emotion Recognition System using
Gaussian Mixture Model (GMM) and K-Nearest
Neighbor
(K-NN)
Techniques
ICAC315,
DOI:.10.1016/j.procs.2015.04.226.
17. Tickle, A., S. Raghu and M. Elshaw, 2013. Emotional
recognition from the speech signal for a virtual
education agentJournal of Physics: Conference
Series,
450:
012053
doi:10.1088/17426596/450/1/012053
18. Mohamed Rizon, Discrete Wavelet Transform Based
Classification of Human Emotions Using
Electroencephalogram Signals American Journal of
Applied Sciences, 7(7): 878-885, 2010 ISSN 1546-9239.
19 Sunitha, Ram C. and R. Ponnusamy, 2014.
Recognising and classify Emotion from the speech
of Autism Spectrum Disorder children for Tamil
language
using
Support
Vector
Machine
International Journal of Applied Engineering
ResearchISSN 0973-4562 9(24): 25587-25602
http://www.ripublication.com.
20. Sunitha Ram, C. and R. Ponnuswamy, 2014. An
effective automatic speech emotion recognition for
Tamil language using Support Vector Machine,
International Conference on Issues and Challenges
in Intelligent Computing Techniques (ICICT), 2014
DOI:10.1109/ICICICT.2014.6781245.
21. Sujata B. Wankhade, 2011. Speech emotion

Recognition using SVM and LIBSVM International
Journal Of Computer Science And Applications, 4(2),
June July 2011 ISSN: 0974-1003
22. Gyanendra K. Verma, 2011. "Multi-algorithm Fusion
for Speech Emotion Recognition", Communications
in Computer and Information Science.
23. Bhoomika Panda, Debananda Padhi, Kshamamayee
Dash and Sanghamitra Mohanty, 2012. Use of SVM
Classifier & MFCC in Speech Emotion Recognition
System March 2012. Available online at:
www.ijarcsse.com.
24. Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi,
2010. Voice Recognition Algorithms using Mel
Frequency Cepstral Coefficient (MFCC) and Dynamic
Time Warping (DTW) Techniques Journal of
Computing Volume 2, issue, march 2010.ISSN 21519617 Https://sites .goolgle.co, /journal of computing/
25. Pranita N. Kulkarni, D.L. Gadhe Student of M.E.
Assistant Prof. M.I.T. College, Aurangabad M.I.T.
College, AurangabadComparison between SVM and
other classifiers for SER International Journal of
Engineering Research & Technology (IJERT) Vol. 2
Issue 1, January- 2013 ISSN: 2278-0181.
26. Melanie Dumas Department of Computer Science
University of California, San Diego La Jolla,
Emotional Expression Recognition using Support
Vector
Machines
CA
92193-0114
mdumas@cs.ucsd.edu.
102

2016-Assessment On Speech Emotion Recognition For ASD

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

2016-Assessment On Speech Emotion Recognition For ASD

Hochgeladen von

Copyright:

Verfügbare Formate

World Applied Sciences Journal 34 (1): 94-102, 2016

Assessment on Speech Emotion Recognition for

C. Sunitha Ram and 2R. Ponnusamy

Department of CSE, SCSVMV University, Kanchipuram, India

expressions. To achieve Human Computer Interaction an

As per human perspective emotions are imperative to

World Appl. Sci. J., 34 (1): 94-102, 2016

Improve the quality of the naturalness in speech

A large sample was taken from Autism Spectrum

This proposed work uses both Telugu and Tamil

World Appl. Sci. J., 34 (1): 94-102, 2016

Tamil and Telugu Database: In existing work ASD

ASD Database: The ASD children under the age group of

World Appl. Sci. J., 34 (1): 94-102, 2016

Fourteen Female, Fifteen Male

Seven Female, Eleven Male

Six Female, Eight Male

Fig. 3.1: Diagrammatic representation of Discrete Wavelet transformation of a speech signal

Y (n)= X(n) - a*X(n-1)

Lets consider a = 0.95, which make 95% of any one

High pass filter

Fig. 3.2: Shows feature extraction process using MFCC

Frame = floor((l - N) /M) + 1;

World Appl. Sci. J., 34 (1): 94-102, 2016

Step 3: Hamming windowing

adjacent filters [20]. Then, each filter output is the sum of

If the window is defined as W (n), 0 n N-1 where

SVM Training and Classification: SVM is a very a good

cos(2 n/(N-1)) (3.3)

Step 4: Fast Fourier Transform

If X (w), H (w) and Y (w) are the Fourier Transform of

Step 5: Triangular Bandpass Filters

MFCC Feature Extraction

DWT Feature Extraction

SVM Training & Testing

To get Mel scale as an output, a weighted sum of

Emotion speech output

Fig. 3.3: Overview of Proposed Work

World Appl. Sci. J., 34 (1): 94-102, 2016

After the training process, every feature is extracted

Table 4.1: Berlin Emotional Speech Database

Table 4.2: Confusion matrix for SVM classifier

RESULTS AND DISCUSSIONS

Type of Databases Features

Angry Happy Sad

The execution of this proposed work is done using

If (Pitch F0=0.470.03) and (SD-Mean=0.160.02) and

World Appl. Sci. J., 34 (1): 94-102, 2016

Fig. 4.1: Preemphasis for Telugu Emotional database

Fig. 4.3: MFCC and DWT (Telugu)

World Appl. Sci. J., 34 (1): 94-102, 2016

In this paper comparison was made for Telugu

Chandra, Praksah and V.B. Gaikwad, 2015. Analysis

Matthew D. Lerner, James C. Mc Partland, James P.

World Appl. Sci. J., 34 (1): 94-102, 2016

16. Rahul B. Lanjewar, Swarup Mathurkar and Nilesh

21. Sujata B. Wankhade, 2011. Speech emotion

Das könnte Ihnen auch gefallen