Beruflich Dokumente
Kultur Dokumente
ISSN 1818-4952
IDOSI Publications, 2016
DOI: 10.5829/idosi.wasj.2016.34.1.15637
Abstract: The affinity between human and machine interaction is a formidable mission that machine should be
able to identify and respond human non verbal communication such as emotions through signal processing
become more natural. This proposal compares emotion from the Autism Spectrum Disorder children with other
Language like Tamil, Telugu and German. It is an extension of recognising and classifies emotion from Autism
Spectrum Disorder children of their speech of Tamil language with Telugu database. Discrete Wavelet
Transform and Mel Frequency Cepstral Coefficient are the feature extraction method which is used. In speech
emotion recognition discipline Support Vector Machine was used for classification method to classify various
databases like Berlin Database, speaker dependent Tamil emotion databases of Autism Spectrum Disorder
children, Tamil Databases and Telugu Databases carries the data from movies. The experimentation shows that
the results of Berlin-DB, Telugu-DB, Tamil-DB and ASD-DB exhibits the values are closer.
Key words: Autism Spectrum disorder
Emotion recognising and classify
Mel Frequency Cepstral
Coefficient Support Vector machine Discrete Wavelet Transform
INTRODUCTION
Corresponding Author: C. Sunitha Ram, Department of CSE, SCSVMV University, Kanchipuram, India.
94
Proposed Work
Overview of Proposed Work
Speech Emotion Databases: Primary emotion speech
databases were created for this experiment. Speech
samples if collected from real life situations then it will be
more realistic and natural. But it is difficult to collect due
to some natural parameters such as noise at the time of
recordings. Database plays a vital role for automatic
speech emotion recognition. As the naturalness of the
database increases the complexity increases based on
parameters. Selecting the correct parameters is an
important part of reducing the complexity of system.
Feature Extraction
Feature Extraction from Wavelet Transform: The speech
signal is send into sequential high pass and low pass filter
in order to extract features from wavelet coefficients.
Daubechis wavelet family provides good results for signal
analysis [21]. The feature vectors obtained at six level
wavelet decomposition provides close packed
representation of the signal. The coefficients are
calculated by the bandwidth from low to high. The Sub
band coefficients are sum by the original signal in the
form of cD1-cD6. Feature vectors are obtained by
standard deviation (mean) and pitch frequency f0 by
applying common statistics [22].
Energy distributed in the form of frequency and time
domain of the signal is described by Wavelet
transformation. In this technique decomposing the
signal into multilevel sequential frequency bands as sub
bands and this process is done by low and high pass
filters.
In DWT, it has different types of wavelet function.
Information extracted by different types of families need
not be same. To evaluate the information for particular
application, we are in need to choose exact wavelet
function to get exact result.
The output of high pass filter and low pass
filter can be represented in discrete wavelet
decomposition of signal and mathematically solved by
equation 1 and 2.
Yhigh [k ] =
X [n]g[2k -1]
(1)
Ylow [k ] =
X [n]h[2k - 1]
(2)
where Yhigh and Ylow are the outputs of the high band
pass and low band pass filters respectively [22].
96
S.No
Process
Description
----------------------------------------------------------------------------------------------------------------------------------------------ASD Database
Tamil Database
Telugu Database
1
2
3
4
5
Speaker
Tools
Recording Environment
Sampling Frequency, fs
Feature extraction
Emotional speech
Preprocessing
Preprocessing
Framing
Wavelet Decomposition
Hamming windowing
Low pass filter
Step 1: Preemphasis
This step processes the passing of signal through a
filter which emphasizes higher frequencies. This process
will increase the energy of signal at higher frequency [24].
FFT
Triangular Band
Pass Filter
Coefficients Extraction
Logarithm
MFCC
(3.1)
Step 2: Framing
The process of segmenting the speech samples
obtained from analog to digital conversion (ADC) into a
small frame with the length within the range of 20 to 40
msec. The voice signal is split up into N samples in the
form of frames. Adjacent frames are being separated by M
(M<N).
DWT
(3.2)
If the signal in a frame is denoted by Y(n), n = 0, N1, then the signal after Hamming windowing is X(n)*W(n),
where W(n) is the Hamming window defined by:
= 00.5
W(n, ) = (1 - ) -
(3.4)
Berlin
emotion
database
ASD Tamil
emotion
database
Tamil
emotion
database
mel(f)=2595*ln(1+f/700)
(3.5)
SVM classification
Telugu
emotion
database
Number of utterances
Anger
Happiness
Neutral
Sadness
Fear
112
69
72
59
55
Total
367
Neutral Fear
EMO-DB
Pitch (F0)
75
76
75.4
73.66
SD(Mean)
70
68.55
70.34 70.55
74
Max
73.8
74.8
74.44 72.34
77.5
Min
66.2
62.3
66.24 68.76
70.5
Fusion
71.25
70.41
71.61 71.33
75.23
Pitch (F0)
ASD- DB
TAMIL-DB
TELUGU-DB
78.91
74.71
68.95
69.96 68.22
75.5
SD(Mean) 73.19
72.13
71.05 70.16
74.86
Max
75.31
78.03
76.53 74.78
79.26
Min
71.06
66.22
65.56 65.53
70.46
Fusion
73.57
71.33
70.78 69.67
75.02
Pitch (F0)
69.91
73.5
78.69 72.22
74.5
SD (Mean) 74.69
76.13
72.55 70.16
74.21
Max
78.31
78.03
73.53 70.78
77.16
Min
71.06
74.22
71.56 69.53
71.26
Fusion
73.49
75.47
74.08 70.67
74.28
Pitch (F0)
75.6
74.5
71.41 75.66
78.69
SD (Mean) 71.96
71.05
67.77 62.05
75
Max
74.8
76.8
69.34 62.34
78.5
Min
69.12
65.3
66.2
71.5
Fusion
72.87
71.91
68.68 65.45
61.76
75.92
Fig. 4.2: MFCC and DWT for ASD Tamil Emotional database
CONCLUSIONS
6.
7.
8.
9.
10.
11.
REFERENCES
1.
2.
3.
4.
5.
12.
13.
14.
15.
101
102