Beruflich Dokumente
Kultur Dokumente
Top of Form
MSS page 750 750 500 500
250 0 r 1 10
Search
Search
Bottom of Form
% Start acquisition
start(ai)
end
% Stop acquisition
stop(ai);
% Disconnect/Cleanup
delete(ai);
clear ai;
% Initialize Variables
samp1 = 1; samp2 = seglength; %Initialize frame start and end
for i = 1:nframes
% Get current frame for analysis
frame = speech(samp1:samp2);
% Do some analysis
...
<DO _ SOMETHING>
...
From the linear predictive filter coefficients, we can obtain several feature vectors using
Processing Toolbox functions, including reflection coefficients, log area ratio parameter
line spectral frequencies.
One set of spectral features commonly used in speech applications because of its robust
Mel Frequency Cepstral Coefficients (MFCCs). MFCCs give a measure of the energy w
overlapping frequency bins of a spectrum with a warped (Mel) frequency scale1.
Since speech can be considered to be short-term stationary, MFCC feature vectors are
calculated for each frame of detected speech. Using many utterances of a digit and comb
all the feature vectors, we can estimate a multidimensional probability density function
of the vectors for a specific digit. Repeating this process for each digit, we obtain the ac
model for each digit.
During the testing stage, we extract the MFCC vectors from the test speech and use a
probabilistic measure to determine the source digit with maximum likelihood. The chall
then becomes to select an appropriate PDF to represent the MFCC feature vector distrib
Figure 2a shows the distribution of the first dimension of MFCC feature vectors extracte
the training data for the digit ‘one’.
Figure 2a. Distribution of the first
dimension of MFCC feature vectors f
the digit ‘one.’ Click on image to see
enlarged view.
We could use dfittool in Statistics Toolbox™ to fit a PDF, but the distribution looks q
arbitrary, and standard distributions do not provide a good fit. One solution is to fit a Ga
mixture model (GMM), a sum of weighted Gaussians (Figure 2b).
The complete Gaussian mixture density is parameterized by the mixture weights, mean
and covariance matrices from all component densities. For isolated digit recognition, ea
is represented by the parameters of its GMM.
To estimate the parameters of a GMM for a set of MFCC feature vectors extracted from
training speech, we use an iterative expectation-maximization (EM) algorithm to obtain
maximum likelihood (ML) estimate. Given some MFCC training data in the variable
MFCCtraindata, we use the Statistics Toolbox gmdistribution function to estimate th
parameters. This function is all that is required to perform the iterative EM calculations.
%Number of Gaussian component densities
M = 8;
model = gmdistribution.fit(MFCCtraindata,M);
© 1994-2011 The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Pr
US/Canada
Australia
Belgium
China
Denmark
Finland
France
Germany
India
Ireland
Italy
Japan
Korea
Luxembourg
Netherlands
Norway
Portugal
Spain
Sweden
Switzerland
United Kingdom
All Other Countries