Beruflich Dokumente
Kultur Dokumente
speech recognizer
On a 1.7GHz Pentium 4
Procedures
Preparation
– identify the goal;
– decide the recognition unit: phoneme, syllable,
word etc;
– preparing the corpus: training, development,
testing;
– label part of training data (opt).
– etc.
Procedures cont.
Wˆ arg max p(O | W ) P(W )
W
Training
– Acoustic model training;
– Language model training;
Adaptation
– Speaker adaptation (VTLN, MLLR, MAP);
– Environment adaptation (mismatch of training and
testing);
Testing
Acoustic model training
Feature extraction and iterative steps of viterbi state-
based alignment and model estimation;
Outputs a set of decision-tree state-clustered HMMs;
Feature extraction (PMVDR)
Perceptual Minimum Variance Distortionless
Response cepstral coefficients;
– fea [options] speechfile.raw featurefile.fea
Dynamic features;
Language Model I
Finite state grammar in terms of a regular
expression;
Language model II
Language model:
– P(W) = P(w1, w2, …, wm) gives the probability of a
given word sequence;
– expanded as
– N-gram
– Calculated as