Beruflich Dokumente
Kultur Dokumente
09-15-2008
2/49
Course Grading
30%
3/49
Homework Assignments
40%
Exam
[12-Dec]
30%
Proposal (1-pager)
[due: 08-Oct]
[start of Dec]
[due: 15-Dec]
Instructors
interACT
2F, 407 S.Craig St.
Newell-Simon Hall
Doherty Hall
4/49
RM 203:
Alex Waibel
( ahw@cs.cmu.edu)
RM 221:
Ian Lane
( ianlane@cs.cmu.edu )
RM 209:
( yct@cs.cmu.edu )
Speech production
5/49
w
G(t)
w
vocal tract
V(t)
x
random noise
generator
A
w
radiation
model R(t)
speech
P L (n )
Sloppy Speech
Actual Input: I have been I have been getting into
ConverSational
Speech
Feature Extraction
Speech Waveform
FFT
FFT based spectrum
x
Mel scale triangular filters
Log
8/49
DCT
39 Element
Acoustic
Vector
Suggested reading:
S. Young, Large vocabulary continuous speech recognition: A review
10/49
???
Input
Speech
11/49
Output
Text
Hello world
Front-end
Processing
Input
Speech
12/49
???
Output
Text
Hello world
Fundamental Equation
For observed feature vector sequence x
=argmax P Wx=argmax P W P xW
W
P x
W
W
Front-end
Processing
Input
Speech
13/49
???
Output
Text
Hello world
Front-end
Processing
Input
Speech
14/49
P xW P W
Output
Text
Hello world
Acoustic
Model
Language
Model
Acoustic Model
Input
Speech
P xW P W
Output
Text
Hello world
Acoustic
Model
(phones)
15/49
I
/i/
you /j/ /u/
we /v/ /e/
Pronunciation
Dictionary
(map words to
phone sequences)
a12
b2(y1)
a33
a23
b2(y2)
a44
a34
b3(y3)
a45
4
b4(y4)
b4(y5)
Acoustic Vector
Sequence
Y
18/49
= y1
y2
y3
y4
y5
Hidden Markov Model for each phone or senome (context dependent model)
Language Model
Front-end
Processing
Input
Speech
19/49
P xW P W
Output
Text
Hello world
Language
Model
(likelihood of word
sequences)
p(you|how are)
p(today|are you)
p(world|Hello)
Language Modelling
P W = P w kw k 1 , w k 2
20/49
Speech
Feature
extraction
Hypotheses
(phonemes)
Decision
(apply trained
classifiers)
...
/h/
21/49
Training classifiers
Speech
features
Aligned
Speech
Feature
extraction
22/49
Improved
Classifiers
Train
Classifier
/e/
- kmeans
- LVQ
Suggested reading:
X. Huang, A. Acero, H. Hon, Spoken Language Processing, Chapter 4
R. Duda and P Hart, Pattern Classification and Scene Analysis, John
Wiley & Sons, 2000 (2nd Edition)
23/49
knowledge /
connectionist
statistical
supervised
parametric
nonparametric
linear
24/49
unsupervised
nonlinear
25/49
Knowledge-based approaches:
Compile knowledge
Connectionist approaches:
Statistical approaches:
Classification Trees
Simple binary decision tree
for height classification:
T=tall,
t=medium-tall,
M=medium,
m=medium-short,
S=short
Goal: Predict the height of a new person
Pattern Recognition
29/49
Types of classifiers
Bayes Classifier
K-Nearest Neighbor
Connectionist Methods
Perceptron
Multilayer Perceptrons
Supervised - Unsupervised
Supervised training:
Class to be recognized is known
for each sample in training data.
Requires a priori knowledge of
useful features and knowledge
/labeling of each training token
(cost!).
Unsupervised training:
Class is not known and structure
is to be discovered automatically.
Feature-space-reduction
example: clustering, auto-associative nets
30/49
Unsupervised Classification
F2
++
++ +
+
+
+
++ +
+ ++ ++
++++ + +
+ + + + ++ + + +
+ + +
+
+ +++ + ++
Classification:
F1
31/49
Unsupervised Classification
F2
++
++ +
+
+
+
++ +
+ ++ ++
++++ + +
+ + + + ++ + + +
+ + +
+
+ +++ + ++
Classification:
F1
Supervised Classification
Income
++
++ +
+
credit-worthy
+ non-credit-worthy
+
++ +
+ ++ ++
++++ + +
+ + + + ++ + + +
+ + +
+
+ +++ + ++
Classification:
Age
Classification Problem
Income
++
++ +
+
credit-worthy
+ non-credit-worthy
+
++ +
+ ++ ++ Is
++++ + +
+ + + + ++ + + +
+ + +
+
+ +++ + ++
34/49
x1
x
2
Joe credit-worthy ?
Age
Parametric - Non-parametric
Number
bad loans
good loans
Income
Parametric:
Non-parametric:
35/49
P ( j )
observation of x
AND
Bayes Rule:
p (x / j)P ( j)
P ( j / x ) =
p(x )
where:
p (x ) = p (x / j)P ( j)
j
P ( j / x)
P ( 1 / x ) if we decide
P ( 2 / x ) else
37/49
Classification Error
Bayes decision rule:
move the decision boundary such that the decision is made to
choose the class i based on the maximum value of P(x|i) P(i).
The tail integral area P(error) becomes minimum
38/49
for all j i
g i(x ) = P ( i / x )
p (x / i)P ( i)
c
p (x / j)( j)
j= 1
independent of class i
g i(x ) = p (x / i)P ( i)
g i ( x ) = lo g ( p ( x / i ) ) + lo g ( P ( i ) )
class
39/49
Problems:
limited training data
limited computation
class-labeling potentially costly and prone to error
classes may not be known
good features not known
Parametric Solution:
Assume that p ( x / i ) has a particular parametric form
Most common representative: multivariate normal density
40/49
42/49
Gaussian Densities
The most often used model for (preprocessed) speech signals are
Gaussian densities.
Often the "size" of the parameter spaces is measured in "number of
densities
A multivariate Gaussian density looks like this:
43/49
Gaussian Classifier
44/49
covariance matrix
mean vector i
Estimation of Parameters
1 n
= x k
n k =1
n
1
= (x k )( x k ) t
n k =1
45/49
Features:
Any features?
46/49
Curse of Dimensionality
47/49
Reason:
Solution:
reduce dimensionality
Trainability
48/49
Problems
f(x)
49/49