Beruflich Dokumente
Kultur Dokumente
Pattern Classication
Introduction Parametric classiers Semi-parametric classiers Dimensionality reduction Signicance testing
Pattern Classication 1
Pattern Classication
Goal: To classify objects (or patterns) into categories (or classes)
Feature Extraction
Classier
Feature Vector x
Class i
1. Supervised: Classes are known beforehand, and data samples of each class are available 2. Unsupervised: Classes (and/or number of classes) are not known beforehand, and must be inferred from data
Pattern Classication 2
Probability Basics
Discrete probability mass function (PMF): P (i ) P (i ) = 1
i
xp(x)dx
Pattern Classication 3
Kullback-Liebler Distance
Can be used to compute a distance between two probability mass distributions, P (zi ), and Q (zi ) D (P Q ) =
i
Known as relative entropy in information theory The divergence of P (zi ) and Q (zi ) is the symmetric sum D (P Q ) + D (Q P )
Pattern Classication 4
Bayes Theorem
p(x|1 )
p(x|2 )
Dene:
{i } P (i ) p(x|i ) P (i |x)
x a set of M mutually exclusive classes a priori probability for class i PDF for feature vector x in class i a posteriori probability of i given x P (i |x) = p (x ) =
M i =1
p(x|i )P (i ) p (x ) p(x|i )P (i )
Pattern Classication 5
To minimize P (error |x) (and P (error )): Choose i if mathP (i |x) > P (j |x) For a two class problem this decision rule means: Choose 1 if p(x|1 )P (1 ) p(x|2 )P (2 ) > ; else 2 p (x ) p (x ) j = i
This rule can be expressed as a likelihood ratio: p(x|1 ) P (2 ) > ; else choose 2 Choose 1 if p(x|2 ) P (1 )
Pattern Classication 6
Bayes Risk
Dene cost function ij and conditional risk R(i |x): ij is cost of classifying x as i when it is really j R(i |x) is the risk for classifying x as class i R(i |x) =
M j =1
ij P (j |x)
Bayes risk is the minimum risk which can be achieved: j = i Choose i if R(i |x) < R(j |x) Bayes risk corresponds to minimum P (error |x) when All errors have equal cost (ij = 1, i = j )
There is no cost for being correct (ii = 0) R(i |x) = P (j |x) = 1 P (i |x)
j =i
6.345 Automatic Speech Recognition Pattern Classication 7
Discriminant Functions
Alternative formulation of Bayes decision rule Dene a discriminant function, gi (x), for each class i Choose i if gi (x) > gj (x) j = i
Functions yielding identical classication results: gi (x) = = = P (i |x) p(x|i )P (i ) log p(x|i ) + log P (i )
Choice of function impacts computation costs Discriminant functions partition feature space into decision regions, separated by decision boundaries
Pattern Classication 8
Density Estimation
Used to estimate the underlying PDF p(x|i ) Parametric methods: Assume a specic functional form for the PDF Optimize PDF parameters to t data Non-parametric methods: Determine the form of the PDF from the data Grow parameter set size with the amount of data Semi-parametric methods: Use a general class of functional forms for the PDF Can vary parameter set independently from data Use unsupervised methods to estimate parameters
6.345 Automatic Speech Recognition Pattern Classication 9
Parametric Classiers
Gaussian distributions Maximum likelihood (ML) parameter estimation Multivariate Gaussians Gaussian classiers
Parametric Classiers 1
Gaussian Distributions
Gaussian PDFs are reasonable when a feature vector can be viewed as perturbation around a reference
0.4
Probability Density
0.0
0.1
0.2
0.3
Simple estimation procedures for model parameters Classication often reduced to simple distance metrics Gaussian distributions also called Normal
6.345 Automatic Speech Recognition Parametric Classiers 2
The PDF is centered around the mean = E (x ) = xp(x)dx The spread of the PDF is determined by the variance = E ((x ) ) =
2 2
(x )2 p(x)dx
Parametric Classiers 3
L ( )
p(xi |)
ML solutions can often be obtained via the derivative L ( ) = 0 For Gaussian distributions log L() is easier to solve
6.345 Automatic Speech Recognition Parametric Classiers 4
Probability Density
10
0.05
0.10
0.25
0.30
Probability Density
10
0.05
0.10
0.25
0.30
Parametric Classiers 7
-3.5
-3.0
-1.5
Parametric Classiers 8
Parametric Classiers 9
p(xi )
p(xi ) N (i , ii )
ii = i2
Parametric Classiers 10
PDF Contour
4
0
2 -4 -2 0 2 4 -4
-4 -4 -2 0 2 4
0
-2
-2
Parametric Classiers 11
i = j
PDF Contour
4
0
2 -4 -2 0 2 4 -4
-4 -4 -2 0 2 4
0
-2
-2
Parametric Classiers 12
PDF Contour
4
0
2 -4 -2 0 2 4 -4
-4 -4 -2 0 2 4
0
-2
-2
Parametric Classiers 13
Multivariate ML Estimation
The ML estimates for parameters = {1 , . . . , l } are determined by maximizing the joint likelihood L( ) of a set of i.i.d. data X = { x 1 , . . . , xn } L( ) = p(X | ) = p(x1 , , xn | ) =
n i =1
p(xi | )
we solve L( ) = 0 , or log L( ) = 0 To nd = { , , } 1 l
1 = )(xi )t ( xi n i
Parametric Classiers 14
Gaussian Classier: i = 2 I
Each class has the same covariance structure: statistically independent dimensions with variance 2 The equivalent discriminant functions are: x i 2 gi (x) = + log P (i ) 2 2 If each class is equally likely, this is a minimum distance classier, a form of template matching The discriminant functions can be replaced by the following linear expression: gi (x) = wit x + i 0 where wi =
1 2 i t 1 and i 0 = 2 2 i i + log P (i )
Parametric Classiers 16
Gaussian Classier: i = 2 I
For distributions with a common covariance structure the decision regions are hyper-planes.
5
4 2 0
4 2
-1 -1 0 1 2 3 4 5
Parametric Classiers 17
Gaussian Classier: i =
Each class has the same covariance structure The equivalent discriminant functions are: 1 gi (x) = (x i )t 1 (x i ) + log P (i ) 2 If each class is equally likely, the minimum error decision rule is the squared Mahalanobis distance The discriminant functions remain linear expressions: gi (x) = wit x + i 0 where i 0
6.345 Automatic Speech Recognition
wi = 1 i 1 t 1 = i i + log P (i ) 2
Parametric Classiers 18
i 0
Parametric Classiers 19
-1
-2
-3 -1 0 1 2 3 4 5
Parametric Classiers 20
2 n0 2 n0 + 2
2
2 n0 + 2
2 n =
2 2 0 2 n0 + 2
References
Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001. Duda, Hart and Stork, Pattern Classication, John Wiley & Sons, 2001. Atal and Rabiner, A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classication with Applications to Speech Recognition, IEEE Trans ASSP, 24(3), 1976.
Parametric Classiers 25