Sie sind auf Seite 1von 16

0

Pattern
Classification

All materials in these slides were taken


from
Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
& Sons, 2000
with the permission of the authors and
the publisher

Pattern Classification, Chapter 2 (Part 2)

Chapter 2 (Part 2):


Bayesian Decision Theory
(Sections 2.3-2.5)

Minimum-Error-Rate Classification
Classifiers, Discriminant Functions and Decision Surfaces
The Normal Density

Minimum-Error-Rate Classification

Actions are decisions on classes


If action i is taken and the true state of nature is j then:
the decision is correct if i = j and in error if i j

Seek a decision rule that minimizes the probability


of error which is the error rate

Pattern Classification, Chapter 2 (Part 2)

Introduction of the zero-one loss function:


0 i = j
( i , j ) =
1 i j

i , j = 1 ,..., c

Therefore, the conditional risk is:


j =c

R( i | x ) = ( i | j ) P ( j | x )
j =1

= P( j | x ) = 1 P( i | x )
j1

The risk corresponding to this loss function is the


average probability error
Pattern Classification, Chapter 2 (Part 2)

The Bayes decision rule depends on minimizing risk


Minimizing the risk requires selecting the i
maximizes P (i | x)
(since R (i | x) = 1 P (i | x))

that

For Minimum error rate


Decide i if P (i | x) > P (j | x) j i
Pattern Classification, Chapter 2 (Part 2)

Regions of decision and zero-one loss function, therefore


(using the likelihood ratio formula:

P( x | 1 )
12 22 P ( 2 )
Let
.
= then decide 1 if :
>
21 11 P ( 1 )
P( x | 2 )

If is the zero-one loss function which means:


0 1

=
1 0
P( 2 )
then =
= a
P( 1 )
0 2
2 P( 2 )
then =
if =
= b
P( 1 )
1 0
Pattern Classification, Chapter 2 (Part 2)

Pattern Classification, Chapter 2 (Part 2)

Classifiers, Discriminant Functions


and Decision Surfaces

The multi-category case


Set of discriminant functions gi(x), i = 1,, c
The classifier assigns a feature vector x to class i
if:

gi(x) > gj(x) j i

Pattern Classification, Chapter 2 (Part 2)

Pattern Classification, Chapter 2 (Part 2)

Let gi(x) = - R(i | x)


(max. discriminant corresponds to min. risk!)

For the minimum error rate, we take


gi(x) = P(i | x)

(max. discrimination corresponds to max.


posterior!)

gi(x) P(x | i) P(i)


gi(x) = ln P(x | i) + ln P(i)
(ln: natural logarithm!)

Pattern Classification, Chapter 2 (Part 2)

10

Feature space divided into c decision regions


if gi(x) > gj(x) j i then x is in Ri
(Ri means assign x to i)
The two-category case

A classifier is a dichotomizer that has two discriminant


functions g1 and g2

Let g(x) g1(x) g2(x)


Decide 1 if g(x) > 0 ; Otherwise decide 2
Pattern Classification, Chapter 2 (Part 2)

11

The computation of g(x)

g( x ) = P ( 1 | x ) P ( 2 | x )
P( 1 )
P( x | 1 )
= ln
+ ln
P( 2 )
P( x | 2 )

Pattern Classification, Chapter 2 (Part 2)

12

Pattern Classification, Chapter 2 (Part 2)

13

The Normal Density

Univariate density
Density which is analytically tractable
Continuous density
A lot of processes are asymptotically Gaussian
Handwritten characters, speech sounds are ideal or prototype
corrupted by random process (central limit theorem)

P( x ) =

1
1 x
exp
,
2
2

Where:
= mean (or expected value) of x
2 = expected squared deviation or variance
Pattern Classification, Chapter 2 (Part 2)

14

Pattern Classification, Chapter 2 (Part 2)

15

Multivariate density
Multivariate normal density in d dimensions is:
P( x ) =

1
( 2 )

d/2

1/ 2

t
1
exp ( x ) ( x )
2

where:
x = (x1, x2, , xd)t (t stands for the transpose vector form)
= (1, 2, , d)t mean vector
= d*d covariance matrix
| | and -1 are determinant and inverse respectively

Pattern Classification, Chapter 2 (Part 2)

Das könnte Ihnen auch gefallen