Sie sind auf Seite 1von 31

Ho Chi Minh City University of Technology

Faculty of Electrical and Electronics Engineering


Department of Telecommunications
Lectured by Ha Hoang Kha, Ph.D.
Ho Chi Minh City University of Technology
Email: hahoangkha@gmail.com
Bayesian Classifiers
Content
Bayes classifier introduction
Nave Bayes classifiers
Bayes classifiers
Classifiers 2 H. H. Kha
References
Slides are adapted from the following resources:
R. C. Gonzalez and R. E. Woods, Digital Image
Processing, Prentice Hall, 2 Ed, 2002.
Classifiers 3 H. H. Kha
1. Bayesian Classifier Introduction
4 Classifiers H. H. Kha
Bayesian Classifier Introduction
5 Classifiers H. H. Kha
Bayesian Classifier Introduction
6 Classifiers H. H. Kha
Bayesian Classifier Introduction
7 Classifiers H. H. Kha
Bayesian Classifier Introduction
8 Classifiers H. H. Kha
Bayesian Classifier Introduction
9 Classifiers H. H. Kha
Bayesian Classifier Introduction
10 Classifiers H. H. Kha
Bayesian Classifier
11 Classifiers H. H. Kha
2. Bayesian Classifier
12 Classifiers H. H. Kha
Bayesian Classifier
13 Classifiers H. H. Kha
Bayesian Classifier
14 Classifiers H. H. Kha
Bayesian Classifier
15 Classifiers H. H. Kha
Bayesian Classifier
16 Classifiers H. H. Kha
Nave Bayesian Classifier
17 Classifiers H. H. Kha
Nave Bayesian Classifier
18 Classifiers H. H. Kha
Advantages and Disadvantages of
Nave Bayesian Classifier
Advantages:
Fast to train (single scan). Fast to classify
Not sensitive to irrelevant features
Handles real and discrete data
Handles streaming data well
Disadvantages:
Assumes independence of features
Classifiers 19 H. H. Kha
3. Optimum statistical classifiers
Probability that x comes from class is
Average loss/risk incurred in assigning x to
Using basic probability theory, we get
i
e
( ) x
i
p e
( ) ( )

=
=
W
k
k kj j
p L r
1
x x e
Loss incurred if x actually came
from , but assigned to
k
e
j
e
j
e
( )
( )
( ) ( )
k
W
k
k kj j
P p L
p
r e e

=
=
1
x
x
1
x
p(A/B)p(B)=p(B/A)p(A)
Classifiers 20 H. H. Kha
Bayes classifier
Because 1/p(x) is positive and common to all r
j
(x),
so it can be dropped w/o affecting the comparison
among r
j
(x)
The classifier assigns x to the class with the
smallest average loss --- Bayes classifier
( ) ( ) ( )
k
W
k
k kj j
P p L r e e

=
=
1
x x
( ) ( ) ( ) ( ) i j j P p L P p L
q
W
q
q kj k
W
k
k ki
= <

= =
; , x x
1 1
e e e e
(1)
Classifiers 21 H. H. Kha
The Loss Function (L
ij
)
0 loss for correct decision, and same nonzero value
(say 1) for any incorrect decision.
where
ij ij
L o =1
j i j i
ij ij
= = = = if 0 and if 1 o o
(2)
Classifiers 22 H. H. Kha
Bayes Classifier
Substituting (2) into (1) yields
The classifier assigns x to class if for all
( ) ( ) ( ) ( )
( ) ( ) ( )
j j
k
W
k
k kj j
P p p
P p r
e e
e e o
x x
x 1 x
1
=
=

=
i
e
i j =
( ) ( ) ( ) ( ) i j W j P p P p
j j i i
= = > ; ,..., 2 , 1 , x x e e e e
p(x) is common to all
classes, so is dropped
Classifiers 23 H. H. Kha
Decision Function
Using Bayes classifier for a 0-1 loss function, the
decision function for is
Now the questions are
How to get ?
How to estimate ?
j
e
( ) ( ) ( ) W j P p d
j j j
,..., 2 , 1 , x x = = e e
( )
j
P e
( )
j
p e x
Classifiers 24 H. H. Kha
Using Gaussian Distribution
Most prevalent form (assumed) for is the
Gaussian probability density function.
Now consider a 1D problem with 2 pattern classes
(W=2)
( )
j
p e x
( ) ( ) ( )
( )
( ) 2 , 1 ,
2
1

x x
2
2
2
= =
=

j P e
P p d
j
m x
j
j j j
j
j
e
o t
e e
o
variance
mean
Classifiers 25 H. H. Kha
Example
Where is the decision if
1.
2.
3.
( ) ( )
2 1
e e p p =
( ) ( )
2 1
e e p p >
( ) ( )
2 1
e e p p <
( ) ( ) ( )
j j j
P p d e e x x =
Classifiers 26 H. H. Kha
N-D Gaussian
For jth pattern class,
where,
( )
( )
( ) ( )
j j
T
j
m C m
j
n
j
e
C
p


=
x x
2
1
2 1
2
1
2
1
x
t
e

e
=
j
j
j
N
m
e x
x
1

e
~
j
T
j j
T
j
j
m m
N
C
e x
xx
1
Classifiers 27 H. H. Kha
N-D Gaussian
Working with the logarithm of the decision function:
If all covariance matrices are equal, then
( ) ( ) | | ( ) | |
( ) ( ) ( ) | |
j j
T
j j j
j j j
m x C m x C
n
P
P p d
=
+ =
1
2
1
ln
2
1
2 ln
2
ln
ln x ln x
t e
e e
( ) ( )
j
T
j j
T
j j
m C m m C P d
1 1
2
1
x ln x

+ = e
Common covariance
Classifiers 28 H. H. Kha
For C=I
If C=I (identity matrix) and is 1/W, we get
which is the minimum distance classifier
Gaussian pattern classes satisfying these conditions
are spherical clouds of identical shape in N-D.
( )
j
p e
( ) W j m m m d
j
T
j j
T
j
,..., 2 , 1 ,
2
1
x x = =
Classifiers 29 H. H. Kha
Example
(
(
(

=
1
1
3
4
1
1
m
(
(
(

=
3
3
1
4
1
2
m
Decision boundary
(
(
(

= =
3 1 1
1 3 1
1 1 3
16
1
2 1
C C
Classifiers 30 H. H. Kha
Example
Assuming
We get
The decision surface is
( ) ( ) 2 1
2 1
= = e e p p
( )
j
T
j j
T
j
m C m m C d
1 1
2
1
x x

=
Dropping , which
is common to all classes
( )
j
p e ln
(
(
(

8 4 4
4 8 4
4 4 8
16
1
1
1
C
( ) ( ) 5 . 5 8 8 4 x and 5 . 1 4 x
3 2 1 2 1 1
+ + = = x x x d x d
( ) ( ) 0 4 8 8 8 x x
3 2 1 2 1
= + = x x x d d
Classifiers 31 H. H. Kha

Das könnte Ihnen auch gefallen