Sie sind auf Seite 1von 6

1

ENGR 691/692 Section 66 (Fall 06): Machine Learning


Homework 1: Bayesian Decision Theory (solutions)

Assigned: August 30
Due: September 13

Problem 1: (22 pts) Let the conditional densities for a two-category one-dimensional problem be given by the
following Cauchy distribution:
1
1
p(x|i ) =

 xai 2 , i = 1, 2.
b 1 +
b
1. (6 pts) By explicit integration, check that the distribution are indeed normalized.
2
, that is, the minimum error
2. (9 pts) Assuming P (1 ) = P (2 ), show that P (1 |x) = P (2 |x) if x = a1 +a
2
decision boundary is a point midway between the peaks of the two distributions, regardless of b.

3. (7 pts) Show that the minimum probability of error is given by






1
1
1  a1 a2 
.
P (error) = tan 
2
2b 
Answer:


1.

1
p(x|i )dx =
u=
b

We substitute y =

xai
b

1+

1
 xai 2 dx .
b

into the above and get


k

=
=
=
=


1
1
dy
1 + y 2

1
tan1 (y) 




1
+
2
2
1.

2. By setting p(x|1 )P (1 ) = p(x|2 )P (2 ), we have


1
1
1
1
1
1

 xa1 2 =
 xa2 2 ,
b 1 +
2
b 1 +
2
b
b
or, equivalently,
x a1 = (x a2 ) .
For a1 = a2 , this implies that x =

a1 +a2
.
2

3. Without loss of generality, we assume a2 > a1 . The probability of error is defined as



P (error, x)dx
P (error) =


P (error|x)p(x)dx .
=

Note that the decision boundary is at


P (error|x)

a1 +a2
2 ,

hence

P (2 |x) if
=
P (1 |x) if
 p(x| )P ( )
2

p(x)
p(x|1 )P (1 )
p(x)

a1 +a2
2
a1 +a2
2
2
x a1 +a
2
a1 +a2
x> 2

x
x>
if
if

2
Therefore, the probability of error is


a1 +a2
2

P (error) =

1
2b

=
We substitute y =

xa2
b

and z =

P (error)


p(x|2 )P (2 )dx +

a1 +a2
2

a1 +a2
2

p(x|1 )P (1 )dx

1
xa2 2 dx +
2b
1+( b )

a1 +a2
2

1+

dx
1 2
( xa
b )

xa1
b

into the above and get


 a1 a2


2b
1
1
1
dy +
dz
a2 a1 1 + z 2
2
1 + y2
2b
 a a



 1 2
1

1
1
2b

tan (y)  + tan (z)  a2 a1
2
2b



a
1
2
1 1
1 a2 a1
+ + tan
tan
2
2b
2
2
2b
1
1
1 a2 a1
tan
.
2
2b

=
=
=
=

Similarly, if a1 > a2 , we have P (error) =

1
2

tan1

a1 a2
2b .

Therefore, we have shown that






1
1
1  a1 a2 
P (error) = tan 
.
2
2b 

Problem 2: (21 pts) Let max (x) be the state of nature for which P (max |x) P (i |x) for all i, i = 1, . . . , c.
1. (7 pts) Show that P (max |x) 1c .
2. (7 pts) Show that for the minimum-error-rate decision rule the average probability of error is given by

P (error) = 1 P (max |x)p(x)dx .
3. (7 pts) Show that P (error)

c1
c .

Answer:
1. Since P (max |x) P (i |x), we have
c


P (max |x)

i=1

c


P (i |x) = 1 .

i=1

Hence
cP (max |x) 1 ,
which implies that P (max |x) 1c .
2. By definition,


P (error) =

P (error|x)p(x)dx


[1 P (max |x)] p(x)dx



1
P (max |x)p(x)dx .

3
3. From 1 and 2, it is clear that

P (error) = 1


P (max |x)p(x)dx 1

1
1
c1
p(x)dx = 1 =
.
c
c
c

Problem 3: (22 pts) In many machine learning applications, one has the option either to assign the pattern to
one of c classes, or to reject it as being unrecognizable. If the cost for rejects is not too high, rejection may be a
desirable action. Let

i=j
i, j = 1, . . . , c
0
r i = c + 1
(i |j ) =

s otherwise,
where r is the loss incurred for choosing the (c + 1)th action, rejection, and s is the loss incurred for making
any substitution error.
1. (10 pts) Please derive the decision rule with the minimum risk.
2. (6 pts) What happens if r = 0?
3. (6 pts) What happens if r > s ?
Answer:
1. For i = 1, . . . , c,
R(i |x)

c


(i |j )P (j |x)

j=1

= s

c


P (j |x)

j=1,j=i

= s [1 P (i |x)] .
For i = c + 1,
R(c+1 |x) = r .
Therefore, the minimum risk is achieved if we decide i if R(i |x) R(c+1 |x), i.e., P (i |x) 1 rs , and
reject otherwise.
2. If r = 0, we always reject.
3. If r > s , we will never reject.
Problem 4: (12 pts + 10 extra points) Let the components of the vector x = [x1 , . . . , xd ]T be binary-valued
(0 or 1), and let P (j ) be the prior probability for the state of nature j and j = 1, . . . , c. We define
pij = P (xi = 1|j ), i = 1, . . . , d, j = 1, . . . , c,
with the components of xi being statistically independent for all x in j .
1. (12 pts) Show that the minimum probability of error is achieved by the following decision rule:
Decide k if gk (x) gj (x) for all j and k, where
gj (x) =

d

i=1


pij
+
ln(1 pij ) + lnP (j ) .
1 pij
i=1
d

xi ln

2. (10 extra pts) If the components of x are ternary valued (1, 0, or 1), show that a minimum probability
of error decision rule can be derived that involves discriminant functions gj (x) that are quadratic function
of the components xi .

Answer:
1. Consider the following discriminant function
gj (x) = ln [p(x|j )P (j )] = ln p(x|j ) + ln P (j ) .
The components of x are statistically independent for all x in j , then we can write the density as a product:
p(x|j )

d


p(xi |j )

i=1
d


pxiji (1 pij )1xi .

i=1

Thus we have the discriminant function


gj (x)

d


[xi ln pij + (1 xi ) ln(1 pij )] + ln P (j )

i=1

d



pij
xi ln
+
ln(1 pij ) + ln P (j ) .
1 pij
i=1
i=1
d

2. Consider the following discriminant function


gj (x) = ln [p(x|j )P (j )] = ln p(x|j ) + ln P (j ) .
The components of x are statistically independent for all x in j , therefore,
p(x|j ) =

d


p(xi |j ) .

i=1

Let
pij

P (xi = 1|j ) ,

qij
rij

=
=

P (xi = 0|j ) ,
P (xi = 1|j ) .

It is not hard to check that


p(xi |j ) =

d


xi + 12 x2i 1x2i 12 xi + 12 x2i


qij rij

pij2

i=1

Thus the discriminant functions can be written as






d

1
1 2
1
1 2
2
xi + xi ln pij + (1 xi ) ln qij + xi + xi ln rij + ln P (j )
gj (x) =
2
2
2
2
i=1

d
d
d

pij rij
1
pij 
=
x2i ln
+
xi ln
+
ln qij + ln P (j )
qij
2 i=1
rij
i=1
i=1
which are quadratic functions of the components xi .
Question 5: (23 pts) Suppose we have three categories with prior probabilities P (1 ) = 0.5, P (2 ) = P (3 ) =
0.25 and the class conditional probability distributions
p(x|1 )
p(x|2 )

N (0, 1)
N (0.5, 1)

p(x|3 )

N (1, 1)

5
where N (, 2 ) represents the normal distribution with density function
(x)2
1
p(x) =
e 22 .
2

We sample the following sequence of four points: x = 0.6, 0.1, 0.9, 1.1.
1. (9 pts) Calculate explicitly the probability that the sequence actually came from 1 , 3 , 3 , 2 .
2. (6 pts) Repeat for the sequence 1 , 2 , 2 , 3 .
3. (8 pts) Find the sequence of states having the maximum probability.
Answer: It is straightforward to compute that
p(0.6|1 ) = 0.333225
p(0.1|1 ) = 0.396953
p(0.9|1 ) = 0.266085
p(1.1|1 ) = 0.217852

p(0.6|2 ) = 0.396953
p(0.1|2 ) = 0.368270
p(0.9|2 ) = 0.368270
p(1.1|2 ) = 0.333225

We denote X = (x1 , x2 , x3 , x4 ) and = ((1), (2), (3), (4)).


as
(1 , 1 , 1 , 1 ) (1 , 1 , 1 , 2 )
(1 , 1 , 2 , 1 ) (1 , 1 , 2 , 2 )
(1 , 3 , 1 , 1 ) (1 , 1 , 3 , 2 )
..
..
.
.
(3 , 3 , 3 , 1 )

(3 , 3 , 3 , 2 )

p(0.6|3 ) = 0.368270
p(0.1|3 ) = 0.266085
p(0.9|3 ) = 0.396953
p(1.1|3 ) = 0.396953 .

Clearly, there are 34 possible values of , such


(1 , 1 , 1 , 3 )
(1 , 1 , 2 , 3 )
(1 , 1 , 3 , 3 )
..
.
(3 , 3 , 3 , 3 )

For each possible value of , we calculate P () and P (x|) using the following, which assume the independences
of xi and (i):
p(X|)

P () =

4

i=1
4


p(xi |w(i))
P ((i)) .

i=1

For example, if = (1 , 3 , 3 , 2 ) and X = (0.6, 0.1, 0.9, 1.1), then we have


p(X|)

= p((0.6, 0.1, 0.9, 1.1)|(1, 3 , 3 , 2 ))


= p(0.6|1 )p(0.1|3 )p(0.9|3 )p(1.1|2 )
= 0.333225 0.266085 0.396953 0.333225
= 0.01173

and
P () =
=
=

P (1 )P (2 )P (3 )P (4 )
1 1 1 1

2 4 4 4
0.0078125 .

1. Given X = (0.6, 0.1, 0.9, 1.1) and = (1 , 3 , 3 , 2 ), we have


p(X)

= p(x1 = 0.6, x2 = 0.1, x3 = 0.9, x4 = 1.1)



=
p(x1 = 0.6, x2 = 0.1, x3 = 0.9, x4 = 1.1|)P ()

6
= p(x1 = 0.6, x2 = 0.1, x3 = 0.9, x4 = 1.1|1 , 1 , 1 , 1 )P (1 , 1 , 1 , 1 )
+p(x1 = 0.6, x2 = 0.1, x3 = 0.9, x4 = 1.1|1 , 1 , 1 , 2 )P (1 , 1 , 1 , 2 )
..
.
+p(x1 = 0.6, x2 = 0.1, x3 = 0.9, x4 = 1.1|3 , 3 , 3 , 3 )P (3 , 3 , 3 , 3 )
= p(0.6|1 )p(0.1|1 )p(0.9|1 )p(1.1|1 )P (1 )P (1 )P (1 )P (1 )
+p(0.6|1 )p(0.1|1 )p(0.9|1 )p(1.1|2 )P (1 )P (1 )P (1 )P (2 )
..
.
+p(0.6|3 )p(0.1|3 )p(0.9|3 )p(1.1|3 )P (3 )P (3 )P (3 )P (3 )
= 0.012083 .
Therefore,
P (|X) =
=
=
=

P (1 , 3 , 3 , 2 |0.6, 0.9, 0.1, 1.1)


p(0.6, 0.9, 0.1, 1.1|1, 3 , 3 , 2 )P (1 , 3 , 3 , 2 )
p(X)
0.01173 0.0078125
0.012083
0.007584 .

2. Following the steps in part 1, we have


P (1 , 2 , 2 , 3 |0.6, 0.1, 0.9, 1.1) =
=
=

p(0.6, 0.1, 0.9, 1.1|1, 2 , 2 , 3 )P (1 , 2 , 2 , 3 )


p(X)
0.01794 0.0078125
0.012083
0.01160 .

3. The sequence = (1 , 1 , 1 , 1 ) has the maximum probability to observe X = (0.6, 0.1, 0.9, 1.1). This
maximum probability is 0.03966.

Das könnte Ihnen auch gefallen