Beruflich Dokumente
Kultur Dokumente
Assigned: August 30
Due: September 13
Problem 1: (22 pts) Let the conditional densities for a two-category one-dimensional problem be given by the
following Cauchy distribution:
1
1
p(x|i ) =
xai 2 , i = 1, 2.
b 1 +
b
1. (6 pts) By explicit integration, check that the distribution are indeed normalized.
2
, that is, the minimum error
2. (9 pts) Assuming P (1 ) = P (2 ), show that P (1 |x) = P (2 |x) if x = a1 +a
2
decision boundary is a point midway between the peaks of the two distributions, regardless of b.
1.
1
p(x|i )dx =
u=
b
We substitute y =
xai
b
1+
1
xai 2 dx .
b
=
=
=
=
1
1
dy
1 + y 2
1
tan1 (y)
1
+
2
2
1.
xa1 2 =
xa2 2 ,
b 1 +
2
b 1 +
2
b
b
or, equivalently,
x a1 = (x a2 ) .
For a1 = a2 , this implies that x =
a1 +a2
.
2
P (error|x)p(x)dx .
=
a1 +a2
2 ,
hence
P (2 |x) if
=
P (1 |x) if
p(x| )P ( )
2
p(x)
p(x|1 )P (1 )
p(x)
a1 +a2
2
a1 +a2
2
2
x a1 +a
2
a1 +a2
x> 2
x
x>
if
if
2
Therefore, the probability of error is
a1 +a2
2
P (error) =
1
2b
=
We substitute y =
xa2
b
and z =
P (error)
p(x|2 )P (2 )dx +
a1 +a2
2
a1 +a2
2
p(x|1 )P (1 )dx
1
xa2 2 dx +
2b
1+( b )
a1 +a2
2
1+
dx
1 2
( xa
b )
xa1
b
2b
1
1
1
dy +
dz
a2 a1 1 + z 2
2
1 + y2
2b
a a
1 2
1
1
1
2b
tan (y) + tan (z) a2 a1
2
2b
a
1
2
1 1
1 a2 a1
+ + tan
tan
2
2b
2
2
2b
1
1
1 a2 a1
tan
.
2
2b
=
=
=
=
1
2
tan1
a1 a2
2b .
Problem 2: (21 pts) Let max (x) be the state of nature for which P (max |x) P (i |x) for all i, i = 1, . . . , c.
1. (7 pts) Show that P (max |x) 1c .
2. (7 pts) Show that for the minimum-error-rate decision rule the average probability of error is given by
P (error) = 1 P (max |x)p(x)dx .
3. (7 pts) Show that P (error)
c1
c .
Answer:
1. Since P (max |x) P (i |x), we have
c
P (max |x)
i=1
c
P (i |x) = 1 .
i=1
Hence
cP (max |x) 1 ,
which implies that P (max |x) 1c .
2. By definition,
P (error) =
P (error|x)p(x)dx
3
3. From 1 and 2, it is clear that
P (error) = 1
P (max |x)p(x)dx 1
1
1
c1
p(x)dx = 1 =
.
c
c
c
Problem 3: (22 pts) In many machine learning applications, one has the option either to assign the pattern to
one of c classes, or to reject it as being unrecognizable. If the cost for rejects is not too high, rejection may be a
desirable action. Let
i=j
i, j = 1, . . . , c
0
r i = c + 1
(i |j ) =
s otherwise,
where r is the loss incurred for choosing the (c + 1)th action, rejection, and s is the loss incurred for making
any substitution error.
1. (10 pts) Please derive the decision rule with the minimum risk.
2. (6 pts) What happens if r = 0?
3. (6 pts) What happens if r > s ?
Answer:
1. For i = 1, . . . , c,
R(i |x)
c
(i |j )P (j |x)
j=1
= s
c
P (j |x)
j=1,j=i
= s [1 P (i |x)] .
For i = c + 1,
R(c+1 |x) = r .
Therefore, the minimum risk is achieved if we decide i if R(i |x) R(c+1 |x), i.e., P (i |x) 1 rs , and
reject otherwise.
2. If r = 0, we always reject.
3. If r > s , we will never reject.
Problem 4: (12 pts + 10 extra points) Let the components of the vector x = [x1 , . . . , xd ]T be binary-valued
(0 or 1), and let P (j ) be the prior probability for the state of nature j and j = 1, . . . , c. We define
pij = P (xi = 1|j ), i = 1, . . . , d, j = 1, . . . , c,
with the components of xi being statistically independent for all x in j .
1. (12 pts) Show that the minimum probability of error is achieved by the following decision rule:
Decide k if gk (x) gj (x) for all j and k, where
gj (x) =
d
i=1
pij
+
ln(1 pij ) + lnP (j ) .
1 pij
i=1
d
xi ln
2. (10 extra pts) If the components of x are ternary valued (1, 0, or 1), show that a minimum probability
of error decision rule can be derived that involves discriminant functions gj (x) that are quadratic function
of the components xi .
Answer:
1. Consider the following discriminant function
gj (x) = ln [p(x|j )P (j )] = ln p(x|j ) + ln P (j ) .
The components of x are statistically independent for all x in j , then we can write the density as a product:
p(x|j )
d
p(xi |j )
i=1
d
i=1
d
i=1
d
pij
xi ln
+
ln(1 pij ) + ln P (j ) .
1 pij
i=1
i=1
d
d
p(xi |j ) .
i=1
Let
pij
P (xi = 1|j ) ,
qij
rij
=
=
P (xi = 0|j ) ,
P (xi = 1|j ) .
d
pij2
i=1
d
d
d
pij rij
1
pij
=
x2i ln
+
xi ln
+
ln qij + ln P (j )
qij
2 i=1
rij
i=1
i=1
which are quadratic functions of the components xi .
Question 5: (23 pts) Suppose we have three categories with prior probabilities P (1 ) = 0.5, P (2 ) = P (3 ) =
0.25 and the class conditional probability distributions
p(x|1 )
p(x|2 )
N (0, 1)
N (0.5, 1)
p(x|3 )
N (1, 1)
5
where N (, 2 ) represents the normal distribution with density function
(x)2
1
p(x) =
e 22 .
2
We sample the following sequence of four points: x = 0.6, 0.1, 0.9, 1.1.
1. (9 pts) Calculate explicitly the probability that the sequence actually came from 1 , 3 , 3 , 2 .
2. (6 pts) Repeat for the sequence 1 , 2 , 2 , 3 .
3. (8 pts) Find the sequence of states having the maximum probability.
Answer: It is straightforward to compute that
p(0.6|1 ) = 0.333225
p(0.1|1 ) = 0.396953
p(0.9|1 ) = 0.266085
p(1.1|1 ) = 0.217852
p(0.6|2 ) = 0.396953
p(0.1|2 ) = 0.368270
p(0.9|2 ) = 0.368270
p(1.1|2 ) = 0.333225
(3 , 3 , 3 , 2 )
p(0.6|3 ) = 0.368270
p(0.1|3 ) = 0.266085
p(0.9|3 ) = 0.396953
p(1.1|3 ) = 0.396953 .
For each possible value of , we calculate P () and P (x|) using the following, which assume the independences
of xi and (i):
p(X|)
P () =
4
i=1
4
p(xi |w(i))
P ((i)) .
i=1
and
P () =
=
=
P (1 )P (2 )P (3 )P (4 )
1 1 1 1
2 4 4 4
0.0078125 .
6
= p(x1 = 0.6, x2 = 0.1, x3 = 0.9, x4 = 1.1|1 , 1 , 1 , 1 )P (1 , 1 , 1 , 1 )
+p(x1 = 0.6, x2 = 0.1, x3 = 0.9, x4 = 1.1|1 , 1 , 1 , 2 )P (1 , 1 , 1 , 2 )
..
.
+p(x1 = 0.6, x2 = 0.1, x3 = 0.9, x4 = 1.1|3 , 3 , 3 , 3 )P (3 , 3 , 3 , 3 )
= p(0.6|1 )p(0.1|1 )p(0.9|1 )p(1.1|1 )P (1 )P (1 )P (1 )P (1 )
+p(0.6|1 )p(0.1|1 )p(0.9|1 )p(1.1|2 )P (1 )P (1 )P (1 )P (2 )
..
.
+p(0.6|3 )p(0.1|3 )p(0.9|3 )p(1.1|3 )P (3 )P (3 )P (3 )P (3 )
= 0.012083 .
Therefore,
P (|X) =
=
=
=
3. The sequence = (1 , 1 , 1 , 1 ) has the maximum probability to observe X = (0.6, 0.1, 0.9, 1.1). This
maximum probability is 0.03966.