Sie sind auf Seite 1von 2

CSE 455/555 Spring 2013 Homework 8: Non-parametric Techniques

Jason J. Corso
Computer Science and Engineering
SUNY at Buffalo
jcorso@buffalo.edu
Solution by Yingbo Zhou

This assignment does not need to be submitted and will not be graded, but students are advised to work through the
problems to ensure they understand the material.
You are both allowed and encouraged to work in groups on this and other homework assignments in this class.
These are challenging topics, and working together will both make them easier to decipher and help you ensure that
you truly understand them.

1. Kernel Density Estimation and k-Nearest Neighbors


Suppose you are given a dataset X = {0, 0, 0, 1, 2, 2, 2, 2, 3, 4, 4, 4, 5, 5}
(a) Using the following kernel function with a bandwith of 3, calculate the kernel density estimate of x =
{0, 1, 2, 3, 4, 5}.

1
1
|u|
2
K(u) =

0
otherwise
Solution:

kn
1X 1
x xi
pn (x) =
=
K(
)
d
nVn
n
hn
hn
i=1

14

p0 (x) =
=
=
=

1 X 1
0 xi
K(
)
1
14
3
3
i=1


1
1
2
4
5
K(0) 3 + K( ) + K( ) 4 + K(1) + K( ) 3 + K( ) 2
14 3
3
3
3
3
1
(1 3 + 1 + 0 4 + 0 + 0 3 + 0 2)
14 3
4
42

Similarly, p1 (x) = 8/42, p2 (x) = 6/42, p3 (x) = 8/42, p4 (x) = 6/42, p5 (x) = 5/42.
(b) Consider the effect of bandwidth in case of kernel density estimation, when will it result estimates that
are of high bias and in what case will it result estimates of high variance?
Solution:
When the bandwidth is too small, the estimate will be quite sensitive to noise since it only covers a small
amount of data around, and thus will result estimates that are of high variance. On the other hand, if
the bandwidth is too large, the estimation will average through a lot of data, which will result a stable
estimation but with high error, since every point is likely to take on similar probabilities. In other words,
too large bandwidth will result an estimate of high bias.
1

(c) Think of the similar question in case of k-nearest neighbor, what is the effect of the number of k?
Solution:
In case of k-nearest neighbor, instead of fixing Vn we fix kn = k and determine Vn based on k and data.
So, when k is small the volume Vn is very likely to be sensitive to the location, and therefore will result
high variance estimates. On the other end, when k is very large, Vn will be almost constant at all times,
so no matter the location the estimate will be almost the same, and thus the estimates will be of high
bias. In other words, we are not able to extract the underlying structure from the data.
(d) Suppose that we are doing classification of d-dimensional data using k-nearest neighbor method, show
that the effective number of parameters used by k-nearest neighbor is in the order of N/k, where N is
the number of training examples.
Hint:Think of the cases where k = 1 and k = N .
Solution:
For k-nearest neighbor classification, we partitioned the dataset to N/k disjoint regions, and if our
sample falls in one of the regions then we claim the class of the sample to the class of that region. So the
effective parameters that affect the classification in this case is the number of regions, i.e. N/k.
2. Figure-ground segmentation
Implement the kernel-density estimation based method for foreground and background segmentation. The
method is discribed in the paper attached.
(a) Be sure to implement weighted kernel-density estimation (based on the current probability of foreground
and probability of background).
(b) Use the same exact color-based feature space that they do.
(c) You do not need to implement the method of normalized KL-divergence for selecting the kernel scale
for initialization. Just set it to some reasonable value manually.
(d) You do not need to use the method based on sample variance to set the kernel bandwidth. Just set it to
something reasonable (0.1).
(e) You have the choice of either implementing a sampling-based version as Zhao and Davis have done (i.e.,
take 6% of the pixels each round), or you can simply process all of the pixels.
(f) Rather than implementing the Gaussian kernel as they have, use the Epanechnikov kernel:
3
K(u) = (1 2 )(|| 1)
4
(g) Have your system iterate for a fixed number of iterations (say 25).
(h) Set the bandwith parameter to some different values (say 0.1, 0.01, or 0.2), and see the effect on the
result on the provided flower.ppm and butterfly.ppm file.
(i) Also try this method on some images of your interest and look at the result. (You probably want to resize
your image to lower resolutions (like 240 180 before processing it.)

Das könnte Ihnen auch gefallen