Beruflich Dokumente
Kultur Dokumente
an introduction to
machine learning
Xiaojin Zhu
jerryzhu@cs.wisc.edu
Computer Sciences Department
University of Wisconsin, Madison
slide 1
Outline
Types of learning
Classification: features, classes, classifier
K-nearest-neighbor, instance-based learning
Distance metric
Training error, test error
Overfitting
Held-out set, k-fold cross validation, leave-one-out
slide 2
machine learning
slide 3
slide 5
...
slide 7
slide 8
1NN algorithm:
w.r.t. Euclidean
distance [(xj(1) - xnew(1))2 + + (xj(d) - xnew(d))2]1/2
j
9. Classify by y
1nn
new
= yj
slide 9
Instance-based learning
K-nearest-neighbor (kNN)
kNN algorithm:
new
w.r.t. Euclidean
distance
6. Classify by yknn = majority vote among the k points
How many neighbors to consider? (k)
What distance to use? (Euclidean)
How to combine neighbors labels? (majority vote)
slide 11
kNN demo
http://www.cs.cmu.edu/~zhuxj/courseproject/knndemo/KNN.html
slide 12
Weighted kNN
slide 13
slide 14
L1 norm
L(max) norm
slide 15
Distance metric
However choosing an appropriate distance metric is a
hard problem
Important questions:
2. How to choose the scaling factors (a1ad)?
3. What if some feature dimensions are simply noise??
4. How to choose K?
These are the parameters of the classifier.
slide 16
Evaluate classifiers
During training
Youre given training set (x1,y1), (x2,y2), , (xn,yn).
You write kNN code. You now have a classifier, a
black box which generates yknn for any xnew
During testing
For new test data xn+1xn+m, your classifier
generates labels yn+1knn yn+mknn
slide 17
Parameter selection
During training
Youre given training set (x1,y1), (x2,y2), , (xn,yn).
You write kNN code. You now have a classifier, a
Good idea: pick the
black box which estimates ynew for anyBut
xnewmost of the
parameters (e.g. K) that
time we only have
During testing
maximize
test set accuracy
the training set,
when we design a
For new test data xn+1xn+m, your classifier
classifier.
generates labels yn+1knn yn+mknn
slide 18
There is something
very wrong,
can you see why?
Hint: consider
K=1
slide 19
Overfitting
slide 20
slide 21
slide 22
Types of learning
K-nearest-neighbor, instance-based learning
Distance metric
Training error, test error
Overfitting
Held-out set, k-fold cross validation, leave-one-out
slide 24