Beruflich Dokumente
Kultur Dokumente
Data Science
Classification and Clustering
Hanspeter Pfister & Joe Blitzstein
pfister@seas.harvard.edu / blitzstein@stat.harvard.edu
residual
y
column space of X
This Week
HW2 due 10/3 at 11:59 pm.
http://www.kontagent.com/kaleidoscope/2013/01/09/kscope-profile-what-george-clooney-can-teach-you-about-ltv-and-machine-learning/
Discriminative vs. Generative Classifiers
http://cvxopt.org/examples/book/logreg.html
linear decision boundary nonlinear decision boundary
y = f(x) + noise
Can we learn f fr
x Lets consider th
Moore, www.cs.cmu.edu/~awm/tutorials
Linear in x
Linear Regression
Moore, www.cs.cmu.edu/~awm/tutorials
Quadratic in x
Quadratic Regression
Moore, www.cs.cmu.edu/~awm/tutorials
Connect the dots
Join-the-dots
Also known as piece
linear nonparame
regression if that m
you feel better
Moore, www.cs.cmu.edu/~awm/tutorials
What do we really want?
y y
x x
Moore, www.cs.cmu.edu/~awm/tutorials
Why not choose the method with the
best fit to the data?
50
40
30
y
20
10
0
-2 -1 0 1 2
Shalizi, http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
plot(x,y2)
curve(7*x^2-0.5*x,add=TRUE,col="grey")
Figure 3.2: Scatter-plot showing sample data and the true, quadratic regression curve
In-Sample MSE
100.0
50.0
mean squared error
20.0
10.0
5.0
2.0
1.0
0.5
0 2 4 6 8
polynomial degree
Shalizi, http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
mse.q = vector(length=10)
for (degree in 0:9) {
Out-Of-Sample MSE
50.0 100.0
mean squared error
20.0
10.0
5.0
2.0
1.0
0.5
0 2 4 6 8
polynomial degree
Shalizi, http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
gmse.q = vector(length=10)
kNN (k Nearest Neighbors)
http://scott.fortmann-roe.com/docs/BiasVariance.html
s r
jN ( i ; u ) ij uj
rui =
s
jN ( i ; u ) ij
n probability
2 0.79
3 0.52
6 0.08
10 0.002
15 0.00001
Gelman, http://andrewgelman.com/2004/10/27/the_blessing_of/
u transmit the k-means Clustering
of points drawn
Lossy Compression
om this dataset.
tall decoding
the receiver.
allowed to send
r point.
be a lossy
n.
m Squared Error
coded coords and
rds.
der/decoder will
st information? Moore, www.cs.cmu.edu/~awm/tutorials
04, Andrew W. Moore K-means and Hierarchical Clustering: Slide 3
u transmit the k-means Clustering
of points drawn
Lossy Compression
om this dataset.
K-means
tall 1.decoding
Ask user how many
clusters theyd like.
the receiver.
(e.g. k=5)
2. Randomly guess k
allowed to Center
cluster send
r point.locations
be a lossy
n.
m Squared Error
coded coords and
rds.
der/decoder will
st information? Moore, www.cs.cmu.edu/~awm/tutorials
Copyright 2001, 2004, Andrew W. Moore K-means and Hierarchical Clustering: Slide 7
10 10
8 8
6 6
(a) (b)
4 4
2 2
0 0
0 2 4 6 8 10 0 2 4 6 8 10
MacKay, http://www.inference.phy.cam.ac.uk/itila/Potter.html