You are on page 1of 16

Instance Based Learning

Read Ch. 8] -Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning
k

199

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Instance-Based Learning
Key idea: just store all training examples h ( )i Nearest neighbor: Given query instance , rst locate nearest training example , then estimate ^( ) ( ) -Nearest neighbor: Given , take vote among its nearest nbrs (if discrete-valued target function) take mean of values of nearest nbrs (if real-valued) P ( ) ^( )
x i ; f xi xq xn f xq f xn k xq k f k
k

f xq

i=1

f xi

200

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

When To Consider Nearest Neighbor


Instances map to points in < Less than 20 attributes per instance Lots of training data Advantages: Training is very fast Learn complex target functions Don't lose information Disadvantages: Slow at query time Easily fooled by irrelevant attributes
n

201

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Voronoi Diagram
+ xq + + +

202

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Behavior in the Limit


Consider ( ) de nes probability that instance will be labeled 1 (positive) versus 0 (negative). Nearest neighbor: As number of training examples ! 1, approaches Gibbs Algorithm Gibbs: with probability ( ) predict 1, else 0
p x p x k k x

-Nearest neighbor: As number of training examples ! 1 and gets large, approaches Bayes optimal Bayes optimal: if ( ) 5 then predict 1, else 0
p x > :

Note Gibbs has at most twice the expected error of Bayes optimal

203

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Distance-Weighted NN
k

Might want weight nearer neighbors more heavily... P ( ) ^( ) P


k

f xq

i=1

wi f x i wi

where
wi

k i=1

and (

d xq ; xi

( ) ) is distance between
d xq ; xi k

xq

and

xi

Note now it makes sense to use all training examples instead of just ! Shepard's method

204

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Curse of Dimensionality
Imagine instances described by 20 attributes, but only 2 are relevant to target function Curse of dimensionality: nearest nbr is easily mislead when high-dimensional One approach: Stretch th axis by weight , where chosen to minimize prediction error Use cross-validation to automatically choose weights Note setting to zero eliminates this dimension altogether
X j zj z1 ; : : : ; z n z1 ; : : : ; z n zj

see Moore and Lee, 1994]

205

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Locally Weighted Regression


Note NN forms local approximation to for each query point Why not form an explicit approximation ^( ) for region surrounding Fit linear function to nearest neighbors Fit quadratic, ... Produces \piecewise approximation" to Several choices of error to minimize: Squared error over nearest neighbors 1 X ( ( ) ? ^( )) ( ) 2
k f xq f x xq k f k E1 xq
x

k nearest nbrs of

f x

f x

Distance-weighted squared error over all nbrs 1 X ( ( ) ? ^( )) ( ( )) ( ) 2 2


E2 xq f x f x
2

K d xq ; x

:::

206

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Radial Basis Function Networks


Global approximation to target function, in terms of linear combination of local approximations Used, e.g., for image classi cation A di erent kind of neural network Closely related to distance-weighted regression, but \eager" instead of \lazy"

207

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Radial Basis Function Networks


f(x)

w0 1

w1

wk

... ...

where ( ) are the attributes describing instance , and ( )= + X ( ( )) One common choice for ( ( )) is
ai x x
k

a 1 (x) a2 (x)

a n (x)

f x

w0

u=1

wu K u d x u ; x

K u d xu ; x e

Ku d xu ; x

((

)) =

? 2 12

2 (x

u ;x)

208

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Training Radial Basis Function Networks


Q1: What to use for each kernel function ( ( )) Scatter uniformly throughout instance space Or use training instances (re ects instance distribution) Q2: How to train weights (assume here Gaussian ) First choose variance (and perhaps mean) for each { e.g., use EM Then hold xed, and train linear output layer { e cient methods to t linear function
xu Ku d xu ; x Ku Ku Ku

209

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Case-Based Reasoning
Can apply instance-based learning even when 6= < ! need di erent \distance" metric
X
n

Case-Based Reasoning is instance-based learning applied to instances with symbolic logic descriptions

((user-complaint error53-on-shutdown) (cpu-model PowerPC) (operating-system Windows) (network-connection PCIA) (memory 48meg) (installed-applications Excel Netscape VirusScan) (disk 1gig) (likely-cause ???))

210

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Case-Based Reasoning in CADET


CADET: 75 stored examples of mechanical devices each training example: h qualitative function, mechanical structurei new query: desired function, target value: mechanical structure for this function Distance metric: match qualitative function descriptions

211

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Case-Based Reasoning in CADET


A stored case: Tjunction pipe Structure: Q ,T 1 1 T = temperature Q = waterflow Q ,T 3 3 Q ,T 2 2 Function: Q 1 Q 2 T T 1 2 + + + T + 3 Q 3

A problem specification: Water faucet Structure: Function:

+ C t + + C + f

Q c Q h T T c + +

+ +

Q m

h +

212

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Case-Based Reasoning in CADET


Instances represented by rich structural descriptions Multiple cases retrieved (and combined) to form solution to new problem Tight coupling between case retrieval and problem solving Bottom line: Simple matching of cases useful for tasks such as answering help-desk queries Area of ongoing research

213

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Lazy and Eager Learning


Lazy: wait for query before generalizing -Nearest Neighbor, Case based reasoning
k

Eager: generalize before seeing query Radial basis function networks, ID3, Backpropagation, NaiveBayes, Does it matter? Eager learner must create global approximation Lazy learner can create many local approximations if they use same , lazy can represent more complex fns (e.g., consider = linear functions)
::: H H

214

lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997