Andrew Rosenberg - Lecture 14: Neural Networks

Lecture
14 Neural Networks
Machine Learning March 18, 2010
Last Time
Perceptrons
Perceptron Loss vs. LogisAc Regression Loss Training Perceptrons and LogisAc Regression Models using Gradient Descent
Today
MulAlayer Neural Networks
Feed Forward Error Back-PropagaAon
Recall: The Neuron Metaphor

Neurons
accept informaAon from mulAple inputs, transmit informaAon to other neurons.
MulAply inputs by weights along edges Apply some funcAon to the set of inputs at each node
1 2 D
1 Types of Neurons 0 f ( , ) x 1 0 f ( , ) x
Linear Neuron
1 2 D
f ( , ) x
1 2 D
1 0
LogisAc Neuron
Perceptron
PotenAally more. Require a convex loss funcAon for gradient descent training.
5
MulAlayer Networks
Cascade Neurons together The output from one layer is the input to the next Each Layer has its own sets of weights
x0 x1 x2
0,0 0,1 0,2
1,0 2,0 1,1 1,2 2,1 2,2 f (x, )
xP
Linear Regression Neural Networks

What happens when we arrange linear neurons in a mulAlayer network?
x0 x1 x2 0,0 1,0 0,1 1,1 1,2 0,2 0,D 1,D f (x, )
Linear Regression Neural Networks

Nothing special happens.
The product of two linear transformaAons is itself a linear D N 1 transformaAon. f (x, ) = 1,i 0,i,n xn 0,0
1,0 1,1 1,2 x2 0,2 0,D 1,D
f (x, ) =
i=0 n=0
x0 x1
0,1
f (x, ) =
D i=0
D i=0
T 1,i [0,i ] x
T x [i ]
8
Neural Networks
We want to introduce non-lineariAes to the network.
Non-lineariAes allow a network to idenAfy complex regions in space 0,0 x0 1,0
x1 x2 0,1 1,1 1,2 0,2 0,D 1,D f (x, )
Linear Separability
1-layer cannot handle XOR More layers can handle more complicated spaces but require more parameters Each node splits the feature space with a hyperplane If the second layer is AND a 2-layer network can represent any convex hull.
10
Feed-Forward Networks
PredicAons are fed forward through the network to classify
x0 x1 x2 0,2 xP 0,0 0,1 1,0 2,0 1,1 1,2 2,1 2,2
11
x0 x1 x2 0,2 xP 0,0 0,1 1,0 2,0 1,1 1,2 2,1 2,2
12
x0 x1 x2 0,2 xP 0,0 0,1 1,0 2,0 1,1 1,2 2,1 2,2
13
x0 x1 x2 0,2 xP 0,0 0,1 1,0 2,0 1,1 1,2 2,1 2,2
14
x0 x1 x2 0,2 xP 0,0 0,1 1,0 2,0 1,1 1,2 2,1 2,2
15
x0 x1 x2 0,2 xP 0,0 0,1 1,0 2,0 1,1 1,2 2,1 2,2
16
Error BackpropagaAon
We will do gradient descent on the whole network. Training will proceed from the last layer to the rst.
x0 0,0 1,0 x1 x2 0,2 xP 0,1 2,0 1,1 1,2
17
2,1 2,2
f (x, )
Introduce variables over the neural network
= {wij , wjk , wkl } wij wjk wkl f (x, )
x0 x1 x2 xP
18
Introduce variables over the neural network

DisAnguish the input and output of each node
zi x0 x1 x2 xP wij aj zj wjk wkl ak zk al zl
= {wij , wjk , wkl }
f (x, )
19
aj =
zj = g(aj )
wij zi
ak =
zk = g(ak )
wjk zj
= {wij , wjk , wkl } al = wkl zk

k
zl = g(al )
zi x0 x1 x2 xP wij
aj
zj wjk
ak
zk
al wkl
zl
f (x, )
20
aj =
i
= {wij , wjk , wkl }

al = wkl zk
Training: Take the gradient of the last component and iterate backwards
wij zi
ak =
zj = g(aj )
zk = g(ak )
wjk zj
k zl = g(al )
zi x0 x1 x2 xP wij
aj
zj wjk
ak
zk
al wkl
zl
f (x, )
21
R() = =
N 1 L(yn f (xn )) N n=0
Empirical Risk FuncAon
N 1 1 2 (yn f (xn )) N n=0 2 2 N 1 1 wjk g wij xn,i yn g wkl g N n=0 2 j i
zi
aj
zj
ak
zk
al
zl
x0 x1 x2 xP
wij
wjk
wkl f (x, )
22
OpAmize last layer weights wkl
Ln =
1 2 (yn f (xn )) 2
Ln al,n R 1 = wkl N n al,n wkl

zi x0 x1 x2 xP
wij aj zj wjk
Calculus chain rule
ak
zk
al wkl
zl
f (x, )
23
Ln =
1 2 (yn f (xn )) 2
Calculus chain rule
R 1 1 (yn g(al,n ))2 al,n 2 = wkl N n al,n wkl

zi x0 x1 x2 xP f (x, )
wij aj zj wjk
ak
zk
al wkl
zl
24
Ln =
1 2 (yn f (xn )) 2
Calculus chain rule
R 1 1 (yn g(al,n ))2 zk,n wkl 2 = wkl N n al,n wkl

wij aj zj wjk
ak
zk
al wkl
zl
25
Ln =
R 1 1 (yn g(al,n ))2 zk,n wkl 1 2 = = [(yn zl,n )g (al,n )] zk,n wkl N n al,n wkl N n
wij aj zj wjk
1 2 (yn f (xn )) 2
Calculus chain rule
ak
zk
al wkl
zl
26
Ln al,n R 1 Calculus chain rule = wkl N n al,n wkl 1 (yn g(al,n ))2 zk,n wkl R 1 1 2 = = [(yn zl,n )g (al,n )] zk,n wkl N n al,n wkl N n 1 l,n nzk,n = N n a z
zi
j j
Ln =
1 2 (yn f (xn )) 2
ak
zk
al
zl
x0 x1 x2 xP
wij
wjk
wkl f (x, )
27
OpAmize last hidden weights wjk
R 1 Ln ak,n = wjk N n ak,n wjk
R wkl
1 l,n zk,n N n
zi x0 x1 x2 xP
wij
aj
zj wjk
ak
zk
al wkl
zl
f (x, )
28
R 1 Ln al,n = wjk N n al,n ak,n

l
ak,n wjk
R wkl
MulAvariate chain rule
1 l,n zk,n N n
zi x0 x1 x2 xP
wij
aj
zj wjk
ak
zk
al wkl
zl
f (x, )
29
R 1 Ln al,n ak,n = MulAvariate chain rule wjk N n al,n ak,n wjk l R 1 al,n = [zj,n ] l wjk N n ak,n
l
R wkl
1 l,n zk,n N n
zi x0 x1 x2 xP
wij
aj
zj wjk
ak
zk
al wkl
zl
f (x, )
30
R 1 Ln al,n ak,n = MulAvariate chain rule wjk N n al,n ak,n wjk l R 1 al,n al = wkl g(ak ) = [zj,n ] l wjk N n ak,n k
l
R wkl
1 l,n zk,n N n
zi x0 x1 x2 xP
wij
aj
zj wjk
ak
zk
al wkl
zl
f (x, )
31
R 1 Ln al,n ak,n = MulAvariate chain rule wjk N n al,n ak,n wjk l R 1 1 = [k,n ] [zj,n ] l wkl g (ak,n ) [zj,n ] = wjk N n N n l al = wkl g(ak )
k
R wkl
1 l,n zk,n N n
zi
aj
zj
ak
zk
al
zl
x0 x1 x2 xP
wij
wjk
wkl f (x, )
32
Repeat for all previous layers R 1 Ln al,n 1 1 = = [(yn zl,n )g (al,n )] zk,n = l,n zk,n wkl N n al,n wkl N n N n Ln ak,n R 1 1 1 = = k,n zj,n l,n wkl g (ak,n ) zj,n = wjk N n ak,n wjk N n N n l Ln R 1 aj,n 1 1 = = j,n zi,n k,n wjk g (aj,n ) zi,n = wij N n aj,n wij N n N n k aj zj zk ak al zl z
i
x0 x1 x2 xP
wij
wjk
wkl f (x, )
33
Now that we have well dened gradients for each parameter, update using Gradient Descent
t+1 wij t+1 wjk t+1 wkl
R wij R t = wjk wkl R t = wkl wkl

t = wij
zi x0 x1 x2 xP
wij
aj
zj wjk
ak
zk
al wkl
zl
f (x, )
34
Error Back-propagaAon
Error backprop unravels the mulAvariate chain rule and solves the gradient for each parAal component separately. The target values for each layer come from the next layer. This feeds the errors back along the network.
zi x0 x1 x2 xP
35
aj wij
zj wjk
ak
zk
al wkl
zl
f (x, )
Problems with Neural Networks

InterpretaAon of Hidden Layers Overang
36
InterpretaAon of Hidden Layers

What are the hidden layers doing?! Feature ExtracAon The non-lineariAes in the feature extracAon can make interpretaAon of the hidden layers very dicult. This leads to Neural Networks being treated as black boxes.
37
Overang in Neural Networks

Neural Networks are especially prone to overang. Recall Perceptron Error
Zero error is possible, but so is more extreme overang
Perceptron LogisAc Regression
38
Bayesian Neural Networks

Bayesian LogisAc Regression by inserAng a prior on the weights
Equivalent to L2 RegularizaAon
We can do the same here. Error Backprop then becomes Maximum A Posteriori (MAP) rather than Maximum Likelihood (ML) training
R() =
N 1 L(yn f (xn )) + ||||2 N n=0
39
HandwriAng RecogniAon
Demo: hgp://yann.lecun.com/exdb/lenet/ index.html
40
ConvoluAonal Network
The network is not fully connected. Dierent nodes are responsible for dierent regions of the image. This allows for robustness to transformaAons.
41
Other Neural Networks

MulAple Outputs Skip Layer Network Recurrent Neural Networks
42
MulAple Outputs
x0 x1 x2 0,2 xP 0,0 0,1 1,0 1,1 1,2
Used for N-way classicaAon. Each Node in the output layer corresponds to a dierent class. No guarantee that the sum of the output vector will equal 1.
43
Skip Layer Network

Input nodes are also sent directly to the output layer.
x0 x1 x2 f (x, )
44
Recurrent Neural Networks

Output or hidden layer informaAon is stored in a context or memory layer.
Output Layer
Hidden Layer
Context Layer
Input Layer
45
Recurrent Neural Networks

Output or hidden layer informaAon is stored in a context or memory layer.
Output Layer
Hidden Layer
Context Layer
Input Layer
46
Time Delayed Recurrent Neural Networks (TDRNN)

Output layer from Ame t are used as inputs to the hidden layer at Ame t+1.
Output Layer
With an opAonal decay Hidden Layer
Input Layer
47
Maximum Margin
Perceptron can lead to many equally valid choices for the decision boundary
Are these really equally valid?

48
Max Margin
How can we pick which is best? Maximize the size of the margin.
Small Margin LargeMargin
Are these really equally valid?

49
Next Time
Maximum Margin Classiers
Support Vector Machines Kernel Methods
50

Andrew Rosenberg - Lecture 14: Neural Networks

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Andrew Rosenberg - Lecture 14: Neural Networks

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture

Recall: The Neuron Metaphor

0,0 0,1 0,2

1,0 2,0 1,1 1,2 2,1 2,2 f (x, )

Linear Regression Neural Networks

Linear Regression Neural Networks

Introduce variables over the neural network

= {wij , wjk , wkl }

= {wij , wjk , wkl } al = wkl zk

= {wij , wjk , wkl }

N 1 L(yn f (xn )) N n=0

N 1 1 2 (yn f (xn )) N n=0 2 2 N 1 1 wjk g wij xn,i yn g wkl g N n=0 2 j i

Ln al,n R 1 = wkl N n al,n wkl

Calculus chain rule

Ln al,n R 1 = wkl N n al,n wkl

Calculus chain rule

R 1 1 (yn g(al,n ))2 al,n 2 = wkl N n al,n wkl

Ln al,n R 1 = wkl N n al,n wkl

Calculus chain rule

R 1 1 (yn g(al,n ))2 zk,n wkl 2 = wkl N n al,n wkl

Ln al,n R 1 = wkl N n al,n wkl

Calculus chain rule

OpAmize last layer weights wkl

R 1 Ln ak,n = wjk N n ak,n wjk

R 1 Ln al,n = wjk N n al,n ak,n

MulAvariate chain rule

R wij R t = wjk wkl R t = wkl wkl

Problems with Neural Networks

InterpretaAon of Hidden Layers

Overang in Neural Networks

Bayesian Neural Networks

Other Neural Networks

Skip Layer Network

Recurrent Neural Networks

Recurrent Neural Networks

Time Delayed Recurrent Neural Networks (TDRNN)

With an opAonal decay Hidden Layer

Are these really equally valid?

Are these really equally valid?

Das könnte Ihnen auch gefallen