Beruflich Dokumente
Kultur Dokumente
Neural Networks 1
Topics
– Perceptrons
structure
training
expressiveness
– Multilayer networks
possible structures
– activation functions
training with gradient descent and backpropagation
expressiveness
Consider humans:
– Neuron switching time ~ 0.001 second
– Number of neurons ~ 1010
– Connections per neuron ~ 104-5
– Scene recognition time ~ 0.1 second
– 100 inference steps doesn’t seem like enough
Properties:
– Many neuron-like threshold switching units
– Many weighted interconnections among units
– Highly parallel, distributed process
– Emphasis on tuning weights automatically
X2 Y
Output node sums up its
input values according to X3 t
the weights of their links
y sign ( w j x j t )
threshold t
j
– Delta rule
train on unthresholded outputs
driven by continuous differences between correct
and predicted outputs
modify weights via gradient descent
0.1
yi yˆ i 1
1 1
-0.2 ( yi yˆ i ) xi1 0.1 -0.1
0.3 0.5
0 ( yi yˆ i ) xi 2 0.0
0 0.5 0 0.5
1
2 i 2 i
Note that error is difference between correct
output and unthresholded sum of inputs, a
continuous quantity (rather than binary difference
between correct output and thresholded output).
Gradient :
E E E
E ( w ) , ,,
w0 w1 wd
E 1
w j w j 2 i
( y i ˆ
y i ) 2
1
2 w j
i i
(
i
y ˆ
y ) 2
1
2( yi yˆ i ) ( yi yˆ i )
2 i w j
( yi yˆ i ) ( yi w x i )
i w j
E
( yi yˆ i )( xij )
w j i
Batch mode
– Compute errors and weight updates for a block of
samples (maybe all samples).
– Apply all updates simultaneously to weights.
Input
nodes
Output
X1 node
X2
X3
t
Y
y I ( w j x j t )
j