Beruflich Dokumente
Kultur Dokumente
Networks
Introduction
History
a or b
a and b
1950s:
Farley
and Clark
Rochester,
History
layer system:
Input nodes
Output node
Association layer
Can
History
input
Output is weighted and associated
Function fires if weighted sum of input exceeds a
threshold.
History
layers of neurons
Better
ANN
Promises
Combine
Neuron Model
Natural neurons
Neuron Model
Neuron Model
Natural neurons
Neuron Model
Bias Nodes
Add
Forward propagation
Calculate
Neuron Model
Firing Rules:
Threshold
rules:
Perceptron
rule
1
( ) 0
2
0
0
Neuron Model
tangent function
tanh( / 2)
Logistic
1 exp( )
1 exp( )
activation function
1
1 exp
where
Wji
p
1
1 exp(V j (n))
2
Bias Node
and of inputs
w0,3=.6
w1,3=.6
w2,3=-.9
xor of inputs
2
5.9
5.1
4
4.6
1
3
-5.2
-3.2
5.2
-2.6
Bias
-2.7
Input (0,0)
-4.8
5.9
5.1
4
4.6
1
3
-5.2
-3.2
5.2
-2.6
Bias
-2.7
Input (0,0)
Node
5.9
5.1
4
4.6
1
3
-5.2
-3.2
5.2
-2.6
Bias
-2.7
Input (0,0)
Node
-4.8
5.9
5.1
4
4.6
1
3
-5.2
-3.2
5.2
-2.6
Bias
-2.7
Input (0,0)
5.9
5.1
4
4.6
1
3
-5.2
-3.2
5.2
-2.6
Bias
-2.7
Input (0,1)
Node
-4.8
5.9
5.1
4
4.6
1
3
-5.2
-3.2
5.2
-2.6
Bias
-2.7
Input (0,1)
Node
-4.8
5.9
5.1
4
4.6
1
3
-5.2
-3.2
5.2
-2.6
Bias
-2.7
Input (0,1)
5.9
5.1
4
4.6
1
3
-5.2
-3.2
5.2
-2.6
Bias
-2.7
Density Plot of
Output
ANN Training
Start
ANN Training
a training set
During
ANN Training
2
E ( x ) ( y k ( x ) t k ( x ))
2 k 1
y observed output
t target output
ANN Training
ANN Training
E ( x )
w j ,k w j ,k ( )
w jk
E ( x )
k k ( )
k
ANN Training
E ( x )
y j j
w jk
0.1
-0.5
4
1
1
3
-1
-0.5
-0.5
Bias
x1
x2
0.687349
0.667459
0.698070
0.676727
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
x1
x2
0.69 0.472448
0.67 0.110583
0.70 0.0911618
0.68 0.457959
Bias
Average Error is 0.283038
Error
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
x1
x2
0.69 0.472448
0.67 0.110583
0.70 0.0911618
0.68 0.457959
Bias
Average Error is 0.283038
Error
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
Bias
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
Bias
( net4(0,0)) = ( 0.787754) = 0.214900
( net4(0,1)) = ( 0.696717) = 0.221957
( net4(1,0)) = ( 0.838124) = 0.210768
( net4(1,1)) = ( 0.738770) = 0.218768
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
Bias
E ( x )
y j j
w jk
Input 0,0:
4= ( net4(0,0)) *(0-y4(0,0)) = -0.152928
Input 0,1:
4= ( net4(0,1)) *(1-y4(0,1)) = 0.0682324
Input 1,0:
4= ( net4(1,0)) *(1-y4(1,0)) = 0.0593889
Input 1,1:
4= ( net4(1,1)) *(0-y4(1,1)) = -0.153776
Average: 4 = -0.0447706
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
Bias
E ( x )
y j j
w jk
E4,2 = - output(neuron(2)* 4
For (0,0): E4,2 = 0.0577366
For (0,1): E4,2 = -0.0424719
For(1,0): E4,2 = -0.0159721
For(1,1): E4,2 = 0.0768878
Average is 0.0190451
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
Bias
E ( x )
y j j
w jk
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
Bias
E ( x )
y j j
w jk
-0,5
-0.5
1
1
0.1
4
3
-1
-0.5
-0.5
Bias
E ( x )
y j j
w jk
-0.0108341
For(1,1): E4,B =
0.0580565
Average is
0.0447706
We now have adjusted all the weights into the output layer.
Next, we adjust the hidden layer
The target output is given by the delta values of the output layer
More formally:
Assume that j is a hidden neuron
Assume that k is the delta-value for an output neuron k.
While the example has only one output neuron, most ANN have more.
When we sum over k, this means summing over all output neurons.
wkj is the weight from neuron j into neuron k
E
yi j
w ji
-0,5
-0.5
1
1
0.1
-0.5
-1
-0.5
Bias
2(0,1) =
0.00160349
E
yi j
w ji
2(1,0) =
0.00116766
2(1,1) = - 0.00384439
-0,5
-0.5
1
1
0.1
4
-0.5
Bias
E
yi j
w ji
-0.5
-1
E20(0,1) = 0.00179694
E20(1,0) = -0.000853626
E20(1,1) = 0.00281047
-0,5
-0.5
1
1
0.1
4
-0.5
Bias
E
yi j
w ji
-0.5
-1
E21(0,1) =
-0.00117224
E21(1,0) = -0.000583829
E21(1,1) =
0.00281047
-0,5
-0.5
1
1
0.1
4
-0.5
-1
-0.5
Bias
E
yi j
w ji
-0.00117224
E2B(1,0) =
-0.000583829
E2B(1,1) =
0.00281047
0.00179694
-0,5
-0.5
1
1
0.1
4
-0.5
-1
-0.5
Bias
E
yi j
w ji
ANN Training
ANN Training
XOR is too simple an example, since
quality of ANN is measured on a finite sets
of inputs.
More relevant are ANN that are trained on
a training set and unleashed on real data
ANN Training
training sets
Need test sets.
of a Mistake:
In a subtle way, the test set has become part of the training set.
ANN Training
Convergence
overcorrect weights
undercorrect weights
ANN Training
Overtraining
An
Training set
Test set
Training time
ANN Training
Overtraining
ANN Training
Overtraining
ANN Training
Improving Convergence
Many
Simulated annealing
Sophisticated gradient descent
ANN Design
ANN Design
Number of layers
Apparently,
ANN Design
Interpreting Output
ANNs
Can
ANN Applications
Pattern recognition
Network attacks
Breast cancer
handwriting recognition
Pattern completion
Auto-association
Noise reduction
Compression
Finding anomalies
ANN Future
ANNs can do some things really well
They lack in structure found in most
natural neural networks
Pseudo-Code
phi activation function
phid derivative of activation function
Pseudo-Code
Forward Propagation:
Input
foreach inputnode i
outputi = xi
Hidden
layer nodes j
foreach hiddenneuron j
outputj = i phi(wjioutputi)
Pseudo-Code
ActivateLayer(input,output)
foreach i inputneuron
calculate outputi
foreach j hiddenneuron
calculate outputj
foreach k hiddenneuron
calculate outputk
output = {outputk}
Pseudo-Code
Output Error
Error()
{
foreach input in InputSet
Errorinput = k output neuron (targetk-outputk)2
return Average(Errorinput,InputSet)
Pseudo-Code
Gradient Calculation
We
The
Pseudo-Code
For each output neuron k calculate:
E
output j k
Wkj
Pseudo-Code
For each hidden neuron j calculate:
E
output i j
W ji
Pseudo-Code
These calculations were done for a single
input.
Now calculate the average gradient over
all inputs (and for all weights).
You also need to calculate the gradients
for the bias weights and average them.
Pseudo-Code
and 1)
For a maximum number of iterations do
Pseudo-Code
Node 0: x0
0.3
0
4
1
1
0.5
3
-1
-0.7
-0.5
Bias
Node 1: x1
Node 2: o2 = (x0 + x1 -0.5)
Node 3: o3 = (0.5 x1 -1)
Calculate outputs
0.3
0
4
1
1
0.5
3
-1
-0.7
-0.5
Bias
x1
x2
y=o4
0.7160
0.7155
0.7308
0.7273
1
0
1
1
0.5
0
2
3
-1
0.3
4
1
0.7
0.5
Bia
s
x0
x1
E=(y-t)2/2
0.2564
0.7160
0
0.7155
0.0405
0.7308
0.0362
0.7273
0.264487
0
1
0
1
0.5
0.3
4
3
-1
-0.7
-0.5
Bias
Need to calculate net4, the weighted input of all input into node 4
net4(x0,x1) = 0.3 o2(x0,x1) 0.7 o3(x0,x1) + 1
net4 = (net4(0,0) + net4(0,1) + net4(1,0) + net4(1,1))/4
This gives 0.956734
0
1
0
1
0.5
0.3
4
3
-1
-0.7
-0.5
Bias
We now calculate
4(0,0) = (net4(0,0)(0 - o4(0,0)) = - 0.14588
4(0,1) = (net4(0,1)(1 - o4(0,1)) = 0.05790
4(1,1) = (net4(1,0)(0 - o4(1,0)) = 0.05297
4(1,1) = (net4(1,1)(0 - o4(1,1)) = -0.14425
On average 4 = -0.044741
E ( x )
y j j
w jk
0
1
0
1
0.5
4 = -0.044741
0.3
4
3
-1
-0.7
-0.5
Bias
E ( x )
y j j
w jk
0
1
0
1
0.5
0.3
4
3
-1
-0.7
-0.5
Bias
E4,2 = 0.00708
Therefore, new weight w42 is 0.2993
E ( x )
y j j
w jk