05 2 2-Backprop

Neural Nets Using
Backpropagation
Chris Marriott
Ryan Shirley
CJ Baker
Thomas Tannahill
Agenda
Review of Neural Nets and Backpropagation

Backpropagation: The Math
Advantages and Disadvantages of Gradient
Descent and other algorithms
Enhancements of Gradient Descent
Other ways of minimizing error
Review
Approach that developed from an analysis of

the human brain
Nodes created as an analog to neurons
Mainly used for classification problems (i.e.
character recognition, voice recognition,
medical applications, etc.)
Review
Neurons have weighted inputs, threshold

values, activation function, and an output
Weighted inputs
Output
Activationfunction=f( (inputs*weight))
Review
4InputAND
Inputs
Threshold=1.5
Outputs
Threshold=1.5
Inputs
Threshold=1.5
Allweights=1andalloutputs=1ifactive0otherwise
Review
Output space for AND gate

Input1
(1,1)
(0,1)
1.5=w1*I1+w2*I2
(0,0)
(1,0)
Input2
Review
Output space for XOR gate

Demonstrates need for hidden layer
Input1
(0,1)
(1,1)
Input2
(0,0)
(1,0)
General multi-layered neural network

OutputLayer
0
X0,0
HiddenLayer
1
W1,0
W0,0
Wi,0
i
InputLayer
9
X9,0
X1,0
Backpropagation
Calculation of hidden layer activation values
Backpropagation
Calculation of output layer activation values
Backpropagation
Calculation of error
k = f(Dk) -f(Ok)
Backpropagation
Gradient Descent objective function
Gradient Descent termination condition
Backpropagation
Output layer weight recalculation
Learning Rate
(eg. 0.25)
Error at k
Backpropagation
Hidden Layer weight recalculation
Backpropagation Using Gradient

Descent
Advantages
Relatively simple implementation

Standard method and generally works well
Disadvantages
Slow and inefficient

Can get stuck in local minima resulting in suboptimal solutions
Local Minima
Local
Minimum
GlobalMinimum
Alternatives To Gradient Descent
Simulated Annealing
Advantages
Can guarantee optimal solution (global minimum)
Disadvantages
May be slower than gradient descent

Much more complicated implementation
Genetic Algorithms/Evolutionary Strategies
Advantages
Faster than simulated annealing

Less likely to get stuck in local minima
Disadvantages
Slower than gradient descent

Memory intensive for large nets
Simplex Algorithm
Advantages
Similar to gradient descent but faster

Easy to implement
Disadvantages
Does not guarantee a global minimum
Enhancements To Gradient
Descent
Momentum
Adds a percentage of the last movement to the

current movement
Descent
Momentum
Useful to get over small bumps in the error function

Often finds a minimum in less steps
w(t) = -n*d*y + a*w(t-1)
w is the change in weight

n is the learning rate
d is the error
y is different depending on which layer we are calculating
a is the momentum parameter
Descent
Adaptive Backpropagation Algorithm
It assigns each weight a learning rate

That learning rate is determined by the sign of the gradient
of the error function from the last iteration
If the signs are equal it is more likely to be a shallow slope so

the learning rate is increased
The signs are more likely to differ on a steep slope so the
learning rate is decreased
This will speed up the advancement when on gradual

slopes
Descent
Adaptive Backpropagation
Possible Problems:
Since we minimize the error for each weight separately

the overall error may increase
Solution:
Calculate the total output error after each adaptation

and if it is greater than the previous error reject that
adaptation and calculate new learning rates
Descent
SuperSAB(Super Self-Adapting Backpropagation)
Combines the momentum and adaptive methods.

Uses adaptive method and momentum so long as the sign
of the gradient does not change
This is an additive effect of both methods resulting in a faster

traversal of gradual slopes
When the sign of the gradient does change the momentum

will cancel the drastic drop in learning rate
This allows for the function to roll up the other side of the
minimum possibly escaping local minima
Descent
SuperSAB
Experiments show that the SuperSAB converges

faster than gradient descent
Overall this algorithm is less sensitive (and so is less
likely to get caught in local minima)
Other Ways To Minimize Error
Varying training data
Add noise to training data
Cycle through input classes

Randomly select from input classes
Randomly change value of input node (with low
probability)
Retrain with expected inputs after initial training
E.g. Speech recognition
Other Ways To Minimize Error
Adding and removing neurons from layers
Adding neurons speeds up learning but may cause

loss in generalization
Removing neurons has the opposite effect
Resources
Artifical Neural Networks, Backpropagation, J.

Henseler
Artificial Intelligence: A Modern Approach, S.
Russell & P. Norvig
501 notes, J.R. Parker
www.dontveter.com/bpr/bpr.html
www.dse.doc.ic.ac.uk/~nd/surprise_96/journal/
vl4/cs11/report.html

05 2 2-Backprop

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

05 2 2-Backprop

Hochgeladen von

Copyright:

Verfügbare Formate

Neural Nets Using

Review of Neural Nets and Backpropagation

Approach that developed from an analysis of

Neurons have weighted inputs, threshold

Output space for AND gate

Output space for XOR gate

Backpropagation: The Math

General multi-layered neural network

Backpropagation: The Math

Calculation of hidden layer activation values

Backpropagation: The Math

Calculation of output layer activation values

Backpropagation: The Math

Backpropagation: The Math

Gradient Descent objective function

Gradient Descent termination condition

Backpropagation: The Math

Output layer weight recalculation

Backpropagation: The Math

Hidden Layer weight recalculation

Backpropagation Using Gradient

Relatively simple implementation

Slow and inefficient

Alternatives To Gradient Descent

Can guarantee optimal solution (global minimum)

May be slower than gradient descent

Alternatives To Gradient Descent

Genetic Algorithms/Evolutionary Strategies

Faster than simulated annealing

Slower than gradient descent

Alternatives To Gradient Descent

Similar to gradient descent but faster

Does not guarantee a global minimum

Adds a percentage of the last movement to the

Useful to get over small bumps in the error function

w is the change in weight

Adaptive Backpropagation Algorithm

It assigns each weight a learning rate

If the signs are equal it is more likely to be a shallow slope so

This will speed up the advancement when on gradual

Since we minimize the error for each weight separately

Calculate the total output error after each adaptation

SuperSAB(Super Self-Adapting Backpropagation)

Combines the momentum and adaptive methods.

This is an additive effect of both methods resulting in a faster

When the sign of the gradient does change the momentum

Experiments show that the SuperSAB converges

Other Ways To Minimize Error

Varying training data

Add noise to training data

Cycle through input classes

Retrain with expected inputs after initial training

E.g. Speech recognition

Other Ways To Minimize Error

Adding and removing neurons from layers

Adding neurons speeds up learning but may cause

Artifical Neural Networks, Backpropagation, J.

Das könnte Ihnen auch gefallen