Multi Layer Feed-Forward Network Learning

A VARIANT OF BACK-PROPAGATION ALGORITHM
FOR MULTILAYER FEED-FORWARD NETWORK LEARNING

A. Khavare Ajinkya A, B. Bidkar Harshal S
DKTES’s Textile &Engineering Institute, (Rajwada)
Ichalkaranji, Maharashtra
aju_kh@yahoo.com , harshal.bidkar@gmail.com
Abstract
Artificial neural networks (ANNs) provide a general,
practical method for learning real-valued, discrete-valued
and vector-valued functions. Algorithms such as Back-
propagation use gradient descent to tune network
parameters to best fit a training set of input-output pairs.
ANN learning is robust to errors in the training data and
has been successfully applied to problems such as
interpreting visual scenes, speech recognition and Fig: 1 A Perceptron
learning robot control strategies. A perceptron takes a vector of real-valued inputs,
In this paper, a variant of Back-propagation algorithm is calculates a linear combination of these inputs, then
proposed for feed-forward neural networks learning. The outputs a 1 if the result is greater than some threshold
proposed algorithm improve the back-propagation and -1 otherwise. More precisely, given inputs xl
training in terms of quick convergence of the solution through x,, the output o(x1, . . . , x,) computed by the
depending on the slope of the error graph and increase perceptron is
the speed of convergence of the system.
Keywords: Neural Networks, Adaptive navigation.
where each wi is a real-valued constant, or weight, that
1. Introduction determines the contribution of input xi to the perceptron
Machine learning refers to a system capable of the output.
autonomous acquisition and integration of knowledge. Learning a perceptron involves choosing values for
This capacity to learn from experience, analytical the weights w0,. . . ,wn,. Let us begin by understanding
observation, and other means, results in a system that can how to learn the weights for a single perceptron. Here the
continuously self-improve and thereby offer increased precise learning problem is to determine a weight vector
efficiency and effectiveness. Over the past 50 years, the that causes the perceptron to produce the correct +1/-1
study of machine learning has grown from the efforts of a output. We can train a perceptron using perceptron rule.
handful of computer engineers exploring whether One way to learn an acceptable weight vector is to
computers could learn to play games, and a field of begin with random weights, then iteratively apply the
statistics that largely ignored computational perceptron to each training example, modifying the
considerations, to a broad discipline that has produced perceptron weights whenever it misclassifies an example.
fundamental statistical-computational theories of learning This process is repeated, iterating through the training
processes, has designed learning algorithms that are examples as many times as needed until the perceptron
routinely used in commercial systems from speech classifies all training examples correctly.
recognition to computer vision, and has spun off an But, in some cases, multi-layer perceptron network
industry in data mining to discover hidden regularities in i.e. feed-forward neural network is needed whenever
the growing volume of online data. nonlinear decision surfaces are to be used. Fig: 2 show us
Neural network learning methods provide a robust a feed forward network.
approach to approximating real-valued, discrete-valued,
and vector-valued target functions. For certain types of
problems, such as learning to interpret complex real-
world sensor data, artificial neural networks are among
the most effective learning methods currently known.
The remainder of the paper is organized as follows:
Section (2) focuses on theoretical concepts of Perceptron
& Multilayer Feed-forward network. Section (3)
emphasizes on Back-propagation algorithm and its
variant. Lastly we conclude providing results.
2. Perceptrons & FNN Fig: 2 Feed-forward Neural Network

Perceptrons are single layered units of ANN systems
as illustrated in Fig: 1.
A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning Page 1 of 5

DKTES’s Textile & Engg. Institute, Ichalkaranji
Each neuron in one layer has directed connections to

the neurons of the subsequent layer. In many applications
the units of these networks apply a sigmoid function as
an activation function. Inputs are provided to the Input
layer. Output is taken at output layer. In this computation, Weights and bias between hidden and output layer
hidden layers are provided to allow other complex are updated as follows:
computations.
Feed-forward neural networks (FNN) have been
widely used for various tasks, such as pattern recognition,
function approximation, dynamical modelling, data
mining and time series forecasting. The training of FNN
is mainly undertaken using the back-propagation (BP) and δk is the error between the hidden and output
based learning. layer and calculated as:
3. Back-Propagation Algorithm
Back-propagation, or propagation of error, is a
common method of teaching artificial neural networks 4. Variant of Back-propagation Algorithm
how to perform a given task. It was first described by The Back-propagation algorithm described above
Paul Werbos in 1974. has many shortcomings. The time complexity of the
In back-propagation, there are two phases in its algorithm is high and it gets trapped frequently in sub-
learning cycle, one to propagate the input pattern through optimal solutions. It is also difficult to get an optimum
the network and the other to adapt the output, by step size for the learning process, since a large step size
changing the weights in the network. It is the error would mean faster learning, which may miss an optimal
signals that are back propagated in the network operation solution altogether, and a small step size would mean a
to the hidden layer(s). The portion of the error signal that very high time complexity for the learning process.
a hidden-layer neuron receives in this process is an Hence, we discuss a variant of above algorithm with
estimate of the contribution of a particular neuron to the following changes.
output error. Adjusting on this basis the weights of the A) Momentum: A simple change to the training law that
connections, the squared error, or some other metric, is sometimes results in much faster training is the addition
reduced in each cycle and finally minimized, if possible. of a momentum term. With this change, the weight
change continues in the direction it was heading. This
Mathematical Analysis : Assume a network with N weight change, in the absence of error, would be a
inputs and M outputs. Let xi be the input to ith neuron in constant multiple of the previous weight change. The
input layer, Bj be the output of the jth neuron before momentum term is an attempt to try to keep the weight
activation, yj be the output after activation, bj be the bias change process moving & also makes the convergence
between input and hidden layer, bk be the bias between faster and the training more stable.
hidden and output layer, wij be the weight between the B) Dynamic control for the learning rate and the
input and the hidden layers, and wjk be the weight momentum: Learning parameters such as learning rate
between the hidden and output layers. Let η be the and momentum serve a better purpose if they can be
learning rate and δ the error. Also, let i, j and k be the changed dynamically during the course of the training .
indexes of the input, hidden and output layers The learning rate can be high when the system is far
respectively. from the goal, and can be decreased when the system
The response of each unit is computed as: gets nearer to the goal, so that the optimal solution
cannot be missed.
C) Gradient Following: Gradient Following has been
added to enable quick convergence of the solution. When
the system is far away from the solution, the learning rate
is further increased by a constant parameter C1 and when
Weights and bias between input and hidden layer are the system is close to a solution, the learning rate is
updated as follows: decreased by a constant parameter C2.
D) Speed Factor: To increase the speed of convergence
of the system, a speed factor S has been used.
Mathematical Analysis : The above algorithm is

modified with suitable steps as follows.
A) Momentum: Let the momentum term be α.

Where, δj is the error between the input and hidden layer Then equation (3) and equation (4) would be modified as:
and calculated as:

The term δ j would be calculated as in equation (5).
The equation (6) and equation (7) would be modified as:

D) Speed Factor: Let S be the speed factor.
Equation (9) and equation (10) would be modified as:
The term δ k would be calculated as in equation (8).
B) Dynamic Control for learning rate and momentum: If

changing the weight decreases the cost function (mean Similarly, equation (11) and (12) would be modified as:
squared error), then the learning rate is given by
C) Gradient Following: Let C1 and C2 be two constants,

such that C1 > 1 and 0 < C2 < 1 and ∆max and ∆min be
the maximum and minimum change permissible for the 5. Experimental Study
weight change. If (∂E/∂w) is the gradient following term The algorithm proposed in this paper was tested on
then three cases need to be considered: the training of standard multilayer feed forward networks
(FNNs) for 8 bit parity problem. The selection of initial
weights is important in feed-forward neural network
training. If the initial weights are very small, the back-
propagated error is so small that practically no change
takes place for some weights and therefore more iteration
are necessary to decrease the error. Large values of
weights, results in speed up of learning, but they can lead
to saturation and to flat regions of the error surface
where training is slow. Keeping these in consideration,
the experiment was conducted using the same initial
weight vectors that have been randomly chosen from a
uniform distribution in (-1,1). Fig: 3 shows us the 8-bit

parity problem neural network (8-8-1).
Fig: 4 Comparison of training time between Back-

propagation algorithm and proposed algorithm for
different momentum and speed for 8-bit parity problem.
Fig: 3 8-bit parity problem neural network
6. Conclusion
The initial learning rate was kept constant for both The variant in Back-propagation has been proposed
algorithms. It was chosen carefully so that the Back- for the training of feed forward neural networks. The
propagation training algorithm rapidly converges without convergence properties of both algorithm have been
oscillating toward a global minimum. Then all the other studied and the conclusion was reached that new
learning parameters were tuned by trying different values algorithm is globally convergent. The proposed
and comparing the number of successes exhibited by five algorithm was tested on available training tasks. These
simulation runs that started from the same initial weights. results point to the conclusion that the proposed methods
stand as very promising new tools for the efficient
Fig: 4 shows the results of training on 8-8-1 network training of neural networks in terms of time. It also
(eight inputs, one hidden layer with eight nodes and one proves to be much more accurate than the existing Back-
output node) on the 8-bit parity problem. It can be propagation algorithm. In addition the error correction
observed that training is considered successful for the rate achieved is much faster and training time is also
given dataset for speed constant and momentum. It can much faster as shown in the results.
be seen that training time is drastically reduced in the
proposed algorithm. In Back-propagation for increase in 7. Acknowledgements
number of cycles, the training time increases rapidly but We would like to acknowledge our Head Of Dept.
in all the cases for the proposed speed algorithm the Prof. Mrs. L. S. Admuthe madam who gave her
training time increases gradually. Also for the change in valuable suggestion in preparing for the above topic.
the momentum and speed term, there was not much We would also like to thank our friends who
difference in the training time. supported us in preparing for this paper.
8. References
[1] J. M. Zurada, “Introduction to artificial neural
systems,” M. G. Road, Mumbai: Jaico, (2002).
[2] P. Mehra and B. W. Wah, “Artificial neural networks:
concepts and theory,” IEEE Comput. Society Press,
(1992).
[3] E.M. Johansson, F.U. Dowla, and D.M. Goodman,
“Backpropagation learning for multi-layer feed-
forward neural networks using the conjugate gradient
method,” Intl. J. Neural Systems, vol. 2: pp. 291-301
(1992).

[4] X. Yu, M. O. Efee, and O. Kaynak, “A general

backpropagation algorithm for feed-forward neural
networks learning,” IEEE Trans. Neural Networks,
vol. 13, no. 1, pp. 251–254 (January 2002).
[5] Machine Learning - Tom M. Mitchell. McGraw-Hill
Science/Engineering/Math; (March 1, 1997)
[6] Bodgam M., David Hunter, “Solving parity-N
problems with feedforward neural networks.” 1992
[7] J. Hertz, A. Krogh, R. Palmer, Introduction to the
Theory of Neural Computation. MA: Addison-
Wesley, 1991.

Multi Layer Feed-Forward Network Learning

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Multi Layer Feed-Forward Network Learning

Hochgeladen von

Copyright:

Verfügbare Formate

A VARIANT OF BACK-PROPAGATION ALGORITHM

FOR MULTILAYER FEED-FORWARD NETWORK LEARNING

2. Perceptrons & FNN Fig: 2 Feed-forward Neural Network

A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning Page 1 of 5

Each neuron in one layer has directed connections to

Mathematical Analysis : The above algorithm is

A) Momentum: Let the momentum term be α.

A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning Page 2 of 5

The term δ j would be calculated as in equation (5).

The equation (6) and equation (7) would be modified as:

The term δ k would be calculated as in equation (8).

B) Dynamic Control for learning rate and momentum: If

C) Gradient Following: Let C1 and C2 be two constants,

A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning Page 3 of 5

parity problem neural network (8-8-1).

Fig: 4 Comparison of training time between Back-

A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning Page 4 of 5

[4] X. Yu, M. O. Efee, and O. Kaynak, “A general

A Variant of Back-propagation Algorithm for Multilayer Feed-Forward Network Learning Page 5 of 5

Das könnte Ihnen auch gefallen