Sie sind auf Seite 1von 8

Development of Self Learning Systems Using

Neural Networks
*Dr H.K Anasuya Devi, **Vikram Nag Ashoka, **Rakshithkumar N,
*CCE-Faculty, **CCE-Student, Indian Institute of Science, Bangalore-560012,
In the 21st century we have made machines
smarter by imparting knowledge in them.
We have made devices smarter by instilling
a sense in them of what the user wants to do,
how the user does a particular task,
understanding the environment the machine
or the device being used.
In the past few years we have gone a step
further and are building products and
services which provide information about
things that we want to know even before we
ask them. This is all possible only because
the machines have the capacity to learn the
user's behavior and their preferences. We
have developed various techniques to guide
machines in learning and here we will be
exploring atypical technique to replicate one
machines behavior into another. This is
human computer interaction.
The objective of this paper is to build a
process that can be used to transfer the
expertise of a trained neural network to an
untrained neural network using feedback
mechanism. Using this novel approach we
can reduce the number of training examples
and the time required to train a neural

1.1 Neural Networks:

Neural Networks is a paradigm that is
inspired by nervous system in human
body. Nervous system is composed of
multiple neurons. Each neuron receives
multiple inputs from various axons. The
neuron has two states excited state and
unexcited state. Neuron reaches excited state
only if it can respond to the signals it
receives. All the information from the
neurons is transferred to the brain which
processes it further. Similar idea is used in
neural networks where the weights assigned
to different inputs determine the output of
the network. Due to this mechanism neural
networks can be used in applications where
learning during the process is necessary.
This significant characteristic of the neural
networks helps in building self-learning
1.2 Training Artificial Neural Network
Once a network has been structured for a
particular application, that network is ready
to be trained. To start this process the initial
weights are chosen randomly. Then, the
training, or learning, begins. There are two
main approaches used for training purposes supervised and unsupervised.

1.2.1 Supervised Training.

In supervised training, both the inputs and
the outputs are provided. The network then
processes the inputs and compares its
resulting outputs against the desired outputs.
Errors are then propagated back through the
system, causing the system to adjust the
weights which control the network. This
process occurs over and over as the weights
are continually tweaked. The set of data
which enables the training is called the
"training set." During the training of a
network the same set of data is processed
many times as the connection weights are
ever refined. For example, teak wood and
rose wood can be distinguished using their
distinct characteristic features.
However with this method we face a
problem if there are overlapping features
and thus some networks never learn. This
could be because the input data does not
contain the specific information from which
the desired output is derived. Networks also
don't converge if there is not enough data to
enable complete learning.
1.2.2 Unsupervised or Adaptive Training.
The other type of training is called
unsupervised training. In unsupervised
training, the network is provided with inputs
but not with desired outputs. The system
itself must then decide what features it will
use to group the input data. This is often
referred to as self-organization or adaption.
One of the leading researchers into
unsupervised learning Tuevo Kohonen (add
reference) has pointed out that the lack of

topology in neural network models make

today's neural networks just simple
abstractions of the real neural networks
within the brain. As this research continues,
more powerful self learning networks may
become possible. But currently, this field
remains one that is still in the laboratory.

1.2.3 Reinforcement learning

This is an upcoming and a novel training
technique where a machine learns by
interacting with the environment.
Reinforcement learning is learning what to
do--how to map situations to actions--so as
to maximize a numerical reward signal. The
learner is not told which actions to take, as
in most forms of machine learning, but
instead must discover which actions yield
the most reward by trying them. In the most
interesting and challenging cases, actions
may affect not only the immediate reward
but also the next situation and, through that,
all subsequent rewards. These two
characteristics--trial-and-error search and
delayed reward--are the two most important
distinguishing features of reinforcement

2. Policy: Maps states to actions to be taken in

those states.
3. Reward function: Rewards are provided by
the environment and are numbers indicating
the desirability of the output.
4. Value function: Values are numbers
indicating the long-term desirability of the
state. These are estimated from sequences of
observed actions and rewards.

One of the challenges that arise in

reinforcement training is the trade-off
between exploration and exploitation. To
obtain a lot of reward, a reinforcement
learning agent must prefer actions that it has
tried in the past and found to be effective in
producing reward. But to discover such
actions, it has to try actions that it has not
selected before. The agent has to exploit
what it already knows in order to obtain
reward, but it also has to explore in order to
make better action selections in the future.
The dilemma is that neither exploration nor
exploitation can be pursued exclusively
without failing at the task. The agent must
try a variety of actions and progressively
favor those that appear to be best. On a
stochastic task, each action must be tried
many times to gain a reliable estimate its
expected reward.

2.1 Existing techniques for training
neural networks:
There are many alternative learning methods
and variants for neural networks. In the case
of feedforward multilayer networks the first
successful algorithm was the classical
backpropagation (Rumelhart et al., 1986).
Although this approach is very useful for the
learning process of this kind of neural
networks its main drawback is Slow
learning speed.
2.1.1 Various attempts made to increase
the learning speed of Backpropagation
In order to solve the problem of slow
learning speed, several variations of the
initial algorithm and also new methods have
been proposed.

Reinforcement learning has many elements.

1. Model of the environment: State transitionsStochastic or deterministic.

Modifications of the standard

algorithms: Some relevant modifications
of the backpropagation method have
been proposed. Sperduti and Antonina
(1993) extend the backpropagation

framework by adding a gradient descent

to the sigmoids steepness parameters.

Methods based on linear

least-squares: Some algorithms based on
linear least-squares methods have been
proposed to initialize or train
feedforward neural networks (BieglerKonig and Barmann, 1993; Pethel et
al., 1993; Yam et al., 1997; Cherkassky
and Mulier, 1998; Castillo et al., 2002;
Fontenla-Romero et al., 2003). These
methods are mostly based on minimizing
the mean squared error (MSE) between
the signal of an output neuron, before the
output nonlinearity, and a modified
desired output, which is exactly the
actual desired output passed through the
inverse of the nonlinearity.

Adaptive step size: In the

standard back propagation method the
learning rate, which determines the
magnitude of the changes in the weights
for each iteration of the algorithm, is
fixed at the beginning of the learning
process. Several heuristic methods for
the dynamical adaptation of the learning
rate have been developed (Hush and
Salas, 1988; Jacobs, 1988; Vogl et al.,
1988). Other interesting algorithm is the
superSAB, proposed by Tollenaere
(Tollenaere, 1990). This method is an
adaptive acceleration strategy for error
back propagation learning that converges
faster than the gradient descent with
optimal step size value, reducing the
sensitivity to parameter values

function, which is multiplied in each layer.

Therefore, the elements of the Jacobian
matrix can differ greatly in magnitude for
different layers. To solve this problem Rigler
et al. (1991) have proposed a rescaling of
these elements.

Rescaling of variables: The error signal

involves the derivative of the neural

For illustration purposes we consider

Master to be the one who is trained to

Here in this paper we take a holistic and

innovative approach of continuous feedback
mechanism to replicate one machines
expertise of doing a task into another. We
employ the machine learning technique
Reinforcement learning for the purpose.
Initially we have a Master neural network
which is trained to recognize a particular
object with very high accuracy. Student is
the machine trying to learn the task from the
Master in a fast and simple way.
2.1.2 Continuous Feedback mechanism
Master acts as a neural n/w embedded
system ready to teach its clients (set of
computers which need to receive the
information about the task that the master is
performing). Feed forward mechanism is
extensively used here to provide relevant
feedback to student so as to improve the
accuracy of the output produced by the
student. Feed backward mechanism is used
to provide the output produced by the
student to the master so that it can analyze
the output and generate relevant feedback.
Here we use the Markov decision processes
mathematical framework, Reinforcement
learning technique to help us in providing
accurate and relevant feedback.

recognize an image of a specific car with

high accuracy.
Student has no information about the task it
is trying to accomplish.
During the first iteration the Student is
given a random set of inputs so that the
generated output can be fed forward as the
input to the Master. Master analyses the
output from the student.
Analysis consists of various parts such as

Orientation: The orientation

of an object such as a line, plane or
rigid body is part of the description of
how it is placed in the space it is in.
Namely, it is the imaginary rotation that
is needed to move the object from a
reference placement to its current
placement. A rotation may not be
enough to reach the current placement.
It may be necessary to add an
imaginary translation, called the
object's location (or position, or linear
position). The location and orientation
together fully describe how the object is
placed in space. Master checks if the
orientation of the image generated by
the student is as required to produce the
necessary output. This information is
sent to the student via one of the input
neurons as shown in the Fig (1).
Student adjusts the weights of the
neuron accordingly.
Height: Master checks if there are some
changes to be made for the generated image
by the student to be of the required height. If
required this information is sent to the
student via one of the input neurons as
shown in the Fig (1). Student adjusts the
weights of the neuron accordingly.

Width: Master checks if the width of the

generated image is in accordance with the
required output else it provides information
which is sent to the student via one of the
input neurons as shown in the Fig (1).
Student adjusts the weights of the neuron
Depth: Master analyses the depth of the
image generated. Master then recommends
changes to be made with respect to the
weights of the input neurons of the student.
This information is sent to the student via
one of the input neurons as shown in the Fig
(1). Student adjusts the weights of the
neuron accordingly.
The process of transferring the expertise to
an untrained neural network involves
continuous feedback mechanism. The
feedback is dependent on many factors such
as orientation, size and color of the
generated image obtained by processing the
pixel information. The master neural
network will generate the feedback known
as reward function which is deterministic
and helps the Student neural network to alter
its generated image so as to improve its
The master has the mechanism to generate
the feedback from which it can build the
untrained neural network to necessary
expertise level. The feedback provided has
multiple sections which provide changes to
be made to the student generated image so
as to improve its accuracy. Multiple sections
involve changes to be made with respect to
orientation, height, width, depth of the
generated image. This feedback process
continues until the expertise of the student

neural network is at the same level as of its


Here we chose actions over time so as to

maximize the expected value of the total
payoff : E[R(s0) +R(s1) + 2R(s2) + )

Fig (1) : Continuous feedback mechanism.

3.Results and discussions:

At the end of this process we build
intelligent agents trained for a specific task.
This novel technique can be used to
replicate the expertise of neural networks in
a faster and simple manner. With the
continuous feedback mechanism we will
eliminate the problem where the neural
network does not learn at all due to
insufficient data about the output required or
because of insufficient data about
parameters required to generate the output.
Continuous feedback mechanism also helps
us to eliminate large number of training
examples necessary to train the neural
network for a particular task. Duplication is
very fast while using the technique. Now all
we need is data and time required to train
one neural network, and we can easily
replicate its functionality using the
aforementioned mechanism.

Markov Decision Processes: MDP provide a

mathematical framework for modeling
decision making in situations where
outcomes are partly random and partly under
the control of a decision maker. Each neural
network is represented by a tuple as shown
A tuple (S, A, {P},, R) where:
- S Set of states, here the states can be
Orientation, size, color of the object to be
- A Set of actions include making changes to
the weights of neurons.
- P State transition probabilities indicate if
the adjustment of weights of neurons in state
S will lead to state S.
- Discount factor represents the difference
between importance of future rewards and
present rewards.
- R Reward function represents the reward
received when transitioned from state S to S
as a result of adjustment of weights of

4. Conclusion and Future enhancements:

Continuous feedback mechanism helps us in
achieving significant results when applied to
areas such as manufacturing, operating
systems, solving classification problems,
replicating real time systems.
Some of the enhancements and applications
of the system are as below:

Automation becomes easy.

manufacturing, where many machines
are used to for working on same tasks.
By this method we can replicate fast and
perform tasks with ease and with higher
success rate.

Helps in building new kinds

of architecture for processors and
operating systems: Using this feedback
mechanism it is simpler to build
operating systems which adapt to user
behavior. Once a particular task learns
about user preferences it is simple to
instill that sense among other tasks
which may be using third party software
that the user has installed. Operating
systems can inform the third party
software about the user preferences and
thus they can adapt quickly to become

Useful for classification

problems which involve recognizing
different objects and categorizing them.

Replicating real
systems: Consider an example where we
have to build a huge network of systems
which are used for recognizing human
faces. Using this technique we can train
multiple systems faster to recognize the
faces that one system has learnt.

Therefore we can use this technique to

combine multiple systems which are
spread across a large geographical area
so that the person who is from point A
recognized by the system at A can also
be recognized when he travels to point
B. This is illustrated in Fig(2).
Can help train stand by systems which can
replace an existing server used for memory
or processor intensive computing.

Fig (2) : Neural Network.

[1] L. B. Almeida, T. Langlois, J. D. Amaral, and A. Plakhov. Parameter adaptation in stochastic optimization. In D.
Saad, editor, On-line Learning in Neural Networks, chapter 6, pages 111134. Cambridge University Press,
[2] R. Battiti. First and second order methods for learning: Between steepest descent and Newtons method. Neural
Computation, 4(2):141166, 1992.

[3] E. M. L. Beale. A derivation of conjugate gradients. In F. A. Lootsma, editor, Numerical methods for nonlinear
optimization, pages 3943. Academic Press, London, 1972.
[4] F. Biegler-Konig and F. Barmann. A learning algorithm for multilayered neural networks based on linear leastsquares problems. Neural Networks, 6:127131, 1993.

[5] W. L. Buntine and A. S. Weigend. Computing second derivatives in feed-forward networks: A review. IEEE
Transactions on Neural Networks, 5(3):480488, 1993.
[6] E. Castillo, A. Cobo, J. M. Gutierrez, and R. E. Pruneda. Working with differential, functional and difference
equations using functional networks. Applied Mathematical Modelling, 23(2):89107,1999.

[7] E. Castillo, J. M. Gutierrez, and A. Hadi. Sensitivity analysis in discrete bayesian networks. IEEE Transactions
on Systems, Man and Cybernetics, 26(7):412423, 1997.