You are on page 1of 22


Neural Networks
W hat is a N eural Network?

The human brain is a highly complex, nonlinear and parallel computer. It has the
capability to organize its structural constituents, known as neurons, so as to perform
certain computations many times faster than the fastest digital computer in existence
A neural network is massively parallel distributed processor made up of simple
processing units, which has a natural propensity for storing experiential knowledge and
making it available for use.
It resembles the brain in two respects:
1. Knowledge is acquired by the network from its environment through process.
2. Interneuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
The procedure used to perform the learning process is called a learning algorithm.
Its function is to modify the synaptic weights of the network to attain a desired design
Neural networks are also referred to as neurocomputers, connectionist networks,
parallel distributed processors.

B enefits of N eural N etwork

A neural network derives its computing power through
1. its massively parallel distribute structure
2. its ability to learn and generalize

P roperties and C apabilities of N eural N

1. Nonlinearity
Nonlinearity is a highly important property, particularly if the underlying physical
mechanism responsible for generation of the input signal is inherently nonlinear.
2. Input-output mapping
 Supervised learning
 Working through training samples or task examples. Page 2
3. Adaptivity
 Adapting the synaptic weights to change in the surrounding environments.
4. Evidential response
5. Contextual information
6. Fault tolerance
7. VLSI implementability
8. Uniformity of analysis and design
9. Neurobiological analogy

H uman B rain
The human nervous system may be viewed as a three-stage system.

Central to the nervous system is the brain. It is represented by the neural net. The
brain continually receives the information, perceives it, and makes appropriate decisions.
The arrows pointing from left to right indicate the forward transmission of information –
bearing signals through the system. The arrows pointing from right to left signify the
presence of feedback in the system.
The receptors convert stimuli from the human body or the external environment
into electrical impulses that convey information to the neural net (the brain). The effectors
convert electrical impulses generated by the neural net into discernible responsible as
system outputs.
Typically, neurons are five to six orders of magnitude slower than silicon gates.
Events in the silicon chip happen in the 10 -9s – range, whereas neural events happen in
the 10-3s – range.
It is estimated that there are approximately 10 billion neurons and 60 trillion
synapses or connections in the human brain.
Synapses are elementary structural and functional units that mediate the
interactions between neurons. The most common kind of synapse is a chemical synapse.
A chemical synapse operates as follows. A pre-synaptic process liberates a
transmitter substance that diffuses across the synaptic junction between neurons and then Page 3
acts on a post-synaptic process. Thus, a synapse converts a pre-synaptic electric signal into
a chemical signal and then back into a post synaptic electrical signal.
Structural organization of levels in the brain

The synapses represent the most fundamental level, depending on molecules and
ions for their action.
A neural microcircuit refers to an assembly of synapses organized into patterns of
connectivity to produce a functional operation of interest.
The neural microcircuits are grouped to form dendritic subunits within the
dendritic trees of individual neurons.
The whole neuron is about 100m in size. It contains several dendritic subunits.
The local circuits are made up of neurons with similar or different properties. Each
circuit is about 1mm in size. The neural assemblies perform operations on characteristics
of a localized region in the brain. Page 4
The interregional circuits are made up of pathways, columns and topographic
maps, which involve multiple regions located in different parts of the brain.
Topographic maps are organized to respond to incoming sensory information.
The central nervous system is the final level of complexity where the topographic
maps and other interregional circuits mediate specific types of behavior.

M odels of a N euron
A neuron is an information-processing unit that is fundamental to the operation of a
neural network. Its model can be shown in the following block diagram.

The neuronal model has three basic elements:

1. A set of synapses each of which is characterized by a weight or strength of its own.
Each synapse has two parts: a signal xj and a weight wkj. Wkj refers to the weight of
the kth neuron with respect to jth input signal. The synaptic weight may range
through positive as well as the negative values.
2. An adder for summing the input signals, weighted by the respective synapses of the
3. An activation function for limiting the amplitude of the output of a neuron.
The neuron model also includes an externally applied bias, b k. the bias has the effect
of increasing or lowering the net input of the activation function, depending on whether it
is positive or negative, respectively. Page 5
A neuron k may be mathematically described as follows:

where x1, x2, …, xm are the input signals; W k1, Wk2, …, Wkm are the synaptic weights of the
neuron k; uk is the linear combiner output due to the input signal; b k is the bias; vk is the
induced local field; (.) is the activation function and yk is the output signal of the neuron k.
The use of bias bk has the effect of applying an affine transformation to the output u k
of the linear combiner.
So, we can have
vk = uk + bk ---- (2)
Now, the equation (1) will be written as follows:

Due to this affine transformation, the graph of vk versus uk no longer passes through
the origin.
vk is called the induced local field or activation potential of neuron k. In vk we have
added a synapse. Its input is x0 = +1 and weight is Wk0 = bk. Page 6
T ypes of A ctivation F unction

1. Threshold function
2. Piecewise-linear function
3. Sigmoid function

The activation function defines the output of a neuron in terms of the induced local
field vk.

1. Threshold function
The function is defined as

1, if v  0
 (v )  
0, if v  0

This form of a threshold function is also called as Heaviside function.

Correspondingly, the output of neuron k is expressed as

1, if vk  0
yk  
0, if vk  0
vk  Wkj x j  bk
j 1

This model is also called the McCullouch-Pitts model. In this model, the output of a
neuron is 1, if the induced local field of that neuron is nonnegative, and 0 otherwise. This
statement describes the all-or-none property of the model. Page 7
2. Piecewise-linear function
The activation function, here, is defined as

 1
1, v 

 1 1
 ( v )   v,   v  
 2 2
 1
 0, v
 2

where the amplification factor inside the linear region of operation is assumed to be unit.
Two situations can be observed for this function:
 A linear combiner arises if the linear region of operation is maintained without
running into situation.
 The piecewise-linear function reduces to a threshold function if the amplification
factor of the linear region is made infinitely large.

3. Sigmoid function
This is the most common form of activation function used in the construction of
artificial neural networks. It is defined as a strictly increasing function that exhibits a
graceful balance between linear and nonlinear behavior.
An example of sigmoid function is the logistic function, which is defined as

 (v ) 
1  e av

where a is the slope parameter of the sigmoid function. Page 8
N eural N etworks and D irected G raphs
The neural network can be represented through a signal-flow graph. A signal-flow
graph is a network of directed links (branches) that are interconnected at certain points
called nodes. A typical node j has an associated node signal x j. A typical directed link
originates at node j and terminates on node k; it has an associated transfer function or
transmittance that specifies manner in which the signal y k at node k depends on the signal
xj at node j.
The flow of signals in the various parts of the graph is directed by three basic rules.

A signal flows along a link only in the direction defined by the arrow on the link.
There are two types of links:
 Synaptic links: whose behavior is governed by a linear input-output relation.
Here, we have yk = Wkjxj.
For example,

 Activation links: whose behavior is governed by a nonlinear input-output

For example,

A node signal equals the algebraic sum of all signals entering the pertinent node via
the incoming links. This also called the synaptic convergence or fan-in.
For example, Page 9
The signal at a node is transmitted to each outgoing link originating from that node.
For example,

This rule is also called the synaptic divergence or fan-out.

A neural network is a directed graph consisting of nodes with interconnecting
synaptic and activation links. It is characterized by four properties:
1. Each neuron is represented by a set of linear synaptic links, an externally applied
bias, and a possibly nonlinear activation link. The bias is represented by a
synaptic link connected to an input fixed at +1.
2. The synaptic links of a neuron weight their respective input signals.
3. The weighted sum of the input signals defines the induced local field of the
neuron under study.
4. The activation link squashes the induced local field of the neuron to produce an
Note: A digraph describes not only the signal flow from neuron to neuron, but also the
signal flow inside each neuron.

N eural N etworks and A rchitectures

There are three fundamentally different classes of network architectures:
1. Single-layer feedforward networks
2. Multilayer feedforward networks
3. Recurrent networks or neural networks with feedback

1. Single-layer feedforward networks

In a layered neural network, the neurons are organized in the form of layers. The
simplest form of a layered network has an input layer of source nodes that project onto an
output layer but not vice versa. Page 10
For example,

The above network is a feedforward or acyclic type. This is also called a single-layer
network. The single layer refers to the output layer as computations take place only at the
output nodes.

2. Multilayer feedforward networks

In this class, a neural network has one or more hidden layers, whose computation
nodes are called hidden neurons or hidden units. The function of hidden neurons is to
intervene between the external input and the network output in a useful manner. By
adding one or more hidden layers, the network is enabled to extract higher-order statistics.
This is essentially required when the size of the input layer is large. For example, Page 11
 The source nodes in the input layer supply respective elements of the
activation pattern (input vector), which constitutes the input signals applied
to the second layer.
 The output signals of the second layer are used as inputs to the inputs to the
third layer, and so on for the rest of the network.
 The set of output signals of the neurons in the output layer constitutes the
overall response of the network to the activation pattern supplied by the
source nodes in the input layer.

3. Recurrent networks or neural networks with feedback

In this class, a network will have at least one feedback loop.
For example,

The above is a recurrent network with no hidden neurons. The presence of feedback
loops has an impact on the learning capability of the network and on its performance.
Moreover, the feedback loops involve the use of unit-delay elements (denoted by Z-1),
which result in a nonlinear dynamical behavior of the network. Page 12
K nowledge R epresentation
Knowledge refers to stored information or models used by a person or machine to
interpret, predict, and appropriately respond to the outside world.
Knowledge representation involves the following:
1. Indentifying the information that is to be processed
2. Physically encoding the information for subsequent use
Knowledge representation is goal directed. In real-world applications of “intelligent”
machines, a good solution depends on a good representation of knowledge.
A major task for a neural network is to provide a model for a real-time environment
into which it is embedded. Knowledge of the world consists of two kinds of information:
1. Prior information: It gives the known state of the world. It is represented by facts
about what is and what has been known.
2. Observations: These are the measures of the world. These are obtained by the
sensors that probe the environment where the neural network operates.
The set of input-output pairs, with each pair consisting of an input signal and the
corresponding desired response is called a set of training data or training sample.
Ex: Handwritten digital recognition.

The training sample consists of a large variety of handwritten digits that are
representative of a real-time situation. Given such a set of examples, the design of a neural
network may proceed as follows:
Step-1: Select an appropriate architecture for the NN, with an input layer consisting of
source nodes equal in number to the pixels of an input image, and an output layer
consisting of 10 neurons (one for each digit). A subset of examples is then used to
train the network by means of a suitable algorithm. This phase is the learning
Step-2: The recognition performance of the trained network is tested with data not seen
before. Here, an input image is presented to the network and not its corresponding
digit. The NN now compares the input image with the stored image of digits and
then produces the required output digit. This phase is called the generalization.

Note: The training data for a NN may consist of both positive and negative examples. Page 13
Example: A simple neuronal model for recognizing handwritten digits.
Consider an input set X of key patterns X1, X2, X3, ……
Each key pattern represents a specific handwritten digit.
The network has k neurons.
Let W = {w1j(i), w2j(i), w3j(i), ……}, for j= 1,2,3, …., k be the set of weights of X1, X2,
X3, ….. with respect to each of k neurons in the network. i referrers to an instance.
Let y(j) be the generated output of neuron j for j=1,2,…k.
Let d(j) be the desired output of neuron j, for j=1,2,…..k.
Let e(j)= d(j) – y(j) be the error that is calculated at neuron j, for j = 1,2,…,k.
Now we design the neuronal model for the system as follows.

In the above model, each neuron computes a specific digit j. With every key pattern,
synapses are established to every neuron in the model. We assumed that the weights of
each key pattern can be either 0 or 1. Page 14
Ex: Let the key pattern x1 corresponds a hand written digit 1. So its synaptic weight
W11(i) should be 1 for the 1st neuron and all other synaptic weights for x1 is must be 0.
Weight matrix for the above model can be as follows.

Now the output for the neuron will be computed as follows.

Y(1) = w11x1+w21x2+w31x3+……………….+w91x9
= 1.(x1)+0.(x2)+0.(x3)+…………..+0.(x9)
= x1
Which means that neuron 1 is designed to recognize only the key pattern x 1 which
corresponds to the hand written digit 1. In the same way all other neurons in the model
have to recognize their respective digits.
Rules for knowledge representation
Rule-1: Similar inputs from similar classes should produce similar representations
inside the network, and should belong to the same category. The concept of
Euclidean distance is used as a measure of the similarity between inputs.
Let Xi denote the m x 1 vector.
Xi = [xi1, xi2, …, xim]T.
The vector Xi defines a point in an m-dimensional space called Euclidian
space denoted by Rm.
Now, the Euclidean distance between Xi and Xj is defined by
d ( X i , X j , ) || X i  X j ||
  (x
k 1
ik  x jk ) 2 Page 15
The two inputs Xi and Xj are said to be similar if d(Xi, Xj) is minimum.

Rule-2: Items to be categorized as separate classes should be given widely different

representations in the network.
This rule is the exact opposite of rule-1.
Rule-3: If a particular feature is important, then there should be a large number of
neurons involved in the representation of that item in the network.
Ex: A radar application involving the detection of a target in the presence of
clutter. The detection performance of such a radar system is measured in
terms of two probabilities.
 Probability of detection
 Probability of false alarm
Rule-4: Prior information and invariances should be built into the design of a NN,
thereby simplifying the network design by not having to learn them.

How to build prior information into NN design?

We can use a combination of two techniques:
1. Restricting the network architecture through the use of local connections known
as receptive fields.
2. Constraining the choice of synaptic weight through the use of weight sharing.

How to build invariances into NN design?

 Coping with a range of transformations of the observed signals.
 Pattern recognition.
 Need of a system that is capable of understanding the whole environment.
A primary requirement of pattern recognition is to design a classifier that is
invariant to the transformations.
There are three techniques for rendering classifier-type NNs invariant to
1. Invariance by structure
2. Invariance by training
3. Invariant feature space Page 16
B L asic earning L aws
A neural network learns about its environment through an interactive process of
adjustments applied to its synaptic weights and bias levels. The network becomes more
knowledgeable after each iteration of the learning process.
Learning is a process by which the free parameters of a neural network are adapted
through a process of stimulation by the environment in which the network is embedded.
The operation of a neural network is governed by neuronal dynamics. Neuronal
dynamics consists of two parts: one corresponding to the dynamics of the activation state
and the other corresponding to the dynamics of the synaptic weights.
The Short Term Memory (STM) in neural networks is modeled by the activation
state of the network. The Long Term Memory (LTM) corresponds to the encoded pattern
information in the synaptic weights due to learning.
Learning laws are merely implementation models of synaptic dynamics. Typically, a
model of synaptic dynamics is described in terms of expressions for the first derivative of
the weights. They are called learning equations.
Learning laws describe the weight vector for the ith processing unit at time instant
(t+1) in terms of the weight vector at time instant (t) as follows:
Wi(t+1) = Wi(t) + Wi(t)
where Wi(t) is the change in the weight vector.
There are different methods for implementing the learning feature of a neural
network, leading to several learning laws. Some basic learning laws are discussed below.
All these learning laws use only local information for adjusting the weight of the connection
between two units.

Hebb’s Laws
Here the change in the weight vector is given by
Wi(t) = f(WiTa)a
Therefore, the jth component of Wi is given by
wij = f(WiTa)aj
= siaj, for j = 1, 2, …, M.
where si is the output signal of the ith unit. a is the input vector.
The Hebb’s law states that the weight increment is proportional to the product of
the input data and the resulting output signal of the unit. This law requires weight
initialization to small random values around wij = 0 prior to learning. This law represents
an unsupervised learning. Page 17
Perceptron Learning Law

Here the change in the weight vector is given by

Wi = [di – sgn(WiTa)]a
where sgn(x) is sign of x. Therefore, we have
wij = [di – sgn(WiTa)]aj
= (di – Si) aj, for j = 1, 2, …, M.

The perceptron law is applicable only for bipolar output functions f(.). This is also
called discrete perceptron learning law. The expression for wij shows that the weights are
adjusted only if the actual output si is incorrect, since the term in the square brackets is
zero for the correct output.
This is a supervised learning law, as the law requires a desired output for each
input. In implementation, the weights can be initialized to any random initial values, as
they are not critical. The weights converge to the final values eventually by repeated use of
the input-output pattern pairs, provided the pattern pairs are representable by the system.

Delta Learning Law

Here the change in the weight vector is given by

Wi = [di – f(WiTa)] f(WiTa)a
where f(x) is the derivative with respect to x. Hence,
wij = [di – f(WiTa)] f(WiTa)aj
= [di - si] f(xi) aj, for j = 1, 2, …, M.
This law is valid only for a differentiable output function, as it depends on the
derivative of the output function f(.). It is a supervised learning law since the change in the
weight is based on the error between the desired and the actual output values for a given
Delta learning law can also be viewed as a continuous perceptron learning law.
In-implementation, the weights can be initialized to any random values as the values
are not very critical. The weights converge to the final values eventually by repeated use of
the input-output pattern pairs. The convergence can be more or less guaranteed by using
more layers of processing units in between the input and output layers. The delta learning
law can be generalized to the case of multiple layers of a feedforward network. Page 18
Widrow and Hoff LMS Learning Law

Here, the change in the weight vector is given by

Wi = [di - WiTa]a
wij = [di - WiTa]aj, for j = 1, 2, …, M.
This is a supervised learning law and is a special case of the delta learning law,
where the output function is assumed linear, i.e., f(xi) = xi.
In this case the change in the weight is made proportional to the negative gradient
of the error between the desired output and the continuous activation value, which is also
the continuous output signal due to linearity of the output function. Hence, this is also
called the Least Mean Squared (LMS) error learning law.
In implementation, the weights may be initialized to any values. The input-output
pattern pairs data is applied several times to achieve convergence of the weights for a
given set of training data. The convergence is not guaranteed for any arbitrary training data

Correlation Learning Law

Here, the change in the weight vector is given by

Wi = dia
wij = diaj
This is a special case of the Hebbian learning with the output signal (si) being
replaced by the desired signal (di). But the Hebbian learning is an unsupervised learning,
whereas the correlation learning is a supervised learning, since it uses the desired output
value to adjust the weights. In the implementation of the learning law, the weights are
initialised to small random values close to zero, i.e., wij ≈ 0. Page 19
Instar (Winner-take-all) Learning Law

This is relevant for a collection of neurons, organized in a layer as shown below.

All the inputs are connected to each of the units in the output layer in a feedforward
manner. For a given input vector a, the output from each unit i is computed using the
weighted sum wiTa. The unit k that gives maximum output is identified. That is
WkT  max(Wi T a )

Then the weight vector leading to the kth unit is adjusted as follows:
Wk = (a - Wk)
wkj = (aj - wkj), for j = 1, 2, …, M.
The final weight vector tends to represent a group of input vectors within a small
neighbourhood. This is a case of unsupervised learning. In implementation, the values of
the weight vectors are initialized to random values prior to learning, and the vector lengths
are normalized during learning.

Outstar Learning Law

The outstar learning law is also related to a group of units arranged in a layer as
shown below. Page 20
In this law the weights are adjusted so as to capture the desired output pattern
characteristics. The adjustment of the weights is given by
Wjk = (dj - wjk), for j = 1, 2, …, M
where the kth unit is the only active unit in the input layer. The vector d = (d1, d2, …, dM)T is
the desired response from the layer of M units.
The outstar learning is a supervised learning law, and it is used with a network of
instars to capture the characteristics of the input and output patterns for data compression.
In implementation, the weight vectors are initialized to zero prior to learning.

P attern R ecognition
Data refers to the collection of raw facts, whereas, the pattern refers to an observed
sequence of facts.
The main difference between human and machine intelligence comes from the fact
that humans perceive everything as a pattern, whereas for a machine everything is data.
Even in routine data consisting of integer numbers (like telephone numbers, bank account
numbers, car numbers) humans tend to perceive a pattern. If there is no pattern, then it is
very difficult for a human being to remember and reproduce the data later.
Thus storage and recall operations in human beings and machines are performed by
different mechanisms. The pattern nature in storage and recall automatically gives
robustness and fault tolerance for the human system.
Pattern recognition tasks
Pattern recognition is the process of identifying a specified sequence that is hidden
in a large amount of data.
Following are the pattern recognition tasks.
1. Pattern association
2. Pattern classification
3. Pattern mapping
4. Pattern grouping
5. Feature mapping
6. Pattern variability
7. Temporal patterns
8. Stability-plasticity dilemma Page 21
Basic ANN Models for Pattern Recognition Problems

1. Feedforward ANN
 Pattern association
 Pattern classification
 Pattern mapping/classification
2. Feedback ANN
 Autoassociation
 Pattern storage (LTM)
 Pattern environment storage (LTM)
3. Feedforward and Feedback (Competitive Learning) ANN
 Pattern storage (STM)
 Pattern clustering
 Feature mapping

In any pattern recognition task we have a set of input patterns and the
corresponding output patterns. Depending on the nature of the output patterns and the
nature of the task environment, the problem could be identified as one of association or
classification or mapping.
The given set of input-output pattern pairs form only a few samples of an unknown
system. From these samples the pattern recognition model should capture the
characteristics of the system.
Without looking into the details of the system, let us assume that the input-output
patterns are available or given to us. Without loss of generality, let us also assume that the
patterns could be represented as vectors in multidimensional spaces. Page 22