Sie sind auf Seite 1von 35

How the use of neural network can simplify coding of OCR applications.

• Download demo project - 22.6 Kb


• Download source - 51.1 Kb

Introduction

A lot of people today are trying to write their own OCR (Optical Character Recognition)
System or to improve the quality of an existing one.

This article shows how the use of artificial neural network simplifies development of an
optical character recognition application, while achieving highest quality of recognition and
good performance.

Background

Developing proprietary OCR system is a complicated task and requires a lot of effort. Such
systems usually are really complicated and can hide a lot of logic behind the code. The use
of artificial neural network in OCR applications can dramatically simplify the code and
improve quality of recognition while achieving good performance. Another benefit of using
neural network in OCR is extensibility of the system � ability to recognize more character
sets than initially defined. Most of traditional OCR systems are not extensible enough. Why?
Because such task as working with tens of thousands Chinese characters, for example, is
not as easy as working with 68 English typed character set and it can easily bring the
traditional system to its knees!

Well, the Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such
kind of problems. The ANN is an information-processing paradigm inspired by the way the
human brain processes information. Artificial neural networks are collections of
mathematical models that represent some of the observed properties of biological nervous
systems and draw on the analogies of adaptive biological learning. The key element of ANN
is topology. The ANN consists of a large number of highly interconnected processing
elements (nodes) that are tied together with weighted connections (links). Learning in
biological systems involves adjustments to the synaptic connections that exist between the
neurons. This is true for ANN as well. Learning typically occurs by example through training,
or exposure to a set of input/output data (pattern) where the training algorithm adjusts
the link weights. The link weights store the knowledge necessary to solve specific problems.

Originated in late 1950's, neural networks didn�t gain much popularity until 1980s � a
computer boom era. Today ANNs are mostly used for solution of complex real world
problems. They are often good at solving problems that are too complex for conventional
technologies (e.g., problems that do not have an algorithmic solution or for which an
algorithmic solution is too complex to be found) and are often well suited to problems that
people are good at solving, but for which traditional methods are not. They are good pattern
recognition engines and robust classifiers, with the ability to generalize in making decisions
based on imprecise input data. They offer ideal solutions to a variety of classification
problems such as speech, character and signal recognition, as well as functional prediction
and system modeling, where the physical processes are not understood or are highly
complex. The advantage of ANNs lies in their resilience against distortions in the input data
and their capability to learn.

Using the code

In this article I use a sample application from Neuro.NET library to show how to use
Backpropagation neural network in a simple OCR application.

Let�s assume you that you already have gone through all image pre-processing routines
(resampling, deskew, zoning, blocking etc.) and you already have images of the characters
from your document. (In the example I simply generate those images).

Creating the neural network.

Let�s construct the network first. In this example I use a Backpropagation neural network.
The Backpropagation network is a multilayer perceptron model with an input layer, one or
more hidden layers, and an output layer.

The nodes in the Backpropagation neural network are interconnected via weighted links with
each node usually connecting to the next layer up, till the output layer which provides
output for the network. The input pattern values are presented and assigned to the input
nodes of the input layer. The input values are initialized to values between -1 and 1. The
nodes in the next layer receive the input values through links and compute output values of
their own, which are then passed to the next layer. These values propagate forward through
the layers till the output layer is reached, or put another way, till each output layer node
has produced an output value for the network. The desired output for the input pattern is
used to compute an error value for each node in the output layer, and then propagated
backwards (and here's where the network name comes in) through the network as the delta
rule is used to adjust the link values to produce better, the desired output. Once the error
produced by the patterns in the training set is below a given tolerance, the training is
complete and the network is presented new input patterns and produce an output based on
the experience it gained from the learning process.

I will use a library class BackPropagationRPROPNetwork to construct my own


OCRNetwork.

Collapse | Copy Code

//Inherit form Backpropagation neural network


public class OCRNetwork: BackPropagationRPROPNetwork
{
//Override method of the base class in order to implement our
//own training method
public override void Train(PatternsCollection patterns)
{
...
}
}

I override the Train method of the base class to implement my own training method. Why
do I need to do it? I do it because of one simple reason: the training progress of the
network is measured by quality of produced result and speed of training. You have to
establish the criteria when the quality of network output is acceptable for you and when you
can stop the training process. The implementation I provide here is proven (based on my
experience) to be fast and accurate. I decided that I can stop the training process when
network is able to recognize all of the patterns, without a single error. So, here is the
implementation of my training method.

Collapse | Copy Code

public override void Train(PatternsCollection patterns)


{ //Current iteration number
if (patterns != null)
{
double error = 0;
int good = 0;
// Train until all patterns are correct
while (good < patterns.Count)
{
good = 0;
for (int i = 0; i<patterns.Count; i++)
{
//Set the input values of the network
for (int k = 0; k<NodesInLayer(0); k++)
nodes[k].Value = patterns[i].Input[k];
//Run the network
this.Run();
//Set the expected result
for (int k = 0;k< this.OutputNodesCount;k++)
this.OutputNode(k).Error = patterns[i].Output[k];
//Make the network to remember corresponding output
//values. (Teach the network)
this.Learn();
//See if network did produced correct result during
//this iteration
if (BestNodeIndex == OutputPatternIndex(patterns[i]))
good++;
}
//Adjust weights of the links in the network to their
//average value. (An epoch training technique)
foreach (NeuroLink link in links)
((EpochBackPropagationLink)link).Epoch(patterns.Count);
}
}
}

Also, I have implemented a BestNodeIndex property that returns the index of the node
having maximum value and having the minimal error. An OutputPatternIndex method
returns the index of the pattern output element having value of 1. If those indices are
matched � the network has produced correct result. Here is how the BestNodeIndex
implementation looks like:

Collapse | Copy Code

public int BestNodeIndex


{
get {
int result = -1;
double aMaxNodeValue = 0;
double aMinError = double.PositiveInfinity;
for (int i = 0; i< this.OutputNodesCount;i++)
{
NeuroNode node = OutputNode(i);
//Look for a node with maximum value or lesser error
if ((node.Value > aMaxNodeValue)||
((node.Value >= aMaxNodeValue)&&(node.Error <aMinError)))
{
aMaxNodeValue = node.Value;
aMinError = node.Error;
result = i;
}
}
return result;
}
}

As simple as it gets I create the instance of the neural network. The network has one
constructor parameter � integer array describing number of nodes in each layer of the
network. First layer in the network is an input layer. The number of elements in this layer
corresponds to number of elements in input pattern and is equal to number of elements in
digitized image matrix (we will talk about it later). The network may have multiple middle
layers with different number of nodes in each layer. In this example I use only one layer
and apply �not official rule of thumb� to determine number of nodes in this layer:

Collapse | Copy Code

NodesNumber = (InputsCount+OutputsCount) / 2

Note: You can experiment by adding more middle layers and using different number of
nodes in there - just to see how it will affect the training speed and recognition quality of
the network.

The last layer in the network is an output layer. This is the layer where we look for the
results. I define the number of nodes in this layer equal to a number of characters that we
going to recognize.

Collapse | Copy Code

//Create an instance of the network


backpropNetwork = new OCRNetwork(new int[3] {aMatrixDim * aMatrixDim,
(aMatrixDim * aMatrixDim + aCharsCount)/2, aCharsCount});

Creating training patterns


Now let's talk about the training patterns. Those patterns will be used for teaching the
neural network to recognize the images. Basically, each training pattern consists of two
single-dimensional arrays of float numbers � Inputs and Outputs arrays.

Collapse | Copy Code

/// <summary>
/// A class representing single training pattern and is used to train a
/// neural network. Contains input data and expected results arrays.
/// </summary>
public class Pattern: NeuroObject
{
private double[] inputs, outputs;
...
}

The Inputs array contains your input data. In our case it is a digitized representation of the
character's image. Under �digitizing� the image I mean process of creating a brightness
(or absolute value of the color vector-whatever you choose) map of the image. To create
this map I split the image into squares and calculate average value of each square. Then I
store those values into the array.

I have implemented CharToDoubleArray method of the network to digitize the image.


There I use an absolute value of the color for each element of the matrix. (No doubt that
you can use other techniques there�) After the image is digitized, I have to scale-down the
results in order to fit them into a range from -1 ..1 to comply with input values range of the
network. To do this I wrote a Scale method, where I look for the maximum element value
of the matrix and then divide all elements of the matrix by it. So, implementation of
CharToDoubleArray looks like this:

Collapse | Copy Code

//aSrc � an image of the character


//aArrayDim � dimension of the pattern matrix
//calculate image quotation X step
double xStep = (double)aSrc.Width/(double)aArrayDim;
//calculate image quotation Y step
double yStep = (double)aSrc.Height/(double)aArrayDim;
double[] result = new double[aMatrixDim*aMatrixDim ];
for (int i=0; i<aSrc.Width; i++)
for (int j=0;j<aSrc.Height;j++)
{
//calculate matrix address
int x = (int)(i/xStep);
int y = (int)(j/yStep);
//Get the color of the pixel
Color c = aSrc.GetPixel(i,j);
//Absolute value of the color, but I guess, it is possible to
//use the B component of Alpha color space too...
result[y*x+y]+=Math.Sqrt(c.R*c.R+c.B*c.B+c.G*c.G);
}
//Scale the matrix to fit values into a range from 0..1 (required by
//ANN) In this method we look for a maximum value of the element
//and then divide all elements of the matrix by this maximum value.
return Scale(result);

The Outputs array of the pattern represents an expected result � the result that network
will use during the training. There are as many elements in this array as many characters
we going to recognize. So, for instance, to teach the network to recognize English letters
from �A� to �Z� we will need 25 elements in the Outputs array. Make it 50 if you
decide to include lower case letters. Each element corresponds to a single letter. The
Inputs of each pattern are set to a digitized image data and a corresponding element in
the Outputs array to 1, so network will know which output (letter) corresponds to input
data. The method CreateTrainingPatterns does this job for me.

Collapse | Copy Code

public PatternsCollection CreateTrainingPatterns(Font font) {


//Create pattern collection
// As many inputs (examples) as many elements in digitized image matrix
// As many outputs as many characters we going to recognize.
PatternsCollection result = new PatternsCollection(aCharsCount,
aMatrixDim * aMatrixDim, aCharsCount);
// generate one pattern for each character
for (int i= 0; i<aCharsCount; i++)
{
//CharToDoubleArray creates an image of the character and digitizes it.
//You can change this method to pass actual the image of the character
double[] aBitMatrix = CharToDoubleArray(Convert.ToChar(aFirstChar + i),
font, aMatrixDim, 0);
//Assign matrix value as input to the pattern
for (int j = 0; j<aMatrixDim * aMatrixDim; j++)
result[i].Input[j] = aBitMatrix[j];
//Output value set to 1 for corresponding character.
//Rest of the outputs are set to 0 by default.
result[i].Output[i] = 1;
}
return result;
}

Now we have completed creation of patterns and we can use those to train the neural
network.

Training of the network.

To start training process of the network simple call the Train method and pass your
training patterns in it.

Collapse | Copy Code

//Train the network


backpropNetwork.Train(trainingPatterns);

Normally, an execution flow will leave this method when training is complete, but in some
cases it could stay there forever (!).The Train method is currently implemented relying
only on one fact: the network training will be completed sooner or later. Well, I admit - this
is wrong assumption and network training may never complete. The most �popular�
reasons for neural network training failure are:

Training never completes because: Possible solution


1. The network topology is too simple
to handle amount of training patterns Add more nodes into middle layer or add more
you provide. You will have to create middle layers to the network.
bigger network.
2. The training patterns are not clear As a solution you can clean the patterns or you can
enough, not precise or are too use different type of network /training algorithm.
complicated for the network to Also, you cannot train the network to guess next
differentiate them. winning lottery numbers... :-)
3. Your training expectations are too Lower your expectations. The network could be
high and/or not realistic. never 100% "sure"
4. No reason Check the code!

Most of those reasons are very easy to resolve and it is a good subject for a future article.
Meanwhile, we can enjoy the results.

Enjoying the results

Now we can see what the network has learned. Following code fragment shows how to use
trained neural network in your OCR application.

Collapse | Copy Code

//Get your input data


double[] aInput = ... (your digitized image of the character)
//Load the data into the network
for (int i = 0; i< backpropNetwork.InputNodesCount;i++)
backpropNetwork.InputNode(i).Value = aInput[i];
//Run the network
backpropNetwork.Run();
//Get result from the network and convert it to a character
return Convert.ToChar(aFirstChar + backpropNetwork.BestNodeIndex).ToString();

In order to use the network you have to load your data into input layer. Then use the Run
method to let the network process your data. Finally, get your results out from output nodes
of the network and analyze those (The BestNodeIndex property I created in OCRNetwork
class does this job for me).

License

This article, along with any associated source code and files, is licensed under The GNU
General Public License (GPLv3)
Introduction

There are many different approaches to optical character recognition problem. One of the
most common and popular approaches is based on neural networks, which can be applied to
different tasks, such as pattern recognition, time series prediction, function approximation,
clustering, etc.

In this article, I'll try to review some approaches for optical character recognition using
artificial neural networks. The attached project is aimed as a research project, so don't try
to find here a ready solution for scanned document processing.

Popular approach

The most popular and simple approach to OCR problem is based on feed forward neural
network with backpropagation learning. The main idea is that we should first prepare a
training set and then train a neural network to recognize patterns from the training set. In
the training step we teach the network to respond with desired output for a specified input.
For this purpose each training sample is represented by two components: possible input and
the desired network's output for the input. After the training step is done, we can give an
arbitrary input to the network and the network will form an output, from which we can
resolve a pattern type presented to the network.

Let's assume that we want to train a network to recognize 26 capital letters represented as
images of 5x6 pixels, something like this one:
One of the most obvious ways to convert an image to an input part of a training sample is
to create a vector of size 30 (for our case), containing "1" in all positions corresponding to
the letter pixel and "0" in all positions corresponding to the background pixels. But, in many
neural network training tasks, it's preferred to represent training patterns in so called
"bipolar" way, placing into input vector "0.5" instead of "1" and "-0.5" instead of "0". Such
sort of pattern coding will lead to a greater learning performance improvement. Finally, our
training sample should look something like this:

Collapse | Copy Code

float[] input_letterK = new float[] {


0.5f, -0.5f, -0.5f, 0.5f, 0.5f,
0.5f, -0.5f, 0.5f, -0.5f, -0.5f,
0.5f, 0.5f, -0.5f, -0.5f, -0.5f,
0.5f, -0.5f, 0.5f, -0.5f, -0.5f,
0.5f, -0.5f, -0.5f, 0.5f, -0.5f,
0.5f, -0.5f, -0.5f, -0.5f, 0.5f};

For each possible input we need to create a desired network's output to complete the
training samples. For OCR task it's very common to code each pattern as a vector of size 26
(because we have 26 different letters), placing into the vector "0.5" for positions
corresponding to the pattern�s type number and "-0.5" for all other positions. So, a desired
output vector for letter "K" will look something like this:

Collapse | Copy Code

// 0.5 is placed only in the position of "K" letter


float[] output_letterK = new float[] {
-0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
-0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
-0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
-0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
-0.5f};

After having such training samples for all letters, we can start to train our network. But, the
last question is about the network's structure. For the above task we can use one layer of
neural network, which will have 30 inputs corresponding to the size of input vector and 26
neurons in the layer corresponding to the size of the output vector.
Collapse | Copy Code

// pattern size
int patternSize = 30;
// patterns count
int patterns = 26;

// learning input vectors


float[][] input = new float[26][]
{
...
new float [] {
0.5f, -0.5f, -0.5f, 0.5f, 0.5f,
0.5f, -0.5f, 0.5f, -0.5f, -0.5f,
0.5f, 0.5f, -0.5f, -0.5f, -0.5f,
0.5f, -0.5f, 0.5f, -0.5f, -0.5f,
0.5f, -0.5f, -0.5f, 0.5f, -0.5f,
0.5f, -0.5f, -0.5f, -0.5f, 0.5f}, // Letter K
...
};
// learning ouput vectors
float[][] output = new float[26][]
{
...
new float [] {
-0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
-0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
-0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
-0.5f, -0.5f, -0.5f, -0.5f, -0.5f,
-0.5f}, // Letter K
...
};

// create neural network


AForge.NeuralNet.Network neuralNet =
new AForge.NeuralNet.Network(new BipolarSigmoidFunction(2.0f),
patternSize, patterns);
// randomize network`s weights
neuralNet.Randomize();

// create network teacher


AForge.NeuralNet.Learning.BackPropagationLearning teacher = new
AForge.NeuralNet.Learning.BackPropagationLearning(neuralNet);
teacher.LearningLimit = 0.1f;
teacher.LearningRate = 0.5f;

// teach the network


int i = 0;
do
{
teacher.LearnEpoch(input, output);
i++;
}
while (!teacher.IsConverged);

//
System.Diagnostics.Debug.WriteLine("total learning " +
"epoch: " + i);

In the sample above a complete neural network training procedure for pattern recognition
task is provided. On each learning epoch all samples from the training set are presented to
the network and the summary squared error is calculated. When the error becomes less
than the specified error limit, then the training is done and the network can be used for
recognition.

How to recognize something? We need to input to the trained network and get its output.
Then we should find an element in the output vector with the maximum value. The
element's number will point us to the recognized pattern:

Collapse | Copy Code

// "K" letter, but a little bit noised


float[] pattern = new float [] {
0.5f, -0.5f, -0.5f, 0.5f, 0.5f,
0.5f, -0.5f, 0.5f, -0.5f, 0.5f,
0.5f, 0.5f, -0.5f, -0.5f, -0.5f,
0.5f, -0.5f, 0.5f, -0.5f, -0.5f,
0.5f, -0.5f, -0.5f, 0.5f, -0.5f,
0.3f, -0.5f, -0.5f, 0.5f, 0.5f};

// get network's output


float[] output = neuralNet.Compute(pattern);

int i, n, maxIndex = 0;

// find the maximum from output


float max = output[0];
for (i = 1, n = output.Length; i < n; i++)
{
if (output[i] > max)
{
max = output1[i];
maxIndex = i;
}
}

//
System.Diagnostics.Debug.WriteLine(
"network thinks it is - " + (char)((int) 'A' + maxIndex));

Another approach

The approach described above works fine. But there are some issues. Suppose we train our
network using the above training set with letters of size 5x6. But, what should we do, if we
need to recognize a letter, which is represented by an 8x8 image? The obvious answer is to
resize the image. But, what about the image, which contains a letter, that is printed with 72
font size? I don't think we'll get a good result after resizing it to 5x6 image. OK, let's train
our network using 8x8 images, or even 16x16 to get high accuracy. But, 16x16 images will
lead to an input vector of size 256, which will be more performance consuming to train the
neural network.

Another idea is based on using the so called receptors. Suppose we have an image with a
letter of arbitrary size. In this approach we'll form an input vector by using not the pixel
values of the image, but by using the receptors values. What are these receptors ?
Receptors are represented by a set of lines with arbitrary size and direction. Any receptor
will have an activated value ("0.5" in input vector) if it crosses a letter and deactivated
value ("-0.5" in input vector) if it does not cross a letter. The size of an input vector will be
the same as receptors count. By the way, we can use a set of short horizontal and vertical
receptors to achieve the same effect as in the case of using pixel values for images of small
size.

The advantage of this method is that we can train neural network on big images even with
small amount of receptors. Resizing an image with letter to 75x75 (or even 150x150) pixels
will not lead to a bad image quality, so it will be much easer to recognize. But, on the other
hand, we can always easily resize our receptors set, because they are defined as lines with
two coordinates. And another big advantage is that we can try to generate a rather small
receptors set, which will be able to recognize the entire training set using only most
significant letter�s features.
But, there are some disadvantages. The described approach can be applied only to OCR
task. It's not possible to recognize complex patterns, because in this case we'll need too
many receptors. But, as we are doing research in OCR task area, it will not disturb us very
much. There is another question, which is much harder and which requires more research.
How to generate the receptors set? Manually or randomly? How to be sure that the set is
optimal?

We can use the next approach for receptors set generation: first, we'll randomly generate a
large set of receptors, and then we'll choose a specified amount of best receptors. How to
resolve if the specified receptor is good or not ? We'll try to use entropy, which is well
known to us from the information theory. Let me explain it with a small example:

Here is a table, which contains some training data. Here we can see five types of objects,
which are represented by rows, and three receptors, which are represented by columns.
Each object has five different variants. Let's describe, for example, the first value of the
table: "11101". It means, that the first receptor is crossing the first variant of the first
object, it also crosses the second, third and fifth variants, but it does not cross the fourth
variant. Does it clean? OK, let's look at the fifth row, first column: a receptor crosses the
first, third and fifth variants, and it does not cross the second and the fourth variants.

We'll use two concepts: inner entropy and outer entropy. Inner entropy is an entropy of a
specified receptor for a specified object. The inner entropy will tell us how good is the
specified receptor for recognizing the specified object, and the value should be as small as
possible. Let's look at the second row and the first column where there is "11111". The
entropy of the set will be 0. It's good, because this receptor will be 100% sure, that we are
working with the second object, because the receptor has the same value for all the
variants of the specified object. The same thing is true with the second row and the second
column: "00000". Its entropy is 0 too. But, let's look at the fifth row and the first column:
"10101". The entropy of the set is 0.971 and it's bad, because the receptor is not sure
about the specified object. The outer entropy is calculated for the whole column, the more it
is closer to the value of "1", the better it is. Why? If the outer entropy is small, then the
receptor is useless, because it cannot divide patterns. The best receptor should be activated
for one half of all the objects and deactivated for another half. So, here is the final formula
for calculating the receptors usability: usability = OuterEntropy * (1 -
AverageInnerEntropy). The average inner entropy is just a sum of inner entropies of the
receptor for all the objects divided by the amount of objects.
So, using the idea, initially we should randomly generate a big set of receptors, for example
500. Then we should generate temporary training inputs using these receptors. On the basis
of the data, we can filter a predefined amount of receptors, for example we can save 100
receptors with the best usability. The filtering procedure will reduce the amount of training
data and neural network's input count. Then, with the filtered training data, we can continue
with the training of our network in the same manner as described in the above approach.

Test application

A test application is provided with the article, which tries to implement the second
approach. How to use it? Let's try the first test:

• We need to generate initial receptors set. On application startup it's already


generated, so we can skip this step, if are not planning to change the initial amount
of receptors or the filtered amount.
• Select fonts, which will be used for teaching network. Let it be the regular Arial font
for the first time.
• Generate data. In this step the initial training data will be generated.
• Filter data. In this step the initial receptors set as well as the training data will be
filtered.
• Create network - a neural network will be created.
• Train network - neural networks training.
• Let's look at the misclassified value. It should be "0/26", which means that the
trained network can successfully recognize all patterns from the training set.
• We can ensure this by using the "Draw" and "Recognize" buttons.

After performing all these steps we find that the concept is working! You can try a more
complex test, choosing all regular fonts. But, don't forget to turn on the "Scale" option
before data generation. The option will scale all images from the training set. Then you can
set the error limit of the second pass to "0.5" for faster training or leave it "0.1" if you are
not in a hurry. At the end of training you should get a misclassified value of "0 / 130". You
can check that all images from the training set can be recognized.

You can even try to teach a network all fonts: regular and italic. You should use "Scale"
option for it and you will need to play a little bit with the learning speed and error limit
values. You can also try to use two-layered network. I was able to get a misclassified value
of "4/260" with only 100 receptors.

Conclusion

From the above tests, it looks like the second approach is able to perform the OCR task.
But, all our experiments were made using the ideal training set. The application also allows
to recognize hand drawn letters, but we should always use the "Scale" option in this case,
still the result is not very outstanding. Possible future research can be done in the direction
of better receptors set generation and filtering and image scaling.

But still, it works! And with some additional research and improvements we can try to use it
for some real tasks.
John Bullinaria's Step by Step Guide to Implementing a Neural Network in
C

By John A. Bullinaria from the School of Computer Science of The University of


Birmingham, UK.

This document contains a step by step guide to implementing a simple neural network in C. It is aimed
mainly at students who wish to (or have been told to) incorporate a neural network learning component
into a larger system they are building. Obviously there are many types of neural network one could
consider using - here I shall concentrate on one particularly common and useful type, namely a simple
three-layer feed-forward back-propagation network (multi layer perceptron).

This type of network will be useful when we have a set of input vectors and a corresponding set of output
vectors, and our system must produce an appropriate output for each input it is given. Of course, if we
already have a complete noise-free set of input and output vectors, then a simple look-up table would
suffice. However, if we want the system to generalize, i.e. produce appropriate outputs for inputs it has
never seen before, then a neural network that has learned how to map between the known inputs and
outputs (i.e. our training set) will often do a pretty good job for new inputs as well.

I shall assume that the reader is already familiar with C, and, for more details about neural networks in
general, simply refer the reader to the newsgroup comp.ai.neural-nets and the associated Neural
Networks FAQ. So, let us begin...

A single neuron (i.e. processing unit) takes it total input In and produces an output activation Out. I shall
take this to be the sigmoid function

Out = 1.0/(1.0 + exp(-In)); /* Out = Sigmoid(In) */

though other activation functions are often used (e.g. linear or hyperbolic tangent). This has the effect of
squashing the infinite range of In into the range 0 to 1. It also has the convenient property that its
derivative takes the particularly simple form

Sigmoid_Derivative = Sigmoid * (1.0 - Sigmoid) ;

Typically, the input In into a given neuron will be the weighted sum of output activations feeding
in from a number of other neurons. It is convenient to think of the activations flowing through
layers of neurons. So, if there are NumUnits1 neurons in layer 1, the total activation flowing into
our layer 2 neuron is just the sum over Layer1Out[i]*Weight[i], where Weight[i] is the
strength/weight of the connection between unit i in layer 1 and our unit in layer 2. Each neuron
will also have a bias, or resting state, that is added to the sum of inputs, and it is convenient to
call this weight[0]. We can then write

Layer2In = Weight[0] ; /* start with the bias */


for( i = 1 ; i <= NumUnits1 ; i++ ) { /* i loop over layer 1 units */
Layer2In += Layer1Out[i] * Weight[i] ; /* add in weighted contributions from
layer 1 */

}
Layer2Out = 1.0/(1.0 + exp(-Layer2In)) ; /* compute sigmoid to give activation */

Normally layer 2 will have many units as well, so it is appropriate to write the weights between unit i in
layer 1 and unit j in layer 2 as an array Weight[i][j]. Thus to get the output of unit j in layer 2 we have

Layer2In[j] = Weight[0][j] ;
for( i = 1 ; i <= NumUnits1 ; i++ ) {

Layer2In[j] += Layer1Out[i] * Weight[i][j] ;

}
Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;

Remember that in C the array indices start from zero, not one, so we would declare our
variables as

double Layer1Out[NumUnits1+1] ;
double Layer2In[NumUnits2+1] ;
double Layer2Out[NumUnits2+1] ;
double Weight[NumUnits1+1][NumUnits2+1] ;

(or, more likely, declare pointers and use calloc or malloc to allocate the memory). Naturally, we
need another loop to get all the layer 2 outputs

for( j = 1 ; j <= NumUnits2 ; j++ ) {

Layer2In[j] = Weight[0][j] ;
for( i = 1 ; i <= NumUnits1 ; i++ ) {

Layer2In[j] += Layer1Out[i] * Weight[i][j] ;

}
Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;

Three layer networks are necessary and sufficient for most purposes, so our layer 2 outputs feed into a
third layer in the same way as above

for( j = 1 ; j <= NumUnits2 ; j++ ) { /* j loop computes layer 2 activations */

Layer2In[j] = Weight12[0][j] ;
for( i = 1 ; i <= NumUnits1 ; i++ ) {

Layer2In[j] += Layer1Out[i] * Weight12[i][j] ;


}
Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;

}
for( k = 1 ; k <= NumUnits3 ; k++ ) { /* k loop computes layer 3 activations */

Layer3In[k] = Weight23[0][k] ;
for( j = 1 ; j <= NumUnits2 ; j++ ) {

Layer3In[k] += Layer2Out[j] * Weight23[j][k] ;

}
Layer3Out[k] = 1.0/(1.0 + exp(-Layer3In[k])) ;

The code can start to become confusing at this point - I find that keeping a separate index i, j, k for each
layer helps, as does an intuitive notation for distinguishing between the different layers of weights
Weight12 and Weight23. For obvious reasons, for three layer networks, it is traditional to call layer 1 the
Input layer, layer 2 the Hidden layer, and layer 3 the Output layer. Our network thus takes on the familiar
form that we shall use for the rest of this document

Also, to save getting all the In's and Out's confused, we can write LayerNIn as SumN. Our code can thus
be written

for( j = 1 ; j <= NumHidden ; j++ ) { /* j loop computes hidden unit activations */

SumH[j] = WeightIH[0][j] ;
for( i = 1 ; i <= NumInput ; i++ ) {

SumH[j] += Input[i] * WeightIH[i][j] ;

}
Hidden[j] = 1.0/(1.0 + exp(-SumH[j])) ;
}
for( k = 1 ; k <= NumOutput ; k++ ) { /* k loop computes output unit activations */

SumO[k] = WeightHO[0][k] ;
for( j = 1 ; j <= NumHidden ; j++ ) {

SumO[k] += Hidden[j] * WeightHO[j][k] ;

}
Output[k] = 1.0/(1.0 + exp(-SumO[k])) ;

Generally we will have a whole set of NumPattern training patterns, i.e. pairs of input and target output
vectors,

Input[p][i] , Target[p][k]

labelled by the index p. The network learns by minimizing some measure of the error of the network's
actual outputs compared with the target outputs. For example, the sum squared error over all output units
k and all training patterns p will be given by

Error = 0.0 ;
for( p = 1 ; p <= NumPattern ; p++ ) {

for( k = 1 ; k <= NumOutput ; k++ ) {

Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ;

(The factor of 0.5 is conventionally included to simplify the algebra in deriving the learning algorithm.) If
we insert the above code for computing the network outputs into the p loop of this, we end up with

Error = 0.0 ;
for( p = 1 ; p <= NumPattern ; p++ ) { /* p loop over training patterns */

for( j = 1 ; j <= NumHidden ; j++ ) { /* j loop over hidden units */

SumH[p][j] = WeightIH[0][j] ;
for( i = 1 ; i <= NumInput ; i++ ) {

SumH[p][j] += Input[p][i] * WeightIH[i][j] ;

}
Hidden[p][j] = 1.0/(1.0 + exp(-SumH[p][j])) ;
}
for( k = 1 ; k <= NumOutput ; k++ ) { /* k loop over output units */

SumO[p][k] = WeightHO[0][k] ;
for( j = 1 ; j <= NumHidden ; j++ ) {

SumO[p][k] += Hidden[p][j] * WeightHO[j][k] ;

}
Output[p][k] = 1.0/(1.0 + exp(-SumO[p][k])) ;
Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ;
/* Sum Squared Error */

I'll leave the reader to dispense with any indices that they don't need for the purposes of their own system
(e.g. the indices on SumH and SumO).

The next stage is to iteratively adjust the weights to minimize the network's error. A standard way to do
this is by 'gradient descent' on the error function. We can compute how much the error is changed by a
small change in each weight (i.e. compute the partial derivatives dError/dWeight) and shift the weights by
a small amount in the direction that reduces the error. The literature is full of variations on this general
approach - I shall begin with the 'standard on-line back-propagation with momentum' algorithm. This is not
the place to go through all the mathematics, but for the above sum squared error we can compute and
apply one iteration (or 'epoch') of the required weight changes DeltaWeightIH and DeltaWeightHO using

Error = 0.0 ;
for( p = 1 ; p <= NumPattern ; p++ ) { /* repeat for all the training patterns */

for( j = 1 ; j <= NumHidden ; j++ ) { /* compute hidden unit activations */

SumH[p][j] = WeightIH[0][j] ;
for( i = 1 ; i <= NumInput ; i++ ) {

SumH[p][j] += Input[p][i] * WeightIH[i][j] ;

}
Hidden[p][j] = 1.0/(1.0 + exp(-SumH[p][j])) ;

}
for( k = 1 ; k <= NumOutput ; k++ ) { /* compute output unit activations and
errors */

SumO[p][k] = WeightHO[0][k] ;
for( j = 1 ; j <= NumHidden ; j++ ) {
SumO[p][k] += Hidden[p][j] * WeightHO[j][k] ;

}
Output[p][k] = 1.0/(1.0 + exp(-SumO[p][k])) ;
Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ;
DeltaO[k] = (Target[p][k] - Output[p][k]) * Output[p][k] * (1.0 - Output[p]
[k]) ;

}
for( j = 1 ; j <= NumHidden ; j++ ) { /* 'back-propagate' errors to hidden layer
*/

SumDOW[j] = 0.0 ;
for( k = 1 ; k <= NumOutput ; k++ ) {

SumDOW[j] += WeightHO[j][k] * DeltaO[k] ;

}
DeltaH[j] = SumDOW[j] * Hidden[p][j] * (1.0 - Hidden[p][j]) ;

}
for( j = 1 ; j <= NumHidden ; j++ ) { /* update weights WeightIH */

DeltaWeightIH[0][j] = eta * DeltaH[j] + alpha * DeltaWeightIH[0][j] ;


WeightIH[0][j] += DeltaWeightIH[0][j] ;
for( i = 1 ; i <= NumInput ; i++ ) {

DeltaWeightIH[i][j] = eta * Input[p][i] * DeltaH[j] + alpha *


DeltaWeightIH[i][j];
WeightIH[i][j] += DeltaWeightIH[i][j] ;

}
for( k = 1 ; k <= NumOutput ; k ++ ) { /* update weights WeightHO */

DeltaWeightHO[0][k] = eta * DeltaO[k] + alpha * DeltaWeightHO[0][k] ;


WeightHO[0][k] += DeltaWeightHO[0][k] ;
for( j = 1 ; j <= NumHidden ; j++ ) {

DeltaWeightHO[j][k] = eta * Hidden[p][j] * DeltaO[k] + alpha *


DeltaWeightHO[j][k] ;
WeightHO[j][k] += DeltaWeightHO[j][k] ;

}
(There is clearly plenty of scope for re-ordering, combining and simplifying the loops here - I will leave that
for the reader to do once they have understood what the separate code sections are doing.) The weight
changes DeltaWeightIH and DeltaWeightHO are each made up of two components. First, the eta
component that is the gradient descent contribution. Second, the alpha component that is a 'momentum'
term which effectively keeps a moving average of the gradient descent weight change contributions, and
thus smoothes out the overall weight changes. Fixing good values of the learning parameters eta and
alpha is usually a matter of trial and error. Certainly alpha must be in the range 0 to 1, and a non-zero
value does usually speed up learning. Finding a good value for eta will depend on the problem, and also
on the value chosen for alpha. If it is set too low, the training will be unnecessarily slow. Having it too
large will cause the weight changes to oscillate wildly, and can slow down or even prevent learning
altogether. (I generally start by trying eta = 0.1 and explore the effects of repeatedly doubling or halving
it.)

The complete training process will consist of repeating the above weight updates for a number of epochs
(using another for loop) until some error crierion is met, for example the Error falls below some chosen
small number. (Note that, with sigmoids on the outputs, the Error can only reach exactly zero if the
weights reach infinity! Note also that sometimes the training can get stuck in a 'local minimum' of the error
function and never get anywhere the actual minimum.) So, we need to wrap the last block of code in
something like

for( epoch = 1 ; epoch < LARGENUMBER ; epoch++ ) {

/* ABOVE CODE FOR ONE ITERATION */


if( Error < SMALLNUMBER ) break ;

If the training patterns are presented in the same systematic order during each epoch, it is possible for
weight oscillations to occur. It is therefore generally a good idea to use a new random order for the
training patterns for each epoch. If we put the NumPattern training pattern indices p in random order into
an array ranpat[], then it is simply a matter of replacing our training pattern loop

for( p = 1 ; p <= NumPattern ; p++ ) {

with

for( np = 1 ; np <= NumPattern ; np++ ) {

p = ranpat[np] ;

Generating the random array ranpat[] is not quite so simple, but the following code will do the job

for( p = 1 ; p <= NumPattern ; p++ ) { /* set up ordered array */

ranpat[p] = p ;
}
for( p = 1 ; p <= NumPattern ; p++) { /* swap random elements into each position */

np = p + rando() * ( NumPattern + 1 - p ) ;
op = ranpat[p] ; ranpat[p] = ranpat[np] ; ranpat[np] = op ;

Naturally, one must set some initial network weights to start the learning process. Starting all the weights
at zero is generally not a good idea, as that is often a local minimum of the error function. It is normal to
initialize all the weights with small random values. If rando() is your favourite random number generator
function that returns a flat distribution of random numbers in the range 0 to 1, and smallwt is the maximum
absolute size of your initial weights, then an appropriate section of weight initialization code would be

for( j = 1 ; j <= NumHidden ; j++ ) { /* initialize WeightIH and DeltaWeightIH */

for( i = 0 ; i <= NumInput ; i++ ) {

DeltaWeightIH[i][j] = 0.0 ;
WeightIH[i][j] = 2.0 * ( rando() - 0.5 ) * smallwt ;

}
for( k = 1 ; k <= NumOutput ; k ++ ) { /* initialize WeightHO and DeltaWeightHO */

for( j = 0 ; j <= NumHidden ; j++ ) {

DeltaWeightHO[j][k] = 0.0 ;
WeightHO[j][k] = 2.0 * ( rando() - 0.5 ) * smallwt ;

Note, that it is a good idea to set all the initial DeltaWeights to zero at the same time.

We now have enough code to put together a working neural network program. I have cut and pasted the
above code into the file nn.c (which your browser should allow you to save into your own file space). I
have added the standard #includes, declared all the variables, hard coded the standard XOR training data
and values for eta, alpha and smallwt, #defined an over simple rando(), added some print statements to
show what the network is doing, and wrapped the whole lot in a main(){ }. The file should compile and run
in the normal way (e.g. using the UNIX commands 'cc nn.c -O -lm -o nn' and 'nn').

I've left plenty for the reader to do to convert this into a useful program, for example:

• Read the training data from file


• Allow the parameters (eta, alpha, smallwt, NumHidden, etc.) to be varied during runtime
• Have appropriate array sizes determined and allocate them memory during runtime
• Saving of weights to file, and reading them back in again
• Plotting of errors, output activations, etc. during training

There are also numerous network variations that could be implemented, for example:

• Batch learning, rather than on-line learning


• Alternative activation functions (linear, tanh, etc.)
• Real (rather than binary) valued outputs require linear output functions

Output[p][k] = SumO[p][k] ;

DeltaO[k] = Target[p][k] - Output[p][k] ;

• Cross-Entropy cost function rather than Sum Squared Error

Error -= ( Target[p][k] * log( Output[p][k] ) + ( 1.0 - Target[p][k] ) * log( 1.0 -


Output[p][k] ) ) ;

DeltaO[k] = Target[p][k] - Output[p][k] ;

• Separate training, validation and testing sets


• Weight decay / Regularization
Image Recognition with Neural Networks
By Murat Firat | 30 Oct 2007
This article contains a brief description of BackPropagation Artificial Neural
Network and its implementation for Image Recognition

• Download source -286.16 KB


• Download demo project -257.52 KB

Introduction

Artificial Neural Networks are a recent development tool that are modeled from biological
neural networks. The powerful side of this new tool is its ability to solve problems that are
very hard to be solved by traditional computing methods (e.g. by algorithms). This work
briefly explains Artificial Neural Networks and their applications, describing how to
implement a simple ANN for image recognition.

Background
I will try to make the idea clear to the reader who is just interested in the topic.

About Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) are a new approach that follow a different way from
traditional computing methods to solve problems. Since conventional computers use
algorithmic approach, if the specific steps that the computer needs to follow are not known,
the computer cannot solve the problem. That means, traditional computing methods can
only solve the problems that we have already understood and knew how to solve. However,
ANNs are, in some way, much more powerful because they can solve problems that we do
not exactly know how to solve. That's why, of late, their usage is spreading over a wide
range of area including, virus detection, robot control, intrusion detection systems, pattern
(image, fingerprint, noise..) recognition and so on.

ANNs have the ability to adapt, learn, generalize, cluster or organize data. There are many
structures of ANNs including, Percepton, Adaline, Madaline, Kohonen, BackPropagation and
many others. Probably, BackPropagation ANN is the most commonly used, as it is very
simple to implement and effective. In this work, we will deal with BackPropagation ANNs.

BackPropagation ANNs contain one or more layers each of which are linked to the next
layer. The first layer is called the "input layer" which meets the initial input (e.g. pixels from
a letter) and so does the last one "output layer" which usually holds the input's identifier
(e.g. name of the input letter). The layers between input and output layers are called
"hidden layer(s)" which only propagate the previous layer's outputs to the next layer and
[back] propagates the following layer's error to the previous layer. Actually, these are the
main operations of training a BackPropagation ANN which follows a few steps.

A typical BackPropagation ANN is as depicted below. The black nodes (on the extreme left)
are the initial inputs. Training such a network involves two phases. In the first phase, the
inputs are propagated forward to compute the outputs for each output node. Then, each of
these outputs are subtracted from its desired output, causing an error [an error for each
output node]. In the second phase, each of these output errors is passed backward and the
weights are fixed. These two phases is continued until the sum of [square of output errors]
reaches an acceptable value.
Implementation

The network layers in the figure above are implemented as arrays of structs. The nodes of
the layers are implemented as follows:

Collapse | Copy Code

[Serializable]
struct PreInput
{
public double Value;
public double[] Weights;
};

[Serializable]
struct Input
{
public double InputSum;
public double Output;
public double Error;
public double[] Weights;
};

[Serializable]
struct Hidden
{
public double InputSum;
public double Output;
public double Error;
public double[] Weights;
};

[Serializable]
struct Output<T> where T : IComparable<T>
{
public double InputSum;
public double output;
public double Error;
public double Target;
public T Value;
};

The layers in the figure are implemented as follows (for a three layer network):

Collapse | Copy Code

private PreInput[] PreInputLayer;


private Input[] InputLayer;
private Hidden[] HiddenLayer;
private Output<string>[] OutputLayer;

Training the network can be summarized as follows:

• Apply input to the network.


• Calculate the output.
• Compare the resulting output with the desired output for the given input. This is
called the error.
• Modify the weights for all neurons using the error.
• Repeat the process until the error reaches an acceptable value (e.g. error < 1%),
which means that the NN was trained successfully, or if we reach a maximum count
of iterations, which means that the NN training was not successful.

It is represented as shown below:

Collapse | Copy Code

void TrainNetwork(TrainingSet,MaxError)
{
while(CurrentError>MaxError)
{
foreach(Pattern in TrainingSet)
{
ForwardPropagate(Pattern);//calculate output
BackPropagate()//fix errors, update weights
}
}
}

This is implemented as follows:

Collapse | Copy Code

public bool Train()


{
double currentError = 0;
int currentIteration = 0;
NeuralEventArgs Args = new NeuralEventArgs() ;

do
{
currentError = 0;
foreach (KeyValuePair<T, double[]> p in TrainingSet)
{
NeuralNet.ForwardPropagate(p.Value, p.Key);
NeuralNet.BackPropagate();
currentError += NeuralNet.GetError();
}

currentIteration++;

if (IterationChanged != null && currentIteration % 5 == 0)


{
Args.CurrentError = currentError;
Args.CurrentIteration = currentIteration;
IterationChanged(this, Args);
}

} while (currentError > maximumError && currentIteration <


maximumIteration && !Args.Stop);

if (IterationChanged != null)
{
Args.CurrentError = currentError;
Args.CurrentIteration = currentIteration;
IterationChanged(this, Args);
}
if (currentIteration >= maximumIteration || Args.Stop)
return false;//Training Not Successful

return true;
}

Where ForwardPropagate(..) and BackPropagate() methods are as shown for a three


layer network:

Collapse | Copy Code

private void ForwardPropagate(double[] pattern, T output)


{
int i, j;
double total;
//Apply input to the network
for (i = 0; i < PreInputNum; i++)
{
PreInputLayer[i].Value = pattern[i];
}
//Calculate The First(Input) Layer's Inputs and Outputs
for (i = 0; i < InputNum; i++)
{
total = 0.0;
for (j = 0; j < PreInputNum; j++)
{
total += PreInputLayer[j].Value * PreInputLayer[j].Weights[i];
}
InputLayer[i].InputSum = total;
InputLayer[i].Output = F(total);
}
//Calculate The Second(Hidden) Layer's Inputs and Outputs
for (i = 0; i < HiddenNum; i++)
{
total = 0.0;
for (j = 0; j < InputNum; j++)
{
total += InputLayer[j].Output * InputLayer[j].Weights[i];
}

HiddenLayer[i].InputSum = total;
HiddenLayer[i].Output = F(total);
}
//Calculate The Third(Output) Layer's Inputs, Outputs, Targets and Errors
for (i = 0; i < OutputNum; i++)
{
total = 0.0;
for (j = 0; j < HiddenNum; j++)
{
total += HiddenLayer[j].Output * HiddenLayer[j].Weights[i];
}

OutputLayer[i].InputSum = total;
OutputLayer[i].output = F(total);
OutputLayer[i].Target = OutputLayer[i].Value.CompareTo(output) == 0 ? 1.0 :
0.0;
OutputLayer[i].Error = (OutputLayer[i].Target - OutputLayer[i].output) *
(OutputLayer[i].output) * (1 -
OutputLayer[i].output);
}
}

private void BackPropagate()


{
int i, j;
double total;
//Fix Hidden Layer's Error
for (i = 0; i < HiddenNum; i++)
{
total = 0.0;
for (j = 0; j < OutputNum; j++)
{
total += HiddenLayer[i].Weights[j] * OutputLayer[j].Error;
}
HiddenLayer[i].Error = total;
}
//Fix Input Layer's Error
for (i = 0; i < InputNum; i++)
{
total = 0.0;
for (j = 0; j < HiddenNum; j++)
{
total += InputLayer[i].Weights[j] * HiddenLayer[j].Error;
}
InputLayer[i].Error = total;
}
//Update The First Layer's Weights
for (i = 0; i < InputNum; i++)
{
for(j = 0; j < PreInputNum; j++)
{
PreInputLayer[j].Weights[i] +=
LearningRate * InputLayer[i].Error * PreInputLayer[j].Value;
}
}
//Update The Second Layer's Weights
for (i = 0; i < HiddenNum; i++)
{
for (j = 0; j < InputNum; j++)
{
InputLayer[j].Weights[i] +=
LearningRate * HiddenLayer[i].Error * InputLayer[j].Output;
}
}
//Update The Third Layer's Weights
for (i = 0; i < OutputNum; i++)
{
for (j = 0; j < HiddenNum; j++)
{
HiddenLayer[j].Weights[i] +=
LearningRate * OutputLayer[i].Error * HiddenLayer[j].Output;
}
}
}

Testing the App

The program trains the network using bitmap images that are located in a folder. This folder
must be in the following format:

• There must be one (input) folder that contains input images [*.bmp].
• Each image's name is the target (or output) value for the network (the pixel values
of the image are the inputs, of course) .

As testing the classes requires to train the network first, there must be a folder in this
format. "PATTERNS" and "ICONS" folders [depicted below] in the Debug folder fit this
format.
History

• 30th September, 2007: Simplified the app


• 24th June, 2007: Initial Release

References & External Links

• Principles of training multi-layer neural network using backpropagation algorithm


• Neural Networks by Christos Stergiou and Dimitrios Siganos
• An Introduction to Neural Networks, Ben Krose & Patrick van der Smagt

License

This article, along with any associated source code and files, is licensed under The Code
Project Open License (CPOL)

Das könnte Ihnen auch gefallen