Sie sind auf Seite 1von 11

High Speed Image Segmentation using a Binary

Neural Network.
Jim Austin
Advanced Computer Architecture Group
Department of Computer Science
University of York, York
YO1 5DD, UK
August 1, 1996

ABSTRACT
In the very near future large amounts of Remotely Sensed data will become
available on a daily basis. Unfortunately, it is not clear if the processing methods
are available to deal with this data in a timely fashion. This paper describes research
towards an approach which will allow a user to perform a rapid pre-search of large
amounts of image data for regions of interest based on texture. The method is
based on a novel neural network architecture (ADAM) that is designed primarily
for speed of operation by making use of computationally simple pre-processing and
only uses Boolean operations in the weights of the network. To facilitate interactive
use of the network, it is capable of rapid training. The paper outlines the neural
network, its application to RS data in comparison with other methods, and brie y
describes a fast hardware implementation of the network.

1 Introduction
The advent of new satellites producing tens of megabytes of image-data per day has
placed a challenge on the image processing community to derive methods that are
capable of delivering timely results, and ways to reduce data to a manageable size.
Although there are may techniques for deriving useful information from remotely
sensed data (RS), it is not clear whether this is possible at reasonable cost, and in
a way that the nave user is capable of exploiting.
This paper focuses on the every day use of RS data by people such as farmers,
Government departments, environmental experts etc. who require results from RS
data quickly, easily and at low cost. Typical queries would aim to identify and
quantify regions of interest (ROI) within images and to calculate the area of these
regions. For example, farmers may require indications of crop yields, planning
ocials may want to know the amount of urban area usage. This places some very
great demands on the designers of such systems, which are far from being met.
It is essential that, given a large resource of image data, a user is able to quickly
specify what they require from the data. Because of the nature of image data,
specifying the required information by a naive user is most simply done by example,
i.e. by the user selecting samples of interest in the image areas using a 'point and
click' method. We can imagine that a user might use one or two example images to
 austin@minster.york.ac.uk

1
select what they see as important areas of interest, then submit these to a central
server where the images are held. These systems would then, using guidance from
the user, search large volumes of RS data, and return the results to the user.
There are a number of methods that could be used within this framework to
deliver the results to the user. Within these, neural networks are a strong candidate.
They provide a single robust method that can be used for many tasks. They are
inherently parallel, permitting the possibility of fast operation. They can be trained
using a set of examples provided by the user. They o er the potential to be designed
for optimal use automatically. Despite these strengths, their full application is
limited by their training speed, which becomes very restrictive with large image
sizes, typical of ROI problems. This raises the need for highly specialised and
expensive neural hardware to deal with the training problem (see Day in this book).
The main limitation of neural networks stems from the methods used to train
the networks. These rely on techniques that search for an optimal solution to the
problem. Because the space of possible solutions increases, at worst, exponentially
with the dimensionality of the problem, the training times can become quickly
impractical. A number of researchers have realized this and are beginning to propose
faster training methods (See Cruse, Leppelmann and Bode in this book). The work
described here has addressed this problem, rstly by the use of a simple training
method that scales well with the dimensionality of the problems, and secondly by
ensuring that the method can be run very quickly on current digital computers
through its use of binary logic operations, or faster on relatively low cost add-on
hardware.
The paper rst describes the neural network used in the work and then goes on
to summarise comparative work that has assessed the method. The nal section
brie y describes the implementation of the network in high speed hardware and
presents some indications of performance.

2 The binary neural network


Reducing the training time of a neural network is essential for their practical use
in RS. As outlined in the introduction, conventional network learning is based on
an optimisation method that searches the weight space of a neural network to nd
the set of weights that will minimize some error criterion. Although this can result
in a robust classi er, the search time can be excessive. It is well known that other
methods of image recognition can get quite good classi cation results at some cost
to recognition time. For example, the K nearest neighbor (K-nn [1]) method is
a particularly simple, but relatively successful method of image classi cation. In
simple terms, the k-nn method uses a set, S, of example images for each classi c-
ation. These images are speci ed by the user. The method classi es an unknown
image as belonging to class S if k of the examples from S are closer to the example
than for any other class. Unfortunately, it su ers from particularly slow recognition
times, due to the need to perform (1) a distance measure to all the stored examples,
and (2) a sort of the resulting matches to obtain the k closest matching examples.
This problem is made worse due to the large numbers of examples needed to get
reasonable recognition success. However, setting the method up for recognition is
particularly quick.
An alternative method to k-nn, the N tuple method, was developed by Bledsoe
and Browning [2]. The N tuple method operates in a similar way to the k-nn method,
but does not su er from long recognition times. It achieves this by combining all the
training examples from a class into one template, achieved by the use of a feature
pre-processing method called N tuple sampling.
Figure 1 shows an outline of the N tuple method. Both training and testing

2
consist of an N tuple pre-process stage. The result of N tuple pre-processing is fed
to a storage system based on a neural network.

Decoders

10
01
Output
Input Imge
Unit

N tuples
01

Figure 1: The basic N tuple method.

In the pre-processing stage, the image is broken into a number of samples, called
N tuples, each of which is made up of N pixels from the image. Each pixel is taken
at random from the image. Each tuple is fed through a function, typically called
a decoder, which assigns one of P states to the tuple, i.e. F(N tuple) ! state.
Each state produced by the decoder function uniquely identi es a combination of
pixel values in a given tuple. In essence the N tuple decoder function is a feature
recognition system. But, whereas feature recognisers usually identify edges, lines
etc. in an image, the decoder function recognises arbitrary pixel value combinations.
There are two reasons for this, the rst is that it makes no assumptions about the
contents of the image, i.e. that edges are the important features, and secondly it
allows simple and fast methods to be used to compute the tuple function. Each
state feeds directly to an input of a neuron. Note that the state value is typically
binary, thus the input to the network is binary. The decoder function used in the
current work is described later.
During training, the states produced by the decoder functions are recorded in
the neural networks. Each class is assigned an individual single neuron. To train
the network a simple hebbian learning method is used. First, the input image is
presented and the outputs of the decoder functions computed. These are then fed
to the neuron selected for that class of image. Where the input to the neuron is set
to one, the weights are set to one (all weights are set to zero initially). Where an
input is set to zero, the weight remains unchanged.
The neurons only need record if the tuple state occurred or not. In some applic-
ations the frequency of occurrence of the N tuple decoder state over all the training
images is recorded. In the simplest case (most often used) no frequency information
is stored, only binary `state has occurred' is recorded. This is to allow very simple
and fast implementation in hardware. In many cases the performance loss of this is
minimal [3].
Recognition is a simple process of feeding the image sample to the network,
passing it trough the decoder functions and then into the network. The usual sum
of weights times inputs is used. The result of this passes directly to the output of
the network, no activation function is used. The neuron with the highest response
indicates the class of the data.
This simple method is fast in training, because all that is recorded is the tuple

3
states, and in recognition only one neuron per class is used making the method
particularly fast.

2.1 Extension to the ADAM memory


The N tuple method was extended to form an associative memory [4], used for
image recognition and scene analysis. The aim was to develop a system that could
associate an input pattern to another pattern for complex image analysis tasks. For
example, the memory could be used to associate image regions with their carto-
graphic equivalents (see section 3.2 for an example of road detection). To allow
this, a second layer of neurons was added which allowed the recall of the associ-
ated image, and a method of selecting the hidden layer was speci ed to allow fast
training of large numbers of associations.

Decoders

10
10 01 0110
Input Imge
01 01 Output Layer

01 11
00
Class Layer
01

Figure 2: The Advanced Distributed Associative Memory, ADAM.

The architecture of the Advanced Distributed Associative Memory (ADAM) is


shown in Fig 2. The network is trained in `one shot' as described in Bolt, Austin
and Morgan [5] The approach is simple and fast, and allows some degree of analysis
to de ne the number of associations that can be stored for a given size of network.
All the results given in section 3 use the rst stage of the ADAM memory, and
as such use it as a simple N tuple network. However, the hardware implementation
described in section 4 supports the full ADAM implementation.

2.2 Parameter selection


The introduction indicated that the use of the system should be as simple as pos-
sible. In neural network methods that use optimised learning, it is particularly
dicult and time consuming to selec the parameters that de ne such things as the
number of units, the number of layers etc. This is becuase no formal model exists
for there de nition.
The same problem exists with N tuple based systems, in that the user must
specify the size of the tuples, the number of tuples to be used and the number of
training examples. However, because training is so fast it is possible to de ne a
learning method that cycles through all the possible combinations of parameters
to nd the optimum. In addition quite a lot is known about the e ect of the
parameters [6] which can be used to speed up the parameter search.

4
For the ADAM network the user needs to supply the size of the hidden layer
(class or separator layer), as well as the bits set to one in that layer. A great deal is
known about the e ect of these parameters on the performance of the network [7].

2.3 Strengths and weaknesses


The networks based on the N tuple method have two great strengths, they can be
trained quickly and they can be implemented in conventional computers simply to
operate at speed. These advantages come at the cost of recognition robustness. In
a MLP or other types of network, the user accepts long training times for possible
high accuracy. However, in many applications, such as the one given here, the long
training times can result in a system that is inapplicable. It has been recently shown
that the N tuple method can result in quite reasonable recognition performance if
used with care [8].
It may be noted that there are many extensions to the basic N tuple method,
into all types of what is known as Binary, Weightless or RAM based networks [9].
Many of these improve the recognition success at some small cost to training speed
and implementation eciency.

3 Examples of image analysis


The N tuple network and the ADAM memory has been used extensively for image
analysis tasks [10], [11]. Its application to the analysis of infra-red line scan (IRLS)
images is of particular interest here [12]. To illustrate the trade o in accuracy the
image in Fig. 3 was used in comparative studies by Ducksbury [13] against a Pearl
Bayes Network (PBN) method using conventional image operators. The results of
these studies are compared and summarised here.

Figure 3: The image used in all examples in this paper. It is a infra-red line scan
image taken from 3000ft by an aircraft, 512 x 512 pixels, 8 bits per pixel.

5
The images for the comparison are taken from an infra-red scanning sensor taken
from an aircraft ying at 3000ft. The area around Luton, UK was used. The data
was collected line by line as the aircraft ew along. Each line in the image is one
sweep of the IR sensor from horizon to horizon. This results in barrel distortion of
the image, and the motion of the aircraft results in line distortions. An example of
one of the images is shown in Fig. 3, which is used for the comparisons given in
this paper.
3.1 Image segmentation
Image segmentation is a central task in RS data analysis. The region of interest
must be identi ed by segmenting areas which fall into the same class. This would
normally be followed by a area count or some other statistics required by the user.
The problem is often dealt with on a pixel basis, using the multi-band data
to classify the pixels into image classes. However, these methods cannot segment
regions such as urban areas from rural, due to the contextual information required.
The dicult in this task is centred on the amount of data required in terms of the
context needed to classify for each region. The most standard approach is to use a
small window of pixels to classify a small region of the image. Most approaches use a
selection of typical regions with known classi cations (ground truth). A convolution
(scan) of the whole image selecting regions of the same size for classi cation is then
performed using the system. This is a relatively fast approach compared to other
methods which involve di usion of information from contextual regions to a central
pixel.
The simple convolution approach can require an input of up to 256 pixels, or
more, to achieve a successful segmentation. For networks such as the MLP the
relatively large input window can result in long training times (many hours) on
conventional workstations.

Figure 4: The results of segmenting the image using the ADAM neural network.

The use of ADAM and the N tuple method for segmentation of urban and rural

6
areas in infra-red line scan (IRLS) data is reported in Austin [14]. The method
used a gray scale version of the N tuple method rst described in Austin (1988)
which uses a ranking method in the tuple function. The network was trained on 24
examples of rural and 24 of urban areas, based on 16x16 pixel segments cut from a
training image. The training was almost instantaneous on a 10MIPs workstation.
The network was then used to classify the rest of the same image, by convolving
the network at 16 x 16 pixel increments. The results are shown in Fig. 4.

Figure 5: The results of segmenting the image using the PBN.

As the network was trained on central regions of the rural area the recognition
performance was good in that region, but limited in the right hand region due to
distortion of the image.
The same problem was investigated by Ducksbury [13] using a perl bayes network
(PBN). The approach rst analysed the image using three image lters, creating
feature maps for edges, extrema and distribution types. This was then fed to a
PBN which was set up to apply a number of relations between the image features.
The results of this process are shown in Fig 5.
The results show that the PBN has developed a clear separation between the
urban and eld area which is comparable with the neural network based method.
Both methods are implementable in parallel hardware for high speed. However,
the PBN needed speci c features to be selected to allow the correct classi cation
of the regions where as the neural network required no pre-selection of the relevant
features. The advantage of the neural network is clearly in its ability to use raw
unprocessed image data. This is highlighted in the next section which shows that
the same network can be used to detect road segments with some success.
3.2 Road detection
Many problems require the annotation of the image to allow identi cation of the
image contents. This is the main task of cartographers. It is a time consuming and

7
expensive process in which all image features must be identi ed and noted. One
such task is road detection, where road like features must be identi ed and labeled.
This is typically a complex task, which rst requires the identi cation of line-like
fragments and then the joining together of these into connected line segments. The
following summarises the application of ADAM [12] for the initial line segment
identi cation.
The neural network based approach is identical to the segmentation method. A
set of example roads were selected by hand and trained into the ADAM network.
In this case the ADAM network was set up to recall an icon of the road at the angle
given. The data use for training is shown in Fig. 6.

Figure 6: The data used to train the neural network, The data consists of four sets
of image patches one for each angle of road (bottom four) and one more non-class
data set (top).

The image regions were 8 x 8 in size, and the gray scale method described in the
previous section was used. The result of convolving the image with this network is
given in Fig 7. Method was very good at nding a road if there was one. However,
there was a large number of false positives. The method cleary provides a fast
pre-searc of an image with few false negatives. Subseqent processing on the regions
indicated would then provide a robust result.

4 Implementation of the ADAM network


The N tuple method has been shown to be implementable in dedicated hardware for
high speed operation for some time [15]. The essential feature of the method was its
similarity to random access memory's (RAM) used in large quantities in computers.
The basic binary version of the N tuple method (using a simple 1 in N binary decoder
as the tuple function) is basically a RAM. Although it is not practical to use RAMs
directly as the neurons and tuple functions, their applicability has given the name
to the method (RAM based neural networks).
Recent work has shown how the weight matrix and recall process can be im-
plemented in eld programmable gate arrays [16]. The N tuple pre- processing is
undertaken by high performance digital signal processors. The latest implement-
ation of ADAM, using the Sum and Threshold implementation (SAT 2), achieves
good performance (using prepared tuple state data). A 512 x 512 image with an
N tuple size 4 can be processed in 32 micro-seconds. The convolution used in the

8
Figure 7: The result of using the ADAM network to nd road segments.

image segmentation task takes in the order of 13 milliseconds. This is equal to the
speed of an SGI workstation, using a single MIPS R4600SC, 150MHz processor.
The card used to implement the memories is about half the cost of the workstation.
The card used to embed the SAT processor is based on a VME implementation,
which allows its use in parallel with a number of other cards, or by itsself in a
workstation. Our current system is designed to incorporate 3 SAT based cards in
a VME based SGI Challenge machine, which contains 4 MIPS R4600SC processors
for image handling and pre-processing. The weights are paged from the challenge
machine to the cards for processing tasks. In this con guration the machine acts
as an image pre-processing engine.
The software to support the ADAM memory is written in C. It allows the user to
create, delete, train and test the memories as well as storing and retrieval of trained
memories, on a UNIX based work station. A copy of the software is available from
our web site 1 . It was used to obtain the results given earlier in this paper. The
SAT based hardware uses a C++ based library which is currently under test.
Current standard computer technology is ideally suited to the implementation of
RAM based networks such as ADAM. The use of a binary weight matrix and simple
sum operations means that good performance can be achieved. This opens the
possibility for a user to locally train the neural network on their own workstation,
and evaluate its ability on a small test set. Subsequent large scale analysis on
many images can then be left for dedicated high performance systems (as a shared
resource), such as the one described above.
The advent of such technology at a reasonable price allows the processing of
large amounts of image data in near real-time.
1 http://Dcpu1.cs.york.ac.uk:6666/~aaron/adam/adam.html

9
5 Conclusion
This paper has shown how binary neural networks can be used to process large
images in a reasonable amount of time. It has illustrated that small training sets,
may be trained very quickly into the network and achieve very good results. The
methods show good performance in comparison with other techniques. The advant-
age of very high speed implementations of the method provides a direct route to
the analysis of large data sets in an interactive environment.

6 Acknowledgements
This work has been supported in part by the University of York Department of
Computer Science; The DTI, EPSRC and British Aerospace under an IED grant;
the EPSRC, and British Aerospace within the EPSRC AIKMS initiative. The work
has been undertaken in the Advanced Computer Architecture Group by a large
number of people, including Guy Smith, Martin Brown, Aaron Turner, Ian Kelly,
John Kennedy, Rick Pack and Steven Buckle.

7 References
References
[1] R O Duda and P E Hart. Pattern Classi cation and Scene Analysis. John
Wiley, 1973.
[2] W. W. Bledsoe and I. Browning. Pattern recognition and reading by machine.
Proc. Joint Comp. Conference, pages 255{232, 1959.
[3] G Smith and J Austin. Analysing aerial photographs with adam. International
Joint Conference on Neural Networks, June 7-11 1992.
[4] J Austin and T J Stonham. An associative memory for use in image recognition
and occlusion analysis. Image and Vision Computing, 5(4):251{261, Nov. 1987.
[5] G Bolt, J Austin, and G Morgan. Uniform tuple storage in adam. Pattern
Recognition letters, 13:339{344, 1992.
[6] J Stonham. Practical pattern recognition. In I. Aleksander, editor, Advanced
Digital Information Systems, pages 231{272. Prentice Hall International, 1985.
[7] M Turner and J Austin. Storage analysis of correlation matrix memories, in
preparation, 1996.
[8] R Rohwer and M Morciniec. The theoretical and experimental status of the
n-tuple classifer. Technical report, Neural Computing Research Group, De-
partment of Computer Science and Applied Mathematics, Aston University,
1995.
[9] J Austin. A review of ram based neural networks. In Proc. of the Fourth
International Conference on Microelectronics for Neural Networks and Fuzzy
Systems., pages 58{66, Turin, 1994. IEEE Computer Society Press.
[10] A.W. Anderson, S.S. Christensen, and T.M. Jorgensen. An active vision system
for robot guidance using a low cost neural network board. In Proceedings of
the European Robotics and Intelligent Systems Conference, (EURISCON'94)
Malaga, Spain, August 22-25, 1994.

10
[11] D Bissett. Weightless Neural Network Workshop '95. University of Kent,
Canterbury, UK., June 1995.
[12] J Austin and S Buckle. Segmentation and matching is infra-red airbourne
images using a binary neural network. In J Taylor, editor, Neural Networks,
pages 95{118. Alfred Waller, 1995.
[13] P G Ducksbury. Parallel texture region segmentation using a pearl bayes net-
work. In John Illingworth, editor, British Machine Vision Conference, pages
187{196. BMVC Press, 1993.
[14] J Austin. Grey scale n tuple processing. In Josef Kittler, editor, Lecture
notes in Computer Science, pattern recognition: 4th international conference,
cambridge, uk, volume 301, pages 110{120, berlin, 1988. springer-verlag.
[15] I. Aleksander, W. V. Thomas, and P. A. Bowden. Wisard: A radical step
forward in image recognition. Sensor Review, pages 120{124, 1984.
[16] J Kennedy, J Austin, R Pack, and B Cass. C-nnap - a parallel processing
architecture for binary neural networks. ICNN 95, June 1995.

11

Das könnte Ihnen auch gefallen