Sie sind auf Seite 1von 5

Neural Chip SAND/1 for Real Time Pattern Recognition

W. Eppler, T. Fischer, H. Gemmeke, T. Kder, R. Stotzka


Forschungszentrum Karlsruhe E-mail: eppler@hpe.fzk.de

Abstract
The neural chip SAND/1 (Simple Applicable Neural Device) was designed to accelerate computations of neural networks at a very low cost basis. Low cost means that only few peripheral chips are necessary to use the neural network chip in applications. The design criterions of the chip are described. A PCI-board was developed and a VME-board is under construction to facilitate applications. At present, time series prognosis, multidimensional curve fitting, pattern recognition in gamma-shower detection and mammography are under evaluation. The last application is described in more detail.

provide reliable muon and hadron background rejection now available only off-line. The computing power of SAND/1 is sufficient for the pattern recognition of Cherenkov images as the expected decision time did not exceed few tenths of microseconds. The training of the neural network is carried out with detailed simulations of Extensive Air Shower traversing atmosphere and with the response of apparatus (see [9], but also notice [10]). X-ray mammography is one of the most effective methods in diagnosis of breast cancer. The detection of grouped micro-calcifications is often the only early indicator of malignant structures within the tissue. Micro-calcifications are often very small (50 - 100 micron) and overlapped by other structures. An automatic computer assisted detection of micro-calcifications may thus reduce the risk of misdetecting some micro-calcifications and lead to a more reliable diagnosis. Image convolution which will reveal calcifications is a very time consuming task and thus not feasible with todays microcomputers in real time. Several SAND/1 chips speed up these convolutions. The nonlinear transfer function is applied to implement a decision threshold to mark regions containing micro-calcifications. Further details are shown in chapter 5. But first of all some neural network basics are introduced to become familiar with the necessary computations. The following chapters present an overview of the SAND neuro-chip and neuro-board. At the end of the paper comparisons to other chips are made.

I. INTRODUCTION
The neural chip SAND/1 (Simple Applicable Neural Device) was designed to accelerate computations of neural networks at a very low cost basis [7]. Low cost means that only few peripheral chips are necessary to use the neural network chip in applications. Three different domains of applications we had in mind: acceleration of neural computations on a standard personal computer with PCI-bus, industrial and physical applications with VME-plattforms, and stand-alone versions in technical environments. Especially the last domain is the reason that the neural transfer function was not implemented with an additional microprocessor but with a freely programmable lookup table. Another feature of the chip is the processing of several input patterns in one block. With this the processing performance of the chip could be brought into line with the transfer rate of weights and activities. Each of the parallel working units has a 16 bit adder, a 16 bit multiplier and a 40 bit accumulator. A tricky cut mechanism provides a nearly optimal reduction of the internal data size to the external 16 bit words. Four parallel data processing units are implemented on a single neuro chip. The performance is 200 MCPS with a 50 MHz clock. This leads to a maximum of 600 MOPS when using radial basis function networks. The chip is available since March 1997, a PCI-board is under test since April 1997, a VME-board is expected at spring 1998. First applications are pattern recognition tasks in mammography and -shower experiments. The exploitation of the range of -energies between 10 and 300 GeV is of great importance and requires new experimental techniques and intelligent methods of data analysis. The expected trigger rate of the Atmospheric Cherenkov Telescope (ACT) in the MAGIC project can be decreased considerably by using neural network hardware triggers. The trigger will

II. N EURAL NETWORK BASICS


The most commonly used neural networks are feedforward networks. In feedforward networks neurons are arranged in layers without back-loops. There are no connections between neurons of the same layer. The basic element of a neural network is an artificial neuron described by
n x i = f wij o j + j =1 (1)

where wij are the connection weights from neurons j to neuron i and is a bias. The input activities oj are multiplied with the connection weights, accumulated and transfered by a nonlinear activation function to the output activities xi (Fig. 1).

O1 wi1 O2 wi2 O3 wi3

...
On win

Xi

Fig. 1 : Model of an artificial neuron

Looking at the matrix/vector multiplication introduced in the last section, it is obvious, that n*m multiplications have to be processed, were n is the number of input or transmitting neurons and m is the number of output or receiving neurons. To compute these multiplications a transfer of n*m+n words (weight matrix and activation vector) is necessary. The ratio of transfer rate to computation rate is rather bad. A better solution is the simultaneous usage of multiple (p) input patterns instead of one. Then the activation vector is replaced by a matrix which consists of m columns. The computation rate is n*m*p and the transfer rate n*(m+p). Each of the processor elements (PE) of SAND is working four cycles (i.e. p=4) with the same weight. Every fourth cycle the weight is updated so there is a continuous flow of weights on the weight bus. In the considered period of four cycles four activities are loaded into SANDs processor elements. These activities are transfered over registers from one PE to the next. There is a continuous flow of data on both the activity and on the weight bus. Due to the method data and commands are handled the architecture of SAND is a systolic processor array. In Fig. 3 the architecture of SAND is shown, which consists of four parallel processing elements each equiped with an ALU and an auto-cut module. The ALU is used for the multiplication of vectors within the matrix/matrix multiplication. Due to the accumulation of activities the width of words grow from 16 bit up to 40 bit, if the number of input neurons is limited to 512. To be compatible with external memories (16 bit), and with the width of activities outside the chip (16 bit), a window of 16 bit must be cut out of 40 bit. The position of this window may be influenced by a user-defined selection of an appropriate weight range. The cut is done in the auto-cut module which automatically checks if an over/underflow occurs. To minimize the error caused by the cut, an automatic adaption of the accuracy is performed in a second step. For self-organized (Kohonen-)maps it is important to find extremal values in the flow of output activities. Therefore, a postprocessing module is used which can work in two modes: search for a maximum or a minimum. The appropriate activation function f(x) is realized outside the chip with a lookup table. Some types of neural networks require both a linear function f(x)=x and a non-linear function like the sigmoidal function. Therefore SAND has two outputs: one for addresses of the lookup table and one for linear data. For the calculation of expression (1) a multiplier and an adder are needed to perform a fast multiplication of vectors. To increase speed both elements are placed within a pipeline. As a first step input activities are multiplied with corresponding weights and then added to previous values. Due to the fact that four patterns are processed, four accumulation registers are required within the PEs. For some neural networks it is also necessary to calculate the Euklidian distance between two vectors. Therefore SANDs ALU is equipped with an additional adder, which is also placed in

If all neurons of one layer are regarded (Fig. 2), the function of the complete layer can be described as a matrix/vector multiplication

x = f (W * o) ,

(2)

where o is the input vector and W the weight matrix which keeps all connection weights between two related layers. The sigmoidal function

f ( x) =

1 1 + e x

(3)

is mostly used as the activation function in feedforward networks. To increase calculation speed of a neural network, neurons have to work in parallel. On the other hand a high flexibility concerning the structure of neural networks should be ensured. To grant both demands, only neurons within the same layer are processed in parallel, whereas the various layers are processed sequentially. The architecture of the chips and the design criteria of the system are described.
flow of data O1 O2 ... On X1 X2 ...

Xm

Fig. 2 : Part of a feedforward network

III. A RCHITECTURE OF SAND


SAND is a cascadable, systolic processor array designed for fast processing of neural networks. The neurochip SAND may be mapped on multi-layer perceptrons (MLP), radial basis function networks (RBF) and Kohonens self organized feature maps (SOM). Due to these most common neural network types, SAND covers about 75% of all important applications. This estimation is result of an analysis of 154 applications found in the literature. In the following the idea of organization of the neural processor is given for feedforward networks, mostly used.

the pipeline. This feature is essentially used for Kohonen Feature Maps or Radial Basis Function Networks (RBF).
activties
16

weights
16
R16 R16

within a fast SRAM external to SAND. The flow of data to and from the board is controlled by the data stream controller (DSC) which also holds the configuration of a neural network. Control of SAND processors, access to memories, and initialization of weight memories and look-up table are performed by a sequencer (SEQ).

R16

R16

R16

16

16

The exchange of data between host and PCI-board is asynchronous. A simple protocol synchronizes both communication sites. On the PCI-board two components are involved to handle the PCI-interface: the PCI-controller AMCC5593 and the data stream controller (DSC) being implemented by a field programmable gate array (FPGA). Another controller, the sequencer (SEQ), produces outputs controlling the SAND chip. The output of the DSC are SEQ instructions being interpreted by the sequencer. The PCI interface transfers commands and data, respectively. Commands are sent to the mailbox or to the DSC, the data is sent to the input queue (FIFO) of SAND. This FIFO contains at most two complete data sets. One data set consists of four subsequent input patterns for the neural network. When SAND is working with the current data set, at the same time a second set may be read into the FIFO.

ADD 16 16

R16

ALU

40

PE 1

PE 2

PE 3

PE 4

V. APPLICATION OF SAND IN MAMMOGRAPHY


Auto-Cut 16
R16 R16 R16 R16

Registerbank

16

16

16

16

Post-Processing Min-Max-Search
16 16

In this paper an application is described in more detail that uses neural networks only because of implementation purposes. No training phase is required for all weights are analytically determined. As shown in equation (1) the computation function of a neuron is mainly the summation of products which are coupled with a transfer function. This transfer function often is non-linear, in many cases of sigmoidal shape, but also linear functions are possible. E.g. in radial basis function networks neurons of the output layer are linear because of f being the identity function. Some convenient concepts and methods used in pattern recognition tasks are based on similar computations. Examples are the convolution and the correlation of functions, FIR-filter and FFT. Table 1 shows their similarities according to the neuron equation (1).
Table 1. Pattern Processing Methods

address

data

Fig. 3 : Block structure of SAND

Method

Computation
m ,n

IV. S TRUCTURE AND OPERATION OF PCI BOARD


For acceleration of neural simulations, a PCI board was designed which can work with up to four SAND chips running at 50 MHz. The performance can be scaled from 200 MOPS (one SAND processor) up to 800 MOPS (four SAND processors). The board contains several memory blocks which store incoming activities, intermediate results and final results in three FIFO chips. Weights are stored in distributed memories, attached to SAND processors. Incoming activities are multiplied with corresponding weights and accumulated within the SAND processors. The calculation of the activation function is done with a look-up table implemented

convolution g( x, y ) = h( x, y ) * f ( x, y ) = h( m, n ) f ( x m, y n) correlation FIR-filter FFT


g( x, y ) = h( x , y ) f ( x, y ) = h * ( m, n ) f ( x + m, y + n )
m, n

g ( x ) = c( k ) f ( x k )
k

F ( u) =

1 N

f (x )W

ux N

ux , with WN = e j 2ux / N

All these functions are linear and may be reduced to a matrix multiplication.

In mammography small micro-calcifications have to be detected in X-ray images. Even for experts they are hard to detect and are not always noticed. Filtering and other image processing techniques help to sharpen the microcalcification structures and stress the relevant information within the images. In our mammography workstation project various processing layers with different filter functions are used. On the first layer a binomial/highpass filter with the following filter matrix is used:
Table 2. Binomial/highpass Filter Matrix

working with 8 bit accuracy, or 16 bit with less than half the rate. Even worse, the analog ETANN computes with approximately 6 bit accuracy. Sometimes poor accuracy may be compensated by non-linear data transformations. But for onchip training of the neural network a minimal data length of 16 bits seems to be necessary to find the global optimum (but see special example in [12,13] where 4 bits are shown to be accurate using tabu search).

-1 -4 -6 -4 -1

-4 16 24 16 -4

-6 24 36 24 -6

-4 16 24 16 -4

-1 -4 -6 -4 -1

...
This matrix b is convolved with the original image f (first image in Fig.4). The result g=b*f is shown in the second image of Fig.4. Another filter used is a smoothening low pass filter s. It weakens high amplitudes and is applied to the original image. The result of this filter l=s*f is used to create a difference image d=g-l that shows significant spots in the image. A further threshold operation produces the binary image h=t(d), shown in the last image of Fig.4. The whole process can be computed by a neural network with the topology also shown in Fig.4. The first hidden neuron represents the filter function b*f, the second hidden neuron stands for the filter s*f. In the output layer the difference of both filters is computed and the threshold is easily implemented by the bias and the non-linear sigmoid function of the output neuron.

f s l=s*f

b g=b*f

th(g-l)

VI. T ECHNICAL DATA AND PERFORMANCE OF SAND


SAND is manufactured in a 0.8m CMOS process, using a sea-of-gates technology with almost 50K Gates. The packaging of SAND is a PGA with 120 signal-pins. The non-linear activation function is calculated by the use of a free programmable look-up table allowing for a maximum of flexibility. A controller chip, the memories, the lookup table and the SAND chip are arranged as a fixed modular unit guaranteeing the tight timing for up to 50 MHz operation. There are especially three neuro chips available fulfilling partly similar requirements as SAND: the MA16 of Siemens [1], CNAPS of Adaptive Solutions [3] and ETANN of Intel [6]. SYNAPSE is a neuro-computer with one MA16. It is available as PC-board [8]. For industrial applications a standalone solution without host computer many other chips and at least one micro-controller are necessary. CNAPS and ETANN have problems with their low precision. CNAPS is

Figure 4: Example of filter implementation in a neural network.

There are two well known applications of digital neural network processors in second level triggers: CNAPS[3] in H1 [2,11] and MA16 [1] in WA92 [4], see Tab.3. The neural processor module based on SAND demonstrates throughput similar to the CNAPS-board and successfully competes with it when the data acquisition system is equipped with an event buffer. Moreover, the module allows processing of higher accuracy input activities. The SAND processor module

shows higher throughput than the trigger module based on MA-16 due to the simultaneous processing of four events and the higher clock frequency of SAND board.
Tab. 3 : Comparison of existing MA16 and CNAPS data with SAND (from simulation of VME-board) ANN input computation latency structure activities time time H1-Exp: CNAPS 64x64x1 8bit/20MHz SAND WA92: MA-16 SAND 16x5x1 16-bit/8MHz 5.5s 16x5x1 16-bit/50MHz 0.5s 8s 3.6s 8s unknown 27s 64x64x1 16-bit/50MHz 5.1s

The SAND chip is produced by IMS1, the PCI-board with four SAND chips by INCO2 [5], Leipzig. Faster versions using full-custom design and supporting a fast hardware learning features are under development at FZK.

VIII. R EFERENCES
[1] U. Ramacher et al., Design of a First Generation Neurocomputer, in VLSI-Design of Neural Networks, eds. U. Ramacher and U. Rckert, Kluwer Academic Publishers, 1991 [2] D. Goldner et al., Proceedings of 4th International Workshop on software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, April 3 - 8, 1995, Pisa, Italy, pp. 333-340 [3] Adaptive Solutions, CNAPS product Information, 1995 [4] C. Baldanza et al., NIM A 376 (1996) 411 and NIM A 373(1996) 261 [5] Becher, T., Eppler, W., Fischer, T., Gemmeke, H., Kock, G., The MiND-project: building, applying and speeding-up neural networks using the SANDneuroprocessor, Proceedings of EUFIT 97 (ed. H J Zimmermann), Aachen, Germany [6] C.S. Lindsey and B. Denby: A study of the Intel ETANN VLSI Neural Network for an Electron Isolation Trigger, 1992, internal CDF Note - CDF/DOC/CDF/Public/1850

Fig. 5 PCI neuro-board

VII. C ONCLUSION
Depending on the application several design criteria for ANN chips have to be met. These are partly different, especially in respect to the size and the type of the neural network. SAND performs feedforward networks, Kohonen feature maps and radial basis functions with comparable speed. The central processing unit of the chip was designed in a way that only few additional devices are required compared to previous designs. To facilitate a stand-alone operation of SAND, the neuron activities are buffered. Because of the modular structure, performance improvements may be achieved by adding more processing elements. Future developments of general purpose micro-processors like pentium P55C from INTEL, K6 from AMD, M2 from Cyrix and others have to be regarded carefully. Their MMXinstructions and the use of parallel integer units on chip enable these devices to very fast matrix multiplications. Up to now the independent parallel transfer of data is a problem so that they cannot compete with the performance of SAND. Because of many restrictions the internal pipeline organization is not appropriate for the fast computation of neural networks. Other processors (from MIPS with MDMXinstructions) or the digital signal processor TMS320C80 from TI aim to the same direction dealing with similar problems. But in near future this may change.

[7] T.Fischer, H.Gemmeke, W.Eppler, A.Menchikov, S. Neusser: Novel digital hardware for trigger applications in particle physics in Proc. of 2nd Workshop on Electronics for LHC Experiments, Balatonfred, Hungary 1996
[8] Siemens Corp., Neurocomputers and Tools, SYNAPSE 2-Homepage at http://www.snat.de/welcome.htm [9] E. Lorenz in Nucl. Phys. B-Proc. Suppl. 48 (1996), 494 [10] S.Omner, S. Westerhoff, H. Meyer and HEGRA collaboration: Search for TeV gamma-rays from extragalactic point sources with neural photon/hadron separation, Nucl. Inst. Meth. 389 (1997), p. 204 [11] J.K.Kohne et al., Realization of a second level neural network trigger for the H1 experiment at HERA, Nucl. Inst. Meth. 389 (1997), p. 128 [12] G. Anzellotti et al., TOTEM: a highly parallel chip for triggering applications with inductive learning based on the reactive tabu search, Int. Jour. Mod. Phys. C6 (1995) p. 555 [13] S. Dusini et al., The neurochip TOTEM: a case study in HEP, ICHEP97

1 IMS, Allmandring 30a, D-70569 Stuttgart, Germany 2 INCO Systeme, Sthrerstrae 17, D-04347 Leipzig, Germany

Das könnte Ihnen auch gefallen