Sie sind auf Seite 1von 4

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 57

Back-Propagation on Horizontal-Vertical-
Reconfigurable-Mesh
Seyed Abolfazl Mousavi, Ali Moeini, Mohammad Reza Salehnamadi

Abstract— The new adaptive approach demonstrates implementing Check-Boarding algorithm on HVRM at this article. First,
Back-Propagation algorithm is summarized; second computational model used in basic Check-Boarding algorithm is explained
and finally the Reconfigurable Mesh and new method are presented.

Index Terms—Neural Network; Reconfigurable Mesh; Check-Boarding; Back-Propagation; Hypercube; Reconfiguration; RTR

——————————  ——————————

1 INTRODUCTION

B ack-Propagation is a learning algorithm embedded by


several neural network implementations. It is per-
formed by various trends to technologies from Gen-
pattern, per epoch or after a complete training age. Note:
the training is iterative and a collection of training data
can be fed into more than one time by adhering to follow-
eral Purpose Processor to ASIC, but the computational ing formulas:
model was interested during 1990s for executing a gener- (1)
al domain of program and taking advantages of paral- (2)
lelism. Check-Boarding is one of them that was mapped (3)
on hypercube. HVRM has some better properties to map (4)
Check-Boarding which is revised by this trend, at the end,
a comparison is made between these two models by tak-
ing into consideration with Scalability, Run-Time perfor- 3 COMPUTATIONAL MODEL
mance and Area occupied in each model. ASIC-like circuit has been designed by using Field Pro-
This observation is operated in theory and it needs some grammable Gate Array (FPGA) or other logical arrays
proof-of-concept that would be the future work. and a high-level language such as VHDL or VERILOG.
Since the surface is just devoted to a special purpose, con-
2 NEURAL NETWORK sumption of area is not optimized and broad ranges of
hardware explorations are lost. In opposite, there is a
Neural Network is a collection of computational nodes fetch-decode-execute cycle with extra overhead in se-
which are joined altogether through a connection net- quential running regime to achieve General Purpose Pro-
work. It should be trained to use by altering correspond- gramming. [2] These two trends of programming spec-
ing weight to each link under a learning algorithm. trum encounter with each other in computational model.
Meanwhile, network is formed. The first and foremost of Hypercube is one of them defined recursively on basis of
this sort of algorithm is Back-Propagation [1] applied in
Reflected Gray Code by adding one extra bit inside left of
MLP. If the number of nodes is equivalent in each layer,
n-1 cube and link equivalent points between 0-left and 1-
ANN will be uniform and the contrary is non-uniform.
left cubes. Neighbor nodes just differ in one bit. The
The maximum number of perceptron is assumed n, f, w,
shortest path between 2 nodes are determined by Ham-
y, θ respectively denote activation function, weight, out-
ming distance, hence the diameter of topology equals to
put and bias. For supporting all kinds of plots by consid-
n. Good availability of this model is achieved by cost in
eration for the universal approximation theorem, it
should be nonlinear. The BP has 2 stages: first, from the wires and excessive degree, n, in each node. Scalability of
beginning to the end and second from the end to the be- model is limited because node number is .
ginning. At the first, output and error are computed ac-
cording to current state and next weights are adjusted 4 RECONFIGURATION
from the last layer-by-layer to the first one by interven-
tion of error. The adjustment might be performed per While computational model and hardware is shared
among some programs, deployment of hardware is more
effective by mapping computational model on it. Pro-
————————————————
grams are advanced consecutively and all has its own
 S.A. Mousavi is with the Department of Computer Engineering, South configuration and state; which results in commute cir-
Tehran Branch, Islamic Azad University, Tehran, Iran.
 A. Moeini is with Department of Algorithms and Computation, Faculty of cumstance of logic array in every next enter. Thus, recon-
Engineering, University of Tehran, Tehran, Iran. figuration is an inevitable outcome.
 M.R. Salehnamadi is with the Department of Computer Engineering, Configuration of hardware is alternated step-by-step,
South Tehran Branch, Islamic Azad University, Tehran, Iran.
statically and according to the source code. Run-Time-
Reconfiguration, which is dynamic and abbreviated by
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 58

RTR, is invoked to evolve computation of a program by view into algorithm mapping in the presence of high de-
changing the state of physical base at run-time. RTR is gree usage of reconfiguration. The general view catego-
applied by various levels, from coarse-grain to fine granu- rizes reconfiguration into optimal, strong, weak and the
larity. closer view measures scalability degree with the purpose
Involving reconfiguration in programming or fine- of degenerating reconfiguration use. This measurement is
grain manner is performed by attaching switches to a done on strong and weak models.
static topology. Reconfigurable Mesh [3] is one of those
evolutionary models whose PE is attached by local
5 CHECK-BOARDING ON RECONFIGURABLE MESH
switches. RM is an acronym for ‘Reconfigurable Mesh’
and used afterward instead of the complete name. Mesh Check-Boarding is a distribution of weights along hyper-
is a simple, regular, popular and suitable shape to fit on a cube and obtains multi-node broadcasting to gain better
logical array like FPGA. It is classified by tile, wrap- performance. Each layer is distributed on all PEs without
around link and communication mode. The topology is any dependency on other layers. Every input weight cor-
formed by tile pattern and wrap-around link [4]. responding to column number is saved on row. The row
Data transportation through this network is alive with is responsible for an internal neural node processing. Da-
two different approaches: link-oriented and bus-oriented. ta is fed into layer from diagonal nodes by vertical broad-
By link adoption, message is sent in point-to-point man- casting and after product of data and weight inside PEs;
ner. In contrast, ideal bus can convey entry to all destina- the results are computed by parallel horizontal broadcast-
tions at constant time, but it is not actually accomplished ing and located on diagonal nodes as inputs to next layer
because of physical restriction, and a propagation delay is [5]. Major feature of Check-Boarding is its arrangement
loaded on computational model. In sum up, the propaga- and multi-node broadcasting [6] is a hypercube benefit
tion delay is involved in features of computational model used to achieve better performance. As a result, this ar-
by sewing bus onto topology. Delay is measured in either rangement is assumed as Check-Boarding. A contribution
constant or dependent on the number of connected was presented on RM [7] but it is designed regardless of
Process Elements (PEs). At the moment, three types are model scalability which is weak. In this section, an algo-
reflected: rithm using Check-Boarding arrangement is described on
1. Constant-time, delay=O(1); CREW-Horizontal-Vertical-RM or CREW-HVRM that is
2. Log-cost, delay=log(n) n refers to the number of bus-oriented. HVRM just has 2 legitimate connection pat-
bus-connected PEs; terns from North to South and East to West. The patterns
3. Bend-cost, delay=bend number, number of bends support bidirectional transportation. The limited patterns
can be counted at the bus structure. compel a sufficient broadcasting delay on bus, the mini-
PEs of RM can read/write on link (bus)/local memory, mum O(1) and the maximum O(log(n)); n is the number
altering wiring between local switches and arithmet- of PEs along length or width of mesh. In addition, the
ic/logical operation. Link (bus) accessibility is as same as model has an optimal scalability.[3]
PRAM memory (CRCW, CREW, ERCW, and EREW); Check-Boarding is performed in forward and back-
therefore they are classified not only by link or bus- ward pass. Forward pass starts with entrance of data
oriented but also by link or bus accessibility. from I/O port to diagonal PEs and then inputs broadcast
Interference of switch in model causes to have flexible through columns and the result of summation is calcu-
arrangement. 15 connection patterns are possible within lated by traversal of a binary tree from leaves to root. Di-
mesh. Legitimate patterns practically differ between agonal nodes are roots. At the end, outputs are ready for
types. As a result, they are figured out as RM features. next layer. This process lasts until the results of neural
The more connection patterns exist, the more computa- network are worked out in last layer. The computing of
tional power is achieved [3]. margin between desirable neural network outputs and
A lot of algorithms are implemented on different types computed values is essential for running backward pass.
of RM but which one is efficient and uses reconfiguration Note that the desirable outputs enter from I/O port.
in a prefect way? The important factor is scalability. The Broadcasting along rows and then traversal binary trees
algorithmic scalability will be apparently endangered by along columns are differences between backward and
a lavish reconfiguration employment. Computational forward pass. However, these functions are in diverse
model is chosen with consideration to algorithmic de- dimensions (First column broadcast, row addition, and
mands. Designers usually state their solutions on an un- finally row broadcast in forward. First row broadcast,
restricted model. It means the size of model depends on column addition and column broadcast in backward.), the
algorithm size; however, the smaller one exists in prac- structures are the same:
tice. Every deprived processor is alternated by memory 1. Binary tree addition;
and its function is overloaded on another processor. This 2. Broadcasting.
property is reduced in dynamic model by frequent mod- The basic operations in contemporary algorithm which
ification of arrangement and should be evaluated on algo- will be introduced are as below:
rithm design by making a trade-off between reconfigura-
tion employment and scalability. It is possible by a gener-
al view at a reconfigurable model and taking the closer
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 59

Connect function is used to configure bus in horizontal


or vertical manner between two nodes. Head and Tail
must be separated from not-included neighbors. Note
that the internal wire connection state is set by consider-
ing to Switches bit-vector.

Transfer function initializes switches by Connect and a


Fig. 1. Example of Head and Tail on an HVRM sub-bus.
packet will be sent through the bus. WriteOnBus commu-
nicates value from BusRegister to bus and ListenToBus
from bus to BusRegister. Transfer is used for adding and
broadcasting.

Fig. 2.Example of Sending Data from Head to Tail according to Fig 1. Sum is the basis of addition. RowSum is an example of
addition in 0th row. It can be deployed for other rows and
the result would be delivered by RowBroadcast(0,y) and
If the RM with n to n elements is indexed by leftmost
the diagonals are still assumed as roots. ColumnSum is
column 0 and rightmost column n-1 and topmost row 0
written by similar appearance. Traversal of binary tree
and bottommost row n-1, then every PE can be addressed
from leaves to leftmost node, root of tree, is depicted in
by notation of where i indicates row and j indicates
Fig. 3 by involving 8 nodes and in 3 steps, depth of binary
column. Accordingly, Head of sub-bus is whose PE is near
tree=log(8).
to and Tail is the end and closer to .
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 60

The area occupied by hypercube or HVRM is able to be


a function of link number within models. The number of
links is product of node degree to P. In hypercube, it
equals to and in HVRM is less than .

7 CONCLUSION
Fig. 3. Example of adding values in a row of 8-PE-HVRM. The label In summary, there is a comparison between these two
corresponding to same long link depicts the phase number of parallel models in Table 1 representing HVRM includes much
transformation among nodes.
better implementation of Check-Boarding rather than
hypercube, and of course, it needs some simulations
which is the future work.

TABLE 1
COMPARISON BETWEEN HYPERCUBE AND HVRM
Factors
Implemen-
tations Run-Time Perfor-
Scalability Area
mance(broadcasting)
It increases in
Hypercube Log(P) PLog(P)
O( )
HVRM O(P)

ACKNOWLEDGMENT
The authors hope readers boost the content by raising
The forward and backward pass pseudo-codes are the new issues in our mail box, and finally sending our spe-
same. Forward is shown for evident. This pseudo is ap- cial regards to S.Z. Mousavi for painting the figures.
plied in a uniform neural net having L layer with n nodes
in each layer. First step is ColumnBroacast which involves REFERENCES
same structure as RowBroadcast, x is row and y is column [1] Ben Krose and Patrick van der Smagt, "Chapter 2 Fundementals; Chapter 3
and is a node sending packet to all other same-column- Perceptron and Adaline; Chapter 4 Back-Propagation," in An Introduction to
nodes. Every PE involves weights stored in the float array Neural Networks, 8th ed.: The University of Amsterdam, 1996,
http://www.divshare.com/download/7105390-f51. 
PE.w from L layers. The outputs of nodes are duplicated in
PEs by PE.y. The output is held during backward phase for [2] Kiran Bondalapati and Viktor K. Prasanna, "Reconfigurable Computing:
Architectures, Models and Algorithms,ʺ  Current Science, vol. 78, pp. 828--
link adaption.
837, 2000. 
[3] Ramachandran Vaidyanathan and Jerry L. Trahan, Dynamic Reconfigura-
6 MISCELLANEOUS tion: Architectures and Algorithms, 1st ed.: Springer, January 31, 2004. 

Suppose that an HVRM with m to m elements is aimed to [4] Behrooz Parhami, ʺMesh-Based Architectures; Low-Diameter Architectures,"
in Introduction to Parallel Processing: Algorithms and Architectures,  1st ed.:
contain an algorithm based on computational model by n
Springer, January 31, 1999, pp. 169-340. 
to n PEs, n is greater than m. This situation happens in
real environment. In order to execute the program, the [5] Vipin Kumar, Shashi Shekhar, and Minesh B. Amin, ʺA Scalable Parallel
Formulation of the Backpropagation Algorithm for Hypercubes and Related
local data and functional duties are divided among ex- Architectures,ʺ  IEEE Transactions on Parallel and Distributed Systems,
isted PEs. Duties instruct PEs including equivalent data to vol. 5, no. 10, pp. 1073 - 1090, October 1994. 
complete computation. As a result, the main fence to divi- [6] Dimitri P. Bertsekas and John N. Tsitsiklis,  ʺHypercube Mappings,ʺ in
sion is efficient distribution of data owned by computa- Parallel and Distributed Computation: Numerical Methods, 1st ed. Massachu-
tional model’s processors, while the duties are automati- setts: Athena Scientific, 1997, ch. 1, pp. 50-65. 
cally assigned. As mentioned at the beginning of section [7] Jing Fu Jenq and Wing Ning Li, ʺArtificial Neural Networks on Reconfigura-
V the HVRM has this quality. Therefore, an HVRM is able ble Meshes,ʺ in Parallel and Distributed Processing. Heidelberg: Springer
to accept this situation and furthermore nonuniform Berlin, 1998, vol. 1388, pp. 234-242, Workshop on Biologically Inspired
Solutions to Parallel Processing Problems Albert Y. Zomaya, The Univer-
neural nets. It results in the other benefit that is not neces- sity of Western Australia Fikret Ercal, University of Missouri-Rolla Ste-
sary to include same width and length for an HVRM to phan Olariu, Old Dominion Univesity. 
map this algorithm. Scalability is more improved in com- First S.A. Mousavi is graduated in M.S from Azad University and in
parison with hypercube; while an n-dimensional hyper- bachelor from Ferdowsi University.
cube must be with elements, it doesn’t matter with Second A. Moeini is a teacher of Tehran University in Computer
Science.
HVRM that can be designed by different elements em- Third M.R. Salehnamadi is a teacher of Azad University in Software
bedded in width and length. Engineering.
Broadcasting is the main factor in computation of Run-
Time performance. Multi-node broadcasting takes a time
from and HVRM is observed at most with
. P is the number of PEs.

Das könnte Ihnen auch gefallen