Beruflich Dokumente
Kultur Dokumente
In this section the costs of dealing with arbitrarily All buses have in common that there can be only one
placeable RTRMs on a fixed mesh of PRRs, which transmitter at the same time. If more than one message
is defined during design time, will be assessed. The is allowed to travel through the network routers are
goal is to find a manageable solution for the arbitrary used, cf. [9], [10]. Due to the reason that low and
placement of RTRMs, which on the one hand keeps the constant latencies are difficult to achieve on routed
number of partial configuration bit stream files small topologies and buses suffer from single message trans-
and on the other hand provides constant low commu- mission, a fully meshed network, where each RTRM
nication latencies and multiple communication access has an extra communication path to each of the other
of the RTRMs at the same time. Figure 1.a shows RTRMs, is considered. If non arbitrary placement of
an example, where RTRMs are logically grouped in RTRMs is assumed, the implementation costs of the
communication domains and Figure 1.b represents a communication infrastructure based on FPGA logic
valid placement of these RTRMs in PRRs on a FPGA. and routing resources for a fully meshed network
consist only of routing resources if communication
3.1. Costs of Communication with RTRMs paths are unregistered (unbuffered). All interconnect
topologies for communication with RTRMs have in
Different approaches have been proposed. The FPGA common that the amount of communication is limited
manufacturer Xilinx uses hard macros, also known as in bandwidth and latencies could be high compared to
bus macros, even though it provides a single directed a fully static design.
540
RTRM 1 RTRM 2
3.2. Costs of Arbitrary Placement of RTRMs sb_out[5] sb_out[5]
sb_in[5] sb_in[5]
data_out[32] data_out[32]
For the calculation of the amount of configuration bit data_in[32] data_in[32]
541
Network 1 operating modes
com−port S reg S reg R com−port R
S1 S2 1 non registered
clk
A.1 D 2 registered at sender side
Figure 3. Schematic view of the communication of 2 3 registered at receiver side
two RTRMs through the communication channel 4 registered at both sides
A.2 B D
A.1 sender transmit
3
chronous (registered) data transfer through the com- C in internal receiver reg
542
com−port 1 reg reg com−port 1
For the physical silicon implementation of all switches, different sizes of RTR grids with 40 bit communication
sending and receiving, 2 · r · (r − 1) · w multiplexers port width. The consumption of transistors used for
of (r-1)-to-1 can be utilized, where r is the maximal the switches and D-Flip-Flops (D-FF) for buffers and
amount of RTRMs which can be placed on the FPGA config registers are listed in detail. The fully meshed
at the same time and w the width of a single communi- network is a silicon hard block and exposed to a
cation port. The amount of transistors can be reduced RTRM through communication channel hard blocks
further if tristate buffers are utilized. This is possible (primitives), one for each communication partner. The
because the transmission is only unidirectional, point location of the hard blocks of the communication ports
to point and can be terminated at the receiver side. The must be within a PRR respectively clock region of a
buffers are neither load dependent nor depend on the FPGA. The sending and receiving ports can be equally
length of the transmission line, which should be made distributed over the PRR (cf. Figure 8.a) or placed near
equal for all communication channels, independently the center of the FPGA (cf. Figure 8.b), which is the
of the physical location to ease the timing analysis preferred variant to allow additional RTRMs working
during the place and route process of a customer at a slower speed behind a master RTRM. Building
design using RTRM designs. The number of tristate larger PRRs allows channel combining to double the
buffers for a configurable fully meshed network with bandwidth (cf. Figure 9.a and Figure 9.b for 8 RTRMs
r PRRs/RTRMs is 2·r·(r−1)2 ·w. For a schematic view with 16 communication channels). The silicon solution
of an outgoing and incoming switch implemented with provides a high speed, i.e. low jitter, low latency path,
tristate buffers confer Figure 7. The implementation of compared to a FPGA routed fully meshed network
a fully meshed silicon network for 16 to 24 RTRMs is design which needs up to 10 ns (cf. [14]) to reach
feasible with 40 bit port width, if we compare this the most distinct RTRMs.
to the number of transistors used for Virtex-5 (1.1
billion) and Altera Stratix IV (2.5 billion) FPGAs. An 4.5. HDL Design
implementation with 16 (24) RTRMs by 40 bit width
(with registers for buffers and config for switches) RTRMs which want to utilize the configurable, fully
using tristate buffers takes about 1.4‰ (4.4‰) of the meshed network have to instantiate the communication
Virtex-5 FPGA transistors or about 2.4‰ (8.1‰) for channel primitive COM CHANNEL for each con-
the multiplexer solution. Table 1 shows the amount nection to a distant RTRM (cf. Figure 10) in HDL
of transistors needed for the silicon implementation (e.g. Verilog or VHDL). This is comparable with the
of a configurable, fully meshed network on chip for instantiation of a BlockRAM (BRAM) or multiplier
543
com−port 1 reg R S S R R S S R
com−port 2 reg
com−port 3 reg
clk config
RTRM 1
com−port 1 reg
com−port 2 reg
com−port 3 reg
clk
(a) Location of hard blocks (b) Location of hard blocks
config
equally distributed inside a near the center of the FPGA
PRR
Figure 7. Tristate buffer solution for the sending
Figure 8. Location of com-ports as hard blocks for
and receiving switch in a RTR grid with a cFMN
a RTR grid with 16 PRRs/RTRMs
#PRRs
cFMN (40 bit/port) 2 4 8 16 24
R S S R R S S R
#transistors for
tristate sol. (mil.) 0.00 0.01 0.13 1.15 4.06
#transistors for
multiplexer sol. (mil.) 0.00 0.02 0.25 2.30 8.13
#transistors for
regs (D-FF) (mil.) 0.00 0.02 0.07 0.31 0.71
#transistors for
config (D-FF) 0 768 5376 30T 88T
P
transistors for
tristate sol. (mil.) 0.00 0.03 0.20 1.49 4.86
P
transistors for
muxer sol. (mil.) 0.00 0.04 0.33 2.64 8.92
Virtex-5 transistors
tristate util. (%) 0.00 0.00 0.01 0.14 0.44
(a) Location of hard blocks (b) Location of hard blocks
Virtex-5 transistors
equally distributed inside a near the center of the FPGA
muxer util. (%) 0.00 0.00 0.03 0.24 0.81
PRR
Table 1. Resource utilization for RTR grids with
Figure 9. Location of com-ports as hard block for
different amount of PRRs using a NoC - both with
a RTR grid with 8 PRRs/RTRMs
40 bit / com-port
544
c o m c h a n n e l i n s t : COM CHANNEL
g e n e r i c map (
CHANNEL NUMBER => n u m b e r p e r m o d u l e ;
p o r t map (
d a t a s e n d => r t r m c o m d a t a o u t ;
d a t a r e c e i v e => r t r m c o m d a t a i n ;
c o n f i g => c o n f i g ;
c l k => c l k ;
};
10
0.1
configurable logic on a FPGA and a dedicated silicon
network on chip solution. Due to the long synthe- 0.01
545
tested on a host system in conjunction with a Xilinx [4] P. Chen and A. Ye, “The effect of sparse switch
XC5VLX50T FPGA PCI Express card. patterns on the area efficiency of multi-bit routing
resources in field-programmable gate arrays,” in Field
Programmable Logic and Applications, 2008. FPL
6. Conclusion 2008. International Conference on, Sept. 2008, pp.
427–430.
In this paper a solution has been presented for the
arbitrary placement of RTRMs in a RTR grid on DPR [5] Xilinx, “Two flows for partial reconfiguration: Module
capable FPGAs based on a dedicated silicon commu- based or difference based,” in Application Note: Virtex,
nication network. The implementation, configuration Virtex-E, Virtex-II, Virtex-II Pro Families (XAPP290),
2004.
and utilization of the dedicated, silicon, configurable,
fully meshed network was shown. The silicon network [6] J. Hagemeyer, B. Kettelhoit, M. Koester, and M. Por-
provides a high speed, i.e. low jitter, low latency, so- rmann, in Design of Homogeneous Communication
lution and consumes only a tiny amount of transistors Infrastructures for Partially Reconfigurable FPGAs
compared to an implementation with FPGA resources. (ERSA). CSREA Press, 2007.
The communication infrastructure allows different [7] D. Koch, C. Beckhoff, and J. Teich, “ReCoBus-Builder
applications built on RTRMs to coexist and commu- a Novel Tool and Technique to Build Statically and
nicate without interference compared to other known Dynamically Reconfigurable Systems for FPGAs,” in
bus or routed network solutions for FPGAs. Proceedings of International Conference on Field-
Programmable Logic and Applications (FPL 08), Hei-
The communication channels are customizable to
delberg, Germany, 2008.
the requirements of the application, which is able to
run self-defined protocols over the channels. For real- [8] D. Koch, C. Beckhoff, and J. Teich, “A Communi-
time applications the communication channels provide cation Architecture for Complex Runtime Reconfig-
a constant and low latency. Additionally the run-time urable Systems and its Implementation on Spartan-
3 FPGAs,” in Proceedings of the 17th ACM/SIGDA
environment system for managing such a network has International Symposium on Field-Programmable Gate
been presented for different fields of applications. Arrays (FPGA 2009). Monterey, California, USA:
ACM, Feb. 2009, pp. 233–236.
7. Future Work
[9] J. Surisi, C. Patterson, and P. Athanas, “An efficient
In this paper we have concentrated on a network design run-time router for connecting modules in FPGAs,”
in Proceedings of International Conference on Field-
with only one network clock frequency for all modules. Programmable Logic and Applications (FPL 08), Hei-
Communication channels working at different clock delberg, Germany, 2008.
speeds seem to be promising.
[10] T. Pionteck, C. Albrecht, K. Maehle, E., Hübner, M.,
8. Acknowledgment and Becker, J., “Commuication Architectures for Dy-
namically Reconfigurable FPGA Designs,” in Proceed-
ings of IEEE International Parallel and Distributed
The project is performed in collaboration with the Cen-
Processing Symposium, IPDPS USA, 2007.
ter of Advanced Study Böblingen, IBM Deutschland
Research & Development GmbH in Germany. [11] E. Lubbers and M. Planner, “ReconOS: An RTOS
Supporting Hard-and Software Threads,” Field Pro-
References grammable Logic and Applications, 2007. FPL 2007.
International Conference on, pp. 441–446, Aug. 2007.
[1] N. A. Woods and T. VanCourt, “FPGA Acceleration of [12] H. K.-H. So and R. Bordersen, “File System Access
Quasi-Monte Carlo in Finance,” in FPL. IEEE, 2008, From Reconfigurable FPGA Hardware Processes In
pp. 335–340. BORPH,” in Proceedings of the 2008 IEEE Interna-
[2] G. L. Zhang, P. H. W. Leong, C. H. Ho, K. H. Tsoi, tional Conference on Field-Programmable Logic, FPL
C. C. C. Cheung, D.-U. Lee, R. C. C. Cheung, and 2008, 8-10 September, Heidelberg. IEEE, 2008.
W. Luk, “Reconfigurable Acceleration for Monte Carlo
[13] “Virtex-II Platform FPGAs: Complete Data Sheet,”
Based Financial Simulation,” in FPT, G. J. Brebner,
Xilinx DS031, vol. 3.5, p. 20, 2007.
S. Chakraborty, and W.-F. Wong, Eds. IEEE, 2005,
pp. 215–222.
[14] J. Strunk, T. Volkmer, K. Stephan, W. Rehm, and
[3] J. Strunk, A. Heinig, T. Volkmer, W. Rehm, and H. Schick, “Impact on Run-Time Reconfiguration on
H. Schick, “Run-Time Reconfiguration for Hypertrans- Design and Speed - A Case Study Based on a Grid of
port coupled FPGAs using ACCFS,” in proceedings Run-Time Reconfigurable Modules inside a FPGA,” in
of the Workshop on HyperTransport Research and proceedings of the Reconfigurable Architectures Work-
Applications (WHTRA), 2009. shop (RAW) / IPDPS, 2009.
546