Sie sind auf Seite 1von 6

International Conference on Computing, Communication and Automation (ICCCA2016)

FPGA Based Design of Area Efficient Router


Architecture for Network on Chip (NoC)
Mayank Kumar, Kishore kumar, Sanjiv kumar gupta and Yogendera Kumar
VLSI Division, School of Electrical, Electronics and Communication Engineering
Galgotias University, Plot no. 2, Sector 17-A, Yamuna Expressway
Greater Noida, 201301 (UP) India
vlsi.mayank09@gmail.com, kishore.9953@gmail.com, sanjiv.gupta.iete@gmail.com, yogiuor@gmail.com
Abstract -FPGA Based design of area efficient router switching speed grows, network on chips for future become
architecture for NoC is proposed in the present work. Design more prone to faults and errors. Fault tolerance is becoming
entry of the proposed router is done using Verilog Hardware analytical for on chip communications [7]. A Comparatively
Description Language (Verilog HDL). In the designed router NoC design takes larger space than that of bus based solution,
four channels (east, west, north and south) are present. Each
as various routing and arbitration approaches can be
channel consists of first in first out (FIFO) buffers and
multiplexers. Buffers are used to store data in binary form and implemented as well as various organizations of the
multiplexers are used to control the data inputs and outputs. Communication infrastructure. The NoC pattern is highly
After designing the channels, crossbar switch has been designed suited to give SoC platforms adaptable and scalable over
and all the components have been integrated to form the several technology creations. NoC platforms may permit the
complete router architecture. Modelsim simulator is used to design productivity to increase as fast as technology
simulate the proposed router and Xilinx ISE 14.1 is used to capabilities and may eventually close the design productivity
obtain the RTL view of the proposed design. The synthesis of the gap [8]. Moreover, inherent redundancy of NoC helps to
proposed design is done by using SPARTAN-6 FPGA. In the tolerate faults and deal with communication critics [9]. As the
proposed work area of the router has been reduced by reducing
less area is an important requirement for the current NoCs,
the number of LUTs. Number of LUTs used in the crossbar
switch is obtained by synthesis report. Obtained results show present work focuses on the reduction of area of routers used
that the proposed router is area efficient. in NoCs.

Keywords System on chip (SoC), Network on Chip (NoC), Router, II. LITERATURE OVERVIEW
Buffers, FIFO, Crossbar, LUTs.
A new pipelined router design by Anoni Roca et al [10], have
I. INTRODUCTION focused in minimizing the router latency. Primarily those
router components have identified which bounds the router
NoC is a technology that is preconceived to remove the short frequency by taking critical paths. Limitations put on the
comings of the buses. It is a perspective to design the Performance of the router by a component called, arbiter. So
communication subsystem between IP cores in a SoC design the designer has made multiple smaller arbiters. L. benini et al
[1-2]. System on chips use dedicated buses for the [11] has given solution for the critical communication
communication among various IP cores. These buses do not problem between multiple IP cores. The original router
provide enough flexibility for the communication. NoC is an architecture was embedded in the System on chip interconnect
alternative paradigm to remove problems related to the buses network NoC. System on chip interconnect network has a
by using a communication network of switches/routers parametric router architecture. Noopur Sharma et al [12] have
connecting the IP cores [3-4]. Although, the system on chips compared the packet delay and outcomes of XY routing
designed using NoCs are getting popular these days and algorithm and OE routing algorithm. Pan Hao1 et al [13]
providing solutions to the problem related to the bus based solved the Problem of hampered communication and clock in
designs and considered as the future of the ASIC design [5-6], the architecture. Debora Matos et al [14] has designed the
these designs faces several design problems. First is the architecture, in which buffer size is reconfigurable. By this
suitable topology for the target NoCs such that the design method the excessive latency, power dissipation can be
constraints and performance needs are satisfied. Second is, the reduced. Along with that the architecture is area efficient. Phi
network interfaces design to access the on chip network and Hung et al [15] has described that various components has
routers provide the physical interconnection channel to been integrated on the NoC architecture and also the modeling
transport data between processing devices. Third is, the choice of reconfigurable components such as IP cores and fixed IPs.
of communication protocols which are suitable for on chip In this work a fresh design approach is presented to customize
interconnection networks. Finally, as technology scaled and the routers in a network-on chip for reconfigurable systems.

ISBN: 978-1-5090-1666-2/16/$31.00 2016 IEEE 1600


International Conference on Computing, Communication and Automation (ICCCA2016)
M. Vestias et al [16] Proposed a new approach of intra- chip The use of adjacent channel by local channel is not
network architecture, it discuss that in the future the considered
calculation using 16bit and 32bit data can be considered future
with the trade-off between the physical cost and real time
demands. In this paper designer have used folded torus
topology with sixty four processing element for intra chip
communication to provide throughput in terms of dead- lock
free and live-lock free and in-order data delivery, which is
suitable for NoC-based real- time processing applications. A
test chip has been made by using the above given approach on
IP6M 130 nm cmos technology. In D. Bertozzis et al [17]
work the method to increase the throughput has been
presented. Complete NoC design has been partitioned in steps
like topology selection, implementation and execution.
Topology selection is the main part for NoC design. The
performance of the NoC depends on the topology because
according to the selected topology routing algorithm has
designed. This work shows the automatic execution of
topology. Minseon Ahn et al [18] gives the concept of pseudo Fig. 1. Proposed South Channel
circuit which is used to accelerate the on chip communication.
This technique is useful for interconnected network. If the loan gritty was increased, small changes will be
occurring in the area of the NoC router because the FIFOs
III. PROPOSED FF DESIGN control circuit controls the viewgraphs of FIFO. FIFOs are
used to implement buffer, the pointers of FIFO are
In the proposed architecture the concept of borrowing the incremented to each new slot and it is not affected by the
buffers from the neighbor channels which are not in use at gritty. Loan gritty of more than one slot degrades the
particular time is used. It decreases the large buffer depth performance and here area and power is not reduced beyond a
requirement. This architecture improves the overall limit.
performance of the router. The proposed architecture allows
Fig.2 shows the complete router architecture which consists
reconfiguring the different buffer size for each channel.
of all the channels of the proposed router. Each channel
According to the need of the buffer depth the neighboring
receives the three data inputs. To understand the working of
channel occupies the empty buffers of that channel. By using
router south channel has been considered. It consists of its
the empty buffer slots of the neighbor connection cost gets
own input (din S), the left side neighbor input (din W) and the
reduced. In such a way each channel may have up to three
right side neighbor input (din E). To understand the operation
times more buffer slots than its original buffer with the size
suppose the buffer have depth of eight and the router that
defined at design time. In the proposed architecture 8:1
needs to be reconfigured as follows: South Channel requires a
multiplexers are used instead of 4:1 multiplexer to allow the
buffer depth of 15, East Channel requires buffer of 4, West
programming process. Fig.1 shows the architecture of South
Channel requires buffer depth of 5, and North Channel
Channel of the proposed work. In the proposed architecture
required buffer depth of 4. In this case, the South Channel has
we have used the buffer depth of eight for each channel. As
to borrow buffer slots from its neighbor channels. As the East
shown in the Fig.2, each channel consists of five multiplexers,
Channel occupies four out of eight slots, this channel can lend
Out of these two multiplexers are used to control the incoming
four slots to its neighbor south channel, still the South
and outgoing data, remaining three multiplexers are
Channel requires three more buffer slots.
indispensable for the FIFO to control the read and write
process, as the size of controlling multiplexers increases with Only five slots are occupied by the west channel, the three
the depth of buffer. Finite state machine of First in first out empty slots can be borrowed by the South Channel. The flit of
manages these multiplexers. To reduce the routing area, the South Channel stored in the East Channel then sent to the
strategy of reducing the total number of Lookup Tables output, flit comes from the East Channel towards the South
[LUTs] is used in the proposed router architecture. Channel (d E S) and flit comes from the output of South
Channel (dout S) via multiplexer. The outputs of south
In the design, the use of adjacent channels can only be done
channel: its own output (dout S) and two other outputs (d S E
by North, South, East and West.
and d S W) which send the flits stored in south channel.

1601
International Conference on Computing, Communication and Automation (ICCCA2016)

Especially, if one or more agents requested to access to the


shared resource one of them will receive a grant. We took an
advantage of this property to simplify logic in cases where
we require knowing whether a resource was granted but not
which certain agent it was granted to. Resources can be
shared on the pre decided priority order by using fixed
priority arbiter. A straight forward implementation of basic
bit cells by using a linear array is shown in figure, each of
which generates a grant gi if both its request input r i and the
incoming priority signal ci are insist. Moreover the incoming
priority signal is propagated to the next cell only if ri is not
asserted. This design minimized the hardware complexity;
nevertheless, its delay of the path, grant scales linearly
with the number of inputs, as stated by the dashed arrow in
Fig.3.
If a huge number of inputs must be supported we can
improved the delay by taking advantage of the point that the
logic equations for the gi and ci+1 outputs are structurally
identical to those for a binary half adders sum and carry
outputs respectively. As such it is possible to transform the
design shown in Fig.3 into an equivalent network of prefix
that computes propagation conditions hierarchically for the
initial signal of priority causing the delay to scale
logarithmically with the number of inputs. Fig. 4, shows the
generational architecture of Fixed- priority arbiter. In the
proposed work, we have used Fixed-Priority Arbiter as shown
in Figure 4. Fixed priority arbiter schedule the packet have
same priority and destination of output ports.

Fig. 2. Proposed Router Architecture


In the proposed work channel has been reconfigured with the
help of buffers of neighboring channel. To fulfill the need of
extra slots of buffer the neighbor channels buffer used slot by
slot when these slots are free. Thus we have designed our
proposed router architecture by using the buffer depth of
eight, three 8:1 multiplexers and Fixed- priority arbiter. In our
proposal we reduced the number of LUTs of fixed priority
arbiter which help us to get the proposed and area efficient
architecture of NoC router.

A. Fixed-Priority Arbiter
We have designed NoC router architecture by using the Fixed
Priority Arbiter technique. We have also increased buffer
depth with using 8:1 multiplexer and a control register. In a
router, mediating access to a shared resource between multiple
agents is one of the fundamental operations performed by the
control logic. Fig. 3. Fixed priority arbiter

1602
International Conference on Computing, Communication and Automation (ICCCA2016)
B. Fixed-Priority Arbiter: RTL Schematic View

The RTL Schematic view of Fixed-priority arbiter is shown in


Fig. 6. We have designed the crossbar with Fixed- priority
arbiter technique, where crossbar has five inputs (South, East,
West, North and Local) and five outputs. According to the set
priority we would get the selected output at the crossbar
output terminal. The RTL schematic is obtained uses Xilinx
ISE Design Suit ver.14.7.
Fig. 7 shows the RTL view of south channel and Fig.8 shows
the RTL schematic view of complete proposed router
architecture. In RTL Schematic view of complete router
architecture, we have taken the control signal for each channel
as 32 bit (31:0) which gives the control signal to the all
multiplexers so that they give selected input at the output
terminal. Other signals such as clock signal provides the clock
pulse to different signal and reset signal is used to reset all the
Fig. 4. General Architecture of Fixed-Priority Arbiter
signals of router to their initial state. Channels output going
to the crossbar connected to the channel input of crossbar and
VI. RESULTS & DISCUSSION crossbar outputs is controled by the arbiter. Read and write to
flit storage signal are used for reading and writing purpose
A. South Channel operation only. The entire channel has three bit (2:0) flits
storage.
The south channel for proposed NoC router architecture with
Fixed-priority arbiter are simulated on Modelsim simulator As shown in Fig. 8, router consist four channels and a Fixed-
version 10.4a and synthesized successfully on Xilinx ISE priority arbiter. In each channel we have three 8:1
Design Suit version 14.7. As shown in figure 5 of the multiplexers which have been used for writing and reading
simulated waveforms it can be verified that this channel is purpose of flits and two 4:1 multiplexers out of which, one
ready to store the data at its buffer locations and after storing multiplexer is used for writing flits on buffer and one
data it is ready to communicate with other channels also. multiplexer is used for ready the stored flits.

Fig. 5. South Channel: Simulation Waveform Fig. 7. RTL Schematic View of South Channel

1603
International Conference on Computing, Communication and Automation (ICCCA2016)
Table.1 Comparison of LUTs between Original Router and
Proposed Router Architecture.
Original Router Proposed Router
Resources
Architecture Architecture
Slices 92 92
LUTs 235 204
Flip Flops 92 92
Bonded IOBs 201 89

V. CONCLUSION
The main focus of the current work is aimed at an area
efficient design of a router for NoC applications. The router is
an important component of NoC design because it determines
Fig. 8. RTL Schematic View of complete Router Architecture
various network parameters like latency, throughput and
delay. In the proposed work baseline router architecture is
used and the router is designed for five inputs and five
outputs.
The simulation has been done using Modelsim Version 10.4a
and synthesis has been done by using the XILINX ISE Design
Suit Version 14.7 Tool. After comparing the proposed
architecture with base line architecture, it has been found that
the proposed architecture performs better than the baseline
architecture. It has constant delay, constant latency, and high
throughput. In addition, it has concurrent transmission which
gives it more flexibility over the baseline architecture and is
less error prone. Proposed architecture occupies lesser area
than the baseline architecture.

REFERENCES
Fig. 9 Complete Router: Simulation Waveform
[1] A. Bhanwal, M. kumar, Y. kumar, FPGA based Design of Low Power
Reconfigurable Router for Network on Chip (NoC), IEEE,International
Conference on Computing, Communication and Automation
C. Area calculation (ICCCA2015), pp 1320 1326, 2015.
[2] B. Attia, W. Chouchene, A. Zitouni, N. Abid,and R. Tourki ,
For obtaining the area efficient architecture for NoC router, A Modular Router Architecture Desgin For Network on Chip 8th
the target device is same i.e. XC6SLX4-3TQG144 on the IEEE, International Multi-Conference on Systems,
Xilinx ISE design suit ver.14.7. All the result are synthesized Signals & Devices, PP. 493-495, 2011.
on the same target device, We have to calculate the number of [3] International Technology Roadmap for Semiconductors, report 2012.
Online Available: http://www.itrs.net/ .
LUTs of original routers crossbar [19] and our proposed
[4] R. Saleh, S. Mirabbasi, AlanHu, M. Greenstreet, G. Lemieux, P. P.
routers crossbar. If a chip has less number of LUTs than it Pande, C. Grecu, and A. Ivanov, System-on-Chip: Reuse and
would takes less space to implement. Table 1 shows, the Integration, Proceedings of the IEEE, vol. 94, no. 6, pp. 1050 1069,
LUTs representation of original crossbar and Fixed-priority Jun. 2006.
arbiter, where original crossbar takes 35 LUTs to implement [5] T. Bjerregaard and S. Mahadevan, A Survey of Research and
and our Fixed-priority arbiter takes 5 LUTs to implement, Practices of Network-on-Chip, ACM Computing Surveys, vol. 38,
no. 1, pp. 1-51, 2006.
Here, there is reduction of 30 LUTs if we use Fixed-priority
[6] L. Benini and G. D. Micheli, Networks on chips: a new SoC
arbiter. paradigm, Comput., vol. 35, no. 1, pp. 70-78, 2002.

1604
International Conference on Computing, Communication and Automation (ICCCA2016)
[7] R. Holsmark and M. Hgberg, Modelling and Prototyping of a Network
on Chip, Master of Science Thesis, 2002 Electronics, online Available:
http://hem.fyristorg.com/.
[8] M. Ali, M. Welzl, and M. Zwicknagl, Networks on Chips: Scalable
Interconnects for Future Systems on Chips, 4th IEEE European
Conference on Circuits and Systems for Communications, pp. 240-245,
2008.
[9] M. Pirretti, G. M. Link, R. R. Brooks, N. Vijaykrishnan, M. Kandemir,
and M. J. Irwin, Fault tolerant algorithms for network- on-chip
interconnect, Proceedings. IEEE Computer society Annual Symposium
on VLSI, pp.46-51, Feb. 2004.
[10] A. Roca, J. Flich, F. Sil la, J. Duato, A Latency- Efficient Router
Architecture for CMP Systems, 2010 13th Euromicro Conference on
Digital System Design: Architectures, methods and Tools. pp. 165-172,
2010.
[11] L. Benini and G.D. Micheli, Network on chips: a new SoCs
paradigm,IEEE Computer, vol. 35, no. 1, pp. 7078, Jan. 2002.
[12] N. Sharma, S. Gadag, An Efficient Way to Increase Performance by
Using Low Power Reconfigurable Routers,IOSR Journal of
Electronics and Communication Engineering (IOSR JECE), Volume 8,
Issue 6, pp. 39-44, (Nov. - Dec. 2013).
[13] P. Hao1, H. QiI, D. Jiaqin, P. Pan, Comparison of 2D MESH Routing
Algorithm in NoC, IEEE 9th International Conference, pp. 791-795,
2011
[14] D. Matos, C. Concatto, M. Kreutz, F. Kastensmidt, L. Carro, and A.
Susin, Reconfigurable Routers for Low Power and High Performance,
ieee transactions on very large scale integration (vlsi) systems, vol. 19,
no. 11, pp.2045-2057, , November, 2011.
[15] P. H. Pham, P. Mau and C. Kim, A 64-PE Folded-Torus Intra-chip
Communication Fabric for Guaranteed Throughput in Network-on-Chip
Based Applications, IEEE Custom Integrated Circuit conference
(CICC) pp. 645 648, 2009.
[16] M. Vestias and H. Neto,Router design for application specificnetwork-
on-chip on reconfigurable systems, Field Program. Logic Appl, vol. 1,
pp. 389394, 2007.
[17] D. Bertozzi, A. Jalabert, M. Srinivasan, R. Tamhankar, S.Stergiou,L.
Benini, and G.D. Micheli, NoCs synthesis flow for customized domain
specific multiprocessor systems-on-chip, IEEE Trans. Parallel Distrib.
Syst., vol. 16, no. 2, pp. 113129, Feb. 2005.
[18] M. Ahn and E. Jung Kim.,Pseudo-Circuit: Accelerating
Communication for On-Chip Interconnection Networks, In
Proceedings of the 43rd Annual IEEE/ACM International Symposium
on Microarchitecture, pp. 399 408, 2010.
[19] D. Matos, C. Concatto, M. Kreutz, F. Kastensmidt, L. Carro and A.
Susin, Reconfigurable routers for low power and high performance,
IEEE Transaction on Very Large Scale Integration (VLSI) System, vol.
19, no. 11, pp. 2045-2057, Nov. 2011

1605