NOC-torus-Folded: New Distributed Networks Architecture

NOC-torus-Folded:
New distributed networks Architecture

Teamour Esmaeili
Dep.of Computer Engineering
DareShahr Branch,
Islamic Azad University, Iran
Ghazal Lak
DareShahr Branch,
Akram Noori Rad
DareShahr Branch,
AbstractNThe Folded Torus interconnection topology is widely used in massively parallel machines. Defect in manufacturing of
integrated circuits is almost inevitable, and fast scaling in technology has caused the components of a Network-on-Chip (NoC)
to be more susceptible to faults. Therefore, it is crucial to sustain chip production yield and reliable operation in the presence of
defects.
Index TermsNNetwork on Chip (NOC), NOC-torus-Folded, NS-2, simulation.
~~~~~~~~~~ ~~~~~~~~~~
1 INTRODUCTION
n chip multiprocessors (CMPs), data access latency de-
pends on the memory hierarchy organization, the on-
chip interconnect (NoC), and the running workload.
Reducing data access latency is vital to achieving per-
formance improvements and scalability of threaded ap-
plications. Multithreaded applications generally exhibit
sharing of data among the program threads, which gen-
erates coherence and data traffic on the NoC.
Many NoC designs exploit communication locality to
reduce communication latency by configuring special fast
paths on which communication is faster than the rest of
the NoC. Communication patterns are directly affected by
the cache organization. However, many cache organiza-
tions are designed in isolation of the underlying NoC or
assume a simple NoC design, thus possibly missing op-
timization opportunities. In this work, we present a NoC-
aware design to reduce data access latency, improve utili-
zation of network, and improve overall system perform-
ance.
The number of processor, memory and accelerator
cores on systems-on-chip is rapidly increasing to support
evolving standards and new applications. Computation
and communication complexity is skyrocketing, and scal-
ability centric design paradigms are critically needed [1].
Networks-on-Chip (NoCs) have emerged as the best
alternative to provide high performance in communica-
tion for futures Systems-on-Chip (SoCs) with dozens of
cores integrated on a single silicon die. Mapping an appli-
cation to on-chip network is the first and the most impor-
tant step in the design flow as it will dominate the overall
performance and cost [2]. Several approaches have been
proposed in literature in the context of topological map-
ping in NoCs [3].
Mapping algorithms are mostly focused on 2D mesh
topology which is the most popular topology in NoC de-
sign due to its layout efficiency, good electrical properties
and simplicity in addressing on-chip resources. Another
concern in NoC implementation is selecting an efficient
routing strategy while providing freedom from deadlock.
The routing algorithm determines the path that each
packet follows between a source-destination pair. In the
future chip generations, faults will appear with increasing
probability due to the susceptibility of shrinking feature
sizes to process variability, age-related degradation,
crosstalk, and single-event upsets. To sustain chip pro-
duction yield and reliable operation, very large numbers
of faults will have to be tolerated [4, 5]. This argument
strengthens the notion that chips need to be designed
with some level of built-in fault tolerance. Furthermore,
relaxing the requirement of 100% correctness in the op-
eration of various components and on-chip channels pro-
foundly reduces the manufacturing cost as well as cost
incurred by test and verification [6].
Multi-core processor performance is dependent on the
data access latency, which is highly dependent on the
design of the on-chip interconnect (NoC) and the organi-
zation of the memory caches. The cache organization af-
fects the distance between where a data block is stored on
chip and the core(s) accessing the data. The cache organi-
zation also affects the utilization of the cache capacity,
which in turn affects the number of misses that require
the costly off-chip accesses. As the number of cores in the
system increases, the data access latency becomes an even
greater bottleneck.
In the domain of network-on-chip (NoC), previous re-
search has attempted to reduce communication latency by
a variety of approaches. One approach is based on reduc-
ing the global hop count [7, 8, 9, 10, 11,12]. Another ap-
proach provides fast delivery of high priority cache
blocks [13]. A third approach configures fast paths or cir-
I
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 154
cuits which may be optical - through the NoC such that
traffic traveling on these fast paths enjoys lower latency
than regular traffic [14, 15, 16]. Communication locality is
exploited in [14, 15, 17] such that the communication over
a subset of source-destination node pairs is given higher
priority than the rest of the traffic and explicit circuits
connecting the source-destination pairs are configured to
carry this higher priority traffic.
Static non-uniform cache architecture (S-NUCA) [18]
and Private [19] caches represent the two ends of the
cache organization spectrum. However, neither of them is
a perfect solution for CMPs. S-NUCA caches have better
utilization of cache capacity - given that only one copy of
a data block is retained in the cache - but suffers from
high data access latency since it interleaves data blocks
across physically distributed cache banks, rarely associat-
ing the data with the core or cores that use it. Private
caches allow fast access to on-chip data blocks but suffer
from low cache capacity utilization due to data replica-
tion, thus resulting in many costly off-chip data accesses.
Many researchers suggested hybrid cache organizations
that attempt to keep the benefits of both S-NUCA and
private cacheswhile avoiding their shortcomings [20-26].
Most of these cache proposals assumed a simple 2-D
packet-switched mesh interconnect. Such interconnects
can be augmented with the ability to configure fast paths
[15-17].
2 SYSTEM ARCHITECTURE
Network topology determines the connectivity among
nodes and is therefore a first-order determinant of net-
work performance and energy-efficiency. Since the abil-
ity of the network to efficiently disseminate information
depends largely on the topology, we especially focus on
different types of Topologies. Figure 1 shows the torus
noc and FOLDED-TORUS topologies.
3 SIMULATION METHODOLOGY
In the last few years different network simulation tool
have been developed. One of the most popular is the ns2
Network Simulator (Breslau et al. 2000). Ns2 is an open
source event driven network simulator. It provides sup-
port for simulation of IP-based network. In particular, ns2
provides researchers with:
- MuIlicasl and unicasl iouling protocols;
- Diffeienl lianspoil piolocoIs (TCI, UDI, RTI, elc.),
- Mosl connon appIicalions (ITI, TeInel, HTTI).
Localization of a network element (agent in ns2 terminol-
ogy) in a simulation scenario is a two steps process,
which requires:
- The Iocalization of the node in which the agent must be
instantiated
- The inslanlialion of lhe agenl.
To support agent instantiation in large topologies we
have included in the extended NAM Editor the concept of
Node Set. A Node Set is a set of nodes selected according
to one of the following criteria:
- Leaf node,
- MuluaI dislance,
- RandonIy.
When a network topology is created with one of the sup-
ported topology generators, some Node Sets are auto-
matically created, reflecting the topology model. For in-
stance, in the case of a transit-stub topology, a Node Set is
associated to each transit domain and a different one to
each stub domain.
To make this tool even more flexible, we made the GUI of
our Extended Nam Editor customizable by the end user.
3.1 Simulation Details
In this paper, we have modeled our architecture concepts
with the widely used network simulator ns-2 [4]. NS2 has
been widely applied in research related to the design and
evaluation of computer networks and to evaluate various
design options for architectures [27], including the design
of routers, communication protocols, etc.
Ns-2 [28] is a discrete event network simulator designed
for simulation of ordinary networks of computers. As
Fig.1. (a) Torus noc topology (k=4) (b) FOLDED-TORUS (k=4)
many models of network components are provided, the
user can simulate at a high abstraction level. Yet, it is pos-
sible to implement new components in the network
model. ns-2 has support for local area networks, mobile
networks and even satellite networks. Two computer lan-
guages are used in ns-2, namely C++ and OTcl.
We would use the tool, Network Simulator ns-2 [29], [30],
Which has been extensively used in the research for de-
sign and evaluation of public domain computer network,
to evaluate various design options for NOC architecture,
including the design of router, communication protocol,
Routing algorithms.
NS-2 is an open source, object-oriented and discrete event
driven network simulator written in C++ and OTCL. It is
a very common and widely used tool to simulate small
and large area networks [31].
4 SIMULATION METHODOLOGY
In this section, simulation results are presented. We have
simulated different levels of NOC-torus-Folded topolo-
gies which they have recursive structure by using NS-2
simulator. Each of the topologies is simulated in different
size. Figures of simulation are shown below..
4.1 NOC-torus-Folded 4*4
Some of the simulations in which the number of nodes is
high may have a different view. For example Figures 2 to
3, show different views of NOC-torus-Folded topology
which each of them consists of 16 nodes.
4.2 NOC-torus-Folded 6*6
Figure 4, shows the NOC-torus-Folded topology which
consists of 36 nodes.
Fig.2. The 1
st
view of 4*4 NOC-FOLDED-TORUS
Fig.3. The 2nd view of 4*4 NOC-FOLDED-TORUS
Fig.4. The 6*6 NOC-FOLDED-TORUS
REFERENCES
[1] L. Benini, "Application Specific NoC Design," date,
vol. 1, pp.105, Proceedings of the Design Automation
& Test in Europe Conference Vol. 1, 2006.
[2] W. Shen, C. Chao, Y. Lien, A. Wu, "A New Binomial
Mapping and Optimization Algorithm for Reduced-
Complexity Mesh-Based On-Chip Network," nocs,
pp.317-322, First International Symposium on Net-
works-on-Chip (NOCS'07), 2007.
[3] A. RoshanFekr, M. Janidarmian, V. Samadi Bok-
haraei, A. Khademzadeh, "Yield Enhancement with a
Novel Method in Design of Application-Specific
Networks on Chips," Electrical Engineering and Ap-
plied Computing, Volume 90, 247-257, 2011.
[4] S. Iuilei, The fuluie of conpulei lechnoIogy and ils
inpIicalions foi lhe conpulei indusliy, Conpul. }.,
vol. 51, no. 6, pp. 735-740,2008.
[5] S. Borkar, "Designing Reliable Systems from Unreli-
able Components: The Challenges of Transistor Vari-
ability and Degradation," IEEE Micro, vol. 25, no. 6,
pp. 10-16, Nov./Dec. 2005.
[6] T. Dunilia, R. MaicuIescu, "On-Chip Stochastic
Communication," date, vol. 1, pp.10790, Design, Au-
tomation and Test in Europe Conference and Exhibi-
tion (DATE'03), 2003.
[7] J. D. Balfour and W. J. Dally. Design tradeoffs for
tiled cmp on-chip networks. In ICS, pages 187-198,
2006.
[8] S. Bourduas and Z. Zilic. A hybrid ring/mesh inter-
connect for network-on-chip using hierarchical rings
for global routing. In NOCS, pages 195-204, 2007.
[9] R. Das, S. Eachempati, A. K. Mishra, N. Vijaykrish-
nan, and C. R. Das. Design and evaluation of a hier-
archical on-chip interconnect for next-generation
cmps. In HPCA, pages 175-186, 2009.
[10] B. Grot, J. Hestness, S.W. Keckler, and O. Mutlu. Ex-
press cube topologies for on-chip interconnects. In
HPCA, pages 163-174, 2009.
[11] J. Kim, J. D. Balfour, and W. J. Dally. Flattened
buttery lopoIogy foi on-chip networks. In MICRO,
pages 172-182,2007.
[12] Y. Xu, Y. Du, B. Zhao, X. Zhou, Y. Zhang, and J.
Yang. A low-radix and low-diameter 3d interconnec-
tion network design. In HPCA, pages 30-42, 2009.
[13] E. Bolotin, Z. Guz, I. Cidon, R. Ginosar, and A. Ko-
lodny. The power of priority: Noc based distributed
cache coherency. In NOCS, pages 117-126, 2007.
[14] I. Artundo,W. Heirman, M. Loperena, C. Debaes, J. V.
Campenhout, and H. Thienpont. Low-power
reconhguialIe nelvoik aichilecluie foi on-chip pho-
tonic interconnects. High-Performance Interconnects,
Symposium on, 0:163-169,2009.
[15] N. D. E. Jerger, L.-S. Peh, and M. H. Lipasti. Circuit-
switched coherence. In NOCS, pages 193-202, 2008.
[16] A. Kumar, L.-S. Peh, P. Kundu, and N. K. Jha. Ex-
press virtual channels: towards the ideal interconnec-
tion fabric. In ISCA, pages 150-161, 2007.
[17] A. Abousamra, R. Melhem, and A. K. Jones.Winning
with pinning in NoC. In Proc. of Hot Interconnects
(HOTI), 2009.
[18] C. Kim, D. Burger, and S.W. Keckler. Nonuniform
cache architectures for wire-delay dominated on-chip
caches. IEEE Micro, 23(6):99-107, 2003.
[19] J. A. Brown, R. Kumar, and D. M. Tullsen. Proximity-
aware directory-based coherence for multi-core proc-
essor architectures. In SPAA, pages 126-134, 2007.
[20] B. M. Beckmann,M. R. Marty, and D. A. Wood. Asr:
Adaptive selective replication for cmp caches. In MI-
CRO, pages 443-454, 2006.
[21] J. Chang and G. S. Sohi. Cooperative caching for chip
multiprocessors. In ISCA, pages 264-276, 2006.
[22] Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Op-
timizing replication, communication, and capacity al-
location in cmps. In ISCA, pages 357-368, 2005.
[23] Z. Guz, I. Keidar, A. Kolodny, and U. C. Weiser. Util-
izing shared data in chip multiprocessors with the
nahalal architecture. In SPAA, pages 1-10, 2008.
[24] N. Hardavellas,M. Ferdman, B. Falsah, and A. AiIa-
maki.Reactive nuca: near-optimal block placement
and replication in distributed caches. In ISCA, pages
184-195, 2009.
[25] J. Huh, C. Kim, H. Shah, L. Zhang, D. uigei, and
S.W. Keckler. A nuca substrate for exilIe cnp cache
sharing. In ICS, pages 31-40, 2005.
[26] M. Zhang and K. Asanovic. Victim replication: Max-
imizing capacity while hiding wire delay in tiled chip
multiprocessors. In ISCA, pages 336-345, 2005.
[27] R. Lemaire, F. Clermidy, Y. Durand, D. Lattard, and
A. }eiiaya, Ieifoinance LvaIualion of a NoC-Based
Design for MC-CDMA Telecommunications Using
NS-2, in The 16th IEEE International Workshop on
Rapid System Prototyping, Jun. 2005, pp. 24-30.
[28] Breslau L., Estrin D., Fall K., S. Floyd, J. Heidemann,
A. Helmy, P. Huang, S. McCanne, K. Varadhan, Ya
Xu, and Haobo Yu. "Advances in network simula-
tion", IEEE Computer, 33(5):59{ 67, May 2000.
[29] LBNL Network Simulator, http://www-
nrg.ee.lbl.gov/ns/
[30] The network simulator-ns-2,available at
http://www.isi.edu/nsnam/ns/
[31] M. Ali, M. Welzl, A. Adnan, F. Nadeem , " Using the
NS-2 Network Simulator for Evaluating Network on
Chips (NoC)" International Conference on Emerging
Technologies, pp.506 - 512, 2006.

NOC-torus-Folded: New Distributed Networks Architecture

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

NOC-torus-Folded: New Distributed Networks Architecture

Hochgeladen von

Copyright:

Verfügbare Formate

NOC-torus-Folded:

New distributed networks Architecture

Das könnte Ihnen auch gefallen