Beruflich Dokumente
Kultur Dokumente
0:
A Detailed On-Chip
Network Model Inside a
Full-System Simulator
Tushar Krishna
Assistant Professor
School of ECE and CS
gem5 workshop Georgia Institute of Technology
ARM Research Summit tushar@ece.gatech.edu
September 11, 2017
http://synergy.ece.gatech.edu/tools/garnet
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 2
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 3
Networks-on-Chip
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 4
Introduction to NoCs
Rsp
Req
On-Chip Network
Fwd
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 5
Modern NoCs
Core
L1 L1
D$ I$
Network
L2$ Interface
L3$/Directory Router
“Tile”
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 6
Network Architecture
u Topology
u Routing
u Flow Control
u Router Microarchitecture
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 7
Topology:
How to connect nodes with links
~Road Network
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 8
Routing:
Which path should a message take
~Series of road segments from source to destination
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 9
Flow Control:
When does a message stop/proceed
~Traffic Signals / Stop signs at end of each road segment
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 10
Router Microarchitecture:
How to build the routers
~Design of traffic intersection (number of lanes,
algorithm for turning red/green)
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 11
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 12
Why model NoCs accurately?
Case Study I: Case Study II:
Directory vs. Broadcast Protocols Private vs. Shared L2
Full-state Directory HyperTransport Token Coherence Shared L2 Private L2
1.6 1.2
Normalized Runtime
1.4
Normalized Runtime
1
1.2
1 0.8
0.8 0.6
0.6
0.4
0.4
0.2 0.2
0
0
Baseline NoC FANOUT + FANIN NoC SMART NoC Baseline NoC SMART NoC
(Intel SCC) (Krishna et al, MICRO 2011) (Krishna et al, HPCA 2013) (Intel SCC) (Krishna et al, HPCA 2013)
64-core CMP with different NoC Microarchitectures 64-core CMP with different NoC Microarchitectures
64-core CMP running PARSEC workloads in full-system gem5. Average runtime plotted.
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 13
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 14
Garnet2.0
u Detailed NoC Model
u Currently part of Ruby Memory System in gem5
u Original version (5-stage pipeline) released in 2009
u Developed by Niket Agarwal (currently at Google) and myself
u New version (1-stage pipeline, more configurability)
released in 2016
u Resources
u Source: src/mem/ruby/network/garnet2.0
u gem5 wiki page: www.gem5.org/garnet2.0
u Dev patches + practice labs:
http://synergy.ece.gatech.edu/garnet
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 15
Each topology is a python file
Topology in configs/topologies/
Dir Dir
Core Core Core
+ L1$ + L1$ + L1$
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 16
Topology Configurable Parameters
u Router
u router_latency (in cycles)
u Can be set per router
u Defined in src/mem/ruby/network/BasicRouter.py
u Link
u link_latency (in cycles)
u Can be set per link
u Defined in src/mem/ruby/network/BasicLink.py
u weight (i.e., link weight)
u To bias routing algorithm [later slides]
u src_outport (string) and dst_inport (string)
u Port direction (e.g., “East”)
u Helps with readability of Config file + Adaptive routing algorithms
u bw_multiplier (value)
u Used by Ruby’s simple network model, NOT by Garnet
u Link bandwidth is set inside Garnet via the ni_flit_size parameter [later slides]
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 17
Default Topologies Supported
u Pt2Pt
u Crossbar
u Mesh
u Mesh_XY
u Mesh_westfirst
u MeshDirCorners_XY
u Cluster
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 18
Routing
u Routing table
u Automatically populated based on topology
u All messages use shortest path
u In case of multiple options, the path with the smaller
weight is chosen
u Deterministic Routing
u Custom
u Users can leverage outport/inport direction names
associated with each port to implement custom
algorithms (say adaptive)
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 19
Deadlock Avoidance
u Deadlock: A condition in which a set of agents wait
indefinitely trying to acquire a set of resources
u
D 0 1 A
x
v
w
3 2
C B
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 20
Deadlock-free Routing Algorithms
u Deadlocks may occur if the turns taken form a cycle
1 1
2 2 2 2 2
2
1 1
SA SB SC
1 1
Mesh_XY
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 22
Deadlock-free Routing Algorithms
u Assign weights to bias which links used first
(to ensure no cyclic dependence)
West-first Routing: Go W first, then N/S 1 1
DC
2 2
2 2 2 2 2 2
DA DB
1 1
2 2
2 2 2 2 2 2
1 1
SA SB SC
2 2
Mesh_westfirst
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 23
Flow Control
u Virtual Channels
u Coherence Protocol requires certain number of virtual
networks / message classes to avoid protocol deadlocks
u This is the minimum number of VCs required
u Within each vnet, there can be more than one VC for
boosting network performance
u In Garnet, only one packet can use a VC inside a router at a time
u VCs in vnets carrying control messages are 1-flit deep
u VCs in vnets carrying data (cacheline) messages are 4-5 flit deep
u Credits
u Each VC conveys its buffer availability by sending credits
to its upstream router
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 24
Conventional VC Router Microarch
Route BW: Buffer Write
Compute VC Allocator
RC: Route Compute
SW Allocator
V
N
VC 0 VA: VC Allocation
0 VC 1 Input VCs arbitrate for “output”
FLIT VCs (Input VCs at next router)
V
N VC n
1
Input Buffers SA: Switch Allocation
Input ports arbitrate for
VC 1
output ports
VC 2
BR: Buffer Read
VC n
ST: Switch Traversal
Input Buffers Crossbar Switch
LT: Link Traversal
BW RC VA SA BR ST LT
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 25
Single-Cycle Router Implementation
Router.ccàwakeup()
1 2
3
SwitchAllocator.cc
àwakeup() OutputUnit.cc
Network InputUnit.cc àwakeup()
Link.cc àwakeup() * Arbitrate inports has_free_vc()
VN0 VC 0
VC 1
* Arbitrate outports
VN1 VC 2 * VC Allocate select_free_vc() CreditLink.cc
OutVCState.cc
VC 3 * send credit
VirtualChannel.cc (schedule creditlink
Credit wakeup next cycle) * Rcv Credit
* Buffer Write * Buffer Read * Update State
Link.cc
* Route Compute 4 (send flit to switch)
CrossbarSwitch.ccàwakeup()
For multi-cycle router, NetworkLink.cc
add delay in flit * Switch Traversal: Push winner flits on link queue
* Schedule output link wakeup for next cycle
before it can do SA
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 26
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 27
System Organization
GarnetNetwork.cc
Network
Coherence Network
Link
Protocol Interface
Credit
Interface is Router
Router Link
“MessageBuffers
” from Ruby Stats
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 28
Msg_size decides Can add additional
Network Interface number of flits info in flit
Cache Controller
VN0
L2$ NI VC 1
NI NI VN1
VC 2
DMA NI R R
VC 3
To/From
Ingress Router
R R
NI NI
L2$ NI
NI NI
L2$ Egress
Dir
Core Core NetworkInterface.cc
+ L1$ + L1$ àwakeup()
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 29
Garnet Configurable Parameters
u ni_flit_size
u Default = 16B (128b) à 1-flit ctrl, 5-flit data
u This sets the bandwidth of each physical link
u vcs_per_vnet
u Total VCs in each inport = num_vnets * vcs_per_vnet
u buffers_per_data_vc
u Default = 4
u buffers_per_ctrl_vc
u Default = 1
u routing_algorithm
u Weight-based table or custom
u Defined in:
src/mem/ruby/network/garnet2.0/GarnetNetwork.py
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 30
Command Line Parameters
src/mem/ruby/network/
BasicRouter.py Network.py Definitions and
BasicLink.py Default Values
garnet2.0/GarnetNetwork.py
configs/
ruby/Ruby.py common/Options.py
network/Network.py example/garnet_synth_test.py
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 31
Running Garnet with Ruby
u Build Ruby Coherence Protocol
scons build/X86_MOESI_hammer/gem5.opt PROTOCOL=MOESI_hammer
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 32
Running Garnet Standalone
u Build the Garnet_standalone protocol
scons build/NULL/gem5.debug PROTOCOL=Garnet_Standalone
u Dummy protocol just for traffic injection via Garnet
Synthetic Traffic tester (next slide)
u 3 Virtual Networks: vnet 0 and vnet 1 inject ctrl (1-flit)
packets, vnet 2 injects data (5-flit) packets
m5out/config.ini
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 36
Output Stats
m5out/stats.txt
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 39
Extensions and FAQs
u Topology
u How do we model multiple BW links in Garnet?
u Inherently, that would require support in the router to manage multiple flit
sizes. Instead, you can add multiple links between the same nodes if you want
to model higher bandwidth. If they have the same weight, Garnet will
randomly send over the two.
u Can we model a heterogeneous CPU-GPU system?
u Yes. The current AMD GPU model models a cluster of CPUs connected to a GPU.
u Can we model indirect networks such as Clos?
u Yes, there can be additional routers that are not connected to any controller
and act purely as switches.
u Can we model large-scale HPC networks?
u Garnet can model any sized network. You can run 256-node standalone
simulations easily. However, beyond that gem5 cannot instantiate more
directories (which it uses as destination nodes). I have a patch to run 1024-
node synthetic sims on my website. But these run quite slowly.
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 40
Extensions and FAQs
u Routing
u How do we model an adaptive routing algorithm?
u If you want to use internal NoC metrics (such as number of credits at an
output port) for making routing decisions, do not use table-based routing.
Instead, set the routing-algorithm to custom, and implement your own routing
function inside RoutingUnit.cc. See outportComputeXY() for reference.
u Flow Control
u Can we implement alternate deadlock avoidance schemes (such as
escape VCs or dateline)?
u You can update the vc selection scheme inside SwitchAllocator to control
which VCs get allocated.
u Microarchitecture
u Can we model variable number of VCs in each router?
u Currently the codebase is very tied to the global vcs_per_vnet parameter. If
you want to model variable number of VCs, one hack could be to have
everyone instantiate the same number of VCs, but modify the VC select to
never allocate certain VCs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 41
Conclusions
u Garnet2.0 is an open-source research vehicle
u Use it and contribute to it!
u Being actively maintained by the following people
u Georgia Tech: Tushar Krishna
u AMD Research: Brad Beckmann, Onur Kayiran, Jieming Yin, Matt
Porembas
u If you have any questions, email on gem5-users or gem5-dev mailing lists
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 42