Beruflich Dokumente
Kultur Dokumente
Presented by:
Andrew W. Moore and David Miller
(University of Cambridge)
Martin Žádník
(Brno University of Technology)
Nadi Sarrar
(TU-Berlin/T-Labs)
http://NetFPGA.org
NetFPGA Cambridge Spring School 15-19 Mar 2010 1
Welcome
Please organize into teams
2 or 3 People/computer
NetFPGA homepage
http://NetFPGA.org
18:00 Punt trip – weather dependent Day 4 – Thursday 18th March, 2010
19:30 Dinner – India House
9:00 – 17:30 Work on Projects
Day 2 – Tuesday 16th March, 2010 NetFPGA users available for Questions and Answers
Networking
Software CPU Memory
running on a
standard PC
PCI
PC with NetFPGA
A hardware
1GE
accelerator
built with Field FPGA 1GE
Programmable
1GE
Gate Array
driving Gigabit Memory
1GE
network links NetFPGA Board
OSPF BGP
CPU Memory
My Protocol
user
kernel
Routing
PCI
Table
“Mirror”
Fwding Packet
Table Buffer
1GE
FPGA 1GE
IPv4
1GE
Router
Memory
1GE
PW-OSPF
Verilog
CPU Memory
Java GUI EDA Tools
Front Panel (Xilinx,
(Extensible) Mentor, etc.)
PCI
NetFPGA Driver
1. Design
2. Simulate
1GE 3. QSynthesize1GE
L3 L2 In
FPGA Parse Parse 4. Download
Mgmt
1GE 1GE
1GE IP My Out Q 1GE
Memory Lookup Block Mgmt
1GE 1GE
Verilog modules interconnected by FIFO interfaces
Us
age Creating new systems
Verilog
CPU Memory
EDA Tools
(Xilinx,
Mentor, etc.)
1. Design
PCI NetFPGA
2. Driver
Simulate
3. Synthesize
4. Download
1GE 1GE
FPGA 1GE My Design 1GE
1GE 1GE
Memory
1GE (1GE MAC is soft/replaceable)
1GE
B E
D
R2
C
Destination Next Hop R5
F
D R3
E R3
F R5
D
TTL Protocol Header Checksum
R2
Source Address
C
Destination Next Hop R5
Destination Address F
D
Options R3
(if any)
E R3
Data
F R5
R3
R1
A R4 D
B E
R2
C
R5
F
Management
Software
& CLI
Routing
Protocols
Routing Control Plane
Table
Hardware
Forwarding
Datapath
Switching
Table per-packet
processing
Header Processing
Data Hdr Data Hdr
Lookup Update Queue
IP Address Header Packet
Forwarding
Forwarding Buffer
Buffer
Table
Table Memory
Memory
128.9.0.0
142.12/19
65/8
128.9/16
0 232-1
2 16
128.9.16.14
128.9.19/24
128.9.25/24
128.9.16/20 128.9.176/20
128.9/16
0 232-1
128.9.16.14
Management
Software
& CLI
Linux user-level
Routing processes
Exception Protocols
Processing Routing
Table
Verilog on
Hardware
Forwarding
NetFPGA PCI board
Switching
Table
Fully programmable
– FPGA hardware
Low cost
Open-souce Software
– Drivers in C and C++
– Memories
• 36Mbits Static RAM
• 512Mbits DDR2 Dynamic RAM
– FPGA Resources
• Block RAMs
• Configurable Logic Block (CLBs)
• Memory Mapped Registers
User Space
Linux Kernel
Packet Forwarding Table
PCI PCI-e
VI
VI
VI
VI
NIC
NetFPGA Router
Hardware
GE
GE
GE
GE
GE
GE
(nf2c0 .. 3) (eth1 .. 2)
• Motherboard
– Standard AMD or Intel-based
x86 computer with PCI
and PCI-express slots
• Processor
– Dual or Quad-Core CPU
• Operating System
– Linux CentOS 5.2
NetFPGA inserts in
PCI or PCI-X slot
2U Server
(Dell 2950)
1U Server
(Accent Technology Inc.)
Thanks: Brian Cashman for providing machine
• Manged
• Power
• Console
• LANs
• Provides
4*40=160 Gbps
of full line-rate
processing
bandwidth
Dual
(eth1 NIC
.. 2) GE
Server eth2 : Server for Neighbor
Internet
Control SW Router GE nf2c1 : Neighbor
Hardware
GE nf2c0 : Ring - Right
CAD Tools
PCI-e
Video NIC
Server GE
CPU x2 Net-FPGA GE
PCI
Internet GE
Router GE
Hardware
Server
GE
delivers
streaming Net-FPGA GE
HD video Internet
Router
GE
GE
through a Hardware GE
chain of
NetFPGA
Routers PCI-e NIC
GE …
GE
CPU x2 Net-FPGA GE
Video
PCI
Internet GE
Display Router GE
Hardware GE
CAD Tools
3 3 3 3 3 3 3 3 3 3
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
0 0 0 0 0 0 0 0 0 0
eth
Video
Each NetFPGA card Server
has four ports
NetFPGA
Port 2 connected to
Client / Server
Video
Client
NetFPGA
Video
Server
HD
Display
Video Video
Server Shortest Path Client
NetFPGA Cambridge Spring School 15-19 Mar 2010 34
o 1
Cable Configuration for Demo 1
1
m o
DDe em
3 3 3 3 3 3 3 3 3 3
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
0 0 0 0 0 0 0 0 0 0
eth
• Objectives
– Become familiar with
Stanford Reference Router
– Observe PW-OSPF
re-routing traffic around a failure
• Video server
– Source files
/var/www/html/video
– Network URL :
http://192.168.Net.Host/video
• Video client
– Windows Media Player
– Linux mplayer
• Video traffic
– MPEG2 HDTV (35 Mbps)
– MPEG2 TV (9 Mbps)
– DVI (3 Mbps)
– WMF (1.7 Mbps)
192.168.3.*
192.168.18.*
5 1192.168.15.*
4 192.168.12.*
3 192.168.9.*
2 192.168.6.*
1
.2.1
.3.1
.3.2
.4.2
.6.1
.6.2
.7.2
.9.1
.9.2
.10.2
.12.1
.12.2
.13.2
.15.1
.15.2
.16.2
.17.1
client
.30.2 .5.1 .8.1 .11.1 . .18.1
14.1
.30.1
Routers re-route
.26.1 .23.1 .18.2
.27.2 .24.2 .21.2
.29.1 .20.1
.27.1 .24.1 .21.1
.28.2 .25.2 .22.2 .19.2
Processors
– Network Processors
– General Purpose Processors
C LB
G 1
CLB
S
G 2 4 LUT M D Q YQ
G 3 G R
G 4 M Y
3 LUT
H1
H
F4 M X
– Global routing
CLB P IP
– Local interconnect
3 r d G e n e r a t io n L U T - b a s e d F P G A
Macro Blocks
... ...
– Block Memories
– Microprocessor ...
M a c ro
B lo c k
I/O Block ... ... (u P,
M em )
... ...
DDR2 18Mb
64MB
1GE
PHY
Board-Board Interconnect
circuits and cores.
1GE
PHY
SATA
3 Gb
FIFO
packet Control, PCI
buffers Interface
• Synthesizable
– Generates hardware from description
From: http://eesun.free.fr/DOC/VERILOG/synvlg.html
reg s;
always @*
begin
case (select2)
2'b00: s = a;
2'b01: s = b;
default: s = c;
endcase
end
From: http://eesun.free.fr/DOC/VERILOG/synvlg.html
– Clock Event
• Example: Rising edge
Din A B C
t=0
– Flip/Flop Clock Transition
• Transfers value from Dout S0 A B
Din to Dout on clock event t=0
Combinational Logic
Q D
S(t+1)= Next
S(t) (X,S(t)) State
...
Q D
State Storage
From: http://eesun.free.fr/DOC/VERILOG/synvlg.html
• Five stages MAC CPU MAC CPU MAC CPU MAC CPU
– Input RxQ RxQ RxQ RxQ RxQ RxQ RxQ RxQ
– Input arbitration
– Routing decision and Input Arbiter
packet modification
– Output queuing Output Port Lookup
– Output
• Packet-based Output Queues
module interface
• Pluggable design
MAC CPU MAC CPU MAC CPU MAC CPU
TxQ TxQ TxQ TxQ TxQ TxQ TxQ TxQ
Objectives:
– Learn how to build hardware
– Run the software
– Explore router architecture
Execution
– Start synthesis
– Rerun the GUI with the new hardware
– Test connectivity and statistics with pings
– Explore pipeline in the details page
– Explore detailed statistics in the details page
Start terminal, cd to
“NF2/projects/tutorial_router/synth”
Start synthesis
with “make”
cd to “NF2/projects/tutorial_router/sw”
…and statistics
2T
• Universally applied rule-of-thumb:
– A router needs a buffer size: B 2T C
– 2T is the two-way propagation delay (or just 250ms)
– C is capacity of bottleneck link
• Context
– Mandated in backbone and edge routers
– Appears in RFPs and IETF architectural guidelines
– Already known by inventors of TCP
• [Van Jacobson, 1988]
– Has major consequences for router design
# packets
1,000,000 10,000 20
at 10Gb/s
2TT CC
2
2T C
2T C (
((11))
(22))
O O(log
(logW
W))
nn
(1) Assume: Large number of desynchronized flows; 100% utilization
(2) Assume: Large number of desynchronized flows; <100% utilization
Objective:
– Use the NetFPGA to understand how large a
buffer we need for a single TCP flow.
Time evolution of a single TCP flow through a router. Buffer is 2T*C Time evolution of a single TCP flow through a router. Buffer is < 2T*C
PCI-e
NIC GE
CPU x2 Net-FPGA GE
Video
Server
PCI
Internet GE
Client Router GE
Hardware GE
delivers
streaming
HD video
GE
to adjacent
PCI-e
Video
Server
NIC GE client
CPU x2
Adjacent
Web & Video
Server
nf2c1 eth2
Local nf2c2
Host NetFPGA
eth1
Route
r
.29.1 .14.2
.29.2
.14.1
.26.2
.23.2 .20.2 .17.2
.28.2 .25.2 .22.2 .19.2 .16.2
.16.1
.28.1 .25.1 .22.1 .19.1
.26.1
.23.1 .20.1 .17.1
This configuration allows you to modify and test your router without affecting others
nf2c nf2c nf2c nf2c nf2c nf2c nf2c nf2c nf2c nf2c
3 3 3 3 3 3 3 3 3 3
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
0 0 0 0 0 0 0 0 0 0
eth eth eth eth eth eth eth eth eth eth
Eth1
6 7 8 9 0
19.1 22.1 25.1 28.1 1.1
Eth2
17.1 20.1 23.1 26.1 29.1
5 4 3 2 1
Objectives
– Observe router with new modules
– New modules: rate limiting, event capture
Execution
– Run event capture router
– Look at routing tables
– Explore details pane
– Start tcp transfer, look at queue occupancy
– Change rate, look at queue occupancy
Type
“./tut_adv_router_gui.pl”
Two modules added MAC CPU MAC CPU MAC CPU MAC CPU
RxQ RxQ RxQ RxQ RxQ RxQ RxQ RxQ
1. Event Capture
to capture output Input Arbiter
queue events
(writes, reads, Output Port Lookup
drops)
Event Capture
Output Queues
MAC
TxQ
NetFPGA Cambridge Spring School 15-19 Mar 2010 81
o 2 Step 3 - Decrease the Link Rate
2
m o
DDe em
Check Enabled
Type
“./iperf.sh”
Objectives
– Add new modules to datapath
– Synthesize and test router
Execution
– Open user_datapath.v, uncomment
delay/rate/event capture modules
– Synthesize
– After synthesis, test the new system
Open terminal
Type
emacs
NF2/projects/tutorial_router/src/user_data_path.v
NetFPGA Cambridge Spring School 15-19 Mar 2010 92
se 2
2
ci ise
Step 2 - Add Wires
er rc
EEx xe
Save (ctrl+x+s)
Start terminal, cd to
“NF2/projects/tutorial_router/synth”
Start synthesis
with “make”
Software
nf2c0 nf2c1 nf2c2 nf2c3 ioctl
PCI Bus
CPU CPU
CPU CPU nf2_reg_grp
CPURxQ
CPURxQ CPU
CPUTxQTxQ
RxQ
RxQ TxQTxQ
NetFPGA user data path
MAC MAC
MAC MAC
MAC
MAC MAC
TxQMACRxQ
TxQ
TxQ RxQ
RxQ
TxQ RxQ
Ethernet
NetFPGA Cambridge Spring School 15-19 Mar 2010 104
Life of a Packet through the Hardware
192.168.1.x 192.168.2.y
port0 port2
Input Arbiter
Output Queues
0 Eth Hdr
0 IP Hdr
0 …
0x10 Last word of packet
data
ctrl
wr
rdy
Pkt length,
0xff
input port = 0
Eth Hdr:
0 Dst MAC = port 0,
Ethertype = IP
IP Hdr:
0 IP Dst: 192.168.2.3,
TTL: 64, Csum:0x3ab4
0 Data
Pkt
Pkt
Pkt
NetFPGA Cambridge Spring School 15-19 Mar 2010 111
Output Port Lookup
OQ0
OQ4
OQ7
Pkt length,
0xff input port = 0
output port = 4
EthHdr: Dst MAC = nextHop
0 Src MAC = port 4,
Ethertype = IP
IP Hdr:
0 IP Dst: 192.168.2.3,
TTL: 64,
63, Csum:0x3ab4
Csum:0x3ac2
0 Data
Software
nf2c0 nf2c1 nf2c2 nf2c3 ioctl
PCI Bus
Ethernet
NetFPGA Cambridge Spring School 15-19 Mar 2010 118
Output Port Lookup
1- Check input
port matches
Dst MAC
Pkt length,
2- Check TTL, 0xff input port = 0
checksum – output port = 1
EXCEPTION! EthHdr: Dst MAC = 0,
0 Src MAC = x,
Ethertype = IP
3- Add output IP Hdr:
0 IP Dst: 192.168.2.3,
port module
TTL: 1, Csum:0x3ab4
0 Data
OQ0
OQ1
OQ2
OQ7
Pkt length,
0xff input port = 0
output port = 1
EthHdr: Dst MAC = 0,
0 Src MAC = x,
Ethertype = IP
IP Hdr:
0 IP Dst: 192.168.2.3,
TTL: 1, Csum:0x3ab4
0 Data
Software
nf2c0 nf2c1 nf2c2 nf2c3 ioctl
PCI Bus
Ethernet
NetFPGA Cambridge Spring School 15-19 Mar 2010 124
NetFPGA-Host Interaction
• Linux driver interfaces with hardware
– Packet interface via standard Linux network
stack
eg:
readReg(&nf2, OQ_NUM_PKTS_STORED_0, &val);
2. Interrupt
PCI Bus
notifies 3. Driver sets up
driver of and initiates
packet DMA transfer
arrival
5. Interrupt
4. NetFPGA PCI Bus signals
transfers completion
packet via of DMA
DMA
2. Driver
PCI Bus
performs
PCI memory
read/write
Objectives
– Add counter and FSM to the code
– Synthesize and test router
Execution
– Open drop_nth_packet.v
– Insert counter code
– Synthesize
– After synthesis, test the new system.
One module added MAC CPU MAC CPU MAC CPU MAC CPU
RxQ RxQ RxQ RxQ RxQ RxQ RxQ RxQ
1. Drop Nth Packet
to drop every Nth Input Arbiter
Output Queues
Rate
Limiter
MAC CPU CPU MAC CPU MAC CPU
TxQ TxQ TxQ TxQ TxQ TxQ TxQ
MAC
TxQ
NetFPGA Cambridge Spring School 15-19 Mar 2010 133
se 3
3
ci ise
Step 1 - Open the Source
er rc
EEx xe
Open terminal
Type “emacs
NF2/projects/tutorial_router/src/drop_nth_packet.v
NetFPGA Cambridge Spring School 15-19 Mar 2010 134
se 3
3
ci ise Step 2 - Add Counter to Module
er rc
EEx xe
Start terminal, cd to
“NF2/projects/
tutorial_router/synth”
•Rice University
•Network Systems Architecture (since ‘08)
http://comp519.cs.rice.edu/
•Cambridge University
•Build an Internet Router (since ‘09)
Quarter-long course targeted at graduates
http://www.cl.cam.ac.uk/teaching/0910/P33/
•University of Wisconsin
•CS838 “Rethinking the Internet Architecture”
http://pages.cs.wisc.edu/~akella/CS838/F09/
See: http://netfpga.org/teachers.html
Management
& CLI
Routing
Exception Protocols Management Management
& CLI & CLI
Processing Routing
Routing Routing
Table
Exception
Processing
Protocols
Routing
Exception Protocols
Processing Routing
• Innovate and add!
Table Table
• Presentations
Emulated • Judges
Emulated Emulated Management
h/w in VNS h/w in VNS h/w in VNS
& CLI
Routing
Exception Protocols
Processing Routing
software Table
hardware
Forwarding
Switching
Table
Testing
4-port non-learning 4-port learning IPv4 router Integrate with S/W Interoperability
switch switch forwarding path
Stanford CS344
http://cs344.stanford.edu
Cambridge P33
http://www.cl.cam.ac.uk/teaching/0910/P33/
Beijing, China
SIGMETRICS - San Diego, California, USA
You can watch the number of received and sent packets to watch the
module drop every Nth packet. Ping a local machine (i.e. 192.168.7.1)
and watch for missing pings
• Access the
Beta code
• Join the
netfpga-beta
mailing list
• Join the
discussion forum
• List your
project on
the Wiki
• Link your
project
homepage
From: Methodology to Contribute NetFPGA Modules, by G. Adam Covington, Glen Gibb, Jad Naous,
John Lockwood, Nick McKeown; IEEE Microelectronics System Education (MSE), June 2009.
on : http://netfpga.org/php/publications.php