Ethernet Ethernet Packet Format

Gigabit Ethernet Switching
Kenny Ranerup
Director of Engineering
SwitchCore AB
kenny.ranerup@switchcore.com
Ethernet
Ethernet Packet Format

t
Ethernet is a communication protocol at OSI

layer 1 and 2 (physical and data link).
It was created in the early 80:s and has then
evolved from 10 Mbit/s performance to
today's 1 Gbit/s with 10 Gbit/s on its way.
The original usage was for communication
between a small number of scientific
computers.
Kenny Ranerup 1999-11-1
preamble
Ethernet frame IPG
preamble
Ethernet frame
dst addr src addr type
bytes: 6
IPG preamble
data
CRC
46-1500
Data is divided into variable length packets.

Minimum frame size 64 bytes and maximum
1518 bytes.
Minimum inter packet gap 96 bits.
4
Ethernet First Generation
Ethernet Access Protocol
10 Mbit/s.
Coaxial cable.
Shared medium.
Access protocol: CSMA/CD
TE
10-100 machines on
one network.
Physical size limited.
TE
TE
Ethernet frame
Carrier Sense Multiple Access with Collision

Detect
One shared medium.
Wait for medium free, then transmit.
If collision then retransmit after a random
period.
If repeated collisions then backoff exponentially.
TE
Ethernet First Generation PHY: 10Base-5
Ethernet PHY: 10Base-T

The shared coaxial cable is replaced with hubs
and dedicated cables.
10 Mbit/s.
Unshielded twisted pair cable (UTP).
One sender and multiple receivers.

Simplex, i.e. its not possible to send and
receive at the same time.
Collisions are detected by sensing the
voltage levels on the line.
Cable and cable connectors are expensive
and fragile.
hub
TE
7
hub
hub
TE
TE
TE
Logically but not physically shared medium.

Still CSMA/CD.
The cable is a cheap unshielded twisted pair

cable (voice grade telephone cable). One
twisted pair for transmission and one pair for
receiving.
Collisions are detected by simultaneous
activity on receive and transmit pair.
Point-to-point links, i.e. the cable is dedicated to

one station.
Hubs are active devices that propagate all
transmissions and collisions to all stations with
no delay or buffering.
Simplex.
10
Ethernet PHY: 100Base-TX
Ethernet PHY: 1000Base-X
100 Mbit/s.
Same logical and physical structure as
10Base-T.
Same cable type as 10BASE-T (two twisted
pairs) but requires Category 5 cable class.
More advanced physical encoding allows
100 Mbit/s data rate with only 31 MHz
frequency spectrum (4B/5B, MLT-3).
The amount of bandwidth wasted due to the

time it takes to propagate a collision
throughout the network is big.
11
One solution uses carrier extension and frame

bursting to lower overhead.
Full duplex and switched network is the
preferred solution. This avoids collisions
altogether.
12
Ethernet PHY:1000Base-SX/LX
1 Gbit/s.
Based on FibreChannel physical layer which
is fibre-optical.
Data is transmitted over a pair of fibres.
Available today.
Four pairs of Category 5 twisted-pair copper

wire, i.e. same as 100Base-T.
1 Gbit/s accomplished with 250 Mbit/s per
pair simultaneously in both directions.
Extremely complex signal processing
required (150 Gops).
First generation chips are available today.
13
14
1000BASE-T: Encoding
1000Base-T: Signal Processing
5-level pulse amplitude modulation.

Trellis encoding at transmitter combined with
a Viterbi decoder at the receiver is used to
gain 6 dB in SNR.
Partial-response spectrum shaping is used
to get the power-spectral density to fall
below that of the 100BASE-TX standard.
Each receiver (one per wire pair) has an

echo cancellation filter to deal with the echo
from its own transmitter.
Each receiver has three adaptive
cancellation filters to cancel out the effects of
near-end crosstalk (NEXT) from the three
other wire pairs.
15
16
Ethernet architecture: Problems
Ethernet architecture: Problems
Shared medium limits total bandwidth to the

bandwidth of one station.
All transmissions (and collisions) must be
propagated throughout the network before
next transmission can begin. This limits the
physical size of the network and maximum
packet rate.
CSMA/CD efficiency drops during heavy

load due to collision overhead.
New arbitration methods, e.g. prioritized
traffic, isnt possible since the CSMA/CD
arbitration method is built into the protocol.
17
18
Ethernet architecture: Switches
Switch port
switch
TE
19
TE
TE
TE
Arbitration protocol
hidden inside switch
fabric, i.e. no longer
limited by CSMA/CD.
Many senders and
receivers at the same
time.
Total bandwidth only
limited by the switch
fabrics internal design.
Full duplex, i.e.

simultaneous
receive and
transmit.
switch
21
TE
TE
switch
TE
TE
switch
switch
TE
TE
TE
TE
TE
TE

Switch ports can have different speeds
(10/100/1000).
Easier upgrade path to higher speeds since they
can coexist.
Backbone connection (uplink) can be higher
speed.
switch
switch
Can be extended to
arbitrary size.
Interswitch link or Uplink
20
TE
TE
Inside packet switches

Address Lookup: How does a switch
determine where to send an incoming
packet?
Forwarding: How are packets copied from
input port to output port?
Queuing: What happens when several input
ports send to the same output port?
22
Address Lookup
Packet header with

destination address
dst adr port

16271 3
23179 1
73544 3
.
For each received packet

the port to send the packet
to is determined by looking
in a table which translates
Ethernet destination
address to port number.
1
2
3
4
23
24
Forwarding
Transfer the packet from
the input port to the correct
output port.
One common solution is a
crossbar which can connect
all inputs to all outputs.
Switch fabric
25
Queuing
In some cases the packet has to be stored
temporarily before transmitting.
If the output port speed differs from the input port.
If several inputs transmit to the same output.
If some packets have higher priority and are
therefore not sent in the arrival order.
Output queue
26
A closer look at Address Lookup

Address lookup consists of
two primary functions:
dst adr port
16271 3
23179 1
73544 3
.
Provide a translation table

between destination address
and port number.
Update the table according to
changes in the network.
Switching versus routing

The differences between switches and
routers in LANs:
Address lookup is based on L2 address
(Ethernet) in a switch and L3 (IP/IPX) address in
a router.
Address learning is based on routing protocols
(OSPF, RIP) in a router and on 802.1 bridge
learning in a switch.
A switch forwards broadcast packets.
1
2
3
4
27
28
Layer-2 switching
Layer-2 switching
Address lookup
Address learning/updates
The switch learns address to port mapping by
listening to source addresses in all incoming
frames, thereby determining which addresses
are present at each port.
If an address isnt present in the lookup table,
the frame is flooded (broadcasted) to all ports.
This simulates a shared LAN.
Due to the need for broadcast for address
learning (and other functions), L2 switches
doesnt scale very well.
The Ethernet address (L2) determines which

switch port a packet is sent to.
The address lookup must translate a 48-bit
Ethernet MAC address to a destination port
number.
29
30
Layer-3 switching (routing)
Layer-3 switching
Address lookup
Address learning/updates
The IP (L3) address is used to determine

destination port.
The address lookup must translate a 32-bit
subnet IP-address into a port number.
Subnet addressing means that a varying
number of bits is used to determine network
number / destination port. This complicates the
design of the address lookup table.
31
Layer 4-7 switching
Routing protocols, e.g. RIP, OSPF, BGP, are

used to administer address learning and
updates.
These routing protocols are much more efficient
than L2 address learning and can therefore
scale to very large networks, e.g. the Internet.
32
A closer look at switch fabrics
A confusing term usually used to describe

some type of server or link load balancing.
1
2
3
4
33
34
Switch Fabric Types
Bus Based
Each switch port is connected to a shared

high-speed bus.
Much like the 10BASE-5 Ethernet but can
have much higher bandwidth.
Limited scalability since required bus
bandwidth is total port bandwidth.
Bus based.
Crossbar switch.
Multi-stage network.
Shared memory.
Combinations thereof.
Shared bus
35
36
Crossbar switch
Multi-stage network.
Physical connection between all ports.

Limited scalability due to exponential
increase in number of wires.
Scalable.
Requires a large number of chips.
Store-and-forward results in long latencies.
37
38
Shared memory
A closer look at queuing
Limited scalability since required memory

bandwidth is total port bandwidth.
Very efficient solution if built on a single chip.
Queue
1
2
3
4
39
40
Queuing / Packet Buffering
Input Buffering
Since packets from several input ports can

arrive at the same output port
simultaneously some temporary packet
buffer is needed.
The packet buffer is usually implemented as
a FIFO queue.
Packets are buffered at the input port.

Problem with input buffering is head-of-line
blocking.
The first packet destined for port 1 awaits that port 1
becomes available. The following three packets are
blocked although they are destined for port 2 which is
free.
1
2
Congested
41
In1
2
3
4
2
3
4
In4
Out3
3
1
42
Output buffering
Packet buffering
No head-of-line blocking.
Broadcast traffic is duplicated.
Buffer bandwidth much higher since the buffer
must be able to swallow traffic from all input
ports.
Packets are variable length with a nonuniform length distribution.
64 400
43
44
Buffer fragmentation
1500
Packet length
[bytes]
Cut-through / Store-and-forward
One frame per cell (large internal fragmentation).
1518
Frames span multiple cells (small external
fragmentation).
Store-and-forward: A complete frame is

buffered before transmission is started at the
destination port. This results in long latency.
Cut-through: As soon as destination port is
known, transmission starts. Results in short
latency but if the frame is faulty the fault is
propagated.
256
frame
45
46
Congestion
Congestion Control
When an output port receives more than it

can transmit, the port becomes congested
and the queue starts to grow very fast.
There are several methods of relieving a

congestion situation.
0.5 Gbps
Signaling the transmitter to stop further

transmissions.
Dropping excessive frames.
Congestion notification.
Admission control which requires allocation of
bandwidth before transmission.
1
2
3
0.8 Gbps
47
1.0 Gbps
48
Flow Control / Pause control

PAUSE frames are sent to the traffic source on
ports that cause the congestion.
PAUSE frames
TE
TE
49
Flow Control / Pause control

Affects all types of traffic from a source
which means that this method cant support
real-time traffic.
Its a link-level congestion control scheme
and therefore cant handle end-to-end
requirements effectively.
Works well with protocols that doesnt have
any congestion control built-in.
TCP/IP doesnt work well with this scheme.
50
Admission Control
Explicit Congestion Indication
Another method of avoiding congestion is

admission control where each connection
through the network must be approved.
With admission control its possible to control
network load and therefore give tight
guarantees on available bandwidth and
latency.
This solution doesnt scale very well to
Internet size since each connection has to
be approved by all intermediate stations.
One method of handling congestion is by

signalling explicitly to nodes in the network
that a congestion situation has occured.
Pause control is a link-level ECI.
51
52
Implicit Congestion Indication
TCP/IP Congestion Control
Lost packets can be used as a congestion

indicator in networks with very low bit error
rate (BER).
Switches can use this to control the
bandwidth of TCP connections by dropping
packets on purpose.
TCP/IP has built-in congestion control

mechanisms based on e.g. transmission
window and slow-start.
TCP Window.
53
The transmission window limits the number of

bytes allowed to be sent without receiving an
acknowledgement.
When packets are dropped in the network the
transmitter decreases the window size thereby
limiting the transmitted bandwidth.
54
TCP/IP Congestion Control
Quality of Service (QoS)
TCP slow-start.
For multimedia and other sensitive traffic

types, some type of quality assurance is
desirable.
To avoid severe congestion after a congestion

situation disappears, each TCP transmitter is not
allowed to start sending at full bandwidth.
A gradual increase in allowed bandwidth up till
the maximum available bandwidth is used
instead.
55
Guaranteed minimum bandwidth.

Guaranteed maximum latency.
Guaranteed maximum bandwidth/latency
variations.
56
QoS
QoS: Latency
In a shared Ethernet, its difficult to

guarantee QoS due to the nature of
CSMA/CD.
In a switched network, latency and
bandwidth can be controlled within a single
switch but requires control protocols for endto-end guarantees spanning multiple
switches.
Latency is dependant on the traffic behavior,

queue lengths in the switch and the
arbitration protocol within the switch.
By separating different types of traffic into
different queues its possible to separate
bursty traffic from traffic that requires low
latency. Doesnt guarantee latency, though.
57
58
QoS: Bandwidth
Class-of-Service / DiffServ
Bandwidth allocation. The source tries to

allocate bandwidth and the switch either
rejects or accepts the request thereby not
oversubscribing its capacity.
Traffic reject by priority or class. By rejecting
(dropping) insensitive traffic when switch get
congested, sensitive traffic can get its
required bandwidth.
Define a small number of different classes,

e.g. real-time, best-effort.
Classify packets as they enter the network
into one of the classes and tag them with a
class identifier.
Switches in the network can then treat the
packets according to class within the
network without per-connection information.
59
60
10
= Sw itch or
R outer
DiffServ Domain
VLSI Architecture
for
Gigabit Ethernet Switches
Boundary Node
DiffServ Domain
Interior Node
61
VLSI Architectures for Packet Switching
Switch Fabrics
Design goals:
Desired characteristics:
Low cost per port high integration

High aggregate performance high integration
Low power dissipation high integration
Scalability
63
Non-blocking, i.e. can forward full traffic from all

input ports to all outputs without dropping
packets.
Low cost per port.
64
Switch Architecture
Shared Memory Switch

MAC
Low chip count (single chip switch possible).

Low power consumption.
MAC
CAM
MAC
MAC
MAC
MAC
Advantages:
Disadvantages:
Serial
to
parallel
65
Shared
buffer
memory
Parallel
to
serial
High shared memory bandwidth.

High lookup rate.
66
11
67
Serial
to
parallel
68
Shared
buffer
memory
MAC
MAC
CAM
MAC
MAC
The required shared memory bandwidth is

very large.
Each incoming packet results in several
address lookups to determine destination
port.
Each Gigabit Ethernet link uses many pins.
MAC
Switch Architecture
MAC
Switching Challenges at Gigabit Speed
Parallel
to
serial
Switching at 1 Gbit/s
Shared Memory Bandwidth
Each link runs at 1 Gbit/s full duplex.

This results in 1.5 million packets/s.
A typical switch will have 32 ports on a
single chip (within two years).
Total packet rate is 48 million packets/s.
32 ports @ 1 Gbit/s
32 * 2 * 125 Mbyte/s = 8 Gbyte/s
32 ports @ 10 Gbit/s
32 * 2 * 1250 Mbyte/s = 80 Gbyte/s
69
70
Shared Memory Bandwidth
Address Lookup Rate
One solution is to convert the data stream

into a more parallel stream.
32 ports @ 1 Gbit/s
48 million packets/s.
3 lookups/packet
Using a 512 bit wide memory the 8 Gbyte/s

requires only 125 MHz.
144 million address lookups/s.

32 ports @ 10 Gbit/s
1440 million address lookups / s.
71
72
12
Address Lookup Rate
Pin Count
Its clear that lookup algorithms like binary

search trees and hash lookups will be
difficult to implement at these speeds.
Special full-custom CAM:s are therefore
used.
32 ports @ 10 Gbit/s (full duplex).

1 GHz / pin.
73
Pin Count
Low swing to reduce average power and
peak currents.
High frequency to reduce number of pins.
Very difficult to avoid skew on PCB. Signaling
should probably be asynchronous.
Signals mustnt switch simultaneously.
Interface must be standardized.
75
Processor Performance
Single processor.
48 million packets/s (32 ports @ 1 Gbit/s).
1000 MIPS processor.
20 instructions/packet.
Multiprocessor: one processor per port.
74
32 * 10 * 2 = 640 pins @ 1 GHz.
Packet Processing
The amount of processing performed for
each packet is increasing rapidly as more
advanced services are introduced.
Programmable solutions are required to give
sufficient flexibility and speed of
implementation.
76
Processor Power Budget

Assume a total power budget of 40 W on a
switch with 32 ports @ 1 Gbit/s.
Assume half of the power consumed by port
processors each running at 1 GHz.
40 / 2 / 32 = 0.6 W / processor @ 1 GHz
1.5 million packets/s/port.

1000 MIPS processor.
670 instructions/packet.
77
78
13
Design Flow
79
81
MAC
MAC
MAC
ReEnc
CAM
ReEnc
Address
lookup
ReEnc
MAC
MAC
PDEC
PDEC
PDEC
MAC
Single Chip Shared Memory Switch
Serial
to
parallel
Shared
buffer
memory
Parallel
to
serial
CPU i/f
Q-Engine
Rambus i/f
Design Approach
Choose an architecture that makes it
possible to utilize the silicon.
The design is divided into a number of
blocks with 1-2 designers per block.
Critical blocks is designed in full-custom,
mostly memory or datapath-like structures.
All other blocks are designed at RTL level
and synthesized to standard cells.
82
Standard Cell Design
Standard Cell Design
Design and simulate at RTL level.
Synthesis
The design is entered mostly as synthesisable

Verilog code using text editor.
Verilog/VHDL descriptions simulated at RTL
level using Modelsim.
Full-custom blocks are modeled at RTL with
timing (including compiled memories).
The RTL code is synthesized continually during
design.
83
Synopsys synthesis using standard cell library.

Timing requirements are set up at block
boundary to allow for long communication paths
between blocks.
Static Timing Analysis

Performed initially in Synopsys at block level.
Timing analysis at chip-level is performed using
PrimeTime and also in the back-end tools.
84
14
Full-Custom Design
Methodology Comparison
Some full-custom blocks are drawn manually

using the GDT/Led editor.
Regular structures are generated using the
L-language in GDT. Examples are SRAMs
and CAMs.
Fab independent 0.25um design rules are
used.
Build a serial to parallel converter that

converts 16 8-bit wide data streams into a
single 512-bit wide stream.
Compare:
85
a semi-custom approach using synthesized

standard cells.
a full custom layout structure.
86
Methodology Comparison
OTHERS
30mm2
12mm2
6.3W
0.5W
Semi-Custom Approach
~3x the Area
~12x the power
SwitchCore
Full Custom Approach
Assumes: .25u, 2.5v process
87
Switchcore AB
CXe-16 Prototype Chip

A Single Chip
16-port Gigabit
Ethernet Switch
Founded 1997 to commercialize a Swedish

research project jointly developed at LTH
and LiTH.
Research project demonstrated a single chip
8 x 10 Gbit/s ATM switch.
Current products:
Single chip 16-port Gigabit Ethernet switch.
Single chip 24 FE + 4 GE Ethernet switch.
CAM chip for large address tables.
89
90
15
Characteristics
References
Non-blocking switch fabric with 32 Gbit/s

switching bandwidth.
16 x 1 Gbit/s full duplex Ethernet ports.
Layer-2 and layer-3 switching.
Incoming traffic: 31 million packets/s.
Gigabit Ethernet by Jayant Kadambi et al,

ISBN 0-13-913286-4.
Interconnections Second Edition:
Bridges, Routers, Switches and
Internetworking Protocols by Radia
Perlman, ISBN 0-201-63448-1.
IP Switching by Christopher Y. Metz, ISBN
0-07-041953-1.
http://www.switchcore.com
91
92
16

Ethernet Ethernet Packet Format

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ethernet Ethernet Packet Format

Hochgeladen von

Copyright:

Verfügbare Formate

Gigabit Ethernet Switching

Ethernet Packet Format

Ethernet is a communication protocol at OSI

Kenny Ranerup 1999-11-1

Ethernet frame IPG

dst addr src addr type

Data is divided into variable length packets.

Kenny Ranerup 1999-11-1

Ethernet First Generation

Ethernet Access Protocol

Access protocol: CSMA/CD

Kenny Ranerup 1999-11-1

Carrier Sense Multiple Access with Collision

Kenny Ranerup 1999-11-1

Ethernet First Generation PHY: 10Base-5

Ethernet PHY: 10Base-T

One sender and multiple receivers.

Kenny Ranerup 1999-11-1

Kenny Ranerup 1999-11-1

Ethernet PHY: 10Base-T

Ethernet PHY: 10Base-T

Logically but not physically shared medium.

The cable is a cheap unshielded twisted pair

Point-to-point links, i.e. the cable is dedicated to

Kenny Ranerup 1999-11-1

Kenny Ranerup 1999-11-1

Ethernet PHY: 100Base-TX

Ethernet PHY: 1000Base-X

The amount of bandwidth wasted due to the

Kenny Ranerup 1999-11-1

One solution uses carrier extension and frame

Kenny Ranerup 1999-11-1

Ethernet PHY: 1000Base-T

Four pairs of Category 5 twisted-pair copper

Kenny Ranerup 1999-11-1

Kenny Ranerup 1999-11-1

1000Base-T: Signal Processing

5-level pulse amplitude modulation.

Each receiver (one per wire pair) has an

Kenny Ranerup 1999-11-1

Kenny Ranerup 1999-11-1

Ethernet architecture: Problems

Ethernet architecture: Problems

Shared medium limits total bandwidth to the

CSMA/CD efficiency drops during heavy

Kenny Ranerup 1999-11-1

Kenny Ranerup 1999-11-1

Ethernet architecture: Switches

Kenny Ranerup 1999-11-1

Full duplex, i.e.

Kenny Ranerup 1999-11-1

Ethernet architecture: Switches

Interswitch link or Uplink

Ethernet architecture: Switches

Ethernet architecture: Switches

Kenny Ranerup 1999-11-1

Inside packet switches

Kenny Ranerup 1999-11-1

Packet header with

dst adr port

For each received packet

Kenny Ranerup 1999-11-1

Kenny Ranerup 1999-11-1