Sie sind auf Seite 1von 16

Gigabit Ethernet Switching

Kenny Ranerup
Director of Engineering
SwitchCore AB
kenny.ranerup@switchcore.com

Ethernet

Ethernet Packet Format


t

Ethernet is a communication protocol at OSI


layer 1 and 2 (physical and data link).
It was created in the early 80:s and has then
evolved from 10 Mbit/s performance to
today's 1 Gbit/s with 10 Gbit/s on its way.
The original usage was for communication
between a small number of scientific
computers.

Kenny Ranerup 1999-11-1

preamble

Ethernet frame IPG

preamble

Ethernet frame

dst addr src addr type

bytes: 6

IPG preamble

data

CRC

46-1500

Data is divided into variable length packets.


Minimum frame size 64 bytes and maximum
1518 bytes.
Minimum inter packet gap 96 bits.
4

Kenny Ranerup 1999-11-1

Ethernet First Generation

Ethernet Access Protocol

10 Mbit/s.
Coaxial cable.
Shared medium.

Access protocol: CSMA/CD

TE

Kenny Ranerup 1999-11-1

10-100 machines on
one network.
Physical size limited.
TE

TE

Ethernet frame

Carrier Sense Multiple Access with Collision


Detect
One shared medium.
Wait for medium free, then transmit.
If collision then retransmit after a random
period.
If repeated collisions then backoff exponentially.

TE

Kenny Ranerup 1999-11-1

Ethernet First Generation PHY: 10Base-5

Ethernet PHY: 10Base-T


The shared coaxial cable is replaced with hubs
and dedicated cables.
10 Mbit/s.
Unshielded twisted pair cable (UTP).

One sender and multiple receivers.


Simplex, i.e. its not possible to send and
receive at the same time.
Collisions are detected by sensing the
voltage levels on the line.
Cable and cable connectors are expensive
and fragile.

hub

TE
7

Kenny Ranerup 1999-11-1

hub

hub

TE

TE

TE

Kenny Ranerup 1999-11-1

Ethernet PHY: 10Base-T

Ethernet PHY: 10Base-T

Logically but not physically shared medium.


Still CSMA/CD.

The cable is a cheap unshielded twisted pair


cable (voice grade telephone cable). One
twisted pair for transmission and one pair for
receiving.
Collisions are detected by simultaneous
activity on receive and transmit pair.

Point-to-point links, i.e. the cable is dedicated to


one station.
Hubs are active devices that propagate all
transmissions and collisions to all stations with
no delay or buffering.

Simplex.

Kenny Ranerup 1999-11-1

10

Kenny Ranerup 1999-11-1

Ethernet PHY: 100Base-TX

Ethernet PHY: 1000Base-X

100 Mbit/s.
Same logical and physical structure as
10Base-T.
Same cable type as 10BASE-T (two twisted
pairs) but requires Category 5 cable class.
More advanced physical encoding allows
100 Mbit/s data rate with only 31 MHz
frequency spectrum (4B/5B, MLT-3).

The amount of bandwidth wasted due to the


time it takes to propagate a collision
throughout the network is big.

11

Kenny Ranerup 1999-11-1

One solution uses carrier extension and frame


bursting to lower overhead.
Full duplex and switched network is the
preferred solution. This avoids collisions
altogether.

12

Kenny Ranerup 1999-11-1

Ethernet PHY:1000Base-SX/LX

Ethernet PHY: 1000Base-T

1 Gbit/s.
Based on FibreChannel physical layer which
is fibre-optical.
Data is transmitted over a pair of fibres.
Available today.

Four pairs of Category 5 twisted-pair copper


wire, i.e. same as 100Base-T.
1 Gbit/s accomplished with 250 Mbit/s per
pair simultaneously in both directions.
Extremely complex signal processing
required (150 Gops).
First generation chips are available today.

13

Kenny Ranerup 1999-11-1

14

Kenny Ranerup 1999-11-1

1000BASE-T: Encoding

1000Base-T: Signal Processing

5-level pulse amplitude modulation.


Trellis encoding at transmitter combined with
a Viterbi decoder at the receiver is used to
gain 6 dB in SNR.
Partial-response spectrum shaping is used
to get the power-spectral density to fall
below that of the 100BASE-TX standard.

Each receiver (one per wire pair) has an


echo cancellation filter to deal with the echo
from its own transmitter.
Each receiver has three adaptive
cancellation filters to cancel out the effects of
near-end crosstalk (NEXT) from the three
other wire pairs.

15

Kenny Ranerup 1999-11-1

16

Kenny Ranerup 1999-11-1

Ethernet architecture: Problems

Ethernet architecture: Problems

Shared medium limits total bandwidth to the


bandwidth of one station.
All transmissions (and collisions) must be
propagated throughout the network before
next transmission can begin. This limits the
physical size of the network and maximum
packet rate.

CSMA/CD efficiency drops during heavy


load due to collision overhead.
New arbitration methods, e.g. prioritized
traffic, isnt possible since the CSMA/CD
arbitration method is built into the protocol.

17

Kenny Ranerup 1999-11-1

18

Kenny Ranerup 1999-11-1

Ethernet architecture: Switches

Switch port

switch

TE

19

TE

TE

TE

Arbitration protocol
hidden inside switch
fabric, i.e. no longer
limited by CSMA/CD.
Many senders and
receivers at the same
time.
Total bandwidth only
limited by the switch
fabrics internal design.

Kenny Ranerup 1999-11-1

Full duplex, i.e.


simultaneous
receive and
transmit.

switch

21

TE

TE

switch

TE

TE

switch

switch

TE

TE

TE

TE

TE

TE

Kenny Ranerup 1999-11-1

Ethernet architecture: Switches


Switch ports can have different speeds
(10/100/1000).
Easier upgrade path to higher speeds since they
can coexist.
Backbone connection (uplink) can be higher
speed.

switch

switch

Can be extended to
arbitrary size.

Interswitch link or Uplink

20

Ethernet architecture: Switches

TE

Ethernet architecture: Switches

TE

Kenny Ranerup 1999-11-1

Inside packet switches


Address Lookup: How does a switch
determine where to send an incoming
packet?
Forwarding: How are packets copied from
input port to output port?
Queuing: What happens when several input
ports send to the same output port?

22

Kenny Ranerup 1999-11-1

Address Lookup

Packet header with


destination address

dst adr port


16271 3
23179 1
73544 3
.

For each received packet


the port to send the packet
to is determined by looking
in a table which translates
Ethernet destination
address to port number.
1
2
3
4

23

Kenny Ranerup 1999-11-1

24

Kenny Ranerup 1999-11-1

Forwarding
Transfer the packet from
the input port to the correct
output port.
One common solution is a
crossbar which can connect
all inputs to all outputs.

Switch fabric

25

Queuing
In some cases the packet has to be stored
temporarily before transmitting.
If the output port speed differs from the input port.
If several inputs transmit to the same output.
If some packets have higher priority and are
therefore not sent in the arrival order.
Output queue

Kenny Ranerup 1999-11-1

26

A closer look at Address Lookup


Address lookup consists of
two primary functions:
dst adr port
16271 3
23179 1
73544 3
.

Provide a translation table


between destination address
and port number.
Update the table according to
changes in the network.

Switching versus routing


The differences between switches and
routers in LANs:
Address lookup is based on L2 address
(Ethernet) in a switch and L3 (IP/IPX) address in
a router.
Address learning is based on routing protocols
(OSPF, RIP) in a router and on 802.1 bridge
learning in a switch.
A switch forwards broadcast packets.

1
2
3
4

27

Kenny Ranerup 1999-11-1

Kenny Ranerup 1999-11-1

28

Kenny Ranerup 1999-11-1

Layer-2 switching

Layer-2 switching

Address lookup

Address learning/updates
The switch learns address to port mapping by
listening to source addresses in all incoming
frames, thereby determining which addresses
are present at each port.
If an address isnt present in the lookup table,
the frame is flooded (broadcasted) to all ports.
This simulates a shared LAN.
Due to the need for broadcast for address
learning (and other functions), L2 switches
doesnt scale very well.

The Ethernet address (L2) determines which


switch port a packet is sent to.
The address lookup must translate a 48-bit
Ethernet MAC address to a destination port
number.

29

Kenny Ranerup 1999-11-1

30

Kenny Ranerup 1999-11-1

Layer-3 switching (routing)

Layer-3 switching

Address lookup

Address learning/updates

The IP (L3) address is used to determine


destination port.
The address lookup must translate a 32-bit
subnet IP-address into a port number.
Subnet addressing means that a varying
number of bits is used to determine network
number / destination port. This complicates the
design of the address lookup table.

31

Kenny Ranerup 1999-11-1

Layer 4-7 switching

Routing protocols, e.g. RIP, OSPF, BGP, are


used to administer address learning and
updates.
These routing protocols are much more efficient
than L2 address learning and can therefore
scale to very large networks, e.g. the Internet.

32

Kenny Ranerup 1999-11-1

A closer look at switch fabrics

A confusing term usually used to describe


some type of server or link load balancing.

1
2
3
4

33

Kenny Ranerup 1999-11-1

34

Kenny Ranerup 1999-11-1

Switch Fabric Types

Bus Based

Each switch port is connected to a shared


high-speed bus.
Much like the 10BASE-5 Ethernet but can
have much higher bandwidth.
Limited scalability since required bus
bandwidth is total port bandwidth.

Bus based.
Crossbar switch.
Multi-stage network.
Shared memory.
Combinations thereof.

Shared bus

35

Kenny Ranerup 1999-11-1

36

Kenny Ranerup 1999-11-1

Crossbar switch

Multi-stage network.

Physical connection between all ports.


Limited scalability due to exponential
increase in number of wires.

Scalable.
Requires a large number of chips.
Store-and-forward results in long latencies.

37

Kenny Ranerup 1999-11-1

38

Shared memory

Kenny Ranerup 1999-11-1

A closer look at queuing

Limited scalability since required memory


bandwidth is total port bandwidth.
Very efficient solution if built on a single chip.
Queue
1
2

3
4

39

Kenny Ranerup 1999-11-1

40

Queuing / Packet Buffering

Kenny Ranerup 1999-11-1

Input Buffering

Since packets from several input ports can


arrive at the same output port
simultaneously some temporary packet
buffer is needed.
The packet buffer is usually implemented as
a FIFO queue.

Packets are buffered at the input port.


Problem with input buffering is head-of-line
blocking.
The first packet destined for port 1 awaits that port 1
becomes available. The following three packets are
blocked although they are destined for port 2 which is
free.
1
2

Congested

41

In1

2
3
4

2
3
4

In4

Out3

Kenny Ranerup 1999-11-1

3
1

42

Kenny Ranerup 1999-11-1

Output buffering

Packet buffering

No head-of-line blocking.
Broadcast traffic is duplicated.
Buffer bandwidth much higher since the buffer
must be able to swallow traffic from all input
ports.

Packets are variable length with a nonuniform length distribution.

64 400

43

Kenny Ranerup 1999-11-1

44

Buffer fragmentation

1500

Packet length
[bytes]

Kenny Ranerup 1999-11-1

Cut-through / Store-and-forward

One frame per cell (large internal fragmentation).

1518
Frames span multiple cells (small external
fragmentation).

Store-and-forward: A complete frame is


buffered before transmission is started at the
destination port. This results in long latency.
Cut-through: As soon as destination port is
known, transmission starts. Results in short
latency but if the frame is faulty the fault is
propagated.

256
frame

45

Kenny Ranerup 1999-11-1

46

Kenny Ranerup 1999-11-1

Congestion

Congestion Control

When an output port receives more than it


can transmit, the port becomes congested
and the queue starts to grow very fast.

There are several methods of relieving a


congestion situation.

0.5 Gbps

Signaling the transmitter to stop further


transmissions.
Dropping excessive frames.
Congestion notification.
Admission control which requires allocation of
bandwidth before transmission.

1
2
3

0.8 Gbps

47

Kenny Ranerup 1999-11-1

1.0 Gbps

48

Kenny Ranerup 1999-11-1

Flow Control / Pause control


PAUSE frames are sent to the traffic source on
ports that cause the congestion.

PAUSE frames

TE
TE
49

Kenny Ranerup 1999-11-1

Flow Control / Pause control


Affects all types of traffic from a source
which means that this method cant support
real-time traffic.
Its a link-level congestion control scheme
and therefore cant handle end-to-end
requirements effectively.
Works well with protocols that doesnt have
any congestion control built-in.
TCP/IP doesnt work well with this scheme.
50

Kenny Ranerup 1999-11-1

Admission Control

Explicit Congestion Indication

Another method of avoiding congestion is


admission control where each connection
through the network must be approved.
With admission control its possible to control
network load and therefore give tight
guarantees on available bandwidth and
latency.
This solution doesnt scale very well to
Internet size since each connection has to
be approved by all intermediate stations.

One method of handling congestion is by


signalling explicitly to nodes in the network
that a congestion situation has occured.
Pause control is a link-level ECI.

51

Kenny Ranerup 1999-11-1

52

Kenny Ranerup 1999-11-1

Implicit Congestion Indication

TCP/IP Congestion Control

Lost packets can be used as a congestion


indicator in networks with very low bit error
rate (BER).
Switches can use this to control the
bandwidth of TCP connections by dropping
packets on purpose.

TCP/IP has built-in congestion control


mechanisms based on e.g. transmission
window and slow-start.
TCP Window.

53

Kenny Ranerup 1999-11-1

The transmission window limits the number of


bytes allowed to be sent without receiving an
acknowledgement.
When packets are dropped in the network the
transmitter decreases the window size thereby
limiting the transmitted bandwidth.
54

Kenny Ranerup 1999-11-1

TCP/IP Congestion Control

Quality of Service (QoS)

TCP slow-start.

For multimedia and other sensitive traffic


types, some type of quality assurance is
desirable.

To avoid severe congestion after a congestion


situation disappears, each TCP transmitter is not
allowed to start sending at full bandwidth.
A gradual increase in allowed bandwidth up till
the maximum available bandwidth is used
instead.

55

Kenny Ranerup 1999-11-1

Guaranteed minimum bandwidth.


Guaranteed maximum latency.
Guaranteed maximum bandwidth/latency
variations.

56

Kenny Ranerup 1999-11-1

QoS

QoS: Latency

In a shared Ethernet, its difficult to


guarantee QoS due to the nature of
CSMA/CD.
In a switched network, latency and
bandwidth can be controlled within a single
switch but requires control protocols for endto-end guarantees spanning multiple
switches.

Latency is dependant on the traffic behavior,


queue lengths in the switch and the
arbitration protocol within the switch.
By separating different types of traffic into
different queues its possible to separate
bursty traffic from traffic that requires low
latency. Doesnt guarantee latency, though.

57

Kenny Ranerup 1999-11-1

58

Kenny Ranerup 1999-11-1

QoS: Bandwidth

Class-of-Service / DiffServ

Bandwidth allocation. The source tries to


allocate bandwidth and the switch either
rejects or accepts the request thereby not
oversubscribing its capacity.
Traffic reject by priority or class. By rejecting
(dropping) insensitive traffic when switch get
congested, sensitive traffic can get its
required bandwidth.

Define a small number of different classes,


e.g. real-time, best-effort.
Classify packets as they enter the network
into one of the classes and tag them with a
class identifier.
Switches in the network can then treat the
packets according to class within the
network without per-connection information.

59

Kenny Ranerup 1999-11-1

60

Kenny Ranerup 1999-11-1

10

= Sw itch or
R outer

DiffServ Domain

VLSI Architecture
for
Gigabit Ethernet Switches

Boundary Node
DiffServ Domain

Interior Node
61

Kenny Ranerup 1999-11-1

VLSI Architectures for Packet Switching

Switch Fabrics

Design goals:

Desired characteristics:

Low cost per port high integration


High aggregate performance high integration
Low power dissipation high integration
Scalability

63

Kenny Ranerup 1999-11-1

Non-blocking, i.e. can forward full traffic from all


input ports to all outputs without dropping
packets.
Low cost per port.

64

Switch Architecture

Kenny Ranerup 1999-11-1

Shared Memory Switch


MAC

Low chip count (single chip switch possible).


Low power consumption.

MAC

CAM

MAC

MAC

MAC

MAC

Advantages:

Disadvantages:
Serial
to
parallel

65

Kenny Ranerup 1999-11-1

Shared
buffer
memory

Parallel
to
serial

High shared memory bandwidth.


High lookup rate.

66

Kenny Ranerup 1999-11-1

11

67

Kenny Ranerup 1999-11-1

Serial
to
parallel

68

Shared
buffer
memory

MAC

MAC

CAM

MAC

MAC

The required shared memory bandwidth is


very large.
Each incoming packet results in several
address lookups to determine destination
port.
Each Gigabit Ethernet link uses many pins.

MAC

Switch Architecture
MAC

Switching Challenges at Gigabit Speed

Parallel
to
serial

Kenny Ranerup 1999-11-1

Switching at 1 Gbit/s

Shared Memory Bandwidth

Each link runs at 1 Gbit/s full duplex.


This results in 1.5 million packets/s.
A typical switch will have 32 ports on a
single chip (within two years).
Total packet rate is 48 million packets/s.

32 ports @ 1 Gbit/s
32 * 2 * 125 Mbyte/s = 8 Gbyte/s
32 ports @ 10 Gbit/s
32 * 2 * 1250 Mbyte/s = 80 Gbyte/s

69

Kenny Ranerup 1999-11-1

70

Kenny Ranerup 1999-11-1

Shared Memory Bandwidth

Address Lookup Rate

One solution is to convert the data stream


into a more parallel stream.

32 ports @ 1 Gbit/s
48 million packets/s.
3 lookups/packet

Using a 512 bit wide memory the 8 Gbyte/s


requires only 125 MHz.

144 million address lookups/s.


32 ports @ 10 Gbit/s
1440 million address lookups / s.

71

Kenny Ranerup 1999-11-1

72

Kenny Ranerup 1999-11-1

12

Address Lookup Rate

Pin Count

Its clear that lookup algorithms like binary


search trees and hash lookups will be
difficult to implement at these speeds.
Special full-custom CAM:s are therefore
used.

32 ports @ 10 Gbit/s (full duplex).


1 GHz / pin.

73

Kenny Ranerup 1999-11-1

Pin Count
Low swing to reduce average power and
peak currents.
High frequency to reduce number of pins.
Very difficult to avoid skew on PCB. Signaling
should probably be asynchronous.
Signals mustnt switch simultaneously.
Interface must be standardized.

75

Kenny Ranerup 1999-11-1

Processor Performance
Single processor.
48 million packets/s (32 ports @ 1 Gbit/s).
1000 MIPS processor.

20 instructions/packet.

Multiprocessor: one processor per port.

74

32 * 10 * 2 = 640 pins @ 1 GHz.

Kenny Ranerup 1999-11-1

Packet Processing
The amount of processing performed for
each packet is increasing rapidly as more
advanced services are introduced.
Programmable solutions are required to give
sufficient flexibility and speed of
implementation.

76

Kenny Ranerup 1999-11-1

Processor Power Budget


Assume a total power budget of 40 W on a
switch with 32 ports @ 1 Gbit/s.
Assume half of the power consumed by port
processors each running at 1 GHz.
40 / 2 / 32 = 0.6 W / processor @ 1 GHz

1.5 million packets/s/port.


1000 MIPS processor.

670 instructions/packet.

77

Kenny Ranerup 1999-11-1

78

Kenny Ranerup 1999-11-1

13

Design Flow

79

Kenny Ranerup 1999-11-1

81

MAC

MAC

MAC
ReEnc

CAM

ReEnc

Address
lookup

ReEnc

MAC

MAC
PDEC

PDEC

PDEC

MAC

Single Chip Shared Memory Switch

Serial
to
parallel

Shared
buffer
memory

Parallel
to
serial

CPU i/f

Q-Engine

Rambus i/f

Kenny Ranerup 1999-11-1

Design Approach
Choose an architecture that makes it
possible to utilize the silicon.
The design is divided into a number of
blocks with 1-2 designers per block.
Critical blocks is designed in full-custom,
mostly memory or datapath-like structures.
All other blocks are designed at RTL level
and synthesized to standard cells.
82

Kenny Ranerup 1999-11-1

Standard Cell Design

Standard Cell Design

Design and simulate at RTL level.

Synthesis

The design is entered mostly as synthesisable


Verilog code using text editor.
Verilog/VHDL descriptions simulated at RTL
level using Modelsim.
Full-custom blocks are modeled at RTL with
timing (including compiled memories).
The RTL code is synthesized continually during
design.

83

Kenny Ranerup 1999-11-1

Synopsys synthesis using standard cell library.


Timing requirements are set up at block
boundary to allow for long communication paths
between blocks.

Static Timing Analysis


Performed initially in Synopsys at block level.
Timing analysis at chip-level is performed using
PrimeTime and also in the back-end tools.
84

Kenny Ranerup 1999-11-1

14

Full-Custom Design

Methodology Comparison

Some full-custom blocks are drawn manually


using the GDT/Led editor.
Regular structures are generated using the
L-language in GDT. Examples are SRAMs
and CAMs.
Fab independent 0.25um design rules are
used.

Build a serial to parallel converter that


converts 16 8-bit wide data streams into a
single 512-bit wide stream.
Compare:

85

Kenny Ranerup 1999-11-1

a semi-custom approach using synthesized


standard cells.
a full custom layout structure.

86

Kenny Ranerup 1999-11-1

Methodology Comparison
OTHERS

30mm2

12mm2

6.3W

0.5W

Semi-Custom Approach
~3x the Area
~12x the power

SwitchCore

Full Custom Approach

Assumes: .25u, 2.5v process

87

Kenny Ranerup 1999-11-1

Switchcore AB

CXe-16 Prototype Chip


A Single Chip
16-port Gigabit
Ethernet Switch

Founded 1997 to commercialize a Swedish


research project jointly developed at LTH
and LiTH.
Research project demonstrated a single chip
8 x 10 Gbit/s ATM switch.
Current products:
Single chip 16-port Gigabit Ethernet switch.
Single chip 24 FE + 4 GE Ethernet switch.
CAM chip for large address tables.
89

Kenny Ranerup 1999-11-1

90

Kenny Ranerup 1999-11-1

15

Characteristics

References

Non-blocking switch fabric with 32 Gbit/s


switching bandwidth.
16 x 1 Gbit/s full duplex Ethernet ports.
Layer-2 and layer-3 switching.
Incoming traffic: 31 million packets/s.

Gigabit Ethernet by Jayant Kadambi et al,


ISBN 0-13-913286-4.
Interconnections Second Edition:
Bridges, Routers, Switches and
Internetworking Protocols by Radia
Perlman, ISBN 0-201-63448-1.
IP Switching by Christopher Y. Metz, ISBN
0-07-041953-1.
http://www.switchcore.com

91

Kenny Ranerup 1999-11-1

92

Kenny Ranerup 1999-11-1

16

Das könnte Ihnen auch gefallen