Sie sind auf Seite 1von 66

Alexander Paul

paulalex@de.ibm.com
ETS- Enhanced Technical Support

AIX-VUG
Demystifying 10 Gb Ethernet Performance

AIX-VUG Demystifying 10 Gb Ethernet Performance

Tools: How to benchmark ?


Method 1: AIX FTP Client and Server
ftp> put "| dd if=/dev/zero bs=1M count=1000" /dev/null
1048576000 bytes sent in 5.991 seconds (1.709e+05 Kbytes/s)

~1.3 Gbit/s
Warning: ftp client and ftpd server are single-threaded kernel processes!
Pid Command
9240598 ftpd
8782016 ftp

Inuse
24497
23847

Pin
Pgsp
9668
9668

Virtual 64-bit Mthrd


0
23940
N
N
0
23766

16MB
N
N

N
N

Running multiple ftp client sessions in parallel to get the desired overall throughput
Example:
#vi .netrc
machine 10gbench2 login root password foo
macdef init
put "| dd if=/dev/zero bs=1M count=1000" /dev/null
bye
<insert a blank line here>
2

chmod 400 .netrc


for i in 1 2 3 4 5 6 7 8
> do
> ftp 10gbench2 | grep seconds &
> done

AIX-VUG Demystifying 10 Gb Ethernet Performance

Tools: How to benchmark ?


Method 2: iperf
Open Source network benchmarking tool, written in C

Easy to use for determining network throughput


TCP and UDP benchmarks possible
Multithreaded program
iperf binary can be started in client- or server-mode

Available in a compiled version for AIX @Perzl.org:

http://www.oss4aix.org/download/RPMS/iperf/

#iperf s
-----------------------------------------------------------Server listening on TCP port 5001
TCP window size: 16.0 KByte (default)
-----------------------------------------------------------# iperf -c localhost -t 60 -P 8
-----------------------------------------------------------Client connecting to localhost, TCP port 5001
TCP window size: 132 KByte (default)
[ ID] Interval
Transfer
Bandwidth
[ 10] 0.0-60.0 sec 40.7 GBytes 5.82 Gbits/sec
[ 3] 0.0-60.0 sec 40.6 GBytes 5.81 Gbits/sec
[ 4] 0.0-60.0 sec 40.4 GBytes 5.79 Gbits/sec
[ 5] 0.0-60.0 sec 40.5 GBytes 5.80 Gbits/sec
[ 6] 0.0-60.0 sec 40.9 GBytes 5.86 Gbits/sec
[ 7] 0.0-60.0 sec 40.7 GBytes 5.83 Gbits/sec
[ 8] 0.0-60.0 sec 40.6 GBytes 5.82 Gbits/sec
[ 9] 0.0-60.0 sec 40.7 GBytes 5.82 Gbits/sec
[SUM] 0.0-60.0 sec
325 GBytes 46.5 Gbits/sec
3

AIX-VUG Demystifying 10 Gb Ethernet Performance

Tools: How to benchmark ?


Method 2.1: jperf
Graphical Java front end for iperf
Requires preinstallation of iperf on sender and receiver site
GUI initiates iperf with options and in client or server mode
Corresponding site can use iperf in GUI or CLI mode

iperf c trade3 P 4 i 1 p 5001 f k t 10

Control panel

Save and load


benchmark runs

Graphical output

TCP settings
Buffer Length,
Window Size
MSS
No Delay
4

CLI output

AIX-VUG Demystifying 10 Gb Ethernet Performance

Tools: How to benchmark ?


Method 3: netperf
More advanced Open Source server and client toolset, written in C
Installing Netperf:
Get netperf source code from ftp://ftp.netperf.org/netperf
gunzip netperf-2.5.0.tar.gz
tar -xvf netperf-2.5.0.tar
cd netperf-2.5.0
./configure CFLAGS="-Wl,-bnoobjreorder -lperfstat" --enable-burst
make ; make install

Throughput
benchmark:

TCP RTT
benchmark:

#netperf -H 192.168.50.1 -t TCP_STREAM -v 0 -f m -i 3


MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.50.1 (192.168.50.1) port 0 AF_INET : +/-2.500% @ 99% conf.
Throughput in Mbit/s
1724.45

#netperf -H 192.168.50.1 -t TCP_RR -v 50


MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.50.1 (192.168.50.1) port 0 AF_INET : first burst 0
Alignment
Offset
RoundTrip Trans
Throughput
Local Remote Local Remote Latency
Rate
10^6bits/s
Send
Recv
Send
Recv
usec/Tran per sec Outbound
Inbound
8
0
0
0
66.075
15134.359 0.121
0.12
TCP Round Trip Time in s/Transaction

AIX-VUG Demystifying 10 Gb Ethernet Performance

Lets test the performance of a #5287 adapter


in a Power 720 system
Adapter:

Benchmark system:

#5287 PCIe2 2-port 10GbE SR


Part Number: 74Y2094
Emulex chipset

Power 720 (8202-E4C)


8 P7 cores 3024 MHz
FW AL740_100

Throughput
test

Description

IBM Part Numbers

PCIe2 (Gen2) Low Profile 2-Port 10GbE SR

FC 5284

PCIe2 (Gen2) Low Profile 2-Port 10GbE SFP+ Copper FC 5286

PCIe2 (Gen2) Full Height 2-Port 10GbE SR

FC 5287

PCIe2 (Gen2) Full Height 2-Port 10GbE SFP+ Copper

FC 528

AIX-VUG Demystifying 10 Gb Ethernet Performance

Benchmark environment
~ 9 Gbit/s
Where is the problem?

AIX LPAR 1

AIX LPAR 2

~ 3 Gbit/s
Power 750 - 8408-E8D
AIX PAR 1

Power 720 - 8202-E4C


Virtual I/O Server

SEA

Virt.
Eth.

10GbE
SR

Virt.
Eth.

PVID 10

vSwitch

Etherchannel

10GbE
SR

PVID 10

AIX LPAR 2

10 GbE
Network

10GbE
SR
#5287

AIX-VUG Demystifying 10 Gb Ethernet Performance

Ethernet Switching on IBM Power Systems


Switching is a hypervisor objective and includes the following major layer 2 tasks:
Frame forwarding is performed as memory transfer, initiated with a
H_SEND_LOGICAL_LAN call at the sending side
Source MAC-Address learning from incoming ethernet frames
Broadcasting and multicast forwarding
Frame queuing and forwarding in two directions:
Incoming: Frames received by the Hypervisor
Outgoing: Frames delivered to a Virtual Ethernet Adapter
Processing of header information for IEEE 802.1q tagged frames (VLAN / CoS)

Forwarding
memcpy

Sending Direction

Client LPAR level


8

Sending Direction

Hypervisor level

Client LPAR level

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet - EC=0.4, capped


Benchmark with 8 parallel TCP sessions
Configuration:
Client LPAR:
Power 770 9117-MMB
AIX 6.1 TL6 SP 3
EC=0.4 Units, capped
2 VPs
Virtual Ethernet Adapter, MTU 1500

Server LPAR:
Power 770 9117-MMB (Same as client)
AIX 6.1 TL6 SP 3
EC=3.0 Units, uncapped
4 VPs
Virtual Ethernet Adapter, MTU 1500

What do you think is the resulting throughput?


Client LPAR

Server LPAR

uncapped

capped
Virt.
Eth.

Traffic
direction

PVID 1

PHYP
Switch
9

Virt.
Eth.

PVID 1
VLAN 1

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet - EC=0.4, capped

MTU 1500:

10

Client connecting to 192.168.2.3, TCP port 5001


TCP window size: 256 KByte (default)
-----------------------------------------------------------[ ID] Interval
Transfer
Bandwidth
[ 4] 0.0-300.0 sec 1.97 GBytes 56.5 Mbits/sec
[ 8] 0.0-300.0 sec 2.12 GBytes 60.6 Mbits/sec
[ 5] 0.0-300.0 sec 1.94 GBytes 55.6 Mbits/sec
[ 10] 0.0-300.0 sec 2.00 GBytes 57.4 Mbits/sec
[ 3] 0.0-300.0 sec 1.98 GBytes 56.8 Mbits/sec
[ 9] 0.0-300.0 sec 1.87 GBytes 53.5 Mbits/sec
[ 6] 0.0-300.0 sec 1.93 GBytes 55.2 Mbits/sec
[ 7] 0.0-300.0 sec 1.95 GBytes 55.8 Mbits/sec
[SUM] 0.0-300.0 sec 15.8 GBytes
451 Mbits/sec

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to CPU time for Virtual Ethernet


Benchmark with 8 parallel TCP sessions
Configuration:
Client LPAR:
Power 770 9117-MMB
AIX 6.1 TL6 SP 3
capped
2 VPs
Virtual Ethernet Adapter, MTU 1500

Server LPAR:
Power 770 9117-MMB (Same as client)
AIX 6.1 TL6 SP 3
uncapped, EC=3.0 Units
4 VPs
Virtual Ethernet Adapter, MTU 1500
Maximum throughput
~ 1,25 Gbit/s

Throughput Virtual Ethernet MTU 1500


Throughput
[Gbps]
1,2

Baseline 0.4 CPU Units with


451 Mbit/s

0,8

Throughput
MTU 1500

0,6

0,4

0,2

0
0
11

0,2

0,4

0,6

0,8

1,2

1,4

1,6

CPU Units

AIX-VUG Demystifying 10 Gb Ethernet Performance

Virtual Processor dispatching


TP
0,82
[Gbps]
0,8

physc=0,66

physc=0,68

0,78
0,76

minus 19 Mbit/s

0,74
0,72
0,7
0,68
0,59

0,64

0,69

0,74

0,79

CPU units

mpstat s
physc=0,66

mpstat s
physc=0,68

12

--------------------------------------------------------------Proc0
Proc4
65.91%
0.01%
cpu0
cpu1
cpu2
cpu3
cpu4
cpu5
cpu6
cpu7
32.81% 16.36%
8.24%
8.50%
0.00%
0.00%
0.00%
0.01%
----------------------------------------------------------------------------------------------------------------------------Proc0
Proc4
52.82%
14.43%
cpu0
cpu1
cpu2
cpu3
cpu4
cpu5
cpu6
cpu7
26.09% 12.18%
7.21%
7.34%
5.33%
3.27%
3.02%
2.82%
-------------------------------------------------------------

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to CPU time for Virtual Ethernet


Benchmark with 8 parallel TCP sessions
Configuration:
Client LPAR:
Power 720 8202-E4C
AIX 7.1 TL1 SP 3
EC=0.4 Units, capped
2 VPs
Virtual Ethernet Adapter, MTU 1500

Server LPAR:
Power 720 8202-E4C (Same as client)
AIX 7.1 TL1 SP 3
EC=3.0 Units, uncapped
4 VPs
Virtual Ethernet Adapter, MTU 1500
Maximum throughput
~ 1,6 Gbit/s

Throughput Virtual Ethernet MTU 1500

Throughput
[Gbps]
Baseline on E4C w.
0.4 CPU units with
991 Mbit/s

1,6
1,4
1,2
1
0,8

TP 9117-MMB
default

0,6

"TP 8202-E4C
default"

0,4
0,2
0
0

0,2

0,4

0,6

0,8

1,2

1,4

1,6

1,8

CPU Units
13

AIX-VUG Demystifying 10 Gb Ethernet Performance

Virtual Processor dispatching


TP
1,2
[Gbps]
1,15
1,1

physc=0,45

physc=0,50

1,05
1

minus 210 Mbit/s

0,95
0,9
0,85
0,8
0,36

mpstat s
physc=0,45

mpstat s
physc=0,50

14

0,41

0,46

0,51

0,56

0,61

0,66

0,71

0,76
CPU units

--------------------------------------------------------------Proc0
Proc4
44.98%
0.01%
cpu0
cpu1
cpu2
cpu3
cpu4
cpu5
cpu6
cpu7
22.39% 11.16%
5.62%
5.80%
0.00%
0.00%
0.00%
0.01%
----------------------------------------------------------------------------------------------------------------------------Proc0
Proc4
39.22%
10.72%
cpu0
cpu1
cpu2
cpu3
cpu4
cpu5
cpu6
cpu7
19.37%
9.04%
5.35%
5.45%
3.96%
2.43%
2.24%
2.09%
-------------------------------------------------------------

AIX-VUG Demystifying 10 Gb Ethernet Performance

What is the reason for the limited maximum throughput?


a->b:
total packets:
ack pkts sent:
pure acks sent:
sack pkts sent:
dsack pkts sent:
max sack blks/ack:
unique bytes sent:
actual data pkts:
actual data bytes:
rexmt data pkts:
rexmt data bytes:
zwnd probe pkts:
zwnd probe bytes:
outoforder pkts:
pushed data pkts:
SYN/FIN pkts sent:
req 1323 ws/ts:
urgent data pkts:
urgent data bytes:
mss requested:
max segm size:
min segm size:
avg segm size:
max win adv:
min win adv:
zero win adv:
avg win adv:
initial window:
initial window:
ttl stream length:
missed data:
truncated data:
truncated packets:
data xmit time:
idletime max:
throughput:
RTT
RTT
RTT
RTT
RTT
15

samples:
min:
max:
avg:
stdev:

Starting with a
TCP trace analysis:

b->a:
8081
8081
0
0
0
0
11698392
8081
11701288
2
2896
0
0
2
5
0/0
N/Y
0
0
0
1448
1448
1447
32761
32761
0
32761
1448
1
NA
NA
11588154
8081
3.929
211.6
2977123
2706
11.7
46.9
25.0
6.9

bytes
pkts
secs
ms
Bps

total packets:
ack pkts sent:
pure acks sent:
sack pkts sent:
dsack pkts sent:
max sack blks/ack:
unique bytes sent:
actual data pkts:
actual data bytes:
rexmt data pkts:
rexmt data bytes:
zwnd probe pkts:
zwnd probe bytes:
outoforder pkts:
pushed data pkts:
SYN/FIN pkts sent:
req 1323 ws/ts:
urgent data pkts:
urgent data bytes:
mss requested:
max segm size:
min segm size:
avg segm size:
max win adv:
min win adv:
zero win adv:
avg win adv:
initial window:
initial window:
ttl stream length:
missed data:
truncated data:
truncated packets:
data xmit time:
idletime max:
throughput:

ms
ms
ms
ms

RTT
RTT
RTT
RTT
RTT

pkts
bytes
bytes
bytes
bytes
bytes
bytes
bytes
times
bytes
bytes
pkts

samples:
min:
max:
avg:
stdev:

3639
3639
3639
0
0
0
0
0
0
0
0
0
0
0
0
0/0
N/Y
0
0
0
0
0
0
65522
38734
0
65509
0
0
NA
NA
0
0
0.000
211.6
0
0
0.0
0.0
0.0
0.0

pkts
bytes
bytes
bytes
bytes
bytes
bytes
bytes
times
bytes
bytes
pkts

bytes
pkts
secs
ms
Bps

ms
ms
ms
ms

Less than 0,03 % of retransmissions

Sufficient TCP buffer space on


receiving site

Gaps with idle time and no use of


SACK results in relativly high
segment Round trip times (RTT)

AIX-VUG Demystifying 10 Gb Ethernet Performance

What is the reason for the limited throughput?


Kernel trace curt report from client with 0.4 CPU units, capped:

Count
========
30419
18173
3189
693
689
689

Total Time
(msec)
===========
157.0195
26.2612
2.7187
1.1146
0.7688
0.3535

% sys
time
======
0.63%
0.11%
0.01%
0.00%
0.00%
0.00%

Avg Time
(msec)
========
0.0052
0.0014
0.0009
0.0016
0.0011
0.0005

Min Time
(msec)
========
0.0004
0.0005
0.0005
0.0010
0.0005
0.0003

Hypervisor Calls Summary


-----------------------Max Time Tot ETime Avg ETime
(msec)
(msec)
(msec)
========
======== =========
0.0301
363.9006
0.0120
0.0162
38.4263
0.0021
0.0050
2.7187
0.0009
0.0035
1.1146
0.0016
0.0026
0.7688
0.0011
0.0046
0.3535
0.0005

Min ETime
(msec)
=========
0.0019
0.0007
0.0005
0.0010
0.0005
0.0003

Max ETime
(msec)
=========
7.5224
6.0742
0.0050
0.0035
0.0026
0.0046

Min ETime
(msec)
=========
0.0020
0.0008
0.0006
0.0009
0.0007
0.0003

Max ETime
(msec)
=========
0.0221
0.0129
0.0081
0.0026
0.0157
0.0009

HCALL (Caller Address)

========================
H_SEND_LOGICAL_LAN((unknown) 41977e8)
H_ADD_LOGICAL_LAN_BUFFER((unknown) 4191
H_PROD((unknown) 6ffb8)
H_XIRR((unknown) 41187cc)
H_EOI((unknown) 41149b8)
H_CPPR((unknown) 4112b08)

Kernel trace curt report from client with 1.3 CPU units:
Count
========
27187
13489
2104
502
501
501

16

Total Time
(msec)
===========
133.6836
16.9196
3.4490
0.7127
0.5384
0.1983

% sys
time
======
4.39%
0.56%
0.11%
0.02%
0.02%
0.01%

Avg Time
(msec)
========
0.0049
0.0013
0.0016
0.0014
0.0011
0.0004

Min Time
(msec)
========
0.0005
0.0008
0.0006
0.0009
0.0007
0.0003

Hypervisor Calls Summary


-----------------------Max Time Tot ETime Avg ETime
(msec)
(msec)
(msec)
========
======== =========
0.0221
133.9216
0.0049
0.0129
16.9269
0.0013
0.0081
3.4490
0.0016
0.0026
0.7127
0.0014
0.0157
0.5384
0.0011
0.0009
0.1983
0.0004

HCALL (Caller Address)

========================
H_SEND_LOGICAL_LAN((unknown) 41977e8)
H_ADD_LOGICAL_LAN_BUFFER((unknown) 4191
H_PROD((unknown) 6ffb8)
H_XIRR((unknown) 41187cc)
H_EOI((unknown) 41149b8)
H_CPPR((unknown) 4112b08)

AIX-VUG Demystifying 10 Gb Ethernet Performance

CPU consumption from an


overall perspective
Power 770 - 9117-MMB
Client LPAR

Power 720 - 8202-E4C


Virtual I/O Server

SEA

Virt.
Eth.

vSwitch

10GbE
SR
10GbE
SR

Virt.
Eth.

PVID 1

17

Etherchannel

PVID 1

Server LPAR

Virtual I/O Server

10 GbE
Network

10GbE
SR
10GbE
SR

Etherchannel

SEA

Virt.
Eth.

Virt.
Eth.

PVID 1

vSwitch

PVID 1

AIX-VUG Demystifying 10 Gb Ethernet Performance

Anatomy of seaproc
seaproc is a 64 bit, multithreaded kernel process
Each active Shared Ethernet Adapter runs a dedicated seaproc instance
seaproc needs CPU cycles for bridging activity
The efficiency of a particular Shared Ethernet Adapter depends on how the corresponding
seaproc threads can perform
# ps -alk | grep seaproc
40303 A
0 3080304
40303 A 10 3801156
40303 A
0 3866764

1
1
1

0
0
0

37 -- 86c0bb190
37 -- 87cc7f190
37 -- 82c0cb190

1024
1024
1024

*
*
*

- 0:00 seaproc
- 22:47 seaproc
- 126:14 seaproc

while true; do ps -lm -p 13369558 -o THREAD;sleep 2;done


USER
PID
root 13369558
-

18

PPID
1
-

TID ST
5963991
6160625
15663143
18153487
18350131
22413341
24117287

CP
A
S
S
S
S
R
S
R

PRI
71
0
0
0
0
36
0
35

SC
37
37
37
37
37
37
37
37

WCHAN
7
1
1
1
1
1
1
1

F
*
40303
f1000a001c3c1318
f1000a001bd00c78
f1000a001c060fc8
f1000a001beb0e20
1000
f1000a001c5714c0
1400

TT BND COMMAND
- seaproc
1400
1400
1400
1400
- 1400
- -

AIX-VUG Demystifying 10 Gb Ethernet Performance

Sizing diagram for shared CPU units in a SEA environment


How much CPU
units are needed for
a particular
throughput?

Throughput: 930 Mbit/s


Overall CPU utilization

2,82
2,5

Sending VIOS

0,74
CPU Units

Receiving VIOS
1,5

0,73
Server LPAR

0,95

0,5

Client LPAR

0,4

0
1
Numbers are dependent on Power Systems Model and hardware configuration

Which instances are


involved in network
activity from an
overall perspective?

AIX-VUG Demystifying 10 Gb Ethernet Performance

Overall CPU consumption in a 10GbE PCIe2 environment with SEA


Max. ~1,6 Gbit/s

TP [Gb/s]

20

AIX-VUG Demystifying 10 Gb Ethernet Performance

Jumbo Frames
The term Jumbo Frame specifies a payload size of more than 1500 bytes and up to
9000 bytes encapsulated within one ethernet frame
Jumbo Frames can significantly reduce the cpu time for data forwarding
Using Jumbo Frames has no effect for data packets, smaller than 1500 bytes
of payload content
Jumbo Frames must be implemented on an end-to-end basis
Networking equipment (physical and virtual) on all potential paths between sender and receiver
must be configured for Jumbo Frame
Internally on Power Systems (Examples will be in the following slides)
Virtual Ethernet Adapters in client partitions
Shared Ethernet Adapters in VIO-Servers
Etherchannel devices
Physical network adapters
At data center level:
Access Layer Switches
Aggregation and core Multilayer-Switches and Router
Security devices like Firewalls and Intrusion Detection Systems
Layer 3 devices like Routers and Firewalls can fragment Jumbo Frames into smaller MTU sized
data packets but with impact on performance (cf. see Andrew S. Tanenbaum: Computer Networks)
21

AIX-VUG Demystifying 10 Gb Ethernet Performance

Using Jumbo Frames within Shared Ethernet Adapter setup


Benchmark with 8 parallel TCP sessions
Configuration:
Managed System: Power 720 - 8202-E4C
Client LPAR:
AIX 7.1 TL1 SP 3, capped, weight 128 Units, 2 VPs
Virtual Ethernet Adapter, MTU 9000

Server LPAR:
AIX 7.1 TL1 SP 3, uncapped,
weight 128, 4VPs
Virtual Ethernet Adapter, MTU 9000
Virtual I/O Servers:
EC=2.0 Units, uncapped, weight 255
PCIe2 2-port 10GbE SR Adapter
To be tuned
for MTU 9000
Power 720 - 8202-E4C

Client LPAR

Virtual I/O Server 1

SEA

Virt.
Eth.

Virt.
Eth.

PVID 1

vSwitch1

PHYP
22

10GbE
SR

Server LPAR

Virtual I/O Server 2

10 GbE
Network

10GbE
SR

SEA

Virt.
Eth.

Virt.
Eth.

PVID 1

PVID 1

vSwitch2

PVID 1

AIX-VUG Demystifying 10 Gb Ethernet Performance

Using Virtual Ethernet Adapters with MTU 9000


#lsattr -El en0
alias4
alias6
arp
on
authority
broadcast
mtu
9000
mtu
9000
netaddr
10.31.203.194
netaddr6
netmask
255.255.255.0
prefixlen
remmtu
576
rfc1323
[]

MTU size for Virtual Ethernet Adapters can by


dynamically changed from 1500 byte (default)
to 9000 byte.
# chdev l en0 a mtu=9000

Server LPAR

Client LPAR

Virt.
Eth.

Virt.
Eth.

PVID 1

PHYP

23

PVID 1
VLAN 1

AIX-VUG Demystifying 10 Gb Ethernet Performance

Client and VIOS setup with Jumbo Frame support


Checklist for configuring MTU 9000
1. @VIOS: Bring SEA down:
vios$ rmdev -dev <SEA> ucfg
entS Defined
2. @VIOS: Enable Jumbo Frame support for the Real Adapter:
vios$ chdev -dev <Real> -attr jumbo_frames=yes
entR changed
3. @VIOS: Enable Jumbo Frame support for Shared Ethernet Adapter:
vios$ chdev -dev <SEA> -attr jumbo_frames=yes
entS changed
4. @Clients: Change MTU size for Vent interfaces to 9000 byte:
client# chdev -l en0 -a mtu=9000
en0 changed
5. @VIOS: Reactivate SEA
vios$ cfgdev -dev <SEA>
vios$ lsdev | grep <SEA>
entS
Available

24

Shared Ethernet Adapter

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet - EC=0.4, capped

MTU 9000:

25

Client connecting to 192.168.2.3, TCP port 5001


TCP window size: 262 KByte (default)
-----------------------------------------------------------[ ID] Interval
Transfer
Bandwidth
[ 10] 0.0-300.0 sec 7.21 GBytes
207 Mbits/sec
[ 3] 0.0-300.0 sec 7.30 GBytes
209 Mbits/sec
[ 4] 0.0-300.0 sec 7.27 GBytes
208 Mbits/sec
[ 5] 0.0-300.0 sec 7.22 GBytes
207 Mbits/sec
[ 9] 0.0-300.0 sec 7.12 GBytes
204 Mbits/sec
[ 7] 0.0-300.0 sec 7.23 GBytes
207 Mbits/sec
[ 8] 0.0-300.0 sec 7.19 GBytes
206 Mbits/sec
[ 6] 0.0-300.0 sec 7.25 GBytes
208 Mbits/sec
[SUM] 0.0-300.0 sec 57.8 GBytes 1.65 Gbits/sec

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to CPU time for Virtual Ethernet


Benchmark with 8 parallel TCP sessions
Configuration:
Client LPAR:
Power 770 9117-MMB
AIX 6.1 TL6 SP 3
capped
2 VPs
Virtual Ethernet Adapter, MTU 9000

Server LPAR:
Power 770 9117-MMB (Same as client)
AIX 6.1 TL6 SP 3
uncapped, EC=3.0 Units
4 VPs
Virtual Ethernet Adapter, MTU 9000

Significant throughput
scale by factor ~ x3

26

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet - EC=0.4, capped


Benchmark with 8 parallel TCP sessions
Configuration:
Client LPAR:
Power 720 8202-E4C
AIX 7.1 TL1 SP 3
EC=0.4 Units, capped
2 VPs
Virtual Ethernet Adapter, MTU 1500

Server LPAR:
Power 720 8202-E4C (Same as client)
AIX 7.1 TL1 SP 3
EC=3.0 Units, uncapped
4 VPs
Virtual Ethernet Adapter, MTU 1500

Client LPAR

Server LPAR

uncapped

capped
Virt.
Eth.

Traffic
direction

PVID 1

PHYP
Switch

27

Virt.
Eth.

PVID 1
VLAN 1

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to CPU time for Virtual Ethernet


Benchmark with 8 parallel TCP sessions
Configuration:
Client LPAR:
Power 720 8202-E4C
AIX 7.1 TL1 SP 3
EC=0.4 Units, capped
2 VPs
Virtual Ethernet Adapter, MTU 9000

Server LPAR:
Power 720 8202-E4C (Same as client)
AIX 7.1 TL1 SP 3
EC=3.0 Units, uncapped
4 VPs
Virtual Ethernet Adapter, MTU 9000

Throughput Virtual Ethernet MTU 9000

Throughput
[Gbps]
7,00
6,00
5,00
4,00

TP 8102-E4C
MTU 9000

3,00

"TP 8202-E4C
MTU1500"

Average scale ~ x3,3

2,00
1,00
0,00
0

28

0,2

0,4

0,6

0,8

1,2

1,4

1,6

1,8

2
CPU Units

AIX-VUG Demystifying 10 Gb Ethernet Performance

Overall CPU consumption in a 10GbE PCIe2 environment with SEA

Here was the limit w.


MTU 1500

TP [Gb/s]

29

AIX-VUG Demystifying 10 Gb Ethernet Performance

Virtual Processor dispatching


TP
[Gbps]

5,00

4,50

physc=0,45

physc=0,60

minus 750 Mbit/s

4,00

3,50

3,00

2,50
0,38

0,43

0,48

0,53

0,58

0,63

0,68

0,73

0,78
CPU units

mpstat s
physc=0,45

mpstat s
physc=0,60

30

--------------------------------------------------------------Proc0
Proc4
44.96%
0.00%
cpu0
cpu1
cpu2
cpu3
cpu4
cpu5
cpu6
cpu7
22.38% 11.16%
5.62%
5.80%
0.00%
0.00%
0.00%
0.00%
----------------------------------------------------------------------------------------------------------------------------Proc0
Proc4
47.04%
12.86%
cpu0
cpu1
cpu2
cpu3
cpu4
cpu5
cpu6
cpu7
23.23% 10.85%
6.42%
6.54%
4.75%
2.91%
2.69%
2.51%
-------------------------------------------------------------

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet MTU 65390 - EC=0.4, capped


Benchmark with 8 parallel TCP sessions
Configuration:
Client LPAR:
Power 770 9117-MMD
AIX 7.1 TL2 SP 2
EC=0.4 Units, capped
4 VPs
Virtual Ethernet Adapter, MTU 65390

Server LPAR:
Power 770 9117-MMD (Same as client)
AIX 7.1 TL2 SP 2
EC=3.0 Units, uncapped
4 VPs
Virtual Ethernet Adapter, MTU 65390

Client LPAR

Server LPAR

uncapped

capped
Virt.
Eth.

Traffic
direction

PVID 1

PHYP
Switch

31

Virt.
Eth.

PVID 1
VLAN 1

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet MTU 65390 - EC=0.4, capped


Client connecting to 192.168.10.81, TCP port 5001
TCP window size: 319 KByte (default)
-----------------------------------------------------------[ ID] Interval
Transfer
Bandwidth
[ 3] 0.0-300.0 sec 13.7 GBytes
391 Mbits/sec
[ 5] 0.0-300.0 sec 12.4 GBytes
356 Mbits/sec
382 Mbits/sec
MTU 65390: [ 8] 0.0-300.0 sec 13.3 GBytes
[ 7] 0.0-300.0 sec 14.3 GBytes
410 Mbits/sec
[ 10] 0.0-300.0 sec 14.2 GBytes
407 Mbits/sec
[ 6] 0.0-300.0 sec 15.0 GBytes
430 Mbits/sec
[ 9] 0.0-300.0 sec 12.9 GBytes
368 Mbits/sec
[ 4] 0.0-300.0 sec 11.9 GBytes
342 Mbits/sec
[SUM] 0.0-300.0 sec
108 GBytes 3.09 Gbits/sec

32

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput baseline Virtual Ethernet MTU 65390 - uncapped


Client connecting to 192.168.10.81, TCP port 5001
TCP window size: 319 KByte (default)
-----------------------------------------------------------[ ID] Interval
Transfer
Bandwidth
[ 10] 0.0-300.0 sec 86.2 GBytes 2.47 Gbits/sec
[ 3] 0.0-300.0 sec 86.1 GBytes 2.47 Gbits/sec
MTU 65390: [ 4] 0.0-300.0 sec 86.3 GBytes 2.47 Gbits/sec
[ 5] 0.0-300.0 sec 86.2 GBytes 2.47 Gbits/sec
[ 6] 0.0-300.0 sec 86.1 GBytes 2.47 Gbits/sec
[ 7] 0.0-300.0 sec 85.8 GBytes 2.46 Gbits/sec
[ 8] 0.0-300.0 sec 86.2 GBytes 2.47 Gbits/sec
[ 9] 0.0-300.0 sec 86.1 GBytes 2.47 Gbits/sec
[SUM] 0.0-300.0 sec
689 GBytes 19.7 Gbits/sec

33

AIX-VUG Demystifying 10 Gb Ethernet Performance

Segmentation offload / aggregation within Shared Ethernet Adapter setup


Benchmark with 8 parallel TCP sessions
Configuration:
Managed System: Power 720 - 8202-E4C
Client LPAR:
AIX 7.1 TL1 SP 3, capped, weight 128 Units, 2 VPs
Virtual Ethernet Adapter, MTU 1500
Server LPAR:
AIX 7.1 TL1 SP 3, uncapped, weight 128, 4VPs
Virtual Ethernet Adapter, MTU 1500
Virtual I/O Servers:
EC=2.0 Units, uncapped, weight 255
PCIe2 2-port 10GbE SR Adapter

Client LPAR

Virtual I/O Server 1

SEA

capped
Virt.
Eth.

Virt.
Eth.

PVID 1

vSwitch1

PHYP
34

10GbE
SR

To be tuned
Power 720 - 8202-E4C
Server LPAR

Virtual I/O Server 2

10 GbE
Network

10GbE
SR

uncapped

SEA

Virt.
Eth.

Virt.
Eth.

PVID 1

PVID 1

vSwitch2

PVID 1

AIX-VUG Demystifying 10 Gb Ethernet Performance

Segmentation offload (largesend)


The task of data segmentation into frames with an appropriate MTU size is
offloaded from the operating systems level to the physical network adapter
Benefits of segmentation offload:
Significantly reduces the cpu consumption on client partitions and VIO-Servers
for packet delivery in the sending direction
Increases the effective VIO-Server outbound throughput for high-speed
network connections
Segmentation offload allows the client partition to send 64 kilobyte of data through
a Virtual Ethernet Adapter
Segmentation offload needs configuration on Power Systems client and VIOS
partitions only! It does not affect the configuration on physical network equipment
The following configuration steps are needed: (Example configuration comes in following slides)
Client partition: Enable largesend at the interface level
VIO-Server partition:
Configure largesend=1 for the Shared Ethernet Adapter
Ensure that the large_send=yes option is set for physical adapters
Etherchannel devices dont need further configuration
Checksum offload is implicitly active in sending direction when segmentation
offload is enabled
35

AIX-VUG Demystifying 10 Gb Ethernet Performance

Segment aggregation (large receive)


Segment aggregation allows the buffering of multiple Ethernet frames at VIOServer level and passes 64 kilobyte data chunks to the client Virtual Ethernet
Adapters
Benefits of large receive:
Significantly reduces the cpu consumption on client partitions and VIO-Servers
for packet delivery in the receiving direction
Increases the effective inbound throughput from the VIO-Server to the client
partitions
Reduces the number of interrupts on VIO-Servers and client partitions
The following configuration steps needs to be done (Example configuration in following slides)
VIO-Server partition:
Configure large_receive=yes for the Shared Ethernet Adapter
Ensure that the large_receive=yes option is set for physical adapters
Etherchannel devices dont need further configuration

36

AIX-VUG Demystifying 10 Gb Ethernet Performance

Enable segmentation offload and aggregation


Virtual I/O Server tuning:
Enable TCP segmentation offload by setting the Largesend attiribute on SEA:
chdev -dev <SEA> -attr largesend=1
Enable TCP receive segment aggregation on SEA:
chdev -dev <SEA> -attr large_receive=yes
LPAR tuning:
Activate TCP segmentation offload to physical Adapter HW:
ifconfig <enX> largesend
Or since AIX 7.1 TL 1 / AIX 6.1 TL 7 (lsattr -El en0)
mtu_bypass
off
Enable/Disable largesend for virtual Ethernet True

chdev l <enX> -l mtu_bypass=on


Virtual I/O Server

Largesend on Physical 10 GbE SR adapter


is enabled by default

Phys.
Adapter

SEA

Virt.
Eth.

PVID 1

PHYP
Switch

37

Client LPAR

Virt.
Eth.

PVID 1
VLAN 1

AIX-VUG Demystifying 10 Gb Ethernet Performance

Overall CPU consumption in a 10GbE PCIe2 environment with SEA

Here was the limit w.


default settings

TP [Gb/s]

38

AIX-VUG Demystifying 10 Gb Ethernet Performance

Overall CPU consumption in a 10GbE PCIe2 environment with SEA

TP [Gb/s]

39

AIX-VUG Demystifying 10 Gb Ethernet Performance

Fixes To Known Issues


IV07193, "aix SEA thread lock contention prevents scale up
IV08263: LARGE SEND PACKETS CAUSES TX TIME OUT.
U842917 devices.pciex.e4145616e4140518.rte 7.1.1.15
IV12776: ENTSTAT DISPLAYS INCORRECT HEA PACKETS DISCARDED COUNT
U843503 devices.chrp.IBM.lhea.rte 7.1.1.15
IV12784: WHEN JUMBO FRAMES IS ENABLED THE PORTS MIGHT DROP ALL THE
U842917 devices.pciex.e4145616e4140518.rte 7.1.1.15
U849664 devices.pciex.e4145616e4140518.rte 7.1.1.4
IV13811: ADAPTERS USING GOENTDD MAY STOP TRANSMITTING
U840901 devices.pci.14106902.rte 7.1.1.15
IV13813: LARGESEND FLAG ON SMALL PACKETS CAN CAUSE ADAPTER TO
MISBEHAVE
U840901 devices.pci.14106902.rte 7.1.1.15

AIX-VUG Demystifying 10 Gb Ethernet Performance

Fixes To Known Issues Cont.


IV15406: SACK - EXPONENTIAL TCP RETRANSMIT ENDS IN RESET "RST" CONNECTION
U843468 bos.net.tcp.client 7.1.1.15
IV17613: LARGESEND DOESNT WORK FROM VIOC TO VIOS WHEN IPSEC IS ENABLED
U843468 bos.net.tcp.client 7.1.1.15
IV17616: UNNECESSARY LARGESEND THROTTLING RESULTING IN POOR
PERFORMANCE
U843468 bos.net.tcp.client 7.1.1.15
IV17666: ALLOCATE NEW LOCK IDS FOR MASON DEVICE DRIVER
IV18708: HEA CREATES PACKET STORM WITH LARGESEND AND 0 MSS
U843503 devices.chrp.IBM.lhea.rte 7.1.1.15
IV18714: ISNO VALUES NOT SETUP FOR 10GIGE
U843061 devices.pciex.a21910071410d003.rte 7.1.1.15

AIX-VUG Demystifying 10 Gb Ethernet Performance

Sizing diagram for shared CPU units in a SEA environment


How many CPU
units are neccesarry
for gaining desired
throughput results?

Benchmark w. effective
Throughput: 930 Mbit/s
Overall CPU utilization

Which instances are


involved in network
activity from an
overall perspective?

2,82
2,5

Sending VIOS

0,74
CPU Units

Receiving VIOS
1,5

0,73

0,73
Server LPAR

0,95

0,5

Client LPAR

0,4

0
1
Results are dependent on Power Systems Model and hardware configuration
42

Improved with
appropriate tuning

AIX-VUG Demystifying 10 Gb Ethernet Performance

Sudden very high latency events in a bottleneck situation


405 ms

3x ~400 ms traffic stop in


10 second time frame

43

AIX-VUG Demystifying 10 Gb Ethernet Performance

Most important rule for Shared Processor LPARs


Plan sufficient Entitled Capacity (EC) on:
Client LPAR working as network client
(Provides sufficient CPU time for hypervisor to process H_SEND_LOGICAL_LAN)
Client LPAR working as network server (Resembling data)
VIO-Server on outgoing site (efficient packet pickup from hypervisor switch and hand
over to Etherchannel or Physical Adapter Device Drivers)
VIO-Server on incoming site
Look for high Virtual Context Switches (vcsw) rates:
# lparstat 2
%user %sys %wait %idle physc %entc lbusy vcsw phint
----- ----- ------ ------ ----- ----- ------ ----- ----1.2 54.6
0.0
44.2 0.72 82.3
8.6 9178 1326
1.5 52.8
0.0
45.6 0.81 92.6
17.4 10458 1296
1.4 48.7
0.0
49.9 0.96 110.6
18.4 9532 1312

44

AIX-VUG Demystifying 10 Gb Ethernet Performance

Client LPAR with insufficient entitlement


Client with EC=0.5, uncapped, weight=128
Client with network load ~ 8,3 Gbit/s
Curt report:

Total Physical CPU time (msec) = 750.95


Physical CPU percentage
= 66.24
Physical processor affinity
= 0.171925
Dispatch Histogram for processor (PHYSICAL CPUid : times_dispatched).
PHYSICAL
CPU 0 : 193
PHYSICAL
CPU 4 : 185
PHYSICAL
CPU 8 : 228
PHYSICAL
CPU 12 : 199
PHYSICAL
CPU 16 : 171
PHYSICAL
CPU 20 : 223
PHYSICAL
CPU 24 : 170
PHYSICAL
CPU 28 : 184
Total number of preemptions = 1553
Total number of H_CEDE
= 0
Total number of H_CONFER
= 0

45

with preeemption = 0
with preeemption = 0

AIX-VUG Demystifying 10 Gb Ethernet Performance

Client LPAR with sufficient entitlement


Client with EC=1.0, uncapped, weight=128
Client with network load ~ 12,4 Gbit/s
Curt report:

Total Physical CPU time (msec) = 957.54


Physical CPU percentage
= 87.56
Physical processor affinity
= 0.176370
Dispatch Histogram for processor (PHYSICAL CPUid : times_dispatched).
PHYSICAL
CPU 0 : 78
PHYSICAL
CPU 4 : 59
PHYSICAL
CPU 8 : 85
PHYSICAL
CPU 12 : 62
PHYSICAL
CPU 16 : 68
PHYSICAL
CPU 20 : 77
PHYSICAL
CPU 24 : 65
PHYSICAL
CPU 28 : 90
Total number of preemptions = 584
Total number of H_CEDE
= 38946822
with preeemption = 0
Total number of H_CONFER
= 0
with preeemption = 0

46

AIX-VUG Demystifying 10 Gb Ethernet Performance

Thread dispatch time and Network Virtualization


Client with sufficient entitlement:
419 Virtual CPU preemption/dispatch data
Preempt: Timeout, Dispatch: Timeslice vProcIndex=001F
rtrdelta=0.000 us enqdelta=0.000 us exdelta=13.718 us
start wait=12.105394 ms end wait=12.119112 ms
SRR0=000000000009AC3C SRR1=8000000000001032
dist: local srad=0 assoc=0

Client with insufficient entitlement:

Virtual CPU preemption/dispatch data


Preempt: Timeout, Dispatch: Timeslice vProcIndex=0005
rtrdelta=0.000 us enqdelta=500.164 us exdelta=57.324 us
start wait=0.000000 ms end wait=0.112652 ms
SRR0=000000000000A4C8 SRR1=8000000000009032
dist: local srad=0 assoc=0
TB delta until ready to
run: number of tics VP
had nothing to do(after
h_cede or h_confer)
47

TB delta until enqueued:


wait on frozen Q
(entitled capacity had
expired)

TB delta until running:


wait on dispatcher for
physical CPU

AIX-VUG Demystifying 10 Gb Ethernet Performance

Flow diagram with bottleneck removed

48

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors on Virtual Ethernet Adapters


ETHERNET STATISTICS (ent8) :
Device Type: Shared Ethernet Adapter
Hardware Address: 00:21:5e:e2:27:22
Elapsed Time: 111 days 19 hours 15 minutes 32 seconds
Transmit Statistics:
-------------------Packets: 140542106901
Bytes: 136466349514017
Interrupts: 0
Transmit Errors: 0
Packets Dropped: 0

Receive Statistics:
------------------Packets: 148017042933
Bytes: 141743288103445
Interrupts: 68921391370
Receive Errors: 0
Packets Dropped: 235745
Bad Packets: 0

No Resource Errors
can occur when the
appropriate amount of
Max Packets on S/W Transmit Queue: 321
memory can not be
S/W Transmit Queue Overflow: 0
added quickly enough to
Current S/W+H/W Transmit Queue Length: 1
vent buffer space. This
has
mainly two reasons:
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
Broadcast Packets: 107560097
Broadcast Packets: 215156995Too much workload or
Multicast Packets: 118240081
Multicast Packets: 252467976too less access to CPU
time.
No Carrier Sense: 0
CRC Errors: 0
DMA Underrun: 0
DMA Overrun: 0
Lost CTS Errors: 0
Alignment Errors: 0
No Resource Errors: 235745
Max Collision Errors: 0
Late Collision Errors: 0
Receive Collision Errors: 0
Deferred: 0
Packet Too Short Errors: 0
SQE Test: 0
Packet Too Long Errors: 0
Timeout Errors: 0
Packets Discarded by Adapter: 0
Single Collision Count: 0
Receiver Start Count: 0
[]
49Hypervisor Receive Failures: 235745

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors Virtual Ethernet Adapters


entstat d (-all on ioscli) provides statistics about preallocated bufferspace for Virtual Ethernet
Adapters
Min Buffers is the number of pre-allocated buffers, Max Buffers is the absolut maximum
Max Allocated represents the maximum number of buffers allocated
Number of buffers are dynamically adjusted btw. Min Buffers and Max Buffers
Always running on the maximum (Max Allocated = Max Buffers) is not a good idea and is normally the
hint to a serious bottleneck for latency and throughput
Also buffer post-allocation (Max Allocated >= Min Buffers) takes time and can negativly affect response
time or high-speed workloads
$ entstat all <SEA>
Move down to the Virtual (Trunk) Adapter statistics
Receive Information
Receive Buffers
Buffer Type
Min Buffers
Max Buffers
Allocated
Registered
History
Max Allocated
Lowest Registered
50

Tiny
512
2048
512
512

Small
512
2048
512
512

Medium
128
256
128
128

Large
24
64
24
24

Huge
24
64
24
24

512
508

1750
502

128
128

24
24

24
24

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors with high throughputs

Receive Information
Receive Buffers
Buffer Type
Min Buffers
Max Buffers
Allocated
Registered
History
Max Allocated
Lowest Registered

51

max alloc:
= min buf
< max buf

max alloc:
> min buf
< max buf

Tiny
512
2048
512
512

Small
512
2048
512
512

Medium
128
256
128
128

Large
24
64
24
24

Huge
24
64
24
24

512
509

523
502

138
123

39
19

64
18

max alloc:
> min buf
< max buf

max alloc:
> min buf
< max buf

max alloc:
> min buf
= max buf

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors on SEAs Virtual Adapter


Tuning the virtual ethernet adapter buffers:
On all bridging Virtual Ethernet Adapters (VENT) configured for the SEA
Reboot required use -P option for chdev if SEA is in use

VIOS

ent1
(VENT)
ent2
(SEA)
ent0
(phy)

52

chdev
chdev
chdev
chdev
chdev
chdev
chdev
chdev
chdev
chdev

-l
-l
l
-l
-l
-l
-l
-l
-l
-l

<VENT>
<VENT>
<VENT>
<VENT>
<VENT>
<VENT>
<VENT>
<VENT>
<VENT>
<VENT>

-a
-a
-a
-a
-a
-a
-a
-a
-a
-a

max_buf_huge=128 -P
min_buf_huge=64 -P
max_buf_large=128 -P
min_buf_large=64 -P
max_buf_medium=512 -P
min_buf_medium=256 -P
max_buf_small=4096 -P
min_buf_small=2048 -P
max_buf_tiny=4096 -P
min_buf_tiny=2048 P

AIX-VUG Demystifying 10 Gb Ethernet Performance

No Resource Errors disappeared after buffer adjustment

Receive Information
Receive Buffers
Buffer Type
Min Buffers
Max Buffers
Allocated
Registered
History
Max Allocated
Lowest Registered

53

max alloc:
= min buf
< max buf

max alloc:
= min buf
< max buf

Tiny
1024
4096
1024
1024

Small
1024
4096
1024
1024

Medium
256
512
256
256

Large
48
128
48
48

Huge
48
128
48
48

1024
1023

1024
1024

256
256

48
48

48
48

max alloc:
= min buf
< max buf

max alloc:
= min buf
< max buf

max alloc:
= min buf
< max buf

AIX-VUG Demystifying 10 Gb Ethernet Performance

Flow control

# entstat d ent5
PCIe2 2-port 10GbE SR Adapter (a21910071410d003) Specific Statistics:
--------------------------------------------------------------------Link Status: Up
Media Speed Running: 10 Gbps Full Duplex
PCI Mode: PCI-Express X8
Relaxed Ordering: Disabled
TLP Size: 512
MRR Size: 4096
PCIe Link Speed: 5.0 Gbps
Firmware Operating Mode: Legacy
Jumbo Frames: Enabled
Transmit TCP segmentation offload: Enabled
Receive TCP segment aggregation: Enabled
Transmit and receive flow control status: Enabled
Number of XOFF packets transmitted: 0
Number of XON packets transmitted: 0
Number of XOFF packets received: 0
Number of XON packets received: 0

54

AIX-VUG Demystifying 10 Gb Ethernet Performance

Latency
AIX localhost in a SPP:
Alignment
Local Remote
Send
Recv
8
0

Offset
Local Remote
Send
Recv
0
0

RoundTrip Trans
Throughput
Latency
Rate
10^6bits/s
usec/Tran per sec Outbound
Inbound
30.769
32500.306 0.260
0.260

Virt. Adapter to Virt. Adapter through the same vSwitch:


Alignment
Local Remote
Send
Recv
8
0

Offset
Local Remote
Send
Recv
0
0

RoundTrip Trans
Throughput
Latency
Rate
10^6bits/s
usec/Tran per sec Outbound
Inbound
57.785
17305.403 0.138
0.138

Cross system through VIOS / SEA with some load:


Alignment
Local Remote
Send
Recv
8
0

Offset
Local Remote
Send
Recv
0
0

RoundTrip Trans
Throughput
Latency
Rate
10^6bits/s
usec/Tran per sec Outbound
Inbound
185.065
5403.498 0.043
0.043

Cross system through VIOS / SEA with very high TP load:


Alignment
Local Remote
Send
Recv
8
0

55

Offset
Local Remote
Send
Recv
0
0

RoundTrip
Latency
usec/Tran
2930.229

Trans
Throughput
Rate
10^6bits/s
per sec Outbound
Inbound
341.270 0.003
0.003

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to tcp_recvspace in


10 GbE VIOS environment
180 s latency / 10 GbE VIOS environment
10
9

7,82

9,39

6,94

7
6

8,42

8,96

5,61

5
4
3
2
1
0
25000

50000

75000

100000

Throughput [Gbit/s]

56

150000

262144

AIX-VUG Demystifying 10 Gb Ethernet Performance

Throughput in dependency to tcp_recvspace in 10 GbE VIOS


environment
Throughput in dependency to
tcp_recvspace and latency

16

15

14
12
10

10,3

7,82
7,01

9,47

9,23

9,09

8,42

8,96

9,39

6,94

5,61
4
2
0
25000

50000

75000

Throughput [Gbit/s] with 180 s latency

57

100000

150000

Throughput [Gbit/s] with 50 s latency

262144

AIX-VUG Demystifying 10 Gb Ethernet Performance

Latency and Transaction Rate for Virtual I/O Servers


Sending/Receiving VIOS
with shared CPU units:
Alignment
Local Remote
Send
Recv
8
0

Offset
Local Remote
Send
Recv
0
0

RoundTrip Trans
Latency
Rate
usec/Tran per sec
162.928
6137.665

Sending/Receiving VIOS
with dedicated donating CPUs:
Alignment
Local Remote
Send
Recv
8
0

58

Offset
Local Remote
Send
Recv
0
0

RoundTrip Trans
Latency
Rate
usec/Tran per sec
146.819
6811.107

~ 11 %
Better
Transaction Rate

AIX-VUG Demystifying 10 Gb Ethernet Performance

Shared Ethernet Adapter Thread mode


Deactivate thread operation mode for the SEA on receiving VIO-Server
Can significantly boost throughput:
No congestion: Reduce latency for bridging by 9 %
Congestion - 8 TCP-Sessions with 7,5 Gbps load:
Reduce latency for transaction oriented traffic by 25 %

Receiving SEA in thread mode


Alignment
Local Remote
Send
Recv
8
0

Offset
Local Remote
Send
Recv
0
0

RoundTrip Trans
Latency
Rate
usec/Tran per sec
159.369
6274.746

Receiving SEA in non thread mode


Alignment
Local Remote
Send
Recv
8
0
59

Offset
Local Remote
Send
Recv
0
0

RoundTrip Trans
Latency
Rate
usec/Tran per sec
145.862
6855.812

9,2 %
Better
Latency

AIX-VUG Demystifying 10 Gb Ethernet Performance

Planning for latency sensitive workloads

Latency sensitive is considered as a guaranteed packet delivery within a


certain amount of time
Operation examples: Transaction oriented workload to App-Servers,
Database backend
Plan Dedicated Donating CPUs for systems with high transaction rates
(Reduces the particular forwarding time by avoiding VP dispatch latency)
Deactivate thread operation mode for the SEA on receiving VIO-Server
Can significantly boost throughput
Can reduce latency for bridging by up to 25 %

60

AIX-VUG Demystifying 10 Gb Ethernet Performance

RDMA and RoCEE


RDMA stands for Remote Direct
Memory Access. RDMA enables an
application to write directly to physical
memory on a remote system. RDMA
supports direct memory access from the
memory of one system into another
system's memory without operating
system's overhead, as data copies from
the network stack to the application
memory area. By eliminating the
operating system involvement, this
promotes high throughput, low-latency
communication
RoCEE (RDMA over Converged
Enhanced Ethernet) is a protocol that
implements Remote Direct Memory
Access (RDMA) over 10 Gigabit
Ethernet networks.
Source: Introduction to Ethernet Latency; Qlogic
http://www.qlogic.com/Resources/Documents/TechnologyBriefs/Adapters/Tech_Brief_Introduction_to_Ethernet_Latency.pdf

61

AIX-VUG Demystifying 10 Gb Ethernet Performance

PCIe2 10GbE RoCE Converged Host Bus Adapter


Supportet with AIX 6.1 TL 08, AIX 7.1 TL02 and VIOS 2.2.2.1
lsdev shows the hba and roce device but no ent adapters:
# lsdev
hba0
Available 09-00
PCIe2 10GbE RoCE Converged Host
Bus Adapter (b315506714101604)
roce0
Available
PCIe2 10GbE RoCE Converged Network
Adapter
PCIe2 10 GbE RoCE Adapter is preconfigured to operate in the RDMA configuration mode.
This can be changed with the following procedure:
# rmdev -dl roce0
# rmdev -dl hba0
# cfgmgr
# rmdev -dl roce0
# chdev -l hba0 -a stack_type=ofed
# lsdev | grep RoCE
ent7
Available 09-00-01
RoCE Converged Network Adapter
ent8
Available 09-00-02
RoCE Converged Network Adapter
hba0
Available 09-00
PCIe2 10GbE RoCE Converged Host
Bus Adapter (b315506714101604)

62

AIX-VUG Demystifying 10 Gb Ethernet Performance

Network performance tuning overview


Option

Throughput gain

#1: VIOS design

HIGH

#2: Jumbo Frames

HIGH

++

Latency gain

HIGH

GOOD

Packet distribution

Load distribution

+++

Ease of
implementation
and/or risk

MED

MED

~x3 Throughput gain

+++

Jumbo frame support


for all devices must
be ensured

#3: Large Send

HIGH

#4: Large Receive

HIGH

#5: EC hashing

HIGH

#6: CPU entitlement

HIGH

#7: Ded. CPUs

HIGH

HIGH

MED

#8: TCP buffer

HIGH

HIGH

GOOD

#9: SEA thread mode

HIGH

++

HIGH

++

#10: RDMA / RoCEE

HIGH

63

~x3-4 Throughput gain


for outgoing packets

+++
~x3-4 Throughput gain
for incomming packets

MED

GOOD

May have a slight impact

MED

++

MED

++

HIGH

Sufficient ENTC

GOOD

May have a slight impact

GOOD

++

GOOD

Sufficient ENTC

Sufficient ENTC

Sufficient ENTC

HIGH

POOR

Gives precedence over


all other VIOS tasks

MED

AIX-VUG Demystifying 10 Gb Ethernet Performance

Do you remember this problem?


~ 9 Gbit/s
AIX LPAR 1

AIX LPAR 2

~ 3 Gbit/s
Power 750 - 8408-E8D
AIX PAR 1

Power 720 - 8202-E4C


Virtual I/O Server

SEA

Virt.
Eth.

10GbE
SR

Virt.
Eth.

PVID 10

vSwitch

64

Etherchannel

10GbE
SR

PVID 10

AIX LPAR 2

10 GbE
Network

10GbE
SR
#5287

AIX-VUG Demystifying 10 Gb Ethernet Performance

Dont let this be you!

Wrong expectation: Operating systems and applications are simply


adopting the line speed of a 10 GbE adapter.

Fact is
A 10 GbE adapter provides a physical
line speed of up to 10 Gbit/s
The network performance of an OS or
an Application depends on.:
available CPU power for
application and OS network stack
maximum Transmission Unit size
distance between sender
and receiver
offloading features
coalescing and aggregation
features
TCP configuration

65

Ill never
get 10 Gig..

AIX-VUG Demystifying 10 Gb Ethernet Performance

THANK YOU!
VIELEN DANK!
Alexander Paul
paulalex@de.ibm.com
Enhanced Technical Support (ETS)

Meet you at
Enterprise2014

66

Das könnte Ihnen auch gefallen