Beruflich Dokumente
Kultur Dokumente
Today's topics
Virtual switching technologies in Linux
bridge
HW switch like device (IEEE 802.1D)
Common NIC filters unicast whose dst is not its mac address
without promiscuous mode
Many NICs also filter multicast / vlan-tagged packets by default
without bridge
with bridge
kernel
kernel
TCP/IP
TCP/IP
if dst mac is bridge device
br0
pass to
upper layer
eth0
bridge
handler hook
eth0
promiscuous
mode
eth1
promiscuous
mode
Copyright 2014 NTT Corp. All Rights Reserved.
macvlan
VLAN using not 802.1Q tag but mac address
4 types of mode
private
vepa
bridge
passthru
Using unicast
filtering if supported,
instead of promiscuous
mode
(except for passthru)
kernel
MAC address A
macvlan0
MAC address B
macvlan1
macvlan
handler hook
eth0
unicast filtering
kernel
MAC address A
macvlan0
MAC address B
macvlan1
macvlan
eth0
External SW
External GW
Copyright 2014 NTT Corp. All Rights Reserved.
kernel
MAC address A
macvlan0
MAC address B
macvlan1
macvlan
eth0
External SW
Allow traffic
between macvlans
(via macvlan stack)
kernel
MAC address A
macvlan0
MAC address B
macvlan1
macvlan
eth0
External SW
kernel
MAC address A
macvlan0
macvlan
eth0
promiscuous
External SW
10
Open vSwitch
Supports OpenFlow
Can be used as a normal switch as well
user space
daemon
(ovs-vswitchd)
control plane
Flow table
FDB
OpenFlow
controller
upcall
kernel
openvswitch
Flow table
(datapath)
(cache)
data plane
handler hook
eth0
promiscuous
mode
eth1
Copyright 2014 NTT Corp. All Rights Reserved.
11
Hardware switch
NIC embedded switch (in SR-IOV device)
12
qemu/vhost
Guest
eth0
fd
read/write
kernel
bridge
eth0
vfs
tap0
13
qemu/vhost
qemu/vhost
Guest
Guest
eth0
eth0
fd
fd
read/write
macvtap0
read/write
macvtap1
macvlan
kernel
eth0
Copyright 2014 NTT Corp. All Rights Reserved.
14
qemu/vhost
Guest
eth0
fd
read/write
macvtap0
macvlan
kernel
eth0
promiscuous
15
qemu/vhost
Guest
eth0
fd
read/write
kernel
openvswitch
eth0
vfs
tap0
16
PF
VF
VF
eth0
eth0_0
eth0_1
embedded switch
kernel
17
qemu
Guest
Guest
eth0_0
eth0_1
eth0
embedded switch
kernel
18
eth0
qemu/vhost
qemu/vhost
Guest
Guest
eth0
eth0
fd
fd
macvtap0
macvtap1
eth0_0
eth0_1
embedded switch
kernel
19
Performance of switches
Environment
Test results
Throughput
Overhead on host
20
Performance: environment
kernel 3.14.4 (2014/5/13 Release)
Host: Xeon E5-2407 4 core * 2 socket
NIC: 10GbE, Intel 82599 chip (ixgbe)
Guest: 2 core *1
HW Switch: BLADE G8124
Benchmark tool: netperf-2.6
host
82599
UDP packets
netperf
BLADE G8124
82599
host
*1: Pinning on host: vcpus -> CPU0~3, vhost -> CPU1. NIC irq affinity on host: 0x1 (CPU0).
Pinning on guest: netserver process -> CPU1. NIC irq affinity on guest: 0x1 (CPU0).
Copyright 2014 NTT Corp. All Rights Reserved.
21
Performance: throughput
Receive throughput on guest
Throughput (Gbps)
3
2
1
0
22
SR-IOV (PCI-passthrough)
has the lowest overhead
300
250
user
200
system
150
hardirq
100
softirq
50
0
250
200
150
100
50
0
vcpu1
vcpu0
vhost
23
Commands
brctl
ip / bridge
24
25
26
FDB manipulation
FDB
Forwarding database
Learning: packet arrival triggers entry creation
Source MAC address is used with incoming port
kernel
FDB
MAC address
Dst
aa:bb:cc:dd:ee:ff
eth0
learning
bridge
...
eth0
packet
eth1
arrival from
aa:bb:cc:dd:ee:ff
Copyright 2014 NTT Corp. All Rights Reserved.
27
FDB manipulation
FDB manipulation commands
Since kernel 3.0
# bridge fdb add <mac address> dev <port> master temp
# bridge fdb del <mac address> dev <port> master
MAC address
Dst
specified mac
port
kernel
...
bridge
specified port
eth0
eth1
28
FDB manipulation
# bridge fdb add <mac address> dev <port> master temp
What's "temp"?
kernel
br0
bridge
(br0)
eth0
if match
permanent
eth1
specified port
Copyright 2014 NTT Corp. All Rights Reserved.
29
FDB manipulation
What's "master"?
master
specified port
(self)
eth0
eth1
30
FDB manipulation
When to use "self"?
Some NIC embedded switches support this command
ixgbe, qlcnic
master
bridge
PF
eth0
self
VF
eth0_0
VF
eth0_1
embedded switch
kernel
31
FDB manipulation
Example: Intel 82599 (ixgbe)
qemu
qemu
Guest 1
MAC A
eth1
Guest 2
MAC C
eth0_0
VF
tap
bridge
PF
eth0 MAC B
Dst. A
embedded switch
kernel
32
FDB manipulation
Example: Intel 82599 (ixgbe)
Type "bridge fdb add A dev eth0" on host
Traffic to A will be forwarded to bridge
qemu
qemu
Guest 1
MAC A
eth1
Guest 2
MAC C
eth0_0
VF
tap
bridge
PF
eth0 MAC B
Dst. A
embedded switch
kernel
33
VLAN filtering
802.1Q Bridge
Filter packets according to vlan tag
Forward packets according to vlan tag as well as mac
address
Insert / strip vlan tag
kernel
FDB
MAC address
Vlan
Dst
aa:bb:cc:dd:ee:ff
10
eth0
...
eth1
34
VLAN filtering
Ingress / egress filtering policy
Incoming / outgoing packet is filtered if matching
filtering policy
Per-port per-vlan policy
Default is "disallow all vlans"
All packets are dropped
Filtering table
kernel
Port
Allowed
Vlans
bridge
eth0
10
20
eth1
20
30
filter by vlan
at ingress
allow 10
eth0
filter by vlan
at egress
disallow 10
eth1
VID 10
Copyright 2014 NTT Corp. All Rights Reserved.
35
VLAN filtering
PVID (Port VID)
Untagged (and VID 0) packet is assigned this VID
Per-port configuration
Default PVID is none (untagged packet is discarded)
Filtering table
Port
Allowed
Vlans
eth0
10
eth1
PVID
Egress
Untag
20
20
30
bridge
apply pvid
(insert vid 20)
eth0
untagged
packet
apply untagged
(strip tag 20)
eth1
36
VLAN filtering
Commands
Enable VLAN filtering (disabled by default)
# echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering
Dump setting
# bridge vlan show
37
qemu
qemu
Guest
Guest
eth0
eth0
tap0
tap1
br10
br20
eth0.10
eth0.20
kernel
eth0
38
qemu
qemu
Guest
Guest
eth0
eth0
tap0
tap1
pvid/untag
vlan 10
br0
pvid/untag
vlan 20
vlan10 / 20
kernel
eth0
39
40
qemu
qemu
Guest
Guest
eth0
eth0
tap0
tap1
no learning
no flooding
bridge
kernel
no learning
no flooding
learning
flooding
eth0
Commands
# echo 0 > /sys/class/net/<port>/brport/learning
# echo 0 > /sys/class/net/<port>/brport/unicast_flooding
Copyright 2014 NTT Corp. All Rights Reserved.
41
42
.1Q tag
payload
43
Bridge preserves
guest .1Q tag (vid
30) when inserting
.1ad tag (vid 10)
qemu
Guest A
eth0.30
Guest C
eth0
eth0
.1Q VID 30
tap0
.1ad VID 10
.1Q VID 30
pvid/untag
pvid/untag
vlan 10
vlan 20
bridge (.1ad mode)
vlan10 / 20
kernel
Customer's
another site
tap1
.1Q VID 30
eth0
.1ad VID 10
.1Q VID 30
.1ad network
Copyright 2014 NTT Corp. All Rights Reserved.
44
Non-promiscuous bridge
If there is only one
learning /flooding port,
it can be non-promisc
Instead of promisc
mode, unicast filtering is
set for static FDB entries
qemu
qemu
Guest
Guest
eth0
eth0
tap0
tap1
Automatically enabled if
meeting some conditions
There is one or zero
learning & flooding port
bridge itself is not
promiscuous mode
VLAN filtering is enabled
no learning
no flooding
bridge
no learning
no flooding
learning
flooding
eth0
non-promisc
kernel
45
Summary
Linux has 3 types of software switches
bridge, macvlan (macvtap), Open vSwitch
SR-IOV NIC enbedded switch can also be used for KVM
46
47