Sie sind auf Seite 1von 19

SO_REUSEPORT

Scaling Techniques for Servers with High Connection Rates

Ying Cai

ycai@google.com

Problems

Problems Servers with high connection/transaction rates TCP servers, e.g. web server UDP servers, e.g. DNS server

Servers with high connection/transaction rates TCP servers, e.g. web serverProblems UDP servers, e.g. DNS server On multi-core systems, using multiple servicing threads, e.g. one thread

connection/transaction rates TCP servers, e.g. web server UDP servers, e.g. DNS server On multi-core systems, using

UDP servers, e.g. DNS serverconnection/transaction rates TCP servers, e.g. web server On multi-core systems, using multiple servicing threads,

On multi-core systems, using multiple servicing threads, e.g. one thread per servicing core. The single server socket becomes bottleneck Cache line bounces Hard to achieve load balance Things will only get worse with more coresProblems Servers with high connection/transaction rates TCP servers, e.g. web server UDP servers, e.g. DNS server

socket becomes bottleneck Cache line bounces Hard to achieve load balance Things will only get worse
socket becomes bottleneck Cache line bounces Hard to achieve load balance Things will only get worse
socket becomes bottleneck Cache line bounces Hard to achieve load balance Things will only get worse
socket becomes bottleneck Cache line bounces Hard to achieve load balance Things will only get worse
socket becomes bottleneck Cache line bounces Hard to achieve load balance Things will only get worse

Scenario

Scenario
Scenario
Scenario

Single TCP Server Socket - Solution 1

Single TCP Server Socket - Solution 1 Use a listener thread to dispatch established connections to
Single TCP Server Socket - Solution 1 Use a listener thread to dispatch established connections to

Use a listener thread to dispatch established connections to server threads The single listener thread becomes bottleneck due to high connection rate Cache misses of the socket structure Load balance is not an issue here

becomes bottleneck due to high connection rate Cache misses of the socket structure Load balance is
becomes bottleneck due to high connection rate Cache misses of the socket structure Load balance is
becomes bottleneck due to high connection rate Cache misses of the socket structure Load balance is
becomes bottleneck due to high connection rate Cache misses of the socket structure Load balance is

Single TCP Server Socket - Solution 2

Single TCP Server Socket - Solution 2 All server threads accept() on the single server socket
Single TCP Server Socket - Solution 2 All server threads accept() on the single server socket

All server threads accept() on the single server socket Lock contention on the server socket Cache line bouncing of the server socket Loads (number of accepted connections per thread) are usually not balanced

of accepted connections per thread) are usually not balanced Larger latency on busier CPUs It can
of accepted connections per thread) are usually not balanced Larger latency on busier CPUs It can
of accepted connections per thread) are usually not balanced Larger latency on busier CPUs It can

Larger latency on busier CPUsof accepted connections per thread) are usually not balanced It can almost be achieved by accept()

It can almost be achieved by accept() at random intervals, but it is hard to decide the interval value, and may introduce latency.the server socket Loads (number of accepted connections per thread) are usually not balanced Larger latency

be achieved by accept() at random intervals, but it is hard to decide the interval value,

Single UDP Server Socket

Single UDP Server Socket Have same issues as TCP SO_REUSEADDR allows multiple UDP sockets bind() to
Single UDP Server Socket Have same issues as TCP SO_REUSEADDR allows multiple UDP sockets bind() to

Have same issues as TCP SO_REUSEADDR allows multiple UDP sockets bind() to the same local IP address and UDP port, but it will not distribute packets among them. It is not designed to solve this problem.

local IP address and UDP port, but it will not distribute packets among them. It is

New Socket Option - SO_REUSEPORT

New Socket Option - SO_REUSEPORT Allow multiple sockets bind()/listen() to the same local address and TCP/UDP

Allow multiple sockets bind()/listen() to the same local address and TCP/UDP portNew Socket Option - SO_REUSEPORT Every thread can have its own server socket No locking contention

Every thread can have its own server socketbind()/listen() to the same local address and TCP/UDP port No locking contention on the server socket

No locking contention on the server socketand TCP/UDP port Every thread can have its own server socket Load balance is achieved by

Load balance is achieved by kernel - kernel randomly picks a socket to receive the TCP connection or UDP requestown server socket No locking contention on the server socket For security reason, all these sockets

For security reason, all these sockets must be opened by the same user, so other users can not "steal" packetssocket Load balance is achieved by kernel - kernel randomly picks a socket to receive the

For security reason, all these sockets must be opened by the same user, so other users

SO_REUSEPORT

SO_REUSEPORT
SO_REUSEPORT

How to enable

How to enable 1. sysctl net.core.allow_reuseport=1 2. Before bind(), setsockopt SO_REUSEADDR and SO_REUSEPORT 3. Then the

1. sysctl net.core.allow_reuseport=1

2. Before bind(), setsockopt SO_REUSEADDR and SO_REUSEPORT

3. Then the same as a normal socket - bind()/listen()

/accept()

2. Before bind(), setsockopt SO_REUSEADDR and SO_REUSEPORT 3. Then the same as a normal socket -

Status

Status Developed by Tom Herbert at Google Submitted to upstream, but has not been accepted yet

Developed by Tom Herbert at GoogleStatus Submitted to upstream, but has not been accepted yet Deployed internally at Google Will be

Submitted to upstream, but has not been accepted yetStatus Developed by Tom Herbert at Google Deployed internally at Google Will be deployed on Google

Deployed internally at GoogleGoogle Submitted to upstream, but has not been accepted yet Will be deployed on Google Front

Will be deployed on Google Front End serversbut has not been accepted yet Deployed internally at Google Already deployed on Google DNS servers.

Already deployed on Google DNS servers. Some test shows change from 50k request/s with some losses to 80k request/s without loss.to upstream, but has not been accepted yet Deployed internally at Google Will be deployed on

deployed on Google DNS servers. Some test shows change from 50k request/s with some losses to

Known Issues - Hashing

Known Issues - Hashing Hash is based on 4 tuples and the number of server sockets,

Hash is based on 4 tuples and the number of server sockets, so if the number is changed (server socket opened/closed), a packet may be hash into a different socket TCP connection can not be establishedKnown Issues - Hashing Solution 1: Use fixed number of server sockets Solution 2: Allow multiple

Solution 1: Use fixed number of server socketsa different socket TCP connection can not be established Solution 2: Allow multiple server sockets to

Solution 2: Allow multiple server sockets to share the TCP request tableestablished Solution 1: Use fixed number of server sockets Solution 3: Do not use hash, pick

Solution 3: Do not use hash, pick local server socket which is on the same CPUSolution 1: Use fixed number of server sockets Solution 2: Allow multiple server sockets to share

sockets to share the TCP request table Solution 3: Do not use hash, pick local server
sockets to share the TCP request table Solution 3: Do not use hash, pick local server

Known Issues - Cache

Known Issues - Cache Have not solved the cache line bouncing problem completely Solved: The accepting
Known Issues - Cache Have not solved the cache line bouncing problem completely Solved: The accepting

Have not solved the cache line bouncing problem completely

Solved: The accepting thread is the processing threadHave not solved the cache line bouncing problem completely Unsolved: The processed packets can be from

Unsolved: The processed packets can be from another CPUSolved: The accepting thread is the processing thread Instead of distribute randomly, deliver to the thread/socket

Unsolved: The processed packets can be from another CPU Instead of distribute randomly, deliver to the

Instead of distribute randomly, deliver to the thread/socket on the same CPU

The processed packets can be from another CPU Instead of distribute randomly, deliver to the thread/socket

Silo'ing

Silo'ing
Silo'ing
Silo'ing
Silo'ing

Interactions with RFS/RPS/XPS-mq - TCP

Interactions with RFS/RPS/XPS-mq - TCP Bind server theads to CPUs RPS (Receive Packet Steering) distributes the

Bind server theads to CPUsInteractions with RFS/RPS/XPS-mq - TCP RPS (Receive Packet Steering) distributes the TCP SYN packets to CPUs

RPS (Receive Packet Steering) distributes the TCP SYN packets to CPUswith RFS/RPS/XPS-mq - TCP Bind server theads to CPUs TCP connection is accept() by the server

TCP connection is accept() by the server thread bound to the CPUPacket Steering) distributes the TCP SYN packets to CPUs Use XPS-mq (Transmit Packet Steering for multiqueue)

Use XPS-mq (Transmit Packet Steering for multiqueue) to send replies using the transmit queue associated with this CPUconnection is accept() by the server thread bound to the CPU Either RFS (Receive Flow Steering)

Either RFS (Receive Flow Steering) or RPS can guarantee that succeeding packets of the same connection will be delivered to that CPUCPU Use XPS-mq (Transmit Packet Steering for multiqueue) to send replies using the transmit queue associated

(Receive Flow Steering) or RPS can guarantee that succeeding packets of the same connection will be

Interactions with RFS/RPS/XPS-mq - TCP

Interactions with RFS/RPS/XPS-mq - TCP RFS/RPS is not needed is RxQs are set up per CPU

RFS/RPS is not needed is RxQs are set up per CPUInteractions with RFS/RPS/XPS-mq - TCP But hardware may not support as many RxQs as CPUs

But hardware may not support as many RxQs as CPUsInteractions with RFS/RPS/XPS-mq - TCP RFS/RPS is not needed is RxQs are set up per CPU

RFS/RPS/XPS-mq - TCP RFS/RPS is not needed is RxQs are set up per CPU But hardware

Interactions with RFS/RPS/XPS-mq - UDP

Interactions with RFS/RPS/XPS-mq - UDP Similar to TCP
Interactions with RFS/RPS/XPS-mq - UDP Similar to TCP

Similar to TCP

Interactions with RFS/RPS/XPS-mq - UDP Similar to TCP

Interactions with scheduler

Interactions with scheduler Some scheduler mechanism may harm the performance Affine wakeup - too aggressive in
Interactions with scheduler Some scheduler mechanism may harm the performance Affine wakeup - too aggressive in

Some scheduler mechanism may harm the performance Affine wakeup - too aggressive in certain conditions, causing cache misses

scheduler mechanism may harm the performance Affine wakeup - too aggressive in certain conditions, causing cache
scheduler mechanism may harm the performance Affine wakeup - too aggressive in certain conditions, causing cache

Other Scalability Issues

Other Scalability Issues Locking contentions HTB Qdisc
Other Scalability Issues Locking contentions HTB Qdisc

Locking contentions HTB Qdisc

Other Scalability Issues Locking contentions HTB Qdisc
Other Scalability Issues Locking contentions HTB Qdisc

Questions?

Questions?