Beruflich Dokumente
Kultur Dokumente
Let’s have a look at how TCP sessions are established… and what can go wrong!
The TCP protocol is a connection-oriented protocol, which means that a connection is established and
maintained until the application programs at each end have finished exchanging messages. TCP works
with the Internet Protocol (IP).
TCP provides reliable, ordered, and error-free transmission. To do so TCP has features such as
Handshake, Reset, Fin, Ack, Push packets, and other types of flags to keep the connection alive and to
not lose any information.
TCP is used under a number of application protocols, such as HTTP, so it is important to know how to
diagnostic TCP issues. In this series of articles, we will explain TCP meta information and explain why
it is important for performance troubleshooting and how to measure it easily with SkyLIGHT PVX.
A TCP connection, also called 3-way Handshake is achieved with SYN, SYN+ACK and ACK packets.
From this handshake, we can extract a performance metric called Connection Time (CT), which
summarizes how fast session a can be set up between a client and a server over a network. For more
details, see this excellent article on Wikipedia.
1. The ‘SYN’ is the first packet sent from a client to a server; it literally asks a server to open a
connection with it
2. If it’s possible, the server will respond with an ‘SYN+ACK’, means “I receive your ‘SYN’ and
I’m OK”
3. And finally, the client sends an ‘ACK’ to validate the connection
A first case you can easily diagnostic with SkyLIGHT PVX is: “Could my clients connect to my
servers?” In the PVX navigation menu, go to Application → Clients, then choose the TCP theme and set
the Filter called “Only Unilateral Flow”. The pattern is that we only see traffic from the client to the
server and no response from the server.
This means that you want to see top client IPs with flows from the client only and without any
responses.
We set the filters to see unilateral flows, and this shows mostly ‘SYN‘ issues, however, you could also
get other types of flows. To query only the ‘SYN’ without connections and only them, use a custom
filter:
Fig
ure 3 – SkyLIGHT PVX finds unilateral flows and sorts them.
As you see on the results above, there are several IPs which demand to connect to a server (SYN > 0) but
they cannot connect to them (Connections = 0).
3
A firewall denies those connections. In this case, you could apply the same query to client zones
(in the same menu) to see if the IPs are in the same zone.
The server does not exist anymore or is not available. This happens frequently when a server IP
is changed, yet some clients continue to query the old one.
In a perfect world, you should have 1 ‘SYN’ per TCP connection. SkyLIGHT PVX provides a metric to
see this connection efficiency, it is an ‘SYN’ per Connection rate (which corresponds to the number of
SYN packets compared to the number of TCP sessions set up). This metric is available in the ‘details’
tables by using the TCP theme. You can also graph its evolution over time in Application → Custom
charts.
A bad ‘SYN’ efficiency is sometimes a network issue. Thus the misconnections are caused by packet loss
or contingency. You can check this assumption by looking at the Connection Time. If it remains low and
impacts several hosts, then it’s probably a network issue.
However, if the Connection Time is high, the issue is on the server side, it is overloaded and cannot
answer to all clients. Finally, if the ‘SYN’ ratio is huge, then you can have security issues, like a DDOS
attack.
The network latency – RTT (Round Trip Time) – can give you another indication that the issue is on the
network side. SkyLIGHT PVX provides the RTT in the Network Performancesmetric theme.
4
Conclusion
In this first article, we saw a short presentation of TCP performance metrics and how the TCP protocol
handles the connections with SYN / SYN+ACK / ACK packets. We also see some common failure cases
that can be diagnosed easily with SkyLIGHT PVX.
To troubleshoot these kind of issues we used pages Top Clients, Top Client Zones and Custom Charts.
To go further, we used “Advanced Filter: Unilateral Flows” to filter flows with no responses.
We introduce several metrics: the number of ‘SYN’ and ‘Handshakes’ (connections), the SYN Efficiency
and the Connection Time.
Paradoxically, is it more complex to close a TCP connection than it is to create one! This is due to the
fact that resources must be correctly released on both sides of the connection, host A and host B.
5
The standard way to close TCP sessions is to send a FIN packet, then wait for a FIN response from the
other party.
1. A sends a FIN packet and waits for a response; it can release some resources but awaits the
response of the other part (Fin Wait)
2. B receives the FIN packet and must release resources; it waits for a closing application level
(Close Wait)
3. B can now send a FIN to A and then await its acknowledgement (Last Ack wait).
4. A can now fully close its job, but it must wait for network collision (?) (Time Wait); it may have
to send the final ACK another time.
5. B eventually receives the final ACK and destroys (kills) the connection.
This works fine in a perfect world. However, what happens when one part of the conversation is broken?
That’s why the Reset (RST) packet exists.
Those abnormal terminations (i.e., either an aborted setup or a disconnection) could appear due to:
SkyLIGHT™ PVX provides metrics to see if you close TCP connections properly or not. By selecting
the theme “TCP Events“, you get the count of FIN and RST packets in both directions.
6
If the SYN rate per connection and the server RST are both high (Figure 3 – 1st row), this means that the
server is refusing the client connection demands. With a drill down to the conversations (see Figure 4),
you will have the precise server ports and applications that cause this issue. Hereunder, we see some
attempts to connect to a VPN from an IP address using port “1194”.
Fi
gure 4 – SkyLIGHT PVX showing the cause of RST without any connections.
For Advanced Users of SkyLIGHT PVXIf you want to filter on data with or
without RST or FIN packets, SkyLIGHT PVX provides some custom filters. In the previous article, we
already saw how to filter on connections.
Examples:
Sometimes, RST packets are quite “normal”. For example when the user manually interrupts a huge data
transfer. The TCP session is sending packets as fast as possible, so when the client sends the FIN and
closes its part, the server is still sending lots of data for a moment. In this case, the client
sends RST packets until the server stops sending data. In this case, the is as client FIN (than server FIN),
but in addition, you will see some RST packets.
It is also important to note that some applications do not close sessions properly and simply use an RST
to close every session. While this is not a good practice, you must be aware that some applications are
developed this way.
7
It could also be relevant to graph some RST or FIN metrics over time. SkyLIGHT PVX provides a
metric to graph the rate of RST per connexion over time using Custom Graphs (Figure 5).
Figur
e 5 – SkyLIGHT PVX custom chart showing TCP RST over time in both directions (from client and
server).
CONCLUSION
We have seen how closing a TCP connection can be more complex than opening one. The session can be
closed by a double FIN, by a mix of FIN + RST, or only by RST packets. However, RST packets can also
be sent without any connection.
SkyLIGHT PVX helps to diagnose session issues by reporting statistics about FIN, RST, SYN, and
connections. It is also able to graph all the metrics over time, especially the RST per connection.
If a frame becomes errored from point to point on a connection due to cabling issues, duplex problems,
or other layer 1 events, the receiver will determine that the data is corrupted and drop it. In most cases,
an error counter will be incremented on the interface, which helps when locating where the loss
occurred.
Traffic congestion can cause input/output discards on interface links, especially when translating
between link speeds (10Gbps to 1Gbps for example). On these connections, the egress link may not be
8
able to keep up with the amount of ingress traffic, which may result in dropped packets. The sender of
the traffic will determine the loss occurred and retransmit. These are typically labelled as “discards” on
interfaces.
As we have seen in this series, TCP is a connection-oriented protocol. Part of the function of
establishing a connection is creating the mechanism to track data that has been sent and acknowledge
what is received. This way, TCP can detect if a packet goes missing and resend it accordingly, ensuring
reliable transmission of data.
Yes. Despite the maturity of network links to 10Gbps and beyond, packet loss is still an underlying
network event that impacts applications today. To troubleshoot these issues, we first need to understand
how packets are dropped, how we can detect these events, and how we can resolve them.
TCP Retransmissions
Each byte of data sent in a TCP connection has an associated sequence number. This is indicated on the
sequence number field of the TCP header.
When the receiving socket detects an incoming segment of data, it uses the acknowledgement number in
the TCP header to indicate receipt. After sending a packet of data, the sender will start a retransmission
timer of variable length. If it does not receive an acknowledgment before the timer expires, the sender
will assume the segment has been lost and will retransmit it.
TC
P header
9
The TCP retransmission mechanism ensures that data is reliably sent from end to end. If retransmissions
are detected in a TCP connection, it is logical to assume that packet loss has occurred on the network
somewhere between client and server.
Most packet analyzers will indicate a duplicate acknowledgment condition when two ACK packets are
detected with the same ACK numbers.
Sending TCP sockets usually transmit data in a series. Rather than sending one segment of data at a time
and waiting for an acknowledgement, transmitting stations will send several packets in succession. If
one of these packets in the stream goes missing, the receiving socket can indicate which packet was lost
using selective acknowledgments.
These allow the receiver to continue to acknowledge incoming data while informing the sender of the
missing packet(s) in the stream.
As shown above, selective acknowledgements will use the ACK number in the TCP header to indicate
which packet was lost. At the same time, in these ACK packets, the receiver can use the SACK option in
the TCP header to show which packets have been successfully received after the point of loss.
10
The SACK option is a function that is advertised by each station at the beginning of the TCP connection.
Most network analyzers will flag these packets as duplicate acknowledgements because the ACK
number will stay the same until the missing packet is retransmitted, filling the gap in the sequence.
Typically, duplicate acknowledgements mean that one or more packets have been lost in the stream and
the connection is attempting to recover. They are a common symptom of packet loss. In most cases,
once the sender receives three duplicate acknowledgments, it will immediately retransmit the missing
packet instead of waiting for a timer to expire. These are called fast retransmissions.
Connections with more latency between client and server will typically have more duplicate
acknowledgement packets when a segment is lost. In high latency connections, it is possible to observe
several hundred duplicate acknowledgements for a single lost packet.
Conclusion
If TCP Retransmissions and duplicate acknowledgments are detected on a connection, don’t assume that
the sky is falling and performance has come to a screeching halt. Depending on the network between
endpoints, a small amount of them may be normal.
For example, if a service provider is connecting end users to applications in a data center, or if the
application is hosted in a cloud environment, there are several connections that are beyond the control
and visibility of the network team. End users may perceive performance as normal, but a small number
of retransmissions may exist.
Lost packets require retransmissions, which take time, which will slow applications down. Depending
on how many occur and how fast the endpoints can recover the missing packets, they can significantly
impact application performance.
In these cases, walk the link between client and server, analyzing link-level errors for all infrastructure
devices you control. It may be that you discover the faulty cable, Frame Check Sequence counter (FCS),
or discard indicator that is contributing to the packet loss.
The size of the TCP Receive Window is communicated to the connection partner using the window size
value field of the TCP header. This field tells the link partner how much data can be sent on the wire
before an acknowledgment is received. If the receiver is not able to process the data as fast as it arrives,
gradually the receive buffer will fill and the TCP window will be reduced in the acknowledgment
packets. This will alert the sender that it needs to reduce the amount of data sent or allow the receiver
time to clear the buffer.
11
In the above diagram, the client and server are advertising their window size values as they
communicate. Each TCP header will display the most recent window value, which can grow or shrink as
the connection progresses. In this example, the client has a TCP receive window of 65,535 bytes, and
the server has 5,840. For many applications, since clients tend to receive data rather than send it, clients
often have a larger allocated window size. After the handshake, the client sends an HTTP GET request
to the server, which is quickly processed. Two response packets from the server arrive at the client,
which sends an acknowledgment along with an updated window size. The client was able to process the
data packets out of the TCP buffer as fast as they came in, so the window size was not reduced. The
client still has a full window available for receiving data – 65,535 bytes.
In another example, a client is requesting data from a server and begins to receive the data. However, in
this case, the client is not able to quickly process the incoming data. The TCP buffer begins to fill, as
indicated by the reduced window value.
12
TCP
Receive Window and TCP buffer
The acknowledgements from the client indicate that the window is shrinking. As long as the
window value does not fall to zero, this behavior will largely go unnoticed by the end user. Although the
number is slightly reduced, there is still plenty of room in the buffer for data transfer to continue. In
many cases, the client can catch up and will process the data out of the buffer, clearing the window out
and increasing the window value.
The TCP header value allocated for the window size is two bytes long. This means that the highest
possible numeric value for a receive window is 65,535 bytes. In today’s networks, this window size is
not enough to provide optimal traffic flow, especially on long, fat networks (links that have high
bandwidth and high latency). In its native state, TCP cannot take advantage of these high-performance
links since it can only send a maximum of 65,535 bytes at a time.
For this reason, TCP Options were introduced in RFC 1323 that enable the TCP receive window to be
increased exponentially. The specific function is called TCP Window Scaling, which is advertised in
the handshake process. When advertising its window, a client or server will also advertise the scale
factor (multiplier) that will be used for the life of the connection.
13
TC
P Window Size information seen in Wireshark
In the image above, the sender of this packet is advertising a TCP Window of 63,792 bytes and is using
a scaling factor of four. This means that that the true window size is 63,792 x 4 (255,168 bytes). Using
scaling windows allows endpoints to advertise a window size of over 1GB. To use window scaling,
both sides of the connection must advertise this capability in the handshake process. If one side or the
other cannot support scaling, then neither will use this function. The scale factor, or multiplier, will only
be sent in the SYN packets during the handshake and will be used for the life of the connection. This is
one reason why it is so important to capture the handshake process when performing TCP analysis.
When a client (or server – but it is usually the client) advertises a zero value for its window size, this
indicates that the TCP receive buffer is full and it cannot receive any more data. It may have a stuck
processor or be busy with some other task, which can cause the TCP receive buffer to fill. Zero
Windows can also be caused by a problem within the application, where the TCP buffer is not being
retrieved.
A TCP Zero Window from a client will halt the data transmission from the server side, allowing time for
the problem station to clear its buffer. When the client begins to digest the data, it will let the server
know to resume the data flow by sending a TCP Window Update packet. This will advertise an
increased window size and the flow will resume.
Window problems are usually observed on applications that move a lot of data such as backups, file
transfers, and large downloads. If a performance problem is hampering data transfer, look for window
problems on the receiver.
SkyLIGHT PVX can monitor for Zero Window conditions and displays statistics about which
connections suffered them and when. If these problems are observed in SkyLIGHT PVX, focus on the
station that is advertising the Zero Window value. Remember that this indicates the TCP receive buffer
has been exhausted and data flow will stop until the buffer is cleared. These are usually caused by stuck
processes on the client, under-resourced PCs or an application that is not tuned to receive high rates of
data.
You can easily drill down to the clients involved in the phenomenon and confirm the impact on the data
transfers and End User Response Times:
You could also view the evolution through time to understand if it is a continuous or intermittent issue:
Zero
Windows events trend through time
15
Why Should You Care About TCP Window Problems and TCP
Eindowing in General?
You should care about TCP window problems because they ultimately determine the speed of data
transfers and hence the experience of your users accessing the applications. In this video, you will
learn more about TCP windows in general, TCP Receive windows in particular and discover how they
can impact performance.
According to Wikipedia,