Beruflich Dokumente
Kultur Dokumente
The goal of the transport layer is to provide efficient, reliable, and cost-effective data. The
transport layer sends segments to the network layer (which turns it into packets, which in turn
will be transformed into frames by the data link layer).
The transport and network layer are very similar, but the transport layer only runs on the user’s
machine and is thus way more easy to customize. It gives the users full control over the
connection.
The existence of the transport layer makes it possible for programmers to write programs that
work on a wide variety of networks. The transport service does this by providing primitives to
use. If the network layer was faultless these primitives wouldn't be necessary. However the real
world works differently. For this reason there is often made a distinction between layers 1
through 4 and the others.
The bottom four can be seen as the transport service providers and the upper layers can be
seen as the transport service user.
We use the term Transport Protocol Data Unit (TDPU) to describe messages sent from
transport entity to transport entity.
Berkeley sockets
The primitives for Berkely sockets extend on the transport service primitives (these are used by
TCP):
1. Socket: create a new communication endpoint
2. Bind: assign a local address to the socket
3. Listen
4. Accept: passively accept an incoming connection request
5. Connect
6. Send
7. Receive
8. Close
The first four primitives here are executed by a server socket in that order:
• The SOCKET call returns a file descriptor to write on in later system calls
• Using BIND a socket file descriptor gets an address to listen on
• The LISTEN call allocates a large enough space to queue incoming requests
• ACCEPT waits for incoming connections, and when one does it creates a new socket
file descriptor similar to the original one. Now the program can fork off to handle this
new connection and start waiting with another ACCEPT call
A connection is only released once both parties, server and client, have executed CLOSE.
Addressing
Packets don’t just need a destination computer, they also need a destination port, for which
program is the packet intended?
An IP Address is a NSAP (Network Service Access Point), because it’s all the identification the
network needs. In some networks, a NSAP may be shared between multiple computers.
A port is a TSAP (Transport Service Access Point), it runs over a NSAP and allows two TSAP
interfaces to communicate (a local and a remote one).
It is not very efficient to let all servers listening to a port all day long, therefore the initial
connection protocol is used. A proxy server, process server, receives the requests and spawns a
new server if needed. But, this is only applicable when servers can be created on demand (i.e.
there needs to be a known mapping between a port and application).
Connection establishment
Problem: What if a packet for a connection establishment times out, is resend, and then still
arrives, thus gets duplicated?
Solution: Guarantee a maximum TTL for packets (120s in Ethernet) using a hop counter, and
give each packet a unique identifier which may not be repeated until the TTL expires.
Forbidden region
The forbidden region is the set of identifiers that may currently be chosen as an initial
sequence number, and is thus forbidden from use. There are two ways we could enter this
region:
• We send too much data, in which case we’ll hit the forbidden region from the right. To fix
this, send at most 1 segment per clock tick. Though this is very unlikely anyway
• We can send too little data, in which case we’ll hit the forbidden region from the left. This
limits how long a connection can last (max 4 hours).
Say data remains in the network at most T seconds. Then we should not send a segment with
a sequence number that can be an initial sequence numbers within the next T seconds,
otherwise we might get duplicates.
When sequence numbers get close to the forbidden region, we can choose a new sequence
number.
If the initial request is now repeated (for example, if the ACK gets lost), host 2 just repeats its
ACK with the number, but host 1 sends a reject. So, this handshake can handle duplicates. The
first connection will still work!
Connection release
There are 2 styles of releasing connections
• Asymmetric (one party just stops the connection, informing the other person but not
waiting for an ACK, it may result in data loss)
• Symmetric (both parties must agree before ending the connection)
The two army problem comes into play for symmetric connections, as we are never sure if the
other party got the last message. This problem is unsolvable, so we just add an extra condition
that if the other party hasn’t communicated for a while after a disconnection, to exit the
connection. The initiating party will retry n times and then release the connection.
Error control & flow control
We want to make sure that the packet got through the entire network correctly, as a final check.
Packets may get corrupted by broken routers, and there is no error control to catch this.
Sequence numbers don't tell you this. For error control, we run a known method (CRC,
hamming etc.).
If the receiver of packets don't have the capability to receive all the packets sent (but the
network has), the transport layer of the receiver can tell the sender to slow down. This is flow
control. On the transport layer, we have a higher delay than in the data link layer. We still use
sliding windows, but since sliding windows may be huge (because we have a large bandwidth-
delay product), we want to use dynamic buffers, shared by multiple connections.
Multiplexing
There are multiple processes sharing a single network connection (eg. same IP address), this is
called multiplexing.
A single process using multiple network connections (to increase bandwidth or reliability) is
called inverse multiplexing.
Crash recovery
The hard problem is recovering from host crashes. We don’t know whether we passed on the
data to the next layer yet. If we first send an ACK and then pass on the data, we might have sent
an ACK but not passed on the data. The other way around has the same problem. This problem
is not solvable on a certain layer k without using layer k + 1. The transport layer can propagate
problems to the application layer.
Congestion control
The network layer already does a lot of congestion control, but at some point the best way to
fix congestion issues is to just send fewer packets into the network. This is the responsibility of
the transport layer. Congestion control is used when the network can't handle the amount of
packets sent over the network (whereas with flow control the receiver couldn't handle the load).
2
It could be that the total bandwidth B = 2, with N = 3. This means each machine gets 23
bandwidth. But this will not work if 2 machines are wired to 1 router, then you get over the
bandwidth over that one router!
Even if you were trying to fairly divide bandwidth, it is hard to decide how many machines there
are and decide how much bandwidth each machine gets. Besides, the path of the packet of a
machine depends on the destination, thus the amount of network congestion also depends on
that. So we have a few unknowns;
• Available bandwidth
• Network topology
• Other clients
Max-min fairness
Max-min fairness is a technique that is frequently used. This maximizes minimum bandwidth,
then uses excess bandwidth where possible. A link is max-min fair if we can’t increase the flow
of a link any further without decreasing the flow of a link with smaller flow. A disadvantage of
this is that we might not use total bandwidth.
The algorithm must also converge quickly, as bandwidth will change rapidly.
Dynamic bandwidth
We can just trial and error to dynamically adjust the bandwidth. We keep trying to increase the
bandwidth usage and slow down when we receive a congestion signal.
Wireless issues
Wireless networks are unreliable, and lose packets all the time. To not make the transport layer
completely useless, we hide a lot of packet losses from the transport layer, and we only send
the packet loss to the transport layer if the packet gets lost after multiple retries.
Header
It has a 8 byte header:
Source port: The port where the source application sent the data from
Destination port: The port where the target application is located at
UDP Length: The amount of bytes of the UDP header and body together, the minimum is 8
(because of the header size) and the maximum is the maximum size of an IP packet.
Checksum: Add up all the 16 bit words using XOR, the result should be 0 (optional)
Application
UDP is used in applications where using a few frames isn't too bad and speed is more
important. This could be video streaming, realtime games etc. This does mean that the
application layer needs to correct for this.
Some properties:
• In TCP, both the sender and the receiver create endpoints called sockets. Each socket is
identified by an IP address and a port. All TCP connections are full duplex and point-to-
point
• TCP is a byte stream, not a message stream. So if you send 4 × 512 bytes, this may be
received as 1 × 2048 bytes by the receiver. TCP may also buffer data until it received
enough to send in a single packet
• The TCP protocol uses path discovery to find the maximum transfer unit (MTU) which it
may send. In practice, the MTU is often 1500 bytes (max ethernet packet size)
• TCP is much, much more complex than UDP
Header
CWR and ECE: Signals congestion, CWR says the sender has slowed down, ECE says the
receiver must slow down
URG: Urgent packet
ACK: Set to 1 if acknowledgement number is valid, set to 0 if acknowledgement number should
be ignored because it’s not used
PSH: This data must be sent to the application layer ASAP, and must not be buffered
RST: Reset this connection, because of a problem (like a host crash)
SYN: Used to establish connections, the connection request has SYN = 1, ACK = 1
SYN = 1, ACK = 0, and the reply has SYN = 1, ACK = 1
FIN: Used to release a connection
Window size: How many more bytes may be sent, starting at the last byte that was
acknowledged (used for flow control)
Checksum: A 16-bit XOR of the header and the data
Options: May be used to provide extra facilities that are not provided by the regular header
TCP conversations
Connection establishment
Uses the 3-way handshake described earlier. One side waits for a connection, the other initiates
it. It also works perfectly if both parties initiate a connection at the same time, since both will
acknowledge with the correct sequence number, thus making only one connection.
An interesting attack is SYN flooding, where we send a large amount of connection requests to
a host, and the host has to remember all connections. This can be solved by not remembering
the initial connections, but rather generating the sequence number from the IP, port, and a local
secret.
Connection release
We can stop a connection, once both sides have sent a packet with the FIN bit. When that bit is
set, the connection is stopped in one direction for new data. If there is no acknowledgement to
a FIN within 2 packet lifetimes, the connection is dropped.
Buffer management
TCP separates the issues of correctly receiving segments (using ACK field) and buffer
management (using WIN field). This is valid TCP conversation:
In TCP, received packets are buffered by the receiver. The length of the buffers available is sent
back to the sender (piggybacked on other messages). On this way, the sender knows when to
send how much, in a way the buffer of the receiver never gets too full. If it's full, it could mean
data is lost
Sequence numbers
• In TCP, every data byte has its own sequence number
• SYN and FIN also increase sequence/ack numbers
• In a TCP segment sent from S to R:
• Sequence number: bytes R should have received before this segment + initial
sequence number of S
• Acknowledgement number: bytes received from R + initial sequence number of
R
• The initial sequence number is clock based but with additional randomness (for security
purposes, to avoid an impersonation attack)
• The sequence numbers always notates the start of the data, not the end.
Flow control
When the window is 0, no packets may be sent by the sender, except:
• Urgent data, for example ending the connection
• A 1-byte packet forcing the receiver to re-announce the next byte excepted and the
window size, called a window probe. This is useful for if a window update ever gets lost.
Another problem is the silly window problem, where the sender sends data in large blocks, but
the receiver reads 1 byte at a time, thus sending a lot of window updates. The solution to this is
to have the receiver only send window updates once a decent amount of new space is
available. Specifically, the receiver should only advertise new space until the buffer is half full, or
it has freed another maximum segment size.
Retransmissions
We retransmit segments after a timer expires. We also retransmit a segment if we received an
acknowledgement of the next segment (called fast retransmission). This has the risk that a
segment might be sent twice. It may also use the time between ACKs, this is called an ACK
clock
ACKs
Senders are not required to send information as soon as it comes in, and receivers are not
required to acknowledge instantly. We can use this to optimize performance. In TCP, we wait
up to 500ms before sending the acknowledgement, in the hope to get a free ride with some
data (piggybacked ACK, then the ACK field is 0 but we increase the ACK number).
If we have nothing to send back, we send an ACK directly (with only the TCP header), with the
ACK field set to 1.
Examples
A few examples of conversations:
Congestion control
TCP maintains a congestion window, whose size is the amount of bytes that the sender may
have on the network at any time. The window size is adjusted according to the AIMD rule
(additive increase, multiplicative decrease). Thus there are two different windows in TCP, you
never send more than the minimum of either of the two windows.
Slow start
AIMD will take a long time to reach a high speed on fast networks. Instead, we use slow start,
which grows exponentially at first. To keep this algorithm from going out of control, we will cap
it off at a threshold, which at first is set to infinity. In TCP Tahoe slow start was implemented
first.
Slow start:
• Start by sending out at most 4 segments
• For each segment that is acknowledged, send out another 2 segments
• When the threshold is hit, switch to additive increase
• When a packet is lost, set the threshold to half of the congestion window, and the entire
process is restarted.
To improve the algorithm, we can use fast recovery: instead of dropping all the way to 0, we can
drop to the threshold directly. TCP Tahoe with fast recovery is used in TCP Reno
Newer TCP versions all use different variations of the same idea of decreasing congestion
window and threshold.