Sie sind auf Seite 1von 14

Transport Layer (chapter 6)

The goal of the transport layer is to provide efficient, reliable, and cost-effective data. The
transport layer sends segments to the network layer (which turns it into packets, which in turn
will be transformed into frames by the data link layer).

The transport and network layer are very similar, but the transport layer only runs on the user’s
machine and is thus way more easy to customize. It gives the users full control over the
connection.

The transport service


The transport service has a lot in common with the network layer's services. They both have
three phases: establishment, data transfer and release. The transport layer aims to solve this by
providing reliable (100% in the case of connection-oriented) connections.

The existence of the transport layer makes it possible for programmers to write programs that
work on a wide variety of networks. The transport service does this by providing primitives to
use. If the network layer was faultless these primitives wouldn't be necessary. However the real
world works differently. For this reason there is often made a distinction between layers 1
through 4 and the others.

The bottom four can be seen as the transport service providers and the upper layers can be
seen as the transport service user.

Transport service primitives


Some example primitives for a simple transport service:
1. Listen: wait for another process to contact us (blocks until some process tries to
connect)
2. Connect: actively try to connect to a process that is listening
3. Send: send data over the established connection
4. Receive: receive data over the established connection (blocks until a data packet arrives)
5. Disconnect: release the connection

We use the term Transport Protocol Data Unit (TDPU) to describe messages sent from
transport entity to transport entity.

Berkeley sockets
The primitives for Berkely sockets extend on the transport service primitives (these are used by
TCP):
1. Socket: create a new communication endpoint
2. Bind: assign a local address to the socket
3. Listen
4. Accept: passively accept an incoming connection request
5. Connect
6. Send
7. Receive
8. Close

The first four primitives here are executed by a server socket in that order:
• The SOCKET call returns a file descriptor to write on in later system calls
• Using BIND a socket file descriptor gets an address to listen on
• The LISTEN call allocates a large enough space to queue incoming requests
• ACCEPT waits for incoming connections, and when one does it creates a new socket
file descriptor similar to the original one. Now the program can fork off to handle this
new connection and start waiting with another ACCEPT call

The client side does something similar in a slightly different order:


• First a SOCKET call is done to create one
• Now CONNECT is used, which blocks the caller until a connection is established with a
server (imeouts can be used here)
• Once a connection is established, both sides can use SEND and RECEIVE

A connection is only released once both parties, server and client, have executed CLOSE.

Elements of transport protocols


Similarities and differences with the data link layer
• Similar: both have to deal with error control, sequencing, and flow control.
• Different:
• The environment is different, we need addressing and we need to set up
connections
• There is also storage capacity on the network, packets may arrive out of order, or
packets may be duplicated
• The final problem with the transport layer is that buffers and the fact that
bandwidth may fluctuate wildly

Addressing
Packets don’t just need a destination computer, they also need a destination port, for which
program is the packet intended?

An IP Address is a NSAP (Network Service Access Point), because it’s all the identification the
network needs. In some networks, a NSAP may be shared between multiple computers.
A port is a TSAP (Transport Service Access Point), it runs over a NSAP and allows two TSAP
interfaces to communicate (a local and a remote one).
It is not very efficient to let all servers listening to a port all day long, therefore the initial
connection protocol is used. A proxy server, process server, receives the requests and spawns a
new server if needed. But, this is only applicable when servers can be created on demand (i.e.
there needs to be a known mapping between a port and application).

Connection establishment
Problem: What if a packet for a connection establishment times out, is resend, and then still
arrives, thus gets duplicated?
Solution: Guarantee a maximum TTL for packets (120s in Ethernet) using a hop counter, and
give each packet a unique identifier which may not be repeated until the TTL expires.

Why use sequence numbers?


• Detect data loss, missing sequence number lost data
• Re-order data, sequence numbers not in ascending order change order
• Detecting duplicates, two segments with same sequence number discard one of
them

Forbidden region
The forbidden region is the set of identifiers that may currently be chosen as an initial
sequence number, and is thus forbidden from use. There are two ways we could enter this
region:
• We send too much data, in which case we’ll hit the forbidden region from the right. To fix
this, send at most 1 segment per clock tick. Though this is very unlikely anyway
• We can send too little data, in which case we’ll hit the forbidden region from the left. This
limits how long a connection can last (max 4 hours).

Say data remains in the network at most T seconds. Then we should not send a segment with
a sequence number that can be an initial sequence numbers within the next T seconds,
otherwise we might get duplicates.

Increasing sequence numbers


We increase the sequence numbers based on the segment size. It should be related to the size
because:
• a part of a segment might be dropped
• fragmentation

When sequence numbers get close to the forbidden region, we can choose a new sequence
number.

Choosing a sequence number


Depending on the time of day we choose this sequence number, so if the host crashes it still
works! If we crash we choose a new sequence number based on the time of day and our
predefined rules.
The 3-way handshake
• First, host 1 sends a connection request to host 2 with host 1’s sequence number
(SYNchronize)
• Then host 2 sends an ACK repeating this sequence number, and including its own
sequence number (SYNchronize-ACKnowledgement). We have 2 sequence numbers on
a connection, one per endpoint. Here we agree on the initial sequence numbers.
• Finally, host 1 can now send data using this new connection (ACKnowledge). Usually, only
after this first data we increase the sequence number.

If the initial request is now repeated (for example, if the ACK gets lost), host 2 just repeats its
ACK with the number, but host 1 sends a reject. So, this handshake can handle duplicates. The
first connection will still work!

Connection release
There are 2 styles of releasing connections
• Asymmetric (one party just stops the connection, informing the other person but not
waiting for an ACK, it may result in data loss)
• Symmetric (both parties must agree before ending the connection)

The two army problem comes into play for symmetric connections, as we are never sure if the
other party got the last message. This problem is unsolvable, so we just add an extra condition
that if the other party hasn’t communicated for a while after a disconnection, to exit the
connection. The initiating party will retry n times and then release the connection.
Error control & flow control
We want to make sure that the packet got through the entire network correctly, as a final check.
Packets may get corrupted by broken routers, and there is no error control to catch this.
Sequence numbers don't tell you this. For error control, we run a known method (CRC,
hamming etc.).

If the receiver of packets don't have the capability to receive all the packets sent (but the
network has), the transport layer of the receiver can tell the sender to slow down. This is flow
control. On the transport layer, we have a higher delay than in the data link layer. We still use
sliding windows, but since sliding windows may be huge (because we have a large bandwidth-
delay product), we want to use dynamic buffers, shared by multiple connections.

Multiplexing
There are multiple processes sharing a single network connection (eg. same IP address), this is
called multiplexing.
A single process using multiple network connections (to increase bandwidth or reliability) is
called inverse multiplexing.
Crash recovery
The hard problem is recovering from host crashes. We don’t know whether we passed on the
data to the next layer yet. If we first send an ACK and then pass on the data, we might have sent
an ACK but not passed on the data. The other way around has the same problem. This problem
is not solvable on a certain layer k without using layer k + 1. The transport layer can propagate
problems to the application layer.

Congestion control
The network layer already does a lot of congestion control, but at some point the best way to
fix congestion issues is to just send fewer packets into the network. This is the responsibility of
the transport layer. Congestion control is used when the network can't handle the amount of
packets sent over the network (whereas with flow control the receiver couldn't handle the load).

Power: used bandwidth (load) / Delay

Fair bandwidth allocation


If we have a total bandwidth of B with N machines, each machine should get NB bandwidth.
Unfortunately this does not always work.

2
It could be that the total bandwidth B = 2, with N = 3. This means each machine gets 23
bandwidth. But this will not work if 2 machines are wired to 1 router, then you get over the
bandwidth over that one router!

Even if you were trying to fairly divide bandwidth, it is hard to decide how many machines there
are and decide how much bandwidth each machine gets. Besides, the path of the packet of a
machine depends on the destination, thus the amount of network congestion also depends on
that. So we have a few unknowns;
• Available bandwidth
• Network topology
• Other clients

Max-min fairness
Max-min fairness is a technique that is frequently used. This maximizes minimum bandwidth,
then uses excess bandwidth where possible. A link is max-min fair if we can’t increase the flow
of a link any further without decreasing the flow of a link with smaller flow. A disadvantage of
this is that we might not use total bandwidth.

How do we choose what to base the fairness on?


• Per computers: Busy servers get the same bandwidth as a mobile phone
• Per connection: Encourages opening more connections

The algorithm must also converge quickly, as bandwidth will change rapidly.

Dynamic bandwidth
We can just trial and error to dynamically adjust the bandwidth. We keep trying to increase the
bandwidth usage and slow down when we receive a congestion signal.

We can detect this congestion by:


1. Explicit feedback from routers (not very common, the routers are stupid)
2. Loss, if we lose packets we decrease bandwidth
3. Latency, if delays between ACKs are bigger than the delays between sending segments

Regulating the sending rate


We must find some way to have a host know how much it may send. Find the optimal point.
The best way turns out to be:
• Additive increase: Try to add a constant amount to the speed each time. If fine, do again,
otherwise decrease
• Multiplicative decrease: Try to multiply by a constant. If fine, do again, otherwise divide
• Combination of additive increase and multiplicative decrease: This makes the line
regress to the fairness line.

Wireless issues
Wireless networks are unreliable, and lose packets all the time. To not make the transport layer
completely useless, we hide a lot of packet losses from the transport layer, and we only send
the packet loss to the transport layer if the packet gets lost after multiple retries.

UDP (user datagram protocol)


Connectionless protocol, doing almost nothing except sending packets to applications.

UDP does do:


• End to end error detection (Optional)
• Demultiplexing

UDP does not do:


• Flow control
• Congestion control
• Retransmissions

Header
It has a 8 byte header:

Source port: The port where the source application sent the data from
Destination port: The port where the target application is located at
UDP Length: The amount of bytes of the UDP header and body together, the minimum is 8
(because of the header size) and the maximum is the maximum size of an IP packet.
Checksum: Add up all the 16 bit words using XOR, the result should be 0 (optional)
Application
UDP is used in applications where using a few frames isn't too bad and speed is more
important. This could be video streaming, realtime games etc. This does mean that the
application layer needs to correct for this.

TCP (transmission control protocol)


TCP is a very reliable but slow protocol. It provides a reliable end-to-end byte stream over an
unreliable network.

Some properties:
• In TCP, both the sender and the receiver create endpoints called sockets. Each socket is
identified by an IP address and a port. All TCP connections are full duplex and point-to-
point
• TCP is a byte stream, not a message stream. So if you send 4 × 512 bytes, this may be
received as 1 × 2048 bytes by the receiver. TCP may also buffer data until it received
enough to send in a single packet
• The TCP protocol uses path discovery to find the maximum transfer unit (MTU) which it
may send. In practice, the MTU is often 1500 bytes (max ethernet packet size)
• TCP is much, much more complex than UDP

Header

Source & Destination port: Identify local endpoints of the connection.


Sequence number: Specifies current segment byte id (specified more later)
Acknowledgement number: Specifies the next byte id it expects
Header length: Amount of 32-bit words in the header

CWR and ECE: Signals congestion, CWR says the sender has slowed down, ECE says the
receiver must slow down
URG: Urgent packet
ACK: Set to 1 if acknowledgement number is valid, set to 0 if acknowledgement number should
be ignored because it’s not used
PSH: This data must be sent to the application layer ASAP, and must not be buffered
RST: Reset this connection, because of a problem (like a host crash)
SYN: Used to establish connections, the connection request has SYN = 1, ACK = 1
SYN = 1, ACK = 0, and the reply has SYN = 1, ACK = 1
FIN: Used to release a connection

Window size: How many more bytes may be sent, starting at the last byte that was
acknowledged (used for flow control)
Checksum: A 16-bit XOR of the header and the data
Options: May be used to provide extra facilities that are not provided by the regular header

TCP conversations
Connection establishment
Uses the 3-way handshake described earlier. One side waits for a connection, the other initiates
it. It also works perfectly if both parties initiate a connection at the same time, since both will
acknowledge with the correct sequence number, thus making only one connection.

An interesting attack is SYN flooding, where we send a large amount of connection requests to
a host, and the host has to remember all connections. This can be solved by not remembering
the initial connections, but rather generating the sequence number from the IP, port, and a local
secret.

Connection release
We can stop a connection, once both sides have sent a packet with the FIN bit. When that bit is
set, the connection is stopped in one direction for new data. If there is no acknowledgement to
a FIN within 2 packet lifetimes, the connection is dropped.

Buffer management
TCP separates the issues of correctly receiving segments (using ACK field) and buffer
management (using WIN field). This is valid TCP conversation:
In TCP, received packets are buffered by the receiver. The length of the buffers available is sent
back to the sender (piggybacked on other messages). On this way, the sender knows when to
send how much, in a way the buffer of the receiver never gets too full. If it's full, it could mean
data is lost

Sequence numbers
• In TCP, every data byte has its own sequence number
• SYN and FIN also increase sequence/ack numbers
• In a TCP segment sent from S to R:
• Sequence number: bytes R should have received before this segment + initial
sequence number of S
• Acknowledgement number: bytes received from R + initial sequence number of
R
• The initial sequence number is clock based but with additional randomness (for security
purposes, to avoid an impersonation attack)
• The sequence numbers always notates the start of the data, not the end.

Flow control
When the window is 0, no packets may be sent by the sender, except:
• Urgent data, for example ending the connection
• A 1-byte packet forcing the receiver to re-announce the next byte excepted and the
window size, called a window probe. This is useful for if a window update ever gets lost.

Another problem is the silly window problem, where the sender sends data in large blocks, but
the receiver reads 1 byte at a time, thus sending a lot of window updates. The solution to this is
to have the receiver only send window updates once a decent amount of new space is
available. Specifically, the receiver should only advertise new space until the buffer is half full, or
it has freed another maximum segment size.

Retransmissions
We retransmit segments after a timer expires. We also retransmit a segment if we received an
acknowledgement of the next segment (called fast retransmission). This has the risk that a
segment might be sent twice. It may also use the time between ACKs, this is called an ACK
clock

ACKs
Senders are not required to send information as soon as it comes in, and receivers are not
required to acknowledge instantly. We can use this to optimize performance. In TCP, we wait
up to 500ms before sending the acknowledgement, in the hope to get a free ride with some
data (piggybacked ACK, then the ACK field is 0 but we increase the ACK number).

If we have nothing to send back, we send an ACK directly (with only the TCP header), with the
ACK field set to 1.

Examples
A few examples of conversations:
Congestion control
TCP maintains a congestion window, whose size is the amount of bytes that the sender may
have on the network at any time. The window size is adjusted according to the AIMD rule
(additive increase, multiplicative decrease). Thus there are two different windows in TCP, you
never send more than the minimum of either of the two windows.

Slow start
AIMD will take a long time to reach a high speed on fast networks. Instead, we use slow start,
which grows exponentially at first. To keep this algorithm from going out of control, we will cap
it off at a threshold, which at first is set to infinity. In TCP Tahoe slow start was implemented
first.

Slow start:
• Start by sending out at most 4 segments
• For each segment that is acknowledged, send out another 2 segments
• When the threshold is hit, switch to additive increase
• When a packet is lost, set the threshold to half of the congestion window, and the entire
process is restarted.
To improve the algorithm, we can use fast recovery: instead of dropping all the way to 0, we can
drop to the threshold directly. TCP Tahoe with fast recovery is used in TCP Reno

Newer TCP versions all use different variations of the same idea of decreasing congestion
window and threshold.

Even more minor improvements have been made:


• We now have selective acknowledgements, which means we can acknowledge a range
of bytes and thus have a better knowledge of what segments were lost.
• We can now use explicit congestion notifications (we explicitly say the network is
congested), in addition to packet losses. We can get these signals from the IP layer.
• Some TCP protocols use precise congestion signals, telling them exactly at what
bandwidth they can send

Das könnte Ihnen auch gefallen