IP Tec For Mobile Networks PDF

Technology
IP for Mobile Networks

TTP18031 D0 SG DEN I1.0
STUDENT GUIDE
All Rights Reserved © Alcatel-Lucent 2009
All rights reserved © Alcatel-Lucent 2008

Passing on and copying of this document, use and communication of its contents
not permitted without written authorization from Alcatel-Lucent

IP for mobile networks - Page 1
Terms of Use and Legal Notices
1. Safety Warning
Switch to notes view!
Both lethal and dangerous voltages may be present within the products used herein. The user is strongly advised not to wear
conductive jewelry while working on the products. Always observe all safety precautions and do not work on the equipment
alone.
The equipment used during this course may be electrostatic sensitive. Please observe correct anti-static precautions.
2. Trade Marks
Alcatel-Lucent and MainStreet are trademarks of Alcatel-Lucent.
All other trademarks, service marks and logos (“Marks”) are the property of their respective holders, including Alcatel-Lucent.
Users are not permitted to use these Marks without the prior consent of Alcatel-Lucent or such third party owning the Mark. The
absence of a Mark identifier is not a representation that a particular product or service name is not a Mark.
Alcatel-Lucent assumes no responsibility for the accuracy of the information presented herein, which may be subject to change
without notice.
3. Copyright
This document contains information that is proprietary to Alcatel-Lucent and may be used for training purposes only. No other
use or transmission of all or any part of this document is permitted without Alcatel-Lucent’s written permission, and must
include all copyright and other proprietary notices. No other use or transmission of all or any part of its contents may be used,
copied, disclosed or conveyed to any party in any manner whatsoever without prior written permission from Alcatel-Lucent.
Use or transmission of all or any part of this document in violation of any applicable legislation is hereby expressly prohibited.
User obtains no rights in the information or in any product, process, technology or trademark which it includes or describes, and
is expressly prohibited from modifying the information or creating derivative works without the express written consent of
Alcatel-Lucent.
All
2 rights reserved © Alcatel-Lucent 2008 All Rights Reserved © Alcatel-Lucent @@YEAR
Technology
IP for mobile networks
4. Disclaimer
In no event will Alcatel-Lucent be liable for any direct, indirect, special, incidental or consequential damages, including lost
profits, lost business or lost data, resulting from the use of or reliance upon the information, whether or not Alcatel-Lucent has
been advised of the possibility of such damages.
Mention of non-Alcatel-Lucent products or services is for information purposes only and constitutes neither an endorsement, nor
a recommendation.
This course is intended to train the student about the overall look, feel, and use of Alcatel-Lucent products. The information
contained herein is representational only. In the interest of file size, simplicity, and compatibility and, in some cases, due to
contractual limitations, certain compromises have been made and therefore some features are not entirely accurate.
Please refer to technical practices supplied by Alcatel-Lucent for current information concerning Alcatel-Lucent equipment and
its operation, or contact your nearest Alcatel-Lucent representative for more information.
The Alcatel-Lucent products described or used herein are presented for demonstration and training purposes only. Alcatel-
Lucent disclaims any warranties in connection with the products as used and described in the courses or the related
documentation, whether express, implied, or statutory. Alcatel-Lucent specifically disclaims all implied warranties, including
warranties of merchantability, non-infringement and fitness for a particular purpose, or arising from a course of dealing, usage
or trade practice.
Alcatel-Lucent is not responsible for any failures caused by: server errors, misdirected or redirected transmissions, failed
internet connections, interruptions, any computer virus or any other technical defect, whether human or technical in nature
5. Governing Law
The products, documentation and information contained herein, as well as these Terms of Use and Legal Notices are governed by
the laws of France, excluding its conflict of law rules. If any provision of these Terms of Use and Legal Notices, or the
application thereof to any person or circumstances, is held invalid for any reason, unenforceable including, but not limited to,
the warranty disclaimers and liability limitations, then such provision shall be deemed superseded by a valid, enforceable
provision that matches, as closely as possible, the original provision, and the other provisions of these Terms of Use and Legal
Notices shall remain in full force and effect.

Course Outline
1. TCP/IP Basics
2. Ethernet technology
1. TCP/IP
1. Basic Concepts
3. Point to Point transport
2. Ethernet technology
4. IP Layer
1. Bridges and Switches
2.5. Virtual LANs
Transport Layer
6. Application
3. Point Services
to Point transport
1. PPP/ML-PPT
7. Quality of Service
4. IP Layer
1.8. IPMPLS Services
addressing
2. Routing principles
9. Introduction to IPSEC
3. Redundancy (HSRP/VRRP)
5. Transport Layer
3 1. User Datagram protocol (UDP) All Rights Reserved © Alcatel-Lucent @@YEAR
Technology
2. Transmission Control Protocol (TCP)
3. SIGTRAN
6. Application Services
1. Synchronization (NTP)
2. FTP/ SFTP
3. Voice over IP (VoIP)
7. Quality of Service
1. QoS problems
2. Mechanisms of the QoS
8. MPLS overview
1. Label switching
2. Traffic engineering
3. MPLS services
9. IPSEC Introduction
1. Security association
2. Tunnel setup
3. IKE

About this Student Guide

Conventions used
Switch to notes in this guide
view!
Note
Provides you with additional information about the topic being discussed.
Although this information is not required knowledge, you might find it useful or
interesting.
Technical Reference
(1) 24.348.98 – Points you to the exact section of Alcatel-Lucent Technical
Practices where you can find more information on the topic being discussed.
Warning
Alerts you to instances where non-compliance could result in equipment damage or
personal injury.
4 Where you can get further information

Technology
All Rights Reserved © Alcatel-Lucent @@YEAR
If you want further information you can refer to the following:

Technical Practices for the specific product
Technical support page on the Alcatel website: http://www.alcatel-lucent.com

Do not delete this graphic elements in here:
1
Section 1
TCP/IP Overview
Technology
Section 1 Page 1
Module Objectives
Upon completion of this module, you should be able to:
Describe the basic concepts of communication over an IP network

Describe the role of the first four layers of the TCP/IP stack list
Explain the operating principle of the main protocols that make up the
TCP/IP stack

TCP/IP Overview
Technology IP for Mobile Networks
Section 1 Page 2
1.1 Basic Concepts

TCP/IP Overview
Section 1 Page 3
1 Basic Concepts
Network Categories
LAN MAN
WAN

TCP/IP Overview
Networks generally fall into three categories, depending on their size and geographical coverage:
Local Area Network (LAN): coverage is limited to a university campus, company premises, etc.
Metropolitan Area Network (MAN): coverage extends to a geographical area, the size of a town. MANs
provide high-speed links between several LANs in the same geographical area (less than one hundred
kilometers).
Wide Area Network (WAN): coverage extends to wide geographical areas.
Section 1 Page 4
1 Basic Concepts
Network Topologies
Bus
Star
Central
Ring

TCP/IP Overview
An IT system is made up of computers connected to each other by communication links (network cables, etc.)
and hardware devices (network boards and other equipment that enables data to circulate properly). The
physical layout of the network (the spatial configuration) is known as the physical topology. Topologies
generally fall into the following categories:
bus topology: in a bus topology, all the computers are connected to the same transmission link.
star topology: in a star topology, the computers in the network are connected to a central equipment
system.
ring topology: in a network with ring topology, the computers are connected to each other in a ring and
communicate in turn.
Section 1 Page 5
1 Basic Concepts
Connectionless Communication Mode
P3 P2
P2
Connectionless network P1
P1 P2 P3
P1
P3
P3 P1
P3 P2 P1 P2

TCP/IP Overview
In a connectionless network:
All packets must know the destination address.
No connection is established: flows to the same destination can travel along different routes.
Data can arrive at the destination in any order.
Section 1 Page 6
1 Basic Concepts
Connection-Oriented Communication Mode
P3
P2 Connection-oriented network
P1
P3
P2 P1 P1
P2
Connectionless
3
network P3 P2
P P2
P1 P2 P1 P3
P3 P2 P1
P1
P3
P3 P1
P3 P2 P1 P2
Path establishment
Data transfer
Path release
TCP/IP Overview
In a connection-oriented network, a connection must be established when two devices wish to communicate.
The intermediate nodes must preserve the context of this connection.
Connection-oriented communication is characterized by:
the setting up of a virtual circuit.
the identification of data by a path identifier.
the delivery of data in the order it is sent.
the need to release the connection after communication.
Section 1 Page 7
1 Basic Concepts
Network Interconnection
LAN
WAN
TCP/IP
network
interconnection
LAN
LAN

TCP/IP Overview
The main role of TCP/IP is the interconnection of networks.

The main difficulty lies in the fact that networks can fall into very diverse categories.
Indeed, connecting networks can involve local business networks based on the following types of topology:
bus
ring
star
Connecting networks can also involves long-haul mesh networks such as:
ATM
Frame Relay
Public Switched Telephone Networks
The role of TCP/IP is therefore to provide universal communication services over diverse physical networks.
Section 1 Page 8
1 Basic Concepts
Communication Needs
• Many kinds of connections:

Some rules
rules are
- Point-to-Point (leased lines, PSTN, etc.) essential for
- Point-to-multipoint (Local Area Networks),
communications
- Virtual connections (Wide Area Networks),

Protocols
• Various Operating Systems
DOS, UNIX, LINUX, etc.
To facilitate the user tasks: file transfer , Some additional
software are
mail exchanges , offered
surf the Net , ….

Services

TCP/IP Overview
Network interconnection brings into play different types of links:

point-to-point links.
multipoint links (deployed mainly in local networks).
virtual-circuit links used in WAN networks (e.g. ATM, Frame Relay, X25).
Network interconnection also brings into play different operating systems, the main ones being:
DOS
Unix
Linux
These operating systems function on machines built by different equipment manufacturers.
Rules therefore had to be defined to enable dialog. These communication rules are known as protocols.
Additional software also had to be developed and integrated in the TCP/IP protocol stack to make it easier for
users wishing to:
transfer files,
exchange e-mails,
surf the internet,
perform many other tasks.
These types of software are known as services.
Section 1 Page 9
1 Basic Concepts
TCP/IP Model
Application
7
Presentation
6
Session
5
HTTP TELNET FTP SMTP DNS TFTP SNMP
Transport
4 TCP UDP
Network
IP ICMP ARP
3
Link IEEE 802.2 (LLC)/802.1 (Bridging)
2 ATM,
IEEE 802.3 (CSMA/CD) PPP/ML PPP,
Physical HDLC...
1 1000Base-SX1000Base-LX1000Base-CX 100BaseT 1000Base-T

TCP/IP Overview
When people refer to communication software, they generally mean the Open Systems Interconnection (OSI) architecture,
which was developed by International Standards Organization (ISO) between 1977 and 1984. The OSI model is broken down
into 7 layers. Each layer plays a specific role: the physical layer is responsible for the transmission of bits over the
transmission medium; the data link layer is responsible for the transmission of frames between devices that are
interconnected physically; the network layer is responsible for routing packets within the network; the transport layer is
responsible for end-to-end message transmission; the session layer is responsible for dialog synchronization; the
presentation layer is responsible for data representation and format conversion; and the application layer is responsible
for hosting network-oriented utilities and applications.
TCP/IP does not follow exactly the same pattern as OSI. The lower-level TCP/IP protocols do not fulfill the role defined by
OSI for the physical and data link layers. At level 3, IP complies with the OSI model. You will discover other very
important network-layer protocols such ARP and ICMP. At level 4, two transport protocols are used: TCP and UDP. Finally,
services are integrated in the three upper layers of the OSI model.
Here are a few examples: HTTP for surfing the internet; Telnet for remote control of a device; FTP for file transfer; SMTP
for e-mail exchange; DNS for internet addressing; TFTP for file transfer, SNMP for network administration.
When people refer to TCP/IP layers or protocols, they are referring not only to these two protocols but to all the
protocols in the stack, which includes TCP and IP.
The TCP/IP sources are available free of charge and were developed independently of any particular architecture,
operating system, or proprietary structure. They can therefore be transported over any type of platform. They form an
open system that is continually evolving and therefore highly popular.
TCP/IP operates over a diverse range of media and technologies such as serial links, coaxial cables, optical fiber, radio
links, ADSL, ATM networks, etc.
The addressing mode is shared by all TCP/IP users regardless of the platform they use. If the address is unique,
communication can take place even if the hosts are on different sides of the world.
The higher protocols are standardized to allow for wide-ranging developments over all types of machines.
Section 1 Page 10
1 Basic Concepts
Standardization
ISOC
IAB
Internet Architecture Board
Internet
Corporation
Internet Engineering Task Force for
www.icann.org Assigned
IESG Names and
Internet Engineering Steering Group Numbers
Area 1 Area 7 IANA www.iana.org

WG WG Internet Assigned Numbers
Working Group Working Group Authority
WG WG
Working Group
RFC editor
Working Group
http://www.rfc-
http://www.rfc-editor.org/rfcsearch.html

TCP/IP Overview
TCP/IP Standardization
The organization responsible for standardization is the "Internet Society". It is made up of individual members
as well as organizations and industrial companies.
The Internet Society is headed by the IAB, which comprises twelve members elected for 2 years.
The IAB is supported by the IETF for studies into new standards and the IANA, which is mainly charged with
assigning official values to certain fields of various protocols and allocating Internet IP addresses.
The IETF is managed by the IESG.
The IETF is divided into Areas. Working Groups are set up within the Areas.
Each Area specializes in a particular Internet field:
one Area is responsible for applications.
another for the Internet.
another for routing.
another for security issues.
another for transport protocols.
the final Area for performance.
It should be noted that the IANA, which was originally formed under the auspices of the American
government, now answers to the ICANN, a non-governmental organization. The new organization has not
affected the responsibilities of the IANA, which continues performing the same functions.
The standards are issued in the form of Request For Comments (RFCs) and are free of charge and available
online.
Section 1 Page 11
1 Basic Concepts
Use of Layers in a TCP/IP Communication
data host server FTP www Mail

21 80 25
Transport IP Transport
Port s→
→21 data Network Port s→
→21 data
Network@IPa Network @IPb
IP@ a→
→b IP@ a→
→b IP@ a→
→b IP@ a→
→b
Link Link
Phys@ 8→
→7 Phys@4→
→15
Phys@ 1→
→2 Phys@ s4→d15
Phys@: 1 Phys@ Phys@ Phys@ Phys@ Phys@
2 6 8 7 4 Phys@: 15
Host Phys@ s1→d2 Phys@ s8→d7 Phys@ s4→d15 Host
Phys@ Phys Phys@

3 @ 34
12
Phys@ 18
Phys
@
Host 9 Host

TCP/IP Overview
When two users wish to communicate, one is the Client because in the IP world the client is defined as the
user requesting the service while the other is the Server because that user provides the service.
Here, the Server is capable of providing various services but the Client wishes to request one service only.
The transport layer is charged with targeting the required service. For this, each application is allocated an
official number known as a "port number". (N.B. the IANA is responsible for allocating a port number to every
new service.) The transport layer sends the datagram to the lower-layer IP. This IP packet must be sent to the
remote server. For this reason, every machine connected to the IP network is therefore assigned a logical
address called an IP address. One of IP jobs is to insert a header. The main fields in this header are the packet
source and destination addresses. The packet is then sent to the data link layer, which encapsulates it in a
frame with a header containing the physical source and destination addresses. Finally, the frame is
transferred to the transmission medium.
All the machines connected to this transmission medium analyze the frame header but because only the
router interface recognizes its physical address it extracts the contents of the frame and transmits them to
the upper-layer IP. The router’s network layer analyzes the packet header, especially its destination IP
address. Its routing table indicates the outgoing interface and the next physically connected device the
packet must pass through to reach its final destination. The IP packet is transferred to the data link layer,
which encapsulates it in a frame. This time, the physical source address is the source router interface address
and the physical destination address is the address of the next router interface. Once again, only the router
recognizes its physical address in the frame transported by the transmission medium. It therefore extracts the
packet from the frame and sends its contents to its network layer. The network layer routes the packet to the
outgoing interface using its routing table.
Finally, the frame is transferred to the last link. The destination machine recognizes its physical address in
the header and sends the contents to its IP. The IP of the final destination machine recognizes its own IP
address in the destination IP field of the packet received. The contents of the packet are then sent to the
transport layer, which examines the header. Thanks to the destination port number contained in the layer-4
protocol header, the data is routed to the service chosen by the Client.
Section 1 Page 12
Answer the Questions
The OSI reference model is quite similar to TCP/IP, with one major
exception. Where does the difference come from?
Layer 1
Layer 3
The top of the stack

TCP/IP Overview
Section 1 Page 13
Answer the Questions [cont.]
What are the attributes of protocol layering that are used by TCP/IP?
Application layer runs only at endpoints
Independent of data link (layer 2) protocol
Independent of network (layer 3) protocol
Independent of physical facilities used

TCP/IP Overview
Section 1 Page 14
Blank page

TCP/IP Overview
Section 1 Page 15
End of Section

TCP/IP Overview
Section 1 Page 16
2
Section 2
Ethernet technology
Technology
Section 2 Page 1
Module Objectives
Upon completion of this module, you should be able to explain:
the principle of CDMA/CD operation

the Ethernet 802.3 frame format
the interest of VLAN
the VLAN tagging process
the 802.1x authentication mechanisms

Ethernet Technology
Section 2 Page 2
1. Ethernet principles

Ethernet Technology
Section 2 Page 3
1 Ethernet principles
CSMA/CD mechanism
Rec R
Collision Loopback Note: The Hub does not
Internal
detection 4-port HUB forward the signal
loopback Trans T on the input port
R
R
2 3
1 T
T
RJ45 connector
R
R
T
T HUB = multiport repeater
<100m
Ethernet Technology
1990: Ethernet arrives on the twisted pair

With the launch of standard 802.3i (10Base-T), the IEEE opened the doors to the Ethernet explosion. The
standard provided for the construction of star networks on simple category-3 Unshielded Twisted Pair (UTP)
cables.
Despite the introduction of the 10Base2 standard, cabling remained tedious.
Because most buildings were already cabled with copper pairs, research was carried out into how these
copper pairs could be used for data transmission.
The results led to the use of 2 pairs for each station:
one for transmitting,
one for receiving.
A frame sent by a station’s transmit pair had to be received by the receive pair of all the other stations. A
device was required to perform this function. This device was known as a HUB.
The Hub also serves as a repeater that amplifies the signals:

The maximum distance between station and Hub is 100 meters.
The connection is made using RJ45 connectors.
Access to the medium remains the same (CSMA/CD):
The station wishing to transmit must first of all ensure that the medium is available.
The transmitted signals are routed by the Hub to all the receive pairs of the other ports.
It should be noted that the Hub that receives signals on the receive pair of one of its ports routes these
signals to the transmission pairs of all the other ports, except the port that received the signals (ingress port).
To ensure collision detection, each 10/100Base-T network interface board (NIC) has internal loopback.
Section 2 Page 4
10/100Base-T: Link Status
Transmission 5 hub
Listening R
? (busy)
2 T 1
(free) Collision
Transmission 7 R
T 6
T
R
Link broken 4
16.8ms
Link Test Pulse
Normal Link Pulse
R LED
Link
T 16.8ms
LED
Link
R T
Ethernet Technology
A machine that does not realize it has a faulty transceiver may start transmitting despite CSMA and cause
collisions. To prevent such a situation from arising, a signal is emitted (when the segment is inactive) to
validate the link. This signal is known as the "Link Test Pulse" or "Normal Link Pulse" and is a 5MHz pulse
emitted every 16.8ms.
In general, a LED is associated with the signal. If the "Link" LEDs on the two interconnected devices are on,
the segment is functioning correctly.
When there are no frames to transmit, each device emits a series of test signals (link test pulses),
interspersed with silences, over the transmit pair. The receive pair of the transceiver at the other end of the
link waits for this signal in order to check the integrity of the line or rather of its receive pair (pair 2).
Section 2 Page 5
10/100/1000 Base T: Cables
10Mb/s 10 Base T Ethernet

100 Mb/s 100 Base TX Fast Ethernet
1000 Mb/s 1000 Base TX Gigabit Ethernet
Twisted
Base band pair
UTP category 5
STP category 5
RJ45
UTP: Unshielded Twisted Pair
STP: Shielded Twisted Pair

Ethernet Technology
10Base-T refers to the Ethernet cabling standard based on twisted pairs.

100 Base T comes in several flavors (T2, T4, TX). Today, it is mainly 100 Base TX that is used.
1000 Base TX is a Gigabit Ethernet technology using twisted pairs. (802.3 ab).
Various cables can be used. They generally comprise 4 copper-wire pairs. The most common are:
UTP cables: category-5 unshielded twisted pairs,
STP cables: category-5 shielded twisted pairs.
The connections are made using 8-pin RJ45 connectors.
Category 5 E cables are adapted for Gigabit Ethernet (up to 100 m)
Section 2 Page 6
10Base-T: Hub Connection
100m
HUB
m 10Base-T
100
100m
m
100
HUB
10Base-T
10 m
0m
100
0m
100
m
10
HUB
10Base-T
HUB
10Base-T
10
0m 0m
10
≤ 500m 100
m
≤ 4 repeaters HUB
10Base-T
0m
10

Ethernet Technology
Characteristics of a 10Base-T LAN

The maximum distance between the Host or router and the Hub is 100 meters.
The number of ports on the Hub is variable.
To increase the number of ports on a 10Base-T LAN, several Hubs can be cascaded. The distance between 2
Hubs is also limited to 100 meters.
The maximum distance between 2 stations is limited to 500m and there can be no more than 4 Hubs between
2 stations.
Section 2 Page 7
Fast Ethernet 100Base-T: Hub Connection
100
m 100m
HUB 100m
100Base-T
m
100
20m
100m
HUB
100Base-T
≤ 220m
≤ 2 repeaters 10
0m
0m
10
10
0m

Ethernet Technology
Fast Ethernet Cabling

100Base-T (also known as "Fast Ethernet") is subject to certain restrictions:
Althoughthe maximum distance between the stations and the Hub is still 100 meters, the maximum distance
between Hubs has fallen to around 20 meters.
The number of Hubs between 2 stations must not exceed 2, which means that the maximum distance
between 2 stations falls to 220 meters.
Section 2 Page 8
Logical Address and Physical Address
xz
IP@ = logical address

Alice Bob
MAC@ = physical address
IP: Internet Protocol

MAC: Medium Access Control

Ethernet Technology
The Medium Access Control (MAC) is part of the data link layer and is responsible for transmitting blocks of
bits (i.e. frames) between devices that are connected to each other physically.
Before looking in detail at the format of a MAC frame, let’s consider the different addressing methods in
TCP/IP.
Two types of address are used in TCP/IP:

The logical address or IP address
The physical address or MAC address
To understand why 2 types of address are used, an analogy can be drawn with the traditional telephone
network.
The logical address could be compared to the people’s names, and the physical address to the telephone
numbers.
When a person, let’s say Alice, wishes to communicate with Bob, her first thought is:
"I’m going to call Bob." However, when she actually makes the call, she will probably have to look in a phone
directory and dial Bob’s telephone number.
The principle is the same in TCP/IP. A station wishes to send a data packet to another station. It indicates the
logical IP address of the remote station. But, in practice, this IP packet will be transported in a frame using
physical addresses. Later on, you will see that the routing tables in TCP/IP are generated automatically by
means of the Address Resolution Protocol (ARP).
Section 2 Page 9
Unicast MAC Address
MAC MAC MAC

00.80.9f.00.02.03 00.18.55.92.a2.08 00.53.27.32.02.c8
Dest: 00.53.27.32.02.c8 ……..
00.35.d6.39.cb.0a 00.6f.66.32.0b.08
MAC MAC

Ethernet Technology
Let’s first look at physical Ethernet addressing.
There are different types of MAC addresses. First of all, the unicast address: this type of address is assigned
to each Ethernet card and is globally unique.
It should be noted that a station with n interfaces will have n MAC addresses.
Unicast addressing is used when a frame needs to be sent to a single, specific station.
The frame placed on the transmission medium can be read by all the stations connected to the LAN.
All of the station interface cards decode the destination MAC address field.
But only the station whose address matches with the MAC address interrupts its processor to deliver it the
contents of the frame. The other stations ignore the frame.
Section 2 Page 10
Broadcast MAC Address
MAC MAC MAC

00.80.9f.00.02.03 00.18.55.92.a2.08 00.53.27.32.02.c8
Dest: ff.ff.ff.ff.ff.ff
00.35.d6.39.cb.0a 00.6f.66.32.0b.08
MAC MAC

Ethernet Technology
The second type of MAC address is the Broadcast address.
This time, a station wishes to send data to all the stations connected to the LAN. Rather than sending n
frames in unicast mode, the transmit station (egress station) uses broadcast addressing. This means that the
destination MAC address field contains only 1s.
Once again, the frame is placed on the transmission medium.

All the interfaces connected read the destination MAC address and see that it is a broadcast.
All the interfaces interrupt their processors to deliver them the contents of the frame.
Section 2 Page 11
Multicast MAC Address
MAC MAC 01.00.5e.00.00.09 MAC

00.80.9f.00.02.03 00.18.55.92.a2.08 00.53.27.32.02.c8
Dest: 01.00.5e.00.00.09 ……..
00.35.d6.39.cb.0a 00.6f.66.32.0b.08
MAC MAC 01.00.5e.00.00.09

Ethernet Technology
The last type of MAC address is the Multicast address.
Certain stations can join a group and receive a second address, known as a multicast address, that is shared
by all stations in the group.
A station wishing to send a frame solely to the stations in the group puts the multicast address in the
destination address field of the frame.
All interfaces connected to the link decode the frame but only stations with the multicast address interrupt
their processors to deliver them the frame data.
Section 2 Page 12
MAC Address - Details
• 6 bytes (48 bits)

O.U.I.: Organizational Unit Identifier (Assigned by IEEE)
vendor code (22 bits) Serial number (24 bits)
0: Universal,
Universal, unique address
U/L: Bit
1: Local, local meaning
0: Individual (or Unicast), associated to only one equipment
I/G: Bit
1: Group (or Multicast), associated to a group of equipment
• Hexadecimal representation (12 digits)
• Examples: CISCO: 0 0 .1 0 .7 B . x x . x x . x x
ALU: 0 0 .8 0 . 9 F . x x . x x . x x
managed by manufacturer
Ethernet Technology
What is the format of a MAC address?

MAC addresses comprise 48 bits or 6 bytes.
How can you ensure that a unicast address is unique?
The IEEE standardization body assigns each Ethernet card manufacturer a 22-bit number.
It is then up to the manufacturer to allocate serial numbers as the cards come off the assembly line and
ensure that the numbers are unique.
MAC addresses generally comprise 12 hexadecimal digits. The codes assigned to manufacturers CISCO and
Alcatel-Lucent, for example, begin with:
00.10.7b for CISCO,
00.80.9f for Alcatel-Lucent.
Certain manufacturers are assigned several codes.
The 2 most significant bits play a special role:
The "Universal / local" bit is not used in Ethernet but rather in Token Ring technology.
The most significant bit is, however, very important since it determines whether the address is unicast (if
the bit is set to 0) or multicast (if the bit is set to 1).
Some people may wonder whether, with the explosion of Internet, 48 bits is enough to cover current, and
indeed future, requirements.
In fact, 48 bits is well over enough since it offers a capacity of around 281 thousand billion combinations.
Even if the first 2 bits have special functions, there is still enough capacity to provide every man, woman and
child on the planet 12,000 Ethernet cards.
Let’s look at it from another angle: if industry produced 100 million interface cards a day, every day of the
year (i.e. 500 times more than is currently produced), it would take 2,000 years to use up the address space
available.
Section 2 Page 13
Ethernet frame format
1518 ≥ length ≥ 64
Bytes 7 1 6 6 2 46 to 1500 4
Preamble SFD MAC @ dest. MAC @ src. Ether Data Padding FCS
7 x ‘AA’ Ethernet frame
type
>5DC
Synchronization Control
Indicate the higher-level protocol
Value > 5DCH or 1500D.
Start Frame Delimiter Examples: IP: 0800H Max Trans. Unit (MTU): 1500
10101011 ARP: 0806H Mini. size: 46 (possibly padding)
IPv6:86DDH MTU: Maximum Transmission Unit
IP: Internet Protocol
ARP: Address Resolution Protocol
FCS: Frame Check Sequence

Ethernet Technology
1980: Beginnings of 10Mbps Ethernet

In Ethernet Version 2, frames begin with a preamble comprising 7 bytes, each of which has the hexadecimal
value "AA". The aim of this preamble is to enable stations currently listening to synchronize with the transmit
(egress) station. "A" in hexadecimal corresponds to 1.0.1.0 in binary. So, the preamble is a long string of 1s
and 0s that generate a clock signal on the transmission medium.
Next, a Start Frame Delimiter (SFD) byte enables stations to detect the end of the preamble and the
beginning of the actual frame itself.
Then there are the destination and source MAC-address fields.
This frame is transporting data intended for higher-level protocols. So the transmit station also uses the
"Ether type" byte to specify which protocol located just above Ethernet is the destination for the data: for
example, 800 if IP is the destination layer, 806 if it is ARP, etc.
These are official values assigned by the IANA. They are always above 5DC in hexadecimal or 1500 in decimal.
Next is the data field. To ensure a minimum of 64 bytes for compliance with the collision-detection
requirements, the data field must contain at least 46 bytes. The transmit station may therefore need to use
padding.
To prevent the transmit station from monopolizing the medium for too long, the data in the frame must not
exceed 1500 bytes.
Finally, frame integrity is checked via a 4-byte Frame Check Sequence (FCS) field.
Frame size is measured after the SFD field, i.e. from the destination MAC address to the FCS field inclusive.
Section 2 Page 14
Other Ethernet frame formats
IP packet
Bytes 3
≤ 1492
2
O. U. I PID Data
0 0 . 0 0 . 0 0 0800
SNAP
Bytes 1 1 1 ≤1497
DSAP SSAP Control

(AA) (AA) (03) Data
LLC 802.2
Bytes 6 6 2 46 to 1500 4 Bytes 6 6 2 46 to 1500 4
Ether
MAC @ MAC @ type Data Padding FCS MAC@ dest. MAC@ src. Long. data
Padding
FCS
dest. src. ≤1500
0800
Eth II frame 802.3 frame
Ethernet Technology
In Ethernet II, an IP packet is directly encapsulated in the MAC frame. The maximum packet length is 1500
bytes. Encapsulation is described in RFC 894.
In 1983, IEEE decided to standardize this protocol. In IEEE, the packet first goes through the Subnetwork
Access Protocol (SNAP) where 5 bytes are added. The main one is the Protocol Identification (PID) byte, which
indicates the encapsulated protocol.
Next, it goes through a Logical Link Control (LLC) where:
the DSAP and LSAP fields contain the value "AA", which indicates that LLC encapsulates SNAP,
the Control field contains the value "03", which signifies "Unnumbered Information".
And finally, IEEE 802.3 formats the frame. The format of the IEEE 802.3 frames for Ethernet is identical to the
Ethernet II format except for one field: the Ethertype field from Ethernet II has been replaced by a payload
length field, which necessarily takes a value less than or equal to 1500 in decimal or 5DC in hexadecimal.
Encapsulation is described in RFC 1042.
N.B. When using SNAP encapsulation, the maximum size for IP packets is 1492 bytes.
Section 2 Page 15
In Ethernet, when a transmitter detects a collision, it:
Signals to upper layer that the network is out of service
Waits a random period of time before retrying
Puts a jam indication on the line
Stops the frame transmission

Ethernet Technology
Section 2 Page 16
Associate each protocol to its defining characteristic.
802.2 Ethernet
802.3 Logical Link Control (LLC)
MAC Contention Resolution
IP Network Address

Ethernet Technology
Section 2 Page 17

Ethernet Technology
Section 2 Page 18
Repeaters
10Base-T 10Base2 AUI (10Base5)
•Media adaptation
•Signal Amplifier
Repeater
Segment Segment
Ethernet Technology
You saw earlier that the length of Ethernet segments is limited and that to extend a LAN, repeaters are
needed to regenerate the signals.
Certain repeaters can also work as adapters enabling transfer from 10Base2 to 10Base5 or 10Base-T.
Repeaters are just signal amplifier devices. They are not intelligent devices.
So, when a station transmits a frame to another station located on the same segment, the repeater
propagates the signals over the other segments. This means that any station located on another segment is
prevented from accessing the transmission medium until the operation is complete.
Lining stations up on the same LAN is the first simple, low-cost step for a local area network. The downside
with this type of architecture is that the number of collisions increases rapidly as traffic increases, which
means a significant reduction in the speed at which data is exchanged.
It would be useful to have devices capable of filtering. An initial solution could be the use of bridges.
Section 2 Page 19
Bridges _ Frame Forwarding
LAN 1 d
a
b
e
Eth 1 f
Eth0
c→a
c bridge
c→f
MAC@ Port
a eth0 LAN 2
b eth0
c eth0
d eth1
e eth1
f eth1

Ethernet Technology
The filtering configuration can be defined manually by storing in the bridge memory the MAC addresses of the
stations associated with each of these ports.
When a frame is moving along a segment, the bridge analyzes the destination MAC address. If the address is
on the same port as the one that detected the frame, the bridge blocks the frame.
If this is not the case, the bridge propagates the frame to the port that corresponds to the destination MAC
address.
It should be noted that bridges do not filter broadcasts and multicasts.
On a large LAN, manual configuration can be time-consuming and maintenance complicated.
Section 2 Page 20
Self-Learning Bridge
"a" sends a frame to "b" filter

MAC@ Port
!!!
2/1
a
?
a b
a b
MAC@: a
2
a b
1 2 1
filter filter
MAC@ Port MAC@ Port
a 1
a 2
a b
filter
!!! MAC@:
b
2 1 MAC@ Port
2/1
a
?
a b
filter
MAC@ Port a b
a 2 2 1
2 1
a b
Ethernet Technology
Let’s now consider the limits of the "Self-Learning Bridge" mechanism.

The network cabling has changed and certain destinations can now be reached via several routes.
"a" sends a frame to "b".
Bridge 1 learns the location of "a".
It doesn’t know where "b" is located and therefore broadcasts the frame. Bridges 2 and 3 then learn the
location of "a".
Bridges 2 and 3 in turn broadcast the frame.
Bridges 4 and 5 are now faced with a dilemma. Both their ports receive a frame with the source MAC address
"a". This means that "a" is located on port 1 and port 2.
This implies that frames will be broadcast over the links and will very soon take up all the available
bandwidth.
As you have seen, the "Self-Learning Bridge" mechanism has its limits: it can only function if there are no
loops in the network.
Section 2 Page 21
Spanning Tree Protocol
Topology Tree representation

Root
234 114 109

175
Loop 234
109
Loop 175 447
562
114
447
492 492 562

Loop
suppression

Ethernet Technology
To overcome this problem but still maintain the automatic mechanism, a special protocol known as the
Spanning Tree Protocol (STP) is implemented in the bridges.
This relatively complex protocol uses Bridge Protocol Data Unit (BPDU) messages to establish specific dialog
between the bridges.
The bridges represent the network topology in the form of a tree. They select a bridge to be the root bridge
and then draw in the connections to form a tree structure. The nodes represent the bridges and the leaves on
the tree are the stations.
The bridges detect loops and remove them. This means there is only one path for getting from one station to
another station, as with a tree for getting from one leaf to another.
Section 2 Page 22
Switch: Principle
Simultaneous
4 x 10Mb/s-port switch communications
Switching fabric
R R
T 1 T
1’
R
R
T
T
4-port switch => the traffic could reach 2 x 10Mb/s

Ethernet Technology
In the past, bridges generally only had 2 ports.

During the 90s, the introduction of 10Base-T links, as well as progress in the field of microprocessors,
Application-Specific Integrated Circuits (ASICs), and memories, made it possible to design bridges with more
ports, which were capable of routing frames simultaneously to several ports at the transmission rate of the
medium.
For marketing reasons, the Switch was born.

But the switch is nevertheless just a bridge equipped with numerous ports.
When a station transmits a frame, the Switch, just like a bridge, analyzes the destination MAC address and,
based on the information in its filter memory, sends the frame to the appropriate link(s).
At the same time, another station can also transmit a frame that will be routed by the Switch to the right
output port(s).
So, unlike the Hub, the Switch makes it possible to increase transmission-medium bandwidth by performing
several operations simultaneously.
Section 2 Page 23
Switch: Full and Half Duplex
Full duplex
HUB Switch
Transmit Transmit Receive Buffer
Collision Receive Transmit Buffer

Loopback
detection
Receive
Collision
Receive Buffer
Transmit
Loopback Collision
Collision Loopback
detection
detection
Transmit Buffer
Receive
Half duplex

Ethernet Technology
Segmentation
On a segment with several stations, various mechanisms must be implemented:
A mechanism for accessing the transmission medium i.e. listening to the link to determine whether it is
available or unavailable,
A mechanism for detecting collisions.
Correct communication is always in half-duplex mode. Indeed, at any given time, a single station transmits
while the others listen.
Collisions can occur in cases where frames transmitted by several stations are mixed up on the receive pair.
Generally, therefore, both the station side and the switch side can be configured to function in half-duplex or
full-duplex mode.
Micro-segmentation
In the case of micro-segmentation, where a single station is connected to a switch port, collisions cannot
occur. Indeed, there is only one transmitter on a pair.
Consequently, the station wishing to transmit does not need to use the collision-detection mechanism.
Moreover, the station should function in full-duplex mode if it has that capability.
By default, the NICs of stations wishing to transmit listen to the transmission medium beforehand. If they
detect traffic, they postpone transmission to avoid causing a collision.
So, if on a micro segment this mechanism is not disabled, the station (or the port of the Switch in the other
direction) will continue to function in half-duplex mode and delay transmission for fear of causing a collision.
The NIC internal loopback mechanism must therefore be disabled. This can be configured manually or via the
auto-negotiation mechanism.
Section 2 Page 24
Switch: Auto-Negotiation
Link state detection 16.8ms

Normal Link Pulse
Auto-
Auto-negotiation 2ms Fast Link Pulse
17..33 pulses
•100BASE-TX Full Duplex

•100BASE-T4
•100BASE-TX,
•10BASE-T Full Duplex
•10BASE-T

Ethernet Technology
Auto-Negotiation
Most Ethernet interfaces, such as adapters (NICs) for PCs or workstations and Switches, are capable of
adapting their transmission speed (10/100) and mode (Half or Full Duplex).
This is done at start-up by exchanging the Fast Link Pulse (FLP), which is the equivalent of the Normal Link
Pulse (NLP) used for the 10Base-T integrity test.
This means that two devices with auto-negotiation capability can define the best method for working
together from the options specified below (in order of preference):
1. Full-duplex 100Base-TX
2. 100Base-T4
3. 100Base-TX
4. Full-duplex 10Base-T
5. 10Base-T
Section 2 Page 25
Switch: Full-Duplex Mode Advantage
Segmentation Micro-
Micro-segmentation
hub Switch Independent rate for each station
10
Mb/s
100 Mb/s Switch 10 Mb/s
Shared bandwidth
10 Mb/s 100 Mb/s
Half duplex
Access contention
Extended length Full Bw
free
medium no Full duplex
?
No need for No need for
Collision detection access contention collision detection
free Transmission no
Transmission no medium =
= reception
reception ? delay
delay

Ethernet Technology
To conclude, let’s compare the characteristics of segmentation and micro-segmentation:
With segmentation, transmission speed is the same for all stations; with micro-segmentation, transmission
speed is independent between stations.
With segmentation, the bandwidth is shared between all the stations; with micro-segmentation, each
station uses the full bandwidth.
With segmentation, the medium-control mechanism must be implemented, implying operation in half-
duplex mode; with micro-segmentation, this mechanism isn’t required and full-duplex mode is therefore
possible.
With segmentation, the collision-detection mechanism must be implemented; with micro-segmentation,

collision detection isn’t required.
Finally, with segmentation, the maximum distance between 2 stations is limited to enable collision
detection; with micro segmentation, there is no limit since collisions are no longer possible. The limit is solely
dependent on the signal transmission technique. Repeaters can always be installed.
1997: Full Duplex Ethernet

The arrival of standard 802.3x enabled communication simultaneously in both directions.
In full-duplex mode, both stations can communicate at 200Mbps over a point-to-point link.
Section 2 Page 26
Network design (1) _ Hubs
HUB
Sale
s
1 Wiring R&D rt t
2 m po men
Fina I rt
HUB nce
s pa
2 Communication de
Sale
s
rt
R&D
x po ent
Fina
E rtm
nce pa
s de

Ethernet Technology
Let’s now consider a scenario in which a building is cabled using Hubs and how communication takes place
between two stations.
The frames exchanged are broadcast over the whole LAN, preventing other exchanges from taking place
simultaneously and also bothering stations that are not concerned by the transaction.
Section 2 Page 27
Network design (2) _ Bridge and hubs
HUB
Sale
s
Filtering Bridge R&D rt t
po men
Fina m
I art
HUB nce p
s de
Sale
s
rt
R&D
x po ent
Fina
E rtm
nce pa
s de

Ethernet Technology
Compared with a cable set-up based on segmentation, you can see that communication is more effective
when the stations are on the same segment.
Section 2 Page 28
Network design (2) _ Bridge and hubs
HUB
Sale
s
Bridge R&D rt t
Fina m po men
nce
I art
HUB s p
de
Sale
s
rt
R&D
x po ent
Fina
E rtm
nce pa
s de

Ethernet Technology
But the same drawbacks exist for communications between stations located on different segments.
Section 2 Page 29
Network design (3) _ Switches
Sale
s
1 Wiring R&D
p ort ent
nce 2 Im artm
Fina
s p
de
2 Communication
Sale rt
s R&D
x po ent
Fina E rtm
nce pa
Switch
s de
Micro-
Micro-segmentation

Ethernet Technology
Installing a switch can bring numerous advantages in terms of:
cabling, since the connections are centralized in a single technical location. A switch usually has a large
number of ports. Some of them can be stacked and interconnected using special links.
communication, thanks to micro-segmentation.
Section 2 Page 30
What is the advantage of Full-Duplex Ethernet over Half-Duplex

Ethernet?
Effective doubling of the link bandwidth
Simpler Management
Support of Voice

Ethernet Technology
Section 2 Page 31
What Ethernet operation mode allows a device to either transmit or

receive?
Auto-negotiation
Full duplex
Half duplex
Spanning Tree

Ethernet Technology
Section 2 Page 32
Match each Ethernet technology to its appropriate function.
Half duplex Matches speed
Full duplex Finds a backup after failure
Auto-negotiation One simultaneous transmitter
Spanning tree 200Mbits/s on Fast Ethernet

Ethernet Technology
Section 2 Page 33
Imagine that you are an Ethernet switch, examining a frame header to

determine what to do. Match each situation to the appropriate action.
Match address of ingress port Broadcast
Match entry for one egress port Forward
No matches Filter
All ones Flood

Ethernet Technology
Section 2 Page 34
Match each protocol to the appropriate layer.
Ethernet Physical
UDP Data link
Auto-negotiation Network
IP Transport

Ethernet Technology
Section 2 Page 35
3. Virtual LAN

Ethernet Technology
Section 2 Page 36
3. Virtual LANs
Problem
SW
F M F F M M
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
F _ Finances
M Marketing
Physical and logical topology :
a single networks

Ethernet Technology
Broadcast traffic is seen and processed by all the users connected to the switch, independently of the
fact that they might not be concerned by the content of the message. Security is also weak in this
environment, a user with a packet sniffer will be able to see the content of many messages.
Section 2 Page 37
3. Virtual LANs
Solution
VLAN id Members
10 (Marketing) Ports 2, 5, 6
20 (Finances) Ports 1, 3, 4
SW
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
F M F F M M
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
Physical topology Logical topology: two isolated networks

Ethernet Technology
The best solution available for simple broadcast contention is the use of VLAN. Even though users are still
physically connected to the same device, they will be isolated in different logical networks and no traffic
from a VLAN can be seen by a user of another VLAN.
The simplest way to create a VLAN in a switch is per port. Each port is explicitly assigned to a VLAN. The
association port –VLAN is stored by the switch in VLAN table. Each VLAN is identified with VLAN id.,
which is a number between 0 and 4095. Usually, VLANs are also given a label that is easier to remember
than a number. By default all ports in the switch are members of VLAN 1. Configuring a VLAN for a port
means removing the port from VLAN 1 and assigning it to a new VLAN.
After VLANs have been implemented, instead of forwarding broadcast traffic to every port, the switch
will forward a broadcast frame only to the ports that are members of the same VLAN as the port
originating it. Unicast traffic will be forwarded to the destination port only if it is a member of the same
VLAN as the source.
InterVLAN communication is not possible at layer 2. A layer 2 switch cannot switch frames between two
different VLANs
Other methods to implement VLAN: by MAC address, by protocol, LANE (LAN emulation for ATM
transport)
Section 2 Page 38
3. Virtual LANs
Access links
VLAN Members
10 (Marketing) Ports 2, 5, 6
20 (Finances) Ports 1, 3, 4
Ethernet Switch
Port 1 Port 2 Port 3 Port 4 Port 5 Port 6
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
F F F
Untagged Ethernet Frame Dest

Dest Src
Src Ethertype
Ethertype Data
Data FCS
FCS

Ethernet Technology
An access port is a switch port that is connected to a terminal device eg. A PC or printer. It is a member
of a single VLAN.
As all the traffic originated on or destined for this port is for the same VLAN, no particular mechanism is
needed to mark the frames (the VLAN membership of the port is already known to the switch). In this
case, the port will be untagged. The untagged VLAN is also called the native VLAN.
Section 2 Page 39
3. Virtual LANs
VLAN spanning multiple switches _ Problem
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
Port 7
Port 7
VLAN Members
?
VLAN id Members
10 Marketing Ports 3, 6, 7
10 Marketing Ports 2, 5, 6, 7
11 Engineering Ports 2,5
20 Finances Ports 1, 3, 4, 7
20 Finances Ports 1, 4, 7
SW1 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6 SW2 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
F M F F M M F E M F E M

Ethernet Technology
Section 2 Page 40
3. Virtual LANs
VLAN tagging
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
Port 7 Port 7 SW2

SW1
VLAN id Members VLAN tag
VLAN id Members VLAN tag
Marketing Ports 3, 6, 7
Marketing Ports 2, 5, 6, 7
Engineering Ports 2,5
Finances Ports 1, 3, 4, 7
Finances Ports 1, 4, 7
Port 1 Port 2 Port 3 Port 4 Port 5 Port 6 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
ff:ff:ff:ff:ff:ff
F M F F M M F E M F E M

Ethernet Technology
To extend a VLAN to span several switches, the switches will be interconnected using trunks.
Unlike the access links, trunks can carry the traffic of multiple VLANs. To identify the VLAN a frame
Belongs to, a label or tag is added to the frame. It contains information about the VLAN originating the
frame. A frame carrying a VLAN tag is called a tagged frame.
In a trunk, only one VLAN can be untagged (the native VLAN). Frames originated in all the other VLANs
must be labelled before transport.
Section 2 Page 41
3. Virtual LANs
Trunking
Dest
Dest Src
Src 802.1q
802.1q tag
tag Ethertype
Ethertype Data
Data FCS
FCS
Trunks must carry traffic

for multiple VLANs
untagged 20 10
Port 7 Port 7
SW1 SW2
Port
Port77isismember
memberof:of:
VLAN
VLAN 10 ->tag
10 -> tag==10
10
VLAN
VLAN20 20->->tag
tag==20
20
VLAN 1 -> untagged
VLAN 1 -> untagged

Ethernet Technology
In a trunk, only one VLAN can be untagged (the native VLAN). Frames originated in all the other VLANs
must be labelled before transport.
By default, a trunk carries all the VLANs configured in the switch. The process of removing unused VLANs
from the trunk is called “VLAN pruning”
Section 2 Page 42
3. Virtual LANs
802.1Q tagging
Destination Address
The next field contains a VLAN tag
Source Address
Ethertype = 0x8100
Length/Type
User Priority CFI
4 bytes (802.1p)
Tag Control Information
Data VID (VLAN ID) – 12 bits
Length/Type
PAD
Data User
User Priority
Priority (3
(3 bits)
bits) __ used
used for
for Class
Class of
of Service
Service
(CoS)
(CoS) marking
marking in in 802.1p
802.1p
FCS
CFI
CFI (1
(1 bit)
bit) __ Canonical
Canonical Format
Format Identifier
Identifier
Set
Set to
to 00 for
for Ethernet
Ethernet networks
networks
PAD
VLAN
VLAN id
id (12
(12 bits)
bits) __ VLAN
VLAN identifier.
identifier. It
It can
can
take
take values
values inin the
the range
range between
between 00 and
and 4095
4095
FCS Value
Value 11 is
is usually
usually assigned
assigned to
to the
the Default
Default
VLAN
VLAN

Ethernet Technology
The tagging scheme proposed by the 802.3ac standard recommends the addition of the four octets after
the source MAC address. Their presence is indicated by a particular value of the EtherType field (called
TPID), which has been fixed to be equal to 0x8100. When a frame has the EtherType equal to 0x8100,
this frame carries the tag IEEE 802.1Q/802.1p. The tag is stored in the following two octets and it
contains 3 bits of user priority, 1 bit of Canonical Format Identifier (CFI), and 12 bits of VLAN ID (VID).
The 3 bits of user priority are used by the 802.1p standard; the CFI is used for compatibility reasons
between Ethernet-type networks and Token Ring-type networks. The VID is the identification of the
VLAN, which is basically used by the 802.1Q standard; being on 12 bits, it allows the identification of
4096 VLANs.
After the two octets of TPID and the two octets of the Tag Control Information field there are two octets
that originally would have been located after the Source Address field where there is the TPID. They
contain either the MAC length in the case of IEEE 802.3 or the EtherType in the case of Ethernet II.
Note _ Adding a tag in a frames implies that the FCS field has to be recomputed by the switch
Section 2 Page 43
3. Virtual LANs
Aggregation layer problem
Customer 1
40 VLAN 41
Customer 1
VLAN 40
? 40
Customer 2 Service Provider Network Customer 2

VLAN 40 VLAN 30
Customer 1
VLAN 42
Dest
Dest Src
Src 802.1q
802.1q tag
tag Ethertype
Ethertype Data
Data FCS
FCS
A single VLAN space to share among all clients = No overlapping allowed

Ethernet Technology
A Service Provider that offers transport services to the clients must support the client VLANs e.g.
transparently transport the VLAN tag across the network. It means that all the provider customers are
sharing the VLAN space e.g. VLAN id range 1 to 4095.
Two customers configuring their networks independently might choose VLAN identifiers that are identical. In
that case, the provider egress switch cannot which customer network is the actual destination of the frame.
In this case, no overlapping can be allowed. Besides the maximum limit of 4095 VLAN is usually sufficient for
enterprise networks but might not be enough for a Provider network
Section 2 Page 44
3. Virtual LANs
Q in Q tagging
VLAN ID 10 -> Customer1->port 2
VLAN ID 20 -> Customer2->port 5
Customer 1
VLAN 41
Customer 140 10 40
VLAN 40
Customer 2 10 40
VLAN 40
Service Provider Network Customer 2
VLAN 30
Customer 1
VLAN 40
Dest
Dest Src
Src Customer
Customer ID
ID Site
Site ID
ID Ethertype
Ethertype Packet
Packet FCS
FCS
The CPE adds a tag to identify the customer. Overlapping VLAN id

indifferent customers are not a problem
Ethernet Technology
A solution to the problem in the previous slide might be the use of an additional VLAN tag. This tag could be
inserted by the provider or the remote CPE and it will identify the customer or service. This method of
encapsulation is called Q in Q.
With Q in Q encapsulation, every customer can potentially use the whole VLAN ids space.
Section 2 Page 45
4. LAN Authentication

Ethernet Technology
Section 2 Page 46
Who are you ?
Authorized User
Protected resources
Unauthorized User

Ethernet Technology
IEEE 802.1x _2001 _ Port-based network access control

802.1aa _ Revision of the 802.1x, work in progress
Section 2 Page 47
802.1x components
(2)
(1) Authentication Server

(RADIUS)
Wired connection
(3)
Network Access Server
Protected Network
Wireless association
Access Point
Supplicants Authenticators
1. Authenticator detects the presence of the client and sets port to “unauthorized state”. The authenticator sends an EAP-Request to the supplicant.
2. Supplicant responds and the authenticator forwards the response to the RADIUS server. The RADIUS will verify the client credentials.
3. If the authentication server accepts the request, the authenticator set the port to “authorized state” and normal traffic is forwarded

Ethernet Technology
IEEE 802.1X is an IEEE standard for port-based Network Access Control. It provides an authentication
mechanism to devices wishing to attach to a LAN, either establishing a point-to-point connection or
preventing it if authentication fails. It is used for most wireless 802.11 access points and is based on the
Extensible Authentication Protocol (EAP).
802.1X involves communications between a supplicant, authenticator, and authentication server. The
supplicant is often software on a client device, such as a laptop, the authenticator is a wired Ethernet
switch or wireless access point, and an authentication server is generally a RADIUS database. The
authenticator acts like a security guard to a protected network. The supplicant (i.e., client device) is not
allowed access through the authenticator to the protected side of the network until the supplicant’s
identity is authorized.
Upon detection of the new client (supplicant), the port on the switch (authenticator) is enabled and set to
the "unauthorized" state. In this state, only 802.1X traffic is allowed; other traffic, such as dhcp and http, is
blocked at the data link layer. The authenticator sends out the EAP-Request identity to the supplicant, the
supplicant responds with the EAP-response packet that the authenticator forwards to the authenticating
server. If the authenticating server accepts the request, the authenticator sets the port to the "authorized"
mode and normal traffic is allowed. When the supplicant logs off, it sends an EAP-logoff message to the
authenticator. The authenticator then sets the port to the "unauthorized" state, once again blocking all non-
EAP traffic.
Note_ In wireless environments, instead of a physical link, the supplicant creates an association with an
access point.
Section 2 Page 48
EAP message format
1 byte 1 byte 2 byte
Data
Data
Code
Code Identifier
Identifier Total
Total packet
packet length
length Type Length Type-Data
1 Request 1 = Identify
2 = Notification
2 Response 3 = Nak (response only)
3 Success 4 = MD5-Challenge
4 Failure 5 = OTP (One Time Password)
9 = RSA Public Key Authentication
13 = EAP-TLS
EAP Request/Response Packet 17 = EAP-Cisco Wireless (LEAP)
21 = EAP-TTLS
22 = Remote Access Service
23 = UTMS Authentication and Key Agreement
25 = PEAP
26 = MS-EAP Authentication
…….
1 byte 1 byte 2 byte
Code
Code Code
Code Total
Total packet
packet length
length Data
Data
EAP Configuration Negotiation Packet

Code
Code Length
Length Authentication
Authentication Prot.
Prot. (0xC227)
(0xC227)

Ethernet Technology
Section 2 Page 49
802.1x authentication
EAPOL encapsulation RADIUS encapsulation
Presence detected
EAPOL
EAP - Identity Request
EAPOL RADIUS Access-Req

EAP-Response (Identity) EAP-Response (Identity)
EAPOL RADIUS Access-Granted

EAP - Success EAP-Success
or or
EAPOL RADIUS Access-Reject
EAP- Failure EAP-Failure
Supplicant Authenticator Authentication Server

(NAS or Access Point (RADIUS)
Ethernet Technology
EAP _ Extensible Authentication Protocol (RFC 2284)

RADIUS support for EAP (RFC 3579)
The protocol used to carry the EAP method between in 802.1x is called EAP encapsulation over LANs (EAPOL).
It is currently defined for Ethernet-like LANs including 802.11 wireless, as well as token ring LANs such as
FDDI. A “type 0” EAPOL frame carries an EAP message. The “type 0” indicates to the receiver (either
supplicant or authenticator) that it should strip off the EAPOL encapsulation and process the EAP data.
EAP messages are encapsulated and transported within Ethernet frames with the Ethertype field set to the
value 0x88FE. EAPOL is an alternative to RADIUS or DIAMETER to carry the messages across the LAN between
the Authenticator and the supplicant.
The standard requires the implementation of the following EAP-methods:

MD5 challenge
One Time passwords (OTP)
Generic Token Card
In addition, there are many proprietary and RFC-based EAP-methods: EAP-TLS, EAP-TTLS, EAP-FAST, EAP-
LEAP, etc.
Section 2 Page 50
Blank page

Ethernet Technology
Section 2 Page 51
End of Section

Ethernet Technology
Section 2 Page 52
3
Section 3
Point to Point Transport
IP Technology
Section 3 Module Page 1

Blank Page
3 2 All Rights Reserved © Alcatel-Lucent 2009

IP Technology IP for Mobile Networks
This page is left blank intentionally
Document History
Edition Date Author Remarks
01 YYYY-MM-DD Last name, first name First edition

1. Point-to-Point protocol (PPP)


1. Point to Point protocol
What is PPP ?
Flag
Flag Address
Address Control
Control Protocol
Protocol Payload
Payload FCS
FCS Flag
Flag
7E
7E FF
FF 03
03 22 bytes
bytes Maximum
Maximum 1500
1500 bytes
bytes 22 or
or 44 bytes
bytes 7E
7E
PPP Connection
Transport Network
Router (leased line, SDH/PDH, ISDN, PSTN, Router
L2TP/GRE tunnels, etc)
PPP Connection IP network

Access Network
(PSTN, ISDN, Wifi, GPRS/UMTS) Network Access Server
Client
(NAS)
PPP is a connection-oriented protocol that enables layer two links over a variety of different physical
layer connections. It is supported on both synchronous and asynchronous lines, and can operate in half-
duplex or full-duplex mode. It was designed to carry IP traffic but is general enough to allow any type of
network layer datagram to be sent over a PPP connection. As its name implies, it is for point-to-point
connections between exactly two devices, and assumes that frames are sent and received in the same
order.
PPP is a complete link layer protocol suite for devices using TCP/IP, which provides framing,
encapsulation, authentication, quality monitoring and other features to enable robust operation of
TCP/IP over a variety of physical layer connections.
Flag: Indicates the start of a PPP frame. Always has the value “01111110” binary (0x7E)
Address: this field has no real meaning. It is thus always set to “11111111” (0xFF or 255 decimal), which
Is equivalent to a broadcast (it means “all stations”).
Control: in PPP it is set to “00000011” (3 decimal).
Protocol: Identifies the protocol of the datagram encapsulated in the Information field of the frame.
Information: Zero or more bytes of payload that contains either data or control information, depending
on the frame type. For regular PPP data frames the network-layer datagram is encapsulated here. For
control frames, the control information fields are placed here instead.
Padding: In some cases, additional dummy bytes may be added to pad out the size of the PPP frame.
Frame Check Sequence (FCS): A checksum computed over the frame to provide basic protection against
errors in transmission. This is a CRC code similar to the one used for other layer two protocol error
protection schemes such as the one used in Ethernet. It can be either 16 bits or 32 bits in size (default is
16 bits). The FCS is calculated over the Address, Control, Protocol, Information and Padding fields.
Flag: Indicates the end of a PPP frame. Always has the value “01111110” binary (0x7E)

PPP connection setup
PPP Connection IP network

Access Network
Client NAS
LCP negotiation : compression,

authentication protocol selection, ….
Authentication : PAP, CHAP,

MS-CHAP, EAP..
NCP negotiation : IP address, ….
Data transfer

Even though PPP is called a “protocol” and even though it is considered part of TCP/IP—depending on
whom you ask—it is really more a protocol suite than a particular protocol. The operation of PPP is based
on procedures defined in many individual protocols.
The PPP standard itself describes three “main” components of PPP:
PPP Encapsulation Method: The primary job of PPP is to take higher-layer messages such as IP datagrams
and encapsulate them for transmission over the underlying physical layer link. To this end, PPP defines a
special frame format for encapsulating data for transmission, based on the framing used in the HDLC
protocol. The PPP frame has been specially designed to be small in size and contain only simple fields, to
maximize bandwidth efficiency and speed in processing.
Link Control Protocol (LCP): The PPP Link Control Protocol (LCP) is responsible for setting up,
maintaining and terminating the link between devices. It is a flexible, extensible protocol that allows
many configuration parameters to be exchanged to ensure that both devices agree on how the link will
be used.
Network Control Protocols (NCPs): PPP supports the encapsulation of many different layer three
datagram types. Some of these require additional setup before the link can be activated. After the
general link setup is completed with LCP, control is passed to the PPP Network Control Protocol (NCP)
specific to the layer three protocol being carried on the PPP link. For example, when IP is carried over
PPP the NCP used is the PPP Internet Protocol Control Protocol (IPCP). Other NCPs are defined for
supporting the IPX protocol, the NetBIOS Frames (NBF) protocol, and so forth.

1 Overview
PPP standards
Network
IP IPX AppleTalk
Authentication Protocols
PPP
CHAP PAP
NCP
Link LCP
HDLC
Physical ADSL/ATM SDH/PDH ISDN …….

Additional PPP functional groups
LCP Support Protocols: Several protocols are included in the PPP suite that are used during the link
negotiation process, either to manage it or to configure options. Examples include the authentication
protocols CHAP and PAP, which are used by LCP during the optional authentication phase.
LCP Optional Feature Protocols: A number of protocols have been added to the basic PPP suite over the
years to enhance its operation after a link has been set up and datagrams are being passed between
devices. For example, the PPP Compression Control Protocol (CCP) allows compression of PPP data, the
PPP Encryption Control Protocol (ECP) enables datagrams to be encrypted for security, and the PPP
Multilink Protocol (ML/PPP) allows a single PPP link to be operated over multiple physical links. The use
of these features often also requires additional setup during link negotiation, so several define
extensions (such as extra configuration options) that are negotiated as part of LCP.

LCP frame format
LCP
C021 Code Ident Length Data
16 8 8 16
Request/Response nb
8 8
Set up:
Length = Type Length data
1: Configure Request
code+ Id+ Length+ Data
2: Configure Ack
3: Configure Nack Length= Type+ Length+ Data
4: Configure Reject
Termination 1: Maximum Receive Unit

5: Terminate Request 2: Asynch control character Map
6: Terminate Ack 3: Authentication Protocol (PAP, CHAP)
4: Link Quality Protocol
Link management 5: Magic number (loop detection)
7: Code Reject 7: Protocol field compression
8: Protocol Reject 8: Address et control field compression
9: Echo Request 9: FCS alternative
10 Echo Reply 10: Self describing padding (padding ff)
11: Discard Request 13: Callback
14: Compound frame
Extension
12: Identification
13 : Time Remaining

There are three classes of LCP packets:
1. Link Configuration packets used to establish and configure a link (Configure-Request, Configure-
Ack, Configure-Nak and Configure-Reject).
2. Link Termination packets used to terminate a link (Terminate- Request and Terminate-Ack).
3. Link Maintenance packets used to manage and debug a link (Code-Reject, Protocol-Reject, Echo-
Request, Echo-Reply, and Discard-Request).

LCP options
Type Length data

8 8 16
Maximum
Max Receive Unit 01 04 Receive Unit (Default 1500)
8 8 16
C023 (PAP) (Default no
Authentication Protocol 03 ≥04 C223 (CHAP) Data authentication)
8 8 32
Asynchronous Control Asynch Control Character Map
Character Map 02 06
(Default 0xffffffff)

Maximum-Receive-Unit
This Configuration Option may be sent to inform the peer that the implementation can receive
larger frames, or to request that the peer send smaller frames. If smaller frames are requested, an
implementation MUST still be able to receive 1500 octet frames in case link synchronization is lost.
Authentication-Protocol
On some links it may be desirable to require a peer to authenticate itself before allowing network-
layer protocol packets to be exchanged. This Configuration Option provides a way to negotiate the
use of a specific authentication protocol. By default, authentication is not necessary.
Quality-Protocol
On some links it may be desirable to determine when, and how often, the link is dropping data. This
process is called link quality monitoring.
This Configuration Option provides a way to negotiate the use of a specific protocol for link quality
monitoring. By default, link quality monitoring is disabled.
Async-Control-Character-Map
This Configuration Option provides a way to negotiate the use of control character mapping on
asynchronous links. By default, PPP maps all control characters into an appropriate two character
sequence. However, it is rarely necessary to map all control characters and often it is unnecessary
to map any characters.

LCP Options (continue)
Type Length Data

8 8 32
Magic number 05 06 Magic number
Address & Control (Default if

compression 08 02 compression not
active)
(By default no
Protocol compression 07 02 compression)
Flag Address Control CRC Flag

Protocol
7E FF 03 7E
1 1 1 2 2 1
Prot
1
Magic-Number
The Magic-Number field is four octets and aids in detecting links which are in the looped-back
condition
Protocol-Field-Compression
This Configuration Option provides a way to negotiate the compression of the Data Link Layer
Protocol field. By default, all implementations MUST transmit standard PPP frames with two octet
Protocol fields. However, PPP Protocol field numbers are chosen such that some values may be
compressed into a single octet form which is clearly distinguishable from the two octet form.
Address-and-Control-Field-Compression
This Configuration Option provides a way to negotiate the compression of the Data Link Layer
Address and Control fields. By default, all implementations MUST transmit frames with Address
and Control fields and MUST use the hexadecimal values 0xff and 0x03 respectively. Since these
fields have constant values, they are easily compressed. This Configuration Option is sent to
inform the peer that the implementation can receive compressed Address and Control fields.
Compressed Address and Control fields are formed by simply omitting them.
Callback
This Configuration Option provides a method for an implementation to request a dial-up peer to
call back. This option might be used for many diverse purposes, such as savings on toll charges.
Compound-Frames
This Configuration Option provides a method for an implementation to send multiple PPP
encapsulated packets within the same frame.

One way LCP negotiation example
A (NAS) B (Client)
Configure-
Configure-Request/
Request Id: 1f/ MRU: 1000; asyncmap : 0; MRU: 1000 (ack);
Auth: PAP; MagicNb: 2f 4e6a; Prot-Compression; asyncmap : 0 (nack);
Addr/ctl-compression Auth: PAP (ack);
MagicNb: 2f 4e6a (ack);
Prot-
Prot-Compression (rej
(rej);
);
Addr/ctl-compression(ack)
Configure-
Configure-Reject/
Reject Id: 1f/ Prot-Compression;
Configure-
Configure-Request/
Request Id: 20/ MRU: 1000; asyncmap : 0; MRU: 1000 (ack);
Auth: PAP;MagicNumber:2f 4e6a;Add/ctl-compression asyncmap : 0 (nack
(nack);
);
Auth: PAP (ack);
Configure-
Configure-Nack/
Nack Id: 20/ asyncmap :
0x2000;
‘A’ prefers
default
value of Configure-
Configure-Request/
Request Id: 21/ MRU: 1000; Auth: PAP;
MagicNumber: 2f 4e6a; Addr/ctl-compression MRU: 1000 (ack);
asyncmap Auth: PAP (ack);
Configure-
Configure-Ack/
Ack Id: 21/ MRU: 1000; Auth: PAP;
MagicNumber: 2f 4e6a; Addr/ctl-compression

The process starts with the initiating device e.g. A creating a Configure-Request frame that contains a
variable number of configuration options that it wants to see set up on the link. This is basically device
A's “wish list” for how it wants the link created.
The other device receives the Configure-Request and processes it. It then has three choices of how to
respond:
If every option in it is acceptable in every way, device B sends back a Configure-Ack

(“acknowledge”). The negotiation is complete.
If all the options that device A sent are valid ones that device B recognizes and is capable of
negotiating, but it doesn't accept the values device A sent, then device B returns a Configure-Nak
(“negative acknowledge”) frame. This message includes a copy of each configuration option that B
found unacceptable.
If any of the options that A sent were either unrecognized by B, or represent ways of using the link
that B considers not only unacceptable but not even subject to negotiation, it returns a Configure-
Reject containing each of the objectionable options.
Even after receiving a reject, device A can retry the negotiation with a new Configure-Request.

LCP negotiation example
LCP Con f-Req Id:1 { Async_map:0x000
CLIENT 0x00217cbb, Prot_comp, Add a0000, Magic_number: NAS
r/ctl_comp, Callback}
c_map:0x00000000,
LCP Conf-Req Id:1 { MRU:1524, Asyn
r/ctl_comp}
Authent_prot:PAP, Prot_comp, Add
LCP Conf-Ack Id:1 { MRU:1524, Async_m

ap:0x00000000,
Authent_prot:PAP, Prot_comp, Addr/ctl
_comp,}
LCP Conf-Rej Id:1 {Callback}
LCP Conf-Req Id:2 { Async_map:0

x000a0000,
Magic_number:0x00217cbb, Prot
_comp, Addr/ctl_comp}
00 a0000,
{ Async_map:0x0
LCP Conf-Ack Id:2 p,
om Ad dr/ctl_comp}
00217cbb, Prot_c
Magic_number:0x

Password Authentication Protocol
username Alice password test

3 username Jack password secret
Connect To X
PAP Authenticate Request
User name Jack 2
Password secret 1 Jack + secret
4
::
=
PAP Authenticate Ack

The Password Authentication Protocol (PAP) provides a simple method for the peer to establish its
identity using a 2-way handshake. This is done only upon initial link establishment.
PAP is not a strong authentication method. Passwords are sent over the circuit "in the clear", and there
is no protection from playback
When PAP is enabled, the remote router attempting to connect to the access server is required to send
an authentication request. If the username and password specified in the authentication request are
accepted, the Cisco IOS software sends an authentication acknowledgement.
After you have enabled CHAP or PAP, the access server will require authentication from remote devices
dialing in to the access server. If the remote device does not support the enabled protocol, the call will
be dropped.
To use CHAP or PAP, you must perform the following tasks:
1. Enable PPP encapsulation.
2. Enable CHAP or PAP on the interface.
3.For CHAP, configure host name authentication and the secret or password for each remote
system with which authentication is required.

PAP message format
PAP
C023 Code Ident Lenght Data
1: Authenticate Request
2: Authenticate Ack
3: Authenticate Nack ID length Peer ID PW length Password
1
1
length Message

RFC 1334
The Code field is one octet and identifies the type of PAP packet. PAP Codes are assigned as follows:
1 Authenticate-Request
2 Authenticate-Ack
3 Authenticate-Nak
Identifier
The Identifier field is one octet and aids in matching requests and replies.
Length
The Length field is two octets and indicates the length of the PAP packet including the Code,
Identifier, Length and Data fields. Octets outside the range of the Length field should be treated a
Data Link Layer padding and should be ignored on reception.
Data
The Data field is zero or more octets. The format of the Data field is determined by the Code
field.
Peer-ID
The Peer-ID field is zero or more octets and indicates the name of the peer to be
authenticated.
Password
The Password field is zero or more octets and indicates the password to be used for
authentication.
Message
The Message field is zero or more octets, and its contents are implementation
dependent. It is intended to be human readable, and MUST NOT affect operation of the
protocol. It is recommended that the message contain displayable ASCII characters

CHAP (Challenge Handshake Authentication Protocol)
Connect To X hostname ISP_a

Username Jack 1
2 Challenge
Password secret
ISP_a + Random nb
username Alice password test

username Jack password secret
3
MD5 5
Non-reversible
algorithm 4
 Response MD5
Jack + 

6
7
Success =
::
Authentication succeeded

The Challenge-Handshake Authentication Protocol (CHAP) is used to periodically verify the identity of the
peer using a 3-way handshake.
When CHAP is enabled on an interface and a remote device attempts to connect to it, the access server
sends a CHAP packet to the remote device. The CHAP packet requests or "challenges" the remote
device to respond. The challenge packet consists of an ID, a random number, and the host name of the
local router.
When the remote device receives the challenge packet, it concatenates the ID, the remote device's
password, and the random number, and then encrypts all of it using the remote device's password. The
remote device sends the results back to the access server, along with the name associated with the
password used in the encryption process.
When the access server receives the response, it uses the name it received to retrieve a password stored
in its user database. The retrieved password should be the same password the remote device used in
its encryption process. The access server then encrypts the concatenated information with the newly
retrieved password—if the result matches the result sent in the response packet, authentication
succeeds.
The benefit of using CHAP authentication is that the remote device's password is never transmitted in
clear text. This prevents other devices from stealing it and gaining illegal access to the ISP's network.
CHAP transactions occur only at the time a link is established. The access server does not request a
password during the rest of the call. (The local device can, however, respond to such requests from
other devices during a call.)
After you have enabled CHAP, the access server will require authentication from remote devices dialing
in to the access server. If the remote device does not support the enabled protocol, the call will be
dropped.
To use CHAP, you must perform the following tasks:

1. Enable PPP encapsulation.
2. Enable CHAP on the interface.
3. For CHAP, configure host name authentication and the secret or password for each remote system
with which authentication is required.

CHAP message format
CHAP
C223 Code Ident Lenght Data
Challenge Name of system

Challenge value transmitting
1: Challenge length this packet
2: Response 1
3: Success
4: Failure Response Response value Name of system
length 128 bytes in MD5 transmitting
this packet
1
Length Message (optional)

Challenge and Response
The Challenge packet is used to begin the Challenge-Handshake Authentication Protocol. The
authenticator MUST transmit a CHAP packet with the Code field set to 1 (Challenge).
A Challenge packet MAY also be transmitted at any time during the Network-Layer Protocol phase to
ensure that the connection has not been altered.
Whenever a Challengepacket is received, the peer MUST transmit a CHAP packet with the Code field set
to 2 (Response).
Whenever a Response packet is received, the authenticator compares the Response Value with its own
calculation of the expected value. Based on this comparison, the authenticator MUST send a Success or
Failure packet
The Challenge Value is a variable stream of octets. The importance of the uniqueness of the Challenge
Value. The Challenge Value MUST be changed each time a Challenge is sent.
The Response Value is the one-way hash calculated over a stream of octets consisting of the Identifier,
followed by (concatenated with) the "secret", followed by (concatenated with) the Challenge Value.
The Name field is one or more octets representing the identification of the system transmitting the
packet
The Message field is zero or more octets, and its contents are implementation dependent. It is intended
to be human readable, and MUST NOT affect operation of the protocol. It is recommended that the
message contain displayable ASCII characters
Note: Because the Success might be lost, the authenticator MUST allow repeated Response packets after
completing the Authentication phase. To prevent discovery of alternative Names and Secrets, any
Response packets received having the current Challenge Identifier MUST return the same reply Code
returned when the Authentication phase completed(the message portion MAY be different). Any
Response packets received during any other phase MUST be silently discarded.

NCP message format
NCP-
NCP-IP
8021 Code Ident Lenght Data
Request/Response nb
8 8
Set-up: Type Length data

1: Configure Request Data length
2: Configure Ack
4: Configure Reject
Release
5: Terminate Request 1: obsolete
6: Terminate Ack 2: IP compression protocol (RFC1332)
3: IP Address (RFC1332)
link management 4 : Mobile-IPv4 [RFC2290]
7: Code Reject 129: Primary DNS Server Address [RFC1877]
130: Primary NBNS Server Address [RFC1877]
131: Secondary DNS Server Address [RFC1877]
132: Secondary NBNS Server Address [RFC1877]

NBNS= WINS
The IP Control Protocol (IPCP) is the NCP for IP and is responsible for configuring, enabling, and disabling
the IP protocol on both ends of the point-to-point link. The IPCP options negotiation sequence is the
same as for LCP, thus allowing the possibility of reusing the code.
IP-Compression-Protocol _ provides a way to negotiate the use of a specific compression protocol. By

default, compression is not enabled. Van Jacobson TCP/IP header compression reduces the size of the
TCP/IP headers to as few as three bytes. This can be a significant improvement on slow serial lines,
particularly for interactive traffic.
The IP-Compression-Protocol Configuration Option is used to indicate the ability to receive compressed
packets. Each end of the link must separately request this option if bi-directional compression is
desired.
IP-Address _ provides a way to negotiate the IP address to be used on the local end of the link. It allows
the sender of the Configure-Request to state which IP-address is desired, or to request that the peer
provide the information. The peer can provide this information by NAKing the option, and returning a
valid IP-address.
If negotiation about the remote IP-address is required, and the peer did not provide the option in its
Configure-Request, the option SHOULD be appended to a Configure-Nak. The value of the IP-address
given must be acceptable as the remote IP-address, or indicate a request that the peer provide the
information. By default, no IP address is assigned.
DNS Server Address _ defines a method for negotiating with the remote peer the address of the primary
and secondary DNS server to be used on the local end of the link. If local peer requests an invalid server
address (which it will typically do intentionally) the remote peer specifies the address by NAKing this
option, and returning the IP address of a valid DNS server. Default : No address is provided.
NBNS Server Address _ defines a method for negotiating with the remote peer the address of the
primary and secondary NBNS server to be used on the local end of the link. If local peer requests an
invalid server address (which it will typically do intentionally) the remote peer specifies the address by
NAK-ing this option, and returning the IP address of a valid NBNS server. By default, no primary NBNS
address is provided.

IPCP-Address negotiation
Client ISP
IPCP
Or wished IP@
Code=01 03 Length
8021 0.0.0.0
Req Ident Lenght=0A IP@ 06
2 1 1 2
Code=03
8021 Nack Ident Lenght=0A 03 06 194.1.2.3
valid IP@
Code=01 03 Length
8021 194.1.2.3
Req Ident Lenght=0A IP@ 06
2 1 1 2
Code=02
8021 03 06 194.1.2.3
Ack Ident Lenght=0A

IP-Address
This Configuration Option provides a way to negotiate the IP address to be used on the local end of the
link. It allows the sender of the Configure-Request to state which IP-address is desired, or to request
that the peer provide the information. The peer can provide this information by NAKing the option,
and returning a valid IP-address.
If negotiation about the remote IP-address is required, and the peer did not provide the option in its
Configure-Request, the option SHOULD be appended to a Configure-Nak. The value of the IP-address
given must be acceptable as the remote IP-address, or indicate a request that the peer provide the
information.
By default, no IP address is assigned.

IPCP _ Van Jacobson compression
4 bytes
Compression
Version Header
length
Type Of
Datagram length
Service
Identification Flag Datagram Offset
I
P TTL Protocol Checksum 1 byte
Flags indicating the presence of the field

Source IP address
Destination IP address c i p s a w u
Connection nb
Source port nb Destination port nb
Checksum TCP
T Sequence number
C Urgent Pointer (u)
Ack. number Window delta (w)
P Header U A P R S F
Acknowledge delta (a)
length Reserved R
G
C S S
K H T N N
Y I Window size
Sequence delta (s)
Checksum Urgent pointer ID delta (i)
Data Data

One important option used with IPCP is Van Jacobson Header Compression, which is used to reduce the
size of the combined IP and TCP headers from 40 bytes to approximately 4 by recording the states of a
set of TCP connections at each end of the link and replacing the full headers with encoded updates for
the normal case, where many of the fields are unchanged or are incremented by small amounts between
successive IP datagrams for a session. This compression is described in RFC 1144.

Data compression negotiation : CCP
CCP: Compression Control Protocol

80FD Code Ident Lenght Data
16 8 8 16
Request/Response nb
8 8
Setup: Type Length data

1: Configure Request Data length
2: Configure Ack
4: Configure Reject
Release 0: OUI
5: Terminate Request 1: Predictor type 1
6: Terminate Ack 2: Predictor type 2
3: Puddle Jumper
Link management 4:-15: unassigned
7: Code Reject 16: Hewlett Packard PPC
14: Reset-request 17: Stac Electronic LZS
15: Reset-Ack 18: Microsoft PPC
19: Gandalf FZA
20: V42bis compression
21: BSD LZW Compress


IP packet transfer
IP IP datagram
IP
Flag Address Control Protocol CRC Flag
PPP 7E FF 03 0021 7E
1 1 1 2 2 1
Could be compressed

2. Multilink Point-to-Point protocol (MP)


2. Multilink Point to Point protocol
Multilink PPP stack
Transport Layer
Protocol Transport Layer Protocol
Network Layer
Protocol Network Layer Protocol
Multilink PPP
PPP
PPP PPP PPP
Line 1 Line 3 Line 2 Line 3

Multilink PPP is an optional feature of PPP, so it must be designed to integrate seamlessly into regular
PPP operation. To accomplish this, MP is implemented as a new architectural “sublayer” within PPP. In
essence, an Multilink PPP sublayer is inserted between the “regular” PPP mechanism and any network
layer protocols using PPP. This allows MP to take all network layer data to be sent over the PPP link and
spread it over multiple physical connections, without causing either the normal PPP mechanisms or the
network layer protocol interfaces to PPP to “break”.
It works by fragmenting whole PPP frames and sending the fragments over different physical links.

Multilink PPP option negotiation
LCP Configure Request

{ MRU=1500; MRRU
= 1500; End-Point
Disc = 00-00-10-0B
-F2-3A}
LCP Configure Nack

= 1490}
{MRU = 1490; MRRU
LCP Configure Request

{ MRU=1490; MRRU
= 1490; End-Point
Disc = 00-00-10-0B
-F2-3A}
Ack
LCP Configure F2-3A}
d-Point Disc = 00-00-10-0B-
RU = 1490; En
{MRU = 1490; MR

To use Multilink PPP , both devices must have it implemented as part of their PPP software and must
negotiate its use. This is done by LCP as part of the negotiation of basic link parameters in the Link
Establishment phase. Three new configuration options are defined to be negotiated to enable Multilink
PPP:
Multilink Maximum Received Reconstructed Unit: Provides the basic indication that the device
starting the negotiation supports MP and wants to use it. The option contains a value specifying the
maximum size of PPP frame it supports. If the device receiving this option does not support Multilink
PPP it must respond with a Configure-Reject LCP message.
Multilink Short Sequence Number Header Format: Allows devices to negotiate use of a shorter
sequence number field for MP frames, for efficiency.
Endpoint Discriminator: Uniquely identifies the system; used to allow devices to determine which links
go to which other devices.
Before MP can be used, a successful negotiation of at least the Multilink Maximum Received Reconstructed
Unit option must be performed on each of the links between the two devices. Once this is done and an LCP
link exists for each of the physical links, a virtual bundle is made of the LCP links and Multilink PPP is
enabled.

Multilink PPP mechanism
PPP Frame
PPP Frame
Frag.1
Frag.3 Frag.2 Frag.1 PPP PPP
Line 1 Line 1
Frag.2
PPP PPP
Line 2 Line 2
MP
Multilink PPP
Sublayer Frag.3
PPP PPP
Line 3 Line 3

Multilink PPP basically sits between the network layer and the regular PPP links and acts as a “middleman”:
Transmission: Multilink PPP accepts datagrams received from any of the network layer protocols configured
using appropriate NCPs. It first encapsulates them into a modified version of the regular PPP frame. It then
takes that frame and decides how to transmit it over the multiple physical links. Typically, this is done by
dividing the frame into fragments that are evenly spread out over the set of links. These are then
encapsulated and sent over the physical links. However, an alternate strategy can also be implemented as
well, such as alternating full-sized frames between the links. Also, smaller frames are typically not
fragmented, nor are control frames such as those used for link configuration.
Reception: Multilink PPP takes the fragments received from all physical links and reassembles them into
the original PPP frame. That frame is then processed like any PPP frame, by looking at its Protocol field and
passing it to the appropriate network layer protocol.
The fragmenting of data in MP introduces a number of complexities that the protocol must handle. For
example, since fragments are being sent simultaneously, we need to identify them with a sequence
number to facilitate reassembly. We also need some control information to identify the first and last
fragments of a frame.

Multilink PPP frame format
IP Data IP Header
Network Layer
MP Sub-Layer 1 byte
Prot. Original PPP frame

IP PDU 0x21 with ACFC & PFC
PPP
CRC Frag.1 Prot.
0x21
Sequence MP Protocol Ctrl. Add. Flag
Number Flags 0x003D 0x03 0xFF 0x7E Line 1
Sequence MP Protocol Ctrl. Add. Flag PPP

CRC Frag.2 Number Flags 0x003D 0x03 0xFF 0x7E Line 2
Sequence MP Protocol Ctrl. Add. Flag PPP

CRC Frag.3 Number Flags 0x003D 0x03 0xFF 0x7E Line 3

Several of the fields that normally appear in a “whole” PPP frame aren’t needed if that frame is going to
then be divided and placed into other PPP Multilink frames, so when fragmentation is to occur, they are
omitted when the original PPP frame is constructed for efficiency’s sake. Specifically:
The Flag fields at the start and end are used only for framing for transmission and aren’t needed in the
logical frame being fragmented.
The FCS field is not needed, because each fragment has its own FCS field.
The compression options that are possible for any PPP frame are used when creating this original frame:
Address and Control Field Compression and Protocol Compression. This means that there are no Address
or Control fields in the frame, and the Protocol field is only one byte in size.
These changes save a full eight bytes on each PPP frame to be fragmented. As a result, the original PPP
frame has a very small header, consisting of only a one-byte Protocol field. The Protocol value of each
fragment is set to 0x003D to indicate a MP fragment, while the Protocol field of the original frame becomes
the first byte of “data” in the first fragment.
Beginning Fragment Flag _ When set to 1, flags this fragment as the first of the split-up PPP frame. It is
set to 0 for other fragments.
Ending Fragment Flag _ When set to 1, flags this fragment as the last of the split-up PPP frame. It is set
to 0 for other fragments.
Reserved (2 or 6 bits) _ Not used, set to zero.
Sequence Number (12 or 24 bits) _ When a frame is split up, the fragments are given consecutive sequence
numbers so the receiving device can properly reassemble them.
Fragment Data: The actual fragment from the original PPP frame.

End of Section


4
Section 4
IP Layer
IP Technology

Blank Page

IP Protocol
Document History

1. IP Addressing

IP Protocol

4 IP Protocol
Analogy between PSTN Dialing and IP
Telephone dialing
French RTC Finnish RTC
Country code= 33 Country code= 358
Barbados RTC Russian RTC

Country code= Border Country code= 7
1246
Telephone number: country code Designation number
IP numbering
Class-
Class-A networks Class-
Class-B networks Class-
Class-C networks
Large IP medium IP
Large IP medium IP
Network
Medium
medium IPIP
Network
Large IP networks Network
Network networks
Network
Border ƒ (class) Small IP
networks
IP address: Network ID Host ID

IP Protocol
To understand the IP addressing format, an analogy can be drawn with the telephone numbering system.
Various countries have telephone networks.
Each country has a country code. Some codes comprise only one figure, some 2, others 3, etc.
So, to reach a particular telephone, you need to dial a number made up of:
a country code,
a designation number.
The boundary between the two fields varies according to the size of the country.
The total number of figures cannot exceed a certain limit. This means that small countries with 4-figure
country codes have less capacity in terms of number of subscribers possible than large countries with
single-figure country codes.
This is also the case with IP addressing where there are:
a few large networks,
a few more medium-sized networks,
a large number of small networks.
A device IP address is divided into two parts:
the Network Identifier (or Net ID),
the station identifier known as the Host ID.
The boundary between these 2 fields also varies.
The boundary can be placed in one of 3 positions and thus determines three types of network:
class-A networks,
class-B networks,
class-C networks.

4 IP Protocol
Network Size
Net ID
0
(7bits) Host Id (24bits)
8 9 16 17 24 25 32
Class-A 1 Number of networks: 126

network Number of Hosts: 16 777 214
Net Id from: 1.0.0.0 to 126.0.0.0
126.0.0.0
10 Net ID (14bits) Host Id (16bits)

Class-B Number of networks: 16 384
network
Number of Hosts: 65 534
Net Id from: 128.0.0.0
128.0.0.0 to 191.255.0.0
191.255.0.0
110 Host Id
Net ID (21bits)
Class-C (8bits)
network Number of networks: 2 097 152
Number of Hosts: 254
Net Id from: 192.0.0.0
192.0.0.0 to 223.255.255.0
223.255.255.0
IP Protocol
The Class-A network type, which uses 7 bits for the Net ID, enables the creation of only 126 networks.
Obviously, 128 combinations are possible with 7 bits but, as you will see later on, certain values are
reserved. The 24-bit Host ID means that a large number of stations can be connected per network (up to
16,777,214). So, Net IDs for Class-A networks can range from 1.0.0.0 to 126.0.0.0
The Class-B network type, which uses 14 bits for the Net ID, enables the creation of 16,384 networks. The
16-bit Host ID means that a maximum of 65,534 stations can be connected per network. So, Net IDs for
Class-B networks can range from 128.0.0.0 to 191.255.0.0
The Class-C network type, which uses 21 bits for the Net ID, enables the creation of up to 2,097,152
networks. However, with only 8 bits for the Host ID no more than 254 stations can be connected per
network. So, Net IDs for Class-C networks can range from 192.0.0.0 to 223.255.255.0
The IP addresses, which are made up of 32 bits, enable over 4 billion combinations. This would seem to be
enough capacity to satisfy the world’s IP-address requirements.
So why is there a lack of IP addresses at the moment?
Because of this class-based organization.
Because Class-C networks allow a maximum of only 254 hosts, they severely restrict the development
potential of a business’s network. So much so that in the 80s, even small businesses were asking for Class-B
Net IDs, which enable the connection of 65,000 hosts.
In reality, few Class-B networks actually use all the IP-address potential available.
If a Class-B Net ID is assigned to a network and only 2,000 addresses are used, the other 63,000 addresses
are unusable and therefore completely wasted.
Indeed, the same Net ID cannot be used elsewhere in the world. You will see later on that the routers
analyze the destination address of IP packets and first try to reach the network (i.e. the Net ID) of the
destination station. If several networks located in different areas have the same Net ID, you can imagine
the confusion at router level.

4 IP Protocol
Special IP @: Broadcast Limited to the Network
Destination
IP@ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
255 . 255 . 255 . 255
IP IP src IP dest
An IP-level broadcast → 255.255.255.255 data
172.245.0.1→
triggers an Ethernet-level
broadcast MAC MAC@dest MAC@src Type
FCS
ff:ff:ff:ff:ff:ff 01:00:2a:01:22:11 0800
Network
172.245.0.0

IP Protocol
We have seen that special multicast and broadcast addresses are used at MAC level. Similarly, special IP
addresses have also been defined at IP level.
The first special address is the broadcast.
A station wishing to transmit an IP packet to all stations connected to the same network uses a broadcast.
In such cases, all the IP-address bits are set to 1.
An IP-level broadcast, which has the destination address 255.255.255.255, automatically triggers a MAC-
level broadcast, which has a destination MAC address in which all the bits are set to "f".
It should be noted that broadcast packets can never go through a router.

4 IP Protocol
Special IP @: Unknown Source IP @
2 IP src IP dest DHCP:

1 IP@= ? IP 0.0.0.0→
→ 255.255.255.255 IP@ Request
3
MAC MAC@dest MAC@src Type
FCS
ff:ff:ff:ff:ff:ff 00:01:2a:01:22:11 0800
MAC: 01:00:2a:01:22:11
4
5
@ pool
DHCP server
(IP@ server)
IP@=0.0.0.0 can be used by an host at start-up to obtain

the IP @ of a BOOTP or DHCP server.

IP Protocol
Another special address is the unknown address.
The station addresses can be provided dynamically by a server. This server can be a "bootp" server or a
"DHCP" server.
Therefore, a station without an IP address that wishes to communicate over the network first sends an IP-
address request to a server.
The station does this by generating an IP packet with the source address 0.0.0.0 (signifying unknown
address) and a broadcast destination address (because the station doesn’t know the server address).
This packet is returned to the MAC protocol, which encapsulates it in a broadcast frame.
The server will take an available address from its address pool.
The special address 0.0.0.0 is used as the source address at start-up only.

4 IP Protocol
Special IP @: Host Loopback
The IP @ : 127. _._._ allows a communication between 2 applications
Application 2 Application 1
IP protocol @IP:Z @IP:127.0.0.1
this address is not sent

IP @ :Z over the network
IP @ : Y

IP Protocol
The class-A network 127.0.0.0 is defined as the loopback network. Addresses from that network are
assigned to interfaces that process data within the local system. These loopback interfaces do not access a
physical network.

4 IP Protocol
NetID
Each network has a unique NetID

The router
200.98.76.0
200.98.76 interface also
has an IP@
192.100.17.0
eth0 eth1
Hub Hub
200.98.76.254 192.100.17.254
200.98.76.1 192.100.17.1
200.98.76.2 192.100.17.2
200.98.76.3 192.100.17.3
200.98.76.253 192.100.17.253
Class-C network => 254 hosts maximum
IP Protocol
Let’s now take the example of this router, which has 2 interfaces.
Two networks can be created.

The networks in this particular example are Class-C networks because the Net ID is greater than or equal to
192.
They have different Net IDs. Indeed, each time a packet passes through a router it must change networks.
Each router interface has an address that is part of the addressing space of the network to which it is
connected. It should be noted that administrators generally assign the highest addresses to the router
interfaces. This means Host ID 254 in this example since Host ID 255 is reserved for broadcasts.
Each station connected to this network is assigned an IP address comprising the network Net ID and an
available Host ID.
The same applies to the second network: the router interface is assigned an address on this network and all
the stations connected to the second network will have an address containing this network Net ID.
To conclude
A Class-C network has 254 addresses: Host IDs "0" and "255" are reserved.

4 IP Protocol
Public Addresses – Private Addresses
IP@: 154.11.22.33
Public IP@
IP@: 195.51.63.1
IP@: 9.1.2.3
•Assigned by IANA
Internet
•Globally unique
•Cannot circulate on the Internet
IP@: 10.6.7.8
IP@: 10.6.7.8
Private network Private network
10.0.0.0 10.0.0.0
•Address ranges reserved by IANA

•Can be used several times Private IP@
IP Protocol
Internet addressing comprises two types of internet address:

public addresses,
private addresses.
A public address is an official address assigned by the IANA, which is the body responsible for allocating
Internet IP addresses.
This type of address is globally unique.
The IANA has set aside certain blocks of addresses for private networks.
These addresses are never assigned to Internet stations and cannot circulate on the Internet.
Several private networks can use the same Net ID. There is no ambiguity as long as the networks are not
interconnected.

4 IP Protocol
Ranges of Private Addresses
Private IP@
Private net.
Public IP@
class A: 10.0.0.0 à 10.255.255.255 (1 network)
Private Internet
networks
class B: 172.16.0.0 à 172.31.255.255 (16 networks)
Private
networks
class C: 192.168.0.0 à 192.168.255.255 (256 networks)

IP Protocol
The blocks of addresses set aside by the IANA are as follows:
In Class A, the network 10.0.0.0
In Class B, 16 networks with Net IDs 172.16. to 172.31
In Class C, 256 networks with Net IDs 192.168.0 to 192.168.255

4 IP Protocol
Private IP Networks and Internet Connection
10.10.10.8 194.5.3.12
IP@: data
10.10.10.8
1
Intranet 1
NetID: 10.10.10.0
2
Deleted Internet
packet 194.5.3.12
Private IP
addresses

IP Protocol
Let’s assume that a private network administrator decides to connect his/her network to the Internet.
But private IP addresses are not allowed to circulate over the Internet. The Internet access router destroys
any packet with private addresses.

4 IP Protocol
Network Address Translation (NAT)
Private IP @ Public IP @
2 10.10.10.4 212.17.22.21 3
212.17.22.22
212.17.22.23
.3
NAT
.1
Private network
10.10.10.0
.4 Internet
194.5.3.12
.2
IPsrc: 212.17.22.21
IPsrc: 10.10.10.4 IPdest: 194.5.3.12
1 IPdest: 194.5.3.12
4
IPsrc: 194.5.3.12 IPsrc: 194.5.3.12

IPdest: 10.10.10.4 IPdest:212.17.22.21 5
6

IP Protocol
A solution does exist to enable private stations to communicate with other stations on the Internet: the
Network Address Translation (NAT) function.
The administrator asks the IANA to allocate a public address and configures the NAT function in the Internet
access router.
When a station from the private network sends a packet to a station on the Internet, the access router
intercepts the packet, stores the source IP address and replaces it with an available public IP address from
the pool.
The packet has been transformed and can now circulate over the Internet.
The Internet server can reply by exchanging the source and destination public addresses in the IP-packet
header.
The access router consults its table to restore the private IP address before sending the packet to the
private network.
The NAT function has its limits: at any given time, the number of stations surfing the internet must equal
the number of public addresses allocated by the IANA.
Other mechanisms can be used such as Port Address Translation (PAT) or proxies, which are beyond the
scope of the TCP/IP beginners course.

4 IP Protocol
DHCP
Client
Server 2 Server 1
Broadcast « dhcp_discover » (MAC@+ requested services)
_offer » (IP @1 + services)

Broadcast« dhcp )
p_of fe r » (@IP2 + services
Broadcast« d hc
Broadcast« dhcp_request »(MAC@+server1+
requested services)
Broadcast« dhcp_ack » (IP@1+services)

Lease
time

IP Protocol
Role of DHCP (extension of BOOTP)

allocates IP addresses dynamically.
provides other useful information for client configuration (DNS address, etc.).
facilitates administration (remote client configuration).
Principle:
Several servers can reply to a request.
DHCP discover: sent by the client to locate the DHCP servers.
DHCP-offer: routes the services offered by a DHCP server.
DHCP-Request: client accepts the server’s offer. Also used to extend the lease.
DHCP-Ack: the server sends the client the configuration.

The IP addresses are supplied:
for a limited period ("lease time") expressed in seconds (from 0 to 100 years).
permanently ("permanent lease"); lease time =ff.ff.ff.ff
Certain IP addresses can be allocated to specific clients (MAC@/IP@).
DHCP-Nack: this message can be sent back to the client when, for example, the server refuses to extend
the lease or the client was too slow to reply to the offer.

4 IP Protocol
Default Gateway
IP@src: 1.0.0.1
1 IP@dest: 2.0.0.2
IP level IP@: 2.0.0.2
Yes IP dest. 2 No
within local
net? Default gateway
=IP@: 1.0.0.254 Other network
9
ARP cache
IP@ MAC@ Router
1.0.0.2 405060
?????? 7
3 1.0.0.254 908070
5
MAC@ MAC@ Type Data FCS IP@:1. 0.0.254
dest. 0800 IP@src: 1.0.0.1 MAC@: 908070
IP@: 1.0.0.1 src. IP@dest: 2.0.0.2
MAC@: 102030 908070 102030 (IP)
8
4 6 ARP Response
ARP Request MAC@ : 908070 MAC@:405060
IP @ : 1.0.0.254 IP@:1.0.0.2
IP Protocol
As you can imagine, if the destination IP address is the address of a station located on the other side of the globe, you
cannot use the broadcast mechanism as it will flood the Internet with messages. That’s precisely why routers never
propagate broadcasts. A broadcast is always restricted to the network in which it was generated. How, then, can
stations in different networks communicate with each other?
In fact, at the IP level of a station, when a packet needs to be sent, the first question IP considers is "Is the destination
address inside or outside the network?"
If the destination address is inside the same network, the usual procedure applies: consultation of the ARP table, ARP
procedure if necessary, etc.
If, however, the destination address is outside the network, the station configuration must indicate the address of the
default router through which the packet must be routed to reach the destination.
This parameter is often called the "default gateway". The transmit station must now transfer the packet as far as this
default gateway. The default gateway has an interface connected to the same network as the transmit station and
therefore has an IP address in the same network (with the same Net ID).
This station knows how to send a packet to another station connected to the same network. It consults its ARP table. If
the MAC address of the default gateway is not yet known, it initiates an ARP procedure by generating a request in the
form of a broadcast. This broadcast will not leave the network but will reach the interface of the router that is
connected to the same network.
The router will reply by sending its interface MAC address. This MAC address will be stored in the station ARP table.
And, finally, the IP packet intended for the remote station will be encapsulated in a frame whose destination MAC
address is the MAC address of the next router. This router is appropriately named "next hop".
It is now the router job to consult its routing table to establish which is the best outgoing interface to use to reach the
final destination. Once again, the routing table indicates the IP address of the next router that will move the packet
nearer to its final destination. A new ARP procedure might be initiated between these 2 routers to retrieve the MAC
address of the next router, and so on.
So, once again, you can see that the physical addresses are used constantly to move the IP packets through the network
to their final destination.

4 IP Protocol
Destination IP @ "Inside" or "Outside" the LAN?
configuration Host
Default gateway:128.5.15.5
2 Host IP@: 128.5.4.1
4 class B 3
1 Dest IP@: 128.5.26.2
Same = ARP cache
network IP@ MAC@
5 128.5.26.2 908070 6
128.5.15.5 405060 IP@: 128.5.26.2
MAC@: 908070
MAC@ MAC@ Type Data

MAC@:102030 dest. IP@src: 128.5.4.1 F
src. 0800 IP@dest: 128.5.26.2 C
IP@:128.5.4.1 7 908070 102030 (IP) S
MAC@: 405060
Internet IP@: 128.5.15.5

IP Protocol
Once the station has been configured, when an IP packet needs to be sent to the address 128.5.26.2, the
station determines whether this IP address is inside or outside its network.
First of all, it analyzes its own IP address to determine which class its own network belongs to. In this
example, 128 indicates a Class-B network address.
Once the station knows the class, it knows where the boundary is between the Net ID and the Host ID for its
own network. Here, the Net ID is two bytes long.
The station therefore compares just the Net ID bytes of the source and destination addresses.
In this example, the Net IDs are identical, which means that the destination IP address is located in the
same network as the transmit station.
The station does not need to send the packet through the default gateway. It just needs to consult the ARP
table directly and possibly initiate an ARP procedure on its LAN if the corresponding MAC address is not yet
known. Here, the ARP table has been updated.
The transmit station can therefore encapsulate the packet in an Ethernet frame whose destination MAC
address will be the MAC address of the IP packet destination station.

4 IP Protocol
Destination IP @ "Inside" or "Outside" the LAN? ’ (2)
configuration Host
Default gateway:128.5.15.5
128.5.15.5 6
2 Host IP@: 128.5.4.1
4 class B 3
≠ 1 Dest IP@: 128.6.6.6
5
Other
network ARP cache
IP@ MAC@
128.5.26.2 908070
128.5.15.5 405060
IP@: 128.5.26.2
7 MAC@: 908070
MAC@ MAC@ Type Data

MAC@: 102030 dest. IP@src: 128.5.4.1 F
src. 0800 IP@dest: 128.6.6.6 C
IP@: 128.5.4.1 8 405060 102030
405060 102030 (IP) S
MAC@: 405060
Internet IP@: 128.5.15.5

IP Protocol
Let’s now assume that this station wishes to send a packet to IP address 128.6.6.6
Once again, it analyzes its own IP address and determines that it’s a Class-B network address. It freezes the
2 Net ID bytes and compares them.
This time, the Net IDs are different and the destination station is therefore located in another network.
This means that the packet must go through a router, which will be the default gateway defined in the
station configuration.
The station knows the router’s IP address and now needs to find the corresponding MAC address.
The station consults its ARP cache. In this case, the cache contains the MAC address. If it had not contained
the address, the station would have launched an ARP procedure.
The packet is encapsulated in an Ethernet frame whose destination MAC address is the address of the next
router on the route leading to the final destination (rather than the MAC address of the final destination
station).

4 IP Protocol
Subnetworks
128.5.4.3 128.5.4.5
Internet
S/Net 128.5.4.0
128.5.4.2 128.5.4.4
128.5.4.1
Network 128.5.0.0
128.5.8.1 S/Net
128.5.8.3
128.5.8.0128.5.8.5
128.5.8.2 128.5.8.4

IP Protocol
The class-based system for network classification lacks the flexibility needed to handle the explosion in the
number of IP networks and devices.
In 1984, to prevent too many stations from being connected to the same network and also because the
distance between sites was increasing, the decision was taken to introduce the "subnetwork" or "subnet"
concept in the aim of offering administrators of large networks an extra hierarchical level.
The Net IDs of these subnetworks borrow a few bits from the Host ID to ensure that the subnetworks are
clearly identified.
Here, the Class-B network 128.5.0.0, which had a capacity of around 16 million host stations, has been
divided into 2 subnetworks with Net IDs 128.5.4.0 and 128.5.8.0 respectively.
So three bytes are used for the Net ID in these subnetworks.
And, of course, all the stations belonging to network 128.5.4.0 have IP addresses starting with 128.5.4 and
all the stations connected to network 128.5.8.0 have IP addresses starting with 128.5.8

4 IP Protocol
Subnet Mask
The "Subnet Mask" indicates the length

of the NetID part in the IP address IP@: 128.5.8.4
IP@src: 128.5.4.3
1 IP@dest 128.5.8.4 IP level Other network
Yes IP dest. 2 No
within local Router
net?
Default gateway
=IP@: 128.5.4.1
Mac@: 304050
IP@: 128.5.4.1
IP@: 128.5.4.3
MAC@: 102030
MAC@:708090
IP@:128.5.4.5

IP Protocol
What can be done to resolve this problem?

The dividing line between Net ID and Host ID can no longer be based on the network class.
Since the introduction of the subnetwork concept, a new parameter has also been developed: the "Subnet
Mask".

4 IP Protocol
Subnet Mask Mechanism
Src IP@: 138 . 5 . 17 . 5

1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1
Dest IP@: 138 . 5 . 19 . 37

1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 1 0 1
24 2322 21 20
Net ID: 138 . 5 . 16 . 0
Mask: 255 . 255 . 252 . 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

IP Protocol
Let’s look at Subnet Mask and at the mechanism for determining whether a destination IP address is "inside"
or "outside" the transmit station network.
Let’s consider an example.
There are two IP addresses: a source address and a destination address. The question is: Are these two
addresses in the same subnetwork?
If the Net ID is 3 bytes long, the answer is no.
If the Net ID is 2 bytes long, the answer is yes.
It is clear that an additional parameter is required to indicate the length of the Net ID. This parameter is
the Subnet Mask.
You will see that the difficulty in processing addresses lies in the fact that they are expressed as decimal
numbers.
To make things completely clear, let’s convert the mask into binary, then apply the mask to both the
source and destination address.
The Net ID of the source IP address now appears clearly and can be compared to the corresponding bits of
the destination address.
You can now see clearly that the 2 addresses are in the same subnetwork.
What is the Net ID of this subnetwork? Once again, there is a slight difficulty concerning translation of the
third byte.

4 IP Protocol
Subnet Mask Notation
Dotted decimal notation

IP @: 138 5 19 37
Netmask: 255 255 252 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
Prefix notation
@ IP: 138 5 19 37 22

IP Protocol
There are two methods for giving a network mask:
Dotted decimal
prefix
In dotted decimal notation, each byte of the mask is given in decimal

In prefix notation, the prefix indicates hoaw many bytes the mask is composed of.

4 IP Protocol
Search for a Router
PC configuration
5 Default gateway:128.5.4.1
gateway:128.5.4.1
Host IP@: 128. 5 . 4 .3 4
ARP cache 1 Subnet Mask: 255.255.255.0
IP @ MAC@ ≠
128.5.4.5 708090 2 IP@ dest: 128. 5 . 8 .4
128.5.4.1 304050
3
IP@: 128.5.4.3
Mac@: 102030
6
MAC@ MAC@ Type IP Packet F
dest. src. 0800 IPdest:
IPdest: 128.5.8.4 C IP@: 128.5.4.5
304050 102030 (IP) IPsrc:
IPsrc: 128.5.4.3 S Mac@: 708090
Mac@: 304050 Subnet 128.5.4.0

IP@: 128.5.4.1
IP@: 128.5.8.4
IP@: 128.5.8.1 Subnet 128.5.8.0 Mac@: aabbcc

IP Protocol
Let’s now consider whether the mask has solved the problem of communicating between subnetworks.
The subnet mask must be included in all station configurations along with the default gateway and the IP
address.
Thanks to previous traffic, the ARP cache already contains the MAC addresses of the stations in the same
network.
This station wishes to transmit a packet to the station with the address 128.5.8.4.
From now on, it’s the mask rather than the class that determines the Net ID of the source network.
This time, then, the transmit station discovers that the destination address is outside the network and so
sends the packet to the default gateway using the address in the configuration.
The station consults its ARP cache, which contains the corresponding MAC address.
A frame can therefore be transmitted to the router. The frame contains the IP packet intended for the
remote station.

4 IP Protocol
Classful/Classless Addressing
"Classful" addressing
Which class of network is to be selected?
Too small class-C network (254 hosts maxi)

Enterprise
500 hosts
Class-B network (65534 hosts maxi)

Consequences?
Waste of IP addresses
"Classless" addressing
Network aggregation

IP Protocol
CIDR stands for Classless Inter-Domain Routing.
Historically, IP addresses were assigned within classes: Class A (8 bits of network address, 24 bits of host
address), Class B (16 bits of network address, 16 bits of host address) and Class C (24 bits of network
address, 8 bits of host address). With the advent of CIDR, address space is now allocated on a bit boundary
basis.

4 IP Protocol
Classless Inter-Domain Routing (CIDR)
Network: 201 . 78 . 48 . 0
1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
NetID HostID (9 bits)

510 hosts
≡ 2 class-C networks
201.78.48.0/23
500 hosts
CIDR: enables to allocate the required amount of IP addresses

IP Protocol
CIDR stands for Classless Inter-Domain Routing.
Historically, IP addresses were assigned within classes: Class A (8 bits of network address, 24 bits of host
address), Class B (16 bits of network address, 16 bits of host address) and Class C (24 bits of network
address, 8 bits of host address). With the advent of CIDR, address space is now allocated on a bit boundary
basis.

4 IP Protocol
Classless Inter-Domain Routing (CIDR) [cont.]
Net1:
Net1: 201 . 78 . 48 . 0 / 23
1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Net2:
Net2: 201 . 78 . 50 . 0 / 23
1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0
Net3:
Net3: 201 . 78 . 52 . 0 / 22
1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0
Net4:
Net4: 201 . 78 . 56 . 0 / 21
1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
Destination Next hop 201.78.56.0/21

201.78.48.0/20 IP@1 Net4: 2046 hosts 201.78.48.0/23
Net1: 510 hosts
IP@1 IP@2
201.78.50.0/23
Destination
Next hop 201.78.52.0/22 Net2: 510 hosts
201.78.48.0/22 IP@2 Net3:1022 hosts
CIDR: enables to aggregate addresses in the routing tables

IP Protocol

2. IP Routing

IP Protocol

4 IP Protocol
Format of the IP Datagram
byte byte byte byte
Version Header Type Of

Service Datagram length
length
Identification Flag Datagram offset
TTL Protocol Checksum
Source IP address
Destination IP address
Option
s
Data

IP Protocol
Here are the different fields.

We will now look at them one by one but will not deal with them in order since many of them are
interlinked.
As you can see, the packets are generally represented in the form of 4-byte words. The importance of this
will become clearer later on in the module.

4 IP Protocol
The Different Types of Routing
Static
Generates no traffic and saves bandwidth
Easy to create for simple networks
Manual programming
No re-routing in case of default
Risk of error occurring
Dynamic
Automatically re-routes the traffic in case of failure
Ideal for large networks
Generates traffic on the network
Leads to a processing overload in the routers
IP Protocol
Static Routing
Static routing is carried out manually by the network administrator. The administrator is responsible for
detecting and propagating routes throughout the network. The administrator enters the routes manually in
the configuration of each of the network’s routing devices.
Once the router has been configured, it simply transfers the packets using the predetermined ports. There
is no communication between the routers concerning the actual network topology.
In small networks with little redundancy, the static routing process is quite easy to manage. However, this
method has certain drawbacks as far as the management of IP routing tables is concerned:
the static routes require a high level of coordination and maintenance in complex network environments,
the static routes do not adapt dynamically to the operating state of the network. When a destination
subnetwork becomes unreachable, the static routes leading to this network remain in the routing table.
Traffic continues to be transmitted to this destination. Until the network administrator updates the static
routes in line with the new network topology, traffic cannot be routed along other existing routes.
Dynamic Routing
Dynamic routing algorithms enable routers to detect and adapt automatically to the routes in the network.

4 IP Protocol
Principle of Dynamic Routing
The router announces which

networks it can reach
The router calculates the

routes from the announcements

IP Protocol

4 IP Protocol
Algorithms of Routing Protocols
Routing algorithm
Distance Vector Link State
•RIP •OSPF
•BGP (path Vector) •IS-IS
•EIGRP
RIP: Routing Information Protocol
IS-IS: Intermediate System to Intermediate System
OSPF: Open Shortest Path First
EIGRP: Enhanced Internet Gateway Routing Protocol
BGP: Border Gateway Protocol

IP Protocol
Several dynamic routing protocols are currently used for automatic route detection. The difference
between these protocols lies in the way they detect and calculate new routes to destination networks.
They can be divided into two main categories:
distance vector protocols.
link state protocols.

4 IP Protocol
Classes of Routing Protocols
(IS-IS)Janet Autonomous
system
INTERNET
Sphinx
(OSPF)
BGP (OSPF)
Sprint
DFN
(IGRP)
Autonomous
system (EIGRP)
Renater
2 classes of protocols:
Interior Gateway Protocol (RIP, IGRP, OSPF, IS-IS, etc.)
Exterior Gateway Protocol (EGP, BGP)

IP Protocol
Autonomous Systems (ASs) are logical portions of a larger IP network. ASs are usually networks inside
organizations. They are controlled by a single administration authority.
Certain routing protocols are used to determine routing paths within an AS while others are used to
interconnect several ASs:
Interior Gateway Protocols: enable routers to exchange information within an AS. Examples: OSPF and
RIP.
Exterior Gateway Protocols: enable ASs to exchange information with other ASs. Example: BGP.
The interior protocols are used to manage routing information within each AS. The figure also shows the
exterior protocols, which manage information on routing between ASs.
Numerous interior routing processes can be used within an AS. When this arises, the AS must present itself
to the other ASs with a single, coherent routing plan. The AS must provide a coherent view of its internal
destinations.

4 IP Protocol
Routing Table: Principle
204.92.75.0
204.92.75.0
.8 .6 .9
.3 .12
.13
.25 .2
.1 e2 .2 .1
.1 e1 e0 .7
e1 e0 1/1/2
204.92.77.0
204.92.77.0 204.92.76.0
204.92.76.0 R1 192.168.201.0
192.168.201.0
R2
#interface e1
Network Mask Next hop If
ip address 204.92.76.2 255.255.255.0 204.92.76.0 255.255.255.0 e1
# interface e0_ 192.168.201.0 255.255.255.0 e0
ip address 192.168.201.1 255.255.255.0
0.0.0.0(default) 0.0.0.0 204.92.76.1 e1
#ip route 0.0.0.0 0.0.0.0 204.92.76.1

IP Protocol
Let’s now look at what a routing plan is by means of the following example.
There are 4 networks:
the network with Net ID 204.92.77.0
the network 204.92.75.0
As usual, each router interface has an IP address in the network it belongs to.
Let’s now look at the R1 routing table, or rather let’s construct the R1 routing table.
The routing table will not include routes to every station as it would be enormous. Instead, it will include
the routes needed to reach each network. A network is represented by its Net ID, that is, an IP address
associated with a mask.
First route: to reach the stations in network 204.92.76, traffic doesn’t need to go through another router as
Ethernet interface 1 (e1) is connected directly to this network.
Similarly, to reach the stations in network 192.168.201, traffic can go through Ethernet interface 0 (e0).
We could then continue to describe all the other networks. But, let’s imagine that all the world’s other
internet networks are located on the left of R1. Describing all the networks would be tedious and the
routing table would be huge. So, to make the task easier, a default route can be included in the routing
table. This default route would be used solely when no other route in the table can be used to route the
packet.
So, here, any IP packet whose destination address doesn’t begin with 204.92.76 or 192.168.201 must be
sent to router R2, which is known as the "next hop". The routing table therefore contains the IP address of
the router R2 interface that shares the same network as R1. It also contains router R1 outgoing interface.

4 IP Protocol
Routing Table
204.92.75.0/24
204.92.75.0/24
.8 .6 .9
.3
.2 .13 .12
.25 .1 e2 e .1 .2 .1
e1 e0 .7
e1
0
204.92.77.0/24
204.92.77.0/24 204.92.76.0/24
204.92.76.0/24 R1 192.168.201.0/24
192.168.201.0/24
R2
204.92.76.0 255.255.255.0 e1
192.168.201.0 255.255.255.0 e0
0.0.0.0(default) 0.0.0.0 204.92.76.1 e1
Fill in this table

IP Protocol
Exercise
Try and fill in the routing table for router R2.
Several solutions are possible.

4 IP Protocol
Routing Table: Exercise (Solution)
204.92.75.0/24
204.92.75.0/24
.8 .6 .9
.3
.13
.2 .12
.25 .1 e2 .1 .2 .1
e0 e1 e0 .7
e1
204.92.77.0/24
204.92.77.0/24 R2 204.92.76.0/24
204.92.76.0/24 R1 192.168.201.0/24
192.168.201.0/24

204.92.76.0 255.255.255.0 e1
192.168.201.0 255.255.255.0 e0
0.0.0.0(default) 0.0.0.0 204.92.76.1 e1
Solution

204.92.76.0 255.255.255.0 e0
204.92.77.0 255.255.255.0 e1
204.92.75.0 255.255.255.0 e2
192.168.201.0 255.255.255.0 204.92.76.2 e0

IP Protocol
One possible solution is shown here.
You can start by adding the routes to the networks connected directly to router R2.
To reach network 204.92.76, go through Ethernet interface 0.
To reach the last network you can now:

either introduce a default route,
or specify a route for the last Net ID.
The preference here is for the Net ID. So, to reach network 192.168.201, go to the next hop (i.e. router R1)
via Ethernet interface 0.

4 IP Protocol
Routing Table: Metric
204.92.77.0 204.92.75.0 192.168.201.0
.2
R2
e2 .1 .2 .1
.1 e1 e0
e1 e0 e2
204.92.76.0 .2
R1
Network Mask Next hop If metric

204.92.76.0 255.255.255.0 e1 0
192.168.201.0 255.255.255.0 e0 0
Secondary route 204.92.77.0 255.255.255.0 204.92.76.1 e1 1
Primary route 204.92.77.0 255.255.255.0 e2 0
204.92.75.0 255.255.255.0 204.92.76.1 e1 1
204.92.75.0 255.255.255.0 204.92.77.1 e2 1

IP Protocol
Let’s now alter the diagram so that there are several routes leading to a destination. The routing table
must be updated. So, in R1 there is now a second direct route to 204.92.77 through Ethernet interface 2.
The question that now arises is "Which one of the 2 routes will R1 choose to reach network 204.92.77?". This
is the role of another routing-table parameter known as the "metric".
Here, for example, the metric corresponds to the number of hops to the destination station. It is 0 when
the network is connected directly. The router chooses the lowest-cost route.
The routing table is not quite up to date. At the moment, it shows only one route for reaching network
204.92.75 (the route that goes through network 204.92.75) when, in fact, another route via Ethernet 2 and
the next hop 204.92.77.1 can be used.
The type of routing just constructed is static routing, which means that it is set up by an operator.
You can see that:
static routing is relatively complex to set up in a large network,
design errors, route omissions and even typing errors can easily occur in the routing tables.
But, on top of that, this type of routing is not self-adjusting. This means that it cannot adjust to events
that occur in the network such as link breakage, router failure, etc.
It is for these reasons that dynamic routing protocols such as RIP, OSPF, BGP, etc. were developed.
The levels of performance and sophistication of these protocols vary and they all offer certain advantages
and disadvantages.
Similarly, static routing can also offer advantages in certain specific circumstances.

4 IP Protocol
Routing table scanning
Prefix Next Hop
192.168.0.0 /16 R4
194.1.0.0 /16 R1
192.168.1.17
194.1.16.0 /20 R2
192.168.1.0/24 R3
Choice of the longest prefix Full scan of the routing table

IP Protocol
Problem:
which of the 2 entries must use the datagram 192.168.1.17? A priori one does not know because the
datagram does not carry the size of the prefix (mask)
Rule:
One retains the entry which has the longest prefix.
It is thus necessary :
to scan the whole routing table,
to retain all the possible prefixes, and
to choose among those, that which has the longest mask. Here, they is 192.168.1/24.

4 IP Protocol
Time To Live (TTL)
Header Type Of Service Datagram length

Version length 1 TTL=64
2
Source IP address
Destination IP address TTL=63
Options TTL=62 3
Data
4 TTL=61
5
TTL=60
1 TTL=32
2
TTL=0
3

IP Protocol
This is the TTL or "Time To Live" field.

In theory, this field indicates the maximum time a packet is allowed to stay in the network. Each router
must decrease the TTL field based on its processing time.
In practice, all routers process packets in less than one second. So, it is now usual practice for routers that
process the packet to decrease the TTL value by one.
When a packet is transmitted by a station, it starts out with a certain value in the TTL field. Then, each
time the packet passes through a router, this value decrements. This packet must arrive at its destination
before the value of the field reaches 0, otherwise the packet is destroyed by a router.
What is the purpose of such a field?
When an IP packet gets lost in the network, the TTL eventually reaches 0, which means that any router can
destroy it.
This happens, for example, when a packet gets stuck in a loop following a routing problem.
You may remember that this phenomenon was mentioned earlier when discussing routing and, in particular,
loops that can occur when the default route is used incorrectly.

4 IP Protocol
Encapsulated Protocol
Data
ICMP TCP UDP
Higher-
Higher-level protocols 1 6 17
Version Header
length ToS Datagram length
IP protocol TTL Protocol Checksum

Source IP address
Options
Data
Type
MAC MAC@ dest. MAC@ src. 0800 Data FCS
(IP)

IP Protocol
When the destination station of a MAC frame receives the frame, it is the EtherType field that indicates
which higher-level protocol the contents must be sent to.
This is also the case for IP. The "protocol" field indicates which higher-level protocol is the destination of
the packet data.
The IANA assigns the official codes for this field.
The protocols encapsulated in IP are ICMP, UDP and TCP.
The TCP and UDP protocols will be studied during this training module.

4 IP Protocol
Layers in a TCP/IP Communication
data host server FTP www Mail
Transport IP Transport
data Network data
Network@IPa Network @IPb
IP@ a→
→b IP@ a→
→b IP@ a→
→b IP@ a→
→b
Link Link
Phys@ 8→
→7 Phys@4→
→15
Phys@ 1→→2 Phys@ s4→d15
Phys@: 1 Phys@ Phys@ Phys@ Phys@ Phys@
2 6 8 7 4 Phys@: 15
Host Phys@ s1→d2 Phys@ s8→d7 Phys@ s4→d15 Host
Phys@ Phys Phys@

3 @ 34
12
Phys@ 18
Phys
@
Host 9 Host

IP Protocol
When two users wish to communicate, one is the Client because in the IP world the client is defined as the
user requesting the service while the other is the Server because that user provides the service.
Here, the Server is capable of providing various services but the Client wishes to request one service only.
The transport layer is charged with targeting the required service. For this, each application is allocated an
official number known as a "port number". (N.B. the IANA is responsible for allocating a port number to
every new service.) The transport layer sends the datagram to the lower-layer IP. This IP packet must be
sent to the remote server. For this reason, every machine connected to the IP network is therefore assigned
a logical address called an IP address. One of IP jobs is to insert a header. The main fields in this header are
the packet source and destination addresses. The packet is then sent to the data link layer, which
encapsulates it in a frame with a header containing the physical source and destination addresses. Finally,
the frame is transferred to the transmission medium.
All the machines connected to this transmission medium analyze the frame header but because only the
router interface recognizes its physical address it extracts the contents of the frame and transmits them to
the upper-layer IP. The router’s network layer analyzes the packet header, especially its destination IP
address. Its routing table indicates the outgoing interface and the next physically connected device the
packet must pass through to reach its final destination. The IP packet is transferred to the data link layer,
which encapsulates it in a frame. This time, the physical source address is the source router interface
address and the physical destination address is the address of the next router interface. Once again, only
the router recognizes its physical address in the frame transported by the transmission medium. It
therefore extracts the packet from the frame and sends its contents to its network layer. The network layer
routes the packet to the outgoing interface using its routing table.
Finally, the frame is transferred to the last link. The destination machine recognizes its physical address in
the header and sends the contents to its IP. The IP of the final destination machine recognizes its own IP
address in the destination IP field of the packet received. The contents of the packet are then sent to the
transport layer, which examines the header. Thanks to the destination port number contained in the layer-
4 protocol header, the data is routed to the service chosen by the Client.

4 IP Protocol
Best Effort
Not reliable
But what does
IP provide?
No error recovery
Best effort
Connectionless-oriented

IP Protocol
Which services are provided by the IP layer?
IP is not reliable. This means that it cannot guarantee that the data it sends will be routed correctly. In the
event that a packet is lost, IP does not perform error recovery.
IP offers a connectionless service. This means that it does not communicate with the other remote IP
layers. Each datagram is managed independently from the other datagrams even when a large file is being
transferred between remote entities. This implies that the datagrams can be mixed up, duplicated, lost or
altered.
IP just tries to deliver the datagrams and provides a "Best effort" service.

3. IP Redundancy

IP Protocol

4 IP Protocol
Router Discovery Problem
C:\ > ipconfig

? 10.1.1.1
Router A
IP address: 10.1.1.10 20.20.20.4

Netmask: 255.255.255.0
Gateway: 10.1.1.1
Network
10.1.1.10¡20.20.20.4 10.1.1.2
Router B

IP Protocol
Router A is the default gateway responsible for handling packets for network 10.1.1.0/24. If the
connection between Router A and the network goes down or if the router becomes unavailable, fast
converging routing protocols, such as the Enhanced Interior Gateway Routing Protocol (Enhanced IGRP)
and Open Shortest Path First (OSPF) can respond within seconds so that Router B is prepared to transfer
packets that would otherwise have gone through Router A.
However, in spite of fast convergence, if Router A goes down, the users in network 10.1.1.0 might not be
able to communicate with the external segments even after the routing protocol has converged. That's
because IP hosts, usually do not participate in routing protocols. Instead, they are configured statically
with the address of a single router, such as Router A. Until someone manually modifies the configuration
of machine to use the address of Router B instead of Router A, the user cannot communicate with the
other network segments.
Some IP hosts use proxy Address Resolution Protocol (ARP) to select a router. If the user’s workstation
was running proxy ARP, it would send an ARP request for the IP address 20.20.20.4. Router A would reply
on behalf of that station and would offer its own media access control (MAC) address With proxy ARP,
stations in external segments are seen as if they were connected to the same segment . If Router A fails,
machine 10.1.1.10 will continue to send packets destined for 20.20.20.4 to the MAC address of Router A
even though those packets have nowhere to go and are lost. The user either waits for ARP to acquire the
MAC address of Router B by sending another ARP request or reboots the workstation to force it to send an
ARP request. In either case, for a significant period of time, it will not be able to communicate with any
external destination , even when routing protocols have converged and Router B is ready to forward
packets.
Some IP hosts use the Routing Information Protocol (RIP) to discover routers. The drawback of using RIP is
that it is slow to adapt to changes in the topology. If stations in network 10.1.1.0 were configured to use
RIP, 3 to 10 minutes might elapse before RIP makes another router available.
Some newer IP hosts use the ICMP Router Discovery Protocol (IRDP) to find a new router when a route
becomes unavailable. A host that runs IRDP listens for hello multicast messages from its configured
router and uses an alternate router when it no longer receives those hello messages. If the station was
running IRDP, it would detect that Router A is no longer sending hello messages and would start sending
its packets to Router B. However, for legacy devices that do not support IRDP, it is not an option.

4 IP Protocol
Principle
Virtual Router
Router A (active)
C:\ > ipconfig Interface IP @:10.1.1.1

IP address: 10.1.1.10 MAC @: 00:10:7B:81:9A:9B
Netmask: 255.255.255.0
Gateway: 10.1.1.3
10.1.1.1 20.20.20.4
Standby group 1
IP Address: 10.1.1.3
MAC @:00:00:0C:07:AC:01
Network
Interface IP @:10.1.1.2
MAC @: 00:10:7B:81:9C:EC
Standby group number to which
participating physical
interfaces belong
Router B (standby)
IP Protocol
One way to achieve high availability is to use HSRP, which provides network redundancy for IP networks,
ensuring that user traffic is forwarded immediately and transparently recovers from first hop failures in
router interfaces
By sharing an IP address and a MAC (Layer 2) address, two or more routers can act as a single "virtual"
router. The members of the virtual router group continually exchange status messages. This way, one
router can assume the routing responsibility of another, should it go out of commission for either planned
or unplanned reasons. Hosts continue to forward IP packets to a consistent IP and MAC address, and the
changeover of devices doing the routing is transparent.
Using HSRP, a set of routers works in concert to present the illusion of a single virtual router to the hosts
on the LAN. This set is known as an HSRP group or a standby group. A single router elected from the
group is responsible for forwarding the packets that hosts send to the virtual router. This router is known
as the Active router. Another router is elected as the Standby router. In the event that the Active router
fails, the Standby assumes the packet-forwarding duties of the Active router. Although an arbitrary
number of routers may run HSRP, only the Active router forwards the packets sent to the virtual router.
To minimize network traffic, only the Active and Standby routers send periodic HSRP messages once the
protocol has completed the election process. If the Active router fails, the Standby router takes over as
the Active router. If the Standby router fails or becomes the Active router, then another router is
elected as the Standby router.
On a particular LAN, multiple hot standby groups may coexist and overlap. Each standby group emulates
a single virtual router. The individual routers may participate in multiple groups. In this case, the router
maintains separate state and timers for each group.
Each standby group has a single, well-known MAC address, as well as an IP address.
In most cases when you configure routers to be part of an HSRP group, they listen for the HSRP MAC
address for that group as well as their own burned-in MAC address. The exception is routers whose
Ethernet controllers only recognize a single MAC address (for example, the Lance controller on the Cisco
2500 and Cisco 4500 routers). These routers use the HSRP MAC address when they are the Active router,
and their burned-in address when they are not.
HSRP uses the following MAC address on all media except Token Ring:
0000.0c07.ac** (where ** is the HSRP group number)

4 IP Protocol
Operation
Virtual Router
Router A Router
(no more hellos)
A (active)
10.1.1.1¡224.0.0.2 Hello
C:\ > ipconfig

IP address: 10.1.1.10 20.20.20.4
Netmask: 255.255.255.0
Standby group 1
Gateway: 10.1.1.3
IP Address: 10.1.1.3
MAC @:00:00:0C:07:AC:01
Network
10.1.1.2¡224.0.0.2 Hello
Routeractive
Router B enters B (standby)
mode

IP Protocol
The routers in an HSRP group send and receive keepalives using the multicast address of 224.0.0.2 and
UDP port 1985. By default the hello interval is 3 seconds. Once 3 hello intervals pass without hearing
from the active router, the standby router automatically becomes the active router. Each router is
configured with a priority number, the router with the highest priority number in a standby group is the
active router
Preemption
The HSRP preemption feature enables the router with highest priority to immediately become the Active
router. Priority is determined first by the priority value that you configure, and then by the IP address. In
each case a higher value is of greater priority.
When a higher priority router preempts a lower priority router, it sends a coup message. When a lower
priority active router receives a coup message or hello message from a higher priority active router, it
changes to the speak state and sends a resign message.
Preempt Delay
The preempt delay feature allows preemption to be delayed for a configurable time period, allowing the
router to populate its routing table before becoming the active router.

What is the function of Network Address Translation (NAT)?
To map private IP addresses to public IP addresses
To map symbolic addresses to numeric addresses
To convert classless addresses into classful (CIDR) ones
To map non-IP addresses to IP addresses

IP Protocol

How are all the routers within an Autonomous System administered?
By BGP-4
By the same manufacturer
By the same organization
By UDP

IP Protocol

Associate each IPv4 header field to its appropriate description.
Source Address IDs errors in IP

header
Time To Live Origin of packet
Header Checksum Counter to avoid route loops
Total Length Size of packet in bytes

IP Protocol

End of Section

IP Protocol

5
Section 5
Transport Layer
IP Technology

Blank Page

Transport Layer
Document History

1. User Datagram Protocol (UDP)

Transport Layer

5. Transport layer
Situation of the UDP Protocol
Application NTP TFTP SNMP DNS Telnet FTP SMTP HTTP
Transport UDP TCP
ICMP
IP
Network
ARP
SNAP
LLC
802.2
Link
MAC FDDI token Ring Ethernet ISO Ethernet V2

802.3
Optical Shield
Physical 10Base-T 10Base2 10Base5
fiber twisted pair

Transport Layer

5. Transport layer
Connectionless Service
UDP does not reorder packets

P3 P2
UDP P2 P1 UDP
P3
P1 IP network
IP Offers connectionless service IP
P2
P1
P3
P3 P1
P3 P2 P1 P2
UDP offers "connectionless" service

Transport Layer
You have already seen that IP offers service in connectionless mode only.
This means that the IP network does not ensure that all the packets from the same flow follow the same
route and therefore cannot guarantee that these packets will arrive in the same order they were
transmitted.
UDP also functions in connectionless mode and therefore does not offer mechanisms for reordering
packets.
To summarize, both UDP and IP offer connectionless mode service only.

5. Transport layer
UDP: a Non-Reliable Protocol
Fact Postal service

150$
User
User
Not reliable
Nevertheless, people appreciate this service

It is the role of users to develop a procedure if they wish a reliable
communication
(For ex.: In case of no reply after 3 days, the letter is sent again)
Transport Layer
You know that IP is not a reliable protocol. Does UDP increase reliability?
Well, actually, no it doesn’t.
This seems unfortunate, but is it really detrimental for data communication?

In fact, it all depends on the data type.
Once again, UDP can be compared to the postal service.

When you send a letter by post, in most cases the service is not guaranteed.
The letter can quite easily be lost in transit. It doesn’t happen often but it is possible.
All users are well aware of this possibility but are relatively confident when they use this mode of
communication. The same goes for UDP.
However, there is nothing stopping users from developing a reliable mechanism for sending post while
continuing to use the postal service.
For example, a user sends a letter and requests a reply. If a reply has not been received after n days, the
user can send the letter again.

5. Transport layer
UDP for Applications Tolerating Information Loss
every 10s
Dat NTP Network

e&
Network ti m
e IP network management
Time
Server
IP network Conversation
Conversation
Co nv er sa ti on

Transport Layer
Other applications are relatively tolerant in the event of information loss.

This is the case, for example, with voice transmission over IP.
The IP packets of a conversation transport digitized voice samples.
If a packet is lost, only a few samples are lost.
Such losses don’t really matter because the human ear is capable of correcting the defects itself.
Plus, retransmitting the packet is no good because it would arrive too late to be reinserted in the
conversation in the right place.
At this level, UDP offers a significant advantage. It is so simple that it produces only very slight
transmission delay. This is an extremely important factor in ensuring that the voice packets are
transmitted properly without any echo phenomena.
To ensure that a network is supervised properly, the network devices must be perfectly synchronized.
For example, alarm messages with alarm raise times generated by the various devices can only be
analyzed correctly if the device clocks are synchronized.
For this reason, time servers use the Network Time Protocol (NTP) to distribute time.
NTP is based on UDP, which is not, however, reliable.
If NTP distributes the time every 10 seconds, losing a message is not of major importance:
firstly, each device has its own internal clock and uses these messages to resynchronize its internal
clock,
secondly, it is not worth implementing a mechanism to retransmit lost messages because time will have
passed by the time the message is retransmitted and the information will be out of date.
In any case, a new message will be received within less than 10 seconds.

5. Transport layer
UDP for Applications Using Simple Exchanges
http://alcatel.com Alcatel Name

IP@=169.109.33.06
169.109.33.06 Server
Internet
DNS UDP
application not UDP DNS
wishing reliable not application
reliability reliable wishing
reliability
What is the IP@
of "alcatel.com"?
What is the IP@

of "alcatel.com"?
.06
tel.com" = 169.109.33
alcate
"alca
The application must implement a procedure for recovering errors

Transport Layer
Other applications are based on UDP even though they need a good level of reliability.
These are generally applications that need to perform extremely simple exchanges such as "request-
reply" exchanges.
Take the case of the Domain Name System (DNS), which uses a "name server" to translate domain names
such as "alcatel.com" into IP addresses.
This is done using a dialog protocol that runs on top of UDP.
When a Client asks for a translation, it obviously wishes to receive a result. However, this level of
reliability is not guaranteed with UDP.
So, the DNS application asks a name server to translate a domain name. This request is made using non-
reliable UDP and IP.
As it happens, the packet is destroyed in the network but there is no reaction from IP or UDP.
It is therefore up to the application to recover the error.
How does it do this?
Quite simply, by triggering a reply Timer when the request is sent.
If, at timeout, a reply has not been received, the Client simply resends the request.
And, hopefully, this time the exchange will proceed as planned.

5. Transport layer
Format of the UDP Message
byte byte byte byte
UDP source port UDP destination port

UDP message length UDP Checksum
Data

Transport Layer
Now you have seen the UDP applications, let’s look at the fields that make up the UDP header.
In reality, this header is extremely simple. It is made up of 4 fields.
You are now familiar with the role of the source and destination port fields.

5. Transport layer
Main UDP Well-Known Ports
UDP "Well-known" ports

7: Echo
9: Discard
11: Systat- logged users
13: Daytime
15: Netstat
19: Chargen
37: Temps (time)
43: whois-
53: DNS Domain Name Server (Query)
67: BOOTPs
BOOTP Bootstrap Protocol- Server
68: BOOTPc
BOOTP Bootstrap Protocol- Client
69: TFTP Trivial File Transfer Protocol
111: RPC remote Procedure Call
123: NTP Network Time Protocol
161: SNMP Simple Network Management Protocol
162: SNMP - Traps

Transport Layer

5. Transport layer
Synthesis
UDP added value:

Connectionless-
Connectionless-oriented
Not reliable
No flow control
Application
2
Application Application
No error recovery 1 3
UDP simply performs multiplexing/demultiplexing

Transport Layer
To conclude on the subject of UDP:

What is the added value of UDP with respect to IP?
It doesn’t improve reliability.
It doesn’t control flows.
It doesn’t provide an error recovery mechanism.
It doesn’t ensure that datagrams are delivered in the order they are sent.
So, what does it do?
UDP simply enables the multiplexing and demultiplexing of data exchanged between several
applications.

For which kind of transfer is UDP used?
Electronic mail
File transfer
Voice over IP
Web page transfer

Transport Layer

For what does the Real-time Transport Control Protocol (RTCP) provide
a performance monitoring channel?
An associated call setup
An IP packet
An RTP flow

Transport Layer

What are the characteristics of the User Datagram Protocol (UDP)?
Connectionless
Layer 3 protocol for TCP
Unacknowledge packet retransmission
Unreliable service

Transport Layer

Can the Real-Time Transport Protocol be used to provide a minimum

guaranteed network transit delay?
Yes
No

Transport Layer

What are the transport services provided by the Real-Time Transport

Protocol (RTP)?
Identification of lost packets
Identification of out-of-sequence packets
Packet retransmission
Relative timing information

Transport Layer

2. Transmission Control Protocol (TCP)

Transport Layer

5. Transport layer
Situation of the TCP Protocol
Application NTP TFTP SNMP DNS Telnet FTP SMTP HTTP
Transport UDP TCP
ICMP
IP
Network
ARP
SNAP
LLC
802.2
Link
MAC FDDI token Ring Ethernet ISO Ethernet V2
802.3
Optical
Physical fiber 10Base-T 10Base2 10Base5

Transport Layer
Illustrating the position of TCP in the TCP/IP stack obviously shows that TCP is located in the transport
layer but, more particularly, it presents the main applications that run over this protocol:
HTTP, which enables users to surf the internet.
FTP, which enables effective file transfer.
TELNET, which enables systems to be remote controlled.
SMTP, which enables the sending of electronic mail.
DNS, which is used for translating domain names into IP addresses and which has the particular feature
of functioning over both UDP, as seen previously, and over TCP. In fact, it uses TCP solely to update
databases between name servers.

5. Transport layer
Connection-Oriented Service
TCP records the packets received

P3 P1
TCP P2 P2 TCP
P3
P1 IP network
IP connectionless service IP
P2
P1
P3
P3 P1
P3 P2 P1 P2
Sequence numbers must

TCP offers the connection-
connection-oriented service be inserted and
managed by TCP
Transport Layer
Although TCP is installed over IP, which is a connectionless protocol, it offers a connection-oriented
service. This means that TCP ensures that packets sent in a particular order over an IP network will be
delivered to the applications in the order they were sent. To make this possible, TCP must insert
sequence numbers in the datagrams.

5. Transport layer
Error Recovery
Application
Withdrawal: 50€
Application
Withdrawal: 50€
TCP P1 P1-
P1-OK
TCP is reliable TCP P1
IP
IP
2
1
Cash er
ens
Central Bank
disp
IP network
(not reliable)

Transport Layer
TCP brings reliability.

All applications that wish to ensure reliable transmission but do not have their own reliability
mechanisms use TCP. TCP offers this service using mechanisms that are complex but effective.
We have seen that this kind of mechanism was sometimes integrated in the applications running above
UDP. However, when using complicated mechanisms that use a lot of memory space, it is preferable to
implement them once only in TCP rather than reproducing them n times in the various applications.

5. Transport layer
TCP Format
Byte Byte Byte Byte
source Port number destination port number

Sequence number
Min
Acknowledgement number 20 bytes
max Header Reserved U A P R S F
60 bytes length
R C S S Y I
G K H T N N Window size
Checksum urgent Pointer
Options (optional)
Data (optional)
Header length: expressed in 4-

4-byte words

Transport Layer
At IP level, the unit of transmission is called a "packet"

At TCP level, the unit of transmission is called a "segment"
To enable the implementation of all TCP mechanisms, the header naturally has more fields than the UDP
header. It has a minimum of 20 bytes, which can reach 60 depending on the options.
It is the "Header length" field, expressed in 4-byte words, that determines the size of the header.

5. Transport layer
Main Well-Known Ports
5: RJE- Remote Job Entry

7: Echo
9: Discard
11: Systat- logged users
13: Daytime
15: Netstat
19: Chargen-
20: FTP File Transfer Protocol- Data
21: FTP File Transfer Protocol- Commands
23: TELNET-
TELNET Remote connection
25: SMTP Simple Mail Transfer Protocol-
53: DNS Domain Name Server (zone transfer)
80: HTTP Hypertext Transfer Protocol
110: POP3 Post Office Protocol
443: HTTPS Secure HTTP

Transport Layer
As in UDP, certain ports are used for particular services such as Discard, Echo, Time, etc.

5. Transport layer
Sequence Number and Flags
Bytes Bytes Bytes Bytes
source Port number destination port number

Sequence number
Acknowledgement number
Header Reserved U A P R S F
length
R C S S Y I
G K H T N N Window size
Checksum urgent Pointer
Options (optional)
Data (optional)

Transport Layer
Several fields are used to ensure that packets are sequenced correctly and errors recovered:
sequence number of the first byte in this packet.
acknowledgement number that indicates which is the next byte expected by the other station.
There are several flag bits:

Urgent
Ack
Push
Reset
Synchro
Final
Instead of just giving the definition and function of each field and flag, let’s look at some examples of
how they are used.

5. Transport layer
Connection Establishment
Appli TCP TCP Appli
Seq. X Three-way handshake Seq.: y

Connect-Request
SYN (Seq.: x)
Connect-Indication
Connect-Response
.= x + 1 )
Connect-Confirm ) / ACK ( Ack
SYN (Seq.= y
ACK ( Ack.= y
+ 1 ) /(Seq.=
X + 1)

Transport Layer
Communication between 2 applications operating over TCP therefore begins with a connection
establishment procedure called the "three-way handshake".
The Client application sends TCP a Connect-Request primitive with the destination port, the IP address,
etc.
TCP on the Client side starts by selecting a sequence number at random. It inserts it in the TCP header
and sets the SYN (synchronization) flag.
TCP on the Server side (the remote station) uses the primitive Connect-Indication to notify the
application corresponding to the port number and provides certain parameters such as the calling IP
address.
The Server application uses the primitive Connect-Response to ask TCP to accept the connection.
The TCP on the Server side then chooses a random sequence number.
It sends back its own sequence number along with the SYN flag and indicates that the request has been
received by setting the ACK flag and sending back the sequence number received incremented by 1.
When TCP on the Server side sends this header back, it is not sure that the information will reach its
destination. For this reason, the other station acknowledges receipt of the message by incrementing its
sequence number and sending back the sequence number received plus 1.
In the meantime, TCP on the Client side uses the primitive Connect-Confirm to inform its application
that the session connection has been established.

5. Transport layer
Data Reordering
Establishment phase
Seq.: 40 Transfer phase
Data-Request (Seq.= 40
abcd")
("abcd ) / Data "a
bcd"
Data-Request (Seq.= 44 Data-Indication
) / Data " ("abcd
abcd")
efg")
("efg efg" ACK=44
K 44
hi")
("hi ) / Data "h
i"
("jkl
jkl") ) / Data "
jkl"
Data-Indication
ACK=52
K 52 efghijkl")
("efghijkl

Transport Layer
Once the session has been established between the 2 applications, data can be exchanged in both
directions.
To make this example easier to understand, the data will be transferred in one direction only.
It should be noted that TCP must ensure that the data is passed on to the applications in the same order
it was sent.
Let’s assume that the sequence number is currently 40. The application uses the Data-Request primitive
to ask its TCP to transmit the 4 characters "abcd". TCP therefore sends this data with its current
sequence number, that is, 40.
The remote station passes this data on to its application. But TCP itself acknowledges receipt of the data
by sending back an acknowledgement number equal to the sequence number received incremented by
the number of bytes received, which in this case is 4.
In the meantime, the sender wishes to send 3 more characters, "efg". Unlike with UDP, TCP doesn’t wait
for acknowledgement of receipt of the previous data before transmitting the new data.
TCP therefore transmits this data immediately. The sequence number of the first byte in the segment is
now 44.
This segment may be carried over another route, which will mean a longer transmission delay.
Next, 2 other characters, "hi", are transmitted with the sequence number 47. This time, the route taken
by the segment is a lot quicker and the segment even reaches its destination before the previous
segment.
Next, another 3 characters, "jkl", with the sequence number 49 are sent along the same faster route.
This segment also overtakes the segment that is still being transported over the longer route.
On the receive side, this data is not passed on to the application because it is no longer in the order in
which it was sent. Only when the middle segment is received all the data waiting in TCP is passed on to
the application. And only then is the acknowledgement sent back to the sender.
This example shows how TCP uses sequence numbers to ensure that data is delivered in the same order it
was sent.

5. Transport layer
Flow Control
window size
TCP gives a credit

to each sender
IP network

Transport Layer
So, you have seen how TCP offers a connection-oriented service that:
draws on procedures for opening and closing sessions and transferring data.
defines mechanisms for reordering data.
Let’s now look at another functionality offered by TCP: flow control.

Let’s take an example with a station that is dedicated to the Server function. This station is likely to
receive considerable flows of data.
It is therefore essential for the station to have a means of regulating the flows end-to-end.
Once again, TCP can offer this type of service.
Each station transmits and receives data but the receive function grants the other station transmit
function a credit that represents the number of incoming bytes the receive function is willing to accept.
The amount of credit varies dynamically and is defined via the "Window size" field in the TCP header.

5. Transport layer
Flow Control _ Window size
Receiver buffer
Transmission Window 2000
ow = 1000
2000 ACK nb = 2000 / Wind
500 bytes
SEQ nb = 2000 50
5000by
bytes
500 bytes tes
2500
2500
SEQ nb = 2500 50
5000by
bytes
tes 500 bytes
500 bytes ow = 500
ACK nb = 2500 / Wind
ow = 0 3000
ACK nb = 3000 / Wind
3000
3000
ow = 600
3000 ACK nb = 3000 / Wind
400 bytes
400 bytes SEQ nb = 3000 40
4000 by
bytes
tes
3500
ow = 200
3500 ACK nb = 3400 / Wind

Transport Layer

5. Transport layer
Retransmit Timeout
INTERNET
SYN
Round Trip Time

SYN, ACK
Round Trip Time
ACK
_ x β=Retransmit
TimeOOut segment
Waiting
for ack

Transport Layer
TCP uses various Timers. The main one, the Retransmit Timeout, is used for the waiting-for-
acknowledgement period.
The problem, of course, lies in assigning the right value to this timer since the time taken to
acknowledge a segment depends are numerous parameters:
distance between the stations,
link speed,
system processing time,
traffic in the network,
etc.
Instead of assigning a set value to the timer, TCP sets the timer according to a parameter known as RTT
or Round Trip Time. This parameter measures the time between when a segment is sent and when an
acknowledgement is received.
The Retransmit Timeout is then set based on this RTT.

5. Transport layer
Congestion Control: "Slow Start" Algorithm
Emitter Receiver
Example
cwnd: 1 : 512 by
t tes
=x
c k , Wi ndow size
A
cwnd: 2 Segments
20
15
10
5
cwnd: 4
(Round Trip Time)
Exponential
increase
Cwnd: Congestion Window

Transport Layer
You have seen the flow-control mechanisms that use the "Window size" field in the TCP header. This field
is set by the end stations, implying that the flow control is end-to-end flow control.
But how does flow control work when there is, say, congestion in the network?
Routers, of course, only process data up to level 3. They do not intervene in level 4 TCP and therefore do
not modify parameters in the segment.
It is therefore TCP that implements a congestion-control algorithm. It is not based on another protocol or
particular fields in the messages exchanged but consists in analyzing network behavior and, in particular,
the network’s ability to return acknowledgements.
If an acknowledgement is not returned, you could assume that the segment has been destroyed during
transmission because a particular interface has changed one of the bits in the frame. In practice, this
type of error is relatively uncommon and accounts for less than 1% of messages transmitted. When an
acknowledgement is not returned, it is usually due to congestion in the network.
TCP implements an algorithm known as "slow start".
The transmit station starts by subjecting the network to a kind of test that consists in transmitting a
segment to the remote station.
If the transmit station receives an acknowledgement, it tests the network again by this time transmitting
two consecutive segments.
If it receives the corresponding acknowledgements, it then transmits 4 consecutive segments and waits
for the acknowledgements and so on, exponentially, until a segment or acknowledgement is lost, in
which case another "slow start" process begins.
Numerous algorithms have been suggested over recent years and engineers are continuing to look for
other solutions.
New TCP implementations generally use a combination of the 4 basic internet standard algorithms:
the "slow-start" algorithm that you have just seen,
the "congestion avoidance" algorithm,
the "fast retransmit" algorithm,
the "fast recovery" algorithm.

5. Transport layer
Slow Start Algorithm and Congestion Avoidance
Segments
25
ssthreshold: slow start threshold
Congestion
detection
20
Congestion avoidance
15
incre ase
10 Lin e ar
Threshold= 16/2= 8
ssthreshold
5 slow start
(Round Trip Time)

Transport Layer
The "congestion avoidance" algorithm is used in conjunction to the slow-start algorithm.

If, during a slow start, congestion is detected while the transmit station is in the process of transmitting
16 consecutive segments, the transmit station starts by dividing this value by 2 and storing the result (in
this case 8) in a variable.
Next, it restarts the slow-start process and transmits one, then two, then four segments, continuing
exponentially until it reaches the value stored in the variable. When it reaches this value, it goes into
what is known as the "congestion avoidance" phase. During this phase, the number of segments
transmitted increases linearly rather than exponentially.

5. Transport layer
Synthesis
TCP provides:
Flow control
Reliability
Error recovery
Multiplexing/
demultiplexing
Connection-
Connection-oriented
service

Transport Layer
You have now seen the basics of TCP. A more extensive examination of TCP could include:
a more detailed look at flow-control algorithms with the "Nagle" and "fast retransmit" algorithms,
an analysis of selective acknowledgement mechanisms,
etc.
However, this would require much more time and would only really be useful for developers.
To summarize and conclude, it can be said that TCP offers applications a large number of services:
Firstly, it provides reliability thanks to the use of sequence numbers and acknowledgement
mechanisms.
It also implements error recovery.
It provides full-duplex flow-control mechanisms, which optimize communication.
Although it operates on a datagram network, it provides connection-oriented service, which ensures
that data is delivered in the order it was transmitted.
And finally, it enables the multiplexing of several data flows.
You now have a solid grasp of the basics of transport-level TCP/IP and are capable of identifying the
advantages and disadvantages of TCP compared with UDP.

What are the two TCP fields which are used to assure reliable delivery
of data.

Transport Layer

What are the possible actions that a receiver can take to slow down the
pace at which a sender transmits segments.
Create a larger input buffer
Not to acknowledge received packets as quickly
Set a low value for the Window size

Transport Layer

The diagram represents the download of 2 files of 50 kBytes and 2 Mbytes

respectively. For each one, different window sizes have been tested in order to
determine the most efficient one. The window sizes used are 8, 16, 32, and 64
kbytes. Complete the graphic with the missing parameters.
% Downlink efficiency
90
80
70
60
50 File size = 50 kB
_____?
40 2 MB
File size = _____?
30
20
10
0 Window size
8 16 32 64
in kBytes
? ? ? ?
Transport Layer

3. Signalling Transport (SIGTRAN)

Transport Layer

5. Transport layer
SIGTRAN and SCTP
IETF/SIGTRAN protocols opposite the SS7 stack SS7 stack
TCAP / MAP Q.931 V5.2 Data
ISUP / SCCP SUA IUA V5UA User
MTP-3 M3UA MTP-3
M2PA M2UA
MTP-2
SCTP
IP MTP-1
SCTP offers a compromise between the reliability of non

real time TCP and the "best effort" real time of UDP

Transport Layer

5. Transport layer
SCTP protocol
SCTP Endpoint A SCTP Endpoint B

SCTP User SCTP User
Application Application
SCTP Transport SCTP Transport

Service Service
IP Network IP Network
Service Service
The SCTP Endpoint Network Transport The SCTP Endpoint

might hold one or
might hold one or
several IP address SCTP Association several IP address

Transport Layer
To reliably transport SS7 messages over IP networks, the IETF SIGTRAN working group devised the Stream
Control Transmission Protocol (SCTP). SCTP allows the reliable transfer of signaling messages between
signaling endpoints in an IP network. To establish an association between SCTP endpoints, one endpoint
provides the other endpoint with a list of its transport addresses (multiple IP addresses in combination
with an
SCTP port). These transport addresses identify the addresses which will send and receive SCTP packets.
SCTP Endpoint _ An SCTP endpoint is a logical sender or receiver of SCTP segments. This endpoint is a
combination of one or more IP addresses and a port number.
Association _ SCTP works by establishing a relationship between SCTP endpoints. Such a relationship is
known
as an association and is defined by the SCTP endpoints involved and the current protocol state.
Segments and Chunks _ When SCTP wishes to send a piece of information to the remote end, it sends a
SCTP
segment to the IP layer and IP routes the packet to the destination.
A number of chunks follow the common SCTP header, and each chunk is comprised of a chunk header
plus
some chunk-specific content. This content can be either SCTP control information or SCTP user
information.
Streams _ A stream is a one-way logical channel between SCTP endpoints. A stream is a sequence of user
messages between two SCTP users. During association establishment the number of streams from SCTP
endpoint A to B and from SCTP endpoint B to A are specified.

5. Transport layer
SCTP Association
file
file 11
record
record 00
file
file 11
record
record 11 file
file 33 file
file 33 file
file 22 file
file 22 file
file 11 file
file 11
file
file 22 TCP record
record 11 record
record 00 record
record 11 record
record 00 record
record 11 record
record 00 TCP
record
record 00
file
file 22
Endpoint Endpoint
TCP connection
record
record 11 A B
file
file 33
record
record 00
file
file 33
record
record 11
buffered received
file
file 11 file
file 11
record
record 11 record
record 00
Stream 0
file
file 11
record
record 00
file
file 11
record
SCTP file
file 22
record
record 11
file
file 22
record
record 00
SCTP file
file 22
record 11 record
record 00
Endpoint Stream 1 Endpoint file
file 22
record
record 11
A B file
file 33
record
record 00
Stream 2 file
file 33
SCTP file record
record 11
file 33 file
file 33
buffered association record
record 11 record
record 00 received

Transport Layer
IP signaling traffic is usually composed of many independent message sequences between many different
signalling endpoints.
SCTP allows signaling messages to be independently ordered within multiple streams (unidirectional
logical
channels established from one SCTP endpoint to another) to ensure in-sequence delivery between
associated
endpoints. By transferring independent message sequences in separate SCTP streams, it is less likely that
the
retransmission of a lost message will affect the timely delivery of other messages in unrelated sequences
(called head-of-line blocking). Because TCP does enforce head-of-line blocking, the SIGTRAN Working
Group
recommends SCTP rather than TCP for the transmission of signalling messages over IP networks.
In summary, SCTP provides:
acknowledged error-free non-duplicated transfer of signaling information

in-sequence delivery of messages within multiple streams, with an option for order-of-arrival delivery
of individual messages
optional bundling of multiple messages into a single SCTP packet
data fragmentation as required
network-level fault tolerance through support of multi-homing at either or both ends of an
association
appropriate congestion avoidance behaviour and resistance to flooding (denial-of-service) and
masquerade attacks

5. Transport layer
SIGTRAN M2UA
The rest of the SS7 stack is here

MTP3 MTP3
User (NIF = Nodal User
Interworking
MTP3 Function) Backhauls MTP2 MTP3
Primitives
NIF
M2UA M2UA
MTP2 MTP2
SCTP SCTP
SS7 IP network
MTP1 MTP1 IP IP
Signalling Signalling Gateway Application Server

End Point (no SP number)
Signalling relationship

Transport Layer
An Application Server contains a set of one or more unique Application Server Processes (ASP). Normally,
one or more of these ASP must be actively processing traffic.

5. Transport layer
SIGTRAN M2UA
MTP3 user
(e.g. ISUP)
M2UA is not
ASP symmetrical:
MTP3
M2UA/SCTP/IP ISUP
SGW
IP network Primitives
M2UA IP
M2UA/SCTP/IP
SGW MTP2
SGW
MTP1 ISUP

Transport Layer

5. Transport layer
SIGTRAN M2PA
Node A Node B Node C
MTP3 MTP3 Users MTP3 Users

User
Full MTP3 Full MTP3
MTP3 Signalling is
Peer-to-Peer
M2PA M2PA
MTP2 MTP2 MTP2
SCTP SCTP
SS7 SS7/IP SS7
MTP1 MTP1 IP IP MTP1
SEP IPSP (SP No.) IPSP
SEP : Signalling End Point

IPSP : Internet Protocol Server Process
M2PA : MTP2 User Adaptation for Peer-to-Peer Connection

Transport Layer

5. Transport layer
SIGTRAN M3UA
Node A Node B Node C
MTP3 MTP3 Users MTP3 Users

User
M3UA’s M3UA’s
MTP3 MTP3 MTP3
MTP3 MTP3
Peer-to-Peer
M3UA M3UA
MTP2
MTP2 SCTP SCTP MTP2
SS7 SS7/IP SS7
MTP1 MTP1 IP IP MTP1
SEP IPSP (SP No.) IPSP
SEP : Signaling End Point

IPSP : Internet Protocol Server Process
M3UA : MTP3 User Adaptation Layer

Transport Layer

Blank Page

Transport Layer
Document History

End of Section

Transport Layer

6
Section 6
Application Services
IP Technology

Blank Page

Document History

1. Network Time Protocol (NTP)


NTP strata hierarchy
Internal hardware clock

Terrestrial can be synchronized by
radio NTP source
Stratum 0 source
GPS Satellite
Stratum 1 NTP
NTP NTP
NTP
Server
Server Server
Server
Stratum 2 Client
Client // Server
Server Peer
relationship
Client
Client // Server
stratum
Server Client
Client // Server
Server Client
Client
stratum
stratum 22 stratum 22 stratum
stratum 22 Stratum
Stratum 22
Stratum 3 Client
Stratum 3
Client / Server
stratum 3
Client / Server
stratum 3
Peer
relationship
Client
Stratum 3
…… …… …… ……
Stratum 15 Client Client Client Client

Stratum Stratum Stratum Stratum
15 15 15 15

NTP is a protocol designed to synchronize the clocks of computers over a network. This protocol has been specifically
designed for Internet environments and uses a client/server model to provide service. NTP version 3 is an internet draft
standard, formalized in RFC 1305. NTP version 4 is a significant revision of the NTP standard, and is the current
development version, but has not been formalized in an RFC.
At the top of any NTP hierarchy are one or more reference clocks. These are electronic clocks synchronized to a
common time reference, for instance, GPS signals, radio signals or extremely accurate frequency control. The accuracy
of the other clocks is judged according to how “close” that clock is to the reference clock (stratum), the network
latency to the clock and its claimed accuracy.
NTP uses the UDP protocol on port 123 for communication between clients and servers. Attempts are made at
designated intervals until the server responds. The interval ranges from once every minute up to 17 minutes depending
on a number of factors.
NTP works on a hierarchical model in which a small number of servers give time to a large number of clients. The client
on each level, or stratum, are in turn, potential servers to an even larger number of clients of a higher numbered
stratum. Stratum numbers increase from the primary (stratum 1) servers to the low numbered strata at the leaves of
the tree. Clients can use time information from multiple servers to automatically determine the best source of time
and prevent bad sources from corrupting their own time.
Servers that are directly connected to the reference clock are termed stratum 1. A reference clock connected to a
stratum 1 server is referred to as stratum 0 server. Clients never communicate directly with a stratum 0 server, they
always go through a stratum 1 server synchronized to a stratum 0 server.
Clients of stratum 1 servers are referred to as stratum 2 clients. If they serve time to clients, they are also referred as
stratum 2 servers and the clients they serve are known as stratum 3 clients. This continues to higher numbered strata.
The maximum NTP stratum number for a client is 15; however, in practice, it is rare to find clients with a stratum
number above 4 or 5, for most real-world configurations.

NTP operation principle
Device B
Device A (sync source)
NTP Packet 10:00:00 NTP Packet 10:00:00 10:14:00
Transmission Time (T1) Reception Time (T2)
NTP Packet 10:00:00 10:14:00 10:14:01
NTP Packet 10:00:00 10:14:00 10:14:02 10:00:03 Response

Transmission Time
(T3)
Response Arrival Time (T4)

RTT Delay (Device A) = (T4 -T1)-(T3 -T2) = ( 10:00:03 – 10:00:00) – (10:14:00 – 10:14:01) = 2s
Offset (Device A) = ((T2 -T1)-(T3 –T4))/2 =( 10:14:00 – 10:00:00) – (10:14:01 – 10:00:03) = 13m29s

The procedures of synchronizing system clocks are as follows:
1. Device_A sends an NTP packet to Device_B, with the timestamp identifying the time when it is sent
(that is, 10:00:00, noted as T1) carried.
2. When the packet arrives, Device_B inserts its own timestamp, which identifies 10:14:00 (noted as T2)
into the packet.
3. Before this NTP packet leaves, Device_B inserts its own timestamp once again, which identifies 10:14:01
(noted as T3).
4. When receiving the response packet, Device_A inserts a new timestamp, which identifies 10:00:03am
(noted as T4), into it.
At this time, Device_A has enough information to calculate the following two parameters:
The delay for an NTP packet to make a round trip between Device_B A and Device_B :
(T4 -T1)-(T3 -T2)
The time offset of Device_A with regard to Device_B :

((T2 -T1) + (T3 -T4))/2.
Device_A can then set its own clock according to the above information to synchronize its clock to that of
Device_B

NTP operation modes
NTP NTP
Server Client 1
Periodical
broadcast or
multicast
NTP
Client 2
Broadcast or multicast mode
NTP
NTP est (1) Client 1
NTP Requ
Server
onse (2)
NTP Resp
NTP
Client 2
Polling or client/server mode

The bandwidth requirements for NTP are also minimal. Unencrypted NTP Ethernet packets are 90 bytes long
(76 bytes long at the IP layer). A broadcast server sends out a packet about every 64 seconds. A non-
broadcast client/server requires 2 packets per transaction. When first started, transactions occur about
once per minute, increasing gradually to once per 17 minutes under normal conditions. Poorly synchronized
clients will tend to poll more often than well synchronized clients. In NTP version 4 implementations, the
minimum and maximum intervals can be extended beyond these limits, if necessary
A unicast client sends a request to a designated server at its unicast address and expects a reply from which
it can determine the time and, optionally, the roundtrip delay and local clock offset relative to the
server.
A multicast server periodically sends a unsolicited message to a designated IPv4 or IPv6 local broadcast
address or multicast group address and ordinarily expects no requests from clients. A multicast client
listens on this address and ordinarily sends no requests.
For IPv4, the IANA has assigned the multicast group address 224.0.1.1 for NTP, which is used both by
multicast servers and anycast clients.

2. File Transfer Protocol (FTP)


FTP _ Data and Control Connections
connect File1 File1

1
13 11
FTP Client FTP server
Control Data Data Control
5
> get file1 connectionconnection connectionconnection
9 4
2 8
Ephemeral port Ephemeral port Well-known port Well-known port
1843 1955 6 20 21
TCP TCP
3 9
12
TCP/IP Network
10 Get "file1" 7 Data port: 1955

FTP is a standardized protocol (STD 9). It is described in the standard RFC 959 – File Transfer Protocol (FTP)
and the update RFC 2228 – FTP Security Extensions.
To access files on a remote station, the user must provide the server with user identification information.
The server is responsible for authenticating the information before authorizing access to the files.
FTP uses TCP as its transport protocol in order to offer reliable end-to-end connections.
Data can be transferred in both directions.
The FTP server waits for connection requests on ports 20 and 21. Two connections are used:
The first one, the control connection, is for the login and uses the TELNET protocol.
The second one, the data connection, is for data transfer.
Use of Passive Mode

In passive mode, data-connection set establishment is reversed. The FTP server selects a port (ephemeral)
and informs the Client of this port number. Establishing the control connection and the data transfer
connection from the Client side facilitates configuration, especially when a firewall is used.

Secure FTP

SSH File Transfer Protocol (sometimes called Secure File Transfer Protocol or SFTP) is a network protocol
that provides file transfer and manipulation functionality over any . It is typically used with version two of
the SSH protocol (TCP port 22) to provide secure file transfer, but is intended to be usable with other
protocols as well.
The SFTP protocol allows for a range of operations on remote files – it is more like a remote file system
protocol. An SFTP client's extra capabilities compared to an SCP client include resuming interrupted
transfers, directory listings, and remote file removal.
SFTP is not FTP run over SSH, but rather a new protocol designed from the ground up by the IETF SECSH
working.
Note _ In Winscp, no mechanism is provided for keys generation. A program like puttygen.exe is necessary
to generate the key files.

3. Voice over IP (VoIP)


Role of RTP-RTCP
RTP flow (audio)

RTCP
RTP flow (audio)
RTCP
RTP flow (video)
RTCP
RTP flow (video)
RTCP
Audio Video
RTP RTCP RTP RTCP RTP RTCP RTP RTCP
UDP RTP: Real-time Transport Protocol
IP RTCP: Real-time Transport Control Protocol

Real-time Transport Protocol (RTP)

The aim of RTP is to provide a standardized means of transmitting real-time data (audio, video, etc.)
over IP. RTP’s main role consists in providing sequence numbering for IP packets so that the voice or
video data can be reconstructed even if the underlying network has changed the order of the packets.
More generally, RTP enables:
identification of the type of information being transported,
addition of time stamps and sequence numbering to the information being transported,
checks to ensure that the packets have arrived at their destination.
In addition, RTP can be used with multicast packets to route conversations to multiple destinations.
In multimedia communications, each medium (voice, video, etc.) has an RTP session with an associated
RTCP feedback function. RTP sessions that have the same address are distinguished by different UDP
ports. For each participant, the session is defined by:
an IP address
a pair of UDP ports (RTP and RTCP)
Real-Time Transport Control Protocol (RTCP)

RTCP is a control protocol used in association with RTP. It measures performance but does not provide
service guarantee. RTCP is based on the periodic transmission of control packets by all the participants in
a session. RTCP provides transmission and receiver information.
It is an RTP-flow control protocol that enables the transmission of basic information about the
participants in a session and about Quality of Service.

RTP functions
Real time
data OK
Real time <=>
data
Recover the time base
…, P3, P2, P1 …, P3, P1

Reorder packets
Detect loss packets
IP
network B L A
B L A Allow conference
B L A

What RTP can do

RTP can
recover the time base of audio, video and real time (in general) data flows,
quickly detect loss of packets and inform the source of them (respectfully to
compatible delay),
be transported in multicast packets in order to deliver media to multiple addressees,
RTP cannot
act at routers' level,
control the QoS – Quality of Service,
make resources reservation
Either guarantee packets delivery or retransmit missing ones.

Problem Inherent to VoIP
1
2 Network
3
4 delay
5
6 1
7 2
8 3
9
Reconstruction
10 4 delay
11
12 6
13 7 1
14 8 2
15 3
16 10 4
17 11
18 12 6
19 7
20 14 8
9 9
15 10
Voice sample 11
18 12
17 14
15
19
13 17
20 18
19
16 20

Voice starts out as a synchronous flow.

This flow is broken down into packets to be transported over the network.
The sender has generated 20 packets. The receiver may receive the packets out of sequence as a result
of variations in transmission times (delay) through the network.
In addition to this variable transmission delay, time is required to reorganize and reconstruct the flow.
When the flow is reconstructed in this example, it is considered that:
packet 5 has been lost,
packets 13 and 16 have arrived too late to be included in the reconstructed flow.
The late arrival of these packets results in a lower level of quality.

Format of the RTP Message
Payload Type Codec

0 G.711 , µ Law
8 G.711 , A Law
9 G.722
AUDIO
4 G.723 Allows for:
15 G.728 • Detection of lost datagrams
18 G.729 • Detection of duplications
34 H.263 • Reordering of datagrams
VIDEO
31 H.261
V P X CC M Payload Type Sequence number
Timestamp
Identifies the source
Synchronization Source Identifier (SSRC) (important in
conference mode)
Contributing Source Identifier (CSRC)
Profile dependent Size
Data (payload)

The Version field, V: 2 bits. Indicates the version of the protocol (V=2).
The Padding field, P: 1 bit. If P equals 1, the packet contains additional padding bytes to complete the
last packet.
The Extension field, X: 1 bit. If X equals 1, the header is followed by an extension packet.
The CSRC count field, CC: 4 bits. Contains the number of CSRCs that follow the header.
The Marker field, M: 1 bit. Its meaning is defined by an application profile.
The Payload Type (PT) field: 7 bits. This field identifies the type of payload (audio, video, image, text,
html, etc.) See the IANA site “ASSIGNED NUMBERS” (http://www.iana.org/numbers.html) for the
various standardised codes (RTP Payload types (PT) for standard audio and video encodings).
The Sequence number field: 16 bits. Its initial value is random and increments by one each time a
packet is sent. It can be used to detect packet loss.
The Timestamp field: 32 bits. Reflects the sampling instant of the first byte in the RTP packet. The
sampling instant must be derived from a clock that increments monotonically and linearly in time to
allow synchronisation and jitter calculations.
The SSRC field: 32 bits. A unique synchronisation source identifier chosen randomly by the application.
The SSRC field identifies the synchronisation source (or more simply the “source”). This identifier is
chosen randomly and has the advantage of being unique amongst all the sources from the same session.
The list of CSRCs identifies the sources (SSRCs) which have contributed to obtaining data contained in
the packet that contains these identifiers. The number of identifiers is given in the CC field.
The CSRC field: 32 bits. Identifies the contributing sources (conference).

Timestamp: Jitter Control
20 ms 40 ms
Sampling:
8 kHz
Unit: 125 µs
Timestamp Payload
0
160
1st sample Timestamp Payload
160
160th sample x 125 µs = 20 ms

The 32-bit Timestamp field reflects the sampling instant of the first byte in the RTP packet. The
sampling instant must be derived from a clock that increments monotonically and linearly in time to
allow synchronization and jitter calculations.
The initial value of the Timestamp should be random.
The sampling frequency is defined for each type of payload.

For most audio codecs, it is generally 8000Hz.
For H.261, it is 90kHz.

Voice Sampling
Analog voice
Frequency: 8 kHz
8,000 samples per second t = 0.125ms
100
101
Amplitude: 3 bits 110
000
001
8 different values 010
011
Result: 101 100 100 101 000 010 011 011 011 001 110 101 100

Sampling
The method used to digitize an analog signal such as voice depends on 2 parameters: frequency and
amplitude.
Together these parameters determine the quality of the sample and the amount of information required to
reconstruct the message.

Problems
Decoding: 101 100 100 101 000 010 011 011 011 001 110 101 100 Original signal
100
101
110
000
001
010
011
How might this be resolved?
Increase the number of samples and the amplitude, but…
the bandwidth used increases.
Compression required
When an analog signal is reconstructed using digital information, the reconstructed signal differs from the
original one.
If we want to produce a digital signal that is closer to the original, the number of samples and the
amplitude must be increased. However, this means that the amount of information to be transported also
increases.
One answer could be to compress the information.

Compression… Carrot Soup
5 carrots
1 onion chopped
20 g butter
1 chicken stock cube
1 potato
150 g chicken
1 litre water
Salt and pepper
How should the list of ingredients be written to make

carrot soup?

To illustrate the various compression families, let’s consider the example of 3 chefs who wish to write
down the list of ingredients required to make carrot soup.

Compression… Carrot Soup [cont.]
First method
5 carrots
1 onion chopped
20 g butter
1 potato
150 g chicken
1 litre water
Ingredients:
Ingredients:
Salt and pepper
55 carrots
carrots
11 onion
onion chopped
chopped
20
20 gg butter
butter
11 chicken
chicken stock
stock cube
cube
11 potato
potato
150
150 gg chicken
chicken
11 litre
litre water
water
Salt
Salt and pepper
and pepper 92 characters
The first chef writes the list of ingredients as it is. He uses 92 characters.

Second method 5 carrots

1 onion chopped
20 g butter
1 potato
150 g chicken
1 litre water
Salt and pepper
Lexicon: Ingredients:
Ingredients:
C = carrot 55 CC
O = onion chopped 11 O O
B = butter 20
20 gg ofof BB
P = chicken stock cube 11 PP
T= potato 11 TT
PL = 150 g chicken 11 PLPL
E = water 11 EE
SP = salt and pepper SP
SP 21 characters
The second chef uses a standard lexicon from a recipe book and writes the list of ingredients using the
lexicon. He uses 21 characters.

Third method 5 carrots

1 onion chopped
20 g butter
1 potato
150 g chicken
Lexicon: 1 litre water
C = carrot Salt and pepper
O = onion chopped
B = butter
P = chicken stock cube
T= potato Ingredients:
Ingredients:
PL = 150 g chicken 55 CC
E = water 11 OO
SP = salt and pepper 11 TT
The chicken stock cube and the 11 PL
PL
butter do not give this soup 11 EE
much taste SP
SP 13 characters
The third chef uses a standard lexicon from a recipe book and, drawing on his experience, determines the
ingredients that do not add the least taste to the soup. He then writes the list of ingredients, leaving some
of them out because they don’t make much difference to the soup. He uses 13 characters.

Associate each chef with his method for writing the recipe.
13 charact.
No compression
Destructive compression
21 charact. with shared lexicon
Compression with
92 charact. shared lexicon


Destructive Codec
5 carrots 5 carrots
1 onion chopped 1 potato
20 g butter
1 potato
150 g chicken
≠ 150 g chicken
1 litre water
Salt and pepper
1 litre water
Salt and pepper
The chicken stock
cube and the
butter do not give The onion does not
this soup much give this soup
taste much taste
Ingredients:
Ingredients:
55 CC Ingredients:
Ingredients:
11 O O 55 CC
11 TT 11 TT
11 PLPL 11 PL
PL
11 EE 11 EE
SP
SP SP
SP
When a chef decides which ingredients must be removed, he changes the list of ingredients slightly. He
therefore changes the original recipe. Another chef may then read the recipe and also decide to change it.
The soup could end up tasting different from the original recipe.

Destructive Codec [cont.]
They all make carrot soup but the quality of the soup is subjective…
Which is the nicest?

The recipe has been changed so it must now be decided which is the nicest carrot soup.

Quality: R Factor - MOS
R MOS
Factor (Mean Opinion Score)
100 5.0 More
reliable
Very satisfied
90 4.1
Satisfied
80 3.7
Some users dissatisfied
70 3.4
Many users dissatisfied
60 2.9 • The MOS terminology is
defined by ITU-T P.800.1
Nearly all users dissatisfied
50 2.4 • The PESQ (Perceptual
Evaluation of Speech
Not recommended Quality) MOS is defined
0 1 by ITU-T P.862

R Factor
The ITU–T has a defined a model for defining the quality of a codec. This benchmark can be used to
compare the quality of one codec with another.
The R factor is calculated on a scale of 0 to 100 (E-model) based on user perception. 100 is excellent and 0
poor. R factor calculation begins with a unimpaired signal. If there is no network or equipment, quality is
perfect.
This is expressed by the equation:
R = R0 (e.g. 93.2)
But the network and equipment impair the signal, thus reducing signal quality as it travels from one end to
the other:
R = R0 – Is -Id –Ie-eff + A where:
Ro: represents the basic signal-to-noise ratio, including noise sources such as circuit noise and room
noise.
Is: it is a combination of all impairments which occur more or less simultaneously with the voice
signal
Id: represents the impairments caused by delay and the effective equipment impairment
Ie-eff: represents impairments caused by low bit-rate codecs. It also includes impairment due to
packet-losses of random distribution.
A: this Advantage factor allows for compensation of impairment factors when there are other
advantages of access to the user.
Mean Opinion Score (MOS)

MOS is based on a subjective test. It corresponds to voice quality as perceived by a testers (PESQ =
Perceptual Evaluation of Speech Quality), who give a score between 1 and 5.

Coding Rate
8-bit
G.711 encoding amplitude
Sampling: 8 kHz
t = 0.125ms
Bit-rate 64 kbps
Encoding rate 125 µs

These codecs do not use the compression method. This means that the rate is calculated using the
following formula:
Rate = amplitude bits x sampling frequency
G.711 is the reference codec. It works as previously described with an amplitude of 8 bits and a sampling
frequency of 8 kHz.
G.726 Adaptive Differential Pulse Code Modulation (ADPCM) uses a compression whereby only the
difference between two samples is encoded. In this case, the amplitude can be reduced to 2 bits with an
acceptable loss of quality.
The most common rate for this codec is 32 kbps.

Coding Rate [cont.]
5
Absolute
sample value
4
3 6
G.726 ADPCM
2
Defines the differences 1

between two samples t
Sample no. 1 2 3 4 5 6 7
Difference between
• Amplitude: 2 - 5 bits sample values
• Sampling: 8 kHz
6
t
no. 1 2 3 4 5
(usually
Rate 16… 40 kbps 32 kbps)
Encoding delay 125 µs

These codecs use compression methods, which means that the formula used for the previous codec is not
applicable.
G.726 ADPCM (Adaptive Differential Pulse Code Modulation) uses a compression whereby only the
difference between two samples is encoded. In this case, the amplitude can be reduced to 2 bits with an
acceptable loss of quality.
The information is then compressed using a lexicon and an algorithm which recreates the human body using
a mathematic model.
As soon as the receiver and the sender have agreed on the lexicon to be used, the model then sends the
vocal chord impulses only.

Coding Rate [cont.]
Destructive compression with shared lexicon

+ VAD / SID / CNG
(Voice Activity Detection / Silence Insertion Deletion / Comfort Noise Generation)
G.729
Rate 8 kbps
• Sampling: 8 kHz
• 20 bytes every 20 ms
Encoding delay 15 ms
• 2 bytes every 20 ms during
silences
AMR
Rate 4.75… 12.2 kbps
• Sampling: 8 kHz
• Between 95 and 244 bits
Encoding delay 20 ms
every 20 ms
• 39 bits every 160 ms
during silences

These codecs use compression methods, which means that the formula used for the previous codec is not
applicable.
The G.729 samples the voice using a similar method as G.711. The information is then compressed using a
lexicon, and an algorithm which recreates the human body using a mathematic model.
As soon as the receiver and the sender have agreed on the lexicon to be used, the model then sends the
vocal chord impulses only.
Adaptive Multiple Rate (AMR) uses a compression similar to the G729. However, the rate is not fixed but 8
levels of quality and data rate have been defined (from AMR 4.75 kbps to 12.2 kbps).

Frequency Hiding
High level The human ear will not hear this

Signal
neighboring frequency sound:
it can therefore be hidden
level
Frequencies

The encoding of the sound is based on human hearing. Among the principal properties, three will be used
to compress a sampled audio flow:
Sensitivities to certain frequencies: the human ear is not designed to hear certain frequencies.
Let us recall that the frequency of a sound indicates its tone, which is similar to the colour of an object
(the colour itself being due to a frequency). An acute sound will have a high frequency whereas a low
sound has a low frequency.
Certain sounds are too acute to be perceived by the human ear. In reality, ultrasounds can be perceived by
certain animals.
Other sounds are too loud to be heard. These are known as infrasound. Inaudible sounds do not require
encoding.
Frequency hiding: a strong sound will hide a lower-level sound with a close frequency.

Temporal Hiding
This sound comes after a high-level

High level sound with the same frequency
and will therefore not be audible
Signal by the human ear: it can be hidden
level
Time

Temporal hiding: the ear also tends to mask sounds produced just before or after the emission of a
relatively strong noise.
This noise drowns out any sound emitted afterwards. These sounds are not perceived by the human ear and
therefore do not required encoding.

Codec Quality
R MOS
Factor
100 5.0
Very satisfied
90 4.1 G711 (64 kbps)
Satisfied AMR (12.2 kbps)
80 3.7 G726 (32 kbps)
Some users dissatisfied
70 3.4 G729
Many users dissatisfied
60 2.9
Nearly all users dissatisfied
50 2.4
Not recommended
0 1

R factor MOS
G.711 89.3 4.1
AMR 12.2 84.3 3.90
G.726 82.3 3.85
G.729 68.8 3.27

Packet Sizes
20 ms 20 ms
Packet size
G.711 (64 kbps) 160 bytes
G.726 ADPCM 80 bytes

(32 kbps)
30.5 bytes
AMR (12.2 kbps)
20 bytes
(VAD)
20 bytes
G.729 (8 kbps)
14 by.
(VAD) VAD: Voice Activity Detection (35% silence)

Voice over IP (VoIP) sends packet information. To understand the calculations of traffic over IP, imagine
that you are filling a cup from a jug. The flow from the jug is constant, but a drop leaves the cup every 20
ms.
To calculate the rate over IP we need to convert the codec rate into the number of bytes transferred
during 20 ms.
With AMR:
12.2 kbps means 244 bits every 20 ms (12.2 kbps x 20 ms)

Because we need bytes instead of bits, 244 bits / 8 = 30.5 bytes
If we assume 35% silence, this means that 30.5 bytes are transferred during 65% of the time.
So 30.5 x 65% = 19.82 bytes
But during the silence (35% of the time), we transfer 39 bits (6 bytes) every 160 ms (equivalent to 0.6
bytes every 20ms)
So 0.6 x 35%= 0.21 bytes
Then, if we consider AMR with 35% silence: ~20 bytes every 20ms
G711 and G726 do not include VAD.

IP Overhead
20 8 12 160
IP UDP RTP G.711
Useful bandwidth: 64 kbps

Used bandwidth: 81 kbps
20 8 12 80
IP UDP RTP G.726
20 8 12 4 30,5
Useful bandwidth: 12.2 kbps
NbUP
IP UDP RTP AMR

20 8 12 14
IP UDP RTP G.729

As shown previously, voice is transferred over IP using a set of layers:

NbUP is a protocol used only with AMR. It defines the rate of AMR transported, and expected.
RTP is a datagram protocol that is designed for real-time data such as streaming audio and video.
UDP is a connectionless datagram protocol. It is a "best effort" or "unreliable" protocol - not because it is
particularly unreliable, but because it does not verify that packets have reached their destination, and
gives no guarantee that they will arrive in order.
IP performs the basic task of getting packets of data from source to destination. IP can carry data for a
number of different higher level protocols; these protocols are each identified by a unique IP Protocol
Number.

Benefit
Number of calls R Factor

using STM-1 (voice quality)
95
7000
89.3
6511 90
6000 84.3
85
5546
5000 80
82.3
75
4000
70
3000
2995 68.9 65
2000 60
1848
Number of TDM calls 55
1000
(G.711) using STM-1
G.711 G.726 AMR G.729
= 1953 (63 x 31)
0

This diagram shows that with STM1 (149Mb/s), it is worth using voice over IP if you don’t use G711.
In fact, with STM1 you can transport 63 PCMs using G711, which means 31TSs x 63PCMs = 1,953 calls.
VoIP becomes viable with G726, AMR and G729, but with G729 quality decreases leading to the R factor
definition: “Many Users Dissatisfied”

Which is the most reliable way to compare the quality of different

speech coders?
A variety of electrical measurements
An electrical technique called Perceptual Speech Quality Measurement (PSQM)
Human listening tests rated by Mean Opinion Scores (MOSs)
Percentage of Completed Calls Dropped


End of Section


7
Section 7
Quality of Service
IP Technology
Section 7 Page 1
Blank Page

Quality of Service
Document History
Section 7 Page 2
1. QOS in IP networks

Quality of Service
Section 7 Page 3
7. QoS in IP networks
Why to implement Quality of Service?
Broadcast TV VoIP
Audio/video Streaming
conference video
IP network
PBX
PSTN/ISDN

Quality of Service
The Quality of Service on IP is to date an extreme subject.

When IP was designed, it was primarily dedicated to the not real time communication such as e-mail,
FTP,... It ensured the "Best effort" i.e. the network does the best that it can to deliver the packets offers
no guaranties and only a single service level, knowing that some times certain packets can be lost, can
arrive with delay. The performances are reached when the network is not very charged.
A switched network dedicates resources specifically to two end users for the entire length of their call. A
packet network using statistical multiplexing allocates resources only when input sources offer something
to send. This characteristic of statistical multiplexing increases the probability that sometimes the
network receives incoming packets at a rate greater than can be processed. The packets rest in the
buffer, or queue while awaiting processing
Now that there is convergence between the switched networks and the packet networks, traffic real time
such as voice and video must be able to forward on the packet networks.
It is necessary to implement the Quality of Service to ensure the new types of traffic:
Interactive: audio and video (videoconference)
Not interactive: audio streaming video (radio, TV)
Up to now, IP networks was dedicated to transport dated and only best effort was provided to teh users.
As we converge the network and we put voice and video applications on IP network, delay affects
interactive conversation.
The ITU says that a packet delay of 150-200 msec degrades the interactivity in a conversation.
IP needs enough intelligence to differentiate one packet from another and provide different service levels
based on the requirements of the applications
Section 7 Page 4
7. QoS in IP networks What is Quality of Service?
Operational means of differentiating packet flow by bounding :
delay
throughput
jitter Recommended
Packet discard probability

Quality of Service
It is necessary to install mechanisms of priority between different types of traffic

real time and,
not real time.
To tackle this subject, several factors must be examined:
Various approaches of QoS
Admission control
Prioritization
Queuing
Work of the IETF
Integrated services
Differentiated services
Applications
RTP and RTCP
What the Quality of Service?
A means of differentiating packet flow in term of
throughput,
time, beyond 150ms, degraded interactive conversation
jitter,
probability of loss of packets
Section 7 Page 5
Delay and jitter
Delay : amount of time it takes the packet to get through the network
Delay = ƒ ( route, line speed, queue size, network load)
t1
t1
Jitter = variation delay

Jitter = ƒ ( route, line speed, queue size, network load)

Quality of Service
The degradation of the performances is primarily due to the queues.

Only the properties on which the network can act are:
The policy of access to the network as regards load
The management of the queues in the equipment of the network
The mechanisms of Quality of Service will play on:
The control of the traffic inbound the network ("traffic admission control").
The manner of managing these queues of waiting ("tail management") and the assignment of priorities,
The treatment inflicted with the packets when the queue is full ("tail drop")
The policy of access to the wearing of exit
...
Packets arriving in network equipment (switch, router) are placed in a queue of the ingress interface then,
treaties, before being deposited in a queue on the egress interface.
Definitions :
Delay The end-in-end transmission time of the packets in the network is not only function of the
distance which separates the two entities in communication and of the rate of the used links, but also
function of the size of the queues and the load of the network.
Jitter each packet of the same flow can be delayed of a different time. The jitter corresponds to this
variation of time of packet transmission. The jitter is also function of the size of the queues and the
load of the network. A packet crossing network equipment having a very short queue size will be sent
much more quickly on the exit interface mainly if the network load is not very high. Another package
taking a different road and crossing routers having a longer queue size will be delayed of a much more
long time.
Throughput « Throughput » mentioned in the contract « SLA » is the guaranteed throughput. Certain
packets can be eliminated if the queue is full and so, affect the throughput.
Section 7 Page 6
Causes of the delay
Switching delay of the order of tens µs and is therefore negligible
Serialization delay
High-speed link (10Gb/s)
Low-speed link (64kb/s)
Frame 1110010110001010111 Frame 1110010110001010111
1500 Bytes 1500 Bytes
187ms 1.2µ
µs
Propagation delay 1
5ms
1000 km
Queuing delay varies with queue occupancy
Delay variation results exclusively from variation in the

queuing delay at every hop
Quality of Service
In advanced high-speed routers, the switching delay is of the order of tens of microseconds and is therefore
negligible. Thus, the one-way delay in a network is caused by three main components:
Serialization delay at each hop This is the time it takes to clock all the bits of the packet onto the wire.
This is very significant on a low-speed link (187 milliseconds (ms) for a 1500-byte packet on a 64-kbps
link) and is entirely negligible at high speeds (1.2 microseconds for a 1500-byte packet on a 10-Gbps link).
For a given link, this is clearly a fixed delay.
Propagation delay end-to-end This is the time it takes for the signal to physically propagate from one end
of the link to the other. This is constrained by the speed of light on fiber (or the propagation speed of
electrical signals on copper) and is about 5 ms per 1000 km. Again, for a given link, this is a fixed delay.
Queuing delay at each hop This is the time spent by the packet in an egress queue waiting for
transmission of other packets before it can be sent on the wire. This delay varies with queue occupancy,
which in turns depends on the packet arrival distribution and queue service rate.
Section 7 Page 7
QoS requirements
Telephony application Mouth-to-ear max delay < 150ms

Telephony application delay ≈ 40ms
(packetization time, codec encoding …)
Delay < 110 ms

Jitter < 110 ms
0.1%< Packet loss < 0.5% (undetectable)
Interactive applications
300ms < Delay <400ms
Jitter not really relevant
0.5%< Packet loss < 1% (involves rare retransmission)
Non-interactive applications
0.1%< Packet loss < 0.5% (drives the throughput via TCP)
Delay irrelevant
Jitter

Quality of Service
Although many applications using a given network may each potentially have their own specific QoS
requirements, they can actually be grouped into a limited number of broad categories with similar QoS
requirements. These categories are called classes of service. The number and definition of such classes of
service is arbitrary and depends on the environment.
In the context of telephony, we'll call the delay between when a sound is made by a speaker and when that
sound is heard by a listener as the mouth-to-ear delay. Telephony users are very sensitive to this mouth-
to-ear delay because it might impact conversational dynamics and result in echo. A mouth-to-ear delay
below 150 ms results in very high-quality perception for the vast majority of telephony users. Hence, this
is used as the design target for very high-quality voice over IP (VoIP) applications. Less-stringent design
targets are also used in some environments where good or medium quality is acceptable.
Because the codec on the receiving VoIP gateway effectively needs to decode a constant rate of voice
samples, a de-jitter buffer is used to compensate for the delay variation in the received stream. This
buffer effectively turns the delay variation into a fixed delay. VoIP gateways commonly use an adaptive
de-jitter buffer that dynamically adjusts its size to the delay variation currently observed. This means
that the delay variation experienced by packets in the network directly contributes to the mouth-to-ear
delay.
Therefore, assuming a delay budget of 40 ms for the telephony application itself (packetization time, voice
activity detection, codec encoding, codec decoding, and so on), you see that the sum of the VoIP one-way
delay target and the delay variation target for the network for high-quality telephony is 110 ms end to
end (including both the core and access links).
Assuming random distribution of loss, a packet loss of 0.1- 0.5 % results in virtually undetectable, or very
tolerable, service degradation and is often used as the target for high-quality VoIP services.
For interactive mission-critical applications, an end-to-end RTT on the order of 300-400 ms is usually a
sufficient target to ensure that an end user can work without being affected by network-induced delay.
Delay variation is not really relevant. A loss ratio of about 0.5-1% may be targeted for such applications,
resulting in sufficiently rare retransmissions.
For noninteractive mission-critical applications, the key QoS element is to maintain a low loss ratio (with
a target in the range of 0.1-0.5 %) because this is what drives the throughput via the TCP congestion
avoidance mechanisms. Only loose commitments on delay are necessary for these applications, and delay
variation is irrelevant.
Section 7 Page 8
« Admission control » et « Queue management »
Ping between US and Europe

Delay
(ms)
Highest Delay
Second highest Delay
1000
Second lowest Delay
800
Lowest Delay
600
400
200
12:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00
Uncontrollable delay Controllable delay

Quality of Service
In advanced high-speed routers, the switching delay is of the order of tens of microseconds and is therefore
negligible. Thus, the one-way delay in a network is caused by three main components:
Serialization delay at each hop This is the time it takes to clock all the bits of the packet onto the wire.
This is very significant on a low-speed link (187 milliseconds (ms) for a 1500-byte packet on a 64-kbps
link) and is entirely negligible at high speeds (1.2 microseconds for a 1500-byte packet on a 10-Gbps
link). For a given link, this is clearly a fixed delay.
Propagation delay end-to-end This is the time it takes for the signal to physically propagate from one
end of the link to the other. This is constrained by the speed of light on fiber (or the propagation speed
of electrical signals on copper) and is about 5 ms per 1000 km. Again, for a given link, this is a fixed
delay.
Queuing delay at each hop This is the time spent by the packet in an egress queue waiting for
transmission of other packets before it can be sent on the wire. This delay varies with queue
occupancy, which in turns depends on the packet arrival distribution and queue service rate.
In the absence of routing change, because the serialization delay and propagation delay are fixed by physics
for a given path, the delay variation in a network results exclusively from variation in the queuing delay
at every hop. In the event of a routing change, the corresponding change of the traffic path is likely to
result in a sudden variation in delay.
The uncontrollable delay is due to:

the topology of the network
the travel time (primarily function of the distance)
The band-width of the connections
Size of the packets sent by the applications.
The controllable time is generated by
the latency of the packets in the buffers (queues)
Section 7 Page 9
Line speed and delay
Interleaving on slow speed link
1
1500 bytes
56kb/s
2 3
66 Bytes Voice Data
t0 214ms serialization delay!!
t0+ 10 µs
Interleaving on fast speed link
1
1500 bytes
1Gb/s
2 3
Voice
t0 1.2 µs serialization delay!!
t0+ 10 µs
Quality of Service
Gigabit Ethernet changes the way you look at statistical multiplexing. Let me remind you that a standard
Ethernet frame is 1500 bytes. This means a 1500 byte packet can be transmitted in roughly 12
microseconds across a 1 Gbps link, assuming that the link actually delivers the full 1 Gbps. In reality,
8B/10B encoding reduces that a little, but not significantly. If a voice packet has to wait around for even
100 microseconds before it can be forwarded, who cares? You need to deliver voice packets in 150,000
microseconds, end to end to keep your voice users happy, otherwise, they will complain about delay.
1- 1500 bytes data frame sent to router

2- 66 byte voice frame sent an instant behind the data frame
3- Voice frame must wait 214ms (or 0.012ms if 1Gb/s) the data frame to be sent
Section 7 Page 10
Control admission : SLA (Service Level Agreement)
Traffic admission control

User
IP network
Network
operator
Traffic shaping user
Legal contract
Performances Service
Level
Packet loss Agreement
xxxxxxxxx xxxxxxx
End-to-End delay
Availability
Guarantees
Quality of Service
Service Level Agreements

A "Service Agreement Level", SLA, is a contract of service signed between a user and carrier. It defines the
type of service which the carrier will have to provide to the user and includes a profile of traffic to be
respected by the user.
The SLA defines:
Performances in the form of "Traffic parameters“, for example: the carrier will support 100 kb/s of high
priority traffic for the customer
The deadlines (Absolute delay, Round Time Trip, ...)
The availability in term of maintenance of the service (Mean time to answer, Mean time to repair, ...)
Losses of packets (percentage of loss)
The guarantees (penalties in the event of non respect, ...)
The operator of the network guarantees a certain quality of service. As it must check as the customer does
not exceed his rights. For this reason it sets up the functions of:
"Traffic admission control“ : Admission control describes how carriers control the traffic entering the
network.
"Traffic shaping" (or "traffic policing") Traffic shaping controls the rate that traffic enters into the
network. Typically carriers shape traffic to ensure that customer conforms to their Service level
agreement. For example, if the customer sends high priority traffic at 100 kb/s, the carrier “shape” it
at the network entry point to ensure that only 100 kb/s enters the network.
Section 7 Page 11
Queue management
Best effort Switch / Router

•Only one queue per port Input port
(First In First Out) itc
h Output port
•Less expensive Sw bric
Input port fa Output port
•Implicit admission control
•No shaping
Switch
Router
Priority traffic Queue high high
Input port Output port
medium medium
•Multipriority queues
•Traffic covered by SLA
•Explicit admission control low low
•Shaping h
•Cost more expensive itc
Sw bric
high fa high
Flow Output port
Input port medium medium
low low
Quality of Service
How do carriers manage traffic once inside the network ?

Previously the switches and the routers ensured only "Best Effort" i.e. they laid out only of one queue per
port thus, the first entering packet became systematically the first outgoing one (FIFO). Best effort
means:
No guarantee in term of delivery, time (weak cost)
No control on the inbound traffic
Today, the modern switches and the routers have "multi-priorities queues" by port for example:
High priority
Medium priority
Low priority
allowing to deposit the packets in the queues suitable in QoS wished.
Then the packets are extracted from the various tails according to the deadlines to respect.
The network device (Switch or router) receiving incoming packets selects the queue according to the
markings in the IP packet (the DiffServ CodePoints).
Tail drop management will impact the packets arriving after a queue reaches the maximum capacity.
Section 7 Page 12
Stateful/stateless QoS
Info about
ATM/FR
Stateful QoS flow
Service flow ATM/FR ATM/FR
ATM/FR
Info about Info about
flow flow
No info
Stateless QoS about flow
No info about flow

Quality of Service
A flow is a sequence of packets of a source towards a destination which requires the same service network.
Example all packets of a conversation.
Stateless/stateful
With the difference of "Stateful" equipment (switches ATM, FR, X25), "Stateless" equipment does not store
an information on flow. The routers are stateless equipment, when they receive a packet, they treat it,
dispatch on the exit interface but do not store any information following this treatment. Another packet,
belonging however to the same flow, will have to undergo the same treatment.
Historically, QoS stateful "IntServ" had the favor in the networks of the type ATM, FR but on Internet public
network that poses problems of scale. Indeed how to memorize in each element of the network
information of QoS million connections?
For this reason the IETF, in the years 1990, moved on QoS stateless in the form of "Differentiated services"
(DiffServ).
The fields Type-of-Service (IPv4) or Class-of-Service (IPv6) will be used to manage QoS, their value will
determine the queue to borrow.
Section 7 Page 13
Stateful QoS _ Integrated Services - RSVP
Resource
reservation
Resource
reservation
Resource
reservation
Resource
reservation

Quality of Service
IntServ (Integrated Service) is a model make it possible to ensure Stateful QoS.

The first IntServ protocol which was born is "RSVP: Resource Protocol Reservation ". In this principle, the
routers preserve the knowledge of flow claiming QoS.
RSVP is a protocol of indication making it possible to hold resources on the path between a source and a
destination.
RSVP is used by an application to require the network to ensure a certain quality of service for a given flow.
This same protocol will be used by the routers of the network between them to draw up and maintain the
tables of states related to the flood.
RSVP identifies one-way flows and is conceived to support multicast exchanges (Radio, TV) as well as
unicast. The reservation of resources is initialized by the site recipient of the flow.
The transmitter of the flood regularly sends messages of control "path" towards the receiver. Each receiver
answers by a message "resv" in which it indicates the criteria of quality of service which is appropriate to
him. Resources necessary, if available, are reserved by the routers on the path from receiver towards the
transmitter. In the case of a multicast flood, the various convergent reservations are aggregated.
Section 7 Page 14
Stateless QoS _ Type of Service
•Informs crossed networks about the desired Quality of Service

byte byte byte byte
Version Header Type Of

length Service Datagram length
Source IP address
Options
Precedence Delay Through Reliability Cost

-put 0
Bits 0 1 2 3 4 5 6 7
RFC 791 RFC 1349
DSCP (RFC 2474)

Quality of Service
Service Type:
The service type is an indication of the quality of service requested for this IP datagram
The Type of Service is used to indicate the quality of the service desired. The type of service is an
abstract or generalized set of parameters which characterize the service choices provided in the
networks that make up the internet. This type of service indication is to be used by gateways to select
the actual transmission parameters for a particular network, the network to be used for the next hop,
or the next gateway when routing an internet datagram.
Section 7 Page 15
ToS : Precedence (rfc791)
Precedence
Bits 0 1 2 3 4 5 6 7
Indicates the priority of

datagram:
the datagram:
000 : Routine
001 : Priority
010 : Immediate
011 : Flash
100 : Flash override
101 : not used
110 : Inter-
Inter-network control
111 : Network control

Quality of Service
Precedence:
is intended to denote the importance or priority of the datagram.
This field specifies the nature and priority of the datagram:
• 000: Routine
• 001: Priority
• 010: Immediate
• 011: Flash
• 100: Flash override
• 101: Critical
• 110: Internetwork control
• 111: Network control
Section 7 Page 16
ToS : Precedence management
Router
Prec
4
Prec
3
Prec
2
Prec
1
Prec
0
IP
network
Congestion

Quality of Service
Section 7 Page 17
DiffServ
Version Header
Type Of
length Service Datagram length
Source IP address
Options
DSCP (Differentiated Services Code Point) Unused
Bits 0 1 2 3 4 5 6 7
Code
Class Selector Code point
Points pool
0: standard
1: experimental or local usage

Quality of Service
Diffserv is a Stateless approach of QoS. There is no procedure of call establishment.

DiffServ solves the principal problem encountered by IntServ, the scalability which must accompany the
increase in size by the network. The solution consists in rejecting into the routers located at the borders
of the network all the functions of classification of packets (marking, "policing" checking of the respect of
the contract by the transmitter) and traffic shaping, while the core routers of the network will have only
to apply preset behaviors (Per-Hop Behaviour. In the core of the network, all the packets are marked,
these marks are used by routers "DS-capable" to determine the behavior which must be applied to them.
The various behaviors intervene in the management of the queues and in the algorithms of selection of
packets to be rejected in the event of a queue congestion.
The choice to be made by the router of the mode of behavior according to the mark present in the packet is
very fast since there is nothing any more but one field to analyze in the packet header. The
differentiation of the traffic is carried out in the IP packet header field "ToS" (IPv4) or "CoS" (IPv6). That
is the role of the applications or border routers to correctly set these bits. But in any event, the operator
of the network controls the respect of the SLA.
DiffServ supposes that the routers or switches have the "multipriority queues".
DiffServ supposes that this simple mechanism will be enough to ensure sufficient and acceptable QoS
To be compliant with with the previous specifications, and because the majority of the old same routers
manage "precedence", DiffServ definined the first 3 bits as being the "Class selector code points" and
have a function slightly compatible with the old field "precedence".
Thus the packages will be marked with a value of DSCP called: PHB: "Per Hop Beahavior"
Advantages:
No protocol of signaling, each package conveys its QoS
No information by flow to be memorized in each equipment network
No problem about dimensioning
Section 7 Page 18
Diffserv : principle of operation
Traffic conditioning
Per-Hop-Behavior % of use
(Meter, Marker, Shaper/Dropper)
EF 65/100
AF2 20/100
Input AF1 10/100
Classifier BE 5/100
Output
Scheduler
Queue management
Quality of Service
DiffServ is a flexible model. It is up to each operator to decide how many classes of service to support,
which PHBs to use, which traffic conditioning mechanisms to use, and how to allocate capacity to each
PHB to achieve the required QoS for each class of service.
The DiffServ Code Point determines the Per-Hop Behavior of the network nodes.
If all the traffic on an access link use the same Code Point, then the PHB depends upon load.
Traffic in the high priority queue should wait less time and experience better network quality of service.
First of all, the received packets on an interface will be classified according to their PHB.
There are 3 principal types of traffic correspondent to 3 types of PHB:
EF for "Expedited Forwarding", traffic having a weak time, few jitter and a guaranteed band-width
AF for "Assured Forwarding" whose band-width can be divided according to policies
And BE for "Best Effort" traffic for which the network will make despite everything possible its best
effort to convey it
But before being directed towards the interface of exit, the traffic passes in the process "traffic
conditioning" or it will be measured in order to control if it respects strictly the contract which was
subscribed in term of flow, volume, etc...
This traffic can be downgraded, i.e. marked with a weaker PHB of quality or even discarded according to
the adopted policy.
The process "Scheduler" has a policy of treatment of flows to convey the packages for the exit interface.
Another process the "tail management" also will be setup mainly in the event of congestion in order to
eliminate certain packets as of their entry in a router congested in order to avoid the aggravation of the
problem.
Section 7 Page 19
Diffserv : encoding
Bits 0 1 2 3 4 5 6 7
DSCP (Differentiated Services Code Point)
Class Selector Code Code
point
Unused
Point pool
0 0 0 0 0 0 Best effort
0 0 1 X X 0 Class 1
0 1 0 X X 0 Class 2
Assured Forwarding
0 1 1 X X 0 Class 3
1 0 0 X X 0 Class 4
1 0 1 1 1 0 Expedited Forwarding

Quality of Service
The packets are classified IETF-defined per-hop behaviors (PHBs) including :

assured forwarding (AF)
expedited forwarding (EF)
and Best effort
The EF PHB is intended to support low-loss, low-delay, and low-jitter services. The EF PHB guarantees that its
traffic is serviced at a rate that is at least equal to a configurable minimum service rate (regardless of the
offered load of non-EF traffic) at both long and short intervals. Configuring the minimum service rate higher
than the actual EF load arrival rate allows the EF queue to remain very small (even at short intervals) and
consequently lets the EF objectives be met. Traffic that is characterized as EF will receive the lowest
latency, jitter and assured bandwidth services which is suitable for applications such as VoIP. Codepoint
101110 is recommended for the EF PHB.
AF specifies forwarding of packets in one of four AF classes. Within each AF class, a packet is assigned one of
three levels of drop precedence. Each corresponding PHB is known as AFij, where i is the AF class and j is
the drop precedence. Each AF class is allocated a certain number of resources, including bandwidth.
Forwarding is independent across AF classes. Within an AF class, packets of drop precedence p experience a
level of loss lower than (or equal to) the level of loss experienced by packets of drop precedence q, if p <
q. Packets are protected from reordering within a given AF class regardless of their precedence level. To
minimize long-term congestion in an AF queue, active queue management (such as Random Early Detection
) is required with different thresholds for each drop profile.
The AF PHB groups are intended to address common applications that require low loss as long as the
aggregate traffic from each site stays below a subscribed profile and that may need to send traffic beyond
the subscribed profile with the understanding that this excess traffic does not get the same level of
assurance. AF allows carving out the bandwidth between multiple classes in a network according to desired
policies.
Best-effort forwarding behavior available in all routers (that aren't running DiffServ) for standard traffic
whose responsibility is simply to deliver as many packets as possible as soon as possible. This PHB is
intended for all traffic for which no special QoS commitments are contracted. The default PHB essentially
specifies that a packet marked with a DSCP value of 000000 receives the traditional best-effort service.
Section 7 Page 20
Diffserv : Assured Forwarding
Bits 0 2 1 3 4 5 6 7
DSCP (Differentiated Services Code Point)
Class Selector Code Code
point
Unused
Point pool
Assured Forwarding Drop 0 0 0

Precedence
Class 1 0 0 1 0 1 Low drop Precedence (AF11)
1 0 Medium drop Precedence (AF12)
1 1 High drop Precedence (AF13)


Quality of Service
Assured Forwarding (AFxy) PHB group
In a typical application, a company uses the Internet to interconnect its geographically distributed sites and
wants an assurance that IP packets within this intranet are forwarded with high probability as long as the
aggregate traffic from each site does not exceed the subscribed information rate
Assured Forwarding (AF) PHB group is a means for a provider to offer different levels of forwarding
assurances for IP packets received from a customer.
Four AF classes are defined, where each AF class is in each node allocated a certain amount of
forwarding resources (buffer space and bandwidth). IP packets that wish to use the services provided by
the AF PHB group are assigned by the customer or the provider into one or more of these AF classes
according to the services that the customer has subscribed to.
Within each AF class IP packets are marked (again by the customer or the provider) with one of three
possible drop precedence values. In case of congestion, the drop precedence of a packet determines the
relative importance of the packet within the AF class. A congested node tries to protect packets with a
lower drop precedence value from being lost by preferably discarding packets with a higher drop
precedence value.
Section 7 Page 21
Diffserv : Control du trafic par Token Bucket
Token input at
constant rate
Token equivalent to
“In-profile” traffic
packet size are
removed if available
in bucket
Enough
tokens?
“Out-of-profile” traffic
•Packet discard or,
•Packet marked

Quality of Service
The SLA makes provision for « in-profile » traffic..

Customer traffic that does not fit the SLA is known as « out-of-profil » traffic is not prioritized in the same
maner as « in-profile traffic.
Diffserv marks both the priority and in or out of profile status of a packet.
The carrier has certain options for processing out-of-profile traffic. Since the customer isn’t paying for it,
because it’s not part of the SLA, the carrier could choose to discard immediately or to mark it and discard
only if the network is congested or the carrier could also carry it and charge the customer a premium
price.
How do carriers determine when and if carriers transmit in-profile traffic?
The "policing" will be carried out by the equipment of access in edge of the network ("access edge"). The
method used is the "Token bucket".
Principle of the token bucket:

With a given rate, tokens are versed in the bucket, without exceeding the capacity of the bucket. Each
token represents a number of bytes of the flow which one ensures certain QoS.
When packets arrive on this flow, tokens are taken in proportion of the size of the packages.
If the bucket has sufficient tokens, the traffic is known as "in-profiles" i.e. the SLA covers the traffic.
If the bucket does not have sufficient tokens, the traffic is known as "out-of-profiles". For these packets,
the operator network applies a policy which can be the destruction of the packets immediately or only
if the network is congested, ...
Section 7 Page 22
Queue management - FIFO
Packets waiting for switch Packets waiting for

fabric resources link access
Switch / Router
Input port h Output port
itc
Sw bric
Input port fa Output port
Max depth High-priority packets

may be delayed or
discarded
tail front
Fixed or adjustable size
Tail drop

Quality of Service
Queing occurs at every swith or router between the source and the destination. Delay-sensitive traffic
requires queue management through the network to support end-to-end QoS requirements.
A queue holds packets awaiting access to a resource :
Input queue hold traffic awaiting access to the switch fabric
Output queue hold traffic awaiting transmission onto the link
The highest level of QoS appears when queues are empty, i.e. all controllable delay is controllable delay is
controlled, and only uncontrollable delay remains.
The method of managing the queue allows from some delay control.
Delay variation, or “jitter” is very important to the real-time applications. Transmitted real-time flows
must be played out by the destination at constant rate. If all the packets in a flow encounter the same
queue length, they wait about the same time. However, queue sizes vary overtime, resulting in packet
experiencing different delays through the network.
Using queue management techniques, we can try to eliminate or minimize jitter for delay sensitive traffic..
Fifo queuing :
The first packet that gets into the queue is the first one that gets out the queue.The device services the
packet at the front of the queue, arriving packets land at the tail of the queue.
Queue depth identifies, the size or maximum number of packets in the queue.
If the queue is full to capacity, then equipment simply drops arriving packets.
FIFO queuing contains some negative characteristics, especially when used in a traffic-prioritized network.
Devices do not process the packets based on established hierarchy. High-priority packets may be delayed
or discarded while the network devices process low-priority packets
Section 7 Page 23
Multi-priority-queuing
Switch
Router
high high
Input port Output port
medium medium
low low
h
itc
Sw bric
high fa high
Flow Output port
Input port medium medium
low low
Queue management method : “Head-of-line blocking”

High-priority packets served first then, medium then low
Medium and low priority packets may be discarded

Quality of Service
Multi-priority queuing uses a hierarchy to determine which packets to service first by defining a separate
queue for each priority level. The device analyze the packet to determine priority and places that packet
into the appropriate queue based on that determination.
Routers and switches using multi-priority queuing process high-priority first, then medium, then low priority
traffic.. Therefore, incoming high priority traffic delays medium priority traffic, which delays low priority
traffic..
Because the device services high priority packets regardless of medium or low priority network load,
medium and low priority packets may not be serviced at all and therefore discarded. This technique of
queue management is known as “head-of-line blocking”.
Class-based queuing
Network devices may employ a more sophisticated method of queuing to avoid head of line blocking. This
method is known as “class-based queuing”.
CBQ does not assign absolute priorities to traffic, but rather assigns a ratio of the resource (e.g. bandwidth)
to each class, or priority. If a particular class uses less than the allocated portion then, the other classes
use it.
Section 7 Page 24
WFQ : Weight Fair Queue
WFQ
Queues
40%
30%
20%
10%
0% WFQ constantly recalculates resource allocation

0% to maintain programmed ratio
67%
33%

Quality of Service
Weighted Fair Queuing allocates a certain ratio of the resource to each priority, but unlike CBQ, it
accommodate and manages traffic consisting of variably sized packets.
WFQ can give certain traffic priority without starving lower priority queue and maintain the resource
allocation ratio constantly over time. Computation complexity offers the major disadvantage to WFQ.
Assume we have 4 queues and we have strategically weighted the resource for each.
To make weighted factoring work, we must dynamically understand the traffic on the network and be
able to dynamically change the weight of the available resource in response to traffic pattern.
In this example, we want to maintain a ratio between the priority queues of 4 to 3 to 2 to 1, from the
highest to lowest.
If the switch or router sent 1500 byte packet from the lower priority queue, it will not serve that queue
again until it serves :
4 times that amount from the high priority queue,
3 times that amount from the medium priority queue,
And 2 times that amount from the low priority queue
If some queues are empty then, the device reallocates the resource to the non-empty queues in such a
way as to maintain the same ratios.
in the lower example, because the two higher priority queues are empty, the weighting between the
low and lower queues is 2 to1, so the low priority gets 67% of the resource and the lower queue gets
33% of the resource. WFQ constantly recalculates resource allocation.
Section 7 Page 25
WFQ : Discard probability
Max depth
Queues
tail head
Probability of packet
being discarded
1.
Tail drop
Queue
50% 100% fill
WFQ leads to TCP flow troubles

Quality of Service
Any queue with a maximum depth contains the potential to fill up thus causing the discard of arriving
packets.
Straight tail drop describes the process of dropping packets based solely on available space in the queue
when packets arrive.
An empty queue offer 0% probability of dropped packets and 100% probability of dropped packets once the
queue reaches capacity.
This probability shift cases the problem known as performance oscillation.
Let’s take a reminder of TCP flow control by means of next figures.
Section 7 Page 26
Performance oscillation with WFQ
Max depth Probability of packet

tail Queues head
being discarded
1.
Queue
50% 100% fill
Dropped
packets from
many TCP
flows

Quality of Service
An empty queue offer 0% probability of dropped packets and 100% probability of dropped packets once the
queue reaches capacity.
This probability shift cases the problem known as performance oscillation.
TCP offers some built in network congestion control mechanisms. If a given TCP flow experiences certain
pattern of packet loss (unacknowledged packets), it assumes network congestion. TCP very quickly
decreases the transmission rate for packets of that flow. TCP slowly builds up the transmission rate as
congestion eases.
However, the network drops packets from many TCP flows within a short period of time they will all slow
down and then build back up again.
This oscillation results in an inefficient use of network resources.
When many TCP flows slow down, throughput drops and the network operates with resources
underutilized.
When many TCP flows speed up, the network congests and drops packets and subsequently network
throughput.
Section 7 Page 27
RED : Random Early Detection
Probability of dropped packets = ƒ (Smoothed queue occupancy)
Max depth Smoothed Queue

occupancy head
tail
being discarded
Randomly
1.
Some packets are dropped

“early” while others have
sufficient space Average queue depth 100%

Quality of Service
A more efficient model allows traffic to build to a point supporting high throughput without experiencing
congestion. This efficiency is the goal of “Random Early Detection” (RED)
RED drops some packets before the queue fills. The probability of packet loss increases with the occupancy
of the queue.
In this way congestion only impact a small subset of the TCP flows. The affected flows slow their
transmission rate to reduce the load on the network enough to avoid full queues and oscillation yet
maintaining acceptable overload throughput.
RED strives to achieve “smoothed queue occupancy”. The average queued threshold describes the average
length of the queue maintained over some period of time. If the queue exceeds the average queue
threshold, then the probability of those packets being dropped increases and and we start randomly
dropping incoming packets. We don’t want to overreact and drop packets when we don’t really need to
do so.
There is a potential problem with basic RED :

It may be that 10% of TCP flows (bulk data flows) send 90% of the packets. RED statically drops more
packets from large flows therefore the TCP flows that make up 90 % of the traffic will be dropped first
according to RED.
However, if those large flows were of the highest priority then high priority packets would be dropped.
Dropping high priority packets packets from QoS enabled network is not what a service provider wants to
do. Service providers want to ensure that high-priority traffic gets the services and the resources that
needs as outlined in a SLA.
Section 7 Page 28
WRED : Weighted Random Early Detection
Queues
High-priority
low-priority
being discarded
One threshold/queue
1.
Average queue depth

Quality of Service
WRED (Weighted RED) concept implement a threshold per queue.

This principle allows, for example, to increase the minimum RED threshold in the high-priority queue
allowing more high-priority packets in the queue before randomly dropping packets. By decreasing the
RED threshold of the low_priority queue, only hold a few low-priority packets before randomly dropping
packets.
This sort of RED adjustment increases the network’s ability to process high-priority traffic. WRED allows
network administrators to flexibly allocate queue resources to best serve traffic prioritization needs.
Section 7 Page 29
Answer the questions

Quality of Service
Section 7 Page 30

Quality of Service
Section 7 Page 31

Quality of Service
Section 7 Page 32
4 QoS

Quality of Service
Section 7 Page 33
End of Section

Quality of Service
Section 7 Page 34
8
Section 8
Multiprotocol Label Switching
(MPLS)
IP Technology
Section 8 Page 1
Blank Page

Document History
Section 8 Page 2
1. Label Switching Principles

Section 8 Page 3
Why MPLS?
Multiprotocol Label Switching offers a number of advantages:
A traffic engineered path with a guaranteed reservation of resources in order

to provide the required QoS.
A fast switchover to a backup path in case of failure (in the order of a few
milliseconds).
The implementation of Layer 2 and 3 VPN services.
VLL _Virtual Leased Line _ Layer 2 point to point service

VPLS _ Virtual Private LAN Service _ Layer 2 point to multipoint service
VPRN _ Virtual Private Routed Network _ Layer 3 VPN
MPLS does not replace classical IP routing but optimizes it

Section 8 Page 4
8. Multiprotocol Label Switching
MPLS location
7
to Applications
5
4 TCP UDP
IP Routing Table
Destination Next Hop
3
134.5.0.0/16 200.5.1.5
IP 134.5.1.0/24 200.2.3.4
MPLS Table
In Out
MPLS (2, 84) (4,12)
2 (2, 85) (3, 99)
PPP Ethernet Frame Relay ATM
1
Physical (Optical — Electrical)

MPLS does not replace classical IP routing but optimizes it.

If the IP routing table is modified, the label table must be modified.
Section 8 Page 5
LSR : Label Switch Router
MPLS network
LSR
LSR LSR
LSR LSR
LSR
IP Label switching IP
Routing Routing
IP Router
Label Switching Router

MPLS network is composed of Label Switching Router(LSR).
A Label Switching Routeur(LSR): is a traditional router which has more processing capacity and having got
MPLS protocols. It knows, amongst other things, how to manage a second table, in addition to the
routing table : the labels switching table
A LSR can be:
An IP router
An ATM switch
A Frame Relay switch
A DWDM optical switch
The table of label depends completely of the traditional IP routing table
If the IP routing table is modified, the label table must be modified.
Section 8 Page 6
LER : Label Edge Router
Transit LSR
processing traffic within the MPLS domain
•Forwards MPLS packets using label swapping (label swap)
LSR
Ingress
LER LER
MPLS network Egress
LSR
LSR LSR
LSR
LSR
LER : Label Edge Router
processing traffic as it enters the LER : Label Edge Router
MPLS domain : processing traffic as it leaves the
• examines inbound IP packets MPLS domain:
• classifies packet for QoS •Removes label (label pop)
• Assigns initial label (label push)
The LER converts both IP packets into MPLS packets and MPLS packets into IP packets.
On the ingress side, the LER examines the incoming packet to determine whether the packet should be
labeled. In an MPLS network, the LERs serve as quality of service (QoS) decision points.
The function of the LSR is to examine incoming packets. Provided that a label is present, the LSR will look
up and follow the label instructions and then forward the packet according to the instructions. The LSR
performs a label-swapping function.
Section 8 Page 7
LSP : Label Switched Path
LSP
LSR
21
l:
La
be
be
La
l:
56
MPLS network LSR 2
l :3
be
LER La LER
LSR
LSR

A path through the network, known as a Label Switched Path (LSP), must be defined and the QoS
parameters along that path must be established. The QoS parameters determine
how many resources to commit to the path, and
what queuing and discarding policy to establish at each LSR for packets
Section 8 Page 8
Principle of the “Label switching »
MPLS does not replace classical IP routing but optimizes it
Switching Table
In Out
(port, label) (port, label)
(1, 22) (2, 17)
(1, 24) (3, 17)
(1, 25) (4, 19)
IP packet Label
(2, 23) (3, 12)
Data IPs: 154.1.2.3
→IPd: 86.6.7.8 25
Port 1 Port 2
IP packet Label
Data IPs: 154.1.2.3
→IPd: 86.6.7.8 19
Port 3 Port 4

Label swapping is based on the accurate match and not the longer prefix like IP.
The MPLS header is simple and short compared to IP header.
Section 8 Page 9
Principle of FEC (Forward Equivalence Class)
A FEC may be a group of IP destination addresses using same LSP

IP@1
LSR LSR
→IP1 LER LER
LSP
→IP2
IP@2
IP@1
→IP1 23 6 6 14 →IP1
→IP2 23 →IP1 23 →IP2 6 →IP1 6 →IP2 14 →IP1 14
→IP2 →IP2
IP@2

The “Forwarding Equivalence Class” is an important concept in MPLS. An FEC is any subset of packets
that are treated the same way by a router. By “treated” this can mean, forwarded out the same
interface with the same next hop and label. It can also mean given the same class of service, output
on same queue, given same drop preference, and any other option available to the network operator.
When a packet enters the MPLS network at the ingress node, the packet is mapped into an FEC.
FECs also allow for greater scalability in MPLS. The limited flexibility and large numbers of (short lived)
flows in the Internet limits the applicability of both IP Switching and MPOA (Multi-Protocols Over Atm).
With MPLS, the aggregation of flows into FECs of variable granularity provides scalability that meets
the demands of the public Internet as well as enterprise applications.
Section 8 Page 10
Flow aggregation
LSP
LSR
Ingress Routing Table

Destination Label
La
134.5.0.0/16 91
be
FEC 200.3.2.0/24 91 91
l:
l:
91 abe
56
56.42.1.0/24 L MPLS network
52 1
FEC 123.2.0.0/16 Lab el :2
el :
10.8.128.0/20 52 52 Lab l : 15
e LER
Lab
LER La
LSR be LSR
l: 88
Aggregation can also be done : 43 l:
a be
L
•By protocol
•By application (destination port) LSR
•By traffic priority LSP
•By source address
FEC : Forward Equivalence Class
•…
FEC = “A subset of packets that are all treated the same way by a router”
The concept of FECs provides for a great deal of flexibility and scalability
In conventional routing, a packet is assigned to a FEC at each hop (i.e. L3 look-up), in MPLS it is only
done once at the network ingress
The mapping can also be done on a wide variety of parameters, address prefix (or host),
source/destination address pair, or ingress interface. This greater flexibility adds functionality to
MPLS that is not available in traditional IP routing.
The FEC for a packet can be determined by one or more of a number of parameters, as specified by the
network manager. Among the possible parameters:
Source or destination IP addresses or IP network addresses
Source or destination port numbers
IP protocol ID
Differentiated services codepoint
IPv6 flow label
…..
Section 8 Page 11
MPLS Forwarding — Example
Routing Table
Destination LSP
2 MPLS Table 134.5.6.1
134.5.0.0/16 LSP3 In Proc Out
9
200.3.2.0/24 LSP5 5 134.5.1.5
134.5.1.5
MPLS Table 2, 84 Swap 6,31
Dest Proc Out
3 Routing Table
LSP3 Push 2, 84 4 6 Destination Next Hop
LSP5 Push 3, 99 84
2 6 31
134.5.1.5 134.5.1.5
134.5.0.0/16 134.5.6.1 8
1 134.5.1.5
2 LSP3 1 200.3.2.0/24 200.3.1.1
MPLS Table
In Proc Out
200.3.2.7 3 LSP5 2
1, 3 Pop -- 7
200.3.2.7 99 1 2 3 5 200.3.2.7 31 2, 3 Pop --
200.3.2.7 56 200.3.2.7
200.3.1.1
MPLS Table MPLS Table

In Proc Out In Proc Out
1,99 Swap 2,56 3,56 Swap 5,31 200.3.2.7

The labels are imposed on the packets only once in periphery of network MPLS on the level of Ingress E-
LSR (Edge Label Switch Router) where a treatment is carried out on the datagram in order to assign a
specific label.
What is important here, is that this calculation is carried out only one time. The first time that the
datagram of a flow arrives at Ingress E-LSR.
This label is removed at the other end by Egress E-LSR.
Thus the mechanism is as follows:
Ingress LSR (E-LSR) receives the IP packet, carry out a classification of the packet, assigns a label and
transmits the labeled packet.
the transit LSR uses the label in the packet to switch it until the packet reaches the Egress LSR
The egress LSR removes the label and routes the packet to its final destination.
Section 8 Page 12
Penultimate Hop Popping
Routing Table The label is removed (popped)

Destination LSP
MPLS Table 134.5.6.1
134.5.0.0/16 LSP3 2 In Proc Out
200.3.2.0/24 LSP5 134.5.1.5
Swap 6, null
2, 84 POP 3 134.5.1.5
MPLS Table
Dest Out Proc Penultimate Routing Table
LSP3 2, 84 Push 3 Destination Next Hop
LSP5 3, 99 Push 134.5.1.5 84 2 6 134.5.1.5
134.5.0.0/16 134.5.6.1
2 LSP3 1 200.3.2.0/24 200.3.1.1 9
134.5.1.5
8
Penultimate
200.3.2.7 2
200.3.2.7 3 LSP5 10
1 200.3.2.7
200.3.2.7 99 1 2 3 5
4
200.3.2.7 56 200.3.1.1
6

5 7 200.3.2.7
1,99 Swap 2,56 3,56 Swap
POP 5,3
null

The label at the top of the stack is removed (popped) by the upstream neighbor of the egress LSR
The egress LSR will not have to do a lookup and remove itself the label
• One lookup is saved in the egress LSR
Egress LSR needs to do an IP lookup for finding more specific route
Egress LSR need NOT receive a labelled packet
Section 8 Page 13
Hierarchical LSP tunnels : Label stacking
MPLS Table
In Proc Out
LSPa Push 25 MPLS Table

In Proc Out
In Proc Out 25 Swap 9 In Proc Out
25 13 Swap 11 2,9 Pop

Push 42
13 31 Pop
Push 42
1
2 25 42 42 18 18 31 31 9
LSPa 13 25 13 25 13 25
LSPb 1
13 3 2 6 1 3 2
3 11 2
LSPc MPLS Table MPLS Table
MPLS Table
In Proc Out MPLS Table
42 Swap 18 18 Swap 31
In Proc Out
LSPb Push 13
11 Pop

Hierarchical tunnel concept
One of the most powerful features of MPLS is label stacking . A labelled packet may carry many labels,
organized as a last-in-first-out stack. Processing is always based on the top label. At any LSR, a label
may be added to the stack (push operation) or removed from the stack (pop operation). Label stacking
allows the aggregation of LSPs into a single LSP for a portion of the route through a network, creating a
tunnel . At the beginning of the tunnel, an LSR assigns the same label to packets from a number of
LSPs by pushing the label onto the stack of each packet. At the end of the tunnel, another LSR pops
the top element from the label stack, revealing the inner label. This is similar to ATM, which has one
level of stacking (virtual channels inside virtual paths), but MPLS supports unlimited stacking.
Label stacking provides considerable flexibility. An enterprise could establish MPLS-enabled networks at
various sites and establish numerous LSPs at each site. The enterprise could then use label stacking to
aggregate multiple flows of its own traffic before handing it to an access provider. The access provider
could aggregate traffic from multiple enterprises before handing it to a larger service provider. Service
providers could aggregate many LSPs into a relatively small number of tunnels between points of
presence. Fewer tunnels means smaller tables, making it easier for a provider to scale the network
core.
Section 8 Page 14
MPLS shim label
1 2 3 4 5 6 7 8bit
Label (20 bits)

EXP S
TTL
Time To Live
Experimental use
bottom of stack
(explained in the following diagrams)

Exp : 3 bits reserved for experimental use; for example, these bits could communicate DS
(Differentiated Services) information or PHB (Per-Hop Behaviour) guidance
S : set to one for the oldest entry in the stack, and zero for all other entries
Time To Live (TTL): 8 bits used to encode a hop count, or time to live, value
Label value : locally significant 20-bit label
Labels 0 through 15 are reserved labels, as specified in draft-ietf-mpls-label-encaps-07.txt.
A value of 0 represents the "IPv4 Explicit NULL Label". This label value is only legal when it is the sole
label stack entry. It indicates that the label stack must be popped, and the forwarding of the packet
must then be based on the IPv4 header.
A value of 1 represents the "Router Alert Label". This label value is legal anywhere in the label stack
except at the bottom. When a received packet contains this label value at the top of the label
stack, it is delivered to a local software module for processing. The actual forwarding of the packet
is determined by the label beneath it in the stack. However, if the packet is forwarded further, the
Router Alert Label should be pushed back onto the label stack before forwarding. The use of this
label is analogous to the use of the "Router Alert Option" in IP packets. Since this label cannot occur
at the bottom of the stack, it is not associated with a particular network layer protocol.
A value of 2 represents the "IPv6 Explicit NULL Label".This label value is only legal when it is the sole
label stack entry. It indicates that the label stack must be popped, and the forwarding of the packet
must then be based on the IPv6 header.
A value of 3 represents the "Implicit NULL Label". This is a label that an LSR may assign and
distribute, but which never actually appears in the encapsulation. When an LSR would otherwise
replace the label at the top of the stack with a new label, but the new label is "Implicit NULL", the
LSR will pop the stack instead of doing the replacement. Although this value may never appear in
the encapsulation, it needs to be specified in the Label Distribution Protocol, so a value is reserved.
Values 4-15 are reserved for future use.
Section 8 Page 15
Bottom of stack
MPLS Table
In Proc Out
Label:88 Label:88
LSPa Push 2,25 S=0 S=0
MPLS Table Label:42 Label:42
In Proc Out S=0 S=0
1,25 Push 3,42 Label:42 Label:13 Label:25
Label:42 S=1
2 2,13 Push 3,42 S=1
1 S=0 S=0
Label:25 Label:25 Label:13
LSPa S=1 S=1 S=1
2
1 3 4
LSPb 2
Label:13 5
3 S=1
LSPC

In Proc Out
In Proc Out
LSPb Push 3,13 4,42 Push 2,88 S : bottom of stack

5,23 Push 2,88

Section 8 Page 16
Time to Live (TTL)
Shim label
Label
EXP S
TTL
MPLS network
LER LER
IP packet LSR LSR
Label = 25
TTL = 10
TTL= 9
ingress IP packet
TTL = 9
Label = 39
TTL= 8 IP packet
IP packet TTL = 5
TTL = 9
LER
Label = 21 IP packet
TTL= 7 TTL = 6
IP packet Egress
LSR TTL = 9 LSR

• A key field in the IP packet header is the TTL field (IPv4), or Hop Limit (IPv6). In an ordinary IP-based
internet, this field is decremented at each router and the packet is dropped if the count falls to zero. This is
done to avoid looping or having the packet remain too long in the internet because of faulty routing.
Because an LSR does not examine the IP header, the TTL field is included in the label so that the TTL
function is still supported. The rules for processing the TTL field in the label are as follows:
• When an IP packet arrives at an ingress edge LSR of an MPLS domain, a single label stack entry is added
to the packet. The TTL value of this label stack entry is set to the value of the IP TTL value. If the IP TTL
field needs to be decremented, as part of the IP processing, it is assumed that this has already been done.
• When an MPLS packet arrives at an internal LSR of an MPLS domain, the TTL value in the top label stack
entry is decremented. Then:
• If this value is zero, the MPLS packet is not forwarded. Depending on the label value in the label stack
entry, the packet may be simply discarded, or it may be passed to the appropriate "ordinary" network
layer for error processing (for example, for the generation of an Internet Control Message Protocol
[ICMP] error message).
• If this value is positive, it is placed in the TTL field of the top label stack entry for the outgoing MPLS
packet, and the packet is forwarded. The outgoing TTL value is a function solely of the incoming TTL
value, and is independent of whether any labels are pushed or popped before forwarding. There is no
significance to the value of the TTL field in any label stack entry that is not at the top of the stack.
• When an MPLS packet arrives at an egress edge LSR of an MPLS domain, the TTL value in the single
label stack entry is decremented and the label is popped, resulting in an empty label stack. Then:
• If this value is zero, the IP packet is not forwarded. Depending on the label value in the label stack entry,
the packet may be simply discarded, or it may be passed to the appropriate "ordinary" network layer for
error processing.
• If this value is positive, it is placed in the TTL field of the IP header, and the IP packet is forwarded using
ordinary IP routing. Note that the IP header checksum must be modified prior to forwarding.
Section 8 Page 17
Transparent TTL
Label = 25
TTL= 255 2 10.3.3.3
80.1.2.3→209.8.7.6 80.1.2.3→209.8.7.6
TTL=3 TTL=2 LSR3
10.2.2.2
LSR2
1 LER1 25
ingress
10.1.1.1
LSR6
Label = 46 46
TTL= 254 3 80.1.2.3→209.8.7.6
TTL=1
80.1.2.3→209.8.7.6
TTL=2 LSR4
LER5
10.4.4.4 63
Label = 63 10.5.5.5
TTL= 253 4
MPLS network 80.1.2.3→209.8.7.6
(Private addressing) TTL=2

In transparent mode, the ingress routers sets the label TTL to 255, a value high enough to allow the
packet to cross the MPLS network in normal conditions (no loop). The IP TTL field will be decreased (-
1) by the ingress LER. When the MPLS label is removed by the egress LER, the IP TTL is not updated
with the value of the TTL in MPLS label. The egress LER will decrease the IP TTL of -1, just like a
normal router would do.
Section 8 Page 18
EXPerimental : direct mapping
Ethernet 802.1q IP Payload ToS : Type of Service

header 802.1p header
DP : Drop Precedence
User C
priority I VLAN_id
F
3 bits
Mapping
Ethernet IP header
header Payload
ToS
LER Label EXP S TTL
Class DP DiffServ Code Point 3 bits
3 2 3
bits bits bits
Ethernet IP header
header ToS Payload
Prec Precedence
3 5
bits bits

The EXP field of the MPLS Shim Header is used by the LSR to determine the PHB to be applied to the
packet.
The Exp bits are set by creating an ingress policy on the ingress LSR. This ingress policy sets the Exp bits in
relation to values associated with the frames and packets traversing the LSP. For example, if a VLAN
trunk port is tunneled through the LSP, the EXP bits can be set by directly copying the values contained
within the three 802.1p priority bits of the 802.1Q headers. Once packets/frames have reached the egress
LSR, an egress policy can be created on the egress LSR that maps the Exp bits back into the bit values of
the packets or frames.
Section 8 Page 19
Notion of Upstream and Downstream LSRs
A C Egress
LER 171.68.10/24
LER
Ingress
LS
P B
Upstream LSR Downstream
Router-C is the downstream neighbour of Router-B for destination 171.68.10/24

Router-B is the downstream neighbour of Router-A for destination 171.68.10/24
LSRs know their downstream neighbours through the IP routing

protocol
Next-hop address is the downstream neighbour

MPLS networks allocate labels from downstream direction toward the upstream routers, toward the
source of a packet flow.
The term “Downstream” refers to the direction of packets flow. Control messages usually flow
“Upstream”
Section 8 Page 20
Label distribution method
•Downstream unsolicited •Downstream on-demand
Net_x
Net_x
LSR LSR LSR LSR
Demand
FEC : net_x → label y FEC : net_x
Response
FEC : net_x → label y

Label distribution method

Downstream on demand :
An LSR can distribute a FEC label binding in response to an explicit request

Downstream Unsolicited label distribution:
Allows an LSR to distribute label bindings to LSRs that have not explicitly requested them
Downstream On-Demand (DoD) Label Distribution

In downstream on-demand mode, label mappings are provided to an upstream LSR when requested. Because labels
will not usually be requested unless needed, this approach results in substantially less label-release traffic for
unwanted labels when conservative label retention is in use and when the number of candidate interfaces that will
not be used for a next hop is relatively large.
Downstream Unsolicited (DOU) Label Distribution

In downstream unsolicited mode, label mappings are provided to all peers for which the local LSR might be a next
hop for a given FEC. This would typically be done at least once during the lifetime of a peer relationship between
adjacent LSRs.
The label manager may use trigger points (such as time intervals) to send out labels or label-refresh messages every
45 seconds. Or a label manager may use the change of standard routing tables as a trigger; when a router changes,
the label manager may send out label updates to all affected routers.
Both can be used in the same network at the same time; however, each LSR must be aware of the distribution
method used by its peer
Section 8 Page 21
Label distribution control
Two control methods
Independent control Each router makes its switch table from its
routing table and informs neighbors
LSR LSR
LER 1 LER
Ingress 2 3 Egress
Ordered control Egress LER is responsible for distributing labels
LSR LSR
LER 2 LER
Ingress 3 1 Egress

Control of Label Distribution

Two modes are used to load cross-connect tables: independent control and ordered control.
Independent control
Independent control is a term given to a situation in which there is no designated label manager and
when every router has the ability to listen to routing protocols, generate cross-connect tables, and
distribute them freely. Independent control provides for faster network convergence. Any router that
hears of a routing change can relay that information to all other routers. The disadvantage is that
there is no single point of control that is generating traffic, which makes engineering more difficult.
LSR binds a Label to a FEC independently, whether or not the LSR has received a Label the next-hop
for the FEC. The LSR then advertises the Label to its neighbor
Consequence: upstream label can be advertised before a downstream label is received
Ordered Control
The other model of loading tables is ordered control. In the ordered control mode, one router—
typically the egress LER—is responsible for distributing labels. Ordered control has the advantages of
better traffic engineering and tighter network control; however, its disadvantages are that
convergence time is slower and the label controller is the single point of failure.
LSR only binds and advertise a label for a particular FEC if it is the egress LSR for that FEC or it has
already received a label binding from its next-hop
Both methods are supported in the standard and can be fully interoperable
Section 8 Page 22
“Downstream unsolicited” and “Ordered control”
#99
LSR8
LSR4 #216 #99
FEC:171.68.10.0/24
Use label #216
216
3 LSR3 FEC:171.68.10.0/24
→#216 Use label #99
2 LSR1 171.68.10/24
LSR6 FEC:171.68.10.0/24
FEC:171.68.10.0/24 3
Use label #99
Use label #612
99
2’ 99
4 1
612 LSR5 LSR2
99 FEC:171.68.10.0/24
→#612 4’ 2” Use label #33
612 FEC:171.68.10.0/24
Use label #99
LSR7 FEC:171.68.10.0/24
Use label #612 #99 #33
#612 #99
→#612

LSR1 discovers a ‘next hop’ for a particular FEC
LSR1 generates a label for the FEC and communicates the binding to LSR2
LSR2 inserts the binding into its forwarding tables
….
Section 8 Page 23
“Downstream On-Demand” and “Ordered control”
→#216
Req label FEC: LSR8
LSR4 171.68.10.0/24 #216 #99
1
216
Use label #216 LSR3
6
2
Req label FEC:
171.68.10/24 Req label FEC:
5 171.68.10.0/24 LSR1 171.68.10/24
LSR6 Use label #99 3 3
99
4
LSR2
Use label #33
LSR5
#99 #33
LSR7

1- LSR4 recognizes LSR3 as its next-hop for an FEC. A request is made to LSR4 for a binding between
the FEC and a label
the FEC and a label
the FEC and a label
4- LSR1 is the ‘egress’ LSR to that particular FEC so, LSR1 replies to LSR2 with a label. LSR2 updates its
switching table.
5- Because a label binding has been received by LSR2 from upstream LSR3, LSR2 replies to LSR3 with a
label. LSR3 updates its switching table.
6- Because a label binding has been received by LSR3 from upstream LSR4, LSR3 replies to LSR4 with a
label. LSR4 updates its switching table.
7 – LSR6 recognizes LSR5 as its next-hop for an FEC. A request is made to LSR5 for a binding between
the FEC and a label
8 – LSR5 recognizes LSR2 as its next-hop for an FEC. A request is made to LSR2 for a binding between
the FEC and a label
9 - LSR2 recognizes the FEC and has a next hop for it, it creates a binding and replies to LSR5
….
Section 8 Page 24
Label retention modes
LSR2
FEC:171.68.10/24
An LSR may receive Use label #33
FEC:171.68.10/24
label bindings from Use label #576
multiple LSRs
FEC:171.68.10/24
LSR5 Use label #33 LSR1
171.68.10/24
FEC:171.68.10/24
FEC:171.68.10/24 Use label #33
Use label #63 LSR4
LSR3
FEC:171.68.10/24
Use label #45
Liberal label retention

Label retention modes
Conservative label retention
Label retention mode
Liberal retention mode

LSR retains labels from all neighbors
Improve convergence time, when next-hop becomes unavailable
Require more memory and label space
Conservative retention mode

LSR retains label only from next-hops neighbors (according to routing)
LSR discards all other labels for this FECs
Free memory and label space
Label Retention method trades off between label capacity and speed of adaptation to routing changes
Section 8 Page 25
Label Distribution Protocols
Populating MPLS switching tables
Manually Only on very small networks
LDP protocol
Automatically Based on existing IP routing tables
MP-BGP protocol
Traffic Engineering RSVP-TE protocol Based on Explicit path

There are several ways to populate the MPLS switching tables.

Manually, which is only realistic for a very limited number of equivalence classes (FECs).
Automatically:
By means of the Label Distribution Protocol (LDP), which is entirely automatic and which builds, on
the basis of the information contained in the IP routing tables, the LSPs for each of the equivalence
classes recognized in the routing tables. With this approach, paths are built hop by hop with an
operation principle similar to that of the IP routing protocols.
By means of the BGP4 with the addition of label distribution information, becoming thus the Multi-
protocol BGP (MP-BGP).
By supplying explicitly the path that the LDPs must follow and the quality of service they must ensure.
These solutions are based on two protocols:
The ReSerVation Protocol – Traffic Engineering (RSVP-TE) is a modification of RSVP which is already
present in the equipment of lot of manufacturers.
LDP is the hop-by-hop distribution protocol defined by the MPLS working group of IETF. It is totally
independent of the pre-existing protocols. The operation mode of LDP is based on the model of the IP
routing protocols. LDP uses the routing table generated by these protocols to build the MPLS switching
tables. The principle of LDP is simple: each LSR attributes a label to each of the neighbor LSRs for each
equivalence class recognized in its routing table. Then the neighbor will use this label for all the
packets of this equivalence class that the LSR sends to it.
Section 8 Page 26
LDP: Functions
LSR-ID: 5.6.7.8 LSR-ID: 1.2.3.4
LSR LSR
Label Distribution Protocol
•Neighbors management
•Session establishment with parameter negotiation
•Label/FEC association exchange

LDP is the hop-by-hop distribution protocol defined by the MPLS working group of IETF. It is totally
independent of the pre-existing protocols.
The Label Distribution Protocol (LDP) is entirely automatic. This protocol builds, on the basis of the
information contained in the IP routing tables, the LSPs for each of the equivalence classes recognized
in the routing tables. With this approach, paths are built hop by hop with an operation principle similar
to that of the IP routing protocols.
It uses the routing table to build the MPLS switching tables. It establishes automatically a path (LSP) for
each equivalence class. It offers different modes of distribution and of conservation of labels, thanks
to which it can adapt to different uses.
In order LDP works, all the internal LSRs of a domain must imperatively know the same FECs. For that, it
is possible to aggregate the inputs of the IP routing tables inside an MPLS domain. The border LSRs are
the only ones that can aggregate the prefixes. If prefixes could be aggregated inside a domain,
downstream LSRs would not be able anymore to de-aggregate the packets that, although intended to
different networks, would have the same label.
LDP ensures 3 main functions:

neighbors management.
the establishment of LDP sessions and the negotiation of the parameters of operation.
the exchange of FEC/label associations and more generally of switching information.
Each LSR has a unique identifier which is generally the IP address of a loopback interface.
Section 8 Page 27
LDP: Association Exchange
NetID y NetID x
Label mapping
FEC: NetID y → #L22
Label mapping
FEC: NetID x → #L63
Label Release Downstream LSR
FEC: NetID x → #L63
Upstream LSR Label Withdraw
FEC: NetID y → #L22

Once an LDP session is established, several types of message are used to exchange Label/FEC
associations:
Label Request (F): this message is sent by the upstream LSR to ask which label must be used for the
packets belonging to the FEC.
Label Mapping (F, L): the downstream LSR uses this message to attribute the upstream LSR a label to
be used for the packets corresponding to the FEC. This message can be spontaneous or can be sent on
receipt of a label request.
Label Withdraw (F, {L, *}): the downstream LSR informs the upstream LSR that the L label/F FEC
association is no more valid and that this label must not be used anymore. When the label is omitted
(*), all the associations corresponding to the F FEC are invalidated. The downstream LSR uses this
message for example in case of routing change or when it cannot route the F FEC anymore.
Label Release (F, L): the upstream LSR informs the downstream LSR that it does not need any F/L
association. The upstream LSR can manage this message because the routing has just changed or
because it received an unsolicited and unnecessary label attribution.
Section 8 Page 28
LDP _ Label Distribution Protocol
8 7
In In FEC Out Out In In FECOut Out
10
FEC if label if label if label if label
In In Out Out
if label if label 1 25 138.120 3 33 1 33 138.120 - -
- - 138.120 3 25
3
5
1 B 3
1
4
138.120
LSR 2 6 1
Label Mapping (LSP-id: x) C
3 9 (label 33)
3
A Label Mapping (LSP-id: x)
2
LSR
1 192.168
2 (label 25)
LSR
3
2
LSR

LDP defines a set of procedures and messages by which one LSR (Label Switched Router) informs another
of the label bindings it has made. The LSR uses this protocol to establish label switched paths through a
network by mapping network layer routing information directly to data-link layer switched paths.
Two LSRs (Label Switched Routers) which use LDP to exchange label mapping information are known as
LDP peers and they have an LDP session between them. In a single session, each peer is able to learn
about the others label mappings, in other words, the protocol is bi-directional.
Label Distribution Protocol (LDP) is often used to establish MPLS LSPs when traffic engineering is not
required. It establishes LSPs that follow the existing IP routing, and is particularly well suited for
establishing a full mesh of LSPs between all of the routers on the network.
LDP can operate in several modes to suit different requirements:

On-demand mode, the ingress router sends an LDP label request to the next hop router, as
determined from its IP routing table. This request is forwarded on through the network hop-by-hop
by each router. Once the request reaches the egress router, a return message is generated. This
message confirms the LSP and tells each router the label mapping to use on each link for that LSP.
Unsolicited mode, the egress routers broadcast label mappings for each external link to all of their
neighbors. These broadcasts are fanned across every link through the network until they reach the
ingress routers. Across each hop, they inform the upstream router of the label mapping to use for
each external link, and by flooding the network they establish LSPs between all of the external
links.
The main advantage of LDP over RSVP is the ease of setting up a full mesh of tunnels using unsolicited
mode, so it is most often used in this mode to set up the underlying mesh of tunnels needed by MPLS
enabled VPNs
Section 8 Page 29
2. MPLS Traffic Engineering

Section 8 Page 30
Drawbacks of IP routing
Traffic based on the D

Cost= 10
F Cost= 15 H
lowest metrics 10
C 10 10
10 10
G 10 A
10 J over-utilized links
A 10
B E 5 congestion under-utilized links
traffic
Changing the metric Cost= 10
1 Cost= 151
1 D F H
10
cause traffic redirection
C 10 10
10 10
10 G 10
J
A 10
B E 10
•Only serves to move problem around MPLS-TE

•Lacks granularity

Rerouting traffic by raising metrics along the current path has the desired effect of forcing the traffic via
another way.
Since interior gateway protocol (IGP) route calculation was topology driven and based on a simple
additive metric such as the hop count or an administrative value, the traffic patterns on the network
were not taken into account when the IGP calculated its forwarding table. As a result, traffic was not
evenly distributed across the network's links, causing inefficient use of expensive resources. Some links
became congested, while other links remained underutilized. This might have been satisfactory in a
sparsely-connected network, but in a richly-connected network (that is, bigger, more thickly meshed
and more redundant) it is necessary to control the paths that traffic takes in order to balance loads.
IGP is topology driven as opposed to being resource driven!
As Internet service provider (ISP) networks became more richly connected, it became more difficult to
ensure that a metric adjustment in one part of the network did not cause problems in another part of
the network. Traffic engineering based on metric manipulation offers a trial-and-error approach rather
than a scientific solution to an increasingly complex problem.
IGP metric manipulation has a “snap” effect when it comes to redirecting traffic… (not an “even”
distribution)
ISPs became uncomfortable with size of Internet core

Large growth spurt imminent
Routers too slow
IGP metric engineering too complex
IGP routing calculation was topology driven, not traffic driven
Router based cores lacked predictability
Section 8 Page 31
Goal
Traffic engineering:
• Optimization of resource usage

(congestion risks are limited)
• Quick re-routing
• Guarantee of the Quality of Service (QoS)

MPLS-TE is used for some security and traffic engineering applications:

optimization of the resource usage: explicit constraint routing enables a better load sharing than IP
routing which is based on the shorter path towards a given destination.
quick re-routing: explicit routing enables to protect primary tunnels with backup tunnels set in
advance. Thanks to this securization, there is no convergency of protocols during the breakdown and
the re-routing delays are then very short (less than 100 ms). This technology is called MPLS Fast
Reroute (§ 2).
Guarantee of the Quality of Service (QoS): MPLS-TE is not, in the literal sense, a QoS mechanism and
cannot by itself guarantee the quality of service. It favors the setting up of a QoS offer. It is important
to remember from now on that the reservation of bandwidth performed by MPLS-TE is purely logic and
is only used for facilitating constraint routing. It is a function of the control plan and there is no real
resource allocation in the transfer plan. The bandwidth guarantee requires at least to combine MPLS-
TE with a mechanism of rate limitation on the tunnel head routers. The DiffServ Aware MPLS-TE (DS-
TE) technology is an extension of MPLS-TE that enables to route the tunnels per class of DiffServ
service and to reserve the resources per class of service, not globally anymore. It enables a sharper
traffic engineering. Combined with a mechanism of rate limitation per class of service on the tunnel
head routers, DS-TE enables to make sure that the traffic per class of service on a link does not exceed
a given threshold and guarantees not only the bandwidth but also the transit time in a tunnel.
The purpose of traffic engineering is to maximize the amount of traffic that can transit in the network,
while maintaining the quality of service, in order to delay at a maximum the network investments
(links, router) (RFC 3272). Traffic engineering works on fixed network topology.
Traffic engineering is complementary of network engineering. Network engineering consists in finding a
topology or in modifying the topology (links, nodes) to support the demand or an increase in demand.
Both types of engineering are often distinguished by the following phrase: "traffic engineering consists
in making the traffic go where bandwidth is whereas network engineering consists in putting
bandwidth where traffic goes."
Section 8 Page 32
MPLS-TE
First request : an LSP from ingress to egress LSR of 500 Mb/s

Second request : an LSP from ingress to egress LSR of 150 Mb/s
Third request : an LSP from ingress to egress LSR of 100 Mb/s
s
55 Mb/
2.
1
5
150 Mb/s
G
/s
b/
Gb
s
100 Mb/s
2.5
622
Mb
INGRESS LSR /s
b/s
500 Mb/s 45 M EGRESS LSR

Metric-based traffic controls continued to be an adequate traffic engineering solution until 1994 or 1995.
At this point, some ISPs reached a size at which they did not feel comfortable moving forward with
either metric-based traffic controls or router-based cores.
Traditional software-based routers had the potential to become traffic bottlenecks under heavy load
because their aggregate bandwidth and packet-processing capabilities were limited.
It became increasingly difficult to ensure that a metric adjustment in one part of a huge network did not
create a new problem in another part. And router-based cores did not offer the high-speed interfaces
or deterministic performance that ISPs required as they planned to grow their core networks.
Traffic engineering (RFC 3346)

Traffic Engineering is the process where data is routed through the network according to a management
view of the availability of resources and the current and expected traffic.
The class of service and quality of service required for the data can also be factored into this process.
Traffic Engineering may be under the control of manual operators. They monitor the state of the network
and route the traffic or provision additional resources to compensate for problems as they arise.
Alternatively, Traffic Engineering may be driven by automated processes reacting to information fed
back through routing protocols or other means.
Traffic Engineering helps the network provider make the best use of available resources.
One of the main uses for MPLS will be to allow improved Traffic Engineering on the ISP backbone
networks.
Section 8 Page 33
Constrained SPF
•Available Bandwidth
CSPF calculation OSPF-TE, IS-IS •Priority
•Attributes
Path Cost Available BW
•Administrative Weight
a-c 1 10
a-b-c 3 100
a-d-e-c 4 500
b
C=1 C=2 Y
200Mb/s 100Mb/s
a C=1 c
10Mb/s
Tunnel C=1
C=1
a→c : 200Mb/s 1Gb/s
1Gb/s
d C=2 e
500Mb/s
CSPF (Constrained Short Path First)

The ingress LSR determines the physical path for each LSP by applying a Constrained Shortest Path First
(CSPF) algorithm to the information in the TE-database . CSPF is a shortest-path-first algorithm that
has been modified to take into account specific restrictions when calculating the shortest path across
the network. Input into the CSPF algorithm includes:
Topology link-state information learned from the IGP and maintained in the TE-database
Attributes associated with the state of network resources (such as total link bandwidth, reserved link
bandwidth, available link bandwidth, and link color) that are carried by IGP extensions and stored in
the TE-database
Administrative attributes required to support traffic traversing the proposed LSP (such as bandwidth
requirements, maximum hop count, and administrative policy requirements) that are obtained from
user configuration
The output of the CSPF calculation is an explicit route consisting of a sequence of LSR addresses that
provides the shortest path through the network that meets the constraints. This explicit route is then
passed to the signaling component, which establishes forwarding state in the LSRs along the LSP. The
CSPF algorithm is repeated for each LSP that the ingress LSR is required to generate.
Following constraints can be taken into account :

bandwidth reservation
include or exclude a specific link(s)
include specific node traversal(s) Constraint-Based Routing in IP networks.
optional backup paths
Network continuously keeps track of these constraints and floods them through IGP extensions. For a new LSP to be
launched in the network, operator configures LSP constraints at ingress LSR, network actively participates in
selecting an LSP path that meets the constraints and represents it as an explicit route
Section 8 Page 34
MPLS-TE components
Destination
• Maximum,
Bandwidth • Reservable,
• Unreserved per priority
Affinities
Preemption
Protection by fast reroute
Optimized metric
Destination The source of the TE LSP is the head-end router where the TE LSP is configured, whereas its
destination must be explicitly configured.
Bandwidth One of the attributes of a TE LSP is obviously the bandwidth required for the TE LSP. The
traffic flow pattern between two points is rarely a constant and is usually a function of the time of
day, not to mention the traffic growth triggered by the introduction of new services in the network or
just an accrued use of existing services. Hence, it is the responsibility of the network administrator to
determine the bandwidth requirement between two points and how often it should be reevaluated.
You can adopt a very conservative approach by considering the traffic peak, X percent of the peak or
averaged bandwidth values. After you determine the bandwidth requirement, you can apply an
over/underbooking ratio, depending on the overall objectives. Another approach consists of relying on
the routers to compute the required bandwidth based on the observed traffic sent to a particular TE
LSP.
Affinities A field that must match the set of links a TE LSP traverses represents affinities.
Preemption The notion of preemption refers to the ability to define up to seven levels of priority. In the
case of resource contention, this allows a higher-priority TE LSP to preempt (and, consequently, tear
down) lower-priority TE LSP(s) if both cannot be accommodated due to lack of bandwidth resources on
a link.
Protection by Fast Reroute MPLS Traffic Engineering provides an efficient local protection scheme
called Fast Reroute to quickly reroute TE LSPs to a presignaled backup tunnel within tens of
milliseconds
Optimized Metric The notion of shortest path is always related to a particular metric. Typically, in an IP
network, each link has a metric, and the shortest path is the path such that the sum of the link metrics
along the path is minimal. MPLS TE also uses metrics to pick the shortest path for a tunnel that
satisfies the constraints specified. MPLS TE has introduced its own metric. When MPLS TE is configured
on a link, the router can flood two metrics for a particular link: the IGP and TE metrics (which may or
may not be the same).
Section 8 Page 35
Explicit Paths
Path to Y
Hop Strict/loose Mandatory path
10.1.1.1 Strict
Loose path
10.2.2.2 Loose
10.3.3.3 Strict
g
c 10.3.3.3
b f
10.1.1.1
10.2.2.2
Tunnel
a→g a
d e

Section 8 Page 36
RSVP-TE : LSP and path
LSP: Point-to-Point entity Path : tunnel instance

B C F
l or LSP
e
Tunn
A Path 2 D E
Path
1
G
Tunnel is made of one or several LSPs

RSVP-TE makes the distinction between LSP and tunnel :

— un tunnel is an unidirectional point-to-point routing entity. It is made of one or more LSP, each LSP
match a particular path;
— an LSP is a path.
The LSPs associated to a MPLS-TE tunnel can be modified during their life. This concept allows, for
instance, the reoptimization without loss of traffic.
the modification of the path of a tunnel without loss of traffic consists in establishing a new LSP on the
new path and rocking the traffic of the old LSP towards the new LSP before destroying the old LSP
(procedure known as make before break). The two LSP are identified like pertaining to the same tunnel
and the bandwidth is reserved only once on the common link.
A tunnel is identified by the combination of the address of the head router, address of the destination
router and a tunnel number (tunnel id) allocated by the head router. The combination of the address
of the head and the tunnel-Id ensures the unicity of the identifiers of tunnel in the network. In the
same way, a LSP is identified by the combination of the tunnel-Id and a LSP number (LSP-id) allocated
by the head router. The combination of the tunnel-Id and the LSP-Id ensures the unicity of the
identifiers of LSP in the network. The tunnel-Id does not vary during the life of the tunnel. The whole
of one or several LSP associated with a tunnel can vary during the life of the tunnel. For example, in
the event of breakdown of LSP or reoptimisation, a new LSP with a new LSP-Id is created for the
tunnel.
Section 8 Page 37
RSVP-TE : Principle
In Out Next out

Oper
Out Next out lab lab Hop if
FEC Oper lab Hop if
15 Swap 33 C 3
C Push 15 B 3
6
8 Path (Tunnel-ID:n/LSP-ID: x)
(ERO:c / Traffic param)
Path( Tunnel-ID:n, LSP-ID: x)
3 3
ERO: b,c / Traffic param 1 B
2 4 4
138.120
LSR 2 5 1
Resv (Tunnel-ID:n/LSP-ID: x) C
3 7 (label 33 / final Traffic param)
3
A Resv (Tunnel-ID:n/LSP-ID: x)
2
LSR
1 192.168
2 (label 15 / final Traffic param)
LSR
3
2
ERO : Explicit Route Object LSR

Generic RSVP (Resource reSerVation Protocol) uses a message exchange to reserve resources across a
network for IP flows. The Extensions to RSVP for LSP Tunnels (RSVP-TE) enhances generic RSVP so that
it can be used to distribute MPLS labels.
RSVP-TE is a separate protocol at the IP level. It uses IP datagrams (or UDP at the margins of the
network) to communicate between LSR peers. It does not require the maintenance of TCP sessions,
but as a consequence of this it must handle the loss of control messages.
The basic flow for setting up an LSP using RSVP-TE for LSP Tunnels is :
1. The traffic parameters required for the session or administrative policies for the network enable LSR
A to determine that the route for the new LSP should go through LSR B, which might not be the same
as the hop-by-hop route to LSR-C. LSR A builds a Path message with an explicit route of (B,C) and
details of the traffic parameters requested for the new route.
2. LSR A then forwards the Path to LSR B as an IP datagram.
3. LSR B receives the Path request, determines that it is not the egress for this LSP, and forwards the
request along the route specified in the request. It modifies the explicit route in the Path message
and passes the message to LSR-C.
4. LSR C determines that it is the egress for this new LSP, determines from the requested traffic
parameters what bandwidth it needs to reserve and allocates the resources required. It selects a
label for the new LSP
5. LSR-C distributes the label to LSR B in a Resv message, which also contains actual details of the
reservation required for the LSP.
6. LSR B receives the Resv message and matches it to the original request using the LSP ID contained in
both the Path and Resv messages. (7) It determines what resources to reserve from the details in the
Resv message, allocates a label for the LSP, sets up the forwarding table.,
8. LSR-B passes the new label to LSR A in a Resv message.
9. The processing at LSR A is similar, but it does not have to allocate a new label and forward this to an
upstream LSR because it is the ingress LSR for the new LSP.
Path and Resv refresh unless suppressed
Section 8 Page 38
Path Protection – Secondary/Standby LSP
3 New path computation

4 Signaling of the new LSP Failure 2
notification
5
Failure 1
Backup tunnel

The default mode of network recovery of MPLS Traffic Engineering, is a global restoration mechanism:
Global The node in charge of rerouting a TE LSP affected by a network element failure is the head-
end router.
Restoration When the head-end router is notified of the failure, a new path is dynamically
computed, and the TE LSP is signaled along the new alternate path (assuming one can be found).
A LSP is initially set up. The link fails. After a period of time (the fault detection time), the upstream
router detects the failure. This period of time essentially depends on the failure type and the Layer 1
or 2 protocol. If you assume a Packet over SONET (PoS) interface, the fault failure detection time is
usually on the order of a few milliseconds. In the absence of a hold-off timer, the router upstream of
the failure immediately sends the failure notification (RSVP-TE Path Error message) to the head-end
router.
Accurately quantifying the time required to perform the set of operations just described is particularly
difficult because of the many variables involved. These include the network topology (and hence the
number of nodes the failure notification and the new LSP signaling messages have to go through and
the propagation times of those through fiber), the number of TE LSPs affected by the failure, CPU
processor on the routers, and so on. We can provide an order of magnitude. On a significantly large and
loaded network, the CSPF time and RSVP-TE processing time per node are usually a few milliseconds.
Then the propagation delay must be taken into account in the failure notification time as well as in the
signaling time. So, on a continental network, MPLS TE head-end rerouting would be on the order of
hundreds of milliseconds.
MPLS TE Reroute is undoubtedly the simplest MPLS TE recovery mechanism because it does not require
any specific configuration and minimizes the required amount of backup state in the network. The
downside is that its rerouting time is not as fast and predictable as the other MPLS TE recovery
techniques that are discussed next. Indeed, the fault first has to be signaled to the headend router,
followed by a path computation and the signaling of a new TE LSP along another path, if any (thus with
some risks that no backup path can be found, or at least with equivalent constraints).
7750 SR : Up to seven secondary or standby LSPs can be specified for each primary LSP. All the
secondary paths are considered equal and the first available path is used.
Section 8 Page 39
Fast Reroute
Protected LSP R2 R4
R1 R3
R5
R6 R9
R7 R8
R1’s backup: R1>R6>R7>R8>R3
Detour or Bypass LSP
R2’s backup: R2>R7>R8>R4
R3’s backup: R3>R8>R9>R5
R4’s backup: R4>R9>R5

Two different methods for local protection. In the one-to-one backup method, a PLR (Point of Local Repair)
computes a separate backup LSP, called a detour LSP, for each LSP that the PLR protects. In the facility
backup method, the PLR creates a single bypass tunnel that can be used to protect multiple LSPs.
The facility backup fast reroute method uses a facility backup tunnel, or bypass, to bypass a failed link
or a failed node. This method takes advantage of MPLS's label stacking capabilities, and all LSPs
protected using this method are protected using a single, common bypass tunnel. Their original labels
are left intact, and another label is pushed on top to direct it through the bypass tunnel. At the egress
end of the tunnel, the traffic is merged back into the original path by popping the outer label and
examining the inner label to find out where the packet should go.
One-
One-to-
to-One Backup:
Backup A local repair method in which a backup LSP is separately created for each protected
LSP at a Point of Local Repair .
Each upstream node sets up a detour LSP that avoids only the immediate downstream node, and merges
back on to the actual path of the LSP as soon as possible. If it is not possible to set up a detour LSP
that avoids the immediate downstream node, a detour can be set up to the downstream node on a
different interface.
The detour LSP may take one or more hops before merging back on to the main LSP path.
Section 8 Page 40
3. MPLS VPN Services

Section 8 Page 41
Layer 2 VPN services
Point-to-Point Service PWE3 (Pseudo Wire Emulation Edge to Edge)

or VPWS (Virtual Private Wire Service)
a.k.a (Alcatel-Lucent) : VLL (Virtual Leased Line)
PE LAN
B
LAN
A PE
LAN
B
LAN
PE
PE LAN
C D
Point-to-Multipoint Service VPLS : Virtual Private LAN Service
Layer-3 VPNs worked well for a number of customers; however, there was a significant percentage of the
marketplace using legacy systems and networks for whom a Layer-2 VPN solution would be better suited.
Businesses in the marketplace found that Layer-3 VPNs met only part of the end users’ requirements.
Back in the early days of MPLS implementation, early adopters of the technology discovered that there
was a market demand for Layer-2 VPNs as well.
For MPLS carriers wishing to capture the FR and ATM market place, VPWS offers rapid service conversion.
Customers will be able to maintain their FR or ATM connection with the same equipment. The difference
is that traffic will now be carried encapsulated in an MPLS header and run over an MPLS network.
In VPWS, the service providers provide a pseudo-wire across the network. This overlay model provides
circuit emulation from customer to customer. It provides services similar to ATM and FR; however,
significant cost savings can be realized using MPLS
As these needs were identified, different architectures were suggested for MPLS Layer-2 VPNs, including:
PWE3 (Pseudo Wire Emulation Edge to Edge ≡ VLL (Virtual Leased Line) One of the important features
of this solution is that the configuration and management required in the provider network is much
simpler than that for leased lines or the MPLS and Martini solutions mentioned above – this makes it
cheaper for the provider to supply such a service.
In addition, this type of VPWS is more flexible than using leased lines.
VPLS (Virtual Private LAN Services) ≡ TLS (Transparent LAN Service)
Section 8 Page 42
3.1 Virtual Private Wire Service (VPWS)

Section 8 Page 43
Point to Point VPN (Pseudowire) Principle
Site 1 Point-to-Point connection LSP

“Pseudo-wire” Site 2
Red Red
ATM
FR
PPP P
HDLC PE2
Eth P
Site 1 Site 2
Blue PE1
P Blue
P
Site 1
Site 2
Green Green
LSP PE3
Pseudo-wire
Encapsulation method RFC 4448 aka “Martini draft”) Luca Martini
LDP (RFC4447)
Signaling protocol (Pseudo-wire setup and control) MP-BGP (RFC 4761)
PWE3 (Pseudo Wire Emulation Edge to Edge)

An improvement on this approach is to use the PWE3 extensions to MPLS that are currently being
standardized by the IETF in the PWE3 working group.
These extensions improve scalability by using a fixed number of MPLS LSPs between PE devices in the
provider network. Emulated, point-to-point layer 2 connections (known as pseudo-wires or Martini
pseudo-wires, after the author of the original draft) are then created between pairs of PE devices by
tunneling through such an LSP.
Section 8 Page 44
Encapsulation – Ethernet
Site 1 Example : Transport of Ethernet Site 2

Red a.k.a (Alcatel-Lucent) : “epipe” (Ethernet pt2pt) Red
P
PE2
P Site 2
Blue
Site 1
Blue PE1 LSP
nx
Ethernet 4 bytes
Label EXPS TTL
4 bytes Pseudo wire EXPS TTL
Label
Mac @ dest
Mac @ src Control Word Sequence nb (opt)
EtherType (4 bytes)
Mac @ dest
Data Mac @ src
EtherType
FCS Data
RFC 4448 : Encapsulation Methods for Transport of Ethernet over MPLS Networks FCS

Raw Mode vs. Tagged Mode

Raw mode - the PW represents a connection between two Ethernet ports. This means that if CE tags the
frame it is not meaningful to the PE, the frame is delivered as it is received by MPLS network
Tagged mode - the PW represents a connection between two VLANs. The tag is used by the service
provider to distinguish the traffic. Each VLAN is represented by a different PW.
Control Word
When carrying Ethernet over an MPLS backbone, sequentially may need to be preserved. The optional
control word along the guidelines of is defined here, and addresses this requirement. In general,
applications running over Ethernet do not require strict frame ordering.
QoS Considerations
The ingress PE may consider the user priority (PRI) field [802.1Q] of the VLAN tag header when determining
the value to be placed in a QoS field of the encapsulating protocol (e.g., the EXP fields of the MPLS label
stack). In a similar way, the egress PE may consider the QoS field of the MPLS (e.g., the EXP fields of the
MPLS label stack) protocol when queuing the frame for CE-bound.
Ethernet pt2pt “epipe” - aka. VPWS (Virtual Private Wire Service)

Transparent to the subscriber’s data and protocols
True VLL - No MAC learning
The Service provider can apply proper QoS treatment, billing and ingress/egress filtering, shaping and
policing
Draft-martini-l2circuit-eth-mpls service encapsulation
VC label dynamically assigned (T-LDP) or provisioned
Section 8 Page 45
Signaling – LDP (TLDP)
PWID = 66 (Bidirectional VC) VPN label : 18

PE2
LSPs P
VPN label : 23 If 0
P eth
PE1
If 1 PWID : 66
eth Remote PE : PE1
LDP : LABEL_MAPPING_Message VPN label : 18
PWID : 66 PW type : Ethernet
Remote PE : PE2 FEC : Virtual Circuit
•PWID : 66
VPN label : 23 •MTU : 1500 Manual
•Control Word : Present /not present configuration
Label : 18
Manual LDP : LABEL_MAPPING_Message
configuration
is required PW type : Ethernet
FEC : Virtual Circuit
•PWID : 66
•MTU : 1500
•Control Word : Present /not present
Label : 23
RFC 4447 : Pseudowire set up and maintenance using LDP

Virtual Circuit FEC Element

C - Control Word present
VC Type - FR, ATM, Ethernet, HDLC, PPP, ATM cell. Assigned values are specified in "IANA Allocations
for Pseudowire Edge to Edge Emulation (PWE3)
VC Info Length - length of VCID field
Group ID - user configured - group of VCs representing port or tunnel index
PW ID - used with VC type to identify unique VC.
Interface Parameters - Specific I/O parameters
Note that the PW ID and the PW type MUST be the same at both endpoints.
Section 8 Page 46
Signaling – LDP/MP-BGP
VC100
FR network VC200
CE1 CE2
VC201
VC101
VC300 VC301
RED-CE2
RED-CE1
Local label mapping
Local label mapping CE3
L21 L21(CE1→→CE2)→VC200
VC100←L12(CE1←CE2) L12 L23(CE3→→CE2) →VC201
VC101←L13(CE1←CE3)
MPLS network PE VC200
PE CE2
VC100 VC201
CE1
VC101 L23
L13
PE
RED-CE3 LSP
Local label mapping L31 L32
VC300←L31(CE3←CE1 VC300 VC301
VC301←L32(CE3←CE2) CE3
RFC 4761 : Using BGP for Auto-Discovery and Signaling
Another solution is described in draft-kompella-ppvpn-l2vpn. This draft gives a mechanism for creating a
VPWS using MP-BGP as both an auto-discovery protocol and a signaling protocol.
In this solution, each PE devices uses Multi-Protocol BGP (MP-BGP) to advertise the CE devices and VPNs
connected to it, together with the MPLS labels used to route data to them. Consequently, when this
information is received by the other CE devices, they learn how to setup the VPWS.
Section 8 Page 47
Data exchange
Label mapping
RED-CE1 L21 (CE1→
→CE2) → VC200
Local label mapping L12 (CE1←CE2) ← VC200 2
VC100←L12(CE1←CE2) 7 →CE2) → VC201
L23 (CE3→
VC101←L13(CE1←CE3) L32 (CE3←CE2) ← VC201
5 4
L54
L31 6
L35 3
L12 L12
L12 L12 1
8 LSP 35 PE
L12 54 VC200
PE 31 CE2
VC100 VC201
CE1
VC101 MPLS network L23
PE
RED-CE3
Local label mapping VC300 VC301
VC300←L31(CE3←CE1)
VC301←L32(CE3←CE2) CE3

CE 2 can now send data to CE 1.

To do so, it simply sends the data over VC 1, which is the VC corresponding to CE 1. PE 2 knows that data
received on VC 1 should be tunneled to PE 1, with label L12 (the label for data from CE 2 to CE 1).
Then because the connection between PE1 and PE2 is achieved through an LSP, a second label is stacked
The top label can be swapped by the core LSR’s.
The egress PE discards the top label. Inner label L12 is removed and the data is forwarded over VC 2 to
CE1.
Section 8 Page 48
Lasserre-V.Kompella vs. K.Kompella
Marc Lasserre Vach Kompella Kireeti Kompella
Lasserre-V.Kompella K.Kompella
Signaling or “auto-configuration”
(tunnels establishment and routing LDP MP-BGP
information exchanges)
Auto-discovery no MP-BGP
Learning which other PE routers are To do manually or using
proprietary solutions. Complex. Spends
participating in the VPLS.
bandwidth
Supported by Many vendors Juniper

(Alcatel-Lucent)

Signaling also called “auto-configuration” : the mechanism by which tunnels are established and routing
information are exchanged
Auto-discovery : process by which one PE router learns which other PE routers are participating in the
VPLS.
The main difference between the two drafts is that Vach advocates using the LDP protocol for VPLS
signaling setup, while Kireeti says MP-BGP can do that and discover other VPLS nodes
Currently, Juniper is the only company supporting Kireeti's Draft Kompella. Most vendors planning on
offering VPLS are behind Vach’s solution, co-authored with Marc Lasserre
The two drafts have very similar names and both relate to how routers assign labels, but there are subtle
differences.
Alcatel supports an approach to label distribution specified in a draft named “Lasserre V Kompella”. This
specification use LDP protocol for assigning the label for a pseudo-wire LSP. This is convenient because
routers in a MPLS network already support LDP signaling for their LSPs. LDP has been designed to establish
signaling relationships with directly connected neighbors as well as indirectly connected neighbors and is
easily extensible.
Lasserre V Kompella draft does nor define an auto-discovery method, so there is a need for extension of
LDP or to do it manually or to develop proprietary solutions.
The alternative approach is supported by Juniper. It is named the “K. Kompella”. It uses MP-BGP for
signaling the assigned labels. Again, the routers in a MPLS network already use BGP and use MP-BGP for the
MPLS L3 VPN service, so this is convenient. However, since BGP is a broadcast protocol, it may not be
bandwidth efficient.
• K-Kompella Pros: K-Kompella Cons:
• Similar to L3VPNs (uses MP-BGP, like L3VPNs) . Not as widely supported as Lasserre-V.Kompella
• Easier to add PEs to a VPN . BGP is essentially a broadcast mechanism
• Don’t have to run LDP (wasted bandwidth, security)
Uses Auto-Discovery
Section 8 Page 49
3.2 Virtual Private LAN Services (VPLS)

Section 8 Page 50
VPLS : Virtual Private Lan Service
LSP between PEs

VPLS A
MPLS VPLS A
network PE3
P
VPLS B
VPLS B P
PE1
P
Customer VPLS VPLS B

tunneled through
MPLS network PE2 MTU
VPLS B
VPLS A
Point-to-Multipoint Service MTU : Multi Tenant Unit

One of the main differences between a VPWS and the VPLS described above is that the VPWS only provides
a point-to-point service, whereas the VPLS provides a point-to-multipoint service. This also means that
the requirements on the CE devices are quite different. In a VPWS, layer 2 switching must be carried out
by the CE routers, which have to choose which Virtual Wire to use to send data to another customer site.
In comparison, the CE routers in a VPLS simply send all traffic destined for other sites to the PE router.
MTU are typically located in large buildings, serving different customers.

In the IETF L2VPN terminology, a MTU is called Layer2 PE (L2PE).
Customers designated VPLS A and VPLS B are part of two independent Virtual Private LANs
Tunnels LSP are set up between PE’s
Layer 2 VC LSPs are set up in Tunnel LSPs
The CE at the ingress side simply reviews Layer-2 addresses and forwards information to the CE on the
egress side based upon Layer-2 switching or bridging tables.
All customer sites using VPLS appear to be on the same LAN, regardless their location. From customer edge
device point of view, the WAN is not visible.
Customer edge devices appear to each other as connected via single logical learning bridge with fully
meshed ports.
VPLS combines the best of Frame Relay VPN and IP
Defined in draft-lasserre-vkompella-vpls-l2vpn-08.txt
Section 8 Page 51
VPLS : LAN emulation
MPLS
network Site B
P PE3
Site A
P
PE1 P
PE2
Site C
IEEE 802.1D
Bridging
(MAC learning) Switch
(LAN emulation)
VPLS Bridge Site B
Site A Bridge VPLS
VPLS Bridge Site C

Pseudo-Wire

The MPLS core acts like a Layer-2 Bridge (LAN switch).

VPLS Forwarding
Learns MAC addresses per pseudo-wire (VC LSP)
Forwarding based on MAC addresses
Replicates multicast & broadcast frames
Floods unknown frames
Split-horizon for loop prevention
Standard IEEE 802.1D code
Used to interface with customer facing ports
Might run STP with CEs
Used to interface with VPLS
Might run STP between Pes
There are two proposed standards for implementing a VPLS. They differ based on their approach to the following:
• Auto-Discovery – What technique is used to enable backbone routers that participate in a VPLS domain to find each
other?
• Signaling – What protocol is used to set up MPLS tunnels and distribute labels?
Draft Lasserre-Vkompella VPLS
This solution uses LDP for signaling and does not use a protocol for auto-discovery. Any network organization that
implements it would have to know what backbone routers were a part of a VPLS instance. For every VPLS instance on
a backbone router, the network organization would have to configure that backbone router with the addresses of all
of the other backbone routers that are part of that VPLS instance. This approach is both operationally demanding and
error prone, and it introduces another protocol (LDP) into the network.
Draft Kompella VPLS
A Draft Kompella VPLS uses MP-BGP for both auto-discovery and signaling. Using MP-BGP for auto-discovery greatly
simplifies the configuration of VPLS without introducing an additional protocol into the network.
Section 8 Page 52
VPLS : Virtual Forwarding Instance
Red Blue
VFI table VFI table
Site 1
Red Blue LSP
Red Site 2
VFI table Red
VFI table
Eth e3
e1 PE Eth VLAN tag8
1
Site 1 Eth e2
Eth e Site 2
Blue PE
2 1 Blue
3
MPLS
VLAN tag 8 network
PE
LSP
Pseudo-wire
Eth Blue
Eth 2 e0 VFI table
Red
VFI table
VLAN tag8
Site 3
Site 3 Blue
Red
VFI : Virtual Forwarding Instance

Provider Edge routers track MAC addresses I VPLS networks by using Virtual Forwarding Instances (VFIs).
VFIs are table that contain MAC addresses for a given VPLS service or customer.
VFIs can be assigned to a physical port such as an Ethernet interface, or a VLAN.
VFIs separate one customer’s MAC addresses and VLANs from another.
Thus, PEs associate received frames to a particular Pseudo-Wire, using the VFI assigned to the port
Section 8 Page 53
VPLS : Encapsulations
•Self-learning bridge
•Spanning Tree Prot. Red VFI table Red VFI table
VPN LSP VPN LSP
•… Mac if
Label Label Mac if
Label Label
a,b,c e1 - -
8 d,e,f e3 - -
2 d,e,f 1 34 56 a,b,c 0 34 12
b c g,h,i 3 65 42 g,h,i 2 23 44 f e
a Site 1 d
Red 1 VPN label 5 6 Site 2
a→d
34
3 L56
4 L25 L34
CW
9 Red
L34 L34 7
L34 CW
Eth CW CW a→d
e3
e1 3 0
l m
25 PE2 Eth VLAN tag8 p
56
Site 1 Eth e2 PE1 27 12
1 3 3’ 4’ 5’ 44 Eth e Site 2
n Blue 3 L56 L25 L12 2MPLS 1 Blue
n→p 2’ L12 CW network q r
VLAN tag 8 1’ L12 42 L12 CW 6’
CW CW
Blue VFI table VPN label
VPN LSP 65
Mac if
Label Label
l,m,n E2.8 - -
o,p,q 1 12 56 LSP PE1→PE2 : 56 –25-3
r,s,t 3 78 42 LSP PE2→PE1 : 12-27-3

These VFIs contain MAC addresses and/or VLAN tags as well as any QoS policies. They also contain inner
labels used for a given Pseudo-Wire or set of pseudo-Wires established for the customer.
Here we see the encapsulation of ethernet over MPLS network and VPLS service.
A standard Ethernet frame is received off the LAN on the customer edge switch. This can be also an MTU.
The frame is forwarded to PE. The PE then looks up the VFI assigned to the port. From information stored in
the VFI, the PE then adds the VPLS/MPLS headers that include :
A control word
A VPN label that represents the Pseudo-Wire
The network MPLS label that reaches the destination PE
Section 8 Page 54
VPLS :Hierarchical VPLS (H-VPLS)
Flat Topology
VPLS scalability
problem
Hierarchical Topology Hub and Spoke

There are some scalability limitations that apply equally.

 VPLS places a significant burden in the PE devices. In particular, the PE device performs routing in the
provider network, it maintains the MPLS tunnels in the provider network together with the pseudo-
wires on top of these, and it performs MAC learning for all of the attached VPLSs. This means that the
PE device will need enough processing power and memory to maintain the forwarding state for
hundreds or thousands of VPLS instances, each of which could have thousands of MAC addresses.
 The size of a single VPN instance is limited by the efficiency of the MAC learning and bridging
algorithms deployed. As a general rule of thumb, it is likely to be possible to connect tens of sites to a
single VPLS VPN, but not hundreds.
Hierarchical VPLS
VPLS requires a full mesh of pseudo-wires between all PE devices causing scalability problems.
It is beneficial to select one PE “Hub and spoke “ and to only set up the mesh of tunnels between this “Hub
and spoke” PE and the other PEs.
This architecture has a direct impact on the Signaling Overhead
This approach seems to be well established as a good solution to the core LSP scalability issue.
It reduces :
the number of connections
The replication requirement (In the basic model, when a frame is received whose destination MAC
address is unknown, the PE replicates the frame to all other PE routers in the network mesk. has to be
fooded)
However, it does not reduce the number of MAC addresses that need to be maintained. PE still does the
Ethernet bridging.
Section 8 Page 55
VPLS : De-coupled VPLS
Thousands MAC addresses
VLANs
MTU MTU
MTU
MTU
MTU MTU
MTU
hundreds MAC MTU
addresses MTU: Multi Tenant Unit

De-coupled VPLS distributes the VPLS functions between PEs and MTUs
De-coupled VPLS reduces the number of MAC addresses to maintain, and the number of signaling
connections but does not limit the number of pseudo-wires as the hierarchical VPLS does.
All Ethernet MAC functions (MAC switching, learning, aging, flooding, STP, etc) and Pseudo-wires
termination functions are performed in the MTU, while the auto-discovery and LSR (MPLS) functions are
performed in the PEs
The link between MTU and PE is able to maintain multiple virtual circuits implemented using VLAN tags (or
MPLS labels).
PE acts as an LSR/LER. It does not implement Ethernet bridging functions.
The result in this architecture is that MTUs perform all the replication and MAC functions and the PE’s
establish a Pseudo-Wire mesh for each MTU-to-MTU link necessary for connectivity using MPLS
provisioning and signaling.
Section 8 Page 56
3.3 Virtual Private Routed Network (VPRN)

Section 8 Page 57
VRF : Virtual Routing and Forwarding
CE
PE
PE
VRF Blue
VRF Blue CE
Router
VRF Red PE
CE
CE VRF Red
VRF Yellow
CE
PE PE
VRF Yellow
VRF Yellow
CE
CE

VRFs : VPN (or virtual) Routing and Forwarding Table.

Each VPN uses its own forwarding table.
At a PE, a VRF represents the context that is specific to an attached VPN; a VRF is primarily associated to
(is identified by) the one or more sub-interfaces through which the sites belonging to this VPN are
connected.
In this architecture, each PE maintains a virtual router for each VPN forwarding table. Fully meshed tunnels
are advertised across the core using VR protocols. The core of the MPLS network does not combine data
from several sites. Since the data is kept separate, this design has the added benefit of additional
security in that a misconfiguration will not impact security of the data. The downside of this design could
prove to be one of scalability and the need for complex configuration.
Multiple VRFs are used on PE routers
Each VPN needs a separate Virtual routing and forwarding instance (VRF) in each PE router to :
Provides VPN isolation
Allows overlapping, private IP address space by different organizations
Section 8 Page 58
PE to CE Router Connectivity
OSPF
RIP
CE1
MPLS CE3
network
PE
PE
CE2 MP-BGP
PE
eBGP CE4
Static

The control flow consists of two subflows.

The first control subflow is responsible for the exchange of routing information between the CE and PE
routers at the edges of the provider's backbone and between the PE routers across the provider's
backbone.
The second control subflow is responsible for the establishment of LSPs across the provider’s backbone
between PE routers.
Different IGPs or eBGP supported between PE and CE peers. The PE learns customer routes from attached
CEs. Protocols used between CE and PE routers to populate VRFs with customer routes :
BGP-4, useful in stub VPNs and transit VPNs
RIPv2
OSPF
static routing, particularly useful in stub VPNs
Customer routes are distributed to other PEs with MP-BGP
Note:
Customer routes need to be advertised between PE routers
Customer routes are not leaked into backbone IGP
Section 8 Page 59
Overlapping VPN
Site 1 Site 2
Red VPN VRF Red Blue VPN
VRF Blue 10.5/16
10.1/16 MPLS
network
VRF Red
Site 4 VRF blue
Red VPN green Blue VPN
10.2/16 VRF Red
VRF green Site 5

Site 8
Red VPN Blue VPN
VRF green Green VPN
10.3/16 Site 3
Green VPN 10.4/16
Red VPN 10.1/16
Site 6 Site 7
Green VPN
Green VPN 10.3/16
10.2/16
Green VPN

• A site can be part of different VPNs

• A site belonging to different VPNs may or may not be used as a transit point between VPNs
• If two or more VPNs have a common site, address space must be unique among these VPNs
Section 8 Page 60
CE-PE routing
Site 1 CE1 •VRF RED Interf. is assigned manually to VRF Site 4

Red VPN Interface : If_11 Red VPN
10.1/16 Gw 10.2/16
a Route If
RIP If_21 CE4
a → 224.0.0.9 10.1/16 a If_11 VRF
RIP mess P PE2
10.1/16:cost=1 If_11 If_2a VRF
P CE5 Site 5
If_22 Blue VPN
VRF If_2b
CE2 If_1a Green VPN
Site 2 10.4/16
Blue VPN VRF PE1 MP-BGP
10.5/16 If_12 P
VRF If_1b
If_13
b P
If_3c
Site 3 Gw
CE6 Site 6
Route If If_3b
Green VPN CE3 Green VPN
10.1/16 10.1/16 b If_13 10.2/16
PE3 If_32
•VRF Green
Interface : If_13 VRF VRF
address If_31 If_33

Site 8 CE7 Site 7
overlapping Red VPN Green VPN
10.3/16 CE8 10.3/16

At a PE, a VRF represents the context that is specific to an attached VPN; a VRF is primarily associated to
(is identified by) the one or more sub-interfaces through which the sites belonging to this VPN are
connected.
In this example :
PE 1 is configured to associate VRF Red with the interface (or subinterface) if_11 over which it learns
routes from CE 1. When CE 1 advertises the route for prefix 10.1/16 to PE 1, PE 1 installs a local route to
10.1/16 in VRF Red.
PE 2 is configured to associate VRF Green with the interface (or subinterface) if_13 over which it learns
routes from CE 2. When CE 2 advertises the route for prefix 10.1/16 to PE 1, PE 1 installs a local route to
10.1/16 in VRF Green.
Then, the routes has to be propagated through the MPLS network.
Overlapping Customer Address Spaces

VPN customers often manage their own networks and use the RFC 1918 private address space.
If customers do not use globally unique IP addresses, the same 32-bit IPv4 address can be used to identify
different systems in different VPNs. The result can be routing difficulties because BGP assumes that each
IPv4 address it carries is globally unique.
To solve this problem, Layer3 VPNs support a mechanism that converts non-unique IP addresses into
globally unique addresses by combining the use of the VPN-IPv4 address family with the deployment of
Multiprotocol BGP Extensions (MP-BGP).
Section 8 Page 61
Route Distinguisher and VPN-IPv4
Site 1 VPN-IPv4 is a globally unique, 96bit routing prefix

Red VPN
10.1/16 CE1
If_11
PE1
Type Assigned
Site 3 CE3 If_13 00 00 ASN nb sub-field
Green VPN
Autonomous System Number Various
10.1/16 (ASN) assigned by IANA formats
Type IP address Assigned nb
00 01 sub-field
when MPLS/VPN network uses a private AS nb
(loopback@ of the PE router that originates the route)
Type Assigned nb
00 02 ASN sub-field
Autonomous System Number (ASN) assigned by IANA

VPN-IPv4 Address Family

One challenge posed by overlapping address spaces is that if conventional BGP sees two different routes to
the same IPv4 address prefix (where the prefix is assigned to systems in different VPNs), BGP treats the
prefixes as if they are equivalent and installs only one route. As a result, the other system is unreachable.
Eliminating this problem requires a mechanism that allows BGP to disambiguate the prefixes so that it is
possible to install two completely different routes to that address, one for each VPN. RFC 2547bis
supports this capability by defining the VPN-IPv4 address family.
BGP was originally designed to carry routing information only for the IPv4 address family. Realizing this
limitation, the IETF is working to standardize the Multiprotocol Extensions for BGP4 (MP-BGP). It is
designed to carry such routing information between peer routers (PE)
propagates VPN-IPv4 addresses
carries additional BGP route attributes (e.g. route target) called extended communities
The ability to use this particular address family is indicated during BGP capabilities exchange between two
MP-BGP peers during their initial session startup.
A VPN-IPv4 address is a 12-byte quantity composed of an 8-byte RD followed by a 4-byte IPv4 address
prefix.
The service provider must ensure that each RD is globally unique. For this reason, the use of the public ASN
space or the public IP address space guarantees that each RD is globally unique.
Notes :
VPN-IPv4 addresses are used only within the service provider network.
VPN customers are not aware of the use of VPN-IPv4 addresses.
VPN-IPv4 addresses are carried only in routing protocols that run across the provider's backbone.
VPN-IPv4 addresses are not carried in the packet headers of VPN data traffic as it crosses the provider's
backbone.
Section 8 Page 62
Route Distinguisher
VRF Red VRF RED Site 4

Site 1 If : if_11 Red VPN
CE1 If : if_ 21
Red VPN RD: RD-1 10.2/16
RD: RD-4 If_21 CE4
10.1/16
VRF
VRF Blue If_11 P PE2
If_2a VRF
If : if_ 12 CE5 Site 5
RD: RD-2 P Blue VPN
If_22
VRF If_2b
CE2 If_1a Green VPN
Site 2 10.4/16
Blue VPN VRF PE1
10.5/16 If_12 P
VRF VRF Brown
If_1b If : 22
RD: RD-5
CE3 If_13 P
If_3c
Site 3 CE6 Site 6
Green VPN If_3b Green VPN
VRF Green
10.1/16 10.2/16
If : if_ 13 PE3 If_32
RD: RD-3 VRF Green
VRF VRF If : if_ 32, if_ 33
CE8 If_31 RD: RD-67
If_33
Site 8 VRF Red CE7
Red VPN Site 7
If : if_ 31 Green VPN
10.3/16 RD: RD-8 10.3/16
Route distinguisher is manually configured
The route distinguisher (RD) must be defined at VRF creation time . A Route Distinguisher makes non-
unique routes unique. It travels in MP-BGP_update
This parameter is used when the VPN private routes are distributed via the backbone to the other sites. The
RDs enable the overlapping of addresses between VPNs
Route distinguishers are not automatically set up at the PE router, instead each element requires manual
input based on the topology design of the VPN and therefore each VPN requires manual set up of VRFs.
The VRF tables have attributes. The network administrator configures these attributes with route
distinguisher to control the distribution of VPN routes to the VPN members.
All further Customer-relayed VPN operations are fully automated by MPLS network significantly simplifying
and reducing operational costs for the service provider.
Section 8 Page 63
VPN labels exchange
In In FEC Proc Out Out In In Proc Out Out In In Proc Out Out In In FEC Proc Out Out
if label if label if label if label if label if label if label if label
- - PE2 Push If_a 12 a 12 Swap b 19 c 19 Pop d -- - - PE1 Push If_2a 21
b 29 Pop a -- d 21 swap c 29
VRF Red
Label: 21 VPN label: 2001
Label:1001 Label:1001
Label: 29
VRF Red Label:1001
VPN label: 1001 VRF
21
e Label:
Label:1001 29 P d If_2a PE2 2001
Label:
LSP c 3 VRF
1001 b
3 P 19 Label:2001
a LSP
a
If_1 Label: 19
VRF 12 Label:2001 Label:2002
PE1 Label: 12 VRF Blue
Label:2001 VPN label: 2002
VRF
Label: 19
Label:2002 Label:2002
Label: 12
Label:2002
VRF Blue
VPN label: 1002

Scalability is enhanced because PE routers are not required to maintain a dedicated VRF for all of the VPNs
supported by the provider's network. Each PE router is only required to maintain a VRF for each of its
directly connected sites.
Section 8 Page 64
User data flow
Output inner Output inner

outer
outer Route
Route if label label if label label
10.1/16 If_11 1001 -- 6 10.2/16 If_21 2001 --
2 10.2/16 If_1a 2001 12 10.1/16 If_2a 1001 21
10.3/16 If_1b 3001 13 10.3/16 If_2b 3001 23
19 2001 10.1.2.3
12 →10.2.4.2 Site 4
Site 1 2001 10.1.2.3 Red VPN
Red VPN
CE1 2001 →10.2.4.2 7 10.2/16
10.1.2.3 2001
10.1/16 10.1.2.3 →10.2.4.2 if_21 CE4
→10.2.4.2 5 VRF
Py
10.1.2.3 if_11 4 3 PE2
1 →10.2.4.2 3 Px 19 if_2a VRF 10.1.1.1
1001
2001 5’ 7’→10.4.4.4
VRF 4’ 2002 If_22
12 19
CE2 10.1.1.1 Site 5
Site 2 2002 Blue VPN
Blue VPN VRF PE1 if_1a 3’ →10.4.4.4 CE5 Green VPN
10.5/16 if_12 12 10.1.1.1
VRF →10.4.4.4 10.4/16
2002
if_13 2002
10.1.1.1 Output inner
outer
Site 3 CE3 →10.4.4.4 Output inner outer Route if label label
Green VPN Route if label label
10.1/16
6’ 10.4/16 If_22 2002 --
10.1/16 If_13 1003 -- 10.1/16 If_2a 1003 21
10.2/16 If_1b 3002 13 10.2/16 If_2b 3002 23
10.1.1.1 10.3/16 If_1b 3003 13 10.3/16 If_2b 3003 23
→10.4.4.4
1’ 2’ 10.4/16 If_1a 2002 12 10.5/16 If_2a 1002 21

Route distribution on the control plane has enabled the building of the VRFs and thus prepared the transfer
of IP traffic between sites. The above figure illustrates two simultaneous data transfers:
from a host at Site 1 to, for example, some server at Site 4 (with IP address 10.2.4.2) and,
from a host at Site 3 to some other server at Site 5 (with IP address 10.4.1.8).
When the IP packet with destination address 10.2.4.2 is received by PE1 from CE1, since all packets that
arrive on if_1 are associated with VRF Red, the Red VRF is interrogated and the entry corresponding to
10.2/16 route indicates if_1a as output interface, and a label stack:
Outer label (12) : which identifies the remote PE
Inner label (2001) : which identifies the remote CE
The label stack is inserted in front of the IP packet, the data link header is inserted in front of the label
stack and the resulting frame is queued on the output interface. Similarly, when the IP packet with
destination address 10.4.1.8 is received by PE1 from CE3, the Green VRF is interrogated and the entry
corresponding to 10.4/16 route indicates if_1a as output interface, 12+2002 as label stack, as well as
(not shown) a data link header. The label stack is inserted in front of the IP packet, the data link header
is inserted in front of the label stack and the resulting frame is queued on the output interface.
The two frames are sent on the LSP egress path (PE1’s output interface: if_1a); at Px router, the top labels
are swapped (19 replaces 12) and the labelled packets forwarded towards Py, which is the penultimate
hop in the LSP.
As a result, the outer labels are popped and the packets sent towards PE2 with only the inner label in front.
At egress PE2, the relevant VRF sub-interface is retrieved from the VPN label and the original IPv4 packet
is finally forwarded to the CE enabling you to reach the server within the site.
Section 8 Page 65
End of Section

Section 8 Page 66
9 Section 9
IPSEC VPN Services
IP Technology
Section 9 Page 1
Blank Page

IPSEC VPN Services
Document History
Section 9 Page 2
1. IPSEC Services

IPSEC VPN Services
Section 9 Page 3
9 IPSEC VPN Services
Services Offered by IPSEC
Integrity check
Authentication of data
Name:
Chaplin
Confidentiality

Protection against replay

IPSEC VPN Services
The IP Security Protocol (IPsec) is a set of mechanisms intended to protect the traffic at the IP level (IPv4 or
IPv6).
The security services offered are:

integrity in connectionless mode,
authentication of the source of the data,
protection against replay,
confidentiality (confidentiality of the data and partial protection against the analysis of traffic). These
services are supplied at the level of the IP layer. Consequently, they offer a protection for IP and for all
the higher-level protocols. IPsec is optional in IPv4 but is mandatory for all implementations of IPv6.
A first release of the proposed mechanisms was published in the form of an RFC in 1995, but didn't deal with
key management. A second release, dealing with the IKE key management protocol, was published in
November 1998.
Section 9 Page 4
Operating modes _ Transport Mode
IP packet
A→
→B src and dest.
addresses stay
visible
A→
→B A→
→B
Not entirely protected
IPsec IPsec
A Entirely protected
B
Internet

IPSEC VPN Services
In transport mode, only the data coming from the higher-level protocol and carried by the IP datagram is
protected.
This mode can only be used with terminal devices. Indeed, when intermediate devices were used, the risk was
that, according to routing hazards, the packet reaches its final destination without going through the
gateway supposed to decryp it.
The original IP packet is not encapsulated into another IP packet.

The entire packet can be authenticated (AH protocol, see section 9.3 Security Mechanisms).
The packet payload can be encrypted (ESP protocol, see section 9.3 Security Mechanisms).
The original header does not change while passing through the Internet.
Section 9 Page 5
Operating modes _Tunnel Mode (example 1)
x→
→y
IPsec
src and dest.
IP packet addresses are
A→
→B
hidden Entirely protected
A→
→B
A→
→B A→
→B
Internet
A x y B
Intranet Intranet
IPsec IPsec
gateway gateway

IPSEC VPN Services
In tunnel mode, the IP header is also protected (authentication, integrity and/or confidentiality). All of it is
encapsulated in a new packet. The purpose of the header of this new packet is to transport the initial packet
up to the end of the tunnel, where the packet is de-encapsulated. Therefore, the tunnel mode can be used
by both terminal devices and security gateways.
This mode enables to ensure a greater protection against analysis of traffic because it hides the source and
final destination addresses.
The original IP packet is encapsulated in another IP packet.

The entire packet can be authenticated and/or encrypted.
Section 9 Page 6
Operating modes _ Tunnel Mode for dial-in (example 2)
A→y
IPsec
IP packet A→
→B
Entirely protected
A→
→B
A→
→B
A Internet y B
Intranet
IPsec IPsec
gateway gateway

IPSEC VPN Services
The tunnel mode can also be used by terminal devices.
Section 9 Page 7
Security Mechanisms _ Services Offered by AH and ESP
AH (Authentication Header) ensures:

data integrity
the authentication of the source of the data
protection against replay, optionally.
ESP (Encapsulation Security Protocol) can ensure:

integrity of data
the authentication of the source of the data
protection against replay, and/or
confidentiality
IPSEC VPN Services
In addition to standard IP processing, IPsec uses two security mechanisms to provide security for IP traffic:
Authentication Header (AH) and Encapsulating Security Payload (ESP).
AH
AH does not offer confidentiality, which means that widespread use of this standard is possible over the
Internet, including in places where exporting, importing and using encryption for confidentiality purposes is
restricted by law. This is one of the reasons why two distinct mechanisms are used.
In AH, integrity and authentication are provided together, using an additional block of data attached to
the message to be protected. This data block is called the Integrity Control Value (ICV), which refers
generically to:
either a Message Authentication Code (MAC),
or a digital signature.
For reasons of performance, the algorithms currently offered are all integrity check algorithms.
Anti-replay protection is provided using a sequence number. It is available only if Internet Key Exchange (IKE)
is used because in manual mode there is no "connection open" that enables the counter to be reset.
ESP
Confidentiality can be selected independently of the other services. However, using confidentiality without
integrity/authentication (directly in ESP or with AH) leaves traffic vulnerable to certain types of active
attack that could weaken the confidentiality service. As in AH, the authentication and integrity services go
hand-in-hand and are often referred to as "authentication". They are based on the use of an ICV (in practice,
a MAC). Anti-replay protection can only be selected if authentication has been selected and IKE is used. It is
provided using a sequence number that is checked by the recipient of the packets.
Unlike AH, where an additional header is simply added to the IP packet, ESP uses the encapsulation function:
the original data is encrypted then encapsulated in a trailer header.
Section 9 Page 8
Authentication Header (AH)
byte byte byte byte
Next header Payload length Reserved
Security Parameter Index (SPI)
Sequence number
Authentication data
(ICV: Integrity Control Value)

IPSEC VPN Services
AH: RFC 2402
The various fields of an AH are:
Next header: It identifies the type of payload that follows the AH header.
Length: It indicates the header length in words of 35 bits minus 2 (since AH is an extension of the IPv6
header).
Reserved: This field is reserved for future use. It must be set to 0.
Security Parameters Index (SPI): The SPI field is a 32-bit arbitrary value that when combined to the
destination IP address, defines the unique Security Association (SA) of this datagram.
Sequence number: This field gives the packet number and is incremented of 1 at each transmission. This
enables to prevent replay (Protection against replay) since this number is not authorized to "cycle" for a
given SA (a new SA must then be created after 232 packets). This field is mandatory for the emitter but it
cannot be taken into account by the recipient. In the latter case, the number is authorized to cycle.
Authentication data: It contains the Integrity Control Value (ICV) of the packet.
Section 9 Page 9
AH: Next Header Field: Example
Transport mode Tunnel mode

A B x Internety
Internet A
Intranet Intranet
B
Hd ToS Datagram length

Vers leng
Hd Identification F Datagram offset
Vers leng ToS Datagram length
Identification F Datagram offset TTL Prot: 51 Checksum
Source IP address x
TTL Prot: 51 Checksum
Destination IP address y
Source IP address A
Options
Destination IP address B Next header Payload
Options 4 length
Reserved
Next header
Payload Security Parameter Index (SPI)
17 length
Reserved AH
Sequence number
AH Security Parameter Index (SPI)
Sequence number Authentication data
Vers Hd Datagram length
Authentication data leng ToS
Identification F Datagram offset
UDP src port UDP dest. port TTL Protocol

Source IP address A
Checksum
UDP length Checksum UDP IPv4 Destination IP address B

UDP Options
Data
Data

IPSEC VPN Services
Next header:
It authenticates the type of payload that follows the AH header. The value of this field is chosen (see the site
http://www.iana.org/assignments/port-numbers).
AH can be used in transport or tunnel mode.
When used in transport mode, it must represent the value of the protected higher-level protocol, namely
UDP or TCP.
When used in tunnel mode:
The value 4 indicates, in IPv4, an IP-in-IP encapsulation.
The value 41 indicates an IPv6 encapsulation.
Section 9 Page 10
AH: Authentication Data
Example: Tunnel mode Hd

TTL Prot: 51 Checksum
B Source IP address: X
Destination IP address: Y
Fields excluded Intranet Options
from authentication
Next header Payload Reserved
4 length
y Security Parameter Index (SPI)

AH
header Sequence number
Internet
Authentication data
Hd
Ver4 leng Datagram length Hd
ToS ver4 leng ToS Datagram length
Identification F Datagram offset x Identification F Datagram offset
TTL Protocol Checksum TTL Protocol Checksum
Source IP address: A Source IP address: A
Destination IP address: B
Intranet
Destination IP address: B
Options
A Options
Data Data

IPSEC VPN Services
Authentication data:
This field contains the Integrity Control Value (ICV) of the packet.
The length of this field must be a multiple of 32 bits. All the implementations must respect this length and
therefore add padding data to this field if required.
Some fields may be modified by intermediate routers. Consequently, they must not be taken into account in
the calculation of authentication anymore. Thus, the fields excluded from the authentication are:
Type of Service (TOS)
Fragment Offset (always set to 0 since AH only applies to unfragmented packets)
Flags
Time To Live (TTL)
IP header checksum
Options
Protocol number in the IP packet: 51
The default algorithms that are supplied for all implementations of IPsec for AH are HMAC and MD5 –96
[RFC2403] or HMAC and SHA1-96 [RFC2404].
Section 9 Page 11
Encapsulation Security Protocol (ESP)
byte byte byte byte
Sequence number
Payload data (IP header + application data)
Padding (0 ..255 bytes)
Padding length Next header
Authentication data

IPSEC VPN Services
ESP: RFC 2406
The fields SPI, Sequence number, Next header and Authentication data (optional) are defined as for AH.
The Payload data field contains encrypted data. The problems are then:
If the encryption algorithm (for example, DES-MAC) needs a cryptographic synchronization (Cipher Block
Chaining (CBC) mode) i.e., an Initialization Vector (IV), then it is possible that such a data is contained in
the Payload data field.
The Padding field enables to resort to padding for the following reasons:
In case of block-encryption, the algorithm may request a certain size of data to be encrypted. Therefore,
this enables the content has the size required by the algorithm.
Padding may also be required when the ESP packet is 4-byte long.
The main algorithms that can be used with ESP are:
Confidentiality:
triple DES (mandatory) (168-bit key),
DES (56-bit key),
RC5, AES, CAST, IDEA, IDEA triple, Blowfish, RC4,
NULL when there is no need of encryption.
Authentication:
HMAC-MD5 (mandatory),
HMAC-SHA-1 (mandatory),
DES-MAC, HMAC-RIPE-MD, KPDK-MD5
NULL when authenticity is not selected.
Section 9 Page 12
ESP Format
Hd Datagram length
Example: Tunnel mode Vers leng ToS
TTL Prot:50 Checksum
Source IP address: X
B Destination IP address:Y
Options
Intranet
Security Parameter Index (SPI) ESP
Sequence number header
y Hd
Ver4 leng ToS Datagram length
Internet
Hd Auth.
Ver leng ToS Datagram length Source IP address: A
x Destination IP address: B Encrypted
TTL Protocol Checksum Options
Source IP address: A
Destination IP address: B Intranet Data
Options Padding
A Padding
length
Next
header4
ESP trailer
Data
Authentication data ESP auth.

IPSEC VPN Services
ESP: protocol number 50
The emitter:
Encapsulates, in the Payload data field of ESP, the data carried by the original datagram and the IP header
in tunnel mode.
Adds if necessary a padding.
Encrypts the result (Data, Padding, Length and Next header fields).
Eventually, adds cryptographic synchronization data (initialization vector) at the beginning of the Payload
data field.
If the authentication has been selected, it is always applied after the data has been encrypted. This
enables to check the validity of the received datagram before performing the datagram decryption,
which is an expensive operation. Unlike AH, the authentication in ESP only applies to the ESP packet
(header + payload + trailer) and includes neither the IP header nor the Authentication data field.
Section 9 Page 13
ESP Position in Transport Mode
src and dest. stay

visible
IP packet
A→
→B
A→
→B
A→
→B
ESP header
ESP header
Suject to
Suject to authentication (if
encryption supplied)
(if supplied)
ESP trailer
ESP ESP trailer
authentication
A B
Internet

IPSEC VPN Services
In transport mode, only the data coming from the higher-level protocol and carried by the IP datagram is
protected.
Section 9 Page 14
How to Find the Path Maximum Transmission Unit
Phase 1 Flag df
(don’t fragment) 2 4 MT
1
1500 MTU=1536
U=10 U=5
12
2 MT
3
ICMP
destination unreachable (Path MTU Discovery:1024)
Message ICMP
1 1 2 2 2
Type Code CRC MTU Data
3 4 0 next hop IP header+ first 64 bits
Need of fragmentation
Phase 2
Flag df 4
(don’t fragment) 102 MT
1024 MTU=1536 TU= U=5
4 M 6 7 12
5
ICMP
destination unreachable (Path MTU Discovery:512)
IPSEC VPN Services
It is essential to know the Path Maximun Transmit Unit (PMTU) mainly when there is a large amount of data
to be transmitted. Indeed, if long packets are sent along the path, some routers will have to perform an
expensive fragmentation in terms of resources and longer processing time. The recipient will also have to
perform complex operations of re-assembly.
Generally, data transfer applications (FTP for example) prefer to determine the PMTU and to emit packets
that do not exceed this PMTU to get faster transfers.
The PMTU is known by emitting IP packets with the "don’t fragment" flag.
At first, the emitter transmits a packet of a maximum length.
A router that cannot forward a packet of such a length sends back in an ICMP message the value of the
next MTU.
The sender can then emit a new packet which length is equal to the received MTU. This packet is
emitted with the "don’t fragment" flag.
The previous 2 steps are repeated until a packet reaches the recipient.
The length of the last packet correctly transmitted is used as a reference for the rest of the traffic.
This way, the sender can find the MTU of a path (PMTU).
Section 9 Page 15
Information Sent Back in the ICMP Message
Hd
Identification DF Datagram offset
IP header TTL Protocol Checksum
Source IP address A
TCP/UDP
header≈20bytes
Destination IP address B
src port (xx)

Options
dest. Port (21) 1500 bytes
!
1 Application data
A 2 @Z MTU
← 3 1000
4 B
Allows the sender to A←Z
find out the
application Type: Dest. unreachable FTP
ICMP server
Code: Fragment needed message
Next MTU: 1000
Part of IP packet Data: Vers leng ToS Datagram length
having caused TTL Protocol Checksum IP
ICMP message Source IP address
header
Options
src port (xx) dest. Port (21)
≥ 8 bytes

IPSEC VPN Services
The ICMP message sent back by the router that cannot forward the packet since fragmentation is
impossible, is the following:
Type = 3 (Destination uUnreachable )
Code = 4 (Need of fragmentation and of DF positioning)
Next-Hop MTU in the 16 weak bits of the second word of the ICMP header (called "unused" in RFC 792),
with the 16 heavy-weight bits set to zero.
Data: contains the IP header + 64bits of the packet that caused this ICMP message.
Thanks to these 64 bits, the sender is able to find the application that has initiated the transmission (source
and destination port numbers).
Section 9 Page 16
"Don’t Frag" Flag
Hd Datagram length
Vers leng ToS
TTL Prot:50 Checksum
B Source IP address::X
Destination IP address:Y
Intranet Options
Must be
copied
FW
y Sequence number
Hd Datagram length
Internet Ver4 leng ToS
Hd
x 2
TTL Protocol Checksum FW Destination IP address: B
Source IP address A Options
Destination IP address B Intranet
Options Data
src port dest. port
1 A Padding
Padding Next
length header4
Application data Authentication data

IPSEC VPN Services
IPsec: how to find the PMTU

When a system (host or gateway) adds an encapsulation header (ESP or AH) tunnel, it MUST support the
option enabling to copy the DF bit from the initial packet to the encapsulation header (and to process
PMTU ICMP messages). This means that it MUST be possible to configure the system processing of the DF
bit (positioning, discarding, copy from the encapsulated header) for each interface.
Section 9 Page 17
Information Sent Back in the ICMP Message if IPsec
Vers leng ToS

Identification
Datagram length B
DF Datagram offset
Source IP address x
Destination IP address y
Intranet
Options
Security Parameter Index (SPI) FW IP X←Z
MTU ! y header
leng
Sequence number Type: Dest. unreachable
Datagram length
Ver4 ToS
Identification 3 Code: Fragment needed
DF Datagram offset
TTL Protocol Checksum Z 4 Next MTU:1000
Source IP address: A ICMP Data: Vers leng ToS Datagram length
Destination IP address: B Internet message Identification DF Datagram offset
Options TTL Protocol Checksum IP
Source IP address x header
Data 2 Destination IP address y
Options
Padding Padding Next
length header4
x Security Parameter Index (SPI) ≥8
Authentication data Sequence number bytes
FW 5
leng Datagram length
Ver4 ToS
Identification DFDatagram offset Intranet
Which host?
Destination IP address: B 1 Which application?
Options
Data A
IPSEC VPN Services
The PMTU message with a 64-bit IPsec header

If the PMTU ICMP message contains an only 64-bit IPsec header (minimum for IPv4), then a security gateway
MUST support the following options for the SPI/SA base:
a) If it is possible to determine the source host (or if it is possible to manage the amount of possible
sources), one option consists in sending the PM information to all the possible source hosts.
b) If it is not possible to determine the source host, another option consists in storing the PMTU with the
SA and in waiting that the next packet(s) come(s) from the source host for the concerned SA. If the
packet(s) exceed(s) the PMTU, the packet(s) is/are discarded and one or more PMTU ICMP messages are
generated with the new packet(s) and the updated PMTU. Then the ICMP message(s) relating to the
problem is/are sent to the source host. The PMTU information is used for any message that comes later.
PMTU calculation
The PMTU calculation that is sent back to the host must take into account that an IPsec header has been
added whichever it is -- AH transport, ESP transport, AH/ESP transport, ESP tunnel, AH tunnel.
Note: In certain situations, the addition of IPsec headers might result in the calculation of an effective
PMTU (as seen by the host or the application) but that is too small. To avoid this, the implementation can
set a threshold under which it would not register a reduced PMTU. The implementation would then apply
IPsec and would reduce by fragmenting the resulting packet according to the PMTU. As a consequence,
the use of the available bandwidth will be more effective.
Section 9 Page 18
Reminder: NAT Function
Private Network Internet

10.10.10.0 Public IP@212.17.22.13 194.5.3.12
.1
Prot Private IP@ Port Public IP@ Port
.4 2 tcp 10.10.10.4 2125 212.17.22.13 2125 3
7 tcp 10.10.10.1 2125 212.17.22.13 1024
8 FTP server
1 IPdest: 194.5.3.12 IPdest: 194.5.3.12

IPsrc: 10.10.10.4 Socket: 5
4 IPsrc: 212.17.22.13
TCPsrc: 2125 TCPsrc: 2125 194.5.3.12
TCPdest: 21 21
TCPdest: 21 212.17.22.13
2125
IPdest: 194.5.3.12 IPdest: 194.5.3.12

6 IPsrc: 10.10.10.1 Socket:10
9 IPsrc: 212.17.22.13
TCPsrc: 2125 194.5.3.12
TCPsrc: 1024 21
TCPdest: 21 TCPdest: 21 212.17.22.13
1024

IPSEC VPN Services
The Network Address Translation (NAT) and Port Address Translation (PAT) functions allow several users to
access the Internet simultaneously.
Section 9 Page 19
Several NAT Devices May Be Crossed
@B @B → @X
7 TCP 21 → 8901
6
Internet
@B → @X
@X →@B @X 8 TCP 21 → 8901
TCP 8901 →21 5
ISP Prot Private IP@ Port Public IP@ Port
NAT 4
TCP @1 4567 @X 8901
@B → @1
9 TCP 21 →4567
@1 → @B
TCP 4567 →21 3
@1 Prot Private IP@ Port Public IP@ Port
NAT 2 TCP @A 1234 @1 4567
Intranet @B → @A
@A→→@B
10 TCP 21 → 1234
TCP 1234→
→21
1
@A
IPSEC VPN Services
A communication may pass through several NAT devices.
Section 9 Page 20
IPsec Problem Inherent to NAT
Example: Tunnel mode @B
Intranet
FW
@Y
Internet
@X
ISP Prot Private IP@ Port Public IP@ Port
IP @1→→@Y NAT
PID: 50 @X
3 esp @1 ??? ???
ESP
2 @1
A→
→B
FW
Intranet
A→
→B
1 @A

IPSEC VPN Services
The NAT problem
RFC 3715: IPsec-Network Address Translation (NAT) Compatibility Requirements

RFC 3947: Negotiation of NAT Traversal in the IKE
RFC 3948: UDP Encapsulation of IPsec ESP Packets
Section 9 Page 21
NAT Traversal
@B
IP @Y→→@X
IP @X→ →@Y Intranet PID: 17(UDP)
PID: 17(UDP) UDP
FW Src: 500 →Dest:4567
UDP 5
Src:4567→Dest:500 @Y
ESP
ESP @B→
→@A
Internet
@A→
→@B
4
@X Prot Private IP@ Port Public IP@ Port
IP @1→→@Y NAT Traversal
PID: 17(UDP) NAT 3 UDP 500 @X 4567
ISP @1
UDP IP @Y→→@1
Src:500 →Dest:500
@1 6 PID: 17(UDP)
ESP 2 UDP
@A→
→@B FW Src:500→Dest:500
Intranet ESP
@A→
→@B @B→
→@A
1 @A

IPSEC VPN Services
The UDP header is a standard [RFC0768] header, where the Source Port and Destination Port MUST be the same
as that used by IKE traffic. The IPv4 UDP Checksum SHOULD be transmitted as a zero value, and receivers
MUST NOT depend on the UDP checksum being a zero value. The SPI field in the ESP header MUST NOT be a
zero value.
RFC 3947: Negotiation of NAT-Traversal in the IKE

This document describes how to detect one or more Network Address Translation devices (NATs) between IPsec
hosts, and how to negotiate the use of UDP encapsulation of IPsec packets through NAT boxes in Internet Key
Exchange (IKE).
Section 9 Page 22
What is the role of the sequence number in the IPSEC headers?
Detect packet loss

Detect attempts of replay
Put the packets in order before supplying them to the applications
If two IPsec tunnels are set between the same pair of entities, which
parameter enables to identify a tunnel accurately?
Impossible at IPsec level as the packet is encrypted
The SPI field
The port number at transport level
The Authentication data field
IPSEC VPN Services
Section 9 Page 23
In which operating mode does ESP hide the source and

destination addresses of IP packets?
Tunnel
Transport

IPSEC VPN Services
Section 9 Page 24
2. IPSEC operation

IPSEC VPN Services
Section 9 Page 25
Security Association (SA)
Parameters:
•Encryption algorithm
•Prot: AH / ESP
•…..
SA dest IP@:Y Security Association

identification Prot: ESP
SPI: ….
Internet
x y
IPsec IPsec
gateway gateway
dest IP@: X
Prot: ESP SA
Security Association identification
SPI: ….
Parameters:
•Encryption algorithm
•Prot: AH / ESP
•…..
IPSEC VPN Services
The mechanisms mentioned previously have resort to cryptography and consequently use a certain amount of
parameters (encryption algorithms, keys, selected mechanisms, etc.) on which the communicating parties
must agree. IPsec uses the Security Association (SA) to manage these parameters.
An IPsec Security Association is a simplex connection that supplies security services to the traffic
transported by it. It can be considered as a structure of data enabling to store the set of parameters
associated to a given communication.
An SA is unidirectional; As a consequence, to protect both ways of a traditional communication, two

associations are required, one for each way. If the AH or ESP protocol is used, then the security services are
used. If AH and ESP are both applied to the traffic concerned, two SAs (even more) are created; They are
referred to as bundle of SAs.
Each association has a unique identification i.e., a triplet made up of:

The packet destination address.
The identifier of the security protocol used (AH or ESP).
A Security Parameter Index (SPI). An SPI is a 32-bit block which is legible in the header of each packet
exchanged. The SPI is chosen by the recipient.
Section 9 Page 26
Security Association Database (SAD)
•One-way SA
SAD •Several SAs towards several partners
outgoing traffic •Several SAs towards the same partner {ƒ(traffic type or destination)}
SA1
SA3 Internet
SA1
SA5 y IPsec
x SA2
gateway
IPsec SA3
gateway SA4
SA5
SAD
incoming traffic SA6
SA2
SA4 z IPsec
SA6 gateway

IPSEC VPN Services
The Security Association Database (SAD) enables to manage the active security associations. It contains all
the parameters relative to each SA. The IPsec gateway looks up the SAD to know how to process each packet
in emission or in reception.
IPsec uses two databases:

One for the outgoing traffic,
the other for the incoming traffic.
Indeed:
Several SAs may be set between several partners.
Several SAs may be set towards the same partner.
Different types of protection may be defined according to the types of the applications.
Different types of protection may be defined according to the direction.
Section 9 Page 27
SAD: Synthesis
SAD
•Sequence number counter,
•Policy when the counter reaches the
maximum value,
SAx
•Anti-replay window for the incoming
traffic, •dest IP@
•Algorithms used, •AH / ESP
•Time To Live •SPI
•Mode (transport / tunnel)
•Information path MTU
•Sequence number counter,
•Policy when the counter reaches the
maximum value, SAy
•Anti-replay window for the incoming •dest IP@
traffic,
•Algorithms used, •AH / ESP
•SPI
•Time To Live
•Mode (transport / tunnel)
•Information path MTU

IPSEC VPN Services
The IPsec processing between two partners requires the following parameters:
Sequence number counter,
Policy when the counter reaches the maximum value,
Anti-replay window for the incoming traffic,
Selection of the AH or ESP algorithm and of the associated parameters,
Time To Live (in seconds or amount of bytes),
Mode (transport or tunnel),
Information path MTU.
Section 9 Page 28
Security Policy Database (SDP) (Example of Outgoing Traffic)
SPD
SA selection parameters:
3
Dest IP@ = 194.1.2.*
Src IP@ = 155.2.8.* SAD
4 outgoing traffic
Transport = TCP •ESP
Algo: …..
Dest port = 21 (ftp) •SA Id1 5 SA1 Time To Live:…
Action: apply transport/tunnel
……
IP packet Dest IP@ = 129.9.9.9 SA2
IPs:155.2.8.1 Src IP@ = 155.2.8.2
IPd:194.1.2.6 •AH
Prot: 6 (TCP) Transport = TCP
Portsrc: 1024 Dest port = any •SA Id2
.6
Portdest: 21(FTP) Action: apply
194.1.2.0
Data
1
2 6
y
.1 SA1
155.2.8.0 x Internet 129.9.9.9
.2 SA2
IPSEC VPN Services
How and when does IPsec processing apply to the IP traffic?

The Security Policy Database (SPD) determines which SA (or sequence of SAs) applies to an IP datagram. There may be:
An SA for each particular type of IP traffic (fine-grained),
An SA for a set of traffic types (coarse-grained).
The protections offered by IPsec are based upon choices defined in a Security Policy Database (SPD). This database is set
and maintained by a user, a system administrator or an application implemented by them. With this database, each
packet is attributed or not security services, is authorized or not to bypass or is rejected or not.
The SPD contains an ordered list of rules. Each rule comprises a given amount of criteria that enable to determine the part
of the traffic that is concerned. The criteria that can be used are the set of information made available by the headers
of the IP and transport layers. They allow to define the granularity according to which security services can be applied
and have a direct influence on the amount of corresponding SAs. When the traffic corresponds to a rule, security services
are attributed to it. The rule indicates the characteristics of the corresponding SA (or bundle of SAs): protocol(s),
modes, required algorithms, etc.
The SPD consists of:
Selection parameters:
Destination IP address (accurate/range of addresses/jocker)
Source IP address (accurate/range of addresses/joker)
ToS
Transport protocol
Source/dest ports (accurate/range of port numbers/jocker)
User identification (e-mail or X.500 name)
One or several mechanisms that can be applied:

ESP, AH, cryptographic algorithm, fine/coarse-grained, tunnel/transport, etc.
SA (sequence of SAs) ID:
Towards the SAD input (or incoming trunk groups),
(SA endpoint).
Some actions - discard (deletes the IP packet), - bypass IPSEC (lets the packet carry on) - apply IPSEC (applies the
security services contained into an SA or a group of SAs (SA Bundle).
Section 9 Page 29
SPD and SAD Management
Administrator
Negotiates
1 modifies, 6
discards
Manual Application
Internet
configuration Keys
of policies Exchange HTTP,FTP,
SA (IKE) POP, …
SAD
creation
Points to Sockets
request
SPD Looks up 5 Transport (TCP, UDP)
4
IP / IPsec (AH, ESP) 2
Looks up 3
Link

IPSEC VPN Services
Outgoing traffic:
When the IPsec "layer" receives data to be sent:
• At first, it looks up the Security Policy Database (SPD) to determine how to process this data.
• If this database indicates that security mechanisms must be applied to the traffic, it gets back the
characteristics required for the corresponding SA and looks up the SA Database (SAD):
– If the required SA already exists, it is used to process the concerned traffic.
– If not, IPsec uses IKE to set a new SA with the required characteristics.
Section 9 Page 30
SA negotiation _ Key Management
Manual
Automatic
With "Preshared Keys"
IKE protocol Certification

Authority
With "Certificates"

IPSEC VPN Services
The distribution and the management of keys are critical operations. IPsec uses two methods of key distribution:
Manual
Automatic
Manual key exchange
The administrators at the end of a tunnel must configure all the security parameters. This principle can be applied in
small static networks. However, the key distribution may become problematic over long distances since the keys may
have been compromised during transit. Moreover, the keys must be regenerated regularly.
IKE supplies a method for:
Negotiating the protocols, algorithms and keys to be used.
Authenticating the parties, that is making sure that you are communicating with the good person from the beginning of
the exchange (primary authentication services).
Managing the keys once chosen (key management).
Supplying the mechanisms to manage the keys.
With "Preshared key":

A "preshared key" is an encryption and decryption key. Both parties must know this key before starting a communication.
In order to authenticate the participants to an IKE session, each end must, in advance, exchange the "preshared key" in a
secured way. In this respect, the problem of a secured distribution of the key is the same as with manual keys. However,
once distributed, unlike a manual key, IKE can automatically create new keys at predefined intervals. Changing the keys
frequently improves considerably the security. If it is automatic, it significantly reduces the responsisbility of key
management. Nevertheless, note that changing the keys frequently increases the traffic. So a compromise must be found
between the security and the effectiveness of data transmission.
With "certificates":
A certificate enables to authenticate the participants to an IKE negotiation. Each end generates a couple of private/public
keys and gets a certificate. The participants can seach for the public key of their peer and ask a Certification Authority
(CA) they both rely on to check the signature.
Section 9 Page 31
Origins of IKE
ISAKMP
OAKLEY
defines the procedures of
defines the groups that authentication and SA
will be used for the Diffie- management
Hellman exchange
DOI
IKE IPSEC
SKEME IPSEC
Secure Key Exchange
Mechanism

IPSEC VPN Services
IKE is drifted from a set of protocols. These protocols are ISAKMP, OAKLEY and SKEME. Actually, they constitute
a protocol stack enabling the automatic key exchange.
OAKLEY
OAKLEY defines the groups that will be used for the Diffie-Hellman exchange. There are 5 groups that are
called The OAKLEY Groups. Among these 5 groups, there are three groups of classical modular
exponentiation (MODP) and two groups of elliptical curves.
Secure Key Exchange Mechanism for Internet (SKEME)

SKEME is a secure key exchange Mechanism. The authentication methods are based upon this mechanism.
Indeed, the method with public-key encryption comes from the SKEME exchange. Developed specifically for
IPsec, SKEME is an extension of Photuris.
Internet Security Association and Key Management Protocol (ISAKMP)

ISAKMP defines the procedures of negotiation, setup, modification, deletion of SAs. ISAKMP defines a
framework to negotiate the security associations, but does not impose anything as regards SA parameters. A
document called Domain of Interpretation (DOI) must define the negotiated parameters and describe how
to use ISAKMP in a given framework. A DOI identifier is used to interprete the content of ISAKMP messages.
Therefore, IKE uses ISAKMP message formats (header and payload).
IPsec Domain of Interpretation (IPsec DOI)

[RFC 2407] defines DOI to use ISAKMP with IPsec.
IPsec DOI is, in brief, a document that contains the definitions of all the security parameters in order the
negotiation of a VPN tunnel is successful i.e., essentially all the attributes required to negotiate SAs and
IKE.
Section 9 Page 32
IKE Phases
Phase 1: A secure channel is set up to execute IKE
Main mode: a secure channel is set up
Aggressive mode: the same goes for this mode but the
the partners' identities are not
protected (simpler and faster)
Phase 2: negotiation of SA parameters
Quick mode: performs an SA negotiation

IPSEC VPN Services
IKE is implemented above UDP. The "well-known port" is: 500
IKE phases
IKE is a two-phase protocol:
Phase 1:
Both partners set up a secure channel (IKE SA) to execute IKE. They negotiate how to authenticate and
secure the channel.
Phase 2
Both partners negotiate the IPsec SA parameters.
IKE modes
Oakley supplies three modes of key exchange and of SA implementation.
Two modes for the phase 1 of IKE:

Main mode: performs a phase 1 IKE exchange by setting up a secure channel,
Aggressive mode: simpler and faster but does not protect the identity of the pertners that are
negotiating since they must transmit their identity before negotiating a secure channel.
One mode for the phase 2 of IKE.

Quick mode: negotiates an IPsec SA.
Section 9 Page 33
IKE Phase 1 _ Main Mode
Msg #1
Negotiation of basic and
hash algorithms Msg #2
Msg #
3 4
Exchange of public keys Msg #
and signature
Msg # 6
5 Msg #
Check of identities
(encrypted exchange)

IPSEC VPN Services
Internet Key Exchange - Phase 1 - Main Mode

The aim of the phase 1 is to configure a secure authentication channel between both parties.
IKE Main Mode consists of six messages that are exchanged between the initiator and the responder in order to
set up an IKE SA. The first 4 messages are legible and are used to determine the security parameters of
future exchanges.
In the first exchange (messages 1 and 2), both parties agree on the basic and hash algorithms:
Authentication (Preshared-key / RSA certificate).
Hash (MD5 / SHA-1 / …)
Encryption (DES / 3DES / AES / …)
DH groups (1 .. 5)
In the second exchange (messages 3 and 4), they exchange:

the public keys in the case of a Diffie-Hellman exchange,
a random number " NONCE" (Number used ONCE) (the random numbers if the other party must then be
signed and sent back to counter the attempts of replay).
• In the third exchange, (messages 5 and 6), they check their identities.
Section 9 Page 34
Parameters Used in the IKE SA Negotiation
Main parameters to secure the ISAKMP tunnel:
Encryption algorithms
(DES, 3DES, AES)
Hash algorithms
(MD5, SHA)
Authentication method
(Preshared-Key or Certificate)
Diffie-Hellman groups (DH Group1: 768-bit modulo,

DH Group2: 1024-bit modulo, …)
IPSEC VPN Services
Setting up a secure channel for IKE negotiation

The initiator supplies several components to the secure exchange of keys and the authentication:
The encryption algorithm to protect the data (DES, 3DES, AES128, etc.)
The hash algorithms in order to reduce the data intended to the signature (MD4, SHA)
An authentication method to sign the data (Preshared-Key, Certificate: RSA, DSA)
The choice of the Diffie-Hellman group (among 5)
Section 9 Page 35
IKE Phase 2 _ Messages
Msg #1
Negotiation of security
protocols for IPsec Msg #2
Authentication Msg #3
SPI: x SPI: y
Outgoing SA IPsec tunnel Incoming SA
•SPI: Security Parameters Index

IPSEC VPN Services
Phase 2: Quick Mode

The messages exchanged during phase 2 are protected as regards authenticity and confidentiality thanks
to the elements negotiated in phase 1.
Authenticity is ensured by the addition of a HASH block after the ISAKMP header.
Confidentiality is ensured by the encryption of all the message blocks.
All the messages of phase 2 are encrypted and authenticated by the IKE SA and the shared secret keys. The
phase 2 always comprises three messages. As there's no long operation in this phase, it has been called
"Quick mode". In this phase, both parties negotiate the parameters of IPsec SAs and calculate the second set
of shared secret keys from the secret value generated in phase 1 and from new random numbers.
Phase 2 is a quick phase because it uses secret-key cryptography instead of public-key cryptography that is a
lot slower and expensive.
The Quick Mode is used to negotiate SAs for some security protocols such as IPsec. Actually, each negotiation
gives two SAs, one for each way of communication.
More precisely, the different exchanges in this mode have the following purposes:
Negotiating a set of IPsec parameters (SA bundles).
Exchanging nonces, used to generate a new key from the secret generated in phase 1 with the Diffie-
Hellman protocol. Optionally, it is possible to create a new Diffie-Hellman exchange to access the property
of Perfect Forward Secrecy (PFS), which is not supplied if a new key is only generated from an old one and
from nonces.
Optionally, identifying the traffic this SA bundle will protect, by means of selectors (IDi and IDr optional
blocks; Without these blocks, the IP addresses of both parties are used).
Security Parameters Index (SPI): The SPI field is an arbitrary value of 32 bits that combined to the destination
IP address defines the unique Security Association (SA) of this datagram.
Section 9 Page 36
Negotiation of SAs for the Data
The initiator proposes:

Encryption algorithms
Hash algorithms
An authentication method
If Perfect Forward Secrecy (PFS) then Diffie-Hellman groups
The type of protection to be used (ESP or

AH)
A time to live

IPSEC VPN Services
Negotiation of security protocols for IPsec

The initiator proposes several components to ensure the exchange of secure keys and authentication:
Encryption algorithms to protect the data (DES, 3DES, AES128, etc.)
Hash algorithms to reduce the data intended to the signature (MD5, SHA)
If Perfect Forward Secrecy (PFS), the choice of the Diffie-Hellman group (among 5)
A Pseudo-Random Function (PRF) used to hash some values during key exchanges for verification purposes
(this is optional, a hash algorithm can simply be used)
A time to live in terms of time and volume.
For the IPsec AH protocol, the transform algorithms which can be negotiated are MD5, SHA and DES (MD5 and
SHA mandatory to implement).
For the IPsec ESP protocol, the transform algorithms which can be negotiated as a basis for authentication are
MD5, SHA and DES. The possible encryption algorithms are DES, 3DES, RC5, IDEA, … (DES being mandatory for
support).
Section 9 Page 37
Perfect Forward Secrecy (PFS)
Alice Bob
Network
Secret of Alice: a1 Session 1 Secret of Bob: b1

A1= ga1 mod n B1= gb1 mod n
Data K1= ga1b1 mod n Data
Secret of Alice: a2 Session 2 Secret of Bob: b2

A1= ga2 mod n B1= gb2 mod n
Data K2= ga2b2 mod n Data
Even if K1 is compromised, K2 is totally secured

IPSEC VPN Services
Perfect Forward Secrecy
When encrypted data goes through a public network, an attacker has many opportunities to intercept the
encrypted data. You can reduce the risk of interception by using larger and larger keys. But the larger the
keys, the slower and the more complex the encryption. This may alter the network performance.
A good compromise consists in using keys of a reasonable length, and to change them frequently. It also has
some problems. The new keys must not be generated from the old ones. Indeed, if a key is discovered, all
the traffic might be compromised.
So, it is necessary to implement a method to generate a new key which will not depend at all on the value of
the current key. Thereby, if someone intercepts your current key, this person can analyze only a small part
of the traffic. He/she will have to crack again another entirely independent key to analyze the other part of
traffic.
Two variants may be used to generate the keys that will be associated to the encryption, hash and
authentication of SAs specific to the negociatied application.
They can simply be generated from the ISAKMP SAs.
Some keys are generated again and are independent of the keys of ISAKMP SAs by exchanging new DH values.
This concept is called Perfect Forward Secrecy.
Section 9 Page 38
In which entities are the Security Associations (SAs) stored?
SPD
SAD
Which parameters might IPSEC use to select an SA?
Source IP address
Port number
ToS
Protocol
IPSEC VPN Services
Section 9 Page 39
Who selects the SAs to be used to forward a packet?
SPD
SAD
What are the main pieces of information stored in an IPSEC SA?

Algorithms used
Time To Live of SA
Operating mode (Tunnel/transport)
ESP/AH protocol
Sequence number counter
Anti-replay window
IPSEC VPN Services
Section 9 Page 40
What is the role of IKE in phase 1?

To negotiate the parameters of IPsec SA
To set up a secure channel
To select the algorithms for the traffic of data
In which mode is SPI selected?
Main mode
Aggressive mode
Quick mode

IPSEC VPN Services
Section 9 Page 41
Which function doesn't the aggressive mode ensure?

Authentication
Non-repudiation
Protection of the third parties' identities
The automatic management of IKE keys requires all the same, in the
Preshared key method, the manual introduction of a secret key. What is its
role?
To ensure authentication
To perform encryption
To perform a hash

IPSEC VPN Services
Section 9 Page 42
What does the Perfect Forward Secrecy (PFS) allow ?
Any new key is generated independently of the current

key
The protection of the identities
A longer time to live for the keys

IPSEC VPN Services
Section 9 Page 43
End of Section

IPSEC VPN Services
Section 9 Page 44

IP Tec For Mobile Networks PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

IP Tec For Mobile Networks PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Technology

IP for Mobile Networks

All Rights Reserved © Alcatel-Lucent 2009

All rights reserved © Alcatel-Lucent 2008

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

4 Where you can get further information

IP for mobile networks

If you want further information you can refer to the following:

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

Upon completion of this module, you should be able to:

Describe the basic concepts of communication over an IP network

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

Wide Area Network (WAN): coverage extends to wide geographical areas.

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All packets must know the destination address.

Data can arrive at the destination in any order.

All Rights Reserved © Alcatel-Lucent 2009

Connection-oriented communication is characterized by:

the setting up of a virtual circuit.

the identification of data by a path identifier.

the delivery of data in the order it is sent.

the need to release the connection after communication.

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

The main role of TCP/IP is the interconnection of networks.

All Rights Reserved © Alcatel-Lucent 2009

• Many kinds of connections:

- Virtual connections (Wide Area Networks),

surf the Net , ….

All Rights Reserved © Alcatel-Lucent 2009

Network interconnection brings into play different types of links:

These types of software are known as services.

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

Area 1 Area 7 IANA www.iana.org

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

data host server FTP www Mail

Phys@ Phys Phys@

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

The top of the stack

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

Application layer runs only at endpoints

Independent of data link (layer 2) protocol

Independent of network (layer 3) protocol

Independent of physical facilities used

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009

All Rights Reserved © Alcatel-Lucent 2009