Sie sind auf Seite 1von 133

Telecommunication Systems

SIP protocol extensions

Author: Franz Edler / October 2011

1 Overview .......................................................................................................................................... 5 1.1 Topic and Goal of Lesson ...................................................................................................... 5 1.2 Exercises .................................................................................................................................. 5 1.3 Audience and Preconditions.................................................................................................. 5 2 The best way to proceed ............................................................................................................... 6 2.1 Recommendation of the Lecturer ......................................................................................... 6 2.2 Schedule and Timing .............................................................................................................. 6 3 Event State Publication ................................................................................................................. 7 3.1 The state publication model .................................................................................................. 8 3.2 Protocol overview .................................................................................................................. 10 3.3 Publish framework applied for presence ........................................................................... 11 4 Event Packages ............................................................................................................................ 15 4.1 Presence Event Package .................................................................................................... 15 4.1.1 Package definition ......................................................................................................... 15 4.1.2 Presence information .................................................................................................... 16 4.2 Watcher Information Event Template-Package ............................................................... 22 4.3 An INVITE-Initiated Dialog Event Package for SIP ......................................................... 24 4.4 Further event packages ....................................................................................................... 28 4.4.1 Message Summary and Message Waiting Indication Event .................................. 28 4.4.2 Event Package for Conference State ......................................................................... 29 4.4.3 Event Package for Registrations ................................................................................. 29 4.4.4 Refer event ..................................................................................................................... 30 4.4.5 Debug Event ................................................................................................................... 30 5 The UPDATE method .................................................................................................................. 31 6 Resource Management ............................................................................................................... 33 6.1 Protocol overview .................................................................................................................. 34 6.2 SDP parameters and attributes .......................................................................................... 36 6.3 Option Tag .............................................................................................................................. 38 7 Third Party Session Control ........................................................................................................ 39 8 REFER Method ............................................................................................................................. 40 8.1 Referred-By header field ...................................................................................................... 42 8.2 Replaces header field ........................................................................................................... 43 9 Conferencing ................................................................................................................................. 48 9.1 Tightly Coupled SIP Conference ........................................................................................ 49 9.1.1 Creation of an Ad-hoc conference .............................................................................. 50 9.1.2 Immediate Conference creation with a URI list ........................................................ 51

9.1.3 Floor Control ................................................................................................................... 52 9.2 Decentralized Conferencing ................................................................................................ 52 9.3 Joining a conference............................................................................................................. 52 9.4 Join header field .................................................................................................................... 53 10 SIP Based Messaging ............................................................................................................... 54 10.1 Page Mode Instant Messaging ......................................................................................... 54 10.2 Session Mode Instant Messaging with MSRP ............................................................... 55 11 INFO method............................................................................................................................... 59 12 Service Configuration ................................................................................................................ 62 12.1 Overview on XML ................................................................................................................ 62 12.2 The XML Configuration Access Protocol (XCAP) ......................................................... 63 12.2.1 XCAP Overview ........................................................................................................... 63 12.2.2 XCAP Application usage ............................................................................................ 65 12.2.3 XCAP URIs ................................................................................................................... 65 12.2.4 Entity Tags and conditional operations .................................................................... 66 12.2.5 Subscriptions to changes in XML documents......................................................... 68 13 NAT and Firewall Traversal ...................................................................................................... 71 13.1 Network Address Translation ............................................................................................ 71 13.2 Firewalls................................................................................................................................ 72 13.3 Problems caused by NAT and Firewall Traversal ......................................................... 73 13.4 SIP Protocol Enhancements ............................................................................................. 75 13.4.1 Symmetric Response Routing ................................................................................... 75 13.4.2 Symmetric RTP/RTCP ................................................................................................ 76 13.4.3 RTCP attribute in SDP ................................................................................................ 77 13.5 Classical NAT and FW Traversal Solutions ................................................................... 77 13.5.1 NAT and FW categorisation....................................................................................... 78 13.5.2 (Classic) STUN protocol ............................................................................................. 79 13.6 The perfect NAT and FW Traversal Solution ................................................................. 80 13.6.1 NAT and FW Behavior Requirements...................................................................... 81 13.6.2 The new STUN protocol ............................................................................................. 82 13.6.3 Traversal Using Relays around NAT (TURN)......................................................... 86 13.6.4 Interactive Connectivity establishment .................................................................... 89 13.6.5 Client initiated connections ........................................................................................ 92 13.7 External and proprietary Solutions ................................................................................... 94 13.7.1 Application Layer Gateways ...................................................................................... 94 13.7.2 UPnP.............................................................................................................................. 95 13.7.3 Skype ............................................................................................................................. 95 13.7.4 SIP Express Router ..................................................................................................... 95 14 Session Timer ............................................................................................................................. 97 15 Caller Preferences and UA Capabilities ............................................................................... 101 15.1 User Agent Capabilities ................................................................................................... 101 15.1.1 Feature tags ............................................................................................................... 102

15.1.2 Expression of capabilities ......................................................................................... 103 15.2 Caller Preferences ............................................................................................................ 104 15.2.1 Feature preferences .................................................................................................. 104 15.2.2 Request handling preferences ................................................................................ 105 16 Global Routable User URI (GRUU)....................................................................................... 107 17 Identity Management ............................................................................................................... 111 18 ENUM ......................................................................................................................................... 115 19 Privacy Mechanism .................................................................................................................. 117 20 Reason ....................................................................................................................................... 119 21 Path............................................................................................................................................. 120 22 Service-Route ........................................................................................................................... 122 23 Request History ........................................................................................................................ 124 24 SIP-Connected-Id ..................................................................................................................... 127 25 Questions ................................................................................................................................... 129

1 Overview
1.1 Topic and Goal of Lesson
The main topic of the Lecture is the basic understanding of the Session Initiation Protocol (SIP). This protocol has its origin in the Internet standardization but was later on also accepted by traditional network operators as the basis for a modern IP based network architecture. This is the second lecture note on SIP. It builds on the content of the first lecture note (basic SIP protocol) and covers some of the most important protocol extensions to SIP with a specific view of the application of SIP in commercial operator networks including IMS1. At the end of the lesson the student will have a good understanding of SIP. He will be able to analyze SIP message flows and perhaps find bugs and is able to identify misbehavior in implementations. In ideal case the lesson will be accompanied by practical exercises in the lab using e.g. open source implementations of SIP servers2 and free SIP clients. A VMware3 image is always available at the University institute which enables the student to verify and enhance the basic knowledge on SIP by running a SIP server on the own notebook computer. The lecture will also encourage the interested student to look into RFCs in certain situations to get first-hand information on more details of the protocol and to get acquainted with reading an RFC.

1.2 Exercises
The last chapter of the lecture note includes a list of questions on each chapter which the student should be able to answer after the lecture. These questions are also good basis for preparing to the final examination.

1.3 Audience and Preconditions

The lecture is targeted to students of the University of Applied Sciences in master courses. A good understanding of the basic Internet protocols (TCP/IP, DNS etc) is required.

1 2

IMS = IP Multimedia Subsystem; a SIP based network architecture used by fixed and mobile network operators Examples of open source SIP servers are: Kamailio - the Open Source SIP Server at The OpenSIPS Project at: SER - SIP Express Router (the mother of above projects) at: 3 VMware: A SW - virtualization product to run e.g. a GNU/Linux server on a notebook computer on top of Windows

2 The best way to proceed

2.1 Recommendation of the Lecturer
The lecturer expects the students to prepare themselves according to the schedule of the lecture. The student should read the relevant chapters of the lecture note in advance, listen to the lecture and raise hands for questions if something is not as clear as it should be.

2.2 Schedule and Timing

The actual schedule of the lesson can be found on the intranet web-site. Do not under-estimate the complexity of SIP and start early to ask questions! In ideal case as already mentioned the theoretical stuff should be accompanied by practical experience. Ask the lecturer for advice if required.

3 Event State Publication

The event notification framework 4 has already been explained in the first lecture note. It is based on SUBSCRIBE and NOTIFY requests. It enables a watcher to subscribe on a specific event and a notifier to send spontaneous notifications about state changes. This model has sometimes scalability problems in case of many resources and watchers. The principle problem is shown in Figure 1. There are six users where some of them subscribe to the presence state of some other users. The scalability problem becomes obvious when we imagine a group of 20 or more users interested in each other presence state.







Figure 1: Mutual subscription to presence state information To overcome the scalability issue a framework for publishing event states5 has been defined. Without this extension a resource has to send all NOTIFY requests itself and will probably run into performance problems when the group of watchers becomes large. The event publishing framework enables an event publishing agent (EPA) to publish its state change to an event state compositor (ESC) which aggregates the state from various EPAs of a resource. The aggregated state is offered to a state agent, which acts on behalf of the resource and processes SUBSCRIBE

4 5

RFC 3265: SIP-Specific Event Notification RFC 3903: SIP Extension for Event State Publication

requests from watchers and sends NOTIFY requests in response. The resources never get any of the SUBSCRIBE requests. Thus the task of sending NOTIFY requests is delegated to a state agent which is implemented on a powerful server. A further advantage of publishing mechanism is that the state agent may correlate and composite state information of a distributed resource to a single NOTIFY request, which means a reduction in network traffic.

3.1 The state publication model

The above mentioned state publication model is shown in Figure 2.

SUBSCRIBE NOTIFY Event State Compositor State Agent


Watcher 1








Watcher 2

Event Publishing Agent 1

Event Publishing Agent 2

Event Publishing Agent 3

Figure 2: State publication model In this example the event state compositor receives state information about a distributed resource, aggregates this information and offers it to a state agent. The state agent acts on behalf of the resources, receives the SUBSCRIBE requests of watchers and sends NOTIFY requests when the (composite) state changes. As a practical example we may apply this concept to the well-known presence state. This means: A resource (the presence state of a person6) may use three user agents which publish presence state (e.g. notebook, PDA and phone). Each of the user agents publishes its actual state (on-line/off-line) to the event state compositor using a PUBLISH request.

For the presence entity of a person also the term presentity has been defined.

The event state compositor aggregates the presence state information and offers the composite state to the state agent. When a watcher requests presence state information for a person he sends a SUBSCRIBE request and the SUBSCRIBE request is forwarded to the state agent. Then it will receive a composite state information in one NOTIFY request sent by the state agent.

The state publication model may be applied to any event, for which an event package has been defined. A further example may be the message waiting event. The state resources in this case are different mailboxes (e-mail, voice-mail, etc) of a user. These resources send PUBLISH requests, whenever e.g. a new message arrives. The event state compositor aggregates the state information and the state agent sends the composites state of all mailboxes in a NOTIFY request. Figure 3 shows an example message flow of combined SUBSCRIBE, PUBLISH and NOTIFY operations. Details to the message flow are explained in the next chapter.

Event Publishing Agent EPA

Event State Compositor and State Agent



Initial publication

State refresh

State modification


Figure 3: Combined PUBLISH, SUBSCRIBE and NOTIFY message flow

3.2 Protocol overview

RFC 3903 defines a new SIP method, PUBLISH, for publishing event state. A PUBLISH request is comparable to a REGISTER request. It allows a user to create, modify, and remove state in a State Agent (compareable to a registrar). The State Agent manages the state on behalf of the user. The user may have multiple User Agents or endpoints that publish event state. Each endpoint may publish its own unique state, out of which the event state compositor generates the composite event state of the resource. In addition to a particular resource, all published event state is associated with a specific event package. With a subscription to that event package, a watcher is able to discover the composite event state of all of the active publications. The User Agent Client (UAC) that publishes event state is called an Event Publication Agent (EPA). The entity that processes the PUBLISH request is called an Event State Compositor (ESC). An interesting question is how the Request URI of PUBLISH and SUBSCRIBE requests is populated. The answer is: The R-URI of PUBLISH and SUBSCRIBE requests is set to the AoR of the resource. This means that unlike an INVITE request, where the inbound proxy queries the location database and forwards the request to the actual location of a user agent, SUBSCRIBE and PUBLISH requests have to be routed to the State Agent if a State Agent is used. This requires a special handling of SUBSCRIBE and PUBLISH requests at an inbound proxy server. PUBLISH requests create soft state information in the ESC. This event soft state has a defined lifetime and will expire after a negotiated amount of time, requiring the publication to be refreshed by subsequent PUBLISH requests. There may also be a hard state information provisioned for each resource for a particular event package. The hard state represents state information that is present at all times and does not expire. The ESC may use event hard state in the absence of, or in addition to, soft state information provided through the PUBLISH mechanism. The body of a PUBLISH request carries the published event state. In response to every successful PUBLISH request, the ESC assigns an identifier to the publication in the form of an entity-tag (ETag). The entity tag is used to keep state information in-sync between the resource (EPA) and state agent. The EPA includes a SIP-ETag header field in any subsequent PUBLISH request that modifies, refreshes or removes the event state of that publication. When the event state expires or is explicitly removed, the entity-tag associated with it becomes invalid. An Expires header field in a PUBLISH request designates the life time of a published soft state. The ESC may accept or perhaps shorten that interval but it will never increase that value. If the lifetime interval of a published state is too short for an ESC it will reject the PUBLISH request with a 423 (Interval Too Brief) response containing a Min-Expires value which the EPA has to follow. This is the same mechanism as used by the REGISTRAR with the Expires header field value in REGISTER requests. For the PUBLISH request two new header fields are defined:

SIP-ETag: This header field is generated by the ESC and contains an Entity-Tag. Whenever an ESC receives a PUBLISH request it marks its actual state with a SIP-ETag value and returns this value in an 200 (OK) response. The ETag value is then used by the EPA to distinguish initial state publication from refreshes and modifications. SIP-If-Match: The EPA re-uses the latest SIP-ETag value received from the ESC and repeats that value in a new PUBLISH request. The first (initial) PUBLISH request of an EPA does not contain a SIP-If-Match header field.

The different publication operations are distinguished by the presence of the SIP-If-Match header field, the presence of a message body and the value of the Expires header field according to table below.

Operation Initial Refresh Modify Remove

PUBLISH contains a message body yes no yes no

PUBLISH contains a SIP-If-Match header field no yes yes yes

Expires value >0 >0 >0 0

In case the entity tag in the SIP-If-Match header field in a PUBLISH request does not contain the expected value the ESC will reject the request with a failure response 412 (Conditional Request Failed). This is a new failure response code defined by RFC 3903.

3.3 Publish framework applied for presence

The first application for the PUBLISH framework was the presence event. In this case a specific terminology is used: The Event Publishing Agent (EPA) is called the Presence User Agent (PUA) The Event State Compositor is called Presence Compositor The State Agent is called Presence Agent (PA)

More details on the presence event package can be found in chapter 4.1 on page 15. Figure 4 shows the content of an initial PUBLISH request and the 200 (OK) response for a presence event. The actual state information for presence within the message body is not shown in this figure. It is an XML formatted Presence Information Data Format (PIDF) document.


PUBLISH SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bK652hsge To: <> From: <>;tag=1234wxyz Call-ID: CSeq: 1 PUBLISH Max-Forwards: 70 Expires: 3600 Event: presence Content-Type: application/pidf+xml Content-Length: ... [Published PIDF document] SIP/2.0 200 OK Via: SIP/2.0/UDP;branch=z9hG4bK652hsge ;received= To: <>;tag=1a2b3c4d From: <>;tag=1234wxyz Call-ID: CSeq: 1 PUBLISH SIP-ETag: dx200xyz Expires: 1800 Figure 4: PUBLISH initial state publication The initial PUBLISH request does not include an SIP-If-Match header field but the 200 (OK) contains a SIP-ETag header field as expected. The example also shows that the Presence Server has reduced the Expires header field value in 200 (OK) from 3600 to 1800 (seconds). The next figure (Figure 5) shows a state refresh cycle when the presence agent determines that the previously published state of Figure 4 is about to expire. The PUBLISH request now uses the value of the previously received SIP-ETag in the SIP-If-Match header field. It does not contain a message body because the state did not change. In the 200 (OK) response the presence server inserts a new SIP-ETag value. As no state change has occurred the presence server in this case does not send any NOTIFY requests (refer also to Figure 3).


PUBLISH SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bK771ash02 To: <> From: <>;tag=1234kljk Call-ID: CSeq: 1 PUBLISH Max-Forwards: 70 SIP-If-Match: dx200xyz Expires: 1800 Event: presence Content-Length: 0 SIP/2.0 200 OK Via: SIP/2.0/UDP;branch=z9hG4bK771ash02;received= To: <>;tag=2affde434 From: <>;tag=1234kljk Call-ID: CSeq: 1 PUBLISH SIP-ETag: kwj449x Expires: 1800 Figure 5: PUBLISH state refresh Figure 6 shows the situation of a state change of a presence user agent. When the PUA detects a change of state it sends a PUBLISH request with an updated state information in the message body. The SIP-If-Match header field again refers to the last received entity tag value.


PUBLISH SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bKcdad2 To: <> From: <>;tag=54321mm Call-ID: CSeq: 1 PUBLISH Max-Forwards: 70 SIP-If-Match: kwj449x Expires: 1800 Event: presence Content-Type: application/pidf+xml Content-Length: ... [Published PIDF Document] SIP/2.0 200 OK Via: SIP/2.0/UDP;branch=z9hG4bKcdad2 ;received= To: <>;tag=effe22aa From: <>;tag=54321mm Call-ID: CSeq: 1 PUBLISH SIP-ETag: qwi982ks Expires: 1800

Figure 6: PUBLISH state change


4 Event Packages
The event notification framework of RFC 3265 is only the framework to implement event handling. Every specific event hast to be specified in a separate RFC as a so called event package. Event packages define a specific instantiation of the event notification framework. An event package defines the specific event and its characteristics like name of the event, message bodies of NOTIFY and SUBSCRIBE, state information etc.7 This chapter introduces some frequently used event packages. An actual list of already specified event packages (referring to their specific RFC) can be found at IANA the Internet Assigned Numbers Authority8. Actually there are 20 event packages already specified and some other are in discussion in various IETF working groups.

4.1 Presence Event Package

4.1.1 Package definition
RFC 3856 defines the following entities: Presence User Agent (PUA): This is the source of presence state information of a presentity (it may only carry a piece of whole presence state information). Presence Agent (PA): This is the network element that receives SUBSCRIBE requests and sends NOTIFY requests. It has presence state information of a presentity which it gathered by whatever mechanism (e.g. PUBLISH request based, access to the location server database, etc). Presence Server: This is a server the usually acts as a PA but it may also forward subscribe requests as a proxy server for those presentities where it is not responsible. Edge Presence Server: In case of no centralised presence server a PUA may also act as a presence server. This usually means a less scalable solution. PRES URI and SIP URI A presentity may be addressed by a pres URI which has been defined by IMPP as a protocol neutral addressing scheme. In case of SIP as the underlying presence protocol a SIP URI is used. Authorisation When a watcher subscribes to the presence state of a presentity the presence agent (PA) usually authenticates and authorizes the watcher because presence information is considered as sensitive. The authorisation may be based on policy rules or on an explicit authorisation by the owner of the presentity. Policy rules may be provisioned in form of an XML based document on an
7 8

See Lecture Note 1 chapter 15.2


XML document server (XDMS). Centralised XML documents can be provisioned by the XCAP 9 protocol. The authorisation rules may define different levels of authorisation, so that not every watcher will get the same amount of information. A geographic location information can be part of the presence state but will perhaps not be offered to everybody. In case an authorisation cannot be solved immediately via some policy rules the SUBSCRIBE request is answered with a 202 (Accepted) response and the Subscription-State header field is set to pending. The NOTIFY request in this case (which must be sent in any case when a SUBSCRIBE has been received) will only contain a neutral or dummy state information. The owner of the presentity may be notified of the new authorisation request by subscribing to a watcher information event on its own presence. The watcher information event is a kind of meta event which can be applied to every event package. Further details on the watcher information event can be found in chapter 4.2 on page 18. Message Bodies The SUBSCRIBE request may contain a message body describing some filter information. A filter may reduce the amount of state information to only a specific aspect where the watcher may be interested in, e.g. the possibility to send instant messages. The NOTIFY request contains the presence state information, the format of which may have different levels. It may be the simple PIDF based information or enriched by various extensions (see next chapter). In any case the format of the NOTIFY message body must correlate to the format the watcher is able to understand (Accept header field of SUBSCRIBE request).

4.1.2 Presence information

The structure of a presence document is based on the PIDF data model defined in RFC 385910 which is compliant to the Common Profile for Presence (CPP11). Both documents define the protocol neutral presence data format. PIDF data format The PIDF based presence information is very limited in describing presence (this was the price for interoperability). It is denoted as application/pidf+xml in the Accept and Content-type header field of SIP messages (e.g. SUBSCRIBE and NOTIFY requests). Figure 7 shows a simple PIDF example and Figure 8 shows the structure behind PIDF.

RFC 4825: The XML Configuration Access Protocol (XCAP) RFC 3859: Presence Information Data Format 11 RFC 3589: Common Profile for Presence


<?xml version="1.0" encoding="UTF-8"?> <presence xmlns="urn:ietf:params:xml:ns:pidf" entity=""> <tuple id="sg89ae"> <status> <basic>open</basic> </status> <contact priority="0.8">tel:+09012345678</contact> </tuple> </presence> Figure 7: Simple presence data in PIDF structure


The <presence> element contains - an entity element with the name of the presentity - the namespace declaration #n Tuples provide a way of segmenting presence information; Each <tuple> element must contain an id attribute


<status> <basic> #n The optional <basic> element contains either open or closed, expressing the ability to receive instant messages


<extension> <contact>


<contact> element is optional, contains a communicatiuons address, may contain a priority attribute #n <note> element is optional, may contain a human readable comment

<note> <timestamp>

<timestamp> element is optional and contains date and time of status change of this tuple





The Presence Data Model (RFC 4479) uses this two extension elements for the <person> and <device> components

Figure 8: Structure of PIDF (Presence Information Data Format)


Data model for presence The PIDF data format for presence has been used by the SIMPLE group as the basis of a presence data model12. This data model for presence offers the possibility to map real-world communications systems built around SIP in particular into a presence document. There are three components assigned to a presentity in the data model: the person, the service and the device. Each attribute in a presence document is affiliated to the service, person or device because they describe a facet of that service, presentity or device. Figure 9 shows that model and possible relationships between the components. The person component models information about the presentity under consideration. A person may represent a group such as a help desk. Examples of presence attributes related to a person are her/his activity, her/his willingness to communicate, her/his picture. The model supports only one person component per presentity. The service data components model the forms of communications for interacting with the presentity. Examples of services through which a presentity may communicate are sessions (audio, video), Instant Messaging, E-mail etc. The device data components model the physical equipment in which services execute: for instance a PC, a PDA, or a mobile phone. A given service may execute in more than one device, therefore the mapping of services to devices is many to many. Devices are uniquely identified with a device ID.

Presentity URI








Figure 9: Presence Data Model of SIMPLE


RFC 4479: A Data Model for Presence


The presence data model of Figure 9 has now been mapped to the PIDF data format of Figure 8. The solution was to use the existing <tuple> element to represent the service and to add the <person> and <device> elements as extension elements. Extension to PIDF: RPID PIDF does not define presence attributes beyond the <basic> status element. RFC 448013 defines therefore Rich Presence Extensions to PIDF. These are additional presence attributes that extend the PIDF <tuple> element and the <device> and <person> elements defined in the data model. The extensions have been chosen to provide features common in existing presence systems, in addition to elements that could readily be derived automatically from existing sources of presence, such as calendaring systems or communication devices, or sources describing the user's current physical environment. Table 1 shows which component of the data model for presence can be enriched by the elements defined in RPID. It also indicates whether from/until attributes are applicable as well as whether a <note> element can be included in the element. Elements that do not have from/until parameters must not appear more than once in each <person>, <tuple>, or <device>. The additional data elements defined by RPID are shortly explained below. This should give an impression what detailed presence information may be offered.

from/until attributes <activities> <class> <deviceID> <mood> <place-is> <place-type> <privacy> <relationship> <service-class> <sphere> <status-icon> <time-offset> <user-input> x x x x x x x x

<note> x

<person> x x

<tuple> service x x


x x x x x x

x x x x x x x x x x x x x x

Table 1: Mapping RPID elements to data model components


RFC 4480: RPID Rich presence extensions to PIDF


The <activities> element describes what the person is currently doing. A person can be engaged in multiple activities at the same time, e.g., traveling and having a meal. This information enables a watcher to evaluate how appropriate a communication attempt is and what is the better way for communicating. Here are some examples of activities: away, appointment, meeting, meal, breakfast, lunch, dinner, busy, holiday, in-transit, looking-for-work (for paid work), sleeping, travel... Most of them can be derived from calendar information. The <class> element describes the class of the service, device, or person. Multiple elements can have the same class name within a presence document, but each person, service, or device can only have one class label. The naming of classes is left to the presentity. The <deviceID> element represents a way to map a service component to a device component. One service can be provided by multiple devices, so that each service tuple may contain zero or more <deviceID> elements. The <mood> element describes the mood of the person. For example: confused, amazed. The <place-is> element describes properties of the place the person is currently at. This offers the watcher an indication of what kind of communication is likely to be successful. Each major media type has its own set of attributes: - audio (noisy, ok, quiet, unknown) - video (toobright, ok, dark, unknown) - text (uncomfortable, inappropriate, ok, unknown) The <place-type> element describes the type of place the person is currently at. This offers the watcher an indication of what kind of communication is likely to be appropriate. The initial set of values is defined in RFC458914 The <privacy> element indicates which types of communication third parties in the vicinity of the presentity are unlikely to be able to intercept accidentally or intentionally. The <relationship> element extends <tuple> and designates the type of relationship an alternate contact has with the presentity. This element is provided only if the tuple refers to somebody other than the presentity. Relationship values include "family", "friend", "associate" (e.g., for a colleague), "assistant", "supervisor", "self", and "unknown". The default is "self". The <service-class> element extends <tuple> and designates the type of service offered: electronic, postal, courier, freight, in-person... The <sphere> element designates the current state and role that the person plays. For example, it might describe whether the person is in a work mode, at home, or participating in activities related to some other organization such as the IETF or a church. RFC4480 does not define names for these spheres except for two common ones, "work" and "home", as well as "unknown".

RFC4589: Location Types Registry


Spheres allow the person to easily turn on or off certain rules that depend on what groups of people should be made aware of the person's status. The <status-icon> element includes a URI pointing to an image (icon) representing the current status of the person or service. The watcher may use this information to represent the status in a graphical user interface. The <time-offset> element describes the number of minutes of offset from UTC at the person's current location. A positive number indicates that the local time-of-day is ahead (i.e., east of) Universal Time, while a negative number indicates that the local time-of-day is behind (i.e., west of) Universal Time. The <user-input> element records the user-input or usage state of the service or device, based on human user input, e.g., keyboard, pointing device, or voice.

Further extensions to PIDF The following extensions to PIDF are only mentioned shortly. The interested student may look into the referenced document. Timed Presence The indication of status information for time intervals, either in the past or in the future, can be achieved via the <timed-status> element, defined in RFC 488115 as a child of the <tuple> element. Contact Information RFC 448216 describes elements for providing a "business card", including references to the homepage, map, representative sound, display name, and an icon Geographic Location RFC 411917 describes an object format for carrying geographical information. It extends the 'status' element of PIDF with a complex element called 'geopriv'. SIP User Agent Capabilities The SIP User Agent Capabilities defined in RFC 3840 (see also chapter 15.1 on page 101) can be added to RPID.

15 16

RFC 4481: Timed Presence Extensions to PIDF RFC 4482: CIPID: Contact Information in PIDF 17 RFC 4119: A Presence-based GEOPRIV location object format

4.2 Watcher Information Event Template-Package

The Watcher Information Event Template-Package18 is a special meta-package which can be applied to any event package. It is a regular SIP event package but it is always associated with some other event package. The Watcher Information Event Template-Package is denoted with the token "winfo" which is appended to the event name where it is applied. For any event package, such as presence, there exists a set (perhaps an empty set) of subscriptions that have been created or are requested by users trying to get the state of a resource in that package. This set of subscriptions changes over time as new subscriptions are requested by users, old subscriptions expire, and subscriptions are approved or rejected by the owners of that resource. The set of users subscribed to a particular resource for a specific event package, and the state of their subscriptions, is referred to as watcher information. Since this state is itself dynamic, it is reasonable to subscribe to it in order to learn about changes to it. The watcher information event template-package is meant to facilitate exactly that - tracking the state of subscriptions to a resource in another package. The most prominent example usage of this event package is the presence event. When the presence function is handled by a centralized presence agent the presentity does not recognize anymore when a new watcher attempts to subscribe to its presence state. The watcher information event package enables now the presentity to get notifications when the number or state of watchers changes. The event package in this case is named presence.winfo".
Presence Agent

E RIB BSC .winfo SU e enc pres
IFY NOT .winfo e enc pres







Figure 10: Application of Watcher Info event package to the presence event


RFC 3857: A Watcher Information Event Template-Package for SIP


The application of the watcher info event package to the presence event is illustrated in Figure 10. An example SUBSCRIBE and NOTIFY request for presence.winfo package is shown in Figure 11. It shows the SUBSCRIBE request of the presentity B to its own watcher information event state and the NOTIFY request it receives when A subscribes to B's presence. In this case the presence subscription of A requires authorisation (status pending).. SUBSCRIBE SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bKnashds7 From:;tag=123s8a To: Call-ID: Max-Forwards: 70 CSeq: 9887 SUBSCRIBE Contact: Event: presence.winfo

NOTIFY SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bKna66g From:;tag=xyz887 To:;tag=123s8a Call-ID: Max-Forwards: 70 CSeq: 1288 NOTIFY Contact: Event: presence.winfo Content-Type: application/watcherinfo+xml Content-Length: ... <?xml version="1.0"?> <watcherinfo xmlns="urn:ietf:params:xml:ns:watcherinfo" version="0" state="full"> <watcher-list resource="" package="presence"> <watcher id="7768a77s" event="subscribe" status="pending"></watcher> </watcher-list> </watcherinfo> Figure 11: SUBSCRIBE an NOTIFY on presence.winfo event package The message body of the NOTIFY request contains a watcher information document. This document describes some or all of the watchers for a resource within a given package, and the state of their subscriptions. The format of the document is named application/watcherinfo+xml and is defined in RFC 385819.


RFC 3858: An XML Based Format for Watcher Information


4.3 An INVITE-Initiated Dialog Event Package for SIP

There are situations before session setup or during a session where the knowledge about the dialog state of session partners may enable advanced applications. Examples for such applications are: Automatic call-back: This feature is already used in PSTN. When user A calls user B but User B is busy user A would like to get a call-back when user B hangs up. When B hangs up, user A's phone rings. When A picks up, they hear ringing, while they are being connected to B. To implement this with SIP, a mechanism is required for A to receive a notification when the dialogs at B are complete. Presence enabled conferencing: In this application, user A wishes to set up a conference call with users B and C. Rather than being scheduled, the call is created automatically when A, B and C are all available. To do this, the server providing the application would like to know whether A, B, and C are "online", not idle, and not in a phone call. If the server subscribes to the dialog state of those users it will receive notifications as these states change. Shared line services: These are services where a group of user agents share some common identities and each user agent requires knowledge about the state of each other. These applications can be supported by the INVITE-initiated dialog event package20. This event package enables a user to subscribe to the dialog state of another user agent. Whenever the state of the monitored user changes a NOTIFY request is sent. There are of course some privacy concerns regarding this event package. Not everybody should be able to see all details of activities of a user agent. Therefore several options are defined with the event package. Provided that the user is authorised at all to get any dialog state information the minimum information offered is the actual dialog state (e.g. idle/terminated, trying, proceeding, confirmed) only. In contrast a maximum amount of state information may include detailed some header fields and even SDP data about the sessions of the monitored user agent. The event is named dialog and the message body of the NOTIFY requests carries the dialog state information. The dialog state is transported in the message body of a NOTIFY request formatted as an XML document named "application/dialog-info+xml".


RFC 4235: An INVITE-Initiated Dialog Event Package for SIP


Figure 12 shows an example of a minimum dialog state information carried in a dialog state XML document. <?xml version="1.0"?> <dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="0" state="full" entity=""> <dialog id="as7d900as8"> <state>confirmed</state> </dialog> </dialog-info> Figure 12: Minimum dialog state information XML document The state information in Figure 12 shows the actual state of a dialog: confirmed. This means that the user agent sending this information has received 200 (OK) and is engaged in a session. When the session is finished the state information will change to terminated and again a NOTIFY request will be sent. RFC 4235 defines a dialog state machine which specifies when a certain state transition happens. The XML document contains also a version, state and entity attribute. The version contains a number which is incremented with every NOTIFY request, the state attribute describes the information as either full or partial and the entity contains the URI that identifies the user whose dialog information is provided. The next example shows more detailed dialog state information. Figure 14 shows an INVITE request sent by a UAC which is monitored by an INVITE-Initiated Dialog Event. This INVITE request evokes a NOTIFY request with the XML document shown in Figure 14. INVITE SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bKnashds8 Max-Forwards: 70 To: Bob <> From: Alice <>;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314159 INVITE Contact: <> Content-Type: application/sdp Content-Length: 142 [SDP not shown] Figure 13: Example INVITE request


<?xml version="1.0"?> <dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="0" state="full" entity=""> <dialog id="as7d900as8" call-id="a84b4c76e66710" local-tag="1928301774" direction="initiator"> <state>trying</state> </dialog> </dialog-info> Figure 14: Corresponding XML document in a NOTIFY request The XML document in Figure 14 also includes details of the dialog-ID . When the dialog setup proceeds additional NOTIFY requests are sent with the state elements early and confirmed and also the remote-tag attribute will be included. The event package definition also allows to send partial state information where only the changed parts are included. The application of the INVITE initiated dialog event to implement an automatic call-back service is shown in Figure 15.


User Agent A (Caller)

User Agent A invites User Agent B. User Agent B is already engaged in a session and responds with 486 (Busy). INVITE 486 Busy ACK

User Agent B (Callee)

User Agent A activates automatic call-back by


Subscription on the dialog event. The immediate delivered XML document in NOTIFY message body may look like:
<?xml version="1.0"?> <dialog-info entity=""> <dialog id="as7d900as8> <state event="2xx">confirmed</state> </dialog> </dialog-info>

After some time the session of User Agent B ends and sends a NOTIFY request to User Agent A

The NOTIFY request now contains the following XML-document:

<?xml version="1.0"?> <dialog-info entity=""> <dialog id="as7d900as8> <state event="4xx">terminated</state> </dialog> </dialog-info>


When the dialog terminates also the subscription ends automatically. Now an automatic call-back may be started by User Agent A.
INVITE 180 Ringing

Figure 15: Call-back service based on an INVITE initiated dialog event

RFC 4235 also defines two new media feature tags (see chapter 15.1 on page 101), which sometimes are used in combination with an INVITE initiated dialog event: sip.rendering: This feature tag indicates if the user agent is actually rendering any media stream. It may take the values "yes", "no", and "unknown". The feature tag sip.rendering=no indicates that the user agent actually ignores the media stream received. This is typically used when putting a session partner on hold. sip.byeless: This feature tag indicates that the user agent is able to terminate a session on its own. This may be used by an announcement machine continuously playing an announcement.


4.4 Further event packages

The following event packages are not described in detail but only by a short description. The method is always the same. Whenever a possible application for a specific event is found an RFC defining the details of the event package is created. An actual overview on already defined event packages can be found at IANA, the official registration authority for Internet numbers and protocol parameters. Event packages are registered at:

4.4.1 Message Summary and Message Waiting Indication Event

RFC 384221 defines an event to carry message waiting status and message summaries from a messaging system to an interested User Agent Message Waiting Indication is a common feature of telephone networks. It typically involves an audible or visible indication that messages are waiting, such as playing a special dial tone (which in telephone networks is called message-waiting dial tone), lighting a light or indicator on the phone, displaying icons or text, or some combination. A User Agent (typically an IP phone or SIP software User Agent) subscribes to the status of its messages. The notifier then notifies the Subscriber each time the messaging account's messages have changed. It sends a message summary in the body of a NOTIFY, encoded as shown below in Figure 16. NOTIFY SIP/2.0 To: <>;tag=78923 From: <>;tag=4442 Date: Mon, 10 Jul 2000 04:28:53 GMT Contact: <> Call-ID: CSeq: 31 NOTIFY Event: message-summary Subscription-State: active Content-Type: application/simple-message-summary Content-Length: 503 Messages-Waiting: yes Message-Account: Voice-Message: 4/8 (1/2) Figure 16: Message Summary and Message Waiting Indication Event

In this case the Content-type of message body is defined as application/simple-messagesummary, which is plain text based and not XML format. The numbers in the last line of the


RFC 3842: A Message Summary and Message Waiting Indication Event Package for SIP

message body show the summary of new/old messages and in parenthesis the summary of urgent messages. A User Agent may also explicitly fetch the current status by sending a SUBSCRIBE request with Expires header field set to zero.

4.4.2 Event Package for Conference State

RFC 4575 defines an event package for conferencing state22. In SIP, conferences are represented by URIs. These URIs identify a SIP user agent called a focus that is responsible for ensuring that all users in the conference can communicate with each other. The conference package allows users to subscribe to a conference URI. Notifications are sent about changes in the membership of this conference and optionally about changes in the state of additional conference components. The message body is defined as an XML document named conference-info+xml. The structure of the XML document contains - conference description - conference-state - users (identity, media information) - sidebars . etc Due to the huge amount of data Notifications do not normally contain full state; rather, they indicate the state that has changed. only

4.4.3 Event Package for Registrations

A rather unexpected event package has been required by the IMS (IP Multimedia Subsystem23) standardisation group. This package fills a gap at registration procedure. The registration is only a one-way procedure which allows the user to register at the network, but there is no way for a network operator to actively de-register a user. Such a procedure is required in commercial networks e.g. in case a user advises the operator that his/her mobile terminal was lost or stolen. In this case the network nodes at the border of the network (P-CSCF24), which are responsible for offering access to the network, are immediately informed about the user being deregistered. To get this information the P-CSCFs subscribe to the registration event25.

22 23

RFC 4575: Event Package for Conference State IMS defines the SIP based the network architecture for carrier networks 24 P-CSCF: Proxy Call Session Control Function 25 RFC 3680: A SIP Event Package for Registrations

4.4.4 Refer event

The REFER request enables a user agent to request the recipient of the REFER request to refer to a resource provided in the request. It also provides an implicit subscription mechanism allowing the party sending the REFER to be notified of the outcome of the referenced request. The NOTIFY sent after receipt of the REFER request informs the referrer about the outcome of the requested action. Further details can be found in chapter 7 on page 39 at the description of the REFER method.

4.4.5 Debug Event

For debugging purposes user agents and network nodes should be able to log and transfer debugmessages (SIP trace) on occasion. This is handled by user agents and network nodes subscribing to a debug event. The actual trigger conditions for starting the logging are transferred by a NOTIFY request26.


SIP Event Package for Debugging: draft-dawes-sipping-debug-event


5 The UPDATE method

The INVITE method is used for the initiation and modification of sessions. However, this method always affects two important pieces of state. It impacts the session (the media streams SIP sets up) and also the dialog (the signaling state that SIP itself defines). While this is reasonable in many cases, there are some important scenarios in which this coupling causes complications. The primary difficulty is when parameters of a session need to be modified before the initial INVITE has been answered. An example of this situation is a session setup, where resource reservation is used and the successful reservation has to be confirmed to the session partner. The status of resource reservation is transported in special attributes of SDP (see chapter 6 on page 33) and therefore an additional SDP offer/answer has to be used. However, a re-INVITE cannot be used for this purpose, because the first INVITE transaction is still pending. As a consequence a solution has been defined that allows the caller or callee to provide updated session information before a final response to the initial INVITE request has been generated. The UPDATE method27 fulfils that need. It can be sent by a UA within a dialog (early or confirmed) to update session parameters without impacting the dialog state itself. Unlike INVITE the UPDATE method is a simple transaction which is also immediately answered. Therefore also no additional ACK request is defined for an UPDATE request. Figure 17 shows an example message flow including two UPDATE transactions during session setup. An UPDATE request can only be sent when a dialog has been established. Remember the dialog is already established as an early dialog after receiving the first response from the UAS (180 Ringing). But this is an unreliable response and to be sure that the UAC has received the necessary information for continuing the dialog (Contact header field, tag parameter in To header field) the reliability mechanism for provisional responses (PRACK request) is used here. After the dialog has been established both sides may send an UPDATE request. Figure 17 shows two UPDATE transactions, one initiated from the caller and the other initiated by the callee.28 The UPDATE requests contain modified SDP offers and the 200 (OK) responses the answers. In above example three offer/answer exchanges take place.29

27 28

RFC 3311: SIP UPDATE Method This example is probably more hypothetical and should only show that UPDATE requests can be sent from either side. In practical situations usually only one UPDATE transaction is used. 29 This is realistic in case of mobile networks (IMS), but in that case the second offer/answer exchange is carried in the PRACK request.

User Agent A (Caller)

INVITE (with SDP offer 1) 180 Ringing (with SDP answer 1) PRACK 200 OK UPDATE (with SDP offer 2) 200 OK (with SDP answer 2) UPDATE (with SDP offer 3) 200 OK (with SDP answer 3) 200 OK (on INVITE) ACK

User Agent B (Callee)

Figure 17: UPDATE Call Flow A user agent should be sure that the peer user agent supports the UPDATE method. Therefore the INVITE request and the 180 (Ringing) response should contain an Allow header field showing support for the UPDATE method. An UPDATE request may also be used during confirmed dialogs (after INVITE transaction is finished), but in that case a re-INVITE is recommended. The re-INVITE allows an approval of the user due to the longer duration an INVITE-ACK may have, while an UPDATE request has to be answered immediately. The main application of the UPDATE requests is Resource Management as explained in the next chapter.


6 Resource Management
Some networks (e.g. mobile networks using SIP) require that at session establishment time, once the callee has been alerted, the chances of a session establishment failure are a minimum. One major source of failure in particular in mobile networks is the inability to reserve network resources for a session. This could lead to so called ghost rings, where the callee is alerted but the session cannot be setup successfully due to lack of resources. In order to minimize "ghost rings", it is necessary to reserve network resources for the session before the callee is alerted. However, the reservation of network resources frequently requires knowledge about the session parameters from the callee. This information is obtained as a result of the initial offer/answer exchange carried in SIP. This exchange normally causes the "phone to ring", thus introducing a chicken-and-egg problem: resources cannot be reserved without performing an initial offer/answer exchange, and the initial offer/answer exchange always causes alerting which might not be appropriate as long as necessary resource are not reserved. The solution to this problem is the concept of preconditions. Preconditions are a set of constraints about the session which are introduced in the offer. The recipient of an offer including preconditions generates an answer, but does not alert the user or otherwise proceed with session establishment until the preconditions are met. The session setup is stopped until an event occurs that the preconditions are met. This can be a local event (such as a confirmation of a resource reservation), or through a new offer sent by the caller. The precondition issue is media stream specific. Therefore the solution is based on extending SDP rather than by extending SIP. The solution is specified in RFC 331230. Additional remark on Updates: The original RFC 3312 based solution is QoS specific. In the meantime two additional applications for preconditions were identified: Usage of preconditions to enable mobility solutions31 Usage of preconditions to enable protection of media streams (media security)32

30 31

RFC 3312: Integration of Resource Management and SIP RFC 4032: Update to SIP preconditions Framework 32 RFC 5027: Security preconditions for SDP media streams

6.1 Protocol overview

The basic idea of extending SDP for support of preconditions is to define two state variables that affect the media stream: current status and desired status. The desired status defines a threshold for the current status that must be reached. Session establishment stops until the current status reaches or surpasses this threshold. Once this threshold is reached or surpassed, session establishment resumes. For example, the following values for current and desired status would not allow session establishment to resume:
current status = resources reserved in the send direction desired status = resources reserved in both (sendrecv) directions

On the other hand, the values of the example below would make session establishment resume:
current status = resources reserved in both (sendrecv) directions desired status = resources reserved in the send direction

These two state variables are mapped to new attributes for the media stream in SDP and are exchanged with the offer/answer cycle. Thus both session partners have a shared view on the resource situation and they know when they have to stop session setup to wait for a condition to be met. Figure 18 shows a basic session setup using SDP preconditions as it is applied in mobile networks. User Agent A includes quality of service preconditions in the SDP of the initial INVITE. User Agent A does not want User Agent B to be alerted until there are network resources reserved in both directions end-to-end. User Agent B agrees to reserve network resources for this session before alerting the callee. Both user agents will handle resource reservation in their local access segment. This is the segment where in fact resources have to be reserved in a mobile network (radio link).


User Agent A (Caller)

INVITE (with SDP offer 1) 183 Session Progress (with SDP answer 1) PRACK (with offer 2) 200 OK (with answer 2)

User Agent B (Callee)

Figure 18: Basic Session Setup using preconditions User Agent B returns a 183 (Session Progress) response to User Agent A asking A to confirm when resources have been reserved in the local segment of A. In mobile networks it is necessary to agree on a specific codec before resource reservation can start due to different bandwidth requirements of different codecs. In SDP answer 1 of the 183 (Session Progress) response there might be more the one codec at disposal. Now User Agent A decides on the codec to be used (because he/she probably has to pay for the session) und tells its decision also User Agent B in a PRACK request containing offer 2. With sending / receiving the PRACK request both sides start the resource reservation mechanism33. User Agent A finishes resource reservation and informs User Agent B with an UPDATE request. User Agent B has already finished resource reservation in above example and now alerts the user and sends a 180 (Ringing) response. Then the session setup proceeds as usual.

Resource Reservation

Resource Reservation

UPDATE (with SDP offer 3) 200 OK (with SDP answer 3) 180 Ringing


200 OK (on INVITE) ACK


The resource reservation mechanism is independent from SDP signaling and depends on the transport network technology in place. A transport layer mechanism for QoS supported by most routers is RSVP (RFC 2205: Resource Reservation Protocol).

6.2 SDP parameters and attributes

Three new SDP attributes for preconditions are defined
a=curr precondition-type status-type direction-tag a=des precondition-type strength-tag status-type direction-tag a=conf precondition-type status-type direction-tag

The current status attribute curr carries the current status of network resources for a particular media stream: The desired status attribute des carries the preconditions for a particular media stream. When the direction-tag of the current status attribute, with a given precondition-type/status-type for a particular stream is equal to (or better than) the direction-tag of the desired status attribute with the same precondition-type/status- type, for that stream, then the preconditions are considered to be met for that stream. The confirmation status attribute conf carries threshold conditions for a media stream. When the status of network resources reach these conditions, the peer user agent must send an update of the session description containing an updated current status attribute for this particular media stream (a confirmation). The attributes use the following parameters: precondition-type: RFC 3312 defines only one type for Quality of Service qos. RFC 5027 defines additionally a precondition type for security sec. status-type: This parameter indicates if preconditions have to be met end-to-end or only segmented (values are: e2e local, remote). strength-tag: This tag indicates whether the callee can be alerted in case the network fails to meet the preconditions (values are "mandatory","optional","none", "failure", unknown") direction-tag: This parameter indicates the direction in which a particular attribute is applicable to (values are "none","send","recv","sendrecv").

Coming back to the example session setup with preconditions in Figure 18 the precondition attributes in the SDP parts may be look as follows: Offer 1: a=curr:qos local none a=curr:qos remote none a=des:qos mandatory local sendrecv a=des:qos none remote sendrecv This is the initial position of User Agent A. It will care for QoS on the local segment but cannot do that for the remote segement.


Answer 1: a=curr:qos local none a=curr:qos remote none a=des:qos mandatory local sendrecv a=des:qos mandatory remote sendrecv a=conf:qos remote sendrecv User Agent B will take care for it own local segment but requires a confirmation when resources are reserved at the remote side. Otherwise it will not alert the user. Offer 2: a=curr:qos local none a=curr:qos remote none a=des:qos mandatory local sendrecv a=des:qos mandatory remote sendrecv Offer 2 reflects the qos mandatory condition from the remote side. In addition the list of codecs has been reduced to exactly one not shown here. Answer 2: a=curr:qos local none a=curr:qos remote none a=des:qos mandatory local sendrecv a=des:qos mandatory remote sendrecv a=conf:qos remote sendrecv Nothing has been changed since Answer 1. Offer 3: a=curr:qos local sendrecv a=curr:qos remote none a=des:qos mandatory local sendrecv a=des:qos mandatory remote sendrecv Now User Agent A confirms QoS readiness in its local segment. Answer 3: a=curr:qos local sendrecv a=curr:qos remote sendrecv a=des:qos mandatory local sendrecv a=des:qos mandatory remote sendrecv User Agent B reflects the availability of OoS readiness on the remote and local side. The user will be alerted now.


6.3 Option Tag

As with many extensions of SIP protocol also the precondition extension defines an option tag, which can be used in the Require and Supported header field. If the SDP preconditions contain a mandatory strength-tag then the user agent must use the precondition option tag in the Require header like: Require: precondition If the peer user agent does not support preconditions the session setup is rejected.


7 Third Party Session Control

Sessions are usually controlled (set-up and terminated) by the session partners themselves. But there are situations where a third party (a controller) may be involved in session set-up. A practical example is a click-to-dial service, where a web-site offers the set-up of a call. The session is then set-up by some SIP code embedded in the web-site (3rd party). Figure 19 shows how such a SIP session can be set-up.

Party A
The controller sets up a call to A with no SDP in the INVITE. A respondes with connection SDP data in 200 OK. Controller sends hold SDP in ACK The controller sets up a call to B with no SDP in the INVITE. A respondes with connection SDP data in 200 OK. The controller re-INVITEs A with SDP data from B. A respondes with connection SDP data in 200 OK (again). The controller sends SDP from A to B in ACK and sends ACK to A. INVITE with no SDP 200 OK with SDP from A ACK


Party B

INVITE with no SDP 200 OK with SDP from B INVITE with SDP from B 200 OK with SDP from A ACK with SDP from A ACK
Media stream

A terminates the session with BYE and the controller sends BYE to B. Both transactions are confirmed.

BYE 200 OK

BYE 200 OK

Figure 19: Third party call-control The message flow in Figure 19 makes use of the fact that an INVITE may be sent without an SDP. In this case the SDP offer/answer has to be exchanged in 200 OK and ACK. The message flow above is only one possibility for 3rd party session set-up. RFC 372534 gives some more examples. The example above once again shows the flexibility of SIP and its nature as a toolbox of functions which may be combined to create some service. Note that all signaling is originated/terminated at the controller, but media is sent directly between party A and party B. No additional SIP protocol extensions are required for above behavior, just basic SIP.


RFC 3725: Best Current Practices for Third Party Call Control (3pcc) in SIP.

8 REFER Method
The REFER35 method is a SIP extension that requests that the recipient refers to a resource provided in the request. This can be used to enable many applications, including call transfer. The REFER method also establishes implicitly (without sending a SUBSCRIBE request) a short-lived subscription to the refer event. The refer event allows the party sending the REFER to be notified of the outcome of the referenced request. The NOTIFY body of a REFER has the Content-Type message/sipfrag which is defined in RFC 342036. Compared with the Content-Type message/sip the sipfrag allows to selectively insert only specific parts of a SIP message. In case of the refer event the message body of NOTIFY contains typically the status line only in case of provisional responses and the full response including the dialog data in case of 200 OK. The dialog data allow the recipient of the NOTIFY to take control of the session later and get the session partner back again via the Replaces header field of an INVITE request (see chapter 8.2 on page 43). The REFER request uses a new header field Refer-To which indicates the target to be referred. When an User Agent sends a REFER request the recipient will contact the resource addressed by the Refer-To header field in the request and it will also notify the referrer of the outcome (success or no success) of the operation. In case of call transfer service (the usual case for REFER) the address in the Refer-To header contains the SIP-URI of person to whom the call will be transferred. But the semantic of the Refer-To header is much broader: it also may contain the address of a web-site. In addition various URI-parameters in the Refer-To address may further define some conditions how the addressed resource should be contacted (e.g. the URI-parameter method=INVITE causes the referee to use the INVITE method). The REFER method maybe used within a dialog or outside of a dialog, but the most common case is to transfer existing calls and in this case it is sent within an existing dialog. User Agents often do not accept REFER request outside of a dialog. If REFER is not used within a dialog a dialog is created. REFER and NOTIFY requests are part of the dialog. SUBSCRIBE is not used due to implicit subscription. Figure 20 shows a typical message flow of a simple unattended call transfer service using the REFER request. The call transfer is called unattended because Alice (the referrer) does not setup a session with the refer target (Carol) before the transfer to explain the reason why the call is transferred. This is perhaps impolite but simpler to explain from a call flow perspective. Later we will see the more realistic example of an attended call transfer (see chapter 8.2 on page 43).

35 36

RFC 3515: The SIP Refer Method RFC 3420: The media type message/sipfrag

The REFER request in Figure 20 is sent within an existing dialog. The Refer-To header field contains the address of the Refer-Target (Carol). The REFER request is executed as a simple transaction causing the referee (Bob) to respond immediately. At this time the referee does not know the result of the action initiated by REFER request (in above example an INVITE request to the refer target). Therefore the referee responds with 202 Accepted and sends NOTIFY requests to the referrer (Alice) to keep her informed about the result of the initiated action.
Alice (Referrer)
A dialog and session exists between Alice and Bob Session and Dialog

Bob (Referee)

Carol (Refer Target)

Alice starts a call transfer. A subscription to the refer event is created implicitly. The first NOTIFY informs that the User Agent of Carol is ringing. Alice user agent automatically terminates the session.

REFER Refer-To: Carol 202 Accepted NOTIFY 200 OK BYE 200 OK --- end of session --and dialog

INVITE 180 Ringing

The second NOTIFY informs that the Carol has accepted the session.


200 OK ACK Session and Dialog

Figure 20: Call transfer example based on a REFER request There are some situations where the implicit subscription to the refer event is not necessary. In this case a further extension allows the referrer to suppress the implicit subscription37.


RFC 4488: SIP REFER without Subscription


8.1 Referred-By header field

If the Refer-To header field of a REFER request contains a SIP URI the refer target is usually contacted by a SIP INVITE request. This INVITE request (the INVITE sent from Bob to Carol in Figure 20) cannot be distinguished from an INVITE sent by Bob without any REFER based action behind. In many cases this is not enough, The refer target should recognise that the INVITE request arriving has been referred. This gap is closed by the Referred-By header field defined in RFC 389238 and is quite simple. Figure 21 shows the Referred-By mechanism.



Refer Target

REFER referee Refer-To: target Referred-By: referrer

INVITE target Referred-By: referrer

Figure 21: Referred-By mechanism The Referrer adds a Referred-By header field to the REFER request containing the identity of the referrer. This header field is copied into the referenced request (INVITE). Someone may detect a security issue in the simple mechanism shown in Figure 21, because it is easy in this case for a man-in-the middle attack to fake a Referred-By header field. Imagine the boss of a company who only might accept calls referred by his secretary. This would be easy to fake. RFC 3892 addresses this situation and offers a solution based on an Authenticated Identity Body39 (AIB). The AIB offers a signature which is included in the message body. Figure 22 shows how the Referred-By header field is secured by an AIB. A content-identifier parameter (cid) is added to the Referred-By header field and the identifier points to a separate part of the message body which contains a signature on the Referred-By header field. The Refer Target has now a possibility to verify the authenticity of the Referred-By header field.

38 39

RFC 3892: The SIP Referred-By Mechanism RFC 3893: SIP Authenticated Identity Body (AIB) Format



Refer Target

REFER referee Refer-To: target Referred-By: referrer; cid=X Additional message body part (MIME) Content-ID: X <Referred-By Token>

INVITE target Referred-By: referrer; cid=X Additional message body part (MIME) Content-ID: X <Referred-By Token>

Figure 22: Referred-By header field secured by an AIB There is also the possibility for the refer target to reject a REFER request without a valid referrer identity with the response 429 (Provide Referrer Identity).

8.2 Replaces header field

The Replaces header field enhances the possibilities of peer-to-peer call control procedures and is often used in combination with procedures initiated by a REFER request. The Replaces header field is defined in RFC 389140 and is used to logically replace an existing SIP dialog with a new SIP dialog. This mechanism can be used to enable a variety of features, for example: "Attended Transfer" and "Call Pickup" as will be shown later. The Replaces header field contains the components of a dialog-id (Call-ID, From-tag, To-tag) and refers to an existing dialog. If an INVITE request contains a Replaces header field the new session seamlessly replaces the existing session identified by the Replaces header field. If the dialog-id within the Replaces header field does not match an existing dialog the request is rejected with 481 (Call/Transaction Does Not Exist). Example: Replaces:;from-tag=r33th4x0r;to-tag=ff87ff The usage of the Replaces header field is best show in an example. Figure 23 shows the scenario of an attended call transfer. A few details in the message content are explained below.


RFC 3891: The SIP Replaces Header


INVITE 180 Ringing 200 OK ACK Session and Dialog Bob puts Alice on hold.



Alice calls Bob

INVITE (hold) 200 OK ACK no RTP

Bob calls Carol

INVITE 180 Ringing 200 OK ACK Session and Dialog

Bob puts Carol on hold.

INVITE (hold) 200 OK ACK

Bob transfers the existing session with Carol to Alice including the Replaces header field.

REFER Refer-To: Carol 202 Accepted NOTIFY 200 OK

no RTP

Alice calls Carol replacing the existing session with Bob.

INVITE Replaces: dialog with Bob 200 OK ACK Session and Dialog

Carol terminates the session with Bob. Alice reports the successful transfer to Bob Bob terminates the session with Alice. NOTIFY 200 OK BYE 200 OK

BYE 200 OK

Figure 23: Attended call transfer using REFER and Replaces The INVITE request (1) of Bob to put Alice on hold is shown in Figure 24. To set a session partner on hold the SDP attribute a=sendonly is used. In addition the media feature tag


sip.rendering="no" in the Contact header field is used to make sure that during hold no received media will be rendered. The a=sendonly attribute of an SDP offer is reflected in the SDP answer with the attribute a=recvonly. INVITE SIP/2.0 Via: SIP/2.0/TLS ;branch=z9hG4bKnashds7 Max-Forwards: 70 From: Bob <>;tag=23431 To: Alice <>;tag=1234567 Call-ID: CSeq: 1024 INVITE Contact: <>;+sip.rendering="no" Content-Type: application/sdp Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, NOTIFY Supported: replaces Content-Length: ... v=0 o=bob 2890844527 2890844528 IN IP4 s= c=IN IP4 t=0 0 m=audio 3456 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=sendonly Figure 24: INVITE request to put Alice on hold (1) The next interesting request is the REFER request (2) sent by Bob to Alice. REFER SIP/2.0 Via: SIP/2.0/TLS ;branch=z9hG4bKnashds2g Max-Forwards: 70 From: Bob <>;tag=23431 To: Alice <>;tag=1234567 Call-ID: CSeq: 1025 REFER Refer-To: < %3Bfrom-tag%3D8675309&Require=replaces> Referred-By: <> Contact: <> Content-Length: 0 Figure 25: REFER request from Bob to Alice (2)


In contrast to the REFER request of an unattended call transfer this REFER request contains a Replaces header field referring to the existing dialog between Bob and Carol and a Referred-By header field. Figure 25 shows the REFER request. The Refer-To header field contains the refer target address, which is the Contact address of Carol (not the AoR) to guarantee that the right instance of the user agent is addressed. The Contact URI is amended by the Replaces header field (after the question mark). One will notice that within the Replaces header field control characters are escaped (%HEX notation for @, = and ;). This is a syntax rule in SIP to avoid ambiguity. The Replaces header field contains the three parameters of the dialog-id: Call-ID, To-tag and From-Tag. The INVITE request (3) sent from Alice to Carol includes the (now unescaped) Replaces header field as shown in Figure 26.

INVITE;gr SIP/2.0 Via: SIP/2.0/TLS ;branch=z9hG4bKadfe4ko To: Carol <> Max-Forwards: 70 From: Alice <>;tag=3461 Call-ID: CSeq: 1 INVITE Require: replaces Referred-By: <> Replaces: ;to-tag=5f35a3;from-tag=8675309 Contact: <> Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, NOTIFY Supported: replaces Content-Type: application/sdp Content-Length: ... v=0 o=alice 2890844989 2890844989 IN IP4 s= c=IN IP4 t=0 0 m=audio 3458 RTP/AVP 0 a=rtpmap:0 PCMU/8000 Figure 26: INVITE with Replaces header field (3)


The full message flow with all details of the message contents can be studied in RFC 535941. This RFC contains many examples of possible services and may be interesting to analyse. But an important remark has to be mentioned: The services examples are only to be considered as examples, because there are in some cases different choices to implement a service. This is the result of the toolbox nature of SIP, where all functions (e.g. Replaces header field extension) have to be seen as tools (service primitives) within the SIP toolbox. As long as the syntax and semantic of an extension liker REFER method or Replaces header field is implemented correctly the interoperability should not be an issue.


RFC 5359: SIP Service Examples


9 Conferencing
The core SIP specification provides a way to set up and manage sessions between two User Agents. It is possible to create and control a multi-party conference using this specification. However in such a scenario, referred to as the loosely coupled conference model, there does not exist a relationship between every participant in the conference. Such conference situations can be accomplished by using multicast. Alternatively, a UA can maintain multiple dialogs with multiple User Agents while also acting as a media mixer. While the User Agent that is acting as the conference controller/mixer has knowledge of the other User Agents involved in the conference, the other User Agents do not know about each other. Additionally this scenario puts extra strain on the resources of the controlling User Agent by forcing it to both: controlling signaling and mixing media streams.. RFC 435342 introduces an architecture by which a central entity, called a focus, provides a variety of conference functions and mixing of media. In this type of conference, referred to as the tightly coupled conference model, each UA involved in the conference connects to the focus and maintains its own SIP dialog with it. This specification also defines a logical function called a conference policy server that stores conference policy, which is simply a set of rules governing a particular conference. The focus must be able to access this conference policy to determine how the conference should operate, such as if a particular UA is allowed to join the conference. The specification also defines a second logical function called a conference notification service. This is a service that a conference participant can subscribe to and receive notifications when changes in conference state occur. In this model, a UA participating in a conference can SUBSCRIBE to the conference URI and be alerted via SIP NOTIFY messages when the state of the conference changes, such as when participants enter and leave the conference. Often the conference focus, policy server, and notification service are located in the same physical entity. RFC 457543 defines an event package for notifying participants of a tightly coupled conferences of the conference state. RFC 457944 uses the concepts from RFC 4353 and RFC 4575 to define a set of recommended practices for creating and controlling

42 43

RFC 4353: A Framework for Conferencing with SIP RFC 4575: A Session Initiation Protocol (SIP) Event Package for Conference State 44 RFC 4579: SIP Call Control - Conferencing for User Agents

9.1 Tightly Coupled SIP Conference

Figure 27 shows the architecture of a conferencing server for tightly coupled conferences. It is designed as a one-box conferencing server and contains several functional elements: Conference focus: The focus is a SIP user agent that is addressed by a conference URI which identifies a conference. The conference focus maintains a SIP signaling relationship with each participant in the conference and is responsible for ensuring, in some way, that each participant receives the media that make up the conference. Mixer: A mixer receives a set of media streams of the same type from the participants of the conference. It combines (mixes) the media and redistributes the result to each participant. Conference Policy Server: A conference policy server stores and manipulates the conference policy like maintaining the list of participants to be invited to a conference. Conference Notification Server: The participants of a conference should get information on all actual participants of a conference and when a new participant joins the conference and another participant leaves the conference. For that function RFC 4575 defines the conferencing event package. The notification server accepts subscriptions by the participants to the conference state and notifies the participants about changes to that state.

Conference Server Conference Policy Server Conference Notification Server

Conference Focus


Conference Mixer





Participant A

Participant B

Participant C

Figure 27: Conferencing Server architecture


Tightly coupled conferences are hosted by a central point of control the conference focus to which every participant has a signaling connection. The conference focus uses a conference specific SIP address which is shared among the participants. Closely coupled with the conference focus is a conference mixer. The mixer terminates and re-originates the media streams. The conference focus controls the conference mixer. It knows the SDP parameters for mixing media streams contained in the SIP signaling messages. The focus controls the mixer via the H.24845 protocol also known as MEGACO protocol46. The mixing does simply the following: it receives media stream from A and sends a combined media from B+C to A, it receives media stream from B and sends a combined media from A+C to B, it receives media stream from C and sends a combined media from A+B to C.

9.1.1 Creation of an Ad-hoc conference

A network operator can allow a user to create an ad-hoc conference. Such a conference has no specified start time and is automatically established as soon as the first user joins the conference. In order to do this, the user creating a conference must call a so-called conference-factory URI provided by the conference-focus:

The message body of the INVITE request contains all media streams that the user wants to establish for this conference. The conference-focus then checks if the user is allowed to create an ad-hoc conference (via the policy server) and if resources for that conference are available at the mixer. If it the focus accepts the ad-hoc conference it sends a dedicated conference URI back to the user within the Contact header field of the 200 OK response.
SIP/2.0 200 (OK) Contact:;isfocus

The conference-focus indicates in this address that it will act as a focus for the ad-hoc conference by adding an isfocus feature-parameter (see chapter 15, page 101). The next step for the creator of the conference is to get the participants invited to the conference. This can be accomplished by e.g. sending the conference URI to the participants e.g. via messaging (see chapter 10, page 54). The participants then individually set-up the session with the conference-focus using the dedicated conference URI.

45 46

H.248: Gateway control protocol Version 3 RFC 3525: Gateway Control Protocol Version 1

9.1.2 Immediate Conference creation with a URI list

The conference creator may also use a more comfortable method to set-up the conference by adding the URIs of the participants in a so-called URI list47 and let the conference focus invite the participants immediately. The conference creator attaches the URI list to the message body of the INVITE request towards the conference-factory URI. But the message body of the INVITE request already contains an SDP and therefore multiple message bodies are included. This is shown in Figure 28. From this we see, that the user wants to create the ad-hoc conference (based on the conference-factory URI in the request line) and that three additional users should be called directly into the conference. In addition we can see another XML attribute CopyControl which is defined in RFC 536448. The purpose of this attribute is well-known from E-mail systems where the recipients can be qualified as To, Cc, or Bcc.

INVITE SIP/2.0 Content-Type: multipart/mixed;boundary="boundary1" --boundary1 Content-Type: application/sdp //SDP Information not shown here --boundary1 Content-Type: application/resource-lists+xml Content-Disposition: recipient-list <?xml version="1.0" encoding="UTF-8"?> <resource-lists xmlns="urn:ietf:params:xml:ns:resource-lists" xmlns:cp="urn:ietf:params:xml:ns:copycontrol"> <list> <entry uri="" cp:copyControl="to" /> <entry uri="" cp:copyControl="to" /> <entry uri="" cp:copyControl ="to" /> </list> </resource-lists> --boundary1

Figure 28: URI list contained in a multipart message body

47 48

RFC 5366: Conference Establishment Using Request-Contained Lists in SIP RFC 5364: Extensible Markup Language (XML) Format Extension for Representing Copy Control Attributes in Resource Lists

9.1.3 Floor Control

Floor control is an optional feature for conferences by which the right to speak can be explicitly given to specific participants. Floor control is performed by means of the Binary Floor Control Protocol (BFCP)49. It allows e.g. the following: a conference participant can request the floor; a conference moderator can grant the floor based on a floor request or deny it; all conference participants get informed about the current status of the floor.

BFCP uses TCP connections between the participants and binary coded information. For more details see RFC 4582.

9.2 Decentralized Conferencing

In the previous chapter we assumed that a centralized conferencing server within the network offers the conferencing service. This server offers the conferencing service typically for conferences where many conferencing participants are involved. For small conferences with only few participants typically three the conference signaling and media mixing is often offered by a participating SIP User Agents directly. No specific server is required in this case.

9.3 Joining a conference

Besides the traditional method of using INVITE requests to participate (join) in a conference there are also other methods. As an overview the following methods can be used to join a conference: UA sends INVITE to focus Focus sends INVITE to UA UA sends REFER to focus UA sends JOIN header

The first two methods have been explained already in the previous chapter of centralized and tightly coupled conferences. The REFER based method is an alternative for the conference initiator to directly bring other participants into the conference. The initiator simply sends a REFER request to the participant which includes a Refer-To header field with the conference URI. Another method may be to use a Join header field as explained in next chapter.


RFC 4582: The Binary Floor Control Protocol (BFCP)


9.4 Join header field

The Join header field50 is used in an INVITE request to request that the dialog (session) be joined with an existing dialog (session). The parameters of the Join header field identify the dialog to be joined by the Call-ID, To-tag and From-tag similar to the replaces header field, for example: Join: 12adf2f34456gs5;to-tag=12345;from-tag=54321 If the Join header field references a point-to-point dialog between two user agents, the Join header field is effectively a request to turn the call into a conference call. If the dialog is already part of a conference, the Join header field is a request to be added to the conference. An example call flow is shown in Figure 29. Carol wants to join an existing session between Alice and Bob. She sends an INVITE request referring to the existing dialog which she wants to join in the Join header. Bob informs Alice that it now acts as a conference focus with a re-INVITE request containing the isfocus feature tag in the Contact header field. When this is accepted by Alice the conference set-up is finished by accepting the INVITE of Carol. Bob from now on acts as a mixer within the established conference.

Media stream



INVITE Contact: isfocus 200 OK ACK

INVITE Join: dialog A-B 180 Ringing

200 OK Contact: isfocus ACK

Media stream

Media stream

Figure 29: Use of Join to create a conference


RFC 3911: SIP "Join" Header


10 SIP Based Messaging

During the early years of the SIP standardization (199x years) there was also a hype regarding Instant Messaging. Many famous internet companies like Google and Yahoo used proprietary messenger products to get their customers sticked to their services. These messengers also offered the presence functionality with buddy-lists, so that a messenger-user could easily see which of her/his friends is available for a messenger chat. It was therefore a natural evolution of SIP to enhance the protocol with presence functionality and instant messaging. The presence part is covered by chapter 4.1 on page 15. This chapter is dedicated to Instant Messaging.51 Two different modes of Instant messaging have been defined, the page and the session mode

10.1 Page Mode Instant Messaging

The page mode of instant messaging offers a user the possibility to spontaneously send a message without setting up a dialog. Basis of the page mode instant messaging is the MESSAGE method52. The semantic of a MESSAGE method is only loosely defined. It is what it is: a message. The content of the message is carried in the message body and the Content type may be any type the recipient will be able to understand. Remember: the Content-Type header field describes the type of a message body and the Accept header field shows which Content types a user agent may accept. MESSAGE requests do not themselves initiate a SIP dialog. Under normal usage each Instant Message stands alone, much like pager messages. MESSAGE requests may also be sent in the context of a dialog initiated by some other SIP request. A typical MESSAGE method is shown below in Figure 30. In this case the content-type is simply plain/text. Each MESSAGE is a standalone SIP transaction and therefore requires a response which is usually 200 (OK). In some implementations (message box) which have the ability to store messages when they cannot be delivered a 202 (Accepted) response is also on option. The drawback of the page mode messaging is that the SIP signaling path is used for routing the MESSAGE to the recipient. This may eventually lead overload of the signaling elements. In addition in case of UDP the reliability of message transport is limited.


Both features Presence and Instant Messaging have been defined by the special IETF working group SIMPLE (SIP for Instant Messaging and Presence Leveraging Extensions) 52 RFC 3428: SIP Extension for Instant Messaging

MESSAGE SIP/2.0 Via: SIP/2.0/TCP;branch=z9hG4bK776sgdkse Max-Forwards: 70 From:;tag=49583 To: Call-ID: asd88asd77a@ CSeq: 1 MESSAGE Content-Type: text/plain Content-Length: 18 Watson, come here. Figure 30: Example of a SIP MESSAGE method The MESSAGE method can be used inside and outside of a dialog. Inside of a dialog (e.g. within an INVITE based dialog) the MESSAGE request may be sent directly between the user agents (end-to-end), otherwise the routing path (inbound and perhaps also an outbound proxy, ...etc) will be used In case of longer messaging sessions (e.g. chat) the session based messaging method should be used.

10.2 Session Mode Instant Messaging with MSRP

A series of related instant messages between two or more parties can be viewed as part of a "message session", that is, a conversational exchange of messages with a definite beginning and end. This is usually also called a chat session. The session mode of instant messaging uses SIP capabilities for session setup (INVITE-ACK) and tear down (BYE). Message sessions are just treated like any other media stream The main difference to an audio session in session setup is that within SDP - for the media (m) line instead of audio the token message is used, and - for the transport protocol instead of RTP/AVP a new protocol TCP/MSRP is used. The Message Session Relay Protocol (MSRP) 53 is a protocol for transmitting a series of related instant messages in the context of a session. An INVITE request for setup of a messaging session is shown in Figure 31. Alice invites Bob to messaging session (some lines in the message header and in SDP message body are removed for clarity and brevity, because the focus now is on the SDP part).


RFC 4975 : Message Session Relay Protocol


INVITE SIP/2.0 To: <> From: <>;tag=786 Call-ID: 3413an89KU Content-Type: application/sdp c=IN IP4 m=message 7654 TCP/MSRP * a=accept-types:text/plain a=path:msrp://;tcp Figure 31: Example session setup of a messaging session (MSRP)

Similar to audio or video sessions the offer/answer model of SDP is applied. The offer is shown in Figure 31 and it tells where Alice is willing to receive the instant messaging stream. An MSRP-URI with a path attribute describes the endpoint of the session. Bob responds to the INVITE request with a 200 OK response containing the SDP answer as shown in Figure 32.
Based on the MSRP offer/answer exchange an additional TCP-connection is setup between Bob and Alice for the MSRP session.

SIP/2.0 200 OK To: <>;tag=087js From: <>;tag=786 Call-ID: 3413an89KU Content-Type: application/sdp c=IN IP4 m=message 12763 TCP/MSRP * a=accept-types:text/plain a=path:msrp://;tcp Figure 32: Example SDP answer for MSRP session.

MSRP session The Message Session Relay Protocol defines two request types (also called methods): SEND and REPORT. SEND requests are used to deliver a complete message or a chunk (a portion of a complete message), while REPORT requests report on the status of a previously sent message. When Alice receives Bob's answer, she checks to see if she has perhaps an existing connection to Bob. If not, she opens a new TCP connection to Bob using the MSRP URI Bob provided in the SDP. Alice then delivers a SEND request to Bob with her initial message, and Bob replies indicating that Alice's request was received successfully.

A typical SEND request is shown in Figure 33 below. SEND and REPORT requests start MSRP transactions which are responded by 200 OK. MSRP a786hjs2 SEND To-Path: msrp://;tcp From-Path: msrp://;tcp Message-ID: 87652491 Byte-Range: 1-25/25 Content-Type: text/plain Hey Bob, are you there? -------a786hjs2$ MSRP a786hjs2 200 OK To-Path: msrp://;tcp From-Path: msrp://;tcp -------a786hjs2$ Figure 33: Example MSRP SEND request Alice's request begins with the MSRP start line, which contains a transaction identifier (a786hjs2) that is also used for request framing. Next she includes the path of MSRP URIs to the destination in the To-Path header field, and her own MSRP URI in the From-Path header field. In this typical case, there is just one "hop", so there is only one URI in each path header field (in case of MSRP relays in between additional MSRP URIs are included in the Path header fields) Alice also includes a message ID (87652491), which she can use to correlate status reports with the original message. Next she puts the actual content. Finally, she closes the request with an end-line of seven hyphens, the transaction identifier, and a "$" to indicate that this request contains the end of a complete message, in contrast to a chunk only. The main purpose of the MSRP URIs is to care for some kind of security. Alice and Bob choose their MSRP URIs in such a way that it is difficult to guess the exact URI. Alice and Bob can reject requests to URIs they are not expecting to service and which they cannot correlate with the probable sender. Alice and Bob can also use TLS to provide channel security over this hop. To receive MSRP requests over a TLS protected connection, Alice or Bob could advertise URIs with the "msrps" scheme instead of "msrp". As already mentioned there is the possibility to include intermediary network nodes in between of a messaging session, an MSRP relay as defined in RFC 497654. Intermediate message session relays may be used for enhanced security and authentication. An additional AUTH request is defined for MSRP relays. A typical application for MSRP relays is the application of instant messaging for trading. In this case a trusted protocol of the message exchange is necessary.

RFC 4976

Relay Extensions for MSRP


An example MSRP address header (path header fields) including security (TLS) and two messaging relays is shown below: To-Path: msrps://;tcp msrps://;tcp From-Path: msrps://;tcp msrps://;tcp


11 INFO method
The purpose of the INFO message is to carry application level information between endpoints, using the SIP dialog signaling path. But the INFO method does not update the characteristics of a SIP dialog or session. It only allows the applications that use the SIP session to exchange information. The INFO method has been originally defined in RFC 297655 which is now called the legacy INFO method. One of the first applications of this INFO method was to exchange encapsulated ISUP signaling between PSTN-SIP gateways as shown in Figure 34. This was a typical scenario in the early days of Voice over IP when an operator offered cheap long-distance calls via the Internet (so called toll-bypass).


Internet (SIP) PSTN-SIP Gateway SIP messages with encapsulated ISUP


Figure 34: Transport of ISUP messages within SIP The above figure shows the principle of the long distance toll bypass method. PSTN-SIP Gateways on the left and right side which take the role of SIP user agents map the ISUP messages for session set-up and release to equivalent SIP messages56,57. But it is not possible to map all information elements of ISUP without loss because not all ISUP information fields do have an equivalent representation within SIP. Therefore an additional mechanism has been defined to transport all ISUP information unaltered through the SIP network: the encapsulation of ISUP messages within SIP messages bodies58. Now all ISUP messages for call set-up and release can be mapped to equivalent SIP messages, but there are additional signaling messages for so called mid-call services, where an appropriate SIP message was missing. For this case the legacy INFO message is used.

55 56

RFC 2976: The SIP INFO Method RFC 3398: ISUP to SIP Mapping 57 RFC 3372: SIP for Telephones (SIP-T): Context and Architectures 58 RFC 3204: MIME media types for ISUP and QSIG Objects

Several other applications for the SIP INFO method had been defined which are not further mentioned here. Then a drawback with the legacy INFO method became obvious: there was no indication on the application for which it was used.

After some discussion the INFO method was redefined in a backward compatible manner. The new INFO method59 also includes an Info Package mechanism. The Info Package specification defines the content and semantics of the information carried in an INFO message associated with the Info Package.

The Info Package mechanism also enables the User Agents to indicate which Info Packages they are willing to receive and for which Info Package a specific INFO request is used. For that two new header fields have been defined: The Recv-Info header field indicates in a set of package names for which Info Packages a User Agent is willing to receive INFO requests. The Recv-Info header field may also be empty if it does not want to receive any INFO request. The Recv-Info header field is included in a dialog initiating request (typically INVITE). The receiver also includes a Recv-Info header field the response. Then both sides know which Info Packages the partner is able to process. The Info-Package header field is included in an INFO request to indicate which Info Package is associated with the request.
Figure 35 shows an exchange of Recv-Info header fields in an INVITE request and the

corresponding 200 (OK) response. The UAC sends an initial INVITE request, where the UAC indicates that it is willing to receive INFO requests for Info Packages P and R. The UAS sends a 200 (OK) response back to the UAC, where the UAS indicates that it is willing to receive INFO requests for Info Packages R and T.
Figure 36 shows an INFO request with a single payload. It refers to the Info-Package foo. The

corresponding specification of the Info-Package foo must also describe the syntax and sematic of the content type application/foo. Alternatively to a single payload an INFO request may also contain multiple message body parts.


RFC 6086: SIP INFO Method and Package Framework


INVITE SIP/2.0 Via: SIP/2.0/TCP;branch=z9hG4bK776 Max-Forwards: 70 To: Bob <> From: Alice <>;tag=1928301774 Call-ID: CSeq: 314159 INVITE Recv-Info: P, R Contact: <> Content-Type: application/sdp Content-Length: ... ... SIP/2.0 200 OK Via: SIP/2.0/TCP;branch=z9hG4bK776; received= To: Bob <>;tag=a6c85cf From: Alice <>;tag=1928301774 Call-ID: CSeq: 314159 INVITE Contact: <> Recv-Info: R, T Content-Type: application/sdp Content-Length: ... ...

Figure 35: Exchange of Recv-Info header fields at dialog establishment

INFO SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bKnabcdef To: Bob <>;tag=a6c85cf From: Alice <>;tag=1928301774 Call-Id: CSeq: 314333 INFO Info-Package: foo Content-type: application/foo Content-Disposition: Info-Package Content-length: 24 I am a foo message type

Figure 36: Example INFO method


12 Service Configuration
Many services require a mechanism to allow users to manage configuration parameters. The presences service which is presented in chapter 4.1 is a prominent candidate for parameter configuration, e.g. buddy lists and authorizations. For service configuration the user has to manipulate data on some server. This data are nowadays stored in XML documents because of their platform independence and the possibility to have the data well structured.

12.1 Overview on XML

The following chapter assumes that the reader is already familiar with the basics of XML60. Just as a short recap the following features are refreshed: XML documents are text based (human readable) representation of data like shown in the example below. Figure 37 shows a simple example of a presence status expressed in an XML document.
<?xml version="1.0" encoding="UTF-8"?> <presence xmlns="urn:ietf:params:xml:ns:pidf" entity=""> <tuple id="sg89ae"> <status> <basic>open</basic> </status> <note>Im in London at the moment</note> </tuple> </presence>

Figure 37: Example of a simple XML document After the XML declaration in the first line the data follow in a tree-like structure. Each node in the tree is called XML element. XML elements start an opening tag with the name of the element enclosed in angle brackets, e.g. <status>, and terminates with a closing tag that contains a slash / and the name of the element, e.g. </status>. XML elements can contain other child elements. XML elements usually contain a text node that represents a value. In the example the value of the <note> element is Im in London at the moment. XML elements can also be empty in which case a compact notation can indicate the beginning and end tags of the empty element by including a slash / at the end of the element name. For example <test/> is an empty element. XML elements can also contain attributes that further characterize the element by defining its metadata. In above example the presence element contains two attributes: xmlns and entity. Unlike elements attributes cannot be empty.


XML = Extensible Markup Language specified by World Wide Web Consortium (

XML documents usually are structured according to predefined rules. These rules are typically defined in separate additional documents like in a Document Type Definition (DTD) or in an XML schema. An important attribute is the namespace element xmlns. By referring to a globally unique namespace ambiguity is guaranteed. In IETF documents the namespaces are usually URNs61.

12.2 The XML Configuration Access Protocol (XCAP)

12.2.1 XCAP Overview
When configuration data are defined in XML documents the first issue is that a user creates an XML document and wants to upload it to a server. The server will use the XML document to personalize the offered service. The best candidate for uploading a document is HTTP since it provides the POST and PUT methods for transferring files from a client to a server. It also contains a GET method for downloading a document from a server. However HTTP is not flexible enough for manipulating configuration data because it is restricted to whole documents. For configuration tasks it is often required to only modify a small piece of data in an XML document. This is where XCAP 62 comes in. The XCAP protocol consists of a set of conventions and rules for using HTTP to upload and download complete or portions of an XML document to and from a server. Strictly speaking XCAP is not a new protocol but a set of conevtiones for using HTTP for managing remotely stored XML documents. Figure 38 shows the schematic representation of the protocol stack used by XCAP.


Figure 38: The XCAP protocol stack XCAP provides a client with the means to read, write and modify XML configuration data remotely stored on a server. The configuration data may be a complete XML document, an element or an attribute. XCAP only defines conventions that map XML documents and their components

61 62

RFC 3986: Uniform Resource Identifier (URI): Generic Syntax RFC 4825: The Extensible Markup Language (XML) Configuration Access Protocol (XCAP)

(elements, attributes) to HTTP URIs. Figure 39 shows an example of an XCAP request.

PUT HTTP/1.1 Content-Type: application/resource-lists+xml Content-Length: 460 <?xml version="1.0" encoding="UTF-8"?> <resource-lists xmlns:xsi=""> <list name="family" uri="" subscribeable="true"> <entry name="Bob" uri=""> <display-name>Bob</display-name> </entry> <entry name="Cynthia" uri=""> <display-name>Cynthia</display-name> </entry> </list> </resource-lists>

Figure 39: Example of an XCAP operation

The above XCAP operation (HTTP PUT request) is used by Alice to create a new presence list named family. The list contains the members of her family. Two URIs are initially added to the list: Bob's and Cynthia's. XCAP defines two new functional elements: an XCAP client and an XCAP server. They are depicted in Figure 40.
XCAP client
HTTP request HTTP response

XCAP server

Figure 40: XCAP functional elements An XCAP client is an HTTP 1.1 compliant client that supports the rules and conventions specified by XCAP. It sends HTTP requests and receives HTTP responses. An XCAP server is an HTTP 1.1 compliant server that supports the rules and conventions specified by XCAP. It receives HTTP requests and sends HTTP responses.


12.2.2 XCAP Application usage

XCAP is a generic protocol which can be used for different purposes related to application data configuration based on XML documents. In case of the presence service (see chapter 4.1) as an example XCAP is used for three applications: For defining the list of buddies where the user is interested in presence status, To control watchers whether watchers can see all or only part of the presence information To manipulate explicitly presence documents (hard state information)

In case of a centralized conferencing service the creator on a conference can use XCAP to configure the list of participants. Due to this versatility XCAP uses the concept of application usage. An application usage defines how a particular application uses XCAP to interact with an XCAP server. Each application usage is identified by an AUID (Application Unique ID) that uniquely identifies the application usage. The AUID is a string which is included in the HTTP URIs that identify XCAP resources (see next chapter). There are standardized and vendor proprietary AUIDs. In case of the above mentioned XCAP usage for the presence service the following application usages have been standardized: XCAP application usage for resource lists63 XCAP application usage for presence authorization64 XCAP application usage for manipulating presence documents65

12.2.3 XCAP URIs

XCAP is able to manage whole XML documents, XML elements, XML element values and XML attributes. Each of them is considered a resource and is identified by a specific HTTP URI to identify that resource (XCAP URI). Figure 41 shows two example XCAP URIs addressing whole documents. The URIs start with http:// followed by the hostname of the server and the directory where the documents are located. This part is the XCAP root locator. The root locator is followed by the document selector. The document selector starts with the application usage (AUID) and one of two possible subtrees: global or users. The global subtree identifies documents which are common to all users whereas the users subtree applies to documents specific for a user. In the latter case the name of the user is appended. Then the actual XML-document follows.

63 64

RFC 4826: Extensible Markup Language (XML) Formats for Representing Resource Lists RFC 5025: Presence Authorization Rules 65 RFC 4827: An XCAP Usage for Manipulating Presence Document Contents

hostname and directory


subtree users

XML document

XCAP root locator hostname and directory AUID document selector subtree users XML document

XCAP root locator document selector

Figure 41: XCAP URI consisting of XCAP root and document selector The above XCAP URIs address whole XML documents. When a specific element within the XML document is selected (not shown above) a node separator /~~/ follows with the node hierarchy and optionally the addressed attribute. A valid XCAP URI which addresses a specific entry in a resource-list might be (all in one line):
http://xcap-root@net1.test/root/resource-lists/users/sip:alice@net1.test/ resource-list.xml/~~/list[3]/entry[@uri=sip:dave@net1.test]

This XCAP URI addresses within the 3rd element of <list> the attribute uri in the element <entry>. It might be used in a HTTP PUT request to create or replace the specific element or in an HTTP DELETE request to delete the element.

12.2.4 Entity Tags and conditional operations

When an XCAP client fetches an XCAP element from an XCAP server it sometimes later may not be sure that the content of the XCAP element is still valid. Another XCAP client might have changed the content or it might have been changed by any other mechanism. To specify validity of an XCAP resource the entity tag mechanism from HTPP is re-used. An entity tag is an opaque string of characters that is associated to the content of a resource. It is a kind of fingerprint. Entity tags are transported in an ETag header field. Figure 42 shows the mechanism used in HTTP. Between the first and the second GET request some maybe external action changes the content. The corresponding HTTP responses show the changed ETag header field.


GET resource 200 OK Etag:1 MIME body


Content changes GET resource 200 OK Etag:2 MIME body

Figure 42: Entity tags in HTTP Entity tags are used in conditional HTTP requests and also in XCAP operations. To avoid unnecessary download of data (MIME body) conditional HTTP requests can be used. Figure 43 shows conditional HTTP requests which use If-Match and If-None-Match header fields. The first GET request in the figure will not include a MIME body because the assumption of the ETag-value 2 is correct (304 Not modified response). When the client now updates the resource it will get a new ETag value in the response. Then some change happens and a new ETag is assigned. When now the client again updates the resource referring to an outdated ETag it will get an error response (412 Precondition Failed) and the client ha to fetch the resource again before updating. The mechanism of conditional HTTP requests can be re-used by XCAP, which uses HTTP as the underlying protocol. Conditional XCAP requests are very useful. Before the client adds a new friend to the presence list, the client should make sure that it already has the latest version of the presence list. If it does not the operation might lead to an undesired result.


GET resource If-None-Match: 2 304 Not modified


PUT resource If-Match: 2 MIME body 200 OK Etag:3 Content changes New Etag: 4

PUT resource If-Match: 3 MIME body 412 Precondition Failed

GET resource If-None-Match: 2 200 OK Etag: 4 MIME body

Figure 43: Conditional HTTP requests

12.2.5 Subscriptions to changes in XML documents

In general different clients may be authorized to modify a given XML document. This may lead to the problem that an XML document update may not be recognized by a client. Imagine that an XCAP client fetches the actual list of buddies during start of the client. Sometimes later the buddy list may be updated by the same user via another XCAP client. The first client will not be notified of the changes made by the second client. This creates a typical problem in situations where a XML document is accessed from different devices such as a computer and a mobile device. A possible solution to this problem may be to periodically fetch the XML document. This might lower the risk of an outdated document to the polling interval but causes additional XCAP traffic.


A more accurate solution is offered by the combination of two specifications: The XCAP-Diff Format66 The XCAP-Diff event package67

These two specifications provide a subscription/notification mechanism to keep one or more XML documents synchronized with those stored on an XCAP server. The XCAP- Diff Format specifies an XML format to express changes in an XML document and the XCAP-Diff event package enables automatic notification on case of change of the content. The XCAP-Diff mechanism allows the terminal to subscribe not only to changes in the whole document, but also to changes in a particular element or attribute of an XML document. Furthermore the subscriber can issue a subscription to a collection of XML documents, elements and attributes even contained in different XML documents. The list of resources to be watched may be maintained in another XML document called a resource list. This list is then referred in the message body of the SUBSCRIBE request. There are further several different ways how the server can express the differences. The client may select a specific handling by using a diff-processing parameter specified for the Event header field. The diff-processing parameter may take one of three values: no-patching, xcappatching and aggregate. The value no-patching means that in case the subscription is done towards a whole XML document the document is not included in the notification, only the new entity tag. The value xcap-patching means that the client is interested in the actual changes also in case of subscription to whole documents. The value aggregate means that the server may aggregate several updates into a single notification. The policy for determining whether or not to apply aggregation or to determine how many updates to aggregate is determined locally. An example subscription to XML document changes is shown in Figure 44. Please note that in this operation XCAP and SIP operations are combined.

66 67

RFC 5874: An XML Document Format for Indicating a Change in XCAP Resources RFC 5875: An XCAP Diff Event Package

SUBSCRIBE Event: xcap-diff [resource-list] 200 OK


NOTIFY Event: xcap-diff [Initial XCAP-Diff document] 200 OK Content changes

NOTIFY Event: xcap-diff [XCAP-Diff document] 200 OK

Figure 44: Subscription to changes in XML documents


13 NAT and Firewall Traversal

SIP and all its protocol extensions offer a perfect toolbox for session oriented services on the Internet. But a precondition for SIP and all the extensions to work is a clean end-to-end architecture of the Internet68. This means that all the application logic is implemented in endsystems (user agents, proxy server) and the transport network in between transparently cares for delivering the IP packets only. But the reality of the public Internet today heavily disturbs the concept of end-to-end transparency. The reasons for this are network address translation (NAT) elements and firewalls typically implemented in a router. Almost two decades ago it became apparent that the address space is going to be exhausted due to the tremendous growth of the Internet. Many enterprises also wanted to use the Internet for their internal network but IP addresses became a scarce resource. A bundle of activities have been started in those days to overcome the problem: Provision of an enhanced addressing space in IPv6 Classless Inter-domain Routing (CIDR)69 Definition of a reserved IPv4 address space for private internets in RFC 1918 70

The usage of the private address space offered by RFC 1918 requires NAT and it was initially considered to be a short time solution only, but this technique is meanwhile used extensively due to its inherent security advantages. With private address space the main issue of exhausted address space has been significantly reduced so that the introduction of IPv6 (the long term solution) still takes only up at a very low pace. The consequence now is the destruction of the transparency of the Internet and the issues with multimedia protocols and SIP in real life.

13.1 Network Address Translation

The principle mechanism of Network Address Translation (NAT) is quite simple. It comes in two flavours: NAT and NAPT71. In case of NAT the private IP address is mapped to a public IP address irrespective the port numbers used. But there is a limitation on the number hosts in the private network which may communicate with hosts on the public network simultaneously depending on the number of public IP addresses assigned. Only as many hosts in the private network may communicate with host on the public network as public IP addresses are available.

68 69

RFC 1958: Architectural Principles of the Internet RFC 4632: Classless Inter-domain Routing (CIDR) 70 RFC 1918: Address Allocation for Private Internets 71 RFC 3022: Traditional IP Network Address Translator (Traditional NAT)

This limitation is avoided by using also different port numbers as an additional addressing layer. This method is called NAPT (Network Address and Port Translation) and is the predominant NAT mechanism used today. The principle of NAPT is well known and shown in Figure 45. Two clients in the private network with different IP addresses ( and use the same port (5060) to communicate with the SIP server on the public Internet. Requests sent by both clients are mapped to the public IP address of the NAPT-router ( and the NAPT mechanism assigns a different source port to each (5060, 23544). The NAPT router keeps a mapping table to forward responses from the SIP server to the clients accordingly. The task of a NAPT box is to create a mapping if required and to exchange IP address and port numbers in IP packet headers accordingly when packets traverse the NAPT box. The NAPT mapping usually has a restricted life time (e.g. 2 minutes) and needs a permanent refreshing by sending packets between client and server.

Client 1 - Request Src: 5060 Dst: 5060

Router with NAPT

Source address mapping: Client 2 - Request Src: 5060 Dst: 5060 External IP: Internal IP: Src1: 5060 Src1: 5060 Src2: 5060 Src2: 23544

Server - Responses Src: 5060 Dst1: 5060 Src: 5060 Dst2: 23544

Figure 45: Principle mechanism of NAPT

13.2 Firewalls
Firewalls are typically implemented in the same equipment as NAT. That means a NAT box usually also has firewalling capabilities and both functions cannot be controlled independently. This fact has led to a specific categorization model in the past (see chapter 13.5.1 on page 78) which unfortunately did not hold. The behavior of NAT boxes turned out to be unpredictable in

many cases and therefore the method of determining a NAT characteristic as used in STUN (see chapter 13.5.2 on page 79) should not be used any longer.

13.3 Problems caused by NAT and Firewall Traversal

The principle problems caused by NAT are well known 72 since many years. Despite of this fact protocol enhancements to SIP which help to cope with NAT have not been in focus from the beginning but have been added only a few years later. Maybe the expectation of the designer of SIP was that IPv6 will be deployed in time and any NAT mechanisms will therefore become obsolete. Why is NAT so bad for SIP? The root cause of the problem is that SIP uses numeric IP addresses within the protocol payload in several positions73. The typical NAT mechanism in a NAPT box only replaces the IP addresses and port number at the IP header but usually does not touch addresses inside the protocol. Figure 46 shows an INVITE request originating from a private network behind NAT. The critical addresses within the request are marked in red. These are addresses from the private address space which cannot be routed outside of the private network. The problems caused by NAT in detail are: Via header field: Responses cannot be sent back to by the previous SIP network element. Contact header field: The contact address cannot be used for direct communication between UAC and UAS during the dialog. SDP: The advertised address and port cannot be used for receiving media.

The consequence is that signaling and media stream is impacted and this sometimes results in: - Signaling only in one direction - Session setup without media connection (ghost ring) - Unidirectional media, etc Without any additional efforts it is impossible to use SIP in NATed network environments. During the several years of SIP standardization various solutions have been proposed and some add-ons to SIP have been defined. The next chapters explain the most important enhancements and protocol extensions in this area. We can find enhancements on the user agent side, enhancements within the network or solutions impacting both sides including use of additional servers.


RFC 2663: IP Network Address Translator (NAT) Terminology and Considerations RFC 3027: Protocol Complications with the IP Network Address Translator 73 From an expert point of view one can argue that the layering rules of protocols have been violated by SIP. Each layer should use its own addressing mechanism but not re-use addresses of a lower layer in an upper layer.

INVITE SIP/2.0 Via: SIP/2.0/TCP;branch=z9hG4bK-f275c654cd6b756d Max-Forwards: 70 To: "Franz Edler"<> From: "Klaus Berner"<>;tag=3f542204 Call-ID: NWU4MDQwNzVjODBhMjhkMjdhMDkxMjhlODkxMGE3NDI. Contact: <sip: klaus.berner@;transport=TCP> CSeq: 1 INVITE Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, NOTIFY, MESSAGE, SUBSCRIBE Content-Type: application/sdp User-Agent: eyeBeam release 1100z stamp 47739 Content-Length: 417 v=0 o=- 7 2 IN IP4 s=CounterPath eyeBeam 1.5 c=IN IP4 t=0 0 m=audio 63912 RTP/AVP 107 100 106 6 0 105 8 18 3 5 101 a=fmtp:18 annexb=yes a=fmtp:101 0-15 a=rtpmap:107 BV32/16000 a=rtpmap:100 SPEEX/16000 a=rtpmap:106 SPEEX-FEC/16000 a=rtpmap:105 SPEEX-FEC/8000 a=rtpmap:18 G729/8000 a=rtpmap:101 telephone-event/8000 a=sendrecv a=x-rtp-session-id:DB819CF67F894567AC7C1F0F9763A62C

Figure 46: SIP INVITE request originated from behind NAT The major drawback of most of these solutions is the brittleness, that means that some of the solutions do not work in all environments; some need special configurations and experience on the client side and/or special resources (servers) in the network, and it usually requires a lot of expertise to predict if a solution will work in which network constellation. This is the bad news. Therefore it is not a surprise that the efforts within the last years concentrated on finding and defining an easy to configure one-fits-all solution for NAT and FW traversal. The good news now is: Such a solution has been defined now. It is a combination of two RFCs, which will be explained later: Client initiated connections: see chapter 13.6.5 on page 92 Interactive Connectivity Establishment: see chapter 13.6.3 on page 86

But before explaining this solution some of the other methods will be mentioned, because these are still in use today and are also applied in IMS74.


IMS = IP Multimedia Subsystem


13.4 SIP Protocol Enhancements

During the first years of bad experience with NAT and FW traversal for SIP some possible gaps have been closed by enhancing the protocol.

13.4.1 Symmetric Response Routing

One possible gap was identified by analyzing the procedures how the Via header field created. The Via header field indicates the path taken by the request so far and indicates the path that should be followed in routing responses. A Via header field value contains the transport protocol used to send the message, the client's host name or network address and possibly the port number at which it wishes to receive responses. A Via header field value may also contain parameters such as "received", and "branch" among others. A typical Via header field may look like
Via: SIP/2.0/UDP;branch=z9hG4bK87asdks7

The rule for a server (the next hop which has to re-use the Via header field for routing responses) which receives a request is as follows: When the server receives a request it must examine the value of the "sent-by" parameter in the top Via header field value. If the host portion of the "sent-by" parameter contains a domain name, or if it contains an IP address that differs from the packet source address, the server must add a "received" parameter to that Via header field. This parameter must contain the source address from which the packet was received. This is to assist the server in sending the response, since it must be sent to the source IP address from which the request came. That means in case of NAT an outbound proxy server will always add the external IP-address from where it received the request to the Via header. But one important parameter is missing to enable successful routing back of responses: the port number. RFC 358175 fills this gap. This protocol extension defines a new parameter for the Via header field, called rport. When a client adds an empty rport parameter to the Via header field of a request the server must also enter the port from which the request originated. The combination of received and rport parameters now contains the both IP address and port number of the external address in case of NAT.

Applying this rule now to an example means the following: When the client is behind NAT and it adds an empty rport parameter.
Via: SIP/2.0/UDP;rport;branch=z9hG4bK87asdks7

The server then will complement the Via header field of the IP address and port number from where it received the request as follows:


RFC 3581: An Extension to SIP for Symmetric Response Routing


Via: SIP/2.0/UDP; received=;rport=9876;branch=z9hG4bK87asdks7 A response will then be sent back to 9876. The rport-procedure is initiated by the client when it adds an empty rport parameter. Therefore the addition of an rport parameter is usually controlled via a provisioning parameter of a user agent. Note that this extension is only necessary in case of UDP. If TCP is used as transport protocol responses are always sent on the connection setup by the request.

13.4.2 Symmetric RTP/RTCP

Another extension enhances the chance that a media stream might successfully traverse a NAT even when the private IP address and port is advertised in SDP. RFC 496176 recommends using one UDP port pair for both communication directions of bidirectional RTP and RTP Control Protocol (RTCP) sessions, commonly called "symmetric RTP" and "symmetric RTCP". If a client uses the same port for both directions of media there is a chance that the session partner recognizes media packets from the other partner whereby the IP-address and port number does not match to the advertised IP-address and port number of the SDP (due to NAT mapping). The trick is now that a client which receives this kind of media packets should ignore the IPaddress and port number of SDP and use the received originating IP-address and port number also as target address for sending the media stream. This leads sometimes to the situation that a client starts to send media packets using the address and port received in SDP, and when it receives the first media packet from the peer client from a different IP address and port than advertised one, it switches to the new address and port (from red to blue). Figure 47 shows the situation RTP/RTCP addressing situation.

External IP-address/port Public IP-address/port

Private IP-address/port

Figure 47: Symmetric RTP/RTCP


RFC 4961: Symmetric RTP / RTP Control Protocol (RTCP)


Be aware that this trick only works if one session partner is on the public Internet (or uses a correctly mapped public address). If it is also behind NAT (green line) the trick wont work because then it will not receive any media packet. Symmetric RTP/RTCP behavior is also a parameter which is usually provisioned at a SIP client.

13.4.3 RTCP attribute in SDP

SDP is used to describe the parameters of media streams used in multimedia sessions. When a session requires multiple ports (typically for RTP and RTCP), SDP assumes that these ports have consecutive numbers. However, when the session crosses a network address translation device that uses a port mapping not preserving the contiguity, the ordering of ports can be destroyed by the translation. To handle this situation RFC 360577 defines an extension attribute to SDP. An enhanced SDP could then look like: m=audio 49170 RTP/AVP 0 a=rtcp:53020 The RTCP uses in above case a different port mapping (port 53020 instead of 49171).

13.5 Classical NAT and FW Traversal Solutions

In the early years of working on viable solutions for NAT and Firewall traversal the first step was to analyze the behavior of NAT boxes. The NAT boxes usually combine NAT and firewall behavior. The outcome of the analysis was a categorization of NAT/FW behavior according to four classes. For three of the classes (cone NAT behavior) a comfortable solution on the client side was found by using an external STUN server. The fourth class (symmetric NAT) cannot be solved by the client alone, and in this case an additional media relay server (TURN server) is necessary. After some time of experience with STUN and TURN it turned out that in reality the behavior of NAT boxes is not as predictable as assumed for the STUN solution (e.g. sometimes NAT boxes change their behavior during operation). Therefore the methodology has been thoroughly reworked and the outcome now is the ICE methodology (see chapter 13.6.3 on page 86). But the historical STUN based solutions are still in use and therefore the methodology is explained below.


RFC 3605: RTCP attribute in SDP


13.5.1 NAT and FW categorisation

The STUN methodology (see next chapter) assumes that NAT boxes behave according to one of the following four categories: Full Cone Restricted Cone Port Restricted Cone Symmetric

The three cone NAT categories are shown in Figure 48. The common characteristic of all three flavours is that the NAT mapping only depends on the IP address and port on the private side (IP:, Port: 8000). Each host on the public side may reach the IP address and port using the same mapping (IP:, Port: 12345). The difference between the three flavors is the firewall behavior is shown in Figure 48. In case of Full Cone there is no firewall at all and therefore no restriction. The inside host is always reachable from any outside host. Restricted Cone means that the inside host is only reachable from an outside host if a packet has been received by the outside host from the inside host within a certain time interval. Only the IP-address of the outside host is sufficient for traversing the firewall. If also the port of a previously sent packet must match, then we have the Port Restricted Cone behavior.

NAT enabled firewall

Full Cone: Host on the public side may always reach the host on the private side using the mapped address/port. Host A Restricted Cone: Hosts on the public side may only reach the host on the private side if it had sent a packet to the outside IP adress before. Restricted Port Cone: Hosts on the public side may only reach the host on the private side if it had sent a packet to the outside IP and port adress before.

IP: Port: 12345 IP: Port: 8000

Host B private address space public address space

Figure 48: Cone NAT behavior All three Cone behavior variants are SIP friendly, because the mapping of an internal IP-address/port to an outside IP-address/port is always the same if it is once known.


In contrast to that we have the Symmetric NAT behavior as shown in Figure 49.

NAT enabled firewall IP: Port: 12345 Symmetric NAT: The IP-address/port mapping of the same inside IP-address/port is destination dependent. The mapping cannot be predicted. If e.g. Host B would use the mapping of Host A the packets would be dropped by the firewall and vice versa. IP: Port: 45678 Host B private address space public address space

Host A

IP: Port: 8000

Figure 49: Symmetric NAT behavior In case of Symmetric NAT behavior the mapping is not predictable. This is very SIP unfriendly. As long as we do not know the IP-address/port of the peer every IP-address/port put into Via, Contact header field or SDP will be incorrect. Unfortunately an increasing number of NAT boxes follow a more symmetric behavior due to its stronger firewall characteristic.

13.5.2 (Classic) STUN protocol

RFC 3489 defines a Simple Traversal of UDP Through NATs protocol and methodology called STUN78. The basic idea behind STUN is to use a server on the outside network which helps a) to find the NAT mapping behavior according to above four categories, and b) in case of cone behavior to get the mapping used by the NAT box. The first step a) is called the NAT behavior discovery process. This is done according a defined algorithm roughly drawn in Figure 50. If the outcome of step a) is symmetric NAT then the mission is finished. Nothing can be done from the client side. A media relay server (TURN server) has to be used in this case (see chapter 13.6.3 on page 86). But if the outcome of step a) is a cone NAT behavior then the client queries the mapping of different ports it plans to use (signaling port, media ports) and inserts the mapped ports into the SIP request and responses. By using the STUN algorithm the client may self-repair the NAT


RFC 3489: STUN - Simple Traversal of UDP Through Network Address Translators (NATs)

situation79 so that from an outside server (registrar or proxy server) the client looks like not being behind NAT.

NAT enabled firewall STUN server: A STUN server is located on the outside (public) network and uses two IP addresses. During NAT behaviour discovery the client asks e.g. the STUN server to reflect back a packet from the other IP address. If this does not arrive the client concludes to be behind a symmetric NAT. If the packet arrives the client concludes to be behind a cone NAT, and it mirrors the mapped address in the reflected packet.

IP: Port: 8000

STUN server

private address space

public address space

Figure 50: STUN server and algorithm The (classic) STUN protocol defines different packet formats to control a STUN server. It uses the dedicated port number 3478 and can be used with UDP or TCP transport protocol.

13.6 The perfect NAT and FW Traversal Solution

Starting with a short overview this chapter describes the final and perfect NAT and FW traversal solutions. The only drawback of this solution: it is a little bit complex. As a first step the shortfalls of the past, where the NAT behavior was never defined, were eliminated by creating RFC 4787. This RFC lists all requirements for a NAT box to behave SIP friendly (chapter 13.6.1). This RFC now is late because millions of NAT boxes are already in the field, but it is perhaps not too late if one believes that IPv4 will still be around for some time. In a next step it was recognized that the classic STUN based NAT traversal solution has some flaws because not all NAT boxes behave as expected and because the protocol also has some security vulnerability. Then two new solutions for NAT traversal have been designed: ICE and SIP outbound: ICE cares for traversal of media packets through NAT (chapter 13.6.3 on page 86). SIP outbound cares for traversal of signaling trough NAT (chapter 13.6.5 on page 92).

Both solutions need additional protocol support by a redefined STUN protocol (chapter 13.6.2 on page 82).


This method is also called UNSAF: Unilateral Self Address Fixing.


With all this ingredients every NAT situation can now be automatically solved in the most economical way. If SIP unfriendly NAT implementations are involved a TURN server will be automatically (chapter 13.6.3 on page 86) inserted into the media path, but this is only the last resort if other solutions fail.

13.6.1 NAT and FW Behavior Requirements

RFC 478780 defines a basic terminology for describing different types of Network Address Translation (NAT) behavior when handling unicast UDP packets. It also defines a set of requirements that would allow many applications like SIP to work consistently. If NAT boxes meet the proposed set of requirements SIP traffic will function properly when traversing the NAT. In view of the ICE methodology (chapter 13.6.3 on page 86) this means that the last resort method (inclusion of a TURN server into the media stream) would not be needed anymore. The following aspects are covered by RFC 4787. It is only a short summary of the requirements to give an impression what behavior aspects of a NAT box might disturb SIP traffic traversal. Network address and port translation Address and port mapping should be endpoint independent. An overloading of mapped ports must be avoided. Port parity (even and odd) numbers should be preserved. The life time of port mappings should be greater than 2 minutes (5 min. recommended). Port mapping should be refreshed when packets traverse from inside to outside. In case private address space is used on both sides an overlap must be avoided.

Packet Filtering The filtering should be endpoint independent.

Hairpinning The NAT box should support hairpinning (see Figure 51 below).

Application Layer Gateway It should be possible to disable an ALG function.

Deterministic properties The NAT algorithm should be deterministic and not be changed on the fly.

ICMP Destination Unreachable Behavior The receipt of an ICMP message should not terminate the NAT mapping.


RFC 4787: NAT Behavioral Requirements for Unicast UDP


Packet fragmentation The NAT box should honor the DF (Dont Fragment) bit set from the internal traffic. The NAT box should be able to receive in-order and out-of-order packets.

NAT enabled firewall Hairpinning behaviour: In case the internal client with IP-address/port A:a sends a packet to a second internal client B:b using the external mapped address Y:y of ist peer as destination, the packet should not be sent to the outside domain but rather be hairpinned internally in the NAT box. It is questionable if a packet sent to outside domain will be returned to the same IP address. A router usually discards packets instead of sending the packets back.



B:b Y:y

private address space

public address space

Figure 51: Hairpinning behavior

13.6.2 The new STUN protocol

With RFC 538981 the classical STUN has been redefined. It obsoletes the previous version of the protocol and also has changed its name: the abbreviation of STUN now stands for Session Traversal Utilities for NAT instead of Simple Traversal of UDP Through NATs. The major differences between the two versions are the following The classic STUN protocol is a dedicated protocol to support traversal of UDP packets through NATs. It uses a dedicated port number (3478) and has two major tasks: - discover the NAT type (one out of four types) - get the mapping in case of cone NAT behavior The classic STUN offers a complete solution for NAT traversal, but unfortunately does not work in all situations due to unpredictable behavior of NAT boxes. The new STUN protocol is a protocol that serves as a tool for other protocols in dealing with NAT traversal. It can be used by an endpoint to determine the IP address and port allocated to it by a NAT. It can also be used to check connectivity between two endpoints, and as a keep-alive protocol to maintain NAT bindings. STUN works with many existing NATs, and does not require any special behavior from them.


RFC 5389: Session Traversal Utilities for NAT (STUN)


STUN is not a NAT traversal solution by itself. Rather, it is a tool to be used in the context of a NAT traversal solution. The new STUN protocol provides a tool for dealing with NATs. It actually enables four different applications called STUN usages: a) It enables an endpoint to determine the IP address/port mapping used by NAT. b) It provides a way for an endpoint to keep a NAT binding alive. c) The protocol can be used to execute connectivity checks between two endpoints. d) The protocol can be used to relay packets between two endpoints. In keeping with its tool nature, the new STUN protocol defines an extensible packet format, defines operation over several transport protocols, and provides for two forms of authentication.

STUN usages STUN is intended to be used in context of one or more NAT traversal solutions. These solutions are known as STUN usages. Each usage describes how STUN is utilized to achieve a NAT traversal solution. A usage typically indicates when STUN messages get sent, which optional attributes to include, what server is used, and what authentication mechanism is to be used. Three usages of STUN are further defined below: Interactive Connectivity Establishment (ICE) Client initiated connections (SIP outbound) Traversal Using Relays around NAT (TURN)

Next the protocol structure is described.


STUN protocol structure STUN is a client/server protocol supporting two types of transactions. One is a request/response transaction in which a client sends a request to a server, and the server returns a response. The second is an indication transaction in which either agent - client or server - sends an indication which generates no response. STUN is a binary protocol. All STUN messages start with a fixed header (see Figure 52 below) that includes a STUN message type (comprising class and method), the message length, a magic cookie and a transaction ID. The class indicates whether this is a request, a success response, an error response, or an indication. The method indicates which of the various requests or indications this is. The basis STUN specification defines just one method, Binding but the TURN usage also defines the methods Allocate, Refresh, Channel Bind, Send and Data. The transaction ID is used to map request and responses.




STUN Message Type Magic Cookie

Message Length

Transaction ID (96 bits)

Figure 52: Format of the STUN message header After the STUN header zero or more attributes may follow. Each attribute is TLV encoded (TypeLength-Value) as shown in Figure 53.
16 31

Type Value (variable )


Figure 53: Format of a STUN attribute (TLV) The STUN protocol defines a basic set of attributes and some usages define additional extension attributes.


In the specific case of a Binding request/transaction, a Binding Request is sent from a STUN client to a STUN server (see Figure 54). The STUN client is embedded in an application and multiplexed with the application protocol. When the Binding Request arrives at the STUN server, it may have passed through one or more NATs between the STUN client and the STUN server (in Figure 54 there were two such NATs). As the Binding Request message passes through a NAT, the NAT will modify the source transport address (that is, the source IP address and the source port) of the packet. As a result, the source transport address of the request received by the server will be the public IP address and port created by the NAT closest to the server. This is called a reflexive transport address. The STUN server copies that source transport address into an XOR-MAPPED- ADDRESS attribute in the STUN Binding Response and sends the Binding Response back to the STUN client. As this packet passes back through a NAT, the NAT will modify the destination transport address in the IP header, but the transport address in the XOR-MAPPED-ADDRESS attribute within the body of the STUN response will remain untouched. In this way, the client can learn the reflexive transport address allocated by the outermost NAT for a specific protocol.


Clients Server Reflexive Transport Address

Application STUN client Binding Response Binding Request STUN server Private network 1 Private network 2 Public network

Figure 54: STUN Binding Request/Response By using the Binding method of STUN a client can acquire its reflexive transport addresses for all its communication protocols. It then may use this addresses inside of the payload of protocols and can so communicate with its peers even if it is located behind NAT. The reflexive transport addresses are only usable if the NAT has a good behavior. If the NAT mechanism is destination IP-address/port dependent than it is bad and as a consequence the acquired reflexive transport addresses will not be usable. A very interesting feature of the new STUN protocol is that it has been designed to be included (multiplexed) in other application protocols using the same port as the application protocol. This is necessary because different destination ports of a packet get a different address mapping at some

NATs. To be able to detect the specific NAT mapping of an application protocol (e.g. SIP with port 5060) the STUN Binding request/response must be sent within the same protocol. Therefore the STUN protocol elements have to be multiplexed within the particular application protocol. The challenge now has been to design some protocol characteristics into STUN that guarantee no code collision and allow to an application to discriminate a STUN packet from other packets of the application protocol. STUN provides the following characteristics in the STUN header for this purpose. The protocol header starts with two bits zero The message type must contain reasonable values The message length must be correct A specific magic cookie with a fixed value at the correct position must be available.

If these characteristics are not sufficient to distinguish the packets, then STUN packets can also contain a fingerprint value carried in a protocol extension field. STUN further defines a set of optional procedures (mechanisms) that may be applied in a specific usage of the protocol. These mechanisms include DNS discovery to locate a STUN server, a redirection technique to an alternate server and two authentication and message integrity exchanges. The authentication mechanisms are based on a username, password, and messageintegrity value. Two authentication mechanisms, the long-term credential mechanism and the short-term credential mechanism are also defined. The long-term credential mechanism is based on a pre-provisioned username and password and a digest challenge/ response exchange similar to HTTP. The short-term credential mechanism uses some out-of-band method (e.g. SIP signaling) to exchange a username and password between client and the server prior to the STUN exchange.

13.6.3 Traversal Using Relays around NAT (TURN)

If a host is located behind a NAT, then in certain situations (e.g. destination dependent address mapping) it can be impossible for that host to communicate directly with other hosts (peers) located behind other NATs. In these situations, it is necessary for the host to use the services of an intermediate node that acts as a communication relay. This relay server is called a TURN server and it acts as an anchor with a fixed address where media packets are relayed. The TURN82 specification defines a protocol that allows the host to control the operation of the relay server and to exchange packets with its peers using this server. The only reliable way to obtain a UDP transport address that can be used for corresponding with a peer through such a NAT is to make use of a relay server. The relay server sits on the public side of the NAT, and allocates transport addresses to clients reaching it from behind the private side of the NAT. These allocated transport addresses, called relayed transport addresses, are IP

RFC 5766: Traversal Using Relays around NAT (TURN) - Relay Extensions to STUNs

addresses and ports on the relay. When the relay server receives a packet on one of these allocated addresses, the relay server forwards it toward the client. The TURN specification makes use of an extension to the STUN protocol. It allows a client to request a relayed transport address on a TURN server. Figure 55 shows the relayed transport address of a client (yellow) together with its server reflexive address obtained via STUN binding method (blue) and the physical transport address (light green). The client behind NAT may in principle offer three different addresses to a potential peer. This is also the basis of the ICE methodology. A relayed transport address of a TURN server will work in any case, irrespective how bad a NAT may be. Why does the relay method always work? Thats because the relay server uses a fixed address and therefore a potential issue with a IPaddress/port dependent mapping is avoided. The variability of destination addresses is shielded by the TURN server which then relays packets to variable destinations. But there is an obvious drawback of TURN server. It costs network resources and causes additional delay for the media stream. Therefore a TURN server is usually used as a last resort solution.

Private network Public network

NAT Peer A

Application STUN TURN client TURN server

Peer B

Clients Server Reflexive Transport Address: Clients Host Transport Address: TURN Server Address:

Peer As Server Reflexive Transport Address: Clients Relayed Transport Address: Peer B Host Transport Address: 18200

Figure 55: Clients Host, Server Reflexive and Relayed Transport Address


Exchanging Data with Peers Figure 56 shows the setup of relay ports and the exchange of data between clients, peers and a TURN server. The client requests the allocation of relay ports from the TURN server. After that it may use the allocated ports to send data to its peers. The address of the peer is an attribute in the Send indication. Data received from the peer are submitted in a Data indication. Allocate request/response, Send and Data indication are TURN commands (STUN extensions). The overhead of the Send and Data indication (36 bytes) is relatively high for applications like voice transport. Therefore an optimised procedure may be used in this case: the setup of data channels. Figure 57 shows the setup of a data channel on a TURN server. Data Channels have to be setup via a ChannelBind request/response (in Figure 57 for peer A). The ChannelBind request maps the destination address to a slim 4 bytes header enabling thus a more efficient data transport.

TURN client

TURN server

Peer A

Peer B

Allocate Request Allocate Response

Send (Peer A) Send (Peer A) Data (Peer A) Data (Peer A) Send (Peer B) Send (Peer B) Data (Peer B) Data (Peer B)

Figure 56: Setup of Relay Ports and Data Exchange with Send and Data Indication


TURN client

TURN server

Peer A

Peer B

Allocate Request Allocate Response ChannelBind Request (Peer A to 0x4001)) ChannelBind Response [0x4001] Data [0x4001] Data Send (Peer B) Send (Peer B) Data (Peer B) Data (Peer B)

Data Data

Figure 57: Setup of Relay Ports and Data Channels The TURN protocol can be used in isolation, but is more properly used as part of the ICE (Interactive Connectivity Establishment) approach to NAT traversal. Some final remarks on TURN protocol: In addition to the principle mechanism of relaying packets the protocol includes also authentication mechanisms to Allocate transactions to avoid any security issues (DoS attacks). Also refresh mechanisms are defined for allocations so that resources are not occupied endless in case of loss of control data.

13.6.4 Interactive Connectivity establishment

After understanding the concept of STUN and TURN (in particular Figure 55) it should be not so difficult to understand the ICE methodology. ICE (Interactive Connectivity Establishment)83 is the only always working solution for NAT traversal for UDP-based multimedia sessions. ICE is also based on an extension to the offer/answer model, and works by including a multiplicity of IP addresses and ports in SDP offers and answers, which are then tested for connectivity by peer-to-peer connectivity checks. The IP addresses and ports included in the SDP are the physical addresses of the host and addresses gathered by using the STUN and TURN. STUN based connectivity checks allow to select the best (most economic) address and port pairs.


RFC 5245: Interactive Connectivity Establishment (ICE) - A Protocol for NAT Traversal for Offer/Answer Protocols

Figure 58 shows the deployment scenario of an ICE based solution. There are two clients which are behind different NATs and which want to setup a media session. A precondition for ICE to be applied is an established signaling connection for each of clients (blue arrows). This precondition can be fulfilled following the method of client initiated connections (chapter 13.6.5 on page 92). Each client additionally has access to a STUN and a TURN server84 which typically may be collocated. The clients use STUN Binding and TURN Allocate requests to get additional IPaddress/port combinations where they may be reachable. Together with the physical address on the interface each client will have three candidate addresses. These candidate addresses are required for each media stream the client wants to use (for RTP and RTCP).

SIP signalling

STUN TURN server

SIP server

SIP server

STUN TURN server





Figure 58: ICE deployment scenario Figure 59 shows again the three address types and their relationship. When Alice now wants to setup a session she gathers all the usable (candidate) addresses for media to receive and sends an INVITE request. This request contains a modified SDP which includes additional attributes for all the candidate addresses. Such an additional attributes are shown below (line folded for readability):


If only a STUN server is available is also a valid scenario, but perhaps some NAT situations may not be covered then.

a=candidate:2 1 UDP 1694498815 45664 typ srflx raddr rport 8998 The a=candidate line contains various parameters like IP-address and port ( a type (srflx = server reflexive) and a priority (1694498815) among others85. When Bob receives the INVITE request it does the same. It gathers also all candidate addresses and includes the a=candidate lines in its SDP answer.

STUN/TURN server

Relayed address

Server reflexive address


Local address

Figure 59: Candidate addresses and their relationship When both peers have exchanged their candidate addresses they both setup possible address pairs and start connectivity tests. The list of possible address pairs is prioritised on both sides so that the most preferable pairs are tested for connectivity before the others (local addresses are preferred over server reflexive addresses and the lowest priority is for relayed addresses). The connectivity test uses a STUN Binding request/response. That means that each peer must listen with an internal STUN server on all advertised candidate address/ports and respond accordingly. When the connectivity test succeeds, a usable address pair was found and further tests are stopped. The detailed procedure is much more complex and should be read in RFC 5245 if required.


Explaining all details is out of scope of this lesson as the ICE-RFC is a complex one.

13.6.5 Client initiated connections

The usual situation for a User Agent behind NAT/FW is that the User Agent can set-up connections to a registrar or proxy server but connections in the reverse direction are often not possible. This is because NAT/FW devices will only allow outgoing connections and the pinhole in the firewall is open only for a limited time in the range of typically 30sec 2min. RFC 5626 Managing Client Initiated Connections in SIP86 (also called SIP outbound) is the proposed solution to enable signaling connections also in the reverse direction (from the inbound proxy server) to the User Agent even if NAT/FW boxes are between. SIP outbound combined with the ICE methodology for media connections through NAT/FW constitute a full solution for NAT/FW traversal. The key idea of SIP outbound is that a UA creates a signaling flow when it sends a REGISTER request to the registrar/inbound proxy87. The identity of the signaling flow is maintained in the location server and reused for inbound connections by the proxy server. Remember that the usual method for an inbound proxy server is to resolve the Contact URI and forward a request irrespective of an existing connection in the reverse direction. Using SIP outbound the inbound proxy server now forwards the request over an existing flow instead of resolving the Contact URI. A flow whether it is based on UDP or TCP is identified by two parameters inserted by the user agent in the Contact header field during registration: The instance-id: a media feature tag88 in the Contact header field which uniquely identifies a specific UA instance. The reg-id: a new parameter in the Contact header field, which identifies a signaling flow towards the UA instance. The UA may register multiple times in parallel creating different flows with different reg-id values. To enable a UA to receive incoming requests through NAT/FW the UA has to connect to a server and to create a flow. Since the server can't connect to the UA, the UA has to make sure that a flow is always active. This requires the UA to detect when a flow fails. Since such detection takes time and leaves a window of opportunity for missed incoming requests, this mechanism allows the UA to register over multiple flows at the same time. An inbound proxy server will recognise multiple flows towards a UA because these are identified by the same instance-id. SIP outbound also defines two keep-alive schemes, depending on the transport protocol. The keep-alive mechanism is used to keep NAT bindings fresh, and to allow the UA to detect when a flow has failed. UAs use a simple periodic message (a ping) as a keep-alive mechanism to keep

86 87

RFC 5626: Managing Client Initiated Connections in SIP In the simple scenario we assume that the registrar takes also the role of the inbound proxy. 88 For media feature tags see chapter 15.1 on page 17.

their flow to the proxy or registrar alive. The ping message has to be answered by a pong message from the server, which allows to detect when a flow has failed. For connection oriented transports such as TCP the ping is based on double carriagereturn and line-feed sequences (CRLF - CRLF) and the pong is a single CRLF. For transports that are not connection oriented the ping-pong messages are accomplished by using a STUN Binding request/response transaction.

Figure 60 shows an example of a REGISTER request creating a signaling flow according to the SIP-outbound draft:

REGISTER SIP/2.0 Via: SIP/2.0/TCP;branch=z9hG4bK-bad0ce-11-1036 Max-Forwards: 70 From: Bob <>;tag=d879h76 To: Bob <> Call-ID: 8921348ju72je840.204 CSeq: 1 REGISTER Supported: outbound Contact: <sip:line1@;transport=tcp>; reg-id=1; ;+sip.instance="<urn:uuid:00000000-0000-1000-8000-000A95A0E128>" Content-Length: 0
Figure 60: REGISTER request creating a signaling flow SIP outbound further defines an option tag outbound which must be inserted by the UA in the Supported header field at the registration. Finally it should be mentioned that the above mechanism describes only the simple scenario of a co-located registrar and inbound-proxy. But the SIP-outbound draft may also be applied to more complex scenarios e.g. allowing a pair of edge proxies be located between UA and registrar/proxy.


13.7 External and proprietary Solutions

Besides the NAT/FW traversal tools and solutions created around the SIP standardization there are also other solutions and methods for NAT/FW traversal available which are shortly mentioned in this chapter.

13.7.1 Application Layer Gateways

Application Layer Gateways (ALGs) were in the early days of SIP the most natural solution for NAT/FW traversal. The principle idea of an ALG is to repair the SIP protocol at that network node where the problem has its roots: the NAT/FW box. Figure 61 shows the principle protocol structure of an ALG.
NAT/FW box

application (SIP) transport IP network transport IP network

Figure 61: Protocol structure of an Application Layer Gateway An ALG in principle works as a B2BUA, because it has to modify many parameters in the user agent domain. But on a first view it makes sense to implement such functionality into a NAT/FW box. The main reason is that the NAT/FW box (e.g. a DSL router for a home network) knows the address mapping created by NAT and also the FW rules set-up. It therefore seems quite natural that the NAT/FW box should do the necessary modifications in SIP requests and responses by replacing the address/port values accordingly and opening ports in the firewall. But this means also that the NAT/FW box now becomes application aware. The drawback of ALGs is twofold: The NAT/FW box must implement and fully understand the SIP protocol. In view of the many extensions the probability is high that some extensions are not properly supported and the end-to-end communication is broken. The business role if the NAT/FW vendor is a different one. The vendor usually does not have a big interest in permanently evolving the product according to the innovations produced by the SIP protocol groups.


13.7.2 UPnP
Another solution outside of standards is the Universal Plug and Play (UPnP) industry initiative ( The UPnP standards were mainly pushed by Microsoft and enable a client to control a residential gateway typically a NAT/FW-router. An UPnP enabled client can query the address mapping from the gateway by SOAP protocol. Compared with an Application Layer Gateway the advantage of UPnP is that the gateway does not need to have all protocol logic of SIP implemented. The drawback of UPnP may be the security risk to have a home gateway that is controlled by different applications. In addition UPnP uses broadcast messages to advertise network infrastructure information, a fact that is not well accepted by some security managers.

13.7.3 Skype
After all the complexity of NAT/FW traversal for SIP shown in the previous chapters a question might arise: How does Skype the well-known VoIP application handle the issues? Skype does an excellent job in this area. No NAT/FW configuration is known where Skype media streams are blocked. But the details how this is handled are not fully transparent because Skype is a proprietary peer-to-peer technology and uses an encrypted protocol. There has been some research and reverse engineering in the past on the protocol89. The main outcome in the area of NAT/FW was: Skype tunnels signaling and media streams through port 8080 (http) as a kind of last resort if no other method succeeds. Also Skype uses some kind of TURN server in case of bad NAT, but these servers where media streams traverse are not servers provisioned by Skype but servers of super-nodes. These are the hosts of some high performance users with high bandwidth access and fixed IP address (with and without their knowledge).

Also with Skype we can conclude that there is no easier way to traverse bad NATs.

13.7.4 SIP Express Router

The SIP express Router (SER) is a famous open source SIP server, which is also used at the University in the labs and as a public service. The handling of NAT/FW traversal was also an issue from the beginning. Various counter measures are offered with SER. The usage of a classic STUN server is proposed as a first step. For all users behind a bad NAT SER offers two competing TURN like technologies: An external TURN like server (RTP-proxy or media proxy)


An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol:

A software module within the server which corrects the unusable private IP addresses and wrong SDP data (nathelper- or mediaproxy module).

This solution is interesting insofar as the TURN like servers of SER are automatically included (SDP data are modified) without any knowledge the user agents.


14 Session Timer
The SIP does not define a keep-alive mechanism for the sessions it establishes. Although the user agents may be able to determine whether the session has timed out by using session specific mechanisms, SIP proxy servers which are included in the signaling path will not be able to do so. The result is that call stateful proxy server will not always be able to determine whether a session is still active. For instance, when a user agent fails to send a BYE message at the end of a session, or when the BYE message gets lost due to network problems, a call stateful proxy will not know when the session has ended. In this situation, the call stateful proxy will retain state for the call and has no chance to determine if the call state information is still valid. To resolve this problem the SIP session timer extension defines a keep-alive mechanism for SIP sessions. UAs send periodic re-INVITE or UPDATE requests (referred to as session refresh requests) to keep the session alive. The interval for the session refresh requests is determined through a negotiation mechanism at session setup. If a session refresh request is not received before the interval passes, the session is considered to be terminated. Both UAs are supposed to send a BYE, and call stateful proxies can remove any state for the call. The SIP session timer extension is defined in RFC 402890. Periodic refreshes, through re-INVITE or UPDATE requests, are used to keep the session active. The solution works as long as either one of the two participants in a dialog understands the extension. Two new header fields (Session-Expires and Min-SE) and a new response code (422) are defined. The Session-Expires header field defines the time-interval for refreshing the session and it also carries a parameter about who is the refresher (uac or uas) . The Min-SE header field defines a minimum value for this time-interval to avoid overload of network elements. The response-code 422 Session Interval too Small avoids session-setup with too short refresh-timer values. Via the option-tag timer a User Agent can signal if it supports (Supported-Header) or even requires (Require-Header) this protocol extension. The default time-interval for refreshing audio-sessions is proposed to be 30 min. The session timer extension allows the user agents or proxy server to negotiate the timer-value and to define which of both User Agents is responsible for session refreshing. Both user agents and any SIP proxy server in between might activate the session-timer procedure. The only precondition is that one of both user agents supports the extension. SIP proxies in between are able to deny the session-timer procedure if the refresh-intervals are too short. Figure 62 and Figure 63 show an example of a session setup with SIP session timer extension. In this example, both the UAC and UAS support the session timer extension and the assumption is that both proxy server use Record-Route to stay within the signaling path.

RFC 4028: Session Timers in the Session Initiation Protocol


The session starts when Alice sends an INVITE request to Bob. Alice starts the session with a proposed session timer (Session-Expires header field, SE) of 120 seconds. This session timer is rejected by proxy P1 because the timer values is too short for the proxy. The Proxy P1 requires a minimum session timer of 1800 seconds (Min-SE header field, MSE). Alice sends a new INVITE request this time with a Session-Expires header field (SE) value 1800. Proxy P1 forwards the INVITE to proxy 2 and now the same rejection happens. Proxy P2 requires a minimum session timer of 3600 seconds. The third INVITE request of Alice with SE and MSE value of 3600 now succeeds and arrives at Bob. Bobs user agent also decides Alice to be the refresher of the session (adds a refresher=uac parameter to the Session-Expires header field). After some time (before the session timer at Alice expires) Alices user agent refreshes the session with an UPDATE request.
INVITE SE: 120 422 Session Int. too Small MSE: 1800 INVITE SE: 1800 MSE: 1800

Proxy P1

Proxy P2


INVITE SE: 1800 MSE: 1800 422 Session Int. too Small MSE: 3600

422 Session Int. too Small MSE: 3600 INVITE SE: 3600 MSE: 3600

Timer negotiation during session set-up

INVITE SE: 3600 MSE: 3600

INVITE SE: 3600 MSE: 3600

200 OK SE: 3600 ACK ACK

200 OK SE: 3600

200 OK SE: 3600

SE: Session Expires MSE: Min-SE


Figure 62: SIP Session timer extension (part 1)



Proxy P1

Proxy P2



UPDATE SE: 3600 200 OK SE: 3600


200 OK SE: 3600

200 OK SE: 3600

UA crashes

Timeout state is removed

Timeout state is removed State is removed in SIP proxies after session timeout

BYE BYE BYE 408 Timeout 408 Timeout

irrespective if a user agent clears the session

Figure 63: SIP Session timer extension (part 2) After some time the user agent of Alice crashes. No SIP signaling is sent anymore. After session timeout at the user agent of Bob the user agent send a BYE request. This BYE request timeouts at proxy P1 (and perhaps also at proxy P2 and at UA of Bob) because the UA of Alice does not answer anymore. Independent of the BYE request of Bob the activated SIP session timer extension within the SIP proxy servers causes the session state to be removed within both SIP proxy servers. The following messages reflect the SIP session timer extension for above example (bold font). The first INVITE request sent by Alice may look like:
INVITE SIP/2.0 Via: SIP/2.0/TLS;branch=z9hG4bKnashds8 Supported: timer Session-Expires: 120 Max-Forwards: 70 To: Bob <> From: Alice <>;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314159 INVITE Contact: <> Content-Type: application/sdp Content-Length: 142 (Alice's SDP not shown)


The rejection response sent by proxy P1 then may look like:

SIP/2.0 422 Session Interval Too Small Via: SIP/2.0/TLS;branch=z9hG4bKnashds8 ;received= Min-SE: 1800 To: Bob <>;tag=9a8kz From: Alice <>;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314159 INVITE

The 2nd INVITE request of Alice may then look like:

INVITE SIP/2.0 Via: SIP/2.0/TLS;branch=z9hG4bKnashds9 Supported: timer Session-Expires: 1800 Min-SE: 1800 Max-Forwards: 70 To: Bob <> From: Alice <>;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314160 INVITE Contact: <> Content-Type: application/sdp Content-Length: 142 (Alice's SDP not shown)

As a last example message the 200 OK of Bob may look like:

SIP/2.0 200 OK Via: SIP/2.0/TLS;branch=z9hG4bKutnsd983c ;received= Via: SIP/2.0/TLS;branch=z9hG4bKnashds10 ;received= Require: timer Supported: timer Record-Route:;lr Record-Route:;lr Session-Expires: 3600;refresher=uac To: Bob <>;tag=9as888nd From: Alice <>;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314161 INVITE Contact: <sips:bob@> Content-Type: application/sdp Content-Length: 142 (Bob's SDP not shown)

It contains the refresher parameter within the Session-Expires header field.


15 Caller Preferences and UA Capabilities

15.1 User Agent Capabilities
SIP user agents vary widely in their capabilities and in the types of devices they represent. Frequently it is important for the session partner to learn the capabilities and characteristics of each other. Two examples illustrate where this information might be needed: One user agent - a PC-based application - is communicating with another that is embedded in a limited-function device. The PC would like to be able to "grey out" those components of the user interface that represent features or capabilities not supported by the peer UA. This requires a way to exchange capability information within a dialog. A user has two devices at disposal. One is a videophone and the other a voice-only wireless phone. A caller wants to interact with the user using video. The caller would like the call to be routed to the device which supports video. Therefore the INVITE request should contain parameters that express a preference for routing to a device with the specified capabilities SIP already has some support for expression of capabilities. The Allow, Accept, Accept-Language, and Supported header fields convey some information about the capabilities of a user agent. However these header fields convey only a small part of the information that is needed. These header fields do not provide a general framework for expression of capabilities. RFC 384091 provides a more general framework for an indication of capabilities and characteristics in SIP. Based on this SIP extension capability and characteristic information about a UA is carried as parameters of the Contact header field. These parameters can be used within REGISTER requests and responses, OPTIONS responses, and requests and responses that create dialogs (such as INVITE). Basis for these parameters added to the Contact header field is an already existing Content Negotiation Framework92. This framework is based on media feature tags which can take different values. The combination of several feature tags with individual values defines a feature set. To make the assignment and thus the description of feature sets which is supported by a user agent flexible feature set predicates are defined. Feature set predicates are logical expressions of feature sets containing conjunctions and disjunctions, negations filters etc. Such a feature set predicate can be added to a Contact header field and is then called Contact predicate. To avoid too much theory the next chapter lists the feature tags that can be used to express the user agent capabilities and also show an example.

91 92

RFC 3840: Indicating User Agent Capabilities in SIP RFC 2703: Protocol-independent Content Negotiation Framework

15.1.1 Feature tags

The feature tags are defined in different semantic trees. RFC 3840 / RFC4235 / RFC4569 define the sip-tree with the following 21 feature tags: This feature tag indicates that the device supports audio as a streaming media type. sip.application This feature tag indicates that the device supports application as a streaming media type. This feature tag exists primarily for completeness. Since so many MIME types are underneath application, indicating the ability to support applications provides little useful information. This feature tag indicates that the device supports data as a streaming media type. This feature tag indicates that the device supports control as a streaming media type. This feature tag indicates that the device supports video as a streaming media type. sip.text sip.automata This feature tag indicates that the device supports text as a streaming media type. The sip.automata feature tag is a boolean value that indicates whether the UA represents an automata (such as a voicemail server, conference server, IVR, or recording device) or a human. sip.class This feature tag indicates the setting, business or personal, in which a communications device is used. sip.duplex The sip.duplex media feature tag indicates whether a communications device can simultaneously send and receive media ("full"), alternate between sending and receiving ("half"), can only receive ("receive-only") or only send ("send-only"). sip.mobility The sip.mobility feature tag indicates whether the device is fixed (meaning that it is associated with a fixed point of contact with the network), or mobile (meaning that it is not associated with a fixed point of contact). Note that cordless phones are fixed, not mobile, based on this definition. sip.description The sip.description feature tag provides a textual description of the device. Each feature tag value indicates a SIP event package supported by a SIP UA. The values for this tag equal the event package names that are registered by each event package. sip.priority The sip.priority feature tag indicates the call priorities the device is willing to handle. A value of X means that the device is willing to take requests with priority



X and higher. This does not imply that a phone has to reject calls of lower priority. As always, the decision on handling of such calls is a matter of local policy. sip.methods Each value of the sip.methods (note the plurality) feature tag indicates a SIP method supported by this UA. In this case, "supported" means that the UA can receive requests with this method. In that sense, it has the same connotation as the Allow header field. sip.extensions Each value of the sip.extensions feature tag is a SIP extension (each of which is defined by an option-tag registered with IANA) that is understood by the UA. Understood, in this context, means that the option tag would be included in a Supported header field in a request. sip.schemes Each value of the sip.schemes (note the plurality) media feature tag indicates a URI scheme that is supported by a UA. Supported implies, for example, that the UA would know how to handle a URI of that scheme in the Contact header field of a redirect response. This feature tag indicates the type of entity that is available at this URI. This feature tag indicates that the UA is a conference server, also known as a focus, and will mix together the media for all calls to the same URI. The feature tag is a boolean flag. When set it indicates that the device is incapable of terminating a session autonomously. This feature tag contains one of three tokens indicating if the device is rendering any media from the current session ("yes"), none of the media from the current session ("no"), or if this status is not known to the device ("unknown"). sip.message This feature tag indicates that the device supports message as a streaming media type. sip.isfocus



15.1.2 Expression of capabilities

Based on above listed feature tags the capabilities and characteristics of a user agent can be expressed in a logical term like shown bleow:
(& (audio=TRUE) (video=TRUE) ( (sip.automata=TRUE) (sip.mobility=fixed) (| (sip.methods=INVITE) (sip.methods=BYE) (sip.methods=OPTIONS) (sip.methods=ACK) (sip.methods=CANCEL)))

The above example means that the UA supports audio and video sessions, can also act as a mailbox (actor=msg-taker and automata=true), is a fixed client and supports the methods INVITE,

BYE, OPTIONS, ACK and CANCEL. In a REGISTER request the above example is expressed in the following predicate included in the Contact header field:
Contact: <>;audio;video; actor="msg-taker";automata;mobility="fixed"; methods="INVITE,BYE,OPTIONS,ACK,CANCEL"

In some cases there is (unfortunately) some overlap with existing headers (e.g. Allow and Supported header field). In case of overlap the specific header field has higher priority. In case of a REGISTER request the feature tags are stored in the location database where they are offered for a filter mechanism at terminating requests.

15.2 Caller Preferences

15.2.1 Feature preferences
The user agent capabilities explained in the previous chapter are useless when there would not be a possibility to address and select target user agents with specific features. This is what the RFC 384193 defines: caller preferences. RFC 3841 defines two header fields for selecting a User Agent with certain capabilities and characteristics. The two header fields are Accept-Contact and Reject-Contact and they can be added to an INVITE request. With both header fields a caller can express which features the UA of the session partner shall support and which ones should be avoided. The following example Reject-Contact: *;actor="msg-taker" Accept-Contact: *;audio;require Accept-Contact: *;mobility="mobile";methods="INVITE";class="business" means that the target UA - should not be a mailbox (reject actor=msg-taker) - must support audio (enforced by the require tag) - should be a mobile equipment used for business purpose and - should support the INVITE method. The exact selection algorithm heavily depends on the existence of the additional parameters require or explicit which can be added to a feature tag. Require means a must-criterion for the selection and explicit means that a feature tag must have been explicitly declared by the User Agent during registering. Based on the feature capabilities and characteristics of a UA stored during the registration and on the feature preferences expressed in the Accept-Contact and Reject-Contact of an INVITE request


RFC 3841: Caller Preferences for the Session Initiation Protocol


a specific selection of a UA can be reached. This offers several possibilities to create services like94 Routing of INVITE and MESSAGE to Different UA Audio/Video vs. Audio Only Forcing Audio/Video Third-Party Call Control: Forcing Media Maximizing Media Overlaps Multilingual Lines I Hate Voicemail! I Hate People! Prefer Voicemail Routing to an Executive Speak to the Executive Mobile Phone Only Simultaneous Languages The Number You Have Called The Number You Have Called, Take Two

Forwarding to a Colleague Besides the above examples the Caller Preferences and User Agent Capabilities extension is used within IMS to express the requirements and capabilities to support certain service sets.

15.2.2 Request handling preferences

RFC 3841 defines a further header field Request-Disposition which enables a User Agent to influence the processing of request at a server. The following processing directives can be used in the Request-Disposition header field: proxy-directive: This type of directive indicates whether the caller would like each server to proxy ("proxy") or redirect ("redirect"). cancel-directive: This type of directive indicates whether the caller would like each proxy server to send a CANCEL request downstream ("cancel") in response to a 200 OK from the downstream server (which is the normal mode of operation, making it redundant), or whether this function should be left to the caller ("no-cancel"). If a proxy receives a request with this parameter set to "no- cancel", it SHOULD NOT CANCEL any outstanding branches upon receipt of a 2xx. However it would still send CANCEL on any outstanding branches upon receipt of a 6xx. fork-directive: This type of directive indicates whether a proxy should fork a request ("fork"), or proxy to only a single address ("no-fork"). If the server is requested not to fork


These examples are use cases defined in RFC 4596: Guidelines for Usage of the SIP Caller Preferences Extension

the server SHOULD proxy the request to the "best" address (generally the one with the highest q-value). If there are multiple addresses with the highest qvalue, the server chooses one based on its local policy. The directive is ignored if "redirect" has been requested. recurse-directive: This type of directive indicates whether a proxy server receiving a 3xx response should send requests to the addresses listed in the response ("recurse") or forward the list of addresses upstream towards the caller ("no-recurse"). The directive is ignored if "redirect" has been requested. parallel-directive: For a forking proxy server this type of directive indicates whether the caller would like the proxy server to proxy the request to all known addresses at once ("parallel") or go through them sequentially contacting the next address only after it has received a non-2xx or non-6xx final response for the previous one ("sequential"). The directive is ignored if "redirect" has been requested. queue-directive: If the called party is temporarily unreachable e.g. because it is in another call, the caller can indicate that it wants to have its call queued ("queue") or rejected immediately ("no-queue"). If the call is queued the server returns "182 Queued".

An example for request handling directives might be: Request-Disposition: proxy, recurse, parallel


16 Global Routable User URI (GRUU)

SIP uses the generic address also called the Address-of-Record (AoR) of a user to reach the user on the Internet. The SIP inbound proxy server maps the generic address to the actual physical address of a User Agent which has been learned at registration. A single user can have a number of User Agents (handsets, softphones, voicemail accounts, etc.) that are all referenced by the same AoR typically leading to forking in case of inbound requests. There are a number of situations where it is desirable to have an identifier that addresses a single User Agent rather than the group of User Agents indicated by the AoR. An example for that is the transfer of a session. When user A has a session with user B and wants to transfer the call to user C it has to transfer the address of B to user C. If the user A uses the AoR of B the transfer may go to a wrong User Agent of B because of forking. If it uses the Contact address of B it might not be reachable by a different client (C). This sounds strange but the reason is that a registered contact is frequently unreachable from hosts outside of the domain of the User Agent (UA). It is commonly a private address, or, when it is a public address, access to it may be blocked by firewalls. What is therefore needed is a User Agent specific address which is globally routable. The situation is shown in Figure 64.

Inbound SIP proxy server

R e a ch a ble via G RUU

R S eac IP ha ou ble tb o u vi a nd

ield ader f act he ialog Cont in a d ble via e a ch a o t e n g a g e d ot R N ss if n addre


Reachable via Contact header field inside of a dialog

Bob Alice
Figure 64: Reachability situation solved by GRUU


The RFC 562795 defines such an address (URI) with the GRUU mechanism. This mechanism enables a User Agent during registration to request a globally routable User Agent URI (GRUU) from the registrar. The registrar returns in the 200 OK response of the REGISTER request such a GRUU as a parameter in the Contact header. If the user agent uses this GRUU in the Contact header it can be sure, that it is globally reachable via that address. The basic idea behind a GRUU is simple. GRUUs are issued by SIP domains and always route back to a SIP proxy in that domain. The domain maintains the binding between the GRUU and the particular User Agent instance. When a GRUU is used in a request URI, that request arrives at the SIP proxy. It maps the GRUU to the contact for the particular User Agent instance, and sends the request there. Therefore it is the registrar who has to provide a globally reachable GRUU, such a URI cannot be generated by the User Agent. There are two different types of GRUUs defined: A temporary GRUU which does not reveal the identity of the user agent. A public GRUU which may reflect the identity of the user agent.

A temporary GRUU must be used whenever privacy requires to hide the underlying AoR. A precondition for requesting a GRUU is an instance-identifier, which the User Agent has to provide as a feature-tag96 at the registration. A User Agent that wants to obtain a GRUU at registration must provide an instance ID in the "+sip.instance" Contact header field parameter like: Contact: <sip:callee@> ;+sip.instance="<urn:uuid:f81d4fae-7dec-11d0-a765-0>" When the registrar detects this header field parameter (in addition to a Supported: gruu header field) it provides two GRUUs in the REGISTER response. One of these is a temporary GRUU, and the other is the public GRUU. The two GRUUs are returned in the "temp-gruu" and "pub-gruu" Contact header field parameters in the response. For example: Contact: <sip:callee@> ;pub-gruu="; gr=urn:uuid:f81d4fae-7dec-11d0-a765-0" ;temp-gruu=";gr" ;+sip.instance="<urn:uuid:f81d4fae-7dec-11d0-a765-0>" ;expires=3600 Note that all parameters of the Contact header field are sent in one line. The separation with linebreaks is done only for better readability.

95 96

RFC 5627: Obtaining and Using Globally Routable User Agent URIs (GRUUs) in SIP See chapter 15.1.1 Feature tags

REGISTER SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bKnashds7 Max-Forwards: 70 From: Callee <>;tag=a73kszlfl Supported: gruu To: Callee <> Call-ID: 1j9FpLxk3uxtm8tn@ CSeq: 1 REGISTER Contact: <sip:callee@> ;+sip.instance="<urn:uuid:f81d4fae-7dec-11d0-a765-0>" Content-Length: 0 SIP/2.0 200 OK Via: SIP/2.0/UDP;branch=z9hG4bKnashds7 From: Callee <>;tag=a73kszlfl To: Callee <> ;tag=b88sn Call-ID: 1j9FpLxk3uxtm8tn@ CSeq: 1 REGISTER Contact: <sip:callee@> ;pub-gruu=";gr=urn:uuid:f81d4fae-7dec-11d0-a765-0" ;temp-gruu=";gr" ;+sip.instance="<urn:uuid:f81d4fae-7dec-11d0-a765-0>" ;expires=3600
Content-Length: 0

Figure 65: GRUU allocation during registration

An example of GRUU allocation during registration is shown in Figure 65 above. A temporary and a public GRUU is assigned by the registrar in the 200 (OK) response. The Contact header field returned in the response from registrar contains two additional header field parameters pub-gruu and temp-gruu. Both GRUUs are valid SIP-URIs but the difference is that the public GRUU contains the AoR in full readability and the attached gr parameter reflects the instance-id, while the temporary GRUU only reveals the domain, where the GRUU has to be resolved to the physical address of the User Agent instance. The user part of a temp-gruu parameter contains a cryptographic string. The temporary GRUU is valid for the duration of a registration including refreshes, but the public GRUU persists across registrations assuming that the instance identifier does not change Finally the question should be answered when a GRUU used. Remember that a GRUU is representing the Contact address but without any limitation regarding global reach. Therefor a User Agent should use a GRUU whenever it is populating the Contact header field of dialog initiating (and target refreshing97) requests and responses. These are


Target refresh requests are requests that may change the remote target address within a dialog. These are re-INVITE and UPDATE.

INVITE and its 2xx response SUBSCRIBE and its 2xx response REFER and its 2xx response UPDATE and its 2xx response

RFC 5627 shows in chapter 9 an example call flow where GRUUs are used extensively. The interested reader may look at this call flow. A further SIP protocol extension related to GRUUs is specified in RFC 562898. In IMS a specific event package is defined which allows some nodes to learn about information stored by a SIP registrar including the registered Contact addresses. When now the Contact addresses have been enhanced by GRUUs it is reasonable to enhance also the registration event package with GRUUs. This simply means that the message body of the NOTIFY requests now also contains two new elements <pub-gruu> and <temp-gruu>.


RFC 5628: Registration Event Package Extension for SIP Globally Routable User Agent URIs (GRUUs)

17 Identity Management
According to the basic SIP standard (RFC 3261) the From and To header fields of a SIP request contain the address of the initiator and the recipient of the request. The format these addresses is an address-of-record (public address). Both header fields are significant only for the user agents (end-to-end) and therefore the header fields are not checked (screened) anywhere in the network. The recipient of a SIP request has no way to verify that the From header field has been populated correctly if not some cryptographic authentication mechanisms have been applied. SIP offers some security mechanisms including digest authentication, TLS and S/MIME. All three mechanisms do not provide the comfort and easy handling of authenticated identities which we are used from PSTN because: Digest authentication requires a shared secret between session partners. This is not easy to arrange due to the amount of different session partners and in particular it cannot be arranged in advance. TLS is only a hop-by-hop mechanism and further on requires all intermediate nodes to be trusted. S/MIME suffers from the lack of end-user-certificates.

In PSTN one can be sure that the calling line identity offered in a call is correct and can be used to call back. But there are different constraints available in PSTN, which are not available on the Internet: The PSTN is based on a network of operators with a trust relationship. The home network of a user is responsible for verifying the calling line identity and all other networks rely on the correct verification of the identity in the home network. There is also a similar model used in the IMS (IP Multimedia Subsystem) which is based on an additional header field P-Asserted-Identity that is inserted at the edge of the network and forwarded towards the destination. Between network operators of IMS there is a strict mutual trust relationship. But this model is not applicable on the Internet where different networks are interconnected without any trust relationship. RFC 447499 defines a viable solution to offer a similar asserted identity in SIP as we are used from PSTN without big effort and trust relationship between service providers. The solution is based on two new functional roles of servers: authentication server and a verification server two new header fields: Identity header field and Identity-Info header field some additional failure response codes

Figure 66 shows the identification architecture which is quite simple. It shows a request which is sent from Alice in domain to Bob in domain The goal of the SIP identity


RFC 4474 : Enhancements for Authenticated Identity Management


management extension is that Bob should be sure that the From header field in the request is correct (that it corresponds to the identity of the originator).

authenticator 4 verifier




Figure 66: Identification architecture These following steps show how this is done: 1. Alice sends the request to an authentication server which could be configured as an outbound proxy for Alice. The authenticator is located in the home domain of Alice and has the possibility to authenticate the originator (e.g. via digest authentication). 2. After successfully authenticating the request the authenticator checks the content of the From header field and verifies that the address in the From header field corresponds to the Address-of-Record of Alice. 3. The authenticator then calculates a hash code on some parts of different header fields (including the AoR of the From header field) and uses an asymmetric cryptographic algorithm to sign the hash code with its private key. It then includes the signature in an Identity header field and additionally includes an Identity-Info header field with an address where the public key may be obtained by the verification server at the target domain. 4. The request may now traverse the untrusted area of the Internet and arrives at the target domain where it is handled by the verification server (verifier). 5. The verifier takes the public key of (according to the domain-part of the AoR of the From header). If it does not have the public key of in its cache it will get the key via the reference address contained in the Identity-Info header field.


The verifier calculates the hash code on the same header fields and verifies the signature based in the public key of 6. If the signature is correct the verifier concludes that the From header contains the authenticated identity of the originator of the request and forwards the request to Bob. Otherwise it rejects the request with a failure code 438 (Invalid Identity Header).

Figure 67 shows an INVITE request forwarded by the authenticator after inserting the Identity and Identity-Info header fields (bold font).
INVITE SIP/2.0 Via: SIP/2.0/TLS;branch=z9hG4bKnashds8 To: Bob <> From: Alice <>;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314159 INVITE Max-Forwards: 70 Date: Thu, 21 Feb 2002 13:02:03 GMT Contact: <> Identity: "ZYNBbHC00VMZr2kZt6VmCvPonWJMGvQTBDqghoWeLxJfzB2a1pxAr3VgrB0SsSAa ifsRdiOPoQZYOy2wrVghuhcsMbHWUSFxI6p6q5TOQXHMmz6uEo3svJsSH49thyGn FVcnyaZ++yRlBYYQTLqWzJ+KVhPKbfU/pryhVn9Yc6U=" Identity-Info: <>;alg=rsa-sha1 Content-Type: application/sdp Content-Length: 147 v=0 o=UserA 2890844526 2890844526 IN IP4 s=Session SDP c=IN IP4 t=0 0 m=audio 49172 RTP/AVP 0 a=rtpmap:0 PCMU/8000

Figure 67: INVITE request including Identity and Identity-Info header field RFC 4474 defines exactly how the hash code has to be calculated. It is calculated on a concatenated string comprising parts of the following header fields - From: AoR part - To: AoR part - Call-ID: value - Cseq: numeric value and method name - Date: value - Contact: AoR part - Message body: whole content


The Address-of-Record of the From header field is the primary goal of protection, but the other parts included increase the level of protection. The Date header field (it comprises a date and a time string) protects against replay attacks. In practical scenarios both new functional components (authenticator and verifier) will be implemented as additional tasks of existing SIP proxy servers. The authenticator will be part of an outbound proxy server and the verifier will be part of an inbound proxy server. The authenticator has to guarantee the authenticity of the originator. This can best be done via authentication of each request (via proxy authentication using digest algorithm) or by using a TLS connection towards the user agent in which case a one-time authentication of the user is sufficient. The verifier function of the inbound proxy server may differentiate requests including a verified identity header field from those not containing such a header field. It may use an Alert-Info header field to mark all identity verified INVITE requests and thus let the user decide how to handle unauthenticated requests.


SIP based sessions and telephone calls on the traditional PSTN use different addressing mechanisms: SIP based sessions use SIP URIs usually including an alphanumeric user part Telephone calls use telephone numbers according to an ITU-T standard: E.164100.

We already have seen that a telephone number can be mapped into the user part of a SIP URI using the user=phone parameter or it can be expressed as a TEL-URI. This mechanism allows using a telephone number in a SIP session. In the legacy telecom world a user addressed with a telephone number is expected to reside in the PSTN and it may be reached from a SIP user via a SIP/PSTN gateway as shown in Figure 68 (step 1).


SIP Operator B 3 2
ENUM Query

Network Migration

SIP Operator A

G a t e w a y

PSTN Operator B

Figure 68: E.164 based subscriber addressed with SIP using ENUM But network operators now gradually migrate their subscribers to SIP based networks. After migration the subscriber has to be addressed with a SIP URI. That means that operator B has to forward all calls via a gateway to the SIP network (step 2). It is obvious that it is not economic to use two gateways and the PSTN as a transit network when it could be directly reached via the Internet. But in this case a mapping from a telephone number (E.164) to a SIP URI is required.


ITU-T E.164: The international public telecommunication numbering plan


Such a mapping is also necessary when the SIP subscriber has to be reached from a PSTN user, because a legacy telephone sets only allows to dial numbers. This is where the ENUM system comes in. ENUM represents a DNS based database where E.164 numbers can be mapped to other services (to a SIP service in our example). ENUM is an abbreviation for E.164 NUmber Mapping and is defined in RFC 6116101 and RFC 6117102. It is based on an abstract Dynamic Delegation Discovery System which is implemented on DNS. The first step on this mapping is to convert a telephone number into a FQDN (Fully Qualified Domain Name). The rule for this transformation is simple. A specific root within DNS has been reserved ( and telephone digits are inverted due to their implicit hierarchy and dots are put between each digit. A simple example is: The international telephone number is transformed to the following FQDN

+44 116 496 0348

DNS NAPTR resource records (Naming Authority Pointer) are used for this mapping. Various ENUM services may be mapped to a telephone number. In our example we focus only on a protocol based mapping for SIP. In this case the NAPTR RR is effectively a rewrite rule exchanging the FQDN with a SIP URI. This may look like: $ORIGIN NAPTR 100 10 "u" "E2U+sip" "!^.*!!" . This NAPTR record describes that the domain can be contacted by SIP using the SIP-URI There is a list of other ENUM services where an E.164 number can be mapped to defined in RFC 6117 (see the IANA registration103), but the most important mapping is towards SIP service.


RFC 6116: The E.164 to Uniform Resource Identifiers (URI) Dynamic Delegation Discovery System (DDDS) Application (ENUM) 102 RFC 6117: IANA Registration of Enumservices: Guide, Template, and IANA Considerations 103

19 Privacy Mechanism
Sometimes a SIP user does not want his identity to be revealed to other session partners. Thereby not only the From and Contact header field disclose his identity but also other header fields like Via or the Route header field may give hints to the origin or destination of a user. While the From header field may be easily modified by the user to obfuscate his identity, all the other headers mentioned above are system header fields which are required for correct routing within the network. To enable privacy of a user RFC 3323104 defines rules and a header field which expresses privacy preferences of a user. The RFC proposes for a user to populate the From header field with an anonymous SIP URI From: "Anonymous" <sip:anonymous@anonymous.invalid> and in addition to add a Privacy header field to express his privacy preferences. As the user cannot modify the other system header fields he relies on a network service, which honours its preferences and obfuscates these header fields before requests and responses to the session partners. The Privacy header field may carry different values which are defined as follows: header: The user requests that a privacy service obscure those headers which cannot be completely expunged of identifying information without the assistance of intermediaries (such as Via and Contact). session: The user requests that a privacy service provide anonymization also for session data (SDP). user: In this case the user delegates the privacy to a network service because the user agent is unable to provide privacy. none: The user requests that a privacy service applies no privacy functions to this message. critical: The user declares that the privacy services requested for this message are critical, and that therefore, if these privacy services cannot be provided by the network, this request should be rejected. RFC 3325105 adds another privacy value which can be used in certain network architectures (IMS), where the identity of a user is asserted by the network (a new P-Asserted-Identity header field is defined for that). id: With this value the user requests the network asserted to be not disclosed.

104 105

RFC 3323: A Privacy Mechanism for SIP RFC 3325: Private Extensions to SIP for Asserted Identity within Trusted Networks

An example usage of the privacy header field could be:

INVITE SIP/2.0 Via: SIP/2.0/UDP;branch=z9hG4bK776asdhds Max-Forwards: 70 To: Bob <> From: "Anonymous" <sip:anonymous@anonymous.invalid> ;tag=1928301774

Privacy: session ....


20 Reason
For creating services it is often useful to know why a SIP request was issued. Take as an example a CANCEL request and consider two different situations when a CANCEL request is sent: The User Agent client uses CANCEL when the caller gives up after listening to the ringing tone for some time A forking Proxy Server uses CANCEL to terminate pending transaction when the session set-up was successful on another branch.

In both cases the system behavior is the same but for the User Agent server it might be different. In the first case a missed call may show up on the display but in the second case this would be misleading. Also in case of responses the existing mechanism based on status code and reason phrase is sometimes not sufficient to transport all information required for proper handling of a session failure. For both applications (requests and responses) a Reason header field was defined106. A reason header filed contains one or more reason values consisting of protocol and a reason description consisting of cause and text as shown in the examples below: Reason: Reason: Reason: Reason: SIP ;cause=200 ;text="Call completed elsewhere" Q.850 ;cause=16 ;text="Terminated" SIP ;cause=600 ;text="Busy Everywhere" SIP ;cause=580 ;text="Precondition Failure"

The protocol Q.850 refers to an ITU-T standard for PSTN and it defines different cause values for unsuccessful calls. In case of interworking with PSTN via a gateway the Reason header field enables to transport and process the cause value within the SIP domain. The Reason header field may appear in any request within a dialog, in any CANCEL request and in any response whose status code explicitly allows the presence of this header field.


RFC 3326: The Reason Header Field for the Session Initiation Protocol

21 Path
When a SIP requests whose R-URI contains an AoR (Address-of-Record) reaches the inbound proxy server, this server replaces the R-URI with the Contact address of the terminal received during registration. This model assumes that the destination user is directly reachable by the inbound proxy server, but there are SIP network architectures where this is not the case. An example is the IMS network architecture, where the serving network node (S-CSCF) cannot reach the user terminal directly and always has to use a proxy network node (P-CSCF). RFC 3327107 defines a SIP protocol extension which allows different network nodes to be included in the signaling path at terminating requests. RFC 3327 defines a Path header field, which may be used by SIP proxy servers during registration. Network nodes which require to be included in routing of terminating requests (after re-targeting by the inbound proxy server) only need to add a Path header field during registration. This is illustrated in Figure 69. The mechanism can also be regarded as a sort of Record-Route mechanism for REGISTER requests. During registration the SIP Proxy Server inserts a Path header field with its own address. The address is stored in the location database and inserted in a Route header field automatically whenever a terminating request arrives.

User Agent

SIP-Proxy Server (


SIP Inbound Proxy Registrar

Supported: Path

REGISTER Path: Store Path header field 200 OK

200 OK Path:


Terminating Request

Terminating Request Route:

Terminating Request

Use Path header field as Route-Header Field

Figure 69: Usage of the Path header field


RFC 3327: SIP Extension Header Field for Registering Non-Adjacent Contacts

The user agent includes a Supported header field with the option tag path and also receives the Path header field in 200 OK of REGISTER request which it usually ignores.


22 Service-Route
RFC 3608108 defines a SIP protocol extension which enables a registrar server to inform a user agent about a service route which the user agent may use when requesting originating services. The Service-Route header field is included by the registrar in the 200 OK response of a REGISTER request. The user agent stores the content of the Service-Route header field and uses the addresses contained within it as a preloaded route. Figure 70 shows the usage of the Service-Route header field in an IMS environment. The user agent (IMS-Terminal) registers at the S-CSCF via two additional SIP proxy server (P-CSCF and ICSCF). The S-CSCF is the registrar. It inserts a Service-Route header field into the 200 OK which is stored at the IMS terminal and used as a preloaded whenever the user agent sends an initial request into the network.






200 OK

200 OK
200 OK Service-Route: Service-Route:

Create Service-Route header field

Store Service-Route header field

Originating Request Use Service-Route as preloaded Route header field Route:

Originating Request

Figure 70: Usage of the Service-Route header field


RFC 3608: SIP Extension Header Field for Service Route Discovery During Registration

By this mechanism the user agent receives dynamically the address of a server which it should use as a preloaded route whenever it requests a service from the network.


23 Request History
SIP offers simple mechanisms to redirect or retarget109 a request. This can be done at a proxy server or an application server by changing the request URI. For some applications it is important to determine why and how a session arrived at a specific application and to recognize that the request has been diverted. The request history extension allows the receiving application to get this information. It is specified in RFC 4244110. The extension is based on a new SIP header field History-Info and an option-tag histinfo. The History-Info header field may be added to a request when it is created by the User Agent client or by a SIP Proxy Server. The History-Info header field carries the following information: Targeted-to-URI: This parameter captures the Request-URI before it is overwritten and forwarded Index: This parameter reflects the chronological order of the Targeted-to-URIs if more than one retarget operation has been performed. It is based on a string of digits separated by dots to indicate the number of forwarding hops and retargets. It also reflects forking and nesting of requests. Reason: This is an optional parameter and only added, when retargeting occurs. Privacy: This is an optional parameter with the privacy value history which may be added to the Targeted-to-URI or to the Privacy header field. It indicates whether a specific or all History-Info header fields should be forwarded Extensions: These optional parameters allow for future extensions.

An example of a History-Info header field is shown below. Note that usually more than one HistoryInfo header field is included in a request (to reflect the routing history step-by-step) and as with the e.g. Route header field the History-Info header fields may be separate header fields or the values of several header fields may be accumulated in one History-Info header field with different values separated by colons as shown below. History-Info: <>;index=1, <>; index=1.1, <;cause=486; text="Busy Here">;index=1.2, <>;index=1.3 There are several levels of indices separated by dots. In above example there are only 2 levels shown.

109 110

Retarget means that the request URI is changed during processing of the request. RFC 4244: An Extension to SIP for Request History Information

The indexing rules are roughly111 as follows: The index starts at 1. Each forwarding hop adds a new index level. Each dot in the index reflects a hop or level of nesting. The number of hops is reflected in the number of dots within the index. In case of forking a SIP Proxy Server creates a new index for each branch. In above example the indices 1.1, 1.2 and 1.3 reflect three branches created by a forking SIP Proxy Server. A simple example is shown in Figure 71. It shows the principle of indexing History-Info header fields.
Supported: histinfo History-Info: <>;index=1, <>; index=1.1

Proxy 1

Proxy 2

Bob UA1

Bob UA2

Bob UA3

Supported: histinfo History-Info: <>;index=1, <>; index=1.1 <>;index=1.1.1

Supported: histinfo History-Info: <>;index=1, <>; index=1.1 <>;index=1.1.2

Supported: histinfo History-Info: <>;index=1, <>; index=1.1 <>;index=1.1.3

Figure 71: History-Info header field - indexing example Alice sends an INVITE request to Bob. Proxy 1 starts the History-Info chain by including the Supported header field and by creating two History-Info header fields as it is re-targeting to Proxy 2. Proxy 2 is a forking SIP-Proxy which forks the INVITE request to three User Agents of Bob. The History-Info header field is also included in responses. It enables upstream SIP Proxy Servers and the User Agent client to make more intelligent decisions in case of failure responses, because the response reflects all routes where the request was sent including reasons.


For more details see RFC 4244.


The request history extension is used in IMS to control all aspects of communication diversion service. It should be mentioned, that historically a Diversion header field was used to carry information about a call diversion. This header field was never standardized but due to lack of standards it was widely used112. There exists even a mapping rule on how to map between both header fields (Diversion and History-Info) in case of interworking113. Another aspect of diversion services towards voicemail system is, that when a diversion is done, not only the address (SIP URI) of the voicemail system is relevant but also the address of the user who is responsible for the diversion and the cause. Both parameters can be added to a SIP URI as defined in RFC 4458114. Such a SIP URI will look like;;cause=486 This URI shows that the mailbox of is the target and the reason for diversion has been user busy.

112 113

RFC 5806: Diversion Indication in SIP (historic) RFC 6044: Mapping and Interworking of Diversion Information between Diversion and History-Info Headers in SIP 114 RFC 4485: SIP URIs for Applications such as Voicemail and Interactive Voice Response (IVR)

24 SIP-Connected-Id
Chapter 17 (Identity Management) described a method to authenticate the identity of the originator of a session. But how can the identity of the participant at the termination side be authenticated, or the identity of a new session partner when a participant hands-over the session to another person? RFC 4916115 offers a solution for this problem. It starts with the fact that the SIP-URI of the To header field reflects only the initial target of the originator but not the final destination. Because of re-targeting (changing the value of the request-URI) during dialog initiating requests the User Agent that receives the session can have a different identity from that identity in the To header field. This may happen due to features like call forwarding, call distribution (call centre), call transfer, etc. The solution is based on an UPDATE request and an option tag from-change. It is applicable only to dialogs (usually INVITE based dialogs) and requires that the User Agents include the fromchange option tag in the Supported header field of an INVITE request and the dialog-creating116 response. This is depicted in Figure 72.


INVITE From: Alice ... To: Bob ... Supported: from-change 180 Ringing From: Alice ... To: Bob ... Supported: from-change UPDATE From: Bob ... To: Alice ... 200 OK 200 OK ACK
Media stream


Figure 72: Application of Connected Identity during session set-up

115 116

RFC 4916: Connected Identity in SIP A dialog-creating response is the first response from the User Agent Server to a dialog initiating request. In case of INVITE this is e.g. a 180 (Ringing) response or 200 (OK) response, whichever comes first.

When the UAS also supports the from-change option an UPDATE request has to be sent during session set-up irrespective if the identity addressed within the To header field corresponds to the targeted user or not. In above example the identity has not been changed. The UPDATE transaction (red arrows) may also be authenticated using the Identity and Identity-Info header fields presented in chapter 17. Figure 73 shows the same example but now the identity has been changed due to some retargeting action within the network.


INVITE From: Alice ... To: Bob ... Supported: from-change

180 Ringing From: Alice ... To: Bob ... Supported: from-change
UPDATE From: Carol ... To: Alice ... 200 OK 200 OK ACK
Media stream

SIP network

Figure 73: Application of Connected Identity during session set-up with identity change Please note that even if the SIP-URI in the From header field has been changed the associated tag (From-tag) must be kept, otherwise the UPDATE cannot be associated to the dialog. The connected identity mechanism can also be applied during a session and also when the identity of initiator is changed. It simply offers the feature to inform the session partner whenever the identity of a participant has been changed. It can also be used in re-INVITE requests.


25 Questions
After studying the relevant chapter of the lesson you should be able to answer the following questions117:

Chapter 3: Event State Publication Explain the principle of event state publication based on the principal message flow! What network components are involved in event state publication! What is the advantage of event state publication compared to event notification alone? What is the purpose of the PUBLISH request and does it use a message body? What is the purpose of the ETag and SIP-IF-Match header fields?

Chapter 4: Event Packages What is an event package in relation to the event notification framework? Why has been PIDF as technology neutral data format been defined? Draw and explain the main components of the presence architecture! Why is usually authorisation of presence subscription required? What mechanisms for authorisation of presence subscription are available? For SIP an enhanced data model for presence was defined. Explain its components! Explain some enhancement to the PIDF data structure defined for SIP! What is a watcher information event? Explain its usage in case of the presence event! What information does the INVITE initiated dialog event offer? For which applications might it be used? Explain (based on the message flow) the call-back service implemented with INVITE initiated dialog event! List some event packages and explain their purpose!

Chapter 5: The UPDATE method Which problem does the UPDATE method solve? What is the difference between INVITE request and an UPDATE request? Draw and comment a typical message flow showing session setup including an UPDATE request!


Note: questions in italics are for an advanced level only


Chapter 6: Resource Management Describe the principle of resource management signaling in SIP! Which problem does resource management solve? How does resource management impact the setup of a session? Explain the additional SDP attributes used for resource management! Draw and comment an example message flow showing session setup with preconditions! How can a user agent tell its peer that resource management should be used?

Chapter 7: Third Party Session Control What is a typical application for third party session control? Explain the message flow of a third party session control flow and show how SDP data may be exchanged between both User Agents!

Chapter 8: REFER Method Explain the purpose of the REFER method! What is a refer event and how is it related to the REFER method? What is the information carried in the NOTIFY body of a refer event? Draw and comment a message flow example of an unattended call transfer based on REFER! What is the purpose of the Referred-By header field? Why is there a security issue with the Referred-By header field and how can it be solved? What is the purpose of the Replaces head field? In which request is it typically used? How can the Replaces head field be used in an attended call transfer?

Chapter 9: Conferencing What is the advantage of using a central entity (conference focus) for conferencing? What does the Event Package for Conference State offer? What is the role of the mixer? What is the role of the policy server? Explain the steps of creating an ad-hoc conference! What is the conference-factory URI used for? Explain the principle of using URI list!

What is the floor control protocol used for? By which methods may participants join a conference?

Chapter 10: SIP Based Messaging Explain the two different modes of instant messaging in SIP! What is the drawback of page mode messaging? How does the SDP for setup of session mode instant messaging look like? What is MSRP? What is an MSRP relay server?

Chapter 11: INFO method What is the INFO method used for and what is its characteristic? For which application is the INFO method used very often? What are Info-Packages and how are info packages referred to in an info request?

Chapter 12: Service Configuration Explain the principles of the XCAP protocol! What is an XCAP application usage and how is it related to an XCAP URI? Explain the principle structure of an XCAP URI including document and node selector! Why are entity tags necessary in XCAP protocol handling? Explain the XCAP-Diff event?

Chapter 13: NAT and Firewall Traversal Why is NAT so bad for SIP compared with other protocols? What are the critical points in an INVITE request where in case of NAT wrong addresses might be included? The classical STUN protocol classifies NAT/FW mechanisms in four categories. Explain these categories and which one cannot be solved by classical STUN? Why is symmetric NAT so bad? What is the principle of the classical STUN server? Why has the classical STUN approach been re-worked? What is the difference between classical STUN and new STUN? What is a TURN server and why does it always help in case of sophisticated NAT/FW situations?

Explain the principal concept of ICE! Which are the three different address categories that are used in ICE? What is the purpose of the SIP outbound mechanism? What are the drawbacks of application layer gateways?

Chapter 14: Session Timer Which problem does the session timer extension solve? Explain the principle of the session timer extension!

Chapter 15: Caller Preferences and UA Capabilities Explain the principle of User Agent Capabilities! How are UA capabilities (media feature tags) included in SIP signaling? Give some example of media feature tags! By which header fields can a caller make use of User agent capabilities? What is the purpose of the Request-Disposition header field?

Chapter 16: Global Routable User URI (GRUU) What is the purpose of GRUUs? Explain the two types of GRUUs! Explain the mechanism how a GRUU is assigned!

Chapter 17: Identity Management Why is there a problem with the identity of the caller and the callee in basic SIP? How can the problem be solved in basic SIP? What is the drawback of the S/MIME based solution for offering secure identities? Explain the principle mechanism of the authenticated identity management solution! Draw and comment the identity management solution! What is the purpose of the Identity and the Identity-Info header field?

Chapter 18: ENUM Which problems does ENUM solve? How can a PSTN user be addressed by a SIP user?

How can a SIP user be addressed by a PSTN user? Explain the ENUM mechanism!

Chapter 19: Privacy Mechanism What problem does the privacy mechanism solve? Which header field is the basis of the privacy mechanism?

Chapter 20: Reason What is the purpose of the Reason header field? Where can it be used? Give an example!

Chapter 21: Path Which problem does the Path mechanism solve? Explain the principle of the Path mechanism including message flow!

Chapter 22: Service-Route Which problem does the Service-Route mechanism solve? Explain the principle of the Service-Route mechanism including message flow!

Chapter 23: Request History What is the purpose of the History-Info header field? What information does the history-index provide?

Chapter 24: SIP-Connected-Id Which problem does the SIP-Connected-ID mechanism solve? Explain the principal mechanism of the extension! What happens with the From- and To-tag when the Identity is changed? How can the identity of the User Agent serve be authenticated?