Sie sind auf Seite 1von 67

wn qoy (:>z

wry -rvdrpnur (Ql

7/1/ ** WsJJ A <*n*mo ( y(

Session 1A - Overview of ITIL Service Management and ITIL

Session 1A and ITIL


S1AP1 -Objectives

Overview of

ITIL Service Management

The function of the control buttons is fairly selfevident. For example, this button here will allow you to pause and restart the course at
any point.

Once a page has played completely, a review

Welcome to this computer based training


course in IT Service Management.

bar is displayed. Clicking and dragging the button left or right will allow you to rewind or fast forward to any point in the page.
Clicking this button will take you forward to the
next page in the session. The end of each

This course has been designed to provide you


with sufficient knowledge to pass the ISEB and
EXIN Foundation level exams.

page is signalled by a "ding" - when you hear

In this introductory session we will:


Discover what ITIL is, and how ITIL fits into a

that, you can then press the Next page button


when you are ready to continue.

quality environment. Examine Service Management and the Organisation, the ICT infrastructure, and how
we define a service in IT terms.

If you want to review a previous page, then the back-arrow button will return you to the page before the one you are now studying.
Clicking on this button will exit you from the
current page and return you to the main menu
screen.

Finally we will examine the functions that make up the core ITIL processes.

S1AP2 - Using this Course The user interface for this course is designed to be fairly self-explanatory.
However, we will take just a few minutes to take you on a tour of the various controls and

Over here there is a progress bar gauge, which indicates how far you are through the
current session.

Your position in each of the sessions is automatically recorded - or book-marked - as you start and end a page.
When you select a session from the main menu you can either put your mouse cursor

features so you will get the most from your


time spent in studying the course.

Most of the screen that you are looking at is taken up with the work area. This is where material is presented, questions are asked and interactions are made at various points throughout the course. Around the edges are the various controls and items of information that you will find useful as you progress through the course. Here in the top left you will see the reference for the session that you are currently studying, the subject that is currently being covered within the session and also a detailed page reference. If you have any queries or problems you should note this reference and quote it when you contact our support staff for
assistance.

over the main session title, which will always take you to the start of the session. Or, you can click any where on the bar gauge to return
to your bookmark.

S1AP3 - Using this Course

Along the top of the screen you will find some very useful function buttons. Running from left to right these buttons allow you to access: A course contents pane. Here the main sections are listed in the order they appear in the course. Think of this as a "Contents" page at the start of a book. By clicking on any entry here you will be taken straight to that page
within the course.

Along the top and bottom areas of the screen, you will see a number of buttons to help you navigate around the course and access some of the ancillary features that are provided.

The next button along will take you to a Glossary of ITIL terms. Clicking on the tabs at the top of this pane will display the glossary entries in alphabetical order.
A course keyword index pane. Think of this as the index page that you would find at the end

Session 1A - Overview of ITIL Service Management and ITIL

of a book. If you want to find out about

something very specific then simply find the


word on the list and click on it to be taken to

reference books outlining good practice guidelines for IT Service Management.

the appropriate page within the course. Alternatively you can search for a specific word by typing in the search field.
Note that when you use either of these functions, the book-marking facility is temporarily switched off. The next button along accesses your

It was conceived by the UK government who approached various organisations and subject matter experts to write all of the books in the library, and it was originally published in the
Iate1980's.

The ITIL library is published by the Office of


Government Commerce, or OGC, and in 2001
revised versions of the ITIL manuals were

Favourites. Clicking here reveals a box where you can add the current page you are on to a list of your favourite pages in the course, or ones which you would like to review at a later time. This list can be edited by changing the name of the page for your reference and removing pages from the list. Clicking on any of your favourites in the list takes you to that
page.

published to include, amongst other things, recent technological developments, such as


the internet and e-commerce. Further updates to the manuals were published in 2002.

Since its inception ITIL has expanded from a library of books into a whole industry, with many organisations offering related products including training, consultancy and management tools.
S1AP6-What is ITIL?

*^m0f

The FAQ button provides you with a list of Frequently Asked Questions about ITIL for your reference. The acronym buttons will take you to a directory of ITIL acronyms, which will help you understand parts of the course. Clicking on the tabs at the top of this pane will display the acronym entries in alphabetical
order.

The ITIL Library consists of seven volumes, although the central part of the library consists of just five. Service Delivery, Service Support, Business Perspective, Infrastructure Management, and at the centre - Application Management.

You can adjust the sound to a comfortable level by clicking the volume control button. Once you are happy with the volume level, release the slider and the panel will close.

Applications Management holds the central position as it's the only volume in the library which deals with both Development and Service Delivery issues. There are two further ancillary volumes, which provide additional guidance. They are:
'Planning to Implement Service Management', used by Project managers who are implementing ITIL.

This button opens your 'Sticky Note Pad'. Each session contains its own Sticky Note pad. You can use this to type reminders, thoughts or information that you think will help you as you prepare for the exam. If you wish you can copy and paste text from the subtitle pane into your sticky note pane. If you would like to print your sticky note pad - click here.
Clicking this button will reveal the subtitle pane. This useful function allows you to see a text transcript of the page narration. A useful slider is provided to adjust the text size, and the font can be changed by clicking here. You can print the subtitles by clicking here. S1AP4- Activity
S1AP5- What is ITIL?

and 'Security Management', which offers


additional information on infrastructure.

For the purposes of this course we are


interested in what's known as 'Core ITIL' This

core consists of the two major volumes, 'Service Support' and 'Service Delivery'.
In addition to the two main manuals we will

also refer to a guidance overview booklet known as 'little ITIL', and its sister publication

So what is ITIL?

"A Dictionary of IT Service Management". These overview booklets are published by the
IT Service Management Forum or ITSMF.

ITIL is an acronym for Information Technology Infrastructure Library. It consists of a library of

Session 1A - Overview of ITIL Service Management and ITIL


This course forms an 'introductory overview' to the content of both books, and you will find that
much of the material is also summarised in the

monitor progress, change and so on.

'little ITIL' book. This 'overview' will provide you with enough knowledge to confidently sit
for the Foundation Certificate in Service

As these projects develop they approach a transition point. A transition point is defined as the point at which responsibility for the project
passes from the development team to the

Management.

team responsible for end user delivery and


support.

S1AP7-Activity This transition point is also known as the


S1AP8 - ITIL and ISO9001

implementation point, and it can vary depending on organisational structure and


policy. approach. ITIL

Today's businesses need to concentrate on

providing a 'Quality Service' and to adopt a


more customer focussed

provides a best practice framework focusing


on the provision of high quality services, and it places particular importance on customersupplier relationships.

For example a development team might retain project responsibility until the end of a warranty
period, at the end of which they hand over the

completed project, and associated ownership,


to service management staff.

For example, areas within 'Service Delivery*


address customer agreements and monitor targets within these agreements. On an operational level 'Service Support' processes address any changes or failings outlined in
these agreements.

ITIL defines a major process to handle the

complex relationships which affect projects, and this is known as Application Management.
Application Management considers the whole 'cradle to grave' lifecycle of an application,

considering issues from feasibility through


In both cases, there is a strong link between ITIL and recognised quality systems, such as ISO9001. ITIL's non prescriptive nature allows
productive life and final retirement of the application.

the tailoring of 'Service Management', allowing it to sit comfortably alongside a recognised


quality system.

It considers applications as 'strategic resources' that need to be managed throughout their life, understanding the implications that decisions made at one stage
has on later stages. Although this process isn't examined in detail in this course, it is important to understand the

Many companies require their suppliers to


become registered to ISO 9001 and because of this, registered companies find that their market opportunities have increased. In addition, a company's compliance with ISO

relationship between Service Management


Guidance and the IT business as a whole.

k^

9001 ensures that it has a sound Quality Assurance system.


Registered companies will usually benefit from reductions in Customer complaints, significant reductions in operating costs and increased demand for their products and services.
S1AP9 - The Organisation In any organisation, managing IT services is a fundamental part of day to day operations. As well as maintaining and servicing these ongoing business functions, an organisation will often be developing new applications.
Each new application might be made up of a number of projects, or a group of projects, known as a programme. The relationship between these different projects needs to be
understood and documented in order to

S1AP10-Activity
S1AP11 -The IT Infrastructure

If service provision to business is to be effective, then its implementation should be as transparent as possible.
It should be assumed that end-users have no

ICT knowledge. IT Service Management staff must take a


customer focused view and concentrate on

providing high quality services that are available when users want them, that respond quickly to demand, and that are easily
maintainable.

Session 1A - Overview of ITIL Service Management and ITIL

As IT management staff, you will be working alongside technical specialists helping to maintain the ICT infrastructure, and ensuring
that delivered services are cost effective. The ICT infrastructure is divided into 3 areas

The ITSMF's Dictionary of Management defines Service as:

IT

Service

Hardware, Software and Peopleware.


Hardware consists of all the ICT and

An integrated composite that consists of a number of components, such as management processes, hardware, software, facilities and people, that provides a capability to satisfy a stated management need or objective. S1AP13 - ITIL Disciplines

environmental

infrastructure,

including

mainframe computers, network equipment,


workstations and so on.

The core ITIL processes are made up of eleven disciplines.


Five of these disciplines relate to service delivery.
These are Service Level Management, IT Financial Management, Availability Management, Capacity Management, and IT Service Continuity Management

Software consists of network and mainframe

operating systems, database management systems, development tools and general applications and the computer data itself.
Inclusion of data here is a contentious one, as

it's suggested by some people that a fourth infrastructure category should exist, handling data as a separate corporate resource.
And finally, Peopleware, this includes skills sets, details of training products, documentation of both products and services, working practices and general procedures.
To deliver effective services to business, all

The remaining six disciplines make up the Service Support function.


These are, Service Desk, Incident

Management, Problem Management, Change Management, Release Management, and Configuration Management.

three infrastructure components should be managed and controlled efficiently.

All six disciplines relate to the day to day maintenance of a quality service. In ITIL terms all of these disciplines, except for
Service Desk are defined as processes.
Service Desk, in ITIL is seen as a function.
You should be aware that ITIL draws a differentiation between functions and

The management of Hardware and Software is dealt with in a separate ITIL guidance volume called 'ICT Infrastructure Management'.
Our focus in this course is the management of 'Peopleware', its documents and procedures, and how it relates to Service Support and Service Delivery.
S1AP12 - What is a Service?

processes and these are defined in the ITIL Dictionary. The definitions are repeated here for your convenience. S1AP14 - ITIL Disciplines

So, what does ITIL regard as a Service?


We all encounter business services in our

everyday lives. Placing an order for goods or services for example, or when checking into a hotel, we are being offered a business service. In most cases businesses are underpinned by
IT services. The IT service consists of a set of

ITIL does not mandate the creation of specific functional areas. So, for example a Problem Management team need not be separate from a Capacity Management Team and so on. In practice, many organisations do follow this model, but ITIL guidance allows you to form
your own structures.

related functions provided by IT systems in support of the business, and is seen by the
customer as a coherent and self-contained

entity.

However, ITIL does suggest one good practice, and that is for Configuration, Change
and Release Management to 'share' staff, and to be managed by one individual. This shared management is known as the CCRM or

Obvious examples of IT services might include e-mail, payroll and order processing.

10

Session 1A - Overview of ITIL Service Management and ITIL


Configuration, Change Management function. and Release

Although we have represented each discipline here as a separate entity a great deal of
interactivity exists between each of them. Each process communicates with others in the

And finally we looked at the eleven disciplines which form the core ITIL processes, and the interactivity which exists between them within IT Service Management.

group. In fact there is a great deal of relationship management within IT Service Management.

For example, Service Level Management deals with the provision of high quality
services, provided at the right cost levels. Consequently it interacts frequently with IT Financial Management.

%j^

Interaction between other processes might be less frequent. For example, Capacity Management and IT Service Continuity Management might work together to develop a
cost effective and workable strategy to handle
a major disaster, such as a flood.
In this scenario, Information on available

capacity at a remote site or location would be provided by Capacity Management.

The pre-determined level of support required


for on-going business function would be

managed

by

IT

Service

Continuity

Management.

These 11 disciplines and the relationship


between them form the basis of this course, and are the subject of the ISEB and EXIN examinations, leading to certificates in Foundation IT Service Management.

S1AP15-Summary

In this introductory session we have briefly examined the history of the ITIL library, its make-up, and how Service Delivery and Service Support sit at its core.
We have discussed how ITIL's flexibility allows easy integration into a recognised quality system, such as ISO9000.
We looked at the ICT infrastructure and its

three constituent components, Hardware, Software and Peopleware. We highlighted Peopleware, its documents and procedures as a primary focus for this course.
We defined 'What a service is' in IT terms, and

examined some less obvious examples of 'IT


services'

11

Session 2A- Service Desk

Session 2A - Service Desk


S2AP1 - Objectives

The Incident Management process enables the recording, tracking, monitoring and resolution
of events that are a threat to "normal service".

In this session we will be examining the IT Service Desk, which is described in Chapter 4 of the Service Support book of the IT Infrastructure Library.

Problem

Management

addresses

the

underlying reasons for such incidents and

seeks to implement permanent resolutions in


order to prevent a recurrence.

When you have completed this session you


will be able to:

List the main reasons why the establishment of a service desk can have major benefits for
the organisation, the end-user and the IT provider alike.

We will be looking in more detail at both Incident Management and Problem Management in the remaining two sessions of this topic.
For the rest of this session we will be

examining the Service Desk function.


S2AP4 - Introduction

Describe the importance of the Service Desk as a single point of contact for IT users. Identify three of the main approaches to
structuring a service desk.

When a Customer or User has an issue, complaint or question, they want answers quickly. More importantly they want a result the issue solved.

Explain what is meant by "escalation" in a service desk context and identify two different types of escalation procedure.
Name at least six technological aids that can

Nothing is more frustrating than calling an


organisation and getting passed around until you find the right person to speak to provided, of course, that they are not out to lunch or on holiday or that it's just after five
o'clock.

be employed to improve the efficiency of a


service desk.

S2AP2- Activity
S2AP3 - Introduction

ITIL Best Practice demands a single point of


contact for users in their communication with

the IT service provider.

One of the most important considerations when delivering IT Services is to ensure the provision of proper support for the users, so

Such a facility is known by various names in


different organisations - some common ones being Help Desk, Call Centre or Customer Hotline. The name used by ITIL - and hence during this course - is "Service Desk".

that when an issue or a query comes up, they


can contact someone who will provide a
^Hp/
solution or an answer.

Obviously, what ITIL is referring to in this


context is an IT Service Desk but the

Often, time is of the essence and users want

either a rapid resolution or a work-around to the fault that will enable them to carry on with their work with a minimum of interruption. In order to support users in this way, ITIL has three closely related chapters, namely: Service Desk, Incident Management and Problem Management.
The Service Desk is meant to be the focal

principle can, and often is applied to many


areas of a company's business.
So, in addition to an IT Service Desk there may be a Service Desk where customers for

the company's products can call to get support. Another Service Desk may exist so that employees can get answers to queries relating to company policies, personnel issues
and so on.

point for the reporting of incidents, requests for change, or any queries that a user may have
about the service. On the other hand it also

For the purpose of this course will be making the assumption that the term Service Desk
refers to an Information and Communications

provides a channel for the IT provider to


communicate information to users.

Technology -or ICT - Service Desk. The integration between IT and communications technology is so close these days that it makes

13

Session 2A - Service Desk

sense to handle them via the same Service Desk.

skilled

network

technicians

or

database

S2AP5 - Activity

experts, for example, to concentrate only on the complex faults or concentrate on improving the quality of the infrastructure.

S2AP6 - Why have a Service Desk?


The establishment and operation of an effective Service Desk is a relatively expensive

It would usually be the case that the users or customers are performing a valuable function for the organisation.

proposition. So it is important to understand why such a facility might be needed and the
benefits that it should provide.
The
word.

So, any time that they are unable to operate at full efficiency as a result of a fault with the IT Services that they use will be both disruptive
and costly. An effective Service Desk will significantly
reduce the likelihood of such faults.

users

of

our

IT

services and

their

managers are customers in every sense of the


Like all customers they would quickly become frustrated and unhappy if they were unable to find somebody who could help them when they had issues with the systems on which they depend.
So customer satisfaction and retention can be

This factor becomes even more crucial in an ebusiness context where the lack of service will

directly impact on customers and certainly lead


to loss of business.

S2AP8 - Why have a Service Desk?


listed as an important benefit.

Another guiding principle of ITIL is that IT should maintain a focus on the support of business goals. IT does not exist to provide ICT components or technology just for the sheer joy of playing with new equipment.
It is there to help the organisation achieve its business objectives. A well-staffed and
efficient Service Desk is a critical element in

Another major benefit of a Service Desk is its contribution to the continuous improvement of the services offered by IT. The Service Desk will keep records of types of enquiry, the issues that are raised, the particular services, or aspects of a service, that
seem to cause most issues and so on.

proving to the business that IT is listening and responding to their needs.


S2AP7 - Why have a Service Desk?
An efficient Service Desk can help to reduce the overall cost of ownership of the IT department, and it can do this in a number of
ways.

Identifying the most commonly occurring faults and feeding this information back quickly to the IT Service Management structure is a critical aspect of the Service Desk. In this way, the Service Desk is the thermometer by which we can monitor the health of the IT services that are being provided. Additionally, the service desk can also operate as a "shop window" - adding value to the business by making users aware of facilities that they may not know exist - or how to make
better use, in a business sense, of the facilities

For example, one alternative to a Service Desk is for each group of users to have their own "super-user", to whom they can turn when things go wrong.
However, ITIL strongly suggests that IT costs can be reduced by not requiring high levels of IT skills within the business community, and by making it obvious to all how support can be achieved very quickly via a Service Desk.
Also, making better use of skilled and expensive IT staff can also reduce costs. Straightforward issues can be resolved

that they are already using. S2AP9 - Activity


S2AP10 - Points of Contact

There is often some confusion about the terms "user" and "customer" - so far in this course

we have used the words interchangeably and for many people they mean pretty much the
same thing.

immediately by the Service Desk, leaving

14

Session 2A- Service Desk

ITIL, however, does draw a distinction between


the two terms.

Some organisations will take this principle to its ultimate conclusion and have a single
Service Desk as the point of contact for everything to do with the ability of the business to continue to function properly. So staff within such an organisation could call the Service Desk if the lift broke down, or a light bulb in their area failed^ or if they had a query on their pension arrangements.
This kind of Service Desk has the

A User - or End-User - is taken to mean the

person who actually uses the product or service under discussion. A machine operator for example.

A Customer is the person who negotiates for the provision of the product or service, what

the specification should be, any changes that may be needed and possibly the payment
arrangements.

disadvantage of demanding a very wide range of skills to be available - which normally


implies a referral system being used - which in

It may well be that the User and the Customer

are the same person. But in many cases, for operational systems, they will be different groups of people. Customers normally being managers, and users being the operators.
These definitions are relevant here because

turn reduces the chances of an problem being


resolved directly and immediately at the desk.
For our purposes in this course, we'll assume a Service Desk is the single point of contact for
just Information and Communications

whilst the Service Desk is the main point of


contact between the User and the IT service

Technology issues, providing users with a single telephone or fax number, or with a
single web or email address.

provider, the

Service

Level Management

process is the main point of contact between the paying customer and the provider.

S2AP12-Activity S2AP13 - A Single Point of Contact

In both cases the key point of reference is the


IT Service itself - as defined in the Service

Level Agreements - which will contain statements about hours of availability, time to resolve issues, response times and so on. The importance of this to the Service Desk is that they must be aware of what Service Level Agreements are in place and how these relate to the questions, complaints and issues that may be being raised by users.

So, as the single contact point, the first duty of


the Service Desk is to act as the IT users

"friend" within the IT department. This particularly relates to the role of the
Service Desk in:

Monitoring progress on incidents and queries


Reporting this progress back to the user.

It may well be for, example, that a user calls in complaining of a 2 second transaction
response time - when in fact the Service Level

Chasing any experts that have been assigned responsibility for resolving an issue. Keeping an eye on any Service Level Agreements that may specify maximum acceptable response times for resolving user
issues.

Agreement specifies that 95% of responses


should be within 4 seconds.

Such an incident would be given a much lower priority than had the figures been reversed.
So, the general point is that the Service Level Agreements provide the link between the
Customer, User and Service Level

As the user's friend, the Service Desk has the

responsibility of communicating with the user, both Reactively and Proactively.


Reactively being in response to issues and queries raised by the users and "proactively" being where the Service Desk goes out to make users aware of issues that might affect
them.

Management and that the Service Desk has a responsibility to act on behalf of the User
within the IT infrastructure.

S2AP11 - A Single Point of Contact

As we have already seen, the idea of the Service Desk as a single point of contact is an important one in ITIL.

It is not uncommon, for example, for the Service Desk to publish regular electronic

15

Session 2A - Service Desk

newsletters to the user community informing them of new facilities, changes to services and
soon.

and hence can provide local expertise to solve


local issues.

S2AP14 - A Single Point of Contact

There are a couple of obvious disadvantages to this approach, such as duplication of


resources and the maintenance of

In order to operate effectively as a single point


of contact and the user's friend, the Service

Desk should have the following ingredients: Well trained staff with good interpersonal skills.

organisation-wide standards and consistency. Also, sessions learned in one area may not be passed on to the others.

Well organised systems and processes for recording and tracking incidents and matching against previous incidents and solutions.
Appropriate technology, such as automatic call distribution equipment and knowledge-based systems that assist in identifying solutions to
issues.

Such difficulties can be minimised by the use of centralised logging of incidents and resolutions and by establishing a central configuration management database that is accessible by all the local service desks.
The big advantage of this approach, knowledge, will obviously become important the more geographically functionally dispersed the organisation's local more and sites

become. In these situations, the issue of

Enough technical competence to address users' issues directly or to interface with technical experts if necessary.
In addition, the Service Desk must have all the

language alone may favour local service


desks.

S2AP16 - Service Desk Structure

necessary linkages with other ITIL disciplines.


For example, there must be continuous communication with the Problem Management process - particularly when a major incident has cropped up.
There will also need to be liaison with Service

The opposite extreme of the local service desk


is the central service desk, where all incidents

and queries are reported to and handled by a single centralised structure.

Centralisation has the benefit of providing


consolidation of management information and improves utilisation of resources - and therefore can reduce operation costs.

Level Management so that potential breaches of Service Level Agreements can be recognised. Configuration Management records will need to be readily accessible so that, for example, a caller's IT equipment can be easily identified. Conversely, the Availability Management process will be keen to look at Service Desk records of incidents for conducting their own analyses and as part of their role in improving service availability.
S2AP15 - Service Desk Structure

There are dangers, however, in that a perceived loss of local knowledge may tempt local sites to set up their own super-users or unofficial help desks. Another major issue with this centralised
approach is the cost of voice and data
communications.

Particularly in an international context, careful

planning will be needed, otherwise long distance telephone calls could easily drive up the cost of providing the service to
unacceptable levels.
The Virtual Service Desk is based on the

A debate that often takes place in the early stages of implementing a service desk is how
the desk should be structured, from a

concept that physical location is not relevant

geographical perspective. There are a number of strategies that will usually be considered.

and that whilst the Service Desk may be perceived as a centralised point, it may
actually consist of several local service desks.

The Virtual Service Desk is often implemented


as an example of a distributed Service Desk.

Here for example, each distinct site or region


of the organisation has its own service desk -

16

Session 2A - Service Desk

As far as the local users are concerned they


are contacting a local service desk - but in

reality their calls may be automatically routed to the most appropriate desk, based on the proximity, time of day, staffing or whatever criteria apply.

Although there are some complexities with this approach, it clearly has many advantages and is becoming a very common arrangement for multi-national organisations offering 24

hour/7day a week coverage - particularly


those in the e-commerce field.

This option is obviously much more demanding on the use of technology, particularly
telephony re-routing equipment, in order to ensure that the whole process appears
transparent to the end user.

S2AP18- Activity S2AP19 - Communicating with the Service


Desk

Faults or issues can be communicated to the


S2AP17 - Service Desk Structure

service desk via many mechanisms.


These can be categorised into two sorts human generated and machine generated.

The logical extension of the virtual service


desk is what is sometimes called the "follow

the sun" option.

Human users can communicate using a whole


This is widely used by multi-national companies - or even, these days, by local companies who want to take advantage of cheaper labour rates in other parts of the
world.

range of options such as telephone, fax, voice


mail, e-mail, browser-based web-forms and so
on.

So a typical "follow the sun" strategy might consist of a service desk in Australia, operating between the hours of 6am to 6pm local time and a second desk in London operating the
same hours local time there.

Machine generated communications could come from some form of system monitoring tool. For example, the loss of a particular
communications link in a network would

usually be reported via network monitoring


software. Such incidents are often referred to

as Operational Events.

The aim is to provide as close to 24 hour coverage as possible for users in each hemisphere with the European service desk coming on line just as the Australian one is closing down for the night - and vice versa.

These automatically generated notifications


allow the Service Desk staff to inform users

about possible issues caused by the fault or take action to repair it.
So when a service desk is established, the different inputs that will be encountered must be anticipated and catered for.

So, people in Europe requiring support during the night will have their calls automatically re
routed to Australia.

A major advantage of this approach is that the local desk will tend to be handling local calls during the period of peak demand - so that overnight re-routing, and hence long-distance traffic, should be relatively minimal - but it's
there if needed.

Clearly, some of these inputs allow potential for some form of automated response. If something comes in via e-mail then at least an acknowledgement of receipt can generated
automatically.

It may even be possible to introduce a degree of self-service where users register and track
their own incidents without the need for inter

Of course, "follow the sun" may well be more than two service desks, depending on the
location or users, time differences and

personal communication with service desk


staff.

coverage required.
To make this work effectively it is imperative that information about incidents is replicated or
shared between the different sites so that the

Be careful with this one though. It can all too easily be used as an excuse for the service desk not playing its role in monitoring and processing incidents on behalf of the user as
the user's friend.

European Desk, for example, can continue to support a user with a query that may have
been raised with the Australian Desk a few hours earlier.

17

Session 2A - Service Desk

Also, be careful with telephone calls. If they are not handled properly it is possible that the user will hang up in frustration and not re-dial.
Hence the information that would have been

Very explicit parameters need to be established to govern hierarchical escalation; otherwise it is very easy for it to become the norm, rather than the exception, which would clearly be unacceptable.

gained about a particular incident or query will


be lost. All that would be recorded is that a call

S2AP21 - Service Desk Capability Related to the escalation procedures is the general debate about how skilled and capable of resolving the issues the service desk staff
should be.

had been dropped, which in turn will be used as a key measure of service desk performance.
Lost calls of this kind are often referred to as

"fugitives". There's a fault out there that cannot be investigated because it hasn't been recorded - and although the user could have been more persistent, the fault is with the service desk staff and or their technology for not making it easier for them to report the
incident.
S2AP20 - Escalation

ITIL does not make any recommendations in this respect because there is no absolute answer - every case must be considered on its
merits.

Factors that are normally considered include the increased costs of employing more highly skilled staff against the improved service to the end-users that will almost certainly result.

Escalation Management is an important part of running an effective service desk. Escalation is the process of moving an incident or query to the point where it is most ably
resolved.

Also this may be a dynamic situation with the optimum skill level changing over time. Immediately following the introduction of a new service, for example, it may be desirable to have some experts available on the service
desk to handle the initial rush of calls about the

ITIL distinguishes between Functional and


Hierarchical escalation.

new system.

Here for example, in a generic rather than just ICT service desk, calls that cannot be directly handled by the service desk will be directed to experts in the relevant functional area. The percentage of calls that get passed upwards will be determined by the skill levels and training of the service desk staff.

Once things have bedded down it may be possible to re-locate them to more productive
areas.

So at the one end of the scale we may have an unskilled service desk, merely logging and routing calls - and at the other end would be an expert desk capable of handing most, if not all, the conceivable issues at the first point of
call.

So functional escalation is the handing over of responsibility to a functionally more competent area, in order to tackle a particular issue.
Hierarchical escalation is where issues are

In between these would be what is often called the skilled or semi-skilled service desk - and

this is considered by many to be the optimal


solution.

passed up the management chain - either because they are very serious or need higher level authority to sanction the resources needed to provide a solution.
The first level of hierarchical escalation would

Achieving this optimal balance is an interesting


and difficult task. As we have said, there are
no hard and fast rules.

normally be to the service desk manager, who is usually the owner of the incident
management process.

There is a school of thought that says a good


target is to have about 70% of all issues resolved at the service desk, without further

referral.
More serious issues may then go up to the
problem management team, with a remit to call together the necessary specialists to resolve the incident as quicklyas possible.

But this will

vary considerably

depending on the service being offered and


the maturity level of that service.

18

Session 2A - Service Desk

Whatever skill level is adopted, the use of diagnostic scripts will increase the rate of
resolution
soon.

at first call, as will access to

knowledge databases, change schedules and

Examples of telephony technology might be Automatic Call Distribution systems, which ensure that a bank of service desk operators are used in an optimal order and that work is smoothed out as evenly as possible.
Conference call facilities can be useful in

Service Level Agreements must also be accessible so that work can be prioritised depending on the SLA clauses.

allowing a second-line expert for example to


be included in the conversation with the enduser.

S2AP22 - Service Desk Capability

Regardless of the technical skills that are put in place on the Service Desk, all operators
must have certain basic attributes to make

them suitable for the job.


These will include:

Computer-Telephony Integration can achieve major gains in efficiency. An example of this would be the identification of an incoming caller based on their telephone number and the linkage of this with a configuration
management database.

This would allow all the details of the user,

A customer-focussed attitude - where helping


the customer is far more important and satisfying than playing with the latest technology.

their facilities and equipment, and possibly


service history, to be brought to the operators
screen before the call is even answered.

An articulate nature - in particular the ability to translate technical information into something
that is meaningful to the business user. This

Useful software technology would include Intelligent Knowledge-based systems that record incidents, learn from them, identify patterns over time and are able to suggest
probable causes and solutions.

can be particularly challenging when dealing


with customers who are slow to catch on or

who become frustrated, irate or even abusive.

A methodical approach to questioning and the recording of facts - and the ability to maintain that approach when under severe pressure or when handling a difficult customer.

In addition, database access would provide fast identification of known errors, problems or any information that would help to provide a
better answer to a call.

Also being widely implemented are Automatic


Referral or Escalation tools which divert an

A good business perspective and understanding of what are the business critical services. This business culture is often helped by recruiting service desk staff from within the
business itself.

issue to a pre-determined list of second-line support staff, perhaps after a certain period of
time.

And finally Automatic Tracking and Alert tools


could be used to monitor the status of an

And finally - multi-lingual capability is becoming an increasingly important attribute for some service desk staff. This is particularly
true in the case of the virtual service desk, as discussed earlier, or in multi-national organisations.

incident as it progresses through the various


stages towards a resolution.
As with all business investments - the cost of

S2AP23-Activity
S2AP24 - Service Desk Technology

introducing such technology must be carefully weighed against the benefits that they bring in terms of service improvements and operational efficiency.
S2AP25 - Benefits & Problems

For the service desk to work effectively, some investment in modern technology will be
needed.

The benefits of and potential difficulties with Service Desk are listed on page 14 of the little
ITIL book and in section 4.1.8 of the Service

Relevant technology can be categorised into two types, telephony and software.

Support manual. They are also summarised here for your convenience.

19

Session 2A - Service Desk

S2AP26 - Summary

In this session we have been looking at the


reasons for and functions of an ICT Service
Desk.

We have seen how the Service Desk's role is

to act as a single point of contact and the


user's friend in IT.

We have examined different strategies for

structuring and resourcing a service desk and


we have seen the skills and attributes that

service desk staff must have if they are to operate efficiently.

Finally we have seen some of the new technology that can be employed to improve the efficiency of operation of the service desk.

20

Session 2B - Incident Management

Session

2B

Incident

Incident Management is more aimed at a

Management
S2BP1 - Objectives

"quick fix" or a workaround rather than a longer


term structural resolution to any fault. The

priority for Incident Management is recovery of service as quickly and painlessly as possible.
Problem Management is more about identifying the underlying cause of faults and finding ways of engineering out these faults in the longer term.
This can of course lead to some conflict

When you have completed this session you


will be able to:

Define the term

Incident Management

according to ITIL Best Practice.


Understand the difference between Incident

between the two disciplines when Incident

Management and Problem Management.


Identify the key stages in an Incident's Lifecycle. Assess the priority of Incidents.
S2BP2 - Introduction

Management staff are driven to get a system


back up and running quickly.

Their colleagues in Problem Management, on the other hand, would like to have the system
down for longer so that they can conduct

analyses and identify strategies for designing


out any problems that may exist. S2BP3 - Activity
S2BP4 - Scope ITIL defines an incident as "Any event which is not part of the standard operation of a service and which causes, or may cause, an interruption to, or a reduction in quality of, that
service".

Historically, incidents were handled by a


fragmented set of processes where users faced with a problem would contact IT staff direct and any resolutions would not be
documented.

As we mentioned in the previous session, the Service Desk often plays a key role in Incident Management; recording and monitoring their progress and retaining ownership on behalf of the user as long as the incident is still "open". It is considered good practice to record all enquiries as incidents because they are often evidence of poor quality training and/or inadequate documentation.

Alternatively, system monitoring tools may have alerted technical specialists who would
rectify the problem, but again with no central recording or control.

It may be that following the initial logging, a distinction is made between simple queries
and an incident that relates to a failure or

This approach led to poor use of expensive resources - the IT experts to a failure to learn sessions from previous incidents. ITIL Best Practice processes aim to resolve both of
these issues.

degradation of a system.

A request for a new product or service is usually regarded as a Request for Change
rather than an Incident.

One of the main goals of Incident Management is to restore normal service as quickly as possible, with a minimum of disruption to the
business.

However, because the processes are essentially similar, many organisations include Requests for Change within the scope of incident Management.
Automatically registered events, such as the
failure of a disk drive or a network connection, are often regarded as part of normal operations. They are still included in the

This has to be balanced against the efficient


use of resources - and the prioritisation of
different incidents that can occur

simultaneously.

It is important to distinguish between Incident Management and Problem Management which is the subject of the next session.

definition of Incidents though - albeit that the service to end-users may never be affected.

21

Session 2B - Incident Management

S2BP5 - Incident Lifecycle

It is very important to understand the process that an incident goes through from its initial detection right through to its point of closure. The first step is the detection and recording of the incident. It is vital that every incident is logged with a unique ID reference - even if we know that the problem has already been reported and a fix is being produced.

If this total process is taking too long then hierarchical escalation procedures may end up being used, as we discussed in the previous
session.

Resolution and Recovery may involve raising a "Request for Change" and getting that change
implemented.

Apart from the basic details about the incident, the log will normally include details of how the incident was reported and the services and Configuration Items that are affected.
Incidents can also be classified into different

Recovery itself may entail the business in further actions, such as re-entering or verifying data. For example, if a disk has crashed, the problem may have been resolved by replacing the disk drive, based on an official request for change. But the service has not been recovered until the data is brought up to date from the backup or archive copies.
Incident Closure should involve some

types for use in subsequent analysis.


The example classifications given in ITIL are
Hardware, Software and Service Requests but what is sensible here will obviously depend
on circumstances.

confirmation by the originating user and, where appropriate a revised classification.

Also included in this part of the process will be the matching of the details against previously reported incidents to check for known errors, and then assigning a priority to the incident.

It is quite likely, for example that an initial report of a printer problem was classified as a hardware fault - but subsequent analysis determined that the fault was actually with the software. It is important that such corrections
are made to the incident classifications so that an accurate record is maintained.

We will be returning to the subject of priority in


a few minutes.

Initial Support may involve the application of a work-around, some sort of temporary solution that we know about from the existing problem or incident database. Alternatively, a work around may come from the expertise of the
Service Desk staff - in which case it should be recorded forfuture use.

It is possible for an incident to be closed whilst the underlying problem is still under investigation. This would be true where a work around is available, for example. Some organisations have an extra category which is "Incident Closed and Underlying Cause Resolved", which they don't use until the final resolution of the underlying problem. S2AP7 - Incident Lifecycle

In

the event that the

incident cannot be

immediately resolved at the Service Desk, one of the vital jobs at this point of the life-cycle is to identify the correct second-line support group to whom the incident should be functionally escalated. S2BP6 - Incident Lifecycle Investigation and Diagnosis may result in a direct resolution or the incident being routed to the identified second line support.
This shuttling backwards and forwards of an incident between different support groups is one of the major issues for Incident Management.

Whilst all this is going on there are the issues of ownership, monitoring, tracking and
communication to be maintained.

Additionally, there will be constant updating of the status of the incident as it moves through
the various points of its life-cycle. All of these are proactive activities carried out by the incident management staff - which is usually the Service Desk, acting on the users

behalf. It involves generating reports, keeping


users informed and managing escalations.

ITIL standard practice guidance says that all


these activities remain with the Service Desk

and the use of tools help with automatic status

22

Session 2B - Incident Management

tracking is very important in the incident lifecycle.

Once a Request for Change has been through


the Change Management process as defined
by ITIL, then this will lead to the release of a structural solution to the problem. This will be a

Finally. Remember that we said that everything


should be logged as an incident - even if it is a Service Request ie. a request for a standard operational item, such as a password reset for example.

permanent fix to the underlying fault, not just a


work-around.

If the Classification and Initial Support process


determines that the incident is in fact a Service

Whilst ail this is going on, the Configuration Management Database should be being updated with information about the incident,
any problems and their links to incidents, about

Request then the Service Request procedure


will be invoked.

any "known errors" and their links to problems, and about requests for change and their links
to known errors.

Because the request was raised as an incident, however, it will eventually have to be

brought back into the incident lifecycle at


incident closure, in order to achieve the close

So an integrated Configuration Management Database not only contains configuration item information but also related support records,
such as incidents, problems, known errors,
requests for change, and release records.

down of that request procedure. S2AP8 - Incident Lifecycle

In understanding the full lifecycle of an incident


it is important to know what further records and processes may be generated as a result of an
incident.

The absence of a Configuration Management Database will make it very difficult to harmonise separate incident recording, problem recording, and change recording
systems.

When an infrastructure fault is first reported it is recorded as an incident, either by the


Service
tools.

We will be looking in more detail at the Configuration Management process in Sesson


3.

Desk

or

direct

to

the

incident

management process by automated support S2BP9- Activity


S2BP10-Priorities

Incidents can spawn problems if they are occurring incidents, or if the Service Desk or second or third-line support cannot ascertain the underlying cause. Some problems will justify the generation of a "known error", this being an admission or statement that we are aware of the problem
and we have a resolution to it.

Assessing the priority of an incident is a very


important process that needs to be carried out early in the incident's lifecycle, since it determines what effort is going to be put into
its resolution.

Priority is determined mainly by the impact and the urgency of the incident or enquiry.

In other cases, it may well be that a work around is an adequate solution - at both the incident and problem levels.
A good example of this might be ahead of a major infrastructure change, where making significant changes now would not be
worthwhile.

However, other things can also come into play. Pragmatically, resource availability will also have a bearing. So if nobody with the right skills to solve the fault is immediately available
it may have to be put down the list a little.

If a "known error" is generated then in most cases this will lead to a Request for Change in order for the underlying fault to be corrected. Unless, as we have said, there are good reasons why we should just live with the problem for the time being, because the cost of a short-term fix is not justified.

Another factor affecting priority may be the existence of a specific statement in a Service Level Agreement that is threatened by the
incident.

Impact - in this definition, is the measure of


the effect of the incident on the business. This could be measured in terms of numbers of

users affected or financial loss for example. So it is important to work very closely with the

23

Session 2B - Incident Management

business in order to understand the factors

S2BP13 - Benefits & Problems

that are considered high or low impact.

Urgency concerns the timescale in which the


incident needs to be resolved.

The benefits of and potential difficulties with Incident Management are listed on page 18 of
the little ITIL book and in section 5.4 of the

For example, a fault with a payroll system that


occurs on the 2nd of the month may well be

Service Support manual. They are summarised here for your convenience.
S2BP14 - Summary

also

considered less urgent than the same fault occurring on the 20th. These two factors together dominate the ITIL model for determining priority. So a high

In this session we have been examining Chapter 5 of the Service Support Manual Incident Management.

urgency does not always mean a high priority - if the impact is considered to be relatively low. For something to be high priority both the impact and urgency must be high.
S2BP11 -Priorities

We have seen how Incident Management is Defined, the scope of Incident Management
and the differences between Incident

Management and Problem Management, which is the subject of the next session.
^$r

As we have already mentioned, Service Level Agreements can also influence priority.

Let's say that Incident A occurs and that this is


the fourth incident relating to a particular
service in the current month.

We have followed the main stages through which an Incident passes during its lifecycle and looked at the records that must be kept and the need for an integrated Configuration Management Database.
We have also examined the different factors

On the other hand Incident B occurs on a different service and this is the second incident

to have occurred so far during the month.


In both cases, The Service Level Agreement for the service states that only four incidents per month are permissible.

that must be considered in determining the priority of different incidents, which may be competing for limited resources.

In these circumstances - all other things being equal - it would be reasonable to give Incident A a higher priority.

The resources available are also likely to affect the priority given to an incident. Although if both the impact and urgency are high then it is likely resources will just have to be made
available from whatever sources.

Where there are a number of medium priority incidents to resolve then clearly the ones that have suitable resources immediately available
will be tackled first.

Note that when a major incident occurs - in other words one with a high impact, urgency and SLA threat - Problem Management staff must be informed so that they can provide
extra support to the Service Desk team.

S2BP12-Activity

24

Session 2C - Problem Management

Session

2C

Problem

a reactive way, but also has a proactive


element.

Management
S2CP1 - Objectives

In this session we will be examining Problem Management, which is described in Chapter 6


of the Service Support book of the Infrastructure Library. IT

Proactive response adopts a forward-looking approach. Trying to prevent issues occurring by providing intelligent analysis of problem trends and statistics, they may even get involved in making decisions about purchasing, and IT provision.
As the term suggests a proactive response is an ongoing and methodical process. The
intention is to minimise occurrences of

When you have completed this session you


will be able to:

incidents by identifying and resolving problems


Define the term Problem Management
and known errors. We will define the difference

according to ITIL best practice.

between problems and known errors a little


later in this session.

Identify Problem Management's reactive and proactive activities.


- Recognise the standard set of activities for problem control and error control.

The 'reactive' requirement of problem management is to resolve Problems quickly, effectively and permanently. It should identify the underlying problems, which are causing
related incidents, and find an immediate
workaround.

List the benefits gained from this process


S2CP2 - Introduction

Any workaround should allow the smooth


continuation of business. When a resolution is

The final component in the ITIL infrastructure

implemented via the change management


process, it should be a permanent solution that will resolve the problem and the related
incidents.

library guidance for supporting the user of IT services is Problem Management.


ITIL defines a problem as 'the unknown underlying cause of one or more incidents.'

Once a problem has been identified, and a

It goes on to define the goal of Problem Management, and that is to minimise the
adverse effect on the business of incidents and

satisfactory resolution found to that problem, then the change will normally be implemented through change management procedures. Whether problem management acts reactively
or proactively, it is important that resources to deal with them are prioritised on a 'business
needs' basis.

problems caused by errors in the infrastructure, and to proactively prevent the occurrence of incidents, problems and errors.

Broadly speaking, Problem Management exists to ensure that a process is in place


which identifies once and for all the root

This prioritisation is sometimes referred to as

causes of problems. It also helps minimise the effects as well as preventing potential problems occurring in the future, thereby attempting to minimise underlying problems
and their causes.

'prioritising in pain factor order'. The pain factor relates to the number of people affected by incidents, and the related problem, and the
seriousness of the impact on the business.
S2CP4-Activities

Problem Management processes are usually carried out by teams of technically focused specialists who work closely with Service Desk and Incident Management staff, and with other internal and external suppliers.
S2CP3 - Activities

As is common to other ITIL processes, the communication of management information between IT Service Management roles is very important. This information is used both internally, within the problem management
team itself, and distributed to other IT Service

Management roles, Management.

such

as

Availability

As is common to other ITIL processes, Problem Management responds to incidents in

25

Session 2C - Problem Management

For example, if IT users were encountering lots of problems caused by poor quality software delivered and supported by a third party supplier, then information gained from Problem Management would be very useful to the Contract Management team. They could use this to help the suppliers make improvements, or in evaluation or analysis of the software or supplied service. In some instances they could
also revoke the contract.

providing support to the organisation. Typically 80% of incidents are caused by 20% of the IT infrastructure components. This Configuration item information can prove useful when attempting to identify the underlying cause of incidents. The provision of management information from problem data to Availability Management for example, can provide vital information on expected levels of availability, and as a consequence, influence statements made about availability in Service Level Agreements.

S2CP5-Activity
S2CP6 - Activities

Ultimately, by redirecting the efforts of an


So how do we define the responsibilities of staff working in Problem Management? These responsibilities can be broken down into a
number of focused areas.
These are: Problem Control Error Control

organisation from reacting to large numbers of incidents to preventing future Incidents, you provide a better overall service to your
customers and make better use of the IT

support organisation resources. Finally conducting Major Problem Reviews. These reviews take place after a problem causing major incident or multiple related incidents have been successfully resolved. It's the responsibility of the Problem Management process to review, identify and prevent the problem reoccurring in the future. Additionally, information from these reviews can identify weaknesses in problem management and incident management processes. These review procedures form part of a 'Service Improvement Programme' a key task for any ITIL conformant organisation which aims to improve value and quality.
S2CP8 - Definitions

Proactive prevention of problems Providing management information from problem data Conducting major problem reviews
Problem Control focuses on transforming Problems into Known Errors. It does this by identifying the root cause of the problem and providing a temporary workaround where possible. This process redefines a Problem as
a Known Error.

Error Control focuses on resolving Known Errors under the control of the Change Management Process. The objective of Error
Control is to be aware of errors, to monitor them, and to eliminate them when feasible and

financially justifiable.
Error Control has become a common process in both the applications development,
enhancement and maintenance environment

So let's look at some problem management definitions in more detail. Firstly, the definition of a problem, which is 'The unknown underlying cause of one or more incidents'.
New Problem identification occurs when we

as well as the live environment; Normally a service and its configuration items are
introduced to the live environment with some

are unable to find a match amongst the definitions of existing problems, or existing
Known Error records. A Problem Record is then raised. One of the most effective Problem

Known Errors. It is important that these are


recorded in a 'Known Error Database, so that

Management techniques is to match against a number of multiple related incidents, and

when related incidents are reported in the live environment they can easily be identified.
S2CP7-Activities

realising that they have a common underlying


cause.

These

Multiple related

incidents

are

of

Proactive Prevention of Problems, and Providing Management Information from Problem Data includes techniques such as trend analysis, targeting support action, and

particular concern to Service Managers, as they can threaten reliability clauses within
Service Level Agreements or Contracts. For

example, an SLA might specify that in any


rolling month there will be no more than two

26

Session 2C - Problem Management

breaks in service provision, and the duration of these breaks will be no greater than two
minutes.

S2CP11 - Problem Control Processes


Classification

Problem Classification is often an extension of

So any train of events causing us to approach these parameters is a major concern. Hence

the incident classification, and is used mainly


to determine an appropriate allocation of

Problem Management helps by providing a


very important role in the ITIL Service

resources. For example, a problem might be


identified in the Local Area Network. This leads

Management structure, by providing early Identification of problems, and communicating this information to relevant management
areas.

to the creation of a team of problem solvers mainly drawn from network specialists. We will discuss this classification process in more
detail later in the course.

S2CP9 - Activity
S2CP10- Problem Control Processes

Investigation and Diagnosis

These two stages are defined separately


The Problem Control process set consists of a
standard set of control activities.
These are:

because they form an iterative process. Initial investigation results in initial diagnosis, which leads to further investigation and so on.

Ultimately the outcome from this process


should be a Known Error.

Identification

These two stages are complex, and require a good technical knowledge, supported by
problem solving and diagnostic skills. ITIL

Recording
Classification

Investigation Diagnosis
Review & Closure

recommends, amongst others, two techniques to help this process. These are Kepner and
Tregoe analysis and Ishikawa fishbone diagrams. Both are important mechanisms, which allow those working in Problem Management to use a structured approach to problem diagnosis.

Each reported incident passes through this


process set, so let's take a few moments to
define each of these in more detail.

Problem Management is unlikely to implement


Identification

Problems

can

be generated from many

the resolution of an error. Once a Known Error has been identified then it is handed to Error

sources. An incident might be completely new and have no matching characteristics with records in either existing Problem or Known Error databases. It may also be a reoccurring incident, which has already been identified. Or it might come about as a result of Problem Management's proactive work, where a trend has been identified and a problem identified as
a result.

Control. Although Error Control remains part of the Problem Management Process Set, any resolution is likely to require some level of agreed change, hence the responsibility for the resolution will transfer to Change
Management.

Recording Once a problem has been identified, a record is created with a unique identifier, and a link is generated to any associated records, such as the incidents that caused it, and also to any Known Errors to which it might relate. It's likely that the problem will pass through the change process, and at this point it will be linked to requests for change. Throughout this process records will also be linked to related configuration items, within the configuration management database.

However, for particular types of problems, there are occasions when Change Management may devolve authority to the Problem Management team. Importantly,
Problem Management must still raise the necessary change records in order to do this.
S2CP12 - Problem Control Processes
Review and Closure

On resolution of every major Problem, Problem Management should complete a major problem review. The appropriate people
involved in the resolution should be called to the review to determine:

What was done right? What was done wrong?

27

Session 2C - Problem Management

What could be done better next time?

And finally how can we prevent the Problem from happening again?
Problem closure is the last of the Problem Control Activities and is often carried out

automatically when a resolution to a Known Error is implemented. However we should point out that an interim closure status can exist. For example, when a Known Error has been identified and a solution put in place, a status of 'Closed pending Post Implementation Review' could be assigned to it in either the
Incident, Known Error or Problem records.

A problem's classification may well change as a consequence of the diagnosis activity. This first classification of a problem is described as the 'initial classification'. For example, what at first appeared to be a problem with a network might actually be the result of a database problem. The problem is then reclassified.
However, it is usual to retain both the initial and final classifications, so that resource

allocation to problem areas can be improved.


S2CP15 - Proactive Activities We discussed earlier in this session how

'Closed pending PIR allows us to confirm the effectiveness of the solution prior to final
closure.

problem management works reactively to identify problems, by checking knowledge bases for records of problems, Known Errors, changes and so on.

For incidents, this may involve nothing more

than a .telephone call to the user to ensure that they are now content. For more serious
Problems or Known Errors, a formal review

may be required.
S2CP13- Activity
S2CP14 - Problem Classification

A proactive activity involves the analysis of past incidents, and the IT infrastructure as a whole. For example, analysis might identify that a pre-existing problem at one site, might
reoccur at another site, which has a similar

^y

server, hardware and software configuration. Also involved is the broader analysis of the IT
infrastructure itself. The examination of over

When a Problem is identified, the amount of

effort required to detect and recover the failing Configuration Item has to be determined. It is also important to be aware of the impact of the Problem on existing service levels. This process is known as 'classification'.
One of the main reasons for problem classification is to ensure that any group of specialists that we bring together to solve a problem is the most appropriate. If a problem is generated by the local area network, then it's important that we assemble LAN and desktop specialists.

complex relationships, or single points of failure, can identify any vulnerable points that are a potential threat to a business.
This analysis might indicate that a particular network route is more heavily used than expected, and as a consequence is a potential
future risk.

Problem classification is also used to prioritise the sequence in which problems are addressed. If we are experiencing a large
number of incidents related to several different

Often this work is carried out in conjunction with Availability Management staff, and involves careful analysis of paths through the component infrastructure that make up the various services. For example, a customer using on-line banking to read their balance may involve hundreds of different paths.

areas of the business then priority must be assigned appropriately. Every incident, problem or change will have both an impact on the business services and urgency. Impact describes how vulnerable the business might be. For example life threatening or
merely a small inconvenience.

Another element of proactive problem management involves working with third party suppliers, and our own internal staff, to ensure all procedures are adequate, for example testing procedures, release procedures and so on. Internal staff can be encouraged to take part in system reviews during development, ensuring a higher level of maintainability is designed into the system.

And finally, providing access to 'knowledge


Urgency illustrates the time that is available to
bases'. Service Desk staff will be able to link

avert, or at least reduce, this impact.

recently occurring incidents to Known Errors

and Problems in these bases, resulting in a better understanding of the underlying

28

Session 2C - Problem Management

problems

and

Known

Errors

in

the

recognised at the Service Desk. Initially we will


attempt to match it against our Known Error database. If a match is found, then the incident moves to 'inform user of workaround' status,
and if the workaround exists the user is

Organisation.
S2CP16 - Error Control

Error

Control

consists

of

four

defined

informed immediately.
The incident process moves on to: Increase by one the incident count on the
known error record.

processes and these are:

Error Identification and Recording


Error Assessment

Recording Error Resolution


Error Closure

Update the category data in the incident, this


could involve reclassification of the incident.

Error identification and recording only comes about when a root cause and, if possible, a
temporary workaround has been found.

An incident might have been initially identified as a network error, but recognised in the
Known error database as a database related
error.

Error assessment involves deciding on how to resolve the error and, if this is valid, raising a
request for change to achieve this.

The next process is to extract any permanent resolution or circumvention knowledge from the known error database. If a permanent
resolution exists, then the Service Desk can

Recording Error Resolutions in documents that the problem has 'actually' been resolved. Here Problem Management works closely with Change Management and Release
management process teams, and the enduser.

execute this, often with the support of change


management.

The third incident example has no match in the Known Error database. However, as it's a pre existing Problem it does have a match in the
problem database. In this case the incident
then follows a similar route to our Known Error

And finally Error Closure. Closure only occurs when the relevant change has led to the business finding a satisfactory resolution to the underlying errors, problems and related
incidents.

example.

Finally, the fourth example has no matches in


either the Known Error or Problem databases.

All four of these processes are classified as reactive. Error Control also has a proactive element. This proactive activity includes analysing and maintaining the Known Error Knowledge base, in order to provide support to the Service Desk, and identifying underlying
trends in Known Errors.

This incident is identified as being caused by a


new problem, and a new record is raised in the
Problem database. The incident is then

forwarded for further support to the problem


management team.

S2CP18-Activity
S2CP19 - Benefits & Problems

S2CP17 - Incident Matching Assisting Incident Management is a fundamental responsibility of Problem Management. To identify incidents, and to assign actions to them, information management moves it through an Incident matching process model. Let's look at some example incidents and follow their path through the model. The first example is defined as a routine
incident, and exits the model at the routine

The benefits of and potential difficulties with Problem Management are listed on page 22 of
the little ITIL book and in section 6.4 of the

Service Support manual. They are summarised here for your convenience.

also

S2CP20 - Summary
In this session we have been examining Chapter 6 of the Service Support Manual Problem Management
We have examined in detail the standard set of

procedures level.

The second example is defined as a nonroutine incident, in other words, one which isn't

control activities, Classification, and

and the Problem Problem and Error

identification processes.

29

Session 2C - Problem Management

We finished by defining the four Error control

processes, and to outline the benefits, and some possible drawbacks, of Problem Management implementation.

30

Session 3A - Configuration Management

Session 3A -

To

account

for

all

IT

assets

and

Configuration Management
S3AP1 - Objectives

configurations within the organisation and its


services.

To provide accurate information on configurations and their documentation to

In this session we will be examining the first of the three ITIL control Processes, Configuration Management, which is described in Chapter 7
of the Service Support book Infrastructure Library.
In this session we will;

support

all

other

service

management

processes.

of the

IT

To provide a sound basis for Incident

Management, Problem Management, Change


Management and Release Management.

Examine the relationship between Configuration Management and the Service Deliveryand Service Support functions
Define a Configuration Item in ITIL terms

To verify Configuration records against the infrastructure and correct any exceptions.
S3AP3 - Relationships

So let's start by looking at how Configuration Management relates to Service Delivery and
Service Support as a whole.

Look at the Configuration Management Database, and the type of information and
records it contains

ITIL places Service Level Management at the


very top of our objectives because it represents service delivery's 'shop window' to
customers and users alike. It's also a service

Describe the five Configuration Management sub-processes. Planning, Identification, Control, Status Accounting and Verification.
S3AP2 - Control Processes

to which,guarantees are applied, in the form of Service Level Agreements.

Service level management is supported by


Configuration Management sits at the centre of
the three ITIL Control Processes.

several Support and Delivery processes, which amongst other things, enable Service Level

The objective of these Processes is to ensure


that:

Management to negotiate and comply with SLAs. This whole support structure is underpinned by the configuration management
process.

The organisation has accurate records of its


ICT assets

Changes to the IT services are executed quickly and with the minimum of business risk

ITIL guidance is explicit on this point and states that 'without effective configuration management we are not likely to effectively implement the other ITIL processes, and this will lead us to a failure to deliver a quality
service.'

An integrated set of data exists, recording details about services, their ICT components and any related support records.
ITIL guidance considers this process as the foundation on which a stable organisation is built. In any organisation, knowing what assets
we have and their current status is

In ITIL terms Configuration Management can be defined as Asset Management plus


relationships. By definition this statement

broadens

the

scope

of

Configuration

fundamental to business stability. After all, how can we build something without knowing what we are building on, and what we have to build
with.

Management. Most organisations have some sort of asset management system in place, where they know the cost of equipment, where it was purchased, and its current status. Such systems will only cover hardware and boughtin software.

This is how ITIL defines the four major Configuration Management goals:

Existing systems are unlikely to cover the 'relationships' or linkages between these assets. This linkage is very important, making changes to one, can have a knock-on effect on

31

Session 3A - Configuration Management

several others, so ITIL clearly focuses on Assets and their relationships.

several CIs. For example a service for the personnel dept might consist of hardware,
software and related documentation, all of

Because configuration management's remit is wider than pure asset management, we tend to refer to the information that Configuration Management maintains as Configuration Items
or CIs, rather than IT assets.

which are individual configuration items. These items together can provide a service, and the
service itself can also be defined as a

configuration item.

S3 AP4 - Activity
S3AP5-CMDB

ITIL suggests that we should be able to draw a map of how a service is assembled from its constituent components. This graphical representation can help us understand the impact of any changes we make to a CI on the
that Configuration
service as a whole.

We

have

established

Management underpins all the Delivery and Support Processes, and it defines IT assets
and services as Configuration Items.
We've also established that it monitors the

The CMDB is also the ideal place to hold incident records, problem records and known error records if they are held on separate systems. ITIL guidance suggests linking these
databases, so that we can associate a record

inter-relationships or linkages between CIs.


So how does Configuration Management

store, manage and update this information? It does this by entering all this information into a Configuration Management database or
CMDB.

with any related configuration items. By doing so, future searches on a particular CI will return information relating to outstanding incident, problem or known error records.
In the change and release section of the CMDB, we may hold requests for change, change records and so on. This information is used for tracking the progress of change and
release records. A release record will contain

A typical CMDB should contain information on:


Hardware, Software, Peopleware, and related
documentation.

information about a number of related CIs,

Services, and the relationship between Configuration Items.

which make up a new release, and will describe how to achieve a change defined in the change records.

Incidents, problems and known errors.


Changes and releases Records at the highest level contain information about the organisation's hardware, servers, workstations, including communications equipment and networks
Information relating to Software, including operating systems, applications or script software, or any custom designed software.

A CMDB can offer great benefits to an organisation. However the benefits might not be immediately obvious to senior management, who might suggest that a simple asset management system would be sufficient. However, asset management only addresses higher value issues in the infrastructure and
doesn't examine it to the same level of detail.

Details

about

Peopleware,

including

information related to IT service staff and their skills.

Perhaps more importantly, asset management systems will not contain the linkages to incident, problem, or known errors, or to change and release management records. Nor will it document the relationships between CIs
and asset records. S3AP7-CMDB

And finally, documentation,

information including

related

to

procedures, We briefly defined earlier in this session what


constitutes a CI, and 'ITIL' defines a

contracts and so on.

S3AP6-CMDB

The second level holds records related to IT

Configuration Item as 'any component of an IT Infrastructure, including a documentary item such as a Service Level Agreement or
Request for Change, which is, or is to be,

services. A service might be made up of

32

Session 3A - Configuration Management


under the control of Configuration Management and therefore subject to formal
Change Control'.

S3AP9 - Activity

S3AP10-Planning

CIs will vary in type, distinguishing between


hardware, software and documentation, and in some circumstances, will sub-define lower

level configuration item records. For example hardware type might be made up of
workstations, servers, network equipment and
soon.

ITIL suggests that Configuration Management is made up of five sub-processes. These are: Planning, Identification, Control, Status Accounting and Verification.

Planning is carried out at the beginning of any


process to establish a configuration
management plan, and should be revisited regularly.

Whatever the CI type, it will require a unique form of identification. Firstly, a unique identifier, which should comply with a pre defined configuration policy. Also an ID type,
which categorises the item into hardware, software, peopleware and so on. Other common CI attributes might include a manufacturer's or developer's id, its location, purchase date and so on.
S3AP8-CMDB

The processes of Identification, Control, Status

Accounting and Verification are on going.


So let's look at each of these processes in a
little more detail.

\fffj

S3AP11-Planning

The first of the Configuration Management sub-processes is planning. ITIL suggests five
key points which should be addressed in planning, and these are:

In addition to the CMDB, Configuration Management has linkages to two other information repositories. These are the Definitive Software Library or DSL, and the
Definitive Hardware Store or DHS.

Strategy, policy, scope and objectives.


Processes, procedures, guidelines and
responsibilities.

The DSL is the safe storage area for trusted software, and is managed by the Release Management process.

The relationships with other ITIL processes.

The DHS houses spare parts for critical equipment, and replica configuration models in the IT infrastructure. For example the DHS might contain a fully configured standard
server and workstation.

The relationship with other parties carrying out Configuration Management.


And finally tools requirements. and other resource

Again records relating to the contents of both the DSL and DHS are held in the Configuration Management Database.

We start by defining a strategy. For example,


an organisation might want to establish a Configuration Management system, but for its 'live systems' only.

Also worth noting here is the management of software licences. This has become a major issue for many organisations, and the repercussions of illegal software use can be severe, so it's considered good practice for configuration management and release management to work jointly on this process.
In a fully ITIL implemented organisation, the configuration management team would be expected to hold information about licences, what they contain, and what they cover, as a
CI in the CMDB. However, as with the DHS

Another policy may define that all new boughtin or internally developed systems or services are to be brought under Configuration Management control at the point of hand over, but existing live systems will not be within the
scope.

The scope might encompass desktop services,


workstations and data centres, but not the
communication network. Accurate definition of

the scope is important in order to understand


the amount of work involved, and the

and DSL the physical licences might be held in a separate repository.

resources required.

33

Session 3A - Configuration Management

Once the strategy, policy and scope are defined, the objectives can be outlined, and a
timeframe in which to achieve them.

management tools, with the benefit of automatic CI recording to the CMDB via these
tools. S3AP14 - Identification

Remember the objectives should be 'SMART' objectives, in other words Simple, Measurable,
Achievable, Realistic and Timely.
S3AP12- Planning

Having dealt with strategy, policy, scope and objectives, our next action is to examine our processes, procedures, guidelines and responsibilities.
The organisation might already have in place processes to control assets, or manage change. Although these may not be formally identified as a Configuration Management process, they may be adapted and improved
upon.

The second of the five Configuration Management processes is identification. The primary focus of the identification process is the establishment of the 'Configuration Item

Level'. When defining a configuration item we


need to establish what level of detail is

appropriate.

For example, a complete workstation might be considered as a configuration item, or it could be further categorised into its component parts, and make each of these a CI. This logic must also apply to software, defining a CI as a program as a whole, or a module or sub module of that program. Generally speaking, select a configuration item
level which is most beneficial to the

Planning procedures should be created and maintained along with other related guidelines.
We will discuss this in more detail later in this
session.

And finally responsibility has to be allocated. After all, these plans, processes and changes
have to be carried out. So work should be

configuration management process. The greater the level of control required over an area or service, the greater the level of configuration management record detail.

allocated to staff in either a configuration management group, or a wider configuration, change and release management group. If, in this sample scenario, configuration management is being introduced into the organisation after other ITIL processes, then it is important to define how these other processes will have to change to accommodate the new configuration management process. Alternatively, if configuration management is implemented ahead of other processes, future inter-process relationships will need to be considered. S3AP13- Planning

Be careful in choosing the most appropriate level, balance information availability and the level of independent control, against the resources and effort needed to support the
CMDB at that level.

The key target is 'maximum control with


minimum records'.

It's also worth

noting that, the level of

configuration hierarchy could be restricted by the support tools available. For example
breaking down a workstation into its monitor
and screen, and then further down into its

Relationships with other parties who carry out Configuration management also require particular attention. Suppliers, external software vendors, and developers might have
their own CMDB with which we want to

motherboard, CPU and other component parts, may be impossible if the depth of our CMDB system hierarchy is specified at two levels only.
S3AP15 -Identification

exchange information.

The final point on planning is the use of tools, and other resource requirements. Careful consideration needs to be given to CMDB

A configuration item record may contain information about configuration items below it in its hierarchy. For example, in the event of a workstation failure, the policy might be to
replace the whole workstation rather than the failed component. However, CI information about the failed component could be held in
the record for the workstation.

implementation, whether to design and build a CMDB from scratch, or to purchase an off-theshelf product. Vitally it should be possible to link the CMDB to system and network

34

Session 3A - Configuration Management

Also consider that a CI might have linkages to other CIs other than its immediate parent. In
these circumstances the CI information would

S3AP17 - Identification

show its linkage to its parent, and also a 'used by* relationship to other CI's. It would not be

In defining the inter-relationships between CIs, there are a number of typical types' which can be used. The most frequently used in ITIL
good practice are Composition, Connection and Usage.

helpful to lose this level of detail by incorporating details into the parent CI.
Documenting these linkages in the CMDB can have a huge impact on database size. Each new CI added might identify three or four linkages. It's good practice to establish in advance the required levels of CIs in the

'Composition' is the simple parent child relationship. A workstation being the parent,

the monitor, keyboard or system box being the


child.

database, even if we don't initially populate the


database to this level. With most CMDB tools, it's far easier to have empty elements in the
database, than to have to restructure the
database at a later date.
S3AP16-Identification

'Connection' describes the relationship between hardware items. The relationship between a LAN and a server for example.
'Usage' describes the interdependency between application usage of a common software module, or the linkage from one category to the other.

Successfully building and maintaining a CMDB depends on accurately identifying and labelling
its configuration structures and CI versions and types, and their linkages with other CIs. This is termed as defining its scope. Defining scope identifies which items of hardware, software, peopleware and
documentation are to be included. Part of this

Finally having

identified

and documented

information about CI items, items should be

labelled. These might exist in electronic format,

or might be printed labels which we apply to


identify the relevant CI's.
S3AP18 - Identification

process involves identifying the number of 'configuration types', and what benefits their identification will bring.

During development we might want to capture information about CI's and their relationships,
to reflect the position at a particular time. This

When identifying and refining CI types, we


might come across candidate CIs which are generally very similar, but have subtle
differences. For instance, two workstations

is known as 'baselining'. This can be a very useful process, as baselining can provide a rollback point if things go wrong. It can provide a specification from which copies can be built,
and can provide valuable review information

exist, which, except for having monitors of different sizes, are exactly the same. This slight difference in specification wouldn't justify the specification of a new CI type. To help us accommodate these anomalies we can specify
these as a 'CI variant'.

after the implementation of a request for


change.

During the baselining process, we should include the relevant related items, including documentation, procedures, peopleware and
so on. Baselines should be established at

Version Identification needs to address the full

lifecycle of the Configuration Item, so, in addition to those items already in the live environment, items in development and awaiting release are also included. At the same time version numbers are assigned.
These numbers should be monitored carefully. If for example the development department assign their own version numbers, then it's important that this information is transferred to the CMDB at the point of handover.

formally agreed points of time. For example, before making significant change to the
infrastructure.

At any point, the current configuration consists of the most recent baseline plus any approved changes that have been implemented. It's very
common to take baselines of standard

workstation configurations to provide a 'rollback' position if recent changes prove unsatisfactory.


S3AP19-Activity

35

Session 3A - Configuration Management

S3AP20 - Control

examining this in more detail in the release management session.

The third Configuration Management activity is Control. The control of configuration items consists of three sub processes. These are: Register, Update and Archive. An additional function of the control process is to protect the integrity of configurations.

S3AP22 - Status Accounting


The fourth Configuration Management activity is Status Accounting. ITIL defines status accounting as; 'The reporting of all current and historical data concerned with each CI throughout its lifecycle.' Status accounting allows us to reveal a CI's past status. What has happened to it up to this point? Its present status, (what state is the CI in now?), and its future status. (What plans there are for this CI in the future?)

CIs are registered as they fall into the remit of IT service management. If we receive new equipment from an external supplier, at the point of handover, we should establish that information received from the supplier is accurate. In many organisations this activity has a direct link with procurement. There are many reasons for updating a configuration items status. For example, a change in the CI's status from testing to 'live'. A change of financial asset value. A change of ownership, or changes brought about by incidents, problems or known errors. All these updates have to happen under the authority of
the configuration management process.

This accounting procedure enables changes to


CIs and their records to be tracked, and

changes in a CI's status to be documented. For example the change from 'live' status to 'withdrawn'. It can also help us establish
'baselines'.

Archiving decommissioned CIs takes place when a component is no longer in use. The
definition of what constitutes a redundant CI,

By declaring a status of 'trusted' we save all the configuration items and relationships as a baseline. If we encounter problems at a later
date, we can then retreat to this 'baselined'

decommissioning and timing details, would usually be specified in a predefined policy


document.

Archiving involves the removal of CIs from the CMDB and archiving onto secure storage, and not necessarily the destruction of the record.
S3AP21 - Control

point. Status accounting can also be used to monitor organisational procedures, for instance, that a request for change on a configuration item was properly authorised.
S3AP23 - Verification

The fifth and final configuration management activity is Verification.


The primary function of Verification, or verification and audit as it is sometimes known,
is to establish that the information in the CMDB

An additional function of the control process is to protect the integrity of configurations. The protection process safeguards against illegal changes to CIs, and procedures are
maintained so that the CMDB and the

information it contains are secure. Protecting the integrity of configurations must include security against theft, protection against unauthorised change or corruption. Enforcing access control procedures. Guarding against any environmental damage. Protection against

exactly matches the real life environment. Configuration management offers little benefit

if the information that it provides is out of date


or inaccurate.

viruses, and making back-up copies of the CMDB information, and the secure storage of these back-ups. Configuration control scope must extend to
'bought in' CIs, such as commercial 'off the
shelf software, sometimes known as 'COTS'

This verification and audit procedure should be carried out regularly but randomly. Deliberate avoidance of the change, and configuration management process is most likely to be revealed by this 'spot check' approach.

These audits involve checking the physical


whereabouts of equipment, and installed

packages. By definition this will involve software licence issues, and we will be

software. In addition to the regular 'spot checks', verification and audit would usually be
carried out at the following times:

36

Session 3A - Configuration Management


Before a new release, preparation of a baseline. or before the
When the incident moves into the Problem

process, we will be recording the problem


After a disaster. To establish that our records

information in the CMDB, and also looking at


the CMDB for related incidents. The Known

are accurate, following a major failure in the IT


infrastructure.

Error process will have links in the database to problem records, which in turn are linked back

Following detection of unauthorised changes


to the infrastructure. A single unauthorised change might be concealing many others, with
the result that the CMDB would not reflect the
real life situation.

to the configuration underlying cause.

items that

are the

When executing a Request for Change the


Configuration Items, and their interrelationships, will be examined in order to

And we would usually carry out an audit before the live implementation of a new Configuration
Management database.
Carrying out a manual verification and audit

assess the impact of the change. Change records will be stored, and their status updated as the change moves through its lifecycle. In
this integrated environment we can see the

fundamental
management

role

of

the
and

configuration
configuration

can be a time consuming and expensive


procedure. ITIL recommends the use, where possible, of automated verification tools. These tools are able to roam networks and servers, reporting on installed hardware and software.

database,

management as a whole.

Interestingly many manufacturers are building


automated management functions into their
PCs.

The ultimate update authority always lies with the configuration management process, but this authority can be delegated in the case of incident and problem records. Configuration
management also remains responsible for

updating the CMDB during the change and


It's also worth remembering that some

verification can be :carried out by the service


desk staff. During"calls from users, service
desk staff can ascertain what hardware and

release processes, often acting on behalf on the change and release management
processes.

software are being used, and whether this matches current configuration item records.

S3AP25 - Benefits & Problems

Finally, it's worth noting that in many large


organisations, responsibility for the verification and audit process would rest with a Configuration Librarian. S3AP24 - Relationships
As we discussed earlier in this session,

The benefits of and potential difficulties with Configuration Management are listed on page
26 of the little ITIL book and in section 7.4 of

the Service Support manual. They are also


summarised here for your convenience. S3AP26 - Summary

configuration management is closely linked with the overall Service Support and Service Delivery processes, both supporting, and depending on these processes.
When an incident is identified it passes through these processes, and it's important to realise how the CMDB, and configuration management as a whole, support this.
The CMDB is used to read and write

In this session we have been looking at the configuration management process.


We have seen how configuration management forms the foundation on which service delivery and service support functions are built, and
how all of these processes support service level management. In ITIL terms, configuration management can be defined as asset management plus
relationships, and we looked at how these assets are defined as configuration Items or
CIs.

information by each of the service support processes throughout the incidents lifecycle. For example, when an incident occurs we will
record it in the CMDB. At the same time we

could examine the CIs which might be causing


the incident.

We went on to examine the configuration management database or CMDB, its structure, and the type of information and records it
should contain. We also looked at how the

37

Session 3A - Configuration Management

CMDB links to the Definitive Software Library


and the Definitive Hardware Store.

We discussed in detail the five Configuration

management

sub-processes;

Planning,

Identification, Control, Status Accounting and Verification, and we went on to look at the

relationship between Service Support and Service Delivery and the CMDB. And finally we looked at the potential benefits and pitfalls when implementing configuration
management.

38

Session 3B - Change Management

Session 3B Management
S3BP1 - Objectives

Change

with changes to operational services and constituent components.


S3BP3 - Introduction

There are a number of key points here which

In this session we will be examining the second of the ITIL control processes, Change Management, which is described in Chapter 8 of the Service Support book of the IT
Infrastructure Library.
In this session we will:

highlight

why

the

change

management

process is critical to a well run IT services organisation.

The first of these is the ability to handle changes promptly and efficiently. When a need for simple and routine change occurs, Change
Management should handle them in a streamlined and pre-planned manner. Where more significant and complex changes arise,

Define what change is in ITIL terms, and the goal of Change Management.

they should be dealt with efficiently, but to an Examine the relationships between Change Management and other ITIL processes.
appropriate level of detail.

Define a Request For Change or RFC, and examine some of its potential sources.

Change Management is responsible for implementing changes in the organisation with

the minimum of disruption. Historically, making


changes to the IT infrastructure has resulted in a loss of business, and lost production time.

Look at the role of the Change Advisory Board, and the Change Advisory Board
Emergency Committee.

Examine the Change Management process


in detail.

S3BP2 - Introduction

The second control process within guidance is Change Management.

ITIL

So what is Change Management? Well let's start by more accurately defining the term change. It has many definitions, but possibly the simplest one is the most apt.
'Change is the process of moving from one
defined state to another'.

ITIL guidance addresses the potential impact of proposed changes by suggesting the use of fixed change slots in what's termed a 'forward schedule of change'. As a result users are informed about up and coming changes, what the change entails, and when it will take place. As a further safety net, change management carries out impact analysis on proposed changes, and produces a backout plan, giving the organisation a point to which they can retreat if a change proves unsatisfactory.
And finally, Change Management must balance the need for change against the risks to the IT infrastructure of implementing it. S3BP4-Activity S3BP5 - Relationships

ITIL defines the goal of change management in the following way.


'To ensure that standardised methods and

procedures are used for efficient and prompt handling of all changes, in order to minimise the impact of any related Incidents upon
service'.

By exchanging information with Capacity, Availability and Configuration management, Change Management is able to 'Assess the overall impact' of the change. Once assessed
we should be able to state whether or not:

The impact is manageable, the cost of change Change Management can either be restricted to changes to the ICT infrastructure and live' ICT services, or it can be expanded to cover all changes, including those in development areas, or changes which are the result of
is reasonable, and business benefits are

worthwhile. Assuming positive answers to all of these Change Management 'authorises the change'.

strategic decisions. For the purposes of this


course, we will be addressing the narrower

scope of change management, which deals

In many cases this authorisation is with the help of other experts who form a body known as the Change Advisory Board, and in some

39

Session 3B - Change Management

cases, where the change is a simple one, Change Management can be devolved. In these cases it is common for the Change management process to be devolved to Problem Management, or even to operational
staff.

Agreements. However, it's important, where financially viable, to meet customer's requests.
Implementation of new or changed legislation might bring about an RFC. Particular examples include legislative changes relating to privacy, intellectual property rights, security and so on.

Throughout the change management process,

there is an ongoing update of infonnation within the Configuration Management database. For example, a CI status can now be moved to 'under change' and so on.

And finally when a change is ready for release to the wider user community, be it effecting software, hardware, documentation or related infrastructure components, it falls to Release Management to manage the actual physical implementation. Remember however, that overall responsibility for any change remains in the hands of change management.
S3BP6 - Request for Change
The trigger for the Change Management process is the receipt of a Request For Change or RFC.
ITIL defines a number of sources from which an RFC can be received. The most common

A major change in business requirements may generate a significant Request for Change. Such a request may have already passed through a conventional investment appraisal process, and is entering the ITIL Service Management process for a second review. The role of Service Management is to ensure full impact analysis of the effects on existing
services, and on the infrastructure as a whole.

Typically, a request for change will contain such information as the sponsor, the requested date for implementation, an initial list of configuration items affected, services affected, the reason for change and initial costing
information. The exact content will vary depending on the origins of the RFC.
S3BP7-CAB

One of the main responsibilities of the Change


Management Process is to establish 'Change Advisory Board' or CAB. a

and well documented are those that form part of the incident resolution lifecycle. For example, where a user identifies an incident and reports it to the service desk staff, who in turn generate an RFC. Or from Problem Management, which generates a RFC after investigation of multiple incidents has led to a known error and a proposed 'structural'
resolution.
Another source of RFC's is the introduction of

The role of the CAB is to consider RFCs, and

in the light of the business need make recommendations as to whether they should be accepted and implemented, or rejected. It also ensures that any RFCs which don't merit detailed consideration by the CAB are
recorded. The CAB will also advise on the

grouping of changes into 'releases' to minimise disruption to the organisation and maximise
benefits.

new or upgraded CIs. For example, newly purchased workstations, their installation, addition to the network, recognition by the server, providing help and user documentation, will all generate RFCs.

Typically a CAB is made up of a Change Manager, who will typically chair the meeting.
Plus representatives of the customer, users, developers, other experts, consultants, outside
contractors, and of course IT service

There may be a 'New or changed business requirement for an IT service', often identified by the service level review process. Again this will generate a Request for Change, and be passed on to the Change Management
Process.

Management staff.

CAB meetings may have different combinations of staff attending; however the
core members of the CAB should be the

An RFC might arise because of customer or


user dissatisfaction with a current service. This

chairperson, customer, representatives.

user,

and

ITSM

may not have been reported via incident or problem management, and it might not be
outside the hours of current Service Level

In general a CAB is regarded as an advisory body, although in some organisations it is defined as an approval board. Its role is
considered as advisory because the ultimate

40

Session 3B - Change Management

responsibility for change lies with the change management process and hence the change management staff. When making decisions about a proposed change, the CAB shpuld consider the
business, financial, technical and risk

So let's start with an incoming Request for


change, remembering that RFCs can come in from many sources, including the business, other service management staff, or as a direct result of incidents or problems.

The initial recipient of the RFC is the Change


Manager. At this point RFCs are filtered, with the Change Manager rejecting those which, for example, have been incorrectly requested, are requests for service rather than changes, or are repeats of earlier requests.

implications.
at all.

It should

also

consider the

repercussions of not implementing the change


One other area for consideration when

deciding whether or not to implement a change is its likely impact on IT continuity plans.
Making changes to the IT infrastructure without making changes to any fall back sites can be very dangerous. S3BP8 - Activity
S3BP9- CAB/EC

It's usual for RFCs to be logged in the CMDB ahead of this filtering process. However, after filtering it's common for RFCs to change status and be redefined as change records.
If the change is accepted it moves to the next process, and the Change Manager allocates a

priority to the change. This involves assessing


Changes for impact on the business and urgency. There are two possible states for this assessment; they are 'urgent change* or

In many large organisations IT services are

required 24 hours a day, seven days a week.


In such environments the need for an RFC

could occur at any time. In such organisations it is usual to have a Change Advisory Board Emergency Committee in place. The CABEC are usually called in at short notice to analyse

the impact of an RFC, and authorise any


corrective work.

'standard change'. Whether changes are standard or urgent, the principles for processing them remain the same. However, urgent changes pass through a 'streamlined' version of the change management process, and we will be looking at this process later in
this session.

The committee would usually consist of the Change Manager, who acts as Chairperson, a senior business representative, and senior IT representative.
A word of caution here about CABEC activities. Often due to time and business

In this example, the change is considered non urgent, and so passes onto the 'Categorisation' process.
Change categorisation involves an initial

assessment of the

actions and

resources

pressures, comprehensive testing of changes isn't always possible. Nor are configuration items updated with status or change information. Ultimately the CAB is responsible, through the emergency committee, for ensuring that the change management and configuration management process work together to update relevant records, as soon as possible.
S3BP10 - Change Management Process
We established earlier in this session that the

required to make the change. There are four possible outcomes from this process. These are: Standard, Minor, Significant and Major.
A 'standard' categorisation is assigned when a frequently occurring change is identified. It can then be dealt with via a pre-existing set of processes and authorisations. These change types are usually considered low risk, and don't require consideration by the CAB. An example of this might be a hard disk replacement or upgrade on a user workstation. S3BP11 - Change Management Process The definition of minor, significant and major will differ between organisations, and will be dependent on the current status of the IT infrastructure, and the IT service management personnel's current feelings about risk.

trigger for the Change Management Process is the receipt of a request for change.
To address these RFCs, ITIL defines a

comprehensive change management process, and we will now look at this process in some
detail.

41

Session 3B - Change Management

A 'minor change' categorisation would usually be authorised by the Change Manager, who will report their actions to the CAB after completion of the change. The aim here is to
reduce the number of RFCs forwarded to the

CAB by filtering out any low risk changes.


If the change is defined as either significant or major, then the CAB will have a role. In both
cases, the first action is for the Change

Note that a failure during the change building process is likely to result in the change returning to the CAB, possibly with a request to modify the scope of the change. It's important that all changes have a back out plan, so that if an error occurs during implementation, the change can be revised and the service
restored.

Manager to circulate RFCs to either the CAB, or in the case of a major change, to company Board or other senior management members.
As we saw earlier in this session, the CAB's

Once the change is complete it moves to an Independent tester, where the change is tested and the quality checks are carried out. If at this point a failure occurs, the change is returned to the Change Builder. If the Change is tested successfully it moves onto the Change Manager, who coordinates the implementation of the change.
Remember that the Change Manager has overall responsibility for the change, but that Release Management normally has control at a detailed physical implementation level.

role is to give advice, provide estimates on required resources and timescales, and put forward schedules for change based on priority and resource availability. The CAB will also perform detailed impact analysis, and this often requires input from ITSM specialists, for example the Capacity Manager. Eventually implementation dates and a schedule are decided upon. This information is contained in a 'forward schedule for change', which is passed to the relevant service management staff, and to the business as a whole. If changes are likely to cause disruption
to the business, then this will be documented

in a 'Projected Service Availability Report'.


Remember, not all RFCs considered by the CAB will be accepted. After investigation, the potential risk or financial implications might be considered too high, and outweigh any potential benefits the change might bring.

Note that throughout the cycle of building and testing, and during implementation the Configuration Management process is updating the status of change records. Typical states might include; accepted, in build, under test and so on. A change record will typically contain details of the back out plan, when it
was built, CAB recommendations and

scheduled implementation dates. As a consequence, the change record is frequently changed.

The CAB activities of estimating and scheduling may well be iterative, and the process continues until an approved change status is reached, or the RFC is rejected, in which case it might re-enter the process at the beginning in a revised form. At the point of approval, the Configuration Manager updates the Change Management Database.
S3BP11 -Diagram

It's important to accurately manage the change record system within the CMDB, so that we can carry out traceability tests. Change records are usually linked to impacted infrastructure configuration item records, and also to any related incident, problem or known
error records.

S3BP14 - Change Management Process

S3BP13 - Change Management Process

The change has now reached the Change Building sub process. The Change Builder may actually consist of several groups of internal or
external staff, who are involved in hardware, software, operating systems, documentation and so on. Change Builders are not normally permanent members of a Change Management Team, but are drawn from areas of technical expertise.

If at the point of live implementation the change fails, then the Change Builder instigates the back out plans. If however, the change is implemented successfully, it's important that the Change Manager reviews the change.

The review process can provide valuable information about our change management process, and can also identify vulnerable areas
in the IT infrastructure. A successful review will

trigger the 'closed' status, and the request for change or change record will be updated in the

42

Session 3B - Change Management CMDB. Note the CAB itself might be involved in the review process. A failure at the review stage would identify shortcomings in the implemented change. This in turn would result

Let's take a few steps back, and look again at the process, assuming this time we have time to test the change. This time our built change passes from the Change Builder to the

in new requests for change entering the


process.

Independent Tester who carries out testing as


quickly as possible. If tests are successful,

S3BP15-Activity

then the change is forwarded to the Change


Manager for coordination of implementation. If the change fails during testing, then it returns to the Change Builder process.

S3BP16 Process

Urgent Change Management

S3BP18 - Standard Model for Change


In the previous few pages we have seen how the Change Management process deals with a standard change. We will now consider how Change Management deals with an RFC, which has been given an Urgent priority by the Change Manager.
NH^/1

The Change Management process deals with Requests for Change from many areas of the
organisation, and with different levels of

The first action is for the Change Manager to call either a CAB meeting, or in an emergency situation, the CABEC. The aim of this meeting is to quickly evaluate the request for change, by assessing its impact, the resources required
and its urgency. The meeting should establish whether its urgent status is justified. If the
outcome suggests that the RFC status isn't urgent, then it will be rejected, and will be dealt
with as a standard RFC.

authorisation. Where RFCs are frequent and repetitive, they can be dealt with via pre existing and authorised processes. These
processes are known as a 'standard model for

change'.

Standard models needn't only apply to simple changes, often complex operations can have
standard models. In general once an RFC is regularly repeated, we can create a standard model for that change.

We saw earlier in this session how the Change Manager examines RFCs and categorises
them either as standard, using a standard change model, minor, significant or major. To assign one of these categories, the Change Manager examines the RFC, and considers the following:

If, on the other hand, the RFC status is

confirmed as urgent, then it passes on to the next process and in to the hands of the

Change Building Team. The Change Building Team then build the change and prepares a back out plan. When the change is complete, as much testing as possible should be carried out. Completely untested Changes should not be implemented
if at all avoidable. If this is the case, the Change Manager then coordinates the implementation of the change into the live
environment.
S3BP17

Impact. The impact of the Request, for Change will have on the business, considering
such factors as the number of users affected.

Novelty. Is the change familiar? Has it


occurred before?

Together, Impact and Novelty can provide us


with some idea about the level of risk involved

Urgent Change Management

Process

with the RFC. An RFC with high impact and high novelty is certainly a higher risk. Devolved Authorisation. Has the responsibility
for change been devolved from the CAB to the Change Manager? Or further devolved to say
the Service Desk?

If the implemented change fails, the Change Manager implements the back out plan. If the change is successful, then the Change Manager firstly ensures that records are brought up to date, carries out testing in the
live environment, and at a later date, reviews

the change. If after the review, the change is


considered successful, then it is closed, and

Standard Model. Can the request for change be dealt with via a standard model, with a preestablished implementation process?

the Configuration Manager closes the RFC and updates the CMDB.

43

Session 3B - Change Management

S3BP19 - Change Categorisation


So let's add some content to our table.

identify these improvements, Change Management measures process performance,


and this is carried out in accordance with our own standards.

This RFC is regarded as low impact to the business, and is a well known change, so the

novelty is also low. Authorisation has been


devolved to the change manager, and a standard model exists. This is a high frequency
RFC.

Measuring performance usually takes place over time to show, for example, that the number of urgent changes is reducing. So that the results can be clearly understood at all levels in the organisation, this data is usually represented in graphical form.

Column 2 is slightly different, again the RFC is regarded as low impact, but it hasn't been done before, so its novelty is high, and as a consequence, no standard model exists. Again authorisation is devolved, and it's categorised as a minor RFC. This type of RFC could act as a trigger to build a new standard model.
In our third example, the results are slightly different. Our RFC has a high degree of

Regular summaries of the change process should be provided to service, customer and user management. Different management levels are likely to require different levels of information, ranging from the Service Manager, who may require a detailed weekly report, to senior management committees who only require a quarterly management
summary.

novelty, and no standard model exists. It will


be forwarded to the CAB, so authorisation isn't

Typical metrics for the change management


process are:

devolved to the change manager. This RFC falls into the significant category.
The RFC in our fourth example has a standard model, however, business impact is considered high, so devolution to the Change Manager won't take place, and it must be examined by the CAB before the standard model processes are implemented. Hence this is regarded as a significant RFC.

The number of changes implemented during the measured period. Number of changes backed out by reason
code.

Number of Staff Training records up to date. Cost per change verses estimated cost.

As both the impact and the novelty are high, the RFC in our fifth example must also be considered by the CAB. This is also a 'significant' RFC.
In example six, we are considering a change which has a very high business impact. For example, changing from an ISDN based telephony system to ADSL. Changes of this magnitude would normally be authorised at a higher level than the CAB. This is categorised as a major RFC. Over time, we should expect the number of standard models, and the changes passing through them to increase. This should result in a reduction of the number of changes
forwarded to the CAB, and reduce the number

Number of urgent changes.

By auditing the change management process we can check for compliance to procedures. In general a change management audit should investigate:

All new software releases. Checking that they have been through a proper authorisation
process

Incident Records. Usually selected at random, and tracked through the change process

of ad-hoc change requests devolved to the Change Management Process.


S3BP20 - Metrics & Audit

Minutes of CAB meetings. Not.only to check that CAB meetings have taken place, but also to see if identified action points have been followed through
Forward schedule for change. To see if it has

been accurately defined, and importantly, that it's been published to the user community, and
is being adhered to.

We've seen in this session how Change Management improves the way in which an organisation implements changes. To clearly

44

Session 3B - Change Management

And finally, that Change review records are in place for all changes. S3BP21 - Activity
S3BP22 - Benefits & Problems

The benefits of and potential difficulties with Change Management are listed on page 33 of
the little ITIL book and in section 8.4 of the

Service Support manual. They are summarised here for your convenience.
S3BP23 - Summary

also

In this session we have been looking at


Change Management, the second ITIL control
process.

I
w

We began the session by defining what


change is, and the goal of Change
Management, in ITILterms.

We looked closely at the relationships between


Change Management and other ITIL processes, particularly Release, Capacity,

Availability and Configuration Management.


We established that the trigger for the Change Management process is the receipt of a Request For Change, and we looked in detail at some of the sources of these requests. We examined the role of the Change Advisory Board or CAB, its make up, and the role it takes in the Change Management process. We went on to look at the role of the Change Advisory Board Emergency Committee.
/

We studied in some detail the Change Management process for both a normal and

W'

standard and urgent RFC, and defined the


standard, minor, significant and major RFC categories. Finally we discussed the use of metrics and auditing, in order to evaluate the change process, and highlighted the benefits, and potential pitfalls, of the Change Management
process.

45

Session 3C - Release Management

Session

3C

Release

A release is defined in ITIL as a collection of

Management
S3CP1 - Objectives
In this final session on the ITIL control

authorised changes to an IT service.


Releases are often divided into:

Major Software Releases and Hardware Upgrades

processes we will be looking at Release

Management, which is described in Chapter 9


of the Service Support book of the IT Infrastructure Library.

These would usually contain large amounts of new functionality, some of which may make
intervening fixes to Problems redundant. A major upgrade or release usually supersedes all preceding minor upgrades.
Minor Software Releases and Hardware upgrades.

When you have completed this session you


will be able to:

Describe why Release Management is


needed

List the major benefits, costs and possible


problems of this process

Usually containing small enhancements and fixes, some of which may have already been issued as emergency fixes. A minor upgrade or release usually supersedes all preceding
emergency fixes.

Understand how the Release Management process operates, and its relationship with other IT and Service Management processes
Describe what is meant by a Definitive Software Library (DSL), and a Definitive Hardware Store (DHS).
S3CP2 - Introduction

And finally, Emergency software and hardware fixes, normally containing the
corrections to a small number of known

problems.

S3CP3 - Activity
S3CP4-Roll Out

The third and final IT control process is Release Management. ITIL defines the goal of this process as: 'To take a holistic view of a change to an IT Service and ensure all aspects of a Release', both technical and non technical, are considered together.' Release Management implements new software or hardware releases into the operational environment the controlling processes of using Configuration Management and Change Management.

Release Management's holistic approach to IT service change ensures that the business as a whole and any relevant technical areas are ready to accept, implement and use a release. It is the responsibility of the Release

Management process to plan and oversee the 'roll out' of these changes. 'Roll out' includes distributing all the configuration items to wherever they are used. This could be done in a number of ways, either via the internet, by email, or simply by posting CD's. In general, use whatever means best
suits the business.

So why do we need Release Management?

In simple terms, it's the controlling process


which ensures that all aspects of a release are handled properly, including the software, hardware and documentation required. It focuses on protecting the live environment and its services through the use of formal procedures and checks.

This all sounds very simple, however the process becomes much more complex when hundreds of servers need to be upgraded simultaneously throughout a large geographic
and cultural area. To ensure successful

distribution, clear and repeatable processes as


well as technical and business skills will be

required.

This process requires technical competence, and its sub-processes are often performed by technical staff under the overall authority of the Change Manager.

As part of the Roll Out activities, it is likely that you will need to provide scripts to help install the release, as well as passwords to activate
the release when needed. Release

47

Session 3C - Release Management

Management must also ensure that only the


correct, authorised and tested versions are
installed in to the 'live* infrastructure.

it or rebuild it in the live environment, and finally implementation.

Additionally Release Management ensures that we can trace where a particular version comes from, and the related changes it has undergone. This is especially important for "due diligence and governance". To make this possible, software needs to be kept secure

Each of these three stages should be verified as accurate. For example, before implementation, we should be absolutely certain that a build process has been achieved correctly.

before, during and after the move into the live'


environment.

Release Management also agrees the exact contents of any release and a detailed roll out plan with Change Management.
S3CP5 - Release Management Process

Note that ITIL refers to specific steps called 'Roll Out Management' and this may take place after independent testing to manage in more detail the actual implementation stages that follow. Roll out management usually comes into play when dealing with very large and complex implementations or 'roll outs'.
Throughout this process it is very important to update the CMDB. Information is held here on Release Records, and any status changes to
these records should be documented.
S3CP7 - DHS & DHL

The Release Management process encompasses three defined areas of the organisation.

The development area, its own area of preproduction, and finally the production area, or
live environment.

Release Management has responsibility for two critical repositories. These are the Definitive Software Library or DSL, and the
Definitive Hardware Store, or DHS.
Information related to the contents of the DSL

The migration from one area to the next, is only permitted subject to satisfactory results from reviews, tests and other appropriate quality checks. Independent testing might include customers acceptance testing, operational acceptance tests and so on. It may well be that significant customer acceptance testing has already been
carried out.

and the DHS is held in the Configuration Management Database, and responsibility for keeping these records up to date belongs to Configuration Management. The DSL contains only trusted versions of software, for example software which has been developed from valid earlier versions via correct Change Management Processes.

However operational acceptance tests are very important - they ensure that anything that goes in to the live environment is supportable,
maintainable and robust.

Also worth noting is that any back out plans which have been prepared should also be
tested.

The DSL may consist of one disk containing all bought in and created software held in a single format. Commonly the DSL consists of separate disk volumes or servers containing
software for individual environments.

Additionally the DSL could contain other


software media, such as diskettes, CDs and so

S3CP6 - Release Management Process

on, which might be stored in a separate


cabinet.

Note that Change Management will decide on the particular contents of the release. It's very important that the release management team are fully aware of decisions made by other
processes.

Software assets are particularly vulnerable to unintended loss or corruption, so it's important
to take very good care of the DSL. For

example, employing adequate security and access controls. Appropriate protection against
other threats, such as fire or flood should also

In the production environment we will have to deal with, distribution, potential rebuild and
implementation, of software and hardware

be in place. Backup copies of critical elements of the DSL are often kept at another location.

releases. There may be three separate stages, firstly to distribute software, secondly, to build

48

Session 3C - Release Management


Finally protect the DSL against virus infection, by running regular virus checks on any item entering the library.
The definitive Hardware Store should be

If it's at suite level then the whole suite, which

might include many applications, would have to be rebuilt. Consequently full releases are
expensive to build, distribute and install.

protected in a similar way, and should have specific protection against physical removal. The contents of the DHS should be updated as quickly as possible to reflect the live
environment.

However they do give confidence that all the elements of a service work together successfully. They are most appropriate for

major changes, and are usually scheduled


over longer periods of time.

Delta releases involve distributing only the


Storing older versions of hardware can be

components that have changed since the last

useful if the organisation encounters significant problems with new configurations and software, then it's possible to revert back, by cloning these older versions. Remember, responsibility for maintaining the
contents of the DSL and the DHS is shared
XBases

release. Consequently this is a less expensive option. Delta releases are most appropriate for fixes and urgent or emergency changes, and as such form the most frequent form of
release.

between Release Management Configuration Management. S3CP8 - Release Unit & Release Type

and

To reduce the frequency of Delta and Full releases, and to provide longer periods of stability 'Package Releases' can be used. A 'Package Release' might consist of groups of
delta and/or full releases, which are held back

and released simultaneously. For example,


One of the key activities of Release Management is deciding on the 'release unit', which is defined as 'the portion of the IT infrastructure that is normally released together'. The general aim is to decide the most appropriate Release-unit level for each software item or type of software. This can be.

Changes to one system or suite will often


require Changes to be made to others. If all these Changes have to be made at the same time, they should be included in the same package Release

S3CP9 - Activity
S3CP10 - Release Identification

set at System, application suite, program, or


module level. Different release units will exist

Defining Release Type involves deciding on a


form of Release Identification. It's normal to

in different parts of the infrastructure.

use a numbering structure, which applies to


For example an organisation may decide that a normal release unit for its order processing service should always be at system level, and as such a change to a CI which forms part of that system will result in a full release for the whole of that system. The same organisation may decide that a more appropriate Release
unit for PC software should be at suite level,
and so on.

two or three levels.

For example a new Payroll System might be


assigned a release ID of V:1.0. An additional

minor release which involves changes to some of its applications would generate a release ID
ofV:1.1.

An emergency fix to a small element of a module within that system might have a
release ID of VM.1.1. Remember there is no
absolute limit to the levels used.

Once the 'release unit' is defined, Release

management moves on to address the question of release type. Release types are divided into 3 categories, these are, full release, Delta release and package release.
A full release is where all components of the release unit are built, tested, distributed and

Definitions of release Type and Release units should be documented in a Release Policy.
This policy should also clarify roles and responsibilities, and information on Release frequency. The policy content is usually determined by the Release Manager, in conjunction with the Change Manager and the CAB.

released together. For example, if the release unit is at program level, then the whole program would have to be rebuilt.

49

Session 3C - Release Management

A Release Policy might also contain:


Guidance on the level in the IT infrastructure to be controlled.

S3CP12 - Roll Out Planning

Details

on

release

identification

and

Roll out planning, together with Release Management decides on the type of rollout approach. This might be a 'big bang', phased or pilot approach.
A Big Bang approach involves all sites receiving all functionality simultaneously. The benefit of this approach is that it offers consistency of use across the organisation. However, achieving a simultaneous upgrade can be problematic.

numbering conventions.
A definition on major and minor releases, plus a policy on issuing emergency fixes.

Expected deliveries for each type of release.


S3CP11 - Release Planning
We mentioned earlier in the session that

Release Management is responsible for the detailed planning of releases. Amongst other things, release planning involves:
Gaining agreement on Release Content

In a phased approach all sites could receive some functionality at the same time, with more coming later. In a Pilot approach a single site receives all functionality ahead of other sites. Note however that combinations are possible, for example a 'phased pilot' approach.
S3AP13-Activity

Producing a high level release schedule S3CP14 - Legal Compliance


Planning resource requirements

Release planning is responsible for verifying all


of the hardware and software in use is as

standard, and has been derived from the

necessary

definitive

software

library

and

Compliance with software licence agreements has become critical to businesses. Ensuring these obligations are met is the joint responsibility of Release and Configuration Management. For example, when moving software to the DSL, it is important to check what has been purchased has arrived, that it has been virus checked, and that the licence agreement has
been checked.

definitive hardware store.

In addition the Release Planner develops a Release Quality Plan, to ensure all aspects of the release are quality managed, and produces a back-out plan Where a release is going to be particularly complex it may require a specific planning phase. To facilitate this, the Release Plan is extended to Rollout planning. This expands the Release plan produced thus far, and adds details of the exact installation process developed and the agreed implementation plan. Roll out planning involves:
Producing a detailed timetable of events.

Remember penalties for breaching the laws on software theft are applicable to any responsible officer of the company, including those at the highest level.
<ei'

There are many legal precedents for holders of software intellectual property rights arriving unannounced at premises, and impounding any equipment, which they believe, contains unlicensed copies of their software. S3CP15-Activity
S3CP16 - Benefits & Problems

Listing all the CIs to be installed and


decommissioned.

Producing

Release

notes

and

The benefits of and potential difficulties with Release Management are listed on page 39 of
the little ITIL book and in section 9.4 of the

communications to End Users.

Service Support manual. They are


Planning Communication.

also

summarised here for your convenience.

50

Session 3C - Release Management

S3CP17 - Summary
In this third and final session on the ITIL

control processes, we have been examining Release Management.

We started the session by defining ITIL's Release Managements goals, and why Release Management is necessary.
We saw how a release can be divided into

Major, Minor and emergency releases, and discussed Release Management's holistic approach to IT service change, and how, as part of this approach it produces detailed release or rollout plans.

We

examined the
and the

Release

Management
its critical

process,

linkages to

repositories, the Definitive Software Library


and Definitive Hardware Store as well as the

Configuration Management Database.

We looked in some detail at release types,


release units and release identification, and we

concluded the session by identifying some of the benefits, and potential problems with the Release Management process.

51

Session 4A - Availability Management

Session 4A Management
S4AP1 - Objectives

Availability

on safety critical systems and on general


aeronautical systems to understand this.
S4AP3 - Introduction

For most organisations there is a break-even

The topic for this session is Availability Management, which is described in Chapter 8
of the Service Delivery book.

point between the benefit given by extra availability and the cost of providing it in terms of more and more advanced techniques and
equipment.
Business of course is interested in the

Once you have completed this session you will be able to define Availability Management and
describe how components. it relates to other ITSM

availability of its services, such as e-mail,


personnel records and so on, and is not

You will be able to recognise the main elements of the Availability lifecycle and
understand the terms MTBF,
MTBSI.

directly concerned about the availability of any components that might be vital in making up
that service.

MTTR

and

In general, the availability of a service is influenced by the complexity of that service

You will appreciate the main responsibilities of the Availability Management process and be able to recognise several techniques which are
of use in this area. S4AP2 - Introduction

and the systems that it is based on, by the reliability of the items in the infrastructure, by both corrective and preventative maintenance procedures - and also by our incident, problem
and change management procedures.
It is important for all staff involved to
service is

It is a fact that the IT Infrastructure is becoming ever more reliable - and hence Availability
levels are generally better than they have ever been. However, Availability Management is none-the-less a critical support process for Service Level Management.
Availability is now regarded as one of the most important issues for IT service management because, even though reliability has increased, so has the dependence of businesses on their
IT services.

understand

that

if a

business

unavailable because of an IT problem there

will be a loss of business productivity.


This may also lead to a loss of revenue,
customer dissatisfaction and extra costs in

having to pay staff overtime for the work they


couldn't do when the system was unavailable. S4AP4 - Relationships & Definitions

Availability Management supports Service Level Management by actively managing the availability of services. For example it assists the Service Level Manager in negotiating and monitoring service levels.
The Service Delivery manual states that:

We will now explore the relationships that exist between Availability and the various elements of the support organisation, such as Service
Level Agreements,
customers.

IT Services and their

A customer will negotiate a Service Level Agreement with IT Services, and within the
SLA there will be statements about service

availability. The goal of the Availability Management process is to optimise the capability of the IT Infrastructure, services and supporting organisation to deliver a cost effective and sustained level of Availability that enables the business to satisfy its business objectives
The critical words here are 'cost effective'.

These statements might say that we expect 99% availability from a service measured over a one month period, or they may say we
expect no more than one hour's lost service over a four weekly period.

The business can have almost any availability it likes provided it is prepared to pay for it. One only has to look at the expenditure

They may say we expect no more than three breaks of service totalling one hour over a monthly period.

53

Session 4A - Availability Management

The definition of availability and the way we phrase that will be subject to local discussions. The current best practice view is to make this statement as business focused as possible and to think in terms of unavailability rather than availability.

Now for a particular component, let's say that a


failure occurs at the time X1. This will be recorded in ITIL as an Incident.

There will then be a period of time that it takes to repair the faulty component - this is usually
referred to as the Mean Time To Recover or
MTTR.

The generic definition of availability is: "The ability of an IT service or component to perform its required function at a stated instant or over a stated period of time."
Related terms, which are also defined in the same section of the Service Delivery manual

Be very careful here as the R in this acronym can have a number of alternate meanings. We
have defined it as "Recover" but it is also

commonly taken to mean "Respond", "Repair" or "Restore". Imagine, for example, that the
failure is a crashed hard disk.

are, Reliability, Maintainability, Serviceability and Security.


S4AP5 - Relationships & Definitions So, customers negotiate the SLA availability clauses with the IT service through service level management processes and then, as we will be seeing in later sessions, service level management processes require underpinning
support.

There will be a period of time that it takes to "Respond" to the incident, to get an engineer on site. Then there will be a further period during which the disk is being repaired or more likely replaced. Typically, it will then take some time to "Restore" the data to the point where
normal business can be resumed.

In this course we will be using the term "Recover" to encompass all of this - and the Mean Time To Recover is the average length
of time that all of this takes to achieve.

There are broadly two types of underpinning support, one through operational level agreements with internal suppliers, the other through underpinning contracts with external providers. In the case of internal support, such as application support, hardware support and so
on, the OLA will contain statements on

Be aware though, that it may be useful to understand these other measures as they are often captured by service management organisations to check on various aspects of the availability management process.
Once normal service has been recovered there

availability, reliability and maintainability of the components that this group is responsible for.

will then be a hopefully long period of time before the component fails again at time X2.

When we are talking about underpinning contracts the word 'serviceability' is used in ITIL as a contractual term to cover availability, reliability and maintainability when applied to components supported by external suppliers.
You can review a definition of each of the

The period of time between the fault being


recovered and the next failure is known as the Mean Time Between Failure or MTBF.
<^s#

Hence it is easy to see that the sum of the MTTR and MTBF will give what is called the Mean Time Between System Incidents or
MTBSI.

terms "availability", "reliability", "maintainability", "serviceability" and "security" by clicking on each of the buttons here. S4AP6 - The Availability Lifecycle

S4AP7 - MTBF, MTTR & MTBSI

It is useful to think of Availability as having a


lifecycle.

We can now consider the relationships that exist between each of these three parameters and the terms Availability, Reliability and Maintainability that we have already discussed.
It is obvious that a high Mean Time Between

So imagine that we have a timeline with time running from left to right.

Service Incidents implies high Reliability. If


components don't fail very often then the
services which are based on them will be

54

Session 4A - Availability Management


reliable services. So high MTBSI is obviously a
good thing.
On the other hand, a low Mean Time To

higher availability of the first part of the service than the second part.
ITIL refers to such business-critical functions
as Vital Business Functions or VBFs.

Recover is good news, since this implies a


high Maintainability. This can be achieved, not

only by technical means but by having good


support procedures within the IT service

The concept of Vital Business Functions is

management team so that there are no delays between an incident being detected and repair
work starting.

widely used in IT Service Continuity Management and Availability Management within ITIL and is a way of highlighting the
services to which the business must have

almost 100% availability.


As you might expect - a high Mean Time

Between Failure is very desirable and directly


equates to a high Availability.
So, typically we can see that if we want to achieve higher availability, then either increasing the Mean Time Between Failure or

Understanding each Vital Business Function

allows the Cost of Unavailability of a service to be measured and reported. Such costs may be
incurred through revenue loss, or overtime
payments and so on, as we discussed earlier.

reducing the Mean Time To Repair - or a


combination of the two can achieve this.

Cost of Unavailability is a more effective way of reporting than percentage availability


because it relates to the true cost of the loss of

All of these measures, MTBF, MTTR and

service to the business directly.

MTBSI, can be applied at both the component


and overall service level.

It is important to report on trends and to agree on the measurement period, for example,
"Service was available for more than 98% of

Typically, if we want to increase the overall availability either of a service or of an assembly of components, then this can be

done either by increasing the reliability of each component or the resilience of the assembly or by improving the maintainability and the procedural aspects.
Increasing the MTBSI and MTBF figures and reducing the MTTR will all cost money. There will be a limit as to how much we can spend to achieve high reliability and high resilience and there will be a limitto how much we can spend to achieve instantaneous reporting and repair.
As we said at the start of this session, the

the agreed service hours during the last month" may be very useful when we're reporting against service levels in Service Level Agreements, which are often expressed in the same way. Trends are very important in the whole of service management. Service improvement programmes, for example, set out to move

things forward, and that relies on having some


baseline against which to measure.

business can have almost whatever availability it wants - provided that it is prepared to pay for
it.

Section 8.7.7 of the Service Delivery Manual uses what it calls an IT Availability Metrics Model (ITAMM) as a framework for deciding on the sort of reporting that needs to be done. Because it covers such a wide range, from details of component availability right through to services, it is a basis for all reporting both
internal and external.

S4AP8-Activity

S4AP9 - The Business View of Availability


All businesses rely on their IT services - but some services, or parts of services, will be more important to the business than others. For example, in an EPOS service, the critical requirement is that we are able to take payments. Other functions such as automatic updating of stock levels is important but not as vital as servicing the immediate customers. Therefore it may be necessary to aim for

It is beyond the scope of a Foundation course


to understand much more about the ITAMM, just the fact that it exists and is a basis for

important reporting is all that we need to know.


S4AP10 Availability Responsibilities Management

Page 64 of the Little ITIL Book gives a useful listing of the responsibilities of the Availability Management process.

55

Session 4A - Availability Management

The first of these, concerning the optimisation of availability is self evident and much of this session concerns that particular point.
The second point is about determining availability requirements in business terms. It is very important to work with the service level manager and the customer so that their requirements for availability can be expressed in terms with which they feel comfortable.

In many ways the Availability Plan is analogous to the Capacity Plan and should take account of current levels of availability against the service level requirements, trends in terms of availability, new technological options and knowledge of the way business is developing. There is no absolute guideline on how far ahead the plan should look, but following the capacity management analogy, it would be reasonable to think in terms of one year at a time with a review at least every three months. The fifth item on the list of responsibilities is all about the collection, analysis and maintenance of availability data. Monitoring the various availability parameters can generate a large amount of data and
because of this it is not unusual to find an

They are often much more comfortable with discussing costs of unavailability in terms of money and time rather than percentages and
fractions.

Hence requirements must be gathered in the


relevant terms and then translated into

meaningful technical terms for discussion with suppliers of underpinning services, both
internal and external.

Availability
created.

Management

Database

being

Conversely, technical information about availability, MTBFs, MTBSIs and so on, may need to be turned back into meaningful
business terms for the customer.

This may be either as a separate entity or by adding extra information to the Configuration Management database. Item six is arguably one of the most important areas and defines the role of the availability
manager.

The third point, Predicting and Designing for expected levels of availability and security, implies that availability management staff are involved in the systems development process right from the very beginning.

It is an ITIL recommendation that Availability Management staff should be involved when the business case is being created for a new or extended service and that they remain involved all the way through the analysis and design process. The aim being to ensure that the needs of management, including availability maintainability and reliability, are built in along with security elements. This implies availability management staff having some familiarity with system development processes.

This is all about monitoring service availability against the Service Level Agreements, for the benefit of the service level manager.
The performance of internal and external

suppliers against the serviceability requirements in any underpinning contracts and targets defined in the Operational Level Agreements must also be monitored as part of this process.
The final point refers to the need for the Availability Management process to be continually looking for improvements on a proactive basis.

S4AP11 Availability Responsibilities

Management

The Availability Plan should be a long-term


plan for the proactive improvement of IT service availability within the imposed cost
constraints.

In other words, not waiting for targets to be threatened before taking action, but to be constantly reviewing current status and looking for cost effective ways of improving availability.
As with many other of the ITIL processes this proactive work is critical but may be the last part of the process to be implemented.

A good plan should have goals, objectives and


deliverables and should look at the issues of

There is an additional responsibility on the


process owner, and that is to monitor the

people, processes, tools and techniques as well as looking at the technology.

56

Session 4A - Availability Management effectiveness and efficiency of the availability


management processes.

achieve

service

levels

in

the

area

of

availability, then we'll be constantly looking at


records of service level achievement or service

This can often be done by looking at how


many SLAs have been breached because of availability issues, for example.

level breaches or potential breaches.

S4AP13 Process

The Availability Management

S4AP12 Process

The Availability Management

Now let's look at the key outputs from the


process, which are:

Section 8.3 of the Service Delivery manual describes the Availability Management process
in some detail.

Availability and Recovery Design criteria for


each new or enhanced IT Service. These are

The inputs to the process include: The Availability Requirements of the


business, which are critical.

intended to help the development teams decide on howto achieve high availability.
Details of the Availability techniques that will be deployed to provide additional Infrastructure

resilience to prevent or minimise the impact of


A business impact assessment, so that the
Vital Business Functions and the

component failure to the IT Service

consequences of loss of availability are fully understood. This will help in determining priorities when setting up the Availability
Management processes for the first time.

Agreed targets of Availability, reliability and


maintainability for the IT Infrastructure components that underpin the IT Services.

Reporting of Availability, reliability and


Part of the service level negotiation process will be to determine the availability, reliability and maintainability requirements from the business. Some of these will be for existing
services whilst others will be for services that

maintainability to reflect the business, User and IT support organisation perspectives

The

monitoring
to

requirements

for

IT
in

components

ensure that

deviations

are in conception.
Incident and Problem data will also need to be

Availability, reliability and maintainability are


detected and reported

examined. Part of the proactive work will be to investigate incidents and problems and to see which of those are caused by unavailable equipment and what the impact of these

And finally, an Availability Plan for the proactive improvement of the IT Infrastructure.
S4AP14-Security

incidents or problems was on availability


measures.

It can be argued that the most valuable assets of IT services are the data and the ability to
process that data.

Configuration data will be very important since that will show the relationships between configuration items and the chain of configuration items that makes up a typical
service.

This is why security is such an important part


of IT service management.

The basic logic behind managing these assets


This will enable us to look for sensible places where we might decide to replace equipment by higher quality equipment with a higher reliability. Or, for other areas where we might decide to mitigate against a possible single point of failure, or SPOF in ITIL terms, by looking for alternative routing in a network or perhaps duplicating of discs or processors. Remembering that one of the jobs of availability management is to ensure we
is:

Make

sure

that

access

is

denied

to

unauthorised people. In other words, maintain Confidentiality.

- Make sure that the assets are trustworthy. That is, maintain Integrity.
And, make sure that assets are available to

authorised people when they need them. Or, maintain Availability.

57

Session 4A - Availability Management

This may lead to some conflict and possible trade-offs. For example, high availability is not necessarily good if it compromises confidentiality or integrity.

User Down Time would be equal to four hours downtime x 1, giving a value of 4. Therefore the overall availability would be 400, minus 4 divided by 400 all times 100 - giving a weighted availability of 99 percent.
Contrast this with the value given by the more simple basic calculation, which would be only
90%.

Within ITIL, availability aspects are the responsibility of availability management while the confidentiality and integrity issues are shared responsibilities with security
management.

Within an organisation, it may well be that the whole responsibility for CIA is devolved to the availability management team. It is very important that such responsibilities are
clarified.

It's important to note that whichever way of calculating availability is chosen has to be agreed with the users before it can be used as the mechanism that we measure and report
on.

S4AP15 Availability %

Techniques,

Calculating

S4AP16-Techniques, Absolute Availability Percentage availability may not always be the most useful measure from a business point of
view.
xsgir

One of the most basic techniques used in Availability Management is the calculation of availability in terms of a percentage.
The basic calculation is straightforward, the availability of a service or of an individual component or of a grouping of components is given by the agreed service time minus the downtime, divided by the agreed service time all times 100 to obtain a percentage value. Note that component availability is often expressed as a decimal value - always less than one - rather than as a percentage.
In order to take account of the fact that one

Absolute figures of up-time and down-time over an agreed period might be more appropriate and may be more acceptable for
the business.

So for example we could say that there were four hours of downtime out of 400 potential service hours in the last week, and that may be a more useful measure than turning that into a percentage value. This is all about agreement and trust between customer and supplier and whichever figures
are chosen should be the ones most

user losing access to the system is significantly less serious than 100 users all losing access, a weighted calculation can sometimes be more meaningful.

meaningful to the business. It is very important to understand and be consistent in the use of reporting periods.

The way this is calculated is to replace the


variables AST and DT with End User

Processing Time and End-User Down Time.

For example, an availability of 99% for a service to be achieved on each and every day
is much more demanding than the same percentage averaged over a year long reporting period.

End User Processing Time is defined as the Agreed Service Time multiplied by the total number of users (Nt).

End User Down Time is found by multiplying


the Down Time by the total number of users
affected.

It is possible to achieve a 99% availability whilst losing service for perhaps two whole days in the year. In order to achieve 99% on a daily basis, the allowabledowntimeon any one day would have to be reduced down to just a
few minutes.

So, if a system is meant to be available for 40


hrs in a week and there are 10 users of the

system, EUPT will be 400.

S4AP17 - Techniques, Agreed Service Time


Great care must be taken over the definition of

If just one of the users is affected for four


hours but the other 9 users are not affected at

what agreed service time is.

all over that period of measurement, then End

58

Session 4A - Availability Management

For example, does it include downtime for maintenance? Is that already factored in?
In most cases we would not want to be

Calculating End-to-End availability for items arranged in parallel is a little more complicated
- as shown.

penalised for agreed maintenance or upgrades.

downtime

for

So for the same two components now arranged in parallel - the resulting End-to-End availability will be 99%.

In 24/7 systems however, where the requirement is for very high availability, the
figures often do include and are meant to include any time for maintenance, which will
need to be reduced to an absolute minimum.

Again it is easy to see that, unlike components arranged in series, the more CIs that are put in
parallel then the higher will be the overall

availability - but such duplication of components, or duplexing, - will necessarily


increase costs.

The pattern of downtime may also be critical and will need to be understood. For example, depending on business circumstances, 10
losses of service each of 10 minutes duration

There may also be some technical limitations

in terms of how easy it is to switch from one


component to another when one fails, but the

may be more damaging than a single loss of service for 100 minutes for the same period of
time.

general principle is one of significant improvement to assembly availability achieved


in this way.

The reporting requirement to cover such

differences will need to be closely examined


and agreed with the business.

S4AP20 - Techniques, Multiple CIs

One difficulty in both cases is finding good


In reporting and discussing availability with end
users and customers, the main areas of
values for A1 and A2.

interest will nearly always be based around services and not around components. However, internal reporting for service improvement purposes and for supplier management mechanisms will often require reporting at the component level. S4AP18-Activity
S4AP19 - Techniques, Multiple CIs

Assuming they are hardware components, this


could be derived from a combination of

manufacturers' engineering specifications, other similar installations and your own experience gained during testing or development.
Using a combination of those three sources will tend to give realistic values for the availability of individual components.

The formulae for calculating end-to-end availability for items arranged in series, is fairly simple.

Once an initial base of figures has been established then monitoring of availability over a period of time using monitoring tools and
records from the service desk of incidents can

The overall availability AT is equal to the product of the availability of each of the individual components. So if we have two components, each of which is capable of delivering 90% availability - the End-to-End availability of the assembly will be
0.9 times 0.9 or 81%. In other words,

allow an iterative improvement component availability figures.


S4AP21

in

the

Techniques,

Analysis

Techniques

significantly less than each of the components making up the assembly. It is easy to see from this formula that the more items that are put in series, the lower will be the End-to-End availability figure.

Finally, there are a range of techniques designed to aid understanding of why availability problems are occurring in particular parts of the infrastructure and to find corrective ways of working.
The first of these techniques that we will look at is Component Failure Impact Analysis or
CFIA.

59

Session 4A - Availability Management

This is represented normally in a matrix showing configuration items against the services supported.

Risk analysis can be done in a variety of ways. The way that's favoured in ITIL because it originally comes from the same development
source, is known as CRAMM, CCTA Risk

For example, here we can see that service 'B' is dependent on all four of the CIs 1 to 4 being available, whilst service 'D' only requires items
3 and 4.

Analysis and Measurement Method.


The CCTA - or Central Computer and Telecommunications Agency was the original
name for the OGC or Office of Government

Looking another way, we can see that item 3 is


essential to all 4 services, none of them can
function without it.

Commerce. The name was changed in 2001.


We'll talk a bit more about CRAMM in the IT

Service Continuity Management session.

It is important to realise that the CFIA matrix can be used by either reading down the columns or across the rows to give us different
information. If 'B' is a service that has vital business

S4AP23

Techniques,

Analysis

Techniques One of the key requirements of availability management is to be able to achieve an understanding of why a particular lack of availability is occurring and what to do about it. There are a couple of techniques that can help us here and they are called; System Outage Analysis, SOA, and Technical Observation
PostsorT.O.P.

functions within it, then it becomes critical to understand, at a more detailed level, how

**&?

those

VBFs

are

dependent

on

the

components.

As a first pass analysis of dependency and understanding of where single points of failure could be critical, CFIA is very useful.
So in the example shown, CI3 is a very good candidate for attention, such as replacement with a more reliable item or duplication by the addition of a parallel assembly as a replacement for the single component.

SOA involves a detailed analysis of service interruptions. It is really a post-mortem about some of the more major incidents that have occurred in the infrastructure and trying to find some common underlying theme or cause for the availability losses.

More sophisticated information can be put in


the CFIA such as information that for service

It requires significant inter-disciplinary work


between different teams to make this work and

'B' to run, either component 3 or component 4 need to be there but not necessarily both.
This may require some extension to the notation - which is often home grown or company-specific and which is beyond the scope of this course.
S4AP22

tends to be managed as a small project with a particular budget and reporting period.
Setting up a Technical Observation Post or T.O.P. is an expensive process because it

involves bringing together a team of people to look at a service at a vulnerable period of its
life.

^i/

Techniques,

Analysis

Techniques Another useful technique is called 'Fault Tree Analysis' or FTA.

If, for example, we know that on a monthly basis are availability problems while
assembling data for end-of-month financial

work, then a Technical Observation Post might be set up to look at this particular process.

This is a diagrammatic technique drawn initially from the world of engineering, which identifies the chain of events leading to service
failure.

In effect the T.O.P. would be watching the


process go wrong in order to more accurately understand what's happening.
This is particularly useful in cases where it
proves difficult in test conditions to simulate

It is part of a family of techniques generally referred to as Failure Mode & Effect Analysis
or FMEA.

the fault that is causing the loss of availability.

60

Session 4A - Availability Management


It requires an inter-disciplinary team and an

acceptance from the business that the only way of finding and resolving the issue is by allowing some availability losses to occur.
It is worth noting that in addition to the techniques that we have discussed in this section, the Availability Management process will support and work closely with proactive problem management. So many of the same

techniques used in Problem Management may also help with identifying the underlying
reasons for lost availability.
S4AP24- Benefits & Problems

The benefits of and potential difficulties with Availability Management are list on Page 68 of
the little ITIL book and in Section 8.3.5 of the

Service Delivery Manual.

They are also summarised here for your


convenience.

S4AP25 - Summary

In this session we have been examining the Availability Management process.


We have defined Availability Management and seen how it relates to other ITSM components.
We have considered the main elements of the

Availability lifecycle and the terms MTBF,


MTTR and MTBSI.

The main responsibilities of the Availability Management process have been defined and several techniques which are of use in this
area have been introduced.

61

Session 4B - Capacity Management

Session 4B Management
S4BP1 - Objectives

Capacity

The Capacity Manager role requires excellent

technical and business capabilities. The dayto-day activities include dealing with technical specialists and service level managers.

It's not usual for the Capacity Manager to In this session we will be examining Capacity Management, which is covered in Chapter 6 of
the Service Delivery infrastructure library. book in the IT
communicate with customers, or to be responsible for procurement of new

equipment. However, Capacity Management will have a significant input on purchasing


decisions.

Once you have completed this session you will


be able to:

S4BP3 - A Balancing Act

Define Capacity Management, and its three


sub-processes of Business, Service and

Resource Capacity Management

The Capacity Management Process can be regarded as something of a balancing act. The organisation must provide enough capacity to
meet justified business demands, balanced

Identify Capacity Management's ongoing, ad


hoc and regular activities

against the costs that the organisation can afford to pay.

Describe the contents of the Capacity


Database and the Capacity Plan
S4BP2 - Introduction

There are two 'laws' associated with Capacity Management, which offer an insight into the demands placed on this process. The first is 'Moore's Law', which suggests that 'processing
capacity doubles every 12 to 18 months'. The second is a variation on 'Parkinson's Law',

In order that Service Level Agreements are met, it is critical that sufficient capacity is available at all times to meet the agreed
business requirements.

Capacity Management ensures that IT processing and storage capacity provision match the evolving demands of the business in a cost effective and timely manner. Of all the ITIL processes this can be regarded as one of the most proactive. ITIL defines Capacity Management's goal as:
'To understand the future business

which states that data expands to fit the space available for storage. This highlights a second 'capacity' problem, the one of supply and demand. As greater capacity becomes
available users will make use of it.

There is continual pressure from the business and customers to increase capacity, but in doing so there are costs incurred to the business. Ultimately, a decision has to be made over whether the cost of capacity provision provides enough business benefit:

requirements, the organisation's operation, the


IT infrastructure, and ensure that all current

and future capacity and performance aspects of the business requirements are provided cost effectively.'
The Capacity Management process incorporates Performance Management, Capacity Planning, and monitoring and tuning activities. In a large organisation there may be many people working in a Capacity management team under the leadership of a specialist.

However, Capacity Management must justify the cost of any capacity increases. Broadly speaking the objective is to provide:
The Right Capacity, enough but not too much
At the right cost
And critically, at the right time In theory, if Capacity Management processes are running well, providing the right level of capacity at the right time, then they should be invisible to the business, and to most aspects of the Service Level Management. S4BP4-Activity

In smaller organisations it might be the role of a single individual who is supported by technical specialists from Networking, desktop
and so on.

63

Session 4B - Capacity Management

S4BP5 - Scope
Capacity Management is also involved in the reduction of capacity or as it is sometimes known, 'managing shrinkage'. In any organisation the capacity of certain components may be being reduced whilst the capacity of others may be being increased.

the business. It tries to ensure SLAs aren't

breached because of capacity problems, and tries to improve scarce resource utilisation through the use of Demand Management.

Finally, Resource Capacity Management concentrates on the underpinning technology


resources that 'enable' business services. It

also

ensures

that

these

resources,

or

An example of this might be where a mainframe-based environment is gradually being replaced by a distributed service. The capacity requirements on the mainframe will be falling while the capacity requirements on the servers will be increasing rapidly.
As we mentioned earlier in this session,

Configuration items, are not over used.


This sub process is also responsible for monitoring future development and capacity of technical components, and reporting these findings back to the business, so that they can be integrated into future plans.
S4BP7 - Activities

providing capacity to the business at the right time is critical. If capacity upgrades are too late then the infrastructure could fail. Failures might already be occurring; for example, through incidents and complaints reported to the Service Desk, or internal monitoring tools might indicate that we are operating close to capacity.
Buying in extra capacity at short notice leaves little negotiating power with external suppliers and as such is likely to be very expensive. Conversely, upgrading the infrastructure to increase capacity to then find it's under-used could in itself lead to financial problems.
S4BP6 - Sub-Processes

The Capacity Management process has a number of ongoing, iterative activities. These activities include: monitoring, analysis, tuning and implementation, and are carried out in Resource Capacity Management and Service Capacity Management. They are not normally used in Business Capacity Management, except during business reporting. The monitoring activity should include the monitoring of thresholds, and baselines or profiles of the normal operating levels.
Thresholds and baselines are set from the

analysis of previously recorded data, they are the 'yardstick' by which Capacity Management
can measure utilisation of IT infrastructure

Capacity Management consists of three inter related sub processes, each working at different levels in the organisational structure. The three sub-processes are, Capacity Management, Service Management, and Resource Management. Business Capacity Capacity

configuration items.
All thresholds should be set below the level at

which a resource is over-utilised, or below the

targets in an SLA. For example, a threshold might specify that the usage on any individual
CPU does not exceed 80% for a sustained

period of one hour. If these thresholds are


exceeded, alarms should be raised and

Business Capacity Management focuses on the future services required by the business and tries to predict future capacity. This process is responsible for the production of a Capacity Plan, which is intended to forecast the future requirements for resources to support IT Services that underpin the business
activities.

exception reports produced.

In addition to exception reports, monitoring will also produce trend reports on a daily, weekly
or monthly basis. Trend reports are intended to help predict future threshold breaches.

To work effectively, BCM requires an insight


into the business as a whole, and should be

Monitoring leads on to the analysis activity, where the monitoring data is analysed to try and identify and classify problems. Analysis then leads onto reporting, and then tuning,
where the problems are addressed, and the

able to gather medium term plans and predictions about growth or shrinkage.
Service Capacity Management is concerned with the services currently in place to support

technical parameters of the system are fine tuned to improve efficiency.

64

Session 4B - Capacity Management

Once a tuning decision has been made it is implemented through the change management process. Finally the activity returns to

Ongoing, the day-to-day activities, Ad hoc, carried out as a result of a particular need, and Regular, which are carried out at fixed
intervals.

monitoring, and the iteration begins again. Note that tuning is an optional activity. If no problems are identified in analysis, then tuning will be unnecessary. Tuning is an expensive
activity, as it involves high levels of skill.

S4BP10 - On-Going Activities

Among the ongoing iterative activities, are those of Monitoring, Analysis, Tuning and
Implementing, which we looked at earlier in the
session.

Tuning can improve service delivery without incurring costs associated with equipment
purchase. However, using skilled resources will incur costs, particularly if they are sourced
from outside the business.

Remember this group of activities are mainly


carried out at the Service and Resource subprocess level. Also note that these activities

are used in Business Capacity Management's


Tuning at service level can ensure that reporting activity.

services don't clash at times of peak demand. Any excess demand can be controlled by Demand Management, or by sharing capacity. We will be looking at Demand Management in
more detail, later in this session.

Another 'on-going' Capacity Management activity is providing data to the Capacity Management Database or CDB. As you can see in the diagram, all of the other on-going
and ad hoc Capacity Management activities
provide information to the CDB.

Importantly, tuning should be carried out initially in a test environment. Only when we are confident that the change will be a benefit to the business, should it be implemented through the conventional change management
process.

The CDB provides valuable information on


who has used which resource and when. This

data can be extremely useful for other ITIL processes, particularly IT Services Financial Management.
The CDB is the cornerstone of a successful

S4BP8-Activity
S4BP9 - Activities

In the next few pages we will look at all of the capacity management activities in more detail, and how they relate to each of the Capacity Management sub-processes of Business

Capacity Management process. Data in the CDB is stored and used by all the subprocesses of Capacity Management, because it is the repository that holds a number of different types of data including; business,
service, technical, financial and utilisation data.

Capacity Management, Service Management and Resource


Management.

Capacity Capacity

However the CDB is unlikely to be a single


database, and probably exists in several physical locations. We will look at the make up
of the CDB later in this session.

Remember Business Capacity Management is concerned with future business requirements for IT services, its planning and timely implementation.

S4BP11 -On-Going Activities

Service Capacity Management is responsible for ensuring the performance of all services detailed in SLRs and SLA targets are monitored, measured, recorded, analysed and reported.
Resource Capacity Management monitors and measures the individual components in the IT
infrastructure.

Another on-going Capacity Management activity is Demand Management. The main objective of Demand Management is to influence the demand for computing resource
and the use of that resource.

This activity can be carried out as a short-term measure because there is insufficient Capacity to support the current workload. Or as a deliberate policy of IT management, to limit the required IT capacity in the long-term. Short-term demand management might be needed if there is a partial failure of a critical

The Capacity Management activities can be sub divided in to three groups based on their frequency, and these are:

65

Session 4B - Capacity Management

resource

in

the

IT

Infrastructure.

Service

provision might have to be modified until a replacement or fix is found.


Long-term Demand Management might be used when an expensive upgrade to the IT infrastructure can't be cost justified. The aim in this case, is to influence patterns of use, by using mechanisms such as physical and
financial constraints.

These modelling techniques vary in complexity and consequently cost, with Trend Analysis being the simplest and cheapest, and benchmarking being the most complex and expensive. Let's look briefly at each of these modelling types.
S4BP13 - Ad-Hoc Activities

Physical constraints might involve restricting the number of concurrent users to a specific
resource, a network router for example.

The Trend Analysis technique looks at various data over a period of time and attempts to draw a smooth curve through these figures, extrapolating the graph data forward into the future, as a way of predicting future trends.

Financial constraints might involve the use of differential charging, such as charging customers a premium to use network bandwidth during peak hours of demand.

Analytical Modelling uses mathematics to represent computer system behaviour. Typically a model is built using a software package, which can recreate a virtual version
of a computer system. When the software is executed, 'queuing theory' is used to calculate response times, and if virtual response times are sufficiently
close to those recorded in the 'real life' IT

Demand Management must be carried out sensitively, without causing damage to the business, customers, or the reputation of the IT organisation. It is essential that customers are kept informed of all the actions being
taken.

infrastructure, the model can be regarded as


S4BP12 - Ad-Hoc Activities
accurate.

Modelling is an example of an ad hoc activity, which is used in all Capacity sub-processes. Modelling tries to predict the behaviour of components and services under a given volume of work, particularly at peak times. It then tries to understand the way in which
current service and resources are used, and

Although Analytical modelling requires less time and effort that other modelling types, typically the end results are less accurate. Simulation modelling involves the modelling of discreet events, in other words what actually happens millisecond by millisecond, as a transaction passes from local pc through the
local area network, to server and so on.

the impact of that usage on the IT infrastructure. It attempts to predict the future from our knowledge of the past. In order to do
this we establish a 'baseline' model.

This type of modelling can be very accurate in predicting the effect of changes, but it is time consuming, and therefore costly.
However, Simulation Modelling can be cost

The baseline model reflects accurately the performance that is being achieved. Once a baseline is created, predictive modelling can
be done.

justified in organisations with very large


systems, where the cost and associated business implications are critical.

We can ask the 'what if?' questions about planned changes to the IT infrastructure. If the
baseline model is accurate then the results of

Finally

Benchmarking involves physically

the

predicted

changes

should

also

be

accurate.

building a replica of part of the IT infrastructure and measuring such things as its response to a reduced workload, and extrapolating these results, to see how it would perform under the
'real' workload.

The major modelling types used by Capacity


Management are: Trend Analysis Analytical Modelling
Discrete Simulation

Because Benchmarking involves the purchase of equipment, building software and simulating
significant workloads, this is the most

expensive modelling option, however, it does

and Benchmarking

give the most accurate predictivefigures.

66

Session 4B - Capacity Management

The important inputs to the Service Capacity


S4BP14 - Ad-Hoc Activities

Management sub processes are; the service


levels and SLAs. Current information from

Another ad hoc Capacity Management activity is Application Sizing. The primary objective of
Application sizing is to estimate the resource requirements to support a modified or new application, and to ensure that it meets its required service levels.

monitoring tools related to systems, networks


and services.

The service review results, including any


issues raised. Incidents and Problems related

to capacity, and any SLA breaches.

Application sizing has a finite lifespan. It is initiated at the beginning of a new application, or when there is likely to be a major change to an existing one. Application sizing is complete when the completed application is accepted
into the operational environment.

RCM's key inputs including incidents or

problems related to a particular component. Monitoring information related to component


utilisation.

Financial Plans and Budgets are a major input


to all 3 sub-processes.

This activity is performed together with colleagues in system and service development, to ensure that we are fully aware of the likely impact of services being developed, designed or purchased, before they are implemented.
This provides Capacity Management with important data on future resource requirements, which can be integrated in to the Capacity Plan, as well as providing valuable information for purchasing and for the development team. Finally, a 'regular' Capacity Management activity is the production of a Capacity Plan, which is typically created annually. Information gained from the activities of monitoring, demand management, modelling and application sizing will contribute to the production of a Capacity Plan.
We will be looking at the Capacity Plan in more
detail later in this session.

Outputs from the sub-processes include a


Capacity Database, Baselines and threshold
information, which we looked at earlier in this

session. Capacity reports such as Trend, Ad

hoc and exception reports, will be produced by


all three sub-processes.
Other outputs include recommendations for

SLAs and SLRs, as Capacity Management


activities will turn initial SLRs into achievable

and cost effective service level quality clauses. Charging and costing recommendations are
also produced.

SCM and RCM will be suggesting 'proactive changes' and 'Service Improvements', to improve levels of capacity, or reduce costs preferably both! Carrying out 'Effectiveness Reviews' and creating 'Audit Reports' form a basis for

checking that business benefits are being achieved, and that the process users are
following the 'rules'.

SPBP15-Activity

S4BP17
S4BP16 - Inputs & Outputs
Database

The

Capacity

Management

To fully appreciate the scope of Capacity Management, we need to consider the major inputs and outputs to the process, and how these relate to the sub-processes of Business, Service and Resource Capacity Management.

Although the Capacity Management Database is represented in the ITIL guidance as a single entity, it is unlikely to exist in this form in many organisations.
The main reason for this is that much of the

Inputs to the BCM sub-process include, the external suppliers of new technology, existing service levels and current SLAs, along with proposed future services and related SLRs. Other important inputs to BCM include the Business Plans, and any strategic plans together with IS and ICT plans.

data held in a CDB is common to that in a fully integrated Configuration Management Database, therefore, there is an argument for
making the CDB part of the CMDB.

Software tools used by Capacity Management may have partial CDB functionality designed in to them. If this information is accessible by

67

Session 4B - Capacity Management

other software, then a 'virtual' CDB can easily


be created.

S4BP19 - Critical Success Factors

Remember the data contributors to the CDB

Managing the capacity of large distributed networks is becoming increasingly complex,


and the financial commitment from business to IT continues to increase.

are the key to its success. Input from the business, includes the 'business strategy* and the business plan.

Service Management will provide Information about SLAs and a full definition of the quality processes in place.
Data about manufacturer's specifications for existing and new technology will be provided by the technical teams. And finally, the IT Financial Management team will provide fiscal data. Additional financial information will be provided from the CMDB, in its role as a 'super' asset register. S4BP18 - The Capacity Plan

A corporate Capacity Management process ensures that the entire organisation's capacity

requirements are catered for. However, making the process work successfully depends
on several critical factors. These include: Accurate business forecasts.

An understanding of current and future technologies.


A cost effective Capacity Management
process.

The Capacity Plan is a major output of the


Capacity Management process.

Working closely with other Service Management processes, for example Problem and Change Management.
Effective financial management.

It has a standard structure and includes:

Assumptions - about levels of growth.


A Management Summary.
Business Scenarios.

Links to Service Level Management - to ensure that any business commitments are
realistic.

And finally, the ability to plan and implement the appropriate IT capacity to match business needs. This provides a longer-term proactive
view. S4BP20 - Benefits & Problems

A Summary of Existing Services, problems or


issues with current services and current levels of utilisation

A Resource Summary - which will show what has happened to particular components over the last year and since the last Capacity Plan The Capacity Plan will also suggestions for cost effective improvements.
A Cost Model will recommendations

The benefits of and potential difficulties with Capacity Management are listed on Page 57 of
the little ITIL book and in Section 6.4 of the

Service Delivery Manual. contain service

They are also summarised here for your


convenience.

illustrate some costed

S4BP21 - Summary

Recommendations for the business - Capacity Management usually provides a number of


alternatives for the business, and it should be produced in a timescale which allows the recommendations to be considered as part of the budget planning lifecycle.

In this session we have been looking at the ITIL process of Capacity Management.

We have defined the goal of Capacity


Management in ITIL terms, and we have

looked in detail at the

three

Capacity

Management sub-processes of Business, Service and Resource Capacity Management.

One final note. Remember that the Capacity Plan should be updated regularly, in line with any revised business plan, or unexpected
changes in the IT infrastructure.

We went on to examine the iterative Capacity Management activities, of Monitoring, Analysis,


Tuning and Implementation, and the ad hoc

68

Session 4B - Capacity Management


and regular activities of Demand Management, Modelling and Application Sizing.

We highlighted the major inputs and outputs of the Capacity Management process, and defined the contents of the Capacity Database and the Capacity Plan. We concluded the session by defining the critical factors for successful Capacity Management implementation.

\^^z

69

Session 5A - Service Level Management

Session 5A - Service Level

Management
S5AP1 - Objectives

Agreements and underlying Operational Level Agreements or underpinning contracts are


met.

S5AP3 - Why do we need SLM?


Customers have become more aware of their

In this session we will be examining Service Level Management, which is covered in Chapter 4 of the Service Delivery book in the IT infrastructure library. When you have completed this session you
will be able to:

dependency on IT for successful business operation. Hence they feel an increased need
to formalise the contractual basis on which IT

services are provided, and this is where

Service Level Management can help. Often, Service Level Management is a driver
for CSIP or SIP or Continuous Service

Define Service Level Management according


to ITIL best practice.

Improvement Programmes.

Identify the core Service Level Management


sub-processes and activities.

Such programmes are aimed at achieving


cost-effective improvements to the services

offered by the IT service provider, in a rapidly


Understand the relationships between SLAs, OLAs and UPCs, and recognise the main sections of a Service Level Agreement. List the benefits gained from the Service Level Management process.
S5AP2 - Introduction

changing technical environment, without necessarily being driven by customer demand.

An example of this might be to take advantage


of dramatically reduced networking costs to provide better response times than the customer originally specified. Or alternatively, by providing the same response times but at a
much lower cost.

Service Level Management is considered by


many to be the heart of ITIL-driven service
management.

It is the

responsibility
to be

of Service Level
aware of service

Management

ITIL defines its goal as:

improvement opportunities, before the customers themselves begin to ask about


them.

To maintain and gradually improve business aligned IT service quality, through a constant cycle of agreeing, monitoring, reporting and reviewing IT service achievements and
through instigating actions to unacceptable levels of service." eradicate

S5AP4 - Activity S5AP5 - Approach to Service Provision

There are a number of ways that IT services can be provided - each having their merits and
draw-backs.

Service Level Management exists to ensure that service targets, such as availability or services, response times and so on, are agreed and documented in a way that the
business understands.

It is also there to ensure service achievements

In the simplest scenario there is external provider of the IT service customer organisation. Services provided on the basis of a contract these two parties.

just the and the


will be between

are monitored and reviewed on a regular


basis.

Service Level Agreements, which are managed through the Service Level Management Process, provide specific targets

Whilst this has the benefit of simplicity, it's a risky strategy and one that generally leads to poor support for the users and poor value for money for the corporate customer. The next approach is often said to involve an "intelligent customer" role. That is, somebody who negotiates on behalf of the customer with suppliers for service delivery. That customer has a Service Level Agreement with the

against which the performance of the IT


provider can be judged. The Service Level Management Process is responsible for ensuring Service Level

71

Session 5A - Service Level Management

Service Level Management process, and the service is underpinned by an 'Underpinning Contract' with the suppliers. In this situation, the internal IT department adds little or no value. Such arrangements are common where an 'off-the-shelf package solution is being provided by the supplier.

of service requirements amongst groups of


customers.

As an example, let's suggest 10 major groups


of customers, each of which has a common set

of service requirements. So by producing SLAs at the Customer Group level the number required could be reduced to 500 - more manageable but still excessive. There are a number of ways in which this problem can be overcome - perhaps the most common one being the mapping of services
onto customer groups.

S5AP6 - Approach to Service Provision


Probably the most common arrangement is
where the customer has a 'Service Level

Agreement' with Management team.

the

Service

Level

In order for that service to be provided, it is necessary for the Service Level Management team to establish 'Operational Level Agreements' with their own internal IT departments, who in turn may have an 'Underpinning Contract' with the external suppliers of the various components. Note that for any one service there may be several Operational Level Agreements and several Underpinning Contracts. Finally, although it is much less common, the whole process can be purely internal, and no external contracts are therefore required. So the Customer has a Service Level Agreement with Service Level Management and they have an OLA with the internal IT department - and
that's it.

So a particular service, say Service A, will be provided in a generalised format to Customer Groups 2 and 4. And in a similar way, Service D will be provided to Customer Groups 1 and
2.

This allows us to have just one SLA per service - so 50 in our previous example. Despite this problem, this is the most common approach that you are likely to encounter. The drawback of this approach is that it tends to make each SLA more complicated, since they may have to cater for the fact that not all groups covered by a service have exactly the same requirements. If there are geographical differences between the groups as well, then this will also add to the complexity.
S5AP8 - SLA Structure, Customer Based

This last arrangement is fairly unusual because most systems will depend on some external supply. It is, on the other hand, quite common for a total service to be provided on the basis of a combination of two or more of these strategies.
S5AP7 - SLA Structure, Service Based

An alternative approach is to turn the previous model on its head and map Customer Groups
onto Services.

An SLA is created for each customer group,


describing all of the services that each customer group will receive.

One of the early decisions that has to be made is the structure of the SLA procedure - which is a major determinant of how many SLAs will end up being produced.
For example, if we had 1000 customers and 50 services we could theoretically produce 50,000

Here for example Customer Group 2 receives three services, however they would have just one SLA, admittedly quite a complex one, detailing how they would receive Services A, C
&D.

There are a couple of advantages to this


approach. One is that the number of SLAs can

Service Level Agreements. This would clearly


be impractical. Fortunately most businesses don't have 1000 customers who are entirely independent of

be dramatically reduced - in our previous example with 10 customer groups and 50 services, we would end up with only 10 SLAs.

Also, it becomes relatively straightforward to


introduce variances on standard services

each other and so there is usually commonality

between the different customer groups.

72

Session 5A - Service Level Management

The disadvantage is that the SLAs can be long


and complex and contain a great deal of
duplication from one another.
S5AP9 - SLA Structure, Multi-Level SLAs

Level Agreements would be authorised at the next management level down in each of these departments.

A third approach to structuring SLAs, is to


have a Multi-level or hierarchical structure.

The general principal is that SLAs are authorised by paying customers on behalf of users in their part of the organisation.
S5AP10 - What is an SLA?

ITIL suggests three levels, namely: Corporate,


Customer and Service.

In their

structure

SLAs

are

rather

like

Corporate is the highest level and contains any


common features that are true of all services

across all customer groups. This might cover


things like service desk hours, escalation procedures, contact points, roles and responsibilities, and so on.
The next level down is the Customer level.

contracts, but they are not in themselves legal documents, However they can be included in a legal contract, particularly when establishing SLAs directly with external suppliers. In such
cases an SLA would be included in the
contract as a schedule.

An SLA which is used internally between departments has no legal weight, it is simply a
document that has a contractual structure to it.

Each of the SLAs produced at this level is a description of the services for a particular group of customers. So in our previous
example there would be 10 SLAs at this level.

The purpose of an SLA is to document an


agreement, and as such shouldn't be an imposition on either the business or IT. It must

At this level SLAs would contain everything that was common for that particular group of customers, but different from the generic services that appeared in the higher Corporate
level.

always be written in unambiguous business language, and shouldn't contain any technical references, which make its intention unclear, and leaves the Business feeling uncomfortable
authorising the agreement.
So we have established what constitutes an

Finally, Service Level sits at the bottom of the


structure. Here we have a document

representing each service used by that customer, and relevant to that particular
customer group. It only contains information which differs from the corporate customer level
clauses.

SLA. What then is an OLA or Operational Level Agreement? Well in simple terms OLAs
are agreements that define the internal IT arrangements that support SLAs.
OLAs are also known as back-to-back

agreements. The most common use of an OLA

Consequently we would have a larger number of SLAs, but each would be relatively short. This in itself makes change management
easier.

is to define the relationship between Service Desk and internal support groups.
OLAs are required to ensure that the SLA targets agreed between customer and IT provider can be delivered in practice. They describe each of the separate components of the overall service delivered to the customer, often with one OLA for each support group and a contract for each supplier.
A further additional contract exists to ensure

If for example, we decided to change the


standard hours of the service desk from 9am

until 7pm, to 9am until 9pm, then that change would only appear in the corporate level SLA.
It is important when using the hierarchical structure, that the correct level of authority is assigned to each level. For Example, at Corporate level the document would be authorised at the highest management level liaising with IT. Customer level documents might be authorised by Department Heads, Finance, Planning, HR and so on. Individual Service

that SLAs are supported, and this is an 'underpinning contract.'


S5AP11 - What is an SLA?

Underpinning contracts are put in place with external suppliers or vendors. It's important that all targets contained within both SLAs, and OLAs that rely on these external suppliers are

73

Session 5A - Service Level Management

'underpinned' by the appropriate level of maintenance and support contracts. For example, an internal software development team might have in place an OLA between themselves and Service Level Management. This OLA offers, amongst other things, a guaranteed response time to serious problems.

published to potential customers, and the wider business as a whole, in a more 'glossy'
format.

In order to establish their exact requirements, the customer develops a Service Level Requirement document. When doing so, the customer should be realistic about potential
levels of service, and related costs. Remember this is not a wish list, and sensible
advice should be offered from the Service

In order to guarantee these service levels, the software development team might have an underpinning contract in place with their development software vendor, ensuring that problems can be resolved well within this guaranteed response time. A word of warning here, it's critical that any commitments made in an OLA are directly supported by the underpinning contract. For example, committing to a 4 hour fix time in an
OLA would be useless if our underpinning contract only commits our supplier to a 6 hour
fix time!

Level Management team. There is no specific format for SLRs, and each organisation will document it in their own way. It is important to remember that these documents, along with SLAs, OLAs and UPCs are all subject to the ITIL Change Management
Process.

S5AP13-Activity
S5AP14 - Sub-Processes

S5AP12 - Service Catalogues & SLRs

In the last few pages we have been looking at those agreements and contracts that form an important part of Service Level Management.
But how do we establish which services are

In the next few pages we will look in some detail at the Service Level Management subprocesses. These sub-processes can be grouped into 4 stages as shown.

available for inclusion in these agreements and


contracts, and which ones our customer or
users would like?

So let's look at these 4 stages individually, and see how they fit together to form a complete Service Level Management process. The first stage is Initial Generic. The first activity at this stage, assuming that a Service Level Management team is in place, is to build the initial Service Catalogue.

Well, there are two other important documents in Service Level Management, which can help
us with this decision, and these are 'A Service

Catalogue', and 'Service Level Requirements'


or SLRs.

A Service Catalogue contains a list of all services used by each customer group. A service Catalogue could be used internally by the service provider, for example, the Service Desk might use it to help them identify those customers entitled to a higher level of service.

The second related sub-process is planning the SLA structure and establishing which SLAs we need to create. This activity involves prioritising the modification of pre existing
SLAs, in order to re work them into standard
formats.

It can also be used externally as a marketing tool, providing a shop window, showing all the services on offer to the business. Commonly,
Organisations now make this available on their
intranet.

Assuming, we've built the Service Catalogue, agreed the SLA structure, and prioritised the work, we can move onto the second stage of 'Initial per-service', and its related sub
processes where specific issues. we address customer

Service Catalogues exist in a number of forms. They are often created as an internal

The first point is to establish Service Level Requirements or SLRs. Find out what users would really like from that service, and what customers are prepared to pay for.

document, listing existing services when Service Level Management is initially established. At a later stage, it might be

The second sub-process uses those SLRs to review the underpinning contracts and OLAs

74

Session 5A - Service Level Management

already in place with internal and external service providers.


Once we are happy with both our OLAs and
UPCs we can create a draft SLA.

Monitoring OLAs and UPCs will help us to understand why SLA breaches are occurring, and also to identify future trends, and possible future SLA breaches. Remember you can't
control things that you can't monitor.

And when the draft SLA is available, agreement should be sought from customers and users that it represents an adequate specification of service.

External reporting should be written in a simple and clear way. An exception report is a typical
example of external reporting, and it should

simply point out when, where and why SLA


breaches or near breaches occurred. It should

Once the agreement is formally signed, the SLA must be implemented. This involves informing all parties constrained by the SLA,
that it is in place.
S5AP15 - Sub-Processes

also explain how we intend to prevent things


from getting worse.

A Service Level Management Agreement Monitoring Chart, or SLAM chart, is a popular mechanism for external reporting. The colour coding used here is quite common, hence this
type of chart is sometimes called a RAG, or
Red, Amber, Green chart. Such charts offer

The third stage in the SLM process, includes the on-going per service activities of monitoring, reporting and review and modify.
The fourth and final Service Level

simple to understand graphical representations


of service level parameters, and show where breaches or potential breaches have occurred.

Management process stage is defined as ongoing generic. It involves sub processes, such as maintaining the Service Catalogue and updating it with new services.
A further activity is to review the Service Level

Another important monitoring tool are trend graphs. Businesses are very interested in consistency of service as well as quality. For

Management process itself. By establishing


Critical Success Factors we can measure

example, trend graphs can display graphically that over a three month rolling period, that the trend is for greater throughput of activity, and for less breaks in service. In displaying these
trends to customers, we can convince them

performance,

we

can

also

set

Key

Performance Indicators, or KPIs for what is


considered a successful service.

that we are achieving Service level targets,


and are likely to continue to do so. S5AP17-Activity
S5AP18 - SLA Contents

The final activity is to consider a Service Improvement Programme or SIP. Service

Level Management should look at all provided services and their associated quality requirements to see how we can improve
service levels without significant increases in
%>&/
cost to the business.

So

what

does

typical

Service

Level

Agreement consist of?

Well, broadly speaking, its contents can be S5AP16 - Reporting We briefly mentioned the activity of reporting earlier in this session. Reporting can be
subdivided into either external or internal

broken

down

into

three

sections.

An

introduction, Agreed Service general extra statements.

Levels,

and

The SLA introduction describes the service, its

reporting.
Internal reporting involves monitoring service quality in SLAs and related OLAs and UPCs.
This detailed monitoring of service quality is normally set up by the Capacity and Availability Management processes. They will be interested in all activity which affects all service clauses, including breaks in service,

scope, the intended customer group, the


commencement date and its duration. It should

be written in clear and concise business terms,

and it should be authorised at an appropriate level, by both parties. 'Agreed Service Levels' will define a number of measurable clauses, for example, normal hours of service, availability and reliability of
the service.

time to repair, response time to users and so


on.

75

Session 5A - Service Level Management

Clauses related to 'throughput' are also common, detailing the number of transactions the service is expected to support in a defined period. SLAs frequently contain clauses covering transaction response times. This is often broken down into several response types, including system responses, a request via mouse click on a PC for example, or an incident response, detailing the maximum time allowable in responding to an incident report. There may be as many as 20 different measurable clauses in an SLA, against which, customers will want us to report.
S5AP19 - SLA Contents The third section in our SLA deals with the

In order to establish customer's perceptions of its service, Service Level Management should carry out regular service review meetings. Typically these meetings involve customers rather than users and consequently shouldn't be used as a substitute for user questionnaires
and so on.

Ahead of these meetings Service Level Management staff should review customer
related incident records from the service desk,

so that they are able to answer any questions


about these incidents.

Review meetings can lead to suggestions for change, remember however, that they are not the place where changes are authorised.

additional statements, such as service charges and how they are structured. Mechanisms for change should also be outlined in this section. Remember however, that changes to SLA clauses should be handled via the Change Management process. If a request is received to amend an SLA clause it is important that the proposed change undergoes a thorough impact analysis. Changes in one SLA can impact on others, for example changing one SLA to allow more users on a network might have an adverse effect on other customers using the same
network.

The Service Level Management process can carry out its own internal review. This review should be carried out by the head of the Service Level Management team, or process
owner.

A key activity in the review process is to review KPIs. Some typical example KPIs might include Customer Perception ratings, the number of service reviews held, and how many are held at the right time. ITIL suggests that
these reviews are held on an annual basis,

although many organisations hold them more frequently. S5AP21 - The Service Level Manager
The SLM Process must be 'owned' in order to

Statements on provision of service in case of a disaster are also important. It is the role of IT Service Continuity Management to create cost effective plans to deal with potential disasters,
such as fire and flood. It is common to state in

be effective and achieve successfully the benefits of implementation. The Service Level Manager must be at an appropriate level to be able to negotiate with Customers on behalf of the organisation, and to initiate and follow through actions required to improve or maintain agreed service levels.

SLAs at what level, and how quickly service


will be available after a disaster.

Also included are statements of User and

Customer responsibilities. Customer statements might include defining the maximum number of Users at any one time, or a commitment to provide data to the IT supplier in the event of weekend working for example.

This requires adequate seniority within the organisation and/or clearly visible management support. It's important that the role acts as a conduit between IT specialists and the customer, translating technical language from the IT groups into

understandable business language, and vice


This can be a lengthy section of the SLA, and it's important to remember that an SLA is an
agreement between the business and IT with responsibilities on both sides.
S5AP20 - Service Reviews
versa.

In summary we could define the characteristics

of a Service Level Manager as being:


A good negotiator, firm but fair A good communicator, both written and oral Business orientated, customer focused and technically aware

76

Session 5A- Service Level Management

Good under pressure. This can be a stressful

role, as it interfaces between two very strong


minded communities.

S5AP22 - Benefits & Problems

The benefits of and potential difficulties with Service Level Management are listed on page
45 of the little ITIL book and in section 4.2.1 of

the Service Support manual. They are also


summarised here for your convenience. S5AP23 - Summary

In this session we have been looking at


Service Level Management.

^v

We have seen how ITIL defines the goal of Service Level Management, how it's often driven by a Service Improvement Programme, and why it's regarded as essential to the ITIL
structure as a whole.

We examined the relationships between the customer, the IT provider, and external suppliers, and went on to look at the structure of Service Level Agreements and the different way in which we can tailor service provision to
customer needs.

We went on to look at the structure of Service

Level Agreements, and their relationships with Operational Level Agreements and Underpinning Contracts, and discussed how, by producing a Service Catalogue and a Service Level Requirement document, we can better satisfy customer's requirements. We examined the Service Level Management sub-processes in detail, including, planning an SLA structure, and the monitor, report, review and modify activities.
We listed the key characteristics of the Service Level Manager role, and highlighted some of the potential benefits and possible problems associated with implementing a Service Level Management process.

77

Das könnte Ihnen auch gefallen