Beruflich Dokumente
Kultur Dokumente
Overview of
The function of the control buttons is fairly selfevident. For example, this button here will allow you to pause and restart the course at
any point.
bar is displayed. Clicking and dragging the button left or right will allow you to rewind or fast forward to any point in the page.
Clicking this button will take you forward to the
next page in the session. The end of each
quality environment. Examine Service Management and the Organisation, the ICT infrastructure, and how
we define a service in IT terms.
If you want to review a previous page, then the back-arrow button will return you to the page before the one you are now studying.
Clicking on this button will exit you from the
current page and return you to the main menu
screen.
Finally we will examine the functions that make up the core ITIL processes.
S1AP2 - Using this Course The user interface for this course is designed to be fairly self-explanatory.
However, we will take just a few minutes to take you on a tour of the various controls and
Over here there is a progress bar gauge, which indicates how far you are through the
current session.
Your position in each of the sessions is automatically recorded - or book-marked - as you start and end a page.
When you select a session from the main menu you can either put your mouse cursor
Most of the screen that you are looking at is taken up with the work area. This is where material is presented, questions are asked and interactions are made at various points throughout the course. Around the edges are the various controls and items of information that you will find useful as you progress through the course. Here in the top left you will see the reference for the session that you are currently studying, the subject that is currently being covered within the session and also a detailed page reference. If you have any queries or problems you should note this reference and quote it when you contact our support staff for
assistance.
over the main session title, which will always take you to the start of the session. Or, you can click any where on the bar gauge to return
to your bookmark.
Along the top of the screen you will find some very useful function buttons. Running from left to right these buttons allow you to access: A course contents pane. Here the main sections are listed in the order they appear in the course. Think of this as a "Contents" page at the start of a book. By clicking on any entry here you will be taken straight to that page
within the course.
Along the top and bottom areas of the screen, you will see a number of buttons to help you navigate around the course and access some of the ancillary features that are provided.
The next button along will take you to a Glossary of ITIL terms. Clicking on the tabs at the top of this pane will display the glossary entries in alphabetical order.
A course keyword index pane. Think of this as the index page that you would find at the end
the appropriate page within the course. Alternatively you can search for a specific word by typing in the search field.
Note that when you use either of these functions, the book-marking facility is temporarily switched off. The next button along accesses your
It was conceived by the UK government who approached various organisations and subject matter experts to write all of the books in the library, and it was originally published in the
Iate1980's.
Favourites. Clicking here reveals a box where you can add the current page you are on to a list of your favourite pages in the course, or ones which you would like to review at a later time. This list can be edited by changing the name of the page for your reference and removing pages from the list. Clicking on any of your favourites in the list takes you to that
page.
Since its inception ITIL has expanded from a library of books into a whole industry, with many organisations offering related products including training, consultancy and management tools.
S1AP6-What is ITIL?
*^m0f
The FAQ button provides you with a list of Frequently Asked Questions about ITIL for your reference. The acronym buttons will take you to a directory of ITIL acronyms, which will help you understand parts of the course. Clicking on the tabs at the top of this pane will display the acronym entries in alphabetical
order.
The ITIL Library consists of seven volumes, although the central part of the library consists of just five. Service Delivery, Service Support, Business Perspective, Infrastructure Management, and at the centre - Application Management.
You can adjust the sound to a comfortable level by clicking the volume control button. Once you are happy with the volume level, release the slider and the panel will close.
Applications Management holds the central position as it's the only volume in the library which deals with both Development and Service Delivery issues. There are two further ancillary volumes, which provide additional guidance. They are:
'Planning to Implement Service Management', used by Project managers who are implementing ITIL.
This button opens your 'Sticky Note Pad'. Each session contains its own Sticky Note pad. You can use this to type reminders, thoughts or information that you think will help you as you prepare for the exam. If you wish you can copy and paste text from the subtitle pane into your sticky note pane. If you would like to print your sticky note pad - click here.
Clicking this button will reveal the subtitle pane. This useful function allows you to see a text transcript of the page narration. A useful slider is provided to adjust the text size, and the font can be changed by clicking here. You can print the subtitles by clicking here. S1AP4- Activity
S1AP5- What is ITIL?
core consists of the two major volumes, 'Service Support' and 'Service Delivery'.
In addition to the two main manuals we will
also refer to a guidance overview booklet known as 'little ITIL', and its sister publication
So what is ITIL?
"A Dictionary of IT Service Management". These overview booklets are published by the
IT Service Management Forum or ITSMF.
'little ITIL' book. This 'overview' will provide you with enough knowledge to confidently sit
for the Foundation Certificate in Service
As these projects develop they approach a transition point. A transition point is defined as the point at which responsibility for the project
passes from the development team to the
Management.
For example a development team might retain project responsibility until the end of a warranty
period, at the end of which they hand over the
complex relationships which affect projects, and this is known as Application Management.
Application Management considers the whole 'cradle to grave' lifecycle of an application,
It considers applications as 'strategic resources' that need to be managed throughout their life, understanding the implications that decisions made at one stage
has on later stages. Although this process isn't examined in detail in this course, it is important to understand the
k^
S1AP10-Activity
S1AP11 -The IT Infrastructure
If service provision to business is to be effective, then its implementation should be as transparent as possible.
It should be assumed that end-users have no
providing high quality services that are available when users want them, that respond quickly to demand, and that are easily
maintainable.
As IT management staff, you will be working alongside technical specialists helping to maintain the ICT infrastructure, and ensuring
that delivered services are cost effective. The ICT infrastructure is divided into 3 areas
IT
Service
An integrated composite that consists of a number of components, such as management processes, hardware, software, facilities and people, that provides a capability to satisfy a stated management need or objective. S1AP13 - ITIL Disciplines
environmental
infrastructure,
including
operating systems, database management systems, development tools and general applications and the computer data itself.
Inclusion of data here is a contentious one, as
it's suggested by some people that a fourth infrastructure category should exist, handling data as a separate corporate resource.
And finally, Peopleware, this includes skills sets, details of training products, documentation of both products and services, working practices and general procedures.
To deliver effective services to business, all
Management, Problem Management, Change Management, Release Management, and Configuration Management.
All six disciplines relate to the day to day maintenance of a quality service. In ITIL terms all of these disciplines, except for
Service Desk are defined as processes.
Service Desk, in ITIL is seen as a function.
You should be aware that ITIL draws a differentiation between functions and
The management of Hardware and Software is dealt with in a separate ITIL guidance volume called 'ICT Infrastructure Management'.
Our focus in this course is the management of 'Peopleware', its documents and procedures, and how it relates to Service Support and Service Delivery.
S1AP12 - What is a Service?
processes and these are defined in the ITIL Dictionary. The definitions are repeated here for your convenience. S1AP14 - ITIL Disciplines
everyday lives. Placing an order for goods or services for example, or when checking into a hotel, we are being offered a business service. In most cases businesses are underpinned by
IT services. The IT service consists of a set of
ITIL does not mandate the creation of specific functional areas. So, for example a Problem Management team need not be separate from a Capacity Management Team and so on. In practice, many organisations do follow this model, but ITIL guidance allows you to form
your own structures.
related functions provided by IT systems in support of the business, and is seen by the
customer as a coherent and self-contained
entity.
However, ITIL does suggest one good practice, and that is for Configuration, Change
and Release Management to 'share' staff, and to be managed by one individual. This shared management is known as the CCRM or
Obvious examples of IT services might include e-mail, payroll and order processing.
10
Although we have represented each discipline here as a separate entity a great deal of
interactivity exists between each of them. Each process communicates with others in the
And finally we looked at the eleven disciplines which form the core ITIL processes, and the interactivity which exists between them within IT Service Management.
group. In fact there is a great deal of relationship management within IT Service Management.
For example, Service Level Management deals with the provision of high quality
services, provided at the right cost levels. Consequently it interacts frequently with IT Financial Management.
%j^
Interaction between other processes might be less frequent. For example, Capacity Management and IT Service Continuity Management might work together to develop a
cost effective and workable strategy to handle
a major disaster, such as a flood.
In this scenario, Information on available
managed
by
IT
Service
Continuity
Management.
S1AP15-Summary
In this introductory session we have briefly examined the history of the ITIL library, its make-up, and how Service Delivery and Service Support sit at its core.
We have discussed how ITIL's flexibility allows easy integration into a recognised quality system, such as ISO9000.
We looked at the ICT infrastructure and its
three constituent components, Hardware, Software and Peopleware. We highlighted Peopleware, its documents and procedures as a primary focus for this course.
We defined 'What a service is' in IT terms, and
11
The Incident Management process enables the recording, tracking, monitoring and resolution
of events that are a threat to "normal service".
In this session we will be examining the IT Service Desk, which is described in Chapter 4 of the Service Support book of the IT Infrastructure Library.
Problem
Management
addresses
the
List the main reasons why the establishment of a service desk can have major benefits for
the organisation, the end-user and the IT provider alike.
We will be looking in more detail at both Incident Management and Problem Management in the remaining two sessions of this topic.
For the rest of this session we will be
Describe the importance of the Service Desk as a single point of contact for IT users. Identify three of the main approaches to
structuring a service desk.
When a Customer or User has an issue, complaint or question, they want answers quickly. More importantly they want a result the issue solved.
Explain what is meant by "escalation" in a service desk context and identify two different types of escalation procedure.
Name at least six technological aids that can
S2AP2- Activity
S2AP3 - Introduction
One of the most important considerations when delivering IT Services is to ensure the provision of proper support for the users, so
either a rapid resolution or a work-around to the fault that will enable them to carry on with their work with a minimum of interruption. In order to support users in this way, ITIL has three closely related chapters, namely: Service Desk, Incident Management and Problem Management.
The Service Desk is meant to be the focal
the company's products can call to get support. Another Service Desk may exist so that employees can get answers to queries relating to company policies, personnel issues
and so on.
point for the reporting of incidents, requests for change, or any queries that a user may have
about the service. On the other hand it also
For the purpose of this course will be making the assumption that the term Service Desk
refers to an Information and Communications
Technology -or ICT - Service Desk. The integration between IT and communications technology is so close these days that it makes
13
skilled
network
technicians
or
database
S2AP5 - Activity
experts, for example, to concentrate only on the complex faults or concentrate on improving the quality of the infrastructure.
It would usually be the case that the users or customers are performing a valuable function for the organisation.
proposition. So it is important to understand why such a facility might be needed and the
benefits that it should provide.
The
word.
So, any time that they are unable to operate at full efficiency as a result of a fault with the IT Services that they use will be both disruptive
and costly. An effective Service Desk will significantly
reduce the likelihood of such faults.
users
of
our
IT
services and
their
This factor becomes even more crucial in an ebusiness context where the lack of service will
Another guiding principle of ITIL is that IT should maintain a focus on the support of business goals. IT does not exist to provide ICT components or technology just for the sheer joy of playing with new equipment.
It is there to help the organisation achieve its business objectives. A well-staffed and
efficient Service Desk is a critical element in
Another major benefit of a Service Desk is its contribution to the continuous improvement of the services offered by IT. The Service Desk will keep records of types of enquiry, the issues that are raised, the particular services, or aspects of a service, that
seem to cause most issues and so on.
Identifying the most commonly occurring faults and feeding this information back quickly to the IT Service Management structure is a critical aspect of the Service Desk. In this way, the Service Desk is the thermometer by which we can monitor the health of the IT services that are being provided. Additionally, the service desk can also operate as a "shop window" - adding value to the business by making users aware of facilities that they may not know exist - or how to make
better use, in a business sense, of the facilities
For example, one alternative to a Service Desk is for each group of users to have their own "super-user", to whom they can turn when things go wrong.
However, ITIL strongly suggests that IT costs can be reduced by not requiring high levels of IT skills within the business community, and by making it obvious to all how support can be achieved very quickly via a Service Desk.
Also, making better use of skilled and expensive IT staff can also reduce costs. Straightforward issues can be resolved
There is often some confusion about the terms "user" and "customer" - so far in this course
we have used the words interchangeably and for many people they mean pretty much the
same thing.
14
Some organisations will take this principle to its ultimate conclusion and have a single
Service Desk as the point of contact for everything to do with the ability of the business to continue to function properly. So staff within such an organisation could call the Service Desk if the lift broke down, or a light bulb in their area failed^ or if they had a query on their pension arrangements.
This kind of Service Desk has the
person who actually uses the product or service under discussion. A machine operator for example.
A Customer is the person who negotiates for the provision of the product or service, what
the specification should be, any changes that may be needed and possibly the payment
arrangements.
are the same person. But in many cases, for operational systems, they will be different groups of people. Customers normally being managers, and users being the operators.
These definitions are relevant here because
Technology issues, providing users with a single telephone or fax number, or with a
single web or email address.
provider, the
Service
Level Management
process is the main point of contact between the paying customer and the provider.
Level Agreements - which will contain statements about hours of availability, time to resolve issues, response times and so on. The importance of this to the Service Desk is that they must be aware of what Service Level Agreements are in place and how these relate to the questions, complaints and issues that may be being raised by users.
"friend" within the IT department. This particularly relates to the role of the
Service Desk in:
It may well be for, example, that a user calls in complaining of a 2 second transaction
response time - when in fact the Service Level
Chasing any experts that have been assigned responsibility for resolving an issue. Keeping an eye on any Service Level Agreements that may specify maximum acceptable response times for resolving user
issues.
Such an incident would be given a much lower priority than had the figures been reversed.
So, the general point is that the Service Level Agreements provide the link between the
Customer, User and Service Level
Management and that the Service Desk has a responsibility to act on behalf of the User
within the IT infrastructure.
As we have already seen, the idea of the Service Desk as a single point of contact is an important one in ITIL.
It is not uncommon, for example, for the Service Desk to publish regular electronic
15
newsletters to the user community informing them of new facilities, changes to services and
soon.
Desk should have the following ingredients: Well trained staff with good interpersonal skills.
organisation-wide standards and consistency. Also, sessions learned in one area may not be passed on to the others.
Well organised systems and processes for recording and tracking incidents and matching against previous incidents and solutions.
Appropriate technology, such as automatic call distribution equipment and knowledge-based systems that assist in identifying solutions to
issues.
Such difficulties can be minimised by the use of centralised logging of incidents and resolutions and by establishing a central configuration management database that is accessible by all the local service desks.
The big advantage of this approach, knowledge, will obviously become important the more geographically functionally dispersed the organisation's local more and sites
Enough technical competence to address users' issues directly or to interface with technical experts if necessary.
In addition, the Service Desk must have all the
Level Management so that potential breaches of Service Level Agreements can be recognised. Configuration Management records will need to be readily accessible so that, for example, a caller's IT equipment can be easily identified. Conversely, the Availability Management process will be keen to look at Service Desk records of incidents for conducting their own analyses and as part of their role in improving service availability.
S2AP15 - Service Desk Structure
There are dangers, however, in that a perceived loss of local knowledge may tempt local sites to set up their own super-users or unofficial help desks. Another major issue with this centralised
approach is the cost of voice and data
communications.
planning will be needed, otherwise long distance telephone calls could easily drive up the cost of providing the service to
unacceptable levels.
The Virtual Service Desk is based on the
A debate that often takes place in the early stages of implementing a service desk is how
the desk should be structured, from a
geographical perspective. There are a number of strategies that will usually be considered.
and that whilst the Service Desk may be perceived as a centralised point, it may
actually consist of several local service desks.
16
reality their calls may be automatically routed to the most appropriate desk, based on the proximity, time of day, staffing or whatever criteria apply.
Although there are some complexities with this approach, it clearly has many advantages and is becoming a very common arrangement for multi-national organisations offering 24
This option is obviously much more demanding on the use of technology, particularly
telephony re-routing equipment, in order to ensure that the whole process appears
transparent to the end user.
So a typical "follow the sun" strategy might consist of a service desk in Australia, operating between the hours of 6am to 6pm local time and a second desk in London operating the
same hours local time there.
Machine generated communications could come from some form of system monitoring tool. For example, the loss of a particular
communications link in a network would
as Operational Events.
The aim is to provide as close to 24 hour coverage as possible for users in each hemisphere with the European service desk coming on line just as the Australian one is closing down for the night - and vice versa.
about possible issues caused by the fault or take action to repair it.
So when a service desk is established, the different inputs that will be encountered must be anticipated and catered for.
So, people in Europe requiring support during the night will have their calls automatically re
routed to Australia.
A major advantage of this approach is that the local desk will tend to be handling local calls during the period of peak demand - so that overnight re-routing, and hence long-distance traffic, should be relatively minimal - but it's
there if needed.
Clearly, some of these inputs allow potential for some form of automated response. If something comes in via e-mail then at least an acknowledgement of receipt can generated
automatically.
It may even be possible to introduce a degree of self-service where users register and track
their own incidents without the need for inter
Of course, "follow the sun" may well be more than two service desks, depending on the
location or users, time differences and
coverage required.
To make this work effectively it is imperative that information about incidents is replicated or
shared between the different sites so that the
Be careful with this one though. It can all too easily be used as an excuse for the service desk not playing its role in monitoring and processing incidents on behalf of the user as
the user's friend.
European Desk, for example, can continue to support a user with a query that may have
been raised with the Australian Desk a few hours earlier.
17
Also, be careful with telephone calls. If they are not handled properly it is possible that the user will hang up in frustration and not re-dial.
Hence the information that would have been
Very explicit parameters need to be established to govern hierarchical escalation; otherwise it is very easy for it to become the norm, rather than the exception, which would clearly be unacceptable.
S2AP21 - Service Desk Capability Related to the escalation procedures is the general debate about how skilled and capable of resolving the issues the service desk staff
should be.
had been dropped, which in turn will be used as a key measure of service desk performance.
Lost calls of this kind are often referred to as
"fugitives". There's a fault out there that cannot be investigated because it hasn't been recorded - and although the user could have been more persistent, the fault is with the service desk staff and or their technology for not making it easier for them to report the
incident.
S2AP20 - Escalation
ITIL does not make any recommendations in this respect because there is no absolute answer - every case must be considered on its
merits.
Factors that are normally considered include the increased costs of employing more highly skilled staff against the improved service to the end-users that will almost certainly result.
Escalation Management is an important part of running an effective service desk. Escalation is the process of moving an incident or query to the point where it is most ably
resolved.
Also this may be a dynamic situation with the optimum skill level changing over time. Immediately following the introduction of a new service, for example, it may be desirable to have some experts available on the service
desk to handle the initial rush of calls about the
new system.
Here for example, in a generic rather than just ICT service desk, calls that cannot be directly handled by the service desk will be directed to experts in the relevant functional area. The percentage of calls that get passed upwards will be determined by the skill levels and training of the service desk staff.
Once things have bedded down it may be possible to re-locate them to more productive
areas.
So at the one end of the scale we may have an unskilled service desk, merely logging and routing calls - and at the other end would be an expert desk capable of handing most, if not all, the conceivable issues at the first point of
call.
So functional escalation is the handing over of responsibility to a functionally more competent area, in order to tackle a particular issue.
Hierarchical escalation is where issues are
In between these would be what is often called the skilled or semi-skilled service desk - and
passed up the management chain - either because they are very serious or need higher level authority to sanction the resources needed to provide a solution.
The first level of hierarchical escalation would
normally be to the service desk manager, who is usually the owner of the incident
management process.
referral.
More serious issues may then go up to the
problem management team, with a remit to call together the necessary specialists to resolve the incident as quicklyas possible.
vary considerably
18
Whatever skill level is adopted, the use of diagnostic scripts will increase the rate of
resolution
soon.
Examples of telephony technology might be Automatic Call Distribution systems, which ensure that a bank of service desk operators are used in an optimal order and that work is smoothed out as evenly as possible.
Conference call facilities can be useful in
Service Level Agreements must also be accessible so that work can be prioritised depending on the SLA clauses.
Regardless of the technical skills that are put in place on the Service Desk, all operators
must have certain basic attributes to make
Computer-Telephony Integration can achieve major gains in efficiency. An example of this would be the identification of an incoming caller based on their telephone number and the linkage of this with a configuration
management database.
An articulate nature - in particular the ability to translate technical information into something
that is meaningful to the business user. This
Useful software technology would include Intelligent Knowledge-based systems that record incidents, learn from them, identify patterns over time and are able to suggest
probable causes and solutions.
A methodical approach to questioning and the recording of facts - and the ability to maintain that approach when under severe pressure or when handling a difficult customer.
In addition, database access would provide fast identification of known errors, problems or any information that would help to provide a
better answer to a call.
A good business perspective and understanding of what are the business critical services. This business culture is often helped by recruiting service desk staff from within the
business itself.
issue to a pre-determined list of second-line support staff, perhaps after a certain period of
time.
And finally - multi-lingual capability is becoming an increasingly important attribute for some service desk staff. This is particularly
true in the case of the virtual service desk, as discussed earlier, or in multi-national organisations.
S2AP23-Activity
S2AP24 - Service Desk Technology
introducing such technology must be carefully weighed against the benefits that they bring in terms of service improvements and operational efficiency.
S2AP25 - Benefits & Problems
For the service desk to work effectively, some investment in modern technology will be
needed.
The benefits of and potential difficulties with Service Desk are listed on page 14 of the little
ITIL book and in section 4.1.8 of the Service
Relevant technology can be categorised into two types, telephony and software.
Support manual. They are also summarised here for your convenience.
19
S2AP26 - Summary
Finally we have seen some of the new technology that can be employed to improve the efficiency of operation of the service desk.
20
Session
2B
Incident
Management
S2BP1 - Objectives
priority for Incident Management is recovery of service as quickly and painlessly as possible.
Problem Management is more about identifying the underlying cause of faults and finding ways of engineering out these faults in the longer term.
This can of course lead to some conflict
Incident Management
Their colleagues in Problem Management, on the other hand, would like to have the system
down for longer so that they can conduct
As we mentioned in the previous session, the Service Desk often plays a key role in Incident Management; recording and monitoring their progress and retaining ownership on behalf of the user as long as the incident is still "open". It is considered good practice to record all enquiries as incidents because they are often evidence of poor quality training and/or inadequate documentation.
Alternatively, system monitoring tools may have alerted technical specialists who would
rectify the problem, but again with no central recording or control.
It may be that following the initial logging, a distinction is made between simple queries
and an incident that relates to a failure or
This approach led to poor use of expensive resources - the IT experts to a failure to learn sessions from previous incidents. ITIL Best Practice processes aim to resolve both of
these issues.
degradation of a system.
A request for a new product or service is usually regarded as a Request for Change
rather than an Incident.
One of the main goals of Incident Management is to restore normal service as quickly as possible, with a minimum of disruption to the
business.
However, because the processes are essentially similar, many organisations include Requests for Change within the scope of incident Management.
Automatically registered events, such as the
failure of a disk drive or a network connection, are often regarded as part of normal operations. They are still included in the
simultaneously.
It is important to distinguish between Incident Management and Problem Management which is the subject of the next session.
definition of Incidents though - albeit that the service to end-users may never be affected.
21
It is very important to understand the process that an incident goes through from its initial detection right through to its point of closure. The first step is the detection and recording of the incident. It is vital that every incident is logged with a unique ID reference - even if we know that the problem has already been reported and a fix is being produced.
If this total process is taking too long then hierarchical escalation procedures may end up being used, as we discussed in the previous
session.
Resolution and Recovery may involve raising a "Request for Change" and getting that change
implemented.
Apart from the basic details about the incident, the log will normally include details of how the incident was reported and the services and Configuration Items that are affected.
Incidents can also be classified into different
Recovery itself may entail the business in further actions, such as re-entering or verifying data. For example, if a disk has crashed, the problem may have been resolved by replacing the disk drive, based on an official request for change. But the service has not been recovered until the data is brought up to date from the backup or archive copies.
Incident Closure should involve some
Also included in this part of the process will be the matching of the details against previously reported incidents to check for known errors, and then assigning a priority to the incident.
It is quite likely, for example that an initial report of a printer problem was classified as a hardware fault - but subsequent analysis determined that the fault was actually with the software. It is important that such corrections
are made to the incident classifications so that an accurate record is maintained.
Initial Support may involve the application of a work-around, some sort of temporary solution that we know about from the existing problem or incident database. Alternatively, a work around may come from the expertise of the
Service Desk staff - in which case it should be recorded forfuture use.
It is possible for an incident to be closed whilst the underlying problem is still under investigation. This would be true where a work around is available, for example. Some organisations have an extra category which is "Incident Closed and Underlying Cause Resolved", which they don't use until the final resolution of the underlying problem. S2AP7 - Incident Lifecycle
In
incident cannot be
immediately resolved at the Service Desk, one of the vital jobs at this point of the life-cycle is to identify the correct second-line support group to whom the incident should be functionally escalated. S2BP6 - Incident Lifecycle Investigation and Diagnosis may result in a direct resolution or the incident being routed to the identified second line support.
This shuttling backwards and forwards of an incident between different support groups is one of the major issues for Incident Management.
Whilst all this is going on there are the issues of ownership, monitoring, tracking and
communication to be maintained.
Additionally, there will be constant updating of the status of the incident as it moves through
the various points of its life-cycle. All of these are proactive activities carried out by the incident management staff - which is usually the Service Desk, acting on the users
22
Whilst ail this is going on, the Configuration Management Database should be being updated with information about the incident,
any problems and their links to incidents, about
any "known errors" and their links to problems, and about requests for change and their links
to known errors.
Because the request was raised as an incident, however, it will eventually have to be
So an integrated Configuration Management Database not only contains configuration item information but also related support records,
such as incidents, problems, known errors,
requests for change, and release records.
The absence of a Configuration Management Database will make it very difficult to harmonise separate incident recording, problem recording, and change recording
systems.
Desk
or
direct
to
the
incident
Incidents can spawn problems if they are occurring incidents, or if the Service Desk or second or third-line support cannot ascertain the underlying cause. Some problems will justify the generation of a "known error", this being an admission or statement that we are aware of the problem
and we have a resolution to it.
Priority is determined mainly by the impact and the urgency of the incident or enquiry.
In other cases, it may well be that a work around is an adequate solution - at both the incident and problem levels.
A good example of this might be ahead of a major infrastructure change, where making significant changes now would not be
worthwhile.
However, other things can also come into play. Pragmatically, resource availability will also have a bearing. So if nobody with the right skills to solve the fault is immediately available
it may have to be put down the list a little.
If a "known error" is generated then in most cases this will lead to a Request for Change in order for the underlying fault to be corrected. Unless, as we have said, there are good reasons why we should just live with the problem for the time being, because the cost of a short-term fix is not justified.
Another factor affecting priority may be the existence of a specific statement in a Service Level Agreement that is threatened by the
incident.
users affected or financial loss for example. So it is important to work very closely with the
23
The benefits of and potential difficulties with Incident Management are listed on page 18 of
the little ITIL book and in section 5.4 of the
Service Support manual. They are summarised here for your convenience.
S2BP14 - Summary
also
considered less urgent than the same fault occurring on the 20th. These two factors together dominate the ITIL model for determining priority. So a high
In this session we have been examining Chapter 5 of the Service Support Manual Incident Management.
urgency does not always mean a high priority - if the impact is considered to be relatively low. For something to be high priority both the impact and urgency must be high.
S2BP11 -Priorities
We have seen how Incident Management is Defined, the scope of Incident Management
and the differences between Incident
Management and Problem Management, which is the subject of the next session.
^$r
As we have already mentioned, Service Level Agreements can also influence priority.
We have followed the main stages through which an Incident passes during its lifecycle and looked at the records that must be kept and the need for an integrated Configuration Management Database.
We have also examined the different factors
On the other hand Incident B occurs on a different service and this is the second incident
that must be considered in determining the priority of different incidents, which may be competing for limited resources.
In these circumstances - all other things being equal - it would be reasonable to give Incident A a higher priority.
The resources available are also likely to affect the priority given to an incident. Although if both the impact and urgency are high then it is likely resources will just have to be made
available from whatever sources.
Where there are a number of medium priority incidents to resolve then clearly the ones that have suitable resources immediately available
will be tackled first.
Note that when a major incident occurs - in other words one with a high impact, urgency and SLA threat - Problem Management staff must be informed so that they can provide
extra support to the Service Desk team.
S2BP12-Activity
24
Session
2C
Problem
Management
S2CP1 - Objectives
Proactive response adopts a forward-looking approach. Trying to prevent issues occurring by providing intelligent analysis of problem trends and statistics, they may even get involved in making decisions about purchasing, and IT provision.
As the term suggests a proactive response is an ongoing and methodical process. The
intention is to minimise occurrences of
The 'reactive' requirement of problem management is to resolve Problems quickly, effectively and permanently. It should identify the underlying problems, which are causing
related incidents, and find an immediate
workaround.
It goes on to define the goal of Problem Management, and that is to minimise the
adverse effect on the business of incidents and
satisfactory resolution found to that problem, then the change will normally be implemented through change management procedures. Whether problem management acts reactively
or proactively, it is important that resources to deal with them are prioritised on a 'business
needs' basis.
problems caused by errors in the infrastructure, and to proactively prevent the occurrence of incidents, problems and errors.
causes of problems. It also helps minimise the effects as well as preventing potential problems occurring in the future, thereby attempting to minimise underlying problems
and their causes.
'prioritising in pain factor order'. The pain factor relates to the number of people affected by incidents, and the related problem, and the
seriousness of the impact on the business.
S2CP4-Activities
Problem Management processes are usually carried out by teams of technically focused specialists who work closely with Service Desk and Incident Management staff, and with other internal and external suppliers.
S2CP3 - Activities
As is common to other ITIL processes, the communication of management information between IT Service Management roles is very important. This information is used both internally, within the problem management
team itself, and distributed to other IT Service
such
as
Availability
25
For example, if IT users were encountering lots of problems caused by poor quality software delivered and supported by a third party supplier, then information gained from Problem Management would be very useful to the Contract Management team. They could use this to help the suppliers make improvements, or in evaluation or analysis of the software or supplied service. In some instances they could
also revoke the contract.
providing support to the organisation. Typically 80% of incidents are caused by 20% of the IT infrastructure components. This Configuration item information can prove useful when attempting to identify the underlying cause of incidents. The provision of management information from problem data to Availability Management for example, can provide vital information on expected levels of availability, and as a consequence, influence statements made about availability in Service Level Agreements.
S2CP5-Activity
S2CP6 - Activities
organisation from reacting to large numbers of incidents to preventing future Incidents, you provide a better overall service to your
customers and make better use of the IT
support organisation resources. Finally conducting Major Problem Reviews. These reviews take place after a problem causing major incident or multiple related incidents have been successfully resolved. It's the responsibility of the Problem Management process to review, identify and prevent the problem reoccurring in the future. Additionally, information from these reviews can identify weaknesses in problem management and incident management processes. These review procedures form part of a 'Service Improvement Programme' a key task for any ITIL conformant organisation which aims to improve value and quality.
S2CP8 - Definitions
Proactive prevention of problems Providing management information from problem data Conducting major problem reviews
Problem Control focuses on transforming Problems into Known Errors. It does this by identifying the root cause of the problem and providing a temporary workaround where possible. This process redefines a Problem as
a Known Error.
Error Control focuses on resolving Known Errors under the control of the Change Management Process. The objective of Error
Control is to be aware of errors, to monitor them, and to eliminate them when feasible and
financially justifiable.
Error Control has become a common process in both the applications development,
enhancement and maintenance environment
So let's look at some problem management definitions in more detail. Firstly, the definition of a problem, which is 'The unknown underlying cause of one or more incidents'.
New Problem identification occurs when we
as well as the live environment; Normally a service and its configuration items are
introduced to the live environment with some
are unable to find a match amongst the definitions of existing problems, or existing
Known Error records. A Problem Record is then raised. One of the most effective Problem
when related incidents are reported in the live environment they can easily be identified.
S2CP7-Activities
These
Multiple related
incidents
are
of
Proactive Prevention of Problems, and Providing Management Information from Problem Data includes techniques such as trend analysis, targeting support action, and
particular concern to Service Managers, as they can threaten reliability clauses within
Service Level Agreements or Contracts. For
26
breaks in service provision, and the duration of these breaks will be no greater than two
minutes.
So any train of events causing us to approach these parameters is a major concern. Hence
Management structure, by providing early Identification of problems, and communicating this information to relevant management
areas.
to the creation of a team of problem solvers mainly drawn from network specialists. We will discuss this classification process in more
detail later in the course.
S2CP9 - Activity
S2CP10- Problem Control Processes
because they form an iterative process. Initial investigation results in initial diagnosis, which leads to further investigation and so on.
Identification
These two stages are complex, and require a good technical knowledge, supported by
problem solving and diagnostic skills. ITIL
Recording
Classification
Investigation Diagnosis
Review & Closure
recommends, amongst others, two techniques to help this process. These are Kepner and
Tregoe analysis and Ishikawa fishbone diagrams. Both are important mechanisms, which allow those working in Problem Management to use a structured approach to problem diagnosis.
Problems
can
the resolution of an error. Once a Known Error has been identified then it is handed to Error
sources. An incident might be completely new and have no matching characteristics with records in either existing Problem or Known Error databases. It may also be a reoccurring incident, which has already been identified. Or it might come about as a result of Problem Management's proactive work, where a trend has been identified and a problem identified as
a result.
Control. Although Error Control remains part of the Problem Management Process Set, any resolution is likely to require some level of agreed change, hence the responsibility for the resolution will transfer to Change
Management.
Recording Once a problem has been identified, a record is created with a unique identifier, and a link is generated to any associated records, such as the incidents that caused it, and also to any Known Errors to which it might relate. It's likely that the problem will pass through the change process, and at this point it will be linked to requests for change. Throughout this process records will also be linked to related configuration items, within the configuration management database.
However, for particular types of problems, there are occasions when Change Management may devolve authority to the Problem Management team. Importantly,
Problem Management must still raise the necessary change records in order to do this.
S2CP12 - Problem Control Processes
Review and Closure
On resolution of every major Problem, Problem Management should complete a major problem review. The appropriate people
involved in the resolution should be called to the review to determine:
27
And finally how can we prevent the Problem from happening again?
Problem closure is the last of the Problem Control Activities and is often carried out
automatically when a resolution to a Known Error is implemented. However we should point out that an interim closure status can exist. For example, when a Known Error has been identified and a solution put in place, a status of 'Closed pending Post Implementation Review' could be assigned to it in either the
Incident, Known Error or Problem records.
A problem's classification may well change as a consequence of the diagnosis activity. This first classification of a problem is described as the 'initial classification'. For example, what at first appeared to be a problem with a network might actually be the result of a database problem. The problem is then reclassified.
However, it is usual to retain both the initial and final classifications, so that resource
'Closed pending PIR allows us to confirm the effectiveness of the solution prior to final
closure.
problem management works reactively to identify problems, by checking knowledge bases for records of problems, Known Errors, changes and so on.
than a .telephone call to the user to ensure that they are now content. For more serious
Problems or Known Errors, a formal review
may be required.
S2CP13- Activity
S2CP14 - Problem Classification
A proactive activity involves the analysis of past incidents, and the IT infrastructure as a whole. For example, analysis might identify that a pre-existing problem at one site, might
reoccur at another site, which has a similar
^y
server, hardware and software configuration. Also involved is the broader analysis of the IT
infrastructure itself. The examination of over
effort required to detect and recover the failing Configuration Item has to be determined. It is also important to be aware of the impact of the Problem on existing service levels. This process is known as 'classification'.
One of the main reasons for problem classification is to ensure that any group of specialists that we bring together to solve a problem is the most appropriate. If a problem is generated by the local area network, then it's important that we assemble LAN and desktop specialists.
complex relationships, or single points of failure, can identify any vulnerable points that are a potential threat to a business.
This analysis might indicate that a particular network route is more heavily used than expected, and as a consequence is a potential
future risk.
Problem classification is also used to prioritise the sequence in which problems are addressed. If we are experiencing a large
number of incidents related to several different
Often this work is carried out in conjunction with Availability Management staff, and involves careful analysis of paths through the component infrastructure that make up the various services. For example, a customer using on-line banking to read their balance may involve hundreds of different paths.
areas of the business then priority must be assigned appropriately. Every incident, problem or change will have both an impact on the business services and urgency. Impact describes how vulnerable the business might be. For example life threatening or
merely a small inconvenience.
Another element of proactive problem management involves working with third party suppliers, and our own internal staff, to ensure all procedures are adequate, for example testing procedures, release procedures and so on. Internal staff can be encouraged to take part in system reviews during development, ensuring a higher level of maintainability is designed into the system.
28
problems
and
Known
Errors
in
the
Organisation.
S2CP16 - Error Control
Error
Control
consists
of
four
defined
informed immediately.
The incident process moves on to: Increase by one the incident count on the
known error record.
Error identification and recording only comes about when a root cause and, if possible, a
temporary workaround has been found.
An incident might have been initially identified as a network error, but recognised in the
Known error database as a database related
error.
Error assessment involves deciding on how to resolve the error and, if this is valid, raising a
request for change to achieve this.
The next process is to extract any permanent resolution or circumvention knowledge from the known error database. If a permanent
resolution exists, then the Service Desk can
Recording Error Resolutions in documents that the problem has 'actually' been resolved. Here Problem Management works closely with Change Management and Release
management process teams, and the enduser.
The third incident example has no match in the Known Error database. However, as it's a pre existing Problem it does have a match in the
problem database. In this case the incident
then follows a similar route to our Known Error
And finally Error Closure. Closure only occurs when the relevant change has led to the business finding a satisfactory resolution to the underlying errors, problems and related
incidents.
example.
All four of these processes are classified as reactive. Error Control also has a proactive element. This proactive activity includes analysing and maintaining the Known Error Knowledge base, in order to provide support to the Service Desk, and identifying underlying
trends in Known Errors.
S2CP18-Activity
S2CP19 - Benefits & Problems
S2CP17 - Incident Matching Assisting Incident Management is a fundamental responsibility of Problem Management. To identify incidents, and to assign actions to them, information management moves it through an Incident matching process model. Let's look at some example incidents and follow their path through the model. The first example is defined as a routine
incident, and exits the model at the routine
The benefits of and potential difficulties with Problem Management are listed on page 22 of
the little ITIL book and in section 6.4 of the
Service Support manual. They are summarised here for your convenience.
also
S2CP20 - Summary
In this session we have been examining Chapter 6 of the Service Support Manual Problem Management
We have examined in detail the standard set of
procedures level.
The second example is defined as a nonroutine incident, in other words, one which isn't
identification processes.
29
processes, and to outline the benefits, and some possible drawbacks, of Problem Management implementation.
30
Session 3A -
To
account
for
all
IT
assets
and
Configuration Management
S3AP1 - Objectives
In this session we will be examining the first of the three ITIL control Processes, Configuration Management, which is described in Chapter 7
of the Service Support book Infrastructure Library.
In this session we will;
support
all
other
service
management
processes.
of the
IT
Examine the relationship between Configuration Management and the Service Deliveryand Service Support functions
Define a Configuration Item in ITIL terms
To verify Configuration records against the infrastructure and correct any exceptions.
S3AP3 - Relationships
So let's start by looking at how Configuration Management relates to Service Delivery and
Service Support as a whole.
Look at the Configuration Management Database, and the type of information and
records it contains
Describe the five Configuration Management sub-processes. Planning, Identification, Control, Status Accounting and Verification.
S3AP2 - Control Processes
several Support and Delivery processes, which amongst other things, enable Service Level
Management to negotiate and comply with SLAs. This whole support structure is underpinned by the configuration management
process.
Changes to the IT services are executed quickly and with the minimum of business risk
ITIL guidance is explicit on this point and states that 'without effective configuration management we are not likely to effectively implement the other ITIL processes, and this will lead us to a failure to deliver a quality
service.'
An integrated set of data exists, recording details about services, their ICT components and any related support records.
ITIL guidance considers this process as the foundation on which a stable organisation is built. In any organisation, knowing what assets
we have and their current status is
broadens
the
scope
of
Configuration
fundamental to business stability. After all, how can we build something without knowing what we are building on, and what we have to build
with.
Management. Most organisations have some sort of asset management system in place, where they know the cost of equipment, where it was purchased, and its current status. Such systems will only cover hardware and boughtin software.
This is how ITIL defines the four major Configuration Management goals:
Existing systems are unlikely to cover the 'relationships' or linkages between these assets. This linkage is very important, making changes to one, can have a knock-on effect on
31
several CIs. For example a service for the personnel dept might consist of hardware,
software and related documentation, all of
Because configuration management's remit is wider than pure asset management, we tend to refer to the information that Configuration Management maintains as Configuration Items
or CIs, rather than IT assets.
which are individual configuration items. These items together can provide a service, and the
service itself can also be defined as a
configuration item.
S3 AP4 - Activity
S3AP5-CMDB
ITIL suggests that we should be able to draw a map of how a service is assembled from its constituent components. This graphical representation can help us understand the impact of any changes we make to a CI on the
that Configuration
service as a whole.
We
have
established
Management underpins all the Delivery and Support Processes, and it defines IT assets
and services as Configuration Items.
We've also established that it monitors the
The CMDB is also the ideal place to hold incident records, problem records and known error records if they are held on separate systems. ITIL guidance suggests linking these
databases, so that we can associate a record
store, manage and update this information? It does this by entering all this information into a Configuration Management database or
CMDB.
with any related configuration items. By doing so, future searches on a particular CI will return information relating to outstanding incident, problem or known error records.
In the change and release section of the CMDB, we may hold requests for change, change records and so on. This information is used for tracking the progress of change and
release records. A release record will contain
which make up a new release, and will describe how to achieve a change defined in the change records.
A CMDB can offer great benefits to an organisation. However the benefits might not be immediately obvious to senior management, who might suggest that a simple asset management system would be sufficient. However, asset management only addresses higher value issues in the infrastructure and
doesn't examine it to the same level of detail.
Details
about
Peopleware,
including
Perhaps more importantly, asset management systems will not contain the linkages to incident, problem, or known errors, or to change and release management records. Nor will it document the relationships between CIs
and asset records. S3AP7-CMDB
information including
related
to
S3AP6-CMDB
Configuration Item as 'any component of an IT Infrastructure, including a documentary item such as a Service Level Agreement or
Request for Change, which is, or is to be,
32
S3AP9 - Activity
S3AP10-Planning
level configuration item records. For example hardware type might be made up of
workstations, servers, network equipment and
soon.
ITIL suggests that Configuration Management is made up of five sub-processes. These are: Planning, Identification, Control, Status Accounting and Verification.
Whatever the CI type, it will require a unique form of identification. Firstly, a unique identifier, which should comply with a pre defined configuration policy. Also an ID type,
which categorises the item into hardware, software, peopleware and so on. Other common CI attributes might include a manufacturer's or developer's id, its location, purchase date and so on.
S3AP8-CMDB
\fffj
S3AP11-Planning
The first of the Configuration Management sub-processes is planning. ITIL suggests five
key points which should be addressed in planning, and these are:
In addition to the CMDB, Configuration Management has linkages to two other information repositories. These are the Definitive Software Library or DSL, and the
Definitive Hardware Store or DHS.
The DSL is the safe storage area for trusted software, and is managed by the Release Management process.
The DHS houses spare parts for critical equipment, and replica configuration models in the IT infrastructure. For example the DHS might contain a fully configured standard
server and workstation.
Again records relating to the contents of both the DSL and DHS are held in the Configuration Management Database.
Also worth noting here is the management of software licences. This has become a major issue for many organisations, and the repercussions of illegal software use can be severe, so it's considered good practice for configuration management and release management to work jointly on this process.
In a fully ITIL implemented organisation, the configuration management team would be expected to hold information about licences, what they contain, and what they cover, as a
CI in the CMDB. However, as with the DHS
Another policy may define that all new boughtin or internally developed systems or services are to be brought under Configuration Management control at the point of hand over, but existing live systems will not be within the
scope.
resources required.
33
Once the strategy, policy and scope are defined, the objectives can be outlined, and a
timeframe in which to achieve them.
management tools, with the benefit of automatic CI recording to the CMDB via these
tools. S3AP14 - Identification
Remember the objectives should be 'SMART' objectives, in other words Simple, Measurable,
Achievable, Realistic and Timely.
S3AP12- Planning
Having dealt with strategy, policy, scope and objectives, our next action is to examine our processes, procedures, guidelines and responsibilities.
The organisation might already have in place processes to control assets, or manage change. Although these may not be formally identified as a Configuration Management process, they may be adapted and improved
upon.
The second of the five Configuration Management processes is identification. The primary focus of the identification process is the establishment of the 'Configuration Item
appropriate.
For example, a complete workstation might be considered as a configuration item, or it could be further categorised into its component parts, and make each of these a CI. This logic must also apply to software, defining a CI as a program as a whole, or a module or sub module of that program. Generally speaking, select a configuration item
level which is most beneficial to the
Planning procedures should be created and maintained along with other related guidelines.
We will discuss this in more detail later in this
session.
And finally responsibility has to be allocated. After all, these plans, processes and changes
have to be carried out. So work should be
configuration management process. The greater the level of control required over an area or service, the greater the level of configuration management record detail.
allocated to staff in either a configuration management group, or a wider configuration, change and release management group. If, in this sample scenario, configuration management is being introduced into the organisation after other ITIL processes, then it is important to define how these other processes will have to change to accommodate the new configuration management process. Alternatively, if configuration management is implemented ahead of other processes, future inter-process relationships will need to be considered. S3AP13- Planning
Be careful in choosing the most appropriate level, balance information availability and the level of independent control, against the resources and effort needed to support the
CMDB at that level.
configuration hierarchy could be restricted by the support tools available. For example
breaking down a workstation into its monitor
and screen, and then further down into its
Relationships with other parties who carry out Configuration management also require particular attention. Suppliers, external software vendors, and developers might have
their own CMDB with which we want to
motherboard, CPU and other component parts, may be impossible if the depth of our CMDB system hierarchy is specified at two levels only.
S3AP15 -Identification
exchange information.
The final point on planning is the use of tools, and other resource requirements. Careful consideration needs to be given to CMDB
A configuration item record may contain information about configuration items below it in its hierarchy. For example, in the event of a workstation failure, the policy might be to
replace the whole workstation rather than the failed component. However, CI information about the failed component could be held in
the record for the workstation.
implementation, whether to design and build a CMDB from scratch, or to purchase an off-theshelf product. Vitally it should be possible to link the CMDB to system and network
34
Also consider that a CI might have linkages to other CIs other than its immediate parent. In
these circumstances the CI information would
S3AP17 - Identification
show its linkage to its parent, and also a 'used by* relationship to other CI's. It would not be
In defining the inter-relationships between CIs, there are a number of typical types' which can be used. The most frequently used in ITIL
good practice are Composition, Connection and Usage.
helpful to lose this level of detail by incorporating details into the parent CI.
Documenting these linkages in the CMDB can have a huge impact on database size. Each new CI added might identify three or four linkages. It's good practice to establish in advance the required levels of CIs in the
'Composition' is the simple parent child relationship. A workstation being the parent,
'Connection' describes the relationship between hardware items. The relationship between a LAN and a server for example.
'Usage' describes the interdependency between application usage of a common software module, or the linkage from one category to the other.
Successfully building and maintaining a CMDB depends on accurately identifying and labelling
its configuration structures and CI versions and types, and their linkages with other CIs. This is termed as defining its scope. Defining scope identifies which items of hardware, software, peopleware and
documentation are to be included. Part of this
Finally having
identified
and documented
process involves identifying the number of 'configuration types', and what benefits their identification will bring.
During development we might want to capture information about CI's and their relationships,
to reflect the position at a particular time. This
is known as 'baselining'. This can be a very useful process, as baselining can provide a rollback point if things go wrong. It can provide a specification from which copies can be built,
and can provide valuable review information
exist, which, except for having monitors of different sizes, are exactly the same. This slight difference in specification wouldn't justify the specification of a new CI type. To help us accommodate these anomalies we can specify
these as a 'CI variant'.
During the baselining process, we should include the relevant related items, including documentation, procedures, peopleware and
so on. Baselines should be established at
lifecycle of the Configuration Item, so, in addition to those items already in the live environment, items in development and awaiting release are also included. At the same time version numbers are assigned.
These numbers should be monitored carefully. If for example the development department assign their own version numbers, then it's important that this information is transferred to the CMDB at the point of handover.
formally agreed points of time. For example, before making significant change to the
infrastructure.
At any point, the current configuration consists of the most recent baseline plus any approved changes that have been implemented. It's very
common to take baselines of standard
35
S3AP20 - Control
The third Configuration Management activity is Control. The control of configuration items consists of three sub processes. These are: Register, Update and Archive. An additional function of the control process is to protect the integrity of configurations.
CIs are registered as they fall into the remit of IT service management. If we receive new equipment from an external supplier, at the point of handover, we should establish that information received from the supplier is accurate. In many organisations this activity has a direct link with procurement. There are many reasons for updating a configuration items status. For example, a change in the CI's status from testing to 'live'. A change of financial asset value. A change of ownership, or changes brought about by incidents, problems or known errors. All these updates have to happen under the authority of
the configuration management process.
changes in a CI's status to be documented. For example the change from 'live' status to 'withdrawn'. It can also help us establish
'baselines'.
Archiving decommissioned CIs takes place when a component is no longer in use. The
definition of what constitutes a redundant CI,
By declaring a status of 'trusted' we save all the configuration items and relationships as a baseline. If we encounter problems at a later
date, we can then retreat to this 'baselined'
Archiving involves the removal of CIs from the CMDB and archiving onto secure storage, and not necessarily the destruction of the record.
S3AP21 - Control
point. Status accounting can also be used to monitor organisational procedures, for instance, that a request for change on a configuration item was properly authorised.
S3AP23 - Verification
An additional function of the control process is to protect the integrity of configurations. The protection process safeguards against illegal changes to CIs, and procedures are
maintained so that the CMDB and the
information it contains are secure. Protecting the integrity of configurations must include security against theft, protection against unauthorised change or corruption. Enforcing access control procedures. Guarding against any environmental damage. Protection against
exactly matches the real life environment. Configuration management offers little benefit
viruses, and making back-up copies of the CMDB information, and the secure storage of these back-ups. Configuration control scope must extend to
'bought in' CIs, such as commercial 'off the
shelf software, sometimes known as 'COTS'
This verification and audit procedure should be carried out regularly but randomly. Deliberate avoidance of the change, and configuration management process is most likely to be revealed by this 'spot check' approach.
packages. By definition this will involve software licence issues, and we will be
software. In addition to the regular 'spot checks', verification and audit would usually be
carried out at the following times:
36
Error process will have links in the database to problem records, which in turn are linked back
items that
are the
And we would usually carry out an audit before the live implementation of a new Configuration
Management database.
Carrying out a manual verification and audit
assess the impact of the change. Change records will be stored, and their status updated as the change moves through its lifecycle. In
this integrated environment we can see the
fundamental
management
role
of
the
and
configuration
configuration
database,
management as a whole.
The ultimate update authority always lies with the configuration management process, but this authority can be delegated in the case of incident and problem records. Configuration
management also remains responsible for
release processes, often acting on behalf on the change and release management
processes.
software are being used, and whether this matches current configuration item records.
The benefits of and potential difficulties with Configuration Management are listed on page
26 of the little ITIL book and in section 7.4 of
configuration management is closely linked with the overall Service Support and Service Delivery processes, both supporting, and depending on these processes.
When an incident is identified it passes through these processes, and it's important to realise how the CMDB, and configuration management as a whole, support this.
The CMDB is used to read and write
information by each of the service support processes throughout the incidents lifecycle. For example, when an incident occurs we will
record it in the CMDB. At the same time we
We went on to examine the configuration management database or CMDB, its structure, and the type of information and records it
should contain. We also looked at how the
37
management
sub-processes;
Planning,
Identification, Control, Status Accounting and Verification, and we went on to look at the
relationship between Service Support and Service Delivery and the CMDB. And finally we looked at the potential benefits and pitfalls when implementing configuration
management.
38
Session 3B Management
S3BP1 - Objectives
Change
In this session we will be examining the second of the ITIL control processes, Change Management, which is described in Chapter 8 of the Service Support book of the IT
Infrastructure Library.
In this session we will:
highlight
why
the
change
management
The first of these is the ability to handle changes promptly and efficiently. When a need for simple and routine change occurs, Change
Management should handle them in a streamlined and pre-planned manner. Where more significant and complex changes arise,
Define what change is in ITIL terms, and the goal of Change Management.
they should be dealt with efficiently, but to an Examine the relationships between Change Management and other ITIL processes.
appropriate level of detail.
Define a Request For Change or RFC, and examine some of its potential sources.
Look at the role of the Change Advisory Board, and the Change Advisory Board
Emergency Committee.
S3BP2 - Introduction
ITIL
So what is Change Management? Well let's start by more accurately defining the term change. It has many definitions, but possibly the simplest one is the most apt.
'Change is the process of moving from one
defined state to another'.
ITIL guidance addresses the potential impact of proposed changes by suggesting the use of fixed change slots in what's termed a 'forward schedule of change'. As a result users are informed about up and coming changes, what the change entails, and when it will take place. As a further safety net, change management carries out impact analysis on proposed changes, and produces a backout plan, giving the organisation a point to which they can retreat if a change proves unsatisfactory.
And finally, Change Management must balance the need for change against the risks to the IT infrastructure of implementing it. S3BP4-Activity S3BP5 - Relationships
procedures are used for efficient and prompt handling of all changes, in order to minimise the impact of any related Incidents upon
service'.
By exchanging information with Capacity, Availability and Configuration management, Change Management is able to 'Assess the overall impact' of the change. Once assessed
we should be able to state whether or not:
The impact is manageable, the cost of change Change Management can either be restricted to changes to the ICT infrastructure and live' ICT services, or it can be expanded to cover all changes, including those in development areas, or changes which are the result of
is reasonable, and business benefits are
worthwhile. Assuming positive answers to all of these Change Management 'authorises the change'.
In many cases this authorisation is with the help of other experts who form a body known as the Change Advisory Board, and in some
39
cases, where the change is a simple one, Change Management can be devolved. In these cases it is common for the Change management process to be devolved to Problem Management, or even to operational
staff.
Agreements. However, it's important, where financially viable, to meet customer's requests.
Implementation of new or changed legislation might bring about an RFC. Particular examples include legislative changes relating to privacy, intellectual property rights, security and so on.
there is an ongoing update of infonnation within the Configuration Management database. For example, a CI status can now be moved to 'under change' and so on.
And finally when a change is ready for release to the wider user community, be it effecting software, hardware, documentation or related infrastructure components, it falls to Release Management to manage the actual physical implementation. Remember however, that overall responsibility for any change remains in the hands of change management.
S3BP6 - Request for Change
The trigger for the Change Management process is the receipt of a Request For Change or RFC.
ITIL defines a number of sources from which an RFC can be received. The most common
A major change in business requirements may generate a significant Request for Change. Such a request may have already passed through a conventional investment appraisal process, and is entering the ITIL Service Management process for a second review. The role of Service Management is to ensure full impact analysis of the effects on existing
services, and on the infrastructure as a whole.
Typically, a request for change will contain such information as the sponsor, the requested date for implementation, an initial list of configuration items affected, services affected, the reason for change and initial costing
information. The exact content will vary depending on the origins of the RFC.
S3BP7-CAB
and well documented are those that form part of the incident resolution lifecycle. For example, where a user identifies an incident and reports it to the service desk staff, who in turn generate an RFC. Or from Problem Management, which generates a RFC after investigation of multiple incidents has led to a known error and a proposed 'structural'
resolution.
Another source of RFC's is the introduction of
in the light of the business need make recommendations as to whether they should be accepted and implemented, or rejected. It also ensures that any RFCs which don't merit detailed consideration by the CAB are
recorded. The CAB will also advise on the
grouping of changes into 'releases' to minimise disruption to the organisation and maximise
benefits.
new or upgraded CIs. For example, newly purchased workstations, their installation, addition to the network, recognition by the server, providing help and user documentation, will all generate RFCs.
Typically a CAB is made up of a Change Manager, who will typically chair the meeting.
Plus representatives of the customer, users, developers, other experts, consultants, outside
contractors, and of course IT service
There may be a 'New or changed business requirement for an IT service', often identified by the service level review process. Again this will generate a Request for Change, and be passed on to the Change Management
Process.
Management staff.
CAB meetings may have different combinations of staff attending; however the
core members of the CAB should be the
user,
and
ITSM
may not have been reported via incident or problem management, and it might not be
outside the hours of current Service Level
In general a CAB is regarded as an advisory body, although in some organisations it is defined as an approval board. Its role is
considered as advisory because the ultimate
40
responsibility for change lies with the change management process and hence the change management staff. When making decisions about a proposed change, the CAB shpuld consider the
business, financial, technical and risk
implications.
at all.
It should
also
consider the
deciding whether or not to implement a change is its likely impact on IT continuity plans.
Making changes to the IT infrastructure without making changes to any fall back sites can be very dangerous. S3BP8 - Activity
S3BP9- CAB/EC
It's usual for RFCs to be logged in the CMDB ahead of this filtering process. However, after filtering it's common for RFCs to change status and be redefined as change records.
If the change is accepted it moves to the next process, and the Change Manager allocates a
could occur at any time. In such organisations it is usual to have a Change Advisory Board Emergency Committee in place. The CABEC are usually called in at short notice to analyse
'standard change'. Whether changes are standard or urgent, the principles for processing them remain the same. However, urgent changes pass through a 'streamlined' version of the change management process, and we will be looking at this process later in
this session.
The committee would usually consist of the Change Manager, who acts as Chairperson, a senior business representative, and senior IT representative.
A word of caution here about CABEC activities. Often due to time and business
In this example, the change is considered non urgent, and so passes onto the 'Categorisation' process.
Change categorisation involves an initial
assessment of the
actions and
resources
pressures, comprehensive testing of changes isn't always possible. Nor are configuration items updated with status or change information. Ultimately the CAB is responsible, through the emergency committee, for ensuring that the change management and configuration management process work together to update relevant records, as soon as possible.
S3BP10 - Change Management Process
We established earlier in this session that the
required to make the change. There are four possible outcomes from this process. These are: Standard, Minor, Significant and Major.
A 'standard' categorisation is assigned when a frequently occurring change is identified. It can then be dealt with via a pre-existing set of processes and authorisations. These change types are usually considered low risk, and don't require consideration by the CAB. An example of this might be a hard disk replacement or upgrade on a user workstation. S3BP11 - Change Management Process The definition of minor, significant and major will differ between organisations, and will be dependent on the current status of the IT infrastructure, and the IT service management personnel's current feelings about risk.
trigger for the Change Management Process is the receipt of a request for change.
To address these RFCs, ITIL defines a
comprehensive change management process, and we will now look at this process in some
detail.
41
A 'minor change' categorisation would usually be authorised by the Change Manager, who will report their actions to the CAB after completion of the change. The aim here is to
reduce the number of RFCs forwarded to the
Note that a failure during the change building process is likely to result in the change returning to the CAB, possibly with a request to modify the scope of the change. It's important that all changes have a back out plan, so that if an error occurs during implementation, the change can be revised and the service
restored.
Manager to circulate RFCs to either the CAB, or in the case of a major change, to company Board or other senior management members.
As we saw earlier in this session, the CAB's
Once the change is complete it moves to an Independent tester, where the change is tested and the quality checks are carried out. If at this point a failure occurs, the change is returned to the Change Builder. If the Change is tested successfully it moves onto the Change Manager, who coordinates the implementation of the change.
Remember that the Change Manager has overall responsibility for the change, but that Release Management normally has control at a detailed physical implementation level.
role is to give advice, provide estimates on required resources and timescales, and put forward schedules for change based on priority and resource availability. The CAB will also perform detailed impact analysis, and this often requires input from ITSM specialists, for example the Capacity Manager. Eventually implementation dates and a schedule are decided upon. This information is contained in a 'forward schedule for change', which is passed to the relevant service management staff, and to the business as a whole. If changes are likely to cause disruption
to the business, then this will be documented
Note that throughout the cycle of building and testing, and during implementation the Configuration Management process is updating the status of change records. Typical states might include; accepted, in build, under test and so on. A change record will typically contain details of the back out plan, when it
was built, CAB recommendations and
The CAB activities of estimating and scheduling may well be iterative, and the process continues until an approved change status is reached, or the RFC is rejected, in which case it might re-enter the process at the beginning in a revised form. At the point of approval, the Configuration Manager updates the Change Management Database.
S3BP11 -Diagram
It's important to accurately manage the change record system within the CMDB, so that we can carry out traceability tests. Change records are usually linked to impacted infrastructure configuration item records, and also to any related incident, problem or known
error records.
The change has now reached the Change Building sub process. The Change Builder may actually consist of several groups of internal or
external staff, who are involved in hardware, software, operating systems, documentation and so on. Change Builders are not normally permanent members of a Change Management Team, but are drawn from areas of technical expertise.
If at the point of live implementation the change fails, then the Change Builder instigates the back out plans. If however, the change is implemented successfully, it's important that the Change Manager reviews the change.
The review process can provide valuable information about our change management process, and can also identify vulnerable areas
in the IT infrastructure. A successful review will
trigger the 'closed' status, and the request for change or change record will be updated in the
42
Session 3B - Change Management CMDB. Note the CAB itself might be involved in the review process. A failure at the review stage would identify shortcomings in the implemented change. This in turn would result
Let's take a few steps back, and look again at the process, assuming this time we have time to test the change. This time our built change passes from the Change Builder to the
S3BP15-Activity
S3BP16 Process
The Change Management process deals with Requests for Change from many areas of the
organisation, and with different levels of
The first action is for the Change Manager to call either a CAB meeting, or in an emergency situation, the CABEC. The aim of this meeting is to quickly evaluate the request for change, by assessing its impact, the resources required
and its urgency. The meeting should establish whether its urgent status is justified. If the
outcome suggests that the RFC status isn't urgent, then it will be rejected, and will be dealt
with as a standard RFC.
authorisation. Where RFCs are frequent and repetitive, they can be dealt with via pre existing and authorised processes. These
processes are known as a 'standard model for
change'.
Standard models needn't only apply to simple changes, often complex operations can have
standard models. In general once an RFC is regularly repeated, we can create a standard model for that change.
We saw earlier in this session how the Change Manager examines RFCs and categorises
them either as standard, using a standard change model, minor, significant or major. To assign one of these categories, the Change Manager examines the RFC, and considers the following:
confirmed as urgent, then it passes on to the next process and in to the hands of the
Change Building Team. The Change Building Team then build the change and prepares a back out plan. When the change is complete, as much testing as possible should be carried out. Completely untested Changes should not be implemented
if at all avoidable. If this is the case, the Change Manager then coordinates the implementation of the change into the live
environment.
S3BP17
Impact. The impact of the Request, for Change will have on the business, considering
such factors as the number of users affected.
Process
with the RFC. An RFC with high impact and high novelty is certainly a higher risk. Devolved Authorisation. Has the responsibility
for change been devolved from the CAB to the Change Manager? Or further devolved to say
the Service Desk?
If the implemented change fails, the Change Manager implements the back out plan. If the change is successful, then the Change Manager firstly ensures that records are brought up to date, carries out testing in the
live environment, and at a later date, reviews
Standard Model. Can the request for change be dealt with via a standard model, with a preestablished implementation process?
the Configuration Manager closes the RFC and updates the CMDB.
43
This RFC is regarded as low impact to the business, and is a well known change, so the
Measuring performance usually takes place over time to show, for example, that the number of urgent changes is reducing. So that the results can be clearly understood at all levels in the organisation, this data is usually represented in graphical form.
Column 2 is slightly different, again the RFC is regarded as low impact, but it hasn't been done before, so its novelty is high, and as a consequence, no standard model exists. Again authorisation is devolved, and it's categorised as a minor RFC. This type of RFC could act as a trigger to build a new standard model.
In our third example, the results are slightly different. Our RFC has a high degree of
Regular summaries of the change process should be provided to service, customer and user management. Different management levels are likely to require different levels of information, ranging from the Service Manager, who may require a detailed weekly report, to senior management committees who only require a quarterly management
summary.
devolved to the change manager. This RFC falls into the significant category.
The RFC in our fourth example has a standard model, however, business impact is considered high, so devolution to the Change Manager won't take place, and it must be examined by the CAB before the standard model processes are implemented. Hence this is regarded as a significant RFC.
The number of changes implemented during the measured period. Number of changes backed out by reason
code.
Number of Staff Training records up to date. Cost per change verses estimated cost.
As both the impact and the novelty are high, the RFC in our fifth example must also be considered by the CAB. This is also a 'significant' RFC.
In example six, we are considering a change which has a very high business impact. For example, changing from an ISDN based telephony system to ADSL. Changes of this magnitude would normally be authorised at a higher level than the CAB. This is categorised as a major RFC. Over time, we should expect the number of standard models, and the changes passing through them to increase. This should result in a reduction of the number of changes
forwarded to the CAB, and reduce the number
By auditing the change management process we can check for compliance to procedures. In general a change management audit should investigate:
All new software releases. Checking that they have been through a proper authorisation
process
Incident Records. Usually selected at random, and tracked through the change process
Minutes of CAB meetings. Not.only to check that CAB meetings have taken place, but also to see if identified action points have been followed through
Forward schedule for change. To see if it has
been accurately defined, and importantly, that it's been published to the user community, and
is being adhered to.
We've seen in this session how Change Management improves the way in which an organisation implements changes. To clearly
44
And finally, that Change review records are in place for all changes. S3BP21 - Activity
S3BP22 - Benefits & Problems
The benefits of and potential difficulties with Change Management are listed on page 33 of
the little ITIL book and in section 8.4 of the
Service Support manual. They are summarised here for your convenience.
S3BP23 - Summary
also
I
w
We studied in some detail the Change Management process for both a normal and
W'
45
Session
3C
Release
Management
S3CP1 - Objectives
In this final session on the ITIL control
These would usually contain large amounts of new functionality, some of which may make
intervening fixes to Problems redundant. A major upgrade or release usually supersedes all preceding minor upgrades.
Minor Software Releases and Hardware upgrades.
Usually containing small enhancements and fixes, some of which may have already been issued as emergency fixes. A minor upgrade or release usually supersedes all preceding
emergency fixes.
Understand how the Release Management process operates, and its relationship with other IT and Service Management processes
Describe what is meant by a Definitive Software Library (DSL), and a Definitive Hardware Store (DHS).
S3CP2 - Introduction
And finally, Emergency software and hardware fixes, normally containing the
corrections to a small number of known
problems.
S3CP3 - Activity
S3CP4-Roll Out
The third and final IT control process is Release Management. ITIL defines the goal of this process as: 'To take a holistic view of a change to an IT Service and ensure all aspects of a Release', both technical and non technical, are considered together.' Release Management implements new software or hardware releases into the operational environment the controlling processes of using Configuration Management and Change Management.
Release Management's holistic approach to IT service change ensures that the business as a whole and any relevant technical areas are ready to accept, implement and use a release. It is the responsibility of the Release
Management process to plan and oversee the 'roll out' of these changes. 'Roll out' includes distributing all the configuration items to wherever they are used. This could be done in a number of ways, either via the internet, by email, or simply by posting CD's. In general, use whatever means best
suits the business.
This all sounds very simple, however the process becomes much more complex when hundreds of servers need to be upgraded simultaneously throughout a large geographic
and cultural area. To ensure successful
required.
This process requires technical competence, and its sub-processes are often performed by technical staff under the overall authority of the Change Manager.
As part of the Roll Out activities, it is likely that you will need to provide scripts to help install the release, as well as passwords to activate
the release when needed. Release
47
Additionally Release Management ensures that we can trace where a particular version comes from, and the related changes it has undergone. This is especially important for "due diligence and governance". To make this possible, software needs to be kept secure
Each of these three stages should be verified as accurate. For example, before implementation, we should be absolutely certain that a build process has been achieved correctly.
Release Management also agrees the exact contents of any release and a detailed roll out plan with Change Management.
S3CP5 - Release Management Process
Note that ITIL refers to specific steps called 'Roll Out Management' and this may take place after independent testing to manage in more detail the actual implementation stages that follow. Roll out management usually comes into play when dealing with very large and complex implementations or 'roll outs'.
Throughout this process it is very important to update the CMDB. Information is held here on Release Records, and any status changes to
these records should be documented.
S3CP7 - DHS & DHL
The Release Management process encompasses three defined areas of the organisation.
The development area, its own area of preproduction, and finally the production area, or
live environment.
Release Management has responsibility for two critical repositories. These are the Definitive Software Library or DSL, and the
Definitive Hardware Store, or DHS.
Information related to the contents of the DSL
The migration from one area to the next, is only permitted subject to satisfactory results from reviews, tests and other appropriate quality checks. Independent testing might include customers acceptance testing, operational acceptance tests and so on. It may well be that significant customer acceptance testing has already been
carried out.
and the DHS is held in the Configuration Management Database, and responsibility for keeping these records up to date belongs to Configuration Management. The DSL contains only trusted versions of software, for example software which has been developed from valid earlier versions via correct Change Management Processes.
However operational acceptance tests are very important - they ensure that anything that goes in to the live environment is supportable,
maintainable and robust.
Also worth noting is that any back out plans which have been prepared should also be
tested.
The DSL may consist of one disk containing all bought in and created software held in a single format. Commonly the DSL consists of separate disk volumes or servers containing
software for individual environments.
Note that Change Management will decide on the particular contents of the release. It's very important that the release management team are fully aware of decisions made by other
processes.
Software assets are particularly vulnerable to unintended loss or corruption, so it's important
to take very good care of the DSL. For
example, employing adequate security and access controls. Appropriate protection against
other threats, such as fire or flood should also
In the production environment we will have to deal with, distribution, potential rebuild and
implementation, of software and hardware
be in place. Backup copies of critical elements of the DSL are often kept at another location.
releases. There may be three separate stages, firstly to distribute software, secondly, to build
48
might include many applications, would have to be rebuilt. Consequently full releases are
expensive to build, distribute and install.
protected in a similar way, and should have specific protection against physical removal. The contents of the DHS should be updated as quickly as possible to reflect the live
environment.
However they do give confidence that all the elements of a service work together successfully. They are most appropriate for
useful if the organisation encounters significant problems with new configurations and software, then it's possible to revert back, by cloning these older versions. Remember, responsibility for maintaining the
contents of the DSL and the DHS is shared
XBases
release. Consequently this is a less expensive option. Delta releases are most appropriate for fixes and urgent or emergency changes, and as such form the most frequent form of
release.
between Release Management Configuration Management. S3CP8 - Release Unit & Release Type
and
To reduce the frequency of Delta and Full releases, and to provide longer periods of stability 'Package Releases' can be used. A 'Package Release' might consist of groups of
delta and/or full releases, which are held back
S3CP9 - Activity
S3CP10 - Release Identification
minor release which involves changes to some of its applications would generate a release ID
ofV:1.1.
An emergency fix to a small element of a module within that system might have a
release ID of VM.1.1. Remember there is no
absolute limit to the levels used.
management moves on to address the question of release type. Release types are divided into 3 categories, these are, full release, Delta release and package release.
A full release is where all components of the release unit are built, tested, distributed and
Definitions of release Type and Release units should be documented in a Release Policy.
This policy should also clarify roles and responsibilities, and information on Release frequency. The policy content is usually determined by the Release Manager, in conjunction with the Change Manager and the CAB.
released together. For example, if the release unit is at program level, then the whole program would have to be rebuilt.
49
Details
on
release
identification
and
Roll out planning, together with Release Management decides on the type of rollout approach. This might be a 'big bang', phased or pilot approach.
A Big Bang approach involves all sites receiving all functionality simultaneously. The benefit of this approach is that it offers consistency of use across the organisation. However, achieving a simultaneous upgrade can be problematic.
numbering conventions.
A definition on major and minor releases, plus a policy on issuing emergency fixes.
Release Management is responsible for the detailed planning of releases. Amongst other things, release planning involves:
Gaining agreement on Release Content
In a phased approach all sites could receive some functionality at the same time, with more coming later. In a Pilot approach a single site receives all functionality ahead of other sites. Note however that combinations are possible, for example a 'phased pilot' approach.
S3AP13-Activity
necessary
definitive
software
library
and
Compliance with software licence agreements has become critical to businesses. Ensuring these obligations are met is the joint responsibility of Release and Configuration Management. For example, when moving software to the DSL, it is important to check what has been purchased has arrived, that it has been virus checked, and that the licence agreement has
been checked.
In addition the Release Planner develops a Release Quality Plan, to ensure all aspects of the release are quality managed, and produces a back-out plan Where a release is going to be particularly complex it may require a specific planning phase. To facilitate this, the Release Plan is extended to Rollout planning. This expands the Release plan produced thus far, and adds details of the exact installation process developed and the agreed implementation plan. Roll out planning involves:
Producing a detailed timetable of events.
Remember penalties for breaching the laws on software theft are applicable to any responsible officer of the company, including those at the highest level.
<ei'
There are many legal precedents for holders of software intellectual property rights arriving unannounced at premises, and impounding any equipment, which they believe, contains unlicensed copies of their software. S3CP15-Activity
S3CP16 - Benefits & Problems
Producing
Release
notes
and
The benefits of and potential difficulties with Release Management are listed on page 39 of
the little ITIL book and in section 9.4 of the
also
50
S3CP17 - Summary
In this third and final session on the ITIL
We started the session by defining ITIL's Release Managements goals, and why Release Management is necessary.
We saw how a release can be divided into
Major, Minor and emergency releases, and discussed Release Management's holistic approach to IT service change, and how, as part of this approach it produces detailed release or rollout plans.
We
examined the
and the
Release
Management
its critical
process,
linkages to
concluded the session by identifying some of the benefits, and potential problems with the Release Management process.
51
Session 4A Management
S4AP1 - Objectives
Availability
The topic for this session is Availability Management, which is described in Chapter 8
of the Service Delivery book.
point between the benefit given by extra availability and the cost of providing it in terms of more and more advanced techniques and
equipment.
Business of course is interested in the
Once you have completed this session you will be able to define Availability Management and
describe how components. it relates to other ITSM
You will be able to recognise the main elements of the Availability lifecycle and
understand the terms MTBF,
MTBSI.
directly concerned about the availability of any components that might be vital in making up
that service.
MTTR
and
You will appreciate the main responsibilities of the Availability Management process and be able to recognise several techniques which are
of use in this area. S4AP2 - Introduction
and the systems that it is based on, by the reliability of the items in the infrastructure, by both corrective and preventative maintenance procedures - and also by our incident, problem
and change management procedures.
It is important for all staff involved to
service is
It is a fact that the IT Infrastructure is becoming ever more reliable - and hence Availability
levels are generally better than they have ever been. However, Availability Management is none-the-less a critical support process for Service Level Management.
Availability is now regarded as one of the most important issues for IT service management because, even though reliability has increased, so has the dependence of businesses on their
IT services.
understand
that
if a
business
Availability Management supports Service Level Management by actively managing the availability of services. For example it assists the Service Level Manager in negotiating and monitoring service levels.
The Service Delivery manual states that:
We will now explore the relationships that exist between Availability and the various elements of the support organisation, such as Service
Level Agreements,
customers.
A customer will negotiate a Service Level Agreement with IT Services, and within the
SLA there will be statements about service
availability. The goal of the Availability Management process is to optimise the capability of the IT Infrastructure, services and supporting organisation to deliver a cost effective and sustained level of Availability that enables the business to satisfy its business objectives
The critical words here are 'cost effective'.
These statements might say that we expect 99% availability from a service measured over a one month period, or they may say we
expect no more than one hour's lost service over a four weekly period.
The business can have almost any availability it likes provided it is prepared to pay for it. One only has to look at the expenditure
They may say we expect no more than three breaks of service totalling one hour over a monthly period.
53
The definition of availability and the way we phrase that will be subject to local discussions. The current best practice view is to make this statement as business focused as possible and to think in terms of unavailability rather than availability.
There will then be a period of time that it takes to repair the faulty component - this is usually
referred to as the Mean Time To Recover or
MTTR.
The generic definition of availability is: "The ability of an IT service or component to perform its required function at a stated instant or over a stated period of time."
Related terms, which are also defined in the same section of the Service Delivery manual
Be very careful here as the R in this acronym can have a number of alternate meanings. We
have defined it as "Recover" but it is also
commonly taken to mean "Respond", "Repair" or "Restore". Imagine, for example, that the
failure is a crashed hard disk.
There will be a period of time that it takes to "Respond" to the incident, to get an engineer on site. Then there will be a further period during which the disk is being repaired or more likely replaced. Typically, it will then take some time to "Restore" the data to the point where
normal business can be resumed.
In this course we will be using the term "Recover" to encompass all of this - and the Mean Time To Recover is the average length
of time that all of this takes to achieve.
There are broadly two types of underpinning support, one through operational level agreements with internal suppliers, the other through underpinning contracts with external providers. In the case of internal support, such as application support, hardware support and so
on, the OLA will contain statements on
Be aware though, that it may be useful to understand these other measures as they are often captured by service management organisations to check on various aspects of the availability management process.
Once normal service has been recovered there
availability, reliability and maintainability of the components that this group is responsible for.
will then be a hopefully long period of time before the component fails again at time X2.
When we are talking about underpinning contracts the word 'serviceability' is used in ITIL as a contractual term to cover availability, reliability and maintainability when applied to components supported by external suppliers.
You can review a definition of each of the
Hence it is easy to see that the sum of the MTTR and MTBF will give what is called the Mean Time Between System Incidents or
MTBSI.
terms "availability", "reliability", "maintainability", "serviceability" and "security" by clicking on each of the buttons here. S4AP6 - The Availability Lifecycle
We can now consider the relationships that exist between each of these three parameters and the terms Availability, Reliability and Maintainability that we have already discussed.
It is obvious that a high Mean Time Between
So imagine that we have a timeline with time running from left to right.
54
higher availability of the first part of the service than the second part.
ITIL refers to such business-critical functions
as Vital Business Functions or VBFs.
management team so that there are no delays between an incident being detected and repair
work starting.
widely used in IT Service Continuity Management and Availability Management within ITIL and is a way of highlighting the
services to which the business must have
allows the Cost of Unavailability of a service to be measured and reported. Such costs may be
incurred through revenue loss, or overtime
payments and so on, as we discussed earlier.
It is important to report on trends and to agree on the measurement period, for example,
"Service was available for more than 98% of
Typically, if we want to increase the overall availability either of a service or of an assembly of components, then this can be
done either by increasing the reliability of each component or the resilience of the assembly or by improving the maintainability and the procedural aspects.
Increasing the MTBSI and MTBF figures and reducing the MTTR will all cost money. There will be a limit as to how much we can spend to achieve high reliability and high resilience and there will be a limitto how much we can spend to achieve instantaneous reporting and repair.
As we said at the start of this session, the
the agreed service hours during the last month" may be very useful when we're reporting against service levels in Service Level Agreements, which are often expressed in the same way. Trends are very important in the whole of service management. Service improvement programmes, for example, set out to move
business can have almost whatever availability it wants - provided that it is prepared to pay for
it.
Section 8.7.7 of the Service Delivery Manual uses what it calls an IT Availability Metrics Model (ITAMM) as a framework for deciding on the sort of reporting that needs to be done. Because it covers such a wide range, from details of component availability right through to services, it is a basis for all reporting both
internal and external.
S4AP8-Activity
Page 64 of the Little ITIL Book gives a useful listing of the responsibilities of the Availability Management process.
55
The first of these, concerning the optimisation of availability is self evident and much of this session concerns that particular point.
The second point is about determining availability requirements in business terms. It is very important to work with the service level manager and the customer so that their requirements for availability can be expressed in terms with which they feel comfortable.
In many ways the Availability Plan is analogous to the Capacity Plan and should take account of current levels of availability against the service level requirements, trends in terms of availability, new technological options and knowledge of the way business is developing. There is no absolute guideline on how far ahead the plan should look, but following the capacity management analogy, it would be reasonable to think in terms of one year at a time with a review at least every three months. The fifth item on the list of responsibilities is all about the collection, analysis and maintenance of availability data. Monitoring the various availability parameters can generate a large amount of data and
because of this it is not unusual to find an
They are often much more comfortable with discussing costs of unavailability in terms of money and time rather than percentages and
fractions.
meaningful technical terms for discussion with suppliers of underpinning services, both
internal and external.
Availability
created.
Management
Database
being
Conversely, technical information about availability, MTBFs, MTBSIs and so on, may need to be turned back into meaningful
business terms for the customer.
This may be either as a separate entity or by adding extra information to the Configuration Management database. Item six is arguably one of the most important areas and defines the role of the availability
manager.
The third point, Predicting and Designing for expected levels of availability and security, implies that availability management staff are involved in the systems development process right from the very beginning.
It is an ITIL recommendation that Availability Management staff should be involved when the business case is being created for a new or extended service and that they remain involved all the way through the analysis and design process. The aim being to ensure that the needs of management, including availability maintainability and reliability, are built in along with security elements. This implies availability management staff having some familiarity with system development processes.
This is all about monitoring service availability against the Service Level Agreements, for the benefit of the service level manager.
The performance of internal and external
suppliers against the serviceability requirements in any underpinning contracts and targets defined in the Operational Level Agreements must also be monitored as part of this process.
The final point refers to the need for the Availability Management process to be continually looking for improvements on a proactive basis.
Management
In other words, not waiting for targets to be threatened before taking action, but to be constantly reviewing current status and looking for cost effective ways of improving availability.
As with many other of the ITIL processes this proactive work is critical but may be the last part of the process to be implemented.
56
achieve
service
levels
in
the
area
of
S4AP13 Process
S4AP12 Process
Section 8.3 of the Service Delivery manual describes the Availability Management process
in some detail.
intended to help the development teams decide on howto achieve high availability.
Details of the Availability techniques that will be deployed to provide additional Infrastructure
consequences of loss of availability are fully understood. This will help in determining priorities when setting up the Availability
Management processes for the first time.
The
monitoring
to
requirements
for
IT
in
components
ensure that
deviations
are in conception.
Incident and Problem data will also need to be
examined. Part of the proactive work will be to investigate incidents and problems and to see which of those are caused by unavailable equipment and what the impact of these
And finally, an Availability Plan for the proactive improvement of the IT Infrastructure.
S4AP14-Security
It can be argued that the most valuable assets of IT services are the data and the ability to
process that data.
Configuration data will be very important since that will show the relationships between configuration items and the chain of configuration items that makes up a typical
service.
Make
sure
that
access
is
denied
to
- Make sure that the assets are trustworthy. That is, maintain Integrity.
And, make sure that assets are available to
57
This may lead to some conflict and possible trade-offs. For example, high availability is not necessarily good if it compromises confidentiality or integrity.
User Down Time would be equal to four hours downtime x 1, giving a value of 4. Therefore the overall availability would be 400, minus 4 divided by 400 all times 100 - giving a weighted availability of 99 percent.
Contrast this with the value given by the more simple basic calculation, which would be only
90%.
Within ITIL, availability aspects are the responsibility of availability management while the confidentiality and integrity issues are shared responsibilities with security
management.
Within an organisation, it may well be that the whole responsibility for CIA is devolved to the availability management team. It is very important that such responsibilities are
clarified.
It's important to note that whichever way of calculating availability is chosen has to be agreed with the users before it can be used as the mechanism that we measure and report
on.
S4AP15 Availability %
Techniques,
Calculating
S4AP16-Techniques, Absolute Availability Percentage availability may not always be the most useful measure from a business point of
view.
xsgir
One of the most basic techniques used in Availability Management is the calculation of availability in terms of a percentage.
The basic calculation is straightforward, the availability of a service or of an individual component or of a grouping of components is given by the agreed service time minus the downtime, divided by the agreed service time all times 100 to obtain a percentage value. Note that component availability is often expressed as a decimal value - always less than one - rather than as a percentage.
In order to take account of the fact that one
Absolute figures of up-time and down-time over an agreed period might be more appropriate and may be more acceptable for
the business.
So for example we could say that there were four hours of downtime out of 400 potential service hours in the last week, and that may be a more useful measure than turning that into a percentage value. This is all about agreement and trust between customer and supplier and whichever figures
are chosen should be the ones most
user losing access to the system is significantly less serious than 100 users all losing access, a weighted calculation can sometimes be more meaningful.
meaningful to the business. It is very important to understand and be consistent in the use of reporting periods.
For example, an availability of 99% for a service to be achieved on each and every day
is much more demanding than the same percentage averaged over a year long reporting period.
End User Processing Time is defined as the Agreed Service Time multiplied by the total number of users (Nt).
It is possible to achieve a 99% availability whilst losing service for perhaps two whole days in the year. In order to achieve 99% on a daily basis, the allowabledowntimeon any one day would have to be reduced down to just a
few minutes.
58
For example, does it include downtime for maintenance? Is that already factored in?
In most cases we would not want to be
Calculating End-to-End availability for items arranged in parallel is a little more complicated
- as shown.
downtime
for
So for the same two components now arranged in parallel - the resulting End-to-End availability will be 99%.
In 24/7 systems however, where the requirement is for very high availability, the
figures often do include and are meant to include any time for maintenance, which will
need to be reduced to an absolute minimum.
Again it is easy to see that, unlike components arranged in series, the more CIs that are put in
parallel then the higher will be the overall
The pattern of downtime may also be critical and will need to be understood. For example, depending on business circumstances, 10
losses of service each of 10 minutes duration
may be more damaging than a single loss of service for 100 minutes for the same period of
time.
interest will nearly always be based around services and not around components. However, internal reporting for service improvement purposes and for supplier management mechanisms will often require reporting at the component level. S4AP18-Activity
S4AP19 - Techniques, Multiple CIs
manufacturers' engineering specifications, other similar installations and your own experience gained during testing or development.
Using a combination of those three sources will tend to give realistic values for the availability of individual components.
The formulae for calculating end-to-end availability for items arranged in series, is fairly simple.
Once an initial base of figures has been established then monitoring of availability over a period of time using monitoring tools and
records from the service desk of incidents can
The overall availability AT is equal to the product of the availability of each of the individual components. So if we have two components, each of which is capable of delivering 90% availability - the End-to-End availability of the assembly will be
0.9 times 0.9 or 81%. In other words,
in
the
Techniques,
Analysis
Techniques
significantly less than each of the components making up the assembly. It is easy to see from this formula that the more items that are put in series, the lower will be the End-to-End availability figure.
Finally, there are a range of techniques designed to aid understanding of why availability problems are occurring in particular parts of the infrastructure and to find corrective ways of working.
The first of these techniques that we will look at is Component Failure Impact Analysis or
CFIA.
59
This is represented normally in a matrix showing configuration items against the services supported.
Risk analysis can be done in a variety of ways. The way that's favoured in ITIL because it originally comes from the same development
source, is known as CRAMM, CCTA Risk
For example, here we can see that service 'B' is dependent on all four of the CIs 1 to 4 being available, whilst service 'D' only requires items
3 and 4.
It is important to realise that the CFIA matrix can be used by either reading down the columns or across the rows to give us different
information. If 'B' is a service that has vital business
S4AP23
Techniques,
Analysis
Techniques One of the key requirements of availability management is to be able to achieve an understanding of why a particular lack of availability is occurring and what to do about it. There are a couple of techniques that can help us here and they are called; System Outage Analysis, SOA, and Technical Observation
PostsorT.O.P.
functions within it, then it becomes critical to understand, at a more detailed level, how
**&?
those
VBFs
are
dependent
on
the
components.
As a first pass analysis of dependency and understanding of where single points of failure could be critical, CFIA is very useful.
So in the example shown, CI3 is a very good candidate for attention, such as replacement with a more reliable item or duplication by the addition of a parallel assembly as a replacement for the single component.
SOA involves a detailed analysis of service interruptions. It is really a post-mortem about some of the more major incidents that have occurred in the infrastructure and trying to find some common underlying theme or cause for the availability losses.
'B' to run, either component 3 or component 4 need to be there but not necessarily both.
This may require some extension to the notation - which is often home grown or company-specific and which is beyond the scope of this course.
S4AP22
tends to be managed as a small project with a particular budget and reporting period.
Setting up a Technical Observation Post or T.O.P. is an expensive process because it
involves bringing together a team of people to look at a service at a vulnerable period of its
life.
^i/
Techniques,
Analysis
If, for example, we know that on a monthly basis are availability problems while
assembling data for end-of-month financial
work, then a Technical Observation Post might be set up to look at this particular process.
This is a diagrammatic technique drawn initially from the world of engineering, which identifies the chain of events leading to service
failure.
It is part of a family of techniques generally referred to as Failure Mode & Effect Analysis
or FMEA.
60
acceptance from the business that the only way of finding and resolving the issue is by allowing some availability losses to occur.
It is worth noting that in addition to the techniques that we have discussed in this section, the Availability Management process will support and work closely with proactive problem management. So many of the same
techniques used in Problem Management may also help with identifying the underlying
reasons for lost availability.
S4AP24- Benefits & Problems
The benefits of and potential difficulties with Availability Management are list on Page 68 of
the little ITIL book and in Section 8.3.5 of the
S4AP25 - Summary
The main responsibilities of the Availability Management process have been defined and several techniques which are of use in this
area have been introduced.
61
Session 4B Management
S4BP1 - Objectives
Capacity
technical and business capabilities. The dayto-day activities include dealing with technical specialists and service level managers.
It's not usual for the Capacity Manager to In this session we will be examining Capacity Management, which is covered in Chapter 6 of
the Service Delivery infrastructure library. book in the IT
communicate with customers, or to be responsible for procurement of new
The Capacity Management Process can be regarded as something of a balancing act. The organisation must provide enough capacity to
meet justified business demands, balanced
There are two 'laws' associated with Capacity Management, which offer an insight into the demands placed on this process. The first is 'Moore's Law', which suggests that 'processing
capacity doubles every 12 to 18 months'. The second is a variation on 'Parkinson's Law',
In order that Service Level Agreements are met, it is critical that sufficient capacity is available at all times to meet the agreed
business requirements.
Capacity Management ensures that IT processing and storage capacity provision match the evolving demands of the business in a cost effective and timely manner. Of all the ITIL processes this can be regarded as one of the most proactive. ITIL defines Capacity Management's goal as:
'To understand the future business
which states that data expands to fit the space available for storage. This highlights a second 'capacity' problem, the one of supply and demand. As greater capacity becomes
available users will make use of it.
There is continual pressure from the business and customers to increase capacity, but in doing so there are costs incurred to the business. Ultimately, a decision has to be made over whether the cost of capacity provision provides enough business benefit:
and future capacity and performance aspects of the business requirements are provided cost effectively.'
The Capacity Management process incorporates Performance Management, Capacity Planning, and monitoring and tuning activities. In a large organisation there may be many people working in a Capacity management team under the leadership of a specialist.
However, Capacity Management must justify the cost of any capacity increases. Broadly speaking the objective is to provide:
The Right Capacity, enough but not too much
At the right cost
And critically, at the right time In theory, if Capacity Management processes are running well, providing the right level of capacity at the right time, then they should be invisible to the business, and to most aspects of the Service Level Management. S4BP4-Activity
In smaller organisations it might be the role of a single individual who is supported by technical specialists from Networking, desktop
and so on.
63
S4BP5 - Scope
Capacity Management is also involved in the reduction of capacity or as it is sometimes known, 'managing shrinkage'. In any organisation the capacity of certain components may be being reduced whilst the capacity of others may be being increased.
breached because of capacity problems, and tries to improve scarce resource utilisation through the use of Demand Management.
also
ensures
that
these
resources,
or
An example of this might be where a mainframe-based environment is gradually being replaced by a distributed service. The capacity requirements on the mainframe will be falling while the capacity requirements on the servers will be increasing rapidly.
As we mentioned earlier in this session,
providing capacity to the business at the right time is critical. If capacity upgrades are too late then the infrastructure could fail. Failures might already be occurring; for example, through incidents and complaints reported to the Service Desk, or internal monitoring tools might indicate that we are operating close to capacity.
Buying in extra capacity at short notice leaves little negotiating power with external suppliers and as such is likely to be very expensive. Conversely, upgrading the infrastructure to increase capacity to then find it's under-used could in itself lead to financial problems.
S4BP6 - Sub-Processes
The Capacity Management process has a number of ongoing, iterative activities. These activities include: monitoring, analysis, tuning and implementation, and are carried out in Resource Capacity Management and Service Capacity Management. They are not normally used in Business Capacity Management, except during business reporting. The monitoring activity should include the monitoring of thresholds, and baselines or profiles of the normal operating levels.
Thresholds and baselines are set from the
analysis of previously recorded data, they are the 'yardstick' by which Capacity Management
can measure utilisation of IT infrastructure
Capacity Management consists of three inter related sub processes, each working at different levels in the organisational structure. The three sub-processes are, Capacity Management, Service Management, and Resource Management. Business Capacity Capacity
configuration items.
All thresholds should be set below the level at
targets in an SLA. For example, a threshold might specify that the usage on any individual
CPU does not exceed 80% for a sustained
Business Capacity Management focuses on the future services required by the business and tries to predict future capacity. This process is responsible for the production of a Capacity Plan, which is intended to forecast the future requirements for resources to support IT Services that underpin the business
activities.
In addition to exception reports, monitoring will also produce trend reports on a daily, weekly
or monthly basis. Trend reports are intended to help predict future threshold breaches.
Monitoring leads on to the analysis activity, where the monitoring data is analysed to try and identify and classify problems. Analysis then leads onto reporting, and then tuning,
where the problems are addressed, and the
able to gather medium term plans and predictions about growth or shrinkage.
Service Capacity Management is concerned with the services currently in place to support
64
Once a tuning decision has been made it is implemented through the change management process. Finally the activity returns to
Ongoing, the day-to-day activities, Ad hoc, carried out as a result of a particular need, and Regular, which are carried out at fixed
intervals.
monitoring, and the iteration begins again. Note that tuning is an optional activity. If no problems are identified in analysis, then tuning will be unnecessary. Tuning is an expensive
activity, as it involves high levels of skill.
Among the ongoing iterative activities, are those of Monitoring, Analysis, Tuning and
Implementing, which we looked at earlier in the
session.
Tuning can improve service delivery without incurring costs associated with equipment
purchase. However, using skilled resources will incur costs, particularly if they are sourced
from outside the business.
services don't clash at times of peak demand. Any excess demand can be controlled by Demand Management, or by sharing capacity. We will be looking at Demand Management in
more detail, later in this session.
Another 'on-going' Capacity Management activity is providing data to the Capacity Management Database or CDB. As you can see in the diagram, all of the other on-going
and ad hoc Capacity Management activities
provide information to the CDB.
Importantly, tuning should be carried out initially in a test environment. Only when we are confident that the change will be a benefit to the business, should it be implemented through the conventional change management
process.
data can be extremely useful for other ITIL processes, particularly IT Services Financial Management.
The CDB is the cornerstone of a successful
S4BP8-Activity
S4BP9 - Activities
In the next few pages we will look at all of the capacity management activities in more detail, and how they relate to each of the Capacity Management sub-processes of Business
Capacity Management process. Data in the CDB is stored and used by all the subprocesses of Capacity Management, because it is the repository that holds a number of different types of data including; business,
service, technical, financial and utilisation data.
Capacity Capacity
Remember Business Capacity Management is concerned with future business requirements for IT services, its planning and timely implementation.
Service Capacity Management is responsible for ensuring the performance of all services detailed in SLRs and SLA targets are monitored, measured, recorded, analysed and reported.
Resource Capacity Management monitors and measures the individual components in the IT
infrastructure.
Another on-going Capacity Management activity is Demand Management. The main objective of Demand Management is to influence the demand for computing resource
and the use of that resource.
This activity can be carried out as a short-term measure because there is insufficient Capacity to support the current workload. Or as a deliberate policy of IT management, to limit the required IT capacity in the long-term. Short-term demand management might be needed if there is a partial failure of a critical
The Capacity Management activities can be sub divided in to three groups based on their frequency, and these are:
65
resource
in
the
IT
Infrastructure.
Service
These modelling techniques vary in complexity and consequently cost, with Trend Analysis being the simplest and cheapest, and benchmarking being the most complex and expensive. Let's look briefly at each of these modelling types.
S4BP13 - Ad-Hoc Activities
Physical constraints might involve restricting the number of concurrent users to a specific
resource, a network router for example.
The Trend Analysis technique looks at various data over a period of time and attempts to draw a smooth curve through these figures, extrapolating the graph data forward into the future, as a way of predicting future trends.
Financial constraints might involve the use of differential charging, such as charging customers a premium to use network bandwidth during peak hours of demand.
Analytical Modelling uses mathematics to represent computer system behaviour. Typically a model is built using a software package, which can recreate a virtual version
of a computer system. When the software is executed, 'queuing theory' is used to calculate response times, and if virtual response times are sufficiently
close to those recorded in the 'real life' IT
Demand Management must be carried out sensitively, without causing damage to the business, customers, or the reputation of the IT organisation. It is essential that customers are kept informed of all the actions being
taken.
Modelling is an example of an ad hoc activity, which is used in all Capacity sub-processes. Modelling tries to predict the behaviour of components and services under a given volume of work, particularly at peak times. It then tries to understand the way in which
current service and resources are used, and
Although Analytical modelling requires less time and effort that other modelling types, typically the end results are less accurate. Simulation modelling involves the modelling of discreet events, in other words what actually happens millisecond by millisecond, as a transaction passes from local pc through the
local area network, to server and so on.
the impact of that usage on the IT infrastructure. It attempts to predict the future from our knowledge of the past. In order to do
this we establish a 'baseline' model.
This type of modelling can be very accurate in predicting the effect of changes, but it is time consuming, and therefore costly.
However, Simulation Modelling can be cost
The baseline model reflects accurately the performance that is being achieved. Once a baseline is created, predictive modelling can
be done.
We can ask the 'what if?' questions about planned changes to the IT infrastructure. If the
baseline model is accurate then the results of
Finally
the
predicted
changes
should
also
be
accurate.
building a replica of part of the IT infrastructure and measuring such things as its response to a reduced workload, and extrapolating these results, to see how it would perform under the
'real' workload.
Because Benchmarking involves the purchase of equipment, building software and simulating
significant workloads, this is the most
and Benchmarking
66
Another ad hoc Capacity Management activity is Application Sizing. The primary objective of
Application sizing is to estimate the resource requirements to support a modified or new application, and to ensure that it meets its required service levels.
Application sizing has a finite lifespan. It is initiated at the beginning of a new application, or when there is likely to be a major change to an existing one. Application sizing is complete when the completed application is accepted
into the operational environment.
This activity is performed together with colleagues in system and service development, to ensure that we are fully aware of the likely impact of services being developed, designed or purchased, before they are implemented.
This provides Capacity Management with important data on future resource requirements, which can be integrated in to the Capacity Plan, as well as providing valuable information for purchasing and for the development team. Finally, a 'regular' Capacity Management activity is the production of a Capacity Plan, which is typically created annually. Information gained from the activities of monitoring, demand management, modelling and application sizing will contribute to the production of a Capacity Plan.
We will be looking at the Capacity Plan in more
detail later in this session.
and cost effective service level quality clauses. Charging and costing recommendations are
also produced.
SCM and RCM will be suggesting 'proactive changes' and 'Service Improvements', to improve levels of capacity, or reduce costs preferably both! Carrying out 'Effectiveness Reviews' and creating 'Audit Reports' form a basis for
checking that business benefits are being achieved, and that the process users are
following the 'rules'.
SPBP15-Activity
S4BP17
S4BP16 - Inputs & Outputs
Database
The
Capacity
Management
To fully appreciate the scope of Capacity Management, we need to consider the major inputs and outputs to the process, and how these relate to the sub-processes of Business, Service and Resource Capacity Management.
Although the Capacity Management Database is represented in the ITIL guidance as a single entity, it is unlikely to exist in this form in many organisations.
The main reason for this is that much of the
Inputs to the BCM sub-process include, the external suppliers of new technology, existing service levels and current SLAs, along with proposed future services and related SLRs. Other important inputs to BCM include the Business Plans, and any strategic plans together with IS and ICT plans.
data held in a CDB is common to that in a fully integrated Configuration Management Database, therefore, there is an argument for
making the CDB part of the CMDB.
Software tools used by Capacity Management may have partial CDB functionality designed in to them. If this information is accessible by
67
are the key to its success. Input from the business, includes the 'business strategy* and the business plan.
Service Management will provide Information about SLAs and a full definition of the quality processes in place.
Data about manufacturer's specifications for existing and new technology will be provided by the technical teams. And finally, the IT Financial Management team will provide fiscal data. Additional financial information will be provided from the CMDB, in its role as a 'super' asset register. S4BP18 - The Capacity Plan
A corporate Capacity Management process ensures that the entire organisation's capacity
requirements are catered for. However, making the process work successfully depends
on several critical factors. These include: Accurate business forecasts.
Working closely with other Service Management processes, for example Problem and Change Management.
Effective financial management.
Links to Service Level Management - to ensure that any business commitments are
realistic.
And finally, the ability to plan and implement the appropriate IT capacity to match business needs. This provides a longer-term proactive
view. S4BP20 - Benefits & Problems
A Resource Summary - which will show what has happened to particular components over the last year and since the last Capacity Plan The Capacity Plan will also suggestions for cost effective improvements.
A Cost Model will recommendations
The benefits of and potential difficulties with Capacity Management are listed on Page 57 of
the little ITIL book and in Section 6.4 of the
S4BP21 - Summary
In this session we have been looking at the ITIL process of Capacity Management.
three
Capacity
One final note. Remember that the Capacity Plan should be updated regularly, in line with any revised business plan, or unexpected
changes in the IT infrastructure.
68
We highlighted the major inputs and outputs of the Capacity Management process, and defined the contents of the Capacity Database and the Capacity Plan. We concluded the session by defining the critical factors for successful Capacity Management implementation.
\^^z
69
Management
S5AP1 - Objectives
In this session we will be examining Service Level Management, which is covered in Chapter 4 of the Service Delivery book in the IT infrastructure library. When you have completed this session you
will be able to:
dependency on IT for successful business operation. Hence they feel an increased need
to formalise the contractual basis on which IT
Service Level Management can help. Often, Service Level Management is a driver
for CSIP or SIP or Continuous Service
Improvement Programmes.
It is the
responsibility
to be
of Service Level
aware of service
Management
To maintain and gradually improve business aligned IT service quality, through a constant cycle of agreeing, monitoring, reporting and reviewing IT service achievements and
through instigating actions to unacceptable levels of service." eradicate
There are a number of ways that IT services can be provided - each having their merits and
draw-backs.
Service Level Management exists to ensure that service targets, such as availability or services, response times and so on, are agreed and documented in a way that the
business understands.
In the simplest scenario there is external provider of the IT service customer organisation. Services provided on the basis of a contract these two parties.
Service Level Agreements, which are managed through the Service Level Management Process, provide specific targets
Whilst this has the benefit of simplicity, it's a risky strategy and one that generally leads to poor support for the users and poor value for money for the corporate customer. The next approach is often said to involve an "intelligent customer" role. That is, somebody who negotiates on behalf of the customer with suppliers for service delivery. That customer has a Service Level Agreement with the
71
Service Level Management process, and the service is underpinned by an 'Underpinning Contract' with the suppliers. In this situation, the internal IT department adds little or no value. Such arrangements are common where an 'off-the-shelf package solution is being provided by the supplier.
of service requirements. So by producing SLAs at the Customer Group level the number required could be reduced to 500 - more manageable but still excessive. There are a number of ways in which this problem can be overcome - perhaps the most common one being the mapping of services
onto customer groups.
the
Service
Level
In order for that service to be provided, it is necessary for the Service Level Management team to establish 'Operational Level Agreements' with their own internal IT departments, who in turn may have an 'Underpinning Contract' with the external suppliers of the various components. Note that for any one service there may be several Operational Level Agreements and several Underpinning Contracts. Finally, although it is much less common, the whole process can be purely internal, and no external contracts are therefore required. So the Customer has a Service Level Agreement with Service Level Management and they have an OLA with the internal IT department - and
that's it.
So a particular service, say Service A, will be provided in a generalised format to Customer Groups 2 and 4. And in a similar way, Service D will be provided to Customer Groups 1 and
2.
This allows us to have just one SLA per service - so 50 in our previous example. Despite this problem, this is the most common approach that you are likely to encounter. The drawback of this approach is that it tends to make each SLA more complicated, since they may have to cater for the fact that not all groups covered by a service have exactly the same requirements. If there are geographical differences between the groups as well, then this will also add to the complexity.
S5AP8 - SLA Structure, Customer Based
This last arrangement is fairly unusual because most systems will depend on some external supply. It is, on the other hand, quite common for a total service to be provided on the basis of a combination of two or more of these strategies.
S5AP7 - SLA Structure, Service Based
An alternative approach is to turn the previous model on its head and map Customer Groups
onto Services.
One of the early decisions that has to be made is the structure of the SLA procedure - which is a major determinant of how many SLAs will end up being produced.
For example, if we had 1000 customers and 50 services we could theoretically produce 50,000
Here for example Customer Group 2 receives three services, however they would have just one SLA, admittedly quite a complex one, detailing how they would receive Services A, C
&D.
be dramatically reduced - in our previous example with 10 customer groups and 50 services, we would end up with only 10 SLAs.
72
Level Agreements would be authorised at the next management level down in each of these departments.
The general principal is that SLAs are authorised by paying customers on behalf of users in their part of the organisation.
S5AP10 - What is an SLA?
In their
structure
SLAs
are
rather
like
contracts, but they are not in themselves legal documents, However they can be included in a legal contract, particularly when establishing SLAs directly with external suppliers. In such
cases an SLA would be included in the
contract as a schedule.
An SLA which is used internally between departments has no legal weight, it is simply a
document that has a contractual structure to it.
Each of the SLAs produced at this level is a description of the services for a particular group of customers. So in our previous
example there would be 10 SLAs at this level.
At this level SLAs would contain everything that was common for that particular group of customers, but different from the generic services that appeared in the higher Corporate
level.
always be written in unambiguous business language, and shouldn't contain any technical references, which make its intention unclear, and leaves the Business feeling uncomfortable
authorising the agreement.
So we have established what constitutes an
representing each service used by that customer, and relevant to that particular
customer group. It only contains information which differs from the corporate customer level
clauses.
SLA. What then is an OLA or Operational Level Agreement? Well in simple terms OLAs
are agreements that define the internal IT arrangements that support SLAs.
OLAs are also known as back-to-back
Consequently we would have a larger number of SLAs, but each would be relatively short. This in itself makes change management
easier.
is to define the relationship between Service Desk and internal support groups.
OLAs are required to ensure that the SLA targets agreed between customer and IT provider can be delivered in practice. They describe each of the separate components of the overall service delivered to the customer, often with one OLA for each support group and a contract for each supplier.
A further additional contract exists to ensure
until 7pm, to 9am until 9pm, then that change would only appear in the corporate level SLA.
It is important when using the hierarchical structure, that the correct level of authority is assigned to each level. For Example, at Corporate level the document would be authorised at the highest management level liaising with IT. Customer level documents might be authorised by Department Heads, Finance, Planning, HR and so on. Individual Service
Underpinning contracts are put in place with external suppliers or vendors. It's important that all targets contained within both SLAs, and OLAs that rely on these external suppliers are
73
'underpinned' by the appropriate level of maintenance and support contracts. For example, an internal software development team might have in place an OLA between themselves and Service Level Management. This OLA offers, amongst other things, a guaranteed response time to serious problems.
published to potential customers, and the wider business as a whole, in a more 'glossy'
format.
In order to establish their exact requirements, the customer develops a Service Level Requirement document. When doing so, the customer should be realistic about potential
levels of service, and related costs. Remember this is not a wish list, and sensible
advice should be offered from the Service
In order to guarantee these service levels, the software development team might have an underpinning contract in place with their development software vendor, ensuring that problems can be resolved well within this guaranteed response time. A word of warning here, it's critical that any commitments made in an OLA are directly supported by the underpinning contract. For example, committing to a 4 hour fix time in an
OLA would be useless if our underpinning contract only commits our supplier to a 6 hour
fix time!
Level Management team. There is no specific format for SLRs, and each organisation will document it in their own way. It is important to remember that these documents, along with SLAs, OLAs and UPCs are all subject to the ITIL Change Management
Process.
S5AP13-Activity
S5AP14 - Sub-Processes
In the last few pages we have been looking at those agreements and contracts that form an important part of Service Level Management.
But how do we establish which services are
In the next few pages we will look in some detail at the Service Level Management subprocesses. These sub-processes can be grouped into 4 stages as shown.
So let's look at these 4 stages individually, and see how they fit together to form a complete Service Level Management process. The first stage is Initial Generic. The first activity at this stage, assuming that a Service Level Management team is in place, is to build the initial Service Catalogue.
Well, there are two other important documents in Service Level Management, which can help
us with this decision, and these are 'A Service
A Service Catalogue contains a list of all services used by each customer group. A service Catalogue could be used internally by the service provider, for example, the Service Desk might use it to help them identify those customers entitled to a higher level of service.
The second related sub-process is planning the SLA structure and establishing which SLAs we need to create. This activity involves prioritising the modification of pre existing
SLAs, in order to re work them into standard
formats.
It can also be used externally as a marketing tool, providing a shop window, showing all the services on offer to the business. Commonly,
Organisations now make this available on their
intranet.
Assuming, we've built the Service Catalogue, agreed the SLA structure, and prioritised the work, we can move onto the second stage of 'Initial per-service', and its related sub
processes where specific issues. we address customer
Service Catalogues exist in a number of forms. They are often created as an internal
The first point is to establish Service Level Requirements or SLRs. Find out what users would really like from that service, and what customers are prepared to pay for.
document, listing existing services when Service Level Management is initially established. At a later stage, it might be
The second sub-process uses those SLRs to review the underpinning contracts and OLAs
74
Monitoring OLAs and UPCs will help us to understand why SLA breaches are occurring, and also to identify future trends, and possible future SLA breaches. Remember you can't
control things that you can't monitor.
And when the draft SLA is available, agreement should be sought from customers and users that it represents an adequate specification of service.
External reporting should be written in a simple and clear way. An exception report is a typical
example of external reporting, and it should
Once the agreement is formally signed, the SLA must be implemented. This involves informing all parties constrained by the SLA,
that it is in place.
S5AP15 - Sub-Processes
A Service Level Management Agreement Monitoring Chart, or SLAM chart, is a popular mechanism for external reporting. The colour coding used here is quite common, hence this
type of chart is sometimes called a RAG, or
Red, Amber, Green chart. Such charts offer
The third stage in the SLM process, includes the on-going per service activities of monitoring, reporting and review and modify.
The fourth and final Service Level
Management process stage is defined as ongoing generic. It involves sub processes, such as maintaining the Service Catalogue and updating it with new services.
A further activity is to review the Service Level
Another important monitoring tool are trend graphs. Businesses are very interested in consistency of service as well as quality. For
example, trend graphs can display graphically that over a three month rolling period, that the trend is for greater throughput of activity, and for less breaks in service. In displaying these
trends to customers, we can convince them
performance,
we
can
also
set
Key
Level Management should look at all provided services and their associated quality requirements to see how we can improve
service levels without significant increases in
%>&/
cost to the business.
So
what
does
typical
Service
Level
Well, broadly speaking, its contents can be S5AP16 - Reporting We briefly mentioned the activity of reporting earlier in this session. Reporting can be
subdivided into either external or internal
broken
down
into
three
sections.
An
Levels,
and
reporting.
Internal reporting involves monitoring service quality in SLAs and related OLAs and UPCs.
This detailed monitoring of service quality is normally set up by the Capacity and Availability Management processes. They will be interested in all activity which affects all service clauses, including breaks in service,
and it should be authorised at an appropriate level, by both parties. 'Agreed Service Levels' will define a number of measurable clauses, for example, normal hours of service, availability and reliability of
the service.
75
Clauses related to 'throughput' are also common, detailing the number of transactions the service is expected to support in a defined period. SLAs frequently contain clauses covering transaction response times. This is often broken down into several response types, including system responses, a request via mouse click on a PC for example, or an incident response, detailing the maximum time allowable in responding to an incident report. There may be as many as 20 different measurable clauses in an SLA, against which, customers will want us to report.
S5AP19 - SLA Contents The third section in our SLA deals with the
In order to establish customer's perceptions of its service, Service Level Management should carry out regular service review meetings. Typically these meetings involve customers rather than users and consequently shouldn't be used as a substitute for user questionnaires
and so on.
Ahead of these meetings Service Level Management staff should review customer
related incident records from the service desk,
Review meetings can lead to suggestions for change, remember however, that they are not the place where changes are authorised.
additional statements, such as service charges and how they are structured. Mechanisms for change should also be outlined in this section. Remember however, that changes to SLA clauses should be handled via the Change Management process. If a request is received to amend an SLA clause it is important that the proposed change undergoes a thorough impact analysis. Changes in one SLA can impact on others, for example changing one SLA to allow more users on a network might have an adverse effect on other customers using the same
network.
The Service Level Management process can carry out its own internal review. This review should be carried out by the head of the Service Level Management team, or process
owner.
A key activity in the review process is to review KPIs. Some typical example KPIs might include Customer Perception ratings, the number of service reviews held, and how many are held at the right time. ITIL suggests that
these reviews are held on an annual basis,
although many organisations hold them more frequently. S5AP21 - The Service Level Manager
The SLM Process must be 'owned' in order to
Statements on provision of service in case of a disaster are also important. It is the role of IT Service Continuity Management to create cost effective plans to deal with potential disasters,
such as fire and flood. It is common to state in
be effective and achieve successfully the benefits of implementation. The Service Level Manager must be at an appropriate level to be able to negotiate with Customers on behalf of the organisation, and to initiate and follow through actions required to improve or maintain agreed service levels.
Customer responsibilities. Customer statements might include defining the maximum number of Users at any one time, or a commitment to provide data to the IT supplier in the event of weekend working for example.
This requires adequate seniority within the organisation and/or clearly visible management support. It's important that the role acts as a conduit between IT specialists and the customer, translating technical language from the IT groups into
76
The benefits of and potential difficulties with Service Level Management are listed on page
45 of the little ITIL book and in section 4.2.1 of
^v
We have seen how ITIL defines the goal of Service Level Management, how it's often driven by a Service Improvement Programme, and why it's regarded as essential to the ITIL
structure as a whole.
We examined the relationships between the customer, the IT provider, and external suppliers, and went on to look at the structure of Service Level Agreements and the different way in which we can tailor service provision to
customer needs.
Level Agreements, and their relationships with Operational Level Agreements and Underpinning Contracts, and discussed how, by producing a Service Catalogue and a Service Level Requirement document, we can better satisfy customer's requirements. We examined the Service Level Management sub-processes in detail, including, planning an SLA structure, and the monitor, report, review and modify activities.
We listed the key characteristics of the Service Level Manager role, and highlighted some of the potential benefits and possible problems associated with implementing a Service Level Management process.
77