Beruflich Dokumente
Kultur Dokumente
About ITIL
Information Technology Infrastructure Library (ITIL) was developed in England in the 1980s for the Central Computer and Telecommunications Agency (CCTA), an agency of the British Government. By collecting the best practices from top companies, the CCTA established a best practice process framework called ITIL. All the companies who contributed their best practices then adopted this guide. This is a public domain framework and can be used within any department of any sized company. The best practice framework for service management contains a description on how to organize service management within your organization. ITILs quality approach to service management focuses on: Improved quality service provision Cost justifiable service quality Services to meet business, customer and user demands Integrated centralized processes Clear roles and responsibilities Knowledge base approach Performance indicators
Tactical
Service Delivery
The Service Desk
SERVICE SUPPORT
Operational
Service Support
Configuration Management Incident Management Problem Management Change Management Release Management
To deliver high quality support to achieve business goals In order to be successful, the service desk must give the customer the highest quality of support possible while staying within budget and delivering business goals directed by management. This can be achieved through Service Level Management. To aid in user retention and satisfaction Good quality service will keep customers coming back for more! As mentioned earlier, positive perception is key to the success of the Service Desk. When a customers needs are met, there is no need to look beyond the Service Desk. Satisfaction can be measured not only by the speed at which the phone is answered but also by the level at which the organization understands their customers business and needs.
To improve the service but reduce the cost This can be achieved by improving the response time to the customer. Time is money, and the faster you can move forward with a customers concern the more cost efficient the resolution will be. Collecting the vital information needed to resolve the customers request the first time (i.e. via 1st level support) will also reduce the amount of time the customer and 2nd level support has to wait. If wrong or insufficient information is collected at 1st level then 2nd level support has to get back to the customer or Service Desk for data collection. The Service Desk is also responsible for ensuring that the right 2nd level support group is working on the customers issue. Again, making efficient use of resources will reduce costs. To highlight customer training and education needs. As the single point of contact, Service Desk will understand the customers needs. All requests for both incidents and service are filtered through the Service Desk. The Service Desk will identify trends that highlight a possible IT need, for example, training. The Service Desk will also be aware of new technology that will be introduced to the business and will make recommendations on training needs based on historical data from previous software or hardware releases. To close incidents and confirm customer satisfaction Before closing records/tickets, the service desk will consult the customer, giving them the authority to ensure that their problem was resolved to their satisfaction. This is an excellent way of confirming satisfaction with the customer. This type of service is not very common and is very well received by customers. Contributing to problem identification The Service Desk staff will have a combination of both superior technical and soft skills, which will enable them to be more involved in the problem identification process. The Service Desk will use both experience and a knowledge base to collect data for both a work-around and resolution. Since it is the responsibility of the Problem Management Process to resolve problems, the Service Desk can identify previous solutions to Problem Management. Tools play a vital role in this area. A call tracking device or knowledge base can be considered a tool.
Roles and Responsibilities The Service Desk is responsible for accepting and recording all calls, without exception. This information, no matter how insignificant, will be vital for statistics. All calls are important - even if they are wrong numbers. A large number of wrong numbers coming into the Service Desk may identify a problem with the lines or trunks coming into the Service Desk. It is the directive of the Service Desk who owns the Incident Management Process to resolve as many calls as possible at 1st level. World class 1st level resolution is set at 85%. This can be achieved by close co-operation with 2nd level support. As calls are being forwarded to 2nd level support for problem resolution, the Service Desk has to ensure that information regarding resolution is placed in the call-tracking tool. This information can be used for similar calls received in the future. If the skill set is adjusted to allow 1st level (Service Desk, Incident Management) to resolve the issue then the % of resolution will go up. In any case the fact that the information is being collected will reduce the waiting time for resolution to the customer and the research and diagnosis for 2nd level support, resulting in a cost saving and efficient use of resources. The Service Desk is responsible for monitoring and escalating according to all SLAs. It is a given that on all Service Level Agreements the Service Desk is highlighted as the point of escalation. Close monitoring of all tickets being escalated will ensure that 2nd level support does not miss their SLA targets to resolution. The Service Desk will keep users informed on the status and progress of their requests. Management of the customers expectations and informing the customer of any status changes or progress with their requests is vital to customer satisfaction. If close monitoring of requests is adhered to, the customers expectations will be exceeded. In addition to keeping customers informed the Service Desk is responsible for closure and verification of every request. Communication of planned and short-term outages will be done when information is gathered from either the Change Advisory Board or Service Level Management. The Service Desk will broadcast these messages to the users using whatever means are available at the time, be they phone messages, email, broad cast messages, bulletin boards, intra-net, etc. Co-ordination of 2nd level and 3rd party support for all customers needs is another responsibility. The Service Desk owns all problems being escalated. They will ensure that the right 2nd level group is handling the customers request. They will also ensure that the proper 3rd party vendor is involved and providing the service within the underpinning contract.
The Service Desk has the ability to inform management on all recommendations for service improvements. Being the single point of contact, the Service Desk will have insight into or input from customers needs. Benefits: Improved user service, perception & satisfaction Increased user accessibility via (SPOC Single Point of Contact) Improved quality and faster response to user requests Improved team work and communications Management of infrastructure and control Reduced Cost (Efficient use of resources)
CONFIGURATION MANAGEMENT
Why Configuration Management?
The responsibility for businesses to deliver quality IT services economically, efficiently and effectively is what drives Configuration Management. Configuration Management is the need to control IT assets and services. Terms & Definitions:
CI
CMDB
SCOPE
Configuration Item. Anything within IT that is decided to be within scope and can be changed should be considered a CI. This could be hardware, software, service level agreements, job descriptions, etc. Configuration Management Database. The CMDB holds all detail and relationship information of all CIs associated with the IT infrastructure The activities of configuration management include identification, control, Status accounting and auditing. In order to be able to do this, the company has to decide what will be within scope of configuration management. If the scope is too big then the CMDB might have integrity issues. If the scope is too small then the CMDB might be useless.
Configuration management, working with asset management, will ensure that all configuration items (Software, Hardware and Documentation) are identified, recorded and tracked. All ITIL processes rely on information supplied by Configuration Management. The information that will be available is: CI history (from the point of being ordered to retired) Information on all assets (what type of equipment, Attributes and location etc.) Information on how much assets are available when a release is being planned The relationship between assets (PCs connected to servers etc) Call tracking data used in incident, problem and change management Information on suppliers that are related to either service or asset providers Information on lease management on all assets Information used for auditing company assets to ensure asset management numbers Information for IT Business Continuity Management Process (Base Line) Planning Configuration Management Planning the Configuration Management database will need a great deal of input from all IT departments. The size of the database will depend on the information that the IT departments need to provide and where it is coming from. The question that will pop up will be How big or how small will this database be? If the database is too small the information contained in it wont be of much use. There will be an inability to track down certain CIs or do any trending. If it has too much information it will too cumbersome to maintain and will soon become riddled with integrity issues. No one will want to maintain a large database. Also, the cost of managing a large database may outweigh its value. It really all depends on the scope and the resources to manage it. The pertinent information may be found in a number of different databases, which needs to be assembled into one logical format within the incident control system. In the diagram below you will see that the service desk has access to information from 5 different databases. Having access to this information depends on the incident control system tool you are using.
HR Database
Hardware Database
Knowledge Base
Software Database
Processes, procedures and activities have to be defined. Who will manage it? There is going to be a need for processes and procedures that will clearly define accountability and authority for managing the information. Only the Configuration Management Owner can authorize people such as the Service Desk Analyst to update or change the Configuration Management Database. The service desk would be the likely group to do this since they are in contact with between 5 to 10 percent of the employees of a company daily. Planning the relationship processes between Configuration Management, Change Management and Release Management as well as 3rd party vendors is important. Configuration Management contains information on all CIs within the infrastructure. This information can help with Release Rollouts, which are directed by Change Management.
For example, if your company decides to release Windows Millennium into your infrastructure there are certain elements that need to be considered: How many licenses do we need? Is the equipment on which the software will be loaded capable of handling the operating system? Will any training need to take place to make the transition smoother?
If your configuration management database is planned and built properly, this information is obtainable. A tool that can do this will need to be selected. All of the larger Help Desk / Call Center solutions companies are ITIL certified and will be able to supply a tool that can do this for you.
Identification You now have a configuration management database! So what do you need to do now? The next step is to decide how you enter the information into your system. You have to decide on an identifying and labeling procedure. When you decide what CIs are in the scope of your process you either have to identify and label them yourself or have your 3rd party providers do that for you. It is very important that you take care in this step. The Service Desk will need to ask this information of a user when they call for support. If the information is not clear or understandable then the issue will take much longer to resolve. Identifying equipment can be very straightforward. For example: A laptop in company ABC located in building 10 on the 3rd floor can be identified as follows: ABCLT1003 ABC LT 10 03 - Company - Laptop - Building 10 - 3rd Floor
This identification will now be associated with information related to that CI. The identification will also include information regarding who owns the equipment and what version of operating system is on it, as well as what server and department its associated with. All this information is vital to understanding the impact that each piece of equipment has on the infrastructure. The business will also have to decide what level of detail regarding each CI is appropriate - will having information on the PC, monitor and keyboard be enough, or should the detail be at a deeper level?
Control The person who owns the Configuration Management Database is called the Configuration Librarian. This person decides who has update authority. As mentioned earlier, the Service Desk would be very helpful in this area considering the amount of contact they have with the users. Only authorized CIs can be included. This means that if your standard desktop is an IBM machine then any other manufacturers who are not recognized should not be included in the database. From the moment that purchase order for equipment is signed an entry into the database should be made. This is very important for management to understand what their infrastructure looks like at the earliest point. Upper management should also have access to this database, in order to find the answers to simple questions about CIs throughout the company (i.e. to determine how many computers are owned by the company). This information should include all recent purchases. It is also important for this database to track equipment to the point at which they are retired or disposed of. This information will be used by a number of different ITIL processes, including IT Financial Management. Status Accounting Status accounting takes into consideration the state of every CI within the company. If a computer is ordered then its status is ordered. When the CI arrives it is now in received status. When its deployed, its in active status, etc. Every CI will have historical data associated with it. This will be beneficial when deciding on a desktop that you want to use in your company. Historical status information will let you know the success or failure rate for all CIs. For example if you determine from the data that the current desktops had a higher than 10% failure rate then you should consider other options for future acquisitions. Status accounting information can be used for audit purposes to determine the current equipment actively being used by the company.
Verification and Audit Verification and Audits have to be done prior to any major changes and releases to CIs. This will give your company an idea as to what equipment or technologies are out there and compare it with the information within the database. On average, the Service Desk comes into contact with between 5 10 % of company users daily. This means that by asking a few selected questions, they can verify the accuracy of the information in the database.
Other Definitions
CI Attributes (unique identifiers) Serial or copy number Model number Licence number Type Version Relationships Connected to Part of Copy of
Relationships: Connected to
RELATIONSHIP Is connected to
ATTRIBUTE Version
Benefits
Configuration Management information that supports all other processes Information for impact and trend analysis for problems and changes Assists in adherence to legal and contractual obligations Reduces risk of unauthorized software Helping with financial planning
INCIDENT MANAGEMENT
The Service Desk (function) owns the incident management (process). The goal for incident management is to restore service as soon as possible, minimizing the disruption to the business.
Service Request ?
Yes
No
Investigation & Diagnosis
Incident closed
If it is an incident, it will then be determined if the incident is routine (i.e. printer start), known error (i.e. lack of space on server) or a known problem (i.e. blue screen of death). Each possibility will have handling processes around them. Part of classification is to determine the priority of the incident. This is done through an evaluation of impact, urgency and expected effort. Once the classification is determined, a priority is assigned. Impact Urgency Expected effort The degree to which the business is affected by the incident The speed in which the incident needs to be resolved The resources needed to rectify the incident
Part of initial support is to inform problem management of any new or unmatched incidents as quickly as possible. The service desk will either resolve or quickly find a resolution using the incident management processes.
Hierarchical - usually a manual escalation through authority and can be done at any time When the right resolving group has accepted the incident and evaluated the type of work that is needed, both the service desk and the customer must be informed of the time to resolution.
Incident closure The incident can only be closed with the users permission. Confirmation of incident resolution will be made with the user and details of the resolution must be placed in the Incident Control System.
Knowledge
Tracking
Incident
Base
tool
&
No Solution
Relationship processes
between
Incident,
Problem
and
Change
management
The relationship between processes highlights the importance of ITIL within the IT environment. The Incident Management (process) through the Service Desk (function) collects information in the incident control system. Incident Anything the Incident Management can resolve about 85% at 1st line. Management process cannot resolve has to be escalated to Problem Management. Problem Management relies on Incident Management to provide enough information to create a work-around. The work-around is for incident management to provide an interim solution to satisfy the user. To permanently remove the incident it has to be determined what the known error is. After the known error is determined and the CI (configuration item) at fault is identified, a request for change (RFC) is presented to the change management process for handling.
Incident Matching
C hange In Infrastructure
Benefits
Reduced business impact of incidents By having proper incident management processes in place the duration of the incident will be reduced, resulting in the management of the customers expectations to resolution. Proactive Identification amendments of beneficial system enhancements and
The ability to use the Incident Control System to analyze incidents, what is causing them and how they can be resolved. (Knowledge base) Availability of business-focussed management information related to the SLA The incident control system being used by the service desk contains information on all incidents. There will be metrics indicating when the incident was logged, resolved and closed. This information will be compared with the SLA requirements, which will indicate if the incidents were resolved within SLA targets. Improved monitoring of SLA The service desk using the incident management process will monitor all Incident Control Tickets being escalated to 2nd level support and up to ensure that SLA targets are met. Improved management information Incident management will capture all information related to incidents, both the symptoms and the resolution. This will give management a better insight into the IT infrastructure. Information on how many incidents for each classification are created, including what areas were affected and how long it took for resolution. Better staff utilization and efficiency Incident management is responsible for ensuring that the right resources are working on the right problems. By doing this, the resolving groups will be used more efficiently and at the correct times.
More accurate CMDB information Incident management will ensure that all information going into the Incident Control System ticket is correct. The information that populates the ticket comes from the CMDB. Verification of the information will always be done and any discrepancies will be corrected. Improved user and customer satisfaction The faster incidents are resolved correctly the first time the more satisfied the user/customer. The Incident management process ensures that speed and accuracy is incorporated into the process of incident resolution.
PROBLEM MANAGEMENT
The goal of problem management is to minimize the adverse effect of incidents on the business, problems caused by unknown errors in the infrastructure and to prevent the recurrence of incidents related to those errors.
(Error Control)
Error Control Now that the problem has been identified, error control eliminates known errors by working with change management. Error control has to be aware of, monitor and eliminate known errors where possible in a cost justifiable way.
RFC
Assistance with the handling of major incidents Problem management will assist incident management with major incidents by alerting them of known errors and work-arounds.
Proactive Problem Management Proactive prevention focuses on identifying and resolving problems before incidents start to occur. This can be done in 3 ways: Trend Analysis Using metrics and incoming data to identify patterns related to incidents or CIs indicative of a problem. Targeting Preventive Action Be aware of current incidents Volume of incidents The number of customers effected Duration and related costs of resolving those incidents The cost to the business with this outage
Providing information to the organization Information on the IT infrastructure that is found within the CMDB can be very useful to the organization when they go through the exercise of selecting new Hardware. Completing major problem reviews Major problem reviews occur when the IT infrastructure has stabilized following a major problem. In this review there will be a description of the known error and what caused it. An example could be that the known error was a scheduled job that didnt run, which was caused by a change to the production schedule. The major problem review will identify the actions that were taken to resolve it and what future considerations will be taken to prevent this from happening again.
Benefits
Increased IT organizational service quality More effective and efficient handling of incidents, problems and known errors will result in the increase of IT service quality Improved IT and Business relationship Improved management information Reduction in the number of incidents and problems will improve user productivity Permanent solutions Solutions that are put in place by problem management will become part of the infrastructure and will be used by 1st level support to increase the likelihood of resolution at 1st call.
CHANGE MANAGEMENT
Change management exists to ensure that all changes introduced into the IT infrastructure do not negatively affect service levels. The changes have to be done using standardized methods and procedures in an efficient and prompt manner to minimize impact.
RFC LOG Request for change Log All requests for change are given a call tracking number so that each RFC can be monitored and referenced. The information contained in the log will be used at post change meetings and any future requests showing similar requirements. FSC Forward schedule of change Information obtained from change management about any scheduled upcoming Changes. This information is vital to Projected Service Availability (PSA). Change advisory board A membership of people ranging from the Change Manager to support staff and Selected Subject Matter Experts (SMEs). These people meet regularly to approve and assess change requests. Members can change from meeting to meeting depending on the type of RFC.
CAB
CAB/EC Change advisory board executive committee Selected members of the CAB committee that will be contacted in extreme or urgent situations to assess and approve changes that are outside of the time lines agreed to for changes. Board Senior Management Board Level Senior level management that makes strategic decisions resulting in possible major changes to the infrastructure. When approved, these changes will be passed to the CAB for authorization and scheduling.
CM PSA
Change manager (A role that has defined responsibilities) Cradle to grave ownership of all RFCs, who sees them Owner and chair of the CAB meetings and selects the attendees that will be required for each meeting Issues FSCs via the Service Desk
Projected service availability This report is a result of the FSC. It contains details of changes to availability in relation to SLAs. This information is also used in the development of new SLAs.
What is a Change?
A change is an action that alters the status of a CI that is found within the IT Infrastructure (CMDB).
R e q u e s t F o r C h a n g e C o n te n t
R F C n u m b e r W h o w ill b e im p a c te d b y c h a n g e ( w h o ) D e s c r ip tio n ( w h a t, w h e r e ) Im p a c t a s s e s s m e n t T im e r e c o m m e n d e d fo r c h a n g e ( w h e n ) R e a s o n (w h y ) C o n ta c t p e r s o n in fo . ( w h o ) B a c k o u t p la n
The following questions will have to be answered by the Change Manager: Has this type of request been submitted before? The change manager will check to see if an RFC has been submitted in the past. Was the previous submission rejected? why? Was it successful? Why or why not? The change manager needs to understand the direction of the proposed change. This can only be done by the creation of a standard RFC document that will be used by all IT departments. If the RFC is complete and is accepted by the change manager it will be put forward to the CAB committee. However, acceptance does not equal authorization. If the change is small enough, the change manager can authorize it him or herself. If the change manager authorizes a change and thinks its necessary, the CAB committee will be informed of the changes that were authorized. Allocating priority and classification Now that the change manager has accepted the RFC it has to be classified and prioritized. Every company will establish their own change model scope that will identify classification. As in incident management, classification will be based on the impact and urgency of the change request. Changes can be classified by range from miniscule to mega. For example: Change requests are based on impact of the problem & urgency of the remedy:
Immediate
Mission critical system affected or effecting a large number of users. Immediate action required (CAB/EC) Severely effecting critical business functions and must be given highest priority No severe impact but correction needs to be completed prior to next release or upgrade Change is justified but urgency for completion is low
High
Medium
Low
AUTHORIZING
Change managers can authorize minor changes but should inform the CAB committee of changes that were authorized CAB committee will authorize medium or significant changes Senior Level (Board of Directors) will authorize major changes on a strategic level. After they authorize the major changes, authorization will then go to the CAB for review.
MINOR
Change Manager
CAB MEETING
Authorization
The authorizing bodies will make an impact assessment on all change requests. The assessment will be based on the following criteria: Impact upon Business Impact upon SLA Targets Other Services Impact of not doing Resources and Costs (Time & People) Current FSC and PSA Release Management (What effect change has on release) Ongoing Maintenance of Resources
CHANGE CO-ORDINATION
Throughout the whole change co-ordination process the configuration management owner is kept up to date through updates to the RFC log. The change manager directs any updates to the RFC log to the configuration management process owner who owns this RFC log. BUILD The change manager will ensure that the correct resources are assigned to build the change. This information will be contained within the RFC and will either identify certain individuals or groups. TEST Testing will take place by the requester after the change build is complete. The testing specifics should be contained with the RFC. The builder will only test emergency or urgent changes if time permits. IMPLEMENT After the testing has been complete to the satisfaction of the requestor/builder then the change manager arranges the implementation of the change into production. POST IMPLEMENTATION REVIEW After the change has been put into production the change manager will do a post implementation review to see how the change process worked in regards to the implemented change. Some questions to be answered in the review are: What caused the need for the change? If the change was a result of a problem then the problem management will get involved in the implementation review process. What can be done to avoid the problem that caused this change?
DEALING WITH URGENT OR EMERGENCY CHANGES There will be time when a change will be considered urgent or emergency. The change manager, when confronted with this type of change, will initiate the process in which the CAB/EC (Change Advisory Board Emergency Committee) will convene to assess the change. CAB/EC will consist of selected members from the CAB who have the accountability and authority to make higher-level decisions. Clear guidelines with an associated process need to be established as to what is considered an urgent or emergency change. The CAB/EC will meet when alerted. They will quickly assess the situation, authorize and coordinate the change. After the change has been completed and implemented the CAB/EC will do a post implementation review.
BENEFITS Alignment - The change manager will align all IT services with business needs based on the changes being put forward. The change manager will need to understand the impact of every change on the business Increased Productivity of both users and IT personnel: Users Better quality changes with fewer disruptions Personnel - The change manager will ensure that the right support personnel are working on the changes resulting in better use of resources Risk - Since change management assesses and filters all RFCs, the risk of the implementation of bad changes is reduced Quality reports on information related to changes - Change management will create reports on all changes, which will include: The number of changes Category/type Successful/Non-successful Increased volume of changes - With a strong change management process, there will be the ability to absorb high volume of changes
RELEASE MANAGEMENT
The goal of release management is to ensure that all requests for change management take into consideration both the technical and non-technical aspects. For example, when a RFC involves upgrading the operating system from Windows 95 to Windows ME, release management will ensure that every workstation will receive software (Windows ME), hardware requirement updates (Ram or Hard Drive space) and documentation (Windows ME manual and training). All four of the change components need to be co-ordinated so that the user service levels arent interrupted.
ACTIVITIES
Release policy and planning This policy will cover the number and frequency of releases in certain business units. All releases must have a unique identifier so that configuration management can maintain information on them. At this point, clarification of peoples roles and level of authority / responsibilities have to be determined. Designing, building and configuring a release Planned and documented release procedures should be used for all software releases. If possible, reusing a previously implemented release procedure is recommended. Points to consider: Release definition Release plan Fit for and release acceptance This is testing that should be done prior to any release. A test group could be used to test the actual release plus perform any stress testing that is needed. Rollout planning Time lines, identifying the CIs being affected, documentation, training and communication planning Communication, preparation and training Initiating the communication campaign, preparing departments and affected CIs for the release, initiate training program per roll out plan. Distribution and installation Initiating the phased distribution and installation of software into the infrastructure.
RELEASE POLICY
Should include: 1. Clarification of roles and responsibilities for release management 2. Could be on a document or a number for each supported system or IT service Should be used for Large or official hardware roll-outs Major software roll-outs Bundling or batching related sets of changes
DSL CONSIDERATIONS
Media Naming convention Environments supported Security arrangements Scope Retention period Audit procedures
BENEFITS
Improved quality of services due to successful releases. Successful releases would improve the quality of service by improving the hardware or software within the infrastructure. They would also reduce the possibility of outages created by releases that were not properly managed, which would have resulted in backing the change out. Planned use of resources With the ability to track all software and hardware via the CMDB, DSL and DHS, all resource usage is maximized. The correct amount of hardware and licenses are accounted for and are at the disposal of authorized change management requests. Managed expectation levels Using release management, the user will receive hardware, software, documentation and training at a scheduled time (of which they are informed). Improved historical data concerning releases Historical data will be kept on all releases as a knowledge base. When releases are requested in the future, there will be data to show how past releases went. Improved control of software and hardware assets Release management via the DSL and DHS will know exactly where all hardware and software CIs are. This will pertain to assets that are owned or leased.
Reduced risk of unauthorized or illegal software Release management will only authorize licensed software, so pirated or illegal software copies will not be a concern.
Release Management
Release Release Policy Planning Design & Develop or Order & Purchase the Software Build and Fit-for- Release Roll-out Communication Distribution Configure purpose Acceptance Planning Preparation & & Release Testing Training Installation
Periodic reviews Review of all SLAs need to be scheduled with the customers on regularly occurring intervals (i.e. monthly). Reviews of all OLAs and UCs to ensure that they are still valid and support the SLA.
SLM process
Establishing the Establishing function the function Planning Planning Implementation Implementation
Negotiate Negotiate
Monitor Monitor
Report Report
Review Review
Review of SLA's, Review ofUC's SLA's, OLA's and OLA's and UC's
SLA CONTENTS
Introduction Service Hours Availability Reliability Support Throughput Transaction response times Batch turnaround times Change IT service continuity and security Charging Service reporting and reviewing Performance incentives/penalties
Benefits
Both the customer and IT provider clearly understand their roles and responsibilities Both parties agree to levels of service and information requirements from each side Communications will be scheduled Service level targets will be set The agreed targets will be documented and measured Service levels will be monitored and reviewed All metrics regarding service levels will be supplied to both the service level manager and the customer. This information will be reviewed and a SIP will take effect if necessary. SLR will drive IT services OLAs and UCs will be better aligned with the business requirements
CAPACITY MANAGEMENT
The goal of capacity management is to ensure that current and future capacity and performance needs for business requirements are provided in a costeffective way. This can be done by: Monitoring the performance and throughput of all IT services and the supporting Infrastructure components Tuning resources to make them more efficient Understanding the present and future demands on IT resources Using financial management, influencing customers on how resources are being used Producing a capacity plan for IT service providers to support services that are required within the SLA
BUSINESS CAPACITY MANAGEMENT Business Capacity Management is the process responsible for ensuring that the future business requirement for IT services are considered, planned and implemented. The use of existing data describing the way current resources are being used by all departments will help in forecasting future requirements. Business capacity management must be: Responsive to change and changing requirements Aware of customers SLRs Involved in both Change Management and Project Management
SERVICE CAPACITY MANAGEMENT Service Capacity Management is the process responsible for managing the performance of operational IT services used by the customer. IT services are monitored and measured against the targets within service level agreements and requirements. These measurements are recorded and analyzed and if necessary action will be taken on IT resources to improve performance. The advice of specialists within resource capacity management and making staff accountable and knowledgeable in capacity management will help improve resource performance.
RESOURCE CAPACITY MANAGEMENT Resource Capacity Management is the process that manages single CIs within the IT infrastructure. All resources with limited resource are monitored and measured. If necessary, action will be taken against the data that is collected and analyzed. This action is to ensure that all business requirements are met. Output from Capacity Management Capacity plan Capacity database Baselines and profiles Thresholds and alarms Capacity reports (regular, ad-hoc and exception) SLA and SLR recommendations Costing and charging recommendations Proactive changes and service improvements Revised operational schedule Effectiveness reviews Audit reports
Im ple m entation
A nalysis
M onitoring
SL M Thresh olds
CD B
A p p li ca ti o n S izi n g P r o d u ctio n o f th e C a p a ci ty P l a n
Co v e rin g a l l a sp e c s o f B C M , S C M a n d RC M
Modeling Modeling helps in predicting the behavior of IT services under certain conditions. Trend Analysis (historical performance of resources) Analytical modeling (through use of mathematical software program) Simulation modeling (through use of a program to simulate computer processing it helps in accurately sizing new applications but time consuming) Baseline Models (Reflects the performance that is being achieved)
Application Sizing Implemented at the project initiation and design stage for a new application. Application sizing takes into account what resources are needed for applications being built internally or purchased from a vendor. At the initiation stage care has to be given to ensure that enough space is given for growth (SLR). Production of A Capacity Plan The capacity plan documents the current levels of resource utilization and service performance. Forecasts and future requirements for resources are made after the analysis of the business strategy and plan are considered. The plan should include: Assumptions Recommendations on resources required, cost, benefits and impacts
BENEFITS
Increased efficiency and cost savings including: Deferring costs of new equipment: If the current technology is used more efficiently within the infrastructure then the need to purchase new equipment can be deferred. This could either free up financial resources to purchase other equipment or the financial services could be saved for future purchases. Economic provisions of service: Matching capacity with business needs properly would avoid the maintenance of unnecessary capacity, which will result in a cost saving. Planned buying: Planned buying would reduce the risk of panic buying.
Reduced Risks A significant benefit of capacity management is reduced risk. When capacity is managed effectively then the risk of failure is reduced: With current applications the risk is reduced by managing the resources and service performance With new applications the risk is managed through the application sizing The change advisory board should include capacity management, and should assess the impact of changes Effective capacity planning will reduce the number of emergency change increases to capacity More confident forecasts Over time, capacity planning improves due to the data collected. For example, normal operating baselines and monitoring data collected will help in capacity planning. Using application sizing and modeling when introducing new services will help forecast capacity needs more accurately. Value to applications lifecycle: Early identification of capacity needs in the development of applications should be put into the capacity plan. This will reduce the risk of running into capacity issues when applications are in production.
AVAILABILITY MANAGEMENT
The goal of Availability Management is to: Ensure IT Services have the availability designed to meet the business requirements Optimize availability to satisfy business objectives Deliver to the business a level of availability in a cost-effective way Offer sustained levels of availability, reduced outages and to always strive for improvement.
INDICATORS OF AVAILABILITY
Mean Time to Repair Incident Incident is the actual time the outage started Detection The point at which the user discovered the outage Response Time The time in which the outage was escalated for support Diagnosis The time the outage was investigated Repair The time it took to repair or replace the CI at fault Recovery time The time it took to bring the machine back up to its original state Restoration The time the business resumed normal operations
BENEFITS
Single point of accountability: The process owner is designated for Availability Management and will ensure availability benefits the business by: Addressing business availability needs Making availability services designed to be cost effective to meet business needs Ensuring the levels of availability are sustained and measured to support SLM Addressing any issues with availability and taking corrective action Ensuring outages due to lack of availability are reduced Helping the mindset of IT move from reactive to proactive With IT support adds value to the business
WHY ITSCM?
The main reason to have ITSCM in place is to ensure business survival by reducing the impact of a disaster or major failure. A plan will analyze risks and vulnerability and put in place measures to reduce them. ITSCM will produce an IT recovery plan, which will prevent the loss of customers and users due to lack of confidence.
SOURCES OF CRISIS
Companies may face a crisis (A disruption of service that exceeds limits set out within a SLA) due to hardware or software malfunction, human error, natural disaster or viruses. Hardware malfunction Hardware malfunctions have been the leading source of crisis over the past 10 years. There are a number of reasons this may occur, ranging from the quality of the product to the technology being used to incorrect specifications. Human error Human error is a large cause of disasters. This can be attributed to lack of training, mistakes or malicious behavior. Software malfunction Software malfunctions can be caused by coding errors, improper versions being introduced into a production environment or changes being introduced without going through change management. Virus Software code specifically designed to hinder or destroy systems. introduced via: Floppy diskette CD rom Internet Natural disasters Earthquakes Floods Tornadoes Volcanoes
Usually
RESPONSIBILTIES
The business needs to understand the options of ITSCM, as the buy in from the business is vital to the success of ITSCM. All possible solutions should be clearly documented and explained, including costs and time lines. It is necessary to explain to the business what the consequences are of either delaying the implementation of a plan or not having a plan in place at all. The roles and responsibilities have to be documented and explained to management and individuals involved.
SOURCES OF RISKS
STAGES OF BCM STAGE ONE: Establish a policy The owner of the BCM process will need to ensure all members of the BCM team are aware of their roles, responsibilities, intentions and objectives Identify terms of reference and scope Clearly define the roles of management and staff through a risk assessment and business impact analysis of their respective departments and how the departments should be run following a disaster. Allocation of resources Both financial (costs) and human resources (skill set) need to be considered to be able to do the analysis needed in stage 2. Definition of project organization and control structure Both ITSCM and BCM projects are complex in nature and need to be structured using a software tool to be able to track time lines, accomplishments and tasks/responsibilities (i.e. MS project). Agreement of project and quality plans will need to be completed by all necessary parties.
STAGE TWO:
Analysis
Risks
Management
Countermeasures
Business impact analysis Identify critical business processes and what damage or loss to the organization would occur if that business process was disrupted Identify the type of loss e.g. financial, increased cost to running business and/or intangible Identify if the outage would cause more outages if not corrected soon Identify the minimum requirements for staff, technical ability and facilities needed to restore services Time lines to recovery need to be identified
Risk Assessment Identify risks (assets that are a vital business function) Assess threats and vulnerabilities Security (from terrorism) IT Services being below ground near a large water source Locations to potential dangers like mountains, etc Levels of risk based on vulnerability and threat defined by BCM owner Low Medium High Business Continuity Strategy (countermeasures) Using the information collected from the impact analysis and assessment, a strategy is put in place for risk reduction. Both a risk reduction and recovery strategy are required and should compliment each other. A cost versus benefits analysis should also be completed. Would it cost more to put a plan in place then the recovery itself? Counter measures Manual Back up Usually help desks, call centres or order takers can use this where paper alternatives can be initiated until the system is back up Reciprocal Arrangements - A situation where 2 similar organizations share the cost of an off-site solution Gradual Recovery (Cold Standby - Greater then 72 hours) Intermediate Recovery (Warm Standby - Between 24 72 hours) Immediate Recovery (Hot Standby (Within 24 hours) Do Nothing (When the cost of having a recovery solution outweighs the risks)
STAGE THREE: Organization planning Executive Authority level (senior management) responsible for crisis management Co-ordination Level below executives responsible for the recovery effort coordination Recovery Teams within critical business functions responsible for executing the plans Implementation planning Co-ordination of the following plans: Emergency response, Damage assessment, Salvage, Vital Records and Crisis Management/Public relations Implement: Risk reduction measures UPS, Fault Tolerant systems for critical applications, off-site storage and archiving, disk mirroring and spare equipment. Stand-by arrangements 3rd party recovery sites like Comdisco or Sunguard Fully equipped company owned stand-by location Installing a stand by computer system
Develop: ITSCM plans Administration (Who, what, when and where) The right people need to be identified as to who will be involved in the ITSCM plan IT infrastructure (Hardware, software & documentation to support the business) The documentation needs to be written so that anyone who is not familiar with the system can use it for system recovery Personnel (People related to staff and accommodation) Security (Instructions for fire, explosions, first aid etc. Also for retrieving backups) Alternative location (Info on alt. Location people, address, facilities etc.) The plan will include actions to bring the IT infrastructure back to its operational state Procedures Should include: Hardware and network installations and testing Reference points to guide the restoration of software and data Procedures to consider time zones and multinational organizations What is considered a business cut off point
STAGE FOUR: Operational Management: Education and awareness All staff within the IT organization should be aware of the ITSCM and BCM and their roles Training In recovery procedures and responsibilities Review The review of the ITSCM and BCM to ensure they are current and valid Testing Testing should be done: When the plan is complete Once a year After every major change to the Infrastructure Change control All changes to the ITSCM or BCM must follow Change Management Process rules. The ITSCM and BCM are considered CI items within the Configuration Management Database
BENEFITS OF ITSCM
Potential for lower insurance premiums due to proactively managing the business risks Business relationships improve due to closer working relationships and increased understanding of risks and dependencies Positive marketing of contingency capabilities results in higher levels of non-interrupted service Organizational credibility improves with the addition of ITSCM and BCM Competitive advantage due to the ability to reduce risk and incorporate business safeguards
IT ACCOUNTING Making cost effective decisions on service provisions is necessary to assess the service being provided against cost. For example, Is it more effective to outsource a service or keep it internal? Putting financial accountability into the hands of managers ensure that every cost decision is cost justified with a business case like any other business investment. Provides the ability to measure under/over usage in financial terms. We were $10,000 over budget on IT expenditures this year Provide for an understanding of the cost of changes and the implications of not taking advantage of a business expenditure. What is the cost if we do it? What is the cost if we dont do it? CHARGING Charging is a means of recovering costs for services rendered. Since customers pay for services, they have the right to influence decisions on the service. Therefore, IT services should be operated as a business unit. Cost Price of services rendered is the cost Costplus Price of services rendered plus % of profit Going rate Price compared to other internal departments or external similar organizations Market rate Price matched by external suppliers Fixed price A set price for a period of time based on anticipated usage
All IT COSTS
External services Outsourced Help Desk, DRP providers, contract companies Software Databases, operating systems, incident control systems People Payroll, benefits, fees associated with staff Transfer Internal charges from other cost centres: Help Desk, Lan Services, and Operations Hardware Mainframes, servers, PCs Accommodations Building, Offices, Utilities, storage areas
COST ELEMENTS
Costs can be broken into two categories: capital or operational Capital The outright purchase of a software package or piece of hardware would constitute capital costs Operational Day to day running of the software or hardware including staffing, utilities, etc would constitute operational costs. Each cost is part of one of 3 elements:
A ll IT C osts
E x te rn al S o ftw a re P e o p le T ra n sfe r S ervice H a rdw are Ac co m m o d atio n
C o s t E le m en ts
S e rvic e A
S e rvic e B
S ervice C
S ervice D
T o tal C o st o f IT S ervices
Direct Costs, Indirect Costs and Unabsorbed Indirect Costs DIRECT COSTS Traced to a cost center or department. For example: Servers Applications used exclusively by a single cost centre
INDIRECT COSTS Traced to a cost of providing a service. For example: Operations staff Network or technical services UNABSORBED INDIRECT COSTS Traced to a cost that cant be assigned to a customer. For example: IT management Building or Facilities DEVELOPING A CHARGING SYSTEM Scope Informing the business of the need for putting the system into production. This charging system should: Determine right policy for the organization Recover fairly and accurately the agreed costs of services Shape the behavior of customers to ensure best return on IT investments Charging Policies will: Force the business units to control their own user demands Reduce overall costs and highlight areas that are not cost effective Match internal services to justifiable business needs through direct funding Recovery of costs The recovery of costs should be simple, fair and realistic. Simple Less bureaucracy with improved overall cost-effectiveness. Fair Ensure that the charge for services provided reflect market value. Realistic The charging policy must be designed to adjust the behavior of the business. If it is perceived that the cost of the service is too low then it will be exploited by the business. Therefore, realistic charges have to be implemented.
Customer behavior Informing customers of costs can adjust customer/user behavior to make better use of IT resources. Also, informing customers/users of costs associated with services being rendered will make better corporate citizens. This will serve 2 purposes: Reduce the inefficient use of IT resources. Better control of the peaks and valleys of IT usage by adjusting charges based on usage and times of usage.
COSTING, CHARGING AND BUDGETING CYCLE An IT operational plan will contain information from the Business IT requirements and Financial targets dictated by senior management. The outcome of this subprocess is fed into a cost model where the costs are analyzed to determine which charging policies should be administered. The proposed charges are then fed back to the business.
BENEFITS
Reduced long term costs by budgeting, monitoring and controlling expenditures Increased confidence in setting and managing budgets. By using controls implemented by financial management there will be improved accuracy and professionalism. Turning IT into a business unit will ensure cost justification is put in place for all expenditures. Accurate cost information when making a business decision on investments will be made clearer. For example, How much did it cost to run that software, hardware or help desk? Can it be done cheaper by using a different technology or service provider? Making the organization aware of costs of doing business will result in better and more efficient use of IT resources. This will increase the professionalism of each IT business unit and will also make it easier to perform market comparisons with alternate service providers.