Incident Management Process

1
Loblaw
IT Service Management Processes
Incident Management Process
Document Name: Incident Management Process
Version History Version

1.00 1.00 1.1 Ali Alaswad
Name
Ali Alaswad
Comment
(the reason for the increment to the version)
Date
July 3, 2008
1st draft Final
Put more description and details Nov 7, 2008 on Incident review activity at 2nd line support
Document Distribution Control Recipient Name

Alex Foord Patricia Tremblay Bill Charters Patrick Ma Dorota Mac Bobby Seebalack
Version
1.0 1.0 1.0 1.0 1.0 1.0
Date
July 4, 2008 July 4, 2008 July 4, 2008 July 3, 2008 July 4, 2008
Table of Contents
1. Process Goal .................................................................................................................... 4 2. Process Scope.................................................................................................................. 4 3. Process Benefits .............................................................................................................. 4 4. Process Overview ............................................................................................................ 5 5. Process Triggers .............................................................................................................. 6 6. Process Interfaces with Other ITSM Processes............................................................... 7 7. Incidents policy ............................................................................................................... 8 8. Roles and Responsibilities ............................................................................................. 10 9. Roles Assignment Matrix .............................................................................................. 12 10. Priorities-High Level Definition ................................................................................... 13 11. Impact-Urgency Matrix ............................................................................................... 14 12. Incidents Service Level Targets Definition .................................................................. 14 13. Process Deliverables ................................................................................................... 15 14. Process Measurement (Metrics) and Reporting ......................................................... 15 14.1. Metrics ................................................................................................................. 15 15. Process Meetings ........................................................................................................ 16 15.1. Daily Meeting ....................................................................................................... 16 15.2. Monthly Meeting ................................................................................................. 17 16. Process RACI Chart ...................................................................................................... 18 17. Process Detailed Description ...................................................................................... 24 18. Legend & Definitions ................................................................................................... 37 19. Attachments................................................................................................................ 38
1. Process Goal
The primary goal of the Incident Management process is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained.
2. Process Scope
Incident Management includes any event which disrupts, or which could disrupt, a service. This includes events which are communicated directly by users, either through the Service Desk or through an interface from Event Management to Incident Management tools. Incidents can also be reported and/or logged by technical staff. This does not mean, however, that all events are incidents. Many classes of events are not related to disruptions at all, but are indicators of normal operation or are simply informational Incidents and service requests are different, service requests do not represent a disruption to agreed service, but are a way of meeting the customers needs and may be addressing an agreed target in an SLA.
3. Process Benefits
For Business: Critical/high Impact, critical/high urgency incidents first Quicker resolution of incidents leading to productivity gains
For IT Organization - Clear view of the status and priorities of the incidents - Remove duplication of effort - Higher user and customer satisfaction For Customers/Users - Incidents are not lost or forgotten - Quick restoration of service following an incident - Up to date status of their incident provided
4. Process Overview
The Incident Management processes is used to report, log, assess, categorize, restore, resolve, verify and close Incidents that occur within the IT services, applications, and infrastructure items. This process does not address Problem Management, Root Cause Analysis, Incident Trend Analysis, and Service Request Process (Request fulfillment). Incident Management and Problem Management are two separate activities that are part of the overall objective of improved reliability within an IT environment. An Incident is any event that is not part of standard service operation that causes or has the potential to cause interruption or degradation in business operations or will result in deviation from Service Level Agreements. Recurring Incidents or issues are resolved through the Problem Management Process. A separate Problem record is created in the problem management system to manage and track the resolution of the Problem. Incident Management includes the following phases:
Incident Detection & Recording
Incident Prioritization, Categorization & Initial Support
Investigation & Diagnosis
Resolution Verification & Incident Closure
Incident Resolution
Restoration/ Recovery
Figure - 1
5. Process Triggers
Incidents can be triggered in many ways. Servicedesk (Common Route) Completes a web-based incident-logging screen Event management tools Technical staff may notice potential failures and raise an incident Suppliers who may send some form of notification of a potential or actual failure.
6. Process Interfaces with Other ITSM Processes
Incident interact with other processes shown in the below diagram.
Problem Management
Unknown error, Root cause analysis required
Known error, Workarounds, Quick Fixes, Permanent Solution Incident associated with CI in CMS (CMDB)
Configuration Management
Information on CI Incident solved by change
Change Management
Information on change planning and implementation, Change failure Service Level Breaches
Service Level Management
Service Level Agreement, Service Catalogue Trigger for performance monitoring
Incident Managem ent Process
Capacity Management
Reports on capacity related incidents Incident Management data to determine the availability of IT services and look at where the incident lifecycle can be improved Reports on availability related incidents
Availability Management
Figure - 2
7. Incidents policy
Policy -1: Incident and Service Request (Request Fulfillment) Management Process are two separate processes. Policy -2: Incident and Problem Management Process are two separate processes. Policy -3: Single Point of Contact (NSC Servicedesk) Policy -4: One centralized Tool for Incidents across the IT organization Policy -5: Problem record can be created during an Incident Life Cycle. Policy -6: Request For Change can be developed during an Incident Life Cycle. Policy -7: Incident Management escalates and notifies on the followings: a) When incidents reaches 75% of service level targets b) When incidents reaches 100% of service level targets (SLA breached) c) No solution internally nor externally is available d) Critical or High priority Policy -8: End user means all parties or individuals benefiting from IT services. Policy -9: Incidents ownership described as follows:
Activity
Logging & Recording Incident Incident Prioritization, Categorization & Initial Support Investigation & Diagnosis
Owner
-Service desk (If logged through Servicedesk) -IT Staff (If logged by IT Staff) -Service desk (If logged through Servicedesk) -IT Staff (If logged by IT Staff) -Service desk (Initial Support-1st Line Support) -IT Service Support specialist (2nd Line Support) IT Service Support specialist (2nd Line Support) IT Service Support specialist (2nd Line Support) IT Service Support specialist (2nd Line Support) Servicedesk
Restoration/Recovery Incident Resolution Resolution Verification & Incident Closure Incident Re-Open
Policy -10: Incident record closed automatically after three days from resolution and it can be re-opened within the same period (3 days) otherwise it will be opened as a new incident record and linked to the original one.
Policy -11: Incident record shouldnt bounce more than 3 times within the team or between incident resolving groups.
10
8. Roles and Responsibilities Role

Process Owner Process Manager/Incident Manager Servicedesk Analyst/1st Line Support
Responsibilities
Responsible for end-to-end success of incident management Perform qualitative management Manage and review problem activity Monitor the effectiveness and efficiency of the process and undertake management of continuous improvement. Assure Incident Manager is on course and schedule Review, analyze and approve changes and modifications to the incident process. Assist and advocate integration between teams and processes. Coordinate group procedures to incident management tasks and activities. Driving the efficiency and effectiveness of the Incident Management process Producing management information Monitoring the effectiveness of Incident Management and making recommendations for improvement Managing critical and high Incidents Conducting and facilitating the Post Incident Review meetings. Logging all relevant incident/service request details, allocating categorization and prioritization codes Providing first-line investigation and diagnosis Resolving those incidents/service requests they are able Escalating incidents/service requests that they cannot resolve within agreed timescales Keeping users informed of progress Conducting customer/user satisfaction callbacks/ surveys as agreed Communication with users keeping them informed of incident progress, notifying them of impending changes or agreed outages, etc. Updating the Configuration management System under the direction and approval of
11
Configuration Management if so agreed. IT Service Support Specialist/ 2nd Line Support 3rd Party Company Diagnosis and perform incident control for all issues in their specialty or area of expertise as assigned. Restore service for incidents of assigned to service partner per SLAs, OLAs and Prioritization incident. Update, status, document, and resolve all assigned tickets. Follow and support all tasks and procedures of the incident management process per OLAs. Prepare and submit necessary request for change. Create problem record for further root cause analysis. Verify resolution with the requester. Monitor ticket from inception to closure per SLA and targets and escalate on SLA violation. Escalate on critical and high incident priorities. Follow up with technical support teams on accepting the ticket. Follow up with technical support teams on restoring services and resolving incident. Assure status and updates as well as targets are met and appropriately documented. Run scheduled and ad hoc incident management reporting. Receive incident records as per the SLA Provide incident solution Update incident records Communicate with IT incident requester Follow LCL timelines and targets as per SLAs Responsible for contacting the service desk and initiating an IT incident. Articulate issue and appropriate entitlement information Common availability or designee for notification and contact about the restoration of service.
Servicedesk Coordinator
Incident Requester
Figure - 3
12
9. Roles Assignment Matrix Role Name of Locatio Resources n

Process Owner Process Manager Alex Foord Patricia Tremblay Toronto Montral
Tel
905-861-2464 514-383-8851
Email
Alex.foord@loblaw.ca patremb@provigo.ca
Time Zone
EST EST
TBD
Servicedesk (1st Line Support)
TBD
Montral
1-866-6727924 (1-866NSC-7X24) 514-383-7019 (Montral)
EST
IT Service Support Specialist (2nd Line Support)
InfrastructureNetwork LAN InfrastructureSystems InfrastructureSecurity InfrastructureNetwork Wireless Application-
IBM- TBD 3rd part Company
Toronto
Figure - 4
13
10. Priorities-High Level Definition

Critical: Complete outage or partial outage of service(s) or component(s) that stop one
or more of the Vital Business Functions causing significant loss of revenue or the ability to deliver important public services. Service(s) or Component(s) supporting a critical business process is down or not functioning correctly or one or several critical business processes are unavailable, affecting all users. There is no workaround
High: Severely affecting some key users, or impacting on a large number of users.
Service(s) or Component(s) is not down but there is a serious problem affecting a great majority of the users and their productivity or affecting an individuals ability to conduct business effectively. Work around (if provided) is awkward and inefficient.
Medium: No severe impact

Service(s) or Component(s) is not down but there is a problem affecting a small number of users. Business critical work can be performed. Acceptable workaround is available.
Low:
Service(s) or Component(s) is not down, business critical work can be performed, but a cosmetic work would be beneficial.
List of Critical Services No. Service name 1 2 3 4 5 6 7 8
Service Owner
Location
Service Hours
14
11. Impact-Urgency Matrix (See Appendix A for the Modified Matrix)
Impact
High High Urgency Medium Low 1 2 3 Medium 2 3 4 Low 3 4 5
Priorities
Figure - 5
12. Incidents Service Level Targets Definition (See Appendix B for the Modified Matrix)
Priority
Description
Code
Accept Incident Record 15 min
Service Level Targets

Restore/Recover Service(s) 1 hr 8 hr Resolve Incident
Critical
High
30 min
1.5 hr
12 hr
Medium
24 hr
48 hr
72 hr
Low
72 hr
4 Days
7 Days
Planning
Planned
Figure - 6
15
13. Process Deliverables

Service Restore/Recovery Incident Resolution Post incident Review Report (On critical & high incidents) Problem Record (If apply) Change Request (If apply)
14. Process Measurement (Metrics) and Reporting

Parties involve in the Incident Management Process can monitor, investigate and improve on the findings within reports, will identify incident pattern and measure them against SLA and expectations. Reports will be used in process meetings, to improve process and evaluate teams capabilities and performance.
14.1. Metrics
Number and percentage of critical and high incidents Mean elapsed time to achieve incident resolution or circumvention (Restoration), broken down by impact code Percentage of incidents handled within agreed response time Average cost per incident Number of incidents reopened and as a percentage of the total Number and percentage of incidents incorrectly assigned (Miss-routed) Number and percentage of incidents incorrectly categorized Number and Percentage of Incidents closed by the Service Desk without reference to other levels of support Number and percentage of incidents resolved remotely, without the need for a visit Number of incidents handled by each Team (Workgroup) Breakdown of incidents by time of day, to help pinpoint peaks and ensure matching of resources.
16
15. Process Meetings

15.1. Daily Meeting
Title: Daily Operational Meeting Purpose: Frequency: To ensure normal operation of the infrastructure or a process To detect potential issues with the infrastructure or process concerned Review report on incidents created during the last 24 hours and outstanding incidents. To ensure that corrective action has been taken and that it was effective
Daily (Preferable early morning time)
Role Players: Incident Manager (Facilitator, prepare agenda and write minutes of meeting) Servicedesk Coordinator IT Managers/Technology Managers Potential escalation to more senior managers if required Staff who execute process, they are on demand depends on incidents categories and the need of their presence in the meeting. Agenda Content: Comparison between required and actual performance Reports of outstanding incidents, missed targets and unexpected levels of performance. Review the status of the actions assigned during previous meetings Develop action plan for the new outstanding issues Agenda will be submitted to the incident manager minimum two hours before the meeting Method of Communication: Tools: Incident Management System Repository for keeping meeting agenda and minutes Conference Call (Tel Number: 1-88...)
17
15.2. Monthly Meeting

Title: Monthly Process Governance Meeting Purpose: Frequency: Overall review on process performance Identify gaps and develop actions plan to accommodate solutions Review report on incidents created during the last 24 hours and outstanding incidents. To ensure that corrective action has been taken and that it was effective
Monthly
Role Players: Incident Manager (Facilitator, prepare agenda and write minutes of meeting) Process Owner IT Directors and Vice Presidents (Infrastructure & Applications) Business operation representative Agenda Content: Comparison between required and actual performance Review business impacts and reports on total incidents cost Reports on overall SLA performance (breaches vs. exceeding the agreed service level targets) Review the status of the actions assigned during previous meetings Develop action plan for the new outstanding issues Agenda will be submitted to the incident manager minimum two days before the meeting Method of Communication: Conference Call (Tel Number: 1-88...)
Tools: Incident Management System Repository for keeping meeting agenda and minutes
18
16. Process RACI Chart

Step
Activity
Incident Requester
NSCServicedesk (1st Line Support)
Incident Manager
IT Operation
3rd Party Company
1 2 3,4, 5 6 7 8,9
Incident Record created by IT Service Support Specialist (2nd Line Support) Incident record created in response to an event management tool alert Reporting Incident by phone, web access or email to service desk Receiving incident record and associated with CI Is it an Existing Incident? Link to the Existing Incident Record and Update the Requester Incident Prioritization Is it a Critical or High Priority Incident? Is it a service request? Execute service request process Incident categorization Provide initial support and diagnosis Incident can be Resolved by NSC? Or Workaround Available in KB? NSC provide solution Incident Resolved within SL Targets? Event Escalated to Service Desk Coordinator Verify incident resolution with Incident Requester Incident Resolution Accepted by Requester? Incident Updated and Closed Can NSC Provide Further Actions within SL Targets? Requires 2nd Line Support? Dispatched to IT Service Appropriate Workgroup-Queue Specialist
AR R I C C C C AR C C C I R AR AR AR AR AR CR AR AR AR AR AR AR AR R AR AR AR AR AR
Incident Requester NSCServicedesk (1st Line Support) IT Service Support Specialist (2nd Line Servicedesk Coordinator Incident Manager IT Operation
AR R
10 11 12 13 14 15 16 17 18 19 20
R AR
21 22 23 24 25
C C C
3rd Party Company
26 Step
Requires 3rd Party Company Involvement?
Activity
19
Support) 27 28 29 30 31 32 33 34 Solution Not Available Escalate to Incident Manager Categorize and Dispatch to the Appropriate Workgroup(S) Escalate to Incident Manager Escalate to service Desk Coordinator Incident Resolution Not Accepted Call received from Incident Requester Re-Open the Incident Record Review Miss Routed Records Create Incident record from Change Management Process Incident Record Reside Under the Appropriate Workgroup/Queue Is Incident Critical/High Priority? Incident Record Created by IT Service Specialist/IT Operation Team Member and Dispatched to the Appropriate Workgroup/Queue Call by Phone the Service Desk to Report the Incident Disruption Reach 75% of SL Targets? Record Accepted, Change Record Status to In-Progress Incident Record Reviewed Is it a Miss-Routed Record? Re-direct Record to Service Desk Queue and Inform Service Desk Analyst by Phone Further Information is Required? SLA Clock Stopped Is the Incident Generated by the Event management Tool? Contact Incident Requester (IT Service Specialist) Information Provided by Incident Solution Requester SLA Clock Restarted Investigate and Diagnosis
R C
AR AR AR AR AR AR AR
I C
C C
C I C AR AR AR R
35
36 37
38 39 40 41 42 43 44 45 46 47 48 49 50
C R C
Incident Requester
C
NSCServicedesk
AR AR AR AR AR AR AR AR AR AR AR AR AR
IT Service Support
C
Servicedesk Coordinator Incident Manager IT Operation
C
3rd Party Company
Step
Activity
20
(1st Line Support) 50-A 50-B 51 52 53 Is Priority Correct Call NSC to Change the Priority Solution Available? Requires 3rd Party Company Involvement Requires Another Loblaw Workgroup Involvement? Incident Record Bounce Exceed 3? Dispatch Incident Record to Another Workgroup for Assistance Solution Provided and Service Restored Change Status to Restored 57 58 Further Root cause Analysis Required? Stop SLA Change Status to Wait RCA 58 Problem Management Process Create Problem Record 59 Start SLA Change Status to In-Progress 60 61 Change Request Required? Stop SLA Change Status to Wait RFC Stop SLA Change Status to Wait RFC 63 Change Management Process Create RFC 64 Change Management Process Change Implemented Successfully 65 Start SLA Change Status to In-Progress 66 Provide Incident Resolution Change Status to Resolved
Specialist (2nd Line Support)
AR AR AR AR AR AR AR C I
54 55
56
AR AR AR AR AR AR AR AR AR AR AR AR
62
21
Step
Activity
Incident Requester
NSCServicedesk (1st Line Support)
Incident Manager
IT Operation
3rd Pa rty Company
67 68
Update Incident Record Verify Incident has been Resolved Automatic Email Sent to Incident Requester or Service Owner
C C R I
AR AR AR AR AR AR AR AR AR
69 70 71 72 73
Is Incident Requester from Stores? Call Requester by Phone to Confirm Resolution Solution Accepted? Record Automatically Closed after 3 Days
I C I AR AR AR AR AR AR AR AR AR AR AR
Further Action Required 74 75 76 77 78 79 80 Escalate to Incident Manager Auto Notification sent to Servicedesk Coordinator Monitoring & Tracking Incident and Receiving Notifications Is Incident Critical/High Priority ? Is Incident Reached to 100% SL Alert ? Is Incident Reached to 75% SL Alert ? Receives 75% Alert from System on Incident Missing SL Targets (Automatic Notification) 81 Follow up with Incident Assignee/Shift Manager until Record Accepted/Restored/Resolved Incident Accepted/Restored/Resolved? End the Notification and Escalation Process and Continue Monitoring & Tracking Incidents Reach 100% Alert Timeframe Receives 100% Alert from System on Incident Missing SL Targets (Automatic Notification) 86 Send Email and Call the Service Delivery
C C
C C I
82 83
84 85
22
Manager/Service Owner 87 88 Step Notify Incident Manager and Service Desk by Phone Start Sending Notifications As Per the Incident Priority Notification Schedule
I I
Incident Requester NSCServicedesk (1st Line Support) IT Service Support Specialist (2nd Line Support)
AR AR
I I
Incident Manager IT Operation 3rd Party Company
Activity
89 90 91 92 93
Incident Restored/Resolved? Receive Notification on Critical/High Priority Incident Revise Incident Prioritization Is Incident Critical/High Priority? Contact the required Support Team Members, Support Team Manager, Sr. Managers, Service Desk Manager and Suppliers/Partners to Join Bridges. Open and Facilitate Operation Bridge Update and Assign Actions Open and Facilitate Management Bridge Update, Review Status and Decide on Next Step Service Restored? Close Operation & Management Bridge Conduct a Post Incident Review When Incident Resolved Problem Management Process Create Problem Record for All Critical/High Priority Incident
AR C C AR AR AR AR AR AR AR AR AR AR C AR AR AR AR AR AR AR AR AR
94 95
96 97 98
99
100
Contact Servicedesk Coordinator to Follow the Escalation Management Process for the Specific Priority Monitoring Event Management Tool Generated Alerts and Systems Performance Alert(s) Requires Actions? Is it Incident Alert? Is it a Critical/High Priority Incident? Call Servicedesk & Escalate to Incident Manager by Phone Create Incident Record Dispatch Incident Record to Appropriate Workgroup Problem Management Process
101
102 103 104 105 106 107 108
C C
23
Create Problem Record 109, 110, 111, 112 113 Step 3rd party Company Receives Request through Call, Web or Email Incident Record Number Exchanged
AR C
Incident Requester NSCServicedesk (1st Line Support)
C
IT Service Support Specialist (2nd Line Support) Servicedesk Coordinator Incident Manager IT Operation
AR
3rd Party Company
Activity
114 115 116 117
Follow 3rd Party Company internal Process LCL SL Targets Violated? Solution Provided Inform LCL Service Desk on Resolution and Verify Incident Resolution with Incident Requester 3rd Party Company Follow LCL Escalation Management Process and Notifications.
I I
AR AR AR AR AR
118
Legend R A C I
Explanation
Responsible for the action but not necessarily an authority or approval Accountable for the action, only one person Consulted before or during the action Informed
24
17. Process Detailed Description No. Activity

1 IT Service Support Specialist Create Incident Record
Explanation
IT staff detect incident and create incident record by directly accessing the system, and dispatch it to the appropriate workgroup (Queue). GOTO activity 36 Another source of incident, tool send alert due to a service disruption or degradation in business operations. GOTO activity 101 End user call Servicedesk to report a service disruption GOTO Activity 6 End user send an email to the Servicedesk email to report a service disruption (Optional) GOTO Activity 6 End user accesses the system and creates incident record using predefined templates in the web access. Some of the incident records will automatically dispatched to the appropriate workgroup. GOTO Activity 6 Incidents must be fully logged and date/time stamped, regardless of whether they are raised through a Service Desk telephone call or whether used the web access or the email or automatically detected via an event alert. See Attachment (Incident Record Template) Service Desk analyst looks into the subject of the incident and if it is another call on an already recorded incident then GOTO 8 If NOT then GOTO activity 10 Service Desk analyst link the created incident record to the original one and give the reference number of the original incident record to the caller for future reference. Requester updated with the reference number and incident status. GOTO activity 119 (END)
Event Management Tool
3 4
Phone call Email
Web Access
Incident Resolution Request Received/Recorded
Is it an Existing Incident?
Link to the Existing Incident Record Update Requester
10
Incident Prioritization
Allocate an appropriate prioritization code as this will determine how the incident is handled both by support tool and staff. Prioritization can normally be determined by taking into account both the urgency of the incident (how quickly the business needs a resolution) and the level of impact it is causing (see Table -5). An indication of impact is often (but not always) the number of users being affected. In some cases, and very importantly, the loss of service to a single user can have a major business impact it all depends upon who is trying to do what so numbers alone is not enough to evaluate overall priority There are four priorities (see section 10) Critical High
25
Medium Low
Service Desk analyst or whoever creates the incident record must assign the correct priority depending on the urgency and impact of the incident.
11
Is it a Critical or High Priority Incident? Is it a service request?
If incident is classified as a Critical or High then GOTO activity 28 If NOT then Continue with activity 12 If Service desk analyst found the call is NOT about an incident and it is a service request then Continue with activity 13 If NOT then GOTO activity 14 Service request (Request fulfillment) is to provide a channel for users to request and receive standard services for which a pre-defined approval and qualification process exists. Service Desk advise requester to follow the Service Request (Request Fulfillment) process for this purpose, sometimes and in certain cases the Service Desk analyst will execute the Service Request (Request Fulfillment ) process to fulfill the requester need (these services needs to be identified and announced to the public such as changing password), otherwise most of the services can be requested through the web by selecting and filling the appropriate template and it will be dispatched automatically to the appropriate group for action. Allocate suitable incident categorization coding so that the exact type of the call is recorded. This will be important later when looking at incident types/frequencies to establish trends for use in Problem Management and other ITSM activities. There are a multi-level of categories, the service desk analyst will select the appropriate and the lowest level category, depending on symptoms (user description) and service desk analyst knowledge. Example of categorization Software Application Finance Suite Purchase Order System Or Hardware Server Memory Board Card Failure Service desk analyst provides initial support and start diagnosing the incident. This applies on Medium and Low priority incidents ONLY and MUST NOT take longer than 15 minutes. If service desk analyst can provide resolution to the incident within 15 minutes depending on his/her technical expertise and/or the workarounds available in the knowledge base then Continue with activity 17. If NOT then GOTO activity 24 Service desk provide the resolution If incident resolved within the service level target (15 minutes) then GOTO activity 20.
12
13
Execute service request process
14
Incident categorization
15
Provide initial support and diagnosis Incident can be Resolved by NSC? Or Workaround Available in KB?
16
17 18
NSC provide solution Incident Resolved within SL Targets?
26
19
Event Escalated to Service Desk Coordinator
If NOT then continue with activity 19 When service desk analyst exceeds the agreed service level of 15 minutes an automatic notification will be sent to the service desk coordinator to notify him/her on violation. The reason is to prevent the service desk analyst from holding the incident record more than 15 minutes, and to avoid any implications that will impact the service desk performance. GOTO activity 81 and in parallel activity Continue with activity 20 Resolved incident get verified with the requester, to ensure his/her acceptance and satisfactory of the resolution. If incident resolution is accepted by the requester then Continue with activity 22. If NOT then GOTO activity 23 Service desk update the incident with the resolution steps and close the record. When incident resolution is not accepted by the requester then the service desk analyst needs to figure out whether they still have the time to perform further actions without exceeding the 15 minutes service level then GOTO activity 15. If NOT then Continue with activity 24. If 2nd line support is required to provide the solution then Continue with activity 25. If NOT then GOTO activity 26. Service desk dispatch the incident record to the appropriate workgroup, by selecting from the tool the appropriate queue. GOTO activity 35 If 3rd part company is required then GOTO 109 or 110 or 111
20 21
Verify incident resolution with Incident Requester Incident Resolution Accepted by Requester? Incident Updated and Closed Can NSC Provide Further Actions within SL Targets?
22 23
24
Requires 2nd Line Support?
25
Dispatched to IT Service Specialist Appropriate Workgroup-Queue Requires 3rd Party Company Involvement? Solution Not Available Escalate to Incident Manager
26
27
If NOT then Continue with activity 27 If incident cant be resolved neither by the IT service support specialist nor the 3rd part company because the incident is out of their scope of services, then service desk escalate to incident manager for further action, GOTO activity 93. Incident is critical or high, Servicedesk needs to assign to it the correct category and swiftly dispatch it to the correct workgroup (Queue).
28
Categorize and Dispatch to the Appropriate Workgroup(S)
29
Escalate to Incident Manager
GOTO activity 35 and in parallel Continue with activity 29 and 30 In parallel activity the service desk analyst call the incident manager by phone to inform her/him on the incident. GOTO activity 90 (Incident Manager starts the escalation activities). Servicedesk coordinator Continue in parallel with activity 30.
27
30
Escalate to service Desk Coordinator
In parallel activity the service desk analyst call the service desk coordinator by phone to inform her/him on the incident. GOTO activity 77 (Servicedesk coordinator starts the monitoring and following up activities).
31
Incident Resolution Not Accepted Call received from Incident Requester
Incident requester call Servicedesk to inform them the solution is not accepted either fully or partially.
32
Re-Open the Incident Record
Despite all adequate care, there will be occasions when incidents recur even though they have been formally closed Or solution is not accepted and the requester call within the 3 days given for feedback before incident record closed automatically. Service desk analyst reopen the existing incident record if it is within the three days, if it is after the three days period then the service desk will open a new incident record and link it to the original one. GOTO activity 10 Incident record redirected to the service desk queue if dispatched mistakenly to wrong workgroup (queue). Service desk analyst review the miss-routed incident record and follow with the categorization activity, GOTO activity 33 Incident record created by change implementer when change implementation fails and the back-out plan fails too. The change implementer (IT Service Support Specialists) logs in to the incident management system and creates an incident record and dispatches it to the appropriate workgroup for action. Incident record resides under the appropriate IT Service Support Specialist workgroup. GOTO activity 39 If incident detected by the IT Service Support Specialist is a critical or high then GOTO 38 If incident is not a critical or high the IT service support specialist dispatches the incident record to the appropriate workgroup (Queue) for action.
33
Review Miss Routed Records
34
Create Incident record from Change Management Process
35
Incident Record Reside Under the Appropriate Workgroup/Queue Is Incident Critical/High Priority? Incident Record Created by IT Service Specialist/IT Operation Team Member and Dispatched to the Appropriate Workgroup/Queue Call by Phone the Service Desk and the Incident manager to Report the Incident Disruption
36
37
GOTO activity 35 IT service support specialist call by phone the service desk and the incident manager to report the incident disruption with critical or high priority
38
28
39
Reach 75% of SL Targets?
40
Record Accepted, Change Record Status to InProgress Incident Record Reviewed
If incident record remains unattended in the queue and it reaches the 75% of the service level target of accepting the incident record then GOTO 80 And in parallel Continue with activity 40 Incident record attended by an IT service support specialist, record status changed to In-Progress. Incident reviewed the IT service support specialist review the incident description stated by the requester and other information captured during incident logging. If it is a miss-routed incident record, sent mistakenly to the wrong workgroup then Continue with activity 43 If NOT then GOTO activity 44
41
42
Is it a Miss-Routed Record?
43
Re-direct Record to Service Desk Queue and Inform Service Desk Analyst by Phone Further Information is Required?
IT service support specialist redirect the incident record to the service desk queue for re-categorization and to dispatch it to the correct workgroup (Queue) GOTO activity 33 If information recorded in the incident record is not complete and the IT service support specialist requires more information in order to proceed with diagnosing and providing the appropriate solution to the incident then Continue with activity 45 If NOT then GOTO activity 50 IT service support specialist stop the service level agreement clock by changing the incident status to wait for information (or any other pre-defined status with same purpose). The reason to stop the SLA clock is the unknown response time of the requester and if we let the clock continue counting it will not be practical and fair as the requester might takes minutes, hours or sometimes days to reply. If incident is detected by the event management tool then contact the Service Owner and once the information is provided GOTO activity 49 to restart the SLA clock. If incident is created by requester then Continue with activity 47. IT service support specialist contact incident requester for more information on the incident. Requester provides the missing or the additional information needed by the IT service support specialist. Restart the SLA clock by changing the incident status back to In Progress. Each of the support groups involved with the incident handling will investigate and diagnose what has gone wrong and all such activities should be fully documented in the incident record so that a complete historical record of all activities is maintained at all times. Valuable time can often be lost if investigation and diagnostic action are performed serially. Where possible, such activities should be performed in parallel to reduce overall timescales and support tools should be designed
44
45
SLA Clock Stopped
46
Is the Incident Generated by the Event management Tool?
47 48 49 50
Contact Incident Requester (IT Service Specialist) Information Provided by Incident Solution Requester SLA Clock Restarted Investigate and Diagnosis
29
and/or selected to allow this. However, care should be taken to coordinate activities, particularly resolution or recovery activities; otherwise the actions of different groups may conflict or further complicate a resolution! This investigation is likely to include such actions as: Establishing exactly what has gone wrong or being sought by the user Understanding the chronological order of events Confirming the full impact of the incident, including the number and range of users affected Identifying any events that could have triggered the incident (e.g. a recent change, some user action?) Knowledge searches looking for previous occurrences by searching previous Incident/Problem Records and/or Known Error Databases or manufacturers/suppliers Error Logs or Knowledge Databases. IT support specialist review the priority setting. If change is required then continue with activity 50-A If change is NOT required then GOTO Activity 51 Priority change privilege is limited to the NSC and incident manager, IT teams needs to change the priority should call NSC to do that, the purpose is to control the re-prioritization activities and prevent improper setting. If solution is available with the workgroup handling the incident then GOTO activity 56. If solution is NOT available wit the workgroup handling the incident then Continue with activity 52 Does it require a 3rd part company involvement to provide the solution If YES then GOTO activity 109 or 110 or 111 If NO the Continue with activity 53 Does it require another Loblaw workgroup to get involve to provide the solution? If YES then Continue with activity 54 If NO then GOTO activity 74 If incident record been transferred to more then 3 times outside the workgroup or within the workgroup then GOTO activity 75 and in parallel activity Continue with activity 55. If NOT then Continue with activity 55 Incident ticket dispatch to another workgroup to provide solution or to assist in providing the solution. GOTO activity 35 IT service support group was able to provide the solution in order to restore/recover the service. Incident status changed to Service Restored If workgroup requires further root cause analysis to provide the permanent solution then Continue with activity 58.
50-A
Is Priority Correct?
50-B
Call Servicedesk to Change priority. Solution Available?
51
52
Requires 3rd Party Company Involvement Requires Another Loblaw Workgroup Involvement?
53
54
Incident Record Bounce Exceed 3?
55
Dispatch Incident Record to Another Workgroup for Assistance Solution Provided and Service Restored Change Status to Restored
56
57
Further Root cause Analysis Required?
30
58
Stop SLA Change Status to Wait RCA
If NOT then GOTO activity 61 IT service support specialist stop the service level agreement clock by changing the incident status to Wait RCA (or any other predefined status for the same purpose) IT service support specialist creates Problem Record in the problem management system (may be the same as the incident management system). Complete the required information in order to create the problem record. After root cause is identified and documented, IT service support specialist restart the SLA clock by changing the incident status to IN-Progress.
59
Problem Management Process Create Problem Record
60
Start SLA Change Status to In-Progress
61
Change Request Required?
Does change required in resolving incident? If YES then Continue with activity 62 If NO then GOTO 66 IT service support specialist stop the SLA clock by changing the incident status to Wait RFC (or any other predefined status for the same purpose) IT service support specialist log in to the change management system and create a request for change.
62
Stop SLA Change Status to Wait RFC Change Management Process Create RFC
63
64
Change Management Process Change Implemented Successfully
Change management successfully.
process
completed
and
changes
implemented
65 66
Start SLA Change Status to In-Progress Provide Incident Resolution Change Status to Resolved
IT service support specialist restart the SLA clock by changing the Incident status to In-Progress Incident resolved and a complete resolution is provided. Even when a resolution has been found, sufficient testing must be performed to ensure that recovery action is complete and that the service has been fully restored to the user(s). Incident status changed to resolved. Regardless of the actions taken, or who does them, the Incident Record must be updated accordingly with all relevant information and details so that a full history is maintained. Once the IT service support specialist change the status to resolved the incident system automatically generate an email and sent to the requester for verification and acceptance.
67
Update Incident Record
68
Verify Incident has been Resolved Automatic Email Sent to Incident Requester or Service Owner
31
69
Is Incident Requester from Stores? Call Requester by Phone to Confirm Resolution Solution Accepted? Record Automatically Closed after 3 Days
If the incident is originally reported from one or more of the stores business units then Continue with activity 70 If NOT then GOTO activity 71 IT service support specialist call by phone the stores requester to confirm incident resolution. If YES then Continue with activity 72 If NO then GOTO activity 73 Incident record is closed automatically after three days from resolution, unless the requester calls the service desk by phone and asks for further action. GOTO activity 119 (END) If YES then GOTO activity 31 If NO then END No solution is available escalate the issue to the incident manager for further action, the incident manager will contact and may open a telephone bridge with technical and/or management to find a solution to the incident. GOTO activity 93 Incident record bounces more than 3 times, an auto notification generated by the system and sent to Servicedesk coordinator to follow up and monitor. Service desk coordinator monitor incident reached to 75% or 100% of its service level targets regardless of its priority. The Servicedesk coordinator will use the tool to view incident records and receives automatic notifications.
70 71 72
73 Further Action Required 74 Escalate to Incident Manager
75
Auto Notification sent to Servicedesk Coordinator Monitoring & Tracking Incident and Receiving Notifications
76
77
Is Incident Critical/High Priority?
If critical or high incident record is created, the service desk coordinator will call the incident manager to notify and ensure he/she is aware of the incident, GOTO activity 87. If incident is NOT critical or high then Continue with activity 78 If incident miss the service level target of 100% then GOTO activity 85 If NOT then Continue with activity 79 If incident miss the service level target of 75% then GOTO activity 80 If NOT then Continue with activity 76 (Continue monitoring) The service desk coordinator receives an alert (Auto notification) of an incident reached 75% of its service level target.
78
Is Incident Reached to 100% SL target ? Is Incident Reached to 75% SL target? Receives 75% Alert from System on Incident Missing SL Targets (Automatic Notification)
79
80
81
Follow up with Incident Assignee/Shift Manager until Record Accepted/Restored/Resolved
Servicedesk coordinator call and follow up with the resolving group/individual to ensure incident get accepted or restored or resolved (depends on the incident phase)
32
Service desk coordinator activities Receives and review automatic notification (75% and 100% alerts) Call, notify and follow up with resolving group/individual, Call, notify and follow up with resolving group manager Call incident manager on critical or high incidents Accountable on sending the multi level automatic notification to IT staff on critical or high incident.
82
Incident Accepted/Restored/Resolved? End the Notification and Escalation Process and Continue Monitoring & Tracking Incidents Reach 100% Alert Timeframe
If resolving group/individual accept or restored the service or resolved the incident (depends on the incident phase) then Continue with activity 83. If NOT then GOTO activity 84 Incident record accepted or restored or resolved (depends on the incident phase), service desk coordinator ends the follow up and escalation activities and continue monitoring incidents and reviewing notifications. GOTO activity 76 Has the incident reached the 100% service level targets then Continue with activity 85 If NOT the GOTO activity 81 The service desk coordinator receives an alert (Auto notification) of an incident reached 100% of its service level target.
83
84
85
Receives 100% Alert from System on Incident Missing SL Targets (Automatic Notification) Send Email and Call the Service Owner
86
Servicedesk coordinator call the resolving group manager and send an email to the service delivery manager or/and service owner. GOTO activity 81
87 88
Notify Incident Manager and Service Desk by Phone Start Sending Notifications As Per the Incident Priority Notification Schedule Service Restored?
Call by phone the incident manager and the service desk to inform them on the critical or high incident (in case they dont know about it) An automatic notification will be sent to different level of IT staff depends on the timeframe and the priority of the incident See attachment Notification list If service restored then GOTO activity 83 If NOT then GOTO activity 88
89
90
Receive Notification on Critical/High Priority Incident
Incident manager receives notification on critical or high incident priority Methods of communication Auto notification by the system Phone call from the Servicedesk coordinator Phone call from the service desk Phone call from IT operation team
91
Revise Incident Prioritization
Incident manager have the privilege to revise and change the incident priority in order to ensure the correct priority has been assigned to the incident before
33
going with escalation and performing further activities. 92 Is Incident Critical/High Priority? Contact the required Support Team Members, Support Team Manager, Sr. Managers, Service Desk Manager and Suppliers/Partners to Join Bridges. If incident is critical or high then Continue with activity 93 If NOT then GOTO activity 100 93 Incident manager will contact all the required people whom might have an input or can assist in providing the solution that is includes but not limited to the following parties: 94 Open and Facilitate Operation Bridge Update and Assign Actions Support team manager and team members group/individual, may be more than one group) Sr. manager(s) Service desk manager (or service desk representative) Suppliers/partners (if required) (resolving
Open a operation bridge Who will attend? Agenda Review incident develop service restoration scenario(s) Develop and assign action plans Update on work taking place Resolving groups and individuals (may be more than one group) Technical management (e.g. application management, middleware management ) Technical people from suppliers/Partners
Method of communication 95 Open and Facilitate Management Bridge Update, Review Status and Decide on Next Step Telephone (Conference call)
Open a management bridge Who will attend? Agenda Review incident Discuss service restoration scenario Review and discuss risks and impacts Develop and assign action plans Get update from operation bridge Service owners Business operation managers Directors Vice Presidents
34
Method of communication 96 Service Restored? Telephone (Conference call)
If YES the Continue with activity 97 If NOT then GOTO activity 94
97 98
Close Operation & Management Bridge Conduct a Post Incident Review When Incident Resolved
Incident manager close the two bridges the operation and management Incident manager conduct the Post Incident Review meeting ONLY after incident resolved. The post incident review will be conducted for every incident with priority critical or high upon resolution of the incident. Frequency To be scheduled within three business days after each critical or high incident resolution Agenda A review of the incident and what the root cause was A review of incident impact on business Any potential process improvement
Post Incident Review Report Includes: A specific list of assigned tasks and timelines A problem record created to get to the root cause Report populated with details and send to all parties who participated in the resolution and IT management.
99
Problem Management Process Create Problem Record for All Critical/High Priority Incident
Incident manager creates a problem record using the problem management system and dispatch it to the problem manager for further root cause analysis. The problem management process will be executed and the outcome of the root cause analysis will be added in the incident record and in the post review report by the problem resolving group/individual. GOTO activity 119 (END)
100
Contact Servicedesk Coordinator to Follow the Escalation Management Process for the Specific Priority
Incident is downgraded to lower than critical or high, incident manager contact the service desk coordinator by phone to follow up on incident resolution.
35
GOTO activity 77 101 Monitoring Event Management Tool Generated Alerts and Systems Performance Alert(s) Requires Actions? IT operation team monitor alerts generated by the event management tool
102
Alerts generated requires action (Non informational alert) then Continue with activity 103 If NOT then GOTO activity 101 (informational alert-no action is required)
103
Is it Incident Alert?
If YES then Continue with activity 104 If NOT then GOTO activity 108
104
Is it a Critical/High Priority Incident? Call Servicedesk & Escalate to Incident Manager by Phone
If YES then Continue with activity 105 If NOT then GOTO activity 106
105
IT team representative call immediately by phone the Servicedesk and the incident manager to notify them on critical or high incident. GOTO Activity 90 GOTO Activity 77 In Parallel Continue with Activity 106 IT operation representative creates incident record in the incident management system. IT operation representative dispatch the incident record to the appropriate workgroup to provide incident resolution, GOTO activity 37 In parallel activity the operation team continues monitoring the alerts generated from the system GOTO activity 101
106 107
Create Incident Record Dispatch Incident Record to Appropriate Workgroup
108
Problem Management Process Create Problem Record
If alert is not an incident (no service disruption or downgrade in quality of service) then it might be a potential incident that is required a proactive action to prevent it from happening. IT operation representative creates problem record and dispatch it to the problem manager queue. In parallel activity the operation team continues monitoring the alerts generated from the system GOTO activity 101
36
109, 110, 111, 112
3rd party Company Receives Request through Call, Web or Email
3rd party company receives a request from IT to provide or assist in providing incident resolution. Method of Communication Phone call, directly by the IT service desk or It service support specialist Email, is an additional to the phone call but not alone. Web interface, the 3rd party company has an interface to the Loblaws incident management system, they can receive incident record and they can create and send incident record too.
113 114
Incident Record Number Exchanged Follow 3rd Party Company internal Process LCL SL Targets Violated?
When communicated it is important to exchange the reference number of the incident record for future follow up and history. The 3rd party company will follow their internal process as far as they are handling the incident. Although the 3rd party company will follow their internal process in providing the incident resolution, it is important that they should follow Loblaws service level targets to ensure the same quality of services and the same recovery/resolution time as per incident priority. If Loblaws service level targets violated by the 3rd party company then GOTO activity 118 If NOT then continue with activity 116
115
116
Solution Provided
3rd party company provide the solution
117
Inform LCL Service Desk on Resolution and Verify Incident Resolution with Incident Requester 3rd Party Company Follow LCL Escalation Management Process and Notifications.
3rd part company inform the incident requester on resolution and it will be verified by the requester before incident closure.
GOTO activity 68 118 The 3rd party company will be part of Loblaws escalation process depends on the incident priority, they will be monitored and called by the service coordinator and invited in the communication bridges opened by the incident manager if incident is critical or high. The 3rd part company expected to cooper ate and fulfill Loblaws requirements in this regards. GOTO activity 116 119 END of Process END of Process
37
18. Legend & Definitions Legend

RFC
Explanation
Request For Change A formal proposal for a Change to be made. An RFC includes details of the proposed Change, and may be recorded on paper or electronically Root Cause Analysis An activity that identifies the Root Cause of an Incident or Problem. RCA typically concentrates on IT Infrastructure failures. Service Level Target. Commitment that is documented in a Service Level Agreement. Service Level Targets are based on Service Level Requirements, and are needed to ensure that the IT Service design is Fit for Purpose. Service Level Targets should be SMART, and are usually based on KPIs. A request from a User for information, or advice, or for a Standard Change or for Access to an IT Service. For example to reset a password, or to provide standard IT Services for a new User. Service Requests are usually handled by a Service Desk, and do not require an RFC to be submitted A Record containing the details of an incident. Each Incident record documents the Lifecycle of a single Incident. The Process responsible for managing the Lifecycle of all Incidents. The primary
RCA
SL Target
Service Request
Incident Record
Incident Management Process
38
Incident
Alert
objective of Incident Management is to return the IT Service to customers as quickly as possible An unplanned interruption to an IT Service or reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet affected Service is also an Incident. For example Failure of one disk from a mirror set. A warning that a threshold has been reached, something has changed, or a Failure has occurred. Alerts are often created and managed by System Management tools
19. Attachments
Incident Record Template
Notification List on Critical & High Priority Incidents.doc
39
Appendix A Impact/Urgency Matrix

Impact
Critical High Mediu m Extensive/Wides Critical Critic High pread al Significant/ Large Critical High Mediu m Moderate/ High High Mediu Limited m Minor/ Localized High Medi Mediu um m
Low Low Low Low Low
Urgency
40
Appendix B
Service Level Targets

Service Level Target (SLT) Time to Respond Target Clock Clock Clock Priority Time Begins Stops Ends (Min) (Status) (Status) (Status) Critical High Critical High 15 30 4hr 8hr In progress In Assigned Pending progress In Pending Resolved Progress In Pending Resolved Progress Assigned Pending
Time to Resolve

Incident Management Process

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Incident Management Process

Hochgeladen von

Copyright:

Verfügbare Formate

1

IT Service Management Processes

Incident Management Process

Document Name: Incident Management Process

Version History Version

1st draft Final

Document Distribution Control Recipient Name

Incident Detection & Recording

Incident Prioritization, Categorization & Initial Support

Investigation & Diagnosis

Resolution Verification & Incident Closure

6. Process Interfaces with Other ITSM Processes

Incident interact with other processes shown in the below diagram.

Unknown error, Root cause analysis required

Service Level Management

Service Level Agreement, Service Catalogue Trigger for performance monitoring

Incident Managem ent Process

8. Roles and Responsibilities Role

9. Roles Assignment Matrix Role Name of Locatio Resources n

Servicedesk (1st Line Support)

1-866-6727924 (1-866NSC-7X24) 514-383-7019 (Montral)

IT Service Support Specialist (2nd Line Support)

InfrastructureNetwork LAN InfrastructureSystems InfrastructureSecurity InfrastructureNetwork Wireless Application-

IBM- TBD 3rd part Company

10. Priorities-High Level Definition

Medium: No severe impact

List of Critical Services No. Service name 1 2 3 4 5 6 7 8

11. Impact-Urgency Matrix (See Appendix A for the Modified Matrix)

Service Level Targets

13. Process Deliverables

14. Process Measurement (Metrics) and Reporting

15. Process Meetings

Daily (Preferable early morning time)

15.2. Monthly Meeting

16. Process RACI Chart

NSCServicedesk (1st Line Support)

IT Service Support Specialist (2nd Line Support)

3rd Party Company

Requires 3rd Party Company Involvement?

Specialist (2nd Line Support)

NSCServicedesk (1st Line Support)

IT Service Support Specialist (2nd Line Support)

3rd Pa rty Company

102 103 104 105 106 107 108

114 115 116 117

17. Process Detailed Description No. Activity

Event Management Tool

Phone call Email

Incident Resolution Request Received/Recorded

Link to the Existing Incident Record Update Requester

Is it a Critical or High Priority Incident? Is it a service request?

Execute service request process

NSC provide solution Incident Resolved within SL Targets?

Event Escalated to Service Desk Coordinator

Requires 2nd Line Support?

Categorize and Dispatch to the Appropriate Workgroup(S)

Escalate to Incident Manager

Escalate to service Desk Coordinator

Incident Resolution Not Accepted Call received from Incident Requester

Re-Open the Incident Record

Review Miss Routed Records

Create Incident record from Change Management Process

Reach 75% of SL Targets?