Beruflich Dokumente
Kultur Dokumente
Table of Contents
Definitions..................................................................................................................................................... 2
Incident ..................................................................................................................................................... 2
Declaring ....................................................................................................................................................... 2
Incident Levels .............................................................................................................................................. 3
Priority 1 ................................................................................................................................................... 3
Priority 2 ................................................................................................................................................... 3
Priority 3 ................................................................................................................................................... 3
Priority 4 ................................................................................................................................................... 3
Mission Critical Services .............................................................................................................................. 3
Authentication ........................................................................................................................................... 3
Computer Labs .......................................................................................................................................... 3
Email ......................................................................................................................................................... 3
Network .................................................................................................................................................... 4
Wireless Network...................................................................................................................................... 4
Power Disruptions to Campus .................................................................................................................. 4
Storage Area Network (SAN) Disruption ................................................................................................. 4
Roles and Responsibilities of Team for Priority 1 Outages/Major Incidents ............................................... 4
CIO or Backup .......................................................................................................................................... 4
Incident Commander ................................................................................................................................. 4
Technical Lead .......................................................................................................................................... 4
User Support Lead .................................................................................................................................... 4
Communications Lead .............................................................................................................................. 5
Field Coordinator ...................................................................................................................................... 5
Service Owners ......................................................................................................................................... 5
Subject Matter Expert (SME) ....................................................................................................................... 5
Communications ........................................................................................................................................... 5
Internal Communications Options ............................................................................................................ 5
External Communications Options ........................................................................................................... 5
1
SOP Unplanned Outages/Major Incidents
Reporting ...................................................................................................................................................... 6
Meeting Location .......................................................................................................................................... 7
Logistics ........................................................................................................................................................ 7
Execution of Work Steps for Priority1 Major Incident ................................................................................. 8
Incident commander duties ......................................................................................................................... 12
General duties ......................................................................................................................................... 12
Incident commander timed checklist ...................................................................................................... 13
Check List ................................................................................................................................................... 13
Username: it
Password: Ic3
Definitions
Incident
Whenever a user is not receiving an expected level of service from an IT service.
Expected levels of service are based on Service Level Agreements (SLA).
Major Incident/Outage
A major incident is defined as a significant event, which demands a response beyond the
routine, resulting from uncontrolled developments in the course of the operation of any establishment or
transient work activity.
Declaring
Mission critical (university or internal) IT service(s) are not performing at the expected level for a
period of 30 minutes unless defined differently in the SLA or designated otherwise by this plan.
2
SOP Unplanned Outages/Major Incidents
Incident Levels
Priority 1
Mission critical services are not performing for the University. All appropriate resources will be
dedicated to restore service(s).
Priority 2
Mission critical services are not performing for departments or computer labs. Service(s) is not
performing at a campus or enterprise level. Appropriate services owners will be dedicated to
restore service(s).
Priority 3
Address problem and escalate as necessary. These incidents do not require the dedication of level
1 or 2.
Priority 4
There is a known work around for the issue. Does not require dedicate resources to resolve.
Email
Email messages not flowing in or out of the following systems:
Exchange on premise
Office 365 Cloud Solution
Note: Unless the service outage is determined to be an exclusive Microsoft issue
and UCCS IT personnel have no control to participant in a resolution, than this
will not follow the full Major Critical procedures. Conceivably only the
communication plan will be followed.
3
SOP Unplanned Outages/Major Incidents
Network
Disruptions to campus network systems to include:
Campus Firewalls
Campus Routing
Campus Switches
Connections in and out or within the El Pomar Data Center
Connections in and out or within the Columbine Data Center
Connections in and out or within Main Hall and Cragmor Hall
External internet connectivity
Wireless Network
Disruptions to the wireless system not allowing customers to utilize the network
Incident Commander
Coordinate plan; oversee response; lead meetings; organize meals; and provide funding;
See below for detailed description.
Technical Lead
Examine situation; confirm major incident; attempt to identify root cause; work to find
technical options; present technical options to team; and participant with plan where
needed.
4
SOP Unplanned Outages/Major Incidents
Communications Lead
Create plan for messaging including frequency; provide messaging to campus; update
UCCS.info; point person for internal communication; and participant where needed.
Field Coordinator
Provide information from the field; deliver support from the field; participant where
needed. Note: depending on the major incident this role may not be needed.
Service Owners
Provide information on services effected; work with technical lead to create options for
plan of action.
Communications
Internal Communications Options
Communications should be sent out from the helpdesk@uccs.edu email address if possible
outage@uccs.edu - hosted on lists.uccs.edu (Communigate server – local infrastructure must
be working) (Texting and Email)
outage@uccs.info - hosted through Bluehost.com (Texting and Email)
uccshelpdesk@gmail.com - help desk communications sent when exchange is not available
CenturyLink Conferencing Audio Conferencing
USA: 1-720-279-0026
USA /Canada (toll free): 1-877-820-7831
1. This will be the Major Incident main line:
Web / sharing desktops
GotoMeeting.com
5
SOP Unplanned Outages/Major Incidents
UCCS leadership - Only CIO or backup communicates with leadership team
University Relations
Hutton, Tom. . . .719-255-3439
Executive Director
University Advancement - University Communications and Media Relations
MAIN 301A
thutton@uccs.edu
UCCS Twitter
Denman, Philip. . . .719-255-3732
Assistant Director
University Advancement - University Communications and Media Relations
MAIN 301
pdenman@uccs.edu
Website Alerts {Craig needing information for posting in Ingeniux or how this should be
handled}
Rave (Must first check with Tim Stoecklein before post message with system)
Stoecklein, Tim. . . .719-255-3106
Program Director of Emergency Management
Public Safety Department - Emergency Management
DPS 208
tstoeckl@uccs.edu
Phones
Help Desk ACD message
Sidecars if necessary
Media
University Relations will be the only organization allowed to speak to the media.
Reporting
When to report
Who to report to
UIS
Other CU campuses
Chancellor's office
President's office
6
SOP Unplanned Outages/Major Incidents
Meeting Location
EPC 139, IT Conference Room
Location needs:
Phone
Laptop/Projector
White board
Table
Room and chairs for 10 people
Extra Ports
Power
Logistics
Review Mission Critical Services
Communications expectations plan
Communication templates
Define essential personnel and backups
Personnel expectations during major incident and after
Essential personnel is expected to participant in major incident/outage response. If
incident is after hours essential personnel is expected to participant if available.
Working Time:
16 hours working max or 2 a.m.
At the start of 14 hours, or midnight appropriately, technical lead must start to
create plan for providing rest to employees.
Discuss of break/meal every 4 hours. Food/Drink coordination
After major incident/outage is resolved and work was conducted after normal business
hours, employees will be given hour for hour flex time. The employee is expected to take
the time and must be used within one month from when the work was performed.
Incident Commander will work with employee’s supervisor to coordinate flex time.
Equipment needs
Equipment needs shall be coordinated by the Incident Commander.
Funding
Will be coordinated by the Incident Commander.
7
SOP Unplanned Outages/Major Incidents
Execution of Work Steps for Priority1 Major Incident
Email Template:
Subject:
(ServiceName) is experiencing a service interruption.
- SYMPTOM1
- WORKAROUND1
Greg Williams
Cell: (719)237-6491
House:(719)481-1290
Email: gwillia5@uccs.edu or
8
SOP Unplanned Outages/Major Incidents
If directors have not been reached contact associate
directors:
Rob Garvie
Cell: (719)439-1724
House: (719)266-8525
Email: rgarvie@uccs.edu
Mike Belding
Cell: (719)338-9776
House: (719)260-6794
Email: mbelding@uccs.edu
Text Template:
UCCS IT Alert: SERVICENAME is experiencing a
9
SOP Unplanned Outages/Major Incidents
service interruption. IT is working to restore service and
will provide more information as it becomes available.
Email Template:
Subject:
UCCS IT Alert: SERVICENAME is experiencing a service
interruption.
Body:
Current symptoms includes:
- SYMPTOM1
- WORKAROUND1
10
SOP Unplanned Outages/Major Incidents
Body:
Current symptoms includes:
- SYMPTOM1
- WORKAROUND1
Email Template:
Subject:
UCCS IT Alert: SERVICENAME Service Now Available
Body:
UCCS IT has identified and addressed root cause. As of
XX:XX a.m./p.m. service has been restored. Thank you for
your cooperation we worked to resolve this issue.
11
SOP Unplanned Outages/Major Incidents
Close Major Incident
1. Hold debrief meeting with three days
2. Prepare Major Incident Response report within five days
with the help from those participating
3. {Rachel - needing report template}
4. Distribute report
12
SOP Unplanned Outages/Major Incidents
Incident commander timed checklist
Every 30 minutes:
o Request status from teams working issues.
What current hypotheses are being investigated and have any been eliminated or
verified.
What actions have been completed or are in progress.
o Provide a verbal update within the incident room and update the incident log.
Every hour:
o Check in with communications staff regarding next status update steps.
o If the list of hypotheses has been exhausted, initiate a new cycle of brainstorming,
documenting, assigning tasks, etc.
At 11:30am and 5:30pm:
o Request that business office (if available) order some food for those working the issue in
the incident room. Be sure to cover dietary needs (vegetarian, etc.).
o Encourage participants to use mealtime as an opportunity to leave the room for a little
while, allowing for coverage if needed.
At 9pm:
o Request that directors/managers begin their plans for staff rotation during the night if on-
going work is required.
Check List
1. Notification of Priority 1 incident
2. Confirmation of major incident/outage
3. Priority 1 incident has been determined
4. If Level 1 priority 1 incident
a. Has incident crossed trigger points
i. No – continue to monitor situation
ii. Yes
1. Create problem in Cherwell
2. Determine which individuals are needing to evaluate the situation
3. Define roles for individuals participating
4. Tools
a. Last pass Cloud Service for password management
www.lastpass.com
b. Monitoring
c. Testing environment
5. Build action plan:
a. Define scope / timeframe
b. Develop technical plan
13
SOP Unplanned Outages/Major Incidents
c. Define personnel needed
d. Determine return on investment
e. Assign tasks
6. Communication plan
a. How do we communicate with each other?
b. How and who do we communicate with externally?
c. Recording communications
d. Confirming communications postings
e. How often do we need to communicate
f. Communicating to UCCS Leadership (Role of CIO)
7. Document going progress and issues, record in Cherwell
8. Resolved
a. Documenting issue, response and fix
9. Closing response
a. Hold debrief meeting with three days
b. Prepare Major Incident Response report within five days with
the help from those participating
c. {Rachel - needing report template}
d. Distribute report
14