Sie sind auf Seite 1von 73

Structured Approach to IT

Business System Availability


and Continuity Planning,
Analysis and Design

Alan McSweeney
Objectives

• To provide details on a structured approach to analyse and


define availability and continuity requirements for IT
systems
• To provide background information on the changing
landscape of availability and continuity

February 18, 2010 2


Agenda

• Availability and Continuity Overview


• Availability Management
• Continuity Management
• Summary

February 18, 2010 3


Availability and Continuity

• Availability is the ability of a system or service to perform its


required function at a stated instant or over a stated period of time.
• Availability is expressed as the availability ratio
− The proportion of time that the service is actually available for use by the
customers within the agreed service hours
• Continuity is concerned with preparing to address unwanted
occurrences
− May relate to the recovery of IT systems or entire business processes.
• Continuity is concerned with ensuring that IT Services are recovered
within agreed time scale
• Availability is a superset of Continuity and encompasses the
continued operation of systems in the event of a disaster
• Continuity ensures availability in extreme circumstances
• Availability defines what is to be available in these extreme
circumstances

February 18, 2010 4


Availability and Continuity Relationship

Availability Provides Availability Criteria


to Continuity

Availability Continuity

Continuity Provides Business Impact


Analysis to Availability

February 18, 2010 5


Availability and Continuity Relationships with Other
IT Management Processes
Service
Finance Security
Planning and
Management Management
Management
Controls Security that May Ensures that Continuity and
Puts a Cost on Lack of Impact Continuity and Availability are
Availability Availability Incorporated into Service
Controls Expenditure on
Availability Provides Availability Criteria Agreements and Provisions
Availability and Continuity
to Continuity

Availability Continuity

Continuity Provides Business Impact


Analysis to Availability
Defines the Capacity Ensures Systems and Controls Change that May
Required for Continuity and Infrastructure are Designed Impact Availability or
Availability to Incorporate Continuity Require Continuity to be
and Availability Invoked
Capacity
Change
Planning and IT Architecture
Management
Management
February 18, 2010 6
Availability and Continuity

• Availability
− Defines availability of service during operating hours
• Under normal circumstances
• Under extraordinary circumstances

• Continuity
− Defines continued operations of critical services and their
availability
• Time until services are available and state of service after recovery
• Under extraordinary circumstances

February 18, 2010 7


Availability and Continuity
Availability of Services During Normal Availability of Services
Operations After Continuity
Primary IT Facilities Recovery IT Facilities

Service 1 Service 2 Service 1

Component 1 Component 1 Component 1

Component 2 Component 4 Component 2


Continuity
Component 3 Component 5
of Component 3
Operations

Service 3 Service 4 Service 3

Component 1 Component 1 Component 1

Component 5 Component 2 Component 5

Component 6 Component 7 Component 6

February 18, 2010 8


Availability and Continuity
Full View of Availability
Primary IT Facilities Recovery IT Facilities

Service 1 Service 2 Service 1

Component 1 Component 1 Component 1

Component 2 Component 4 Component 2


Continuity
Component 3 Component 5
of Component 3
Operations

Service 3 Service 4 Service 3

Component 1 Component 1 Component 1

Component 5 Component 2 Component 5

Component 6 Component 7 Component 6

February 18, 2010 9


Availability and Continuity
Non-disruptive system maintenance such as data
backup combined with continuous availability of
agreed business systems

Continuous
Operation

Business
Continuity
High Disaster
Availability Recovery

Fault-tolerant, failure-resistant Protection against unplanned


infrastructure supporting continuous outages such as disasters through
availability of agreed business systems reliable and predictable recovery and
continuity of operations

February 18, 2010 10


Availability and Continuity
Availability
Continuity

Availability During Normal


Operations

Availability During Housekeeping


and Maintenance Operations

Availability After Some


Component Failures

Availability After Complete


Failure of Primary Facility
February 18, 2010 11
Availability and Continuity Heat Map

Last
Transaction

Recovery
Point
Minutes
Objective
(RPO) –
Amount
of Data Increasing Availability
Loss Hours (and Continuity)
Tolerable Requirements
After
Recovery
Days

Days Hours Minutes Seconds Instantly

Recovery Time Objective (RTO) – Time to Recover


Service/Time By Which Service Needs to be Recovered

February 18, 2010 12


RTO and RPO

• Recovery Point Objective (RPO)


− Amount of Data Loss Tolerable After Recovery
• Either amount of data immediately available after recovery or amount of
data available for some time after recovery
• Can be different
• Provide some data for minimal operations initially
• Provide more/all data

• Recovery Time Objective (RTO)


− Time to Recover Service/Time By Which Service Needs to be
Recovered

February 18, 2010 13


RTO and RPO With Cost of Lack of Availability
Recovery Business Critical Services
Point Requiring Immediate Access With
Objective Very Limited/No Data Loss and
(RPO) – Requiring Continued Operation in
Amount the Event of a Disaster
of Data
Loss
Tolerable
After
Recovery

Recovery Time Objective


(RTO) – Time to Recover
Service/Time By Which
Cost of Lack of Service Needs to be
Availability of Recovered
Service/Cost
Benefit of
Providing High
Availability and
High Continuity

• Add extra dimension to Availability and Continuity Heat Map to allow for explicit
identification of those systems that need to be continuously available
February 18, 2010 14
What is a Business Critical Application?

• Applications deemed business/mission critical


− 2006 – 16%
− 2007 – 36%
− 2008 – 56%
− 2009 – 60%
• Availability and continuity are merging as most
applications are being deemed mission critical

February 18, 2010 15


How Often Have You had to Invoke Continuity Plan
in Last Five Years?
Once 14%

Twice 6%

Three 3%
Four 2%
Five or More 2%

None 73%

• 27% of organisations have declared at least one disaster in


the last five years
February 18, 2010 16
What Were the Causes of Having to Invoke
Continuity Plans?
Power Failure 22.5%

Hardware Failure 16.6%

Network Failure 11.2%

Software Failure 8.9%

Human Error 8.4%

Flood 6.3%

Other 6.3%

Hurricane 5.6%

Fire 3.9%

Winter Storm 3.5%

Terrorism 1.9%

Not Specified 1.9%

Earthquake 1.5%

Tornado 1.1%

Chemical Spill 0.4%


February 18, 2010 17
Continuity Testing Seen as Disruptive

• 40% of organisations state that continuity testing impacts customers


• 32% of organisations state that continuity testing impacts sales
• Reasons for lack of testing
− Lack of time resources
− Lack of technology
− Disruption to employees
− Budget
− Disruption to customers
− Disruption to sales
− Disruption to production systems
− Not seen as a priority

February 18, 2010 18


Business Impact of Lack of Availability and
Continuity Increase Exponentially Over Time
Financial Loss

Seconds Minutes Hour Hours Day Days


Duration of Loss of Continuity
Revenue Loss Staff Productivity Loss
Reputational Damage Financial Performance

February 18, 2010 19


Availability Design and Management

• Availability design optimises the capability of the IT infrastructure,


services and supporting organisation to deliver a cost effective and
sustained level of availability that enables the business to satisfy its
business objectives
− Ensures IT systems and infrastructure are designed to deliver the levels of
availability required by the business
− Provides a range of availability reporting to ensure that agreed levels of
Availability are continuously measured and monitored
− Optimises the availability of the IT infrastructure to deliver cost effective
improvements that deliver real benefits to the business
− Ensures shortfalls in availability are recognised and corrective actions are
identified and performed
− Reduces problems and incidents that impact availability
− Creates and maintains an Availability Plan aimed at improving the overall
availability and infrastructure components to ensure business availability
requirements can be satisfied

February 18, 2010 20


Continuity Design and Management

• Continuity design is concerned with responding to and


recovering business operations in the event of an outage
or disaster rendering significant impact on the organisation
− Support the business by ensuring that the required IT facilities can
be recovered within required and agreed business timescales
− Provides the strategic and operational framework to review the
way the organisation continues to provide its services while
increasing its ability to recover from disruption, interruption or
loss
− Depends both on management and operations
− Requires management commitment

February 18, 2010 21


People, Process, Technology

• Start availability and continuity design with a business


impact analysis and risk assessment
• Technology exists to supports availability and continuity
design - technology not constitute a plan
• Focus on prevention before investing in technology
• However, availability and continuity is seen as the preserve
of IT
− The business frequently does not have the required project focus
or experience
• Embed availability and continuity into IT architectures

February 18, 2010 22


Questions

• Do you have adequate control over prevention of business process or IT


infrastructure downtime?
• Do you have adequate IT capabilities to insure continuous operations?
• Do you know the risks your business and its business systems face?
• What would the cost and impact of downtime be to your business?
• Is your current continuity plan sufficient to meet your RPO and RTO objectives?
• Do you know how much will business continuity costs?
• What business problems will implementing availability and continuity solve even if
you do not experience an unplanned IT outage?
• What is the overall business value of availability and continuity to the business?
• How should we define what level of business continuity we really need?

February 18, 2010 23


Availability Design and Management

February 18, 2010 24


Availability Design and Management Process
Availability Design and Management Consists of Two Parallel Sub-Processes
Availability Process Quality Control

2. Availability 3. Management
1. Availability Report Escalations of
Reporting Evaluation and Service Availability
Improvement Violations

Availability Process Design and Management

2. Document
1. Availability 3. Gap Analysis
System and 4. Availability
Requirements and
Application Review
Analysis Recommendations
Architecture

February 18, 2010 25


Structured Approach to Availability Design and
Management
• Can be used for an individual system or application or a
service that consists of a number of systems or
applications or the entire IT landscape
• Scope is to define a plan to implement agreed availability

February 18, 2010 26


Scope of Availability Design and Management

• Planning for service availability


• Designing for service availability by anticipating
disruptions, estimating and measuring reliability and
maintainability
• Planning for availability within SLA and reporting on them
• Ensuring cost effectiveness of availability solutions
• Reducing the duration of problems and incidents affecting
availability
• Ensuring that security requirements are defined and
incorporated within the overall availability design
February 18, 2010 27
Availability Design and Management Driven by
Requirements
• Availability requirements are based on the needs of the
business
• Requirements are gathered, defined, and validated by the
key users and business management
• Includes hours of uptime as well as planned and
unplanned downtime
• Includes ongoing support and procedures to address
service disruptions

February 18, 2010 28


Benefits of a Structured Approach to Availability
Design and Management
• Reduce Risks
− SLAs will incorporate availability design based on architecture,
− Reduced risk of violating SLAs
• Cost Reduction
− A defined and agreed acceptable level of service prevents over-delivery
− Unnecessary expenditure on maintenance and resilience building is avoided
• Improved Service Agility
− Changing business availability requirements are addressed quickly
− Cost of changes in availability of different levels is defined or can be assessed
quickly.
• Improved Service Quality
− Improvement in Service Quality results from reduced Incidents as well as a
reduced time to restore service

February 18, 2010 29


Structured Approach to Availability Design and
Management
Availability Analysis and
Design

1. Availability 2. Document System and 3. Gap Analysis and


4. Availability Review
Requirements Analysis Application Architecture Recommendations

1.1 Understand Service 2.1 Define Service Critical 3.1 Perform Gap and Risk 4.1 Define Availability
Goals Components Analysis Measurement Model

2.2 Document Service


1.2 Document Availability 3.2 Identify Single points 4.2 Perform Trend
Critical Components and
Requirements of Failure Analysis
Their Relationships

1.3 Validate with Service 2.3 Document and


3.3 Evaluate Alternative 4.3 Analyse Expanded
Level Management Review Components
Approaches and Costs Incident Lifecycle
Function Monitoring Capability

3.4 Produce Gap Closure


2.4 Document System and 4.4 Investigate Major
Recommendation and
Application Architecture Outages
Specification

3.5 Plan and Summarise 4.5 Analyse Availability


Downtime Reports

3.6 Create Statement of


Work to Implement

February 18, 2010 30


Step 1 - Availability Requirements Analysis

Step Scope Inputs Outputs


1. Availability Determine availability Request for new service or changes Documented and agreed
Requirements requirements related to supporting to existing service availability requirements
Analysis the needs of the business Request for change to availability
Validate with other IT
management processes
Create draft service agreement and
assess for feasibility from
availability perspective
1.1 Understand Document business goals for the Service design specification Documented and agreed
Service Goals service business goals

1.2 Document Produce draft availability Draft service level agreement Documented and agreed
Availability requirements based on availability requirements
Requirements understanding of business goals

1.3 Validate with Validate availability draft Overall service management plan Validated availability
Service Level requirements with service level requirements
Management agreements and overall service
Function management plan

February 18, 2010 31


Step 2 - Document System and Application
Architecture
Step Scope Inputs Outputs
2. Document Analyse operating environment of Service design specification Documented and agreed
System and the individual components that Configurations of individual existing architecture for
Application comprise the service components that comprise the service delivery
Architecture service level agreement
2.1 Define Define the configurations of Service design specification Documented and agreed list
Service Critical individual components that Configurations of individual of individual components
Components comprise the service components that comprise the that comprise the service
service
2.2 Document Document the structure of the Configurations of individual Representation of individual
Service Critical service breakdown - individual components, their attributes and components, their attributes
Components and components and and their relationships and relationships
Their relationships that deliver the
Relationships service
2.3 Document Review existing service monitoring Existing service monitoring Defined service monitoring
and Review facilities and update or replace if procedures criteria
Components required
Monitoring
Capability

2.4 Document Complete architecture document Representation of individual Architecture document


System and that describes how the service is components, their attributes and
Application delivered according to the service relationships
Architecture level agreement Defined service monitoring criteria

February 18, 2010 32


Step 3 - Gap Analysis and Recommendations
Step Scope Inputs Outputs
3. Gap Analysis and Perform gap analysis and Validated availability requirements Availability design
Recommendations recommend suitable approach, Architecture document
create specifications and cost
justification Service problem and incident history
3.1 Perform Gap Based on knowledge derived Problem and incident data Gaps analysed and risks
and Risk Analysis from Incident and Problem data Availability requirements identified and documented
identify gaps in current services
Architecture document
Identify individual components
3.2 Identify Single Components attributes and Identified points of failure
points of Failure whose failure can cause service relationships
disruption
3.3 Evaluate Explore various options within IT strategy and architecture Approach for required
Alternative the approved range and identify Gaps analysed and risks identified availability
Approaches and a suitable approach based on and documented
Costs requirements and cost
justification
3.4 Produce Gap Decision on how the closure Approach for required availability Decision on design and
Closure should be implemented based Cost information implementation
Recommendation on financial and business Specifications for the
and Specification reasons availability design and
Develop specifications for the architecture
availability design and
architecture
3.5 Plan and Plan downtime for components Decision on design and Planned downtime
Summarise and aggregate downtime across implementation
Downtime services
3.6 Create Initiate project for implementing Specifications for the availability Statement of work for project
Statement of Work changes to address availability design and architecture
to Implement issues
February 18, 2010 33
Step 4 - Availability Review

Step Scope Inputs Outputs


4. Availability Assess, review and update Incident, problem, fault reports Identified availability
Review availability design if required concerns and amended
design if required
4.1 Define Define availability measurement Documented and agreed availability Defined data sources for
Availability model requirements availability measurement
Measurement
Model
4.2 Perform Trend Analyse incident and problem Incident and problem trend reports Identified availability
Analysis data to arrive at a high level view concerns
of availability
4.3 Analyse Analyse expanded incident Analyse breakdown of incident Identified specific areas
Expanded Incident lifecycle resolution to validate and update which need improvement
Lifecycle design considerations

4.4 Investigate Investigate large outages and Detailed incident analysis for Identified availability
Major Outages update availability design if specific incidents, fault, problems concerns
required and performance reports

4.5 Analyse Review availability reports and Availability reports Identified availability
Availability Reports update infrastructure if required concerns
Statement of work for
identified changes

February 18, 2010 34


Core Principles

• Core principles ensure consistency of work and outputs


• Ensure processes will meet the requirements of the
business
• Work will be of a high quality
• Core principles should serve as a checklist against which all
work is assessed

February 18, 2010 35


Availability Design and Management Core Principles

1. Availability requirements are based on the agreed and defined needs of the
business
2. The IT function will determine the overall requirement of availability,
performance and recoverability of systems under the terms of a service
agreement with the business
3. Infrastructure needs to be designed to routinely incorporate availability
requirements
4. The availability design and management process must adhere to security policies
and procedures
5. An availability plan will be used to track and manage availability requirements
and information collected
6. Data on service reliability, maintainability, resiliency must be collected and
monitored
7. The IT function will use continuous process improvement to achieve and
maintain level of service availability
8. Planned downtime must be minimised for business-critical functions and
unplanned downtime is handled by service management processes including
Incident Management, Service Request Management, Continuity Management

February 18, 2010 36


Core Principle 1 - Availability Requirements Are Based On
The Agreed And Defined Needs Of The Business

• Elements • Benefits

− Conditions for availability must be − Expectations are clearly defined


aligned with the needs of the and accepted
business − User satisfaction is increased
− Relevant availability data must be − Growth can be forecast more
gathered and analysed easily
− Input and validation of − Problem areas can be identified
requirements must be solicited
from the business
− Availability requirements must be
documented and distributed for
agreement and approval

February 18, 2010 37


Core Principle 2 - The IT Function Determines The Overall
Requirement Of Availability, Performance And
Recoverability Of Systems
• Elements • Benefits

− Requirements are met under − There is a structure of supporting


defined and agreed service contracts in place from suppliers
agreements and vendors to met business
− Good working relationships need availability requirements
to exist with key suppliers and
vendors
− Changes to environment must be
reflected in service agreements

February 18, 2010 38


Core Principle 3 - Infrastructure Needs To Be Designed To
Routinely Incorporate Availability Requirements

• Elements • Benefits

− Changes in infrastructure and − Availability requirements and


business needs must reflected in expectations are clearly defined
availability planning and design and accepted
− Availability and recovery
requirements need to be explicitly
incorporated at the design stage

February 18, 2010 39


Core Principle 4 - Availability Design And Management
Process Must Adhere To Security Policies And Procedures

• Elements • Benefits

− Access to IT services must be − Security measures will be followed


provided in a secure environment − There will be an ability to
− Availability processes must be differentiate between security
aligned with security policies problems and availability problems

February 18, 2010 40


Core Principle 5 - Availability Plan Will Be Used To Track And
Manage Availability Requirements And Information
Collected
• Elements • Benefits

− An availability plan must be − Availability management goals are


developed and distributed clearly defined and documented
− Availability planning must be − There will be a clearly
defined and outlined communicated process for
− The availability plan must define availability planning and reporting
the details about the to be data − Data provided for availability
collected: what, how often, reporting, analysis and forecasting
analysis, reporting, distribution,
responses required

February 18, 2010 41


Core Principle 6 - Data On Service Reliability,
Maintainability, Resiliency Must Be Collected And
Monitored
• Elements • Benefits

− The data to be collected and − Availability management will be


monitored must be defined, proactive and responsive rather
documented and communicated than reactive
− A supporting procedure to collect − The expectations of the business
and monitor data, including can be set accurately
response to potential problems − There will be an ability to prepare
must be defined for potentially increased future
− Data needs to be reviewed on a requirements
regular and consistent basis − Availability trends can be identified
and addresses

February 18, 2010 42


Core Principle 7 - IT Function Will Use Continuous Process
Improvement To Achieve And Maintain Level Of Service
Availability
• Elements • Benefits

− Collected availability data will be − The business is enabled to make


used to identify areas requiring recommendations on availability
improvement improvements
− Implementation of any availability
process improvement must be
controlled by the change
management process to control
impact

February 18, 2010 43


Core Principle 8 - Planned Downtime Must Be Minimised
For Business-Critical Functions And Unplanned Downtime Is
Handled By Service Management Processes
• Elements • Benefits

− Planned and unplanned downtime − Expectations are set with the


must be clearly notified to the business
business − IT demonstrates commitment to
− Acceptable versus unacceptable supporting business-critical
unplanned downtime for business- functions
critical functions must be defined
− Escalation procedures will be
developed and distributed

February 18, 2010 44


Use Core Principles as Checklist for Independent
Verification of Availability Design and Processes
1 Availability requirements are based on the agreed and defined needs of the business 
1.1 Conditions for availability must be aligned with the needs of the business 
1.2 Relevant availability data must be gathered and analysed 
1.3 Input and validation of requirements must be solicited from the business 
1.4 Availability requirements must be documented and distributed for agreement and approval 
2 The IT function will determine the overall requirement of availability, performance and recoverability 
of systems under the terms of a service agreement with the business
2.1 Requirements are met under defined and agreed service agreements 
2.2 Good working relationships need to exist with key suppliers and vendors 
2.3 Changes to environment must be reflected in service agreements 
3 Infrastructure needs to be designed to routinely incorporate availability requirements 
3.1 Changes in infrastructure and business needs must reflected in availability planning and design 
3.2 Availability and recovery requirements need to be explicitly incorporated at the design stage 
4 Availability Design And Management Process Must Adhere To Security Policies And Procedures 
4.1 Access to IT services must be provided in a secure environment 
4.2 Availability processes must be aligned with security policies 
February 18, 2010 45
Continuity Design and Management

February 18, 2010 46


Continuity Design and Management Process
Continuity Design and Management Consists of Two Parallel Sub-Processes
Continuity Process Quality Control

2. Continuity Report 3. Management


1. Continuity
Evaluation and Escalations of Service
Reporting
Improvement Continuity Violations

Continuity Process Design and Management

1. Conduct Risk and 3. Determine Data 4. Form Continuity


2. Conduct Business
Disaster Avoidance Backup and Recovery and Disaster Recovery
Impact Analysis
Assessment Options Team

6. Continuity 7. Conduct Continuity 8. Maintain Continuity


5. Design and Develop
Processing for Critical and Disaster Recovery and Disaster Recovery
Disaster Recovery Plan
Service Components Rehearsal Plan

February 18, 2010 47


Structured Approach to Continuity Design and
Management
• Can be used for an individual system or application or a
service that consists of a number of systems or
applications or the entire IT landscape
• Scope is to define a plan to implement agreed continuity

February 18, 2010 48


Scope of Continuity Design and Management

• Conducting impact analyses on loss of business systems


• Designing for service continuity by anticipating disruptions,
estimating and measuring reliability and maintainability
• Supporting business critical functions
• Designing and developing a Disaster Recovery Plan
• Design and developing Disaster Recovery Training
• Planning for and performing disaster mitigation and
avoidance
• Assessing and managing risk

February 18, 2010 49


Structured Approach to Continuity Design and
Management
Continuity
Analysis and
Design

1. Conduct Risk 5. Design and 6. Continuity 7. Conduct 8. Maintain


2. Conduct 3. Determine 4. Form Continuity
and Disaster Develop Disaster Processing for Continuity and Continuity and
Business Impact Backup and and Disaster
Avoidance Recovery Plan Critical Service Disaster Recovery Disaster Recovery
Analysis Recovery Options Recovery Team
Assessment (DRP) Components Rehearsal Plan

3.1 Identify
2.1 Define 6.1 Identify
Backup and 4.1 Define 5.1 Determine 7.1 Design 8.1 Assign
1.1 Identify Business Impact Critical
Recovery Options Recovery Team DRP Structure and Rehearsal Responsibility for
Potential Threats Analysis Components for
for Critical Structure Methodology Programme DRP Maintenance
Methodology Continuity
Functions

8.2 Establish DRP


2.2 Identify 3.2 Evaluate 5.2 Define DRP
1.2 Assess 4.2 Define 6.2 Develop 7.2 Develop Review and
Business Operation of Notification
Probability of Recovery Team Options for Rehearsal Maintenance
Functions to be Backup and Schedule and
Threats Functions Continuity Scenarios Procedures and
Analysed Recovery Options Process
Schedule

3.3 Determine
1.3 Evaluate 2.3 Define 8.3 Integrate DRP
Backup and 4.3 Define Team 6.3 Develop 7.3 Plan and
Current Disaster Business Function 5.3 Define DRP Maintenance into
Recovery Options Leaders and Continuity Schedule
Avoidance Criticality Escalation Process Change
for Critical Members Processing Steps Rehearsals
Measures Categorisation Management
Functions

2.4 Design 7.4 Develop


1.4 Assess Risk 3.4 Design Backup 5.4 Define Key 6.4 Develop 8.4 Agree and
Questions and 4.4 Define Team Rehearsal
Controls to and Recovery Recovery Return from Maintain DRP
Conduct Charter Evaluation
Mitigate Threats Procedures Objectives Continuity Process Distribution List
Interviews Criteria

1.5 Determine 2.5 Analyse


5.5 Define 7.5 Conduct
Impact of Results of
Recovery Steps Rehearsals
Reduced Controls Interviews

1.6 Determine 5.6 Define Critical


2.6 Summarise 7.6 Review and
Value of Function
and Present Analyse
Additional Restoration
Results Rehearsals
Controls Process
February 18, 2010 50
Step 1 - Conduct Risk and Disaster Avoidance
Assessment
Step Scope Inputs Outputs
1. Conduct Risk Identify and quantify risks and Risks and threats, historical data, Risk assessment report with
and Disaster vulnerabilities to the organisation current environment, current policies, recommendations for
Avoidance
Assessment processes and procedures improvements
1.1 Identify Identify potential threats, internal Agreement on scope of Continuity Potential threats affecting IT
Potential Threats and external, including weaknesses in recovery plan systems are identified
the organisation that will cause
failure of IT systems
1.2 Assess Assess the probability of the potential Potential threats affecting IT systems Assessment of probability of
Probability of threats affecting IT systems are are identified identified potential threats
identified
Threats

1.3 Evaluate Evaluates current disaster avoidance Potential threats affecting IT systems Evaluation of current disaster
Current Disaster measures are identified and their probability avoidance measures
Avoidance
Measures
1.4 Assess Risk Determine the effectiveness of Current avoidance measures Assessment of risk controls to
Controls to controls in deterring threats reduce threats
Mitigate Threats
1.5 Determine Determine how effective a control Assessment of risk controls to reduce Impact to organisation without
Impact of Reduced would be in deterring the threat, threats adequate disaster recovery
limiting the cost of the risk and controls
Controls minimising the impact threats have
1.6 Determine Determine which risks the Assessment of risk controls to reduce Value to organisation of
Value of Additional organisation is willing to accept and threats, impact to organisation additional controls
those to be controlled
Controls
February 18, 2010 51
Step 2 - Conduct Business Impact Analysis

Step Scope Inputs Outputs


2. Conduct Conduct business impact analysis In Risk and disaster avoidance assessment Critical function categorisation
Business Impact order to know which functions are List of recovery requirements
Analysis the most critical to the organisation for processing critical functions
for survival

2.1 Define Defines methodology and process to Business systems Agreed methodologies and
Business Impact be used in Business Impact Analysis processes to be used in
based on the risk and disaster Business Impact Analysis
Analysis avoidance assessment
Methodology
2.2 Identify Identify business functions to be Agreed methodologies and processes Business functions identified for
Business Functions analysed for risk and disasters to be used in Business Impact Analysis analysis
to be Analysed
2.3 Define Defined categorisation criteria for Identified business functions Criteria for categorising
Business Function each business function business functions
Criticality
Categorisation
2.4 Design Design and validate questions and Defined criteria for categories of Validation of business losses
Questions and conduct interviews business functions
Conduct Interviews
2.5 Analyse Results Analyse the data and validate findings Validation of business losses Analysis of data
of Interviews if necessary
2.6 Summarise and Develop conclusions and present final Analysis of data Conclusions and final report of
Present Results report regarding Business Impact Business Impact Analysis
Analysis

February 18, 2010 52


Step 3 - Determine Data Backup and Recovery
Options
Step Scope Inputs Outputs
3. Determine Data Determine data backup and recovery Available time to backup and recover Recovery objectives
Backup and options based on the requirements Acceptable downtime List of backup options,
Recovery Options for recovering critical functions and
the type of disaster or interruption Recovery requirements Supporting procedures
being cater for
3.1 Identify Backup Work with business units to identify Conclusions and final report of Backup options for critical
and Recovery possible backup options for critical Business Impact Analysis functions
business functions
Options for Critical
Functions
3.2 Evaluate Evaluate previously identified backup Backup options for critical functions Evaluated backup options for
Operation of options needs to be for various critical business functions
scenarios
Backup and
Recovery Options
3.3 Determine Determine backup options for those Evaluated backup options for critical Backup options for all critical
Backup and critical business functions that business functions business functions
currently do not have any backup
Recovery Options options or where the options do not
for Critical work correctly
Functions
3.4 Design Backup Design backup procedures for all Backup options for critical business Backup procedures for critical
and Recovery critical business functions functions business functions
Procedures

February 18, 2010 53


Step 4 - Form Continuity and Disaster Recovery
Team
Step Scope Inputs Outputs
4. Form Continuity Establish recovery teams and specify Business needs Recovery team structure
and Disaster what each team is to do in the event Recovery requirements Recovery team charter and
Recovery Team of a broad range of possibilities members
Recovery procedures
4.1 Define Define structure of disaster recovery Decision to proceed Structure of disaster recovery
Recovery Team team team
Structure
4.2 Define Define the function of each individual Structure of disaster recovery team Functions for recovery team
Recovery Team disaster recovery team of each
business units
Functions
4.3 Define Team Define team leader, alternative leader Functions for recovery team Recovery team leader, alternate
Leaders and and other team members for each team leader and members
type of disaster and business units
Members
4.4 Define Team Define charter for each team along Recovery team leader, alternate team Charter and recovery
Charter with the defined roles and leader and members procedures along with roles and
responsibilities responsibilities for each
Define recovery procedures for each recovery team
team relevant to their team role and
charter

February 18, 2010 54


Step 5 - Design and Develop Disaster Recovery Plan

Step Scope Inputs Outputs


5. Design and Develop and validate processes and Recovery objectives Recovery Plan
Develop Disaster procedures to support the critical Scope of plan
Recovery Plan business functions and validate,
Business function classification
Disaster definitions and classification
Recovery team organisation
5.1 Determine DRP Determine the structure and Structure of disaster recovery team Structure and methodology of
Structure and methodology of how the plan will be developing DRP
developed
Methodology
5.2 Define DRP Define the notification schedule and Structure and methodology of Notification schedule and
Notification process of recovery developing DRP recovery process
Schedule and
Process
5.3 Define DRP Define the DRP escalation criteria and Notification schedule and recovery Escalation procedure
Escalation Process procedure process

5.4 Define Key Consider the organisation’s key Escalation procedure Consideration of key recovery
Recovery recovery objectives and policies while objectives and policies
designing DRP
Objectives
5.5 Define Define the framework for disaster Consideration of key recovery Disaster recovery steps
Recovery Steps recovery to ensure it contains the objectives and policies
required recovery steps
5.6 Define Critical Discuss the DRP with business units Disaster recovery steps Accepted restoration process
Function to get acceptance to define final
restoration process and define
Restoration training to be provided
Process
February 18, 2010 55
Step 6 - Alternate Processing for Critical Service
Components
Step Scope Inputs Outputs
6. Alternate Evaluate critical business function Critical business function components Critical business function
Processing for components to determine if alternate Alternatives for processing critical components timelines
Critical Service processing procedures are necessary components Alternate procedures
Components and feasible for the period between a
disaster and recovery and how
recovery should be achieved
6.1 Identify Critical Work with business units to identify Accepted restoration process Critical components identified
Components for critical components that need
alternate processing
Continuity
6.2 Develop Develop options for alternate Critical components identified Options for alternate processing
Options for processing for critical components in
coordination with business units
Continuity
6.3 Develop Develop processing steps based on Options for alternate processing Alternate processing steps
Continuity the options for alternate processing
for critical components
Processing Steps
6.4 Develop Return Develop procedure to return from Alternate processing steps Steps to return critical
from Continuity alternate processing to normal components to normal
processing processing from alternate
Process processing

February 18, 2010 56


Step 7 - Conduct Continuity and Disaster Recovery
Rehearsal
Step Scope Inputs Outputs
7. Conduct Conduct rehearsals to validate the Rehearsal plan Lessons learned
Continuity and success of an organisation’s ability to Recovery procedures Rehearsal report
Disaster Recovery respond and recover from a disaster
Rehearsal Alternate procedures
Rehearsal objectives
7.1 Design Designed programmes for rehearsals Disaster Recovery Plan Programs for rehearsals
Rehearsal

7.2 Develop Develop rehearsal scenarios based on Programs for rehearsals Rehearsal scenarios
Rehearsal the design of rehearsals
Scenarios
7.3 Plan and Plan and schedule rehearsals, both Rehearsal scenarios Schedule rehearsals
Schedule planned and unannounced
Rehearsals
7.4 Develop Develop evaluation techniques and Schedule rehearsals Evaluation techniques and
Rehearsal criteria for each rehearsal scenarios criteria
Evaluation Criteria
7.5 Conduct Conduct rehearsals in coordination Schedule rehearsals Conduct rehearsals
Rehearsals with all other members
7.6 Review and Document and distribute outcomes of Conduct rehearsals Reports on conducted
Analyse Rehearsals the rehearsals to all the members rehearsals
along with lessons learned and review
reports

February 18, 2010 57


Step 8 - Maintain Continuity and Disaster Recovery
Plan
Step Scope Inputs Outputs
8. Maintain Conduct scheduled reviews of the Disaster recovery plan Recommendations for
Continuity and contents of the continuity plan Review schedule improvements or changes
Disaster Recovery Updated the plan as part of the Approval list from reviewer
Plan List of reviewers
change management process and
with other related changes Review criteria and objectives
8.1 Assign Identify reviewers responsible for Rehearsal review reports Assigned responsibilities to
Responsibility for plan maintenance and assign DRP review and maintenance of DRP
responsibility
DRP Maintenance Review criteria and objectives
8.2 Establish DRP Establish review and maintenance of Assigned responsibilities to review and Procedure for review and
Review and procedures and schedules maintenance of DRP maintenance of DRP
Maintenance
Procedures and
Schedule
8.3 Integrate DRP Integrate maintenance process with Review feedbacks and inputs Updated DRP
Maintenance into change management processes to
assessed changes for their potential
Change impact on the continuity plans
Management
8.4 Agree and After updating DRP create a Updated DRP Distribution list
Maintain DRP distribution list to whom the DRP has
to be distributed
Distribution List

February 18, 2010 58


Continuity Design and Management Core Principles

1. Scope of continuity plan must contain clear and realistic recovery


objectives and recovery timeframes
2. Risk management and disaster avoidance measures should be in
place and practiced
3. Continuity plan including disaster recovery should be designed and
developed to support recovery of agreed critical business functions
4. Continuity plan should be rehearsed regularly
5. Continuity and recovery strategies or plans should be integrated
into design and deployment of changes to infrastructure
6. Continuity and recovery processes or plans should be reviewed
and updated on a regular basis

February 18, 2010 59


Core Principle 1 - Scope Of Continuity Plan Must Contain
Clear And Realistic Recovery Objectives And Recovery
Timeframes
• Elements • Benefits

− Recovery process must be aligned − Clear objectives


to support business objectives − Defined scope of efforts
− It must be ensured that business − Expectations are agreed and
impact and recovery investments defined
have direct relationship − Coordinated recovery efforts
− Recovery time and objectives
needs to be communicated and
validated
− The disasters must be defined,
which continuity plan will and will
not address
− Scope of planning efforts must be
stated

February 18, 2010 60


Core Principle 2 - Scope Of Continuity Plan Must Contain
Clear And Realistic Recovery Objectives And Recovery
Timeframes
• Elements • Benefits

− Ensure that environment is − Control of preventable, predictable


constructed and operated to disasters
prevent potential disasters − Minimising and deterring potential
− As infrastructure changes and disasters
business needs change, ensure
risks and exposures are addressed

February 18, 2010 61


Core Principle 3 - Continuity Plan Including Disaster
Recovery Should Be Designed And Developed To Support
Recovery Of Agreed Critical Business Functions
• Elements • Benefits

− Investment for adequate − Expectations are set and agreed


preventative, proactive, and upon
recovery methods for critical − Minimise significant losses to the
business functions organisation in terms of financial,
− All business functions and their legal, and operational issues
criticality must be defined and
communicated to the organisation
− Must be ensured that the key
customers are reassured of
continuity management process

February 18, 2010 62


Core Principle 4 - Continuity Plan Should Be
Rehearsed Regularly
• Elements • Benefits

− Regular rehearsals must be − Potential for successful recovery is


conducted, both planned and high
unannounced − Reinforces learning and
− Partial and full rehearsals must be commitment
conducted − Demonstrates value to
− A variety of rehearsal techniques organisation
must be used − Identification of potential
− Rehearsal objectives and success weaknesses in plan
criteria must be clearly defined

February 18, 2010 63


Core Principle 5 - Continuity And Recovery Strategies Or
Plans Should Be Integrated Into Design And Deployment Of
Changes To Infrastructure
• Elements • Benefits

− Must ensure the plans for changes − Continuity is critical component of


to infrastructure are considered operating environment
with continuity in mind − Continuity strategies and plan have
− Recovery procedures must be important role in design and
requested for new applications, deployment decisions and plans
systems, networks

February 18, 2010 64


Core Principle 6 - Continuity And Recovery Processes Or
Plans Should Be Reviewed And Updated On A Regular Basis

• Elements • Benefits

− Regular reviews of continuity plans − Keeps continuity plan as a living


must be defined and scheduled document
− Make sure reviewers are not − Ensures the plan is kept current
involved in the development of the − Reminder of continuing purpose of
plan and are objective plan and its benefits to the
− Integration into the change organisation
management process for plan
updates must be ensured
− Revision, tracking, and distribution
list must be defined and document

February 18, 2010 65


Use Core Principles as Checklist for Independent
Verification of Continuity Design and Processes
1 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery 
Timeframes
1.1 Recovery process must be aligned to support business objectives 
1.2 It must be ensured that business impact and recovery investments have direct relationship 
1.3 Recovery time and objectives needs to be communicated and validated 
1.4 The disasters must be defined, which continuity plan will and will not address 
2 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery 
Timeframes
2.1 Ensure that environment is constructed and operated to prevent potential disasters 
2.2 As infrastructure changes and business needs change, ensure risks and exposures are addressed 
3 Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery 
Of Agreed Critical Business Functions
3.1 Investment for adequate preventative, proactive, and recovery methods for critical business functions 
3.2 All business functions and their criticality must be defined and communicated to the organisation 
3.3 Must be ensured that the key customers are reassured of continuity management process 
4 Continuity Plan Should Be Rehearsed Regularly 
4.1 Regular rehearsals must be conducted, both planned and unannounced 
4.2 Partial and full rehearsals must be conducted 
February 18, 2010 66
Process Quality Control

February 18, 2010 67


Common Process Quality Control Procedures for
Availability and Continuity

Continuity Process Quality Control

3. Management
2. Continuity
1. Continuity Escalations of
Report Evaluation
Reporting Service Continuity
and Improvement
Violations

Availability Process Quality Control

2. Availability 3. Management
1. Availability Report Escalations of
Reporting Evaluation and Service Availability
Improvement Violations

February 18, 2010 68


Structured Approach to Availability and Continuity
Process Quality Control
Availability and Continuity Process
Quality Control

1. Generate Report Metrics and 3. Management Escalations of


2. Evaluation and Improvement
Reports Service Continuity Violations

1.1 Develop Management Reports 2.1 Evaluate Process for


Based on Agreed Metrics Improvement

2.2 Develop Improvements and


1.2 Schedule Report
Implementation Plan

2.3 Create and Submit


1.3 Generate Reports Improvement Implementation
Plan

1.4 Distribute Reports 2.4 Implement Improvement Plan

1.5 Review Report Schedule 2.5 Review Implementation

2.6 Update Process Improvement


1.6 Update Reporting Schedule
Plan
February 18, 2010 69
Step 1 - Generate Report Metrics and Reports

Step Scope Inputs Outputs


1. Generate Report Generate report metrics and periodic Report Schedule Generated or distributed
Metrics and and ad hoc reports as per Reports
Reports Request for Ad hoc reports
requirement or plan
1.1 Develop Report to management the Report requirements Accepted reports, frequency
Management contributions made by this process to and costs
overall service management
Reports Based on
Agreed Metrics
1.2 Schedule Update the report schedule Report schedule Updated report schedule
Report

1.3 Generate Generate reports according to per Collected metrics Generated reports
Reports schedule or in response to ad hoc
requirements
1.4 Distribute Distribute the generated report to the Generated reports Distributed reports
Reports target recipients

1.5 Review Report Review regularly the report Report schedule Review results
Schedule requirements Report details
1.6 Update Update report schedule with the new Report schedule Updated report schedule
Reporting reports
Schedule

February 18, 2010 70


Step 2 - Evaluation and Improvement
Step Scope Inputs Outputs
2. Evaluation and Perform periodic reviews for process Process metrics Implemented improvements,
Improvement performance improvement Future directives Reduced costs, Improved
process efficiency and
Service level expectations effectiveness
Review schedule
Improvement plan
2.1 Evaluate Review the effectiveness and Improvement plan Gap analysis report
Process for efficiency of the continuity
management process regularly
Improvement
2.2 Develop Develop and review proposed process Improvement plan Improvement strategy
Improvements and improvements Gap analysis report
Implementation Revised business requirements
Plan
2.3 Create and Create and submit improvement Improvement strategy Submitted improvement
Submit implementation plan implementation plan
Improvement
Implementation
Plan
2.4 Implement Manage and coordinate the Approved improvement Implemented improvements
Improvement Plan implementation of the process implementation plan Reduced costs
improvement plan Improvement strategy Improved process efficiency
And effectiveness
2.5 Review Monitor implementation to ensure Implemented improvements Closed improvement
Implementation that process is not disrupted and that implementation plan
the changes are working as intended Review Results
2.6 Update Process Update the process improvement Process Improvement plan Updated process improvement
Improvement Plan plan with any changes Review cycle plan
February 18, 2010 71
Summary

• Availability and continuity are merging into a single unbroken


requirement
• Availability and continuity can be a significant overhead to an
organisation so their cost should yield benefits elsewhere
• Most business systems and processes are defined as business critical
• Management commitment is needed to ensure availability and
continuity can the required attention and resources
• Use core principles for availability and continuity for independent
verification of processes and designs
• Availability and continuity should be embedded into system
architectures and designs rather than being an afterthought

February 18, 2010 72


More Information

Alan McSweeney
alan@alanmcsweeney.com

February 18, 2010 73

Das könnte Ihnen auch gefallen