Sie sind auf Seite 1von 118

16 Marzo 2019

OPTIMIZACION DE SERVICIOS
DIGITALES
Maestría en gestión de tecnologías de información

Por
M.C. Luis Alberto Ovando Brito
1. Programas de optimización en tecnologías de la
información

• 1.1 La optimización en las organizaciones


• 1.2 Las tecnologías como un medio para el logro de
objetivos
• 1.3 La optimización para maximizar el valor
• 1.4 Optimización y negocios de objetivos
La necesidad por la optimización
¿Está aproximándose la inversión en tecnología a un punto en el que
disminuye el rendimiento?

En lo que se gasta:
1. Infraestructura de TI
2. Seguridad
3. Capacitación
4. Mano de obra especializada
5. Mantenimiento
La necesidad por la optimización
En lo que se gasta:
1. Infraestructura de TI

Infraestructura de data center


Networking
Conmutación y voz
Dispositivos de usuario final
Licencias
La necesidad por la optimización
El alto coste de la proliferación de activos de TI: El aumento del número de servidores y
la cantidad de almacenamiento constituyen unas de las mayores preocupaciones de los
directores financieros (CFO) y responsables de las tecnologías de la información (CIO) de
las empresas. Conforme aumentan los activos de TI, aumenta la complejidad de la
infraestructura de TI, creando importante problemas de gestión al ya sobrecargado
personal administrativo de TI. Además, el consumo de energía de los centros de datos se
está
disparando a la vez que los precios de la energía continúan subiendo.
La necesidad por la optimización
Infraestructura de centro de datos
Infraestructura de Red
Infraestructura de almacenamiento
Infraestructura de conmutación y voz
Dispositivos de usuario final
Software

Ofimática
Sistemas operativos
Herramientas de desarrollo
Software para la seguridad
CRMs (customer relationship management),
ERPs (Enterprise resource planning),
SCM (Supply chain management),
DSS (Decision Support System)
Software
Calidad del software
Es cierto: ¿qué es?
En un nivel algo pragmático, David Garvin , de Harvard Business School,
sugiere que
“la calidad es un concepto complejo y de facetas múltiples” que puede
describirse desde cinco diferentes puntos de vista:
• El punto de vista trascendental dice que la calidad es algo que se
reconoce de inmediato, pero que no es posible definir explícitamente.
• El punto de vista del usuario concibe la calidad en términos de las metas
específicas del usuario final. Si un producto las satisface, tiene calidad.
• El punto de vista del fabricante la define en términos de las
especificaciones originales del producto. Si éste las cumple, tiene calidad.
• El punto de vista del producto sugiere que la calidad tiene que ver con las
características inherentes (funciones y características) de un producto.
Calidad del software
• Por último, el punto de vista basado en el valor la mide de acuerdo
con lo que un cliente está dispuesto a pagar por un producto. En
realidad, la calidad incluye todo esto y más

ATRIBUTOS Y CRITERIOS DE CALIDAD


Fiabilidad
Disponibilidad
Usabilidad
Robustez
Performance
Seguridad
Pruebas

Alpha y Beta
De unidad
Integración
Caja Negra
Caja Blanca
Rendimiento y Stress
2. Modelos operativos

• 2.1 Eficiencia en costes


• 2.2 Habilitación del negocio
• 2.3 Ventajas estratégicas
• 2.4 Modernización tecnológica
La mejora continua en el campo de TI
Un plan de mejora continua (CSI) permite identificar procesos para una generar
mejoras incrementales, además de alinear y realinear continuadamente los
servicios de TI a las cambiantes necesidades del negocio mediante la
identificación e implementación de mejoras a los servicios y los procesos que los
soportan.
La mejora continua en el campo de TI
La mejora continua en el campo de TI
3 pasos de la mejora continua

1. Evaluación continua de los Servicios y los Procesos tanto de negocio como los de TI
que los apoyan.

2. Identificación continúa de servicios y procesos candidatos a mejora.

3. Adecuación de los servicios y procesos identificando los beneficios de cada cambio.


La mejora continua en el campo de TI
Requisitos para la mejora continua

1. Analizar la cadena de valor de la organización.

2. Identificar las actividades estratégicas de la empresa.

3. Establecer un plan que apoye los núcleos generadores de valor (actividades estratégicas) a través
de TI de manera continuada.

4. Organización y asignación de Roles & Responsabilidades.

5. Establecer herramientas de monitoreo de servicios y procesos.

6. Reuniones periódicas para revisar el avance de Cada Cambio y Mejora


La mejora continua en el campo de TI
Requisitos para la mejora continua

La cadena de valor empresarial, también cadena de valor, es un modelo teórico que permite
describir el desarrollo de las actividades de una organización empresarial generando valor al
producto final, descrito y popularizado por Michael Porter en su obra, Competitive Advantage:
Creating and Sustaining Superior Performance (1985)

Ventajas competitivas

Ejemplo de un posible valor… ????


La mejora continua en el campo de TI
Requisitos para la mejora continua
La mejora continua en el campo de TI
Requisitos para la mejora continua

La cadena de valor ayuda a determinar las actividades, core business o competencias distintivas
que permiten generar una ventaja competitiva. Tener una ventaja de mercado es tener una
rentabilidad relativa superior a los rivales en el sector industrial en el cual se compite, la cual
tiene que ser sustentable en el tiempo.​ Rentabilidad significa un margen entre los ingresos y los
costos. Cada actividad que realiza la empresa debe generar el mayor ingreso posible. De no ser
así, debe costar lo menos posible, con el fin de obtener un margen superior al de los rivales
La mejora continua en el campo de TI
Modernización tecnológica

• Infraestructura de datos y voz


• Procesamiento – Computo
• Red y conmutación
• Software
• Dispositivo de usuario final
• Infraestructura de soporte
• Capacitación y certificación
Diseño de almacenamiento
https://d.docs.live.net/da28e6b541b074e4/MISDOCS/TRABAJO/UVM
/MATERIAS/POSTGRADO/OPTIMIZACION%20DE%20SERVICIOS%20DI
GITALES/PRESENTACIONES/OPTIMIZACION%20DE%20SERVICIOS%20D
IGITALES%20-%20sesion%202.pptx
Caso practico: Optimización de rendimiento
What is TDP (Thermal Design Power)?
TDP (Thermal Design Power) Intel defines TDP as follows: The upper
point of the thermal profile consists of the Thermal Design Power
(TDP) and the associated Tcase value. Thermal Design Power (TDP)
should be used for processor thermal solution design targets. TDP is
not the maximum power that the processor can dissipate. TDP is
measured at maximum TCASE.

CPU Temperature
Also called "Tcase", this is the temperature shown in Intel's Thermal
Specification. It's measured on the surface of the Integrated Heat
Spreader (IHS) under tightly controlled laboratory conditions
Caso practico: Optimización de rendimiento
What is TDP (Thermal Design Power)?
Lower Power is Better ???
There are many press articles which assume a lower power server is
also more energy efficient. This is actually far from the truth. Power
by itself is not a measurement of overall server efficiency.
Performance of the server, in conjunction with the power consumed
is what defines energy efficiency. A system which is lower power, but
is also lower performance will take longer to perform a task, and
may ultimately consume more energy. The most efficient server is
one which has the best performance per watt. As an example,
consider these two servers with the following performance and
power specifications:
Caso practico: Optimización de rendimiento
What is TDP (Thermal Design Power)?
Lower Power is Better ???
At first glance you might think Server B will be more efficient because
it has lower total processor TDP (70W vs. 180W). However, when you
look at the overall server performance per watt, you see a different
story. Server A has 218% higher performance, yet server power is
only 176% greater than Server B. Even though Server A does
consume more power, it is 23.6% more energy efficient as shown by
its better performance/watt.
Caso practico: Optimización de rendimiento
What is TDP (Thermal Design Power)?
Core i7-8700
Caso practico: Optimización de rendimiento
What is TDP (Thermal Design Power)?
Core i7-8700
Caso practico: Optimización de rendimiento
What is TDP (Thermal Design Power)?
Caso practico: Optimización de rendimiento
What is TDP (Thermal Design Power)?
PL1 is the effective long-term expected steady state power consumption
of a processor. For all intents and purposes, the PL1 is usually defined as
the TDP of a processor. So if the TDP is 45W, then PL1 is 45W.

PL2 is the short-term maximum power draw for a processor. This number
is higher than PL1, and the processor goes into this state when a workload
is applied, allowing the processor to use its turbo modes up to the
maximum PL2 value. This means that if Intel has defined a processor with a
series of turbo modes, they will only work when PL2 is the driving variable
for maximum power consumption. Turbo does not work in PL1 mode.

Intel Core i7 8750h PL2 TDP is 78W


3. Arquitectura de solución

3.1 Portafolio de tecnologías de la información

3.2 Soluciones del área tecnológica prestadas el negocio

3.3 Gestión del cliente o servicio y modelo de prestación

3.4 Portafolio de dominios


Portafolio de tecnologías de información

1. Telecomunicaciones

2. Sistemas de información

3. Sistemas operativos

4. Sistemas para la toma de decisiones

5. Seguridad
Portafolio de tecnologías de información
Telecomunicaciones

1. Telepresencia

2. VoIP

3. Software para colaboración y

comunicación grupal
Portafolio de tecnologías de información
Telecomunicaciones
Portafolio de tecnologías de información
Opciones de conectividad
Redes escalables
Modelo jerárquico
Redes escalables
Metro Ethernet

Una Red Metro Ethernet, es una arquitectura


tecnológica destinada a suministrar servicios de
conectividad de datos en una Red de área
metropolitana (MAN) de capa 2 en el modelo OSI, a
través de interfaces (UNIs) Ethernet. Estas redes
denominadas "multiservicio", soportan una amplia
gama de servicios, aplicaciones, y cuentan con
mecanismos donde se incluye soporte a tráfico "RTP"
(tiempo real), para aplicaciones como Telefonía IP y
Video IP, aun cuando este tipo de tráfico es
especialmente sensible al retardo y
al jitter(Fluctuación).
Optimización de la seguridad
Seguridad de la informacion

Confidencialidad (Confidentiality)
Integridad (Integrity)
Disponibilidad (Availability)
4. Optimizacion de tecnologías de información
5. Evaluacion del rendimiento

• Rendimiento del almacenamiento


• Continuidad del negocio
• Respaldo, Replica y archivamiento
• Disponibilidad
• ITIL , COBIT
4. Optimizacion de tecnologías de información
5. Evaluacion del rendimiento

• Rendimiento del almacenamiento


• Continuidad del negocio
• Respaldo, Replica y archivamiento
• Disponibilidad
• ITIL , COBIT
Module 4: Intelligent Storage Systems (ISS)
Upon completion of this module, you should be able to:
• Describe the key components of an intelligent storage system
• Describe HDD and SSD components, addressing, and performance
• Describe RAID, its techniques, and its levels
• Discuss the types of intelligent storage systems

Module 4: Intelligent Storage Systems (ISS)


Third Platform Requirements for Storage
• Process massive amount of IOPS
• Elastic and non-disruptive horizontal scaling of resources
• Intelligent resource management
• Automated and policy driven configuration
• Support for multiple protocols for data access
• Supports APIs for software-defined and cloud integration
• Centralized management and chargeback in a multi-tenancy environment

Module 4: Intelligent Storage Systems (ISS)


Technology Solution
• Intelligent storage system
• Block-based storage system
• File-based storage system
• Object-based storage system
• Unified storage system
• Storage Virtualization
• Software-defined storage

Module 4: Intelligent Storage Systems (ISS)


What is an Intelligent Storage System?
Intelligent Storage System
A feature-rich RAID array that provides highly optimized I/O processing capabilities.

• Has a purpose-built operating Features


environment • Supports combination of HDD and SSD
• Provides intelligent resource • Service massive amount of IOPS
management capability • Scale-out architecture
• Deduplication, compression, and encryption
• Provides large amount of cache • Automated storage tiering
• Virtual storage provisioning
• Provides multiple I/O paths • Multi-tenancy
• Supports APIs to integrate with SDDC and cloud
• Data protection

Module 4: Intelligent Storage Systems (ISS)


Components of Intelligent Storage System
• Two key components of an ISS Storage
• Controller
• Block-based
• File-based
• Object-based
• Unified
• Storage
• All HDDs
• All SSDs
• Combination of both

Controller(s)

Intelligent Storage System

Module 4: Intelligent Storage Systems (ISS)


Storage – Hard Disk Drives
Components of HDD

Controller
Board

HDA

Platter and Interface


Read/Write Head
Power
Connectors

Module 4: Intelligent Storage Systems (ISS)


Physical Disk Structure

Spindle Sector Sector

Track

Cylinder

Track

Platter

Module 4: Intelligent Storage Systems (ISS)


Logical Block Addressing
Sector 8

(Upper Surface)
Head 0 Block 0

Cylinder 1 Block 32
(Lower Surface)

Block 64

Block 128

Physical Address = CHS Logical Block Address = Block #

Module 4: Intelligent Storage Systems (ISS)


HDD Performance
• Electromechanical device
• Impacts the overall performance of the storage system
• Disk service time
• Time taken by a disk to complete an I/O request, depends on:
• Seek time
• Rotational latency
• Data transfer rate

Disk service time = seek time + rotational latency + data transfer time

Module 4: Intelligent Storage Systems (ISS)


Seek Time
• Time taken to position the read/write head
• The lower the seek time, the faster the I/O operation
• Seek time specifications
include
• Full stroke
• Average
• Track-to-track
• The seek time of a disk is specified by the drive
manufacturer

Module 4: Intelligent Storage Systems (ISS)


Rotational Latency
• The time taken by the platter to rotate and position the
data under the R/W head
• Depends on the rotation speed of the spindle
• Average rotational latency
• One-half of the time taken for a full rotation
• For ‘X’ rpm, drive latency is calculated in milliseconds as:
1
( × 1000) 500 30000
= 2 = =
𝑋 𝑋 𝑋
( ) ( )
60 60

Module 4: Intelligent Storage Systems (ISS)


Data Transfer Rate
• Average amount of data per unit time that the drive can deliver to the HBA
• Internal transfer rate: Speed at which data moves from a platter’s surface to the
internal buffer of the disk
• External transfer rate: Rate at which data move through the interface to the HBA

Disk Drive

Head Disk
HBA Interface Buffer Assembly

External transfer rate measured here Internal transfer rate measured here

Module 4: Intelligent Storage Systems (ISS)


I/O Controller Utilization Vs. Response Time
• Based on fundamental laws of disk drive performance:
𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝑇𝑖𝑚𝑒
𝐴𝑣𝑔. 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑇𝑖𝑚𝑒 =
(1 − 𝑈𝑡𝑖𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛)
• Service time is time taken by the controller to serve an I/O
• For performance-sensitive applications disks are commonly utilized below 70% of their
I/O serving capability

Module 4: Intelligent Storage Systems (ISS)


Storage Design Based on Application Requirements and Disk
Drive Performance
• Disks required to meet an application’s capacity need (DC):
𝑇𝑜𝑡𝑎𝑙 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑
𝐷𝒄 =
𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑑𝑖𝑠𝑘

• Disks required to meet application’s performance need (DP):


𝐼𝑂𝑃𝑆 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 𝑏𝑦 𝑎𝑛 𝑎𝑝𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑝𝑒𝑎𝑘 𝑤𝑜𝑟𝑘𝑙𝑜𝑎𝑑
𝐷𝑝 =
𝐼𝑂𝑃𝑆 𝑠𝑒𝑟𝑣𝑖𝑐𝑒𝑑 𝑏𝑦 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑑𝑖𝑠𝑘

• IOPS serviced by a disk (S) depends upon disk service time (TS):
0.5 𝐷𝑎𝑡𝑎 𝑏𝑙𝑜𝑐𝑘 𝑠𝑖𝑧𝑒
𝑇𝑠 = 𝑆𝑒𝑒𝑘 𝑡𝑖𝑚𝑒 + +
(𝐷𝑖𝑠𝑘 𝑟𝑝𝑚/60) 𝐷𝑎𝑡𝑎 𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑟𝑎𝑡𝑒

• TS is time taken for an I/O to complete, therefore IOPS serviced by a disk (S) is equal to (1/TS)
1
• For performance sensitive application (S) = 0.7 X
𝑇𝑠

Disk required for an application = Max (DC, DP)

Module 4: Intelligent Storage Systems (ISS)


Storage – Solid State Drives
Components of SSD

RAM
Cache
Flash
Memory
Flash
Memory
..... Flash
Memory

Flash

......
Memory
I/O interface

Drive

....
Controller

Non-Volatile
Memory
Flash
Memory
........ Flash
Memory

Controller Mass Storage

Module 4: Intelligent Storage Systems (ISS)


SSD Addressing
Logically mapped to pages
(SSD metadata)
8KB write to Saved as two 4KB
SSD pages LBA 0x2000

LBA 0x3000

……
128KB Block
4KB Page
(32 x 4KB pages)

Module 4: Intelligent Storage Systems (ISS)


Page and Block States

Write
Start Erased New

Write
Start Erased Valid

(re)write
Erase or delete Write
Erase
(electrical) or delete
(electrical)

Invalid Used
(re)write
or delete
(invalidate
page)

Flash memory page states Block state diagram

Module 4: Intelligent Storage Systems (ISS)


SSD Performance
• Access type
• SSD performs random reads the best
• SSDs use all internal I/O channels in parallel for multi-threaded large block I/Os
• Drive state
• New SSD or SSD with substantial unused capacity offers best performance
• Workload duration
• SSDs are best for workloads with short bursts of activity

Module 4: Intelligent Storage Systems (ISS)


Why RAID?
RAID
A technique that combines multiple disk drives into a logical unit (RAID set) and provides
protection, performance, or both.

• Provides data protection against drive failures


• Improves storage system performance by serving I/Os from multiple drives
simultaneously
• Two implementation methods
• Software RAID implementation
• Hardware RAID implementation

Module 4: Intelligent Storage Systems (ISS)


RAID Array Components

Logical Array
(RAID Sets)
RAID
Controller
Hard Disks

Compute System

RAID Array

Module 4: Intelligent Storage Systems (ISS)


RAID Techniques
Striping Mirroring Parity

A A A

RAID Controller RAID Controller RAID Controller

D1 D2 D3 P

A1 A2 A3 A4 A A A1 A2 A3 Ap

Strip Rebuilding data of the failed D3 drive:


D1 + D2 + ? = P
D3 = P – D1 – D2
Stripe
Figure 1 Figure 2 Figure 3

Module 4: Intelligent Storage Systems (ISS)


RAID Levels
• Commonly used RAID levels are:
• RAID 0 – Striped set with no fault tolerance
• RAID 1 – Disk mirroring
• RAID 1 + 0 – Nested RAID
• RAID 3 – Striped set with parallel access and dedicated parity disk
• RAID 5 – Striped set with independent disk access and a distributed parity
• RAID 6 – Striped set with independent disk access and dual distributed parity

Module 4: Intelligent Storage Systems (ISS)


RAID 0

Module 4: Intelligent Storage Systems (ISS)


RAID 1

Module 4: Intelligent Storage Systems (ISS)


Nested RAID – 1+0

Module 4: Intelligent Storage Systems (ISS)


RAID 3

Module 4: Intelligent Storage Systems (ISS)


RAID 5

Module 4: Intelligent Storage Systems (ISS)


RAID 6

Module 4: Intelligent Storage Systems (ISS)


RAID Impacts on Performance

• In RAID 5, every write (update) to a disk manifests as four I/O operations (2 disk reads and 2 disk
writes)
• In RAID 6, every write (update) to a disk manifests as six I/O operations (3 disk reads and 3 disk
writes)
• In RAID 1, every write manifests as two I/O operations (2 disk writes)

Module 4: Intelligent Storage Systems (ISS)


RAID Comparison

Available storage
RAID level Min disks Write penalty Protection
capacity (%)

1 2 50 2 Mirror

1+0 4 50 2 Mirror

Parity
3 3 [(n-1)/n]*100 4
(Supports single disk failure)

Parity
5 3 [(n-1)/n]*100 4
(Supports single disk failure)

Parity
6 4 [(n-2)/n]*100 6
(Supports two disk failures)

Module 4: Intelligent Storage Systems (ISS)


Dynamic Disk Sparing (Hot Sparing)

Module 4: Intelligent Storage Systems (ISS)


Data Access Methods
Compute Compute Compute

Application Application Application

File Interface User Component


File System
OSD Interface
Block Interface

File-level
Request
Storage

Object-level
Network

Request
Storage

Block-level
Request
Storage Network
Network
Network

File System Storage Component

Block I/O Block I/O Block I/O


File System
Storage Storage Storage
User Component
Storage System Storage System Storage System
Storage Component
Block-level Access File-level Access Object-level Access

Module 4: Intelligent Storage Systems (ISS)


Types of Intelligent Storage Systems
• Block-based storage systems
• File-based storage systems
• Object-based storage systems
• Unified storage systems

Module 4: Intelligent Storage Systems (ISS)


Scale-up Vs. Scale-out Architecture
Storage
Scale-up

Scale-out

Node 1 Node 2 Node 3

Controller(s)

Cluster

Module 4: Intelligent Storage Systems (ISS)


Introduction to Backup and Recovery
Backup
An additional copy of production data, created and retained for the sole purpose of recovering
lost or corrupted data.

• Typically both application data and server configurations are backed up to restore data and servers
in the event of outage
• Businesses also implement backup solutions in order to comply with regulatory requirements
• To implement a successful backup and recovery solution
• IT needs to evaluate the backup methods along with their recovery considerations and retention
requirements

Module 13: Backup and Archive


Primary Purposes of Backup
• Disaster recovery
• Restores to the operational state following a
disaster
• Operational restores Disaster

• Enables recovery in case of data loss, logical


corruption
• Long-term storage Operational Error

• Preserves records required for regulatory


requirements
Long-term
Preservation

Module 13: Backup and Archive


Backup Architecture
Backup Cloud
Server

Tracking
Information

Backup
Clients Backup Data Backup Data
Storage
Node
Backup
Device

• Key backup components


• Backup client
• Backup server
• Storage node
• Backup device (backup target)

Module 13: Backup and Archive


Backup Targets
Backup Target Description

• Tapes are portable and can be used for long term offsite storage
• Must be stored in locations with a controlled environment
Tape Library • Not optimized to recognize duplicate content
• Data integrity and recoverability are major issues with tape-based backup media

• Enhanced backup and recovery performance


• No inherent off-site capability
Disk Library • Disk-based backup appliance includes features such as deduplication, compression, encryption,
and replication to support business objectives

• Disks are emulated and presented as tapes to backup software


• Does not require any additional modules or changes in the legacy backup software
• Provides better performance and reliability over physical tape
Virtual Tape Library • Does not require the usual maintenance tasks associated with a physical tape drive, such as
periodic cleaning and drive calibration

Module 13: Backup and Archive


Backup Granularity
• Full backup
• Incremental backup
• Cumulative backup
• Synthetic backup
• Incremental forever backup

Module 13: Backup and Archive


Key Backup/Recovery Considerations
• Backup requires integration between backup applications and management server of virtualized
environment
• Application awareness
• Backup solutions should integrate with different types of business applications including third
platform applications
• Backup and recovery operations need to be automated
• Policy-based protection
• Need to support deduplication and WAN optimization techniques
• To optimize backup infrastructure and reduce cost
• To provide extended retention of backup copies

Module 13: Backup and Archive


Key Backup/Recovery Considerations (Cont’d)
• Backup requirements may differ from one service to another based on RTO and
RPO
• Requires well-defined backup strategies to meet the requirements
• Supports on-demand recovery of data at file and VM level
• Backup solution needs to support secure multi-tenancy
• Centralized management of backup and recovery environment
• Should provide single management interface
• Should provide chargeback or show-back reporting for backup data

Module 13: Backup and Archive


Agent-based Backup Approach
• Agent is running inside the application servers (physical/virtual)
• Performs file-level backup
• Impacts performance of applications running on compute systems
• Performing backup on multiple VMs on a compute system may consume more resources and lead to resource
contention
A A

Application
Servers
A
Backup Server/ Backup Device
Storage Node A Backup Agent

Module 13: Backup and Archive


Image-based Backup Approach
• Creates a copy (snapshot) of the entire virtual disk and configuration data
associated with a particular VM
• Backup is saved as a single entity called a VM image
• Enables quick restoration of a VM
• Supports recovery at VM-level and file-level
• No agent is required inside the VM to perform backup
• Backup processing is offloaded from VMs to a proxy server
Proxy
Server

VMDK VM
Files Snapshot
Mount the
Create Snapshot on Proxy Backup
Snapshot Server Data

Application Servers FS Volume Backup


Device

Module 13: Backup and Archive


Image-based Backup Approach (Cont’d)
• Changed block tracking for backup
• Identifies and tags any blocks that have changed since the last VM snapshot
• Enables the backup application to backup only the blocks that have changed, rather
than backing up every block
• Changed block tracking for restore
• Determines which blocks have changed since the last backup and restores only the
changed VM blocks
• Reduces RTO

Module 13: Backup and Archive


Image-based Backup Approach (Cont’d)
Recovery-in-place
A term that refers to running a VM directly from the backup device, using a backed up copy of
the VM image instead of restoring that image file.

• Eliminates the need to transfer the image from the backup device to the
primary storage before it is restarted
• Provides an almost instant recovery of a failed VM
• Requires a random access device in order to work efficiently
• Disk-based backup target
• Reduces the RTO and network bandwidth to restore VM files

Module 13: Backup and Archive


NDMP-based Backup Approach
NDMP
An open standard TCP/IP-based protocol specifically designed for a backup in a NAS
environment.

• Data can be backed up using NDMP regardless of the OS or platform


• Backup data is sent directly from NAS to the backup device
• No longer necessary to transport data through application servers
• Backs up and restores data while preserving security attributes of file system
(NFS and CIFS) and maintains data integrity

Module 13: Backup and Archive


NDMP-based Backup Approach (Cont’d)
Key Components of NDMP
Application
Servers NDMP Server Running on
NAS Head
• NDMP client
• It is an NDMP enabled backup software installed as add-on
software on backup server
• Instructs the NAS head to start the backup Backup Data

• NDMP server NDMP Client

• NAS head acts as an NDMP server which performs backup and


sends the data to backup device
• The NAS head uses its data server to read the data from the storage
Backup Metadata NAS Device Backup
• The NAS head then uses its media server to send data read by the Server Device
data server to a backup device
• Only backup metadata is transferred over production LAN Backup data is transferred either directly to backup
device (NDMP 2-way) or via a private network (NDMP 3-
way)

Module 13: Backup and Archive


NDMP-based Backup Approach (Cont’d)
NDMP Server
Application Servers Running on
NAS Head (B)
Backup
Application Servers Data

NDMP Server
Running on Storage Network
NAS Head
Backup
Data

NDMP Client Storage Network


LAN Backup
Device
NDMP Client Backup
LAN Private
Backup LAN Data
Metadata
Backup Device
Backup Server

Metadata
Backup Server

NDMP Server
Running on
NAS Head (A)

NDMP 2-way Backup NDMP 3-way Backup

Module 13: Backup and Archive


Direct Primary Storage Backup Approach
• Backs up data directly from primary storage system to a backup storage without
the need of additional backup software
• Eliminates the backup impact on application servers
• Improves the backup and recovery performance to meet SLAs
A

Backup
Data

Storage Storage
Application Network Network
Servers

Primary Backup Device


Storage System
A Agent

Module 13: Backup and Archive


Drivers for Data Deduplication
Capacity requirements are growing year Shorter backup windows due to the need
over year – Increases storage cost for 24x7 service availability

Limited
Limited Backup
Budget Window

Network Longer
Bandwidth Retention
Constrain Period
Data is distributed across remote locations
Regulatory requirement demand to keep
for DR purpose – Requires huge network
data for longer periods
bandwidth

Module 13: Backup and Archive


Introduction to Data Deduplication
Data Deduplication
The process of detecting and identifying the unique data segments within a given set of data to
eliminate redundancy.

• Deduplication process
• Chunk the data set
• Identify duplicate chunk Deduplication
• Eliminate the redundant chunk

• Deduplication could be performed in backup After Deduplication


as well as in production environment Unique segments = 3

• Effectiveness of deduplication is expressed


Before Deduplication
as a deduplication ratio Total segments = 39

Module 13: Backup and Archive


Factors Affecting Deduplication Ratio
Factor Description

Longer the data retention period, the greater is the chance of identical data existence in
Retention period
the backup

More frequently the full backups are conducted, the greater is the advantage of
Frequency of full backup
deduplication

Fewer the changes to the content between backups, the greater is the efficiency of
Change rate
deduplication

Data type The more unique the data, the less intrinsic duplication exists

The highest amount of deduplication across an organization is discovered using variable-


Deduplication method
length, sub-file deduplication

Module 13: Backup and Archive


Deduplication Granularity Level
• File-level deduplication
• Detects and removes redundant copies of identical files
• Only one copy of the file is stored; the subsequent copies are replaced with a pointer
to the original file
• Does not address the problem of duplicate content inside the files

• Sub-file level deduplication


• Breaks files down to smaller segments
• Detects redundant data within and across files
• Two methods:
• Fixed-length block
• Variable-length block

Module 13: Backup and Archive


Deduplication Methods
Source-based Deduplication
Deduplication at Source

• Data is deduplicated at the source (backup client)


• Backup client sends only new, unique segments across the A A
network

• Reduced storage capacity and network bandwidth


requirements
Application Server Backup
• Recommended for ROBO environment for taking
Deduplication
(Backup Client) Server Device

centralized backup
• Cloud service providers can also implement this method A Deduplication Agent

when performing backup from consumer’s location to


their location

Module 13: Backup and Archive


Deduplication Methods (Cont’d)
Target-based Deduplication Deduplication at
Target

• Data is deduplicated at the target


• Inline
• Post-process
• Offloads the backup client from deduplication
process Deduplication Backup
Application Server Device
(Backup Client) Server
• Requires sufficient network bandwidth
Deduplication Appliance

• In some implementations, part of the deduplication load is


moved to the backup server
• Reduces the burden on the target
• Improves the overall backup performance

Module 13: Backup and Archive


Global Deduplication
• Single hash index is shared across multiple appliances (nodes)
• Ensures the data is backed up only once across the backup environment
• Deduplication is more effective – provides better deduplication ratio
• Creates smaller storage footprints and reduces storage costs
• Best suited for environment with large amount of backup data across multiple
locations

Module 13: Backup and Archive


Data Deduplication in Primary Storage
• Eliminates redundant data block in primary storage
• All incoming data writes are chunked into blocks
• Each block is fingerprinted (hash value) based on the data content
• Each fingerprinted block is compared to the existing blocks before it is written to the
storage system
• If the block is already existing, the data block is not written to disk
• Else, this unique data block is written to the disk

• Reduces the primary storage requirement and TCO


• Improves the effective utilization of storage

Module 13: Backup and Archive


Drivers for Cloud-based Backup
• Large CAPEX to procure backup infrastructure for large volume of data
• Continuous investment to meet the changing technology and backup requirements
• Deployment of a new backup solution takes weeks of planning, justification,
procurement, and setup
• Difficulty in meeting service level and compliance requirements
• Complexity in managing backup environment
• Limited IT resources for managing backup

Module 13: Backup and Archive


Cloud-based Backup (Backup as a Service)
• Enables consumers to procure backup services on demand through self-service
portal Cloud Resources

• Provides the capability to perform backup and recovery any time, from
anywhere
• Reduces the backup management overhead
• Transformation from CAPEX to OPEX
• Pay-per-use/subscription-based pricing
Backup Data Restore Data
• Enables to meet long-term retention requirements to Cloud from Cloud

• Backing up to cloud ensures regular and automated backup of data


• Gives consumers the flexibility to select a backup technology based on their
current requirements
Backup Clients

Module 13: Backup and Archive


Backup Service Deployment Options
Cloud Resources
Cloud Resources Cloud Resources

Backup data
Backup data is sent to the
is sent to the cloud for DR purpose
cloud Backup is
Agent is running on performed
the backup client on in consumer’s
consumer’s location location
Consumer
• Suitable when a cloud service provider already hosts Organization
consumer applications and data Consumer
Organization
• Backup service is offered by the provider to protect
• Service provider only manages data replication and
consumer’s data
• Service provider receives data from consumers IT infrastructure at disaster recovery site
• Backup is managed by the service provider
• Backup is managed by the service provider • Local backups are managed by consumer
organization

Managed Backup Service Remote Backup Service Replicated Backup Service

Module 13: Backup and Archive


Mobile Device Backup Mobile Backup Clients

• Organization’s critical data resides not only within its data center
but also on mobile devices
• It is important to backup data from these devices to the data center or
cloud
• Require installing backup client application on the mobile devices
• Deduplication, compression, encryption, and incremental backup
Backup
Data WAN

can be implemented for backup


• Provides network and backup storage optimization and security
Cloud

Enterprise Data Center

Module 13: Backup and Archive


Mobile Device Backup (Cont’d)
• Backing up remote mobile devices can be challenging due to:
• Backup client must support mobile device’s OS
• Data is backed up only when the mobile device is online
• Devices are not always connected to the corporate network, so it happens over
Internet, that may rise to security threat
• Backup is impacted due to intermittent network connectivity
• To overcome these challenges, organizations must adopt new policies,
strategies, and techniques to protect the data residing on mobile devices

Module 13: Backup and Archive


Introduction to Data Archiving
Data Archiving
The process of identifying and moving inactive data out of current production systems into low
cost storage tier for long term retention and future reference.

• Data archive is a repository where fixed content is stored


• Organizations set their own policies for qualifying data to archive
• Archiving enables organizations
• To reduce on-going primary storage acquisition costs
• To meet regulatory compliance
• To reduce backup challenges including backup window by moving static data out of the recurring backup
stream process
• To make use of these data for generating new revenue strategies

Module 13: Backup and Archive


Key Requirements for Data Archiving Solutions
• To provide automated policy-driven archiving
• To provide scalability, authenticity, immutability, availability, and security
• Support single instance storage and variety of online storage options (disk and cloud-based storage)
• To provide rapid retrieval of archived data when required
• Capable of handling a variety of electronic documents, including e-mail, instant messages, files
• To provide features for indexing, searching, and reporting
• Supports for eDiscovery to enable legal investigations and litigation holds

Module 13: Backup and Archive


Data Archiving Solution Architecture
Primary
Archiving Server Storage
(Policy Engine)
• Key components E-mail Server

• Archiving agent A

• Running on application servers


• Responsible for scanning the data that
can be archived Network

• Archiving server (policy engine) A

• Policy is configured
• Archiving storage device File Server

• Archived data can be stored on tape, Archive


disk, or cloud Storage

Archiving
Clients A Agent

Module 13: Backup and Archive


Content Addressed Storage (CAS) – An Archival Solution
• Content addressed storage (CAS) is a special type of object-based storage,
purposely built for storing fixed content
• Provides online accessibility to archived data
• Each object is assigned a globally unique identifier, known as content address
(CA)
• CA is derived from the binary representation of the data
• CAS device can be accessed via the CAS API running on the application server
• Enables organization to meet the required SLAs

Module 13: Backup and Archive


Key Features of CAS
Feature Description

Data integrity Provides assurance that the stored content has not been altered

Content authenticity Assures the genuineness of stored content

Uses a unique content address to guarantee the storage of only a


Single instance storage
single instance of an object

Configurable retention settings ensure content is not erased prior


Retention enforcement
to the expiration of its defined retention period

Allows the addition of more nodes to the cluster to scale without


Scalability any interruption

Module 13: Backup and Archive


Key Features of CAS (Cont’d)
Feature Description
Physical location of the stored data irrelevant to the application
Location independence
that requests the data
Ensures content stored on the CAS system is available even if a
Data protection
disk or a node fails

Provides faster access to the objects compared to tapes and


Performance
optical discs

Self-healing Automatically detects and repairs corrupted objects

Keeps track of management activities and any access or


Audit trails
disposition of data

Module 13: Backup and Archive


Role of CAS in Healthcare Domain
Hospital

Stored locally for


short-term use
(60 days) API

Data moved to CAS


Application (after 60 days)
Server
Patient Records CAS System

• CAS facilitates secured long-term storage of patient records and in compliance with
regulations
• Provides immediate access to patient’s record, when needed

Module 13: Backup and Archive


Use Case: E-mail Archiving
• Moves the e-mails from primary to archive storage, based on policy
• Saves space on primary storage
• Enables to retain e-mails in the archive for longer period to meet regulatory
requirements
• Gives end users virtually unlimited mailbox space

Module 13: Backup and Archive


Cloud-based Archiving
Primary Storage

Archive
Data

Network WAN Cloud

E-mail Server/ File Server

Archiving Server
(Policy Engine) • Organizations prefer hybrid cloud options
- Archived data that may require high-speed access is
retained internally (private cloud) while lower-priority
Data Center archive data is moved to low-cost, public cloud-based
archive storage
• No CAPEX, pay-as-you-go, faster deployment

• Reduced management overhead of IT

• Supports massive data growth and retention requirements

Module 13: Backup and Archive


Key Considerations for Cloud-based Archiving
Consideration Description

Must reflect cost, availability, performance, retention and disposition policies, search and data access, penalty, data privacy,
SLA
ownership, and compensation for data loss as parameters of the agreement

Refers to a situation where a consumer is locked to a service provider due to the complexity or restrictions imposed by the
Vendor lock-in
provider

Organization should assess compliance requirements and convey to the service provider. Organization’s compliance
Compliance
requirements may include internal policies and legal requirements.

Various mechanisms (secure multi-tenancy, encryption, shredding, access and identity management) should be deployed by
Data security
service provider to ensure security for data stored in the cloud archive

Pricing Consumers should consider various factors and decide which pricing model is best suited to their needs

Module 13: Backup and Archive