Sie sind auf Seite 1von 448

Data Domain System

Administration
Student Guide

Education Services
May 2013

Table of Contents
EMC Data Domain System Administration Course Introduction............................................................ 1
Module 1: Technology Overview ........................................................................................................ 15
Module 2: Basic Administration ......................................................................................................... 69
Module 3: Managing Network Interfaces ......................................................................................... 131
Module 4: CIFS and NFS ................................................................................................................... 169
Module 5: File System and Data Management ................................................................................. 193
Module 6: Data Replication and Recovery ........................................................................................ 257
Module 7: Tape Library and VTL Concepts ........................................................................................ 303
Module 8: DD Boost ......................................................................................................................... 347
Module 9: Data Security................................................................................................................... 379
Module 10: Sizing, Capacity and Throughput Planning and Tuning ................................................... 417

Slide 1

DATA DOMAIN
SYSTEM ADMINISTRATION

Support Contact: Education Services

Copyright 2013 EMC Corporation. All Rights Reserved.

Welcome to Data Domain System Administration.


Copyright 1996, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013
EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of
its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.

EMC2, EMC, Data Domain, RSA, EMC Centera, EMC ControlCenter, EMC LifeLine, EMC OnCourse, EMC
Proven, EMC Snap, EMC SourceOne, EMC Storage Administrator, Acartus, Access Logix, AdvantEdge,
AlphaStor, ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated
Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Captiva, Catalog Solution, C-Clip,
Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, ClaimPack, ClaimsEditor, CLARiiON,
ClientPak, Codebook Correlation Technology, Common Information Model, Configuration Intelligence,
Configuresoft, Connectrix, CopyCross, CopyPoint, Dantz, DatabaseXtender, Direct Matrix Architecture,
DiskXtender, DiskXtender 2000, Document Sciences, Documentum, elnput, E-Lab, EmailXaminer,
EmailXtender, Enginuity, eRoom, Event Explorer, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony,
Global File Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, InfoMover,
Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, ISIS, Max Retriever, MediaStor,
MirrorView, Navisphere, NetWorker, nLayers, OnAlert, OpenScale, PixTools, Powerlink, PowerPath,
PowerSnap, QuickScan, Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the RSA logo,
SafeLine, SAN Advisor, SAN Copy, SAN Manager, Smarts, SnapImage, SnapSure, SnapView, SRDF,
StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX,
TimeFinder, UltraFlex, UltraPoint, UltraScale, Unisphere, VMAX, Vblock, Viewlets, Virtual Matrix, Virtual
Matrix Architecture, Virtual Provisioning, VisualSAN, VisualSRM, Voyence, VPLEX, VSAM-Assist,
WebXtender, xPression, xPresso, YottaYotta, the EMC logo, and where information lives, are registered
trademarks or trademarks of EMC Corporation in the United States and other countries.
All other trademarks used herein are the property of their respective owners.
Copyright 2013 EMC Corporation. All rights reserved. Published in the USA.
Revision Date: 04/23/2013
Revision Number: MR-1CP-DDSADMIN.5.2.1.0

Slide 2

Class Introductions

Name
Company
Region
Role
Data Domain system experience

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

Slide 3

Classroom Etiquette

Do not use the following during lectures:


Cell phones/PDAs (set to vibrate if possible)
Laptops (must be closed during lecture)

If your cell phone rings, answer it as

you step out of the classroom


Food and drink permitted in classroom, but
not lab
Inform your instructor, and lab partner if
applicable, if you will be absent from any
classroom sessions. Excessive absences result
in non-attendance status, and you will not
receive credit for the course

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

Slide 4

Course Overview

Description

Audience

This EMC Education Services course provides the knowledge and skills needed to
manage a Data Domain system. This course provides lectures and hands-on
learning.
This course is for any person who presently manages or plans to manage Data
Domain systems.

Prior to attending this course, you should have attended the EMC Data Domain

Prerequisites Systems and Technology Introduction course.

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

Slide 5

Course Objectives

Upon completion of this course, you should be able to:


Describe deduplication
Describe Data Domain technologies, including Data Domain

deduplication
Monitor a Data Domain system
Perform a Data Domain system initial setup
Identify and configure Data Domain data paths
Configure and manage Data Domain network interfaces

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

Upon completion of this course, you should be able to:


Describe deduplication
Describe Data Domain technologies, including Data Domain deduplication
Monitor a Data Domain system
Perform a Data Domain system initial setup
Identify and configure Data Domain data paths
Configure and manage Data Domain network interfaces

Slide 6

Course Objectives (Continued)

Upon completion of this course, you should be able to:

Access and copy data to a Data Domain system


Customize and manage a Data Domain deduplication file system
Describe and perform data replication and recovery
Describe and configure a VTL
Describe DD Boost
Perform a DD Boost backup
Describe capacity and throughput planning

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

Upon completion of this course, you should be able to:


Access and copy data to a Data Domain system
Customize and manage a Data Domain deduplication file system
Describe and perform data replication and recovery
Describe and configure a VTL
Describe DD Boost
Perform a DD Boost backup
Describe capacity and throughput planning

Slide 7

Course Flow
Conceptual

Configuration

Application

Monitoring

CIFS and NFS

Configure CIFS and


NFS

CIFS and NFS


Management

Monitor CIFS/NFS
Performance

File System and


Data Management

Data Management
Operations

File System
Management

Monitoring Mtrees,
Space Usage and
Consumption

Data Domain
Introduction

Data Replication and


Recovery

Replication
Operations

Replication
Concepts, Types
and Topologies

Monitor VTL
Performance

Basic
Administration

Tape Library and


VTL Concepts

Configure Data
Domain as a VTL

Backup and Restore


using VTL

Monitor VTL
Performance

DD Boost

Configure Data
Domain to use DD
Boost

Backup and Restore


using DD Boost

Monitor DD Boost
Performance

Data Security

File System Security


Setup

Retention Lock, File


System Lock, Data
Sanitization and
Encryption

Foundation

Managing Network
Interfaces

Throughput
Monitoring and
Tuning

Sizing, Capacity and


Throughput Planning

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

Slide 8

Agenda
Modules

Day 1

Labs

1.

Technical Overview

Lab 1.1: VDC Introduction and Data


Domain Administration Interfaces

2.

Basic Administration

Lab 2.1: Initial Setup and Hardware


Verification
Lab 2.2: Managing System Access
Lab 2.3: Monitoring a Data Domain
System
Lab 2.4: Licensed Features

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

Slide 9

Agenda (Continued)
Modules

Labs

3.

Managing Network Interfaces

Lab 3.1: Configuring Network


Interfaces
Lab 3.2: Configuring Link Aggregation
Lab 3.3: Configuring Link Failover

4.

CIFS and NFS

Lab 4.1: Configuring CIFS on a Data


Domain System
Lab 4.2: Configuring NFS on a Data
Domain System

5.

File System and Data


Management

Lab 5.1: Configuring MTrees and


Quotas
Lab 5.2 Configuring Snapshots
Lab 5.3: Configuring Fast Copy
Lab 5.4 Configuring File System
Cleaning

Day 2

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

10

Slide 10

Agenda (Continued)
Modules
Day 3

Labs

6.

Data Replication and Recovery

Lab 6.1: Managing Replication

7.

Tape Library and VTL Concepts

Lab 7.1: Setting Up VTL on a Data


Domain System

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

11

10

Slide 11

Agenda (Continued)
Modules

Labs

8.

Data Security

Lab 8.1: Configuring Retention Lock


Compliance
Lab 8.2: Configuring Data Sanitization

9.

DD Boost

Lab 9.1: Configuring DD Boost with


EMC Networker
Lab 9.2: Configuring DD Boost with
NetBackup

10.

Sizing, Capacity and Throughput


Planning and Tuning

Day 4

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

12

11

Slide 12

Course Materials
Bring these materials with you to class each day:
Student Guide
Lab Guide

Data Domain System Administration/Course Introduction

Copyright 2013 EMC Corporation. All Rights Reserved.

12

You can use your student guide to follow the lecture and take notes. Space is provided for you to take
notes.
Use the lab guide to get step-by-step instructions to complete the labs.
Bring these materials with you to class each day.

13

14

Slide 1

Module 1: Technology Overview

Upon completion of this module, you should be able to:


Describe features of the Data Domain OS
Describe DD storage integration
Describe deduplication on a Data Domain system
Describe SISL and DIA
List the protocols used by a Data Domain system
Describe Data Domain shared file systems and their purpose
Describe Data Domain data paths
Access the primary Data Domain administrative interfaces

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

This module focuses on Data Domain core technologies. It includes the following lessons:
Data Domain Overview
Deduplication Basics
EMC Data Domain Stream-Informed Segment Layout (SISL) Scaling Architecture Overview
EMC Data Domain Data Invulnerability Architecture (DIA) Overview
EMC Data Domain File Systems Introduction
EMC Data Domain Protocols Overview
EMC Data Domain Data Paths Overview
EMC Data Domain Administration Interfaces
This module also includes a lab, which will enable you to test your knowledge.

15

Slide 2

Module 1: Technology Overview

Lesson 1: Data Domain Overview


This lesson covers the following topics:
What is a Data Domain system?
Hardware overview
Software overview

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson is an introduction to the EMC Data Domain appliance. The first topic answers the question:
What is a Data Domain system? Also covered in this lesson is an overview of some Data Domain OS
software features and a current hardware model overview.

16

Slide 3

What is a Data Domain System?

A Data Domain system is:


A storage system used for backup and archiving

workloads that:

Performs high-speed deduplication to maximize

storage efficiency
Ensures recoverability of data through integrated
data integrity intelligence
Can replicate data automatically for disaster
recovery
Easily integrates via Ethernet and Fibre Channel into
existing backup infrastructures
Safe and reliable
Provides Continuous recovery verification, fault
detection, and healing for end-to-end data integrity

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery. An EMC Data Domain system can also be used for online storage with additional features and
benefits.
A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.
Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity.
Most Data Domain systems have a controller and multiple storage units.

17

Slide 4

Hardware Overview

EMC has several hardware offerings to meet a variety of


environments including:

Small enterprise data centers and remote offices


Midsized enterprise data centers
Enterprise data centers
Large enterprise data centers
EMC Data Domain Expansion Shelves

Visit the Data Domain Hardware page on http://www.emc.com/


for specific models and specifications.

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

EMC has several hardware offerings to meet a variety of environments including:

Small enterprise data centers and remote offices


Midsized enterprise data centers
Enterprise data centers
Large enterprise data centers
EMC Data Domain Expansion Shelves

Visit the Data Domain Hardware page on http://www.emc.com/ for specific models and specifications.

http://www.emc.com/ > Products and Solutions > Backup and Recovery > EMC Data Domain >
Hardware

18

Slide 5

Software Overview
The latest Data Domain Operating System (DD OS):
Supports leading backup, file archiving, and email archiving
applications
Allows simultaneous use of VTL, CIFS, NFS, NDMP, and EMC
Data Domain Boost
Provides inline write/read verification, continuous fault
detection, and healing
Meets IT governance and regulatory compliance standards for
archived data

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

The latest Data Domain Operating System (DD OS) has several features and benefits, including:
Support for leading backup, file archiving, and email archiving applications
Simultaneous use of VTL, CIFS, NFS, NDMP, and EMC Data Domain Boost
Inline write/read verification, continuous fault detection, and healing
Conformance with IT governance and regulatory compliance standards for archived data

19

Slide 6

Module 1: Technology Overview

Lesson 2: Deduplication Basics


This lesson covers the following topics:
Deduplication fundamentals
Fingerprints
File-Based, Fixed-Length and Variable-Length Deduplication
Post-Process and Inline Deduplication
Target- and Source-Based Deduplication
Data Domain Global and Local Compression

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers deduplication, which is an important technology that improves data storage by
providing extremely efficient data backups and archiving. This lesson also covers the different types of
deduplication (inline, post-process, file-based, block-based, fixed-length, and variable-length) and the
advantages of each type. The last topic in this lesson covers Data Domain deduplication and its
advantages.

20

Slide 7

Deduplication Fundamentals
Deduplication has the following characteristics:
It is performed at the sub-file, whole file, or backup job level
Redundant data is stored only once
Multiple instances point to the same copy
Deduplication performance is dependent on several factors:
New Data

Amount of

data
Bandwidth
CPU
Disk speed
Memory

P L A Q
U A P L
P L A Q
Files

Segments
Smaller References

Unique Instance Stored

16

12

17

21

16

12

16

12

17

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

Deduplication is similar to data compression, but it looks for redundancy of large sequences of bytes.
Sequences of bytes identical to those previously encountered and stored are replaced with references
to the previously encountered data.
This is all hidden from users and applications. When the data is read, the original data is provided to the
application or user.
Deduplication performance is dependent on the amount of data, bandwidth, disk speed, CPU, and
memory or the hosts and devices performing the deduplication.
When processing data, deduplication recognizes data that is identical to previously stored data. When it
encounters such data, deduplication creates a reference to the previously stored data, thus avoiding
storing duplicate data.

21

Slide 8

Fingerprints
How deduplication compresses data:
Deduplication typically uses hashing algorithms
Hashing algorithms yield a unique value based on data content
The unique value is called a hash, fingerprint, or checksum
The fingerprint is much smaller than the original data
Fingerprints are used to determine if data is new or duplicate

P L A Q U A P L P L A Q
42

37

89

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

Deduplication typically uses hashing algorithms.


Hashing algorithms yield a unique value based on the content of the data being hashed. This value is
called the hash or fingerprint, and is much smaller in size than the original data.
Different data contents yield different hashes; each hash can be checked against previously stored
hashes.

22

Slide 9

File-Based Deduplication

Pros
Only one copy of file content is stored
Identical copies are replaced with a reference to the original

Cons
Any change to the file results in the whole file being stored again
It uses more disk space than other deduplication methods
Original Data

Deduplicated Data

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

In file-based deduplication, only the original instance of a file is stored. Future identical copies of the file
use a small reference to point to the original file content. File-based deduplication is sometimes called
single-instance storage (SIS).
In this example, eight files are being deduplicated. The blue files are identical, but each has its own copy
of the file content. The grey files also have their own copy of identical content. After deduplication there
are still eight files. The blue files point to the same content, which is stored only once on disk. This is
similar for the grey files. If each file is 20 megabytes, the file-based deduplication has reduced the
storage required from 160 megabytes to 40.
File-based deduplication enables storage savings. It can be combined with compression (a way to
transmit the same amount of data in fewer bits) for additional storage savings. It is popular in desktop
backups. It can be more effective for data restores. It doesnt need to re-assemble files. It can be
included in backup software, so an organization doesnt have to depend on a vendor disk.

23

File-based deduplication results are often not as great as with other types of deduplication (such as
block- and segment-based deduplication). The most important disadvantage is there is no deduplication
with previously backed up files if the file is modified.
File-based deduplication stores an original version of a file and creates a digital signature for it (such as
SHA1, a standard for digital signatures). Future exact copy iterations of the file are pointed to the digital
signature rather than being stored.

24

Slide 10

Fixed-Length Deduplication

P L A Q U A P L P L A Q
42

56

42

P L A Q U A P L
42

56

42

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

10

Fixed-length segment deduplication (also called block-based deduplication or fixed-segment


deduplication) is a technology that reduces data storage requirements by comparing incoming data
segments (also called fixed data blocks or data chunks) with previously stored data segments. It divides
data into a single, fixed length (for example, 4 KB, 8 KB, 12 KB).
Fixed-length segment deduplication reads data and divides it into fixed-size segments. These segments
are compared to other segments already processed and stored. If the segment is identical to a previous
segment, a pointer is used to point to that previous segment.
In this example, the data stream is divided into a fixed length of four units. Small pointers to the
common content are assembled in the correct order to represent the original data. Each unique data
element is stored only once.
For data that is identical (does not change), fixed-length segment deduplication reduces storage
requirements.

25

Slide 11

Fixed-Length Deduplication

A
P L A Q U A P L P L A Q
Add one byte

42

56

42

AP L A Q U A P L P L A Q
68

87

30

11

Four new segments

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

11

When data is altered the segments shift, causing more segments to be stored. For example, when you
add a slide to a Microsoft PowerPoint deck, all subsequent blocks in the file are rewritten and are likely
to be considered as different from those in the original file, so the deduplication effect is less significant.
Smaller blocks get better deduplication than large ones, but it takes more resources to deduplicate.
In backup applications, the backup stream consists of many files. The backup streams are rarely entirely
identical even when they are successive backups of the same file system. A single addition, deletion, or
change of any file changes the number of bytes in the new backup stream. Even if no file has changed,
adding a new file to the backup stream shifts the rest of the backup stream. Fixed-sized segment
deduplication backs up large numbers of segments because of the new boundaries between the
segments.
Many hardware and software deduplication products use fixed-length segments for deduplication.

26

Slide 12

Variable-Length Deduplication

P L A Q U A P L P L A Q
21

56

21

28

21

P L A Q U A A Q
21

21

56

21

28

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

12

Variable-length segment deduplication evaluates data by examining its contents to look for the
boundary from one segment to the next. Variable-length segments are any number of bytes within a
range determined by the particular algorithm implemented.
Unlike fixed-length segment deduplication, variable-length segment deduplication uses the content of
the stream to divide the backup or data stream into segments based on the contents of the data stream.

27

Slide 13

Variable-Length Deduplication
Add one byte

P L A Q U A P L P L A Q
21

56

21

28

21

A P L A Q U A P L P L A Q
24

56

21

21

28

One new segment

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

13

When you apply variable-length segmentation to a data sequence, deduplication uses variable data
segments when it looks at the data sequence. In this example, byte A is added to the beginning of the
data. Only one new segment needs to be stored, since the data defining boundaries between the
remaining data were not altered.
Eventually variable-length segment deduplication will find the segments that have not changed, and
backup fewer segments than fixed-size segment deduplication. Even for storing individual files, variable
length segments have an advantage. Many files are very similar to, but not identical to, other versions of
the same file. Variable length segments will isolate the changes, find more identical segments, and store
fewer segments than fixed-length deduplication.

28

Slide 14

Post-Process Deduplication
In contrast, post-process deduplication:
Should not interfere with the incoming backup data speed
Requires more I/O
Writes files first to disk in their entirety, then scans and
deduplicates them
Post-Process
Deduplication

Backup Server
All incoming data written to disk first

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

14

With post-process deduplication, files are written to disk first, and then they are scanned and
compressed.
Post-process deduplication should never interfere with the incoming backup data speed.
Post-process deduplication requires more I/O. It writes new data to disk and then reads the new data
before it checks for duplicates. It requires an additional write to delete the duplicate data and another
write to update the hash table. If it cant determine whether a data segment is duplicate or new, it
requires another write (this happens about 5% of the time). It requires more disk space to:
initially capture the data.
store multiple pools of data.
provide adequate performance by distributing the data over a large number of drives.
Post-process deduplication is run as a separate processing task and could lengthen the time needed to
fully complete the backup.

29

In post-process deduplication, files are first written to disk in their entirety (they are buffered to a large
cache). After the files are written, the hard drive is scanned for duplicates and compressed. In other
words, with post-process deduplication, deduplication happens after the files are written to disk.
With post-process deduplication, a data segment enters the appliance (as part of a larger stream of data
from a backup), and it is written to disk in its entirety. Then a separate process (running asynchronously
and possibly from another appliance accessing the same disk) reads the block of data to determine if it is
a duplicate. If it is a duplicate, it is deleted and replaced with a pointer. If it is new, it is stored.

30

Slide 15

Data Domain Inline Deduplication


New Data

P L A Q U A P L P L A Q
RAM

P L A Q UA A Q
21 56 21 21 28
New data compared to previously
Stored data before it is written to disk

21 56 21 21 28

P L A Q U A A Q
Deduplication Disk

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

15

With Data Domain inline deduplication, incoming data is examined as soon as it arrives to determine if a
segment (or block, or chunk) is new or unique or a duplicate of a segment previously stored. Inline
deduplication occurs in RAM before the data is written to disk. Around 99% of data segments are
analyzed in RAM without disk access. A very small amount of data is not identified immediately as either
unique or redundant. That data is stored to disk and examined again later against the previously stored
data.
In some cases, an inline deduplication process will temporarily store a small amount of data on disk
before it is analyzed.
The process is shown in this slide, as follows:
Inbound segments are analyzed in RAM.
If a segment is redundant, a reference to the stored segment is created.
If a segment is unique, it is compressed and stored.

31

Inline deduplication requires less disk space than post-process deduplication. There is less
administration for an inline deduplication process, as the administrator does not need to define and
monitor the staging space.
Inline deduplication analyzes the data in RAM, and reduces disk seek times to determine if the new data
must be stored.

32

Slide 16

Source-Based vs. Target-Based Deduplication

Source-based deduplication
Occurs near where data is created
Uses a host-resident agent that reduces data at the server source

and sends just changed data over the network


Reduces the data stream prior to transmission, thereby reducing
bandwidth constraints

Target-based deduplication
Occurs near where the data is stored
Is controlled by a storage system, rather than a host
Provides an excellent fit for a virtual tape library (VTL) without

substantial disruption to existing backup software infrastructure


and processes
Works best for higher change-rate environments

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

16

When the deduplication occurs close to where data is created, it is often referred to as source-based
deduplication, whereas when it occurs near where the data is stored, it is commonly called target-based
deduplication.
Source-based deduplication
Occurs near where data is created
Uses a host-resident agent that reduces data at the server source and sends just changed data
over the network
Reduces the data stream prior to transmission, thereby reducing bandwidth constraints
Target-based deduplication
Occurs near where the data is stored
Is controlled by a storage system, rather than a host
Provides an excellent fit for a virtual tape library (VTL) without substantial disruption to existing
backup software infrastructure and processes
Works best for higher change-rate environments

33

Slide 17

How Data Domain Stores Data Efficiently

Global compression = deduplication


Identifies previously stored segments
Cannot be turned off

Local compression
Compresses segments before writing them to disk
Uses common, industry-standard algorithms (lz, gz, and gzfast)
Is similar to zipping a file to reduce the file size
Can be turned off

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

17

EMC Data Domain Global Compression is the EMC Data Domain trademarked name for global
compression, local compression, and deduplication.
Global compression equals deduplication. It identifies previously stored segments and cannot be turned
off.
Local compression compresses segments before writing them to disk. It uses common, industrystandard algorithms (for example, lz, gz, and gzfast). The default compression algorithm used by Data
Domain systems is lz.
Local compression is similar to zipping a file to reduce the file size. Zip is a file format used for data
compression and archiving. A zip file contains one or more files that have been compressed, to reduce
file size, or stored as is. The zip file format permits a number of compression algorithms. Local
compression can be turned off.

34

Slide 18

Module 1: Technology Overview

Lesson 3: Stream-Informed Segment Layout (SISL) Overview


This lesson covers the following topics:
SISL overview and definition
How SISL works

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers EMC Data Domain SISL Scaling Architecture.


EMC Data Domain SISL Scaling Architecture is also called:
Stream-Informed Segment Layout (SISL) scaling architecture
SISL scaling architecture
SISL architecture
SISL technology
SISL architecture helps to speed up Data Domain systems.
In this lesson, you will learn more about SISL architecture, its advantages, and how it works.

35

18

Slide 19

SISL Overview and Definition

Used to implement EMC Data Domain inline deduplication


Uses fingerprints and RAM to identify segments already on disk
Avoids excessive disk reads to check if segment is on disk
99% of segments processed without disk reads to check

fingerprints
Scales with Data Domain systems using newer and faster CPUs and
RAM
Increases new-data processing throughput-rate

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

SISL architecture provides fast and efficient deduplication:


99% of duplicate data segments are identified inline in RAM before they are stored to disk.
System throughput increases directly as CPU performance increases.
Reduces the disk footprint by minimizing disk access.

36

19

Slide 20

How SISL Works


1. Segment: Data Sliced into Segments
2. Fingerprint: Segments given fingerprint ID (segment ID)
3. Filter: Fingerprint IDs compared to fingerprints in cache

If fingerprint ID new, continue


If fingerprint ID duplicate, reference, then discard redundant segment

4. Compress: Groups of new segments compressed using common technique (lz,


5.

gz, gzfast)
Write: Segments (including fingerprints, metadata, & logs) written to
containers, containers written to disk

~~~~~
~~~~~
~~~~~

~~~~~
~~~~~
~~~~~

~~~~~
~~~~~
~~~~~

~~~~~
~~~~~
~~~~~

~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~
~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~ ~~~~~
~~~~~
~~~~~
~~~~~
~~~~~
~~~~~

2
1

container
disk

disk

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

20

SISL does the following:


1. Segments.
The data is broken into variable-length segments.
2. Fingerprints.
Each segment is given a fingerprint, or hash, for identification.
3. Filters.
The summary vector and segment locality techniques identify 99% of the duplicate segments in
RAM, inline, before storing to disk.
4. Compresses.
New segments are compressed using common algorithms (lz by default).
5. Writes.
Writes data to containers, and containers are written to disk.

37

Slide 21

Module 1: Technology Overview

Lesson 4: Data Invulnerability Architecture (DIA) Overview


This lesson covers the following topics:
DIA overview and definition
End-to-End Verification
Fault Avoidance and Containment
Fault Detection and Healing
File System Recovery

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

21

This lesson covers EMC Data Domain Data Invulnerability Architecture (DIA), which is an important EMC
Data Domain technology that provides safe and reliable storage.

38

Slide 22

DIA Overview and Definition

Provides safe and reliable storage


Fights data loss in four ways:
End-to-end verification
2. Fault avoidance and containment
3. Continuous fault detection and healing
4. File system recovery
1.

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

22

Data Invulnerability Architecture (DIA) is an important EMC Data Domain technology that provides safe
and reliable storage.
The EMC Data Domain operating system (DD OS) is built for data protection. Its elements comprise an
architectural design whose goal is data invulnerability. Four technologies within the DIA fight data loss:
1. End-to-end verification
2. Fault avoidance and containment
3. Continuous fault detection and healing
4. File system recoverability
DIA helps to provide data integrity and recoverability and extremely resilient and protective disk
storage. This keeps data safe.

39

Slide 23

End-to-End Verification
1. Writes request from
2.
3.
4.
5.

6.

Generate
Checksum

backup software.
Analyzes data for
redundancy.
Stores new data segments.
Stores fingerprints.
Verifies, after backup I/O,
that the Data Domain OS
(DD OS) can read the data
from disk and through the
Data Domain file system.
Verifies that the checksum
that is read back matches
the checksum written to
disk.

Verify

File System
Global Compression
Local Compression
RAID

Verify file system


metadata integrity

Verify user data


integrity

Verify stripe integrity

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

23

The end-to-end verification check verifies all file system data and metadata. The end-to-end verification
flow:
1. Writes request from backup software.
2. Analyzes data for redundancy.
3. Stores new data segments.
4. Stores fingerprints.
5. Verifies, after backup I/O, that the Data Domain OS (DD OS) can read the data from disk and
through the Data Domain file system.
6. Verifies that the checksum that is read back matches the checksum written to disk.
If something goes wrong, it is corrected through self-healing and the system alerts to back up again.
Since every component of a storage system can introduce errors, an end-to-end test is the simplest way
to ensure data integrity. End-to-end verification means reading data after it is written and comparing it
to what was sent do disk, proving that it is reachable through the file system to disk, and proving that
data is not corrupted.

40

When the DD OS receives a write request from backup software, it computes a huge checksum over the
constituent data. After analyzing the data for redundancy, it stores the new data segments and all of the
checksums. After the I/O has selected a backup and all data is synced to disk, the DD OS verifies that it
can read the entire file from the disk platter and through the Data Domain file system, and that the
checksums of the data read back match the checksums of the written data.
This ensures that the data on the disks is readable and correct and that the file system metadata
structures used to find the data are also readable and correct. This confirms that the data is correct and
recoverable from every level of the system. If there are problems anywhere, for example if a bit flips on
a disk drive, it is caught. Mostly, a problem is corrected through self-healing. If a problem cant be
corrected, it is reported immediately, and a backup is repeated while the data is still valid on the primary
store.

41

Slide 24

Fault Avoidance and Containment


The Data Domain logging file system has these important benefits:
1. New data never overwrites existing data
2. There are fewer complex data structures
3. It includes non-volatile RAM (NVRAM) for fast, safe restarts

Old Data

New Data

Data Container Log

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

24

Data Domain systems are equipped with a specialized log-structured file system that has important
benefits.
1. New data never overwrites existing data. (The system never puts existing data at risk.)
Traditional file systems often overwrite blocks when data changes, and then use the old block
address. The Data Domain file system writes only to new blocks. This isolates any incorrect
overwrite (a software bug problem) to only the newest backup data. Older versions remain safe.
As shown in this slide, the container log never overwrites or updates existing data. New data is
written to new containers. Old containers and references remain in place and safe even when
software bugs or hardware faults occur when new backups are stored.
2. There are fewer complex data structures.
In a traditional file system, there are many data structures (for example, free block bit maps and
reference counts) that support fast block updates. In a backup application, the workload is
primarily sequential writes of new data. Because a Data Domain system is simpler, it requires
fewer data structures to support it. As long as the Data Domain system can keep track of the
head of the log, new writes never overwrite old data. This design simplicity greatly reduces the
chances of software errors that could lead to data corruption.

42

3. The system includes non-volatile RAM (NVRAM) for fast, safe restarts.
The system includes a non-volatile RAM (NVRAM) write buffer into which it puts all data not yet
safely on disk. The file system leverages the security of this write buffer to implement a fast,
safe restart capability.
The file system includes many internal logic and data structure integrity checks. If a problem is
found by one of these checks, the file system restarts. The checks and restarts provide early
detection and recovery from the kinds of bugs that can corrupt data. As it restarts, the Data
Domain file system verifies the integrity of the data in the NVRAM buffer before applying it to
the file system and thus ensures that no data is lost due to a power outage.
For example, in a power outage, the old data could be lost and a recovery attempt could fail. For
this reason, Data Domain systems never update just one block in a stripe. Following the nooverwrite policy, all new writes go to new RAID stripes, and those new RAID stripes are written
in their entirety. The verification-after-write ensures that the new stripe is consistent (there are
no partial stripe writes). New writes dont put existing backups at risk.

43

Slide 25

Fault Detection and Healing


Scrubbing buffers recheck formatted data
blocks and correct errors on the fly

File System
Global Compression
Local Compression
RAID

Check Stripe coherence and repair faults

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

25

Continuous fault detection and healing provide an extra level of protection within the Data Domain
operating system. The DD OS detects faults and recovers from them continuously. Continuous fault
detection and healing ensures successful data restore operations.
Here is the flow for continuous fault detection and healing:
1. The Data Domain system periodically rechecks the integrity of the RAID stripes and container
logs.
2. The Data Domain system uses RAID system redundancy to heal faults. RAID 6 is the foundation
for Data Domain systems continuous fault detection and healing. Its dual-parity architecture
offers advantages over conventional architectures, including RAID 1 (mirroring), RAID 3, RAID 4
or RAID 5 single-parity approaches.
RAID 6:
Protects against two disk failures.
Protects against disk read errors during reconstruction.
Protects against the operator pulling the wrong disk.
Guarantees RAID stripe consistency even during power failure without reliance on NVRAM
or an uninterruptable power supply (UPS).

44

Verifies data integrity and stripe coherency after writes.


By comparison, after a single disk fails in other RAID architectures, any further simultaneous
disk errors cause data loss. A system whose focus is data protection must include the extra
level of protection that RAID 6 provides.

3. During every read, data integrity is re-verified.


4. Any errors are healed as they are encountered.
To ensure that all data returned to the user during a restore is correct, the Data Domain file
system stores all of its on-disk data structures in formatted data blocks. These are selfidentifying and covered by a strong checksum. On every read from disk, the system first verifies
that the block read from disk is the block expected. It then uses the checksum to verify the
integrity of the data. If any issue is found, it asks RAID 6 to use its extra level of redundancy to
correct the data error. Because the RAID stripes are never partially updated, their consistency is
ensured and thus so is the ability to heal an error when it is discovered.
Continuous error detection works well for data being read, but it does not address issues with
data that may be unread for weeks or months before being needed for a recovery. For this
reason, Data Domain systems actively re-verify the integrity of all data every week in an ongoing
background process. This scrub process finds and repairs defects on the disk before they can
become a problem.

45

Slide 26

File System Recovery


File system can be recreated by scanning the
logs and using the metadata stored with the data

Meta
Data

Meta
Data
Containers

Data

Data

Data Container Log


Data written in self-describing
format

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

26

The EMC Data Domain Data Invulnerability Architecture (DIA) file system recovery is a feature that
reconstructs lost or corrupted file system metadata. It includes file system check tools.
If a Data Domain system does have a problem, DIA file system recovery ensures that the system is
brought back online quickly.
This slide shows DIA file system recovery:
Data is written in a self-describing format.
The file system can be recreated by scanning the logs and rebuilding it from metadata stored
with the data.
In a traditional file system, consistency is not checked. Data Domain systems check through initial
verification after each backup to ensure consistency for all new writes. The usable size of a traditional
file system is often limited by the time it takes to recover the file system in the event of some sort of
corruption.

46

Imagine running fsck on a traditional file system with more than 80 TB of data. The reason the checking
process can take so long is the file system needs to sort out the locations of the free blocks so new
writes do not accidentally overwrite existing data. Typically, this entails checking all references to
rebuild free block maps and reference counts. The more data in the system, the longer this takes.
In contrast, since the Data Domain file system never overwrites existing data and doesnt have block
maps and reference counts to rebuild, it has to verify only the location of the head of the log to safely
bring the system back online and restore critical data.

47

Slide 27

Module 1: Technology Overview

Lesson 5: Data Domain File System Introduction


This lesson covers the following topics:
ddvar (Administrative files)
MTrees (File storage)

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers the Data Domain file system. The Data Domain file system includes:
ddvar (Administrative files)
MTrees (File Storage)

48

27

Slide 28

ddvar

Consists of Administrative files


Stores
/ddvar

Core files
Log files

/log

Support upload bundles


Compressed core files

/releases

.rpm upgrade packages

Cannot be renamed or deleted


Does not provide access to all ddvar

/snmp

/support

sub-directories

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

28

Data Domain system administrative files are stored in /ddvar. This directory stores system core and
log files, generated support upload bundles, compressed core files, and .rpm upgrade packages.

The NFS directory is /ddvar


The CIFS share is \ddvar

The ddvar file structure keeps administrative files separate from storage files.
You cannot rename or delete /ddvar, nor can you access all of its sub-directories, such as the core subdirectory.

49

Slide 29

MTree Introduction

Is the destination directory for

deduplicated data
Is the Root directory for deduplicated
data
Lets you configure directory export levels
to separate and organize backup files
Lets you manage each MTree directory
separately (for example, different
compression rates)

/data

/col1
/backup

/HR

/sales

/support

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

29

The MTree (Managed Tree) file structure is the destination for deduplicated data. It is also the root
directory for deduplicated data. It comes pre-configured for NFS export as /backup. You configure
directory export levels to separate and organize backup files in the MTree file system.
The MTree file structure:
Uses compression.
Implements data integrity.
Reclaims storage space with file-system cleaning. You will learn more about file-system cleaning
later in this course.
MTrees provide more granular space management and reporting. This allows for finer management of
replication, snapshots, and retention locking. These operations can be performed on a specific MTree
rather than on the entire file system. For example, you can configure directory export levels to separate
and organize backup files.

50

Although a Data Domain system supports a maximum of 100 MTrees, system performance might
degrade rapidly if more than 14 MTrees are actively engaged in read or write streams. The degree of
degradation depends on overall I/O intensity and other file-system loads. For optimum performance,
you should contain the number of simultaneously active MTrees to a maximum of 14. Whenever
possible, it is best to aggregate operations on the same MTree into a single operation.
You can add subdirectories to MTree directories. You cannot add anything to the /data directory. You
can change only the col1 subdirectory. The backup MTree (/data/col1/backup) cannot be
deleted or renamed. If MTrees are added, they can be renamed and deleted. You can replicate
directories under /backup.

51

Slide 30

Module 1: Technology Overview

Lesson 6: Data Domain Protocols Introduction


This lesson covers the following topics:
NFS
CIFS
VTL
DD Boost
NDMP

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers Data Domain protocols, which include:


NFS
CIFS
VTL
DD Boost
NDMP

52

30

Slide 31

Data Domain System Protocols

NFS: This protocol allows Network File System (NFS) clients

access to Data Domain system directories and MTrees


CIFS: This protocol allows Common Internet File System (CIFS)
clients access to Data Domain system directories and MTrees
VTL: The virtual tape library (VTL) protocol enables backup
applications to connect to and manage Data Domain system
storage as if it were a tape library
DD Boost: The DD Boost protocol enables backup servers to
communicate with storage systems without the need for Data
Domain systems to emulate tape
NDMP: If the VTL communication between a backup server and a
Data Domain system is through NDMP (Network Data
Management Protocol), no Fibre Channel (FC) is required
Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

31

Five protocols can be used to connect to a Data Domain appliance:


NFS
This protocol allows Network File System (NFS) clients access to Data Domain system directories
and MTrees.
CIFS
This protocol allows Common Internet File System (CIFS) clients access to Data Domain system
directories and MTrees.
VTL
The virtual tape library (VTL) protocol enables backup applications to connect to and manage
Data Domain system storage as if it were a tape library. All of the functionality generally
supported by a physical tape library is available with a Data Domain system configured as a VTL.
The movement of data from a system configured as a VTL to a physical tape library is managed
by backup software (not by the Data Domain system). The VTL protocol is used with Fibre
Channel (FC) networking.

53

DD Boost
The DD Boost protocol enables backup servers to communicate with storage systems without
the need for Data Domain systems to emulate tape. There are two components to DD Boost:
one component that runs on the backup server and another component that runs on a Data
Domain system.
NDMP
If the VTL communication between a backup server and a Data Domain system is through NDMP
(Network Data Management Protocol), no Fibre Channel (FC) is required. When you use NDMP,
all initiator and port functionality does not apply.

54

Slide 32

Module 1: Technology Overview

Lesson 7: Data Domain Paths Overview


This lesson covers the following topics:
Data Domain Systems in typical backup Environments
Data Path over Ethernet
Data Path over Fibre Channel VTL

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers Data Domain data paths, which include NFS, CIFS, DD Boost, NDMP, and VTL over
Ethernet or Fibre Channel.
This lesson also covers where a Data Domain system fits into a typical backup environment.

55

32

Slide 33

Typical Backup Environments


Solaris, Oracle, Linux,
Windows, SQL,
Exchange, and
application servers
production LAN gigabit Ethernet copper and fiber
backup server

tape library

WAN-based replication

backup server
WAN

offsite disaster
recovery location

copy to tape as required

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

33

Data Domain systems connect to backup servers as storage capacity to hold large collections of backup
data. This slide shows how a Data Domain system integrates non-intrusively into an existing storage
environment. Often a Data Domain system is connected directly to a backup server. The backup data
flow from the clients is simply redirected to the Data Domain device instead of to a tape library.
Data Domain systems integrate non-intrusively into typical backup environments and reduce the
amount of storage needed to back up large amounts of data by performing deduplication and
compression on data before writing it to disk. The data footprint is reduced, making it possible for tapes
to be partially or completely replaced.
Depending on an organizations policies, a tape library can be either removed or retained.
An organization can replicate and vault duplicate copies of data when two Data Domain systems have
the Data Domain Replicator software option enabled.

56

One option (not shown) is that data can be replicated locally with the copies stored onsite. The smaller
data footprint after deduplication also makes WAN vaulting feasible. As shown in the slide, replicas can
be sent over the WAN to an offsite disaster recovery (DR) location.
WAN vaulting can replace the process of rotating tapes from the library and sending the tapes to a vault
by truck.
If an organizations policies dictate that tape must still be made for long-term archival retention, data
can flow from the Data Domain system back to the server and then to a tape library.
Often the Data Domain system is connected directly to the backup server. The backup data flow is
redirected from the clients to the Data Domain system instead of to tape. If tape needs to be made for
long-term archival retention, data flows from the Data Domain system back to the server and then to
tape, completing the same flow that the backup server was doing initially. Tapes come out in the same
standard backup software formats as before and can go off-site for long-term retention. If a tape must
be retrieved, it goes back into the tape library, and the data flows back through the backup software to
the client that needs it.

57

Slide 34

Data Path over Ethernet


Backup/Archive Media Servers

DD Boost
NFS/CIFS
FTP/NDMP

Ethernet

TCP(UDP)/IP
WAN

TCP(UDP)/IP

deduplicated replication

Ethernet
NFS/CIFS/DD Boost
FTP/NDMP

Ethernet

deduplicated
data written to
file system

Data Domain system

Data Domain system

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

34

A data path is the path that data travels from the backup (or archive) servers to a Data Domain system.
Ethernet supports the NFS, CIFS, FTP, NDMP, and DD Boost protocols that a Data Domain system uses to
move data.
In the data path over Ethernet (a family of computer networking technologies), backup and archive
servers send data from clients to Data Domain systems on the network via the TCP(UDP)/IP (a set of
communication protocols for the internet and other networks).
You can also use a direct connection between a dedicated port on the backup or archive server and a
dedicated port on the Data Domain system. The connection between the backup (or archive) server and
the Data Domain system can be Ethernet or Fibre Channel, or both if needed. This slide shows the
Ethernet connection.

58

When Data Domain Replicator is licensed on two Data Domain systems, replication is enabled between
the two systems. The Data Domain systems can be either local, for local retention, or remote, for
disaster recovery. Data in flight over the WAN can be secured using VPN. Physical separation of the
replication traffic from backup traffic can be achieved by using two separate Ethernet interfaces on a
Data Domain system. This allows backups and replication to run simultaneously without network
conflicts. Since the Data Domain OS is based on Linux, it needs additional software to work with CIFS.
Samba software enables CIFS to work with the Data Domain OS.

59

Slide 35

Data Path over Fibre Channel VTL


Backup/Archive Media Servers

/dev/rmt

\\.\Tape#
WAN

FC SAN

TCP(UDP)/IP

deduplicated replication

SAN

Ethernet

Ethernet

VTL
deduplicated
data written to
file system

Data Domain system

Data Domain system

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

35

A data path is the path that data travels from the backup (or archive) servers to a Data Domain system.
Fibre Channel supports the VTL protocols that a Data Domain system uses to move data.
If the Data Domain virtual tape library (VTL) option is licensed, and a VTL FC HBA is installed on the Data
Domain system, the system can be connected to a Fibre Channel system attached network (SAN). The
backup or archive server sees the Data Domain system as one or multiple VTLs with up to 512 virtual
linear tape-open (LTO)-1, LTO-2, or LTO-3 tape drives and 20,000 virtual slots across up to 100,000
virtual cartridges.

60

Slide 36

Module 1: Technology Overview

Lesson 8: Data Domain Administration Interfaces


This lesson covers the following topics:
Enterprise Manager
Command Line Interface (CLI)

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers Data Domain administration interfaces, which include:


The Enterprise Manager, which is the graphical user interface (GUI)
The command line interface (CLI)

61

36

Slide 37

Enterprise Manager
https://<DDHostName>/ddem

https://<DDHostName>/ddem
You need the
sysadmin password
to add a Data
Domain system

Enterprise Manager
summary screen

Cumulative information for


monitored systems

Select a machine to view detailed


information for that machine

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

37

With the Enterprise Manager, you can manage one or more Data Domain systems. You can monitor and
add systems from the Enterprise Manager. (To add a system you need a sysadmin password.) You can
also view cumulative information about the systems youre monitoring.
A Data Domain system should be added to, and managed by, only one Enterprise Manager.
You can access the Enterprise Manager from many browsers:
Microsoft Internet Explorer
Google Chrome
Mozilla Firefox
The Summary screen presents a status overview of, and cumulative information for, all managed
systems in the DD Network devices list and summarizes key operating information. The System Status,
Space Usage, and Systems panes provide key factors to help you recognize problems immediately and to
allow you to drill down to the system exhibiting the problem.

62

The tally of alerts and charts of disk space that the Enterprise Manager presents enables you to quickly
spot problems.
Click the plus sign (+) next to the DD Network icon in the sidebar to expose the systems being managed
by the Enterprise Manager.
The Enterprise Manager includes tabs to help you navigate your way through administrative tasks. To
access the top- and sub-level tabs, shown in this slide, you must first select a system. In the lower pane
on the screen, you can view information about the system you selected. In this slide, a system has been
selected, and you can view details about it.

63

Slide 38

Command Line Interface (CLI)

Access CLI via SSH, serial console,


telnet, Serial Over LAN (SOL),
keyboard & monitor

1.
2.
3.
4.
5.

Keyboard
Video Port
Serial Port
eth0a
eth0b

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

38

The EMC Data Domain command line interface (CLI) enables you to manage Data Domain systems.
You can do everything from the CLI that you can do from the Enterprise Manager.
After the initial configuration, use the SSH or Telnet (if enabled) utilities to access the system remotely
and open the CLI.
The DD OS 5.2 Command Reference Guide provides information for using the commands to accomplish
specific administration tasks. Each command also has an online help page that gives the complete
command syntax. Help pages are available at the CLI using the help command. Any Data Domain system
command that accepts a list (such as a list of IP addresses) accepts entries separated by commas, by
spaces, or both.

64

Slide 39

Lab 1.1: Lab Environment Setup and Administration


Interfaces

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

65

39

Slide 40

Module 1 Summary: What is a Data Domain System?

Not just a backup appliance and not just online


storage

Ethernet and Fibre Channel connections


Simultaneous NDMP, VTL, CIFS, NFS, and DD Boost

protocols

Safe and Reliable


Data Invulnerability Architecture (DIA)

Deduplicating hardware system


Inline deduplication
Variable-length segments

Easy to integrate
Qualified with leading enterprise backup and

archiving applications
Integrates easily into existing storage infrastructures.
Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

40

EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery with high-speed, inline deduplication. An EMC Data Domain system can also be used for online
storage with additional features and benefits.
A Data Domain system can connect to your network via Ethernet or Fibre Channel connections. With an
Ethernet connection, the system can be accessed using the NDMP, DD Boost, CIFS and NFS protocols.
The Fibre Channel connection supports the VTL protocol.
EMC Data Domain implements deduplication in a special hardware device. Most Data Domain systems
have a controller and multiple storage units.
Data Domain systems use low-cost Serial Advanced Technology Attachment (SATA) disk drives and
implement a redundant array of independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity. Data Domain systems use non-volatile random access memory
(NVRAM) to protect unwritten data. NVRAM is used to hold data not yet written to disk. Holding data
like this ensures that data is not lost in a power outage.

66

Slide 41

Module 1: Summary

Deduplication is a technology that improves data storage


EMC Data Domain deduplication is performed inline on bytes,
not files
SISL gives Data Domain systems speed
DIA provides safe and reliable storage
DIA fights data loss in four ways:
End-to-end verification
Fault avoidance and containment
Continuous fault detection and healing
File system recovery

Module 1: Technology Overview

Copyright 2013 EMC Corporation. All Rights Reserved.

67

41

68

Slide 1

Module 2: Basic Administration

Upon completion of this module, you should be able to:

Perform the initial setup of a Data Domain system


Create local users on a Data Domain system
Verify hardware on a Data Domain system
Find key log files using the Enterprise Manager
List the optional licensed features available on a Data Domain
system

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

This module covers basic administrative tasks on a Data Domain system. It includes the following
lessons:
Verifying Hardware
Managing System Access
Introduction to Monitoring a Data Domain System
Licensed Features
Upgrading a Data Domain System

69

Slide 2

Module 2: Basic Administration

Lesson 1: Verifying Hardware


This lesson covers the following topics:
Verifying System Information
Verifying Storage Status
Viewing Active Tier, Usable Enclosures, and
Failed/Foreign/Absent Disks
Viewing Chassis Status

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

As part of initially setting up your Data Domain system, you should verify that your hardware is installed
and configured correctly. This lesson covers verifying your hardware.

70

Slide 3

Launch Configuration Wizard


1
3

1. Click Maintenance
2. Click More Tasks
3. Select Launch
4.

Configuration
Wizard
Follow all steps of
the Configuration
Wizard

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

The initial configuration of the Data Domain system will most likely be done using the Enterprise
Manager (EM) Configuration Wizard. The Enterprise Manager Configuration Wizard provides a graphical
user interface (GUI) that includes configuration options. After a network connection is configured (with
the CLI-based Configuration Wizard), you can use the Enterprise Manager Configuration Wizard to
modify or add configuration data. The Configuration Wizard performs an initial configurationit does
not cover all configuration options; it configures what is needed for the most basic system setup. After
the initial configuration, you can use the Enterprise Manager or CLI commands to change or update the
configuration.
The Configuration Wizard consists of these sections: Licenses, Network, File system, System, CIFS, and
NFS. You can configure or skip any section. After completing the Configuration Wizard, reboot the Data
Domain system. Note: The file system configuration is not described here. Default values are acceptable
to most sites.
The Configuration Wizard enables you to quickly step through basic configuration options without
having to use CLI commands.

71

To launch the Configuration Wizard:


1. From the Enterprise Manager, click Maintenance.
2. Click the More Tasks menu.
3. Double-click Launch Configuration Wizard.
4. Follow the Configuration Wizard prompts.
You must follow the configuration prompts. You cant select an item to configure from the
left navigation pane. You are prompted to submit your configuration changes as you move
through the wizard. You can also quit the wizard during your configuration.
You can also use the config setup command on a single node or in a GDA to change configuration
settings for the system, network, file system, CIFS, NFS, and licenses.
# config setup
Use this command on a single node or in a GDA to change configuration settings for the system,
network, file system, CIFS, NFS, and licenses. Press Enter to cycle through the selections. You will
be prompted to confirm any changes. Choices include Save, Cancel, and Retry.
Note: This command option is unavailable on systems using Retention Lock Compliance. Use
Enterprise Manager to change configuration settings.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.

72

Slide 4

Verifying System Information


1

1.
2.

Click Maintenance
Verify the model
number, DD OS
version, system
uptime, and serial
number

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

After your Data Domain system is installed, you should verify that you have the correct model number,
DD OS version, and serial number to ensure that they match what you ordered.
The System page in the Enterprise Manager gives you important system information without requiring
you to enter multiple commands.
To verify your model number, system uptime, and serial number in the Enterprise Manager:
1. Click the Maintenance tab.
2. Verify the model number, DD OS version, system uptime, and serial number.
You can also use the system show command using the command line interface (CLI) to view system
options.
# system show all
Show all system information.
# system show modelno
Display the hardware model number of a Data Domain system.

73

# system show serialno


Display the system serial number.
# system show uptime
Display the file system uptime, the time since the last reboot, the number of users, and the
average load.
# system show version
Display the Data Domain OS version and build identification number.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.

74

Slide 5

Verifying Storage Status


1

2
3

1. Click Hardware
2. Click Storage
3. Storage Status
Green
All disks in the system are in good condition
Yellow
The system is operational, but there are

problems that need to be corrected

Red
The system is not operational

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

After your Data Domain system is installed, you should verify that your storage is operational.
The Storage Status area of the page shows the current status of the storage (such as operational or nonoperational) and any active alerts (these can be clicked to view alert details). There are no active alerts
shown in this slide.
The status of a storage system can be:
Normal: System operational (green). All disks in the system are in good condition.
Warning: System operational (yellow). The system is operational, but there are problems that
need to be corrected. Warnings may result from a degraded RAID group, the presence of foreign
storage, or failed or absent disks.
Error: System non-operational (red). The system is not operational.

75

The Storage view provides a way of organizing the Data Domain system storage so disks can be viewed
by usage type (Active, Archive, Failed, and so on), operational status, and location. This includes internal
system storage and systems configured with external disk shelves. The status and inventory are shown
for all enclosures, disks, and RAID groups. The system is automatically scanned and inventoried so all
storage is shown in the Storage view.
1. Click the Hardware tab.
2. Click the Storage tab.
3. Verify the storage status.
From the command line, you can use the storage show command to display information about file
system storage.
# storage show {all | summary | tier {active | archive}}
Display information about file system storage. All users may run this command option.
Output includes the number of disk groups working normally and the number of degraded disk
groups. Details on disk groups undergoing, or queued for, reconstruction, are also shown when
applicable. The abbreviation N/A in the column Shelf Capacity License Needed indicates the
enclosure does not require a capacity license, or that part of the enclosure is within a tier and
the required capacity license for the entire enclosure has been accounted for.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.

76

Slide 6

Viewing Active Tier, Usable Disks,


Failed/Foreign/Absent and System Disks
1

1.
2.
3.
4.

Click Hardware
Click Storage
Click Overview
Expand Active Tier,
Usable Disks,
Failed/Foreign/Absent
or System Disks to view
details

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

After your Data Domain system is installed, you should verify that your storage is operational and your
disk group status is normal. Ensure that you observe your Disks Not In Use status.
The Storage view provides a way of organizing the Data Domain system storage so disks can be viewed
by usage type (Active, Archive, Failed, and so on), operational status, and location. This includes internal
system storage and systems configured with external disk shelves. The status and inventory are shown
for all enclosures, disks, and RAID groups. The system is automatically scanned and inventoried so all
storage is shown in the Storage view.
To view information about the Active Tier, Usable Disks, or Failed/Foreign/Absent disks, do the
following:
1. Click the Hardware tab.
2. Click the Storage tab.
3. Click the Overview tab.
4. Click Active Tier, Usable Disks, Failed/Foreign/Absent or System Disks to view details.

77

You can also use the command line interface (CLI) to display state information about all disks in an
enclosure (a Data Domain system or an attached expansion shelf), or LUNs in a Data Domain gateway
system using storage area network (SAN) storage using the disk show state command.
# disk show state
Display state information about all disks in an enclosure (a Data Domain system or an attached
expansion shelf), or LUNs in a Data Domain gateway system using storage area network (SAN)
storage.
Columns in the output display the disk state for each slot number by enclosure ID, the total
number of disks by disk state, and the total number of disks.
If a RAID disk group reconstruction is underway, columns for the disk identifier, progress, and
time remaining are also shown.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.

78

Slide 7

Viewing the Active Tier


1

3
4

1.
2.
3.
4.

Click Hardware
Click Storage
Click Overview
Expand Active Tier

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

Disks in the active tier are currently marked as usable by the Data Domain file system.
Sections are organized by disks in use and disks not in use. If the optional archive feature is installed,
you can expand your view of the disk use in the active tier from the Storage Status Overview pane. You
can view both disks in use and disks not in use. In this example:
Disk Group: dg1
Status: Normal
Disk Reconstructing: N/A
Total Disks: 14
Disks: 3.1-3.14
You can also click the View Disks link to view individual disks.

79

Slide 8

Locating a Disk
1

1.
2.
3.
4.

Click Hardware
Click Storage
Click Disks
Select a disk and click
Beacon to locate a
disk
5. The Beaconing Disk
dialog opens. Click
Stop to close.
5

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

The Disks view lists all the system disks in a scrollable table with the following information.
Disk: The disk identifier. It can be:
The enclosure and disk number (in the form Enclosure.Slot).
A gateway disk (devn).
A LUN.
Status: The status of the disk (for example In Use, Spare).
Manufacturer/Model The manufacturers model designation. The display may include a model
ID or RAID type or other information depending on the vendor string sent by the storage array.
Firmware: The firmware level used by the third-party physical disk storage controller.
Serial Number: The manufacturers serial number for the disk.
The Disks tab enables you to see the status of all disks and details on individual disks.
Use the radio buttons to select how the disks are viewed: by all disks, or by tier, or by disk group.

80

To locate (beacon) a disk (for example, when a failed disk needs to be replaced):
1. Click Hardware > Storage > Disks.
2. The Disks view appears.
3. Select a disk from the Disks table and click Beacon.
4. The Beaconing Disk dialog window appears, and the LED light on the disk begins flashing.
5. Click Stop to stop the LED from beaconing.
From the command line, you can use the disk show command to display list of serial numbers of failed
disks in the Data Domain system. The disk beacon command will cause the LED that signals normal
operation to flash on the target disk.
# disk show failure-history
Display a list of serial numbers of failed disks in the Data Domain system.
# disk beacon <enclosure-id>.<disk-id>
Cause the LED that signals normal operation to flash on the target disk. Press Ctrl-C to stop the
flash. To check all disks in an enclosure, use the enclosure beacon command option.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.

81

Slide 9

Viewing Usable Enclosures


1

1.
2.
3.
4.

Click Hardware
Click Storage
Click Overview
Expand Usable
Enclosures

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

Usable enclosures are those that arent incorporated into the file system yet.
The Usable Enclosures section enables you to view the usable disks within the expansion shelves on a
Data Domain system. You can also view the details of individual disks.
To view details about usable disks from the Enterprise Manager:
1. Select a system from the left navigation pane.
2. Click the Hardware tab.
3. View the status, which includes the disk:
Name
Status
Size
Manufacturer/model
Firmware
Serial number

82

From the command line, the disk show hardware command will display disk hardware information.
# disk show hardware
Display disk hardware information.
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.

83

Slide 10

Viewing Failed/Foreign/Absent Disks


1

1.
2.
3.
4.

Click Hardware
Click Storage
Click Overview
Expand
Failed/Foreign/
Absent Disks

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

10

If there are any unusable disks, whether failed, foreign or absent, they will be displayed in this section.

Failed: The number of failed disks.


Foreign: The number of foreign disks. The foreign state indicates that the disk contains valid
Data Domain file system data and alerts the administrator to the presence of this data to make
sure it is attended properly. This commonly happens during chassis swaps, or when new shelves
are added to an active system.
Absent: The number of absent disks.

The Failed/Foreign/Absent Disks section enables you to view failed, foreign, and absent Disks. You can
also view the details of individual disks.

84

To get the status on failed, foreign, and absent Disks in the Enterprise Manager:
1. Select a system from the left navigation pane.
2. Open the Failed/Foreign/Absent Disks panel.
3. View the following disk information:
Name
Status
Size
Manufacturer/model
Firmware
Serial number

85

Slide 11

Viewing Chassis Status


2

1. Click Hardware
2. Click Chassis

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

11

The Chassis view provides a block drawing of the chassis and its componentsdisks, fans, power
supplies, NVRAM, CPUs, Memory, etc. The components that appear depend on the Data Domain system
model.
The chassis view enables you to check the hardware status.
To view your chassis status in the Enterprise Manager:
1. Click the Hardware tab.
2. Click Chassis.

86

From here you can view the following by hovering your mouse over them:
NVRAM
PCI slots
SAS
Power supply
PS fan
Riser expansion
Temperature
Fans
Front and back chassis views
Using the command line interface (CLI), you can check system statistics for the time period since the last
reboot using the system show stats command. The system show hardware command will display
information about slots and vendors and other hardware in a Data Domain system. Consult the DD OS
5.2 Command Reference Guide for more information on using the commands referenced in this student
guide.

87

Slide 12

Lab 2.1: Initial Setup and Hardware Verification

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

88

12

Slide 13

Module 2: Basic Administration

Lesson 2: Manage System Access


This lesson covers the following topics:
Defining User Roles
Creating Users
Managing Administration Access Protocols

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers user privileges, administration access, and user administration.

89

13

Slide 14

Defining User Roles

Roles enable you to restrict system access to a set of privileges


Admin
User
Security
Backup-Operator
Data-Access

Only the sysadmin user can create the first security officer. After
the first security officer is created, only security officers can
create or modify other security officers.
Sysadmin is the default admin user and cannot be deleted or
modified.
The first security-officer account cannot be deleted

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

14

To enhance security, each user can be assigned a different role. Roles enable you to restrict system
access to a set of privileges. A Data Domain system supports the following roles:
Admin
Allows one to administer, that is, configure and monitor, the entire Data Domain system.
User
Allows one to monitor Data Domain systems and perform the fast copy operation.
Security
In addition to the user role privileges, allows one to set up security officer configurations and
manage other security officer operators.
Backup-operator
In addition to the user role privileges, allows one to create snapshots, import and export tapes
to a VTL library and move tapes within a VTL library.
Data-access
Intended for DD Boost authentication, an operator with this role cannot monitor or configure a
Data Domain system.

90

Note: The available roles display based on the users role. Only the Sysadmin user can create the first
security officer. After the first security officer is created, only security officers can create or modify other
security officers. Sysadmin is the default admin user and cannot be deleted or modified.

91

Slide 15

Managing Local Users


1

1. Click System Settings


2. Click Access

Management

3. Select Local Users


4. Click Create

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

15

In the Access Management tab, you can create and manage users.
Managing users enables you to name the user, grant them privileges, make them active, disabled or
locked, and find out if and when they were disabled. You can also find out the users last login location
and time.
To create new users, follow these steps:
1. Click the System Settings > Access Management > Local Users tabs.
The Local Users view appears.
2. Click the Create button to create a new user.
The Create User dialog box appears.
3. Enter the following information in the General Tab:
User The user ID or name.
Password The user password. Set an initial password (the user can change it later).
Verify Password The user password, again.
Role The role assigned to the user.

92

4. Enter the following information in the Advanced Tab:


Minimum Days Between Change The minimum number of days between password
changes that you allow a user. Default is 0.
Maximum Days Between Change The maximum number of days between password
changes that you allow a user. Default is 99999.
Warn Days Before Expire The number of days to warn the users before their password
expires. Default is 7.
Disable Days After Expire The number of days after a password expires to disable the user
account. Default is Never.
Disable account on the following date Check this box and enter a date (mm/dd/yyyy)
when you want to disable this account. Also, you can click the calendar to select a date.
5. Click OK.
To enable or disable users, follow these steps:
1. Click the System Settings > Access Management > Local Users tabs.
The Local Users view appears.
2. Click one or more user names from the list.
3. Click either the Enable or Disable button to enable or disable user accounts.
The Enable or Disable User dialog box appears.
4. Click OK and Close.

93

Slide 16

Manage Administration Access Protocols


2

1
4

3
5

1. Click System
2.
3.
4.
5.

Settings
Click Access
Management
Select Administrator
Access
Expand More Tasks
Select a protocol to
configure

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

16

As an administrator, you need to view and configure services that provide administrator and user access
to a Data Domain system. The services include:
Telnet: Provides access to a Data Domain system through a Telnet connection.
FTP: Provides access to a Data Domain system through an FTP connection.
HTTP/HTTPS: Provides access to a Data Domain system through an HTTP HTTPS, or both,
connection.
SSH: Provides access to a Data Domain system through an SSH connection.
Managing administration access protocols enables you to view and manage how other administrators
and users access a Data Domain system.

94

To provide access to a Data Domain system through a Telnet connection:


1. On the Access Management page, select Configure Telnet from the More Tasks menu.
The Configure Telnet Access dialog box appears.
2. To enable Telnet access, click the Allow Telnet Access checkbox.
3. Determine how the hosts connect:
To allow complete access, click the Allow all hosts to connect radio button.
To configure specific hosts, click the Limit Access to the following systems radio button, and
click the appropriate icon in the Allowed Hosts pane. A hostname can be a fully qualified
hostname or an IP address.
To add a host, click the plus button (+). Enter the hostname, and click OK.
To modify a hostname, click the checkbox of the hostname in the Hosts list, and click the
edit button (pencil). Change the hostname, and click OK.
To remove a hostname, click the checkbox of the hostname in the Hosts list, click the
minus button (-), and click OK.
4. Click OK.
To provide access to a Data Domain system through FTP:
1. On the Access Management page, select Configure FTP from the More Tasks menu.
The Configure FTP Access dialog box appears.
2. To enable FTP access, click the Allow FTP Access checkbox.
3. Determine how hosts connect:
To allow complete access, click the Allow all hosts to connect radio button.
To configure specific hosts, click the Limit Access to the following systems radio button, and
click the appropriate icon in the Allowed Hosts pane. A hostname can be a fully qualified
hostname or an IP address.
To add a host, click the plus button (+). Enter the hostname, and click OK.
To modify a hostname, click the checkbox of the hostname in the Hosts list, and click the
edit button (pencil). Change the hostname, and click OK.
To remove a hostname, click the checkbox of the hostname in the Hosts list, click the
minus button (-), and click OK.
4. Click OK.

95

To provide access to a Data Domain system through an HTTP, HTTPS, or both connection:
1. On the Access Management page, select Configure HTTP/HTTPS from the More Tasks menu.
The Configure HTTP/HTTPS Access dialog box appears.
2. To enable HTTP and/or HTTPS access, click the checkbox for Allow HTTP Access and/or the Allow
HTTPS Access.
3. Determine how hosts connect:
To allow complete access, click the Allow all hosts to connect radio button.
To configure specific hosts, click the Limit Access to the following systems radio button, and
click the appropriate icon in the Allowed Hosts pane. A hostname can be a fully qualified
hostname or an IP address.
To add a host, click the plus button (+). Enter the hostname, and click OK.
To modify a hostname, click the checkbox next to the hostname in the Hosts list, and
click the edit button (pencil). Change the hostname, and click OK.
To remove a hostname, click the checkbox of the hostname in the Hosts list, click the
minus button (-), and click OK.
4. To configure system ports and session timeout values, click the Advanced tab.
In the HTTP Port text entry box, enter the port for connection.
Port 80 is assigned by default.
In the HTTPS Port text entry box, enter the port for the connection.
Port 443 is assigned by default.
In the Session Timeout text entry box, enter the interval in seconds that must elapse before
the connection closes.
10800 seconds (3 hours) is assigned by default.
Note: Click Default to return the setting back to the default value.
5. Click OK.
To provide access to a Data Domain system through an SSH connection:
1. On the Access Management page, select Configure SSH from the More Tasks menu.
The Configure SSH Access dialog box appears.
2. To enable SSH access, click the Allow SSH Access checkbox.
3. Determine how hosts connect:
To allow complete access, click the Allow all hosts to connect radio button.
To configure specific hosts, click the Limit Access to the following systems radio button, and
click the appropriate icon in the Allowed Hosts pane. A hostname can be a fully qualified
hostname or an IP address.
To add a host, click the plus button (+). Enter the hostname, and click OK.
To modify a hostname, click the checkbox of the hostname in the Hosts list, and click the
edit button (pencil). Change the hostname, and click OK.
To remove a hostname, click the checkbox of the hostname in the Hosts list, click the
minus button (-), and click OK.
4. Click OK.
Using the command line interface (CLI) the adminaccess command can be used to allow remote
hosts to use the FTP, Telnet, HTTP, HTTPS, and SSH administrative protocols on the Data Domain system.

96

Slide 17

Lab 2.2: Managing System Access

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

97

17

Slide 18

Module 2: Basic Administration

Lesson 3: Introduction to Monitoring a Data Domain System


This lesson covers the following topics:
Log Files
Autosupports
Alerts
SNMP
Syslog (Remote Logging)

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

18

This lesson covers the basics of monitoring a Data Domain system, including log file locations, settings
and alerts.

98

Slide 19

Log Files

/ddvar

messages
space.log

/log

ddfs.info
vtl.info
perf.log
messages.engineering

/debug

/ddvar/log troubleshooting-related files


Only relevant files and folders are listed

/cifs

/ost

/platform

cifs.log
join_domain.log
ost.log

kern.info

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

19

The Data Domain system logs system status messages hourly. Log files can be bundled and sent to Data
Domain Support to provide the detailed system information that aids in troubleshooting any system
issues that may arise.
The Data Domain system log file entries contain messages from the alerts feature, autosupport reports,
and general system messages. The log directory is /ddvar/log.
Every Sunday at 3 a.m., the Data Domain system automatically opens new log files and renames the
previous files with an appended number of 1 through 9, such as messages.1. Each numbered file is
rolled to the next number each week. For example, at the second week, the file messages.1 is rolled
to messages.2. If a file messages.2 already existed, it rolls to messages.3. An existing
messages.9 is deleted when messages.8 rolls to messages.9.
The /ddvar/log folder includes files related to troubleshooting. Only relevant files or folders are
listed. The CLI command to view logs is log view [filename].

To view files under ddvar/log, use log view filename.


To view files under ddvar/log/debug, use log view debug/filename.

99

Use the Enterprise Manager to view the system log files in /ddvar/log.
1. Maintenance > Logs
2. Click the file you want to view.
The ddvar folder contains other log files that you cannot view through log commands or from the
Enterprise Manager.
To view all Data Domain system log files, create a ddvar share (CIFS) or mount the ddvar folder (NFS).
Contents of listed log files:
messages: Messages from the alerts, autosupport reports, and general system messages
space.log: Messages about disk space used by Data Domain system components and data
storage, and messages from the cleaning process
ddfs.info: Debugging information created by the file system processes
vtl.info: VTL information messages
perf.log: Performance statistics used by Data Domain support staff for system tuning
cifs.log: CIFs information messages
join_domain.log: Active directory information messages
ost.log: System information related to DD Boost
messages.engineering: Engineering-level messages related to the system
kern.info: Kernel information messages
You can also view log files from the command line using the following commands:
# log list
List top level or debug files in the log directory
# log view
View the system log or another log file
# log watch
Watch the system log or another log file in real time

100

Slide 20

Autosupport Logs and Alert Messages

Report the system status and identify potential system problems


Provide daily notification of the systems condition
Send email notifications to specific recipients for quicker,
targeted responses
Supply critical system data to aid support case triage and
management

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

20

Autosupport logs and alert messages help solve and prevent potentially crippling Data Domain system
problems.
Autosupport alert files provide timely notification of significant issues. Autosupport sends system
administrators, as well as Data Domain Support (when configured), a daily report of system information
and consolidated status output from a number of Data Domain system commands and entries from
various log files. Included in the report are extensive and detailed internal statistics and log information
to aid Data Domain Support in identifying and debugging system problems.
Autosupport logs are sent by email as simple text. Autosupport log distribution can be scheduled, with
the default time being 6:00 a.m.
During normal operation, a Data Domain system may produce warnings or encounter failures whereby
administrators must be informed immediately. This communication is performed by means of an alert.

101

Alerts are sent out to designated individuals or groups so appropriate actions can be taken promptly.
Alerts are sent as email in two forms: one is an immediate email for an individual alert to subscribers set
via the notification settings. The other is sent as a cumulative Daily Alert Summary email that is logged
on the Current Alerts page. These summaries are sent daily at 8:00 a.m. Daily alert summaries update
any critical events that might be occurring to the system.
Autosupport logs and alert messages:
Report the system status and identify potential system problems
Provide daily notification of the systems condition
Send email notifications to specific recipients for quicker, targeted responses
Supply critical system data to aid support case triage and management

102

Slide 21

Autosupport System Overview

autosupport@
autosupport.datadomain.com

via SMTP
summary alert report
detailed autosupport
report

System History

daily alert summary

/ddvar/support

reboots
warnings

reports

integration to
other systems

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

21

Each autosupport report can be rather large, depending on your system configuration (plain text
format).
The autosupport file contains a great deal of information on the system. The file includes general
information, such as the DD OS version, System ID, Model Number and Uptime, as well as information
found in many of the log files.
Autosupport logs are stored in the Data Domain system in /ddvar/support. Autosupport contents
include:
system ID
uptime information
system command outputs
runtime parameters
logs
system settings
status and performance data
debugging information

103

By default, the full autosupport report is emailed daily at 6:00 a.m. A second report, the autosupport
alert summary, is sent daily at 8:00 a.m.
A Data Domain system can send autosupport reports, if configured, to EMC Data Domain via SMTP to
the autosupport data warehouse within EMC. Data Domain captures the above files and stores them by
Data Domain serial number in the data warehouse for reference when needed for troubleshooting that
system. Autosupport reports are also a useful resource for Data Domain Technical Support to assist in
researching any cases opened against the system.
The autosupport function also sends alert messages to report anomalous behaviors, such as, reboots,
serious warnings, failed disk, failed power supply, and system nearly full. For more serious issues, such
as system reboots and failed hardware, these messages, can be configured to send to Data Domain, and
to automatically create cases for Support to proactively take action on your behalf.
Autosupport requires SMTP service to be active on the Data Domain system pointing to a valid email
server over a connection path to the Internet.

104

Slide 22

Configure Autosupport
1

2
3
5

1.
2.
3.
4.

5.

Click Maintenance
Click Support
Select Autosupport
Add or remove
additional subscribers
to autosupport
mailing list
Enable or disable
notifications

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

22

In the Enterprise Manager, you can add, delete, or edit email subscribers by clicking Configure in the
Autosupport Mailing List Subscribers area of the Autosupport tab.
Autosupport subscribers receive daily detailed reports. Using SMTP, autosupports are sent to Data
Domain Technical Support daily at 6 a.m. local time. This is the default setting.
View any of the collection of Autosupport reports in the Autosupport Report file listing by clicking the
file name. You are then prompted to download the file locally. Open the file for reading in a standard
web browser for convenience.

105

You can also use the command line interface (CLI) to configure Autosupports. Consult the DD OS 5.2
Command Reference Guide for more information on using the commands referenced in this student
guide.
# autosupport disable support-notify
Disables the sending of the Daily Alert Summary and the Autosupport Report to Data Domain
Support.
# autosupport enable support-notify
Enables the sending of the Daily Alert Summary and the Autosupport Report to Data Domain
Support.
# autosupport add
Adds entries to the email list for the Daily Alert Summary or the Autosupport Report.
# autosupport del
Deletes entries to the email list for the Daily Alert Summary or the Autosupport Report.
# autosupport set schedule
Schedules the Daily Alert Summary or the Autosupport Report. For either report, the most
recently configured schedule overrides the previously configured schedule.
# autosupport show
Displays autosupport configuration.
# autosupport show schedule
Displays the schedules for the Daily Alert Summary and the Autosupport Report.

106

Slide 23

Alerts
1
2

1.
2.
3.
4.
5.

3
4

Click Status
Click Alerts
Select Notification
Click Add
Add a group name and
set appropriate
attributes

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

23

Alerts are notification messages generated by a Data Domain system if an undesirable event occurs.
A configured Data Domain system sends an alert immediately via email to any list of subscribers. Higherlevel alerts can be sent automatically to EMC Data Domain Support for tracking.
If Data Domain Support receives a copy of the message, and depending on the nature of the event, a
support case is generated, and a Technical Support Engineer proactively tries to resolve the issue as
soon as possible.

Alerts contain a short description of the problem.


Alerts have a separate email distribution list.
On receipt of an alert, Data Domain creates a support case.

107

Alert notification groups allows flexibility in notifying the responsible parties who provide maintenance
to a Data Domain system. Individual subscribers can be targeted for specific types of alerts. Instead of
sending alerts to every subscriber for every type of problem, a sysadmin can configure groups of
contacts related to types of issues. For example, you can create an environment alert notification group
for team members who are responsible for data center facilities, and power to the system. When the
system creates a specific, environment-related alert, only those recipients for that class of alerts are
contacted.
System administrators can also set groups according to the seriousness of the alert.
Set alert notification groups in Status > Alerts > Notifications tab.
After a group is created, you can configure the Class Attributes pane to modify the types and severity of
the alerts this group should receive. In the Subscribers pane, you can modify a list of recipient email
addresses belonging to this group.
You can also use the command line interface (CLI) to configure autosupports.
# alerts notify-list create
Creates a notification list and subscribes to events belonging to the specified list of classes and
severity levels.
# alerts notify-list add
Adds to a notification list and subscribes to events belonging to the specified list of classes and
severity levels.
# alerts notify-list del
Deletes members from a notification list, a list of classes, a list of email addresses.
# alerts notify-list destroy
Destroys a notification list
# alerts notify-list reset
Resets all notification lists to factory default
# alerts notify-list show
Shows notification lists configuration
# alerts notify-list test
Sends a test notification to alerts notify-list
Consult the DD OS 5.2 Command Reference Guide for more information on using the commands
referenced in this student guide.

108

Slide 24

View Autosupports Within the Support Portal

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

24

Within the EMC Data Domain support portal, you can access and view autosupports, alert messages,
and alert summaries sent by a Data Domain system. Only systems sending autosupport information to
Data Domain are presented through the support portal.
When reviewing your systems, you see a list of systems and their maintenance status. The symbols used
on this web page reflect maintenance contract status and do not reflect the operational status of the
machine.
A maintenance alert is a red disk icon with a white X. It indicates that a maintenance contract has
expired. An amber triangle with a white exclamation point indicates that maintenance is nearing
expiration.
Select a line item in the list of available Data Domain systems, and you are presented with information
about your support contract, including its expiration date and a link to renew the contract.

109

Slide 25

View Autosupports Within the Support Portal

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

25

When you click View Space Plot, a graph appears where the space usage is shown. Cumulative
autosupport data is gathered in this graph. In the space plot page, there is a link to view detailed tabular
data.
Within the autosupport archive, you see autosupports, alerts, alert summaries, and reboot notifications
for a given system. Autosupports can be listed in the support portal, showing only the most recent of
each type of autosupport, or a list of all autosupports of a single type, or all autosupports of all types.

110

Slide 26

SNMP
SNMP server

SNMP
management console

snmpd

EMC NetWorker or Data


Protection Advisor

Trap Packet

MIB
community string (V2C) or
authenticated user with privacy (V3)

DATA-DOMAIN-MIB:
powerSupplyFailedAlarm

Trap Packet

SNMP
agent

OID: 1.3.6.1.4.1.19746.2.0.1

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

26

The Simple Network Management Protocol (SNMP) is an open-standard protocol for exchanging
network management information, and is a part of the Transmission Control Protocol/Internet Protocol
(TCP/IP) protocol suite. SNMP provides a tool for network administrators to monitor and manage
network-attached devices, such as Data Domain systems, for conditions that warrant administrator
attention.
In typical SNMP uses, one or more administrative computers, called managers, have the task of
monitoring or managing a group of hosts or devices on a computer network. Each managed system
executes, at all times, a software component called an agent that reports information via SNMP to the
manager.
Essentially, SNMP agents expose management data on the managed systems through object IDs (OIDs).
The protocol also permits active management tasks, such as modifying and applying a new
configuration, through remote modification of these variables. In the case of Data Domain systems,
active management tasks are not supported. The data contained in the OIDs are called variables, and are
organized in hierarchies. These hierarchies, and other metadata (such as type and description of the
variable), are described by Management Information Bases (MIBs).

111

When an SNMP agent residing on the Data Domain system transmits OID traps, which are messages
from the system indicating change of system state in the form of a very basic OID code (for example,
1.3.6.1.4.1.19746.2.0.1). The management system, running the snmp daemon, interprets the OID
through the Data Domain MIB and generates the alert message to the SNMP management console (for
example, powerSupplyFailedAlarm).
DD OS supports two forms of SNMP authentication, each in a different SNMP version. In SNMP version 2
(v2), each SNMP management host and agent belongs to an SNMP community: a collection of hosts
grouped together for administrative purposes. Deciding the computers that should belong to the same
community is generally, but not always, determined by the physical proximity of the computers.
Communities are identified by the names you assign them. A community string can be thought of as a
password shared by SNMP management consoles and managed computers. Set hard-to-guess
community strings when you install the SNMP service. There is little security as none of the data is
encrypted.
SNMP version 3 (v3) offers individual users instead of communities with related authentication (MD5 or
SHA1) and AES or DES privacy.
When an SNMP agent receives a message from the Data Domain system, the community string or user
authentication information contained in the packet is verified against the agent's list of acceptable users
or community strings. After the name is determined to be acceptable, the request is evaluated against
the agent's list of access permissions for that community. Access can be set to read-only or read-write.
System status information can be captured and recorded for the system that the agent is monitoring.
You can integrate the Data Domain management information base into SNMP monitoring software, such
as EMC NetWorker or Data Protection Advisor. Refer to your SNMP monitoring software administration
guide for instructions on how to integrate the MIB into your monitoring software and for recommended
practices. SNMP management systems monitor the system by maintaining an event log of reported
traps.

112

Slide 27

SNMP
2

1
3

1. Click System Settings


2. Click General Configuration
3. Click SNMP

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

27

You can download the Management Information Base (MIB) file from the Enterprise Manager by
navigating to System Settings > General Configuration > SNMP and clicking the Download MIB file
button. You can also download the MIB files from the /ddvar/snmp directory.
Install the MIB file according to the instructions of your management server.
The default port that is open when SNMP is enabled is port 161. Traps are sent out through port 162.
Configure either SNMP V3 or V2C in the same window. Follow the instructions for your SNMP
management software to ensure proper set-up and communication between the management console
and the Data Domain system.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for the full set of MIB
parameters included in the Data Domain MIB branch.

113

Slide 28

Syslog (Remote Logging)

DD OS uses syslog to publish log messages to remote systems


System messages are sent to remote syslog server using UDP

port 514
Syslog can be configured using only the command line interface
(CLI)
System
Messages

LAN

syslog server
collects
logs

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

28

Some log messages can be sent from the Data Domain system to other systems. DD OS uses syslog to
publish log messages to remote systems.

In a Data Domain system, the remote logging feature uses UDP port 514.
You can configure a Data Domain system to send system messages to a remote syslog server.
A Data Domain system exports the following facility.priority selectors for log files. For
information on managing the selectors and receiving messages on a third-party system, see your
vendor-supplied documentation for the receiving system.
The log host commands manage the process of sending log messages to another system:
*.noticeSends all messages at the notice priority and higher.
*.alertSends all messages at the alert priority and higher (alerts are included in *.notice).
kern.*Sends all kernel messages (kern.info log files).
local7.*Sends all messages from system startups (boot.log files).

Syslog can be configured using only the command line interface (CLI) with the Data Domain system.

114

Configure syslog by doing the following:


Obtain the IP address of the remote logging device receiving the Data Domain system log
information.
Use the log command to configure remote logging.
Ensure that UDP port 514 is open and available on the remote log device.
Enable remote logging with the log host enable command.
Add a syslog server using the log host add [serverIP] command.
Check the configuration using the log host show command.
If you need to disable the syslog for any reason, use the log host disable command.

115

Slide 29

Lab 2.3: Monitoring a Data Domain System

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

116

29

Slide 30

Module 2: Basic Administration

Lesson 4: Licensed Features


This lesson covers the following topics:
Checking and installing optional licenses on a Data Domain
system
Removing optional licenses from a Data Domain system

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

30

This lesson covers the basics of adding licensed features to, and removing optional licenses from, a Data
Domain system.

117

Slide 31

Data Domain Licensed Features

DD Boost
Replication
Retention Lock Governance
Retention Lock Compliance
VTL (Virtual Tape Library)
Encryption of Data at Rest
Expansion Storage
Shelf Capacity

Gateway Expanded Storage

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

Level 2
Gateway Expanded Storage
Level 3
DD Extended Retention
(Formerly DD Archiver)
Global Deduplication Array
(GDA)
Nearline

31

DD Boost
Allows a system to use the Boost interface on a Data Domain system.
Replication
Adds the Data Domain Replicator for replication of data from one Data Domain system to
another.
Retention Lock Governance
Protects selected files from modification and unscheduled deletion, that is, deletion before a
specified retention period has expired.
Retention Lock Compliance
Allows you to meet the strictest data retention requirements from regulatory standards such as
SEC17a-4.
VTL (Virtual Tape Library)
Allows backup software to see a Data Domain system as a tape library.
Encryption of Data at Rest
Allows data on system drives or external storage to be encrypted while being saved, and then
locked before moving to another location.

118

Expansion Storage
Allows the upgrade of capacity for the Data Domain system. Enables either the upgrade of a 9disk DD510/DD530 to 15 disks, or the upgrade of a 7-disk DD610/DD630 to 12 disks.
Shelf Capacity
Allows ES30 and ES20 (purchased for use with DD OS 5.1) external shelves to be added to the
Data Domain system for additional capacity.
Gateway Expanded Storage Level 2
Enables gateway systems to support up to 71 TB of usable capacity.
Gateway Expanded Storage Level 3
Enables gateway systems to support up to 145 TB of usable capacity.
DD Extended Retention (formerly DD Archiver)
Provides long-term backup retention on the DD860 and DD990 platforms.
Global Deduplication Array (GDA)
Licenses the global deduplication array.
Nearline
Identifies systems deployed for archive and nearline workloads.

119

Slide 32

Managing Licenses
1

2
3

1. Click System
2.
3.
4.

Settings
Click Licenses
Click Add
Licenses to add
licenses
Select one or
more licenses
from the list
then click Delete
Selected
Licenses to
remove licenses

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

32

You can check which licenses are enabled on your Data Domain system using the Enterprise Manager.
1. In the Navigational pane, expand the DD Network and select a system.
2. Click the System Settings > Licenses tabs.
The Feature Licenses pane appears, showing the list of license keys and features.
You can also use the command line interface (CLI) to check which licenses are enabled by using the
license show command. If the local argument is included in the option, output includes details on
local nodes only.
To add a feature license using the Enterprise Manager:
1. In the Feature Licenses pane, click Add Licenses.
The Add Licenses dialog box displays.
2. In the License Key text box, type or paste one or more license keys, each on its own line or
separated by a space or comma (and they will be automatically placed on a new line).
3. Click Add.

120

The added licenses display in the Added license list. If there are errors, they will be shown in the error
license list. Click a license with an error to edit the license, and click Retry Failed License(s) to retry the
key. Otherwise, click Done to ignore the errors and return to the Feature Licenses page.
You can also add one or more licenses for features and storage capacity using the command line
interface (CLI). Include dashes when entering the license codes. This command option may run on a
standalone Data Domain system or on the master controller of a Global Deduplication Array.
# license add <license-code> [<license-code> ...]
Example
# license add ABCD-DCBA-AABB-CCDD BBCC-DDAA-CCAB-AADD-DCCB-BDAC-E5
Added "ABCD-DCBA-AABB-CCDD" : REPLICATION feature
Added "BBCC-DDAA-CCAB-AADD-DCCB-BDAC-E5" : CAPACITY-ARCHIVE feature
for 6TiB capacity ES20
To remove one or more feature licenses using the Enterprise Manager:
In the Feature Licenses pane, click a checkbox next to one or more licenses you wish to
remove and click Delete Selected Licenses.
In the Warning dialog box, verify the license(s) to delete and click OK.
The licenses are removed from the license list.
You can also use the command line interface (CLI) to delete one or more software option licenses. In a
GDA configuration, run this command on the master controller.
Security officer authorization is required to delete licenses from Retention Lock Compliance systems
only.
You can also use the license del command to remove licenses from the command line.
Example
# license del EEFF-GGHH-JJII-LLKK MMNN-OOQP-NMPQ-PMNM STXZ-ZDYSGSSGBBAA
License code "EEFF-GGHH-JJII-LLKK" deleted.
License code "MMNN-OOQP-NMPQ-PMNM" deleted.
License code "STXZ-ZDYS-GSSG-BBAA" deleted.
If you need to remove all licenses at once using the command line interface (CLI) you can use the
license reset command. This command option requires security officer authorization if removing
licenses from Retention Lock Compliance systems. Licenses cannot be reset on a Global Deduplication
Array.

121

Slide 33

Lab 2.4: Managing Licensed Features

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

122

33

Slide 34

Module 2: Basic Administration

Lesson 5: Upgrading a Data Domain System


This lesson covers the following topics:
Preparing for a DD OS upgrade
Downloading the upgrade file
Using release notes to prepare for an upgrade
Performing the upgrade process

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

34

Upon completion of this module, you should be able to describe the upgrade process for a Data Domain
system.
This lesson covers the following topics:
Preparing for a DD OS upgrade
Downloading the upgrade file
Using release notes to prepare for an upgrade
Performing the upgrade process

123

Slide 35

DD OS Releases

Release Types
RA, IA, and GA

There is no down-grade path


Read all release notes before upgrading
When in doubt, contact Support before installing an upgrade

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

35

There are three basic release types:


Restricted Availability (RA)
An RA release has completed all internal testing, as well as testing, at selected customer sites.
An RA release is provided to a limited number of receptive customers and is primarily used to
help customers who want to start looking at new features.

Restricted availability releases are not available to all Data Domain system owners as a general
download. They can be obtained only through the appropriate EMC Data Domain Sales or
Support team approvals.
Initial Availability (IA)
An IA release is available as a download on the Data Domain support website and is intended for
production use by customers who need any of the new features or bug fixes contained in the
release.
General Availability (GA)
A GA release is available as a download on the Data Domain Support website and is intended for
production use by all customers. Any customer running an earlier Data Domain operating
system release, GA release or non-GA release, should upgrade to the latest GA release.

124

To ensure consistency in how we introduce our software, all release types move through the RA, IA, and
GA progression in a similar fashion. This allows customers to evaluate the releases using similar
standards. Data Domain recommends that you track Data Domain OS releases deployed in your backup
environment. It is important that the backup environment run the most current, supported releases.
Minimize the number of different deployed release versions in the same environment. As a general rule,
you should upgrade to the latest GA release of a particular release family. This ensures you are running
the latest version that has achieved our highest reliability status.
When RA or IA status releases are made available for upgrade, carefully consider factors such as the
backup environment, the feature improvements that are made to the release, and the potential risks of
implementing releases with less customer run-time than a GA release. Depending on these factors, it
might make sense to wait until a release reaches GA status.
There is no down-grade path to a previous version of the Data Domain operating system (DD OS). The
only method to revert to a previous DD OS version is to destroy the file system and all the data
contained therein, and start with a fresh installation of your preferred DD OS.
Caution: REVERTING TO A PREVIOUS DD OS VERSION DESTROYS ALL DATA ON THE DATA DOMAIN
SYSTEM.
Before upgrading:
Read all pertinent information contained in the release notes for the given upgrade version.
If you have questions or need additional information about an upgrade, contact EMC Data
Domain Support before upgrading for the best advice on how to proceed.

125

Slide 36

Why Upgrade?

Data Domain is constantly improving its operating system to take


advantage of new system features and capabilities.

When changing to newer systems upgrade is often required


Systems paired in a replication configuration should all have the

same version of DD OS
Compatibility is ensured with your backup host software
Unexpected system behavior can be corrected

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

36

It is not always essential, but it is wise, to maintain a Data Domain system with the current versions of
the OS. With the newest version of the Data Domain operating system, you can be sure that you have
access to all features and capabilities your system has to offer.

When you add newer Data Domain systems to your backup architecture, a newer version of DD
OS is typically required to support hardware changes such as remote-battery NVRAM, or when
adding the newer ES30 expansion shelf.
Data Domain Support recommends that systems paired in a replication configuration all have
the same version of DD OS.
Administrators upgrading or changing backup host software should always check the minimum
DD OS version recommended for a version of backup software in the Backup Compatibility
Guide. This guide is available in the EMC Data Domain support portal. Often, newer versions of
backup software are supported only with a newer version of DD OS. Always use the version of
the Data Domain operating system recommended by the backup software used in your backup
environment.
No software is free of flaws, and EMC Data Domain works continuously to improve the
functionality of the DD OS. Each version release has complete Release Notes that identify bug
fixes by number and what was fixed in the version.

126

Slide 37

Preparing for a DD OS Upgrade

Considerations
Are you upgrading more than two release families at a time?
4.7 to 4.9 is considered two families
4.9 to 5.2 is more than two families and requires two upgrades
Time required
Single upgrades can take 45 minutes or more
During the upgrade, the Data Domain file system is unavailable
Shutting down processes, rebooting after upgrade, and checking the
upgrade all take time
Replication
Do not disable replication on either system in the pair
Upgrade the destination (replica) before upgrading the source
(originator)
The system should be idle before beginning the upgrade

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

37

An upgrade to release 5.2 can be performed only from systems using release families 5.0 or 5.1.
Typically when upgrading DD OS, you should upgrade only two release families at a time ( 4.7 to 4.9, or
4.8 to 5.0). In order to upgrade to release 5.2 from a release family earlier than 4.7, you must upgrade in
steps. If you are more than two release families behind, contact EMC Data Domain Support for advice on
the intermediate versions to use for your stepped upgrade.
Make sure you allocate appropriate system downtime to perform the upgrade. Set aside enough time to
shut down processes prior to the upgrade and for spot-checking the upgraded system after completing
the upgrade. The time to run an the actual upgrade should take no longer than 45 minutes. Adding the
time to shut down processes, and to check the upgraded system, might take 90 minutes or more to
complete the upgrade. Double this time if you are upgrading more than two release families.
For replication users: Do not disable replication on either side of the replication pair. After it is back
online, replication automatically resumes service.
You should upgrade the destination (replica) before you upgrade the source Data Domain system.
Be sure to stop any client connections before beginning the upgrade.

127

Slide 38

Upgrade: Download Software

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

To access and download current available versions of Data Domain OS software:


1. Login to my.datadomain.com using your EMC Powerlink credentials.
2. Select Download Software from the toolbar on the left side of the support portal page.
3. Select the product (DD OS), and the platform (the Data Domain system model you are
upgrading), then click View.
4. Select the version of the upgrade you want to download from a list of available upgrade
packages by version and links (listed as Details and Download).
5. Be sure to download and read the Release Notes associated with the upgrade package you
downloaded before you upgrade.

128

38

Slide 39

Upgrade: Installing the Upgrade Package

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

39

When you have the new DD OS upgrade package downloaded locally, you can upload it to the Data
Domain system with the Data Domain Enterprise Manager:
1. Click Upload Upgrade Package and browse your local system until you find the upgrade package
you downloaded from the support portal.
2. Click OK.
The file transfers to the Data Domain system. The file is now in the list of available upgrade packages.
To perform a system upgrade:
1. Select the upgrade package you want to use from the list of available upgrade packages.
2. Click Perform System Upgrade.
The upgrade proceeds.
When the upgrade is complete, the system automatically reboots on its own. You need to login to the
Data Domain Enterprise Manager to resume administrative control of the Data Domain system.

129

Slide 40

Module 2: Summary

The Configuration Wizard in the Enterprise Manager can be

used to perform the initial setup of a Data Domain system


Hardware can be verified in the Enterprise Manager as well as
with the command line interface (CLI)
Local users on a Data Domain system must be assigned one of
five roles: Admin, User, Security, Backup Operator, or Data
Access
Key log files can be viewed using the Enterprise Manager or the
command line interface (CLI)
There are several optional licensed features available on a Data
Domain system

Module 2: Basic Administration

Copyright 2013 EMC Corporation. All Rights Reserved.

130

40

Slide 1

Module 3: Managing Network Interfaces

Upon completion of this module, you should be able to:


Manage network interfaces, settings and routes
Describe and manage link aggregation interfaces
Describe and manage link failover interfaces
Describe and manage VLAN interfaces
Describe and manage IP aliases

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

This module focuses on managing network interfaces. It includes the following lessons:
Configuring Network Interfaces
Link Aggregation
Link Failover
VLAN and IP Alias Interfaces
This module also includes a lab, which will enable you to test your knowledge.

131

Slide 2

Module 3: Managing Network Interfaces

Lesson 1: Configuring Network Interfaces


This lesson covers the following topics:
Managing Network Interfaces
Configuring an Ethernet Interface
Managing Network Settings
Managing Network Routes

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers configuring network interfaces. To do this, you need to know how to manage network
settings and routes, and how to create and configure static routes.

132

Slide 3

Managing Network Interfaces


2

1. Click Hardware
2. Click Network
3. Click Interfaces

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

The Network view provides a means to:


Configure network interfaces so the Data Domain system is available for management and
backup activities over a network.
Configure network interfaces to maximize throughput and be highly available.
Name the Data Domain system in the network environment and resolve the names of other
systems in the environment.
Isolate backup and near-line traffic in shared network environments.
View all the network-related settings.
Troubleshoot and diagnose network issues.
Select the Hardware tab, then the Network tab, and finally the Interfaces tabs to view and configure
network settings.

133

The Interfaces table presents the following information:


Interface: Shows the name of each interface associated with the selected Data Domain system.
Physical interfaces names start with eth. Virtual interface names start with veth.
Enabled: Indicates whether or not the interface is enabled. Select Yes to enable the interface
and connect it to the network. Select No to disable the interface and disconnect it from the
network.
DHCP: Indicates if the interface is configured to use DHCP. Shows Yes, No, or N/A.
IP Address: Shows the IP address associated with the interface. The address is used by the
network to identify the interface. If the interface is configured through DHCP, an asterisk
appears after this value.
Netmask: Shows the netmask associated with the interface. Uses the standard IP network mask
format. If the interface is configured through DHCP, an asterisk appears after this value.
Link: Indicates whether or not the interface currently has a live Ethernet connection (set to
either Yes or No).
Additional Info: Lists additional settings for the interface, such as the bonding mode.
Intelligent Platform Management Interface (IPMI)
Yes/No: Indicates if IPMI health and management monitoring is configured for the interface.
View IPMI Interfaces: Links to the Maintenance > IPMI configuration tab.

134

You can also use the command line interface (CLI) to configure and manage physical and virtual
interfaces, DHCP, DNS, IP addresses, and display network information and status.
# net config <ifname>{[[<ipaddr>] [netmask <mask>] [dhcp {yes |
no}]] | [<ipv6addr>]} {[autoneg] | [duplex {full | half} speed
{10|100|1000|10000}] [up | down] [mtu {<size> | default}]
Configure an Ethernet interface.
# net config <ifname> type {none | management | replication |
cluster}
Configure or set the type of Ethernet interface.
# net show all
Display all networking information, including IPv4 and IPv6 addresses.
# net show config [<ifname>]
Display the configuration for a specific Ethernet interface.
# net show {domainname | searchdomains}
Display the domain name or search domains used for email sent by a Data Domain system.
# net show dns
Display a list of DNS servers used by the Data Domain system. The final line in the output shows
if the servers were configured manually or by DHCP.
# net show hardware
Display Ethernet port hardware information.
# net show stats [ipversion {ipv4 | ipv6}] [all | interfaces |
listening | route | statistics]
Display network statistics.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for details on using the
net command.

135

Slide 4

Configuring an Ethernet Interface

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

To configure an Ethernet interface using the Enterprise Manager:


1. From the Navigation pane, select the Data Domain system to configure.
2. Click the Hardware > Network > Interfaces tab.
3. Select an interface to configure.
4. Click Configure.
The Configure Interface dialog box appears.
5. Determine how the interface IP address is to be set:
Use DHCP to assign the IP address. In the IP Settings pane, click the Obtain using DHCP radio
button.
Specify the IP settings manually. In the IP Settings pane, click the Manually configure IP
Address radio button.
The IP Address and Netmask fields become active.
Enter an IP Address.
The Internet Protocol (IP) address is the numerical label assigned to the interface, for
example, 192.168.10.23.
Enter a Netmask address.
The netmask is the subnet portion of the IP address assigned to the interface.
The format is typically 255.255.255.###, where the ### are the values that identify the
interface.

136

6. Specify the speed and duplex settings.


The speed and duplex settings define the rate of data transfer through the interface. Select one
of these options:
Autonegotiate Speed/Duplex: Select this option to allow the network interface card to
autonegotiate the line speed and duplex setting for an interface.
Manually Configure Speed/Duplex: Select this option to manually set an interface data
transfer rate. Select the speed and duplex from the drop-down lists.
Duplex options are Unknown, half-duplex or full-duplex.
The speed options listed are limited to the capabilities of the hardware device. Options
are Unknown, 10Mb, 100Mb, 1000Mb, and 10Gb.
Half-duplex is available only for 10Mb and 100Mb speeds.
1000Mb and 10Gb line speeds require full-duplex.
Optical interfaces require the Autonegotiate option.
Copper interface default is 10Mb. If a copper interface is set to 1000Mb or 10Gb line
speed, duplex must be full-duplex.
7. Specify the maximum transfer unit (MTU) size for the physical (Ethernet) interface.
Supported values are from 350 to 9014. For 100 Base-T and gigabit networks, 1500 is the
standard default.
8. Click the Default button to return the setting to the default value.
9. Ensure that all of your network components support the size set with this option.
10. Optionally, select the Dynamic DNS Registration option.
Dynamic domain name system (DDNS) is the protocol that allows machines on a network to
communicate with, and register their IP address on, a DNS server.
The DDNS must be registered to enable this option. Refer to Registering a DDNS in the DD
OS 5.2 Administration Guide for additional information. This option disables DHCP for this
interface.
11. Click Next.
The Configure Interface Settings summary page appears. The values listed reflect the new
system and interface state, which are applied when you click Finish.
12. Click Finish.
13. Click OK.

137

Slide 5

Managing Network Settings


1. Click

2
3

2.
3.

Hardware
Click Network
Click Settings

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

The Settings view enables you to manage Network settings in one place without having to execute
multiple commands.
To manage hardware settings, go to the Hardware tab, select the Network tab, then select the Settings
tab. From the Settings tab, you can view and edit the host settings, domain list, host mappings, and DNS
list.
The Network view presents status and configuration information about the system Ethernet interfaces.
It contains the Interfaces view, Settings view, and Routes view.
Use the Hardware > Network > Settings view to view and configure network settings. This includes
network parameters such as the hostname, domain name, search domains, host mapping, and the DNS
list.

138

Host Settings
Host Name: The hostname of the selected Data Domain system.
Domain Name: The fully-qualified domain name associated with the selected Data Domain
system.
Search Domain List
Search Domain: A list of search domains used by the Data Domain system. The Data Domain
system applies the search domain as a suffix to the hostname.
Hosts Mapping
IP Address: IP address of the host to resolve.
Host Name: Hostnames associated with the IP address.
DNS List
DNS IP Address: Current DNS IP addresses associated with the selected Data Domain
system. An asterisk (*) indicates the addresses were assigned through DHCP.

139

Slide 6

Managing Network Routes

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

Data Domain systems do not generate or respond to any of the network routing management protocols
(RIP, EGRP/EIGRP, and BGP) in any way. The only routing implemented on a Data Domain system is
based on the internal route table, where the administrator may define a specific network or subnet used
by a physical interface (or interface group).
Data Domain systems use source-based routing, which means outbound network packets that match
the subnet of multiple interfaces will be routed over only the physical interface from which they
originated.
In the Routes view, you can view and manage network routes without having to execute many
commands.

140

To set the default gateway:


1. Click the Hardware > Network > Routes tabs.
2. Click Edit in the Default Gateway area.
The Configure Default Gateway dialog box appears.
3. Choose how the gateway address is set. Either:
Select the Use DHCP value radio button for setting the gateway.
The Dynamic Host Configuration Protocol (DHCP) indicates if the gateway is configured using
the value from the DHCP server.
Or, select the Manually Configure radio button.
The gateway address box becomes available.
4. Enter the gateway address in the Gateway field.
5. Click OK.
The system processes the information and returns you to the Routes tab.
The Create Routes > Summary page appears. The values listed reflect the new configuration.
6. Click Finish.
Progress messages display. When changes are applied, the message indicates Completed.
7. Click OK to close the dialog box.
The new route specification is listed in the Route Spec list.
To create Static Routes:
1. From the Navigation pane, select the Data Domain system to configure.
2. Click the Hardware > Network > Routes tabs.
3. Click Create in the Static Routes area.
The Create Routes dialog box appears.
4. Select an interface to configure for the static route.
Click the checkboxes of the interface(s) whose route you are configuring.
Click Next.
5. Specify the Destination. Select either of the following.
The Network Address and Netmask.
Click the Network radio button.
Enter destination information, by providing the destination network address and netmask.
Note: This is not the IP of any interface. The interface is selected in the initial dialog, and it is
used for routing traffic.
The hostname or IP address of the host destination.
Click the Host radio button.
Enter the hostname or IP address of the destination host to use for the route.
6. Optionally, change the gateway for this route.
Click the checkbox, Specify different gateway for this route.
Enter a gateway address in the Gateway field.
7. Review changes, and click Next.
The Create Routes > Summary page appears. The values listed reflect the new configuration.
8. Complete the action, and click Finish.
Progress messages display. When changes are applied, the message indicates Completed. Click
OK to close the dialog.
The new route specification is listed in the Route Spec list.

141

Use the route command to manage routing between a Data Domain system and the backup hosts. An
added routing rule appears in the Kernel IP routing table and in the Data Domain system Route Config
list, a list of static routes that are reapplied at each system boot.
# route show config
Display the configured static routes in the Route Config list.
# route show table [ipversion {ipv4 | ipv6}]
Display all entries in the Kernel IP routing table.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for details on using the
route command.

142

Slide 7

Lab 3.1: Configuring Network Interfaces

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

143

Slide 8

Module 3: Managing Network Interfaces

Lesson 2: Link Aggregation


This lesson covers the following topics:
Understanding Link aggregation
Creating a virtual interface for link aggregation

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers link aggregation. First you will learn about link aggregation. Then, you will create a
virtual interface for link aggregation.

144

Slide 9

Understanding Link Aggregation


link aggregation 1
NIC 1

port 1

NIC 2

port 2

link aggregation 2

LAN

port 3

eth0a

port 4

eth1a
Data Domain
Appliance

Application/
Media Server

Link aggregation increases network throughput, across a LAN


Link aggregation performance is impacted by:

Link and switch speed


The quantity of data the Data Domain system can process
Out-of-order packets
The number of clients
The number of streams (connections) per client
Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

Using multiple Ethernet network cables, ports, and interfaces (links) in parallel, link aggregation
increases network throughput, across a LAN or LANs, until the maximum computer speed is reached.
Data processing can thus become faster than when data is sent over individual links. For example, you
can enable link aggregation on a virtual interface (veth1) to two physical interfaces (eth0a and eth0b) in
the link aggregation control protocol (LACP) mode and hash XOR-L2. Link aggregation evenly splits
network traffic across all links or ports in an aggregation group. It does this with minimal impact to the
splitting, assembling, and reordering of out-of-order packets.

145

Aggregation can occur between two directly attached systems (point-to-point and physical or virtual).
Normally, aggregation is between the local system and the connected network device or system. A Data
Domain system is usually connected to a switch or router. Aggregation is handled between the IP layer
(L3 and L4) and the mac layer (L2) network driver. Link aggregation performance is impacted by the
following:
Switch speed: Normally the switch can handle the speed of each connected link, but it may lose
some packets if all of the packets are coming from several ports that are concentrated on one
uplink running at maximum speed. In most cases, this means you can use only one switch for
port aggregation coming out of a Data Domain system. Some network topologies allow for link
aggregation across multiple switches.
The quantity of data the Data Domain system can process.
Out-of-order packets: A network program must put out-of-order packets back in their original
order. If the link aggregation mode allows the packets to be sent out of order, and the protocol
requires that they be put back to the original order, the added overhead may impact the
throughput speed enough that the link aggregation mode causing the out-of-order packets
should not be used.
The number of clients: In most cases, either the physical or OS resources cannot drive data at
multiple Gbps. Also, due to hashing limits, you need multiple clients to push data at multiple
Gbps.
The number of streams (connections) per client can significantly impact link utilization
depending on the hashing used.
A Data Domain system supports two aggregation methods: round robin and balance-xor (you set
it up manually on both sides).
Requirements
Links can be part of only one group.
Aggregation is only between two systems.
All links in a group must have the same speed.
All links in a group must be either half-duplex or full-duplex.
No changes to the network headers are allowed.
You must have a unique address across aggregation groups.
Frame distribution must be predictable and consistent.

146

Slide 10

Creating a Virtual Interface for Link Aggregation

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

To create a link aggregation virtual interface:


1. Make sure your switch supports aggregation.
2. Select the Hardware tab, then the Interfaces tab.
3. Disable the physical interface where you want to add the virtual interface by selecting the
interface and selecting No from the Enabled menu.
4. From the Create menu, select Virtual Interface.
The Create Virtual Interface dialog box appears.
5. Specify a virtual interface name in the veth text box.

147

10

6. Enter a virtual interface name in the form vethx, where x is a unique ID (typically one or two
digits).
A typical virtual interface name with VLAN and IP alias is veth56.3999.199. The maximum length
of the full name is 15 characters. Special characters are not allowed. Numbers must be between
0 and 9999.
From the General tab, specify the bonding mode by selecting type from the Bonding Type
list.
In this example, aggregate is selected. The registry setting can be different from the bonding
configuration. When you add interfaces to the virtual interface, the information is not sent
to the bonding module until the virtual interface is brought up. Until that time, the registry
and the bonding driver configuration are different. Specify a bonding mode compatible with
the system requirements to which the interfaces are directly attached. The available modes
are:
Round robin: Transmits packets in sequential order from the first available link through the
last in the aggregated group.
Balanced: Sends data over the interfaces as determined by the selected hash method. All
associated interfaces on the switch must be grouped into an EtherChannel (trunk).
LACP: Is similar to Balanced, except for the control protocol that communicates with the
other end and coordinates what links, within the bond, are available. It provides heartbeat
failover.
7. Select an interface to add to the aggregate configuration by clicking the checkbox corresponding
to the interface.
8. Click Next.
The Create Virtual Interface veth name dialog box appears.

148

Slide 11

Creating a Virtual Interface for Link Aggregation

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

11

To create a link aggregation virtual interface (Continued):


9. Enter an IP address.
10. Enter a netmask address.
The netmask is the subnet portion of the IP address assigned to the interface. The format is
usually 255.255.255.XXX, where XXX is the value that identifies the interface. If you do not
specify a netmask, the Data Domain system uses the netmask format as determined by the
TCP/IP address class (A, B, C) that you are using.
11. Specify the speed and duplex options by selecting either the Autonegotiate Speed/Duplex radio
button or the Manually Configure Speed/Duplex radio button.
The combination of the speed and duplex settings defines the rate of data transfer through the
interface.
12. Select the Autonegotiate Speed/Duplex radio button to allow a NIC to auto-negotiate the line
speed and duplex setting for an interface.

149

13. Select the Manually Configure Speed/Duplex radio button if you want to manually set an
interface data transfer rate.
Duplex options are half-duplex or full-duplex. Speed options are limited to the capabilities of the
hardware. Ensure that all of your network components support the size set with this option.
Optionally select Dynamic Registration (also called DDNS). The dynamic DNS (DDNS) protocol
enables machines on a network to communicate with and register IP addresses on a Data
Domain system DNS server. The DDNS must be registered to enable this option.
14. Click Next.
The Create Virtual Interface Settings summary appears.
15. Ensure that the values listed are correct.
16. Click Finish.
17. Click OK.
Several commands can be used from the command line interface (CLI) to set up and configure link
aggregation on a Data Domain system:
# net aggregate add
Enables aggregation on a virtual interface by specifying the physical interfaces and mode.
Choose the mode compatible with the requirements of the system to which the ports are
attached.
#net aggregate del
Deletes interfaces from the physical list of the aggregate virtual interfaces.
#net aggregate modify
Changes the aggregation configuration on a virtual interface by specifying the physical interfaces
and mode. Choose the mode compatible with the requirements of the system to which the
ports are directly attached.
#net aggregate reset
Removes all physical interfaces from an aggregate virtual interface.
#net aggregate show
Displays basic information on the aggregate setup.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for details on using the
net aggregate commands.

150

Slide 12

Lab 3.2: Configuring Link Aggregation

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

151

12

Slide 13

Module 3: Managing Network Interfaces

Lesson 3: Link Failover


This lesson covers the following topics:
Understanding Link Failover
Creating a Virtual Interface for Link Failover
Enabling or Disabling Link Failover Interfaces

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

13

This lesson covers link failover. First you will learn what link failover does and then you will learn how to
create a virtual interface for link failover on a Data Domain system.

152

Slide 14

Understanding Link Failover


Network Ethernet
Switch
Active Interface
Standby Interface
Data Domain
Appliance

Application/
Media Server

Link failover improves network stability and performance by

keeping backups operational during network glitches.


The Data Domain system bonding driver checks the carrier signal
every 0.9 seconds.
If the carrier signal is lost, the active interface switches to a
standby interface.

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

14

A virtual interface may include both physical and virtual interfaces as members (called interface group
members).
Link failover improves network stability and performance by keeping backups operational during
network glitches.
Link failover is supported by a bonding driver on a Data Domain system. The bonding driver checks the
carrier signal on the active interface every 0.9 seconds. If the carrier signal is lost, the active interface is
changed to another standby interface. An address resolution protocol (ARP) is sent to indicate that the
data must flow to the new interface. The interface can be:
On the same switch
On a different switch
Directly connected

153

Specifications
Only one interface in a group can be active at a time.
Data flows over the active interface. Non-active interfaces can receive data.
You can specify a primary interface. If you do specify a primary interface, it is the active
interface if it is available.
Bonded interfaces can go to the same or different switches.
You do not have to configure a switch to make link failover work.
For a 1 GbE interface, you can put two, or more interfaces in a link failover bonding group.
The bonding interfaces can be:
On the same card
Across cards
Between a card and an interface on the motherboard
Link failover is independent of the interface type. For example, copper and optical can be
failover links if the switches support the connections.
For a 10 GbE interface, you can put only two interfaces in a failover bonding group.

154

Slide 15

Creating a Virtual Interface for Link Failover

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

To create a virtual interface for Link Failover:


1. Go to Hardware > Network > Interfaces
2. Select the Create pull-down menu.
3. Choose Virtual Interface.
4. Enter the virtual interface id.
5. Select General
6. Enter the bonding information.
7. Select the interface(s) for bonding.

155

15

Slide 16

Creating a Virtual Interface for Link Failover

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

16

Continued from previous slide:


8. Click Next.
9. Enter the IP address and Netmask for the virtual interface.
10. Set the Speed/Duplex, and MTU settings.
11. Click Next.
12. Verify that the information in the settings dialog is correct.
13. Click Finish.
The command line interface (CLI) can also be used to create and modify link failover.
# net failover add
Adds network interfaces to a failover interface.
# net failover del
Deletes network interfaces from a failover interface. The physical interface remains disabled
after being removed from the virtual interface. Use commas, spaces, or both to separate list
entries.

156

# net failover modify


Modifies the primary network interface for a failover interface. A down interface must be up for
the amount of time to be designated up. An up interface must be down for the amount of time
to be designated down. A primary interface cannot be removed from failover. To remove a
primary, use the argument primary <physical-ifname> none.
# net failover reset
Resets a failover interface by removing the associated slave interfaces. Resetting a virtual
interface removes all associated physical interfaces from the virtual interface.
# net failover show
Displays all failover interfaces. This command shows what is configured at the bonding driver. To
see what is in the registry, use the net show settings command option. The registry settings may
be different from the bonding configuration. When interfaces are added to the virtual interface,
the information is not sent to the bonding module until the virtual interface is brought up. Until
that time the registry and the bonding driver configuration differ.
Consult the EMC Data Domain Operating System 5.2 Command Reference Guide for details on using the
net failover commands.

157

Slide 17

Enabling or Disabling Link Failover Interfaces


2

1.
2.
3.
4.

Click Hardware
Click Network
Click Interfaces
Select Yes or No
from the
Enabled menu
for the
appropriate
interface

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

To enable or disable an interface:


1. Go to Hardware > Network > Interfaces
2. Select the Enabled pull-down menu.
3. Choose Yes or No.

158

17

Slide 18

Lab 3.3: Configuring Link Failover

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

159

18

Slide 19

Module 3: Managing Network Interfaces

Lesson 4: VLAN and IP Alias Interfaces


This lesson covers the following topics:
Introduction to VLAN and IP alias network interfaces
VLAN and IP alias differences
Creating VLAN and IP aliases

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

19

This lesson covers virtual local area network (VLAN) and internet protocol (IP) alias interfaces. First, you
will learn more about these interfaces and how they differ. Then, you will learn how to enable and
disable them using the Enterprise Manager.

160

Slide 20

Introduction to VLAN and IP Aliases

VLAN and IP aliases identify subnets on a network


VLAN and IP aliases enable LANs to bypass router boundaries
VLAN and IP alias network interfaces are used:
For network security
To segregate network traffic
To speed up network traffic
To organize a network

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

20

Virtual local area networks (VLANs) manage subnets on a network. VLANs enable a LAN to bypass router
boundaries. IP aliases do the same thing.
Virtual local area network (VLAN) and internet protocol (IP) network interfaces are used to:
Segregate network broadcasting
Provide network security
Segregate network traffic
Speed up network traffic
Organize a network

161

Slide 21

VLANs vs. IP Aliases

IP Address
IP Address
IP Address
.
.
.

Corporate Network
IP
Alias

IP
Alias

IP Address
IP Address
IP Address
.
.
.

IT

HR
VLAN 100

VLAN

VLAN 200

Subnet
192.168.11.X

Subnet
10.10.10.X

IP aliases are easy to implement and are less expensive than VLANs
You can combine VLANs and IP aliases
Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

21

If you are not using VLANs, you can use IP aliases. IP aliases are easy to implement and are less
expensive than VLANs, but they are not true VLANs. For example, you must use one IP address for
management and another IP address to back up or archive data. You can combine VLANs and IP aliases.

162

Slide 22

Creating a VLAN Interface

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

22

A VLAN tag is the VLAN or IP alias ID.


VLAN tag insertion (VLAN tagging) enables you to create multiple VLAN segments.
You get VLAN tags from a network administrator. In a Data Domain system, you can have up to 4096
VLAN tags. You can create a new VLAN interface from either a physical interface or a virtual interface.
The recommended total number that can be created is 80, although it is possible to create up to 100
interfaces before the system is affected.
You may add your Data Domain system to a VLAN because the switch port it is connected to may be a
member of multiple VLANs, and you want the most direct path to the DD client (backup software) for
minimum latency.

163

To create a VLAN tag from the Enterprise Manager:


1. From the Navigation pane, select the Data Domain system to configure.
2. Click Hardware > Network > Interfaces.
3. Click Create, and select the VLAN Interface option.
The Create VLAN Interface dialog box appears.
4. Specify a VLAN ID by entering a number in the ID field.
The range of a VLAN ID is between 1 and 4095.
You get the VLAN tag from your system administrator.
5. Enter an IP address.
The Internet Protocol (IP) address is the numerical label assigned to the interface. For example,
192.168.10.23.
6. Enter a netmask address.
The netmask is the subnet portion of the IP address assigned to the interface. The format is
typically 255.255.255.###, where the ### are the values that identify the interface. If you do not
specify a netmask, the Data Domain system uses the netmask format as determined by the
TCP/IP address class (A,B,C) you are using.
7. Specify the MTU settings. Specifying the MTU settings sets the maximum transfer unit (MTU)
size for the physical ( or Ethernet) interface. Supported values are from 350 to 9014. For 100
Base-T and gigabit networks, 1500 is the standard default. Click the Default button to return this
setting to the default value. Ensure that all of your network components support the size set
with this option.
8. Specify the dynamic DNS Registration option.
Dynamic DNS (DDNS) is the protocol that allows machines on a network to communicate with,
and register their IP address on, a domain name system (DNS) server. The DDNS must be
registered to enable this option.
9. Click Next.
The Create VLAN Interface Settings summary page appears. The values listed reflect the new
system and interface state.
10. Click Finish.
11. Click OK.

164

Slide 23

Creating an IP Alias Interface

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

23

You can create a new IP Alias interface from a physical interface, a virtual interface, or a VLAN. When
you do this, you are telling the interface the IP Subnet(s) to which it belongs. This is done because the
switch/router may be connected to many networks, and you want the most direct path to the Data
Domain system.
The recommended total number of IP Aliases, VLAN, physical, and virtual interfaces that can exist on the
system is 80, although it is possible to have up to 100 interfaces.
1. From the Navigation pane, select the Data Domain system to configure.
2. Click the Hardware > Network > Interfaces tabs.
3. Click the Create menu and select the IP Alias option.
The Create IP Alias dialog box appears.
4. Specify an IP Alias ID by entering a number in the eth0a field.
Requirements are: 1 to 4094 inclusive.
5. Enter an IP Address.
The Internet Protocol (IP) Address is the numerical label assigned to the interface. For example,
192.168.10.23

165

6. Enter a Netmask address.


The Netmask is the subnet portion of the IP address assigned to the interface.
The format is typically 255.255.255.000. If you do not specify a netmask, the Data Domain
system uses the netmask format as determined by the TCP/IP address class (A,B,C) you are
using.
7. Specify Dynamic DNS Registration option.
Dynamic DNS (DDNS) is the protocol that allows machines on a network to communicate with,
and register their IP address on, a Domain Name System (DNS) server.
The DDNS must be registered to enable this option. Refer to Registering a DDNS in the DD OS
5.2 Administration Guide for additional information.
8. Click Next.
The Create IP Alias Interface Settings summary page appears. The values listed reflect the new
system and interface state.
9. Click Finish and OK.

166

Slide 24

Module 3: Summary

Link aggregation increases throughput across a network


Aggregation is across two or more network interfaces
Links can be part of only one group
All links in a group must have the same settings
Aggregation and failover use virtual interfaces
Link failover provides high availability by keeping backups operational
during network glitches
A Data Domain system bonding driver checks the carrier signal every
0.9 seconds
If a signal is lost, the active interface changes to a standby interface
For link failover, only one interface in a group can be active at a time
VLAN and IP aliases are used for better network speed, security, and
organization

Module 3: Managing Network Interfaces

Copyright 2013 EMC Corporation. All Rights Reserved.

167

24

168

Slide 1

Module 4: CIFS and NFS

Upon completion of this module, you should be able to:


Configure CIFS on a Data Domain System
Configure NFS on a Data Domain System

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

This module focuses on connecting to a Data Domain appliance using the CIFS and NFS protocols.

169

Slide 2

Module 4: CIFS and NFS

Lesson 1: CIFS
This lesson covers the following topics:
Data Access for CIFS
Enabling CIFS Services
CIFS Authentication
Creating a CIFS Share
Accessing a CIFS Share
Monitoring CIFS

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

In many cases, as part of the initial Data Domain system configuration, CIFS clients were configured to
access the ddvar and MTree directories. This module describes how to modify these settings and how to
manage data access using the Enterprise Manager and cifs command.
This lesson covers the following topics:
Data Access for CIFS
Enabling CIFS Services
Creating a CIFS Share
Accessing a CIFS Share
Monitoring CIFS

170

Slide 3

Data Access for CIFS

The Enterprise Manager Data Management > CIFS page allows


you to perform major CIFS operations such as:
Enabling and disabling CIFS
Setting authentication
Managing shares
Viewing configuration and share information

From the command line interface (CLI), the cifs command


contains all the options to facilitate CIFS transfers between
backup servers and Data Domain systems, and display CIFS
statistics and status.

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

The Common Internet File System (CIFS) clients can have access to the system directories on the Data
Domain system. The /data/col1/backup directory is the default destination directory for
compressed backup server data. The /ddvar directory contains Data Domain system core and log files.
Clients, such as backup servers that perform backup and restore operations with a Data Domain System,
at the least, need access to the /data/col1/backup directory. Clients that have administrative
access need to be able to access the /ddvar directory to retrieve core and log files.
The Common Internet File System (CIFS) operates as an application-layer network protocol. It is mainly
used for providing shared access to files, printers, serial ports, and miscellaneous communication
between nodes on a network.
When you configure CIFS, your Data Domain system is able to communicate with MS Windows.

171

To configure a CIFS share, you must:


1. Configure the workgroup mode, or configure the active directory mode.
2. Give a descriptive name for the share.
3. Enter the path to the target directory (for example, /data/col1/mtree1).
The cifs command enables and disables access to a Data Domain system from media servers and
other Windows clients that use the CIFS protocol. For complete information about the cifs command,
see the DD OS 5.2 Command Reference Guide.

172

Slide 4

Enable CIFS Services


1

1. Click Data
2.

Management
Click CIFS

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

After configuring client access, enable CIFS services, which allow the client to access the system using
the CIFS protocol.
1. For the Data Domain system selected in the Enterprise Manager Navigation pane, click Data
Management > CIFS.
2. In the CIFS Status area, click Enable.
The hostname for the Data Domain system that serves as the CIFS server was set during the systems
initial configuration.
A Data Domain systems hostname should match the name assigned to its IP address, or addresses, in
the DNS table. Otherwise, there might be problems when the system attempts to join a domain, and
authentication failures can occur. If you need to change the Data Domain systems hostname, use the
net set hostname command, and also modify the systems entry in the DNS table.
When the Data Domain system acts as a CIFS server, it takes the hostname of the system. For
compatibility, it also creates a NetBIOS name. The NetBIOS name is the first component of the hostname
in all uppercase letters. For example, the hostname jp9.oasis.local is truncated to the NetBIOS name JP9.
The CIFS server responds to both names.

173

From the command line, you can use the cifs enable command to enable CIFS services.
# cifs enable
Enable the CIFS service and allow CIFS clients to connect to the Data Domain system.
For complete information about the cifs enable command, see the DD OS 5.2 Command Reference
Guide.

174

Slide 5

CIFS Authentication

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

The Enterprise Manager Configure Authentication dialog box allows you to set the authentication
parameters that the Data Domain system uses for working with CIFS.
The Data Domain system can join the active directory (AD) domain or the NT4 domain, or be part of a
workgroup (the default). If you did not use the Enterprise Managers Configuration Wizard to set the
join mode, use the procedures in this section to choose or change a mode.
The Data Domain system must meet all active-directory requirements, such as a clock time that differs
no more than five minutes from that of the domain controller.
The workgroup mode means that the Data Domain system authenticates CIFS clients using local user
accounts defined on the Data Domain system.

175

You can also set authentication for CIFS shares using the command line interface (CLI):
# cifs set authentication active-directory <realm> { [<dc1> [<dc2>
...]] | * }
Set authentication to the Active Directory. The realm must be a fully qualified name. Use
commas, spaces, or both to separate entries in the domain controller list. Security officer
authorization is required for systems with Retention Lock Compliance.
Note: Data Domain recommends using the asterisk to set all controllers instead of entering
them individually.
When prompted, enter a name for a user account. The type and format of the name depend on
whether the user is inside or outside the company domain.
For user Administrator inside the company domain, enter the name only: administrator.
For user JaneDoe in a non-local, trusted domain, enter the username and domain:
jane.doe@trusteddomain.com. The account in the trusted domain must have permission to
join the Data Domain system to your company domain.
If DDNS is enabled, the Data Domain system automatically adds a host entry to the DNS server.
It is not necessary to create the entry manually when DDNS is enabled.
If you set the NetBIOS hostname using the command cifs set nb-hostname, the entry is
created for the NetBIOS hostname only, not the system hostname. Otherwise, the system
hostname is used.
# cifs set authentication workgroup <workgroup>
Set the authentication mode to workgroup for the specified workgroup name.
For complete information about the cifs set authentication command, see the DD OS 5.2
Command Reference Guide.

176

Slide 6

Creating a CIFS Share

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

When creating shares, you must assign client access to each directory separately and remove access
from each directory separately. For example, a client can be removed from /ddvar and still have
access to /data/col1/backup.
Note: If Replication is to be implemented, a Data Domain system can receive backups from both CIFS
clients and NFS clients as long as separate directories are used for each. Do not mix CIFS and NFS data in
the same directory.
To share a folder using the CIFS protocol on a Data Domain system:
1. From the Navigational pane, select a Data Domain system to configure shares.
2. Click Data Management > CIFS tabs to navigate to the CIFS view.
3. Ensure authentication has been configured.
4. On the CIFS client, set shared directory permissions or security options.
5. On the CIFS view, click the Shares tab.
6. Click Create.
The Create Shares dialog box appears.

177

7. In the Create Shares dialog box, enter the following information:


Share Name: A descriptive name for the share.
Directory Path: The path to the target directory (for example, /data/col1/backup/dir1).
Comment: A descriptive comment about the share.
8. Add a client by clicking the plus sign (+) in the Clients area. The Client dialog box appears. Enter
the name of the client in the Client text box and click OK. No blanks or tabs (white space)
characters are allowed. Repeat this step for each client that you need to configure.
9. To modify a User or Group name, in the User/Group list, click the checkbox of the user or group
and click edit (pencil icon) or delete (X). To add a user or group, click (+), and in the User/Group
dialog box, select the Type radio button for User or Group, and enter the user or group name.

178

Slide 7

Accessing a CIFS Share

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

From a Windows Client, you can access CIFS shares on a Data Domain system either from a Windows
Explorer window or at the DOS prompt (Run menu).
From a Windows Explorer window:
1. Select Map Network Drive
2. Select a Drive letter to assign the share
3. Enter the DD system to connect to and the share name (\\<DD_Sys>\<Share>), for
example, \\host1\backup
4. Check the box Connect using a different username, if necessary
5. Click Finish
If Connect using a different username was checked, you will be prompted for your Data Domain
username and password.
From the DOS Prompt or Run menu, enter:
> net use drive: \\<DD_Sys>\<Share> /USER:<DD_Username>
You will be prompted for the password to your Data Domain user account.

179

For example, enter:


> net use H: \\host1\backup /USER:dd02
This command maps the backup share from Data Domain system host1 to drive H on the Windows
system and gives the user named dd02 access to the \\<DD_System>\backup directory.

180

Slide 8

Monitoring CIFS

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

The CIFS tab of the Data Domain Enterprise Manager provides information about the configuration and
status of CIFS shares.
Easily viewable are the number of open connections, open files, connection limit and open files limit per
connection. Click the Connection Details link to view the details about active connections to the CIFS
shares.

181

You can also use the command line interface (CLI) to view details and statistics about CIFS shares.
# cifs show active
Display all active CIFS clients.
# cifs show clients
Display all allowed CIFS clients for the default /ddvar administrative share and the default
/backup data share.
# cifs show config
Display the CIFS configuration.
# cifs show detailed-stats
Display statistics for every individual type of SMB operation, display CIFS client statistics, and
print a list of operating systems with their client counts.
The list counts the number of different IP addresses connected from each operating system. In
some cases, the same client may use multiple IP addresses.
Output for CIFS Client Type shows Miscellaneous clients, where Yes means the displayed list of
clients is incomplete. No means the list is complete, and Maximum connections, where the
value is the maximum number of connections since the last reset.
# cifs show stats
Show CIFS statistics.
For complete information about the cifs show command, see the DD OS 5.2 Command Reference
Guide.

182

Slide 9

Lab 4.1: Configuring CIFS on a Data Domain System

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

183

Slide 10

Module 4: CIFS and NFS

Lesson 2: NFS
This lesson covers the following topics:
NFS Exports
Configuring NFS
Monitoring NFS

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers the configuration and monitoring of NFS exports on a Data Domain system.

184

10

Slide 11

NFS Exports

Network File System (NFS) clients can have access to the system
directories or MTrees on the Data Domain system.

/backup is the default destination for non-Mtree backup data.


The /data/col1/backup path is the root destination when

using MTrees for backup data.


The /ddvar directory contains Data Domain System core and log
files.

Clients, such as backup servers that perform backup and restore


operations with a Data Domain System, need access to the
/backup or /data/col1/backup areas.
Clients that have administrative access need access to the
/ddvar directory to retrieve core and log files.

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

11

The Network File System (NFS) is a distributed file system protocol originally developed by Sun
Microsystems in 1984. It allows a user on a client computer to access files over a network in a manner
similar to how local storage is accessed. NFS, like many other protocols, builds on the Open Network
Computing Remote Procedure Call (ONC RPC) system. The Network File System is an open standard
defined in RFCs, allowing anyone to implement the protocol.
Network File System (NFS) clients can have access to the system directories or MTrees on the Data
Domain system.
/backup is the default destination for non-MTree compressed backup server data.
The /data/col1/backup path is the root destination when using MTrees for compressed
backup server data.
The /ddvar directory contains Data Domain System core and log files.
Clients, such as backup servers that perform backup and restore operations with a Data Domain System,
need access to the /backup or /data/col1/backup areas. Clients that have administrative access
need access to the /ddvar directory to retrieve core and log files.

185

Slide 12

Configuring NFS
1

1. Click Data
2.
3.
4.

Management
Click NFS
Click Create
Click + to add
clients

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

12

To configure an NFS export:


1. Select a Data Domain system in the left navigation pane.
2. Go to Data Management > NFS > Exports.
3. Click Create.
4. Enter a path name for the export.
5. In the Clients area, select an existing client or click the plus (+) icon to create a client.
The Create NFS Exports dialog box appears.
6. Enter a server name in the text box:
Enter fully qualified domain names, hostnames, or IP addresses.
A single asterisk (*) as a wild card indicates that all backup servers are used as clients.
Clients given access to the /data/col1/backup directory have access to the entire
directory.
A client given access to a subdirectory of /data/col1/backup has access only to that
subdirectory.

186

A client can be a(n):


fully-qualified domain hostname
IP address
IP address with either a netmask or length
NIS netgroup name with the prefix @, or an asterisk (*) wildcard with a domain name,
such as *.yourcompany.com
7. Select the checkboxes of the NFS options for the client.
Read-only permission.
Default requires that requests originate on a port that is less than IPPORT_RESERVED (1024).
Map requests from UID or GID 0 to the anonymous UID or GID
Map all user requests to the anonymous UID or GID.
Use default anonymous UID or GID.
The nfs command enables you to add NFS clients and manage access to a Data Domain system. It also
enables you to display status information, such as verifying that the NFS system is active, and the time
required for specific NFS operations.
# nfs add <path > <client-list> [(<option-list>)]
Add NFS clients that can access the Data Domain system. A client can be a fully qualified domain
hostname, class-C IP addresses, IP addresses with netmasks or length, an NIS netgroup name
with the prefix @, or an asterisk wildcard for the domain name, such as *.yourcompany.com.
An asterisk by itself means no restrictions. A client added to a subdirectory under /backup has
access only to that subdirectory.
The <options-list> is comma or space separated, enclosed by parentheses. If no option is
specified, the default options are rw, root_squash, no_all_squash, and secure.
In GDA configurations, only /ddvar is exported. The export of /data shares is not supported.
# nfs disable
Disable all NFS clients.
# nfs enable
Allow all NFS-defined clients to access the Data Domain system.

187

Slide 13

Monitoring NFS
1

1.

2.
3.

Click Data
Management
Click the NFS tab
Click Active
Clients

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

13

You can use the Data Domain Enterprise Manager to monitor NFS client status and NFS configuration:
1. Click Data Management
2. Click the NFS tab
The top pane shows the operational status of NFS, for example, NFS is currently active and
running.

188

You can also use the command line interface (CLI) to monitor NFS client status and statistics.
# nfs show active
List clients active in the past 15 minutes and the mount path for each. Allow all NFS-defined
clients to access the Data Domain system.
# nfs show clients
List NFS clients allowed to access the Data Domain system and the mount path and NFS options
for each.
# nfs show detailed-stats
Display NFS cache entries and status to facilitate troubleshooting.
# nfs show histogram
Display NFS operations in a histogram. Users with user role permissions may run this command.
# nfs show port
Display NFS port information. Users with user role permissions may run this command.
# nfs show stats
Display NFS statistics.
# nfs status
Enter this option to determine if the NFS system is operational. When the file system is active
and running, the output shows the total number of NFS requests since the file system started, or
since the last time the NFS statistics were reset.
For complete information about the nfs commands, see the DD OS 5.2 Command Reference Guide.

189

Slide 14

Lab 4.2: Configuring NFS on a Data Domain System

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

190

14

Slide 15

Module 4: Summary

When you configure CIFS, your Data Domain system is able to

communicate with MS Windows.


When you configure NFS, your Data Domain system is able to
communicate with Unix-based systems.
The /data/col1/backup directory is the directory for backup data.
The /ddvar directory contains Data Domain system core and log
files.

Module 4: CIFS and NFS

Copyright 2013 EMC Corporation. All Rights Reserved.

191

15

192

Slide 1

Module 5: File System and Data Management

Upon completion of this module, you should be able to:


Describe and configure MTrees
Describe and perform snapshots
Describe and perform a fast copy
Describe and perform file system cleaning
Describe file system space usage

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

In this module, you will learn about managing data with a Data Domain system.
Describe and configure MTrees
Describe and perform snapshots
Describe and perform a fast copy
Describe and perform file system cleaning
Describe file system space usage

193

Slide 2

Module 5: File System and Data


Management
Lesson 1: Configuring and Monitoring MTrees
This lesson covers the following topics:
MTree use and benefits
Soft and hard MTree quotas
Structured lab using MTrees

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers configuring and monitoring MTrees for storing backups within a Data Domain file
system. Topics include:
MTree use and benefits
Soft and hard MTree quotas
You will have a chance to configure MTrees, as well as set and monitor quotas on a Data Domain system
in a structured lab.

194

Slide 3

MTrees
/backup/

/data/

/hr

/col1/
/backup

/sales
All subdirectories are
subject to the same
permissions, policies and
reporting.

/hr

/sales
Each MTree can
be managed
individually

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

MTrees (Management Trees) are used to provide more granular management of data so different types
of data, or data from different sources, can be managed and reported on, separately. Various backup
operations are directed to individual MTrees. For example, you can configure directory export levels and
quotas to separate and manage backup files by department.
Before MTrees were implemented, subdirectories under a single /backup directory were created to
keep different types of data separate. Data from different sources, departments, or locales were backed
up to separate subdirectories under /backup but all subdirectories were subject to the same
permissions, policies, and reporting.
With MTrees enabled, data can now be backed up to separately managed directory trees, MTrees. A
static MTree, /backup, is still created by the file system, but cannot be removed or renamed.
Additional MTrees can be configured by the system administrator under /data/col1/ (col stands for
collection). You can still create a subdirectory under any MTree, but it will be subject to the same
permissions, policies, and reporting as the MTree in which it resides.

195

Slide 4

Benefits of MTrees

Space and deduplication rate

reporting by MTree
Independent replication
scheduling MTree replication
Independent snapshot schedules
MTree-specific retention lock
MTree-specific compression types
Limit logical space used by
specific MTree - quotas

/data/

/col1/
/backup

/hr

/sales

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

Increased granular reporting of space and deduplication rates in the case you might have different
departments or geographies backing up to the same Data Domain system, each department or
geography could have their own independent storage location each with different choices for
compression, and replication.
The term, snapshot, is a common industry term denoting the ability to record the state of a storage
device or a portion of the data being stored on the device, at any given moment, and to preserve that
snapshot as a guide for restoring the storage device, or portion thereof. Snapshots are used extensively
as a part of the Data Domain data restoration process. With MTrees, snapshots can be managed at a
more granular level.
Retention lock is an optional feature used by Data Domain systems to securely retain saved data for a
given length of time and protecting it from accidental or malicious deletion. Retention lock feature can
now be applied at the MTree level.
Another major benefit is to limit the logical, pre-comp, space used by the specific MTree through
quotas.

196

Slide 5

MTrees
Data Domain systems support up to 100

MTrees.
More than 14 simultaneous MTrees
engaged in read or write streams will
degrade performance.
Nothing can be added to the
/data/ directory.
/data/, /data/col1/, and
/data/col1/backup cannot be deleted or
renamed.
MTrees are only created under
/data/col1/
Subdirectories can still be created under
/data/col1/backup.
Subdirectories can be created within
user-created MTrees. Reporting is
cumulative for the entire Mtree.

/data/

/col1/
/backup

/hr

/sales

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

Although a Data Domain system supports a maximum of 100 MTrees, system performance might
degrade rapidly if more than 14 MTrees are actively engaged in read or write streams. The degree of
degradation depends on overall I/O intensity and other file system loads. For optimum performance,
constrain the number of simultaneously active MTrees to a maximum of 14. Whenever possible,
aggregate operations on the same MTree into a single operation.
Regular subdirectories can be configured under /data/col1/backup as allowed in prior versions of
DDOS. Subdirectories can also be configured under any other configured MTree. Although you can
create additional directories under an MTree, the Data Domain system recognizes and reports on the
cumulative data contained within the entire MTree.
You cannot add data or directories to /data or /col1. You can add MTrees only to /col1/data.
/col1, and /backup cannot be deleted or renamed.

197

Slide 6

MTrees with CIFS and NFS

NFS and CIFS can access:


/data/col1/<MTreeName>
/data/col1/<MTreeName>
/arbitrary/subdirectory/path
Other protocols have special
storage requirements within the
MTree structure and are
discussed in their respective
modules.

/data/

/col1/
/backup

/hr

/sales

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

NFS and CIFS can access /data and all of the MTrees beneath /col1 by configuring normal CIFS
shares and NFS exports.
VTL and DD Boost have special storage requirements within the MTree structure and are discussed in
later modules.

198

Slide 7

MTree Quotas

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

MTree quotas allow you to set limits on the amount of logical, pre-comp space used by individual
MTrees. Quotas can be set for MTrees used by CIFS, NFS, VTL, or DD BOOST data.
There are two types of quotas:

Soft limit: When this limit is reached, an alert is generated through the system, but operations
continue as normal.

Hard limit: When this limit is reached, any data in the process of backup to this MTree fail. An
alert is also generated through the system, and an out of space error (EMOSP for VTL) is
reported to the backup app. In order to resume backup operations after data within an MTree
reaches a hard limit quota, you must either delete sufficient content in the MTree, increase the
hard limit quota, or disable quotas for the MTree.

You can set a soft limit, a hard limit, or both soft and hard limits. Quotas work using the amount of
logical space (pre-comp, not physical space) allocated to an individual MTree. The smallest quota that
can be set is 1 MiB.

199

An administrator can set the storage space restriction for an MTree to prevent it from consuming excess
space. The Data Management > Quota page shows the administrator how many MTrees have no soft or
hard quotas set, and for MTrees with quotas set, the percentage of pre-compressed soft and hard limits
used.
The entire quota function is enabled or disabled from the Quota Settings window.
Quotas for existing MTrees are set by selecting the Configure Quota button.

200

Slide 8

Creating MTrees in Enterprise Manager

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

To create an MTree in the Enterprise Manager:


1. Click Data Management > MTree > Create.
A Create MTree dialog will appear.
2. Type the name of the MTree you are creating in the MTree name field.
3. Click OK to complete the MTree creation.
Setting MTree Quotas
MTree quotas can be set at the same time that an MTree is created, or they can be set after creating the
MTree. Quotas can be set and managed using the Enterprise Manager or the CLI.
The advantage of MTree operations is that quotas can be applied to a specific MTree as opposed to the
entire file system.

201

Related CLI commands:


# mtree create
Creates an Mtree
# mtree delete
Deletes an MTree
# mtree undelete
Undeletes an MTree
# mtree list
Lists the Mtrees
# quota disable
Disables quota function
# quota enable
Enables quota function
# quota reset
Resets quota limits to none
# quota set
Sets quota limits
# quota show
Lists quotas for MTrees and storage-units
# quota status
Shows status for quota function

202

Slide 9

Creating MTrees in Enterprise Manager (Continued)

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

When the MTree is created, it appears in the list of MTrees alphabetically by name.
As data fills the MTree, Data Domain Enterprise Manager will display graphically and by percentage the
quota hard limit. You can view this display at Data Management > MTree. The MTree display presents
the list of MTrees, quota hard limits, daily and weekly pre-comp and post-comp amounts and ratios.

203

Slide 10

Monitoring MTree Usage

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

10

Scroll further down the MTree tab and you see three additional tabs: Summary, Space Usage, and Daily
Written.
Selecting an MTree from the list will display a summary of that MTree. In the Summary tab you can also
rename the MTree, adjust the quotas, and create an NFS export.
The Space Usage tab displays a graph representing the amount of space used in the selected MTree over
the selected duration (7, 30, 60, or 120 days).
Click the Daily Written tab, and you see a graph depicting the amount of space written in the selected
MTree over a selected duration (7, 30, 60, or 120 days).
Note: You must have the most current version of Adobe Flash installed and enabled with your web
browser in order to view these reports.
The related pre-, post-, and total compression factors over the same time period are also reported.

204

Slide 11

Monitoring MTree Usage


MTree overview pane

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

11

Data Domain systems not only provide improved control over backups using MTrees, the system also
provides data monitoring at the MTree level.
Under Data Management > MTree is a summary tab that provides an at-a-glance view of all configured
MTrees, their quota hard limits (if set), pre- and post-comp usage, as well as compression ratios for the
last 24 hours, the last 7 days, and current weekly average compression.
Select an Mtree, and the Summary pane presents current information about the selected MTree.
Note: The information on this summary page is delayed by at least 10 minutes.

205

Slide 12

Monitoring MTree Usage


MTree Quota Alerts

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

12

If a quota-enabled MTree fills with data, the system will generate soft and hard limit alerts when a soft
or hard limit in a specific MTree is reached.

Soft limit: When this limit is reached, an alert is generated through the system, but operations
continue as normal.

Hard limit: When this limit is reached, any data in the process of backup to this MTree fail. An
alert is also generated through the system, and an out of space error (EMOSP for VTL) is
reported to the backup app. In order to resume backup operations after data within an MTree
reaches a hard limit quota, you must delete sufficient content in the MTree, increase the hard
limit quota, or disable quotas for the MTree.

These alerts are reported in the Data Domain Enterprise Manager > Status > Summary > Alerts pane in
the file system alerts. Details are reported in the Status > Alerts > Current Alerts and Alerts History tabs.
When an alert is reported, you will see the status as posted. After the alert is resolved, you will see the
status as cleared.

206

Slide 13

Monitoring MTree Usage


MTree Summary Pane

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

13

A Data Domain system provides control through individual MTree organization. You can also monitor
system usage at the same MTree level.
Under Data Management > MTree you find a summary tab providing an at-a-glance view of all
configured MTrees, their quota hard limits (if set), pre- and post-comp usage, as well as compression
ratios for the last 24 hours, the last 7 days, and current weekly average compression.
Below the list of MTrees, the MTree Summary pane shows at-a-glance the settings associated with the
selected MTree. In this pane, you can also perform the following on the selected MTree:
Rename the MTree
Configure quotas, hard and soft
Create an NFS export

207

On the same display below the summary pane, you can also find panes that monitor MTree replication,
snapshots and retention lock for the selected MTree. This course covers the MTree replication pane and
the retention lock pane in a later module.
You can control the snapshot schedules associated with the selected MTree. You can also see at-aglance, the total number of snapshots collected, expired, and unexpired, as well as the oldest, newest,
and next scheduled snapshot.

208

Slide 14

Monitoring MTrees Using the Command Line


Show MTree list with pre-comp space and quotas
# quota show all
sysadmin@ddsystem-03# quota show all
MTree
Pre-Comp (MiB)
----------------------------------/data/col1/backup
2465
/data/col1/dev
605
/data/col1/engineering
0
/data/col1/HR
998
/data/col1/sales
714
/data/col1/support
1924
-----------------------------------

Soft-Limit (MiB)
---------------none
1500
200
750
1000
2000
----------------

Hard-Limit (MiB)
---------------none
2000
250
1000
2000
3000
----------------

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

14

The reports shown in Data Management > MTree are delayed at least fifteen minutes. Real time
reporting is available only through the command line interface (CLI) using the quota show command.
As data transfers to any MTree, you can use quota show all to view a nearly instant update of the
pre-comp amount change.
In this example, /data/col1/HR has exceeded the soft-limit and nearly reached the hard-limit.

209

Slide 15

Monitoring MTrees Using the Command Line


Display the current alert messages
# alerts show current
sysadmin@ddsystem-03# alerts show current
Alert Id
Time
Severity
Message
-------------------------------------------------------------------25
Thu Oct 4 09:48:52 2012
WARNING
MTree Quota Soft limit reached.
-------------------------------------------------------------------There is 1 active alert.

Class

Object

----------

-------------------

Filesystem

MTree=/data/col1/HR

----------

-------------------

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

15

After an MTree exceeds the value set as a soft-limit quota, the Data Domain system generates an alert
warning.
In this example, /data/col1/HR has exceeded the soft-limit and the system has generated the alert
warning.
From the command line, you can review current alerts by issuing the alerts show current command. In
this case, there is only one current system alert showing that /data/col1/HR has reached its quota
soft limit.
In the Data Domain Enterprise Manager, you can view alerts by clicking Status > Alerts > Current Alerts
There are three ways to clear a quota limit alert: remove data stored in the MTree, increase the quota
limit, or turn quota limits off.

210

Related CLI commands:


# quota disable
Disables Mtree quotas.
# quota enable
Enables Mtree quotas.
# quota reset
Resets quota limits.
# quota set
Sets quota limits.
# quota status
Gets the current quota status.

211

Slide 16

Lab 5.1: Configuring MTrees and Quotas

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

212

16

Slide 17

Module 5: File System and Data


Management
Lesson 2: Snapshot Operations
This lesson covers the following topics:
Snapshot definition and benefits
Basic snapshot operations: creation, schedule, and expiration

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

17

This lesson covers snapshot operations and their use in a Data Domain file system. Topics include:
Snapshot definition, use, and benefits
Basic snapshot operations: creation, schedule, and expiration
You will have a chance to configure and create a snapshot on a Data Domain system in a structured lab.

213

Slide 18

What is a Snapshot?
/data/

/data/

/col1/

/col1/
/backup

/backup

/HR

/HR

/sales

/sales

/support

/support

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

18

Snapshot is a common industry term denoting the ability to record the state of a storage device or a
portion of the data being stored on the device, at any given moment, and to preserve that snapshot as a
guide for restoring the storage device, or portion thereof. A snapshot primarily creates a point-in-time
copy of the data. Snapshot copy is done instantly and made available for use by other applications such
as data protection, data analysis and reporting, and data replication applications. The original copy of
the data continues to be available to the applications without interruption, while the snapshot copy is
used to perform other functions on the data.
Snapshots provide an excellent means of data protection. The trend towards using snapshot technology
comes from the benefits that snapshots deliver in addressing many of the issues that businesses face.
Snapshots enable better application availability, faster recovery, and easier back up management of
large volumes of data.

214

Snapshot benefits:
Snapshots initially do not use many system resources.
Note: Snapshots will continue to place a hold on all data they reference even when the backups
have expired.
Snapshots save a read-only copy of a designated MTree at a specific point in time.
Snapshots are useful for saving a copy of MTrees at specific points in time for instance, before
a Data Domain OS upgrade which can later be used as a restore point if files need to be
restored from that specific point in time. Use the snapshot command to take an image of an
MTree, to manage MTree snapshots and schedules, and to display information about the status
of existing snapshots.
You can schedule multiple snapshot schedules at the same time or create them individually as
you choose.
The maximum number of snapshots allowed to be stored on a Data Domain system is 750 per MTree.
You will receive a warning when the number of snapshots reaches 90% of the allowed number (675-749)
in a given MTree. An alert is generated when you reach the maximum snapshot count.

215

Slide 19

What is a Snapshot?

Snapshot of
the production
file

Production file

/HR

/HR

Snapshot taken at 22:24 GMT


Snapshot originally copies metadata
pointers

File data segments 1

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

19

A snapshot saves a read-only copy of the designated MTree at a specific point in time where it can later
be used as a restore point if files need to be restored from that specific point in time.
In a snapshot, only the pointers to the production data being copied are recorded at a specific point in
time. In this case, 22:24 GMT. The copy is extremely quick and places extremely little load on the
production systems to copy this data.

216

Slide 20

What is a Snapshot?
Modified
production
file

Snapshot of the
un-modified file

/HR

/HR

Snapshot taken at 22:24 GMT

File data segments 1

Changed data (After 22:24 GMT)

When changes are made to the file data,


additional blocks are added and pointers to the
changed data are maintained in the production data
logs. The snapshot maintains pointers to the original,
point-in-time data. No data is overwritten or deleted.

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

20

When production data is changed, additional blocks are written, and pointers are changed to access the
changed data. The snapshot maintains pointers to the original, point-in-time data. All data remains on
the system as long as pointers reference the data.
Snapshots are a point-in-time view of a file system. They can be used to recover previous versions of
files, and also to recover from an accidental deletion of files.

217

Slide 21

Snapshot Operations Overview

original copy

snapshot copy
/HR

/HR

Snapshot taken at 22:24 GMT

/data/ col1 /HR

/data/ col1 /HR /.snapshot/snap001/[files]


002/[files]
Each directory in the MTree will
003/[files]
have a copy of its snapshot data.

/data/ col1 /backup/[files]

/data/ col1 /backup/.snapshot/snap001/[files]

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

21

As an example, snapshots for the MTree named backup are created in the system directory
/data/col1/backup/.snapshot. Each directory under /data/col1/backup also has a
.snapshot directory with the name of each snapshot that includes the directory. Each MTree has the
same type of structure, so an MTree named HR would have a system directory
/data/col1/HR/.snapshot, and each subdirectory in /data/col1/HR would have a
.snapshot directory as well.
Use the snapshot feature to take an image of an MTree, to manage MTree snapshots and schedules, and
to display information about the status of existing snapshots.
Note: If only /data is mounted or shared, the .snapshot directory is not visible. The .snapshot directory
is visible when the MTree itself is mounted.

218

Slide 22

Snapshot Operations

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

22

To create a Snapshot:
1. Go to Data Management > Snapshots
Select the MTree from the dropdown list.
If snapshots are listed, you can search by using a search term in the Filter By Name or Year
field.
You can modify the expiration date, rename a snapshot or immediately expire any number
of selected snapshots from the Snapshots pane.
2. Click Create. A snapshot Create dialog appears.
3. Name the snapshot, and set an expiration date. If you do not set a date, the snapshot will not
release the data to which it is pointing until you manually remove the snapshot.
You can perform modify, rename, and delete actions using the same interface in the Snapshots
tab.

219

Related CLI commands:


# snapshot expire
Sets or resets the retention time of a snapshot. Expires a snapshot. If you want to expire the
snapshot immediately, use the snapshot expire operation with no options. An expired snapshot
remains available until the next file system clean operation.
# snapshot rename
Renames a snapshot
# snapshot list MTree
Displays a list of snapshots of a specific MTree. The display shows the snapshot name, the
amount of pre-compression data, the creation date, the retention date, and the status. The
status may be blank or expired.
# snapshot create
Creates a snapshot.
# snapshot schedule create
Schedules when snapshots are taken.
# snapshot schedule del
Schedules when snapshots are deleted.
# snapshot schedule destroy
Deletes snapshots from the schedule.
# snapshot schedule modify
Modifies the existing snapshot schedule.
# snapshot schedule reset
Deletes all snapshot schedules.
# snapshot schedule show
Shows all snapshot schedules.

220

Slide 23

Creating Snapshot Schedules

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

To create a schedule for a series of snapshots:


1. From the Schedules tab, click Create.
2. Follow the Snapshot Schedule Wizard to define a name, naming pattern, the schedule for
recurring snapshot events, and the retention period before the snapshots expire.
A summary window appears allowing you to approve the schedule.
3. Click Finish to confirm the schedule.
Snapshots occur as scheduled. Scheduled snapshots appear in the list below the Schedules tab

221

23

Related CLI commands:


# snapshot schedule add
Adds multiple MTrees to a single snapshot schedule.
# snapshot schedule
Creates a snapshot schedule for multiple MTrees. Command arguments determine the duration
of the schedule.
# snapshot schedule del
Removes a list of MTrees from a schedule.
# snapshot schedule destroy
Removes the name of a schedule.
# snapshot schedule modify
Modifies a snapshot schedule. Command arguments determine the duration of the schedule.
# snapshot schedule reset
Immediately resets a snapshot schedule and deletes all snapshot schedules.
CAUTION!: This command deletes the previous schedule without prompting the user.
# snapshot schedule show
Shows schedules associated with a selected MTree. To show a list of schedules, enter the same
command with no options.

222

Slide 24

Monitoring MTree Usage


MTree Snapshots Pane

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

24

Immediately below the MTree list, in the summary pane, you can view the Snapshot pane that monitors
snapshots for the selected MTree.
The Snapshots pane in the MTree summary page allows you to see at-a-glance, the total number of
snapshots collected, expired, and unexpired, as well as the oldest, newest, and next scheduled snapshot
within a given MTree.
You can associate configured snapshot schedules with the selected MTree name. Click Assign Snapshot
Schedules, select a schedule from the list of snapshot schedules and click okay to assign it. You can
create additional snapshot schedules if needed.

223

Slide 25

Lab 5.2: Configuring Snapshots

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

224

25

Slide 26

Module 5: File System and Data


Management
Lesson 3: Fast Copy
This lesson covers the following topics:
Fast copy overview
Configuring and performing fast copy operations

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

26

This lesson covers fast copy operations and their use in a Data Domain file system. Topics include:
Fast copy definition, use, and benefits.
Basic fast copy operations: creation, schedule, and expiration.
You will have a chance to configure and create a fast copy on a Data Domain system in a structured lab.

225

Slide 27

Fast Copy

/backup
/recovery

Destination:

10-31-2012

/hr

/.snapshot

/data/col1/backup/recovery

Source:

/data/col1/hr/.snapshot/10-31-2012

10-31-2012
10-15-2012

/support

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

27

Fast copy is a function that makes an alternate copy of your backed up data on the same Data Domain
system. Fast copy is very efficient at making duplicate copies of pointers to data by using the DD OS
snapshot function with only 1% to 2% of overhead needed to write data pointers to the original data.
Sometimes, access to production backup data is restricted. Fast copy gives access to all data fast copied
readable and writeable, making this operation handy for data recovery from backups.
The difference between snapshots and fast copied data is that the fast copy duplicate is not a point-intime duplicate. Any changes that are made during the data copy, in either the source or the target
directories, will not be duplicated in the fast copy.
Note that fast copy is a read/write copy of a point-in-time copy at the time it was made while a snapshot
is read only.

226

Fast copy makes a copy of the pointers to data segments and structure of a source to a target directory
on the same Data Domain system. You can use the fast copy operation to retrieve data stored in
snapshots. In this example, the /hr MTree contains two snapshots in the /.snapshot directory. One
of these snapshots, 10-31-2012, is fast copied to /backup/recovery. Only pointers to the actual
data are copied, adding a 1% to 2% increase in actual used data space. All of the referenced data is
readable and writable. If the /hr MTree or any of its contents is deleted, no data referenced in the fast
copy is deleted from the system.

227

Slide 28

Perform a Fast Copy

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

28

To perform a fast copy from the Enterprise Manager:


1. Navigate to Data Management > File System > More Tasks > Fast Copy
2. Enter the data source and the destination (target location).
3. Enter the pathname for the directory where the data to be copied resides.
If you want to copy a snapshot created in the hr Mtree, to a destination named, dir1 in the
/backup Mtree, use the path to the given snapshot as the source and the full path to the
directory, dir1, in the destination field.
Specifying an non-existent directory will create that directory. Be aware that the destination
directory must be empty or the fast copy operation will fail. You can choose to overwrite the
contents of the destination by checking that option in the Fast Copy dialog window.
Related CLI command:
# filesys fastcopy
Copies a file or directory tree from a Data Domain system source directory to a destination on
the Data Domain system.

228

Slide 29

Fast Copy Operations

The Fast Copy operation can be used as part of a data recovery

workflow using a snapshot for user-based search and recovery.


Users with access to the Data Domain system can access a
share or mount to the fast copy data for self-search and
recovery (click and drag).
Fast copy directories do not disturb actual production data.
Note that Fast Copy makes a destination equal to its source, but
not at a particular point in time.
For instance, running a backup to the same directory that
Fast Copy is attempting to copy may cause the source
directory to become out of sync with what is copied.
Fast Copy directories are not managed in the Data Domain
Enterprise Manager or through the command line interface (CLI).
They must be managed manually.
Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

29

The fast copy operation can be used as part of a data recovery workflow using a snapshot. Snapshot
content is not viewable from a CIFS share or NFS mount, but a fast copy of the snapshot is fully
viewable. From a fast copy on a share or a mount, you can recover lost data without disturbing normal
backup operations and production files.
Fast copy makes a destination equal to the source, but not at a particular point in time. The source and
destination may not be equal if either is changed during the copy operation.
This data must be manually identified and deleted to free up space. Then, space reclamation (file system
cleaning) must be run to regain the data space held by the fast copy. When backup data expires, a fast
copy directory will prevent the Data Domain system from recovering the space held by the expired data
because it is flagged by the fast copy directory as in-use.

229

Slide 30

Lab 5.3: Configuring Fast Copy

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

230

30

Slide 31

Module 5: File System and Data


Management
Lesson 4: File System Cleaning
In this lesson, the following topics are covered:
File system cleaning purpose and use
Configuring and running file system cleaning

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers Data Domain file system cleaning, also called garbage collection.
Topics include:
The purpose and use of file system cleaning.
Scheduling, configuring, and running the file system cleaning operation.
You will have a chance to configure and run file system cleaning on a Data Domain system in a
structured lab at the end of this lesson.

231

31

Slide 32

File System Cleaning


Expire?

Application
host

A
B
C
D
E

Expired data

container 1

When the backup application expires data, the related file segments are
marked on the Data Domain system for deletion. No data is deleted until file
system cleaning is run.
Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

32

When your backup application (such as NetWorker or NetBackup) expires backups, the associated data
is marked by the Data Domain system for deletion. However, the expired data is not deleted
immediately by the Data Domain system; it is removed during the cleaning operation. While the data is
not immediately deleted, the path name is. This results in unclaimed segment space that is not
immediately available.
File system cleaning is the process by which storage space is reclaimed from stored data that is no
longer needed. For example, when retention periods on backup software expire, the backups are
removed from the backup catalog, but space on the Data Domain system is not recovered until file
system cleaning is completed.
Depending on the amount of space the file system must clean, file system cleaning can take from several
hours to several days to complete. During the cleaning operation, the file system is available for all
normal operations including backup (write) and restore (read).
Although cleaning uses a significant amount of system resources, cleaning is self-throttling and gives up
system resources in the presence of user traffic.

232

Slide 33

Cleaning Process

Copies forward data into free containers


Reclaims space
Deletes duplicate segments if they exist
container 2

container 1

Valid segments are copied


forward and reorganized into
new containers

unclaimed segment

Reorganized valid
data segments

valid segment

container 1

Reclaimed space is
appended back onto
available disk space in
new, empty
containers

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

33

Data invulnerability requires that data be written only into new, empty containers data already written
in existing containers cannot be overwritten. This requirement also applies to file system cleaning.
During file system cleaning, the system reclaims space taken up by expired data so you can use it for
new data.
The example in this figure refers to dead and valid segments. Dead segments are segments in containers
no longer needed by the system, for example, claimed by a file that has been deleted and was the
only/or final claim to that segment, or any other segment/container space deemed not needed by the
file system internally. Valid segments contain unexpired data used to store backup-related files. When
files in a backup are expired, pointers to the related file segments are removed. Dead segments are not
allowed to be overwritten with new data since this could put valid data at risk of corruption. Instead,
valid segments are copied forward into free containers to group the remaining valid segments together.
When the data is safe and reorganized, the original containers are appended back onto the available
disk space.
Since the Data Domain system uses a log structured file system, space that was deleted must be
reclaimed. The reclamation process runs automatically as a part of file system cleaning.

233

During the cleaning process, a Data Domain system is available for all normal operations, to include
accepting data from backup systems.
Cleaning does require a significant amount of system processing resources and might take several hours,
or under extreme circumstances days, to complete even when undisturbed. Cleaning applies a set
processing throttle of 50% when other operations are running, sharing the system resources with other
operations. The throttling percentage can be manually adjusted up or down by the system
administrator.
File system cleaning can be scheduled to meet the needs of your backup plan. The default time schedule
is set to run every Tuesday at 6 a.m. The default CPU throttle is 50%. This setting applies half of the CPU
resources with the cleaning process and half with all of the other processes. Increasing the throttle
amount, increases the resources dedicated to the cleaning process and decreases resources available to
other running processes.

234

Slide 34

Running File System Cleaning

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

34

Using the Data Domain Enterprise Manager, navigate to Data Management > File System > Start
Cleaning.
This action begins an immediate cleaning session.
A window displays an informational alert describing the possible performance impact during cleaning,
and a field to set the percentage of throttle for the cleaning session.

235

Slide 35

Schedule File System Cleaning

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

35

Schedule file system cleaning to start when the period of high activity ends, and the competition for
resources is minimal or non-existent.
To schedule file system cleaning using the Data Domain Enterprise Manager, navigate to Data
Management > File System > Configuration > Clean Schedule.
You see a window with three options for scheduling file system cleaning:
Default: Tuesday at 6 a.m. with 50% throttle.
Note: The throttle setting affects cleaning only when the system is servicing other user requests.
When there are no user requests, cleaning always runs at full throttle. For example, if throttle is
set to 70%, the system uses 100% of the system resources and throttles down to 70% of
resources when the system is handling other user requests.
No Schedule: The only cleaning that occurs would be manually initiated.
Custom Clean Schedule: Configurable with weekly-based or monthly-based settings.
Every day or selected days of the week on the schedule will run cleaning at the same time on the given
days.
Click OK to set the schedule you have selected.

236

Related CLI commands:


# filesys clean reset
Resets the clean schedule to the default of Tuesday at 6 a.m. (tue 0600), the default throttle of
50 percent, or both.
# filesys clean set schedule
Sets the schedule for the clean operation to run automatically. Default is Tuesday at 6 a.m.
# filesys clean set throttle
Sets the clean operations to use a lower level of system resources when the Data Domain
system is busy. At zero percent, cleaning runs slowly or not at all, depending on how busy the
system is.
# filesys clean show config
Displays settings for file system cleaning.
# filesys clean show schedule
Displays the current date and time for the clean schedule.
# filesys clean show throttle
Displays throttle setting for cleaning.
# filesys clean start
Starts the clean process manually.
# filesys clean status
Displays the status of the clean process.
# filesys clean stop
Stops the clean process.
# filesys clean watch
Monitors the filesys clean process.

237

Slide 36

File System Cleaning: Considerations and practices


Schedule cleaning during low system traffic periods.
Raising the throttle higher than 50% can significantly slow other

running processes.
Taking the file system offline for any reason stops the cleaning process.
Cleaning does not automatically resume after the file system restarts
until the next cleaning cycle.
Encryption and gz compression increases cleaning process time.
All pointers to data, including snapshots and fast copies, and pending
replication must be removed before that data can be a candidate for
cleaning.
Overly frequent cleaning can cause poor deduplication and increased
file fragmentation.
Cleaning might cause replication to lag.
Run cleaning after the first full backup to increase the compression
factor.
Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

36

Considerations and suggested practices:


You should schedule cleaning for times when system traffic is lowest.
Cleaning is a file system operation that will impact overall system performance while it runs.
Adjusting the cleaning throttle higher than 50% will consume more system resources during the
cleaning operation and can potentially slow down other system processes.
Any operation that shuts down the Data Domain file system or powers off the device (a system
power-off, reboot, or filesys disable command) stops the clean operation. File system cleaning
does not automatically continue when the Data Domain system or file system restarts.
Encryption and gz compression requires much more time than normal to complete cleaning as
all existing data needs to be read, uncompressed, and compressed again.
Expiring files from your backup does not guarantee that space will be freed after cleaning. If
active pointers exist to any segments related to the data you expire, such as snapshots or fast
copies, those data segments are still considered valid and will remain on the system until all
references to those segments are removed.

238

Daily file system cleaning is not recommended as overly frequent cleaning can lead to increased
file fragmentation. File fragmentation can result in poor data locality and, among other things,
higher-than-normal disk utilization. If the retention period of your backups is short, you might
be able to run cleaning more often than once weekly. The more frequently the data expires, the
more frequently file system cleaning can operate. Work with EMC Data Domain Support to
determine the best cleaning frequency under unusual circumstances.
If your system is growing closer to full capacity, do not change the cleaning schedule to increase
cleaning cycles. A higher frequency of cleaning cycles might reduce the deduplication factor,
thus reducing the logical capacity on the Data Domain system and causing more space to be
used by the same data stored.
Instead, manually remove unneeded data or reduce the retention periods set by your backup
software to free additional space. Run cleaning per the schedule after data on the system has
been expired.
If you encounter a system full (100%) or near full (90%) alert, and you are unable to free up
space before the next backup, contact Support as soon as possible.
If cleaning is run during replication operations and replication lags in its process, cleaning may
not be able to complete operations. This condition requires either replication break and resync
after cleaning has completed or allowing replication to catch up (for example, increasing
network link speed or writing less new data to the source directory).

Note: It is good practice to run a cleaning operation after the first full backup to a Data Domain system.
The initial local compression on a full backup is generally a factor of 1.5 to 2.5. An immediate cleaning
operation gives additional compression by another factor of 1.15 to 1.2 and reclaims a corresponding
amount of disk space.

239

Slide 37

Lab 5.4: Configuring File System Cleaning

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

240

37

Slide 38

Module 5: File System and Data


Management
Lesson 5: Monitoring File System Space Usage
This lesson covers the following topics:
Factors affecting the speed of space consumption
How to monitor space consumption and space usage

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

38

This lesson covers how to monitor Data Domain file system space usage.
Topics include:
The factors that affect the rate at which space is consumed on the system.
How to monitor the space used and rate of consumption on the system.
You will have a chance to review space usage, and data consumption reports on a Data Domain system
in a structured lab.

241

Slide 39

Monitoring File System Space Usage

Factors affecting how fast data on disk grows include:


Amount of data being written
Compressibility of data being written
How long the data is retained on the system

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

39

When a disk-based deduplication system such as a Data Domain system is used as the primary
destination storage device for backups, sizing must be done appropriately. Presuming the correctly sized
system is installed, it is important to monitor usage to ensure data growth does not exceed system
capacity.
The factors affecting how fast data on a disk grows on a Data Domain system include:
The size and number of data sets being backed up. An increase in the number of backups or an
increase in the amount of data being backed-up and retained will cause space usage to increase.
The compressibility of data being backed up. Pre-compressed data formats do not compress or
deduplicate as well as non-compressed files and thus increase the amount of space used on the
system.
The retention period specified in the backup software. The longer the retention period, the
larger the amount of space required.
If any of these factors increase above the original sizing plan, your backup system could easily overrun
its capacity.
There are several ways to monitor the space usage on a Data Domain system to help prevent system full
conditions.

242

Slide 40

Monitoring File System Space Usage

Ways to monitor data growth on the Data Domain system:


Space usage plots at my.datatdomain.com
Graphic reports in the Data Domain Enterprise Manager
Capacity and quota alerts at the MTree level
Daily autosupport reports

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

Ways to monitor data growth on the Data Domain system:


Space usage plots at my.datadomain.com if autosupports are being sent to Data Domain
Support
Graphic reports in the Data Domain Enterprise Manager
Capacity and quota alerts
Daily autosupport reports

243

40

Slide 41

Space Usage Plots

testsystem.test.com (1FA1432305)

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

41

If you have set your system to send autosupports to EMC Data Domain Support at,
http://my.datadomain.com, you can log in to the site and click My Systems, select from a list of systems
registered for support and view an up-to-the-day plot of your space usage over time. The plot usually
shows up to a years worth of data at a time.
On the plot, you can see data reported by your system through daily autosupports. The plots will show
your pre-compressed, and post-compressed data and the daily compression ratio. This is a valuable tool
to watch longer trends in data growth and compression. You can note when your system took on a
different backup plan and how it impacted the growth rate and compression ratio.
From this same page, you can also view the tabular data used to create the graph, or the autosupports
themselves for a more granular view.

244

Slide 42

File System Summary Tab

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

42

The File System Summary tab is under the Data Management tab in the Data Domain Enterprise
Manager.
The window displays an easy-to-read dashboard of current space usage and availability. It also provides
an up-to-the-minute indication of the compression factor.
The Space Usage section shows two panes:
The first pane shows the amount of disk space available and used by file system components, based on
the last cleaning.

245

/data:post-comp shows:
Size (GiB): The amount of total physical disk space available for data.
Used: (GiB): The actual physical space used for compressed data. Warning messages go to the
system log, and an email alert is generated when the use reaches 90%, 95%, and 100%. At 100%,
the Data Domain system accepts no more data from backup hosts.
Available (GiB): The total amount of space available for data storage. This figure can change
because an internal index may expand as the Data Domain system fills with data. The index
expansion takes space from the Avail GiB amount.
Cleanable (GiB): The estimated amount of space that could be reclaimed if a cleaning operation
were run.
The /ddvar line is the space reserved for system operations such as log files and upgrade tar files. It is
not a part of the data storage total.
The second Space Usage pane shows the compression factors:
Currently Used: The amounts currently in use by the file system.
Written in Last 24 Hours: The compression activity over the last day.
For both of these areas, the following is shown:
Pre-Compression (GiB*): Data written before compression
Post-Compression (GiB*): Storage used after compression
Global-Comp Factor: Pre-Compression / (Size after global compression)
Local-Comp Factor: (Size after global compression) / Post- Compression
Total-Comp Factor: Pre-Compression / Post-Compression
Reduction %: [(Pre-Compression - Post-Compression) / Pre-Compression] * 100
*The gibibyte is a standards-based binary multiple (prefix gibi, symbol Gi) of the byte, a unit of digital
information storage. The gibibyte unit symbol is GiB.[1] 1 gibibyte = 230 bytes = 1073741824bytes = 1024
mebibytes.
Note: It is important to know how these compression statistics are calculated and what they are
reporting to ensure a complete understanding of what is being reported.
Related CLI commands:
# filesys show space
Display the space available to, and used by, file system resources.
# filesys show compression
Display the space used by, and compression achieved for, files and directories in the file system.

246

Slide 43

Space Usage View

When the mouse rolls-over a data


point, it references the data both in
a pop-out on the graph and in a
section below the graph

Pre-Comp Written
Sat Feb 04 2012 12:00 PM
16.9 GiB

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

43

The Space Usage view contains a graph that displays a visual representation of data usage for the
system.
This view is used to monitor and analyze daily activities on the Data Domain system
Roll over a point on a graph line to display a box with data at that point. (as shown in the slide).
Click Print (at the bottom on the graph) to open the standard Print dialog box.
Click Show in a new window to display the graph in a new browser window.
The lines of the graph denote measurement for:
Pre-comp WrittenThe total amount of data sent to the Data Domain system by backup
servers. Pre-compressed data on a Data Domain system is what a backup server sees as the total
uncompressed data held by a Data Domain system-as-storage unit. Shown with the Space Used
(left) vertical axis of the graph.
Post-comp UsedThe total amount of disk storage in use on the Data Domain system. Shown
with the Space Used (left) vertical axis of the graph.
Comp FactorThe amount of compression the Data Domain system has performed with the
data it received (compression ratio). Shown with the Compression Factor (right) vertical axis of
the graph.

247

The bottom of the screen also displays all three measurements when a point is rolled over on the graph.
Note: In this example, 16.9 GiB was ingested while only 643.5 MiB was used to store the data for a total
compression factor of 26.8x.
The view can be set to various durations between 7 and 120 days.
Related CLI command:
# filesys show compression
Display the space used by, and compression achieved for, files and directories in the file system.

248

Slide 44

Space Consumption View

Capacity option unchecked


Scales to GiB for improved display

Post-Comp
Thur Mar 01 2012 12:00 PM
2.1 GiB

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

44

The Space Consumption view contains a graph that displays the space used over time, shown in relation
to total system capacity.
With the Capacity option unchecked (see circled on the slide), the scale is reduced from TiB to GiB in
order to present a clear view of space used. In this example, only 2.1 GiB post-comp has been stored
with a 7.5 TiB capacity. See the next slide to see the consumption view with the capacity indicator.
This view is useful to note trends in space availability on the Data Domain system, such as changes in
space availability and compression in relation to cleaning processes.
Roll over a point on a graph line to display a box with data at that point.
Click Print (at the bottom on the graph) to open the standard Print dialog box.
Click Show in a new window to display the graph in a new browser window.

249

The lines of the graph show measurements for:


Capacity (not shown) The total amount of disk storage available for data on the Data Domain
system. The amount is shown on the Space Used (left) vertical axis of the graph. Clicking the
Capacity checkbox changes the view of space between GiB and TiB. The capacity on the example
system is 7.5 TiB and does not show the capacity line in this smaller view.
Post-comp (as shown in the larger shaded area in the graph) The total amount of disk storage
in use on the Data Domain system. This is shown with the Space Used (left) vertical axis of the
graph.
Comp Factor (as shown in the slide as a single black line on the graph) The amount of
compression the Data Domain system has performed with the data it received (compression
ratio). This is shown on the Compression Factor (right) vertical axis of the graph.
Cleaning A grey vertical line appears on the graph each time a file system cleaning operation
was started. Roll over a data line representing cleaning to see the date and time cleaning was
started and the duration of the process.
Data Movement (not shown) The amount of disk space moved to the archiving storage area
(if the Archive license is enabled).
You can change the interval of time represented on the graph by clicking a different duration, up to 120
days. 30 days is the default duration.

250

Slide 45

Space Consumption View with Capacity Indicator

Capacity option checked


Scales to TiB

Capacity
Sun Feb 05 2012 12:00 PM
7.5 TiB

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

45

When the capacity option is checked, the display scales to TiB, and a line at the maximum capacity of 7.5
TiB appears.
When you roll over the capacity line, an indicator will show the capacity details as shown in this
screenshot.
Notice that at this scale, the 666.0 MiB Post-Comp data mark on February 5, does not show on the
graph.

251

Slide 46

Daily Written View

Pre-Comp
Thu Feb 02 2012 12:00 PM
13.7 GiB

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

46

The Daily Written view contains a graph that displays a visual representation of data that is written daily
to the system over a period of time, selectable from 7 to 120 days. The data amounts are shown over
time for pre- and post-compression amounts.
It is useful to see data ingestion and compression factor results over a selected duration. You should be
able to notice trends in compression factor and ingestion rates.
It also provides totals for global and local compression amounts, and pre-compression and postcompression amounts:
Roll over a point on a graph line to display a box with data at that point.
Click Print (at the bottom on the graph) to open the standard Print dialog box.
Click Show a in new window to display the graph in a new browser window.

252

The lines of the graph show measurements for:


Pre-CompThe total amount of data written to the Data Domain system by backup hosts. Precompressed data on a Data Domain system is what a backup host sees as the total
uncompressed data held by a Data Domain system-as-storage-unit.
Post-CompThe total amount of data written to the Data Domain system after compression has
been performed, as shown in GiBs.
Total CompThe total amount of compression the Data Domain system has performed with the
data it received (compression ratio). Shown with the Total Compression Factor (right) vertical
axis of the graph.
You can change the interval of time represented on the graph by clicking a different duration, up to 120
days. 30 days is the default duration.

253

Slide 47

Module 5: Summary
Key points covered in this module include:
MTrees can be configured so that different types of data, or
data from different sources, can be managed and reported on
separately.
You can set limits on the amount of logical, pre-comp, space
used by individual Mtrees using Mtree hard and soft quotas.
Snapshots enable you to save a read-only copy of an MTree at
a specific point in time.
Fast copy gives read/write access to all data fast copied,
making this operation handy for data recovery from snapshots.

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

254

47

Slide 48

Module 5: Summary
Key points covered in this module include (continued):
The default time scheduled for file system cleaning is every
Tuesday at 6 a.m.
Overly frequent cleaning can cause poor deduplication and
increased file fragmentation.
Use the Space Usage, Consumption, and Daily Written views in
the File System tab to monitor data ingestion and compression
rates over time.
The total compression factor is the pre-compression rate
divided by the post-compression rate.

Module 5: File System and Data Management

Copyright 2013 EMC Corporation. All Rights Reserved.

255

48

256

Slide 1

Module 6: Data Replication and Recovery

Upon completion of this module, you should be able to:


Describe replication types and topologies supported by Data
Domain systems.
Describe how to configure replication.
Describe the process of recovering data from an off-site
replica.
Identify and read the reports used to monitor replication.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

Replication of deduplicated, compressed data offers the most economical approach to the automated
movement of data copies to a safe site using minimum WAN bandwidth. This ensures fast recovery in
case of loss of the primary data, the primary site or the secondary store.

257

Slide 2

Module 6: Data Replication and Recovery

Lesson 1: Data Replication


This lesson covers the following topics:
An overview of replication
Replication types and topologies
Replication seeding

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson is an overview of Data Domain replication types and topologies, configuring, and seeding
replication.

258

Slide 3

Overview of Data Replication


Backup host

Replication pair

Ethernet/SAN

Clients
Server

Primary
storage

Source

Network

Destination

Backup data can be efficiently replicated for:


Disaster recovery
Remote office data protection
Multiple site consolidation

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

Data Domain systems are used to store backup data onsite for a short period such as 30, 60 or 90 days,
depending on local practices and capacity. Lost or corrupted files are recovered easily from the onsite
Data Domain system since it is disk-based, and files are easy to locate and read at any time.
In the case of a disaster that destroys the onsite data, the offsite replica is used to restore operations.
Data on the replica is immediately available for use by systems in the disaster recovery facility. When a
Data Domain system at the main site is repaired or replaced, the data can be recovered using a few
simple recovery configuration and initiation commands.
You can quickly move data offsite (with no delays in copying and moving tapes). You dont have to
complete replication for backups to occur. Replication occurs in real time.
Replication typically consists of a source Data Domain system (which receives data from a backup
system), and one or more destination Data Domain systems.
Replication duplicates backed-up data over a WAN after it has been deduplicated and compressed.
Replication creates a logical copy of the selected source data post-deduplication, and only sends any
segments that do not already exist on the destination. Network demands are reduced during replication
because only unique data segments are sent over the network.

259

Replication provides a secondary copy replicated (usually) to an offsite location for:


Disaster recovery
Remote office data protection
Multiple site tape consolidation
After you configure replication between a source and destination, only new data written to the source is
automatically replicated to the destination. Data is deduplicated at the source and at the destination. All
offsite replicated data is recoverable online, reducing the amount of time needed for recovering from
data loss.
The replication process is designed to deal with network interruptions common in the WAN and to
recover gracefully with very high data integrity and resilience. This ensures that the data on the replica is
in a state usable by applications a critical component for optimizing the utility of the replica for data
recovery and archive access.
A Data Domain system is able to perform normal backup and restore operations and replication,
simultaneously.
Replication is a software feature that requires an additional license. You need a replicator license for
both the source and destination Data Domain systems.

260

Slide 4

Replication Pair Context


/data/

/data/

/col1/

/col1/

/backup/

/backup/

/subdir1

/subdir1

system A (source)

system B (destination)

source directory://[system A]/data/col1/backup


/subdir1

destination directory://[system B]/data/col1/backup


/subdir1

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

Defining a replication source and destination is called a pair. A source or a destination in the
replication pair is referred to as a context. The context is defined in both the source and destination
Data Domain systems paired for replication.
A replication context can also be termed a replication stream, and although the use case is quite
different, the stream resource utilization within a Data Domain system is roughly equivalent to a read
stream (for a source context) or a write stream (for a destination context).
The count of replication streams per system depends upon the processing power of the Data Domain
system on which they are created. Lesser systems can handle no more than 15 source and 20
destination streams, while the most powerful Data Domain system can handle over 200 streams.

261

Slide 5

Replication Topologies
System A
source/destination

System B
destination

System A
source

1 to 1

System B
destination/source

bi-directional

source

destination
destination

source

1 to many

many to 1
source

primary
source/
destination

source
destination

cascaded

primary
source/
destination

destination

cascaded 1-to-many

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

Data Domain supports various replication topologies in which data flows from a source to a destination
directory over a LAN or WAN.

One-to-one replication
The simplest type of replication is from a Data Domain source system to a Data Domain
destination system, otherwise known as a one-to-one replication pair. This replication topology
can be configured with directory, MTree, or collection replication types.

Bi-directional replication
In a bi-directional replication pair, data from a directory or MTree on System A is replicated to
System B, and from another directory or MTree on System B to System A.

One-to-many replication
In one-to-many replication data flows from a source directory or MTree on a System A to several
destination systems. You could use this type of replication to create more than two copies for
increased data protection, or to distribute data for multi-site usage.

262

Many-to-one replication
In many-to-one replication, whether with MTree or directory, replication data flows from
several source systems to a single destination system. This type of replication can be used to
provide data recovery protection for several branch offices at the corporate headquarters IT
systems.

Cascaded replication
In a cascaded replication topology, a source directory or MTree is chained among three Data
Domain systems. The last hop in the chain can be configured as collection, MTree, or directory
replication, depending on whether the source is directory or MTree.
For example, the first DD system replicates one or more MTrees to a second DD system, which
then replicates those MTrees to a final DD system. The MTrees on the second DD system are
both a destination (from the first DD system) and a source (to the final DD system). Data
recovery can be performed from the non-degraded replication pair context.

263

Slide 6

Types of Data Domain Replication

Collection Replication: For whole system mirroring


Directory Replication: For partial site, single

directory backup
MTree Replication: For partial site,
point-in-time backup
Managed Replication: Used with Data Domain Boost

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

Data Domain Replicator software offers four replication types that leverage the different logical levels of
the system described in the previous slide for different effects.

Collection replication: This performs whole-system mirroring in a one-to-one topology,


continuously transferring changes in the underlying collection, including all of the logical
directories and files of the Data Domain file system. This type of replication is very simple and
requires fewer resources than other types; therefore it can provide higher throughput and
support more objects with less overhead.
Directory replication: A subdirectory under /backup/ and all files and directories below it on a
source system replicates to a destination directory on a different Data Domain system. This
transfers only the deduplicated changes of any file or subdirectory within the selected Data
Domain file system directory.

264

MTree replication: This is used to replicate MTrees between Data Domain systems. It uses the
same WAN deduplication mechanism as used by directory replication to avoid sending
redundant data across the network. The use of snapshots ensures that the data on the
destination is always a point-in-time copy of the source with file consistency, while reducing
replication churn, thus making WAN use more efficient. Replicating individual directories under
an MTree is not permitted with this type.
A fourth type, managed replication, belongs to Data Domain Boost operations and will be
discussed later in this course.

265

Slide 7

Collection Replication

C1

C2

C3

C4

C3

head of
source
collection log

C2

head of
destination
collection log

system A (source)

The entire /data/col1 area is replicated


making collection replication a mirror of
the original.

C1

system B (destination)

Collection replication uses the collection log


to track and update what is missing on the
source.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

Collection replication replicates the entire /data/col1 area from a source Data Domain system to a
destination Data Domain system. Collection replication uses the logging file system structure to track
replication. Transferring data in this way means simply comparing the heads of the source and
destination logs, and catching-up, one container at a time, as shown in this diagram. If collection
replication lags behind, it continues until it catches up.
The Data Domain system to be used as the collection replication destination must be empty before
configuring replication. Once replication is configured, the destination system is dedicated to receive
data only from the source system.
With collection replication, all user accounts and passwords are replicated from the source to the
destination. If the Data Domain system is a source for collection replication, snapshots are also
replicated.

266

Collection replication is the fastest and lightest type of replication offered by the DD OS. There is no ongoing negotiation between the systems regarding what to send. Collection replication is mostly unaware
of the boundaries between files. Replication operates on segment locality containers that are sent after
they are closed.
Because there is only one collection per Data Domain system, this is specifically an approach to system
mirroring. Collection replication is the only form of replication used for true disaster recovery. The
destination system cannot be shared for other roles. It is read-only and shows data only from one
source. After the data is on the destination, it is immediately visible for recovery.

267

Slide 8

Collection Replication Points to Consider

The entire /data/col1/ directory is replicated.


Other than receiving data from the source, the destination is

read-only. The context must be broken by using the replication


break command to make it read/write -able.
Snapshots cannot be created on the destination of a collection
replication because the destination is readonly.
Retention Lock Compliance supports collection replication only.
The encryption of Data at Rest feature can be used and requires
both source and destination to be configured identically.
Collection replication supports 1-to-1, and cascaded* replication
topologies.
* Where only the last system in a cascaded chain can be
configured as collection replication

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

Collection replication replicates the entire /data/col1 area from a source Data Domain system to a
destination Data Domain system. This is useful when all the contents being written to the DD system
need to be protected at a secondary site.
The Data Domain system to be used as the collection replication destination must be empty before
configuring replication. The destination immediately offers all backed up data, as a read-only mirror,
after it is replicated from the source.
Snapshots cannot be created on the destination of a collection replication because the destination is
read-only.
With collection replication, all user accounts and passwords are replicated from the source to the
destination.

268

Data Domain Replicator software can be used with the optional Encryption of Data at Rest feature,
enabling encrypted data to be replicated using collection replication. Collection replication requires the
source and target to have the exact same encryption configuration because the target is expected to be
an exact replica of the source data. In particular, the encryption feature must be turned on or off at both
source and target and if the feature is turned on, then the encryption algorithm and the system
passphrases must also match. The parameters are checked during the replication association phase.
During collection replication, the source system transmits the encrypted user data along with the
encrypted system encryption key. The data can be recovered at the target, because the target machine
has the same passphrase and the same system encryption key.
Collection replication topologies can be configured in the following ways.
One-to-One Replication: This topology can be used with collection replication where the entire
/backup directory from a source Data Domain system is mirrored to a destination Data
Domain system. Other than receiving data from the source, the destination is a read-only
system.
Cascaded Replication: In a cascaded replication topology, directory replication is chained among
three or more Data Domain systems. The last system in the chain can be configured as collection
replication. Data recovery can be performed from the non-degraded replication pair context.

269

Slide 9

Directory Replication
/data/

/data/

/col1/

/col1/

/backup/

/backup/

/subdir1

Directory replication only copies


subdirectories within the
/data/col1/backup path.

/subdir1

Replication process initiated by a file closing


on the source or forced automatically if file
closures are infrequent.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

With directory replication, a replication context pairs a directory, under /data/col1/backup/ and
all files and directories below it on a source system with a destination directory on a different system.
During replication, deduplication is preserved since data segments that already reside on the destination
system will not be resent across the network. The destination directory is read-only, and it can coexist
on the same system with other replication destination directories, replication source directories, and
other local directories, all of which share deduplication in that systems collection.
The directory replication process is triggered by a file closing on the source. In cases where file closures
are infrequent, Data Domain Replicator forces the data transfer periodically.
If the Data Domain system is a source for directory replication, snapshots within that directory are not
replicated. You must create and replicate snapshots separately.

270

Slide 10

Directory Replication Points to Consider


A destination Data Domain system must have available storage

capacity that is at least the post-compressed size of the expected


maximum post-compressed size of the source directory.
After replication is initialized, ownership and permissions of the
destination directory are always identical to those of the source
directory.
As long as the context exists, the destination directory is kept in a
read-only state and can receive data only from the source directory.
Due to differences in global compression, the source and destination
directory can differ in size.
A directory can be set to receive either CIFS or NFS backups, but not
both. Directory replication can replicate directories receiving CIFS or
NFS backups.
Directory replication supports 1-to-1, bi-directional, many-to-one, oneto-many, and cascaded topologies.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

10

During directory replication, a Data Domain system can perform normal backup and restore operations.
A destination Data Domain system must have available storage capacity that is at least the postcompressed size of the expected maximum size of the source directory. In a directory replication pair,
the destination is always read-only. In order to write to the destination outside of replication, you must
first break replication.
When replication is initialized, a destination directory is created automatically if it does not already
exist. After replication is initialized, ownership and permissions of the destination directory are always
identical to those of the source directory.
Directory replication can receive backups from both CIFS and NFS clients, but cannot not mix CIFS and
NFS data in same directory.
Directory replication supports encryption and retention lock.

271

Directory replication can be configured in the following ways:


One-to-One Replication: The simplest type of replication is from a Data Domain source
system to a Data Domain destination system.
Bi-Directional Replication: In a bi-directional replication pair, data from the source is
replicated to the destination directory on the destination system, and from the source
directory on the destination system to destination directory on the source system. This
topology can be used only with directory replication.
Many-to-One Replication: In many-to-one replication, data flows from several source
directory contexts to a single destination system. This type of replication occurs, for
example, when several branch offices replicate their data to the corporate headquarters IT
systems.
One-To-Many Replication: In a one-to-many replication, multi-streamed optimization
maximizes replication throughput per context.
Cascaded Replication: In a cascaded replication topology, directory replication is chained
among three or more Data Domain systems. Data recovery can be performed from the nondegraded replication pair context.

272

Slide 11

MTree Replication

/data/

/data/
/col1/

Snapshot 2

Snapshot 1

/col1/

/backup
/hr
/sales

The destination of the replication pair is


read-only.
The destination must have sufficient
available storage.
CIFS and NFS clients cannot be used in the
same MTree.

/sales

The Destination MTree is created by the


MTree replication operation.
MTree replication is usable with encryption
and Retention Lock Compliance.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

11

MTree replication enables the creation of disaster recovery copies of MTrees at a secondary location by
the /data/col1/mtree pathname. A Data Domain system can simultaneously be the source of
some replication contexts and the destination for other contexts. The Data Domain system can also
receive data from backup and archive applications while it is replicating data.
One fundamental difference between MTree replication and directory replication is the method used for
determining what needs to be replicated between the source and destination. MTree replication creates
periodic snapshots at the source and transmits the differences between two consecutive snapshots to
the destination. At the destination Data Domain system, the latest snapshot is not exposed until all of
the data for that snapshot is received. This ensures the destination is always a point-in-time image of
the source Data Domain system. In addition, files do not show out of order at the destination. This
provides file-level consistency, simplifying recovery procedures. It also reduces recovery time objectives
(RTOs). Users are also able to create a snapshot at the source Data Domain system for application
consistency (for example, after a completion of a backup), which is replicated on the destination where
the data can be used for disaster recovery.

273

MTree replication shares some common features with directory replication. It uses the same WAN
deduplication mechanism as used by directory replication to avoid sending redundant data across the
network. It also supports the same topologies that directory replication supports. Additionally, you can
have directory and MTree contexts on the same pair of systems.
The destination of the replication pair is read-only.
The destination must have sufficient available storage to avoid replication failures.
CIFS and NFS clients should not be used within the same MTree.
MTree replication duplicates data for an MTree specified by the /data/col1/mtree pathname
including the destination MTree.
Some replication command options with MTree replication may target a single replication pair (source
and destination directories) or may target all pairs that have a source or destination on the Data Domain
system.
MTree replication is usable with encryption and Data Domain Retention Lock Compliance on an MTreelevel at the source that is replicated to the destination.

274

Slide 12

MTree Replication Points to Consider


A destination Data Domain system must have available storage

capacity that is at least the post-compressed size of the expected


maximum size of the source MTree.
A destination Data Domain system can receive backups from both CIFS
clients and NFS clients as long as they are in separate MTrees.
MTree replication can receive backups from both CIFS and NFS clients
each in their own replication pair. (But not in the same MTree.)
When replication is initialized, a destination MTree is created
automatically it cannot already exist.
After replication is initialized, ownership and permissions of the
destination MTree are always identical to those of the source MTree.
At any time, due to differences in global compression, the source and
destination MTree can differ in size.
Supports 1-to-1, bi-directional, one-to-many, many-to-one, and
cascaded replication topologies.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

12

A destination Data Domain system must have available storage capacity that is at least the postcompressed size of the expected maximum size of the source MTree.
A destination Data Domain system can receive backups from both CIFS clients and NFS clients as
long as they are separate.
MTree replication can receive backups from both CIFS and NFS clients each in their own
replication pair. (But not in the same MTree.)
When replication is initialized, a destination MTree is created automatically it cannot already
exist.
After replication is initialized, ownership and permissions of the destination MTree are always
identical to those of the source MTree.
At any time, due to differences in global compression, the source and destination MTree can
differ in size.
MTree replication supports 1-to-1, bi-directional, one-to-many, many-to-one, and cascaded
replication topologies.

275

Slide 13

An Example of Data Layout Using MTrees


MTree-based layout

Directory-based layout
/data/

/data/

/col1/

/col1/
/backup/

/backup/

/Oracle/

/prod/

/prod
These sub-directories
are replicated as part of
the /backup/ MTree

/Oracle

/dev

/SQL

/SQL/

/dev/

/prod

/Oracle

/dev

/SQL

/prod/ and /dev/


can be configured
to replicate
individually or not
at all.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

13

Replication is a major feature that takes advantage of MTree structure on the Data Domain system.
MTree structure and flexibility provides greater control over its data being replicated. Careful planning
of your data layout will allow the greatest flexibility when managing data under an MTree structure.
MTree replication works only at the MTree level. If you want to implement MTree replication, you must
move data from the existing directory structure within the /backup MTree to a new or existing MTree,
and create a replication pair using that MTree.
For example, suppose that a Data Domain system has shares mounted in locations under /backup/ as
shown in the directory-based layout.

276

If you want to use MTree replication for your production (prod) data, but are not interested in
replicating any of the development (dev) data, the data layout can be modified to create two MTrees:
/prod and /dev, with two directories within each of them. The old shares would then be deleted and
new shares created for each of the four new subdirectories under the two new MTrees. This would look
like the structure shown in the MTree-based layout.
The Data Domain system now has two new MTrees, and four shares as earlier. You can set up MTree
replication for the /prod MTree to replicate all of your production data and not set up replication for
the /dev MTree as you are not interested in replicating your development data.

277

Slide 14

Replication Seeding

high-speed
low-latency link

Source

Destination

Moves destination and source to same location


Provides faster initialization throughput
Improves performance 2-3x by using 10GbE links
Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

14

If the source Data Domain system has a high volume of data prior to configuring replication, the initial
replication seeding can take some time over a slow link. To expedite the initial seeding, you can bring
the destination system to the same location as the source system to use a high-speed, low-latency link.
After data is initially seeded using the high-speed network, you then move the system back to its
intended location.
After data is initially seeded, only new data is sent from that point onwards.
All replication topologies are supported for this process, which is typically performed using collection
replication.

278

Slide 15

Module 6: Data Replication and Recovery

Lesson 2: Configuring Replication


This lesson covers the following topics:
Configuring replication using Data Domain Enterprise
Manager
Low Bandwidth Optimization (LBO)
Encrypted file replication
Using a non-default connection port
Replication throttle settings

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

15

This lesson shows how to configure replication using DD Enterprise Manager, including low-bandwidth
optimization (LBO), encryption over wire, using a non-default connection port, and setting replication
throttle.

279

Slide 16

Configuring Replication

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

16

To create a replication pair:


1. In the Data Domain Enterprise Manager, go to Replication > Create Pair > General > Replication
Type.
2. In Replication Type, select the type of replication you want to configure: Directory, Collection or
MTree.
3. Select the source system hostname from the Source System dropdown menu. Enter the
hostname of the source system, if it is not listed.
4. Select the destination system hostname from the Destination System menu. Enter the hostname
of the destination system, if it is not listed.
5. Enter the source path in the Source Path field.
Notice that the source path changes depending on the type of replication chosen. Since
directory replication is chosen, the source path will begin with /backup/. If MTree is chosen, the
source path will begin with /data/col1/ and for collection replication, it will simply identify the
entire system.

280

6. Enter the destination path in the Destination Path field.


Notice that the source and destination paths change depending on the type of replication
chosen. Since directory replication is chosen, the source and destination paths begin with
/backup. If MTree is chosen, the source and destination paths begin with /data/col1 and
for collection replication, it identifies the entire system.
Related CLI commands:
# Replication add
Creates a replication pair
# Replication break
Removes the source or destination DD system from a replication pair
# Replication disable
Disables replication
# Replication enable
Enables replication
# Replication initialize
Initializes replicaiton
# Replication modify
Modifies connection host, hostname, encryption, and LBO
# Replication option reset
Resets system bandwidth
# Replication option set
Sets variable rates such as bandwidth, delay, and listen-port

281

Slide 17

Low Bandwidth Optimization (LBO)

Can optionally reduce WAN bandwidth utilization


Is useful if using a low-bandwidth network link.
Provides additional compression
Is CPU intensive and should only be enabled in cases that can
benefit from it.
Should be applied only when available bandwidth is
6 Mb/s or less.

Copyright 2013 EMC Corporation. All Rights Reserved.

Module 6: Data Replication and Recovery

17

Low bandwidth optimization (LBO) is an optional mode that enables remote sites with limited
bandwidth to replicate and protect more of their data over existing networks. LBO:
Can optionally reduce WAN bandwidth utilization.
Is useful if file replication is being performed over a low-bandwidth WAN link.
Provides additional compression during data transfer.
Is recommended only for file replication jobs that occur over WAN links with less than 6 Mb/s of
available bandwidth. Do not use this option if maximum file system write performance is
required.
LBO can be applied on a per-context basis to all file replication jobs on a system.
Additional tuning might be required to improve LBO functionality on your system. Use bandwidth and
network-delay settings together to calculate the proper TCP buffer size, and set replication bandwidth
for replication for greater compatibility with LBO.
LBO can be monitored and managed through the Data Domain Enterprise Manager Data Management >
DD Boost > Active File Replications view.

282

Slide 18

Low Bandwidth Optimization Using Delta Comparisons


New segment list
S1
S2
S3
S4
S16
S7

S1

S2

S3

S16

WAN

S4
S7

source

S1

S2

S3

S16

List of missing segments

S4
S7

destination

S16
S7
Missing segments & deltas

S7
(S1)

S1 +6

S7

S7
S16

(S1)

+6

Copyright 2013 EMC Corporation. All Rights Reserved.

(S1)

+6 S16

S1

Module 6: Data Replication and Recovery

18

Delta compression is a global compression algorithm that is applied after identity filtering. The algorithm
looks for previous similar segments using a sketch-like technique that sends only the difference between
previous and new segments. In this example, segment S1 is similar to S16. The destination can ask the
source if it also has S1. If it does, then it needs to transfer only the delta (or difference) between S1 and
S16. If the destination doesnt have S1, it can send the full segment data for S16 and the full missing
segment data for S1.
Delta comparison reduces the amount of data to be replicated over low-bandwidth WANs by eliminating
the transfer of redundant data found with replicated, deduplicated data. This feature is typically
beneficial to remote sites with lower-performance Data Domain models.
Replication without deduplication can be expensive, requiring either physical transport of tapes or high
capacity WAN links. This often restricts it to being feasible for only a small percentage of data that is
identified as critical and high value.

283

Reductions through deduplication make it possible to replicate everything across a small WAN link. Only
new, unique segments need to be sent. This reduces WAN traffic down to a small percentage of what is
needed for replication without deduplication. These large factor reductions make it possible to replicate
over a less-expensive, slower WAN link or to replicate more than just the most critical data.
As a result, the lag is as small as possible.

284

Slide 19

Configuring Low Bandwidth Optimization

Copyright 2013 EMC Corporation. All Rights Reserved.

Module 6: Data Replication and Recovery

19

LBO is enabled on a per-context basis. LBO must be enabled on both the source and destination Data
Domain systems. If the source and destination have incompatible LBO settings, LBO will be inactive for
that context. This feature is configurable in the Create Replication Pair settings in the Advanced Tab.
To enable LBO, click the checkbox, Use Low Bandwidth Optimization.
Key points of LBO:
Must be enabled on both source and destination
Can be monitored through the Data Domain Enterprise Manager
Encrypted replication uses the ADH-AES256-SHA cipher suite
Related CLI command:
# replication modify
Enables delta replication on a replication context.

285

Slide 20

Encrypted File Replication

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

20

Encryption over wire or live encryption is supported as an advanced feature to provide further security
during replication. This feature is configurable in the Create Replication Pair settings in the Advanced
tab.
To enable encrypted file replication, click the checkbox, Enable Encryption Over Wire.
It is important to note, when configuring encrypted file replication, that it must be enabled on both the
source and destination Data Domain systems. Encrypted replication uses the ADH-AES256-SHA cipher
suite and can be monitored through the Data Domain Enterprise Manager.
Related CLI command:
# replication modify
Modifies the destination hostname and sets the state of encryption.
Note: This command must be entered on both Data Domain systemsthe source and
destination (target) systems. Only an administrator can set this option.

286

Slide 21

Configuring a Non-Default Connection Port

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

21

The source system transmits data to a destination system listen port. As a source system can have
replication configured for many destination systems (each of which can have a different listen port),
each context on the source can configure the connection port to the corresponding listen port of the
destination.
1.
2.
3.
4.
5.

Go to Replication > Summary > General.


Check the box for the configuration type.
Click the Advanced tab.
Click the checkbox for Use Non-default Connection Host
Change the listen port to a new value.

Related CLI Command:


# replication option set listen-port
Sets the listen port for the Data Domain system.

287

Slide 22

Managing Throttle Settings

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

22

The Throttle Settings area shows the current settings for:


Temporary Override: If configured, shows the throttle rate or 0, which means all replication
traffic is stopped.
Permanent Schedule: Shows the time for days of the week on which scheduled throttling occurs.
To add throttle settings:
1. Click the Replication > Advanced Settings tabs, and click Add Throttle Setting.
The Add Throttle Setting dialog box appears.
2. Set the days of the week that throttling is active by clicking the checkboxes next to the days.
3. Set the time that throttling starts with the Start Time selectors for the hour, minute and
A.M./P.M.

288

In the Throttle Rate area:


1. Click the Unlimited radio button to set no limits.
2. Enter a number in the text entry box (for example, 20000) and select the rate from the dropdown menu (bps, Bps, Kibps, or KiBps).
3. Select the 0 Bps (Disabled) option to disable all replication traffic.
4. Click OK to set the schedule.
The new schedule is shown in the Throttle Settings Permanent Schedule area. Replication runs at the
given rate until the next scheduled change or until a new throttle setting forces a change.

289

Slide 23

Lab 6.1: Managing Replication

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

290

23

Slide 24

Module 6: Data Replication and Recovery

Lesson 3: Monitoring Replication


This lesson covers the following topics:
Replication summary report
Replication status report

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers the following topics:


The replication summary report
The replication status report

291

24

Slide 25

Replication Reports

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

25

Data Domain Enterprise Manager allows you to generate reports to track space usage on a Data Domain
system for a period of up to two years back. In addition, you can generate reports to help understand
replication progress. You can view reports on file systems daily and cumulatively, over a period of time.
Access the Reports view by selecting the Reports stack in the left-hand column of the Data Domain
Enterprise Manager beneath the listed Data Domain systems.

292

Slide 26

Replication Reports

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

26

The Reports view is divided into two sections. The upper section allows you to create various space
usage and replication reports. The lower section allows you to view and manage saved reports.
The reports display historical data, not real-time data. After the report is generated, the charts remain
static and do not update.
The replication status reports includes the status of the current replication job running on the system.
This report is used to provide a snapshot of what is happening for all replication contexts, to help you
understand the overall replication status on a Data Domain System.
The replication summary reports includes network-in and network-out usage for all replication, in
addition to per-context levels on the system during the specified duration. This report is used to analyze
network utilization during the replication process to help understand the overall replication
performance on a Data Domain system.

293

Slide 27

Replication Status Report Details

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

27

The replication status report generates a summary of all replication contexts on a given Data Domain
system with the following information:
ID: the context number or designation or a particular context. The context number is used for
identification; 0 is reserved for collection replication, and directory replication numbering begins
at 1.
Source > Destination: The path between both Data Domain systems in the context.
Type: The type of replication context, will be Directory, MTree, or Collection .
Status: Error or Normal.
Sync as of Time: Time and date stamp of the most recent sync.
Estimated Completion: The estimated time at which the current replication operation should be
complete.
Pre-Comp Remaining: The amount of storage remaining pre-compression (applies only to
collection contexts)
Post-Comp Remaining: The amount of storage remaining post-compression (applies only to
directory, MTree, and collection contexts).

294

If an error exists in a reported context, a section called Replication Context Error Status is added to the
report. It includes the ID, source/destination, the type, the status, and a description of the error.
The last section of the report is the Replication Destination Space Availability, showing the destination
system name and the total amount of storage available in GiB.
Related CLI command:
# replication show performance
Displays current replication activity.

295

Slide 28

Module 6: Data Replication and Recovery

Lesson 4: Data Recovery


This lesson covers the following topics:
Recovering data from an off-site replica
Resyncing recovered data

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson covers the following topics:


Recovering data replicated off-site
Resyncing recovered data

296

28

Slide 29

Recovering Data

Data Domain systems are typically used to store backup data

onsite for short periods of fewer than 90 days.


Offsite Data Domain systems store backup replicas for disaster
recovery purposes.
If onsite backups are lost, the offsite replica can be used to
restore operations.
When the onsite Data Domain system is repaired or replaced,
the data can automatically be recovered from the offsite replica.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

29

Onsite Data Domain systems are typically used to store backup data onsite for short periods such as 30,
60, or 90 days, depending on local practices and capacity. Lost or corrupted files are recovered easily
from the onsite Data Domain system since it is disk-based, and files are easy to locate and read at any
time.
In the case of a disaster destroying onsite data, the offsite replica is used to restore operations. Data on
the replica is immediately available for use by systems in the disaster recovery facility. When a Data
Domain system at the main site is repaired or replaced, the data can be recovered using a few simple
recovery configuration and initiation commands.
If something occurs that makes the source replication data inaccessible, the data can be recovered from
the offsite replica. Either collection or directory replicated data can be recovered to the source. During
collection replication, the destination context must be fully initialized for the recover process to be
successful. Recover a selected data set if it becomes necessary to recover one or more directory
replication pairs.
Note: If a recovery fails or must be terminated, the replication recovery can be aborted.

297

Slide 30

Recover Replication Pair Data

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

30

For directory replication:


1. Go to Replication > More Tasks > Start Recovery...
2. Select the replication type.
3. In the Recovery Details section, select the system to recover to.
4. In the Recovery Details section, select the system to recover from.
5. Select the appropriate context if more than one is listed.
6. Click OK.
Note: A replication recover cannot be performed on a source context whose path is the source path for
other contexts; the other contexts first need to be broken and then resynchronized after the recovery is
complete.
If a recovery fails or must be terminated, the replication recover can be aborted.
Recovery on the source should be restarted again as soon as possible by restarting the recovery.
1. Click the More menu and select Abort Recover. The Abort Recover dialog box appears, showing
the contexts that are currently performing recovery.
2. Click the checkbox of one or more contexts to abort from the list.
3. Click OK.

298

Slide 31

Resynchronizing Recovered Data


Resynchronization is the process of bringing a source and
destination replication pair back into synch with each other.
Source and destination are resynchronized so both endpoints
contain the same data.
Resynchronization can be used:
To convert a collection replication to directory replication.
To re-create a context was lost or deleted.
When a replication destination runs out of space while the
source system still has data to replicate.
When a WAN connection is lost for an extended period of
time.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

31

Resynchronization is the process of recovering (or bringing back into sync) the data between a source
and destination replication pair after a manual break in replication. The replication pair are
resynchronized so both endpoints contain the same data.
Resynchronization can be used:
To convert a collection replication to directory replication. This is useful when the system is to
be a source directory for cascaded replication. A conversion is started with a replication
resynchronization that filters all data from the source Data Domain system to the destination
Data Domain system. This implies that seeding can be accomplished by first performing a
collection replication, then breaking collection replication, then performing a directory
replication resynchronization.
To re-create a context that was lost or deleted.
When a replication destination runs out of space and the source system still has data to
replicate.

299

Slide 32

Resynchronization Process

1. Pause replication between a pair by deleting the context.


2. Use the Start Resync between the pair.
Resync will add the context to both systems and begin the resync

process.
Depending on the amount of data, throughput rates, and load
factors, the resync process can take between several hours and
several days.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

32

To resynchronize a replication pair:


1. Break existing replication by selecting the source Data Domain system, and choosing
Replication. Select the context to break, and select Delete Pair and click OK.
2. From either the source or the destination replication system, click the More menu and select
Start Resync. The Start Resync dialog box appears.
3. Select the source system hostname from the Source System menu.
4. Select the destination system hostname from the Destination System menu.
5. Enter the directory path in the Source text box.
6. Enter the directory path in the Destination text box.
7. Click OK.
This process will add the context back to both the source and destination DDRs and start the resync
process. The resync process can take between several hours and several days, depending on the size of
the system and current load factors.

300

Slide 33

Module 6: Summary
Key points covered in this module:

Replication is a method for storing a real-time, offsite replica of backup

data.
Replicated data is used to restore operations when backup data is lost.
Data replication types include, collection, MTree, and directory.
A replication pair is also called a context.
Replication seeding is a term to describe copying initial source backup
data to a remote destination.
You can resynchronize recovered data when:
You need to recreate a deleted context.
A destination system in a context runs out of space.
You want to convert collection replication to directory replication.

Module 6: Data Replication and Recovery

Copyright 2013 EMC Corporation. All Rights Reserved.

301

33

302

Slide 1

Module 7: Tape Library and VTL Concepts

Upon completion of this module, you should be able to:


Describe virtual tape library (VTL) topology using Data Domain
systems.
Identify requirements and best practices for configuring VTL
on a Data Domain system.
Identify steps to configure VTL on a Data Domain
system.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

In this module, you will learn about things to consider when planning, configuring, and managing a
virtual tape library (VTL).

303

Slide 2

Module 7: Tape Library and VTL Concepts

Lesson 1: Data Domain VTL Overview


This lesson covers the following topics:
VTL configuration overview
Benefits of Data Domain VTL
Simple VTL terminology

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

In this lesson, you will become familiar with the virtual tape library (VTL) environment that is
configurable on a Data Domain system.

304

Slide 3

Overview of Data Domain Virtual Tape Library (VTL)


Its use is motivated by the need to leverage existing IT backup policies

using a strategy of physical tape libraries.


Typical configuration is an HBA-equipped host connecting to an FC SAN
to an HBA-equipped Data Domain system.
The host can be Windows, Linux, Unix, Solaris, IBM i, NetApp, VNX, or
any NAS supporting a Fibre Channel card.
VTLs emulate physical tape equipment and function.
Virtual tapes and pools can be replicated over a Data Domain
replication context and later archived to physical tape, if required.
The backup application manages all data movement to and from the
Data Domain system and all tape creation.
Data Domain replication operations manage virtual tape replication
and vaulting.
Enterprise Manager configures and manages tape emulation.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

In some environments, the Data Domain system is configured as a virtual tape library (VTL). This practice
may be motivated by the need to leverage existing backup policies that were built using a strategy of
physical tape libraries. Using a VTL can be an intermediate step in a longer range migration plan toward
disk-based media for backup. It might also be driven by the need to minimize the effort to recertify a
system to meet compliance needs.
A Fibre Channel HBA-equipped host connecting to an FC SAN can ultimately connect to a Fibre Channel
HBA-equipped Data Domain system. When properly zoned, the host can send its backups via VTL
protocol directly to the Data Domain system as if the Data Domain system were an actual tape library
complete with drives, robot, and tapes.
This host can be a Windows, Linux, Unix, Solaris, IBM i, NetApp, VNX, or any NAS that support having a
Fibre Channel card in it.
Virtual tape libraries emulate the physical tape equipment and function. Virtual tape drives are
accessible to backup software in the same way as physical tape drives. Once drives are created in the
VTL, they appear to the backup software as SCSI tape drives. A virtual tape library appears to the backup
software as a SCSI robotic device accessed through standard driver interfaces.

305

When disaster recovery is needed, pools and tapes can be replicated to a remote Data Domain system
using the Data Domain replication process and later archived to tape.
Data Domain systems support backups over the SAN via Fibre Channel HBA. The backup application on
the backup host manages all data movement to and from Data Domain systems. The backup application
also directs all tape creation. Data Domain replication operations manage virtual tape replication, and
vaulting. The Data Domain Enterprise Manager is used to configure and manage tape emulations.

306

Slide 4

VTL Using NDMP


NAS
running NDMP
client software

LAN

Backup data is
sent over TCP/IP

clients
Server configured
with Ethernet NIC

VTL
Data Domain system
configured with
NDMP tape server
receives backup data
and places onto
virtual tapes

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

NDMP (Network Data Management Protocol) is an open-standard protocol for enterprise-wide backup
of heterogeneous network-attached storage. NDMP was co-invented by Network Appliance and PDC
Software (acquired by Legato Systems, Inc., and now part of EMC).
Data Domain systems support backups using NDMP over TCP/IP via standard Ethernet as an alternate
method. This offers a VTL solution for remote office/back office use.
Data servers configured only with Ethernet can also back up to a Data Domain VTL when used with an
NDMP tape server on the Data Domain system. The backup host must also be running NDMP client
software to route the server data to the related tape server on the Data Domain system.
When a backup is initiated, the host tells the server to send its backup data to the Data Domain VTL tape
server. Data is sent via TCP/IP to the Data Domain system where it is captured to virtual tape and stored.
While this process can be slower than Fibre Channel speeds, a Data Domain can function as an NDMP
tape server in an NDMP environment over IP.

307

Slide 5

Data Domain VTL Benefits

Easily integrates with an existing Fibre Channel or tape-based

infrastructure.
Allows simultaneous use of VTL with NAS, NDMP, and DD Boost
Eliminates disk-based storage issues related to physical tape.
Simplifies and speeds up backups through the use of Data
Domain deduplication technology.
Reduces RTO by eliminating the need for physical tape handling.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

A Data Domain virtual tape library (VTL) offers a simple integration, leveraging existing backup policies.
A Data Domain VTL can leverage existing backup policies in a backup system currently using a strategy of
physical tape libraries.
Any Data Domain system running VTL can also run other backup operations using NAS, NDMP, and DD
Boost simultaneously.
A Data Domain VTL eliminates the use of tape and the accompanying tape-related issues (large physical
storage requirement, off-site transport, high time to recovery, and tape shelf life) for the majority of
restores. Compared to normal tape technology, a Data Domain VTL provides resilience in storage
through the benefits of Data Invulnerability Architecture (DIA) (end-to-end verification, fault avoidance
and containment, continuous fault detection and healing, and file system recoverability).

308

Compared to physical tape libraries, Data Domain systems configured for VTL, simplify and speeds up
backups through the use of deduplication technology. Backups are also speedier with the use of virtual
tape does not need to wind, rewind, or position to a particular spot. Robotic movement of tapes is also
eliminated, which speeds up the overall performance of the tape backup.
Disk-based network storage provides a shorter RTO by eliminating the need for handling, loading, and
accessing tapes from a remote location.

309

Slide 6

VTL Configuration Terms (slide 1 of 2)


Access
Group
Barcode
CAP

A collection (list) of initiator WWPNs or initiator names and the drives and
changers they are allowed to access. The equivalent of LUN masking.
A unique ID for a virtual tape that is assigned when the user creates the virtual
tape cartridge.
Cartridge access port. In a VTL, a CAP is the emulated tape enter/eject point for
moving tapes to or from a library.
Also called: mail slot

Changer

A device that handles the tape between a tape library and the tape drive. In the
virtual tape world, the system emulates a specific changer type.

Initiator

Any Data Domain Storage System clients HBA world-wide port name (WWPN).
An initiator name is an alias that maps to a clients WWPN.

Library

A collection of magnetic tape cartridges used for long term data backup. A
virtual tape library emulates a physical tape library with tape drives, changers,
CAPs, and slots (cartridge slots).
Also called: autoloader, tape silo, tape mount, tape jukebox

Pool

A collection of tapes that maps to a directory in the Data Domain system, used
to replicate tapes to a destination.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

Different tape library products may package some components in different ways, and the names of
some elements may differ among products, but the fundamental function is basically the same. The
Data Domain features VTL configuration including tape libraries, tapes, cartridge access ports, and
barcodes.

Access Group (VTL Group)


A collection (list) of initiator worldwide port names (WWPNs) or initiator names and the drives
and changers they are allowed to access. It is the equivalent of LUN masking. For multiple hosts
to use the same devices, the Data Domain Storage System requires you to create different
access groups for each host. A group consists of exactly one host (initiator), one or more target
FC ports on the Data Domain Storage System, and one or more devices. The Data Domain
Storage System does not permit multiple hosts to access the same group.

Barcode
A unique ID for a virtual tape. Barcodes are assigned when the user creates the virtual tape
cartridge.

310

CAP
An abbreviation for cartridge access port. A CAP enables the user to deposit and
withdraw volumes in an autochanger without opening the door to the autochanger. In a VTL, a
CAP is the emulated tape enter/eject point for moving tapes to or from a library.
Also called: mail slot.

Changer (Tape Backup Medium Changer)


The device that handles the tape between a tape library and the tape drive. In the virtual tape
world, the system creates an emulation of a specific type of changer.
Although no tapes are physically moved within the Data Domain VTL system, the virtual tape
backup medium changer must emulate the messages your backup software expects to see when
tapes are moved to and from the drives. Selecting and using the incorrect changer model in your
VTL configuration causes the system to send incorrect messages to the backup software, which
can cause the VTL system to fail.

Initiator
Any Data Domain Storage System clients HBA WWPN. An initiator name is an alias that maps to
a clients WWPN.

Library
A collection of magnetic tape cartridges used for long-term data backup. A virtual tape library
emulates a physical tape library with tape drives, changer, CAPs, and slots (cartridge slots).
Also called: autoloader, tape silo, tape mount, tape jukebox, vault.

Pool
A collection of tapes that maps to a directory on a file system, used to replicate tapes to a
destination.
Note: Data Domain pools are not the same as backup software pools. Most backup software,
including EMC NetWorker, has its own pooling mechanism.

311

Slide 7

VTL Configuration Terms (slide 2 of 2)


Slot

A storage location within a library. For example, a tape library has one slot for
each tape that the library can hold.

Tape

A tape is a cartridge holding magnetic tape used to store data long term. Tapes
are virtually represented in a system as grouped data files. The user can
export/import from a vault to a library, move within a library across drives,
slots, and CAPs.
Also called: cartridge.

Tape Drive

The device that records backed-up data to a tape. In the virtual tape world, this
drive still uses the same Linear Tape-Open (LTO) technology standards.

Vault

A holding place for tapes not currently in any library.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

Slot
A storage location within a library. For example, a tape library has one slot for each tape that the
library can hold.

Tape
A cartridge holding magnetic tape used to store data long term. Tapes are virtually represented
in a system as grouped data files. The user can export/import from a vault to a library, and move
within a library across drives, slots, and CAPs.
Also called: cartridge.

Tape Drive
The device that records backed-up data to a tape cartridge. In the virtual tape world, this drive
still uses the same Linear Tape-Open (LTO) technology standards as physical drives with the
following capacities:
LTO-1: 100 GB per tape
LTO-2: 200 GB per tape
LTO-3: 400 GB per tape

312

There are additional generations of LTO, but only LTO -1, -2, and -3 are currently supported by
Data Domain. Each drive operates as a single data stream on your network.

Vault
A holding place for tapes not currently in any library. Tapes in the vault eventually have to be
inserted into the tape library before they can be used.

313

Slide 8

Module 7: Tape Library and VTL Concepts

Lesson 2: VTL Planning


This lesson covers the following topics:
Review of Data Domain configuration specifications
What to consider when planning a VTL environment
Tape size and count considerations

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

In this lesson, you will become familiar with the evaluation process to determine the capacity and
throughput requirements of a Data Domain system.
Note: This lesson is intended to be a simplified overview of Data Domain VTL configuration planning.
Typically, any production Data Domain system running VTL has been assessed, planned, and configured
by Data Domain implementation experts prior to installation and production.

314

Slide 9

VTL Planning: Capacity and Scalability


Depending on the amount of memory, a Data Domain system can have

between 64 and 540 LTO-1, LTO-2, or LTO-3 tape drives per system:
DD990 has a 540 virtual drive capacity
DD890 has a 256 virtual drive capacity
DD6xx has a 64 virtual drive capacity
A single Data Domain system can support:
Up to 64 virtual libraries
Up to 32k slots per library and 64k slots per system
Up to 100 CAPs per library and 1000 CAPs per system
Up to 4000 GiB per tape.

Note: These are some of the maximum capacities for various features in a VTL
configuration for the larger Data Domain systems. Check the VTL Best Practices Guide for
recommendations for your system and configuration.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

In setting up a virtual tape library (VTL) on a Data Domain system, you configure parameters in the
environment to structure the number and size of elements within each library. The parameters you
choose are dictated by the tape technology and library you are emulating. Efficiencies are dictated by
the processing power and storage capacity of the Data Domain restorer being used as the VTL systems.
Larger, faster systems allow more streams to write to a higher number of virtual tape drives, thus
providing faster virtual tape backups.
Libraries: All systems are currently limited to a maximum of 64 libraries, (64 concurrently active VTL
instances on each Data Domain system).
Drives: Up to 540 tape drives are supported, depending on the Data Domain model. A DD6xx, model can
have a maximum of 64 drives. A DD890 model can have a maximum of 256 drives.
Note: Although a DD890 can configure up to 256 tape devices, the system is limited to a maximum
stream limit of 180 streams. Additional drives beyond the 180 can be configured for provisioning per
backup policies.
Initiators: A maximum of 92 initiator names or WWPNs can be added to a single access group.

315

Slots: The maximum numbers of slots in the library are:


32,000 slots per library
64,000 slots per system
The system automatically adds slots to keep the number of slots equal to or greater than the
number of drives.
CAPs: The maximum numbers of cartridge access ports (CAPs) are:
100 CAPs per library
2000 CAPs per system
Tapes: Can be configured to 4000 GiB per tape.
Note: The information presented on this slide indicates some of the maximum capacities for the various
features in a Data Domain VTL configuration. Your backup host may not support these capacities. Refer
to your backup host software support for correct sizing and capacity to fit your software.
Understand that the Data Domain VTL is scalable and should accommodate most configurations.
Standard practices suggest creating only as many tape cartridges as needed to satisfy backup
requirements, and enough slots to hold the number of tapes you create. Creating additional slots is not
a problem. The key in good capacity planning is to not be excessive beyond the system needs and add
capacity as needed.
For further information about the definitions and ranges of each parameter, consult the DD OS 5.2
System Administration Guide and the most current VTL Best Practices Guide. Both are available through
the Data Domain Support Portal.

316

Slide 10

Considerations When Planning VTL

VTL license
Fibre Channel hardware
Number of slots and drives
Space management considerations
Backup size
Data type
Retention periods and expired media
Replication

Working with your EMC implementation and support team

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

10

As you plan your VTL configuration, be sure to give special consideration to the following:

VTL License
VTL is a licensed feature of the Data Domain system. Only one license is needed to back up to a
Data Domain configured for VTL.

Fibre Channel Hardware Considerations


There are many 4 GB and 8 GB Fibre Channel port solutions for target mode Fibre Channel
attachment. All connections to these ports should be via a Fibre Channel switch or direct
attachment of a device. Check the DD OS 5.2 Backup Compatibility Guide found in the Data
Domain Support Portal to see if a specific Fibre Channel HBA card is supported. The DD OS 5.2
Backup Compatibility Guide indicates which driver and DD OS versions are required.

Fibre Channel Switch Compatibility


Data Domain systems can be connected to hosts through FC switches or directors. When adding
or changing a switch/director, consult the DD OS 5.2 Backup Compatibility Guide found in the
Data Domain Support Portal to determine compatibility and the firmware, DD OS version, and
type of support (VTL, IBM i, or gateway) it offers prior to installation and use.

317

When you establish fabric zones via FC switches, the best way to avoid problems with VTL configurations
is to include only one initiator and one target port in one zone. Avoid having any other targets or
initiators in any zones that contain a gateway target HBA port.

The following recommendations apply when connecting the Data Domain system to a backup
host via Fibre Channel:
Only initiators that need to communicate with a particular set of VTL target ports on a Data
Domain system should be zoned with that Data Domain system.
The host-side FC port must be dedicated to Data Domain VTL devices.
All host-side FC HBAs should be upgraded to the latest driver version for the OS being used.
If you are uncertain about compatibility with your FC HBAs installed in an application server
and operating as initiators for VTL, consult the DD OS 5.2 Backup Compatibility Guide,
available on the Support Portal or contact Support for assistance and advice.
When establishing fabric zones via FC switches, the best way to avoid problems with VTL
configurations is to include only one initiator and one target port in one zone.

The following recommendations apply to target HBAs:


Consider spreading the backup load across multiple FC ports on the Data Domain system in
order to avoid bottlenecks on a single port.
Verify the speed of each FC port on the switch to confirm that the port is configured for the
desired rate.
Set secondary ports to None unless explicitly necessary for your particular configuration.

Number of Slots and Drives for a Data Domain VTL Configuration


In a physical tape library setting, multiplexing sending data from multiple clients interleaving
the data onto a single tape drive simultaneously is a method to gain efficiency by sending data
from multiple clients to a single tape drive. Multiplexing was useful for clients with slow
throughput since a single client could not send data fast enough to keep the tape drive busy.
With Data Domain VTL, multiplexing causes existing data to land on a Data Domain system in a
different order each time a backup is performed. Multiplexing makes it nearly impossible for a
system to recognize repeated segments, thus ruining deduplication efficiency. Do not enable
multiplexing on your backup host software when writing to a Data Domain system.
To increase throughput efficiency and maintain deduplication-friendly data, establish multiple
data streams from your client system to the Data Domain system. Each stream will require
writing to a separate virtual drive.

The number of slots and drives in a VTL are governed by the number of simultaneous backup
and restore streams that are expected to run. Drive counts are also constrained by the
configuration and overall performance limits of your particular Data Domain system. Slot counts
are typically based on the number of tapes are used over a retention policy cycle.

318

Data Domain Space Management Considerations


It is important to note that the same considerations for capacity planning also apply when you
are planning a VTL environment. Space management considerations include:
The size of your backups: The larger the overall amount you need to back up, the more time
should be allotted to perform the backups. Using multiple drives and data streams should be
a consideration. The more powerful your Data Domain system, the greater number of
concurrent streams you can employ.
The source data type: How many files are you backing up? If you are backing up larger files,
perhaps you should consider using larger capacity tapes.
Retention periods and data space: How long do you need to hold on to your backups? You
cannot recover the data space used by a tape if the tape is still holding unexpired data. This
can be a problem if you are managing smaller file sets on large tapes. Smaller tapes give you
more flexibility when dealing with smaller data sets.
Expired media is not available for space reclamation (file system cleaning) until the volume
is also relabeled. Relabeling the expired tape volume places it in a state that allows the
space reclamation process to dereference and subsequently delete the unique blocks
associated with the backups on that volume.
You may want to use a backup script using backup software commands to force relabeling
volumes as they are expired. Some backup software will always use a blank tape in
preference to one with customer data, and if there are a lot of unnecessary tapes, space
reclamation will be inefficient.
Replication: Replication and VTL operations require substantial resources and will complete
faster if they are run separately. It is good practice to run VTL and replication operations
separately.

Work with Your EMC implementation and Support Team


Be sure to work closely with your EMC implementation team to properly size, configure, and test your
VTL system design before running it in a production backup scenario.

319

Slide 11

Tape Size Considerations

Check your specific backup application requirements.


Choose larger tapes if you are backing up large single data files.
Choose a strategy of smaller tapes across a larger number of

drives to operate a greater number of data streams for increased


bandwidth.
Expired tapes are not deleted, and the space occupied by that
tape is not reclaimed until it is relabeled, overwritten, or
deleted.
Smaller tapes are easier to manage and alleviate system full
conditions.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

11

Choosing the optimal size of tapes for your needs depends on multiple factors, including the specific
backup application being used, and the characteristics of the data being backed up. In general, its better
to use a larger number of smaller capacity tapes than a smaller number of large capacity tapes, in order
to control disk usage and prevent system full conditions.
When choosing a tape size, you should also consider the backup application being used. For instance,
Hewlett Packard Data Protector supports only LTO-1 /200 GB capacity tapes.
Data Domain systems support LT0-1, LTO-2, and LTO-3 formats.
LTO-1: 100 GB per tape
LTO-2: 200 GB per tape
LTO-3: 400 GB per tape
If the data you are backing up is large, (over 200 GB, for example), you may want larger-sized tapes since
some backup applications are not able to span across multiple tapes.

320

The strategy of using smaller tapes across many drives gives your system greater throughput by using
more data streams between the backup host and Data Domain system.
Larger capacity tapes pose a risk to system full conditions. It is more difficult to expire and reclaim the
space on data being held on a larger tape than on smaller tapes. A larger tape can have more backups
on it, making it potentially harder to expire because it might contain a current backup on it. Expired
tapes are not deleted, and the space occupied by that tape is not reclaimed until it is relabeled,
overwritten, or deleted. Consider a situation in which 30% of your data is being held on a 1TB tape. You
could recover half of that data space (500 GB) and still not be able to reclaim any of that space while the
tape is still holding unexpired data.

321

Slide 12

Tape Sizing
Unexpired and active
data pointers

All backup images on


a tape must expire, by
policy or manually,
before the space in the
cartridge can be
relabeled and made
available for reuse.
For this reason,
smaller capacity tapes
generally work better
when backing up
smaller amounts of
data to tape.

Expired backups still claiming disk segments


until all files on the tape expire

All data segments identified as part of the VTL tape are


treated as a complete set of data. File system cleaning
cannot run on a tape until all data on the tape is expired.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

12

All backups on a tape must expire, by policy or manually, before the space in the cartridge can be
relabeled and made available for reuse. If backups with different retention policies exist on a single
piece of media, the youngest image will prevent file system cleaning and reuse of the tape. You can
avoid this condition by initially creating and using smaller tape cartridges in most cases, tapes in the
100GB to 200GB range.
Unless you are backing up larger-size files, backing up smaller files to larger-sized tapes will contribute to
this issue by taking longer to fill a cartridge with data. Using a larger number of smaller-sized tapes can
reduce the chances of a few young files preventing cleaning older data on a larger tape.

322

Slide 13

Tape Count Guidelines

Create only as many tapes as you need to satisfy holding your

back up data until your retention expires and space can be


reclaimed.
A starting tape count should be less than 2x the available space
on the Data Domain system.
Creating too many virtual tapes might cause the Data Domain

system to reach system full conditions prematurely.

Optimal tape size depends on the size of the files being backed

up and the backup application used. A good rule of thumb: use


small-sized tapes with small file types and larger-sized tapes with
larger file types.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

13

When deciding how many tapes to create for your VTL configuration, remember, that creating more
tapes than you actually need might cause the system to fill up prematurely and cause unexpected
system full conditions. In most cases, backup software will use blank tapes before recycling tapes. It is a
good idea to start with a tape count less than twice the available space on the Data Domain system.

323

Slide 14

Tapes VTL Barcode Definition

Tape barcodes are 8-character tape identifiers


When creating tapes you must provide a starting barcode
A starting barcode specifies:
A 2 or 3 character unique identifier
A 3 or 4 digit number marking the beginning of the sequence of tapes
A 2 character designation identifying the default capacity for each tape
if not otherwise specified
identifier

tape capacity

Code
L1
L2
L3
LA
LB
LC

sequence

The first tape number is labeled 100;


the numbers increment serially to 999.

Capacity
100 GiB
200 GiB
400 GiB
50 GiB
30 GiB
10 GiB

Tape Type
LTO-1
LTO-2
LTO-3

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

14

When a tape is created, a logical, eight-character barcode is assigned that is a unique identifier of a tape.
When creating tapes, the administrator must provide the starting barcode. The barcode must start with
six numeric or uppercase alphabetic characters (from the set {0-9, A-Z}). The barcode may end with a
two-character tag for the supported LT0-1, LT0-2, and LT0-3 tape types.
A good practice is to use either two or three of the first characters as the identifier of the group in which
the tapes belong. If you use two characters as the identifier, you can then use four numbers in sequence
to number up to 10,000 tapes. If you use three characters, you are able to sequence only 1000 tapes.
Note: If you specify the tape capacity when you create a tape through the Data Domain Enterprise
Manager, you will override the two-character tag capacity specification.

324

Slide 15

Module 7: Tape Library and VTL Concepts

Lesson 3: Configure Data Domain as VTL


This lesson covers the following topics:
Creating a tape library
Creating tapes
Importing tapes
Configuring the physical resources used for VTL
NDMP tape server configuration
VTL support for IBM-I

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

In this lesson, you will see the steps you would take to create a library and tapes, and set the logical
interaction between the host initiators and their related access groups.
Basic NDMP tape server configuration with a Data Domain VTL library and a brief overview of VTL
support for IBM i products are also presented.

325

15

Slide 16

Overview of Configuring a Virtual Tape Library

Verify or configure the VTL license


Enable and configure the VTL service
Create a tape library, including drives, slots, changer, and CAPs
Create an access group
Create tapes
Import tapes
Create pools
Add initiators
Add LUNs

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

16

The Enterprise Manager Configuration Wizard walks you through the initial VTL configuration, using the
VTL configuration module. Typically, the Configuration Wizard is run initially by the EMC installation
team in your environment.
To open the Enterprise Manager Configuration Wizard, go to the Enterprise Manager, and select
Maintenance > More Tasks > Launch Configuration Wizard.
Navigate to the VTL configuration, and click No until you arrive at the VTL Protocol configuration section.
Select Yes to configure VTL.
The wizard steps you through library, tape, initiator, and access group configuration.
Manual configuration is also possible. Manually configuring the tape library and tapes, importing tapes,
configuring physical resources, setting initiators, and creating VTL access groups are covered in the
following slides.

326

Slide 17

Creating a Library

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

17

Libraries identify the changer, the drives, the drives associated slots and CAPs, and tapes to be used in a
VTL configuration.

To create a library outside of the configuration manager, go to Data Management > VTL
Click the Virtual Tape Libraries stack > More Tasks menu > Library > Create

Pictured here is the Create Library window in the Data Domain Enterprise Manager.
If the VTL is properly planned ahead of time, you should know the values to enter when creating a
library.

327

Keep in mind the capacities and scalability of the elements configured when creating a library (see the
earlier slide on capacity and scalability).
1. Check the backup software application documentation on the Data Domain support site for the
model name you should use with your application. Typically, Restorer-L180 is used only with
Symantec NetBackup and BackupExec software. TS3500 is used with various backup applications
and various OS versions. If you intend to use TS3500 as your changer emulator, check the DD OS
5.2 Backup Compatibility Guide to be sure TS3500 is supported with your selected OS version
and backup application.
2. Click OK.
The new library appears under the Libraries icon in the VTL Service stack. Options configured
above appear as icons under the library. Clicking the library displays the configuration details in
the informational pane.
Related CLI Commands:
# vtl add
Creates/adds a tape library.
# vtl enable
Enables VTL subsystem.
# vtl disable
Closes all libraries and shuts down the VTL process.

328

Slide 18

Creating Tapes

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

18

To create tapes:
1. Select the Virtual Tape Library stack, then click the library for which you want to create tapes. In
this case the library titled VTL is selected.
2. From the More Tasks menu (not pictured), select Tapes > Create
The Create Tapes pane appears as shown in this slide.
Refer to your implementation planning, to find the number, capacity, and starting barcode for your tape
set.
A VTL supports up to 100,000 tapes, and the tape capacity can be up to 4000 GiBs.
You can use the Enterprise Manager to create tapes.
You can create tapes from within a library, a vault, or a pool.
Related CLI commands:
# vtl tape add
Adds one or more virtual tapes and inserts them into the vault. Optionally, associates the tapes
with an existing pool for replication.

329

Slide 19

Importing Tapes

Select the tapes


to import

then click
Import from Vault

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

19

When tapes are created, they are added into the vault. From the vault, tapes can be imported, exported,
moved, searched, and removed. Importing moves existing tapes from the vault to a library slot, drive, or
cartridge access port (CAP). The number of tapes you can import at one time is limited by the number of
empty slots in the library.
To import tapes:
1. Select Data Management > VTL > VTL Service > Libraries.
2. Select a library and view the list of tapes, or click More Tasks and select Tapes > Import
3. Enter the search criteria about the tapes you want to import and click Search.
4. Select the tapes to import from the search results.
or
1. Select Data Management > VTL > VTL Service > Libraries.
2. Select the tapes to import by clicking the checkbox next to a tape, a barcode column or select all
by clicking the top of the checkbox column.
3. Only tapes showing Vault in the location are imported.
4. Click Import from Vault.

330

Related CLI Commands


# vtl import
Moves existing tapes from the vault into a slot, drive, or cartridge access port (CAP).
# vtl export
Removes tapes from a slot, drive, or cartridge access port (CAP) and sends them to the vault.

331

Slide 20

Overview of Configuring Physical Resources

Enable the HBA ports on the Data Domain system.


Check with Networking that SAN switch is properly zoned.
Locate the initiators in the Physical Resources stack of the DD
Enterprise Manager, and set the initiator aliases.
Configure the VTL access groups.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

20

There are three steps to configuring the physical resources used for VTL communication:
1. Enable the HBA ports to be used with your VTL configuration.
2. Work with Networking resources that the SAN switch is connected and zoned properly between
the host and the Data Domain system.
3. Locate and set the alias of the initiators in the Physical Resources stack in the Data Domain
Enterprise Manager.
4. Configure the VTL access groups.

332

Slide 21

Enabling HBA Ports

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

21

To enable HBA ports:


1. Select Data Management > VTL > Physical Resources > HBA Ports > More Tasks (not shown in
this slide) > Ports Enable.
The Enable Ports dialog box appears. Only the currently disabled ports are listed.
2. In the Enable Ports dialog box, click the checkboxes of the ports to enable.
3. Click Next to verify the configuration.
4. When the Enable Ports status dialog box displays Completed, click Close.
Related CLI commands:
# vtl port disable
Disables a single Fibre Channel port or all Fibre Channel ports in the list.
# vtl port enable
Enables a single Fibre Channel port or all Fibre Channel ports in the list.

333

Slide 22

Setting an Initiator Alias

Assigned access group


for this initiator
Initiator with
no alias

Assigned
accessnode
group
The world-wide
for
this initiator
number
and port
number of the FC port
in the media server

Initiator listed with alias

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

22

An initiator is any Data Domain Storage System clients HBA worldwide port name (WWPN) that belongs
to the backup host. An initiator name is an alias that maps to a clients WWPN. The Data Domain system
interfaces with the initiator for VTL activity. Initiator aliases are useful because it is easier to reference a
name than an eight-pair WWPN number when configuring access groups.
For instance, you might have a host server with the name HP-1, and you want it to belong to a group HP1. You can name the initiator coming from that host server as HP-1. You can then create an access group
also named HP-1 and ensure that the associated initiator has the same name.
To set the alias of an initiator:
1. Click Data Management > VTL > Physical Resources > Initiators.
2. Select the initiator you want to alias.
3. Click More Tasks > Set Alias

334

Related CLI Commands:


# vtl initiator set alias
adds an initiator alias
# vtl initiator show initiator
shows configured initiators.
# vtl initiator reset alias
removes an initiator alias.

335

Slide 23

VTL Access Group

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

23

A VTL access group (or VTL group) is created to manage a collection of initiator WWPNs or aliases and
the drives and changers they are allowed to access. Access group configuration allows initiators in
backup applications to read and write data only to the devices included in the access group list. An
access group may contain multiple initiators (a maximum of 128), but an initiator can exist in only one
access group. A maximum of 512 initiators can be configured for a Data Domain system.
A default access group exists named TapeServer, to which you can add devices that support NDMPbased backup applications. Configuration for this group is discussed in the next slide.
Access groups are similar to LUN masking. They allow clients to access only selected LUNs (media
changers or virtual tape drives) on a system through assignment. A client set up for an access group can
access only those devices in the access group to which it is assigned.
Note: Avoid making access group changes on a Data Domain system during active backup or restore
jobs. A change may cause an active job to fail. The impact of changes during an active job depends on a
combination of backup software and host configurations.

336

To create an access group in the Data Domain Enterprise Manager:


1. Navigate to Data Management > VTL > Access Groups > Groups > More Tasks > Group > Create
2. In the Configuration window, name the access group, and select the initiators to add to it.
3. Click Next.
A window appears in which you can add devices by selecting the library, and choosing from a list
of devices, and identifying the LUN number, as well as the primary and secondary (failover)
ports it should use.
Related CLI Commands:
# vtl group add
Adds an initiator or a device to a group.
# vtl group create
creates a group.
# vtl group del
Removes an initiator or device from a group.
# vtl group destroy
Destroys a group.
# vtl group modify
Modifies a device in a group.
# vtl group rename
Renames a group.
# vtl group show
Shows configured groups.
# vtl group use
Switches the ports in use in a group or library to the primary or secondary port list.

337

Slide 24

VTL Access Group (continued)

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

24

The Initiators tab of the Access Group shows the Initiator alias and its related WWPN that is grouped to
the LUNs listed in the LUNs tab.
It is showing the administrator that the host associated to this initiator can see the changers and drives
listed in the LUNs tab.

338

Slide 25

Introduction to Tape Server Configuration

TapeServer does not require a FibreChannel HBA. It does not use


an HBA if one is installed.
Devices assigned to the access group TapeServer can be
accessed only by the NDMP TapeServer.
An NDMP user is associated with the configuration for
authentication purposes. DD OS users can be used, but the
password is plain over the network. NDMPD adds the user and
can enable password encryption for added security.
The top level CLI command is NDMPD.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

25

When configuring an NDMP over TCP/IP configuration, a Data Domain system starts an NDMP tape
server.
NDMP tape servers are accessed via a standard NDMP protocol. For more details see http://ndmp.org.
The host server must have NDMP client software installed and running. This client software is used to
remotely access the Data Domain VTL.
Devices assigned to the access group TapeServer on the Data Domain system can be accessed only by
the NDMP TapeServer
The NDMP tape server on the Data Domain system converts this data to tape I/O, and writes to the Data
Domain VTL.
An NDMP user is associated with the configuration for authentication purposes. DDOS users can be used
but the password is plain over the network. NDMPD adds the user and can enable password encryption
for added security.
The top level CLI command is NDMPD.

339

Slide 26

Tape Server Configuration


Enable the NDMP daemon
sysadmin@dddev-01# ndmpd enable
Starting NDMP daemon, please wait
NDMP daemon is enabled

Make sure NDMP daemon sees the devices in the TapeServer access group
sysadmin@dddev-01# ndmpd show devicenames
NDMP Device
Virtual Name
Vendor
----------------- -----------------------/dev/dd_ch_c0t310
Mydd610 changer
STK
/dev/dd_ch_c0t410
Mydd610 drive 1
IBM
/dev/dd_ch_c0t510
Mydd610 drive 2
IBM
/dev/dd_ch_c0t910
Mydd610 drive 3
IBM
/dev/dd_ch_c0t1310 Mydd610 drive 4
IBM

Product
-----------L180
ULTRIUM-TD3
ULTRIUM-TD3
ULTRIUM-TD3
ULTRIUM-TD3

Serial #
---------3478270003
3478270004
3478270005
3478270006
3478270007

Add and verify an NDMP user for the ndmpd service


sysadmin@dddev-01# ndmpd user add ndmp
Enter password:
Verify password:
sysadmin#@dddev-01# ndmpd user show
ndmp

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

26

The following steps configure an NDMP tape server on the Data Domain system.
1. Enable the NDMP daemon by typing the CLI command ndmpd enable.
2. Verify that the NDMP daemon sees the devices created in the TapeServer access group
Note: you must first create a VTL per the instructions discussed earlier in this module, and then
assign the access group, TapeServer, before performing this step. Enter the command ndmpd
show devicenames.
The VTL device names will appear as a table as shown in this slide.
3. Add an NDMP user for the ndmpd service. Enter the command, ndmpd user add ndmp.
When prompted, enter and verify the password for this user. Verify the created user by entering
the command, ndmpd user show. The username appears below the command.

340

Slide 27

Tape Server Configuration (continued)

Check the options for the ndmpd daemon


sysadmin@dddev-01# ndmpd option show all
Name
Value
---------------------authentication
text
debug
disabled
port
10000
preferred-ip
----------------------

Set the ndmp service authentication to MD5


sysadmin@dddev-01# ndmpd option set authentication md5

Verify the service authentication was correctly set to MD5


sysadmin@dddev-01# ndmpd option show all
Name
Value
---------------------authentication
md5
debug
disabled
port
10000
preferred-ip
----------------------

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

27

(Continued from previous slide)


4. Check the options for the ndmpd daemon. Enter the command ndmpd option show all.
A table showing the names of the options appears as shown in this slide.
Note that the authentication value is set to text. That means your authentication to the ndmp
daemon is transmitted as plain text: this is a possible security risk.
5. Set the ndmpd service authentication to MD5. Enter the command, ndmpd option set
authentication md5.
6. Verify the service.

341

Slide 28

VTL Support for IBM i

Data Domain System

The TS3500 library type as VTL is used specifically

for IBM iSeries / AS400 support.


VTL is configured with IBM LTO-3 drives.
VTL support requires a special IBM i license.
License must be active before configuration.
Configuration must be set up after the licensing

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

28

The IBM power systems utilize a hardware abstraction layer, commonly referred to as the physical
hardware. All peripheral equipment must emulate IBM equipment, including IBM tape libraries and
devices, when presented to the operating system.
Additionally, the hardware drivers used by these systems are embedded in the LIC and IBM i operating
system. LIC PTFs, or program temporary fixes, are IBM's method of updating and activating the drivers.
In most cases, hardware configuration settings cannot be manually configured, as only IBM, or
equipment that emulates IBM equipment is attached, requiring only fixed configuration settings.
Fibre Channel devices can be connected directly to host (direct attach) through FC-AL topology or
through a switched fabric (FC-SW) topology. Please note that the Data Domain VTL supports only
switched fabric for connectivity. The Fibre Channel host bus adapters or IOAs (input/output adapters)
can negotiate at speeds of 2 Gbps, 4 Gbps, and 8 Gbps in an FC-SW environment without any
configuration on the operating system other than plugging in the cable at the host. Fibre Channel IOPs
and IOAs are typically installed by an IBM business partner.

342

Virtual Libraries
Data Domain VTL supports one type of library configuration for IBM i use. This is an IBM TS3500
configured with IBM LT03 virtual tape drives. Virtual library management is done from the Virtual Tape
Libraries tab. From Virtual Tape Libraries > More Tasks > Library > Create, you can set the number of
virtual drives and the number of slots.
A special VTL license that supports IBM i use is required. This special license supports other VTL
configurations as well, but the standard VTL license does not directly support IBM i configurations.
IBM i virtual libraries are not managed any differently from other operating systems. Once the library
and tapes are created, they are managed either by BRMS (IBM's tape management on the i) or through
other IBM i native command access or third-party tape management systems. The only library
supported on the IBM i is the TS3500, and LTO3 drives. They must be created after you add the i/OS
license to the DD system to have the correct IBM i configuration.
Refer to the Virtual Tape Library for IBM System i Integration Guide for current configuration
instructions available in the support portal for all configuration and best practices information when
using VTL in an IBM i environment.

343

Slide 29

Lab 7.1: Configuring VTL with EMC Networker

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

344

29

Slide 30

Module 7: Summary

Key points covered in this module:


A virtual tape library (VTL) provides an interface between backup

software packages so they can work with a Data Domain system


as if they were working with a physical tape library.
Data Domain systems support backups over the SAN via Fibre
Channel HBA.
VTL backups are also supported using NDMP over TCP/IP.
Expired tapes are not automatically deleted. Space is not
reclaimed until tapes are manually relabeled, overwritten, or
deleted.
Always create more slots than you think you need.
To avoid performance issues, run system-intensive processes
only when active VTL backups are not running.

Module 7: Tape Library and VTL Concepts

Copyright 2013 EMC Corporation. All Rights Reserved.

345

30

346

Slide 1

Module 8: DD Boost

Upon completion of this module, you should be able to:


Describe DD Boost features and their functions
Identify how replication is enhanced with DD Boost
Describe how DD Boost is configured generally for operation

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

This module discusses how DD Boost incorporates several features to significantly reduce backup time
and manage replicated data for easier access in data recovery operations.
By the end of this module, you should be able to:
Describe DD Boost features and their functions.
Indentify how replication is enhanced with DD Boost.
Describe how DD Boost is configured for operation.

347

Slide 2

Module 8: DD Boost

Lesson 1: DD Boost Overview and Features


This lesson presents an overview of DD Boost features and
additional options.

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

EMC Data Domain Boost extends the optimization capabilities of Data Domain systems for other EMC
environments, such as Avamar and NetWorker, as well as Greenplum, Quest vRanger, Oracle RMAN,
Symantec NetBackup, and Backup Exec.
In this lesson, you will get an overview of the DD Boost functionality and the features that make up this
licensed addition to the Data Domain operating system.

348

Slide 3

DD Boost Overview of Features

DD Boost is a private protocol that is more efficient for backup

than CIFS/NFS.
The application host is aware of, and manages replication of
backups created with DD Boost. This is called Managed File
Replication.
DD Boost shares the work of deduplication by distributing some
of the processing with the application host. This feature is called
distributed segment processing (DSP).

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

There are three basic features to DD Boost:


1. A private protocol that is more efficient than CIFS or NFS. DD Boost has a private, efficient data
transfer protocol with options to increase efficiencies.
2. Distributed segment processing (DSP). An optional feature to DD Boost shares portions of the
deduplication process with the application host, improving data throughput.
DSP distributes parts of the deduplication process to the NetWorker storage node using the
embedded DD Boost Library (or, for other backup applications, using the DD BOOST plug-in),
moving some of the processing normally handled by the Data Domain system to the application
host. The application host performs a comparison of the data to be backed up with the library
and looks for any unique segments. Thus it sends only unique segments to the Data Domain
system.
Benefits of DSP include:
Increased throughput
Reduced load on the Data Domain system
Reduced bandwidth utilization

349

Reduced load on the storage node/backup host. Managed file replication, an optional
feature of DD Boost, offers a replication environment where the application host is both
aware and can control replication.
3. DD Boost provides systems with centralized replication awareness and management. Using this
feature, known as Managed File Replication, backups written to one Data Domain system can be
replicated to a second Data Domain system under the management of the application host. The
application host catalogs and tracks the replica, making it immediately accessible for recovery
operations. Administrators can use their backup application to recover duplicate copies directly
from a replica Data Domain system.
Benefits of managed file replication include:
Faster disaster recovery.
Quicker access to recovery. All backups and clones are cataloged in your backup application
on your server.
Full administrative control of all backups and replicas through the backup software.

350

Slide 4

DD Boost Additional Options Overview

Advanced load balancing and link failover via interface groups


Virtual synthetics
Low bandwidth optimization
Encryption of managed file replication data

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

Advanced load balancing and link failover via interface groups


To improve data transfer performance and increase reliability, you can create a group interface using
the advanced load balancing and link failover feature. Configuring an interface group creates a private
network within the Data Domain system, comprised of the IP addresses designated as a group. Clients
are assigned to a single group by specifying client name (client.emc.com) or wild card name (*.emc).
Benefits include:
Potentially simplified installation management
A system that remains operational through loss of individual interfaces
Potentially higher link utilization
In-flight jobs that fail over to healthy links, so jobs continue uninterrupted from the point of
view of the backup application.
Virtual synthetics
DD Boost in DD OS 5.2 supports optimized synthetic backups when integrated with backup software.
Currently, EMC NetWorker and Symantec NetBackup are the only supported software applications using
this feature.

351

Optimized synthetic backups reduce processing overhead associated with traditional synthetic full
backups. Just like a traditional backup scenario, optimized synthetic backups start with an initial full
backup followed by incremental backups throughout the week. However, the subsequent full backup
requires no data movement between the application server and Data Domain system. The second full
backup is synthesized using pointers to existing segments on the Data Domain system. This optimization
reduces the frequency of full backups, thus improving recovery point objectives (RPO) and enabling
single step recovery to improve recovery time objectives (RTO). In addition, optimized synthetic backups
further reduce the load on the LAN and application host.
Benefits include:
Reduces the frequency of full backups
Improves RPO and RTO
Reduces load on the LAN and application host
Both low bandwidth optimization and encryption of managed file replication data are replication
optional features and are both supported with DD Boost enabled.

352

Slide 5

DD Boost Technology Interoperability


DD Boost works with the following applications:
EMC Avamar
EMC Greenplum Data Computing Appliance
EMC NetWorker
Oracle Recovery Manager (RMAN)
Quest vRanger Pro
Symantec Backup Exec
Symantec NetBackup

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

As of DD OS version 5.2, DD Boost currently supports interoperability with the listed products on various
backup host platforms and operating systems. The interoperability matrix is both large and complex. To
be certain a specific platform and operating system is compatible with a version of DD Boost, consult the
EMC DD Boost Compatibility Guide found in the Support Portal at http://my.datadomain.com.

353

Slide 6

DD Boost Storage Units


/data/
/col1/
/backup
/hr
/sales
/exchange_su
/.ddboost

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

To store backup data using DD Boost, the Data Domain system exposes user-created disk volumes called
storage units (SUs) to a DD Boost-enabled application host. In this example, an administrator created an
SU named exchange_su. As the system completes the SU creation, an MTree is created, and the file,
/.ddboost is placed within the created MTree. Creating additional storage units creates additional
MTrees under /data/col1 each with its own /.ddboost file within. Access to the SU is OS
independent. Multiple applications hosts, when configured with DD Boost, can use the same SU on a
Data Domain system as a storage server.
Storage units can be monitored and controlled just as any data managed within an MTree. You can set
hard and soft quota limits and receive reports about MTree content.
Note: Storage units cannot be used with anything but a DD Boost replication context.

354

Slide 7

DD Boost: Without Distributed Segment Processing

Backup host

Sends all data to be backed up to the Data


Domain system.
Segments the received data and
creates fingerprints
Filters fingerprints
Compresses unique data
Notes references to previously
stored data and writes new data.

LAN
Clients
Server

DD
Boost

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

If you recall, the deduplication on a Data Domain system is a five-step process where the system:
1. Segments data to be backed up
2. Creates fingerprints of segment data
3. Filters the fingerprints and notes references to previously stored data
4. Compresses unique, new data to be stored
5. Writes the new data to disk
In normal backup operations, the backup host has no part in the deduplication process. When backups
run, the backup host sends all backup data to allow the Data Domain system to perform the entire
deduplication process to all of the data.

355

Slide 8

DD Boost: With Distributed Segment Processing

Backup host
DD
Boost
Library

Segments data to be backed up.


Creates fingerprints and sends them to DD.
Compresses and sends only unique data
segments to the DD system.
Filters fingerprints and requests
only unique data segments.
Notes references to previously
stored data and writes new data.

LAN
Clients
Server

DD
Boost

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

Distributed segment processing (DSP) shares deduplication duties with the backup host. With DSP
enabled the backup host:
1. Segments the data to be backed up
2. Creates fingerprints of segment data and sends them to the Data Domain system
3. Optionally compresses data to be backed up
4. Sends only the requested unique data segments to the Data Domain system
The Data Domain system:
1. Filters the fingerprints sent by the backup host and requests data not previously stored
2. Notes references to previously stored data and writes new data
The deduplication process is the same whether DSP is enabled or not. With DSP enabled, the backup
host will split the arriving data into 4-12 kb segments. A fingerprint (or segment ID) is created for each
segment. Each segment ID is sent over the network to the Data Domain system to filter. The filter
determines if the segment ID is new or a duplicate. The segment IDs are checked against segment IDs
already on the Data Domain system. The segment IDs that match existing segments IDs are referenced
and discarded, while the Data Domain system tells the backup host which segment IDs are unmatched
(new).

356

Unmatched or new segments are compressed using common compression techniques, such LZ, GZ, or
Gzfast. This is also called local compression. The compressed segments are sent to the Data Domain
system and written to the Data Domain system with the associated fingerprints, metadata, and logs.
The main benefits of DSP are:
More efficient CPU utilization.
Improved utilization of network bandwidth. Less data throughput is required to send with each
backup.
Less time to restart failed backup jobs. If a job fails, the data already sent to the Data Domain
system does not need to be sent again reducing the load on the network and improving the
overall throughput for the failed backups upon retry.
Distribution of the workload between the Data Domain system and the DD Boost aware
application.
DD BOOST can operate with distributed segment processing either enabled or disabled.

357

Slide 9

Considerations for Distributed Segment Processing


Network speed: DSP Allows use of existing 1 GbE infrastructure

to achieve higher throughput than is physically possible over 1


GbE links.
Application host: Use DSP if your application host is underutilized
and can accommodate the additional processing assignment.

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

The network bandwidth requirements are significantly reduced because only unique data is sent over
the LAN to the Data Domain systems.
Consider DSP only if your application host can accommodate the additional processing required by its
share of the DSP workflow.

358

Slide 10

DD Boost: Managed File Replication


Backup host
DD
Boost
Library

File replication done


ad hoc at the request
of DD Boost aware
backup software

WAN

Replication and recovery are


centrally configured and
monitored through backup
software.
DD Boost file replication removes
the media server from the data
path when creating duplicates.
WAN-efficient replication is used
between source and destination
when making duplicate backups.
Reports the contents of replicated data.

Clients
Server

Source
DD
Boost

Network

Replication pair

Destination
DD
Boost

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

10

DD Boost integration enables the backup application to manage file replication between two or more
Data Domain systems configured with DD Boost software. It is a simple process to schedule Data
Domain replication operations and keep track of backups for both local and remote sites. In turn,
recovery from backup copies at the central site is also simplified because all copies are tracked in the
backup software catalog.
The Data Domain system uses a wide area network (WAN)-efficient replication process for deduplicated
data. The process can be optimized for WANs, reducing the overall load on the WAN bandwidth
required for creating a duplicate copy.

359

Slide 11

Managed File Replication: A NetWorker Example


Media
Database

Initial
backup
control
data
clone
copy
control
data

NetWorker
Storage Node

update control
data
(initial backup)

update
control data
(replication copy)

Local
Data Domain System
Initial data backup
backup complete

3
4
7

begin replication

replication complete

initial
backup
5

Remote
Data Domain System

replication
Replication
complete

replication
copy
6

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

11

This example shows managed file replication with DD Boost. The example is specific to an EMC
NetWorker environment. Symantec and other backup applications using DD Boost will manage
replication in a similar manner.
In this environment, a backup server is sending backups to a local Data Domain system. A remote Data
Domain system is set up for replication and disaster recovery of the primary site.
1. The NetWorker storage node initiates the backup job and sends data to the Data Domain
system. Backup proceeds.
2. The Data Domain system signals that the backup is complete.
3. Information about the initial backup is updated in the NetWorker media database.
4. The NetWorker storage node initiates replication of the primary backup to the remote Data
Domain system through a clone request.
5. Replication between the local and remote Data Domain systems proceed.
6. When replication completes, the Networker storage node receives confirmation of the
completed replication action.
7. Information about the clone copy of the data set is updated in the NetWorker media database.
Replicated data is now immediately accessible for data recovery using the NetWorker media database.

360

Slide 12

Considerations for Managed File Replication

Standard MTree replication and managed file replication can

operate on the same system.


Note: Managed file replication can be used only with DD Boost
storage units, while MTree replication can be used only with CIFS
and NFS data.
Any combination of MTrees can be created but cannot exceed
the limit of 100 MTrees total.
Remember to remain below the recommended limit of
replication pairs for your Data Domain systems.

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

12

While it is acceptable for both standard MTree replication and managed file replication to operate on
the same system, be aware that managed file replication can be used only with MTrees established with
DD Boost storage units. MTree replication can be used only with CIFS and NFS data.
You also need to be mindful not to exceed the total number of 100 MTrees on a system. The 100 MTree
limit is a count of both standard MTrees and MTrees created as DD Boost storage units.
Also remember to remain below the maximum total number of replication pairs (contexts)
recommended for your particular Data Domain systems.

361

Slide 13

DD Boost Advanced Load Balancing and Link Failover


Application-layer aggregation
of multiple 1GbE\10GbE
physical ports on Data
Domain systems that enable:
load balanced
Automatic load balancing
backup server group
and failover
Improved performance on
grouped 1GbE physical ports

OST Plug-in

OST Plug-in

OST Plug-in

OST Plug-in

backup hosts

Application layer
aggregation

DD Boost negotiates with Data


Domain systems to obtain an
interface to send the data.
Distributed segment processing
is not affected by interface
groups

4 port interface
group

NIC

NIC

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

13

For Data Domain systems that require multiple 1 GbE links to obtain full system performance, it is
necessary to set up multiple backup servers on the Data Domain systems (one per interface) and target
the backup policies to different servers to spread the load on the interfaces. Using the DD Boost
interface groups, you can improve performance on 1 Gb Ethernet ports.
The Advanced Load Balancing and Link Failover feature allows for combining multiple Ethernet links into
a group. Only one of the interfaces on the Data Domain system is registered with the backup
application. DD Boost software negotiates with the Data Domain system on the interface registered with
the backup application to obtain an interface to send the data. The load balancing provides higher
physical throughput to the Data Domain system compared to configuring the interfaces into a virtual
interface using Ethernet-level aggregation.
The links connecting the backup hosts and the switch that connects to the Data Domain system are
placed in an aggregated failover mode. A network-layer aggregation of multiple 1 GbE or 10 GbE links is
registered with the backup application and is controlled on the backup server.
This configuration provides network failover functionality from end-to-end in the configuration. Any of
the available aggregation technologies can be used between the backup servers and the switch.

362

An interface group is configured on the Data Domain system as a private network used for data transfer.
The IP address must be configured on the Data Domain system and its interface enabled. If an interface
(or a NIC that has multiple interfaces) fails, all of the in-flight jobs to that interface transparently failover to a healthy interface in the interface group (ifgroup). Any jobs started subsequent to the failure
are routed to the healthy interfaces. You can add public or private IP addresses for data transfer
connections.
Distributed segment processing (DSP) is not affected by DD Boost application-level groups.
With dynamic load balancing and failover, the DD Boost plug-in dynamically negotiates with the Data
Domain system on the interface registered with the backup application to obtain an interface to send
the data. The load balancing provides higher physical throughput to the Data Domain system compared
to configuring the interfaces into a virtual interface using Ethernet-level aggregation.
Note: Do not use 1GbE and 10GbE connections in the same interface group.

363

Slide 14

Virtual Synthetic Backups

Are full backups generated from one previous traditional or

synthetic full backup and differential backups or a cumulative


incremental backup
Can be used to restore files and directories just like a traditional
backup.
Reduce network traffic and client processing by transferring
backup data over the network only once.
Are a scalable solution for backing up remote offices with
manageable data volumes and low levels of daily change.

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

14

A synthetic full or synthetic cumulative incremental backup is a backup assembled from previous
backups. Synthetic backups are generated from one previous, traditional full or synthetic full backup,
and subsequent differential backups or a cumulative incremental backup. (A traditional full backup
means a non-synthesized, full backup.) A client can use the synthesized backup to restore files and
directories in the same way that a client restores from a traditional backup.
During a traditional full backup, all files are copied from the client to a media server and the resulting
image set is sent to the Data Domain system. The files are copied even though those files may not have
changed since the last incremental or differential backup. During a synthetic full backup, the previous
full backup and the subsequent incremental backups on the Data Domain system are combined to form
a new, full backup. The new, full synthetic backup is an accurate representation of the clients file
system at the time of the most recent full backup.
Because processing takes place on the Data Domain system under the direction of the media server
instead of the client, virtual synthetic backups help to reduce the network traffic and client processing.
Client files and backup image sets are transferred over the network only once. After the backup images
are combined into a synthetic backup, the previous incremental and/or differential images can be
expired.

364

The virtual synthetic full backup is a scalable solution for backing up remote offices with manageable
data volumes and low levels of daily change. If the clients experience a high rate of change daily, the
incremental or differential backups are too large. In this case, a virtual synthetic backup is no more
helpful than a traditional full backup. To ensure good restore performance, it is recommended that you
create a traditional full backup every two months, presuming a normal weekly full and daily incremental
backup policy.
The virtual synthetic full backup is the combination of the last full (synthetic or full) backup and all
subsequent incremental backups. It is time-stamped as occurring one second after the latest
incremental. It does NOT include any changes to the backup selection since the latest incremental.

365

Slide 15

Considerations for Synthetic Backups

The amount of change in the daily incremental backups.


The size and physical storage capacity of your Data Domain

system.
How well your systems handle DSP.
How frequently you perform data restores from your backed-up
data.
The type of data being backed up, that is, does it lend itself well
to virtual synthetic backups?

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

15

Synthetic backups can reduce the load on an application server and the data traffic between an
application server and a media server. Synthetic backups can reduce the traffic between the media
server and the DD System by performing the Virtual Synthetic Backup assembly on the DD System.
You might want to consider using virtual synthetic backups when:
Your backups are small, and localized, so that daily incrementals are small (<10% of a normal,
full backup).
The Data Domain system you are using has a large number of disks (>10).
Data restores are infrequent.
Your intention is to reduce the amount of network traffic between the application server, the
media servers and the Data Domain system.
Your media servers are burdened and might not handle DSP well.

366

It might not be appropriate to use virtual synthetic backups when:


Daily incremental backups are high, or highly distributed (incrementals are > 15% of a full
backup).
You are backing up large, non-file system data (such as databases).
Data restores are frequent.
The Data Domain system is small or has few disks.
Your media server handles DSP well.
Restore performance from a synthetic backup will typically be worse than a standard full backup due to
poor data locality.

367

Slide 16

Module 8: DD Boost

Lesson 2: Configure Data Domain to Use DD Boost


This lesson covers how to integrate DD Boost in EMC NetWorker
and Symantec NetBackup environments

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

16

EMC Data Domain Boost integrates with many EMC, and a growing number of third-party, applications.
This lesson discusses how DD Boost integrates with EMC NetWorker and Symantec NetBackup.

368

Slide 17

Enabling DD Boost
Backup host

DD
Boost
Library

source

destination

DD
Boost

DD
Boost

When using DD Boost with


DD Boost functionality is
Symantec NetBackup and other
built in the DD OS. A
3rd party backup applications,
license is required but no
you must download and install
installation required.
the appropriate OST plug-in. EMC
NetWorker, EMC Avamar, Oracle
RMAN and others, the
application is built-in with no
further installation required.

A separate DD Boost
license is required for
a destination Data Domain
system if you implement
the managed file
replication feature.

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

17

The DD Boost feature is built-into the Data Domain operating system. Unlock the DD Boost feature on
each Data Domain system with separate license keys. If you are planning not to use Managed File
Replication, the destination Data Domain system does not require a DD Boost license.
Note: For EMC, Oracle, and Quest users, the Data Domain Boost library is already included in recent
versions of software. Before enabling DD Boost on Symantec Backup Exec, and NetBackup, a special OST
plug-in must be downloaded and installed on the backup host. The plug-in contains the appropriate DD
Boost Library for use with compatible Symantec product versions. Consult the most current DD Boost
Compatibility Guide to verify compatibility with your specific software and Data Domain operating
system versions. Both the compatibility guide and versions of OpenStorage (OST) plug-in software are
available through the Data Domain support portal at: http://my.datadomain.com.
A second destination Data Domain system licensed with DD Boost is needed when implementing
centralized replication awareness and management.

369

Slide 18

DD Boost Configuration

backup
host

DD
Boost
Library

1. License as required
2. Create devices, pools through
the backup server management
console and interface.
3. Configure the backup
policies/groups to use Data
Domain configured devices.
4. Configure the backup host to
use Data Domain configured
devices on desired Data Domain
systems.

source

destination

DD
Boost

DD
Boost

1. License DD Boost.
1. License DD Boost.
2. Enable DD Boost.
2. Enable DD Boost.
3. Set a client and a Data 3. Set a Data Domain
local user as a DD
Domain local user as
Boost user.
a DD Boost user.
4. Create DD Boost
4. Create DD Boost
storage units.
storage units.
5. Enable or disable
optional DD Boost
Network Note: Enable the following ports:
features.
UDP 2049 (enables NFS communication)

TCP 2051 (enables file replication communication)


TCP 111 (enables RPC portmapper services comms)

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

18

Data Domain Boost configuration is the same for all backup environments:
On each of the Data Domain systems:
1. License DD Boost on the Data Domain system(s): System Settings > Licenses > Add Licenses
2. Enable DD Boost on all Data Domain systems: Data Management > DD Boost > DD Boost Status >
Enable.
3. Set a backup host as a client by hostname (the configuration does not accept IP addresses in this
case). Define a Data Domain local user as the DD Boost User: Data Management > DD Boost >
DD Boost User > Modify
4. Create at least one storage unit. You must create one or more storage units for each Data
Domain system enabled for DD Boost: Data Management > DD Boost > Storage Units > Create
Storage Unit

370

The following are optional configuration parameters:


Configure distributed segment processing: DD Boost > Activities > Distributed Segment
Processing Status: > Enable (default)/Disable
Note: DSP is enabled by default.
Configure advanced load balancing and link failover: DD Boost > Activities > Interface Group
Status > Configure (then Enable).
Enable low-bandwidth optimization : DD Boost > Active File Replications > Low Bandwidth
Optimization status > Disable (default)/Enable.
Note: Low-bandwidth optimization is disabled by default.
Enable encrypted optimized deduplication: DD Boost > Active File Replications > File Replication
Encryption status > Disable (default)/Enable.
Note: Encrypted optimized duplication is disabled by default.
For the backup host:
1. License the backup software for DD Boost as required by the software manufacturer.
2. Create devices and pools through the management console/interface.
3. Configure backup policies and groups to use the Data Domain system for backups with DD
Boost.
4. Configure clone or duplicate operations to use Data Domain managed replication between Data
Domain systems.
On the Network:
Open the following ports if you plan to use any of the related features through a network
firewall:
UDP 2049 (enables NFS communication)
TCP 2051 (enables file replication communication)
TCP 111 (enables RPC portmapper services communication)

371

Related CLI commands:


# ddboost ifgroup enable
Enables an interface group
# ddboost ifgroup show config
Shows the configuration of an interface group
# license add license_key
Adds a license key
# ddboost enable
Enables DD Boost.
# ddboost set user-name
Sets the DD Boost user name when DD Boost is enabled.
# ddboost storage-unit create
Creates and names a storage unit.
# ddboost option set distributed-segment-processing
Enables or disables the distributed segment processing feature.
# ddboost ifgroup add interface
Adds an IP address to a private network to enable data transfer.
# ddboost file-replication option set low-bw-optim
Enables or disables low-bandwidth optimization.
# ddboost file-replication option set encryption
Enables or disables file replication encryption.

372

Slide 19

DD Boost Configuration: Enable DD Boost

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

19

Enable DD Boost by navigating in the Data Domain Enterprise Manager to Data Management > DD Boost
> Settings.
In the example on the slide, see that the current DD Boost Status is enabled. To click the button circled
in red to either enable or disable DD Boost on a system.

373

Slide 20

DD Boost Configuration: Add DD Boost User and Client

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

20

To add or change a DD Boost user for the system, click the Modify button. In the Modify DD Boost
User window, select from an existing user or add a new user, give them a password and assign them a
role. In the case on this slide, we have added the user name, ddboost, and assigned them the role of
backup-operator.
In the Allowed Clients field, click the green plus button to add a new client whom you are allowing to
access DD Boost on the system. Add the client name as a domain name since IP addresses are not
allowed.

374

Slide 21

DD Boost Configuration: Create Storage Units

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

21

Create a storage unit by navigating to Data Management > DD Boost > Storage Units > Create
Note: The section, Storage Unit Details is new to DD OS 5.2. It provides a good summary of a storage
unit and the status of file count, compression ratio, SU status, and quota function.
Name the storage unit and set any quota settings you wish. Be aware that these quota settings are not
enforced unless MTree quotas are enabled.

375

Slide 22

DD Boost Configuration: DD Boost Options

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

22

To enable or disable distributed segment processing, bandwidth optimization for file replication, and file
replication encryption, click More Tasks > Set Options.

376

Slide 23

Lab 8.1: Configuring DD Boost with EMC2 Networker 8


Lab 8.2: Configuring DD Boost with Netbackup 7

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

23

In this lab, you have the choice of configuring DD Boost using either EMC NetWorker or Symantec
NetBackup. If time allows, you may perform this lab twice, configuring with both backup applications.

377

Slide 24

Module 8: Summary
Key points covered in this module:
DD Boost uses distributed segment processing (DSP) to reduce
network bandwidth.
DD Boost features centralized replication management as a
single point for tracking all backups and duplicate copies.
DD Boost uses advanced load balancing and failover among
available ports, thereby keeping backups running efficiently and
fault tolerant.
With DSP, the deduplication process is distributed between the
backup host and a Data Domain system, increasing aggregate
throughput while decreasing data transferred over the network.

Module 8: DD Boost

Copyright 2013 EMC Corporation. All Rights Reserved.

378

24

Slide 1

Module 9: Data Security

Upon completion of this module, you should be able to:


Describe purposes of and differences between retention lock
compliance and retention lock governance.
Configure and set retention lock compliance
Describe file system locking
Describe and perform data sanitization
Describe and perform encryption for data at rest

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

In this module, you will learn about security and protecting your data with a Data Domain system,
specifically how to:
Describe the purposes of, and differences between, retention lock compliance and retention
lock governance.
Configure and set retention lock compliance
Describe file system locking
Describe and perform data sanitization
Describe and perform encryption for data at rest

379

Slide 2

Module 9: Data Security

Lesson 1: Data Domain Retention Lock


In this lesson, the following topics are covered:
Data Domain Retention Lock features overview
An introduction to the Security Officer role within Data
Domain systems
Data Domain Retention Lock functional overview and
configuration

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

As data ages and becomes seldom used, EMC recommends moving this data to archive storage where it
can still be accessed, but no longer occupies valuable storage space.
Unlike backup data, which is a secondary copy of data for shorter-term recovery purposes, archive data
is a primary copy of data and is often retained for several years. In many environments, corporate
governance and/or compliance regulatory standards can mandate that some or all of this data be
retained as-is. In other words, the integrity of the archive data must be maintained for specific time
periods before it can be deleted.
The EMC Data Domain Retention Lock (DD Retention Lock) feature provides immutable file locking and
secure data retention capabilities to meet both governance and compliance standards of secure data
retention. DD Retention Lock ensures that archive data is retained for the length of the policy with data
integrity and security.
This lesson presents an overview of Data Domain Retention Lock, its configuration and use.

380

Slide 3

Overview of Data Domain Retention Lock

Protects against
User errors
Malicious activity

Protects locked files by making them


Non-writeable
Non-erasable

Fully integrated with Data Domain replication


Sets and enforces the lock set by user and software
Comes in two editions:
Governance where the system administrator manages the locks.
Compliance where locks are managed by both system

administrator and security officer.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

EMC Data Domain Retention Lock is an optional, licensed software feature that allows storage
administrators and compliance officers to meet data retention requirements for archive data stored on
an EMC Data Domain system. For files committed to be retained, DD Retention Lock software works in
conjunction with the applications retention policy to prevent these files from being modified or deleted
during the applications defined retention period, for up to 70 years. It protects against data
management accidents, user errors and any malicious activity that might compromise the integrity of
the retained data. The retention period of a retention-locked file can be extended, but not reduced.
After the retention period expires, files can be deleted, but cannot be modified. Files that are written to
an EMC Data Domain system, but not committed to be retained, can be modified or deleted at any time.

381

DD Retention Lock comes in two, separately licensed, editions:


DD Retention Lock Governance edition maintains the integrity of the archive data with the
assumption that the system administrator is generally trusted, and thus any actions taken by the
system administrator are valid as far as the data integrity of the archive data is concerned.
DD Retention Lock Compliance edition is designed to meet strict regulatory compliance
standards such of those of the United States Securities and Exchange Commission. When DD
Retention Lock Compliance is installed and deployed on an EMC Data Domain system, it requires
additional authorization by a Security Officer for system functions to safeguard against any
actions that could compromise data integrity.

382

Slide 4

DD Retention Lock Capabilities


Capability

Retention Lock
Governance

Retention Lock
Compliance

File level retention policies

Yes

Yes

Update minimum and maximum retention


periods

Yes

Yes, with Security Officer


authorization

Rename MTree

Yes

Yes, with Security Officer


authorization

Extension of minimum and maximum retention


periods

Yes

Yes, with Security Officer


authorization

Replication modes supported

Collection, Directory,
MTree

Collection

Secure Clock (disables ability to set and change


date on the Data Domain system)

No

Yes

Audit Logging

No

Yes

CLI Support

Yes

Yes

DD Enterprise Manager (GUI) Configuration

Yes

No

Supported Protocols

CIFS, NFS, VTL

CIFS, NFS

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

The capabilities built into Data Domain Retention Lock are based on governance and compliance archive
data requirements.
Governance archive data requirements:
Governance standards are considered to be lenient in nature allowing for flexible control of retention
policies, but not at the expense of maintaining the integrity of the data during the retention period.
These standards apply to environments where the system administrator is trusted with his administrator
actions.

383

The storage system has to securely retain archive data per corporate governance standards and must
meet the following requirements:
Allow archive files to be committed for a specific period of time during which the contents of the
secured file cannot be deleted or modified.
Allow for deletion of the retained data after the retention period expires.
Allow for ease of integration with existing archiving application infrastructure through CIFS and
NFS.
Provide flexible policies such as allow extending the retention period of a secured file, revert of
locked state of the archived file, etc.
Ability to replicate both the retained archive files and retention period attribute to a destination
site to meet the disaster recovery (DR) needs for archived data.
Compliance archive data requirements:
Securities and Exchange Commission (SEC) rules define compliance standards for archive storage to be
retained on electronic storage media, which must meet certain conditions:
Preserve the records exclusively in a non-writeable, non-erasable format.
Verify automatically the quality and accuracy of the storage media recording process.
Serialize the original, and any duplicate units of storage media, and the time-date for the
required retention period for information placed on the storage media.
Store, separately from the original, a duplicate copy of the record on an SEC-approved medium
for the time required.
Data Domain Retention Lock Governance edition maintains the integrity of the archive data with the
assumption that the system administrator is trusted, and that any actions they take are valid to maintain
the integrity of the archive data.
Data Domain Retention Lock Compliance edition is designed to meet the regulatory compliance
standards such as those set by the SEC standards, for records (SEC 17a-4(f)). Additional security
authorization is required to manage the manipulation of retention periods, as well as renaming MTrees
designated for retention lock.
Note: DD Retention Lock software cannot be used with EMC Data Domain GDA models or with the DD
Boost protocol. Attempts to apply retention lock to MTrees containing files created by DD Boost will fail.

384

Slide 5

Security Officer Role and Security Privilege

Security privilege is assigned to user accounts using either the

CLI or Enterprise Manager.


Security privilege is additional to user and admin privileges.
A user assigned the security privilege is called a security officer.
The security officer role can enable the runtime authorization
policy, which is used to manage encryption commands.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

As discussed in the Basic Administration module, a security privilege can be assigned to user accounts:
In the Enterprise Manager when user accounts are created.
In the CLI when user accounts are added.
This security privilege is in addition to the user and admin privileges.
A user assigned the security privilege is called a security officer.
The security officer can run a command via the CLI called the runtime authorization policy.
Updating or extending retention periods, and renaming MTrees, requires the use of the runtime
authorization policy. When enabled, runtime authorization policy is invoked on the system for the
length of time the security officer is logged in to the current session.
Runtime authorization policy, when enabled, authorizes the security officer to provide credentials, as
part of a dual authorization with the admin role, to set-up and modify both retention lock compliance
features, and data encryption features as you will learn later in this module.

385

Slide 6

Data Domain Retention Lock Flow


Enable DD Retention Lock Governance, Compliance, or both on
the Data Domain system. (You must have a valid license for DD
Retention lock Governance and/or Compliance.)

Enable MTrees for governance or


compliance retention locking using
Enterprise Manger or CLI commands.

Commit files to be retention locked on


the Data Domain system using clientside commands issued by an
appropriately configured archiving or
backup application, manually, or using
scripts.

Optional
Extend file retention times or delete
files with expired retention periods
using client-side commands.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

1. Enable DD Retention Lock Governance, Compliance, or both on the Data Domain system. (You
must have a valid license for DD Retention lock Governance and/or Compliance.)
2. Enable MTrees for governance or compliance retention locking using Enterprise Manger or CLI
commands.
3. Commit files to be retention locked on the Data Domain system using client-side commands
issued by an appropriately configured archiving or backup application, manually, or using scripts.
4. (Optional) Extend file retention times or delete files with expired retention periods using clientside commands.

386

Slide 7

File Locking Protocol

To lock a file that is migrated to an MTree with DD Retention

Lock enabled, the user or software must set the last access time
(atime) of that file to communicate the retention period to the
Data Domain system.
atime must be set beyond the current configured minimum
retention period.
Defaults
Minimum retention period = 12 hours
Maximum retention period = 5 years

Locked files cannot be modified even after their retention period


expires. Archived data that remains on the Data Domain system
is not deleted automatically. Data must be deleted by an
archiving application or manually.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

After an archive file has been migrated onto a Data Domain system, it is the responsibility of the
archiving application to set and communicate the retention period attribute to the Data Domain system.
The archiving application sends the retention period attribute over standard industry protocols.
The retention period attribute used by the archiving application is the last access time: the atime. DD
Retention Lock software allows granular management of retention periods on a file-by-file basis. As part
of the configuration and administrative setup process of the DD Retention Lock software, a minimum
and maximum time-based retention period for each MTree is established. This ensures that the atime
retention expiration date for an archive file is not set below the minimum, or above the maximum,
retention period.

387

The archiving application must set the atime value, and DD Retention Lock must enforce it, to avoid any
modification or deletion of files under retention of the file on the Data Domain system. For example,
Symantec Enterprise Vault retains records for a user-specified amount of time. When Enterprise Vault
retention is in effect, these documents cannot be modified or deleted on the Data Domain system.
When that time expires, Enterprise Vault can be set to automatically dispose of those records.
Locked files cannot be modified on the Data Domain system even after the retention period for the file
expires. Files can be copied to another system and then be modified. Archive data retained on the Data
Domain system after the retention period expires is not deleted automatically. An archiving application
must delete the remaining files, or they must be removed manually.

388

Slide 8

Configuring Data Domain Retention Lock Governance

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

You can configure DD Retention Lock Governance using the Enterprise Manager or by using CLI
commands. Enterprise Manager provides the capability to modify the minimum and maximum retention
period for selected MTrees. In the example above, the Modify dialog is for the MTree /data/col1/hr.
To configure retention lock:
1. Select the system in the navigation pane.
2. Select Data Management > MTree.
3. Select the MTree you want to edit with DD Retention Lock.
4. Go to the Retention Lock pane at the bottom of the window.
5. Click Edit.
6. Check the box to enable retention lock.
7. Enter the retention period or select Default.
8. Click OK.

389

Related CLI commands:


# mtree retention-lock disable mtree
Disables the retention-lock feature for the specified MTree.
# mtree retention-lock enable mtree
Enables the retention-lock feature for the specified MTree.
Note: You cannot rename non-empty folders or directories within a retention-locked MTree;
however, you can rename empty folders or directories and create new ones.
# mtree retention-lock reset
Resets the minimum or maximum retention period for the specified MTree to its default value.
# mtree retention-lock revert
Reverts the retention lock for all files on a specified path.
# mtree retention-lock set
Sets the minimum or maximum retention period for the specified MTree.
# mtree retention-lock show
Shows the minimum or maximum retention period for the specified MTree.
# mtree retention-lock status mtree
Shows the retention-lock status for the specified MTree. Possible values are enabled, disabled,
and previously enabled.

390

Slide 9

DD Retention Lock Compliance Edition

Retention lock compliance ensures files locked by an archiving

application or user software cannot be deleted or overwritten


under any circumstances.
Retention lock compliance uses multiple hardening procedures:
Secures the system clock from illegal updates
Requires dual sign-on for certain administrative actions
Disables various avenues of access where locked data or the state

of retention attributes might be compromised.

DD Retention Lock Compliance edition is supported by CIFS and

NFS protocols only.


Retention lock is not currently supported with DD Boost and VTL
Pool MTrees.
Retention lock compliance can only be removed from a Data
Domain system by a fresh installation of the OS using a USB key.
Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

The DD Retention Lock Compliance edition meets the strict requirements of regulatory standards for
electronic records, such as SEC 17a-4(f), and other standards that are practiced worldwide.
DD Retention Lock Compliance, when enabled on an MTree, ensures that all files locked by an archiving
application, for a time-based retention period, cannot be deleted or overwritten under any
circumstances until the retention period expires. This is archived using multiple hardening procedures:
Requiring dual sign-on for certain administrative actions. Before engaging DD Retention Lock
Compliance edition, the System Administrator must create a Security Officer role. The System
Administrator can create the first Security Officer, but only the Security Officer can create other
Security Officers on the system.
Some of the actions requiring dual sign-on are:
Extending the retention periods for an MTree.
Renaming the MTree.
Deleting the Retention Lock Compliance license from the Data Domain system.

391

Securing the system clock from illegal updates


If the system clock is skewed more than 15 minutes or more than 2 weeks in a year, the file
system will shut down and can be resumed only by providing Security Officer credentials.
Completely disallowing operations that could lead to a compromise in the state of locked and
retained archive data.
Removing retention lock compliance requires a fresh installation of the DD OS using a USB key
installation. Contact Data Domain Support for assistance in performing this operation as it is not
covered in this course.
Note: Retention lock is not currently supported with DD Boost and VTL Pool MTrees.

392

Slide 10

Lab 9.1: Configuring Retention Lock Compliance

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

393

10

Slide 11

Module 9: Data Security

Lesson 2: Data Sanitization


This lesson covers the following topics:
Overview of data sanitization
Running the system sanitize command

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

11

In this lesson, you will about learn the function of data sanitization and how to run a command from the
CLI to sanitize data on a Data Domain system.

394

Slide 12

Overview of Data Sanitization

Also called electronic shredding


Performs a filesys clean operation with the added step of

'sanitizing' overwriting free space, metadata, references, etc.


Overwrites deleted files (no residual data remains)
Is often a government requirement
Used to resolve classified message incidents (CMIs)

Erases segments of deleted files not used by other files and all
unused capacity in the file system

Unused capacity is data space that has been used and cleaned
Unused capacity does not include space that has never been used

Accessible only through the CLI


#

system sanitize

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

12

Data sanitization is sometimes referred to as electronic shredding.


With the data sanitization function, deleted files are overwritten using a DoD/NIST-compliant algorithm
and procedures. No complex setup or system process disruption is required. Current, existing data is
available during the sanitization process, with limited disruption to daily operations. Sanitization is the
electronic equivalent of data shredding. Normal file deletion provides residual data that allows recovery.
Sanitization removes any trace of deleted files with no residual remains.
Sanitization supports organizations (typically government organizations) that:
Are required to delete data that is no longer needed.
Need to resolve (remove and destroy) classified message incidents. Classified message incident
(CMI) is a government term that describes an event where data of a certain classification is
inadvertently copied into another system that is not certified for data of that classification.

395

The system sanitize command erases content in the following locations:


Segments of deleted files not used by other files
Contaminated metadata
All unused storage space in the file system
All segments used by deleted files that cannot be globally erased, because some segments might
be used by other files
Sanitization can be run only by using the CLI.

396

Slide 13

System Sanitization Procedure

Use the command:


# system sanitize start
Sysadmin# system sanitize start
System sanitization could take longer than a
filesys clean operation.
Are you sure? (yes|no|?) [no]: yes
Sanitization started. Use system sanitize
watch to monitor progress.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

13

When you issue the system sanitize start command, you are prompted to consider the length
of time required to perform this task. The system advises that it can take longer than the time it takes to
reclaim space holding expired data on the system (filesys clean). This can be several hours or
longer, if there is a high percentage of space to be sanitized.
During sanitization, the system runs through five phases: merge, analysis, enumeration, copy, and zero.
1. Merge: Performs an index merge to flush all index data to disk.
2. Analysis: Reviews all data to be sanitized. This includes all stored data.
3. Enumeration: Reviews all of the files in the logical space and remembers what data is active.
4. Copy: Copies live data forward and frees the space it used to occupy.
5. Zero: Writes zeroes to the disks in the system.
You can view the progress of these five phases by running the system sanitize watch command.

397

Related CLI commands:


# system sanitize abort
Aborts the sanitization process
# system sanitize start
Starts sanitization process immediately
# system sanitize status
Shows current sanitization status
# system sanitize watch
Monitors sanitization progress

398

Slide 14

Lab 9.2: Configuring Data Sanitization

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

399

14

Slide 15

Module 9: Data Security

Lesson 3: Encryption of Data at Rest


This lesson covers the following topics:
The purpose of encryption of data on a Data Domain system
How encryption works on a Data Domain system
How to configure encryption on a Data Domain system
The purpose of file system locking
How to configure file system locking

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

15

In this lesson, you will learn about the features, benefits, and function of the encryption of data at rest
feature.
You will also learn about the purpose of other security features, such as file system locking, and when
and how to use this feature.

400

Slide 16

Encryption of Data at Rest

Enables data on system drives or external storage to be

encrypted, while being saved and locked, before being moved to


another location
Is also called inline data encryption
Protects data on a Data Domain system from unauthorized
access or accidental exposure
Requires an encryption software license
Encrypts all ingested data
Does not automatically encrypt data that was in the system
before encryption was enabled. Such data can be encrypted by
enabling an option to encrypt existing data.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

16

Data encryption protects user data if the Data Domain system is stolen or if the physical storage media is
lost during transit, and eliminates accidental exposure of a failed drive if it is replaced. In addition, if an
intruder ever gains access to encrypted data, the data is unreadable and unusable without the proper
cryptographic keys.
Encryption of data at rest:
Enables data on the Data Domain system to be encrypted, while being saved and locked, before
being moved to another location.
Is also called inline data encryption.
Protects data on a Data Domain system from unauthorized access or accidental exposure.
Requires an encryption software license.
Encrypts all ingested data.
Does not automatically encrypt data that was in the system before encryption was enabled.
Such data can be encrypted by enabling an option to encrypt existing data.
Furthermore, you can use all of the currently supported backup applications described in the Backup
Application Matrix on the Support Portal with the Encryption of Data at Rest feature.

401

Slide 17

Key Management
Two key management capabilities are available:
1. The Local Key Manager provides a single encryption key per
Data Domain system. This single internal Data Domain
encryption key is available on all Data Domain systems.
2. Optional RSA Data Protection Manager (DPM) Key Manager for
added capability. The RSA DPM Key Manager enables the use
of multiple, rotating keys on a Data Domain system.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

There are two available key management options:


As of DD OS 5.2, an optional external encryption key management capability has been
added, the RSA Data Protection Manager (DPM) Key Manager. The preexisting local
encryption key administration method is still in place. You can choose either method to
manage the Data Domain encryption key.
The Local Key Manager provides a single encryption key per Data Domain system.

A single internal Data Domain encryption key is available on all Data Domain systems.

The first time Encryption of Data at Rest is enabled, the Data Domain system randomly generates an
internal system encryption key. After the key is generated, the system encryption key cannot be
changed and is not accessible to a user.

402

17

The encryption key is further protected by a passphrase, which is used to encrypt the encryption key
before it is stored in multiple locations on disk. The passphrase is user-generated and requires both an
administrator and a security officer to change it.

The RSA DPM Key Manager enables the use of multiple, rotating keys on a Data Domain system.
The RSA DPM Key Manager consists of a centralized RSA DPM Key Manager Server and the
embedded DPM client on each Data Domain system.
The RSA DPM Key Manager is in charge of the generation, distribution, and lifecycle
management of multiple encryption keys. Keys can be rotated on a regular basis, depending on
the policy. A maximum number of 254 keys is supported.
If the RSA DPM Key Manager is configured and enabled, the Data Domain systems uses keys
provided by the RSA DPM Key Manager Server.

Note: Only one encryption key can be active on a Data Domain system. The DPM Key Manager provides
the active key. If the same DPM Key Manager manages multiple Data Domain systems, all will have the
same active keyif they are synced, and the Data Domain file system has been restarted.
For additional information about RSA DPM Key Manager, refer to the DD OS 5.2 Administration Guide.

403

Slide 18

Inline Encryption

Configurable 128-bit or 256-bit advanced encryption standard


(AES) algorithm with either:

Confidentiality with cipher-block chaining (CBC) mode

or
Both confidentiality and message authenticity with Galois/Counter
(GCM) mode

Encryption and decryption to and from the disk is transparent to


all access protocols: DD Boost, NFS, CIFS, NDMP tape server, and
VTL (no administrative action is required for decryption).

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

18

With the encryption software option licensed and enabled, all incoming data is encrypted inline before it
is written to disk. This is a software-based approach, and it requires no additional hardware. It includes:
Configurable 128-bit or 256-bit advanced encryption standard (AES) algorithm with either:
Confidentiality with cipher-block chaining (CBC) mode.
or

Both confidentiality and message authenticity with Galois/Counter (GCM) mode


Encryption and decryption to and from the disk is transparent to all access protocols: DD Boost,
NFS, CIFS, NDMP tape server, and VTL (no administrative action is required for decryption).

404

When data is backed up, data enters via NFS, CIFS, VTL, DD Boost, and NDMP tape server protocols. It is
then:
1. Segmented
2. Fingerprinted
3. Deduplicated (or globally compressed)
4. Grouped
5. Locally compressed
6. Encrypted
Note: When enabled, the encryption at rest feature encrypts all data entering the Data Domain system.
You cannot enable encryption at a more granular level.

405

Slide 19

Authorization Workflow
To set encryption on a Data Domain system:
The security officer logs in via
CLI and issues the runtime
authorization policy.
The administrator role issues
the command to enable
encryption via the Enterprise
Manager.
The Enterprise Manager
prompts for security officer
credentials.
With system-accepted
security credentials,
encryption is enabled.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

19

Procedures requiring authorization must be dual-authenticated by the security officer and the user in
the admin role.
For example, to set encryption, the admin enables the feature, and the security officer enables runtime
authorization.
A user in the administrator role interacts with the security officer to perform a command that requires
security officer sign off.
In a typical scenario, the admin issues the command, and the system displays a message that security
officer authorizations must be enabled. To proceed with the sign-off, the security officer must enter his
or her credentials on the same console at which the command option was run. If the system recognizes
the credentials, the procedure is authorized. If not, a Security alert is generated. The authorization log
records the details of each transaction.

406

Slide 20

Configuring Encryption

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

20

With encryption active in the Data Domain system, the Encryption tab within the File System section of
the Data Domain Enterprise Manager shows the current status of system encryption of data at rest.
The status indicates Enabled, Disabled, or Not configured. In the slide, the encryption status is Not
configured.
To configure encryption:
1. Click Configure
(Continued on the next slide)

407

Slide 21

Configuring Encryption (Continued)

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

21

You are prompted for a passphrase. The system generates an encryption key and uses the passphrase to
encrypt the key. One key is used to encrypt all data written to the system. After encryption is enabled,
the passphrase is used by system administrators only when locking or unlocking the file system, or when
disabling encryption. The current passphrase size for DD OS 5.2 is 256 characters.
CAUTION: Unless you can reenter the correct passphrase, you cannot unlock the file system and access
the data. The data will be irretrievably lost.
2. Click Next.
You are prompted to choose the encryption algorithm:
Configurable 128-bit or 256-bit Advanced Encryption Standard (AES) algorithm with either:
Confidentiality with Cipher Block Chaining (CBC) mode
Both confidentiality and message authenticity with Galois/Counter (GCM) mode
In this configuration window, you can optionally apply encryption to data that existed on the
system before encryption was enabled.

408

3. Click Restart the system now to enable encryption of data at rest once you have closed the
Configure Encryption window. If you do not click this, you need to disable and re-enable the file
system before encryption will begin.
4. Click OK to select the default AES 256-bit (CBC) algorithm, close the Configure Encryption
window, and continue.
Related CLI commands:
# filesys disable
Disables the file system
# filesys encryption enable
Enables encryption. Enter a passphrase when prompted
# filesys encryption algorithm set algorithm
Sets an alternative cryptographic algorithm (optional). Default algorithm is aes_256_cbc. Other
options are: aes_128_cbc, aes_128_gcm, or aes_256_gcm
# filesys enable
Enables the file system

409

Slide 22

Changing the Encryption Passphrase

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

22

Only administrative users with security officer credentials can change the encryption passphrase.
To change the existing encryption passphrase:
1. Disable the file system by clicking the disable button on the State line of the File System section.
The slide shows the file system state as disabled and shut down after the disable button clicked.
2. Click Change Passphrase.
3. Enter the security officer credentials to authorize the passphrase change.
4. Enter the current passphrase.
5. Enter the new passphrase twice.
6. Click Enable file system now if you want to reinstate services with the new passphrase;
otherwise the passphrase does not go into effect until the file system is re-enabled.
7. Click OK to proceed with the passphrase change.

410

Slide 23

Disabling Encryption

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

Only administrative users with security officer credentials can disable encryption.
To disable encryption on a Data Domain system:
1. Click Disable on the Encryption status line of the Encryption tab.
2. Enter the security officer credentials.
3. Click Restart file system now in order to stop any further encryption of data at rest.
Note: Restarting the file system will interrupt any processes currently running on the Data
Domain system.
4. Click OK to continue.

411

23

Related CLI commands:


# filesys encryption disable
Disables encryption. You are prompted for a security officer username and password in order to
disable encryption from the command line.
# filesys disable
Disables the file system.
# filesys enable
Enables the file system. The file system must be disabled and re-enabled to effect encryption
operations.

412

Slide 24

File System Locking

Requires two-user authentication.


Protects the data on the system from unauthorized data access.
Can be run only with file system encryption feature enabled to
encrypt all user data.
Prevents the retrieval of the encryption key.
Limits unlocking to only an administrator with the set
passphrase.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

24

Use file system locking when an encryption-enabled Data Domain system and its external storage
devices (if any) are being transported. Without the encryption provided in file system locking, user data
could possibly be recovered by a thief with forensic tools (especially if local compression is turned off).
This action requires two-user authentication a sysadmin and a security officer to confirm the lockdown action.
File system locking:
Requires the user name and password of a security officer account to lock the file system.
Protects the Data Domain system from unauthorized data access.
Is run only with the file system encryption feature enabled. File system locking encrypts all user
data, and the data cannot be decrypted without the key.
A passphrase protects the encryption key, which is stored on disk, and is encrypted by the
passphrase. With the system locked, this passphrase cannot be retrieved.
Allows only an admin, who knows the set passphrase, to unlock an encrypted file system.

413

Slide 25

File System Locking and Unlocking

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

25

Note: Before you can lock the file system, the file system must be stopped, disabled, and shut down.
To lock the file system:
1. In the passphrase area, enter the current passphrase (if one existed before) followed by a new
passphrase that locks the file system for transport. Repeat the passphrase in the Confirm New
Passphrase field.
2. Click OK to continue.
After the new passphrase is entered, the system destroys the cached copy of the current
passphrase. Therefore, anyone who does not possess the new passphrase cannot decrypt the
data.
CAUTION: Be sure to take care of the passphrase. If the passphrase is lost, you will never be able
to unlock the file system and access the data. There is no backdoor access to the file system. The
data is irretrievably lost.
3. Shut down the system using the system poweroff command from the command line
interface (CLI).
CAUTION: Do not use the chassis power switch to power off the system. There is no other
method for shutting down the system to invoke file system locking.

414

To unlock the file system:


1. Power on the Data Domain system.
2. Return to the Encryption view in the Data Domain Enterprise Manager and click the Unlock File
System button.
3. Enter the current lock file system passphrase. The file system re-enables itself.
Related CLI commands:
# filesys encryption lock
Locks the system by creating a new passphrase and destroying the cached copy of the current
passphrase. Before you run this command, you must run filesys disable and enter security
officer credentials.
# filesys encryption passphrase change
Changes the passphrase for system encryption keys. Before running this command, you must
run filesys disable and enter security officer credentials.
# filesys encryption show
Checks the status of the encryption feature.
# filesys encryption unlock
Prepares the encrypted file system for use after it has arrived at its destination.

415

Slide 26

Module 9: Summary

Retention lock prevents locked files from being deleted or

modified for up to 70 years.


Retention lock compliance edition requires a dual authorization
to initiate, renaming MTrees, or extending retention periods.
File system sanitization overwrites deleted files using a
DoD/NIST-compliant algorithm and procedures.
File system sanitization is available only through the command
line interface (CLI).
Encryption and decryption to and from the disk is transparent to
all access protocols no additional administration is required.
Encryption of data at rest allows data on system drives or
external storage to be encrypted.

Module 9: Data Security

Copyright 2013 EMC Corporation. All Rights Reserved.

416

26

Slide 1

Module 10: Sizing, Capacity and Throughput Planning


and Tuning
Upon completion of this module, you should be able to:
Describe capacity planning and why it is important
Perform basic capacity-planning calculations
Describe throughput planning and why it is important
Perform basic throughput-planning calculations and analysis
Identify throughput tuning steps

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

In any backup environment, it is critical to plan capacity and throughput adequately. Planning ensures
your backups complete within the time required and are securely retained for the needed times. Data
growth in backups is also a reality as business needs change. Inadequate capacity and bandwidth to
perform the backup can cause backups to lag, or fail to complete. Unplanned growth can fill a backup
device sooner than expected and choke backup processes.
The main goal in capacity planning is to design your system with a Data Domain model and configuration
that is able to hold the required data for the required retention periods and have plenty of space left
over to avoid system full conditions.
For throughput planning, the goal is to ensure the link bandwidth is sufficient to perform daily and
weekly backups to the Data Domain system within the backup window allotted. Good throughput
planning takes into consideration network bandwidth sharing, along with adequate backup and system
housekeeping timeframes (windows).

417

Upon completion of this module, you should be able to:


Describe capacity planning and why it is important
Perform basic capacity-planning calculations
Describe throughput planning and why it is important
Perform basic throughput-planning calculations and analysis
Identify throughput-tuning steps

418

Slide 2

Module 10: Sizing, Capacity and


Throughput Planning and Tuning
Lesson 1: Capacity Planning
This lesson covers the following topics:
Collecting information
Calculating capacity requirements

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

In this lesson, you will become familiar with the testing and evaluation process that helps to determine
the capacity requirements of a Data Domain system.
Collecting information
Determining and calculating capacity needs
Note: EMC Sales uses detailed software tools and formulas when working with its customers to identify
backup environment capacity and throughput needs. Such tools help systems architects recommend
systems with appropriate capacities and correct throughput to meet those needs. This lesson discusses
the most basic considerations for capacity and throughput planning.

419

Slide 3

Determining Capacity Needs

Longer Retention = Greater Data Reduction


How Much for How Long = Capacity Needs
How Much?
Data size
Data type
Full backup size
Data reduction rate (deduplication)
How Long?
Retention policy (duration)
Schedule

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

Using information collected about the backup system, you calculate capacity needs by understanding
the amount of data (data size) to be backed up, the types of data, the size of a full (complete) backup,
and the expected data reduction rates (deduplication).
Data Domain system internal indexes and other product components use additional, variable amounts
of storage, depending on the type of data and the sizes of files. If you send different data sets to
otherwise identical systems, one system may, over time, have room for more or less actual backup data
than another.
Data reduction factors depend on the type of data being backed up. Some types of challenging
(deduplication-unfriendly) data types include:
pre-compressed (multimedia, .mp3, .zip, and .jpg)
pre-encrypted data
Secondly, retention policies greatly determine the amount of deduplication that can be realized on a
Data Domain system. The longer data is retained, the greater the data reduction that can be realized. A
backup schedule where retained data is repeatedly replaced with new data ensures very little data
reduction.

420

Slide 4

Typical Data Reduction Expectations over Time

5x
Incremental plus weekly full backup with 2 weeks retention
Daily full backup with 1 week retention
Online and archival use data reduction tends to be capped here

10x
Incremental plus weekly full backup with 1 month of retention
Daily full backup with 2-3 weeks retention

20x
Incremental plus weekly full backup with 2-3 months retention
Daily full backup with 3-4 weeks retention

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

The reduction factors listed in this slide are examples of how changing retention rates can improve the
amount of data reduction over time.
The compression rates shown are approximate.
A daily full backup held only for one week on a Data Domain system may realize no more than a
compression factor of 5x, while holding weekly backups plus daily incrementals for up to 90 days may
result in 20x or higher compression.
Data reduction rates depend on a number of variables including data types, the amount of similar data,
and the length of storage. It is difficult to determine exactly what rates to expect from any given system.
The highest rates are usually achieved when many full backups are stored.
When calculating capacity planning, use average rates as a starting point for your calculations and refine
them after real data is available.

421

Slide 5

Calculating the Required Capacity

Total Space Required


First Full Backup +
Incremental Backups (4-6 per week) +
Weekly Cycle x Number of Weeks Retained
1st full backup 1 TB @ 5X= 200GB
incremental backup 100 GB @ 10X = 10 GB
full backup 1 TB @ 25X = 40 GB

remaining capacity

Base = 200GB
1 Week = 80 GB
1 Retention Period = 640 GB

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

Calculate the required capacity by adding up the space required in this manner:
First Full backup plus
Incremental backups (the number of days incrementals are runtypically 4-6) plus
Weekly cycle (one weekly full and 4-6 incrementals) times the number of weeks data is retained.
For example, 1 TB of data is backed up, and a conservative compression rate is estimated at 5x (which
may have come from a test or is a reasonable assumption to start with). This gives 200 GB needed for
the initial backup. With a 10 percent change rate in the data each day, incremental backups are 100 GB
each, and with an estimated compression on these of 10x, the amount of space required for each
incremental backup is 10 GB.
As subsequent full backups run, it is likely that the backup yields a higher data reduction rate. 25x is
estimated for the data reduction rate on subsequent full backups. 1 TB of data compresses to 40 GB.

422

Four daily incremental backups require 10 GB each, and one weekly backup needing 40 GB yields a burn
rate of 80 GB per week. Running the 80 GB weekly burn rate out over the full 8-week retention period
means that an estimated 640 GB is needed to store the daily incremental backups and the weekly full
backups.
Adding this to the initial full backup gives a total of 840 GB needed. On a Data Domain system with 1 TB
of usable capacity, this means the unit operates at about 84% of capacity. This may be okay for current
needs. You might want to consider a system with a larger capacity or that can have additional storage
added, which might be a better choice to allow for data growth.
Again, these calculations are for estimation purposes only. Before determining true capacity, use the
analysis of real data gathered from your system as a part of an EMC BRS sizing evaluation.

423

Slide 6

Module 10: Sizing, Capacity and


Throughput Planning and Tuning
Lesson 2: Throughput Planning
This lesson covers the following topic:
Calculating throughput requirements

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

In this lesson, you will become familiar with the testing and evaluation process that helps to determine
the throughput requirements of a Data Domain system.
Note: EMC Sales uses detailed software tools and formulas when working with customers to identify
backup environment capacity and throughput needs. Such tools help systems architects recommend
systems with appropriate capacities and correct throughput to meet those needs. This lesson discusses
the most basic considerations for capacity and throughput planning.

424

Slide 7

Calculating Required Throughput

Required Throughput =
Largest Backup divided by
Backup Window Time
Backup Server

20 GB/hr

200 GB

10 Hours

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

While capacity is one part of the sizing calculation, it is important not to neglect the throughput of the
data during backups.
An assumption would be that the greatest backup need is to process a full 200 GB backup within a 10hour backup window. Incremental backups should require much less time to complete, and we could
safely presume that incremental backups would easily complete within the backup window.
Dividing 200 GB by 10 hours yields a raw processing requirement of at least 20 GB per hour.
Over an unfettered 1 GB network with maximum bandwidth available (with a theoretical 270 GB per
hour throughput), this backup would take less than 1 hour to complete. If the network were sharing
throughput resources during the backup time window, the amount of time required to complete the
backup would increase considerably.
It is important to note the effective throughput of both the Data Domain system and the network on
which it runs. Both points in data transfer determine whether the required speeds are reliably feasible.
Feasibility can be assessed by running network testing software such as iperf.

425

Slide 8

Module 10: Sizing, Capacity and


Throughput Planning and Tuning
Lesson 3: Model Capacity and Throughput Performance
This lesson covers the following topic:
Matching the appropriate Data Domain hardware to your
capacity and throughput needs

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

This lesson applies the formulae from the previous two lessons to selecting the best Data Domain
system to fit specific capacity and throughput requirements.

426

Slide 9

System Model Capacity and Throughput Performance

Maximum capacity is the amount of usable data

storage space in a model


Maximum capacity is based on the maximum
number of drives supported by a model
Maximum throughput is achieved using either the
VTL interface and 8 Gbps Fibre Channel or DD
Boost and 10 Gb Ethernet
Visit the Data Domain Hardware page on
http://www.emc.com/ for the latest hardware
offerings and specifications

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

The system capacity numbers of a Data Domain system assume a mix of typical enterprise backup data
(such as file systems, databases, mail, and developer files). The low and high ends of the range are also
determined by how often data is backed up.
The maximum capacity for each Data Domain model assumes the maximum number of drives (either
internal or external) supported for that model.
Maximum throughput for each Data Domain model is dependent mostly on the number and speed
capability of the network interfaces being used to transfer data. Some Data Domain systems have more
and faster processors so they can process incoming data faster.
Note: Advertised capacity and throughput ratings for Data Domain products are best case results, based
on tests conducted in laboratory conditions. Your throughput will vary depending on your network
conditions.
The number of network streams you may expect to use depends on your hardware model. Refer to the
specific model Data Domain system guide to learn specific maximum supported stream counts.

427

Slide 10

Selecting a Model

Capacity percentage equals required capacity divided by


maximum capacity

Capacity % = Required Capacity / Maximum Capacity

Throughput percentage equals required throughput divided by


maximum throughput

Throughput % = Required Throughput / Maximum Throughput

Be conservative when determining which model to use


Use 80% of model capacity and throughput
Factor a 20% buffer for capacity and throughput

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

10

Standard practices are to be conservative in calculating capacity and throughput required for the needs
of a specific backup environment; estimate the need for greater throughput and capacity rather than
less. Apply your requirements against conservative ratings (not the maximums) of the Data Domain
system needed to meet requirements. Allow for a minimum 20% buffer in both capacity and throughput
requirements.

Required capacity divided by maximum capacity of a particular model times 100 equals the
capacity percentage.
Required throughput divided by the maximum throughput of a particular model times 100
equals the throughput percentage.

If the capacity or throughput percentage for a particular model does not provide at least a 20% buffer,
then calculate the capacity and throughput percentages for a Data Domain model of the next higher
capacity. For example, if the capacity calculation for a DD620 yields a capacity percentage of 91%, only a
9% buffer is available, so you should look at the DD640 next to calculate its capacity.
Sometimes one model provides adequate capacity, but does not provide enough throughput, or vice
versa. The model selection must accommodate both throughput and capacity requirements with an
appropriate buffer.

428

Slide 11

Calculating Capacity Buffer for Selected Models

Required Capacity = 3,248 GB

Model A

Model B

3,350 GB Capacity

7,216 GB Capacity

Capacity % = Required Capacity / Maximum Capacity


3,248/7,216 = 45%

3,248/3,350 = 97%

55% Buffer

3% Buffer

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

11

In this example, the capacity requirement of 3248 GB fills Model A to 97% of capacity.
Model B has a capacity of 7.2 TB. The capacity percentage estimated for Model B is 45%, and the 55%
buffer is more than adequate.

429

Slide 12

Matching Required Capacity to Model Specifications

Required Capacity = 3,248 GB


Model B

7,216 GB Capacity

3,248/7,216 = 45%
55% Buffer
OR?
Model A

3,350 GB Capacity

3,248/3,350 = 97%
3% Buffer
Model A

7,974 GB Capacity
(1 Additional Shelf)

3,248/7,974 = 40%
60% Buffer

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

In this example 3,248 GB capacity is needed.


It appears by the capacity specifications that Model A does not meet this need with only 3,350 GB
capacity. It leaves only a 3% buffer.
Model A with an additional shelf, offers 7,974 GB capacity. A 60% buffer is clearly a better option.
Model B is also a viable option with 7,216 GB capacity a 55% buffer.

430

12

Slide 13

Calculating Throughput Buffer for Selected Models

Required Throughput = 1,200 GB/hr

Model A

Model B

1,334 GB/hr

2,252 GB/hr

Throughput % = Required Throughput / Maximum Throughput


1,200/2,252 = 53%

1,200/1,334= 89%

47% Buffer

11% Buffer

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

13

This calculation is similar to calculating the capacity buffer for selected models.
Select a model that meets throughput requirements with no more than 80% of the models maximum
throughput capacity.
In this example, the throughput requirement of 1,200 GB per hour would load Model A to more than
89% of capacity, with a buffer of 11%.
A better selection is a model with higher throughput capability, such Model B, rated with 2,252 GB per
hour throughput and offering a 47% buffer in estimated throughput.

431

Slide 14

Matching Required Performance to Model Specifications

Required Capacity = 3,248 GB


Required Throughput = 1,200 GB/hr

Model A

Model A

Model B

3,350 GB Capacity
1,334 GB/hr Throughput

7,974 GB Capacity
(1 Additional Shelf)
1,334 GB/hr Throughput

7,216 GB Capacity
2,252 GB/hr Throughput

3% Capacity Buffer
11% Throughput Buffer

60% Capacity Buffer


11% Throughput Buffer

55% Capacity Buffer


47% Throughput Buffer

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

14

In summary, Model A with an additional shelf might meet the capacity requirement; Model B is the
minimum model that would meet the throughput performance requirement.
While Model A meets the storage capacity requirement, Model B is the best choice based upon the need
for greater throughput.
Note: Another option is to consider implementing DD Boost with Model A to raise the throughput
rating.

432

Slide 15

Module 10: Sizing, Capacity and


Throughput Planning and Tuning
Lesson 4: Throughput Monitoring and Tuning
This lesson covers the following topics:
Identifying bottlenecks
Displaying and understanding Data Domain system
performance metrics
Implementing Tuning Solutions

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

15

This lesson covers basic throughput monitoring and tuning on a Data Domain System.
There are three primary steps to throughput:
Identifying potential bottlenecks that might reduce the data transfer rates during backups and
restores.
Displaying and understanding Data Domain system performance metrics.
Identifying and implementing viable solutions to resolve slower-than-expected throughput
issues.

433

Slide 16

Throughput Bottlenecks

Where are possible throughput bottlenecks?

Clients
Network

Backup Server

Network

The Data Domain system collects and reports

performance metrics you can use to identify bottlenecks

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

16

Integrating Data Domain systems into an existing backup architecture can change the responsiveness of
the backup system. Bottlenecks can appear and restrict the flow of data being backed up.
Some possible bottlenecks are:
Clients
Disk Issues
Configuration
Connectivity
Network
Wire speeds
Switches and routers
Routing protocols and firewalls
Backup Server
Configuration
Load
Connectivity

434

Data Domain System


Connectivity
Configuration
Log level set too high

As demand shifts among system resources such as the backup host, client, network, and Data Domain
system itself the source of the bottlenecks can shift as well.
Eliminating bottlenecks where possible, or at least mitigating the cause of reduced performance through
system tuning, is essential to a productive backup system. Data Domain systems collect and report
performance metrics through real-time reporting and in log files to help identify potential bottlenecks
and their causes.

435

Slide 17

Data Domain System Performance MetricsNetwork and


Process Utilization

1. ops/s - Operations per

# system show performance

2
3
1---------------Protocol----------------4
ops/s
----0
0
0
0
0
0
0
0

load
--%-0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%

data(MB/s)
wait(ms/MB)
--in/out----in/out--0.00/ 0.00 221.02/ 80.53
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00
0.00/ 0.00 198.07/ 81.24
0.00/ 0.00
0.00/ 0.00

Note: The above output has been simplified for this lesson to
show only pertinent areas of # system show performance
output.

2.
3.

4.

second
load - Load percentage
(pending ops/total RPC
ops *100)
data (MB/s) - Protocol
throughput. Amount of
data the file system can
read from and write to
the kernel socket buffer
wait (ms/MB) - Time
taken to send and
receive 1MB of data
from the file system to
kernel socket buffer

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

17

If you notice backups running slower than expected, it is useful to review system performance metrics.
From the command line, use the command system show performance
The command syntax is:
# system show performance [ {hr | min | sec} [ {hr | min | sec} ]]
For example:
# system show performance 24 hr 10 min
This shows the system performance for the last 24 hours at 10 minute intervals. 1 minute is the
minimum interval.
Servicing a file system request consists of three steps: receiving the request over the network,
processing the request, and sending a reply to the request.

436

Utilization is measured in four states:

ops/s
Operations per second.
load
Load percentage (pending ops/total RPC ops *100).
data (MB/s in/out)
Protocol throughput. Amount of data the file system can read from and write to the kernel
socket buffer.
wait (ms/MB in/out)
Time taken to send and receive 1MB of data from the file system to kernel socket buffer.

437

Slide 18

Data Domain System Performance MetricsCPU and Disk


Utilization
1. State:
# system show performance
C cleaning
1 -State- -----Utilization----D disk reconstruction
'CDBVMSFI' 2 CPU
3 disk
V verification
--------avg/max---- --max---------0%/ 0%[0]
2%[01]
2. CPU avg/max: average and
-------0%/ 0%[0]
2%[01]
maximum CPU utilization;
-------0%/ 0%[0]
2%[02]
-------0%/ 0%[0]
2%[01]
the CPU ID of the most-------0%/ 0%[0]
2%[01]
loaded CPU is shown in the
0%/ 0%[0]
2%[01]
--------------0%/ 0%[0]
2%[01]
brackets
-------0%/ 0%[0]
2%[01]
3. Disk max: maximum
-------0%/ 0%[0]
2%[01]
-------0%/ 0%[0]
2%[01]
(highest) disk utilization over
all disks; the disk ID of the
Note: The above output has been simplified for this
lesson to show only pertinent areas of # system show
most-loaded disk is shown in
performance output.
the brackets

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

18

An important section of the system show performance output is the CPU and disk utilization.

CPU avg/max: The average and maximum CPU utilization; the CPU ID of the most-loaded CPU is
shown in the brackets.
Disk max: Maximum disk utilization over all disks; the disk ID of the most-loaded disk is shown in
the brackets.

If the CPU utilization shows 80% or greater, or if the disk utilization is 60% or greater for an extended
period of time, the Data Domain system is likely to run out of disk capacity or is the CPU processing
maximum. Check that there is no cleaning or disk reconstruction in progress. You can check cleaning and
disk reconstruction in the State section of the system show performance report.

438

The following is a list of states and their meaning indicated in the # system show performance output:
C Cleaning
D Disk reconstruction
B GDA (also known as multinode cluster [MNC] balancing)
V Verification (used in the deduplication process)
M Fingerprint merge (used in the deduplication process)
F Archive data movement (active to archive)
S Summary vector checkpoint (used in the deduplication process)
I Data integrity
Typically the processes listed in the State section of the system show performance report impact the
amount of CPU utilization for handling backup and replication activity.

439

Slide 19

Data Domain System Stats MetricsThroughput


# system show stats interval 2
---------------------------------------------------------------------CPU |Net
|Disk
|NVRAM
|Repl
aggr|eth0a eth0a|
|
aggr
aggr|
busy|
in
out|
read
write busy|
read
write|
in
out
%| MB/s MB/s| KiB/s
KiB/s
%| KiB/s
KiB/s|
KB/s
KB/s
---- ----- ----- ------- ------- ---- ------- ------- ------- ------11 17951
436
0
1989
1
0
0
0
0
12 18735
455
4
3078
0
0
0
0
0
10 18269
445
4
64
0
0
0
0
0
9 17103
418
4
764
0
0
0
0
0
10 16556
404
4
764
0
0
0
0
0
10 18269
445
4
64
0
0
0
0
0
9 17103
418
4
764
0
0
0
0
0
10 16556
404
4
764
0
0
0
0
0
10 18269
445
4
64
0
0
0
0
0
9 17103
418
4
764
0
0
0
0
0
Note: The above output has been simplified for this lesson to show only pertinent areas of # system show stats
output.

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

19

In addition to watching disk utilization, you should monitor the rate at which data is being received and
processed. These throughput statistics are measured at several points in the system to assist with
analyzing the performance to identify bottlenecks.
If slow performance is happening in real-time, you can also run the following command:
# system show stats interval [interval in seconds]
Example:
# system show stats interval 2
Adding 2 produces a new line of data every 2 seconds.
The system show stats command reports CPU activity and disk read/write amounts.
In the example report shown, you can see a high and steady amount of data inbound on the network
interface, which indicates that the backup host is writing data to the Data Domain device. We know it is
backup traffic and not replication traffic as the Repl column is reporting no activity.

440

Low disk-write rates relative to steady inbound network activity are likely because much of the incoming
data segments are duplicates of segments already stored on disk. The Data Domain system is identifying
the duplicates in real time as they arrive and writing only those new segments it detects.

441

Slide 20

Tuning Solutions

Reduce stream count


Dont clean during heavy input
Dont replicate during heavy input
Consider using link aggregation
Reduce hop count
Isolate network to reduce other network congestion
Consider implementing DD Boost

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

20

If you experience system performance concerns, for example, you are exceeding your backup window,
or if throughput appears to be slower than expected, consider the following:
Check the Streams columns of the system show performance command to make sure that the
system is not exceeding the recommended write and read stream count. Look specifically under
rd (active read streams) and wr (active write streams) to determine the stream count. Compare
this to the recommended number of streams allowed for your system. If you are unsure about
the recommended streams number, contact Data Domain Support for assistance.
Check that CPU utilization (1 process) is not unusually high. If you see CPU utilization at or
above 80%, it is possible that the CPU is under-powered for the load it is required to currently
process.
Check the State output of the system show performance command. Confirm that there
is no cleaning (C) or disk reconstruction (D) in progress.
Check the output of the replication show performance all command. Confirm
that there is no replication in progress. If there is no replication activity, the output reports
zeros. Press Ctrl + c to stop the command. If replication is occurring during data ingestion and
causing slower-than-expected performance, you might want to separate these two activities in
your backup schedule.

442

If CPU utilization (1 process) is unusually high for any extended length, and you are unable to
determine the cause, contact Data Domain Support for further assistance.
When you are identifying performance problems, it is important to note the actual time when
poor performance was observed to know where to look in the system show performance output
chronology.

An example of a network-related problem occurs when the client is trying to access the Data Domain
system over a 100 MBit network, rather than a 1 GB network.
Check network settings, and ensure the switch is running 1 GB to the Data Domain system and is
not set to 100 MBit
If possible, consider implementing link aggregation.
Isolate the network between the backup server and the Data Domain system. Shared bandwidth
adversely impacts optimum network throughput.
Consider implementing DD Boost to improve overall transfer rates between backup hosts and
Data Domain systems.

443

Slide 21

Module 10: Summary

The steps to planning capacity and throughput are:


Gather data collection and retention policies
Determine capacity requirements
Calculate throughput requirements
Match the appropriate Data Domain hardware model to your

capacity and throughput needs

Tuning solutions include:


Avoid running replication or cleaning processes during high data

ingestion
Implement link aggregation
Consider implementing DD Boost
Maximize Data Domain system storage capacity

Module 10: Sizing, Capacity and Throughput Planning and Tuning

Copyright 2013 EMC Corporation. All Rights Reserved.

444

21

Das könnte Ihnen auch gefallen