Sie sind auf Seite 1von 47

Technical Report

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use
Dr. Adolf Hohl, Georg Mey, NetApp, with support from the NetApp Field Centers for Innovation October 2010 | RA-0007

ABSTRACT
In the face of exponential data growth, efficient management of data is crucial. NetApp provides a set of technologies to do more with less. These technologies allow for thin provisioned storagethe ability to consolidate much more data on NetApp storage controllers than would fit in the physically attached disks. This document explains how to achieve best-in-class storage use and how to manage thin-provisioned storage to enable storage efficiency in daily life while meeting service level agreements.

TABLE OF CONTENTS 1 2 EXECUTIVE SUMMARY ........................................................................................................................ 4 INTRODUCTION .................................................................................................................................... 5


2.1 2.2 2.3 2.4 2.5 TERMINOLOGY.........................................................................................................................................................5 GOAL OF THIS DOCUMENT .................................................................................................................................... 6 AUDIENCE ................................................................................................................................................................8 SCENARIO ................................................................................................................................................................9 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY ...................................................... 11

PROVISIONING ................................................................................................................................... 12
3.1 3.2 3.3 PROVISIONING FROM SCRATCH: FULL FAT TO ZERO FAT PROVISIONING.................................................... 12 PROVISIONING FROM TEMPLATES: VOLUME AND DEDUPE-CENTRIC LAYOUTS.......................................... 22 SETTLED/NOMAD PROVISIONING FOR NETAPP DATA MOTION ...................................................................... 27

OPERATION ........................................................................................................................................ 30
4.1 4.2 4.3 4.4 PHASES AND TRANSITIONS ................................................................................................................................. 31 MONITORING .......................................................................................................................................................... 31 NOTIFICATION ........................................................................................................................................................ 35 MITIGATE STORAGE USE ..................................................................................................................................... 37

REAL-LIFE SETTINGS ........................................................................................................................ 39


5.1 5.2 SAMPLE SETTING 1: REAL-LIFE SETTING .......................................................................................................... 39 SAMPLE SETTING 2: SETTLED/NOMAD............................................................................................................... 41

6 7 8

STORAGE EFFICIENCY COOKBOOK ............................................................................................... 43 REFERENCES ..................................................................................................................................... 46 ACKNOWLEDGMENTS....................................................................................................................... 47

LIST OF TABLES Table 1) NetApp technologies for storage efficiency and flexibility. ........................................................... 11 Table 2) Full fat provisioning. ...................................................................................................................... 13 Table 3) Zero fat provisioning. .................................................................................................................... 14 Table 4) Full fat provisioning. ...................................................................................................................... 16 Table 5) Low fat provisioning. ..................................................................................................................... 16 Table 6) Zero fat provisioning. .................................................................................................................... 17 Table 7) Comparison of provisioning methods. .......................................................................................... 18 Table 8) Mitigation alternatives to control use within aggregates. .............................................................. 38 Table 9) Mitigation activities for resource tightness within volumes. .......................................................... 38 Table 10) Phase transitions with settled/nomad provisioning pattern and on-line migration mitigation alternative. .................................................................................................................................................. 41

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

LIST OF FIGURES Figure 1) Terminology in context of the storage objects of volumes and aggregates. ................................. 6 Figure 2) Storage consolidation and growing utilization using thin provisioning. ......................................... 7 Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate. 7 Figure 4) Mitigate to prevent uncontrolled utilization. ................................................................................... 8 Figure 5) Sample service levels ordered by service disruption and recovery time. ..................................... 9 Figure 6) Questions regarding storage efficiency from an operational point of view.................................. 10 Figure 7) Provisioning model for NAS storage from scratch. Technically, only two out of four combinations are possible. ................................................................................................................................................ 13 Figure 8) Provisioning model for SAN storage from scratch. ..................................................................... 15 Figure 9) Configuring full/zero fat provisioning policy using Provisioning Manager for NAS. Select checkboxes as outlined. Provisioning Manager deviates from zero/full fat by first growing volumes with autosize and then allowing snapshot autodelete. ....................................................................................... 20 Figure 10) Configuring full/low/zero fat provisioning policy using Provisioning Manager for SAN storage. Select checkboxes as outlined. Provisioning Manager deviates by not turning on autosize for zero fat. .. 21 Figure 11) Full/low/zero fat provisioning policies for datasets and storage services.................................. 21 Figure 12) Volume-centric storage provisioning. Application instances are aligned horizontally with their volumes. ...................................................................................................................................................... 24 Figure 13) Dedupe-centric storage provisioning. Application instances are aligned horizontally; volumes are aligned vertically. .................................................................................................................................. 26 Figure 14) Settled/nomad provisioning into an aggregate. In case of aggregate tightness, a nomad is migrated to a separate aggregate. .............................................................................................................. 27 Figure 15) Alignment by technical impact (sorted by negative impact in descending order). .................... 28 Figure 16) Alignment by business impact (sorted by negative impact in descending order). .................... 28 Figure 17) Operations Manager screen to configure thresholds on operational metrics. .......................... 32 Figure 18) Trending of data growth and days-to-full prediction in Operations Manager. ........................... 33 Figure 19) Storage efficiency dashboard in Operations Manager. ............................................................. 34 Figure 20) Configuring an alarm based on the threshold, aggregate almost full........................................ 36 Figure 21) Storage to enable organic data growth between planned downtime windows. ........................ 39 Figure 22) Transition of changes depending on the metrics, aggregate capacity used and aggregate committed space. ........................................................................................................................................ 40 Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months. ....................................................................................................................................................... 41 Figure 24) Visualization of phase transitions depending on metric, aggregate capacity used. .................. 42 Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe. ....................................................................................................................................................... 43

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

1 EXECUTIVE SUMMARY
This document provides consolidated best practices to achieve and manage best-in-class storage use. We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible. Starting with provisioning models focused on high consolidation and operational agility, we describe the operational phases and its transitions. A list of mitigation alternatives describes the available alternatives to control use in the face of data growth. Finally, this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies.

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

2 INTRODUCTION
Exponential data growth generates a serious challenge for IT managers. Gartner predicts that within the period from 2008 to 2013, enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (www.gartner.com/technology/mediaproducts/newsletters/netapp/issue24/gartner3.html). Until recently, continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems. However, IT executives are discovering that there are limits to that solution; floor space, weight loads, rack space, network drops, power connections, cooling infrastructure, and even power itself are finite resources. Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands. NetApps solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently. Key benefits of this strategy are: Less management involvement Reduced complexity, support, and service costs Improved performance and network efficiency

NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth; they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future. In this document, we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment. By adhering to the best practices outlined, you can increase storage consolidation and agility as well as decrease operational risk. The document is organized as follows: Chapter 3 describes storage provisioning. Chapter 4 describes the monitoring process and supporting tools for daily operation. Chapter 5 describes concrete operational setups used in daily life. Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to adapt individual thresholds.

2.1

TERMINOLOGY

We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers. Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology. Logical storage refers to storage that is visible at the application layer. Logical storage does not necessarily require the allocation of usable capacity. Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers. Used capacity is a value that represents the amount of physical capacity that holds application or 1 user data. In Operations Manager terminology, this is represented by capacity used. Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns.

NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts, reports, performance, and configuration tools.

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes. Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality. This rate is measured for the volume and the aggregate level in percentage metric.

For the aggregate, we define different operational windows characterized by an interval of storage utilization. We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability. We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window. We define an interval as a no-go area (red) where we do not intend to operate the aggregate. This area might act as a last buffer of time or can be considered an area where operational staff has less experience. Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller. The aggregate is a physically limited storage object. Aggregates are treated as fairly static containers and thus need proper size management.
Figure 1) Terminology in context of the storage objects of volumes and aggregates.
Committed Logical Storage

...

Volumes with LUNS/NAS

Operational Sweet Spot Corridor

Data

Data Growth

Used Capacity

Usable Capacity of Aggregate

In practice, commitment rates far above 100% are common in customer environments. This document describes how to manage this.

2.2

GOAL OF THIS DOCUMENT

The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements. The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor. NetApp storage efficiency technologies can save a significant amount of the IT budget. On the other side, running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment. The difference in managing thin-provisioned storage compared to traditional storage is that, due to dense consolidation of application data, accumulated application data growth rates might vary in a broader corridor than they would traditionally.

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

To control the level of physical resources, we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies. We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility. To summarize, this document introduces three phases to manage the storage on NetApp storage controllers: provisioning, organic growth, and mitigation of storage tightness. These phases play a vital role for aggregates, which are the coarsest storage abstraction of a NetApp storage controller. Provisioning phase. In this phase, storage is provisioned by the NetApp shared storage infrastructure, which increases the utilization of aggregates. The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor. Figure 2 visualizes this corridor. Organic growth phase. In this phase, no further storage is provisioned to slow down growth of aggregate utilization. The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications. Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows. Figure 3 visualizes slowed growth. Mitigation of storage tightness phase. This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level. Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor. Figure 4 visualizes this mitigation.

Figure 2) Storage consolidation and growing utilization using thin provisioning.

Operational Sweet Spot Corridor

Data

Data Growth

Aggregate Capacity

Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate.

Data

Data Growth

Aggregate Capacity

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 4) Mitigate to prevent uncontrolled utilization.

Data

Aggregate Capacity

Mitigate to prevent uncontrolled utilization

This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases.

2.3

AUDIENCE

This document addresses two audiences: Decision makers. It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization. Operational teams. It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth. It allows the operational teams to implement a basic setting and to position their usage goals. We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications.

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

2.4

SCENARIO

As a scenario, we consider a service provider who delivers a set of IT services. This service provider might serve internal or external customers at different service levels. The service levels provided are characterized by unplanned downtime, as exemplified by Figure 5. This characterization is useful for aligning service data with physical resources. In our example, the highest level of service availability is delivered for Platinum services. It is further assumed that provided services have different lifetimes and dates of creation. The service providers major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies. This directly translates into cost savings related to capital investments, floor space, cooling, maintenance, and operational expenses. However, storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications. Predicting the data growth rates depends on several parameters, which are usually outside of control and knowledge of the service provider. These parameters include usage characteristics, number of users, and functionality used. To compensate for the deficiencies in precisely predicting the data growth over a specific time frame, we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates.
Figure 5) Sample service levels ordered by service disruption and recovery time.

Best Effort

Gold Recovery Time


Low Production

Silver
Production Low budget

Bronze
Best Effort Services Dev/Test Cold/Fillup data Dynamic/short term data

Platinum
Lowest Production Premium customers

Gold
Production

Lowest

Low

Best Effort

Disruption

In this document, the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility. Thus, we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency. We address the questions posed by the operational staff, such as: How do we set it up? How do we run this? How do we integrate necessary procedures in our daily life?

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

A set of questions pertaining to the lifetime of a service instance and its storage arises. It starts with provisioning storage in a NetApp shared storage infrastructure, detection and monitoring of situations endangering the level of a service, necessary response procedures, and promoting a continuous and smooth delivery of services. The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage. Figure 6 shows important questions regarding storage efficiency from an operational point of view. Provisioning deals with the provisioning of storage. In this document, provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility. For individual applications, NetApp provides a rich library of technical reports on how to provision best. Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example, the provisioning of storage). Relevant metrics provided by NetApp Operations Manager are described. Notifying deals with how to notify people in charge of when to perform certain actions. The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events. Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment.

Figure 6) Questions regarding storage efficiency from an operational point of view.

Provision

Available options? Implications on SLAs? When to act?

Mitigate

Monitor

How to provision best for storage efficiency? Provisioning Models NetApp Data Motion awareness From scratch or template/clone? Where to provision to? Which SLA? What are the defaults?

Notification
Who is in charge to react? How to notify?

What is critical? When to stop provisioning? When to stop extending? When to relax tightness? How to detect? Monitoring Tools? What to monitor?

Before discussing the details of this cycle, it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases.

10

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

2.5

NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY

The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value. For example, FlexClone technology provides significant time and space advantages while provisioning, but the space advantage might be reduced over time. In contrast, deduplication technology can achieve space savings over the entire storage lifetime.
Table 1) NetApp technologies for storage efficiency and flexibility.

NetApp Technology
FlexClone FlexVol Deduplication

Benefit
Instantly creates thin provisioned and space-efficient writable clones. Implements thin provisioning and consumes only the needed space rather than the requested space. Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage. Provides flexibility for management and optimal load/capacity rebalancing in growing cloud environments without downtime. Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources. It allows extending physical aggregates during operation.

During During Provisioning Operation


X X X X X

NetApp Data Motion

Aggregate Extensibility in Data ONTAP

Furthermore, NetApp RAID-DP , SATA, and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously. It is assumed that these technologies are deployed according to the requirements of the use case. NETAPP SHARED STORAGE INFRASTRUCTURE To implement the practices outlined in this document, some prerequisites must be met. We assume a NetApp shared storage infrastructure implemented using large aggregates. This acts as a utility for delivering storage in a flexible manner for applications with different needs. It scales with the demands and serves a variety of different service levels at the same time. NetApp Operations Manager monitors the NetApp shared storage infrastructure. This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers. Based on this information, Operations Manager indicates the necessity to change the phases and behavior in the data center. The NetApp shared storage infrastructure provides different ways for clients to consume its resources. It can provide a traditional view where storage resources are located at a specific controller. Using NetApp Provisioning Manager, the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers. The abstractions of a storage service catalog, resource pools, and datasets provide easy manageability in the face of massive scale. If multi-tenancy is not required, then this is the abstraction of choice. Supported by the NetApp technologies MultiStore (vFiler ) and NetApp Data Motion, storage can be provided in a utilitylike fashion, independent of physical hardware. This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests.

11

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

3 PROVISIONING
Some features, such as data deduplication, can be turned on or off at any time. However, to achieve the maximum consolidation and storage efficiency, more strategies must be considered during the dataprovisioning phase. In addition, provisioning should take the flexibility of storage (for example, the migrate ability) into account. This allows administrators to easily move data off aggregates approaching capacity without downtime. This is also an important aspect when planning to deliver services 24x7. Thus, design goals for storage layouts are storage efficiency and operational flexibility. In the following sections, we discuss three orthogonal dimensions of provisioning storage. Two of them focus on achieving data consolidation. The third one focuses on achieving operational flexibility. All dimensions can be combined independently. Note that the achievable level of consolidation depends on the applications and its data. TR-3827: If You Are Doing This, Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility.

3.1

PROVISIONING FROM SCRATCH: FULL FAT TO ZERO FAT PROVISIONING

This section deals with the way data is provisioned and the consequences for storage efficiency. We recommend applying the so-called zero fat configurations. We consider the storage setup for a single application instance. The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage. When the technical dimensions of storage provisioning are categorized in primary data and its Snapshot copies space, there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments. In practical applications, only two variants are relevant to NAS, and three variants are relevant to SAN storage: Full fat Low fat Zero fat

According to NetApp best practices, we do not mix block and file data within a single volume, which allows us to consider NAS and SAN environments separately. NAS For NAS, two options are recommended: full fat and zero fat. Full fat. The primary data and Snapshot copy space are preallocated. Zero fat. Primary data and its Snapshot copy space are allocated on demand. This variant achieves the best ratio of storage efficiency when provisioning applications from scratch.

12

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 7) Provisioning model for NAS storage from scratch. Technically, only two out of four combinations are possible.

Primary Data (Files & Directory) Space Allocation Fat Snapshot Copy Space Allocation
Note:

Thin No Option Zero Fat Option

Fat Thin

Full Fat Option No Option

Full fat is characterized slightly different in NAS and SAN due to their technical properties.

FULL FAT PROVISIONING Full fat provisioning NAS is the traditional (default) way to implement NFS/CIFS shares. Volumes in a full fat configuration are characterized as follows: Volumes are created with space guarantee. The size of the volume follows the formula X + . X is the size of the primary data = sum of all user data (files and directories) within the volume; is the amount of space needed to hold Snapshot data. Because space used for Snapshot copies might grow unexpectedly, the autosize function can be used to make space available when reaching a certain volume threshold. This would also happen when the space reserved for user data gets low. Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients). For volumes with deduplication enabled, volume autogrow is a mandatory option. Normally, using autodelete is not recommended in NAS environments. Keeping a certain number of Snapshot copies for file versioning/restores might be part of the SLAs defined for file services. Note: Deleting snapshots may be a reasonable approach when no other option for freeing up space is available, but this will be a specific and individual decision.

Table 2) Full fat provisioning.

Option
Volume Options guarantee fractional_reserve

Recommended Value

Notes

volume 100% Leave at default, mostly relevant for SAN environment. Default value up to Data ONTAP 7.3.3 is 100%. For later releases, 0% is the default. Turn autosize on. There is no artificial limited volume that needs to be monitored. Autosize makes sense to allow growth of user data beyond the guaranteed space limit.

autosize

on

13

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Option
autosize options

Recommended Value
-m X% -i Y%

Notes
The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions. A reasonable resizing increment depends on various factors, such as data growth rate in the particular volume, the volume size itself, and so on.

Volume Snapshot Options reserve schedule autodelete yes switched on off Value depends on number of Snapshot copies and change rate within the volume. Automatic Snapshot technology schedules. Deleting Snapshot copies is not recommended in most NAS environments.

ZERO FAT PROVISIONING The zero fat method is the most efficient way to provision NAS volumes: Volumes are created without space guarantee. The size of the volume still follows the formula X + . X is the size of the primary data = sum of all user data (files and directories) within the volume; is the amount of space needed to hold Snapshot data. Sizing the volume defines a container with a virtual size for the consumers. NAS users are familiar with fixed-sized file shares. Space used for Snapshot copies can grow unexpectedly. You can use the autosize function to make space available when reaching a certain volume threshold. You can also use the autosize function when the space reserved for user data gets low. Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies. For volumes with deduplication enabled, volume autogrow is a mandatory option. Using autodelete is normally not recommended in NAS environments. Keeping a certain amount of space for Snapshot copies for file versioning/restores is part of the SLAs defined for file services. Note: Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available, but this will be a specific and individual decision.

Table 3) Zero fat provisioning.

Option
Volume Options guarantee

Recommended Value

Notes

none Leave at default, mostly relevant for SAN environment. Default value up to Data ONTAP 7.3.3 is 100%. For later releases, 0% is the default. Turn autosize on. There is no artificial limited volume that needs to be monitored. Autosize makes sense to allow growth of user data beyond the guaranteed space limit.

fractional_reserve 100%

autosize

on

14

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Option
autosize options

Recommended Value
-m X% -i Y%

Notes
The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions. A reasonable resizing increment depends on various factors, such as data growth rate in the particular volume, the volume size itself, and so on. Autodelete is not recommended in most environments.

try first

Volume Snapshot Options reserve yes/no The value depends on the number of Snapshot copies and the change rate within the volume. Displaying only the committed usable space using SLA is the preferred way to provision NAS storage. However, there might be situations in which the Snapshot reserve area is omitted (no). Automatic Snapshot technology schedules. Deleting Snapshot copies is not recommended in most NAS environments.

schedule autodelete

switched on off

SAN For SAN, we consider three options: Full fat. Both primary data and its Snapshot copy space are preallocated. Low fat. The primary data is preallocated. The Snapshot copy space is allocated on demand. Zero fat. Primary data and its Snapshot copy space are allocated on demand. This variant achieves the best ratio of storage efficiency when provisioning applications from scratch.

Figure 8) Provisioning model for SAN storage from scratch.

Primary Data (LUN) Space Allocation Fat Snapshot Copy Space Allocation Fat Thin Full Fat Option Low Fat Option Thin No Option Zero Fat Option

FULL FAT PROVISIONING This method can be treated as the historical way of provisioning block storage with Data ONTAP. Volumes are created with space guarantee. A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten completely with Snapshot copies in place. If this space is not available, Snapshot copy creation will fail. The size of the volume follows the formula 2X + . X is the size of the primary data = sum of all LUN capacities within the volume; is the amount of space needed to hold Snapshot copy data.

15

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete. As of today, the default settings for creating volumes/LUNs in Data ONTAP still apply to these settings. See Provisioning from Scratch: Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager.
Table 4) Full fat provisioning.

Option
Volume Options guarantee fractional_reserve

Recommended Value

Notes

volume 100 Even technically possible, a fractional reserve below 100% incorporates a potential risk to run out of Snapshot copy overwrite space. This situation should be avoided. Autosize could be used as an option to create free space needed for Snapshot copy creation.

autosize

off

Volume Snapshot Options reserve schedule autodelete LUN Options reservation enable 0 switched off off

LOW FAT PROVISIONING With low fat provisioning, we use a more space-efficient way to provision volumes: Volumes are created with space guarantee. LUNs are created with space guarantee as well. This setup does not benefit from unused blocks with a LUN. (During the lifetime of a LUN, the amount of free, unused blocks typically decreases. Without space reclamation techniques, allocated blocks on the storage system stay allocated.) The size of the volume follows the formula X + . X is the size of the primary data = sum of all LUN capacities within the volume; is the amount of space needed to hold Snapshot copy data. Because space used for Snapshot copies might grow unexpectedly, the autosize and autodelete policies are used to make space available when reaching a preset volume threshold.

Table 5) Low fat provisioning.

Option
Volume Options guarantee fractional_reserve autosize

Recommended Value

Notes

volume 0 on Snapshot space is controlled by autodelete and autosize options. Turn autosize on.

16

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Option
autosize options

Recommended Value
-m X% -i Y%

Notes
The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions. A reasonable resizing increment depends on various factors, such as data growth rate in the particular volume, the volume size itself, and so on. Increasing the size of the volume does not destroy any data or information. There is no reason not to increase the size of the volume. It can be reverted afterward if the volume free space increases again. There might be configurations where automatic volume growth is not desired.

try first

volume_grow

Volume Snapshot Options reserve schedule 0 switched off For NAS volumes, setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup. For SAN volumes, this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide). There might be Snapshot copies that are needed to fulfill certain SLAs, such as backup SLAs. Setting this policy needs to be negotiated with the business requirements. In the worst case scenario, deleting Snapshot copies is not an option. There is a precedent for Snapshot copies being a candidate for deletion; oldest_first is the current default.

autodelete

on

autodelete options LUN Options reservation

volume, oldest_first

enable

Reserves space for the LUN during creation.

ZERO FAT PROVISIONING Full and low fat provisioning use fully allocated volumes and LUNs. Per-default space allocation happens within the boundaries of the LUN and the volume. Zero fat follows a 100% allocate on demand concept. Volumes are created without space guarantee. LUNs are created without space guarantee. The size of the volume follows the formula X N + . X is the size of the primary data = sum of all LUN capacities within the volume; is the amount of space needed to hold Snapshot copy data; N is the amount of unused blocks within a given LUN.
Table 6) Zero fat provisioning.

Option
Volume Options guarantee

Recommended Value

Notes

none

No space reservation for volume at all. With Data ONTAP 7.3.3, fractional_reserve can be modified even for volumes without a space guarantee of type volume. Prior to Data ONTAP 7.3.3, the value was fixed at 100%. Turn autosize on.

fractional_reserve 0

autosize

on

17

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Option
autosize options

Recommended Value
-m X% -i Y%

Notes
The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions. A reasonable resizing increment depends on various factors, such as data growth rate in the particular volume, the volume size itself, and so on.

try first

volume_grow

Volume Snapshot Options reserve schedule 0 switched off For NAS volumes, setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup. For SAN volumes, this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide). Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregates free space becomes low.

autodelete

off

LUN Options reservation disable No preallocation of blocks for LUN.

SUMMARY OF PROVISIONING METHODS There are good reasons for using any of the provisioning methods already described; however, full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio. Even with a 100% block usage ratio on primary data, zero fat provisioning has many advantages and is the preferred method. The aggregates free space is a global pool that can serve space for volumes. This gives more flexibility than volumes with their own dedicated free space. For SAN volumes, the block consumption can be easily monitored. Deduplication savings go directly into the global pool of free space, which is the aggregate or the resource pool in which it belongs. Monitoring is needed only on the aggregate level. Volumes will grow on demand.

Table 7) Comparison of provisioning methods.

Characteristics
Space consumption Space efficient Monitoring Notification/mitigation process required

Full Fat
2X + No Optional No

Low Fat
X+ Partially, for Snapshot copies Required on volume and aggregate level Optional in most cases

Zero Fat
X N + 2 Yes Required on aggregate level Yes

N is the traditional thin provisioning impact = amount of blocks logically allocated, but not used.

18

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Characteristics
Pool benefitting of dedupe savings Risk of an out of space condition on primary data Typical use cases

Full Fat
Volume fractional reserve area No

Low Fat
Volume free space area No, as long as autodelete is able to delete any Snapshot copies Large database environments

Zero Fat
Aggregate free space area Yes, when monitoring and notification processes are missing Shared storage infrastructure Test/dev environments Storage pools for virtualized servers

Small installations None or few storage management skills (no monitoring infrastructure)

FULL/LOW/ZERO FAT PROVISIONING WITH PROVISIONING MANAGER NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure. It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure. These processes are: Faster than manually provisioning storage Easier to maintain than scripts Instrumental in minimizing the risk of data loss resulting from misconfigured storage

Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity. This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues. The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources. Provisioning Manager can help improve business agility and capacity use, shrink provisioning time, and improve administrator productivity. Provisioning Managers thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment. A GUI allows you to implement the provisioning models full/low/zero fat within Provisioning Manager. See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN. Whenever storage is provisioned using this provisioning policy, the settings apply automatically. For more information, refer to TR-3710: Operations Manager, Provisioning Manager, and Protection Manager Best Practices Guide. Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks. Policies and their use in socalled datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software. Note: Provisioning Manager up to version 4.0 does not allow you to specify autosize and autodelete individually; the policy template determines if these features are used and which options are selected. In order to implement the provisioning methods outlined, a customized Provisioning Script needs to be provided to set autosize and autodelete parameters according to the recommendations for Full/Low/Zero methods. Post provisioning scripts are standard with Provision Manager. Use caution when Provisioning Manager runs conformance checks; this reverts individual settings.

19

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 9) Configuring full/zero fat provisioning policy using Provisioning Manager for NAS. Select checkboxes as outlined. Provisioning Manager deviates from zero/full fat by first growing volumes with autosize and then allowing snapshot autodelete.

20

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 10) Configuring full/low/zero fat provisioning policy using Provisioning Manager for SAN storage. Select checkboxes as outlined. Provisioning Manager deviates by not turning on autosize for zero fat.

FULL/LOW/ZERO FAT PROVISIONING FOR STORAGE SERVICES Storage services are an easy abstraction to provision storage in a utilitylike fashion. A storage service describes all characteristic attributes for storage needed in a certain scenario. A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand. Technically, storage services or datasets consist of one or more resource pools, a protection policy, and a provisioning policy. Full/low/zero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy. Figure 11 shows the provisioning policies closest to full/low/zero fat configurations.
Figure 11) Full/low/zero fat provisioning policies for datasets and storage services.

Because this wizard is able to configure the deduplication feature, two policies are configured for the zero fat configurations, one with deduplication and one without deduplication.

21

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

HOW SHOULD A VOLUME BE SIZED? Because physical allocation of data within a zero fat-provisioned volume is done on demand, theoretically, the volume size can be set to a very high value that can easily keep all application data and Snapshot copies. As the unallocated space in the volume is not exclusively reserved for this volume itself, all other applications can benefit from the shared pool of unallocated storage. However, NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand. The important advantage is that the commitment rate acts as a metric for data consolidation. Note: The commitment rate reflects the amount of logical data consolidation. This metric is suitable for deciding when data should be left for organic growth.

Additionally, the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers. APPLICATION RECOMMENDATIONS Thin provisioning is most effective when applications use data that is committed to them step by step. When applications preformat data, the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks. Because thin provisioning has no performance penalty, the general recommendation is to provision with the zero fat configuration. For SAN-attached storage, NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands. This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed. On Windows platforms, this can be configured in NetApp SnapDrive . For Oracle database best practices, refer to WP-7084: Storage Efficiency in an Oracle Environment.

3.2

PROVISIONING FROM TEMPLATES: VOLUME AND DEDUPE-CENTRIC LAYOUTS

This section deals with provisioning storage for similar applications from a golden template. A valid use case is a hosting provider who offers and serves predefined application services in mass quantities. Instead of provisioning each application from scratch, the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure. When applications are provisioned this way, NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space. This achieves a high degree of data consolidation and cost savings. The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios. Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance, CPU, and memory. There are two ways to align application data to a NetApp shared storage infrastructure: Volume-centric storage layout Dedupe-centric storage layout

Depending on the data lifetime, suitability for deduplication, consistency, and tool constraints, one way of aligning application data is more appropriate than the other. In both variants, the storage of the application template can be provisioned as either full, low, or zero fat. The cloning procedure inherits the attributes of the parent volume. To create space-efficient clones, the space guarantee must be set to none.

22

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

CONSEQUENCES FOR MONITORING When using one of the following layouts, very high data consolidation can be achieved. Because this effect depends on the usage characteristics of the corresponding applications, monitoring the aggregate is key. In case a low fat volume acts as a template that is cloned, preserving the original space guarantees, monitoring is necessary for the cloned volumes as well. VOLUME-CENTRIC STORAGE LAYOUT In volume-centric storage layout, an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations, such as instant cloning and volume-consistent Snapshot copies. In addition to the convenient ways to manage volumes, volume-centric storage layouts have storage efficiency advantages in two dimensions: High instant storage efficiency savings. High instant savings when cloning data of an application instance with FlexClone; savings might deteriorate over time Long-term storage efficiency savings. Medium long-term savings when deduplicating application data

A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed. This approach works for both NAS and SAN. Figure 12 shows the data alignment of an application instance and its volume. An application instance organizes its data in one or more dedicated volumes. Note that the entire construct is created within one aggregate. Because deduplication is performed on the volume level, long-term savings depends on the block-sharing rate within one instance of an application. Volume-centric layouts are preferred in the following cases: Simplicity of data management using volumes Individual control over the SLA of each application instance Application instances with a short duration No consideration of deduplication Management tools that require volume-centric layouts

23

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 12) Volume-centric storage provisioning. Application instances are aligned horizontally with their volumes.

Template

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

Deduplication Block Sharing

FlexVol

Instance 1

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

Deduplication Block Sharing

FlexVol
FlexClone Block Sharing

...
LUN/qtree LUN/qtree LUN/qtree LUN/qtree LUN/qtree

Instance n

Deduplication Block Sharing

FlexVol

Impact on commitment and storage utilization: The impact of using FlexClone to clone a volumecentric storage layout to implement storage template-based provisioning is visualized schematically. At clone creation, Data ONTAP creates metadata for the new instance of the data. It allocates data for storing changes to the cloned copy or new data on request. Thus, the overcommitment of the aggregate containing the cloned data increases when creating the clone. However, this does not affect the space used in the aggregate. When data in the clone is rendered and new data is added by the application, the aggregate use will grow. Best Practice A volume-centric layout implicitly implements a consistency group. It is preferable to align all application data in it, which should be recovered at a certain point. Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance. Client side data realignment, such as disk defragmentation or database table space reorganization, has a counterproductive effect on the FlexClone savings. This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process. If possible, the following actions on client data should be avoided: Reorganizing data, for example, database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning Preformatting data

24

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

DEDUPE-CENTRIC STORAGE LAYOUT In a dedupe-centric storage layout, the goal is to achieve high storage efficiency returns from the deduplication feature. In contrast to the volume-centric storage layout, data of different application instances is grouped to achieve storage efficiency returns across a set of application instances. Figure 13 shows a sample dedupe-centric storage layout. Data of application instances is organized horizontally. Individual data of each application is grouped vertically in a volume to implement deduplication. This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily. Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data. To implement template-based provisioning with such a layout, cloning template data must be performed with the file/LUN FlexClone operation. File/LUN FlexClone allows storage objects to be cloned within a volume, providing finer granularity. This storage layout provides the following storage efficiency advantages in a short- and long-term perspective: Very high long-term storage efficiency savings. Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns. Short-term storage efficiency savings. Instant storage efficiency savings are provided when cloning an application instance through a file/LUN FlexClone operation, for example, template application data.

In contrast to the volume-centric storage layout, application instances are bundled together in a matrix style because of their participation in a volume. This implies that the applications share major operational tasks and are managed as a bundle. From an SLA perspective, a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout. Achieving application-consistent Snapshot copies requires the iterative application of file/LUN FlexClone functionality to all storage objects of the instance. This is slightly more difficult than cloning with a volume FlexClone operation. TR-3505: NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment. Figure 13 illustrates dedupe-centric storage provisioning. Volumes are shared among several application instances to achieve cross-dedupe returns. Note that this construct is created within an aggregate. Volumes can be assigned to different aggregates.

25

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 13) Dedupe-centric storage provisioning. Application instances are aligned horizontally; volumes are aligned vertically.

Template Instance 1 Instance 2

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

LUN/qtree

Deduplication Block Sharing within FlexVolume

Deduplication Block Sharing within FlexVolume

Deduplication Block Sharing within FlexVolume

Deduplication Block Sharing within FlexVolume

Deduplication Block Sharing within FlexVolume

FlexVol

FlexVol

FlexVol

FlexVol

FlexVol

Impact on commitment and used aggregate usage: When creating the FlexVol volumes for this layout, their individual size is contributing to the commitment rate. The aggregate use grows with the provisioning and object use within the FlexVol volumes. Provisioning a new instance in this layout through a file/LUN FlexClone operation has no effect on the overcommitment rate. It has an effect on the overdeduplication value of the volumes itself. Thus, NetApp recommends using zero fat configuration for the volume to have autogrow enabled. Best Practice This layout is very attractive for applications using multiple but similar storage objects among service instances (for example, virtual disks in virtual machine hypervisors). They usually use similar operating systems and applications in dedicated virtual disks. Thus, grouping these storage objects leads to a very high degree of consolidation due to deduplication. Quickly changing data, such as pages and swapfiles, should not be considered for inclusion in deduplicated volumes on primary storage. Deduplication savings are limited due to their high change rate and do not justify running the deduplication process. NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume. We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations. Because of the way that NetApp storage controllers work, defragmentation of client data is served at no performance penalties.

26

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

3.3

SETTLED/NOMAD PROVISIONING FOR NETAPP DATA MOTION

Settled/nomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage. When the online migratability features of storage are exploited, response times to mitigate data growth scenarios are independent of application-specific planned downtime windows. Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility. Thus, it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications. NetApp MultiStore technology implements this feature using the vFiler abstraction, which NetApp recommends you consider in the provisioning process. Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach. The settled/nomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate. Figure 14 illustrates the concept of settled/nomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate. The settled part describes data that does not move during its lifetime. It might use vFiler units to simplify operation and hardware maintenance of the storage controller, but there is no direct need. The nomad parts are considered moving parts and thus must make use of vFiler units. The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part. Assuming that the aggregate size is constant over this period, the aggregate is filled with settled and nomad data. Over the data lifetime, more and more nomads are migrated away. At the end of the lifetime, the settled data is left. It is irrelevant whether the data growth happens in the settled or nomad part; when a nomad is migrated away, the resource situation on the aggregate is relaxed. It is preferable to provision several nomads of different sizes. This allows you to: React on different growth scenarios of the data Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a limited resource Operate the aggregate in its operational sweet spot corridor over a long time frame; by slicing the migratable entities in the right way, you can be sure that the aggregate operates in a predefined use interval

Figure 14) Settled/nomad provisioning into an aggregate. In case of aggregate tightness, a nomad is migrated to a separate aggregate.

Settled

Nomad

Nomad

Aggregate

To summarize, the settled/nomad provisioning pattern is an elegant method to adjust the block use of an aggregate. The use of an aggregate can be controlled and kept in a desired corridor.

27

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

SLA-BASED ASSESSMENT FOR SETTLED/NOMAD The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs. The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances. We use the introduced SLA metric of service disruption and map it to the stickiness of the settled/nomad instances. The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side. Fibre Channel-attached storage cannot be migrated online at the time of writing. Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment. Alignment by technical impact: For data belonging to application with SLAs that fit perfectly into what is provided, a direct assignment can be made. For example, application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled); applications with the highest acceptable service disruptions should be considered as nomads. However, there might be data of application instances that likely will be migrated during the application lifetime. You must take into account the business impact of migrating these instances.
Figure 15) Alignment by technical impact (sorted by negative impact in descending order).

Instance Neg. Impact

Inst1 High/Outside SLA E.g., All FC Settled Medium

InstN Low/Inside SLA

Nomad

Alignment by business impact: An assessment of penalty costs is made for the data of the remaining applications. For vFiler migration, a very short negative impact on the performance of the service level must be taken into account during the migration. Thus, application data with the highest negative impact is considered to be the stickiest.
Figure 16) Alignment by business impact (sorted by negative impact in descending order).

Instance Neg. Impact $$


$

Settled

Semi-Settled

Nomad

Nomad

PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION Migration of a nomad might be triggered due to heavy storage consumption in an aggregate. It might also be triggered due to performance limitations of the corresponding storage controller. Because the progress of migration is consuming additional resources on the network and the participating storage controllers,

28

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

this consumption must be taken into account to avoid further intensifying the situation. Refer to TR-3881 for a quantitative evaluation of DataMotion. NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover. Doing so should leave enough resources to perform migrations. Migrating vFiler entities consists mainly of SnapMirror and MultiStore technology. Thus, it requires these licenses on all participating storage controllers. TR-3814: NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution. It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager. Furthermore, vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility. ENABLE SETTLED/NOMAD FOR ALREADY-PROVISIONED STORAGE While NetApp recommends that you initially consider the settled/nomad setting and take sizing and lifetime of storage into account, it is possible to implement this in a planned downtime window. If NFSattached storage should be migrated, existing volumes can be adopted by a vFiler entity. Because the vFiler entity has its own IP address, the clients attaching the storage need to be remounted. SETTLED/NOMADLIKE SETTING WITH SHORT/LONG-TERM DATA PAIRING In the previous section, the settled/nomad pattern was described to mitigate organic data growth. The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life. Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance. This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settled/nomad setting. ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios. For example, VMware Storage VMotion is capable of transferring a virtual machine, including its storage, when it is attached using a datastore. Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine. In such cases, a nomad can simply be implemented by a NAS/SAN-attached datastore. In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers, the migration traffic flows using the hypervisor. This might have consequences for the execution of the virtual machines. Also, the NetApp storage efficiency savings cannot be exploited during the transfer. Deduplication savings are gained back by executing the deduplication process on the destination storage controller.

29

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

4 OPERATION
This section focuses on the operation and management of overcommitted storage. The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency. It addresses questions of how to detect situations that need manual assistance, how to raise the awareness of the operational staff, and how to resolve situations that arise. We first consider situations that put the SLA fulfillment at risk. Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives. Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases: Provision storage. Leave room for organic growth. It might be desirable to still allow for extending storage of previously provisioned applications. Reduce storage use with mitigation alternatives such as deletion, data motion, and so on.

These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs. The point is to detect situations that will violate the SLAs in the future. SITUATIONS PUTTING SLA FULFILLMENT AT RISK Over time, more and more data is stored and processed by the provided applications. NetApp storage efficiency technologies compensate this growth. To prevent running out of physical resources, usage must be managed within safe boundaries. This makes sure the operations team has enough time to react with the appropriate mitigation strategy. The following list summarize situations that are critical for service delivery. Running out of time. Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident. This time determines the number of mitigation alternatives that could be considered at a certain time. Running out of mitigation alternatives. Several mitigation alternatives exist to control the usage. However, some alternatives are one-time activities and some must be performed within a certain time frame. Depending on the situation, not all alternatives might be available for use. Running too tight on storage. Over time, applications use more and more of the blocks from storage that were committed to them. This forces Data ONTAP to allocate from a pool of free blocks. Assuming data growth, the size of the free block pool directly translates into available time to react. Running out of storage completely. This must be prevented because it has a high impact on the availability of the service. Furthermore, data integrity can be at risk. Consider the following scenarios: Application wants to write to committed storage but fails (NAS/SAN): For applications, this looks like a storage failure and implies service disruption. Data integrity can be at risk. Application wants to allocate new storage but fails (NAS): An application is confronted with a No space left on device exception. Verify the application behavior on this exception. Most applications can deal with this situation, and data integrity is not at risk.

Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage. Storage for an object such as a LUN or a share can be tight because of: Insufficient space within the volume in which the storage object is contained Insufficient free space within the aggregate in which the storage object and its volume are contained

30

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase. We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups.

4.1

PHASES AND TRANSITIONS

This section outlines the phases of a storage resource. Starting with an empty aggregate, storage is provisioned to certain thresholds. After that, storage is left for organic growth. After certain thresholds are exceeded, further inspection or activities must be performed to mitigate storage tightness. Provisioning storage. When certain thresholds are within a defined range, storage is provisioned to the aggregates. Monitoring should support making a decision to transition to the next phase. Leave storage for organic growth. When certain thresholds are exceeded, provisioned storage is left for organic growth. Depending on the environment, storage of existing applications might still be extended, and a second threshold might signal that extensions are not possible anymore. Monitoring should support making a decision to transition to the next or prior phase. Mitigate storage use. When certain thresholds are exceeded, this phase must make sure that committed storage can be delivered to store applications data. The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor. Monitoring should support making a decision to transition back to the organic growth phase.

4.2

MONITORING

NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage. It provides alerts, reports, performance monitoring, and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency. NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors. Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations. Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation. When the event triggers an alarm, notification can be sent by e-mail, pager, Simple Network Management Protocol (SNMP), or customized scripts. To raise awareness about a certain situation, the event must be characterized using the metrics provided by Operations Manager. To communicate the event, an alarm must be set. THRESHOLDS Operations Manager monitors relevant parameters that indicate the presence of specified situations. Thresholds can be set to trigger actions, for example, to notify the operational team that an alarm situation exists. The thresholds can be set to notify in advance. Operations Manager also performs trending on operational parameters to express the urgency of a certain situation. This supports the decision making on how to react to a certain situation. Within your Operations Manager instance, the thresholds can be verified and set by navigating to the Default Threshold page and following SetupOptionsDefault Thresholds or the link http://opsmgrserver:port/dfm/edit/options. Figure 17 shows a sample configuration page.

31

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 17) Operations Manager screen to configure thresholds on operational metrics.

For aggregates, Operations Manager provides a set of thresholds described in the following list. They represent absolute limits. Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation. Monitoring the aggregates is very important. They are the physical containers of preallocated and growable storage objects that host application data. If an aggregate of a storage controller runs at uncontrolled usage, it could have direct consequences for applications for which it is providing data. The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage. When no mitigation alternatives should or can be taken over the lifetime of the data, mitigation actions must be performed in scheduled downtime windows. Thus, threshold settings and actions tend to be more conservative to avoid SLA-endangering situations. Aggregate full threshold. This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge. Aggregate nearly full threshold. This threshold is the counterpart of the aggregate full threshold but provides an earlier notification. Aggregate over committed threshold. This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge. This metric refers to the amount of storage that is assigned to applications. It represents the level of consolidation and also the width and increase of the block use corridor. Aggregate nearly over committed threshold. This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification.

Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state. Volume full threshold. This event notifies a person in charge that the preset threshold on the metric volume has been reached. Volume almost full threshold. This event is the counterpart of the volume full threshold but provides an earlier notification. Volume autosized. This event notifies a person in charge when a volume was extended using the autogrow functionality.

32

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

TRENDING Operations Manager 4.0 supports a variety of trending features for certain storage objects. This is an important feature for all storage objects with a fixed size. It allows you to estimate when the time frame within a certain situation needs to be mitigated. The trend is calculated as a linear regression of up to 90 days in the past. For aggregates, Operations Manager calculates a trend on the daily growth rate. In your Operations Manager instance, use the link http://opsmgrserver:port/dfm/report/view/aggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full. Each aggregate can be drilled down, and you can select trending based on an interval of one day, one week, one month, three months, or one year. To see the effect of recent data activities, set the interval of a trend calculation to enclose this activity. Investigate if growth rates calculated over different intervals deviate significantly.
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager.

Note:

The calculation basis of time to full is the usable aggregate capacity. This value is not calculated based on the aggregate full threshold setting.

The trending on the volume level is analogous to the trending on the aggregate level. In your Operations Manager instance, access the link http://opsmgrserver:port/dfm/report/view/volumesgrowth-rates for trending of volume growth rates. NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates. On the volume level, you can set an alarm to fire when the volume growth is outside the usual boundary. Abnormal volume growth: This event notifies when the growth rate of a volume exceeds a preset limit. It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object.

33

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME For each aggregate or volume, the general default settings can be overwritten and be made more specific. In order to do so, select your aggregate or volume of choice. For example, you can use the links already provided in this technical report. When selecting a concrete aggregate, it can be configured using the Edit Settings link and dialog. When selecting a concrete volume, its configuration can be adapted using the Edit Quota Settings link and dialog. MONITORING STORAGE EFFICIENCY RETURNS NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure. This report lists important parameters drilled down by utilization, capacity, unused reserve capacity, storage efficiency, and efficiency return breakdown. It allows you to judge the effectiveness of the NetApp storage efficiency technologies. Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager. Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard.
Figure 19) Storage efficiency dashboard in Operations Manager.

34

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

4.3

NOTIFICATION

Operational staff must be notified when situations occur that require a transition of phases, especially situations with negative consequences for the SLA fulfillment. Operations Manager provides alarms for notification. Alarms are bound by the metrics and thresholds explained in section 4.2 and notify operational staff, storage administrators, or storage capacity planners. Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low. After being notified, the responsible person can evaluate the situation and decide which actions to take. Further, the trends on operational parameters provided by Operations Manager simplify the decisionmaking process. Depending on the organizational structure, the responsibilities to operate, plan, and administer the storage infrastructure can be separated into different groups, persons, or roles. Thus, we characterize the mitigation activities by required skill set and time to act. This allows an easy alignment to a given organizational structure. Operations Manager supports different methods to send a notification. The notification methods can be used in combination; for example, a notification can be sent by both e-mail and SNMP. NOTIFY BY E-MAIL An alarm can be sent to multiple destinations by e-mail. Repeated notifications can be sent when the situation is not resolved. To set an alarm, access the alarm configuration page by following SetupAlarms from the default Operations Manager dashboard. Clicking Advanced Version accesses an advanced version of this page. The direct link for the advanced version is http://opsmgrserver:port/dfm/edit/alarms-advanced. Figure 20 shows how to configure an alarm. Adjust the threshold as described in section 4.2. NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons. If you follow this recommendation, changing responsibilities and roles does not require you to make corresponding changes to Operations Manager. NOTIFY BY SNMP Operations Manager supports the notification of alarms using SNMP, a widely used standard that is supported by most orchestration frameworks and ticketing systems. Using SNMP, Operations Manager can be integrated into existing ticketing systems. Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold. The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening. The alarm can be saved and tested.

35

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 20) Configuring an alarm based on the threshold, aggregate almost full.

Note:

The SNMP event must be routed to the responsible groups or persons in the ticketing system. Thus, mapping the detected situation and responsible operational group must be implemented there.

NOTIFY BY SCRIPT Operations Manager supports notifications in highly customized integration scenarios. A user-defined adapter can be executed, which delivers the information to the infrastructure or system of choice. A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure. To set an alarm on the event aggregate almost full, which starts a script, instrument Operations Manager on the command line: dfm alarm create s script_to_execute h aggregate-almost-full

36

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

4.4

MITIGATE STORAGE USE

Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk. This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor. The effect of a mitigation activity should return the usage to its defined corridor. Storage tightness might occur in aggregates or volumes, depending on their configuration. When all volumes in an aggregate are thin provisioned with the zero fat configuration, they use the shared pool of free blocks of the aggregate to deal with data growth. To solve this situation, a mitigation activity on the aggregate level is necessary. When storage objects in a fixed size volume cannot grow to the committed space, a mitigation activity on the volume level is necessary to solve upcoming volume tightness. MITIGATION ACTIVITIES FOR AGGREGATES Aggregates are the coarsest storage object within a NetApp storage controller. Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration. They might grow on demand; however, because they live within an aggregate of physically limited size, the growth of the storage object itself is also limited. As described in the following list, providing usable space in the aggregate automatically allows contained storage objects to grow. 1. Increase the aggregate. You can add drives to aggregates during operation. You can repeat this mitigation activity. The maximum aggregate size depends on the Data ONTAP version, the type of aggregate, and the type of storage controller. Aggregates with 64-bit supported with Data ONTAP 8 have very high limits. Additional drives can be used immediately; however, their procurement needs to be taken into account. Rebalancing data between existing and new drives results in a uniformly distributed use of the drives. 2. Decrease aggregate Snapshot copy reserve. This reserve is needed in MetroCluster and for SyncMirror configurations. In other configurations, you can decrease this reserve or set it to zero. 3. Shrink preallocated volumes. Volumes with preallocated space reserve available aggregate-free space. When possible, these volumes can be shrunk, returning the freed space to the aggregate to allow others to make use of the preallocated space. 4. Enable deduplication and shrink the volume. 5. If available, migrate a nomad online to a different storage controller. Doing this on the NetApp storage controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license. Adequate free space on the aggregates of the target storage controller is required. This mitigation activity is not limited in its applicability. 6. A volume can be migrated from one aggregate to another aggregate within the same or another storage controller. SnapMirror replicates the data while it is still served. To switch over to the replicated data, the client needs to detach from the source and reattach to the replica. After completion, the replica is considered the new source. This operation has an impact on client downtime. Typically, inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes. 7. If none of the listed activities can be used, the application must be stopped to achieve a consistent state. The mitigation activities for aggregate tightness are summarized in Table 8. Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online.

37

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Table 8) Mitigation alternatives to control use within aggregates.

No.

Mitigation Activity
Increase aggregate capacity by adding disks (Data ONTAP 7.x) Data ONTAP 8

Repeatability
Low limits

SLA Impact

Preparation Time
HW procurement

Time to Show Effect


Immediate (+rebalancing)

None High limits One time None

Decrease the aggregates Snapshot copy reserve area if possible Shrink other volumes in the aggregate if they have enough free space Run deduplication and shrink volumes Migrate nomads (online) Migrate volumes to a different aggregate (offline) Prevent application data loss and stop the application, then migrate (offline)

None

Immediate

One time

Low

None Time to execute dedupe None Next planned downtime window Coordinate with app. owner

Immediate

Repeatable

Low.

Immediate Minutes: vFiler migration time Minutes: Volume switchover time Minutes: Migration time

Repeatable

Low

Repeatable

Medhigh

Repeatable

Lowhigh

MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size. When it is not possible to enable growth for storage objects contained in volumes, you need to perform an aggregate mitigation activity. Note: Some of these mitigation alternatives depend on and affect used capacity (in the aggregate).

Table 9) Mitigation activities for resource tightness within volumes.

No.
1 2 3

Mitigation Activity
Reduce the volumes Snapshot copy reserve (if configured and not used) Increase the volume if there is free space in the aggregate (see Table 8) Delete Snapshot copies not needed or those skipped by the AutoDelete function

Repeatability SLA Impact


One time One time Limited Low Low Low

Prep. Time
None None None

Time to show effect


Immediate Immediate Immediate

Activate FAS deduplication for the volume One time (requires proper space guarantees) If the volume contains more than a single LUN, migrate those objects to another volume or aggregate Stop application and migrate data

Low/possible Wait for performance schedule impact High Next planned downtime window Coordinate w/app. owner

Hours Minutes: Volume migration time Minutes: Migration time

Repeatable

Repeatable

High

38

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

5 REAL-LIFE SETTINGS
This section summarizes two different operational settings. The first one does not make use of online data migration and settled/nomad provisioning pattern; the second setting implements a settled/nomad provisioning pattern to maintain the flexibility for online data migrations. The concrete threshold settings and approaches might be very customer and application specific. To exploit NetApp storage efficiency features in your own data center, NetApp recommends that you start conservatively. After you are familiar with the process, work toward the customer-specific optimum.

5.1

SAMPLE SETTING 1: REAL-LIFE SETTING

This section describes a real-life setting a customer started with. It makes use of a limited set of mitigation alternatives. This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped. A settled/nomad setting is not considered. Thus, the threshold to signal a transition of the phases are set lower and more conservatively for this customer. Because on-line data migration and aggregate extension are not available as a mitigation alternative, sufficient available space is required to safely reach the next planned downtime window, as shown in Figure 21. In practice, refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth. All storage is provisioned using the zero fat option with growable FlexVol volumes. Only aggregate monitoring is used. Aggregate extension is not a mitigation alternative. Online migration is not a mitigation alternative.

Figure 21) Storage to enable organic data growth between planned downtime windows.

Planned Downtime Window Months

Planned Downtime Window Time

Data

Data Growth

Note:

Several months might fall between planned downtime windows to perform major mitigation alternatives.

The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows. To prevent this situation, sufficient space must be reserved to enable data growth. Second, the level of data consolidation is monitored to manage accumulated growth rates safely. Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached. The operational teams are notified using an alarm on the Operations Manager event aggregate

39

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

nearly full threshold (event configured when metric exceeds 50%) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110%). These alarms stop the responsible entities from provisioning new storage; the aggregate is left for organic growth. An assessment of the storage situation might be performed. Depending on experiences and knowledge of the application growth rates seen in the past, the thresholds may be adapted. After the upper threshold of the operational sweet spot corridor is left, an alarm based on aggregate full threshold (set initially to 65%) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window. In the meantime, organic growth can take place in the yellow-marked area shown in Figure 22. The metrics used are: First metric: Aggregate capacity used Second metric: Aggregate space committed

Because all storage is provisioned using the zero fat option, no artificial limited storage container exists. Thus, there is no need to consider a volume-based metric. Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space.
Figure 22) Transition of changes depending on the metrics, aggregate capacity used and aggregate committed space.
Operational Sweet Spot Corridor

Aggregate Capacity

Data

Data Growth

Aggregate Capacity Used Provisioning New Storage Capacity Assessment, Adapt Thresholds Mitigate Aggregate Space Committed Provisioning New Storage Assess Capacity

050% Y Y Y

> 65%

Y 0110% Y Y > 120%

40

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

5.2

SAMPLE SETTING 2: SETTLED/NOMAD

This section describes a setting that takes the settled/nomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology. This setting requires storage space at alternative locations where nomads might be migrated. It is seen more often in larger environments with an emphasis on NFS-attached storage. It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors. Figure 23 visualizes the effect of a mitigation alternative that can be performed online.
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months.

Detecting the Need to Act Hours

Effect of Mitigation (e.g., migration) Time

Settled

In this sample setting, as well as in sample setting 1, the critical situation to prevent is where aggregates become too full. However, the flexibility gained with online data migration does not require taking a further metric into account, for example, storage overcommitment. All storage is provisioned using the zero fat option with growable FlexVol volumes. Only aggregate monitoring is used. Storage is provisioned using the settled/nomad pattern with ability to perform online migration. Days to full aggregate trending was more than 200 days on average. Note that this value depends on the individual situation and is calculated against 100%.

The sole metric in this setting is aggregate capacity used. Table 10 contains the thresholds describing the transition of phases.
Table 10) Phase transitions with settled/nomad provisioning pattern and on-line migration mitigation alternative.

Detection Threshold
> 70% > 85% > 90%

Notify
Storage operations Storage operations Storage operations

Mitigation
Stop provisioning of storage Stop extending provisioned storage Relax resource situation and migrate nomad

41

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Figure 24) Visualization of phase transitions depending on metric, aggregate capacity used.
Operational Sweet Spot Corridor

Aggregate Capacity

Settled

Data Growth

Aggregate Capacity Used Provisioning New Storage Extending Already Provisioned Storage Relax Utilization NetApp Data Motion a Nomad

070% Y Y

7085%

> 90%

Y Y

You can achieve a very high data consolidation in this setting by using NetApp storage controllers. The served amount of logical data exceeds the physical usable capacity by factors.

42

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

6 STORAGE EFFICIENCY COOKBOOK


To increase consolidation, we propose the following steps to exploit the advantages of NetApp storage efficiency technologies.
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe.

Capacity Used

Committed Capacity

Capacity

Overall Trend

Last 3-Month Trend

1 Month

3 Months

Elapsed Time

As a general rule, we dont introduce artificially limited container types. They increase monitoring effort and might prevent pooling unused space. For an existing landscape, proceed as follows: 1. Install and configure Operations Manager, the earlier the better. From day one, Operations Manager collects data. The more information it collects, the better are the predictions and trending. The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness. Make sure all NetApp storage controllers are monitored. Wait for one month. Define which mitigation alternatives your operational team is comfortable with. Check the boxes accompanying the provided list and identify the time your team needs to perform the actions. If you can perform online migrations for nomads, define the time to negotiate and approve the migration. For all other data, define the time to the next planned downtime window. 2. Change all volumes to zero fat configuration with the autogrow feature set to on. Since there is no artificial space limitation for the autogrow volume, monitoring is restricted to aggregate monitoring. When using deduplication, set the volume to autogrow. During this period, the capacity used diminishes, as shown in Figure 25. Usually, each change in the volume configuration can be detected. So far, only metadata has changed, and unused space in the volumes is now available from a common shared pool. The aggregated free space is available for the same applications storing the data. We recommend monitoring for three months to understand the growth rate of your environment. 3. Derive the growth trend of the aggregates. Note that the overall trend might still be negative. Use Operations Manager to help determine the trend. Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications, such as month- and year-end closing of business applications or regular software maintenance updates (for example, in virtualized environments).

43

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Work backward to determine the thresholds of the phases: a. Define the aggregate use at a level where your operational team is comfortable. At first, do not exceed 80%. Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect. b. Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives. c. Determine the growth rate. Operations Manager provides help in determining the trend of data growth.

d. Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided. Operations Manager helps you to understand the growth rate of the past. To provision storage, following these steps: 1. Create big aggregates to enable shared storage in your data center. We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation. Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense. Free space and performance in an aggregate can be shared. Few big aggregates reduce the monitoring effort. Also, build aggregates in a limited number of standardized configurations and sizes. 2. Create volumes in zero fat configuration with autogrow feature set to on. Because there is no artificial space limitation for the autogrow volume, monitoring is restricted to aggregate monitoring. When using deduplication, set the volume to autogrow. Whenever possible, use Provisioning Manager for convenience and for repeating configurations. a. Classify your data and provision for flexibility. Give NFS a preference and make use of vFiler entities. b. Turn on deduplication. Even in situations where deduplication rates are expected to be low, there is sometimes a big surprise. If you prefer to try deduplication on the storage controller, then create a clone of the intended volume and deduplicate it to estimate the effect. Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job. Also, use deduplication scheduled by change rate. Mind the maximum sizes depending on the storage controller. c. Initially size volumes to the expected size of the data you are going to store. Thus, the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely.

d. Trim existing volumes provisioned in full/low fat to zero fat configuration. Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments: vol options <volume> guarantee none vol options <volume> try_first volume_grow vol autosize <volume> -m <maximum size> -i <increment size> on snap autodelete <volume> off Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments: vol options <volume> guarantee none vol options <volume> try_first volume_grow vol autosize <volume> -m <maximum size> -i <increment size> on snap autodelete <volume> trigger volume snap autodelete <volume> delete_order oldest_first snap autodelete <volume> on

44

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

Use the following commands to configure zero fat without Snapshot autodelete for SAN environments: vol options <volume> guarantee none vol options <volume> try_first volume_grow vol autosize <volume> -m <maximum size> -i <increment size> on snap reserve -V <volume> 0 snap autodelete <volume> off lun set reservation <lun> disable Use the following command sequence to configure zero fat for SAN environments with autodelete set to on: vol options <volume> guarantee none vol options <volume> try_first volume_grow vol autosize <volume> -m <maximum size> -i <increment size> on snap reserve -V <volume> 0 snap autodelete <volume> trigger volume snap autodelete <volume> delete_order oldest_first snap autodelete <volume> on lun set reservation <lun> disable e. Identify storage of inactive data. Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated. f. Identify storage that is close to deprovisioning. Deprovisioning of storage relaxes use and can act as a mitigation alternative.

g. Turn already provisioned volumes in zero fat configuration. 3. Let Operations Manager monitor the landscape. Use reported aggregate daily growth rates and days to full trending reported by Operations Manager to adapt the thresholds. Remember that days to full trending reports against 100% capacity used of aggregate.

45

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

7 REFERENCES
TR-3505: NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide www.netapp.com/us/library/technical-reports/tr-3505.html TR-3563: NetApp Thin Provisioning: Improving Storage Utilization and Reducing TCO www.netapp.com/us/library/technical-reports/tr-3563.html TR-3710: Operations Manager, Provisioning Manager, and Protection Manager Best Practices Guide www.netapp.com/us/library/technical-reports/tr-3710.html TR-3786: A Thorough Introduction to 64-Bit Aggregates www.netapp.com/us/library/technical-reports/tr-3786.html TR-3814: NetApp Data Motion www.netapp.com/us/library/technical-reports/tr-3814.html TR-3827: If Youre Doing This, Then Your Storage Could Be Underutilized www.netapp.com/us/library/technical-reports/tr-3827.html TR-3881: DataMotion For Volumes For Enterprise Applications http://www.netapp.com/us/library/technical-reports/tr-3881.html NetApp Operations Manager Efficiency Dashboard Installation and User Guide http://now.netapp.com/NOW/download/tools/omsed_plugin/InstallUserGuide.pdf

46

Storage Efficiency Every Day: How to Achieve and Manage Best-in-Class Storage Use

8 ACKNOWLEDGMENTS
This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise. It would not have been possible without the input of many experts. Significant contributions were made by Matthew Agoni, Carlos Alvarez, Jeff Berks, Manfred Buchmann, Hans Deuerlein, Erik Dybwad, Niels Reker Oliver Dziuba, Larry Freeman, Gary Garcia, Pretoom Goswami, Naveen Harsani, George John, Nigel Maddock, Andreas Martinovsky, Holger Niermann, Cesar Orosco, Christian Ott, Shiva Raja, Michael Reusch, Maurice Skubski, John Tyrrell, Oliver Walsdorf, and Allen Wang.

NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customers responsibility and depends on the customers ability to evaluate and integrate them into the customers operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

47

Copyright 2010 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, Data ONTAP, FlexClone, FlexVol, MultiStore, RAID-DP, SnapDrive, SnapMirror, Snapshot, SyncMirror, and vFiler are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. Windows is a registered trademark of Microsoft Corporation. Oracle Day: How to Achieve and Manage Best-in-Class Storage Use Storage Efficiency Everyis a registered trademark of Oracle Corporation. VMware is a registered trademark and VMotion is a trademark of VMware, Inc. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. RA-0007-1010