Sie sind auf Seite 1von 11

Understanding Cold

Storage:
Best Practices around
Google Nearline &
Amazon Glacier
Author: Bill Kleyman, Cloud, Virtualization and Data Center Architect

The Cloudberry Lab Whitepaper

CloudBerry Lab

Learn more
www.cloudberrylab.com

Understanding Cold Storage

Summary:
Today, the modern organizations is tasked with managing more backup data, within
more locations which must ultimately be delivered to more users and services.
Through it call cloud computing has become a source for optimized data storage
and information delivery. In fact, the public cloud, hybrid clouds, and private clouds
now dot the landscape of IT based solutions. Because of that, the basic issues have
moved from what is cloud to how will cloud projects evolve. This evolution takes the
form of next-generation storage methodologies. New solutions around cold storage
are evolving how organizations store and deliver vast amounts of information.
So, heres the big question how do you backup and control all of this data in the
cloud? Most of all how can new solutions around cold storage help you create better
backup economics? With so many organizations looking at a Google Nearline or
Amazon Glacier cold storage architecture Its clear that businesses are looking at
new ways to make their data center and data center a lot more agile. Theyre looking
for ways to better control the ow of information between their data centers and their
cloud providers. In this whitepaper, we look at the modern cloud, how data and
backup controls have evolved, using cold storage as a data control mechanism, and
best practices around Google Nearline and Amazon Glacier integration.

Introduction
Lets start here, ndings form a recent Gartner report go on to say that the use of
cloud computing is growing, and by 2016 this growth will increase to become the bulk
of new IT spend. 2016 will be a dening year for cloud as private cloud begins to give
way to hybrid cloud, and nearly half of large enterprises will have hybrid cloud deployments by the end of 2017. Growth in cloud services is being driven by new IT computing scenarios being deployed using cloud models, as well as the migration of traditional IT services to cloud service alternatives.
This means that organizations see the clear benet of extending their data centers
and data backup practices into cloud environments like Amazon Glacier and Google
Nearline by using intelligent APIs, like OpenStack. Furthermore, as an example - new
technologies give you the direct ability to integrate with an Amazon Glacier backup
architecture to enhance cold storage by using S3 Backup APIs.
A big part of the conversation begins around understand just how much data were
creating. Cisco reports that global data center trac is rmly in the zettabyte era and
will triple from 2013 to reach 8.6 zettabytes annually by 2018. A rapidly growing segment of data center trac is cloud trac, which will nearly quadruple over the forecast period and represent more than three-fourths of all data center trac by 2018.

CloudBerry Lab

Learn more
www.cloudberrylab.com

Understanding Cold Storage

Today, there are a lot more conversations happening around cloud and of course,
data backup. Organizations across all verticals and user counts absolutely see how
cloud storage can act as a direct benet to their business. To dive a bit further into the
discussion lets begin by look closer at cold storage, impacts of cloud computing,
how to properly deploy a cold storage and backup architecture, and how all of this
impacts your business.

1 Diving into Storage and the Cloud Backup Industry


There are a number of dierent ways to control and manage your data. Were far
beyond legacy means of data retention and are now talking about advanced
controllers, cloud-based backups, hybrid solutions, and of course newer cold
storage solutions. We begin by briey dening various cloud storage options:

Traditional storage.

This is where it all began. Traditional storage environments revolve around onsite storage
solutions built on data center storage arrays. This is where NetApp, EMC, IBM and other
such solutions empower on premise data centers to control, backup, and archive their
data. You have complete control of both the physical and logical aspects of the storage
environment and can apply a number of workloads to your storage arrays. New kinds of
storage solutions oer even greater amounts of convergence and integration with underlying data center resources.

Use-cases and cautions:

Traditional storage methodologies will continue to have their presence in the data center.
You have complete control of your disks and can assign workloads based on the requirements of your business. If your organization requires that data sit very close to an application and the user an on premise storage array may be the right move. Also, new
converged infrastructure storage solutions can be great for specic virtual workloads like
application and even desktop delivery. One aspect to be aware of is capabilities around
scale and economics. Does it make sense to keep everything on premise? How much is it
costing you to day to backup and store your data? Does the cloud make sense? Which
brings us to the next technology. This is where it all began. Traditional storage environments revolve around onsite storage solutions built on data center storage arrays. This is
where NetApp, EMC, IBM and other such solutions empower on premise data centers to
control, backup, and archive their data. You have complete control of both the physical
and logical aspects of the storage environment and can apply a number of workloads to
your storage arrays. New kinds of storage solutions oer even greater amounts of convergence and integration with underlying data center resources.

CloudBerry Lab

Learn more
www.cloudberrylab.com

Understanding Cold Storage


Cloud storage and backup.

The evolution of cloud services saw cloud storage as a great tool to ooad on premise
storage economics into the cloud. Now, organizations can allow entire applications to live
in the cloud, utilizing cloud storage resources. Capabilities around cloud storage and
backup are evolving as well. Organizations can now access their cloud stores from any
device, any time, and anywhere. This type of cloud storage model helps reduce on-site
hardware footprints and creates even better storage and backup economics.

Use-cases and cautions:

Entire business and applications are being born in the cloud. Todays cloud backup and
storage environment is a lot easier to use, more cost-eective, and easier to integrate
with. Furthermore, storage services from AWS, Google and Azure allow you to dynamically
provision and de-provision resources for users, applications, and large workloads. Something to keep in mind, is the kind of workloads and backup repositories that youre placing
into the cloud. Vast amounts of archival data might start to eat up storage volumes and
get pricey. Furthermore, if your information is bound by compliance youll need to
ensure that your cloud storage can be compliant as well. This means you might need to
keep some of this data onsite.

Onsite-to-cloud storage and archiving.

Similar to traditional storage methodologies onsite backup looks at long-term retention


of data. In this scenario, an organization might be using older tape drives to house information. Although this may have worked in the past, todays reliance on resiliency means
that archiving and onsite storage must evolve as well. Organizations are able to remove
older tape-based methodologies and extend their existing backup strategies into a cloud
model.

Use-cases and cautions:

Not only are you removing physical tapes youre also creating a more resilient disaster
recovery architecture. Now, archived backups are no longer taking up physical space
within your data center. Youre creating better cloud storage economics while building in
data resiliency. Furthermore, youre able to control how and where your data is stored.
One big caution here is to always create good data archival policies. In some cases, you
would set alerts when you reach certain threshold to alert you on storage capacities and
access. Or, you can tag data points to be uploaded automatically after a set period of
time. For example, an older le repository must now be removed from active, onsite,
systems and transferred into the cloud. The great part here is that this process can be
automated via intelligent control platforms.

CloudBerry Lab

Learn more
www.cloudberrylab.com

Understanding Cold Storage


Hybrid storage and archival models.

Combining all of the previous topics around storage, archiving, and the cloud we see
that one of the most dominant forms of cloud architecture is the hybrid model. Organizations are now integrating into the cloud conguring their storage architecture to talk with
powerful APIs and even direct controller integration. This means that some data stays on
premise while other data points can live in the cloud. This can be a mesh of application
data, database information, and even archived data. Furthermore, load-balancing technologies allow you to create dynamic policies to control where data resides and how its
being delivered.

Use-cases and cautions:

This kind of scenario is perfect for growing organizations looking to deliver data to a
variety of points. In this kind of model you can synchronize on premise and cloud-based
data to talk to applications, web services, and even orchestration tools. You have a lot of
exibility around data control, delivery policies, and backup architectures. Still something to be aware of will be the specic policies around your data. This means understanding retention policies, archival procedures, and the economics of cloud storage.
Basically does it make sense to store an archival backup repository in a more expensive,
live, cloud backup architecture? This is where new solutions around cold storage can
help.

2 The Evolution of Cold Storage


Cloud storage has certainly come a long way. To add on new services cold storage

became a very popular means of storing vast amounts of information which is not frequently
accessed. Legal data, tertiary copies of information, data that is required to be retained for
longer periods of time due to compliance, and archival information set for infrequent access
are great use-cases for cold storage deployments. With that in mind what sets cold storage
apart from more traditional storage options?

Lets start by dening cold storage.


Cold storage is dened as an operational mode, data storage system, for inactive data.
This kind of deployment will have explicit trade-os when compared to other storage
solutions. When deploying cold storage, expect data retrieval response times to be
beyond what may be considered normally acceptable to online or production
applications. This is done in order to achieve signicant capital and operational savings
Ultimately this means working with the right kind of cold storage backup solution that specically ts your business and workload needs. Although there are a few cold storage platform
out there two are currently leading the market in terms of price, product, and oerings. But
heres the reality not all cold storage architecture are built the same. Keeping this in mind,
lets examine Google Nearline and Amazon Glacier.

CloudBerry Lab

Learn more
www.cloudberrylab.com

Understanding Cold Storage

Google Nearline:
Google announced its Nearline archival storage product earlier in 2015 and it was quickly
seen as a disruptive solution in the market. Why? There was the direct promise of a very
quick (only a few seconds) retrieval time. When compared to market leader AWS Glacier
this is pretty fast. According to Google, Nearline oers slightly lower availability and slightly
higher latency than the companys standard storage product but with a lower cost. Nearlines
time to rst byte is about 2 5 seconds. Which, when you look at other solutions, can be
seen as a real game-changer. However, there are some issues.
Heres one of the challenges - Google Nearline limits data retrieval to 4MB/sec for every TB
stored. This throughput scales linearly with increased storage consumption. For example,
storing 3 TB of data would guarantee 12 MB/s of throughput while storing 100 TB of data
would provide users with 400 MB/s of throughput. So, if a customer stores 1TB of data within
Nearline, their download will start within 2 5 seconds, and then promptly take 73 hours to
complete (assuming they are downloading 1TB at 4 MB/second). This is a serious consideration point since a similar conguration with Amazon Glacier will actually see faster retrieval
times than that of Nearline. Still, a just-released new feature called On-Demand I/O actually
allows you to increase your throughput in situations where you need to retrieve content
from a Google Cloud Storage Nearline bucket faster than the default provisioned 4 MB/s of
read throughput per TB of data stored per location per project. Two things to keep in mind:
1. On-Demand I/O is turned o by default.
2. On-Demand I/O applies only to Nearline Storage and has no eect on Standard
Storage or Durable Reduced Availability Storage I/O.
Well take a look at pricing as well as a few other comparison points shortly.
Amazon Web Services (AWS) Glacier:
As one of the rst and leading cold storage solutions Glacier was built as a secure and
extremely low-cost storage service for data archiving and online backup. Customers are
allowed to store large or small amounts of data. According to Amazon, pricing can start for as
little as $0.01 per gigabyte per month, a signicant savings compared to on-premises solutions. To keep costs low, Amazon Glacier is optimized for infrequently accessed data where a
retrieval time of several hours is suitable. So, when you look at the very same 1TB of data
being stored with Nearline Glacier would approach retrieval and delivery a bit dierently.
Glacier would have that storage object available to customers in approximately 3 5 hours.
Four hours into their download, a Google Nearline customer would be 5% complete on
downloading their 1TB of data with approximately 69 hours to completion.
Now lets examine these two platforms directly.
Comparing Google Nearline vs. Amazon Glacier
To really understand these technologies we need to look at them side-by-side. A recent
spec chart and base storage comparison chart done by StorageNewsletter outlines specic
services, retrieval limits, and examples of base storage.

CloudBerry Lab

Learn more

www.cloudberrylab.com

Understanding Cold Storage

Is the new Nearline platform really faster and less expensive? Is the linear 4MB/s download speed fast
enough? Lets examine the two comparison charts below, the first one with Glaciers and Nearlines principal specifications, and the second one being a speed and cost chart to store an example of 500TB, adding
1TB and retrieving 1TB weekly, and deleting or overwriting 2TB per month.

SPECIFICATIONS
Amazon Glacier

Google Nearline

Retrieval limit*

No limit or customizable

No limit

Retrieval speed

4 / 8 or 28 hours

Linear 4MB/s for every terabyte stored

Time to begin
retrieval

3h 5h

3s

Free retrieval
allowance per day

0.17% of the data

No free retrieval

Limit days for


deletion

90

30

Storage cost

$0.01 per GB

$0.01 per GB

Retrieval cost

$0.01 per GB

$0.01 per GB

Early deletion cost

$0.01 to $0.03 per GB

$0.01 per GB per day

Transfer cost

1GB free then from $0.09 per GB to


$0.08 per GB to $0.23 per GB depending
$0.02 per GB depending on the
on the transfer region
amount of GB and the transfer
region

*Note on Retrieval Limit: Amazon offers a custom izable limit for retrieval or transfer, which means
that you can limit the data for transfer per month and avoid additional billing. Nearline, does not
offer this option. If not customized, there is no limit.
Based on the chart above we quickly see that Google and Amazon oer the same price for
storage and retrieval, $0.01. Even when we look at the example above we see that price per
TB per month is very similar. This means that Google Nearline and Amazon Glacier are competing around services, performance and not as much around pricing.

CloudBerry Lab

Learn more

www.cloudberrylab.com

Understanding Cold Storage


EXAMPLE WITH 500TB OF BASE STORAGE
Amazon Glacier

Google Nearline

Base storage

500TB

500TB

Weekly storage addition

1TB

1TB

Weekly retrieval

1TB

1TB

Early monthly data overwritten or deleted

2TB

2TB

Time to begin retrieval

3h to 5h

3s

Retrieval speed

278MB/s

2,000MB/s

Total download time

4h

8mins

Monthly base storage cost

$5,000

$5,000

Weekly additional 1TB cost

$10

$10

Weekly retrieval cost

$0

$10

Monthly transfer out cost (basic regions)

$360

$480

Monthly override or deletion cost

$60

$20

Monthly cost

$5,460

$5,580

Final price/TB/month

$10.9

$11.2

Five key aspects to understand around Glacier and Nearline Integration


and Best Practices

1.

Nearline is still very new. So new that its still in Beta. Nevertheless, the technology
promises great accessibility and speed capabilities. Most likely, were seeing Google utilize
new kinds of optical or magnetic media; making initial restore times a lot faster. This is compared to older LTO tapes which will result in longer waits for data.

2.

Make sure you understand the length of time to wait for a data restore. Using our example
above with 1TB of storage, it would last a massive 70 hours, 66 additional hours comparing it
to Glaciers minimum retrieval time, and not counting the additional 3-5h after request. To
that point fast restores can be very expensive. Which brings us to the next consideration.

3.

Google oers a limit of 30 days before you can delete your data with an early deletion cost of
0.01$ per day, versus Amazon which oers 90 days and three times the amount while
deleting in the rst month, two times the amount deleting on the second month and the
same price as Nearline while deleting the last third month. This means that organizations
looking to replace their data frequently (every month or so) you should look at Nearline.
Longer-term retention repositories can look at Glacier for less frequent access and better
pricing.

CloudBerry Lab

Learn more

www.cloudberrylab.com

Understanding Cold Storage


4.

Remember - Glacier costs about the same as Google's new service but data is stored oine.
That means it can take a couple of hours to get hold of that le. Furthermore, to help extend
on premise storage and archival services, Google partnered with Iron Mountain, NetApp,
Geminare, and Symantec/Veritas to sell the new service into enterprises.

5.

Its important to also consider data retrieval, transfer, and cost. Using the example above
from the StorageNewsletter report, we see that Amazon oers 5% of the amount stored for
free retrieval per month, and then $0.01 per additional gigabyte. In this example, 5% of the
data stored would be 25TB, and we are only using 4TB. For retrieving 25TB of data, Google
would charge $250. Heres something else to be aware of instead of creating one linear
pricing strategy Amazon actually has numerous dierent price points for all of their
regions. Google has just one, worldwide, pricing schema.

Catch all of that? Google Nearline and Amazon Glacier are working to create a new kind of
storage platform for cold storage archival. With more organizations looking to a cloud cold
storage solution one of the biggest consideration points after understanding the model
itself is management. How do you control what sits in your cold storage environment? How
do you label and create policies around specic repositories? What about data encryption,
data purging, and even API integration? To get the most out of a cold storage architecture,
organizations must also look at the management layer.
As a nal note - lets examine how new management platforms are now allowing
organizations to leverage even better service oerings from both Glacier and Nearline.

Final Thoughts Integrating Cold Storage into your Business and Storage
Strategy
Organizations are nding key benets in moving a piece of their storage and data retention
into a cold storage architecture. Still, as powerful and aordable as Nearline and Glacier
might be end-to-end integration and management are still sometimes a challenge.
New kinds of storage management technologies allow you to dynamically scale between
your data center and cloud-based cold storage solutions. Tools like those from Cloudberry
really begin to expand the capabilities of cloud storage management. These kinds of
solutions oer new options to management data.
If youre working with a management solution for either Glacier or Nearline there are a few
key technology points to look for. In the following table, youll see some critical
considerations which will help create better practices around cold storage management.
With that in mind - Lets talk about what you should be looking for when it comes to data
archiving and disaster recovery with cold storage services like Amazon Glacier and Google
Nearline:

CloudBerry Lab

Learn more

www.cloudberrylab.com

Understanding Cold Storage


COLD STORAGE CONSIDERATIONS
Automation and
Integration

Look for solutions which oer scheduling, PowerShell, CLI and API integration.
This allows you to integrate more backup points and automate the actual cold
storage archival process.

Security

Youll want to look at two important security points: Encryption on the client
side. And, encryption when the data is in transition and in the cloud.

Storage and Bandwidth


Optimization

Youre not just uploading archival data into a backup or cold storage solution.
Youre also optimizing this information. Look for capabilities to upload in
multiple threads as well as options for incremental and block level backup.
From an optimization perspective, bandwidth throttling, compression, and
deduplication are great ways to control and optimize data ow.

Initial Data Set Seeding

Look for direct capabilities to import and export data within a cold storage
provider. This allows you to control how, when, and how much data is
processed or uploaded at any one time. Similarly, you can export data much
quick with tools that directly integrate with a cold storage provider like Nearline
or Glacier.

Data Retention

New data control mechanisms allow you to manage versioning and lifecycle
management. Furthermore, you can set granular purge rules around a variety
of backup data points.

Restore Capabilities

Simply restoring data isnt enough. When working with storage management
solutions look for capabilities around backup consistency checks and smart
restore functionality.

As you build out your own cold storage architecture make sure to create an environment
based on integration best practices. This means understanding what kind of data youll be
storing, retention policies, pricing, and of course how quickly youll be needing the information during a restore. Theres little doubt that this will quickly become a consumer market for
cold storage. Massive data sets are becoming more prevalent within organizations and they
will be looking for ways to archive and store this information.
Through it all, management capabilities around backup and storage will be critical. Let me
give you a specic example - Glacier allows customers to set policies that only allow users to
retrieve a certain amount of data per day. Furthermore, Glacier customers could also set a
policy for retrieval that falls within the free tier. When compared to Google Nearline the
same sort of granularity seems to be missing.
As an explorer into your cold storage architecture, technologies like Cloudberry are there to
not only pick up the slack, but also optimize the deployment, integration, and entire storage
and backup control process.

CloudBerry Lab

Learn more

www.cloudberrylab.com

Understanding Cold Storage


Soon as competition picks up even more - well see even better performance, lower pricing,
and more integration with data center storage environments. Cold storage has become a
way for organizations to extend their data storage solutions into cost-eective, long-term,
retention silos. To ensure optimal performance and eective cost controls, make sure to
create your cold storage architecture around proper management best practices and as a
very important note proper alignment with the goals of your organization.

CloudBerry Backup for Amazon Glacier and Google


Nearline
CloudBerry Backup is a reliable, powerful and aordable cloud backup solutions that
supports Google Nearline and Amazon Glacier. In addition to oering real-time and/or
scheduled regular backups, local disk image, or bare metal restore, CloudBerry employs block level backup for maximum eciency and provides best-in -class alerting
features allowing to track each backup and restore plan remotely. Data is protected
via 256-bit AES encryption with keys controlled only by end-users, while condentiality
is enforced through transparent exchange whereby only end-users have access to the
actual data residing on the cloud.
Those seeking to leverage low cost cold storage providers without giving up reliability
of service, security and ease of le restore now have easy access to a feature set
designed specically to safeguard the data and les backed up on Amazon Glacier and
Google Nearline.

Download CloudBerry Backup free trial

CloudBerry Lab

Learn more

www.cloudberrylab.com

Das könnte Ihnen auch gefallen