Sie sind auf Seite 1von 44

Technology Analysis Report

Cloud Storage Gateways

Kiran Srinivasan, ATG, CTO Office


Shwetha Krishnan, ATG, CTO Office
Monika Doshi, SPT, CTO Office
Chris Busick, V-series product group
Sonali Sahu, V-series product group
Kaladhar Voruganti, ATG, CTO Office

TABLE OF CONTENTS

Style examples (HEADING LEVEL 1, Arial bold 13 pt blue) Error! Bookmark not defined.

1.1
defined.
1.2
1.3
1.4

2
2.1
2.2
2.3
2.4

3
3.1
3.2

subsection (heading level 2, arial bold 11 pt black)


subsection (heading level 2)
subsection (heading level 2)
subsection (heading level 2)

Error! Bookmark not

Error! Bookmark not defined.


Error! Bookmark not defined.
Error! Bookmark not defined.

Template use Error! Bookmark not defined.


Applying paragraph styles Error! Bookmark not defined.
Adding Graphics Error! Bookmark not defined.
Adding New Tables
Error! Bookmark not defined.
styling Unformatted Tables
Error! Bookmark not defined.

Appendices

Error! Bookmark not defined.

Appendix title
Appendix title

Error! Bookmark not defined.


Error! Bookmark not defined.

References

Error! Bookmark not defined.

Styles lists

Error! Bookmark not defined.

LIST OF TABLES
Table 1) Example NetApp table with caption. ............................................... Error! Bookmark not defined.
Table 2) Single-use styles............................................................................. Error! Bookmark not defined.
Table 3) Authoring styles. ............................................................................. Error! Bookmark not defined.

LIST OF FIGURES
Figure 1) Insert name of figure here (Arial bold 9 point gray). ...................... Error! Bookmark not defined.
Figure 2) Example caption. Always finish captions with a period, even if its not a complete sentence.
...................................................................................................................... Error! Bookmark not defined.

Cloud Storage Gateways

NetApp Confidential

1 SUMMARY
In this section, we will present the key observations and insights of the report, these will be elaborated at
length in the rest of the report.

1.1

CLOUD GATEWAYS (Section 2)

A cloud storage gateway can be defined as follows:

Cloud storage gateway is a hardware or software based appliance located on your organizations
premises. It enables applications located in your local datacenter to access data over a WAN from
external cloud storage. The applications continue to use iSCSI, CIFS, NFS protocols while the cloud
storage gateway accesses data over the WAN using APIs such as SOAP or REST.
Cloud gateways act as a bridge between enterprise data centers and storage that is resident in an
external service provider, supporting the trend towards hybrid clouds.
Why are gateways important for our customers?
1. Agile storage delivery
a. Provide access to a elastic storage for enterprises with simpler and rapid provisioning
2. Lower infrastructure costs :
a. Pay only for storage used (pay-as-you-go model).
b. Lower capital costs in the data center.
c.

Reduced storage management, offloaded to the cloud provider.

d. No separate off-site disaster recovery solution needed.


Why are gateways important for NetApp?
1. Another storage tier offering with different SLA properties in our storage portfolio.
a. Cloud storage is viewed as low-SLA storage. A cloud gateway can enhance the value of
cloud storage for enterprises features like security, dedupe, storage mgmt., etc.
2. Opportunity to offer MSEs a compelling alternative to dedicated backup appliances (e.g. Data
Domain).

1.2

RATIONALE FOR CLOUD GATEWAYS (section 2.3 )

Enterprise backups, archival data and tape data can leverage elastic cloud storage:
o Roughly three copies of primary data are created for backup and secondary purposes
leading to provisioning issues.
o Cloud storage in a remote data center can be an equivalent for off-site tape (for DR).
o Storage in the cloud is the largest growing cloud service (750 Billion objects in S3 by late
2011). Mainly archival and online backups from consumer space.
o Movement of enterprise data can be facilitated by the advent of cloud gateways.
All enterprise applications might not move to the cloud:
o Migration of compute and storage to the cloud is not cheap unless the application is
offered as a SaaS (Software as a Service) (e.g. Salesforce.com, Microsofts Office365).
o Security, control and process concerns for many larger enterprises will force them to
keep at least some applications on-premises.
o On-premise enterprise applications can benefit from elasticity and other cloud storage
advantages via cloud gateways.

Cloud Storage Gateways

NetApp Confidential

1.3

KEY CLOUD GATEWAY USE CASES (Sections 3, 4, 5)

Short term (1-2 yrs): Conduit for secondary storage - backup streams, archival data and tape
data.

Longer term (2-5 yrs): Conduit for primary Tier-2 application data (primary) Microsoft Exchange,
Microsoft SharePoint, Home directories.

1.4

OPPORTUNITIES AND THREATS FOR NETAPP (Section 7)

Threat: Tier-2 application primary data, especially in virtualized environments form the bulk of our
revenue. They can move to the cloud via cloud gateways impacting our revenue significantly.
Amazons AWS gateway suggests that their next version will aim at primary enterprise data.

Opportunity-1: Currently NetApp does not have a compelling solution against DataDomains
backup appliances. A cloud gateway solution with inline deduplication, WAN latency performance
optimizations, integrated with NetApp data management features (like SnapVault, Sync Mirror)
allows us to compete with them in this $2.18B market.

Opportunity-2: Cloud gateways can enable easier migration of data to cloud service providers
who use NetApp storage. In addition, an integrated solution between a NetApp-based cloud
storage and NetApp cloud gateway can be efficient and compelling.

1.5

KEY COMPETITORS IN THIS SPACE (Section 6)

Startup vendors: Nasuni (primary focus), StorSimple (Sharepoint integration), Panzura (Global file
system), CTERA (consumer oriented). Enterprise readiness is a question with most of them, only
a couple of them have more than 50 customers.

Established vendors: Amazon AWS Gateway, Riverbeds Whitewater appliance, Microsoft Azure
appliance. Amazons gateway as well as Googles foray into cloud storage highlight the fact that
established players are keen to move entreprise data to the cloud.

EMC has partnerships with almost all gateway vendors. EMC also has Atmos for cloud storage.

Mode of deployment: All have VSAs, some have both VSAs and physical appliances. Very few
have HA capabilities.

1.6

NETAPP ADVANTAGES/DISTINGUISHING FEATURES (Section 8)

NetApps data management value: Expose NetApps value add in data management (like
snapshots, cloning, mirroring, snapvault) on another storage tier - cloud storage.

SLO-based management: Enable migration of data between traditional storage tiers and cloud
storage via SLOs.

Leverage NetApp technologies: Cloud gateways require write-back caching for performance,
NetApp can leverage existing technologies to create an efficient write-back cache that is
protected by HA.

1.7

KEY ADDITIONAL IP FOR A CLOUD GATEWAY VIS--VIS NETAPP (SECTION 9)

Basic cloud gateway infrastructure (for both secondary and primary storage):
o File to object protocols conversion.
o Volume to objects/group of objects data granularity mapping.
o Security of objects in the cloud (encryption).

Cloud Storage Gateways

NetApp Confidential

Value-added features (for both secondary and primary storage):


o Compression
o Deduplication
o Application integration.
o Cloud-aware data management (like auditing cloud costs)

Optimizations for a viable primary specific storage solution:


o WAN latency optimization via read and write back caching, pre-fetching

Infrastructure for global collaboration of a primary storage repository:


o Global Locking across a WAN.

1.8

RECOMMENDATIONS FOR NETAPP (Section 9, Section 10)

BUY : (Near Term 1 yr)


o

Pros: Lower time to market; compelling and unique IP (as listed above in Section 1.7).

Cons: Enterprise readiness of many startup vendors; Integration of acquired vendors IP


with NetApp data management features requires time and effort.

Recommendation: Buy only when IP is hard to develop; chart out pathway to integrate.

PARTNER : (Short Term 3 to 6 months)


o

Pros: Lower time to market; parity with EMC; integrated solutions that lower TCO.

Cons: Limited gains; NetApps data management value-add could be hidden

BUILD : (Long Term 2 yrs)


o

Pros: NetApps distinguishing features can be fully leveraged; enterprise readiness

Cons: Building cloud gateway in ONTAP would take beyond 2015 (LB+). Building nonONTAP solution might limit exposing our value-adds.

The overall recommendation is to partner immediately with cloud gateway vendors and pursue the buy
and build options in parallel. Specific projects in the build option are outlined below:

Enhance our V-series offering to have a cloud storage backend (already underway)

ATG Projects:
o

Understand the performance of primary, Tier-2 applications on a cloud gateway

Reliability and security aspects of cloud gateway

Unique data management functionality required for cloud gateways

Global file system using cloud gateways

Cloud Storage Gateways

NetApp Confidential

2 INTRODUCTION
The growth of cloud technologies, both public and private clouds, have been driven primarily by perceived
reduction in IT costs. The commoditization of server hardware resources (especially CPU cycles, memory
capacity and disk capacity) has been the biggest enabler. In addition, the growth of hypervisors
technologies to help increase resource utilization and consolidation of application servers have
contributed significantly to this trend. Also, analytics on large data repositories (big data) have assumed
significance for many organizations that derive revenue from web services; e.g. Google, Yahoo, Amazon,
etc. The scale of data, and the need to compute analytics cost effectively, have forced them to adopt a
cloud-based infrastructure along with novel paradigms for computing like Map-Reduce and Hadoop.

2.1

PRIVATE, PUBLIC AND HYBRID CLOUDS

In the context of our discussion, we primarily deal with cloud storage as opposed to cloud compute.
Private cloud storage has been applicable in situations where enterprises feel insecure about certain
types of data leaving their controlled admin domains, e.g. payroll, source code, corporate email, etc. On
the other hand, public clouds are applicable for cases where flexibility in terms of compute and/or storage,
as well as ease of management trumps other admin considerations.
Private clouds require both upfront (capital expenses) cost to create them as well as recurring operational
expenses for administration and management. Whereas, the inherent sharing of resources, economies of
scale, and multi-vendor competition attributed to public cloud vendors enable a pure operational
expenses model with an expected downward tendency in prices.
From an enterprise storage context, it is clear that there are always some types of data that can be
moved to a public cloud, e.g. backups, archival data. Therefore, for enterprises, a hybrid cloud model, a
combination of private and public clouds is expected to emerge. However, it is speculated that all data
will move to public clouds eventually provided the issues around security, control and service-levels are
addressed adequately.

2.2

CLOUD GATEWAYS

For both hybrid clouds as well as for enterprise datacenters to leverage public clouds, there is a
requirement of a functionality that can bridge the two worlds and enable data migration between them.
We call such a functionality as the Cloud Gateway, which can reside in an appliance or in a virtual
machine. Typically, the raw public cloud storage is accessed via a simple, object-based (PUT/GET)
interface using SOAP/REST-based API over HTTPS. In addition, the Cloud Gateway would employ the
local disks or flash storage associated with the appliance to cache cloud data. In addition, the local disks
could also be employed as the final resting place for certains types of data that need to stored
permanently in the gateway, e.g., filesystem metadata.
The caching can be write-back or write-through. Typically, to reduce latency for write operations, the
cache would be a write-back cache. However, with write-back caching, we need to satisfy these
requirements: the dirty data in the write-back cache has adequate protection against failures and that
these protection mechanisms do not debilitate the appliances performance. The storage for the cache is
the only upfront storage investment needed to leverage the cloud gateway. This implies that the cost for
the customer is proportional to the working sets of their workloads as opposed to the entire data
generated by their workloads. Depending on the workloads, the former could be much smaller than the
latter. In addition, for enterprise storage vendors, their ability to sell actual storage (in bytes) reduces
drastically.
For a viable cloud gateway to an external cloud storage provider, there are some very specific
requirements due to the nature of the raw cloud data and the API offered to access it. In Section 3.1, we
list these requirements and the rationale for having them. In Figure 1, an illustration of a cloud gateway
appliance is presented in Figure 1 in the context of an enterprise datacenter. As can be seen, the cloud

Cloud Storage Gateways

NetApp Confidential

Figure 1: Cloud Storage Gateway Architecture

gateway is an on-premise appliance or a VSA (virtual storage appliance) and talks via NAS (CIFS, NFS)
or SAN (iSCSI, FC) protocols with traditional clients in the datacenter. It interacts with the cloud storage
using an object-based interface and uses local disks for caching hot content or as a permanent store for
primary data. Typically, the gateway would be responsible to ensure security of the data before leaves the
data center. In addition, the gateway might contain features (depending on its use) that enable
performance optimizations and latency reductions while accessing cloud storage over the WAN. Last but
not the least, to enhance reliability of data stored in the cloud, the gateway might simultaneously store
data on multiple cloud vendors to protect against cloud access outages, vendor lock-in and to leverage
price changes. Overall, all of the functionality in the gateway is aimed to lower the TCO by enabling the
flexibility of cloud storage, i.e. lower/simpler provisioning (pay-as-you-go), lower admin cost and infinitely
scalable storage.
Current cloud storage gateway market is very nascent and the offerings are not fully featured. Most of the
vendors are smaller startup companies that are new to the storage space and do not own a complete
storage portfolio like NetApp or EMC. As of now, only a few offer traditional enterprise capabilities like
high-availablity and very few of them actually target enterprise storage. Currently, gateways have been
primarily targeted for backup streams, archival data and as a replacement for tape (offsite disasterrecovery). These are primarily offline workloads, with limited performance requirements that are typically
tolerant to variation in throughput and latency, such as in a WAN. Moreover, since most of the vendors
are startups, they would like to target workloads that are relatively easy to support from a performance
perspective, as opposed to primary workloads that have stringent requirements.

2.3

MOTIVATING FACTORS

In this section, we will provide motivation for cloud gateways from the perspective of two different
enterprise workloads: secondary storage and for Tier-2 application data.
2.3.1

Cloud gateway for secondary storage

Cloud Storage Gateways

NetApp Confidential

In the enterprise data center, a rule of thumb is that for every byte of primary data, three bytes of
secondary data are stored. This includes backups within the data center and a copy on tape at a remote
site for disaster recovery (DR) purposes. With backups, the typical enterprise workflow entails full backup
every week followed by daily incremental backups, leading to secondary data bloat. The reason
deduplication technologies are adopted heavily in this realm is primarily due to this bloat. In spite of
deduplication technologies, we observe from certain case studies that efficient provisioning of storage to
address secondary growth is very difficult.
The number of objects in Amazons S3 now exceeds 700 million [ref]. Backup and archival data constitute
nearly 55% of S3s objects. These backups are largely expected to be online backups of personal laptops
(consumer space), the fraction of enterprise backups is not known. However, such a large percentage
questions whether cloud storage can be an efficient option for enterprise backup data as well. Also, the
features expected of offsite-copy (for disaster recovery purposes) of enterprise data maintained on tape is
very similar to that offered by cloud storage. But, cloud storage has other inherent advantages relative to
tape like WAN-based global access and a variable cost structure (due to multiplexing of cloud resources
across clients and workloads). We observe that these advantages make the migration of enterprise
backups and off-site copies to cloud storage imminent.
The inherent elastic nature of cloud storage can help to address the provisioning of secondary data.
Thus, the need is for a conduit to send enterprise backup to the cloud with the right level of security
(protection) and recovery semantics. We envision the cloud gateway to act as this conduit and enable
existing backup applications to transparently leverage cloud storage. From another perspective, currently
we observe that within Amazon S3, an overwhelming percentage of data consists of online backups and
archives from individual users (or consumers). Enabling enterprise backups to utilize cloud storage would
be a natural extension of this trend.
As mentioned before, a copy is typically stored on a off-site tape archive, primarily for DR. Similar to cloud
storage, the tape is maintained at a remote data center. Moreover, like cloud storage, the tape archive
could be managed by a different company, maintained as a repository across their multiple customers.
Therefore, the functionality and requirements are almost identical. This implies that cloud storage can be
an effective and cheaper tape replacement because the extra cost of copying data over to tapes and
transporting them are not applicable. Compared to tape, cloud storage has one distinct advantage, data
can be accessed at any time or place, independent of the actual physical location. With the cloud
gateway as the bridge to the archive in the cloud, the archive can be kept online indefinitely. Moreover,
this online archive can be accessed using traditional protocols and recovered efficiently, with little
logistical overheads.

2.3.2 Cloud gateway for Tier-2 application data (primary storage)


The cloud market as a whole, both private and public is growing rapidly. The cloud vendors classify their
service in many ways Infrastructure as a Service (ItaaS, e.g. Amazons S3), Platform as a Service
(PaaS, e.g. Joyent), Software as a service (SaaS, e.g. Microsofts Office365), etc. Within ITaaS, there is
further classification as Storage as a Service (StaaS, e.g. Amazons S3) and Compute as a Service
(CaaS, e.g. Amazons EC2, Microsofts Azure cloud). Among these different categories, StaaS is
experiencing the highest growth (cumulative growth rate of 25% annually), but CaaS leads in terms of
revenue[1,2,11].
The key question remains as to whether other workloads will adopt cloud storage. A related question is
whether hybrid clouds will become mainstream, where some data resides in an enterprise datacenter or a
private cloud and the rest resides in a public cloud. An interesting perspective is provided by Intels
whitepaper on the future of IT, datacenters and their evolution vis--vis cloud technologies [4]. Figure 2
shows an illustration from the whitepaper on the evolution of hybrid clouds and where the different wo

Cloud Storage Gateways

NetApp Confidential

Figure 2: Intel's IT evolution - hybrid clouds (Source: Intel whitepaper [4])

workloads will reside. It can be observed that in the mid term, only selective functions will move to public
clouds like caching of content on ItaaS and sales support to SaaS (e.g. salesforce.com). However, in the
longer term, they expect more workloads to go to public clouds backups, storage, manageability, client
VM images to ItaaS as well as CRM, Collaboration, Productivity tools to SaaS. As per this report, for
NetApp, the implications are pretty clear a significant portion of the enterprise storage data is moving to
public clouds. Also, in the picture, we can see that internal clients in the enterprise datacenter are
expected to make use of ITaaS services like cloud storage over the WAN. This observation points to the
importance and development of cloud storage gateway technologies in the near future that will enable this
evolution.
Another aspect in the evolution of cloud workloads is the role of SaaS. SaaS provide you the application
as a cloud service accessible typically over a web-based interface. For business applications like CRM,
ERP, such cloud services are readily available and are being adopted zealously. A unique case in point is
Microsofts Office365, which offers the entire Mircosoft Office suite of applications as web services. Such
services are clearly cost effective for enterprises. In addition to the advantages of a cloud service, i.e. low
management and admin costs, instant deployment, such services eliminate the extra servers (and
associated data center costs) required to run the application servers in the datacenter. However, the key
disadvantage is that there is very little control on the application data, i.e. their storage and security
policies on them. Thus, while adopting a SaaS service, we trust the provider considerably.
The SaaS security model might be suitable for MSEs but not completely for large enterprises. We expect
that large enterprises would still like the security, control over resources and processes, that standalone
application servers offer when they are on-premises. However, they would like to take advantage of the
elasticity and cost advantages of cloud storage if possible. Cloud gateways enable this exact
requirement, to serve as a bridge that enable applications to reside in the data center but leverage cloud
storage for their storage needs.
With cloud gateways, other than SaaS use-cases, we expect that there are a significant number of
workloads that use compute in the datacenter or private cloud but leverage storage in a public cloud. A
key question is whether this assumption is valid. A contrary opinion is that for all enterprise applications

Cloud Storage Gateways

NetApp Confidential

(standard as well as custom) both compute and storage would move to a public cloud and render the
gateway functionality useless. There are no clear answers to this question as of now. We need to wait for
the evolution to take place before we make our judgement. However, a hybrid cloud scenario, with a split
of compute in the private cloud (or enterprise datacenter) and storage in the public cloud is plausible
given the characteristics of these applications and resources (compute and storage). We list a few of
them here:
1. Change in applications: For applications to move completely to a public cloud, a few changes
have to happen:
a. These applications need to be portable, i.e. encapsulated in a VM, before they can be
moved to a compute cloud service.
b. Typically these applications work with unencrypted data in the datacenter, when run in a
public cloud compute infrastructure, data traffic into and out of the VM needs to be
encrypted. Moreover, data at rest created by the applications need to be maintained in
encrypted form.
c.

The application or the layer below the application needs to talk to cloud storage via a
different protocol. Enterpise applications or the layer(s) below them need an object
protocol to access cloud storage as opposed to the typical enterprise storage protocols
(NAS/SAN). Either a translation from NAS/SAN protocols to object protocols need to be
made or new storage client software that natively talk object protocols need to be
introduced in the application VM stack.

d. Also, along with translation, to reduce cost of cloud storage, features like storage
efficiency need to be incorporated into the applications VM stack.
None of these changes are insignificant for many legacy enterprise applications.
2. Compute and Storage are different kinds of resources: With increasing processor speeds,
compute as a resource is one of the cheapest in the datacenter and the most flexible in terms of
usage. Contrarily, the cost per CPU cycle, as seen with Amazons EC2 is not low. This
observation has translated to CaaS generating more revenue than StaaS. Moreover, compute is
a renewable resouce, the moment a CPU cycle is used up it is available for use again. Whereas,
storage costs are going down, but the hidden costs of storage administration is still considerable.
In addition, storage is a consumable resource, once a byte of storage has been used, it has been
consumed and cannot be used until the data is erased. These factors might make storage a
candidate to migrate to the public cloud but not necessarily compute.
Given these observations and the growth of hybrid clouds, we feel that that cloud gateways might be the
conduit for enterprise storage to move to public clouds at least for some workloads. Burton Groups report
[1] on cloud gateways classifies workloads that have already moved and the ones that can move to the
public cloud via gateways. A key observation is that many Tier-2 applications where NetApp has enjoyed
significant market share and revenue growth are listed as ones that might move. The lure of flexible, payas-you-go and a low capital expenditure model for enterprise storage are the key motivating factors
behind this prognosis.

Cloud Storage Gateways

NetApp Confidential

3 CLOUD GATEWAY ARCHITECTURES


In this section, first, we will outline the mandatory capabilities that a cloud gateway should possess
mainly dictated by design considerations and partly by the first-movers differentiation in this space.
Second, we will provide three alternative models of usage for a cloud gateway. These models are not
mutually exclusive of each other.

3.1

MANDATORY CAPABILITIES

Given the market space for cloud gateways, the following are the mandatory features expected of an
enterprise-class appliance/VSA:

Operations and protocols (NAS/SAN) that emulate conventional storage arrays and file servers:
To enable existing enterprise (storage) clients to access data.

Translate between files, blocks to objects on the cloud storage: Since the cloud storage API
offered is typically object-based, the gateway needs to translate appropriately.

Data leaving the enterprise datacenter needs to be secure: Enterprise data cannot leave the
premises in clear-text and cannot be stored in clear-text in the cloud. This requirement is usually
achieved by encryption before the data leaves the datacenter.

Perform smart caching of the data to avoid WAN latency: Typically, write-back caching and
efficient pre-fetching strategies are employed in this context. Also, having an effective cache
reduces the number of network requests to the cloud storage provider, enabling extra savings.

Minimize the WAN bandwidth usage by deduplicating data: Most external cloud SSPs charge for
both the data stored as well as for network requests. To ensure minimal data is stored in the
cloud, deduplication is essential. Moreover, you pay the SSPs only for network requests of unique
data.

Provide access to multiple public cloud storage vendors: Mainly to prevent a single point of
failure as well as single-vendor lock-in.

Export cloud storage semantics to the end admin: Cloud storage features such as on-demand
capacity and pay-as-you-go pricing model needs to be exported to the admins in a transparent
way.

Monitoring, reporting and other data management capabilities: Since the customer would be
paying the cloud SSP for the storage as well as network requests that originate from the cloud
gateway, it is essential to audit all the requests efficiently and present them to the customer on
demand.

3.2

DESIGN APPROACHES TO CLOUD GATEWAY

Different approaches or models have surfaced among the cloud gateway vendors to facilitate cloud
storage integration. These models also have a strong relationship to a typical dataset they would support.
The performance characteristics of all enterprise workloads cannot be satisfied by all models. However,
these models are not exclusive to each other, some appliances blend them. The models are:

Caching device model: The gateway provides advanced caching algorithms to mask cloud
performance limitations both WAN latency and bandwidth constraints. Typically, write-back
caching is done on local disk or SSD devices.

Cloud Storage Gateways

NetApp Confidential

Tiered device model: The gateway enables the creation of an explicit enterprise storage tier
with specific performance and capacity characteristics. By definition, in this model, the gateway is
part of a larger eco-system that provides the other storage tiers.

Copy device model: The gateway provides conventional on-premises storage with scheduled
replication services to the cloud to facilitate backup/recovery functionality as well as a disaster
recovery solution.

As gateway offerings increase, we expect them to be a combination of these models. The following
subsections detail each of these models.

3.3

CACHING DEVICE MODEL

Figure 3 illustrates a gateway modeled as a caching device. With this approach, a cached copy, i.e., a
virtual storage volume (filesystem or LUN) is presented to the datacenter clients by the appliance,
whereas the actual volume is in the cloud. The cached copy need not be in-sync with the volume in the
cloud. Moreover, the cache type write-through or write-back dictates the invalidation and consistency

Figure 3: Gateway Caching Model (Source: Gartner)

requirements.
Typically, in order to mask WAN latencies, the caches are designed as write-back caches. This implies
that during steady-state, some amount of dirty data (unflushed writes) will be present in the cache. In this
case, since we are dealing with enterprise data, data loss is not acceptable. Therefore, we need to
ensure that the dirty data can survive the loss of the gateway appliance by appropriate reliability
mechanisms (e.g., mirroring to another appliance within the datacenter). Since this requirement is similar
to the reliability requirements of primary enterprise data, vendors should build these functionalities in their
gateways to be feasible. In addition, in the event of a datacenter disaster, the volume recovered from the
cloud needs to be in a consistent state. To enable this, by design, we require well-defined cut-off points
(checkpoints/snapshots/consistency-points) in time for synchronizing dirty data from the gateway to the
cloud.
A vendor-proprietary caching algorithm attempts to minimize data transfers between the gateway and
the cloud- storage provider for both reads and writes. Cache reads can transfer to the enterprise data
clients at speeds consistent with NAS or SAN systems. Anytime the cache experiences a read cache
"miss," the gateway must retrieve data from the cloud and incur both latency and bandwidth penalties
while the data moves from the cloud through the cloud connectivity and finally to the gateway. Typically,
for writes, the cache aggregates, compresses/deduplicates, and encrypts the data for transfer to the

Cloud Storage Gateways

NetApp Confidential

cloud at opportune times. To minimize cloud data footprints, improve performance, and preserve data
privacy, upon transfer the gateway will compress, deduplicate, and encrypt data.
Some vendors like StorSimple have taken a hybrid approach, where some data like the filesystem
metadata always resides in the gateway, whereas the file system data is being cached. This approach
enables fast access to metadata even in the event of a disconnect to the cloud.
The following are some key technical issues are relevant for the caching approach:

Cacheable Workloads: To mask the WAN latency effectively, the workloads need to be cachefriendly, i.e., have temporal locality properties that lend to a relatively small working set. In
addition, the cache replacement policy has a big impact on performance and has to be designed
with due importance. Lastly, effective prefetching strategies need to be devised, in order to
minimize cold misses that require synchronous retrieval from the cloud leading to unexpected
and unacceptable WAN latencies (could be three orders of magnitude compared to local disk).
Sizing: Since we store data storage in the gateway is a function of the working set sizes and not
the actual data size, ideally, we can serve an ever-increasing cloud storage footprint with the
same local storage on the gateway appliance. Even if we do not exhibit this ideal behavior, we
expect the growth of local storage (on the gateway) to be dependent on working set size growth,
which is more tied to applications behavior/evolution as opposed to application data growth. This
insight offers incredible cost advantages for datacenters, where storage capacity sizing and
provisioning (usually for peak utilization) concerns are largely mitigated. In addition, the lower
capacity requirements augur well for a pure flash-based cache (SSDs) .
Coherency issue: In some cases, the actual cloud volume could be shared by multiple cloud
gateways in different geo-distributed datacenters via gateways. Panzuras Global File System is
an example. To provide a globally consistent view of a single file system, we need appropriate
coherency mechanisms across the WAN-distributed gateways, such as global locking and cache
invalidation schemes.

So far, analysts like Gartner have suggested that the caching model is best for minimal footprint
installations (like branch offices), file sharing workloads, data archival and low-demand backup, primarily
due to latency concerns. It is still unclear if the requirements of key primary workloads like business
applications can be satisfied by the caching model. For this model, example vendors are Nasuni,
Riverbeds Whitewater appliance and Panzuras Alto 6000 Series Cloud Controllers.

3.4

STORAGE TIER MODEL

Figure 4 shows a tiered gateway. With this model, in contrast to the caching model, storage volumes
may wholly exist in the datacenter on local storage and/or exist in the cloud. In this model, the gateway
enables cloud storage access to enterprises as a specific tier in a multi-tier storage hierarchy. The
hierarchy is usually based on performance characteristics of each tier.
Today, in enterprise datacenters, such multi-tier hierarchies already exist, usually classified based on
performance characteristics - starting from fast, expensive flash-based storage to slow, inexpensive
tapes. A cloud gateway would extend this hierarchy by offering another tier with flexible storage capacity
and disaster protection but lower performance. Given these characteristics, this tier would fit datasets
described as archival/data warehouse, cold data as well as traditional backups. Also, multi-tier storage
hierarchies enable features like automated data migration to reduce overall cost - the ability to move data
from one tier to another automatically via dataset policies.
Here are some key issues to relevant to this model:

Cloud Storage Gateways

NetApp Confidential

Figure 4: Gateway as a storage tier (Source : Gartner)

Traditional workloads compatibility: A tiered gateway from a different perspective could be


thought of as a conventional enterprise storage system with cloud gateway feature. Therefore,
workloads that are appropriate for local storage systems are still applicable to tiered gateways.
However, a key advantage of such a gateway, is that the cloud storage provides infinite capacity,
albeit with performance, SLO and cost limitations. Though, current offerings have limited local
storage scalability (StorSimples 7010 appliances maximum is 20TB) and may fall short for IO
intensive workloads.

Caching issues mitigation: As mentioned before, with a write-back caching gateway, to protect
against data loss due to failures (hardware or cloud connectivity), we need protection
mechanisms in place like local mirroring and consistent checkpoints of the cloud data. In the
tiering model, due to the existence of a local storage tier, such issues are largely mitigated or
completely absent.

Highly competitive landscape: With the tiering model, we expect existing enterprise storage
players, e.g. EMC, HP, IBM, Hitachi data systems, etc. to have a tiered gateway as they are best
equipped to enhance their value proposition with multi-tier storage.

The tiering model is best suited for data archival, data warehousing type of datasets. Example products
are StorSimple and F5 Networks ARX.

Figure 5: Gateway used for a remote copy (Source: Gartner)

Cloud Storage Gateways

NetApp Confidential

3.5

COPY MODEL

Figure 5 illustrates a copy cloud gateway. In this model, the gateway is similar to a traditional local
NAS/SAN storage system and customers are supposed to use it that way. The performance and
management expectations from this appliance are also similar to a traditional on-premise storage system.
However, the only unique value-add is the ability to connect to external cloud storage and perform
replication/copy services from local storage to the cloud storage. These copy services would be similar or
identical to the ones between enterprise storage systems. The main goal of such services is data
protection in the event of both the on-premise storage system or the datacenter itself. A secondary goal
would be asymmetric data sharing (i.e., read only) across geographically distributed datacenters.
With this model, storage admins are required to be able to perform cloud storage related configuration
and map traditional copy services notions to the equivalent ones relevant for cloud storage. Also, the
gateway is expected to deduplicate and compress data before transferring to the cloud as a large stream.
The copy approach is ideal for data protection use-cases (like DR) that require routine snapshot
capabilities. This model has the least barrier to adoption and the ideal first product offering across all
gateway models. However, the limited applicability might also mean limited cost savings. Also, since the
offering is geared towards disaster recovery, transferring large datasets from/to the cloud efficiently is a
key issue. Not all cloud storage protocols are suited for a streaming workload, all of them currently export
an RPC like object based protocols.
Some key issues relevant for the copy model are:

Storage players imminent entry: Enterprise storage vendors with current DR offerings would
jump onto the copy cloud gateway soon. The required SLA/SLOs in the cloud storage to support
this model is a key open question, the role of cloud storage providers to facilitate this model is not
clear. Amazons S3 is optimized for relatively smaller objects (order of 1MB) as opposed to large
objects that might arise of a copy workload (order of TB). This implies more work on the gateway
to split the copy workload onto many smaller objects and the associated book-keeping.

MSE suitability: Customers in the MSE space are the ideal candidate for such a gateway. With
cloud-based replication, DR services, they can skip the equivalent datacenter-to-datacenter
offered by current vendors completely for cost reasons. We expect this advantage as a key
driver for this model.

An additional use-case is for data sharing between datacenters with completely non-overlapped work
patterns (two datacenters on different sides of the globe). Though, we have not seen existing vendors
tout this specific advantage. CTERA is a prototypical vendor in this space.

3.6

HYBRID (COMBINATION) MODELS

It is easy to notice that these models are not mutually exclusive. A number of solutions from existing
vendors is a combination of these different models. Typically, most gateways include a write-back cache
for performance reasons irrespective of the intended workload, i.e., backup or primary. An example is
Panzura, that can be used as a primary storage as well as for archival purposes. As opposed to using
different models, vendors prefer to differentiate by offering value added services like closer application
integration, e.g., StorSimple offers MS Sharepoint/Exchange server integration.

Cloud Storage Gateways

NetApp Confidential

4 UNIQUE REQUIREMENTS/EXPECTATIONS OF GATEWAYS


Compared to traditional enterprise storage in datacenters, cloud gateways have unique requirements that
they need to fulfill. At a high level, most of these are methods to integrate them into known storage
management notions. We look into them next.

4.1

CLOUD-STORAGE SERVICE AUDITING AND CONSOLIDATION

Changing a storage environment to incorporate cloud providers storage requires service-plan tracking to
manage user accounts, billing, access and usage. Consolidation of user accounts for volume pricing and
indirect billing help reduce cost. Also, auditing of all network IOs to the cloud provider is key to provide the
customer with necessary data to validate the cloud providers actual cost and to project future costs.

4.2

ANALYTICS BUILT INTO THE GATEWAY

To manage accounts and optimize savings, report generation for things such as caching efficacy,
bandwidth usage, object transfer sizes are required to validate the effectiveness of cloud storage vs
traditional storage. Also, these analytics might identify performance bottlenecks and help in provisioning
the right number of gateways and cloud storage volumes.

4.3

PROVISIONING MANAGEMENT

As is true for traditional storage, administrators should expect a gateway to offer simplified provisioning.
For example, because cloud storage provides dynamic capacity expansion, creating a thin-provisioned
volume from a gateway should be a simple procedure. Expanding storage is straightforward, but when a
gateway releases unused storage (from the users perspective) it may not release the equivalent amount
from the cloud storage provider. This can be true of gateways that translate block protocols to a objectbased cloud storage without matching. Thus, releasing capacity may require local server agents to
release unused data blocks from the gateway and thus objects in the cloud.

4.4

DR AND BACKUP INTEGRATION

Because DR and Backup workloads are an important use-case for a gateway, it would be appropriate to
integrate the gateway to existing mechanisms to perform these operations. For example, backup
applications and protocols like Microsofts VSS (Volume Shadow Copy Services) , NDMP (Network Data
Management Protocol) or Symantecs OST (Open Storage Technology) are some relevant ones.

4.5

FILE SYSTEM INTEGRITY PROTECTION MANAGEMENT

Cloud gateways provide the ability to share a global file system that is made available to geographically
distributed datacenters. Some vendors like Panzura have touted this as one of their major features. This
feature definitely distinguishes gateways from other traditional storage appliances. However, in order to
accomplish file system integrity in light of a sharing between clients in different datacenters require
mechanisms like global lock management and global file synchronization.

Cloud Storage Gateways

NetApp Confidential

5 SOLUTION DEPLOYMENT
In this section, we will compare and contrast different deployment options for the cloud gateway
functionality in an enterprise datacenter.

5.1

LOCATION IN DATACENTER

The cloud gateway needs the best possible access to WAN connectivity within the datacenter. To a large
degree, both throughput and latency experienced in accessing cloud storage is dictated by WAN
characteristics. Of the two, throughput can be maintained close to the available bandwidth by associating
with the appropriate physical datacenter of the cloud storage providers network and by sufficient
parallelism in software (multiple open connections to the cloud storage provider). For latency, each extra
network hop in the local datacenter before reaching the WAN connection will be detrimental to the overall
latency. Therefore, the cloud gateway should be placed in the network topology with minimal distance to
the WAN connection of the datacenter.

5.2

MERITS/DEMERITS OF AN APPLIANCE DEPLOYMENT

Some cloud gateway vendors are packaging their functionality in a dedicated physical appliance. This
approach has many benefits : dedicated physical resources, performance isolation, fault isolation,
typically better control on performance, leaner data path, etc. Of these advantages, the ones that
influence performance are prominent. Since the cloud gateway needs to have a write-back cache in most
models, the write-back cache can be made reliable with very little effect on performance by using
specialized hardware (like using NVRAM, using high-speed interconnects to mirror contents to another
node). This approach is typical of many primary storage systems. In addition, having dedicated memory
and CPU resources just for the gateway functionality enhances predictability in performance.
There are many disadvantages of this approach as well. First, a dedicated physical appliance typical
comes with a higher cost. Second, deploying a physical appliance is more time consuming and expensive
for the admins of a datacenter. Third, the opex component of a physical appliance including the
rackspace, cooling/heating costs as well as power are not insignificant. Last but not the least, for the
vendors manufacturing the systems, there are more dimensions to handle (suppliers/inventory control,
qualification), a longer product cycle resulting in a longer ROI.

5.3

MERITS/DEMERITS OF VM DEPLOYMENT

Most cloud gateway vendors offer their solution as a VM. This has been influenced largely by the market
into which they are positioning their solution. The MSE/SMB market has been the focus of many startup
vendors. The VM solution is ideal for such cost-constrained environments where the higher performance
of a dedicated appliance is not as important. Constrained in a VM, the cloud gateway is forced to share
resources CPU, memory and storage devices. Moreover, the management of the VM has to be
integrated with the hypervisors management processes and tools.
A VM-based deployment model has its unique advantages. It is possible to deploy many smaller virtual
appliances in each host, such that the combined resources is higher or equal to that of a physical
appliance. In addition, being closer to the application VMs allows the caches in the gateway VMs to be
more effective. Apart from the advantages listed for a physical appliance deployment, the key
disadvantage is that performing any global optimization that entails cross-gateway communication is
expensive and is avoided. For example, the cloud gateways can only perform deduplication within the I/O
streams originating at their hosts and cannot deduplicate across the gateways. Therefore, some

Cloud Storage Gateways

NetApp Confidential

duplicates will find their way to the cloud storage resulting in extra costs for the admins. Also, being a VMbased deployment, the performance expectations need to be appropriately calibrated.

5.4

USE-CASES

The different architectural models enable one or more different enterprise use-cases for the cloud
gateways. A single architecture could support multiple use-cases. The following are some important usecases for cloud gateway deployments.

5.5

CASE 1: BACKUP/COLD DATA

By far the most common case where cloud gateways are being employed today. The cloud gateway is
used to backup data to the cloud. Typically, the backup copies consume a lot of storage because of
traditional policies full backup every week, with incrementals every day. In datacenters today, dedicated
storage appliances like Data Domains disk-based backup systems are prevalent. With this model,there
are significant overheads: raw storage costs (in spite of deduplication), storage administration costs of
the systems, datacenter costs, provisioning/planning for storage growth. In addition, backup and recovery
software like Symantec or CommVault need to be employed. Backing up to the cloud is cost-effective in
the local datacenter, minimal or no storage admins are needed, and the storage needs can be met
dynamically.

5.6

CASE 2: PRIMARY DATA {CIRTAS, STORSIMPLE, NASUNI}

A small number of cloud gateway vendors are positioning it as a one-stop solution for all storage needs.
They claim that they can cache the most performance-critical working sets on their appliances (virtual or
physical) to enable primary datasets to be stored on the cloud gateway. These appliances are expected
to understand the data access properties of the different data entities (files, blocks or objects) stored on
them and transfer only appropriate ones to the cloud. In addition, they typically perform effective prefetching from the cloud to avoid WAN latencies.
These vendors are careful not to position their appliance for highly latency-sensitive tier-1 applications
like OLTP. They are targeting tier-2 application content like Exchange or Sharepoint databases, whose
performance requirements they feel they can satisfy by careful analysis of their data access properties.
Moreover, compared to OLTP datasets, tier-2 applications typically generate more data. Therefore, their
storage growth trends can make them moving to the cloud economically viable.

5.7

CASE 3: DISASTER RECOVERY AND COMPLIANCE COPY

With this approach, the traditional tape-based, off-site DR copy is being replaced by a copy in the cloud.
The datacenter is expected to have a local disk-based backup appliance for operational recovery. Data is
retrieved from the cloud only when access to the datacenter is completely lost. In this use-case, data is
being sent to the cloud continuously, but hardly ever read back.
A similar use-case is to keep a copy in the cloud for compliance purposes. Sarbanes-Oxley and HIPPA
regulations force the respective verticals to maintain fine-grained data for prolonged periods of time with
the ability to recover when needed. Maintaining an off-site datacenter just for compliance reasons is
expensive. A cloud copy kept with a provider with reasonable reliability and availability guarantees is a
good option to keep costs low. Amazons S3 provides different levels of reliability and availability with a
matching cost spectrum to enable such use-cases.

Cloud Storage Gateways

NetApp Confidential

5.8

CASE 4: REDUNDANT DATA BLOAT {NETFLIX USE}

In rich content space, there is a need to keep multiple copies of the same content at different degrees of
definition or resolutions. A case in point is Netflix, they need to maintain multiple copies of their online
movie content at different resolutions. Coupled with the number of movies they have, this leads to a
storage explosion. Not all resolutions are being used at the same time. Each resolution could be
appropriate for a different device on which the movies can be played. A similar case can be made for
online photo repository or sharing services. In such cases, to satisfy storage growth, it would make sense
to put such content on an external cloud storage service and stream the content directly from them.

Cloud Storage Gateways

NetApp Confidential

6 COMPETITIVE LANDSCAPE
6.1

CLOUD STORAGE GATEWAY ECOSYSTEM

In Table1 below, we have listed the key products in this space with some of their main attributes along
with their differentiators.

U
Table 1: Cloud Gateway vendors, features and differentiators

Vendor/Product Form Factor

Use-case

Name

Focus

Block/File

Supported
Clouds

Key
Differentiators

AArkeia

Hardware
Appliance

Data
File
Protection only

S3/Arkeia
cloud

Integrated
backup/DR,
source-based
dedupe

Axcient

H/W
Appliance

Data
File
Protection only

Axcient cloud

Integrated data
protection and
business
continuity

CTERA

H/W
Data
appliance +
protection, file
backup agents sharing

File

S3, EMC
All-in-one
Atmos,
Rackspace,
Hitachi HCP,
Mezeo,
Scality,
Nirvanix, IBM
GPFS, Dell
DX/Carringo

Egnyte

PC Agent,
hardware or
virtual
appliance

File

Egnyte cloud
only

Cloud Storage Gateways

Cloud file
server, file
sharing, file
backup

NetApp Confidential

Ease of use,
centrally managed
cloud file server
with local edit
and offline access

Vendor/Product Form Factor

Use-case

Name

Focus

Block/File

Supported
Clouds

Key
Differentiators

Gladinet

Software

Cloud desktop, File


cloud server

Mezeo, S3,
AT&T
Synaptic
Drive,
Internap,
Google,
Box.net, Open
Stack,
Nirvanix,
Rackspace
CloudFiles,
Azure

Wide choice of
storage clouds,
low cost

Hitachi Data
Ingestor

Hardware
appliance
with HA

Data
protection,
archiving
(private cloud)

File

HDS Hitachi
Content
Platform only

Centrally manage
and control data
at the edge

MS i365

Software,
hardware

Data
Protection

File

i365 Cloud

Range of services,
Microsoft DPM
integration

Nasuni

Hardware or
Virtual
appliance

Primary NAS, File


data protection

S3

100% Uptime
SLA

Nirvanix
CloudNAS

Software
feature

Cloud filer,
sharing

File

Nirvanix only Free of charge,


ease of use

Panzura Alto
Hardware
Cloud Controller appliance
(with HA) or
virtual
appliance

Primary,
collaboration,
archiving

File

S3, Limelight
CDN,
Microsoft
Azure, AT&T
Synaptic
storage,
Nirvanix

Global
Namespace,
global data
replication and
locking, and
global
deduplication

Riverbed
Whitewater

Data
File
protection only

S3, Nirvanix,
AT&T
Synaptic
Storage

Experience in
WAN
bandwidth/latency
optimization

Cloud Storage Gateways

Hardware or
virtual
appliance

NetApp Confidential

Vendor/Product Form Factor

Use-case

Name

Focus

Block/File

Supported
Clouds

Key
Differentiators

Seven10
StorFirst EAS

Software

Archiving only File

EMC Atmos,
AT&T
Synaptic
Storage, Dell
DX6000 +
others

Multi-vendor,
multi-platform
and multimedia
archiving

StorSimple

Hardware
appliance
(with HA)

Primary,
Block
secondary,
data protection

AT&T
Synaptic
Storage, S3,
EMC Atmos,
Microsoft
Azure

Microsoft and
VMWare
certification

TwinStrata
CloudArray

Hardware or
virtual
appliance
(with HA)

Secondary,
Block
data protection

S3, EMC
DR anywhere,
Atmos,
Compute
Mezeo, Scality anywhere

EMC Atmos
GeoDrive

Software
feature

Cloud filer,
sharing

Atmos

File

Ease of use,
integration with
Atmos

PPLiER/ PRODUCT NA
As can be seen, the preferred use-cases for most of these cloud gateway products are data protection
and secondary storage. Very few of them explicitly target the primary space. Also, as of now, S3 seems to
be the preferred cloud storage vendor of choice. EMC is partnering with a lot of these vendors to fuel
Atmos cloud deployments. A notable feature is that except a few of them, most of them handle data at file
granularity. For vendors that focus specifically on data protection, it is reasonable to handle at the level of
files. However, vendors that are offering cloud gateways for primary, collaboration and sharing workloads
the rationale for file-level granularity needs to be established. In the rest of the section, we will present
more details of a few select products. Each of these represent one particular type of cloud gateway
architecture.

6.2

NASUNI

The Nasuni Filer is an on-premise storage device that serves as a cache to the cloud storage, where the
primary copy of the data resides. It is available both as a virtual machine (on VMWare or Microsofts
Hyper-V servers) and as a physical appliance. It supports NFS and CIFS, with full integration with Active
Directory, DFS, older Windows versions. Key differentiating features:

Performance with unique caching algorithms: Active data is stored in the cache for local storage
performance, while relatively inactive data, typically bulk file date, is stored off-site in the cloud
storage. For this offsite data, all metadata is cached. Upon a cache miss, the file requested is
returned in chunks so that users can access the data without waiting for the whole file to be brought
into the local cache. There are special algorithms to handle metadata versus data so that the system
is responsive when the user is scanning directory listings and browsing his folders. Being file-based

Cloud Storage Gateways

NetApp Confidential

6.3

and not block-based, the appliance prefetches the rest of the file upon the first access. It also does
file-type aware differentiation, like it uses the fact that Word documents are much more likely to be
updated and accessed than ZIP files. Has a configurable cache size based on the users working
data set.
Synchronous snapshots with fast restores: Snapshots stored in the cloud capture the filesystem
at user-defined points in time, providing versioning of data and eliminating the need for local backups.
As metadata is cached and restore happens in chunk units, users can access data immediately.
Snapshots are de-duplicated at file level and compressed.
Non-disruptive cloud-to-cloud migration: Downtime to move terabytes of primary storage data
from one cloud storage provider to another is only on the order of few minutes.
Global Multi-Site Access: Nasuni allows multiple of its storage controllers to have live access to the
same volume of snapshots. It provides two way synchronized read-write, so workers who move from
office to office can now ensure that they have fast access to local data. Virtual and hardware forms of
the appliance are interchangeable.
Most stringent SLAs: Nasuni guarantees 100 percent data availability and accessibility, with
significant penalties if the services are unavailable even for just a few minutes.
Stress Tests to qualify Cloud Providers: Through rigorous and ongoing testing, Nasuni chooses
only the highest performing cloud providers. It tests for performance, stability, availability and
scalability that organizations need to take advantage of the cloud for primary storage, data protection
and disaster recovery.

PANZURA

Panzura offers Application Cloud Controller (ACC) appliances that can serve as primary storage,
collaboration and file sharing at branch offices, and for backup and archiving use cases as an alternative
to offsite tape. It is available in virtual machine and hardware (1U/2U) form factors and currently supports
only NAS-based protocols NFS and CIFS. Key differentiating features:

Targeting Large Enterprises: Selling to high end of the enterprise market.


Application Network Storage (ANS): Revisits traditional network-centric storage by focusing on the
application and its data usage pattern. Includes deep packet inspection and acceleration, WAN
optimization, deduplication, encryption, offline file access.
Global namespace and file-system, global replication, global block-level deduplication and
global dynamic lock management: Unified file system spanning multiple physical sites. Metadata is
separated from data and this smaller metadata repository is quickly replicated to all of the ACC
appliances for visibility from all nodes for search/browsing operations with rapid response like local
file systems. Replication can be done either directly between appliances or through a cloud provider.
When a user requests to open a file that isn't stored locally, its data moves to the top of the replication
queue. Administrators can set replication policies to preload folders to specific locations. Global
locking keeps track of file accesses and grants write access to the first user requesting a file,
providing read-only access to subsequent requestors. This enables shared read/write access to data
for users in different locations without accidentally creating file corruption. No administrator
intervention required to maintain write order fidelity and atomic file synchronization. Upon a WAN
failure, users get read/write access to files that were last modified locally and read-only access to files
last modified in another location. Can also write simultaneously to multiple third-party clouds.
Application Integration (with SharePoint): Uses EBS (External Blob Storage) to split BLOBs from
back-end SQL database. The local ACC appliance intercepts and serves requests sent to SharePoint
server. Also supports Symantecs NetBackup application.

Cloud Storage Gateways

NetApp Confidential

6.4

High-Speed SSDs for Performance Acceleration: Upto 12 SSDs to hold frequently used data (front
end cache) to support multiple concurrent users. Admins can assign tiering policies so the .VMDKs
get the performance of flash but ARCHIVE.PSTs stay on disk.
High Availability with redundant components: RAID-5 or RAID-6 protection, hot-swappable
drives, redundant power supplies and fans.

STORSIMPLE

StorSimple is about separating off the top two tiers from an enterprise storage array, the SSD and fast
SAS disk drives and putting them in a 3U hardware appliance, with the rest of the array - the bulk data
storage part - replaced by the cloud. It offers the Armada hybrid cloud storage appliance as a primary
storage alternative to conventional block storage systems in midsized companies (~500 users) and
departments within enterprises. The iSCSI-based appliance is positioned as all-in-one primary storage,
archive, backup/recovery and DR in a single box.

Four Storage Tiers: SSD - linear (raw tier 1), SSD - deduplicated (tier 2), SAS deduped,
compressed, Cloud - deduplicated, compressed, encrypted.
Weighted Storage Layout (WSL) and BlockRank Algorithm: Figures out what data is relevant to
an application over a period of time and makes sure these hotspots/working set data is in the
StorSimple appliance while colder data goes out to the cloud. Transparently moves data across tiers
of storage to optimize performance and cost for example, 85% utilization threshold causes spilling
downward to lower cost, lower performance tier. WSL is automatic, dynamic and operates in real
time. Data is carved into variable-length blocks. WSL works at block level and uses the "Block Rank
to order blocks in terms of their usage patterns, frequency of use, age, reference counts and the
relationship segments have with each other, to find the right storage tier. Spilling can be controlled by
a per-volume priority setting (local-preferred, normal or cloud-preferred).
Application integration and application-specific optimization for Microsoft SharePoint and
Exchange 2010: Has application-optimization plug-ins which maximizes performance on a pervolume basis. Leverages Microsofts EBS and RBS APIs with SharePoint, wherein the SQL server
database is always stored on SSD whereas the content, including BLOBs like audio, video and CAD
drawings, can be spread over SSD, SAS drive or the cloud. With Exchange, leverages deduplicated
primary storage and the cloud for supporting DAG, increased mailbox quotas, PST centralization.
Can recover individual items or full mailboxes.
High Availability: Offers dual controllers for enterprise grade HA, redundant power supplies, network
connections and no single point of failure. Also supports non-disruptive software upgrades. Certified
by Microsoft and VMWare.

Others: Concurrent inline block-level dedupe using variable-length subblock segmentation. Cloud
Snapshots for data protection, backup/recovery, and Cloud Clones for off-site backup, geo-replication,
DR, tape replacement and Cloud Bursting for compute.

6.5

CTERA

CTERA offers a Cloud Attached Storage solution for SMBs and enterprise branch offices, that combines
secure cloud storage services with on-premises appliances and managed agents in an all-in-one solution
for backup and recovery, shared storage and file-based collaboration. Its products are all in hardware
appliance form, ranging from the consumer-centric CloudPlug to the enterprise-grade C800 8-bay
appliance. It supports file-based protocols such as NFS and CIFS.

Tiny form factor offering: CloudPlug is a plug-top computing device that instantly transforms any
external USB/eSATA drive into a NAS device with automatic secure cloud backup without the need

Cloud Storage Gateways

NetApp Confidential

6.6

for any user intervention or PC client software, and allows remote access, file sharing and
synchronization.
Next3 File System for Thin-Provisioned Snapshots: Developed on top of ext3, this creates
snapshots using dynamically allocated space, so there is no need to pre-allocate and waste valuable
space on the disk, and unused space is automatically recovered for file system use. It works by
creating a special, sparse file (that takes no space at the outset), to represent a snapshot of the
filesystem. When a change is made to a block on disk, the filesystem first checks to see if that block
has been saved in the most recent snapshot already. If not, the affected block is moved over to the
snapshot file, and a new block is allocated to replace it. Writes take a little longer due to the need to
move the old block. Over time, this fragments the contiguous on-disk format that ext3 tries to create,
affecting streaming read performance.
File and disk level backup: Backups can be stored both locally and in the cloud. Individual files
backup/restore as well as incremental, disk-level (bare-metal) backup of live servers is possible for
entire system recovery. Supports built-in and custom-created Backup Sets where each represents
a group of files of certain types and/or that are located in specific folders. Block-level as well as
partial-file deduplication is done.
Full Remote Management: All aspects of CTERA's solution can be managed remotely, with no onsite presence or intervention. Only require a Web browser to access, configure every feature,
including firmware updates, real-time monitoring and event notifications.

AMAZON AWS CLOUD GATEWAY

This beta offering from Amazon is a service connecting an on-premises software appliance with cloudbased storage to provide seamless and secure integration between on-site IT environments and AWSs
cloud storage infrastructure. It is a virtual appliance running atop VMWare ESXi hypervisor on a physical
machine with 7.5GB RAM for the VM, and 75GB of local disk storage (DAS or SAN). It exposes an iSCSIcompatible interface. It compliments on-premises storage by preserving low-latency performance, while
asynchronously uploading the data to Amazon S3.

Versioned, compressed EBS snapshots: The gateway proactively buffers writes temporarily on onpremises disks, before compressing and asynchronously uploading them to Amazon S3 where they
are encrypted and stored as an Amazon EBS snapshot. Each snapshot has a unique identifier for
point-in-time recovery - it is mounted as new iSCSI volume on-premise, and the volumes data is
loaded lazily in the background.
Backup, DR, Workload Migration Use Cases: Provides low-cost offsite backup using snapshots.
Amazon S3 redundantly stores these snapshots on multiple devices across multiple facilities, quickly
detecting and repairing any lost redundancy. If on-premises systems go down, users can launch
Amazon EC2 compute instances, restore snapshots to new EBS volumes and get the DR
environment up and running with no upfront server costs. To leverage Amazon EC2s on-demand
compute capacity during peak periods, or as a more cost-effective way to run normal workloads, the
Gateway can be used to move compute to the cloud by mirroring on-premises data to Amazon EC2
instances. It can upload data to S3 as EBS snapshots, from which new EBS volumes can be created
using AWS Management Console or Amazon EC2s APIs, and attached to EC2 instances.
Monitoring Metrics via Amazon CloudWatch: Provides insight into on-premises applications
throughput, latency, and internet bandwidth to S3.
Bandwidth Throttling: Can restrict the bandwidth between the gateway and AWS cloud based on a
user-specified rate for inbound and outbound traffic.
Gateway-Cached Volumes: Future support, wherein only a cache of recently written and frequently
accessed data will be stored locally on on-premises storage hardware, and the entire data set will be
in the cloud. Fits cloud as primary storage case, with low access latency to active data only.

Cloud Storage Gateways

NetApp Confidential

7 MARKET ANALYSIS
In section, we will provide an overview of the cloud storage market and its relation to the cloud market as
a whole. Specifically, we will focus on the growth of cloud service providers that offer Storage As a
Service (StaaS) in terms of both revenue and capacity. Within the context of StaaS, we will analyze the
market of cloud gateways and their impact.

7.1

STAAS MARKET OVERVIEW

Among the StaaS providers, AWS S3 is an established leader but far from the only player. Today, most
public cloud providers telcos.; Managed service providers; hosting specialists, etc have some form of
cloud storage offering, either on a stand-alone basis or as part of a broader Infrastructure as a Service
(ItaaS) or Platform as a Service (PaaS) capability. Therefore, cloud storage is starting to become a
material market from a revenue perspective.
With S3, as shown in Figure 6, we can see that the number of objects are growing at a staggering rate
approaching a 1 trillion objects. However, we do not have a distribution on the sizes of these objects are
not available. Also, the growth in paid object storage is not available directly.

Figure 6: Growth in number of objects in S3 (Source: Amazon)

From a revenue perspective, as per the 451 Groups Market Monitor Service, the StaaS market

Figure 7: Growth of StaaS market (source: The 451 Group)


Cloud Storage Gateways

NetApp Confidential

Figure 8: Segmentation of StaaS capacity

generated $388m in 2010, and will grow at a CAGR of 25% to reach $1.18bn in 2014 (see Figure 7).
These figures are for business-centric cloud storage and not consumer cloud storage. significant point is
that compared to all other cloud-based services, this sector has been experiencing the highest growth.
Probing a bit further, we break the StaaS market into three segments: stand-alone cloud storage, online
backup and archiving. As Figure 8 shows, an overwhelming portion of cloud storage objects are backup
streams (online backup) and archival datasets. We expect this trend to continue at least in the near
future. In light of this observation, the role of dedicated backup appliances in the datacenter is expected
to diminish. More backup objects will move to the cloud provided the throughput requirements for the
backup streams are being met when the target is cloud storage.

Application Areas

July 2011

April 2011

Email Systems

39%

35%

Customer Relationship Management (CRM)

35%

29%

Document and Enterprise Content Management (ECM)

22%

17%

Collaboration Tools

22%

22%

Business Intelligence / Reporting and analytics (BI)

21%

14%

Disaster Recovery/Failover

20%

20%

B2B e-Commerce (Business to Business)

17%

16%

Enterprise Resource Planning (ERP)

11%

7%

Test and Development

11%

9%

Supply Chain Management (SCM)

10%

6%

Table 2: Current cloud usage (Source: ChangeWave, Corporate Cloud Computing Trends, Aug 2011)

Cloud Storage Gateways

NetApp Confidential

In addition, to understand the relative adoption of public cloud storage services by different applications,
we present in Table 2, the survey results published by ChangeWave Research in their Corporate Cloud
Computing Trends report. The exact survey question was Which of the following areas does your
company currently support applications that run on public cloud computing services? (check all that
apply). As seen in the Table, respondents picked a wide range of uses for cloud computing. Although
heavily skewed towards cloud-email and CRM systems which are compute intensive and less of a
storage element, other storage/data-centric applications have been cited like disaster recovery, BI and
ECM.
All of this data illustrate that stand-alone external cloud storage is yet to take off as a significant enterprise
market in its own right. To encourage larger businesses to adopt cloud storage, some challenges need to
be overcome. The key challenge is to address performance limitations of accessing cloud storage over
the WAN. Though latency is the key metric that would be affected, given the nature of data stored, i.e.,
backup/archival streams predominantly, we notice that reduction in throughput is also cited as a big
concern. Therefore, solutions that can enable cloud storage access for enterprise has to focus on both
throughput and latency.
In addition, another survey by InfoPro pointed that the top inhibitor of cloud adoption is the
change/learning involved over and above expected concerns like security and cost. This indicates that
solutions that enable near-seamless access of applications to the cloud stand to benefit. Admins do not
want to change their enterprise applications to support a new cloud protocol (as opposed to standard
storage protocols - NFS/CIFS/iSCSI, etc) in order to leverage the benefits of cloud storage. Also, cloud
storage is viewed to restrict flexibility once data is in the cloud, it is very difficult to migrate it out, a
different form of vendor lock-in. In light of these concerns, an ideal cloud gateway solution that provides a
pathway to move enterprise application data to the cloud should provide: relatively good performance,
seamless integration to existing apps, default access to multiple clouds and the ability to move-in/moveout of clouds. We will probe deeper into the existing cloud gateway market next and examine existing
vendors solutions from the perspective of these requirements.

7.2
7.2.1

CLOUD GATEWAY MARKET


Workload Analysis

Before we understand the nascent cloud gateway market, it would be userful to assess cloud gateways
from the perspective of workloads they can support. As per Gartner, in Figure 9, we show the different
workloads that can be relevant for cloud gateways. This picture splits the workloads into two categories
gateway reality and gateway potential. In addition, there is a third category, labeled as Tier-1 workloads
(such as OLTP, financial databases) that are latency-sensitive which are expectedly kept out of both
these categories.
With regards to the gateway reality, Tier-3 workloads such as Backup/DR, file and email archives have
been focus of existing vendors. This picture from Gartner also includes Home directories as well as file
distribution workloads under Tier-3. From our experience at NetApp, this is considered as primary
workloads by many of our customers and a prime use-case for NetApps FAS systems. Moreover, we do
not see the gateway being deployed for these workloads in our analysis. So, we do not believe that these
two workloads fit in the gateway reality category but should be classified under gateway potential. For
Backup/DR workloads, NetApp does not sell a competitive product in the dedicated backup appliance
space (like EMC/DataDomains BRS). Therefore, having a cloud gateway solution would present an
opportunity to have a solution in this space, albeit the storage could be residing in a public cloud.

Cloud Storage Gateways

NetApp Confidential

Under gateway potential, we see the typical Tier-2 applications, mainly primary workloads: email,

Figure 9: Cloud Gateway workloads (Source: Gartner)

collaboration, workgroup files and development/test. These workloads are characterized by applications
that care about latency and throughput, but can tolerate some variation in both of them. From NetApps
perspective, it would be useful to understand our share of this tier to assess if cloud gateways represent a
threat to an existing sales segment.

FY11
Size
($ Billion)

DAS (incl. Server

($ Billion)

(%)

(%)

FAS
Rev

E-Series
Rev

($ Billion)

($ Billion)

~$16.01

~$8.01

-13%

~3%

Active and Dark


Archives

$0.6

$2.6

33%

~0%

Enterprise Content
Depots

$1.3

$6.0

36%

~0%

$4.6

$8.0

12%

~29%

$1.3

$10.0

$19.3

14%

~22%

$2.2

(Collaboration, App Dev,


Tier2-BP, parts of IT infra,
parts of Web Infra)

$8.8

$6.3

-7%

~16%

$1.4

HPC, FMV, VSS

$1.5

$2.9

15%

~7%

Attached Storage)

Big
C

FY16 FY11-16 FY11 NTAP


Size CAGR
Share

File Svs, HomeDirs etc

$0.4

Virtualized

SVI

Big
B

(Collaboration, App Dev,


Tier2-BP, parts of IT infra,
parts of Web Infra)

Virtualizing

Big Analytics

$0.1

$0.6

$2.6

34%

~0%

Tier1-BP

$4.5

$5.0

2%

~4%

$0.2

DSS / DW

$8.5

$10.3

4%

~2%

$0.1

(DSS/DW, Web Infra)

Non
SVI

15.4% Share in
the Open N/W
Storage Market

$0.2

Figure 10: NetApp share based on workloads

To understand the impact of cloud gateways on NetApps revenue streams, we have in Figure 10, the
breakdown of our current revenue based on both products as well as workloads over the different

Cloud Storage Gateways

NetApp Confidential

financial years. It is clear that the bulk of the revenue is obtained from two broad segments: a)
Collaboration, App Development, Tier-2 business processing applications, web infrastructure both
virtualized and non-virtualized segments contributing $3.6B towards total revenue b) File services and
home directories contributing $1.4B to the total revenue. As can be seen, a significant portion of these
overlap with the workloads that can be potentially satisfied with a cloud gateway (as seen in Figure 5).
This is the clear potential threat that NetApp to its current revenue generation model from the emerging
cloud gateway vendors.

7.2.2

Cloud gateway vendors share

Since this market is very nascent, most cloud gateway vendors do not have more than a dozen
customers. Some of the vendors, like Panzura sell their solution exclusively to enterprises, inspite of the
number of customers, each customer might lead to a bigger revenue. On the other side of the spectrum,
there are vendors like CTERA who have a consumer focus and a larger number of customers. Storsimple
and other like them fall in between these two extremes. In Figure 11, we have an approximate number of
customers for many of these cloud gateway vendors, their focus and investors. These numbers indicate
that this market is pretty nascent. However, with the emergence of Amazons AWS gateway and other
gateways from established enterprise storage system vendors might change the landscape significantly.

Figure 11: Cloud Gateways, customers and investors

From this Figure, we can also see that more than $100m of VC funding has been infused into this space.
Most vendors have an SMB or MSE focus, with some notable failures as well (Cirtas). There is a highlevel consensus between different analysts like Gartner, ESG and 451 Research Group in their studies of
this market:
a. Current vendors are mainly startups focusing on SMB/MSEs with backup/archival as the primary
use-case.
b. Most vendors have significant disadvantages compared to enterprise storage systems today
lack of standard high-availability/reliability features and enterprise readiness (untested file
systems). Only a few of them have HA as a default option.

Cloud Storage Gateways

NetApp Confidential

c.

7.3
7.3.1

Established enterprise storage vendors (like EMC or NetApp) can develop solutions that augment
their existing solution portfolio, i.e. make cloud storage as an extra tier as well. In addition, they
are not constrained by many of the disadvantages the smaller competitors, especially enterprise
readiness. Once one of the established storage vendors have a solution, they feel that cloud
gateways would be adopted faster in enterprise data centers.

CASE STUDIES
Medplast

MedPlast provides thermoplastic and elastomer molding products and related services to the healthcare,
pharmaceutical and certain consumer/industrial markets. The company has about 600 employees across
five sites and is approaching $100million in revenue. It fits clearly in the MSE space that many gateways
are targeting.
Medplasts IT department consists of just one person running all IT-related operations. Accordingly, it has
a heavy emphasis on outsourcing IT processes and applications and retain only critical systems under
direct control. The company already uses SaaS and hosted applications (like Rackspace-hosted
Exchange). However, it generates a lot of critical data internally, mainly manufacturing and engineering
related. It used a tier-one Hitachi SAN for this data. Medplast was using EMCs Data Domain target
arrays for backup/recovery via Veeam application instances and tape backups for off-site DR.
Upgrading the Data Domain systems to growing data volumes and corresponding upgrade cycles led
Medplast to explore cloud storage for backup/recovery. This led them to cloud gateways as they offer
both primary storage as well as a means to backup into cloud storage, and also eliminate tape backup in
the process.
Medplast worked with Cirtas but replaced them with StorSimple. StorSimple offered them with an iSCSI
interface to their applications; enterprise-grade capabilities (specifically high-availability), and was a
certified VMWare and Microsoft partner (vendors Medplast uses extensively). Regarding the cloud
storage provider, they chose Amazon S3 because of their ability to geo-replicate for no extra charge.
StorSimple deployment at Medplast consists of a HA pair, with 10TB data local and 4TB in the cloud.
These storage systems are used by Medplast as primary storage for mission-critical applications and
other Tier-2 apps ERP, Sharepoint and file servers. Therefore, usage of Tier-1 storage has reduced
considerably. Also, StorSimple enables simpler backup and DR operations and eliminated tape backups
entirely. In addition, Snapshots are also available via StorSimple for local recoveries.
In summary, Medplast was able to solve all their storage requirements (primary, backup and DR copies)
necessitated by growth via StorSimple.

7.3.2

NYU Langone Medical center

NYU Langone Medical center comprises NYU school of Medicine and three hospitals and has a trifold
mission: patient care, biomedical research and medical education. The storage-engineering department
at the center has faced numerous challenges in recent years, which led it to evaluate cloud-based
options. The department had a four-tier storage strategy that it was looking to squeeze more efficiencies
out of : tier-1 was high performance SAN running on EMC VMAX; tier-2 was IBM XIV; tier-3 was NAS
(Windows-based file server clusters mapped to SAN); and tier-4 was off-site storage for archive/retention
and setup.
Key challenges included data growth in the order of tens of terabytes and investing in individual storage
systems were no longer feasible. Also, the center is planning to move its primary datacenter (currently
housed in an IBM-hosted and managed facility, where they have about 70TB of tier-1 storage) to reduce
operational costs for storage and data retention.

Cloud Storage Gateways

NetApp Confidential

They evaluated systems from IBM and EMC, but neither could offer performance or cost advantages for
tier-4 storage. They started looking at cloud storage providers and settled on Nirvanix for the off-site cloud
component, primarily based on cost. Nirvanix was able to offer storage at $0.15/GB/month including
unlimited data movement into and out of the cloud versus $0.87/GB/month it was paying for IBM
storage. With Nirvanix, it needed a means to send data to the cloud with performance requirements, this
effort lead to investigating cloud gateway options. Key requirements included performance and scalability
to hundreds of users as well as high availability, seamless/transparent data access (over CIFS and NFS)
and efficiency (dedupe and compression). They settled on Panzuras Alto Cloud controller after
evaluating a number of options. They have recently finished a pilot deployment with two 20TB controllers,
with plans to move into production by end of 2011. Panzuras selection was based on its ability to have a
large front-end cache with SSDs that allowed it to scale to hundreds of users concurrently. Panzuras
global namespace was a factor too. It enables their infrastructure to grow incrementally by adding
gateways at remote locations with a single namespace and point of management.
The center is initially using Panzura for its research department replacing low-end NAS units that
individual researchers had purchased in the past. But it quickly realized that there are other potential
workloads, including archival workloads (it currently archives to EMC Centera via Symantecs Enterprise
Vault) can move to the gateway. Its also considering Panzura/Nirvanix to replace their tape-based backup
system. According to them, Panzuras snapshots and Nirvanixs replication functions effectively remove
the need for backup. In all, 100TB of local data can move to the cloud.
The center is aware of Panzuras shortcomings in their essentials list richer Active Directory integration;
end-user snapshot-based recovery; and global read/write for NFS as well as CIFS. However, they
believe Panzura/Nirvanix combination could form an integral part of their next-generation datacenter
buildout. Initially, Nirvanix was used only for tier-4 storage, but with Nirvanix hNode deployed in their local
datacenter, they are contemplating using Panzura/Nirvanix for both tier-4 and tier-3. Panzura would be
used to front-end the hNode appliance and export a global namespace.

7.3.3

Seneca Data

Seneca Data is a IT value-added distributor and custom systems manufacturer that focuses on building
integrated systems and related services for resellers, OEMs spanning servers, desktops/laptops and
storage. Its business is split into three divisions partnering services, engineering services and life-cycle
management services.
The company has been experimenting with cloud-based technologies and services via its CLOUDeCITY
(www.cloudecity.com) service a marketplace for its 3000-4000 resellers to offer cloud services to SMBs
and others looking for web-based tools and applications to enable sales, marketing and operational tasks.
Seneca believes that CLOUDeCITY offers resellers a recurring revenue model without much investment
in a complex and expensive infrastructure. The portal, started in 2011, offering modest services CRM
(based on SugarCRM), website/email hosting and blogging tools. However, interest has grown and there
is demand for higher functionality tools such as business-productivity, finance and even specific, vertically
oriented tools like healthcare tools.
As part of this expansion, Seneca formed a partnership with CTERA to add managed storage and dataprotection services based on CTERAs cloud storage gateway. Although Seneca markets an online
backup service called DataMend for users that require a more customized offering, it found value in
CTERA because of its simplicity and ease-of-use (plug and play functionality).
The partnership initially focused on CTERAs CloudPlug, offering shared local storage, cloud backup,
snapshots and browser-based file access aimed at consumers and small-business users. However, it has
since added the CTERA C200 and C400 appliances, which offer 6TB and 12TB of local storage,
respectively as well as integration with the CTERA cloud for backup, remote access and collaboration.
Seneca has plans for larger appliances from CTERA in the near future. Currently, CTERA charges a
monthly fee for the CloudPlug, C200 and C400 appliances, which increases with more storage. In

Cloud Storage Gateways

NetApp Confidential

addition, CTERA has workstation agents that can backup/recover Microsoft Exchange, SQL and Active
Directory data. As of now, Seneca claims their customers are pushing around 30% of their locally backedup data to the cloud.
7.3.4

Energy-industry customer

One of Panzuras early wins was with a large energy-industry customer that deployed Panzuras Alto
Cloud Controllers to create a private cloud storage alternative to off-site tape repositories. Prior to
Panzura deployment, the customer used tapes to move older seismic data to an off-site repository, which
created potential data-leakage vulnerabilities since the tapes were not encrypted.
The customer creates 6PB of data per year, which consists of seismic trace files that are a few hundred
terabytes in size. Prior to its Panzura hybrid cloud deployment, data-restoration jobs could potentially take
weeks to accomplish using off-site tape repository, a process that takes only a few hours using cloud
storage.
Beyond backup, more importantly, the customer is using the cloud to keep more of its data set online and
could potentially use hybrid cloud storage to extend data access to its partners. The company uses
BlueArc and NetApp NAS systems to hold live data, which offloading older data sets to the off-site tape
archive. Storage costs for the customer were reduced from $6m to $1.2m per year with the hybrid
appliance and private cloud storage resources eliminating the need to purchase additional NAS systems.
The backup-replacement solution in the cloud costs them $0.5/GB versus $2/GB for data held on off-site
tape. Also, to share their large data sets with partners, they currently ship the NAS systems to them.
However, with Panzura gateways, the data may be shared remotely from the private cloud. They have
their data in a private cloud as opposed to a public cloud due to security concerns and do not plan on
changing it anytime soon. Panzuras compression capability help reduce the size of nearly 6TB of data it
needs to transfer nightly. Also, Panzuras security certificates and key management mechanisms were
essential features expected by this company.

7.3.5

Psomas

Psomas is a 500-person civil-engineering firm based in Los Angeles, serving public and private clients in
the transportation, water, site development, federal and energy markets. The company has 10 offices
plus a datacenter spread across the western US, including locations in California, Arizona and Utah.
By 2010, Psomas was struggling to meet its recovery objectives through its existing tape-based backup
infrastructure. Partly due to long-standing policy to back up everything , it could not meet its backup
windows. Moreover, it was running into tape-media upgrade and reliability issues, and was also finding
that maintaining tape-based backup infrastructure at each remote office was increasingly difficult to
justify.
Psomas team started exploring disk-based backup alternatives, including EMCs Data Domain based
backup-to-disk with de-duplication, plus options from Symantec and IronMountains online backup
services. The latter two were too expensive to begin with and Data Domains solution had high up-front
capex costs.
Meanwhile, Psomas was exploring a new storage project from an existing vendor Riverbed. It was one
of the early beta customer for Riverbeds Whitewater product before converting into full production
customer in early 2011. It is now running the appliance in all of its locations, almost exclusively as virtual
appliances and has transitioned away from tape-based backup completely. Psomas leverages the
Whitewater appliance to backup to local disks for operational recovery and uses the cloud for DR
purposes. As part of this move, Psomas selectively backups only certain data CAD files, office
documents and databases. Also, Psomas moved to a hosted Exchange service, so it need not backup
emails anymore.
Psomas uses Amazons S3 as its cloud storage backup target and has currently 12TB of backup data in
the cloud (with dedupe ratios of 20:1). With this approach, Psomas does not worry about running out of

Cloud Storage Gateways

NetApp Confidential

capacity and upgrading hardware. Amazons very high reliability (eleven nines) and availability (four
nines) are important assurance metrics for them.
Overall, Psomas has high confidence in its backup infrastructure with very fast restores. Moving to
Whitewater is expected to save them around $80000 per year across capex and opex. In addition, it is
exploring leveraging Amazons EC2 for running its Autodesk CAD application to enable cloud bursting
during busy periods.

Cloud Storage Gateways

NetApp Confidential

8 NETAPP DIFFERENTIATION
In this section, we will discuss the various advantages that NetApp enjoys in comparison to other cloud
gateways vendors which enable significant opportunities.

8.1

UNIQUE ADVANTAGES OF HAVING A NETAPP SOLUTION IN THE MARKET

NetApp exploits different types media in its storage systems to deliver compelling value to its customers
at a low TCO. This includes low-cost SATA drives that are made enterprise ready by providing reliability
features in software over and above the raw disks. Cloud storage can be perceived as one such medium
with unique characteristics infinite capacity; low cost of maintenance/management; SLOs defined by
cloud service providers (performance, reliability, etc.); poor performance - varying bandwidth; high
latency; object-based interface and global access. Just like ordinary low-cost SATA disks were made
useful in the enterprise context applying value-added features in software, cloud storage can be made
into enterprise-class by providing significant value above raw cloud storage.
NetApp already enables different storage tiers in the datacenter ranging from a performance oriented tier
at or closer to the host (like flash-based Project Mercury); primary storage tier (FAS based systems);
archival tier for disk-to-disk backup based on SATA drives. Having a cloud gateway would enable cloud
storage to be an extra tier in this hierarchy. With SLO-based (or policy based) data management,
NetApps data management software can identify and move data to the appropriate tier based on data or
workload properties. Hot data will move closer to the higher performance tier and cold data will move to
the slowest tier. The cloud storage tiers performance characteristics are largely governed by the
providers policies and their cost structure. For example, Amazons price and performance varies
significantly based on the regional data center picked. So, cloud storage can be an effective tier with
flexible characteristics based on cost and enhance our SLO-based data management vision.
With most cloud storage providers, the cost of cloud storage is dominated by two factors network cost of
access (both the number of requests and amount of bytes transferred) and the amount of cloud storage
used. We can reduce both by effective caching and prefetching strategies for suitable workloads. Such
techniques have been exploited by many vendors in this space. Given the latency, only write-back
caching strategies are feasible for data exported by a cloud gateway. The amount of dirty data buffered
depends on the cloud storage performance. Compared to LAN use-cases, where the NVRAM is sufficient
to buffer adequate amount of new writes, with WAN latencies, more capacity is needed, leading to diskbased buffering. With any write-back caching strategy, the buffered data needs to be protected against
failures that can lead to permanent data loss of this data. Typically, this is done by means of a highavailability solution that entails replicating the buffered data to another system. The replication needs to
happen with very little overhead lest it affects latency. Building a high-performance HA solution requires
significant effort and time. Most of the current gateway startups do not have such technology. NetApp has
had HA techniques by means of specialized low-latency hardware (infiniband interconnect for NVRAM
replication) for a long time now and can be exploited fully in this context to provide reliability for the
buffered data. This is a significant and key advantage for NetApp other enterprise storage system
vendors. Also, gateways that are only virtual appliances, need to use standard Ethernet based schemes
for replicating data between two instances. In such cases, the clients will experience latencies
comparable to two Ethernet network hops, making them infeasible for many latency-sensitive primary
workloads.

8.2

SOLUTION GAPS GATEWAY OPPORTUNITIES FOR NETAPP

Cloud gateways can also enable NetApp to fill gaps in its solutions portfolio. As of now, we do not have a
compelling solution to rival EMCs Data Domain backup appliances. Enabling cloud-based backups by
means of a cloud gateway can satisfy this gap. In addition, the key attributes of the Data Domain

Cloud Storage Gateways

NetApp Confidential

appliances need to be matched inline deduplication to reduce storage capacity with a high ingest rate to
enable timely backups.
Among the features, inline deduplication is essential in the cloud gateway context to reduce cloud storage
capacity irrespective of the workload (backup, archival or primary). Since the gateway is positioned to
reduce cost for a customer, storage efficiency has to be a default feature. Inline deduplications utility for
backup streams is more critical, given the typical capacity savings seen (above 90% is not uncommon).
For a high ingest rate, we need ensure efficiencies in buffering data on disk reliably and in destaging to
the cloud. To buffer data on disk reliably and with high performance, our current techniques on NVRAM
mirroring should suffice. With regards to destaging, we need to develop strategies that are cognizant of
the cloud storage providers characteristics. Most cloud storage services, like Amazons S3, are most
cost-effective and provide good throughput when writing out large objects via parallel network streams.
This implies allocating appropriate disk capacities on the cloud gateway to buffer large objects with
destage activities at the right intervals. Implementing such techniques can be accomplished with
reasonable effort and can lead to a compelling NetApp solution to compete with EMCs Data Domain
appliances.

Cloud Storage Gateways

NetApp Confidential

9 RECOMMENDATIONS FOR NETAPP


In this section, we will discuss the different options for NetApp in the cloud gateway space and the
rationale, timescale for pursuing those options. We will start with a look at the key IP we require to have a
compelling and complete solution followed by an assessment and comparison between organic product
development (Build) with NetApp, acquisition of existing vendors (Buy) and partnership (Partner)
options.

9.1

KEY INTELLECTUAL PROPERTY (IP) NEEDED FOR CLOUD GATEWAYS

In a previous section, we outlined two reasonable outcomes for positioning a NetApp cloud gateway
product a backup, tape replacement using cloud storage solution; a primary storage appliance for
specific Tier-2 application with integrated, SLO-based data management features. For both these cases,
before evaluating the ways to bring a cloud storage gateway solution to the market, the following are key
technical building blocks required for such solutions:
1. Basic infrastructure (for both secondary and primary storage solutions):
File to object protocols conversion.
Volume to objects/group of objects data granularity mapping.
Security of objects in the cloud (encryption).
2. Value-added features (for both secondary and primary storage):
Compression
Deduplication
Application integration.
Cloud-aware management (like auditing)
3. Optimizations for a viable primary specific storage solution:
WAN latency optimization via read and write back caching, pre-fetching
4. Infrastructure for global collaboration of a primary storage repository:
Global Locking across a WAN.
The more immediate need is to deliver a secondary storage solution based on the market demand. In
terms of IP, NetApp has a working prototype of primary and secondary storage infrastructure (1) but not
production ready functionality. NetApp has most of the IP for value-added features of primary and
secondary storage (2), except for cloud-aware data management as these are available within Data
ONTAP today. Whereas, optimizations for a primary specific storage solution (3) and infrastructure for
global collaboration (4) are new areas where we need to get IP.
NetApp has three possible options for the building blocks for the cloud storage gateway solution:
a. Organic development within NetApp (Build)
i. Pros: Ability to provide this solution as a simple license key that enables
customers to easily direct their data to external cloud storage. All NetApp value
add (reliability, performance, data mgmt., SLO mgmt.) can be exposed.
Prototype already exists for secondary data use cases.
ii. Cons: Slow Time to market. No opening in the roadmap to deliver the (1) and
(2) functionality until 2015. Resources required to deliver the solution. Primary
storage use cases would need to be addressed after that timeline
b. Partner or Buy & Rebrand (Partner)

Cloud Storage Gateways

NetApp Confidential

c.

i. Pros: Immediate time to market


ii. Cons: Difficult to position existing lower end FAS solutions with partner solutions.
No differentiation versus other cloud gateway solutions. Potential overlap in
positioning low end FAS and cloud gateway solution. New solution for customer
to manage. Cost of purchase to NetApp in the case of buy & rebrand. No unique
IP being purchased in the case of buy & rebrand.
Buy and integrate (Buy)
i. Pros: Ability to provide this solution as a simple license key that enables
customers to easily direct their data to external cloud storage. All NetApp value
add (reliability, performance, data mgmt., SLO mgmt.) can be exposed.
ii. Cons: Cost of purchase. Slow time to market as work to integrate requires data
path work on ONTAP. Resources required bringing IP into the NetApp stack. No
unique IP being purchased.

Based on the assessment of the three options, the recommendation is two-fold:


1. Tactical: Drive partnership with 1-3 companies to have an interim solution for time to market
reasons to ensure stickiness with existing customers that wish to leverage the external cloud
storage for certain types of data. Enables service provider customers to easily ingest data from
the enterprise to their external cloud storage. Will require careful positioning versus existing lowend FAS.
2. Strategic: Change the roadmap priority to deliver a compelling cloud gateway solution for
secondary data sooner than Longboard. In parallel, conduct due diligence on interesting startups
with unique IP for potential acquisition to enable primary data use cases.

Cloud Storage Gateways

NetApp Confidential

10 NETAPP PROJECTS
In this section, taking into account all the observations from the previous sections, we will recommend a
few projects to help understand cloud gateways as well as answer questions that will help NetApp decide
on the nature and contours of future products in this space.

10.1 PROJECT 0: V-SERIES - SNAPVAULT TO THE CLOUD


This is a project that is already underway in the V-series product group and is headed by Chris Busick.
Project Goal: This project aims to provide the ability to copy snapshots from traditional Netapp storage
systems to cloud storage.
Synopsis: The project aims to build a module below the disk subsystem that will perform the translation
of blocks to cloud objects and store them on Amazons S3. Coalescing of a fixed number of blocks to a
single Amazon object are being studied to reduce network traffic to S3 as well as to improve throughput.
Initial observations indicate that the latency to S3s network vary widely. As part of this project, the basic
infrastructure pieces needed to communicate with Amazon AWS (via the S3 protocol) would be built
within ONTAP and can be leveraged by further ATG projects.

10.2 ATG PROJECT 1: UNDERSTAND CACHING BEHAVIOR AND PERFORMANCE FOR


TIER-2 WORKLOADS (WAN LATENCY MITIGATION)
This will be the first foundation project that will help us understand the performance characteristics of
workloads that can leverage the cloud gateway. Specifically, workloads that currently use NetApp FAS
systems for their storage.
Project Goals: The goals of the project are the following:
1. Assess performance boundary limits (using ONTAP): Understand the performance limitations of
cloud gateways with a realistic cloud storage provider (like Amazon S3) and assess the
applicability of a gateway for Tier-2 application workloads.
2. Efficient read and write-back caching, prefetching strategies: Assess different caching
technologies in this context for read caching. Evaluate methods to perform write-back caching
(buffering on disk) in this context and pick an efficient strategy. Also, examine prefetching
strategies that reduce cold-misses that impact latency. Overall, look at different strategies to
mitigate WAN latencies and its relationship to total cost.
3. Efficient inline deduplication: Since deduplication is vital to reduce cloud storage costs, it would
be explored as part of this project. However, it introduces extra complexity in the code path as it
has to be done inline to requests. In addition, deduplication can reduce network traffic
significantly to further reduce costs.
4. Efficient cloud interaction: Since the cloud gateway would be bridging file or block-based
protocols to cloud storage, we need to map blocks to objects. An efficient mapping should enable
flexibility and lesser network traffic to the cloud.
5. Efficient metadata management: This is a side-effect of many of the features like deduplication,
block to object mapping mentioned above. These features imply maintaining different types of
metadata, i.e. lookup tables. These lookup tables might need to be kept on disk or flash due to
their size. Keeping them efficient in terms of both size and access performance are important to
keep the overall performance of the gateway high.

Cloud Storage Gateways

NetApp Confidential

6. Cognizant of VSA deployment: The cloud gateway might need to be deployed as a VSA in
multiple scenarios. VSAs have their own peculiarities in terms of performance and resources.
While designing the gateway appliance, it is important to be cognizant of how features might
change in the VSA context.
Comments: As mentioned in the goals, this project will be performance centric, while building a relatively
realistic cloud gateway prototype with a lot of key features included. We hope to build the prototype
leveraging ONTAP features wherever applicable and evaluate the prototype in a physical appliance. The
outcome of the evaluation would be a detailed performance analysis of a cloud gateway appliance and its
applicability to common Tier-2 applications. The cloud storage would likely be Amazon S3, and the Tier-2
applications we would like to evaluate in this context are Microsoft Exchange and Microsoft Sharepoint.
Since this is the first cloud gateway project, some essential features like security and reliability have been
kept out of the scope. In the VSA context, compared to the physical appliance, the cloud gateway might
be handicapped in terms of features and performance. Thus, further reducing the list of Tier-2 workloads
that can be supported. These workloads would be highlighted as part of our evaluation of the VSA as
well.

10.3 ATG PROJECT 2: CLOUD SECURITY AND RELIABILITY ACROSS CLOUD


PROVIDERS
This project is a potential followup project to Project 1 above and can leverage the infrastructure built.
However, the scope of this project does not overlap with the previous ones, so it can be executed in
parallel to the above projects.
Project Goals: The primary goals of the project are the following:
1. Security: With a cloud gateway, enterprise data is leaving the datacenter over a publicly accessed
WAN network and is stored for prolonged periods of time at a cloud storage provider. Therefore,
security concerns need to be addressed to prevent enterprise data from being compromised. A
commonly applied technique is to encrypt the data before sending it to the cloud storage provider.
The cost of such encryption and the corresponding decryption needs to be evaluated in this
context. Moreover, efficient key management for the encryption is another area that needs to be
addressed.
2. Reliability: To prevent data-loss in an traditional enterprise data center, parity schemes such as
RAID are being employed. However, with cloud storage, traditional RAID algorithms might not be
applicable given the scale of data stored. Alternative techniques need to be evaluated. Moreover,
with cloud storage, it is essential that data loss due to network outage to a specific cloud storage
provider is protected. This data along with their associated protection metadata like parity
blocks or erasure coded blocks need to be spread across multiple cloud providers to enable
recovery of the data when one or more of them fail or suffer outage.
3. Auditing: The actual cost of cloud storage is dependent on the number of network accesses made
to the cloud service, the amount of data transferred as well as the actual number of bytes stored
in the cloud. In order to validate the bill presented by the cloud storage provider, for the data
accessed via the gateway, the gateway has to audit all the traffic to the cloud.
Comments: These features are essential to have a viable gateway product. Most vendors in this space
solve these issues in different ways. The project would aim to look at all the options available and make
the right recommendations.

10.4 ATG PROJECT 3: CLOUD-AWARE MANAGEMENT


This project will look at the data management aspects that are introduced due to the cloud gateway in an
enterprise datacenter. Some of the issues are extensions to known issues, while some are specific to the
cloud.

Cloud Storage Gateways

NetApp Confidential

The primary goals of the project are:


1. SLO-aware cloud tiering: With cloud storage being a tier in the storage hierarchy, we need to
make this tier SLO-aware. The cloud provider would export a set of SLOs. The intelligence in the
gateways would be able to enhance or provide stricter bounds for these SLOs. Together, we will
have a new set of SLOs for the cloud storage tier. The rest of the data management infrastructure
in the enterprise datacenter needs to incorporate these new SLOs and make them available to
the end-clients.
2. Cloud storage billing support: As mentioned in the previous project, auditing support is an
essential feature. In addition to auditing, we need to have management infrastructure in the
gateway that will allow admins to query the gateway to ascertain costs dynamically. For example,
if the admin would like to know the cost of a specific backup or a set of backups, we need an
interface to expose the actual costs for a specific time period.
3. Cloud storage supportability: With cloud being a tier, it introduces a new support tier as well. The
problems and their resolution with cloud storage are vastly different from traditional enterprise
storage. We need to have the right management infrastructure to assess and resolve cloud
storage problems efficiently.
Comments: Among these challenges, making the cloud storage tier SLO-aware is the biggest one.
Moreover, it would enable NetApp to extend our SLO-aware management portfolio and provide greater
value to our customers.

10.5 ATG PROJECT 4: CLOUD ENABLED GLOBAL FILE SYSTEM


This project envisions cloud gateways to enable a single, geo-distributed, global file system, where the
actual data of the file system resides in the cloud, and the different cloud gateways provide access to this
file system. This project requires that some of the essential features outlined in the previous projects are
complete and we have a viable single-site cloud gateway.
Project Goals: The goals are the following:
1. Globally distributed clients access a single file system: All data and metadata corresponding to the file
system is resident in the cloud. The gateways are deployed at different sites (or datacenters) that are
globally distributed. All the gateways provide a view to a single file system (and namespace) and
coordinate among themselves to enforce consistency and coherency.
2. Global data consistency mgmt : Since files could be shared in the single file system across sites,
there needs to be global data consistency management between the gateways by means of a global
lock manager. Making sure the manager works effectively across WAN distances is the key
challenge. Also, understanding and articulating the consistency semantics in this scenario is another
challenge.
3. Global delegation coordination: For performance reasons, some gateways might need
complete/exclusive access to certain portions of the global file system for prolonged periods of time,
i.e. delegation. Since the delegation managers are WAN separated, new mechanisms to handle
delegations might need to be devised.
Comments: Panzura already has this feature and is a key enabler in many of its enterprise accounts.
Given the nascent customer base of Panzura, it is not clear how valuable this complex feature is for the
cloud gateway market.

Cloud Storage Gateways

NetApp Confidential

11 REFERENCES
1. Burton Group. Cloud-Storage Gateways: Bridge the Gap. Jan 2011
2. The 451 Group. Cloud Storage On-Ramps. November 2011.
3. Nasuni. Nasuni unveils New Storage Services Backed by a 100 percent uptime Service Level
Agreement. July 18, 2011
4. Intel Corporation. An Enterprise private cloud architecture and implementation roadmap. June
2010.
5. Ian Howells (StorSimple). Cloud-as-a-Tier: Building an Enterprise Architecture For Secure HighPerformance Hybrid Cloud Storage. May, 2011
6. Peer-1 Hosting. Peer-1 Hosting Rolls Out EMC Atmos. July,2011
7. Dale Stara, Everest Group. Where Are Enterprises in the Public Cloud? | Gaining Altitude in the
Cloud. April, 2011
8. Frank Gillett, Forrester. The Age of Computing Diversity. September 2010
9. Gene Ruth, Burton Group. Market Profile: Cloud-Storage Service Providers, 2011. Dec 2010
10. Intechnica. How fast is the Cloud? June,2011
11. Cloud.com. 2011 Cloud Computing Outlook | Survey Results. 2011
12. The 451 Group. CTERA grows hybrid cloud storage base, partners with EMC. Aug, 2011
13. Intel Corporation. Taking Control of the Cloud for Your Enterprise. 2011
14. Gartner. Magic Quadrant for WAN Optimization Controllers. Dec, 2010.
15. Gartner. Hybrid Cloud Appliances Expand Cloud Storage Use Cases. Jan 2011
16. Gartner. Cloud-Based Server Backup Services 1Q11 Update. Jan,2011
17. Gladinet. Gladinet Blog (http://gladinet.blogspot.com). June 2011
18. The 451 Group. (Mezeo) IaaS, PaaS and Enabling Technologies Market Monitor. Jan, 2010
19. Gartner. Cool vendors in Storage Technologies. April 2011
20. Gartner. Cloud IaaS: Adding Storage to Compute. Oct 2010
21. Gartner. Hype Cycle for Business Continuity Management and IT Disaster Recovery
Management. Sept, 2010
22. Gartner. Competitive Landscape: Cloud Storage Infrastructure As a Service, North America,
2010. June 2010
23. Gartner. Cloud Storage: An Emerging Market. June, 2009
24. The 451 Group. Nirvanix Quadruples Cloud Storage Capacity, Eyes European Expansion. April
2010
25. Panzura. The Panzura Global Cloud Storage Platform. 2011
26. The 451 Group. Do Startups with On-Ramps hold the key unlocking Cloud storage? June 2010
27. Mezeo. Distributed data store and async geo services with single namespace. June 2010
28. Amazon. Amazon S3 announces server side encryption support. Oct, 2011
29. Network Computing (magazine). Quest and StorSimple Collaborate On Cloud-Based Backup.
May, 2011

Cloud Storage Gateways

NetApp Confidential

30. CipherCloud. Cloud Encryption Gateway. Sept, 2011


31. Information Week (magazine). HP Takes On Amazon With Enterprise Cloud Services. Jan, 2011
32. IDC. How Conigent Effectively Leverages Public Cloud IT Infrastructure with Egnyte. Dec 2010
33. The 451 Group. Nasuni unveils File aims to be NetApp for cloud storage. Feb 2010.
34. ESG. Nasuni builds a bridge to the Cloud. April 2010
35. The 451 Group. Do startups with On-ramps hold the key to unlocking cloud storage. June 2010
36. The 451 Group. TwinStrata emerges with focus on hybrid cloud storage environment. May 2010
37. The 451 Group. Nirvanix Quadruples cloud storage capacity, eyes European expansion. April
2010
38. ESG. Will cloud storage come of age in 2010. Feb 2010
39. IDC. Cirtas Announces cloud storage controller for the enterprise storage. Sept 2010.
40. The 451 Group. Cirtas goes live with cloud storage controller, reveals Amazon as a investor. Sept
2010
41. The 451 Group. Panzura pounces on hybrid cloud storage opportunity for enterprises. Sept 2010.

Cloud Storage Gateways

NetApp Confidential

NetApp provides no representations or warranties regarding the accuracy, reliability or serviceability of any
information or recommendations provided in this publication, or with respect to any results that may be
obtained by the use of the information or observance of any recommendations provided herein. The
information in this document is distributed AS IS, and the use of this information or the implementation of
any recommendations or techniques herein is a customers responsibility and depends on the customers
ability to evaluate and integrate them into the customers operational environment. This document and
the information contained herein may be used solely in connection with the NetApp products discussed
in this document.

Cloud Storage Gateways

2015 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp,
Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, xxx, and xxx are trademarks or
registered trademarks of NetApp, Inc. in the United States and/or other countries. <<Insert third-party trademark notices here.>> All
other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.nTRXXX-XX
NetApp Confidential

Das könnte Ihnen auch gefallen