Beruflich Dokumente
Kultur Dokumente
ddn.com
2013 DataDirect Networks. All Rights Reserved.
Executive Summary
Object Storage is the new storage paradigm. There is a high level of interest from organizations, as this new
approach resolves the challenges of efficiently storing massive volumes of unstructured data - Big Unstructured
Data. This paper addresses the why, what and how of object storage.
Why should companies use Object Storage for unstructured data and how is it dierent from NAS or SAN?
The biggest problem with traditional approaches is scalability. NAS lacks the ability to scale as a single system,
especially in Petabyte environments. Todays SANs are already complex, when deployed with a file system layer on
top. Scaling-out makes the problem a lot worse.
Object Storage is essentially just a different way
of storing, organizing and accessing data on disk.
An Object Storage platform provides a storage
infrastructure to store files with lots of metadata
added to them referred to as objects. The backend
architecture of an object storage platform is
designed to present all the storage nodes as one
single pool. With Object Storage, there is no file
system hierarchy. The architecture of the platform,
and its new data protection schemes (vs. RAID, the
de-facto data protection scheme for SAN and NAS),
allow this pool to scale virtually to an unlimited size,
Figure 1 Object Storage Architecture
Users access object storage through applications that typically use a REST API (an internet protocol, optimized for
online applications). This makes object storage ideal for all online, cloud, environments. When objects are stored, an
identifier is created to locate the object in the pool. Applications can very quickly retrieve the right data for the users
through the object identifier or by querying the metadata (information about the objects, like the name, when it
was created, by who etc.). This approach enables significantly faster access and much less overhead than locating a
file through a traditional file system.
DDN | WOS is a true object storage platform, designed to scale beyond petabytes as a single system, optimizing TCO
without compromising performance or durability. This makes WOS a perfect platform for a variety of storage cloud
solutions, including online collaboration, active archives, cloud backup and worldwide data distribution.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
Table of Contents
Executive Summary
2
History of Object Storage 4
SAN vs NAS 5
Object Storage, The Third Paradigm 6
Cloud Storage, Storage Clouds, Object Storage 7
REST APIs 7
Object Storage Summary 8
Why Object Storage? 8
Massive Data Growth 8
Always Online 8
Power to the Applications 9
The Big Data Explosion 9
We All Use Object Storage Everyday 10
Use Cases 10
How Does Object Storage Work?
11
Issues with File Storage 11
Data Protection: Erasure Coding or Not? 14
WOS 15
True Object Storage Platform 16
Optimized for Small and Large Files 16
Choice of Data Protection Schemes 16
Self-healing Architecture 16
Single Storage Infrastructure 16
Widest Selection of Interfaces; Out of the Box Applications
16
Enterprise-grade Platform 17
WOS Benefits 18
Ecosystem 18
Resources 20
ddn.com
2013 DataDirect Networks. All Rights Reserved.
The current generation of object storage platforms is designed with this openness & flexibility in mind. Most
platforms support a subset of Amazons REST API and some platforms are designed to be independent of the
hardware platform. The industry has learned some tough lessons from using proprietary systems. One initiative
to prevent Vendor Lock-in, is SNIAs Cloud Data Management Interface (CDMI). This is a set of pre-defined RESTful
HTTP operations for assessing the capabilities of the cloud storage system, allocating and accessing containers
and objects, managing users and groups, implementing access control, attaching metadata, billing, moving data
between cloud systems, exporting data, etc.1
For the sake of briefness, we will stick to the very basics. There are hundreds, if not thousands of blog articles and papers about this topic available online.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
SAN vs NAS
A SAN is block storage device, not that different from an external USB disk drive, just bigger. Systems connect to a
SAN with a block interface; common protocols for block storage include iSCSI, Fibre Channel, Fibre Channel over
Ethernet (FCoE), etc. A device attaching to SAN will see the storage presented as a disk drive. SANs allow multiple
servers to share a pool of storage that cannot be accessed by individual users. This is to prevent the overwriting each
others data. SANs are typically used by large applications, such as enterprise databases, that handle data locking
through the application. SAN storage can be presented as a file system (by putting a file system layer on top), which
is generally referred to as a clustered file system. As we will explain later in this document, SANs are complex systems
to manage, especially when used for file storage.
Figure 3 Simplified SAN infrastructure with Clustered file system and enterprise applications
A NAS is a file storage device. NAS exposes its storage as a network file system. Devices that attach to a NAS see a
mountable file system. Common protocols for file storage devices include, NFS and SMB / CIFS. A NAS operates at
the file level and is accessible to users with proper access rights - so it needs to manage user privileges, file locking
and other security measures. A NAS environment is a much better fit than SANs for to store files.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
this is actually only partly true. The backend architecture of an object storage platform is designed so that all the
storage nodes are presented as one single pool. There is no file system hierarchy. The architecture of the platform,
and new data protection schemes (vs. RAID, the de-facto data protection scheme for SAN and NAS) allow this pool
to scale to virtually unlimited capacities, while keeping the system simple to manage.
Users access object storage through applications that will typically use a REST API. They use a set of simple
commands: GET (read), PUT (save) and DELETE. REST is an internet protocol, optimized for online applications. This
makes object storage ideal for all online, Cloud, environments. When objects are stored, an identifier is created to
locate the object in the pool. Applications can very quickly retrieve the right data for the users through the object
identifier - or by querying the metadata (information about objects: name, when it was created, by who, etc.). This
is much faster than attempting to locate a file through a traditional file system. Applications also handle user access
management. Each time a file (object) is changed, it is stored as a new object. This prevents corruption through
simultaneous changes.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
Figure 5 Scale out object storage with simple REST API for applications
REST APIs
REST stands for Representational State Transfer. It is a software architecture that is used for distributed application
environments, such as the internet. An API, short for Application Programming Interface, is an interface used for an
application (client) to talk to its environment (backend servers, storage, databases etc.). With the success of cloudstyle computing (running applications in the cloud, rather than on the users computer), REST APIs have become the
predominant interface for cloud applications to connect to the cloud. For storage-centric cloud applications, a REST
API is the interface between the application and the object storage platform.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
The three most common commands in REST APIs for object storage environments are GET, PUT and DELETE, which
are the equivalents of reading a file, saving a file (technically save as because object storage does not allow you to
update an object), or deleting a file.
Since the early days of Cloud Computing, theres been a lot of discussion about standardizing on a specific REST API
to avoid vendor lock-in. The general idea behind this is, if all vendors (of applications, cloud infrastructures, object
storage platforms etc.) use a standard API, users will never be locked-in to a specific environment. Without having to
reprogram their applications, they would be able to freely move their data from one platform to another - or keep
it on more than one platform. Little progress has been made on the standardization front however, and the result is
that object storage platforms will either support the Amazon S3 API, the OpenStack API or a native API (i.e. an API of
their own, typically a very easy to use, lightweight interface).
Always Online
Much of that data growth is driven by the recent innovations in cloud and mobile computing. We already mentioned
Amazon S3, but there are also Google, Facebook, Apple and several smaller public storage cloud offerings that set
a new level of expectations where all data needs to be available anywhere at anytime.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
Use Cases
Object Storage is more than a smarter paradigm that allows you to store large volumes of unstructured data.
Features like massive scalability, REST APIs, geographic distribution, enable a series of compelling use cases. An
interesting side effect is that solutions tend to overlap. Dropbox is not just file sharing, its backup, collaboration,
archiving and mobile storage. Here are a few popular use cases:
Online Web Services: As we mentioned earlier, one of the drivers for object storage is the trend to use more
and more online cloud applications. Previously, without Amazons S3, none of this would have been possible.
The more successful web services companies are now gradually making the move to in-house infrastructures.
Also, with corporate security policies, IP and compliance considerations, most enterprises prefer to run cloud
most of us did not know we had. Today, service providers are now deploying similar services and enterprises are
deploying private file sharing services as people utilize a variety of devices at home, at work and on-the-go.
They collaborate with people across the office or around the world.
Cloud Backup is increasingly popular. There are dozens and dozens of online services for backup. For
enterprises, the idea of backing up to low cost, highly scalable disk infrastructures - rather than tape, which can
be cumbersome for recovery - is also very compelling
ddn.com
2013 DataDirect Networks. All Rights Reserved.
10
Cloud Archives: Data archiving decisions used to be very simple: data that was infrequently accessed was
moved off disk to tape. Very few arguments could beat the low TCO of tape. Disk archives were hard to justify
and reserved for those exceptional use cases where latency outweighed the huge cost of disk archives. With
object storage, it is now possible to deploy disk archives at an acquisition cost and TCO close to that of tape.
Many organizations are opting for hybrid environments - with a really, really superfast hot disk tier and a very
cheap cold tape tier.
Worldwide Collaboration: Globally distributed teams have become standard practice. Think of researchers
from different institutions working on the same project. Think of a movie being shot in New Zealand and
produced in Los Angeles - or software being developed in California and then tested in India. Geographically
distributed storage pools enable teams to work in real-time on the same datasets.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
11
When the system is instructed to read a file, the repository of file system metadata is consulted and the required
data blocks are retrieved from the storage device. Writing data into a file system has the additional complexity
of requiring that the file system metadata must be written or updated - potentially by several users or processes
simultaneously. Numerous techniques and designs exist that attempt to minimize the impact of dealing with file
system metadata, and the locking problem associated with simultaneous access. Unfortunately, as the number
of files in the system grows large, keeping the file system metadata correctly organized (so that the names and the
data blocks that make up files can be found) becomes increasingly complex. As this requirement increases, keeping
track of billions of files (which may be distributed across a number of network connected computer systems),
the abstraction of the file system begins to breakdown. Moreover, the hierarchical structure of the file system is
insufficient to adequately categorize the data in the system.
File systems require at least three layers of software constructs to execute any file operation. As they allow files to
be amended by multiple users, they must maintain complex lock structures with OPEN and CLOSE semantics. These
lock structures must be distributed coherently to all of the servers used for access.
Also, as data is placed (based on random block availability), traditional file systems are always fragmented. This
is especially true in environments where the data is unstructured and it is not uncommon to write widely varied
file sizes. Using a traditional file system designed for amendable data, storing immutable data constitutes an
inappropriate and wasteful use of bandwidth and compute resources. This highly inefficient approach requires
a great deal of additional hardware and network resources to achieve data distribution goals. These systems now
become exponentially more complex as they are scaled-out.
Object storage systems dispense with the overburdened concept of file system metadata. This approach allows the
system to separate the storage of data from the relationship that the individual data items have to each other. In an
object storage system, the physical storage blocks are organized into objects which are collections of data blocks
represented by an identifier. There is no hierarchy imposed on the data and no repository of the objects metadata
to be consulted when reads or writes are requested. This approach allows an object storage system to scale with
both the requirements and size of the system, well beyond the technical & practical boundaries of traditional file
systems.
While Object Storage systems do not use file system metadata, they do employ object metadata (customizable
information about the objects). This information can later be used to query or analyze the information stored. Object
metadata for a photo could be the day it was taken, the last time it was modified, the type of camera that was used,
whether a flash was used, where it was taken, etc. Object metadata will play an increasingly important role as we
store more and more information, but it does not add complexity to the system like file system metadata does.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
12
At the highest level, storage servers are, like NAS and SAN, simply boxes with a lot of disks in there. Typically,
object storage vendors will use SATA disks in their systems, and may include SSDs for caching. Some
platforms opt for separate controllers, but in essence that does not make a difference, as the storage is
presented as one pool (namespace). When choosing an object storage platform, its important to understand
the limitations of the namespace and how the system combines different pools or namespaces. Many
vendors claim infinite scalability, but there is no such thing. The important thing is to understand how
namespaces are combined, presented and managed. How many such namespaces can be combined? Are
they managed as one system? The system software manages most of that.
The actual software layer is where vendors can differentiate. The list of possible features is endless. A single
management interface is always great. Self-healing capabilities are a must for environments that will scale
into the hundreds of petabytes. The software layer also provides data protection mechanisms, which we will
cover in the next section.An Object ID is stored, to locate the data
The standard interface to access data in an object storage platform is a RESTful interface or REST API. This is a
set of simple commands that application developers use in their code to let the application access the data.
The basic REST commands are LIST, GET, PUT and DELETE, which are used to list (a selection of ) objects, read
an object, store an object or delete it. There is no standard for REST yet, but the so called Amazon API is by
far the most popular amongst developers. Hence, most object storage providers will provide an Amazon
compatible API, which is typically a subset of the commands that are supported by Amazon S3. As most
legacy applications were designed to interface with a file system, most object storage platforms will also
provide one or more file interfaces (a file system layer on top of the object storage pool also called a file
system gateway) and often a selection of programming language-specific APIs will be provided as well.
DDNs WOS has the widest selection of interfaces on the market.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
13
ddn.com
2013 DataDirect Networks. All Rights Reserved.
14
WOS
DDNs legacy is designing high-performance storage systems, but without making things more complex than they
need to be. WOS is the perfect example of achieving operational excellence through reverse engineering - stripping
the architecture down to the very basics. The architecture of WOS consists of three components: WOS building
blocks, WOS Core software and a choice of simple interfaces.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
15
Self-healing Architecture
Keeping traditional storage infrastructures healthy is management-intensive. Disks need to be replaced and
restored. Rebuild windows need to be kept to a minimum to avoid data loss and preserve application performance.
This is not the case with WOS. The built-in data protection algorithm, ObjectAssure has unique self-healing
capabilities that further reduce the management effort. Also, in case of a broken disk, ObjectAssure only has to
reconstruct the actual data that was lost - as opposed to the entire disk. This dramatically reduces the rebuild
window.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
16
Enterprise-grade Platform
Most vendors recommend commodity hardware for their object storage platforms. In the short term, this could
mean initial CAPEX savings, but as such devices typically have shorter replacement cycles, this highly impacts the
OPEX further down the road. This is especially so for multi petabyte deployments. While WOS was designed to be
hardware agnostic, we designed the WOS 7000 hardware to reduce TCO. Unlike commodity hardware, the WOS 7000
has an ultra dense form factor, so there are fewer systems to house, manage, power, cool and maintain. Leveraging
over 15 years of hardware design for the most demanding HPC environments, WOS 7000 was built to run many more
years than cheaper commodity hardware.
WOS Benefits
Lowest Global Access Latency
WOS was designed with the intent of maximizing performance for storage of massive volumes of immutable data.
Scales with All Varieties of Applications
WOS scales virtually unlimited in clusters as large as 30PB. Those clusters can consist of any mix of small (kilobytes)
or large (terabytes) files.
Best Durability & System Availability
WOS choice of data protection schemes allows the customer to deploy object storage that combines durability with
availability.
Lowest Administration Overhead, Lowest TCO
Through automated management, lower hardware costs, less power usage, simple architecture, optimized disk
usage and reduced WAN bandwidth usage; WOS enables organizations to store more data at a much lower cost.
Simple Integration
Integrate WOS with your GRIDScaler GPFS storage or your EXAScaler Lustre platform. Use WOS as an archive for your
HScaler Big Data Storage, or build an Active Archive of WOS with a tape library for offline cold archiving.
Maximum Portability
WOS features the most complete set of interfaces to facilitate your application integration, including C++ and Java
APIs for direct application integration, REST for web applications (S3 or not) and file gateways to support file-based
workflows.
Best Data Center Density
Designed for massive HPC deployments, WOS 7000 provides the highest data center density possible.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
17
Ecosystem
Object Storage is clearly the hot space in the storage industry, with offerings from both startups and established
storage solution providers. But, there is more than just object storage on the market: object storage has fostered a
wave of innovation that enables or leverages the paradigm.
The list of Tier 2 object storage players can be endless, especially when including the application providers. Here is
a short selection of popular gateways, WAN optimizers, collaboration platforms and other applications. This should
help to provide a better understanding of the object storage ecosystem and the opportunities and use cases.
Ctera CTERA leverages object storage to offer a range of solutions for SMBs, enterprise branch offices and remote
users, including: data backup and recovery, file-based collaboration and mobile access.
Mezeo also provides a number of storage solutions that leverage object storage, including: an AWS compatible
REST API and a number of file sync and share clients that give users access and collaboration capabilities from their
PC/Mac, smartphone, tablet or browser interface.
Panzura built a NAS gateway for storage clouds. The gateway enables enterprises to combine multiple storage
(cloud) resources and make them accessible to multiple locations, presented as a unified global file system.
Aspera both leverages and facilitates object storage. On the one hand, they have a number of applications for
collaboration, distribution etc., but the core of their technology is a protocol that optimizes how data is sent from
the object storage pool, over the WAN to a user application - or between sites, if an object storage infrastructure is
distributed over multiple locations.
Bitspeed and Silverpeak are active in the same space: WAN optimization, which enables faster, more reliable
and] secure data transfer between storage sites - or between the object storage pool and the application. These
technologies are becoming increasingly important in the deployment of object storage based storage clouds.
Dropbox is probably the best-know object storage success case. This early AWS S3 customer launched a file-sharing application when no one even knew they needed one. The power of Dropbox lies in their use of deduplication
(when multiple users store the same file in their Dropbox, only one copy is kept). This way, Dropbox saves a lot on
storage costs. Deduplication is not new, but Dropbox pioneered its use in an online, object storage based application. This also allowed them to quickly gain a large user base through a fermium model, which would have been
unaffordable otherwise.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
18
Box(.net) also started as an online file sharing application but with some very important differences. Box runs
on their own (object storage) infrastructure, which gave them more control over security, data integrity etc. (as
compared to using S3). This allowed them to bring their solution to the SMB and Enterprise markets. Today, Box.net
grew to what can probably best be described as a storage-centric Platform as a Service, enabling organizations to
customize apps, integrate with their own applications etc.
Netflix, which launched as a DVD rental by mail is an early adopter of object storage: in 2007 it launched a movie
streaming service which would disrupt the market. Well before Apple added movies and tv shows to their store,
Netflix leveraged S3 to offer movies in an online format.
Apple, Google and Facebook also have massive object storage deployments, but little is known about their
architectures. Apple and Google are going after the S3 end users with document sharing and other storage in the
cloud services. With this, they compete both with Amazon and the applications that use S3 such as Dropbox and
Evernote.
Resources
http://knowledgelayer.softlayer.com/learning/introduction-object-storage
http://docs.openstack.org/trunk/openstack-object-storage/admin/content/ch_introduction-to-openstack-object-storage.html
http://cloudarchitect.att.com/Articles/Introduction-Object-Based-Storage
http://www.conres.com/hitachi-hds-object-storage-content-platform
http://www.cs.cmu.edu/~garth/RAIDpaper/Patterson88.pdf
http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162
https://en.wikipedia.org/wiki/Representational_state_transfer#RESTful_web_APIs
ddn.com
2013 DataDirect Networks. All Rights Reserved.
19
DDN | About Us
DataDirect Networks (DDN) is the world leader in massively scalable storage.
Our data storage and processing solutions and professional services enable content-rich and
high growth IT environments to achieve the highest levels of systems scalability, efficiency
and simplicity. DDN enables enterprises to extract value and deliver business results from their
information. Our customers include the worlds leading online content and social networking
providers, high performance cloud and grid computing, life sciences, media production, and
security and intelligence organizations. Deployed in thousands of mission critical environments
worldwide, DDNs solutions have been designed, engineered and proven in the worlds most
scalable data centers to ensure competitive business advantage for todays information powered
enterprise.
For more information, go towww.ddn.comor call +1.800.837.2298
2013, DataDirect Networks, Inc. All Rights Reserved. DataDirect Networks, EXAScaler, GRIDScaler,
hScaler, ReACT, SFA12K, SFA, SFX, Storage Fusion Xceleration, Web Object Storage, WOS are trademarks of
DataDirect Networks. All other trademarks are the property of their respective owners.
ddn.com
2013 DataDirect Networks. All Rights Reserved.
Version-15/13
20