Beruflich Dokumente
Kultur Dokumente
A few years ago, object-based storage made a huge splash on-premise with the
promise of meaningful data relationships, information accessibility and strong
compliance. It remains an important component for information management
based on compliance and single-tenant architectures. However, the evolution of
object-based storage has big implications for the cloud and unstructured data:
new approaches to active archiving, web/mobile application development and a changing model for
cloud storage service providers.
Object storage is optimal for the web. It has a very different architecture from file systems, which
are frankly overkill for most cloud storage. On-premise can be a different story; having data close to
hand under single-tenant access control is right for some data storage. But on-premise stored data
requires that the enterprise maintain a primary data center, a cold data center for DR, replication,
continuous data protection, and so on. Given the right set of needs this is a fine trade-off of course
and we certainly do not counsel people to get rid of their internal data centers and redundant
systems.
However, cloud-based object architecture offers big benefits for storing unstructured data for
active archiving, global access to data, fast application development and much lower cost compared
to the high computing and data protection costs of on-premise NAS. EMC has engineered Atmos to
provide these capabilities and many more as a massively scalable, distributed cloud-based system.
In this Technology in Brief we will examine the fast-changing world of archiving and development
on the web, and how object-based storage is the best way to go for these monumental tasks.
themselves). Most do not rate this architecture, and instead reside on poorly scalable systems. The
number of these systems grows as applications come online, making it even harder for IT and
application owners to administrate and for users to get the value from the application that they
need. This already difficult scenario gets even worse when NAS storage is used for what is
essentially a cloud use case, such as extending existing assets over the cloud.
In contrast to hierarchical file system-based storage silos, object-based storage opens up a whole
new range of dynamic functionality. Object-based storage assigns unique object IDs to access data
across all federated locations. This goes a long way towards eliminating traditional, time-
consuming storage management tasks like LUN creation and RAID groups. Active archives and
applications needing fast global access particularly benefit from global namespaces and location
transparency. The flat, universal namespace allows global access to stored content from anywhere
the distributed application runs. Applications can also efficiently associate metadata with stored
objects without using a dedicated database. Sharing vast storage resources means application
administrators do not need to modify application files. Object-based storage usually has elements of
file systems in order to handle processes like file archiving, but it is not founded on that
architecture and its drawbacks.
Object-based storage originally developed as a type of specialized NAS storage where the
hierarchical system was replaced with an object-oriented system that made file storage far more
secure and scalable. One of its most popular incarnations is still going strong today: Content-
Addressable Storage (CAS). A subset of object-oriented storage, CAS ensures there is only one ID for
any object. When the CAS object is retrieved, it can be hashed again and checked against its ID to
verify identity. CAS de-dupes at the object level for copy control.
Object-based storage both on-premise and in the cloud require certain key capabilities. On-premise
object storage has great benefits for local file storage including multiple application access, massive
scaling, high availability; and in some architectures, information governance as well.
Multiple application access. Applications simultaneously leverage the same centralized
object-based storage infrastructure. This enables local object-based storage to execute
application-specific archiving management attributes for a complete chain of information
custody.
Massive scaling. Massive scaling is problematical with file-based archive solutions. As the file
system reaches its maximum capacity, administrators must expand the entire system’s
operating system, file system and application in order to scale the archive. By contrast, object-
based storage can expand in an open fashion into multiple petabytes due to their flat address
space.
High availability. Object storage often archives data that has heavy retention and government
requirements. In this environment, 5 9’s or higher availability (99.999%) is a necessity.
Mirroring and parity help to protect availability; other beneficial features include self-healing,
detecting and fixing soft corruptions in the background, and addressing hardware failures
before they impact data availability.
Information governance. A subset of object-based storage, Content-Addressable Storage
(CAS) is purpose-built for long-term defensible retention of fixed files and data. As opposed to
other archival storage methods like tape or monolithic “tar” files that bundle data up and/or
move it offline, CAS stores data as objects that can be strictly and individually managed for
governance and compliance and yet remain actively accessible on-line.
MULTI-TENANCY
Secure multi-tenancy is a key requirement of cloud object storage, which should support two levels
of multi-tenancy: tenants and sub-tenants. Tenants are top-level entities that each has its own
access points, security controls and master storage policies. Tenants share nothing with other
tenants and are fully isolated. Every node gets assigned to a specific tenant; tenants do not share
nodes and therefore each tenant has its own dedicated access points and storage. Within a large
company, a tenant could be set up for independently managed divisions or subsidiaries. In a service
provider implementation, the tenant might be mapped to a broad storage service offering.
Sub-tenants are then created within each tenant with security controls and defined management
policies assigned by the tenant. Each sub-tenancy defines a distinct storage environment with
isolated management for its own users, object namespace, and defined shares. A sub-tenant within
a company might correspond to a department, while a storage provider's sub-tenant might track to
a specific client account.
This highly functional multi-tenancy capability makes it easy to create private sandboxes or
implement a global content delivery scheme. With some planning, this scheme could enable large
corporations to facilitate aggregating “big data” distributed across the enterprise.
The policy mechanism should be highly flexible, targeting policies to any group of objects based on
both system and user defined metadata. Policies can be used to build service levels by defining the
amount of replication, implement archive rules for compliance, and optimize capacity and
performance as items age.
Archive-as-a-service. The most agile and flexible way for IT to deliver archive services is with
the cloud model of self-service portals. This model manages and meters utilization and
bandwidth and supports third-party chargeback. Within an enterprise this flexibility and
instant storage relieves users of the temptation of using commercial cloud services simply
because they can get the storage they need fast – even though security might not be in place.
This approach also enables ISVs and MSPs to extend archive requirements and offerings.
Reduce manual tasks and provisioning across multiple archives. Cloud-based archives
must be easy to set-up and for reliability and consistency must not require long or deep manual
configuration. They should also automate underlying complexities including security, audit,
retention, performance, and capacity growth. Atmos provides these features and more,
relieving the cloud administrator of enormous burdens. Distributed systems may be managed
as a single entity with policies to automate hundreds of management and data protection tasks.
And perhaps the most important of all, object-based systems like Atmos offer massive
scalability of capacity and performance thanks to their unique architecture.
Centera Compliance Edition Plus captures and preserves original content, protecting data and
proving chain of custody for legal eDiscovery and litigation. Retention classes assign a logical
reference to each electronic record object; policies enforce data retention and safe disposition.
Centera Governance Edition enforces internal policies for data retention and disposition. Policies
may be organizational or application-specific, which improves corporate accountability, reduces the
cost of eDiscovery and compliance, and proves the integrity of governance controls.
Replication is controlled by automated policies which can mirror data objects at many points in an
object’s lifecycle both within and across multiple sites. Within a data center site, replication might
for example be set to happen synchronously upon ingestion while between replication between
sites might be set asynchronously and launched with an arbitrary delay to allow for data settling.
Replications can be targeted to specific locations, or abstractly sent to “other” sites as the system
decides.
For performance and availability, replicas are all active for read access (objects are inherently
immutable so there is no issue with having to manage distributed locking mechanisms). Because it
is “multi-site active/active”, any site can fulfill new object write requests when the local primary
site is unavailable.
In addition to full replication, EMC also provides an erasure coding option called GeoParity. Instead
of keeping two or more full 100% copies, “9/12” erasure coding enables storing an “expanded”
object containing only 33% additional encoded “redundant” data broken up into 12 segments. By
using erasure coding, the original data can be reconstructed dynamically from any 9 of the
segments. These segments are cleverly distributed so that the object can survive (and even be
accessed during) multiple failures. For greater protection there is also a “10/16” coding with a 60%
capacity overhead. Erasure coding does impact access performance, especially at ingestion, but
provides great fault tolerance with much lower capacity utilization. Of course, policies can be
written to convert replicated objects to erasure coded schemes as they age appropriately.
With object stores there is generally no need for low-level RAID or disk level protection and Atmos
is no exception. Upon hardware failures, replications and/or GeoParity across nodes (RAIN)
combined with built-in node auto-healing features suffice to provide the full data protection as
determined by the service level “policies” implemented for each type of data object. Atmos can
withstand the loss of any disk, node, rack, or even site.
Medical Over 800 million medical imaging Vendor Neutral Archive (VNA) on Atmos:
Archiving procedures a year require huge integrates with EMR/EHR and improves
storage scalability; collaboration and PACs for better patient care and
compliance increase complexity. collaboration, improves data lifecycle
management, reduces IT costs, and
preserves HIPAA compliance.
File Archiving Corporate file sharing is popular with With EMC Sync & Share, users can securely
employees but syncing and sharing share Atmos files across mobile devices,
are hard to manage. Employees will Linux and Windows. GeoDrive creates a
frequently share files anyway over Dropbox-like service that is secure and
mobile devices, leaving corporations manageable, powered by Atmos’ fast
accountable for risky behavior. performance. Atmos policies monitor
changes to data and provide access control,
benefitting regulated verticals like finance.
Archive as a Both the enterprise and storage The Atmos Cloud Delivery Platform enables
Service service providers struggle to provide corporations and service providers to meter
IT services to their respective capacity, bandwidth, and usage across
customers. Provisioning, tenants. Provisioning is automated by
maintenance, and security are all tenant, and Atmos allows tenants to safely
difficult issues in traditional storage self-manage and access their own storage.
offerings.
Managed Many MSPs suffer from narrow profit Atmos lets MSPs efficiently offer storage as
Service margins because of the expense of a service and better monetize new service
Providers delivering storage to customers. offerings. MSPs can monitor capacity and
Managing multiple tenants, manual usage for chargeback, reduce provisioning
provisioning and maintaining service costs, and replace multiple tenant manage-
level agreements all cut into revenue ment systems with a single system. Dynamic
and make it too expensive to add scaling, high availability and security cost-
new storage services. effectively meet service level requirements.
Content-Rich Traditional storage is a poor Atmos provides location transparency for
Web environment for Web application global applications and a highly mobile user
Applications development, which needs highly base. The single namespace means that
scalable capacity for multiple large application developers never need to recode
data sets, a secure environment for pathnames and locations, and do not need
test/dev and application testing in to code for limited storage environments.
real-time environments. Self-management options make it easy for
customers to provision their own storage,
and REST APIs reduce application
complexity.
When a company is dealing with geographic reach and large growing volumes of rich content, then
they should look to object-based storage in the cloud. We fully support EMC in its push to scale
capacity, performance, availability and management far beyond what traditional file systems are
capable of, and more massively than ever before.
.NOTICE: The information and product recommendations made by Taneja Group are based upon public
information and sources and may also include personal opinions both of Taneja Group and others, all of which we
believe to be accurate and reliable. However, as market conditions change and not within our control, the
information and recommendations are made without warranty of any kind. All product names used and
mentioned herein are the trademarks of their respective owners. Taneja Group, Inc. assumes no responsibility or
liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or
reliance upon, the information and recommendations presented herein, nor for any inadvertent errors that may
appear in this document.