Sie sind auf Seite 1von 36

Typically, disk arrays are divided into categories: Network

attached storage (NAS) arrays. Storage area network (SAN) arrays: Modular SAN arrays.
Monolithic SANarrays.

Storage area network (SAN) arrays:

 Modular SAN arrays


 Monolithic SAN arrays
 Utility Storage Arrays
Modular VS Monolithic Storage

Finding the right storage devices for your business or domestic needs can be problematic,
whether you’re struggling to balance cost with the amount of capacity you expect to need,
or unable to decide on the right data storage array. For some people, choosing between
modular and monolithic storage is incredibly difficult: each type offers its own advantages
and disadvantages, and will appeal to people for various reasons. Understanding the
difference between the two types can be quite a challenge, given the complexity of their
design and performance, but at eProvided, we’re here to help you make sense of all things
storage and data recovery – we’re going to explore the basics of modular and monolithic
storage, to help you make a more informed choice.

Modular Storage
Modular storage arrays are typically based on two controllers, which are kept apart from the
unit’s disks – this ensures that, if one controller experiences a failure, the second will take
over from the first automatically. These are held on a shelf which runs on a separate power
source to the disks, to which they’re connected via cables (usually copper or optical). One
of the key advantages of modular storage is that it’s usually cheaper than monolithic
storage, and can be added to over time: you may want to start with a single controller shelf
and one shelf housing disks, before adding more and more as your needs dictate, until you
reach its optimum capacity.

Generally, smaller businesses or households with low storage demands may begin with a
low-capacity modular storage device – which is more cost-effective – and then expand it as
the budget allows and your data needs demand.

In general, modular storage is usually up to 25% cheaper than monolithic devices, though
each will differ. These also tend to offer interfaces designed for maximum user-friendliness,
allowing for easier capacity-expansion – reducing the need to make calls for technical
assistance.
Monolithic Storage
Unlike modular storage, monolithic is based on disks fixed into the array frame and linked to
the various controllers via massive amounts of cache memory. These offer helpful
redundancy features, so that any failures are immediately compensated for by another –
thanks to this design, monolithic storage rarely fails.

With plenty of cache memory (shared among any linked devices), these offer greater
storage capacities than modular arrays, and are typically equipped for better data-
management. The more cache memory your storage device has, the faster and smoother it
will run. These typically boast the better service and support of the two, and their proven
disaster-recovery appeals to many businesses – you may be more willing to pay for long-
term reliability (if your budget allows).

Making the Choice


Having taken a brief look at modular and monolithic storage, which is right for you and your
needs? The choice, of course, is down to you – cost will certainly be a factor, and easy-
expansion may be a priority, too. Depending on the size of your business or department,
you may find that modular storage is the best all-round choice for its cost-effectiveness and
user-friendly design; on the other hand, the additional cache memory and reliability
monolithic options offer may just swing it for you. There is no right or wrong answer – just
weigh up the pros and cons of each and compare these to your business needs.

At eProvided, we offer data recovery across all devices, regardless of its construction. With
more than fifteen years in business and a team of experienced experts, we use the latest
techniques to retrieve your vital lost data – whether you fear your favourite images have
been overwritten or damaged for good or your hard drive appears to have shut down or
broken, we’re here to help. Just give us a call on 1 – 866 – 857 – 5950, or
email contact@eprovided.com.

========================

Monolithic vs. Modular Disk Storage Subsystems

Both types of storage subsystems can be used for most applications and environments.
Choosing which is best means assessing the benefits and risks of each.

What You Need to Know


Enterprises should use modular arrays wherever possible, because of their lower costs and
increased functionality. However, monolithic arrays will often be more suitable for larger
data centers and for high-availability applications employing dual locations and sophisticated
DR schemes. Enterprises should require vendors offering modular and monolithic storage to
bid both solutions. They should also be prepared to buy from nonleadership storage
vendors, provided they view storage acquisitions as tactical and the deployment of
proprietary value-adding features as strategic.

Analysis

Strategic Planning Assumptions


Monolithic disk storage arrays will continue to dominate disaster
recovery deployments based on remote-copy technology through 2004 (0.7
probability).
Monolithic disk storage arrays will maintain functional advantages relative to
modular arrays through 2003 (0.8 probability).
Modular disk arrays will maintain at least a 25 percent price/performance
advantage through 2003 (0.8 probability).

Change is now a constant in the storage marketplace. New partnerships are reshaping
distribution and support channels, and innovations in packaging and breakthrough
technologies will further segment the storage market. Disk array features that provided
product differentiation last year are now often prerequisites for inclusion in an enterprise’s
shortlist and are now offered by monolithic and modular disk arrays. Figure 1 illustrates the
architectural differences between the two approaches.

Figure 1

Monolithic Architecture vs. Modular Architecture

Source: Gartner Research

Today, enterprises can deploy either kind of array for most applications, and both are
suitable for networked storage environments. For monolithic and modular systems, point-in-
time (PIT) and remote-copy (RC) technologies have become almost ubiquitous, while FC
and storage area networks (SANs) have become standard interconnect technologies. As with
features and functions, the relevance of data availability, throughput and performance is
also evolving. Unplanned downtime is now such a rare event that enterprises often focus
more on planned downtime issues to identify product differentiation. In addition, many
users are now viewing throughput and performance as a scalability enabler and productivity
tool rather than as a source of significant product differentiation.

Choosing whether to implement SANs using modular or monolithic arrays, or a combination


of both, becomes a function of application requirements, anticipated growth rates, prior
investments, internal skills, costs and strategic business needs. Data availability, excluding
the impact of value-adding features, should be indistinguishable between monolithic and
modular arrays. Just which type of array is best depends on the factors we discuss below.

Monolithic Disk Arrays

Competition in the monolithic-disk storage market has greatly improved product


functionality, throughput and price/performance. Sometimes, size does matter. Monolithic
arrays, which can scale to more than 100 terabytes, reduce the number of boxes being
managed. With as many as 96 host FC connections, all of a monolithic system’s throughput
and capacity can be used. Capacity upgrades are simple and nondisruptive. However, for
monolithic arrays to be financially attractive, enterprises must negotiate upgrade costs that
anticipate continuing price reductions.

Monolithic systems’ value-adding features, and lots of experience of using them to support
extreme 24x7 environments, make them a relatively low-risk solution compared with
modular arrays. Monolithic systems also offer better tuning tools than modular arrays. For
example, EMC and Hitachi Data Systems (HDS) provide the ability to identify "hot spots" in
redundant array of independent disks (RAID) sets and nondisruptively move them. Service
and support of monolithic arrays are also generally better than for modular arrays. Because
monolithic-array vendors are trying to move down-market while modular array vendors are
trying to move up-market, we encourage enterprises to seek bids from vendors that can
offer both solutions in open-system environments.

The risks of monolithic systems — limited deployment flexibility, higher costs and sensitivity
to cache hit ratios — derive from their cache-centric designs. In particular, small monolithic
configurations are very expensive because they are partially populated boxes that are
expensive to maintain. The only significant difference between a big and small configuration
is the number of disks in the box. There is also a greater risk of vendor lock-in, which
increases the acquisition cost of storage infrastructures.

The benefits and risks of monolithic systems are summarized in Figure 2.

Figure 2

Benefits and Risks of Monolithic Arrays


Source: Gartner Research

Modular Disk Arrays

FC host connections provide modular arrays with the connectivity needed to implement
storage consolidation projects, and put them into more direct competition with monolithic
arrays. Today's leading modular arrays have no single points of failure, and many are also
delivering nondisruptive maintenance and microcode updates. Modular arrays and
monolithic arrays should easily exceed most throughput requirements, although costs may
be very different. Modular array throughput is increased by adding additional controller
electronics and host connections, while monolithic performance is generally enhanced by
adding cache or additional host connections.

Modular arrays’ low cost and high availability are compelling advantages in small and
midsize enterprises not requiring sophisticated DR schemes or S/390 connectivity. Their low
cost, even in small configurations (less than 500GB), and consolidated management tools
make them the only solution when high-availability distributed storage is required.
Performance is very good in open-system environments, because modular arrays rely
primarily on bandwidth rather than cache hit ratios to deliver throughput. However,
identifying and resolving throughput problems can be difficult because of a lack of
instrumentation and tuning tools.

Where scalability and connectivity limitations are not an issue, the availability of PIT and RC
functionality makes modular arrays viable in mission-critical environments. Understanding
the details of PIT implementations is also critical because implementations from the same
vendor are not always functionally equivalent — for example, EMC’s SnapView is a logical
PIT copy, whereas TimeFinder is a physical PIT copy.

Risks focus on large applications that span multiple storage subsystems, and on service and
support. Many smaller vendors rely on third-party support and are still not experienced in
supporting high-availability environments. Modular array vendors also have less experience
with modular PIT and RC solutions than monolithic vendors, which means it is more
important to check references and the level of education and consulting services that are
available.

The benefits and risks of modular systems are summarized in Figure 3.


Figure 3

Benefits and Risks of Modular Arrays

Source: Gartner Research

Key Issues
How will storage systems evolve during the next five years?
What changes in technologies and vendor dynamics will shape the storage industry?

Acronym Key

DR Disaster recovery
ESCON Enterprise Systems Connection
FC Fibre Channel
HBA Host bus adapter
HDS Hitachi Data Systems
ISV Independent software vendor
PIT Point-in-time
RAID Redundant array of independent disks
RC Remote copy
SAN Storage area network

=============================
Monolithic and modular storage both take advantage of the movement toward network
storage through consolidation, scalability, performance, availability and a better return on
investment.

Comparison of modular & monolithic

There are strong benefits for each design - the hard part is choosing.

KEY STORAGE MODULAR MONOLITHIC


ATTRIBUTES

Availability Implementations should have Robust failover and availability.


and reliability path failover/redundancy in host Initially led and still perceived to
I/O path, switch and storage provide the premier remote
components (dual drive bus, dual mirroring solutions, although this
controllers, fans, power supplies, gap is rapidly closing.
and hot spare drive(s) for
immediate RAID rebuild process
in the event of any disk failure).
Remote mirroring and
snapshot/backup techniques are
available. Validates cluster server
testing.

Connectivity SCSI, FC and iSCSI attach, ESCON, FICON, SCSI, FC


however, lacking mainframe connections for mainframe and
attach. Variations between open systems attach. Large server
vendors for number of logical connectivity, but being matched
volumes, channels and operating by some modular.
systems.

Interoperability Getting much better, but not Since these systems have the
ubiquitous. Look for largest base, in general, more
interoperability interoperability testing is
processes/certifications and available for their systems. As
standards (i.e., FCIA SANmark volumes shift towards modular
certification). Complete solutions systems, this advantage will go
available from various vendors. away.
Some modular vendors lead the
market in this area.

Manageability Vendor-specific usability and Probably requires service call to


capability. Check for dynamic, expand storage and change
online scaling capabilities (i.e., configurations. May have an
adding disks, expanding advantage to stay with
volumes, changing RAID or homogenous brand of storage.
configurations). Heterogeneous Each vendor has future strategy
management products coming to manage heterogeneous storage.
onto the market. Some vendors Often requires multiple
providing all management from applications to manage in open
within a single application. systems.

Performance For open systems environments, Cache-centric architectures were


the Storage Performance Council developed to counteract the
has established the first industry architectural limitations of the
standard benchmark for mainframe environments. The
measuring I/O transactions per cache architecture - beneficial for
second, and modular storage mainframe performance - can't
came out a leader in actual keep pace with the modular
performance and, scalable back- open-systems design which can
end channels and effective read- scale controllers and back-end
ahead algorithms assist in channels to the disks.
anticipatory reads in the random
nature of open systems.

Scalability/ Scales controller modules and Can attach storage to mainframe


flexibility drive modules independently. and open systems, but not
Pay as you grow approach. recommended to have both on
Designed for simple expansion. the same subsystem. Typically
Industry standard rack- requires service call to add or
mount cabinets allow flexible partition storage. May
appliance integration. Vendor accommodate more storage with
unique capacity per each storage fewer subsystems.
system with some approaching
same capacity as monolithic.

Service Vendor, channel partner Complete service and


dependent. Professional services professional services offered and
not required. Ease of use can be typically required. This can be
built into softwareinterface. valuable, but costly.

TCO Architectures, innovation and Higher costs for open system-


competition allow for much attach (Windows, Unix) and less
better pricing and scalability. performance, however,
Gartner estimates 25% savings management of homogenous
on storage costs. Additional storage may be of value.
service and maintenance savings.

But, here's where it gets interesting: According to analysts, monolithic systems generally
provide more robust failover, availability, interoperability testing and professional service.
On the other hand, the little guys excel in scalability, performance, management and cost
less.

Monolithic boxes usually come with the RAID controllers and disk drives in large, self-
contained, one-size-fits-all enclosures. The majority of today's storage area network (SAN)-
based storage is in large enterprises where mainframe class external RAID systems were
established with EMC's Symmetrix and IBM/StorageTek's monolithic storage systems.
Currently EMC's Symmetrix, IBM's Enterprise Storage Server (Shark), StorageTek's SVA,
and Hitachi Data Systems' Lightning Series provide this large form-factor storage system
capable of attaching to mainframes, as well as open systems server environments.
In contrast, modular external RAID controller-based storage systems are defined by the
distinct separation of the RAID controller module(s) from the disk drive module(s). Each
module is housed in industry standard racks - which may also hold other general-
purposeappliance servers - and separate the scalability of performance and capacity.
Modular storage systems are targeted specifically at open systems servers, and typically
don't carry the large cache and myriad of connections, Enterprise Systems Connection
(ESCON) and Fiber Connectivity (FICON), which are required for mainframe storage.
Examples of modular storage includes Compaq's EVA, EMC's Clariion, Dell's PowerVault,
HP's VA7000 series, IBM's FAStT series, LSI Logic's E-Series, StorageTek's D-Series,
XIOtech's Magnitude, and others.The analysts at Gartner, Inc. in Stamford, CT, are
projecting that modular storage systems will exceed the revenue of monolithic storage
systems in 2003, with a continuing growth trend favoring modular.

Benefits of modular storage

Modular storage developed in much the same vein as stereo components did: to satisfy a customer's
desire for lower-cost gear and prevent vendor lock-in. For example, with modular components an
iSCSI router can easily attach to lower-end servers that don't require the performance of Fibre
Channel (FC).

A summary of modular storage strengths

 Cost. Gartner estimates 25% less than monolithic. The actual cost comparisons can vary
drastically and must be evaluated for each configuration.

 Performance. Applications such as transactional databases, data warehousing, customer support


ande-commerce place different, more complex demands on storage system performance than
previous generation applications. Coupled with higher performing server technology, modular
storage leads performance by supplying multiple back-end drive channels and sophisticated
controllers with special read-ahead algorithms.

 Scalability and flexibility. Systems can dynamically scale capacity and performance
independently. Systems can start small and grow cost effectively.
 Footprint. The latest disk drives and packaging allow for optimum capacity per square foot.

 Manageability and usability. User-friendly interfaces and management tools have been
designed to allow administrators to expand capacity, create new volumes, map new servers and
other tasks that often require service calls in the monolithic storage systems.

Benefits of monolithic storage

The IBM 3XX mainframe systems of the late 70s and early 80s were limited in their performance
due to the availability and cost of memory technologies. Mainframe storage vendors - initially EMC
and IBM/StorageTek - developed cache-focused storage systems to counteract the architectural
limitations of the mainframe environments. In the mainframe environment, it proved to be more cost
and performance effective to place significant cache in the storage system relative to the mainframe
to accommodate its more predictable, read-oriented applications. These large cache storage systems
provided significant performance advantages in the mainframe environment.

Typically one or more storage systems were connected to each mainframe system. Initially, midrange
servers - running Unix, NT and NetWare - each had their separate physical storage, either internally
or externally attached. Eventually these large external RAID storage systems expanded from
mainframe to also attach to open systems servers using a networked storage model. The advent of
consolidated, networked storage provided the ability to logically divide storage, allowing for much
higher capacity utilization.

The monolithic storage systems now offered with EMC's Symmetrix, IBM's Shark, StorageTek's
SVA and Hitachi's Lightning series provide the following:

 Connection. Both mainframe and open systems servers can be attached, although it's not
recommended to use both on the same physical system.

 Homogenous environment. With the large installed base of these systems, it may be
advantageous to keep one brand of storage system.

 Capacity in a single storage subsystem is greater than most modular storage systems.
 Robust replication and disaster recovery. The largest installed base of replication products are
on these systems which are perceived to have the most robust mirroring solutions, although this
gap is closing quickly.

 ISV integration. Many integrators are familiar with these established systems.

 Service and support. Extensive professional services and support teams are in place.

The choice is yours

The good news is that storage technology is accelerating at a tremendous pace, bringing increased
functionality and benefits. Most importantly there's more choice: There isn't a one-size-fits-all
solution. Monolithic systems provide external RAID storage for mainframes and also attach to open
systems servers. Modular storage systems have been designed to attach to open systems
environments and provide a flexible, lower-cost alternative in these environments. As noted earlier,
there are benefits to each design.

Monolithic storage systems are the only alternative for mainframe attach and may be viable for open
systems servers, primarily if keeping with a single brand of storage. The additional cost of these large
systems may be justified because monolithic systems provide a homogenous storage environment.
They offer the most proven disaster recovery and replication products and usually come with the
highest levels of service and support.

Modular systems, on the other hand, are more flexible and scalable and allow a pay-as-you-grow
price structure. The management interfaces are often designed for end users, and as a result it's easier
to expand capacity, manage volumes, provision storage and a variety of other tasks that would likely
require a service call in the monolithic world. Performance is also superior in the open systems
environments, and modular systems cost significantly less than monolithic systems. In addition,
today most storage innovations first appear in modular systems.

The choices that are available offer increasing value to the customer and - with a little investigation -
administrators can meet their storage needs today and have an infrastructure that accommodates their
future requirements.
=============================================================

storage array (disk array)


A storage array, also called a disk array, is a data storage system that is used for block-
based, file-based or object storage. The term is used to describe dedicated storage
hardware that contains spinning hard disk drives (HDDs) and/or solid-state disk drives.

Arrays were initially designed to separate storage from servers so systems could be built
into large, monolithic configurations for block- or file-based storage. They have
complicated redundancy features built into them such as high-performance RAID
controllers, and the storage may be configured with logical unit numbers (LUNs).

The storage array is the backbone of the modern business storage environment. Arrays
have evolved into different designs for enterprise, midrange and small business
environments, and offer a wide-range of data protection and high-availability features. They
contain controllers -- the brains of the system -- that provide a level of abstraction between
the operating system and physical drives. A controller has the ability to access copies of
data across physical devices, and can take the form of a PCI or PCI Expresscard designed
to support a specific drive format such as Serial ATA (SATA) or serial-attached SCSI (SAS).

Storage arrays and flash storage

Initially built for HDDs for storage area networks (SANs) -- block-based storage – or network-
attached storage (NAS) -- file-based storage -- there are now systems built for flash or solid-state
drive (SSD) storage arrays. Flash arrays contain flash memory drives designed to overcome the
performance and capacity limitations of mechanical, spinning disk drives. A flash array can read data
from SSDs much faster than disk drives and are increasingly used to boost application performance.
Storage arrays can be all-flash, all-spinning disk or hybrids combining both types of media.

An enterprise-level storage array is for configurations that contain hundreds of servers. It


can process server compute power to handle huge amounts of data transactions per
second. A midsize or low-end storage array is a stripped down version for environments
with only a few servers.

array-based replication
Array-based replication is an approach to data backup in which compatible storage arrays
use built-in software to automatically copy data from one storage array to another.

Array-based replication software runs on one or more storage controllers resident


in disk storage systems, synchronously or asynchronously replicating data between similar
storage array models at the logical unit number (LUN) or volume block level. The term can
refer to the creation of local copies of data within the same array as the source data, as well
as the creation of remote copies in an array situated off site.

remote replication
What is remote replication?

Remote replication is the process of copying production data to a device at a remote


location for data protection or disaster recovery purposes.

Remote replication may be either synchronous or asynchronous. Synchronous replication


writes data to the primary and secondary sites at the same time. With asynchronous
replication, there is a delay before the data gets written to the secondary site. Because
asynchronous replication is designed to work over longer distances and requires less
bandwidth, it is often a better option for disaster recovery. However, asynchronous
replication risks a loss of data during a system outage because data at the target device
isn't synchronized with the source data.

Replication occurs in one of three places: in the storage array, at the host (server) or in the
network. Most enterprise data storage vendors include replication software on their high-
end and mid-range storage arrays. Host-based replication software runs on standard
servers, making it the cheapest and easiest type of replication to manage for many, but it
taxes server processing. Replication on the network requires an additional device, either an
intelligent switch or an appliance.

Data can also be replicated remotely to a cloud backup service.

host-based replication

Host-based replication is the practice of using servers to copy data from one site to another.

Host-based replication is conducted by software that resides on application servers and


forwards data changes to another device. The process is usually file-based
and asynchronous: The software traps write input/output (I/O) and then forward changes to
replication targets.

To enable efficient and secure data copying, host-based replication software products
include capacities such as deduplication, compression, encryption, and throttling. Host-
based replication can also provide server and application failover capability to aid in disaster
recovery.

The other two main types of replication are array-based and network-based. Array-based
replication takes place in the storage array and network-based replication requires adding
an appliance or intelligent switch. Host-based replication is less expensive than the other
two options and works with any type of storage. However, it can impact performance of the
server and can also be negatively affected by viruses and application crashes.

Vendors of host-based replication software applications include CA, Neverfail, Quest


Software, SIOS Technology.

data deduplication
Data deduplication -- often called intelligent compression or single-instance storage -- is a
process that eliminates redundant copies of data and reduces storage overhead. Data
deduplication techniques ensure that only one unique instance of data is retained on
storage media, such as disk, flash or tape. Redundant data blocks are replaced with a
pointer to the unique data copy. In that way, data deduplication closely aligns
with incremental backup, which copies only the data that has changed since the
previous backup.

For example, a typical email system might contain 100 instances of the same 1 megabyte
(MB) file attachment. If the email platform is backed up or archived, all 100 instances are
saved, requiring 100 MB of storage space. With data deduplication, only one instance of the
attachment is stored; each subsequent instance is referenced back to the one saved copy.
In this example, a 100 MB storage demand drops to 1 MB.

Target vs. source deduplication

Data deduplication can occur at the source or target level.

Source-based dedupe removes redundant blocks before transmitting data to a backup target at the
client or server level. There is no additional hardware required. Deduplicating at the source reduces
bandwidth and storage use.
In target-based dedupe, backups are transmitted across a network to disk-based hardware in a remote
location. Using deduplication targets increases costs, although it generally provides a performance
advantage compared to source dedupe, particularly for petabyte-scale data sets.

Techniques to deduplicate data

There are two main methods used to deduplicate redundant data: inline and post-processing
deduplication. Your backup environment will dictate which method you use.

Inline deduplication analyzes data as it is ingested in a backup system. Redundancies are


removed as the data is written to backup storage. Inline dedupe requires less backup
storage, but can cause bottlenecks. Storage array vendors recommend that their inline data
deduplication tools be turned off for high-performance primary storage.

Post-processing dedupe is an asynchronous backup process that removes redundant data


after it is written to storage. Duplicate data is removed and replaced with a pointer to the
first iteration of the block. The post-processing approach gives users the flexibility to dedupe
specific workloads and to quickly recover the most recent backup without hydration. The
tradeoff is a larger backup storage capacity than is required with inline deduplication.

File-level and vs. block-level deduplication

Data deduplication generally operates at the file or block level. File deduplication eliminates
duplicate files, but is not an efficient means of deduplication.

File-level data deduplication compares a file to be backed up or archived with copies that
are already stored. This is done by checking its attributes against an index. If the file is
unique, it is stored and the index is updated; if not, only a pointer to the existing file is
stored. The result is that only one instance of the file is saved, and subsequent copies are
replaced with a stub that points to the original file.

Block-level deduplication looks within a file and saves unique iterations of each block. All
the blocks are broken into chunks with the same fixed length. Each chunk of data is
processed using a hash algorithm, such as MD5 or SHA-1.

This process generates a unique number for each piece, which is then stored in an index. If
a file is updated, only the changed data is saved, even if only a few bytes of the document
or presentation have changed. The changes don't constitute an entirely new file. This
behavior makes block deduplication far more efficient. However, block deduplication takes
more processing power and uses a much larger index to track the individual pieces.

Variable-length deduplication is an alternative that breaks a file system into chunks of


various sizes, allowing the deduplication effort to achieve better data reduction ratios than
fixed-length blocks. The downsides are that it also produces more metadata and tends to be
slower.

Hash collisions are a potential problem with deduplication. When a piece of data receives a
hash number, that number is then compared with the index of other existing hash numbers.
If that hash number is already in the index, the piece of data is considered a duplicate and
does not need to be stored again. Otherwise, the new hash number is added to the index
and the new data is stored. In rare cases, the hash algorithm may produce the same hash
number for two different chunks of data. When a hash collision occurs, the system won't
store the new data because it sees that its hash number already exists in the index. This is
called a false positive, and it can result in data loss. Some vendors combine hash
algorithms to reduce the possibility of a hash collision. Some vendors are also examining
metadata to identify data and prevent collisions.

Data deduplication vs. compression vs. thin provisioning


Another technique often associated with deduplication is compression. However, the two
techniques operate differently: data dedupe seeks out redundant chunks of data, while
compression uses an algorithm to reduce the number of bits needed to represent data.

Compression and delta differencing are often used with deduplication. Taken together,
these three data reduction techniques are designed to optimize storage capacity.

Thin provisioning optimizes how capacity is utilized in a storage area network.


Conversely, erasure codingis a method of data protection that breaks data into fragments
and encodes each fragment with redundant data pieces to help reconstruct corrupted data
sets.

Other benefits of deduplication include:

 A reduced data footprint;

 Lower bandwidth consumption when copying data associated with remote


backups, replication and disaster recovery;

 Longer retention periods;

 Faster recovery time objectives; and

 Reduced tape backups.


Deduplication of primary data and the cloud

Data deduplication originated in backup and secondary storage, although it is possible to


dedupe primary data sets. It is particularly helpful to maximize flash storage capacity and
performance. Primary storage deduplication occurs as a function of the storage hardware or
operating system software.

Techniques for data dedupe hold promise for cloud services providers in the area of
rationalizing expenses. The ability to deduplicate what they store results in lower costs for
disk storage and bandwidth for off-site replication.
data compression
Data compression is a reduction in the number of bits needed to represent data.
Compressing data can save storage capacity, speed up file transfer, and decrease costs
for storage hardware and network bandwidth.

How compression works

Compression is performed by a program that uses a formula or algorithm to determine how


to shrink the size of the data. For instance, an algorithm may represent a string of bits -- or
0s and 1s -- with a smaller string of 0s and 1s by using a dictionary for the conversion
between them, or the formula may insert a reference or pointer to a string of 0s and 1s that
the program has already seen.

Text compression can be as simple as removing all unneeded characters, inserting a single
repeat character to indicate a string of repeated characters and substituting a smaller bit
string for a frequently occurring bit string. Data compression can reduce a text file to 50% or
a significantly higher percentage of its original size.

For data transmission, compression can be performed on the data content or on the entire
transmission unit, including header data. When information is sent or received via the
internet, larger files, either singly or with others as part of an archive file, may be transmitted
in a ZIP, GZIP or other compressed format.

Why is data compression important?

Data compression can dramatically decrease the amount of storage a file takes up. For
example, in a 2:1 compression ratio, a 20 megabyte (MB) file takes up 10 MB of space. As
a result of compression, administrators spend less money and less time on storage.
Compression optimizes backup storage performance and has recently shown up in primary storage
data reduction. Compression will be an important method of data reduction as data continues to grow
exponentially.

Virtually any type of file can be compressed, but it's important to follow best practices when
choosing which ones to compress. For example, some files may already come compressed, so
compressing those files would not have a significant impact.

Data compression methods: lossless and lossy compression

Compressing data can be a lossless or lossy process. Lossless compression enables the restoration of
a file to its original state, without the loss of a single bit of data, when the file is uncompressed.
Lossless compression is the typical approach with executables, as well as text and spreadsheet files,
where the loss of words or numbers would change the information.

Lossy compression permanently eliminates bits of data that are redundant, unimportant or
imperceptible. Lossy compression is useful with graphics, audio, video and images, where the
removal of some data bits has little or no discernible effect on the representation of the content.

Graphics image compression can be lossy or lossless. Graphic image file formats are
typically designed to compress information since the files tend to be large. JPEG is an
image file format that supports lossy image compression. Formats such as GIF and PNG
use lossless compression.

Compression vs. data deduplication

Compression is often compared to data deduplication, but the two techniques operate
differently. Deduplication is a type of compression that looks for redundant chunks of data
across a storage or file systemand then replaces each duplicate chunk with a pointer to the
original. Data compression algorithms reduce the size of the bit strings in a data stream that
is far smaller in scope and generally remembers no more than the last megabyte or less of
data.
File-level deduplication eliminates redundant files and replaces them with stubs pointing to the
original file. Block-level deduplication identifies duplicate data at the subfile level. The system saves
unique instances of each block, uses a hash algorithm to process them and generates a unique
identifier to store them in an index. Deduplication typically looks for larger chunks of duplicate data
than compression, and systems can deduplicate using a fixed or variable-sized chunk.

Deduplication is most effective in environments that have a high degree of redundant data, such
as virtual desktop infrastructure or storage backup systems. Data compression tends to be more
effective than deduplication in reducing the size of unique information, such as images, audio,
videos, databases and executable files. Many storage systems support both compression and
deduplication.

Data compression and backup

Compression is often used for data that's not accessed much, as the process can be intensive and slow
down systems. Administrators, though, can seamlessly integrate compression in their backup
systems.

Backup is a redundant type of workload, as the process captures the same files frequently. An
organization that performs full backups will often have close to the same data from backup to
backup.

There are major benefits to compressing data prior to backup:

 Data takes up less space, as a compression ratio can reach 100:1, but between 2:1 and 5:1 is
common.

 If compression is done in a server prior to transmission, the time needed to transmit the data and
the total network bandwidth are drastically reduced.

 On tape, the compressed, smaller file system image can be scanned faster to reach a particular
file, reducing restore latency.

 Compression is supported by backup software and tape libraries, so there is a choice of data
compression techniques.
Pros and cons of compression

The main advantages of compression are a reduction in storage hardware, data transmission time and
communication bandwidth -- and the resulting cost savings. A compressed file requires less storage
capacity than an uncompressed file, and the use of compression can lead to a significant decrease in
expenses for disk and/or solid-state drives. A compressed file also requires less time for transfer, and
it consumes less network bandwidth than an uncompressed file.

The main disadvantage of data compression is the performance impact resulting from the use
of CPU and memory resources to compress the data and perform decompression. Many vendors have
designed their systems to try to minimize the impact of the processor-intensive calculations
associated with compression. If the compression runs inline, before the data is written to disk, the
system may offload compression to preserve system resources. For instance, IBM uses a
separate hardware accelerationcard to handle compression with some of its enterprise storage
systems.

If data is compressed after it is written to disk, or post-process, the compression may run in the
background to reduce the performance impact. Although post-process compression can reduce the
response time for each input/output (I/O), it still consumes memory and processor cycles and can
affect the overall number of I/Os a storage system can handle. Also, because data initially must be
written to disk or flash drives in an uncompressed form, the physical storage savings are not as great
as they are with inline compression.

Data compression techniques: File system compression

File system compression takes a fairly straightforward approach to reducing the storage footprint of
data by transparently compressing each file as it is written.

Many of the popular Linux file systems -- including Reiser4, ZFS and btrfs -- and Microsoft NTFS
have a compression option. The server compresses chunks of data in a file and then writes the
smaller fragments to storage.
Read-back involves a relatively small latency to expand each fragment, while writing adds
substantial load to the server, so compression is usually not recommended for data that is volatile.
File system compression can weaken performance, so it should be deployed selectively on files that
are not accessed frequently.

Historically, with the expensive hard drives of early computers, data compression software, such as
DiskDoubler and SuperStor Pro, were popular and helped establish mainstream file system
compression.

Storage administrators can also apply the technique of using compression and deduplication for
improved data reduction.

Technologies and products that use data compression

Compression is built into a wide range of technologies, including storage systems, databases,
operating systems and software applications used by businesses and enterprise organizations.
Compressing data is also common in consumer devices, such as laptops, PCs and mobile phones.

Many systems and devices perform compression transparently, but some give users the option to turn
compression on or off. It can be performed more than once on the same file or piece of data, but
subsequent compressions result in little to no additional compression and may even increase the size
of the file to a slight degree, depending on the data compression algorithms.

WinZip is a popular Windows program that compresses files when it packages them in an archive.
Archive file formats that support compression include ZIP and RAR. The BZIP2 and GZIP formats
see widespread use for compressing individual files.

Other vendors that offer compression include Dell EMC with its XtremIO all-flash array, Kaminario
with its K2 all-flash array and RainStor with its data compression software.

Data differencing
Data differencing is a general term for comparing the contents of two data objects. In the context of
compression, it involves repetitively searching through the target file to find similar blocks and
replacing them with a reference to a library object. This process repeats until it finds no additional
duplicate objects. Data differencing can result in many compressed files with just one element in the
library representing each duplicated object.

In virtual desktops, this technique can feature a compression ratio of as much as 100:1. The
process is often more closely aligned with deduplication, which looks for identical files or
objects, rather than within the content of each object.

Data differencing is sometimes referred to as deduplication.

encryption
In computing, encryption is the method by which plaintext or any other type of data is
converted from a readable form to an encoded version that can only be decoded by another
entity if they have access to a decryption key. Encryption is one of the most important
methods for providing data security, especially for end-to-end protection of data transmitted
across networks.

Encryption is widely used on the internet to protect user information being sent between a
browser and a server, including passwords, payment information and other personal
information that should be considered private. Organizations and individuals also commonly
use encryption to protect sensitive data stored on computers, servers and mobile devices
like phones or tablets.

How encryption works

Unencrypted data, often referred to as plaintext, is encrypted using an


encryption algorithm and an encryption key. This process generates ciphertext that can only
be viewed in its original form if decrypted with the correct key. Decryption is simply the
inverse of encryption, following the same steps but reversing the order in which the keys are
applied. Today's most widely used encryption algorithms fall into two categories: symmetric
and asymmetric.

DEFINITION

encryption

Posted by: Margaret Rouse

WhatIs.com

Contributor(s): Peter Loshin, Michael Cobb, Robert Bauchle, Fred Hazen, John Lund, Gabe Oakley, Frank Rundatz

This definition is part of our Essential Guide: No-code/low-code app development evolves from loathed to loved






Sponsored News

 Five Steps to Maximizing the Value of Hadoop–SAS Institute Inc.


 Korean Tour Group Boosts Profit with Automated Marketing Performance and ... –SAS Institute Inc.

 See More

Vendor Resources
 Encrypting Data at Rest–Amazon Web Services

In computing, encryption is the method by which plaintext or any other type of data is
converted from a readable form to an encoded version that can only be decoded by another
entity if they have access to a decryption key. Encryption is one of the most important
methods for providing data security, especially for end-to-end protection of data transmitted
across networks.

DOWNLOAD THIS FREE GUIDE

The ABCs of Ciphertext Exploits

Experts reveal 18 types of cryptography attacks, and how


they are executed. Today’s cryptography is far more
advanced than the cryptosystems of yesterday, don’t let your
system be compromised.
 Corporate E-mail Address:
Download Now

By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content,

products and special offers.

You also agree that your personal information may be transferred and processed in the United States, and that you have read and

agree to the Terms of Use and the Privacy Policy.

Encryption is widely used on the internet to protect user information being sent between a
browser and a server, including passwords, payment information and other personal
information that should be considered private. Organizations and individuals also commonly
use encryption to protect sensitive data stored on computers, servers and mobile devices
like phones or tablets.

How encryption works

Unencrypted data, often referred to as plaintext, is encrypted using an


encryption algorithm and an encryption key. This process generates ciphertext that can only
be viewed in its original form if decrypted with the correct key. Decryption is simply the
inverse of encryption, following the same steps but reversing the order in which the keys are
applied. Today's most widely used encryption algorithms fall into two categories: symmetric
and asymmetric.

DEFINITION

encryption
In computing, encryption is the method by which plaintext or any other type of data is
converted from a readable form to an encoded version that can only be decoded by another
entity if they have access to a decryption key. Encryption is one of the most important
methods for providing data security, especially for end-to-end protection of data transmitted
across networks.

DOWNLOAD THIS FREE GUIDE


The ABCs of Ciphertext Exploits

Experts reveal 18 types of cryptography attacks, and how


they are executed. Today’s cryptography is far more
advanced than the cryptosystems of yesterday, don’t let your
system be compromised.
 Corporate E-mail Address:
Download Now

By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content,

products and special offers.

You also agree that your personal information may be transferred and processed in the United States, and that you have read and

agree to the Terms of Use and the Privacy Policy.

Encryption is widely used on the internet to protect user information being sent between a
browser and a server, including passwords, payment information and other personal
information that should be considered private. Organizations and individuals also commonly
use encryption to protect sensitive data stored on computers, servers and mobile devices
like phones or tablets.

How encryption works

Unencrypted data, often referred to as plaintext, is encrypted using an


encryption algorithm and an encryption key. This process generates ciphertext that can only
be viewed in its original form if decrypted with the correct key. Decryption is simply the
inverse of encryption, following the same steps but reversing the order in which the keys are
applied. Today's most widely used encryption algorithms fall into two categories: symmetric
and asymmetric.

How the encryption operation works

Symmetric-key ciphers, also referred to as "secret key," use a single key, sometimes
referred to as a shared secret because the system doing the encryption must share it with
any entity it intends to be able to decrypt the encrypted data. The most widely used
symmetric-key cipher is the Advanced Encryption Standard (AES), which was designed to
protect government classified information.
Symmetric-key encryption is usually much faster than asymmetric encryption, but the
sender must exchange the key used to encrypt the data with the recipient before the
recipient can perform decryption on the ciphertext. The need to securely distribute and
manage large numbers of keys means most cryptographic processes use a symmetric
algorithm to efficiently encrypt data, but use an asymmetric algorithm to securely exchange
the secret key.

Asymmetric cryptography, also known as public key cryptography, uses two different but
mathematically linked keys, one public and one private. The public key can be shared with
everyone, whereas the private key must be kept secret. The RSA encryption algorithm is
the most widely used public key algorithm, partly because both the public and the private
keys can encrypt a message; the opposite key from the one used to encrypt a message is
used to decrypt it. This attribute provides a method of assuring not only confidentiality, but
also the integrity, authenticity and nonreputability of electronic communications and data at
rest through the use of digital signatures.

Benefits of encryption

The primary purpose of encryption is to protect the confidentiality of digital data stored on
computer systems or transmitted via the internet or any other computer network. A number
of organizations and standards bodies either recommend or require sensitive data to be
encrypted in order to prevent unauthorized third parties or threat actors from accessing the
data. For example, the Payment Card Industry Data Security Standardrequires merchants
to encrypt customers' payment card data when it is both stored at rest and transmitted
across public networks.

Modern encryption algorithms also play a vital role in the security assurance of IT systems and
communications as they can provide not only confidentiality, but also the following key elements of
security:

 Authentication: the origin of a message can be verified.

 Integrity: proof that the contents of a message have not been changed since it was sent.
 Nonrepudiation: the sender of a message cannot deny sending the message.
Types of encryption

Traditional public key cryptography depends on the properties of large prime numbers and the
computational difficulty of factoring those primes. Elliptical curve cryptography(ECC) enables
another kind of public key cryptography that depends on the properties of the elliptic curve equation;
the resulting cryptographic algorithms can be faster and more efficient and can produce comparable
levels of security with shorter cryptographic keys. As a result, ECC algorithms are often
implemented in internet of things devices and other products with limited computing resources.

As development of quantum computing continues to approach practical application, quantum


cryptography will become more important. Quantum cryptography depends on the quantum
mechanical properties of particles to protect data. In particular, the Heisenberg uncertainty principle
posits that the two identifying properties of a particle -- its location and its momentum -- cannot be
measured without changing the values of those properties. As a result, quantum encoded data cannot
be copied because any attempt to access the encoded data will change the data. Likewise, any attempt
to copy or access the data will cause a change in the data, thus notifying the authorized parties to the
encryption that an attack has occurred.

Encryption is used to protect data stored on a system (encryption in place or encryption at rest); many
internet protocols define mechanisms for encrypting data moving from one system to another (data in
transit).

Some applications tout the use of end-to-end encryption (E2EE) to guarantee data being sent between
two parties cannot be viewed by an attacker that intercepts the communication channel. Use of an
encrypted communication circuit, as provided by Transport Layer Security (TLS) between web client
and web server software, is not always enough to insure E2EE; typically, the actual content being
transmitted is encrypted by client software before being passed to a web client, and decrypted only
by the recipient.
Messaging apps that provide E2EE include Facebook's WhatsApp and Open Whisper Systems'
Signal. Facebook Messenger users may also get E2EE messaging with the "Secret Conversations"
option.

How encryption is used

Encryption was almost exclusively used only by governments and large enterprises until the late
1970s when the Diffie-Hellman key exchange and RSA algorithms were first published -- and the
first personal computers were introduced. By the mid-1990s, both public key and private key
encryption were being routinely deployed in web browsers and servers to protect sensitive data.

Encryption is now an important part of many products and services, used in the commercial and
consumer realms to protect data both while it is in transit and while it is stored, such as on a hard
drive, smartphone or flash drive (data at rest).

Devices like modems, set-top boxes, smartcards and SIM cards all use encryption or rely
on protocols like SSH, S/MIME, and SSL/TLS to encrypt sensitive data. Encryption is used to
protect data in transit sent from all sorts of devices across all sorts of networks, not just the internet;
every time someone uses an ATM or buys something online with a smartphone, makes a mobile
phone call or presses a key fob to unlock a car, encryption is used to protect the information being
relayed. Digital rights management systems, which prevent unauthorized use or reproduction of
copyrighted material, are yet another example of encryption protecting data.

Cryptographic hash functions

Encryption is usually a two-way function, meaning the same algorithm can be used to encrypt
plaintext and to decrypt ciphertext. A cryptographic hash function can be viewed as a type of one-
way function for encryption, meaning the function output cannot easily be reversed to recover the
original input. Hash functions are commonly used in many aspects of security to generate digital
signatures and data integrity checks. They take an electronic file, message or block of data and
generate a short digital fingerprint of the content called a message digest or hash value. The key
properties of a secure cryptographic hash function are:
 Output length is small compared to input

 Computation is fast and efficient for any input

 Any change to input affects lots of output bits

 One-way value -- the input cannot be determined from the output

 Strong collision resistance -- two different inputs can't create the same output

The ciphers in hash functions are optimized for hashing: They use large keys and blocks, can
efficiently change keys every block and have been designed and vetted for resistance to related-key
attacks. General-purpose ciphers used for encryption tend to have different design goals. For
example, the symmetric-key block cipher AES could also be used for generating hash values, but its
key and block sizes make it nontrivial and inefficient.

Contemporary encryption issues

For any cipher, the most basic method of attack is brute force; trying each key until the right one is
found. The length of the key determines the number of possible keys, hence the feasibility of this
type of attack. Encryption strength is directly tied to key size, but as the key size increases so, too, do
the resources required to perform the computation.

Alternative methods of breaking a cipher include side-channel attacks, which don't attack the actual
cipher but the physical side effects of its implementation. An error in system design or execution can
allow such attacks to succeed.

Attackers may also attempt to break a targeted cipher through cryptanalysis, the process of
attempting to find a weakness in the cipher that can be exploited with a complexity less than a brute-
force attack. The challenge of successfully attacking a cipher is easier if the cipher itself is already
flawed. For example, there have been suspicions that interference from the National Security
Agency weakened the Data Encryption Standard algorithm, and following revelations from former
NSA analyst and contractor Edward Snowden, many believe the NSA has attempted to subvert other
cryptography standards and weaken encryption products.
More recently, law enforcement agencies such as the FBI have criticized technology companies that
offer end-to-end encryption, arguing that such encryption prevents law enforcement from accessing
data and communications even with a warrant. The FBI has referred to this issue as "Going Dark,"
while the U.S. Department of Justice has proclaimed the need for "responsible encryption" that can
be unlocked by technology companies under a court order.

History of encryption

The word encryption comes from the Greek word kryptos, meaning hidden or secret. The use of
encryption is nearly as old as the art of communication itself. As early as 1900 B.C., an Egyptian
scribe used nonstandard hieroglyphs to hide the meaning of an inscription. In a time when most
people couldn't read, simply writing a message was often enough, but encryption schemes soon
developed to convert messages into unreadable groups of figures to protect the message's secrecy
while it was carried from one place to another. The contents of a message were reordered
(transposition) or replaced (substitution) with other characters, symbols, numbers or pictures in order
to conceal its meaning.

In 700 B.C., the Spartans wrote sensitive messages on strips of leather wrapped around
sticks. When the tape was unwound, the characters became meaningless, but with a stick
of exactly the same diameter, the recipient could recreate (decipher) the message. Later,
the Romans used what's known as the Caesar Shift Cipher, a monoalphabetic cipher in
which each letter is shifted by an agreed number. So, for example, if the agreed number is
three, then the message, "Be at the gates at six" would become "eh dw wkh jdwhv dw vla".
At first glance this may look difficult to decipher, but juxtaposing the start of the alphabet
until the letters make sense doesn't take long. Also, the vowels and other commonly used
letters like T and S can be quickly deduced using frequency analysis, and that information,
in turn, can be used to decipher the rest of the message.

The Middle Ages saw the emergence of polyalphabetic substitution, which uses multiple
substitution alphabets to limit the use of frequency analysis to crack a cipher. This method
of encrypting messages remained popular despite many implementations that failed to
adequately conceal when the substitution changed, also known as key progression.
Possibly the most famous implementation of a polyalphabetic substitution cipher is the
Enigma electromechanical rotor cipher machine used by the Germans during World War II.

It was not until the mid-1970s that encryption took a major leap forward. Until this point, all
encryption schemes used the same secret for encrypting and decrypting a message: a
symmetric key. In 1976, Whitfield Diffie and Martin Hellman's paper "New Directions in
Cryptography" solved one of the fundamental problems of cryptography: namely, how to
securely distribute the encryption key to those who need it. This breakthrough was followed
shortly afterward by RSA, an implementation of public-key cryptography using asymmetric
algorithms, which ushered in a new era of encryption.

Das könnte Ihnen auch gefallen