Beruflich Dokumente
Kultur Dokumente
attached storage (NAS) arrays. Storage area network (SAN) arrays: Modular SAN arrays.
Monolithic SANarrays.
Finding the right storage devices for your business or domestic needs can be problematic,
whether you’re struggling to balance cost with the amount of capacity you expect to need,
or unable to decide on the right data storage array. For some people, choosing between
modular and monolithic storage is incredibly difficult: each type offers its own advantages
and disadvantages, and will appeal to people for various reasons. Understanding the
difference between the two types can be quite a challenge, given the complexity of their
design and performance, but at eProvided, we’re here to help you make sense of all things
storage and data recovery – we’re going to explore the basics of modular and monolithic
storage, to help you make a more informed choice.
Modular Storage
Modular storage arrays are typically based on two controllers, which are kept apart from the
unit’s disks – this ensures that, if one controller experiences a failure, the second will take
over from the first automatically. These are held on a shelf which runs on a separate power
source to the disks, to which they’re connected via cables (usually copper or optical). One
of the key advantages of modular storage is that it’s usually cheaper than monolithic
storage, and can be added to over time: you may want to start with a single controller shelf
and one shelf housing disks, before adding more and more as your needs dictate, until you
reach its optimum capacity.
Generally, smaller businesses or households with low storage demands may begin with a
low-capacity modular storage device – which is more cost-effective – and then expand it as
the budget allows and your data needs demand.
In general, modular storage is usually up to 25% cheaper than monolithic devices, though
each will differ. These also tend to offer interfaces designed for maximum user-friendliness,
allowing for easier capacity-expansion – reducing the need to make calls for technical
assistance.
Monolithic Storage
Unlike modular storage, monolithic is based on disks fixed into the array frame and linked to
the various controllers via massive amounts of cache memory. These offer helpful
redundancy features, so that any failures are immediately compensated for by another –
thanks to this design, monolithic storage rarely fails.
With plenty of cache memory (shared among any linked devices), these offer greater
storage capacities than modular arrays, and are typically equipped for better data-
management. The more cache memory your storage device has, the faster and smoother it
will run. These typically boast the better service and support of the two, and their proven
disaster-recovery appeals to many businesses – you may be more willing to pay for long-
term reliability (if your budget allows).
At eProvided, we offer data recovery across all devices, regardless of its construction. With
more than fifteen years in business and a team of experienced experts, we use the latest
techniques to retrieve your vital lost data – whether you fear your favourite images have
been overwritten or damaged for good or your hard drive appears to have shut down or
broken, we’re here to help. Just give us a call on 1 – 866 – 857 – 5950, or
email contact@eprovided.com.
========================
Both types of storage subsystems can be used for most applications and environments.
Choosing which is best means assessing the benefits and risks of each.
Analysis
Change is now a constant in the storage marketplace. New partnerships are reshaping
distribution and support channels, and innovations in packaging and breakthrough
technologies will further segment the storage market. Disk array features that provided
product differentiation last year are now often prerequisites for inclusion in an enterprise’s
shortlist and are now offered by monolithic and modular disk arrays. Figure 1 illustrates the
architectural differences between the two approaches.
Figure 1
Today, enterprises can deploy either kind of array for most applications, and both are
suitable for networked storage environments. For monolithic and modular systems, point-in-
time (PIT) and remote-copy (RC) technologies have become almost ubiquitous, while FC
and storage area networks (SANs) have become standard interconnect technologies. As with
features and functions, the relevance of data availability, throughput and performance is
also evolving. Unplanned downtime is now such a rare event that enterprises often focus
more on planned downtime issues to identify product differentiation. In addition, many
users are now viewing throughput and performance as a scalability enabler and productivity
tool rather than as a source of significant product differentiation.
Monolithic systems’ value-adding features, and lots of experience of using them to support
extreme 24x7 environments, make them a relatively low-risk solution compared with
modular arrays. Monolithic systems also offer better tuning tools than modular arrays. For
example, EMC and Hitachi Data Systems (HDS) provide the ability to identify "hot spots" in
redundant array of independent disks (RAID) sets and nondisruptively move them. Service
and support of monolithic arrays are also generally better than for modular arrays. Because
monolithic-array vendors are trying to move down-market while modular array vendors are
trying to move up-market, we encourage enterprises to seek bids from vendors that can
offer both solutions in open-system environments.
The risks of monolithic systems — limited deployment flexibility, higher costs and sensitivity
to cache hit ratios — derive from their cache-centric designs. In particular, small monolithic
configurations are very expensive because they are partially populated boxes that are
expensive to maintain. The only significant difference between a big and small configuration
is the number of disks in the box. There is also a greater risk of vendor lock-in, which
increases the acquisition cost of storage infrastructures.
Figure 2
FC host connections provide modular arrays with the connectivity needed to implement
storage consolidation projects, and put them into more direct competition with monolithic
arrays. Today's leading modular arrays have no single points of failure, and many are also
delivering nondisruptive maintenance and microcode updates. Modular arrays and
monolithic arrays should easily exceed most throughput requirements, although costs may
be very different. Modular array throughput is increased by adding additional controller
electronics and host connections, while monolithic performance is generally enhanced by
adding cache or additional host connections.
Modular arrays’ low cost and high availability are compelling advantages in small and
midsize enterprises not requiring sophisticated DR schemes or S/390 connectivity. Their low
cost, even in small configurations (less than 500GB), and consolidated management tools
make them the only solution when high-availability distributed storage is required.
Performance is very good in open-system environments, because modular arrays rely
primarily on bandwidth rather than cache hit ratios to deliver throughput. However,
identifying and resolving throughput problems can be difficult because of a lack of
instrumentation and tuning tools.
Where scalability and connectivity limitations are not an issue, the availability of PIT and RC
functionality makes modular arrays viable in mission-critical environments. Understanding
the details of PIT implementations is also critical because implementations from the same
vendor are not always functionally equivalent — for example, EMC’s SnapView is a logical
PIT copy, whereas TimeFinder is a physical PIT copy.
Risks focus on large applications that span multiple storage subsystems, and on service and
support. Many smaller vendors rely on third-party support and are still not experienced in
supporting high-availability environments. Modular array vendors also have less experience
with modular PIT and RC solutions than monolithic vendors, which means it is more
important to check references and the level of education and consulting services that are
available.
Key Issues
How will storage systems evolve during the next five years?
What changes in technologies and vendor dynamics will shape the storage industry?
Acronym Key
DR Disaster recovery
ESCON Enterprise Systems Connection
FC Fibre Channel
HBA Host bus adapter
HDS Hitachi Data Systems
ISV Independent software vendor
PIT Point-in-time
RAID Redundant array of independent disks
RC Remote copy
SAN Storage area network
=============================
Monolithic and modular storage both take advantage of the movement toward network
storage through consolidation, scalability, performance, availability and a better return on
investment.
There are strong benefits for each design - the hard part is choosing.
Interoperability Getting much better, but not Since these systems have the
ubiquitous. Look for largest base, in general, more
interoperability interoperability testing is
processes/certifications and available for their systems. As
standards (i.e., FCIA SANmark volumes shift towards modular
certification). Complete solutions systems, this advantage will go
available from various vendors. away.
Some modular vendors lead the
market in this area.
But, here's where it gets interesting: According to analysts, monolithic systems generally
provide more robust failover, availability, interoperability testing and professional service.
On the other hand, the little guys excel in scalability, performance, management and cost
less.
Monolithic boxes usually come with the RAID controllers and disk drives in large, self-
contained, one-size-fits-all enclosures. The majority of today's storage area network (SAN)-
based storage is in large enterprises where mainframe class external RAID systems were
established with EMC's Symmetrix and IBM/StorageTek's monolithic storage systems.
Currently EMC's Symmetrix, IBM's Enterprise Storage Server (Shark), StorageTek's SVA,
and Hitachi Data Systems' Lightning Series provide this large form-factor storage system
capable of attaching to mainframes, as well as open systems server environments.
In contrast, modular external RAID controller-based storage systems are defined by the
distinct separation of the RAID controller module(s) from the disk drive module(s). Each
module is housed in industry standard racks - which may also hold other general-
purposeappliance servers - and separate the scalability of performance and capacity.
Modular storage systems are targeted specifically at open systems servers, and typically
don't carry the large cache and myriad of connections, Enterprise Systems Connection
(ESCON) and Fiber Connectivity (FICON), which are required for mainframe storage.
Examples of modular storage includes Compaq's EVA, EMC's Clariion, Dell's PowerVault,
HP's VA7000 series, IBM's FAStT series, LSI Logic's E-Series, StorageTek's D-Series,
XIOtech's Magnitude, and others.The analysts at Gartner, Inc. in Stamford, CT, are
projecting that modular storage systems will exceed the revenue of monolithic storage
systems in 2003, with a continuing growth trend favoring modular.
Modular storage developed in much the same vein as stereo components did: to satisfy a customer's
desire for lower-cost gear and prevent vendor lock-in. For example, with modular components an
iSCSI router can easily attach to lower-end servers that don't require the performance of Fibre
Channel (FC).
Cost. Gartner estimates 25% less than monolithic. The actual cost comparisons can vary
drastically and must be evaluated for each configuration.
Scalability and flexibility. Systems can dynamically scale capacity and performance
independently. Systems can start small and grow cost effectively.
Footprint. The latest disk drives and packaging allow for optimum capacity per square foot.
Manageability and usability. User-friendly interfaces and management tools have been
designed to allow administrators to expand capacity, create new volumes, map new servers and
other tasks that often require service calls in the monolithic storage systems.
The IBM 3XX mainframe systems of the late 70s and early 80s were limited in their performance
due to the availability and cost of memory technologies. Mainframe storage vendors - initially EMC
and IBM/StorageTek - developed cache-focused storage systems to counteract the architectural
limitations of the mainframe environments. In the mainframe environment, it proved to be more cost
and performance effective to place significant cache in the storage system relative to the mainframe
to accommodate its more predictable, read-oriented applications. These large cache storage systems
provided significant performance advantages in the mainframe environment.
Typically one or more storage systems were connected to each mainframe system. Initially, midrange
servers - running Unix, NT and NetWare - each had their separate physical storage, either internally
or externally attached. Eventually these large external RAID storage systems expanded from
mainframe to also attach to open systems servers using a networked storage model. The advent of
consolidated, networked storage provided the ability to logically divide storage, allowing for much
higher capacity utilization.
The monolithic storage systems now offered with EMC's Symmetrix, IBM's Shark, StorageTek's
SVA and Hitachi's Lightning series provide the following:
Connection. Both mainframe and open systems servers can be attached, although it's not
recommended to use both on the same physical system.
Homogenous environment. With the large installed base of these systems, it may be
advantageous to keep one brand of storage system.
Capacity in a single storage subsystem is greater than most modular storage systems.
Robust replication and disaster recovery. The largest installed base of replication products are
on these systems which are perceived to have the most robust mirroring solutions, although this
gap is closing quickly.
ISV integration. Many integrators are familiar with these established systems.
Service and support. Extensive professional services and support teams are in place.
The good news is that storage technology is accelerating at a tremendous pace, bringing increased
functionality and benefits. Most importantly there's more choice: There isn't a one-size-fits-all
solution. Monolithic systems provide external RAID storage for mainframes and also attach to open
systems servers. Modular storage systems have been designed to attach to open systems
environments and provide a flexible, lower-cost alternative in these environments. As noted earlier,
there are benefits to each design.
Monolithic storage systems are the only alternative for mainframe attach and may be viable for open
systems servers, primarily if keeping with a single brand of storage. The additional cost of these large
systems may be justified because monolithic systems provide a homogenous storage environment.
They offer the most proven disaster recovery and replication products and usually come with the
highest levels of service and support.
Modular systems, on the other hand, are more flexible and scalable and allow a pay-as-you-grow
price structure. The management interfaces are often designed for end users, and as a result it's easier
to expand capacity, manage volumes, provision storage and a variety of other tasks that would likely
require a service call in the monolithic world. Performance is also superior in the open systems
environments, and modular systems cost significantly less than monolithic systems. In addition,
today most storage innovations first appear in modular systems.
The choices that are available offer increasing value to the customer and - with a little investigation -
administrators can meet their storage needs today and have an infrastructure that accommodates their
future requirements.
=============================================================
Arrays were initially designed to separate storage from servers so systems could be built
into large, monolithic configurations for block- or file-based storage. They have
complicated redundancy features built into them such as high-performance RAID
controllers, and the storage may be configured with logical unit numbers (LUNs).
The storage array is the backbone of the modern business storage environment. Arrays
have evolved into different designs for enterprise, midrange and small business
environments, and offer a wide-range of data protection and high-availability features. They
contain controllers -- the brains of the system -- that provide a level of abstraction between
the operating system and physical drives. A controller has the ability to access copies of
data across physical devices, and can take the form of a PCI or PCI Expresscard designed
to support a specific drive format such as Serial ATA (SATA) or serial-attached SCSI (SAS).
Initially built for HDDs for storage area networks (SANs) -- block-based storage – or network-
attached storage (NAS) -- file-based storage -- there are now systems built for flash or solid-state
drive (SSD) storage arrays. Flash arrays contain flash memory drives designed to overcome the
performance and capacity limitations of mechanical, spinning disk drives. A flash array can read data
from SSDs much faster than disk drives and are increasingly used to boost application performance.
Storage arrays can be all-flash, all-spinning disk or hybrids combining both types of media.
array-based replication
Array-based replication is an approach to data backup in which compatible storage arrays
use built-in software to automatically copy data from one storage array to another.
remote replication
What is remote replication?
Replication occurs in one of three places: in the storage array, at the host (server) or in the
network. Most enterprise data storage vendors include replication software on their high-
end and mid-range storage arrays. Host-based replication software runs on standard
servers, making it the cheapest and easiest type of replication to manage for many, but it
taxes server processing. Replication on the network requires an additional device, either an
intelligent switch or an appliance.
host-based replication
Host-based replication is the practice of using servers to copy data from one site to another.
To enable efficient and secure data copying, host-based replication software products
include capacities such as deduplication, compression, encryption, and throttling. Host-
based replication can also provide server and application failover capability to aid in disaster
recovery.
The other two main types of replication are array-based and network-based. Array-based
replication takes place in the storage array and network-based replication requires adding
an appliance or intelligent switch. Host-based replication is less expensive than the other
two options and works with any type of storage. However, it can impact performance of the
server and can also be negatively affected by viruses and application crashes.
data deduplication
Data deduplication -- often called intelligent compression or single-instance storage -- is a
process that eliminates redundant copies of data and reduces storage overhead. Data
deduplication techniques ensure that only one unique instance of data is retained on
storage media, such as disk, flash or tape. Redundant data blocks are replaced with a
pointer to the unique data copy. In that way, data deduplication closely aligns
with incremental backup, which copies only the data that has changed since the
previous backup.
For example, a typical email system might contain 100 instances of the same 1 megabyte
(MB) file attachment. If the email platform is backed up or archived, all 100 instances are
saved, requiring 100 MB of storage space. With data deduplication, only one instance of the
attachment is stored; each subsequent instance is referenced back to the one saved copy.
In this example, a 100 MB storage demand drops to 1 MB.
Source-based dedupe removes redundant blocks before transmitting data to a backup target at the
client or server level. There is no additional hardware required. Deduplicating at the source reduces
bandwidth and storage use.
In target-based dedupe, backups are transmitted across a network to disk-based hardware in a remote
location. Using deduplication targets increases costs, although it generally provides a performance
advantage compared to source dedupe, particularly for petabyte-scale data sets.
There are two main methods used to deduplicate redundant data: inline and post-processing
deduplication. Your backup environment will dictate which method you use.
Data deduplication generally operates at the file or block level. File deduplication eliminates
duplicate files, but is not an efficient means of deduplication.
File-level data deduplication compares a file to be backed up or archived with copies that
are already stored. This is done by checking its attributes against an index. If the file is
unique, it is stored and the index is updated; if not, only a pointer to the existing file is
stored. The result is that only one instance of the file is saved, and subsequent copies are
replaced with a stub that points to the original file.
Block-level deduplication looks within a file and saves unique iterations of each block. All
the blocks are broken into chunks with the same fixed length. Each chunk of data is
processed using a hash algorithm, such as MD5 or SHA-1.
This process generates a unique number for each piece, which is then stored in an index. If
a file is updated, only the changed data is saved, even if only a few bytes of the document
or presentation have changed. The changes don't constitute an entirely new file. This
behavior makes block deduplication far more efficient. However, block deduplication takes
more processing power and uses a much larger index to track the individual pieces.
Hash collisions are a potential problem with deduplication. When a piece of data receives a
hash number, that number is then compared with the index of other existing hash numbers.
If that hash number is already in the index, the piece of data is considered a duplicate and
does not need to be stored again. Otherwise, the new hash number is added to the index
and the new data is stored. In rare cases, the hash algorithm may produce the same hash
number for two different chunks of data. When a hash collision occurs, the system won't
store the new data because it sees that its hash number already exists in the index. This is
called a false positive, and it can result in data loss. Some vendors combine hash
algorithms to reduce the possibility of a hash collision. Some vendors are also examining
metadata to identify data and prevent collisions.
Compression and delta differencing are often used with deduplication. Taken together,
these three data reduction techniques are designed to optimize storage capacity.
Techniques for data dedupe hold promise for cloud services providers in the area of
rationalizing expenses. The ability to deduplicate what they store results in lower costs for
disk storage and bandwidth for off-site replication.
data compression
Data compression is a reduction in the number of bits needed to represent data.
Compressing data can save storage capacity, speed up file transfer, and decrease costs
for storage hardware and network bandwidth.
Text compression can be as simple as removing all unneeded characters, inserting a single
repeat character to indicate a string of repeated characters and substituting a smaller bit
string for a frequently occurring bit string. Data compression can reduce a text file to 50% or
a significantly higher percentage of its original size.
For data transmission, compression can be performed on the data content or on the entire
transmission unit, including header data. When information is sent or received via the
internet, larger files, either singly or with others as part of an archive file, may be transmitted
in a ZIP, GZIP or other compressed format.
Data compression can dramatically decrease the amount of storage a file takes up. For
example, in a 2:1 compression ratio, a 20 megabyte (MB) file takes up 10 MB of space. As
a result of compression, administrators spend less money and less time on storage.
Compression optimizes backup storage performance and has recently shown up in primary storage
data reduction. Compression will be an important method of data reduction as data continues to grow
exponentially.
Virtually any type of file can be compressed, but it's important to follow best practices when
choosing which ones to compress. For example, some files may already come compressed, so
compressing those files would not have a significant impact.
Compressing data can be a lossless or lossy process. Lossless compression enables the restoration of
a file to its original state, without the loss of a single bit of data, when the file is uncompressed.
Lossless compression is the typical approach with executables, as well as text and spreadsheet files,
where the loss of words or numbers would change the information.
Lossy compression permanently eliminates bits of data that are redundant, unimportant or
imperceptible. Lossy compression is useful with graphics, audio, video and images, where the
removal of some data bits has little or no discernible effect on the representation of the content.
Graphics image compression can be lossy or lossless. Graphic image file formats are
typically designed to compress information since the files tend to be large. JPEG is an
image file format that supports lossy image compression. Formats such as GIF and PNG
use lossless compression.
Compression is often compared to data deduplication, but the two techniques operate
differently. Deduplication is a type of compression that looks for redundant chunks of data
across a storage or file systemand then replaces each duplicate chunk with a pointer to the
original. Data compression algorithms reduce the size of the bit strings in a data stream that
is far smaller in scope and generally remembers no more than the last megabyte or less of
data.
File-level deduplication eliminates redundant files and replaces them with stubs pointing to the
original file. Block-level deduplication identifies duplicate data at the subfile level. The system saves
unique instances of each block, uses a hash algorithm to process them and generates a unique
identifier to store them in an index. Deduplication typically looks for larger chunks of duplicate data
than compression, and systems can deduplicate using a fixed or variable-sized chunk.
Deduplication is most effective in environments that have a high degree of redundant data, such
as virtual desktop infrastructure or storage backup systems. Data compression tends to be more
effective than deduplication in reducing the size of unique information, such as images, audio,
videos, databases and executable files. Many storage systems support both compression and
deduplication.
Compression is often used for data that's not accessed much, as the process can be intensive and slow
down systems. Administrators, though, can seamlessly integrate compression in their backup
systems.
Backup is a redundant type of workload, as the process captures the same files frequently. An
organization that performs full backups will often have close to the same data from backup to
backup.
Data takes up less space, as a compression ratio can reach 100:1, but between 2:1 and 5:1 is
common.
If compression is done in a server prior to transmission, the time needed to transmit the data and
the total network bandwidth are drastically reduced.
On tape, the compressed, smaller file system image can be scanned faster to reach a particular
file, reducing restore latency.
Compression is supported by backup software and tape libraries, so there is a choice of data
compression techniques.
Pros and cons of compression
The main advantages of compression are a reduction in storage hardware, data transmission time and
communication bandwidth -- and the resulting cost savings. A compressed file requires less storage
capacity than an uncompressed file, and the use of compression can lead to a significant decrease in
expenses for disk and/or solid-state drives. A compressed file also requires less time for transfer, and
it consumes less network bandwidth than an uncompressed file.
The main disadvantage of data compression is the performance impact resulting from the use
of CPU and memory resources to compress the data and perform decompression. Many vendors have
designed their systems to try to minimize the impact of the processor-intensive calculations
associated with compression. If the compression runs inline, before the data is written to disk, the
system may offload compression to preserve system resources. For instance, IBM uses a
separate hardware accelerationcard to handle compression with some of its enterprise storage
systems.
If data is compressed after it is written to disk, or post-process, the compression may run in the
background to reduce the performance impact. Although post-process compression can reduce the
response time for each input/output (I/O), it still consumes memory and processor cycles and can
affect the overall number of I/Os a storage system can handle. Also, because data initially must be
written to disk or flash drives in an uncompressed form, the physical storage savings are not as great
as they are with inline compression.
File system compression takes a fairly straightforward approach to reducing the storage footprint of
data by transparently compressing each file as it is written.
Many of the popular Linux file systems -- including Reiser4, ZFS and btrfs -- and Microsoft NTFS
have a compression option. The server compresses chunks of data in a file and then writes the
smaller fragments to storage.
Read-back involves a relatively small latency to expand each fragment, while writing adds
substantial load to the server, so compression is usually not recommended for data that is volatile.
File system compression can weaken performance, so it should be deployed selectively on files that
are not accessed frequently.
Historically, with the expensive hard drives of early computers, data compression software, such as
DiskDoubler and SuperStor Pro, were popular and helped establish mainstream file system
compression.
Storage administrators can also apply the technique of using compression and deduplication for
improved data reduction.
Compression is built into a wide range of technologies, including storage systems, databases,
operating systems and software applications used by businesses and enterprise organizations.
Compressing data is also common in consumer devices, such as laptops, PCs and mobile phones.
Many systems and devices perform compression transparently, but some give users the option to turn
compression on or off. It can be performed more than once on the same file or piece of data, but
subsequent compressions result in little to no additional compression and may even increase the size
of the file to a slight degree, depending on the data compression algorithms.
WinZip is a popular Windows program that compresses files when it packages them in an archive.
Archive file formats that support compression include ZIP and RAR. The BZIP2 and GZIP formats
see widespread use for compressing individual files.
Other vendors that offer compression include Dell EMC with its XtremIO all-flash array, Kaminario
with its K2 all-flash array and RainStor with its data compression software.
Data differencing
Data differencing is a general term for comparing the contents of two data objects. In the context of
compression, it involves repetitively searching through the target file to find similar blocks and
replacing them with a reference to a library object. This process repeats until it finds no additional
duplicate objects. Data differencing can result in many compressed files with just one element in the
library representing each duplicated object.
In virtual desktops, this technique can feature a compression ratio of as much as 100:1. The
process is often more closely aligned with deduplication, which looks for identical files or
objects, rather than within the content of each object.
encryption
In computing, encryption is the method by which plaintext or any other type of data is
converted from a readable form to an encoded version that can only be decoded by another
entity if they have access to a decryption key. Encryption is one of the most important
methods for providing data security, especially for end-to-end protection of data transmitted
across networks.
Encryption is widely used on the internet to protect user information being sent between a
browser and a server, including passwords, payment information and other personal
information that should be considered private. Organizations and individuals also commonly
use encryption to protect sensitive data stored on computers, servers and mobile devices
like phones or tablets.
DEFINITION
encryption
WhatIs.com
Contributor(s): Peter Loshin, Michael Cobb, Robert Bauchle, Fred Hazen, John Lund, Gabe Oakley, Frank Rundatz
This definition is part of our Essential Guide: No-code/low-code app development evolves from loathed to loved
Sponsored News
See More
Vendor Resources
Encrypting Data at Rest–Amazon Web Services
In computing, encryption is the method by which plaintext or any other type of data is
converted from a readable form to an encoded version that can only be decoded by another
entity if they have access to a decryption key. Encryption is one of the most important
methods for providing data security, especially for end-to-end protection of data transmitted
across networks.
You also agree that your personal information may be transferred and processed in the United States, and that you have read and
Encryption is widely used on the internet to protect user information being sent between a
browser and a server, including passwords, payment information and other personal
information that should be considered private. Organizations and individuals also commonly
use encryption to protect sensitive data stored on computers, servers and mobile devices
like phones or tablets.
DEFINITION
encryption
In computing, encryption is the method by which plaintext or any other type of data is
converted from a readable form to an encoded version that can only be decoded by another
entity if they have access to a decryption key. Encryption is one of the most important
methods for providing data security, especially for end-to-end protection of data transmitted
across networks.
You also agree that your personal information may be transferred and processed in the United States, and that you have read and
Encryption is widely used on the internet to protect user information being sent between a
browser and a server, including passwords, payment information and other personal
information that should be considered private. Organizations and individuals also commonly
use encryption to protect sensitive data stored on computers, servers and mobile devices
like phones or tablets.
Symmetric-key ciphers, also referred to as "secret key," use a single key, sometimes
referred to as a shared secret because the system doing the encryption must share it with
any entity it intends to be able to decrypt the encrypted data. The most widely used
symmetric-key cipher is the Advanced Encryption Standard (AES), which was designed to
protect government classified information.
Symmetric-key encryption is usually much faster than asymmetric encryption, but the
sender must exchange the key used to encrypt the data with the recipient before the
recipient can perform decryption on the ciphertext. The need to securely distribute and
manage large numbers of keys means most cryptographic processes use a symmetric
algorithm to efficiently encrypt data, but use an asymmetric algorithm to securely exchange
the secret key.
Asymmetric cryptography, also known as public key cryptography, uses two different but
mathematically linked keys, one public and one private. The public key can be shared with
everyone, whereas the private key must be kept secret. The RSA encryption algorithm is
the most widely used public key algorithm, partly because both the public and the private
keys can encrypt a message; the opposite key from the one used to encrypt a message is
used to decrypt it. This attribute provides a method of assuring not only confidentiality, but
also the integrity, authenticity and nonreputability of electronic communications and data at
rest through the use of digital signatures.
Benefits of encryption
The primary purpose of encryption is to protect the confidentiality of digital data stored on
computer systems or transmitted via the internet or any other computer network. A number
of organizations and standards bodies either recommend or require sensitive data to be
encrypted in order to prevent unauthorized third parties or threat actors from accessing the
data. For example, the Payment Card Industry Data Security Standardrequires merchants
to encrypt customers' payment card data when it is both stored at rest and transmitted
across public networks.
Modern encryption algorithms also play a vital role in the security assurance of IT systems and
communications as they can provide not only confidentiality, but also the following key elements of
security:
Integrity: proof that the contents of a message have not been changed since it was sent.
Nonrepudiation: the sender of a message cannot deny sending the message.
Types of encryption
Traditional public key cryptography depends on the properties of large prime numbers and the
computational difficulty of factoring those primes. Elliptical curve cryptography(ECC) enables
another kind of public key cryptography that depends on the properties of the elliptic curve equation;
the resulting cryptographic algorithms can be faster and more efficient and can produce comparable
levels of security with shorter cryptographic keys. As a result, ECC algorithms are often
implemented in internet of things devices and other products with limited computing resources.
Encryption is used to protect data stored on a system (encryption in place or encryption at rest); many
internet protocols define mechanisms for encrypting data moving from one system to another (data in
transit).
Some applications tout the use of end-to-end encryption (E2EE) to guarantee data being sent between
two parties cannot be viewed by an attacker that intercepts the communication channel. Use of an
encrypted communication circuit, as provided by Transport Layer Security (TLS) between web client
and web server software, is not always enough to insure E2EE; typically, the actual content being
transmitted is encrypted by client software before being passed to a web client, and decrypted only
by the recipient.
Messaging apps that provide E2EE include Facebook's WhatsApp and Open Whisper Systems'
Signal. Facebook Messenger users may also get E2EE messaging with the "Secret Conversations"
option.
Encryption was almost exclusively used only by governments and large enterprises until the late
1970s when the Diffie-Hellman key exchange and RSA algorithms were first published -- and the
first personal computers were introduced. By the mid-1990s, both public key and private key
encryption were being routinely deployed in web browsers and servers to protect sensitive data.
Encryption is now an important part of many products and services, used in the commercial and
consumer realms to protect data both while it is in transit and while it is stored, such as on a hard
drive, smartphone or flash drive (data at rest).
Devices like modems, set-top boxes, smartcards and SIM cards all use encryption or rely
on protocols like SSH, S/MIME, and SSL/TLS to encrypt sensitive data. Encryption is used to
protect data in transit sent from all sorts of devices across all sorts of networks, not just the internet;
every time someone uses an ATM or buys something online with a smartphone, makes a mobile
phone call or presses a key fob to unlock a car, encryption is used to protect the information being
relayed. Digital rights management systems, which prevent unauthorized use or reproduction of
copyrighted material, are yet another example of encryption protecting data.
Encryption is usually a two-way function, meaning the same algorithm can be used to encrypt
plaintext and to decrypt ciphertext. A cryptographic hash function can be viewed as a type of one-
way function for encryption, meaning the function output cannot easily be reversed to recover the
original input. Hash functions are commonly used in many aspects of security to generate digital
signatures and data integrity checks. They take an electronic file, message or block of data and
generate a short digital fingerprint of the content called a message digest or hash value. The key
properties of a secure cryptographic hash function are:
Output length is small compared to input
Strong collision resistance -- two different inputs can't create the same output
The ciphers in hash functions are optimized for hashing: They use large keys and blocks, can
efficiently change keys every block and have been designed and vetted for resistance to related-key
attacks. General-purpose ciphers used for encryption tend to have different design goals. For
example, the symmetric-key block cipher AES could also be used for generating hash values, but its
key and block sizes make it nontrivial and inefficient.
For any cipher, the most basic method of attack is brute force; trying each key until the right one is
found. The length of the key determines the number of possible keys, hence the feasibility of this
type of attack. Encryption strength is directly tied to key size, but as the key size increases so, too, do
the resources required to perform the computation.
Alternative methods of breaking a cipher include side-channel attacks, which don't attack the actual
cipher but the physical side effects of its implementation. An error in system design or execution can
allow such attacks to succeed.
Attackers may also attempt to break a targeted cipher through cryptanalysis, the process of
attempting to find a weakness in the cipher that can be exploited with a complexity less than a brute-
force attack. The challenge of successfully attacking a cipher is easier if the cipher itself is already
flawed. For example, there have been suspicions that interference from the National Security
Agency weakened the Data Encryption Standard algorithm, and following revelations from former
NSA analyst and contractor Edward Snowden, many believe the NSA has attempted to subvert other
cryptography standards and weaken encryption products.
More recently, law enforcement agencies such as the FBI have criticized technology companies that
offer end-to-end encryption, arguing that such encryption prevents law enforcement from accessing
data and communications even with a warrant. The FBI has referred to this issue as "Going Dark,"
while the U.S. Department of Justice has proclaimed the need for "responsible encryption" that can
be unlocked by technology companies under a court order.
History of encryption
The word encryption comes from the Greek word kryptos, meaning hidden or secret. The use of
encryption is nearly as old as the art of communication itself. As early as 1900 B.C., an Egyptian
scribe used nonstandard hieroglyphs to hide the meaning of an inscription. In a time when most
people couldn't read, simply writing a message was often enough, but encryption schemes soon
developed to convert messages into unreadable groups of figures to protect the message's secrecy
while it was carried from one place to another. The contents of a message were reordered
(transposition) or replaced (substitution) with other characters, symbols, numbers or pictures in order
to conceal its meaning.
In 700 B.C., the Spartans wrote sensitive messages on strips of leather wrapped around
sticks. When the tape was unwound, the characters became meaningless, but with a stick
of exactly the same diameter, the recipient could recreate (decipher) the message. Later,
the Romans used what's known as the Caesar Shift Cipher, a monoalphabetic cipher in
which each letter is shifted by an agreed number. So, for example, if the agreed number is
three, then the message, "Be at the gates at six" would become "eh dw wkh jdwhv dw vla".
At first glance this may look difficult to decipher, but juxtaposing the start of the alphabet
until the letters make sense doesn't take long. Also, the vowels and other commonly used
letters like T and S can be quickly deduced using frequency analysis, and that information,
in turn, can be used to decipher the rest of the message.
The Middle Ages saw the emergence of polyalphabetic substitution, which uses multiple
substitution alphabets to limit the use of frequency analysis to crack a cipher. This method
of encrypting messages remained popular despite many implementations that failed to
adequately conceal when the substitution changed, also known as key progression.
Possibly the most famous implementation of a polyalphabetic substitution cipher is the
Enigma electromechanical rotor cipher machine used by the Germans during World War II.
It was not until the mid-1970s that encryption took a major leap forward. Until this point, all
encryption schemes used the same secret for encrypting and decrypting a message: a
symmetric key. In 1976, Whitfield Diffie and Martin Hellman's paper "New Directions in
Cryptography" solved one of the fundamental problems of cryptography: namely, how to
securely distribute the encryption key to those who need it. This breakthrough was followed
shortly afterward by RSA, an implementation of public-key cryptography using asymmetric
algorithms, which ushered in a new era of encryption.