Sie sind auf Seite 1von 15

RAID TECHNOLOGY

ABSTRACT
Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAID, organizations. Disk Reduce is a medications of the Hadoop distributed file system (HDFS) enabling asynchronous compression of initially triplicate data down to RAIDclass redundancy overheads. In addition to increasing a cluster's storage capacity as seen by its users by up to a factor of three, Disk Reduce can delay encoding long enough to deliver the performance benefits of multiple data copies.

RAID, an acronym for Redundant Arrays of Inexpensive Disks, is a way to virtualized multiple, independent hard disk drives into one or more arrays to improve performance, capacity and reliability (availability). The total array capacity depends on the type of RAID array you build and the number and size of disk drives. This total array capacity is independent of whether you use software or hardware RAID. The following sections look at the different implementations, the strengths and weaknesses and their impact to system performance and effectiveness in enhancing data availability.

GPTC, Kasaragod

1|P ag e

RAID TECHNOLOGY

INTRODUCTION
RAID, an acronym for redundant array of inexpensive disks or redundant array of independent disks, is a technology that allows high levels of storage reliability from low-cost and less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for redundancy. This concept was first defined by David A. Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987 as redundant array of inexpensive disks. Marketers representing industry RAID manufacturers later reinvented the term to describe a redundant array of independent disks as a means of dissociating a low-cost expectation from RAID technology. RAID combines two or more physical hard disks into a single logical unit using special hardware or software. Hardware solutions are often designed to present themselves to the attached system as a single hard drive, so that the operating system would be unaware of the technical workings. For example, if one were to configure a hardware-based RAID-5 volume using three 250 GB hard drives (two drives for data, and one for parity), the operating system would be presented with a single 500 GB volume. Software solutions are typically implemented in the operating system and would present the RAID volume as a single drive to applications running within the operating system. There are three key concepts in RAID: mirroring, the writing of identical data to more than one disk; striping, the splitting of data across more than one disk; and error correction, where redundant parity data is stored to allow problems to be detected and possibly repaired (known as fault tolerance)

GPTC, Kasaragod

2|P ag e

RAID TECHNOLOGY

HISTORY

Norman Ken Ouchi at IBM was awarded a 1978 U.S. patent 4,092,732 titled "System for recovering data stored in failed memory unit." The claims for this patent describe what would later be termed RAID 5 with full stripe writes. This 1978 patent also mentions that disk mirroring or duplexing (what would later be termed RAID 1) and protection with dedicated parity (that would later be termed RAID 4) wereprior art at that time. The term RAID was first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the University of California, Berkeley, in 1987. They studied the possibility of using two or more drives to appear as a single device to the host system and published a paper: "A Case for Redundant Arrays of Inexpensive Disks (RAID)" in June 1988 at the SIGMOD conference. One of the early uses of RAID 0 and 1 was the Crosfield Electronics Studio 9500 page layout system based on the Python workstation. The Python workstation was a Crosfield managed international development using PERQ 3B electronics, benchMark Technology's Viper display system and Crosfield's own RAID and fibre-optic network controllers. RAID 0 was particularly important to these workstations as it dramatically sped up image manipulation for the pre-press markets. Volume production started in Peterborough, England in early 1987.

GPTC, Kasaragod

3|P ag e

RAID TECHNOLOGY

SOFTWARE RAID
A simple way to describe software RAID is that the RAID task runs on the CPU of your computer system. Some software RAID implementations include a piece of hardware, which might make the implementation seem like a hardware RAID implementation, at first glance. Therefore, it is important to understand that software RAID code utilizes the CPUs calculating power. The code that provides the RAID features runs on the system CPU, sharing the computing power with the operating system and all the associated applications

Software RAID Implementations


Software RAID can be implemented in a variety of ways: 1) As a pure software solution, or 2) As a hybrid solution that includes some hardware designed to increase performance and reduce system CPU overhead. In pure software solution RAID, the RAID implementation is an application running on the host without any additional hardware. This type of software RAID uses hard disk drives which are attached to the computer system via a built-in I/O interface or a processor-less host bus adapter (HBA). The RAID becomes active as soon as the operating system has loaded the RAID driver software. Such pure software RAID solutions often come integrated into the server OS and usually are free of additional cost for the user. Low cost is the primary advantage of this solution.

Software RAID implementations are now provided by many operating systems. Software RAID can be implemented as:
y a layer that abstracts multiple devices, thereby providing a single virtual device (e.g.

Linux's md). y a more generic logical volume manager (provided with most server-class operating systems, e.g. Veritas or LVM). y a component of the file system

GPTC, Kasaragod

4|P ag e

RAID TECHNOLOGY

HARDWARE RAID
A hardware RAID solution has its own processor and memory to run the RAID application. In this implementation, the RAID system is an independent small computer system dedicated to the RAID application, offloading this task from the host system. Hardware RAID can be found as an integral part of the solution (e.g. integrated in the motherboard) or as an add-in card. If the necessary hardware is already integrated in the system solution, then hardware RAID might become a software upgrade to your existing system. So like software RAID, hardware RAID might not be identified as such at first glance. The simplest way to identify whether a solution is software or hardware RAID is to read the technical specification or data sheet of the RAID solution. If the solution includes a microprocessor (usually called I/O processor, processor or sometimes ROC which means RAID on Chip), then is the solution is a hardware RAID solution. If there is no processor, it is a software RAID solution. This is important for your selection because of the system impacts of t he software RAID vs. hardware RAID implementation. These impacts include: y y y CPU utilization and performance when other applications are running - Scalability of disk drives that can be added to a system Ease of recovery after a data loss Capability for advanced data management/monitoring - Ability to manage disk drives consistently across different

Operating systems y Ability to add a battery backup option that allow to enable write caching on the controller to enhance write performance of the system

Hardware RAID Implementations Hardware RAID can be implemented in a variety of ways: 1) As a discrete RAID Controller Card 2) As integrated hardware based on RAID-on-Chip technology. Almost all enterprise and high performance computing storage systems protect data against disk failures using a variant of the erasure protecting scheme known as Redundant Arrays of Inexpensive Disks. Presented originally as a single disk failure tolerant scheme, RAID was soon enhanced by various double disk failure tolerance encodings, collectively known as RAID 6, including two-dimensional parity, P+Q Reed Solomon codes, XOR-based Even Odd, and NetApp's variant Row-Diagonal Parity. Lately research as turned to greater reliability through codes that protect more, but not all, sets of larger than two disk failures, and the careful evaluation of the tradeoffs between codes and their implementations Networked RAID has also
GPTC, Kasaragod 5|P ag e

RAID TECHNOLOGY

been explored, initially as a block storage scheme, then later for symmetric multi-server logs, Redundant Arrays of Independent Nodes, peer to-peer le systems and is in use today in the Pan FS supercomputer storage clusters. This paper explores similar techniques, specialized to the characteristics of large scale data-intensive distributed le systems. Deferred encoding for compression, a technique we use to recover capacity without loss of the benefits of multiple copies for read bandwidth, is similar to two-level caching-and-compression in le systems, delayed parity updates in RAID systems, and alternative mirror or RAID 5 representation schemes. Finally, our basic approach of adding erasure coding to data intensive distributed le systems has been introduced into the Goggle File System and, as a result of an early version of this work, into the Hadoop Distributed File System. This paper studies the advantages of deferring the act of encoding. This specification suggested a number of prototype RAID levels, or combinations of drives. Each had theoretical advantages and disadvantages. Over the years, different implementations of the RAID concept have appeared. Most differ substantially from the original idealized RAID levels, but the numbered names have remained. This can be confusing, since one implementation of RAID 5, for example, can differ substantially from another. RAID 3 and RAID 4 are often confused and even used interchangeably.

GPTC, Kasaragod

6|P ag e

RAID TECHNOLOGY

ORGANIZATION
Organizing disks into a redundant array decreases the usable storage capacity. For instance, a 2disk RAID 1 array loses half of the total capacity that would have otherwise been available using both disks independently, and a RAID 5 array with several disks loses the capacity of one disk. Other types of RAID arrays are arranged, for example, so that they are faster to write to and read from than a single disk. There are various combinations of these approaches giving different trade-offs of protection against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements. RAID 0 RAID 0 (striped disks) distributes data across multiple disks in ways that gives improved speed at any given instant. If one disk fails, however, all of the data on the array will be lost, as there is neither parity nor mirroring. In this regard, RAID 0 is somewhat of a misnomer, in that RAID 0 is non-redundant. A RAID 0 array requires a minimum of two drives. A RAID 0 configuration can be applied to a single drive provided that the RAID controller is hardware and not software (i.e. OS-based arrays) and allows for such configuration. This allows a single drive to be added to a controller already containing another RAID configuration when the user does not wish to add the additional drive to the existing array. In this case, the controller would be set up as RAID only (as opposed to SCSI in non-RAID configuration), which requires that each individual drive be a part of some sort of RAID array. RAID 1 RAID 1 mirrors the contents of the disks, making a form of 1:1 ratio realtime mirroring. The contents of each disk in the array are identical to that of every other disk in the array. A RAID 1 array requires a minimum of two drives. RAID 3, RAID 4 RAID 3 or 4 (striped disks with dedicated parity) combines three or more disks in a way that protects data against loss of any one disk. Fault tolerance is achieved by adding an extra disk to the array, which is dedicated to storing parity information; the overall capacity of the array is reduced by one disk. A RAID 3 or 4 arrays requires a minimum of three drives: two to hold striped data, and a third for parity. With the minimum three drives needed for RAID 3, the storage efficiency is 66 percent. With six drives, the storage efficiency is 87 percent. RAID 5 Striped set with distributed parity or interleave parity requiring 3 or more disks. Distributed parity requires all drives but one to be present to operate; drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user.
GPTC, Kasaragod 7|P ag e

RAID TECHNOLOGY

The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive. A single drive failure in the set will result in reduced performance of the entire set until the failed drive has been replaced and rebuilt. STANDARD LEVELS A number of standard schemes have evolved which are referred to as levels. There were five RAID levels originally conceived, but many more variations have evolved, notably several nested levels and many non-standard levels (mostly proprietary). Following is a brief summary of the most commonly used RAID levels.[4] Space efficiency is given as amount of storage space available in an array of n disks, in multiples of the capacity of a single drive. For example if an array holds n=5 drives of 250GB and efficiency is n-1 then available space is 4 times 250GB or roughly 1TB. Array Minimu Space Failur Read Write Descripti Fault m # of Efficien Benefi Benefi Level e on Tolerance disks cy t t Rate* * Blocklevel RAID striping 0 without parity or mirroring.

Image

0 (none)

nr

nX

nX

Mirroring RAID without parity or 1 striping.

1/n

n1 disks

rn

nX

1X

Bit-level striping with RAID dedicated 2 Hamming -code parity.


GPTC, Kasaragod

RAID 2 can recover 1 1/n from 1 disk variabl variabl variabl log2(n- failure or e e e 1) repair corrupt data or
8|P ag e

RAID TECHNOLOGY

parity when a corrupted bit's correspondi ng data and parity are good. Bytelevel RAID striping 3 with dedicated parity. Blocklevel RAID striping 4 with dedicated parity. Blocklevel RAID striping 5 with distribute d parity. Blocklevel striping RAID with 6 double distribute d parity.

1 1/n

1 disk

n(n1) (n1) (n1) r2 X X*

1 1/n

1 disk

n(n1) (n1) (n1) r2 X X*

1 1/n

1 disk

n(n1) (n1) (n1) r2 X* X*

1 2/n

2 disks

n(n1)(n2)r3

(n2) (n2) X* X*

GPTC, Kasaragod

9|P ag e

RAID TECHNOLOGY

PROBLEMS WITH RAID


Correlated failures
The theory behind the error correction in RAID assumes that failures of drives are independent. Given these assumptions it is possible to calculate how often they can fail and to arrange the array to make data loss arbitrarily improbable. In practice, the drives are often the same age, with similar wear, and subject to the same environment. Since many drive failures are due to mechanical issues which are more likely on older drives, this violates those assumptions and failures are in fact statistically correlated. In practice then, the chances of a second failure before the first has been recovered are not nearly as unlikely as might be supposed, and data loss can, in practice, occur at significant rates. A common misconception is that "server-grade" drives fail less frequently than consumer-grade drives. Two independent studies, one by Carnegie Mellon University and the other by Google, have shown that the "grade" of the drive does not relate to failure rates.

Atomicity
This is a little understood and rarely mentioned failure mode for redundant storage systems that do not utilize transactional features. Database researcher Jim Gray wrote "Update in Place is a Poison Apple"[28] during the early days of relational database commercialization. However, this warning largely went unheeded and fell by the wayside upon the advent of RAID, which many software engineers mistook as solving all data storage integrity and reliability problems. Many software programs update a storage object "in-place"; that is, they write a new version of the object on to the same disk addresses as the old version of the object. While the software may also log some delta information elsewhere, it expects the storage to present "atomic write semantics," meaning that the write of the data either occurred in its entirety or did not occur at all.

However, very few storage systems provide support for atomic writes, and even fewer specify their rate of failure in providing this semantic. Note that during the act of writing an object, a RAID storage device will usually be writing all redundant copies of the object in parallel, although overlapped or staggered writes are more common when a single RAID processor is responsible for multiple drives. Hence an error that occurs during the process of writing may leave the redundant copies in different states, and furthermore may leave the copies in neither the old nor the new state. The little known failure mode is that delta logging relies on the original data being either in the old or the new state so as to enable backing out the logical change, yet few storage systems provide an atomic write semantic on a RAID disk. While the battery-backed write cache may partially solve the problem, it is applicable only to a power failure scenario. Since transactional support is not universally present in hardware RAID, many operating systems include transactional support to protect against data loss during an interrupted write. Novell
GPTC, Kasaragod 10 | P a g e

RAID TECHNOLOGY

NetWare, starting with version 3.x, included a transaction tracking system. Microsoft introduced transaction tracking via the journaling feature in NTFS. ext4 has journaling with checksums; ext3 has journaling without checksums but an "append-only" option, or ext3cow (Copy on Write). If the journal itself in a filesystem is corrupted though, this can be problematic. The journaling in NetApp WAFL file system gives atomicity by never updating the data in place, as does ZFS. An alternative method to journaling is soft updates, which are used in some BSDderived system's implementation of UFS. This can present as a sector read failure. Some RAID implementations protect against this failure mode by remapping the bad sector, using the redundant data to retrieve a good copy of the data, and rewriting that good data to the newly mapped replacement sector. The UBE (Unrecoverable Bit Error) rate is typically specified at 1 bit in 1015 for enterprise class disk drives (SCSI, FC, SAS) , and 1 bit in 1014 for desktop class disk drives (IDE/ATA/PATA, SATA). Increasing disk capacities and large RAID 5 redundancy groups have led to an increasing inability to successfully rebuild a RAID group after a disk failure because an unrecoverable sector is found on the remaining drives. Double protection schemes such as RAID 6 are attempting to address this issue, but suffer from a very high write penalty.

Write cache reliability


The disk system can acknowledge the write operation as soon as the data is in the cache, not waiting for the data to be physically written. This typically occurs in old, non-journaled systems such as FAT32, or if the Linux/Unix "writeback" option is chosen without any protections like the "soft updates" option (to promote I/O speed whilst trading-away data reliability). A power outage or system hang such as a BSOD can mean a significant loss of any data queued in such a cache. Often a battery is protecting the write cache, mostly solving the problem. If a write fails because of power failure, the controller may complete the pending writes as soon as restarted. This solution still has potential failure cases: the battery may have worn out; the power may be off for too long, the disks could be moved to another controller, the controller itself could fail. Some disk systems provide the capability of testing the battery periodically; however this leaves the system without a fully charged battery for several hours. An additional concern about write cache reliability exists, specifically regarding devices equipped with a write-back cachea caching system which reports the data as written as soon as it is written to cache, as opposed to the non-volatile medium.[29] The safer cache technique is write-through, which reports transactions as written when they are written to the non-volatile medium.

Equipment compatibility
The methods used to store data by various RAID controllers are not necessarily compatible, so that it may not be possible to read a RAID array on different hardware, with the exception of RAID 1, which is typically represented as plain identical copies of the original data on each disk.
GPTC, Kasaragod 11 | P a g e

RAID TECHNOLOGY

Consequently a non-disk hardware failure may require the use of identical hardware to recover the data, and furthermore an identical configuration has to be reassembled without triggering a rebuild and overwriting the data. Software RAID however, such as implemented in the Linux kernel, alleviates this concern, as the setup is not hardware dependent, but runs on ordinary disk controllers, and allows the reassembly of an array. Additionally, individual RAID1 disks (software, and most hardware implementations) can be read like normal disks when removed from the array, so no RAID system is required to retrieve the data. Inexperienced data recovery firms typically have a difficult time recovering data from RAID drives, with the exception of RAID1 drives with conventional data structure.

Data recovery in the event of a failed array


With larger disk capacities the odds of a disk failure during rebuild are not negligible. In that event the difficulty of extracting data from a failed array must be considered. Only RAID 1 stores all data on each disk. Although it may depend on the controller, some RAID 1 disks can be read as a single conventional disk. This means a dropped RAID 1 disk, although damaged, can often be reasonably easily recovered using a software recovery program. If the damage is more severe, data can often be recovered by professional data recovery specialists. RAID 5 and other striped or distributed arrays present much more formidable obstacles to data recovery in the event the array fails.

Drive error recovery algorithms


This section does not cite any references or sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (November 2009) Many modern drives have internal error recovery algorithms that can take upwards of a minute to recover and re-map data that the drive fails to easily read. Many RAID controllers will drop a non-responsive drive in 8 seconds or so. This can cause the array to drop a good drive because it has not been given enough time to complete its internal error recovery procedure, leaving the rest of the array vulnerable. So-called enterprise class drives limit the error recovery time and prevent this problem, but desktop drives can be quite risky for this reason. A fix specific to Western Digital drives used to be known: a utility called WDTLER.exe could limit the error recovery time of a Western Digital desktop drive so that it would not be dropped from the array for this reason. The utility enabled TLER (time limited error recovery) which limits the error recovery time to 7 seconds. As of October 2009 Western Digital has locked out this feature in their desktop drives such as the Caviar Black.[30] Western Digital enterprise class drives are shipped from the factory with TLER enabled to prevent being dropped from RAID arrays. Similar technologies are used by Seagate, Samsung, and Hitachi. As of late 2010, support for ATA Error Recovery Control configuration has been added to the Smartmontools program, so it now allows configuring many desktop class hard drives for use on a RAID controller.

Increasing recovery time

GPTC, Kasaragod

12 | P a g e

RAID TECHNOLOGY

Drive capacity has grown at a much faster rate than transfer speed, and error rates have only fallen a little in comparison. Therefore, larger capacity drives may take hours, if not days, to rebuild. The re-build time is also limited if the entire array is still in operation at reduced capacity. Given a RAID array with only one disk of redundancy (RAIDs 3, 4, and 5), a second failure would cause complete failure of the array. Even though individual drives' mean time between failure (MTBF) have increased over time, this increase has not kept pace with the increased storage capacity of the drives. The time to rebuild the array after a single disk failure, as well as the chance of a second failure during a rebuild, have increased over time.

Operator skills, correct operation


In order to provide the desired protection against physical drive failure, a RAID array must be properly set up and maintained by an operator with sufficient knowledge of the chosen RAID configuration, array controller (hardware or software), failure detection and recovery. Unskilled handling of the array at any stage may exacerbate the consequences of a failure, and result in downtime and full or partial loss of data that might otherwise be recoverable. Particularly, the array must be monitored, and any failures detected and dealt with promptly. Failure to do so will result in the array continuing to run in a degraded state, vulnerable to further failures. Ultimately more failures may occur, until the entire array becomes inoperable, resulting in data loss and downtime. In this case, any protection the array may provide merely delays this. The operator must know how to detect failures or verify healthy state of the array, identify which drive failed, have replacement drives available, and know how to replace a drive and initiate a rebuild of the array.

Other problems
While RAID may protect against physical drive failure, the data is still exposed to operator, software, hardware and virus destruction. Many studies cite operator fault as the most common source of malfunction, such as a server operator replacing the incorrect disk in a faulty RAID array, and disabling the system (even temporarily) in the process. Most well-designed systems include separate backup systems that hold copies of the data, but do not allow much interaction with it. Most copy the data and remove the copy from the computer for safe storage.

GPTC, Kasaragod

13 | P a g e

RAID TECHNOLOGY

CONCLUSION
Data-intensive systems are part of the core of data intensive computing paradigms like Map/Reduce. It is envisaged a large increase in the use of large scale parallel programming tools for science analysis applications applied to massive data sets such as astronomy surveys, protein folding, public information data mining, machine translation, etc. But current data-intensive le systems protect data against disk and node failure with high overhead triplication schemes, undesirable when data sets are massive and resources are shared over many users, each with their own massive datasets. The future has bigger disks, bigger systems, and more emphasis on failure recovery. But RAID can evolve to cope with all that. The future of RAID lies in more coding carefully targeted at specific failure cases, and more parallelism and load balancing in the reconstruction of lost data. After more than 17 years, it appear as if RAID is here to stay, at least for the foreseeable future. "RAID is a term that people can latch on to," Schulz said. "They may not fully understand it, but a lot of people know about it. We'll see new things come about that are RAID like RAID6 and enhanced RAID but RAID will remain a basic building block." Indeed, RAID6 and RAID DP offer some hope for SATA drive users, with their ability to support a second drive failure. Gibson also believes that RAID terminology will stick around for a while still. "The term will last well into a major change in the way it's done," Gibson said. "It implies a notion of reliability and tradeoff of reliability against performance. It's a checklist item for storage, so users don't want to see it go away."

GPTC, Kasaragod

14 | P a g e

RAID TECHNOLOGY

REFERENCES

y y

http://en.wikipedia.org/wiki/RAID A Case for Redundant Arrays of Inexpensive Disks (RAID). By David A. Patterson, Garth Gibson, and Randy H. Katz, University of California Berkley. Storagecc Area Network Fundamentals; Meeta Gupta; Cisco Press

GPTC, Kasaragod

15 | P a g e