Sie sind auf Seite 1von 11

RAID AN INDEPTH LOOK

WHITE PAPER BY

SIVANESSEN E PILLAI
ENTERPRISE SERVICES

Redundant array of independent disks


In computing, a redundant array of independent disks, also known as redundant array of inexpensive disks
(commonly abbreviated RAID) is a system which uses multiple hard drives to share or replicate data among the drives.
Depending on the version (RAID Level) chosen, the benefit of RAID is one or more of increased data integrity, faulttolerance, throughput or capacity compared to single drives. In its original implementations (in which it was an
abbreviation for "redundant array of inexpensive disks"), its key advantage was the ability to combine multiple low-cost
devices using older technology into an array that offered greater capacity, reliability, speed, or a combination of these
things, than was affordably available in a single device using the newest technology.
At the very simplest level, RAID combines multiple hard drives into a single logical unit. Thus, instead of seeing
several different hard drives, the operating system sees only one. RAID is typically used on server computers, and is
usually (but not necessarily) implemented with identically-sized disk drives

Hardware vs. software


RAID can be implemented either in dedicated hardware or custom software running on standard hardware.
Additionally, there are hybrid RAIDs that are partly software- and partly hardware-based solutions.
With a software implementation, the operating system manages the disks of the array through the normal drive
controller (IDE/ATA, SCSI, Fibre Channel, etc.). With present CPU speeds, software RAID can be faster than hardware
RAID, though at the cost of using CPU power which might be best used for other tasks. One major exception is where
the hardware implementation of RAID incorporates a battery backed-up write back cache which can speed up an
application, such as an OLTP database server. In this case, the hardware RAID implementation flushes the write cache
to secure storage to preserve data at a known point if there is a crash. The hardware approach is faster than accessing
the disk drive and limited by RAM speeds, the rate at which the cache can be mirrored to another controller, the amount
of cache and how fast it can flush the cache to disk. For this reason, battery-backed caching disk controllers are often
recommended for high transaction rate database servers. In the same situation, the software solution is limited to no
more flushes than the number of rotations or seeks per second of the drives. Another disadvantage of a pure software
RAID is that, depending on the disk that fails and the boot arrangements in use, the computer may not be able to be
rebooted until the array has been rebuilt.
A hardware implementation of RAID requires at a minimum a special-purpose RAID controller. On a desktop system,
this may be a PCI expansion card, or might be a capability built in to the motherboard. In larger RAIDs, the controller
and disks are usually housed in an external multi-bay enclosure. The disks may be IDE, ATA, SATA, SCSI, Fibre
Channel, or any combination thereof. The controller links to the host computer(s) with one or more high-speed SCSI,
Fibre Channel or iSCSI connections, either directly, or through a fabric, or is accessed as network attached storage.
This controller handles the management of the disks, and performs parity calculations (needed for many RAID levels).
This option tends to provide better performance, and makes operating system support easier. Hardware
implementations also typically support hot swapping, allowing failed drives to be replaced while the system is running.
Both hardware and software versions may support the use of a hot spare, a preinstalled drive which is used to
immediately (and almost always automatically) replace a failed drive. This reduces the mean time to repair period
during which a second drive failure in the same RAID redundancy group can result in loss of data
Example:
Software RAID: Veritas Volume Manager from Veritas, Sun Volume Manager / Solstice Disk Suite from Sun
Microsystems, Logical Volume Manager from HP.
Hardware RAID: Sun Storedge 3510 FC, Hitachi Thunder 9500 Series, EMC Claiion Series, HP EVA,VA and MSA.
Note: Please refer the Vendor specific document on the above said examples for more details.

Standard RAID Levels:


We will study about the different types of RAID levels which are prevailing in the industry and also we will talk about
the variuos advantages and dis-sdvantages of variuos RAID levels. Since the layout of RAID levels which would be
implemented on the storage subsystem would have a signifiant impact on the overall performance of the system and
application it is very crucial to understand these RAID levels in details.
RAID Levels:
RAID levels can classifed into following categories:
1.
2.
3.

Standard RAID levels


Nested RAID levels
Properitery RAID levels

Standard RAID Levels:

RAID 0 :
A RAID 0 (also known as a striped set) splits data evenly across two or more disks with no parity information for
redundancy. It is important to note that RAID 0 is not redundant. RAID 0 is normally used to increase performance.
A RAID 0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to
the size of the smallest disk. RAID 0 implementations with more than two disks are also possible, however the
reliability of a given RAID 0 set is equal to the average reliability of each disk divided by the number of disks in the
set.
Since the file system at operating environemtn level is distriubuted across all the disks, the event of a sigle disk failure
would result in file system corruption and loss of data. Hot swapping is not possible in this RAID level, since all the
disks are dependant on each other for the data.

While the block size can technically be as small as a byte it is almost always a multiple of the hard disk sector size of
512 bytes. This lets each drive seek independently when randomly reading or writing data on the disk. If all the
accessed sectors are entirely on one disk then the apparent seek time would be the same as a single disk. If the accessed
sectors are spread evenly among the disks then the apparent seek time would be reduced by half for two disks, by twothirds for three disks, etc., assuming identical disks. For normal data access patterns the apparent seek time of the array
would be between these two extremes. The transfer speed of the array will be the transfer speed of all the disks added
together.
RAID 0 setup is useful for setups where the data is read-only and the downtime is not of important factor for business
and where performance is mandatory during operation. This RAID level is popular gor gaming systems where
performance is mandatory and the data is not important.

In the above depicted diagram, virtual blocks are divided into group of four and written into successive disks. These
corresponding group of four blocks (VB000 VB011) are called a stripe. The number of cosecutive virtual disk blocks
mapped to a single physical disk is called as stripe depth. Stripe depth multiples by the number of disk in called as a
stripe size. Sometimes the stripe depth is often refered as a stripe element or segment size depending on the storage
vendor. Hence one more defintion is stripe element multiplied by number of disks is called a s stripe size.

In the above figure, as you can see a single file which has record 000 to record 009 is splitted across three disks.

Concatenation (JBOD)
Although a concatenation of disks (also called JBOD, or "Just a Bunch of Disks") is not one of the numbered RAID
levels, it is a popular method for combining multiple physical disk drives into a single virtual one. As the name implies,
disks are merely concatenated together, end to beginning, so they appear to be a single large disk.
In that it consists of an Array of Independent Disks (no redundancy), it can be thought of as a distant relation to RAID.
JBOD is sometimes used to turn several odd-sized drives into one useful drive. Therefore, JBOD could use a 3 GB, 15
GB, 5.5 GB, and 12 GB drive to combine into a logical drive at 35.5 GB, which is often more useful than the individual
drives separately.
Software RAIDs like veritas volume manager, Logical Volume manager can effectivey use this JBOD to create a single
large virtual disk with redudndancy. But you wont be able to the performance benefits as it would be in hardware RAID
0.

RAID 1
A RAID 1 creates an exact copy (or mirror) of a set of data on two or more disks. This is useful when read
performance is more important than minimizing the storage capacity used for redundancy. The array can only be as big
as the smallest member disk, however. A classic RAID 1 mirrored pair contains two disks, which increases reliability
by a factor of two over a single disk, but it is possible to have many more than two copies. Since each member can be
addressed independently if the other fails, reliability is a linear multiple of the number of members. To truly get the full
redundancy benefits of RAID 1, independent disk controllers are recommended, one for each disk. Some refer to this
practice as splitting or duplexing.
When reading, both disks can be accessed independently. Like RAID 0 the average seek time is reduced by half when
randomly reading but because each disk has the exact same data the requested sectors can always be split evenly
between the disks and the seek time remains low.

Mirroring: Simplest form of RAID.


RAID 1 has many administrative advantages. For instance, in some 365*24 environments, it is possible to "Split the
Mirror": declare one disk as inactive, do a backup of that disk, and then "rebuild" the mirror. This requires that the
application support recovery from the image of data on the disk at the point of the mirror split.

Also, one common practice is to create an extra mirror of a volume (also known as a Business Continuance Volume or
BCV. Source EMC. Shadow Image in Hitachi) which is meant to be split from the source RAID set and used
independently. In some implementations, these extra mirrors can be split and then incrementally re-established, instead
of requiring a complete RAID set rebuild.

RAID 2
A RAID 2 stripes data at the bit (rather than block) level, and uses a Hamming code for error correction. The disks
are synchronized by the controller to run in perfect tandem. This is the only original level of RAID is not currently
used. Extremely high data transfer rates are possible.

RAID 3
It uses byte-level striping with a dedicated parity disk. RAID 3 is very rare in practice. One of the side effects of RAID
3 is that it generally cannot service multiple requests simultaneously. This comes about because any single block of
data will by definition be spread across all members of the set and will reside in the same location, so any I/O operation
requires activity on every disk.
In our example below, a request for block "A1" would require all three data disks to seek to the beginning and reply
with their contents. A simultaneous request for block B1 would have to wait.
Traditional
RAID 3
A1 A2 A3 Ap(1-3)
A4 A5 A6 Ap(4-6)
A7 A8 A9 Ap(7-9)
B1 B2 B3 Bp(1-3)
In Simple definition, RAID 3 will have its user data block distributed across all the disks. And also here the bottleneck
will be the disk which holds the parity. If the parity disk is of less RPM and less performance, then total RAID 3
performance would be limited to of that disk.

RAID 3: Illustration
From the above figure, Disk A and Disk B hold the user data and the Disk C hold the parity Data. In the event of the
Disk A failure, the data would be regenerated from Disk B and Disk C. Parity check data is computed by bit-by-bit
exclusive OR of all the user data on the disks. In a RAID 3 array you would be effectively wasting one disk for check
data. Hence the overhead is less on cost perspective, as compared to the RAID 1 where the overhead is 50%.

RAID 4
A RAID 4 uses block-level striping with a dedicated parity disk. RAID 4 looks similar to RAID 3 except that it stripes
at the block, rather than the byte level. This allows each member of the set to act independently when only a single
block is requested. If the disk controller allows it, a RAID 4 set can service multiple read requests simultaneously.
In our example below, a request for block "A1" would be serviced by disk 1. A simultaneous request for block B1
would have to wait, but a request for B2 could be serviced concurrently.

Traditional
RAID 4
A1 A2 A3 Ap
B1 B2 B3 Bp
C1 C2 C3 Cp
D1 D2 D3 Dp
In the real world scenario, we can see RAID 4 implementations in Netapp filers.

RAID 5
A RAID 5 uses block-level striping with parity data distributed across all member disks. RAID 5 has achieved
popularity due to its low cost of redundancy. Generally RAID 5 is implemented with hardware support for parity
calculations.

Sample RAID 5 Illustration.

Every time a block is written to a disk in a RAID 5, a parity block is generated within the same stripe. A block is often
composed of many consecutive sectors on a disk. A series of blocks (a block from each of the disks in an array) is
collectively called a "stripe". If another block, or some portion of a block, is written on that same stripe the parity block
(or some portion of the parity block) is recalculated and rewritten. For small writes, this requires reading the old parity,
reading the old data, writing the new parity, and writing the new data. The disk used for the parity block is staggered
from one stripe to the next, hence the term "distributed parity blocks". RAID 5 writes are expensive (write penalty is
more I RAID 5) in terms of disk operations and traffic between the disks and the controller. Hence RAID 5 in not
recommeded for an application which asks for more write. However when an application write a new stripe then it
takes only two writes. One for data and another for parity.
The parity blocks are not read on data reads, since this would be unnecessary overhead and would diminish
performance. The parity blocks are read, however, when a read of a data sector results in a cyclic redundancy check
(CRC) error. Distributing the parity across all the disk would reduce the I/O overhead caused by the need for updating
the parity.
Likewise, should a disk fail in the array, the parity blocks from the surviving disks are combined mathematically with
the data blocks from the surviving disks to reconstruct the data on the failed drive "on the fly".
RAID 5 can accomadate a single disk failure, but not 2 disks failure.
RAID 6
A RAID 6 extends RAID 5 by adding an additional parity block, thus it uses block-level striping with two parity blocks
distributed across all member disks.

Like RAID 5 the parity is distributed in stripes, with the parity blocks in a different place in each stripe.
Traditional
RAID 5
A1
B1
C1
Dp

A2
B2
Cp
D1

A3
Bp
C2
D2

Ap
B3
C3
D3

Typical
RAID 6
A1
B1
C1
Dp

A2
B2
Cp
Dq

A3
Bp
Cq
D1

Ap
Bq
C2
D2

Aq
B3
C3
D3

RAID 6 is inefficient when used with a small number of drives but as arrays become bigger and have more drives the
loss in storage capacity becomes less important and the probability of two disks failing at once becomes greater. RAID
6 provides protection against double disk failures and failures while a single disk is rebuilding. In the case where there
is only one array it makes more sense than having a "hot spare" disk.
RAID 6 does not have a performance penalty for read operations, but it does have a performance penalty on write
operations due to the overhead associated with the additional parity calculations.
Nested RAID:
RAID 0+1
A RAID 0+1 (also called RAID 01, though it shouldn't be confused with RAID 10) is a RAID used for both replicating
and sharing data among disks. The difference between RAID 0+1 and RAID 1+0 is the location of each RAID system
it is a mirror of stripes. Consider an example of RAID 0+1: six 120 GB drives need to be set up on a RAID 0+1.
Below is an example where two 360 GB level 0 arrays are mirrored, creating 360 GB of total storage space:
RAID 1
/--------------------------\
|
|
RAID 0
RAID 0
/-----------------\
/-----------------\
|
|
|
|
|
|
120 GB
120 GB
120 GB
120 GB
120 GB
120 GB
A1
A2
A3
A1
A2
A3
A4
A5
A6
A4
A5
A6
B1
B2
B3
B1
B2
B3
B4
B5
B6
B4
B5
B6
The maximum storage space here is 360 GB, spread across two arrays. The advantage is that when a hard drive fails in
one of the level 0 arrays, the missing data can be transferred from the other array. However, adding an extra hard drive
to one stripe requires you to add an additional hard drive to the other stripes to balance out storage among the arrays.
It is not as robust as RAID 10 and cannot tolerate two simultaneous disk failures, if not from the same stripe. That is,
once a single disk fails, each of the mechanisms in the other stripe is single point of failure. Also, once the single failed
mechanism is replaced, in order to rebuild its data all the disks in the array must participate in the rebuild.
RAID 10
A RAID 10, sometimes called RAID 1+0, or RAID 1&0, is similar to a RAID 0+1 with exception that the RAID
levels used are reversedRAID 10 is a stripe of mirrors. Below is an example where three collections of 120 GB level
1 arrays are striped together to add up to 360 GB of total storage space:
RAID 0
/-----------------------------------\
|
|
|
RAID 1
RAID 1
RAID 1
/--------\
/--------\
/--------\
|
|
|
|
|
|
120 GB
120 GB
120 GB
120 GB
120 GB
120 GB
A1
A1
A2
A2
A3
A3
A4
A4
A5
A5
A6
A6
B1
B1
B2
B2
B3
B3
B4
B4
B5
B5
B6
B6

All but one drive from each RAID 1 set could fail without damaging the data. However, if the failed drive is not
replaced, the single working hard drive in the set then becomes a single point of failure for the entire array. If that
single hard drive then fails, all data stored in the entire array is lost.
Extra 120GB hard drives could be added to any one of the level 1 arrays to provide extra redundancy. Unlike RAID
0+1, all the "sub-arrays" do not have to be upgraded simultaneously.
RAID 10 is often the primary choice for high-load databases, because the lack of parity to calculate gives it faster write
speeds. In the event of one disk fail in a RAID 1 set, would ask for rebuilding the data within the RAID 1 set and hence
the total time to rebuild the data is reduced a lot as compared to RAID 01.
RAID 50 (RAID 5+0)
A RAID 50 combines the block-level striping with distributed parity of RAID 5, with the straight block-level striping
of RAID 0. This is a RAID 0 array striped across RAID 5 elements.
Below is an example where three collections of 120 GB RAID 5s are striped together to add up to 720 GB of total
storage space:
RAID 0
/-----------------------------------------------------\
|
|
|
RAID 5
RAID 5
RAID 5
/-----------------\
/-----------------\
/-----------------\
|
|
|
|
|
|
|
|
|
120 GB
120 GB
120 GB
120 GB
120 GB
120 GB
120 GB
120 GB
120 GB
A1
A2
Ap
A3
A4
Ap
A5
A6
Ap
B1
Bp
B2
B3
Bp
B4
B5
Bp
B6
Cp
C1
C2
Cp
C3
C4
Cp
C5
C6
D1
D2
Dp
D3
D4
Dp
D5
D6
Dp
One drive from each of the RAID 5 sets could fail without loss of data. However, if the failed drive is not replaced, the
remaining drives in that set then become a single point of failure for the entire array. If one of those drives fails, all data
stored in the entire array is lost. The time spent in recovery (detecting and responding to a drive failure, and the rebuild
process to the newly inserted drive) represents a period of vulnerability to the RAID set.
The configuration of the RAID sets will impact the overall fault tolerancy. A construction of three seven-drive RAID 5
sets has higher capacity and storage efficiency, but can only tolerate three maximum potential drive failures. A
construction of seven three-drive RAID 5 sets can handle as many as seven drive failures but has lower capacity and
storage efficiency.
Proprietary RAID Levels:
RAID S or Parity RAID
RAID S is EMC Corporation's proprietary striped parity RAID system used in their Symmetrix storage systems.
Each volume exists on a single physical disk, and multiple volumes are arbitrarily combined for parity purposes. EMC
originally referred to this capability as RAID S, and then renamed it Parity RAID for the Symmetrix DMX platform.
EMC now offers standard striped RAID 5 on the Symmetrix DMX as well.
Traditional
RAID 5
A1 A2 A3 Ap
B1 B2 Bp B3
C1 Cp C2 C3
Dp D1 D2 D3

A1
A2
A3
A4

EMC
RAID S
B1 C1
B2 C2
B3 C3
B4 C4

1p
2p
3p
4p

IBM ServeRAID 1E
The IBM ServeRAID adapter series supports 2-way mirroring on an arbitrary number of drives. For example, mirroring
on 5 drives would look like
A1
A5
B1
B5

A2
A1
B2
B1

A3
A2
B3
B2

A4
A3
B4
B3

A5
A4
B5
B4

This configuration is tolerant of non-adjacent drives failing. Other storage systems including Sun's StorEdge T3
support this mode as well.

Comparison of all RAID Levels

Das könnte Ihnen auch gefallen