Beruflich Dokumente
Kultur Dokumente
For any organization, whether it be small business or a data center, lost data means lost business. There are two
common practices for protecting that data: backups (protecting your data against total system failure, viruses,
corruption, etc.), and RAID (protecting your data against drive failure). Both are necessary to ensure your data is
secure.
This white paper discusses the various types of RAID configurations available, their uses, and how they should be
implemented into data servers.
NOTE: RAID is not a substitute for regularly-scheduled backups. All organizations and users should always have a solid backup
strategy in place.
What is RAID?
RAID (Redundant Array of Inexpensive Disks) is a data storage structure that allows a system
administrator/designer/builder/user to combine two or more physical storage devices (HDDs, SSDs, or both) into a
logical unit (an array) that is seen by the attached system as a single drive.
1. Striping (RAID 0) writes some data to one drive and some data to another, minimizing read and write access
times and improving I/O performance.
2. Mirroring (RAID 1) replicates data on two drives, preventing loss of data in the event of a drive failure.
3. Parity (RAID 5 & 6) provides fault tolerance by examining the data on two drives and storing the results on a
third. When a failed drive is replaced, the lost data is rebuilt from the remaining drives.
It is possible to configure these RAID levels into combination levels — called RAID 10, 50 and 60.
The RAID controller handles the combining of drives into these different configurations to maximize performance,
capacity, redundancy (safety) and cost to suit the user needs.
Software RAID runs entirely on the CPU of the host computer system.
In hardware RAID, a RAID controller has a processor, memory and multiple drive connectors that allow drives to be
attached either directly to the controller, or placed in hot-swap backplanes.
In both cases, the RAID system combines the individual drives into one logical disk. The OS treats the drive like any
other drive in the computer — it does not know the difference between a single drive connected to a motherboard or
a RAID array being presented by the RAID controller.
Given its performance benefits and flexibility, hardware RAID is better suited for the typical modern server system.
From a RAID perspective, HDDs and SSDs only differ in their performance and capacity capabilities. To the RAID
controller they are all drives, but it is important to take note of the performance characteristics of the RAID controller
to ensure it is capable of fully accommodating the performance capabilities of the SSD. Most modern RAID
controllers are fast enough to allow SSDs to run at their full potential, but a slow RAID controller could bottleneck data
and negatively impact system performance.
Hybrid RAID
Hybrid RAID is a redundant storage solution that combines high capacity, low-cost SATA or higher-performance SAS
HDDs with low latency, high IOPs SSDs and an SSD-aware RAID adapter card (Figure 1).
In Hybrid RAID, read operations are done from the faster SSD and write operations happen on both SSD and HDD
for redundancy purposes.
Hybrid RAID arrays offer tremendous performance gains over standard HDD arrays at a much lower cost than SSD-
only RAID arrays. Compared to HDD-only RAID arrays, hybrid arrays accelerate IOPs and reduce latency, allowing
any server system to host more users and perform more transactions per second on each server, which reduces the
number of servers required to support any given workload.
A simple glance at Hybrid RAID functionality does not readily show its common use cases, which include creating
simple mirrors in workstations through to high-performance readintensive applications in the small to medium
business arena. Hybrid RAID is also used extensively in the data center to provide greater capacity in storage servers
while providing fast boot for those servers. Learn more about Hybrid RAID.
Who should use RAID?
Any server or high-end workstation, and any computer system where constant uptime is required, is a suitable
candidate for RAID.
At some point in the life of a server, at least one drive will fail. Without some form of RAID protection, a failed drive’s
data would have to be restored from backups, likely at the loss of some data and a considerable amount of time. With
a RAID controller in the system, a failed drive can simply be replaced and the RAID controller will automatically
rebuild the missing data from the rest of the drives onto the newlyinserted drive. This means that your system can
survive a drive failure without the complex and long-winded task of restoring data from backups.
The factors to consider when choosing the right RAID level include:
Capacity
Performance
Redundancy (reliability/safety)
Price
There is no one-size-fits all approach to RAID because focus on one factor typically comes at the expense of another.
Some RAID levels designate drives to be used for redundancy, which means they can’t be used for capacity. Other
RAID levels focus on performance but not on redundancy. A large, fast, highlyredundant array will be expensive.
Conversely, a small, averagespeed redundant array won’t cost much, but will not be anywhere near as fast as the
previous expensive array.
With that in mind, here is a look at the different RAID levels and how they may meet your requirements.
RAID 0 (Striping)
In RAID 0, all drives are combined into one logical disk (Figure 2). This configuration offers low cost and maximum
performance, but no data protection — a single drive failure results in total data loss.
As such, RAID 0 is not recommended. As SSDs become more affordable and grow in capacity, RAID 0 has declined
in popularity. The benefits of fast read/write access are far outweighed by the threat of losing all data in the event of a
drive failure.
Usage: Suited only for situations where data isn’t mission critical, such as video/audio post-production, multimedia
imaging, CAD, data logging, etc. where it’s OK to lose a complete drive because the data can be quickly re-copied
from the source. Generally speaking, RAID 0 is not recommended.
Pros: » Fast and inexpensive.
» All drive capacity is usable.
» Quick to set up. Multiple HDDs sharing the data load make it the fastest of all arrays.
Cons: » RAID 0 provides no data protection at all.
» If one drive fails, all data will be lost with no chance of recovery.
RAID 1 (Mirroring)
RAID 1 maintains duplicate sets of all data on two separate drives while showing just one set of data as a logical disk
(Figure 3). RAID 1 is about protection, not performance or capacity.
Since each drive holds copies of the same data, the usable capacity is 50% of the available drives in the RAID set.
Usage: Generally only used in cases where there is not a large capacity requirement, but the user wants to make
sure the data is 100% recoverable in the case of a drive failure, such as accounting systems, video editing, gaming
etc.
As in RAID 1, usable drive capacity in RAID 1E is 50% of the total available capacity of all drives in the RAID set.
Usage: Small servers, high-end workstations, and other environments with no large capacity requirements, but where
the user wants to make sure the data is 100% recoverable in the case of a drive failure.
Pros: » Redundant with better performance and capacity than RAID 1. In effect, RAID 1E is a mirror of an odd number of drives.
Cons: » Cost is high because only half the capacity of the physical drives is available.
NOTE: RAID 1E is best suited for systems with three drives. For scenarios with four or more drives, RAID 10 is
recommended.
RAID 5 read performance is comparable to that of RAID 0, but there is a penalty for writes since the system must
write both the data block and the parity data before the operation is complete.
The RAID parity requires one drive capacity per RAID set, so usable capacity will always be one drive less than the
total number of drives in the configuration.
Usage: Often used in fileservers, general storage servers, backup servers, streaming data, and other environments
that call for good performance but best value for the money. Not suited to database applications due to poor random
write performance.
Pros: » Good value and good all-around performance.
Cons: » One drive capacity is lost to parity.
» Can only survive a single drive failure at any one time.
» If two drives fail at once, all data is lost.
NOTE: It is strongly recommended to have a hot spare set up with the RAID 5 to reduce exposure to multiple drive
failures. NOTE: While SSDs are becoming cheaper, and their improved performance over HDDs makes it seem
possible to use them in RAID 5 arrays for database applications, the general nature of small random writes in RAID 5
still means that this RAID level should not be used in a system with a large number of small, random writes. A non-
parity array such as RAID 10 should be used instead.
RAID 6 (Striping with Dual Parity)
In RAID 6, data is striped across several drives and dual parity is used to store and recover data (Figure 6). It is
similar to RAID 5 in performance and capacity capabilities, but the second parity scheme is distributed across
different drives and therefore offers extremely high fault tolerance and the ability to withstand the simultaneous failure
of two drives in an array.
RAID 6 requires a minimum of 4 drives and a maximum of 32 drives to be implemented. Usable capacity is always
two less than the number of available drives in the RAID set.
Usage: Similar to RAID 5, including fileservers, general storage servers, backup servers, etc. Poor random write
performance makes RAID 6 unsuitable for database applications.
RAID 10 requires a minimum of four drives, and usable capacity is 50% of available drives. It should be noted,
however, that RAID 10 can use more than four drives in multiples of two. Each mirror in RAID 10 is called a “leg” of
the array. A RAID 10 array using, say, eight drives (four “legs,” with four drives as capacity) will offer extreme
performance in both spinning media and SSD environments as there are many more drives splitting the reads and
writes into smaller chunks across each drive.
Usage: Ideal for database servers and any environment with many small random data writes.
Pros: » Fast and redundant.
Cons: » Expensive because it requires four drives to get the capacity of two.
» Not suited to large capacities due to cost restrictions.
» Not as fast as RAID 5 in most streaming environments.
RAID 50 (Striping with Parity)
RAID 50 (sometimes referred to as RAID 5+0) combines multiple RAID 5 sets (striping with parity) with RAID 0
(striping) (Figures 9 and 10). The benefits of RAID 5 are gained while the spanned RAID 0 allows the incorporation of
many more drives into a single logical disk. Up to one drive in each sub-array may fail without loss of data. Also,
rebuild times are substantially less than a single large RAID 5 array.
A RAID 50 configuration can accommodate 6 or more drives, but should only be used with configurations of more
than 16 drives. The usable capacity of RAID 50 is 67%-94%, depending on the number of data drives in the RAID
set.
It should be noted that you can have more than two legs in a RAID 50. For example, with 24 drives you could have a
RAID 50 of two legs of 12 drives each, or a RAID 50 of three legs of eight drives each. The first of these two arrays
would offer greater capacity as only two drives are lost to parity, but the second array would have greater
performance and much quicker rebuild times as only the drives in the leg with the failed drive are involved in the
rebuild function of the entire array.
Usage: Good configuration for cases where many drives need to be in a single array but capacity is too large for
RAID 10, such as in very large capacity servers.
Like RAID 50, a RAID 60 configuration can accommodate 8 or more drives, but should only be used with
configurations of more than 16 drives. The usable capacity of RAID 60 is between 50%-88%, depending on the
number of data drives in the RAID set.
Note that all of the above multiple-leg configurations that are possible with RAID 10 and RAID 50 are also possible
with RAID 60. With 36 drives, for example, you can have a RAID 60 comprising two legs of 18 drives each, or a RAID
60 of three legs with 12 drives in each.
Usage: RAID 60 is similar to RAID 50 but offers more redundancy, making it good for very large capacity servers,
especially those that will not be backed up (i.e. video surveillance servers handling large numbers of cameras).
Pros: » Can sustain two drive failures per RAID 6 array within the set, so it is very safe.
» Very large and reasonable value for money, considering this RAID level won’t be used unless there are a large number
of drives.
Cons: » Requires a lot of drives.
» Slightly more expensive than RAID 50 due to losing more drives to parity calculations.
When to use which RAID level
We can classify data into two basic types: random and streaming. As indicated previously, there are two general
types of RAID arrays: non-parity (RAID 1, 10) and parity (RAID 5, 6, 50, 60).
Random data is generally small in nature (i.e., small blocks), with a large number of small reads and writes making up
the data pattern. This is typified by database-type data.
Streaming data is large in nature, and is characterized by such data types as video, images, general large files.
While it is not possible to accurately determine all of a server’s data usage, and servers often change their usage
patterns over time, the general rule of thumb is that random data is best suited to non-parity RAID, while streaming
data works best and is most cost-effective on parity RAID.
Note that it is possible to set up both RAID types on the same controller, and even possible to set up the same RAID
types on the same set of drives. So if, for example, you have eight 2TB drives, you can make a RAID 10 of 1TB for
your database-type data, and a RAID 5 of the capacity that is left on the drives for your general and/or streaming type
data (approximately 12TB). Having these two different arrays spanning the same drives will not impact performance,
but your data will benefit in performance from being situated on the right RAID level.
Conversely, SSDs are often faster in larger capacities, so an 80GB SSD and an 800GB SSD from the same product
family will have quite different performance characteristics. This should be checked carefully with the product
specifications from the drive vendor to make sure you are getting the performance you think you are getting from your
drives.
With HDDs it is generally better to create an array with more, rather than fewer, drives. A RAID 5 of three 6TB HDDs
(12TB capacity) will not have the same performance as a RAID 5 array made from five 3TB HDDs (12TB capacity).
With SSDs, however, it is advisable to achieve the capacity required from as few as drives possible by using larger
capacity SSDs. These will have higher throughput than their smaller counterparts and will yield better system
performance.
During the creation process, you can change the size of the array to a lesser size. The unused space on the drives
will be available for creating additional RAID arrays.
A good example of this would be when creating a large server and keeping the operating system and data on
separate RAID arrays. Typically you would make a RAID 10 of, say, 200GB for your OS installation spread across all
drives in the server. This would use a minimal amount of capacity from each drive. You can then create a RAID 5 for
your general data across the unused space on the drives.
This has an added benefit of getting around drive size limitations for boot arrays on non-UEFI servers as the OS will
believe it is only dealing with a 200GB drive when installing the operating system.
For example, a RAID 5 made of 32 6TB drives (186TB) will have very poor build and rebuild times due to the size,
speed and number of drives. In this scenario, it would be advisable to build a RAID 50 with two legs from those drives
(180TB capacity). When a drive fails and is replaced, only 16 of the drives (15 existing plus the new drive) will be
involved in the rebuild. This will improve rebuild performance and reduce system performance impact during the
rebuild process.
Note, however, that no matter what you do, when it comes to rebuilding arrays with 6TB+ SATA drives, rebuild times
will increase beyond 24 hours in an absolutely perfect environment (no load on server). In a real-world environment
with a heavilyloaded system, the rebuild times will be even longer.
Of course, rebuild times on SSD arrays are dramatically quicker due to the fact that the drives are smaller and the
write speed of the SSDs are much faster than their spinning media counterparts.
Minimum # Drives 2 2 3 3 4 4 6 8
Data Protection Single-drive Single-drive Single-drive failure Two-drive failure Up to one drive Up to one drive Up to two driv
No Protection
failure failure Up to one drive failure in each failure in each failure in eac
failure in each sub-array sub-array sub-array
sub-array
Read Performance High Medium Medium High High High High High
Read Performance Low High Medium Medium Low High Medium Low
(degraded)
Write Performance N/A Medium Medium Low Low High Medium Low
(degraded)
Capacity Utilization 100% 50% 50% 67% - 94% 50% - 88% 50% 67% - 94% 67% - 94%
Typical Usage Non-mission-critical Cases where Small servers, Fileservers, Similar to RAID Ideal for Good RAID 60 is
data, such as video/audio there is not a high-end general storage 5, including database servers configuration similar to RAI
postproduction, large capacity workstations, and servers, backup fileservers, and any for cases where 50 but offers
multimedia imaging, requirement, but other servers, streaming general storage environment with many drives more
CAD, data logging, etc. the user wants to environments data, and other servers, backup many small need to be in a redundancy,
where it’s OK to lose a make sure the where no large environments that servers, etc. Poor random data single array but making it goo
complete drive because data is 100% capacity call for good random write writes. capacity is too for very large
the data can be quickly recoverable in requirements, but performance but performance large for RAID capacity serve
recopied from the source. the case of a where the user best value for the makes RAID 6 10, such as in especially tho
GENERALLY drive failure, wants to make money. Not suited unsuitable for very large that will not b
SPEAKING, RAID 0 IS such as sure the data is to database database capacity servers. backed up (i.e
NOT RECOMMENDED. accounting 100% applications due applications. video
systems, video recoverable in to poor random surveillance
editing, gaming the case of a write servers handli
etc. drive failure. performance. large numbers
cameras).
Pros Fast and inexpensive. All Highly Redundant with Good value and Reasonable Fast and Reasonable Can sustain tw
drive capacity is usable. redundant – better good all around value for money redundant. value for the drive failures
Quick to set up. Multiple each drive is a performance and performance. with good all- expense. Very RAID 6 array
HDDs sharing the data copy of the capacity than round good all-round within the set,
load make it the fastest of other. If one RAID 1. In performance. performance, it is very safe.
all arrays. drive fails, the effect, RAID 1E Can survive two especially for Reasonable
system is a mirror of an drives failing at streaming data, value for mon
continues as odd number of the same time, or and very high considering th
normal with no drives. one drive failing capacity RAID level
data loss. and then a capabilities. won’t be used
second drive unless there ar
failing during large number
the data rebuild. drives.
Cons RAID 0 provides no data Capacity is Cost is high One drive More expensive Expensive as it Requires a lot of Requires a lot
protection at all. If one limited to 50% because only half capacity is lost to than RAID 5 due requires four drives. Capacity drives. Slightl
drive fails, all data will of the available the capacity of parity. Can only to the loss of two drives to get the of one drive in more expensiv
be lost with no chance of drives, and the physical survive a single drive capacity to capacity of two. each RAID 5 set than RAID 50
recovery. performance is drives is drive failure at parity. Slightly Not suited to is lost to parity. due to losing
not much better available. any one time. If slower than large capacities Slightly more more drives to
than a single two drives fail at RAID 5 in most due to cost expensive than parity
drive. once, all data is applications. restrictions. Not RAID 5 due to calculations.
lost. as fast as RAID 5 this lost
in streaming capacity.
environments.
Description Included in the OS, such as Windows®, and Linux. Processor-intensive RAID operations are Processor-intensive RAID operations are off-
All RAID functions are handled by the host CPU which can off-loaded from the host CPU to a RAID loaded from the host CPU to an external PCIe
severely tax its ability to perform other computations. processor integrated into the motherboard. adapter.
Battery-back write back cache can dramatically
increase performance without adding risk of data
loss.
Typical Best used for large block applications such as data Inexpensive. Best used for small block applications such as
Usage warehousing or video streaming. Also where servers have the transaction oriented databases and web servers.
available CPU cycles to manage the I/O intensive operations
certain RAID levels require.
Pros Lower cost due to lack of RAIDdedicated hardware. Lower cost than adapter-based RAID. Offloads RAID tasks from the host system,
yielding better performance than software RAID.
Controller cards can be easily swapped out for
replacement and upgrades.
Data can be backed up to prevent loss in a power
failure.
Cons Lower RAID performance as CPU also powers the OS and No ability to upgrade or replace the RAID More expensive than software and integrated
applications. processor in the event of hardware failure. RAID.
May only support a few RAID levels.
Types of RAID
Nonredundant Arrays (RAID 0)
An array with RAID 0 includes two or more disk drives and provides data striping,
where data is distributed evenly across the disk drives in equal-sized sections.
However, RAID 0 arrays do not maintain redundant data, so they offer no data
protection.
Drive segment size is limited to the size of the smallest disk drive in the array. For
instance, a array with two 250 GB disk drives and two 400 GB disk drives can create
a RAID 0 drive segment of 250 GB, for a total of 1000 GB for the volume, as shown
in this figure.
[D]
RAID 1 Arrays
A RAID 1 array is built from two disk drives, where one disk drive is a mirror of the
other (the same data is stored on each disk drive). Compared to independent disk
drives, RAID 1 arrays provide improved performance, with twice the read rate and an
equal write rate of single disks. However, capacity is only 50 percent of independent
disk drives.
If the RAID 1 array is built from different-sized disk drives, drive segment size is the
size of the smaller disk drive, as shown in this figure.
In this figure, the large bold numbers represent the striped data, and the smaller, non-
bold numbers represent the mirrored data stripes.
RAID 10 Arrays
A RAID 10 array is built from two or more equal-sized RAID 1 arrays. Data in a
RAID 10 array is both striped and mirrored. Mirroring provides data protection, and
striping improves performance.
Drive segment size is limited to the size of the smallest disk drive in the array. For
instance, a array with two 250 GB disk drives and two 400 GB disk drives can create
two mirrored drive segments of 250 GB, for a total of 500 GB for the array, as shown
in this figure.
RAID 5 Arrays
A RAID 5 array is built from a minimum of three disk drives, and uses data striping
and parity data to provide redundancy. Parity data provides data protection, and
striping improves performance.
Unlike a hot-spare, a distributed spare is striped evenly across the disk drives with the
stored data and parity data, and can’t be shared with other logical disk drives. A
distributed spare improves the speed at which the array is rebuilt following a disk
drive failure.
A RAID 5EE array protects your data and increases read and write speeds. However,
capacity is reduced by two disk drives’ worth of space, which is for parity data and
spare data.
In this example, S represents the distributed spare, P represents the distributed parity
data.
RAID 50 Arrays
A RAID 50 array is built from at least six disk drives configured as two or more
RAID 5 arrays, and stripes stored data and parity data across all disk drives in both
RAID 5 arrays. (For more information, seeRAID 5 Arrays.)
The parity data provides data protection, and striping improves performance. RAID
50 arrays also provide high data transfer speeds.
Drive segment size is limited to the size of the smallest disk drive in the array. For
example, three 250 GB disk drives and three 400 GB disk drives comprise two equal-
sized RAID 5 arrays with 500 GB of stored data and 250 GB of parity data. The
RAID 50 array can therefore contain 1000 GB (2 x 500 GB) of stored data and 500
GB of parity data.
[D]
RAID 6 arrays provide extra protection for your data because they can recover from
two simultaneous disk drive failures. However, the extra parity calculation slows
performance (compared to RAID 5 arrays).
RAID 6 arrays must be built from at least four disk drives. Maximum stripe size
depends on the number of disk drives in the array.
Two sets of parity data provide enhanced data protection, and striping improves
performance. RAID 60 arrays also provide high data transfer speeds.
Disk Minimum
Drive Built-
RAID Read Write in Hot- Disk
Level Redundancy Usage Performance Performance Spare Drives
RAID No 100% www www No 2
0
RAID Yes 50% ww ww No 2
1
RAID Yes 50% ww ww No 3
1E
RAID Yes 50% ww ww No 4
10
RAID Yes 67 - www w No 3
5 94%
RAID Yes 50 - www w Yes 4
5EE 88%
RAID Yes 67 - www w No 6
50 94%
RAID Yes 50 - ww w No 4
6 88%
RAID Yes 50 - ww w No 8
60 88%
Disk drive usage, read performance, and write performance depend on the number of
drives in the array. In general, the more drives, the better the performance.
Migrating RAID Levels
As your storage space changes, you can migrate existing RAID levels to new RAID
levels that better meet your storage needs. You can perform these migrations through
the Sun StorageTek RAID Manager software. For more information, see the Sun
StorageTek RAID Manager Software User’s Guide. TABLE F-2 lists the supported
RAID level migrations.
RAID 5 RAID 0
RAID 5EE
RAID 6
RAID 10
RAID 6 RAID 5
RAID 10 RAID 0
RAID 5
AID 2
This uses bit level striping. i.e Instead of striping the blocks across the disks, it
stripes the bits across the disks.
In the above diagram b1, b2, b3 are bits. E1, E2, E3 are error correction codes.
You need two groups of disks. One group of disks are used to write the data, another
group is used to write the error correction codes.
This uses Hamming error correction code (ECC), and stores this information in the
redundancy disks.
When data is written to the disks, it calculates the ECC code for the data on the fly,
and stripes the data bits to the data-disks, and writes the ECC code to the
redundancy disks.
When data is read from the disks, it also reads the corresponding ECC code from the
redundancy disks, and checks whether the data is consistent. If required, it makes
appropriate corrections on the fly.
This uses lot of disks and can be configured in different disk configuration. Some
valid configurations are 1) 10 disks for data and 4 disks for ECC 2) 4 disks for data
and 3 disks for ECC
This is not used anymore. This is expensive and implementing it in a RAID
controller is complex, and ECC is redundant now-a-days, as the hard disk
themselves can do this.
RAID 3
This uses byte level striping. i.e Instead of striping the blocks across the disks, it
stripes the bits across the disks.
In the above diagram B1, B2, B3 are bytes. p1, p2, p3 are parities.
Uses multiple data disks, and a dedicated disk to store parity.
The disks have to spin in sync to get to the data.
Sequential read and write will have good performance.
Random read and write will have worst performance.
This is not commonly used.
RAID 4
This uses block level striping.
In the above diagram B1, B2, B3 are blocks. p1, p2, p3 are parities.
Uses multiple data disks, and a dedicated disk to store parity.
Minimum of 3 disks (2 disks for data and 1 for parity)
Good random reads, as the data blocks are striped.
Bad random writes, as for every write, it has to write to the single parity disk.
It is somewhat similar to RAID 3 and 5, but little different.
This is just like RAID 3 in having the dedicated parity disk, but this stripes blocks.
This is just like RAID 5 in striping the blocks across the data disks, but this has only
one parity disk.
This is not commonly used.
RAID 6
Just like RAID 5, this does block level striping. However, it uses dual parity.
In the above diagram A, B, C are blocks. p1, p2Q: What is the definition of a "RAID 5"
volume?
A: "RAID 5" refers to a "Redundant Array of Inexpensive (or Independent) Disks" that have been
established in a Level 5, or striped with parity, volume set. A RAID 5 volume is a combination of hard drives
that are configured for data to be written across three (3) or more drives.
Q: What are the differences between "hardware" and "software" RAID 5 configurations?
A: With a software-based RAID 5 volume, the hard disk drives use a standard drive contoller and a software
utility provides the management of the drives in the volume. A RAID 5 volume that relies on hardware for
management will have a physical controller (commonly built into the motherboard, but it can also be a
stand-alone expansion card) that provides for the reading and writing of data across the hard drives in the
volume.
Q: If multiple drives fail in a RAID volume all at once, is the data still recoverable?
A: In many cases, the answer is yes. It usually requires that data be recovered from each failed hard drive
individually before attempting to address the rest of the volume. The quality and integrity of the data
recovered will depend on the extent of the damage incurred to each failed storage device., p3 are
A non-redundant disk array, or RAID level 0, has the lowest cost of any RAID
organization because it does not employ redundancy at all. This scheme offers the best
performance since it never needs to update redundant information. Surprisingly, it
does not have the best performance. Redundancy schemes that duplicate data, such as
mirroring, can perform better on reads by selectively scheduling requests on the disk
with the shortest expected seek and rotational delays. Without, redundancy, any single
disk failure will result in data-loss. Non-redundant disk arrays are widely used in
super-computing environments where performance and capacity, rather than
reliability, are the primary concerns.
Sequential blocks of data are written across multiple disks in stripes, as follows:
source: Reference 2
The size of a data block, which is known as the "stripe width", varies with the
implementation, but is always at least as large as a disk's sector size. When it comes
time to read back this sequential data, all disks can be read in parallel. In a multi-
tasking operating system, there is a high probability that even non-sequential disk
accesses will keep all of the disks working in parallel.
The traditional solution, called mirroring or shadowing, uses twice as many disks as a
non-redundant disk array. whenever data is written to a disk the same data is also
written to a redundant disk, so that there are always two copies of the information.
When data is read, it can be retrieved from the disk with the shorter queuing, seek and
rotational delays. If a disk fails, the other copy is used to service requests. Mirroring is
frequently used in database applications where availability and transaction time are
more important than storage efficiency.
source: Reference 2
Memory-Style(RAID Level 2)
Memory systems have provided recovery from failed components with much less cost
than mirroring by using Hamming codes. Hamming codes contain parity for distinct
overlapping subsets of components. In one version of this scheme, four disks require
three redundant disks, one less than mirroring. Since the number of redundant disks is
proportional to the log of the total number of the disks on the system, storage
efficiency increases as the number of data disks increases.
If a single component fails, several of the parity components will have inconsistent
values, and the failed component is the one held in common by each incorrect subset.
The lost information is recovered by reading the other components in a subset,
including the parity component, and setting the missing bit to 0 or 1 to create proper
parity value for that subset. Thus, multiple redundant disks are needed to identify the
failed disk, but only one is needed to recover the lost information.
In you are unaware of parity, you can think of the redundant disk as having the sum of
all data in the other disks. When a disk fails, you can subtract all the data on the good
disks form the parity disk; the remaining information must be the missing
information. Parity is simply this sum modulo 2.
A RAID 2 system would normally have as many data disks as the word size of the
computer, typically 32. In addition, RAID 2 requires the use of extra disks to store an
error-correcting code for redundancy. With 32 data disks, a RAID 2 system would
require 7 additional disks for a Hamming-code ECC. Such an array of 39 disks was
the subject of a U.S. patent granted to Unisys Corporation in 1988, but no commercial
product was ever released.
For a number of reasons, including the fact that modern disk drives contain their own
internal ECC, RAID 2 is not a practical disk array scheme.
source: Reference 2
One can improve upon memory-style ECC disk arrays by noting that, unlike memory
component failures, disk controllers can easily identify which disk has failed. Thus,
one can use a single parity rather than a set of parity disks to recover lost information.
In a bit-interleaved, parity disk array, data is conceptually interleaved bit-wise over
the data disks, and a single parity disk is added to tolerate any single disk failure. Each
read request accesses all data disks and each write request accesses all data disks and
the parity disk. Thus, only one request can be serviced at a time. Because the parity
disk contains only parity and no data, the parity disk cannot participate on reads,
resulting in slightly lower read performance than for redundancy schemes that
distribute the parity and data over all disks. Bit-interleaved, parity disk arrays are
frequently used in applications that require high bandwidth but not high I/O rates.
They are also simpler to implement than RAID levels 4, 5, and 6.
Here, the parity disk is written in the same way as the parity bit in normal Random
Access Memory (RAM), where it is the Exclusive Or of the 8, 16 or 32 data bits. In
RAM, parity is used to detect single-bit data errors, but it cannot correct them because
there is no information available to determine which bit is incorrect. With disk drives,
however, we rely on the disk controller to report a data read error. Knowing which
disk's data is missing, we can reconstruct it as the Exclusive Or (XOR) of all
remaining data disks plus the parity disk.
source: Reference 2
As a simple example, suppose we have 4 data disks and one parity disk. The sample
bits are:
The parity bit is the XOR of these four data bits, which can be calculated by adding
them up and writing a 0 if the sum is even and a 1 if it is odd. Here the sum of Disk 0
through Disk 3 is "3", so the parity is 1. Now if we attempt to read back this data, and
find that Disk 2 gives a read error, we can reconstruct Disk 2 as the XOR of all the
other disks, including the parity. In the example, the sum of Disk 0, 1, 3 and Parity is
"3", so the data on Disk 2 must be 1.
Block-Interleaved Parity (RAID Level 4)
The block-interleaved, parity disk array is similar to the bit-interleaved, parity disk
array except that data is interleaved across disks of arbitrary size rather than in bits.
The size of these blocks is called the striping unit. Read requests smaller than the
striping unit access only a single data disk. Write requests must update the requested
data blocks and must also compute and update the parity block. For large writes that
touch blocks on all disks, parity is easily computed by exclusive-or'ing the new data
for each disk. For small write requests that update only one data disk, parity is
computed by noting how the new data differs from the old data and applying those
differences to the parity block. Small write requests thus require four disk I/Os: one to
write the new data, two to read the old data and old parity for computing the new
parity, and one to write the new parity. This is referred to as a read-modify-write
procedure. Because a block-interleaved, parity disk array has only one parity disk,
which must be updated on all write operations, the parity disk can easily become a
bottleneck. Because of this limitation, the block-interleaved distributed parity disk
array is universally preferred over the block-interleaved, parity disk array.
source: Reference 2
source: Reference 2
Once such scheme, called P+Q redundancy, uses Reed-Solomon codes to protect
against up to two disk failures using the bare minimum of two redundant disk arrays.
The P+Q redundant disk arrays are structurally very similar to the block-interleaved
distributed-parity disk arrays and operate in much the same manner. In particular,
P+Q redundant disk arrays also perform small write operations using a read-modify-
write procedure, except that instead of four disk accesses per write requests, P+Q
redundant disk arrays require six disk accesses due to the need to update both the `P'
and `Q' information.
Obviously, RAID 10 uses more disk space to provide redundant data than RAID 5.
However, it also provides a performance advantage by reading from all disks in
parallel while eliminating the write penalty of RAID 5. In addition, RAID 10 gives
better performance than RAID 5 while a failed drive remains unreplaced. Under
RAID 5, each attempted read of the failed drive can be performed only by reading all
of the other disks. On RAID 10, a failed disk can be recovered by a single read of its
mirrored pair.
source: Reference 2
Tool to calculate storage efficiency given the number of disks and the RAID
level (source: Reference 3)
It is worth remembering an important point about RAID systems. Even when you use
a redundancy scheme like mirroring or RAID 5 or RAID 10, you must still do regular
tape backups of your system. There are several reasons for insisting on this, among
them:
RAID does not protect you from multiple disk failures. While one disk is off
line for any reason, your disk array is not fully redundant.
Regular tape backups allow you to recover from data loss that is not related to a
disk failure. This includes human errors, hardware errors, and software errors.
There are many different ways to measure these parameters for eg. performance could
be measured as I/Os per second per dollar, bytes per second or response time. We
could also compare systems at the same cost, the same total user capacity, the same
performance or the same reliability. The method used largely depends on the
application and the reason to compare. For example, in transaction processing
applications the primary base for comparison would be I/Os per second per dollar
while in scientific applications we would be more interested in bytes per second per
dollar. In some heterogeneous systems like file servers both I/O per second and bytes
per second may be important. Sometimes it is important to consider reliability as the
base for comparison.
Taking a closer look at the RAID levels we observe that most of the levels are similar
to each other. RAID level 1 and RAID level 3 disk arrays can be viewed as a subclass
of RAID level 5 disk arrays. Also RAID level 2 and RAID level 4 disk arrays are
generally found to be inferior to RAID level 5 disk arrays. Hence the problem of
selecting among RAID levels 1 through 5 is a subset of the more general problem of
choosing an appropriate parity group size and striping unit for RAID level 5 disk
arrays.
Some Comparisons
Given below is a table that compares the throughput of various redundancy schemes
for four types of I/O requests. The I/O requests are basically reads and writes which
are divided into small (reads & writes) and large ones. Remembering the fact that our
data has been spread over multiple disks (data striping), a small refers to an I/O
request of one striping unit while a large I/O request refers to requests of one full
stripe (one stripe unit from each disk in an error correcting group).
The table above tabulates the maximum throughput per dollar relative level 0 for
RAID levels 0, 1, 3, 5 and 6. For practical purposes we consider RAID levels 2 & 4
inferior to RAID level 5 disk arrays, so we don't show the comparisons. The cost of a
system is directly proportional to the number of disks it uses in the disk array. Thus
the table shows us that given equivalent cost RAID level 0 and RAID level 1 systems,
the RAID level 1 system can sustain half the number of small writes per second that a
RAID level 0 system can sustain. Equivalently the cost of small writes is twice as
expensive in a RAID level 1 system as in a RAID level 0 system.
The table also shows storage efficiency of each RAID level. The storage efficiency is
approximately inverse the cost of each unit of user capacity relative to a RAID level 0
system. The storage efficiency is equal to the performance/cost metric for large
writes.
s
ource: Reference 1
The figures above graph the performance/cost metrics from the table above for RAID
levels 1, 3, 5 and 6 over a range of parity group sizes. The performance/cost of RAID
level 1 systems is equivalent to the performance/cost of RAID level 5 systems when
the parity group size is equal to 2. The performance/cost of RAID level 3 systems is
always less than or equal to the performance/cost of RAID level 5 systems. This is
expected given that a RAID level 3 system is a subclass of RAID level 5 systems
derived by restricting the striping unit size such that all requests access exactly a
parity stripe of data. Since the configuration of RAID level 5 systems is not subject to
such a restriction, the performance/cost of RAID level 5 systems can never be less
than that of an equivalent RAID level 3 system. Of course such generalizations are
specific to the models of disk arrays used in the above experiments. In reality, a
specific implementation of a RAID level 3 system can have better performance/cost
than a specific implementation of a RAID level 5 system.
The question of which RAID level to use is better expressed as more general
configuration questions concerning the size of the parity group and striping unit. For a
parity group size of 2, mirroring is desirable, while for a very small striping unit
RAID level 3 would be suited.
The figure below plots the performance/cost metrics from the table above for RAID
levels 3, 5 &
6.
BACK / HOME
Reliability of any I/O system has become as important as its performance and cost.
This part of the tutorial:
Redundancy in disk arrays is motivated by the need to fight disk failures. Two key
factors MTTF(Mean-Time-to-Failure) and MTTR(Mean-Time-to-Repair) are of
primary concern in estimating the reliability of any disk. Following are some formulae
for the mean time between failures :
RAID level 5
MTTF(disk) 2
------------------
N*(G-1)*MTTR(disk)
Disk array with two redundant disk per parity group (eg: P+Q redundancy)
MTTF(disk) 3
-------------------------
N*(G-1)*(G-2)* (MTTR(disk) 2 )
System crashes
Uncorrectable bit-errors
Correlated disk failures
System Crashes
System crash refers to any event such as a power failure, operator error, hardware
breakdown, or software crash that can interrupt an I/O operation to a disk array.
Such crashes can interrupt write operations, resulting in states where the data is
updated and the parity is not updated or vice versa. In either case, parity is
inconsistent and cannot be used in the event of a disk failure. Techniques such
as redundant hardware and power supplies can be applied to make such crashes less
frequent.
System crashes can cause parity inconsistencies in both bit-interleaved and block-
interleaved disk arrays, but the problem is of practical concern only in block-
interleaved disk arrays.
For, reliability purposes, system crashes in block-interleaved disk arrays are similar
to disk failures in that they may result in the loss of the correct parity for stripes
that were modified during the crash.
Uncorrectable bit-errors
Most uncorrectable bit-errors are generated because data is incorrectly written or
gradually damaged as the magnetic media ages. These errors are detected only
when we attempt to read the data.
Our interpretation of uncorrectable bit error rates is that they represent the rate at
which errors are detected during reads from the disk during the normal
operation of the disk drive.
One approach that can be used with or without redundancy is to try to protect
against bit errors by predicting when a disk is about to fail. VAXsimPLUS, a
product from DEC, monitors the warnings issued by disks and notifies an operator
when it feels the disk is about to fail.
For example, an accident might sharply increase the failure rate for all disks in a disk
array for a short period of time. In general, power surges, power failures and simply
switching the disks on and offcan place stress on the electrical components of all
affected disks. Disks also share common support hardware; when this hardware fails,
it can lead to multiple, simultaneous disk failures.
Disks are generally more likely to fail either very early or very late in their lifetimes.
Early failuresare frequently caused by transient defects which may not have been
detected during the manufacturer's burn-in process.
Late failures occur when a disk wears out. Correlated disk failures greatly reduce the
reliability of disk arrays by making it much more likely that an initial disk failure will
be closely followed by additional disk failures before the failed disk can be
reconstructed.
Mean-Time-To-Data-Loss(MTTDL)
Following are some formulae to calculate the mean-time-to-data-loss(MTTDL). In a
block-interleaved parity-protected disk array, data loss is possible through the
following three common ways:
The above three failure modes are the hardest failure combinations, in that we,
currently, don't have any techniques to protect against them without sacrificing
performance.
RAID Level 5
MTTF(disk1) * MTTF(disk2)
Double Disk Failure -----------------------
N * (G-1) * MTTR(disk)
MTTF(system) * MTTF(disk)
System Crash + Disk Failure -----------------------
N * MTTR(system)
MTTF(disk)
Disk Failure + Bit Error -----------------------
N * (1 - ( p(disk)) (G-1) )
Software RAID harmonic sum of the above
harmonic sum of above excluding
Hardware RAID
system crash + disk failure
Failure Characteristics for RAID Level 5 Disk Arrays (source: Reference
1)
p(disk) = The probability of reading all sectors on a disk (derived from disk size,
sector size, and BER)
In the event of a disk failure, disk access continues normally and the failure is
transparent to the host system.
Logical Drive
A logical drive is an array of independent physical drives. Increased availability,
capacity, and performance are achieved by creating logical drives. The logical drive
appears to the host the same as a local hard disk drive does.
FIGURE A-1 Logical Drive Including Multiple Physical Drives
Logical Volume
A logical volume is composed of two or more logical drives. The logical volume can
be divided into a maximum of 32 partitions for Fibre Channel. During operation, the
host sees a nonpartitioned logical volume or a partition of a logical volume as one
single physical drive.
Channels
You can connect up to 15 devices (excluding the controller itself) to a SCSI channel
when the Wide function is enabled (16-bit SCSI). You can connect up to 125 devices
to an FC channel in loop mode. Each device has a unique ID that identifies the device
on the SCSI bus or FC loop.
A logical drive consists of a group of SCSI drives, Fibre Channel drives, or SATA
drives. Physical drives in one logical drive do not have to come from the same SCSI
channel. Also, each logical drive can be configured for a different RAID level.
A drive can be assigned as the local spare drive to one specified logical drive, or as a
global spare drive. A spare is not available for logical drives that have no data
redundancy (RAID 0).
You can divide a logical drive or logical volume into several partitions or
use the entire logical drive as single partition.
FIGURE A-3 Partitions in Logical Drive Configurations
Each partition is mapped to LUNs under host SCSI IDs or IDs on host channels. Each
SCSI ID/LUN acts as one individual hard drive to the host computer.
FIGURE A-4 Mapping Partitions to Host ID/LUNs
FIGURE A-5 Mapping Partitions to LUNs Under an ID
RAID Levels
There are several ways to implement a RAID array, using a combination of mirroring,
striping, duplexing, and parity technologies. These various techniques are referred to
as RAID levels. Each level offers a mix of performance, reliability, and cost. Each
level uses a distinct algorithm to implement fault tolerance.
There are several RAID level choices: RAID 0, 1, 3, 5, 1+0, 3+0 (30), and 5+0 (50).
RAID levels 1, 3, and 5 are the most commonly used.
For RAID 3+0 (30) and 5+0 (50), capacity refers to the total number of physical
drives (N) minus one physical drive (#) for each logical drive in the volume. For
example, if the total number of disk drives in the logical drive is twenty 36-Mbyte
drives and the total number of logical drives is 2, the disk space available for storage
is equal to 18 disk drives--18 x 36 Mbyte (648 Mbyte).
RAID 0
RAID 0 implements block striping, where data is broken into logical blocks and is
striped across several drives. Unlike other RAID levels, there is no facility for
redundancy. In the event of a disk failure, data is lost.
In block striping, the total disk capacity is equivalent to the sum of the capacities of
all drives in the array. This combination of drives appears to the system as a single
logical drive.
RAID 1
RAID 1 implements disk mirroring, where a copy of the same data is recorded onto
two drives. By keeping two copies of data on separate disks, data is protected against
a disk failure. If, at any time, a disk in the RAID 1 array fails, the remaining good disk
(copy) can provide all of the data needed, thus preventing downtime.
In disk mirroring, the total usable capacity is equivalent to the capacity of one drive in
the RAID 1 array. Thus, combining two 1-Gbyte drives, for example, creates a single
logical drive with a total usable capacity of 1 Gbyte. This combination of drives
appears to the system as a single logical drive.
Note - RAID 1 does not allow expansion. RAID levels 3 and 5 permit expansion by
adding drives to an existing array.
FIGURE A-7 RAID 1 Configuration
In addition to the data protection that RAID 1 provides, this RAID level also improves
performance. In cases where multiple concurrent I/O is occurring, that I/O can be
distributed between disk copies, thus reducing total effective data access time.
RAID 1+0
RAID 1+0 combines RAID 0 and RAID 1 to offer mirroring and disk striping. Using
RAID 1+0 is a time-saving feature that enables you to configure a large number of
disks for mirroring in one step. It is not a standard RAID level option that you can
select; it does not appear in the list of RAID level options supported by the controller.
If four or more disk drives are chosen for a RAID 1 logical drive, RAID 1+0 is
performed automatically.
FIGURE A-8 RAID 1+0 Configuration
RAID 3
RAID 3 implements block striping with dedicated parity. This RAID level breaks
data into logical blocks, the size of a disk block, and then stripes these blocks across
several drives. One drive is dedicated to parity. In the event that a disk fails, the
original data can be reconstructed using the parity information and the information on
the remaining disks.
In RAID 3, the total disk capacity is equivalent to the sum of the capacities of all
drives in the combination, excluding the parity drive. Thus, combining four 1-Gbyte
drives, for example, creates a single logical drive with a total usable capacity of 3
Gbyte. This combination appears to the system as a single logical drive.
RAID 3 provides increased data transfer rates when data is being read in small chunks
or sequentially. However, in write operations that do not span every drive,
performance is reduced because the information stored in the parity drive needs to be
recalculated and rewritten every time new data is written, limiting simultaneous I/O.
FIGURE A-9 RAID 3 Configuration
RAID 5
RAID 5 implements multiple-block striping with distributed parity. This RAID level
offers redundancy with the parity information distributed across all disks in the array.
Data and its parity are never stored on the same disk. In the event that a disk fails,
original data can be reconstructed using the parity information and the information on
the remaining disks.
FIGURE A-10 RAID 5 Configuration
RAID 5 offers increased data transfer rates when data is accessed in large chunks, or
randomly and reduced data access time during many simultaneous I/O cycles.
RAID
Level Description
RAID 3+0 RAID 3 logical drives that have been joined together using the array’s
(30) built-in volume manager.
TABLE A-2 Advanced RAID Levels
RAID
Level Description
RAID 5+0 RAID 5 logical drives that have been joined together using the array’s
(50) volume manager.
The local spare drive always has higher priority than the global spare drive. Therefore,
if a drive fails and both types of spares are available at the same time or a greater size
is needed to replace the failed drive, the local spare is used.
If there is a failed drive in the RAID 5 logical drive, replace the failed drive with a
new drive to keep the logical drive working. To identify a failed drive, refer to the Sun
StorEdge 3000 Family RAID Firmware User’s Guide for your array.
Caution - If, when trying to remove a failed drive, you mistakenly remove the
wrong drive, you can no longer access the logical drive because you have
incorrectly failed another drive.
A local spare drive is a standby drive assigned to serve one specified logical drive.
When a member drive of this specified logical drive fails, the local spare drive
becomes a member drive and automatically starts to rebuild.
A local spare drive always has higher priority than a global spare drive; that is, if a
drive fails and there is a local spare and a global spare drive available, the local spare
drive is used.
A global spare drive is available for all logical drives rather than serving only one
logical drive (see FIGURE A-12). When a member drive from any of the logical
drives fails, the global spare drive joins that logical drive and automatically starts to
rebuild.
A local spare drive always has higher priority than a global spare drive; that is, if a
drive fails and there is a local spare and a global spare drive available, the local spare
drive is used.
FIGURE A-12 Global Spare
A local spare drive always has higher priority than a global spare drive; that is, if a
drive fails and both a local spare and a global spare drive are available, the local spare
drive is used.
In FIGURE A-13, it is not possible for the 4-Gbyte global spare drive to join logical
drive 0 because of its insufficient capacity. The 9-Gbyte local spare drive aids logical
drive 0 once a drive in this logical drive fails. If the failed drive is in logical drive 1 or
2, the 4-Gbyte global spare drive immediately aids the failed drive.
source: Reference 1
ck.