Beruflich Dokumente
Kultur Dokumente
2002-10-12
RAID Systems
Page 2 of 10
Introduction
Overall description
The purpo se o f this do cument is to give an over view of what RAID is, why we need it and
how it works.
The r esearch was per for med during the co urse “Systems Engineering” at Blekinge I nstitute of
Technology in Sweden.
History
1 Back in the middle eig hties SLED:s (Single Large Expensive Disk) were the mo st popular
media for storing data. At that time, the disk dr ives did not by far have the storing capacity or
the performance that disks have today. To be able to provide a large amo unt of data one had
to have lots of disk drives, which all had to be mounted in a single file tree 2 . This was an
extr emely messy and inconvenient way of handling data. Disk s these days were also very
expensive, hence the meaning of SLED. Another big pro blem was, and still is, lo ss of data
because of disk failur e. A so lution for this was mu ch needed.
3To take care o f these problems IBM co -sponsored Berkeley University of California to build
a disk arr ay subsystem to which IBM had received a patent in 1978 4 . In 1987 Randy Katz and
Dave Patterson, both working at Berkeley University of Califor nia, had succeeded. They
called the solution RAID 5 . RAID stand for “Redundant Array of Inexpensive Disk s”,
although so me peop le chose to change “… Inexpensive …”, which is the o riginal word, to
“… Independent …”. Randy and Dave clustered mu ltiple smaller and less expensive disks
into an array. By do ing this, all disks appear to the rest of the world as if there was just one
sing le large d isk. The result was compared to SLED:s according to cost ver sus performance.
It turned out to that RAID had the same o r super io r performance as SLED, but with a
theo retical Meantime Before Data Loss (MTBDL) that was reduced to an acceptable level.
6The search for a way o f decreasing the MTBDL now started. There was need for a way to
prevent single d isk dr ive failures from causing data lo ss within the array of disk s. The result
was the six RAID levels 0 through 5. The d ifferent RAID levels determine how data is
distributed across the drives in the ar ray and the level of redundancy. A R AID system does
not prevent hard dr ive failure. Har d drives always fail. RAI D level 1 through 5 offer
protectio n against the data lo ss caused by dr ive failure. RAID level 0 is a non-redundant
method which only increases transfer speed.
7IBM shipped RAID 1 in 1990, and it was not until then RAI D had its breakthrough. The year
after (1991) IBM launched RAID 3 and 1992 they released R AID 5.
1
De Re IT Corp, White Paper: RAID
2 Rockwood,
Systems. RAID Theory: An Overview, 2002.
3
De Re IT Corp, White Paper: RAID
4 Storage
Systems. search, RAID man ufacturers on STORAGE search
.com.IBM Hard disk drives | A history of IBM "firsts" in storage technology | 1973.
IBM,
5 Berkeley, RAID,
7 Systems.
Storage search, RAID man ufacturers on STORAGE search
IBM,
.com.IBM Hard disk drives | A history of IBM "firsts" in storage technology | Timelin e.
Page 3 of 10
Issues
8 So, basically, what RAID is all about, is to connect multiple disk drives and make them
appear as if there was only one d isk. Then better perfor mance, storage capability and
reliability are achieved, than can be do ne by u sing just a single disk drive.
9Now days there are three ways o f acco mplish a RAID system. It can be either hardware
driven, softwar e driven or a combinatio n of both. The performance o f hard ware driven RAID
systems are generally higher than softwar e driven o nes.
10One issue with RAID systems is that all d isk s mu st have the same sto ring capacity to work
properly. I f they have not, the storage capacity o f all disks will be restricted to have the same
capacity as the smallest disk drive. If for we for instance have 3 d isks with a capacity of 20
GB respectively, and one disk with a storing capability o f 10 GB, the total stor ing capacity
will be 40 GB (4x10 GB), since all d isks will be restricted to a maximum sto rage of 10 GB.
Problem description
Problem
• Why RAID?
• Ho w does RAID work?
• Are there any alternatives to RAID?
There are pro bably more questions to consider, but in this research these are the ones we have
chosen.
Systems.
Page 4 of 10
Research
Why RAID?
RAID has for a lo ng time been so mething that yo u only find in large ser ver systems, but lately
cheaper RAI D controller card have made it possible to get a RAID system even for small
servers and home computers. These will of course not have all the features, which the mor e
expensive ones have.
Different levels of RAID have different advantages and disadvantages. Ther efore o ne must
make an analysis of the workload before deciding what to bu y. The choice also much depend
on the quality attributes needed. Some examples of quality attributes one can get by using a
RAID system is data redundancy, fault tolerance, increased capacity and increased
performance11 .
RAID works in three different ways to provide the quality attributes mentio ned above. These
ways are mir roring, striping and par ity, of which each can be used either separately or mixed
with one or more of the others. This is why RAID is divided into different levels.
Mirroring Di sk On e Di sk T wo
This is called mirror ing and you normally get one MB for
every two MB of physical disk space. Yo u will always Da t a B Mi rror B
have the second d isk to read fro m if the other disk fails.
Dat a C Mir r or C
The d isadvantages of this method ar e waste of disk space
and that you will not get higher write perfor mance. Yo u
Dat a D Mir r or D
can ho wever get hig her read performance because r eads
can o ccur simu ltaneously on every dr ive12 .
Striping
Di sk On e Di sk T wo
11
RAID: An In-Depth Guide To RAID Technolog y, p
12 Adaptec,
8. RAID White Papers and Ben chmark Papers, What Is RAID?
Page 5 of 10
cache or other temporary stored data is recommended to store using only str iping.
RAID 0
Raid 0 is the simplest RAID level. It uses striping o nly to get higher perfo rmance. Therefore it
is not really a RAID system, because it has no data redundancy. Because of that, it is not
reco mmended to store important data with RAID 0. If one disk is lost, so will all data stored
on that disk be as well.
RAID 1
RAID 1 is the easiest way to get data redundancy. It uses mirroring to store two copies of all
data on two separate disks. Many computer systems need high availability without the
requirement of more perfo rmance. In that case RAID 1 can be a goo d solution. Neither RAI D
0 nor RAI D 1 uses much co mputing power. The RAID contro llers that handle these two
RAID levels are therefore cheaper than controllers for hig her RAID levels.
RAID 2
RAID 2 use striping and a special k ind of redundancy technique, which is not described
above. The technique used is bit level striping with Hamming code ECC. Separate disks are
used for data storage and ECC. The Hamming codes are calculated and written to the ECC
disk s at the same time as data is wr itten to its specific disk. The code is calculated again when
data is read fro m the d isks. This is do ne to check that it has no t been changed since the time it
was written. The co mplicated and expensive RAI D controller hardware needed fo r this level
of RAID, and the min imum number of hard drives requ ired, is the reason this level is no t used
today13 .
RAID 3
This level uses data striping with par ity. The parity is stored on ded icated disks. Parity makes
it po ssible to, for example, store data on two disks with striping and have a third disk to store
13 4RAID, What is a
RAID.
Page 6 of 10
the parity. The per fo rmance is good but when data is written, the parity data will slow down
the writing process.
RAID 4
RAID 4 is very similar to RAID 3. The difference is that it use larger blo cks o f data than
RAID 3 does. This makes it possible to change the data stripe size to su it the applications
needs. In the same way as with RAID 3, however, the extra parity disk will have a negative
impact on performance.
RAID 5
This level is similar to RAID 4 in such a way that it uses data striping with larger blocks of
data. The difference is that it tries to remove the bottleneck that RAID 4 has. That is do ne by
sk ipp ing the dedicated parity disks. The par ity is instead distributed amongst all disks. When
the bottleneck is r emoved still one problem remains. It still needs to calculate the parity, and
that is still more costly than for example mirroring. For fault tolerance and per fo rmance
reasons, the data and parity are never stored o n the same d isk. This means that if one disk
goes down, that data can always be reco vered by using data on other disks to calculate what
data disappeared. This level is one of the most used levels today, becau se many think it is the
best co mbination of the quality attributes such as performance and fault tolerance.
15At the Berkeley University o f Califor nia they perfo rm researches about alter native so lutio ns.
Such a solutio n is RADD, or Redundant Array of Distributed Disks. RADD: s can support
redundant copies of data across a computer network at the same space cost as RAID: s do for
local data. Such copies increase availability in the presence of both temporary and permanent
failures (disasters) of single site computer systems as well as disk failures. As such, RADD: s
shou ld be co nsider ed as a po ssible alternative to trad itional multiple cop y techniqu es.
14
Stonebraker, Schloss, DISTRIBUTED RAID -- A NEW MULTIPLE COPY ALGORITHM,
ftp.cs.berkeley.edu:8000/postgres/papers/ERL-M89-56.pdf.
http://s2k-
15
Stonebraker, DATA BASE RESEARCH AT BERKELEY, http://s2k-
papers/sigmodr90-ucb.pdf.
ftp.cs.berkeley.edu:8000/postgres/
Page 7 of 10
Furthermor e, RADD: s are also candidate alternatives to hig h availability schemes such as hot
standbys.
16This is a new techno logy, and it is still under co nstructio n. There are some issues that must
be handled to make RADD useful. A good thing is that all RAID algorithms wo rk directly in
a distributed environment; however distribution generates its own sig nificant problems, which
must be handled. Other things to consider are the fact that algorithms must be able to handle
unequal amount of storage at each site, it mu st be possible to mix different group sizes in the
same network and final, it must be po ssible to subtract disk storage from RADD witho ut the
need o f a global reconstruction.
Ther e are other technolo gies that may be alternatives to RAID. These are for instance: 2D-
17
16
Stonebraker, DATA BASE RESEARCH AT BERKELEY, http://s2k-
papers/sigmodr90-ucb.pdf.
ftp.cs.berkeley.edu:8000/postgres/
17
Stonebraker, DATA BASE RESEARCH AT BERKELEY, http://s2k-
papers/sigmodr90-ucb.pdf.
ftp.cs.berkeley.edu:8000/postgres/
Page 8 of 10
Conclusion
There are some disadvantages about RAID. They are ho wever few compared to the
advantages, like the high availability and per fo rmance. These are things that have become
more important, especially since companies to day use Internet for mak ing business o n a
global market. One must no t forget that techno logy and techniques are developed and fine
tuned every day. This means that alternative technolog ies, like R ADD, may be a co mplement
to RAID o r even rep lace it.
The idea behind RAID is rather simp le. It is important to understand it, at least a high level.
This is important knowledge have to be able to make a goo d decision abo ut what RAID level
to use. The choice of level may have great impact o f how well the system will work.
No wadays RAID contro llers have beco me cheaper. Therefore RAID has beco me available to
a bigger market. It is even possible to find it in ordinary workstations and small company
servers.
Page 9 of 10
Source
Internet
4RAID, What is a R AID, http://www.4raid.co m/raid levels. htm.
IBM, IBM Hard disk drives | A history of IBM "fir sts" in sto rage technolo gy | 1973,
http://www.storage.ibm. com/hdd/firsts/n1973. htm.
IBM, IBM Hard disk drives | A history of IBM "fir sts" in sto rage technolog y | Timeline,
http://www.storage.ibm. com/hdd/firsts/timeline.htm.
Rockwood, Ben, RAID Theory: An Overview, The Cuddletech Ver itas Volume Manager
Series CDLTK Cuddletech, http://www.cuddletech.com/veritas/raidtheory/x26.html, April
08th 2002.
Written
Stonebraker, Michael, DATA BASE RESEARCH AT BERKELEY, Department of Electrical
Engineering and Computer Science University of Califo rnia Berkeley, CA 94720, http://s2k-
ftp.cs.berkeley.edu:8000/postgres/ papers/sigmodr 90-ucb.pdf.
Page 10 of 10