Sie sind auf Seite 1von 20

Seminar Report On RAID(REDUNTANT ARRAY OF INDEPENDENT DISK) By ADEEB KHAN

Seminar Report
Topic Of Seminar:- RAID(REDUNDANT ARRAY OF INDEPENDENT DISK)

Department of Computer Science & Engineering. Institute Of Engineering & Rural Technology, Allahabad, INDIA March, 2011

A Seminar Report on RAID(REDUNDANT ARRAY OF INDEPENDENT DISK) By ADEEB KHAN

Under the guidance of Mr. ARPIT

Institute of Engineering & Rural Technology , Allahabad, INDIA March, 2011

Abstract

RAID, an acronym for Redundant Array of Independent Disks (Changed from its original term Redundant Array of Inexpensive Disks), is a technology that provides increased storage functions and reliability through redundancy. This is achieved by combining multiple disk drive components into a logical unit, where data is distributed across the drives in one of several ways called "RAID levels". This concept was first defined by David A. Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987 as Redundant Arrays of Inexpensive Disks.[1] Marketers representing industry RAID manufacturers later attempted to reinvent the term to describe a redundant array of independent disks as a means of dissociating a low-cost expectation from RAID technology. RAID is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple physical disk drives. The physical disks are said to be in a RAID array,[3] which is addressed by the operating system as one single disk. The different schemes or architectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each scheme provides a different balance between two key goals: increase data reliability and increase input/output performance.

INDEX

1.) INTRODUCTION

2.) OVERALL DESCRIPTION

3.) HISTORY

4.) ISSUES

5.) PROBLEM DESCRIPTION

6.)WHY RAID?

7.) HOW DOES RAID WORK?

8.) DIFFERENT LEVELS OF RAID..

RAID 0 RAID1 RAID 2 RAID 3 RAID 4 RAID 5

9.)Combinations of different levels

10.) ARE THERE ANY ALTERNATIVES TO RAID?

11.) CONCLUSION

Introduction

History 1.Back in the middle eighties SLED:s (Single Large Expensive Disk) were the most popular media for storing data. At that time, the disk drives did not by far have the storing capacity or the performance that disks have today. To be able to provide a large amount of data one had to have lots of disk drives, which all had to be mounted in a single file tree 2. This was an extremely messy and inconvenient way of handling data. Disks these days were also very expensive, hence the meaning of SLED. Another big problem was, and still is, loss of data because of disk failure. A solution for this was much needed. 3.To take care of these problems IBM co-sponsored Berkeley University of California to build a disk array subsystem to which IBM had received a patent in 19784. In 1987 Randy Katz and Dave Patterson, both working at Berkeley University of California, had succeeded. They called the solution RAID5. RAID stand for Redundant Array of Inexpensive Disks, although some people chose to change Inexpensive , which is the original word, to Independent . Randy and Dave clustered multiple smaller and less expensive disks into an array. By doing this, all disks appear to the rest of the world as if there was just one single large disk. The result was compared to SLED:s according to cost versus performance. It turned out to that RAID had the same or superior performance as SLED, but with a theoretical Meantime Before Data Loss (MTBDL) that was reduced to an acceptable level. The search for a way of decreasing the MTBDL now started. There was need for a way to prevent single disk drive failures from causing data loss within the array of disks. The result was the six RAID levels 0 through 5. The different RAID levels determine how data is distributed across the drives in the array and the level of redundancy. A RAID system does not prevent hard drive failure. Hard drives always fail. RAID level 1 through 5 offer protection against the data loss caused by drive failure. RAID level 0 is a non-redundant method which only increases transfer speed.

Why RAID?

RAID has for a long time been something that you only find in large server systems, but lately

cheaper RAID controller card have made it possible to get a RAID system even for small servers and home computers. These will of course not have all the features, which the more expensive ones have.

Different levels of RAID have different advantages and disadvantages. Therefore one must make an analysis of the workload before deciding what to buy. The choice also much depend on the quality attributes needed. Some examples of quality attributes one can get by using a RAID system is data redundancy, fault tolerance, increased capacity and increased performance11.

How does RAID work? The main idea behind RAID is, as mentioned in the introduction, to take some inexpensive disks and group them together, which will make the system see them as one single disk. This is done by using a RAID controller card that handle all I/O to the disks, and which knows where the stored data can be found. RAID works in three different ways to provide the quality attributes mentioned above. These ways are mirroring, striping and parity, of which each can be used either separately or mixed with one or more of the others. This is why RAID is divided into different levels.

Disk Striping Fundamental to RAID technology is striping. This is a method of combining multiple drives into one logical storage unit. Striping partitions the storage space of each drive into stripes, which can be as small as one sector (512 bytes) or as large as several megabytes. These stripes are then interleaved in a rotating sequence, so that the combined space is composed alternately of stripes from each drive. The specific type of operating environment determines whether large or small stripes should be used. Most operating systems today support concurrent disk I/O operations across multiple drives. However, in order to maximize throughput for the disk subsystem, the I/O load must be balanced across all the drives so that each drive can be kept busy as much as possible. In a multiple drive system without striping, the disk I/O load is never perfectly balanced. Some drives will contain data files that are frequently accessed and some drives will rarely be accessed.

Mirroring The easiest way to get both availability and fault tolerance is to make a copy of all data on a second disk. This is called mirroring and you normally get one MB for every two MB of physical disk space. You will always have the second disk to read from if the other disk fails. The disadvantages of this method are waste of disk space and that you will not get higher write performance. You can however get higher read performance because reads can occur simultaneously on every drive.

Parity Mirroring and striping are fairly easy to understand. Parity however is a bit more complicated. In the same way as with mirroring, it is used to improve the availability but without the waste of space. If you have X number of data elements, they can be used to create a parity. Then you end up with X+1 data elements. It is always possible to recover a lost element by using the others. The advantage with parity is of course that you have no single point of failure. However, to achieve this, it will cost a lot of computing power.

Different levels of RAID Now when we know what RAID can do, we will look at how it is implemented in the different levels of RAID. The ones described below are the standard RAID levels. Some companies though, have developed their own levels of RAID. RAID 0 RAID 0 is typically defined as a group of striped disk drives without parity or data redundancy. RAID 0 arrays can be configured with large stripes for multi-user environments or small stripes for single-user systems that access long sequential records. RAID 0 arrays deliver the best data storage efficiency and performance of any array type. The disadvantage is that if one drive in a RAID 0 array fails, the entire array fails.

RAID 0 Non-Redundant Striped Array Writes can occur simultaneously on every drive. Reads can occur simultaneously on every drive.

RAID 1 It is also known as disk mirroring, is simply a pair of disk drives that store duplicate data but appear to the computer as a single drive. Although striping is not used within a single mirrored drive pair, multiple RAID 1 arrays can be striped together to create a single large array consisting of pairs of mirrored drives. All writes must go to both drives of a mirrored pair so that the information on the drives is kept identical. However, each individual drive can perform simultaneous, independent read operations. Mirroring thus doubles the read performance of a single non-mirrored drive and while the write performance is unchanged. RAID 1 delivers the best performance of any redundant array type. In addition, there is less performance degradation during drive failure than in RAID 5 arrays. RAID 1 Mirrored Arrays Duplicate data is written to pairs of drives. RAID 2 RAID 2 use striping and a special kind of redundancy technique, which is not described above. The technique used is bit level striping with Hamming code ECC. Separate disks are used for data storage and ECC. The Hamming codes are calculated and written to the ECC disks at the same time as data is written to its specific disk. The code is calculated again when data is read from the disks. This is done to check that it has not been changed since the time it was written. The complicated and expensive RAID controller hardware needed for this level of RAID, and the minimum number of hard drives required, is the reason this level is not used . RAID 2 Parallel Array with ECC Each write operation spans all drives. Each read operation spans all drives. RAID 3 As with RAID 2, sector-stripes data across groups of drives, but one drive in the group is dedicated to storing parity information. RAID 3 relies on the embedded ECC in each sector for error detection. In the case of drive failure, data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the remaining drives. Records typically span all drives, which optimizes the disk transfer rate. Because each I/O request accesses every drive in the array, RAID 3 arrays can satisfy only one I/O request at a time. RAID 3 delivers the best performance for single-user, single-tasking environments with long records. Synchronized-spindle drives are required for RAID 3 arrays in order to avoid performance degradation with short records. Because RAID 5 arrays with small stripes can

yield similar performance to RAID 3 arrays, RAID 3 is not supported by Adaptec RAID controllers. RAID 3 Parallel Array with Parity Read and write operations span all drives. Parallel access decreases data transfer time for long sequential records.. RAID 4 RAID 4 is very similar to RAID 3. The difference is that it use larger blocks of data than RAID 3 does. This makes it possible to change the data stripe size to suit the applications needs. In the same way as with RAID 3, however, the extra parity disk will have a negative impact on performance. Parallel Array with Parity Every write must update dedicated parity drive. Reads can occur simultaneouslyon every data drive. RAID 5 This level is similar to RAID 4 in such a way that it uses data striping with larger blocks of data. The difference is that it tries to remove the bottleneck that RAID 4 has. That is done by skipping the dedicated parity disks. The parity is instead distributed amongst all disks. When the bottleneck is removed still one problem remains. It still needs to calculate the parity, and that is still more costly than for example mirroring. For fault tolerance and performance reasons, the data and parity are never stored on the same disk. This means that if one disk goes down, that data can always be recovered by using data on other disks to calculate what data disappeared. This level is one of the most used levels today, because many think it is the best combination of the quality attributes such as performance and fault tolerance.

In summary: RAID 0 is the fastest and most efficient array type but offers no fault-tolerance. RAID 0 requires a minimum of two drives. RAID 1 is the best choice for performance-critical, fault-tolerant environments. RAID 1 is the only choice for fault-tolerance if no more than two drives are used. RAID 2 is seldom used today since ECC is embedded in all hard drives. RAID 2 is not supported by Adaptec RAID controllers.

RAID 3 can be used to speed up data transfer and provide fault-tolerance in single-user environments that access long sequential records. However, RAID 3 does not allow overlapping of multiple I/O operations and requires synchronized-spindle drives to avoid performance degradation with short records. Because RAID 5 with a small stripe size offers similar performance, RAID 3 is not supported by Adaptec RAID controllers. RAID 4 offers no advantages over RAID 5 and does not support multiple simultaneous write operations. RAID 4 is not supported by Adaptec RAID controllers. RAID 5 combines efficient, fault-tolerant data storage with good performance characteristics. However, write performance and performance during drive failure is slower than with RAID 1. Rebuild operations also require more time than with RAID 1 because parity information is also reconstructed. At least three drives are required for RAID 5 arrays. Dual-Level RAID In addition to the standard RAID levels, Adaptec RAID controllers can combine multiple hardware RAID arrays into a single array group or parity group. In a dual-level RAID configuration, the controller firmware stripes two or more hardware arrays into a single array. NOTE The arrays being combined must both use the same RAID level. Dual-level RAID achieves a balance between the increased data availability inherent in RAID 1 and RAID 5 and the increased read performance inherent in disk striping (RAID 0). These arrays are sometimes referred to as RAID 0+1 or RAID 10 and RAID 0+5 or RAID 50.

Creating Data Redundancy RAID 5 offers improved storage efficiency over RAID 1 because only the parity information is stored, rather than a complete redundant copy of all data. The result is that three or more drives can be combined into a RAID 5 array, with the storage capacity of only one drive dedicated to store the parity information. Therefore, RAID 5 arrays provide greater storage efficiency than RAID 1 arrays. However, this efficiency must be balanced against a corresponding loss in performance. The parity data for each stripe of a RAID 5 array is the XOR of all the data in that stripe, across all the drives in the array. When the data in a stripe is changed, the parity information is also updated. There are two ways to accomplish this: The first method is based on accessing all of the data in the modified stripe and regenerating parity from that data. For a write that changes all the data in a stripe, parity can be generated without having to read from the disk, because the data for the entire stripe will be in the cache. This is known as full-stripe write. If only some of the data in a stripe is to change, the missing data (the data the host does not write) must be read from the disks to create the new parity. This is known as partial-stripe write. The efficiency of this method for a particular write operation depends on the number of drives in the RAID 5 array

and what portion of the complete stripe is written. The second method of updating parity is to determine which data bits were changed by the write operation and then change only the corresponding parity bits. This is done by first reading the old data which is to be overwritten. This data is then XORed with the new data that is to be written. The result is a bit mask which has a 1 in the position of every bit which has changed. This bit mask is then XORed with the old parity information from the array. This results in the corresponding bits being changed in the parity information. The new updated parity is then written back to the array. This results in two reads, two writes and two XOR operations. This is known as readmodify-write.

Combinations of different levels As explained above the different levels have different advantages. Then why not combine two levels and get the advantages from both? Well, that was the idea behind combinations such as 0+1, 1+0, 0+3, 3+0, 0+5, 5+0, 1+5, and 5+1. The most common one is 1+0 because it gives you both performance and good data redundancy without the need of complicated and expensive hardware that other combinations needs. The extra waste of space may be cheaper than the more expensive RAID controller.

Are there any alternatives to RAID? Now when we know what RAID is and how it works, one may wonder if there are any other techniques that do the same or similar thing. As mentioned earlier in this document there are obvious advantages to RAID compared to other techniques, for instance SLED (Single Large Expensive Disk). However, RAID is not all good though. Since a RAID system is placed at one single geographical location, it is very vulnerable to disasters. If the room is flooded, all data on the disks may be lost. Moreover, if there is a temporary failure, because of a power outage, a hardware or software failure, etc., then the data on a RAID will be unavailable for the duration of the outage.

Conclusion

There are some disadvantages about RAID. They are however few compared to the advantages, like the high availability and performance. These are things that have become more important, especially since companies to day use Internet for making business on a global market. One must not forget that technology and techniques are developed and fine

tuned every day. This means that alternative technologies, like RADD, may be a complement to RAID or even replace it. The idea behind RAID is rather simple. It is important to understand it, at least a high level. This is important knowledge have to be able to make a good decision about what RAID level to use. The choice of level may have great impact of how well the system will work. Nowadays RAID controllers have become cheaper. Therefore RAID has become available to a bigger market. It is even possible to find it in ordinary workstations and small company servers.

RAID Papers
(Berkeley FTP pointers updated, 95/5/11) A nice collection of RAID papers was published in the Fall, 1991 issue of _CMG Transactions_. A few more appeared in the December, 1992 _CMG Proceedings_ and there are 3 RAID papers in the 1993 International Symposium on Computer Architecture (Published as _Computer Architecture News 21_, #2, May, 1993 by ACM SIGARCH. (dwilmot@crl.com, Dick Wilmot, Editor, Independent RAID Report) There is a short RAID FAQ at ftp.mcs.com (rdv, 96/2/21) Try contacting the RAID project at the University of California, Berkeley. In the proceedings of the recent IEEE Mass Storage Symposium, Ann Drapeau and Randy Katz have a paper describing the reults of some investigations into the use of tape arrays. I think you can find RAID papers, perhaps this one, on anon ftp at ftp.cs.berkeley.edu. Have no address for Ann Drapeau, but Randy Katz is randy@cs.berkeley.edu. Some of the RAID papers are available via anon ftp from ftp.cs.berkeley.edu:pub/raid/papers Ann Drapeau's email address is alc@cs.berkeley.edu. (dm_devaney@pnl.gov, Mike DeVaney) (eklee@cs.berkeley.edu, Edward K. Lee) You could get that lengthy RAID taxonomy research report from Storage Computer as mentioned recently on these news groups, by Emailing them at RAID7@World.std.com Alternatively, their phone number is 603 880 3005. I do not know if their RAID research report is copyrighted or not.

I believe their executive in charge of RAID activities in Hong Kong would be John Taylor, the former Wang national accounts director. They also put on technical raid seminars which might be of interest to your PhD students, concentrating on performance enhancements over RAID 3/4/5 (somewhat less than an order of magnitude, but I have not reviewed their benchmark data.) The RAID theory discussed is rather interesting.

(MICHAEL.WILLETT@OFFICE.WANG.COM, Michael Willett) --------- >> I am looking for papers or technical papers on RAID or other multiple disks >> storage systems. Could somebody give me pointers for them?

Here are some papers that I either have read or am looking for: I don't have copies of this group:

Dishon, Yitzhak; Lui, T.S.; Disk Dual Copy Methods and Their Performance; FTCS-18: Eighteenth International Symposium on Fault-Tolerant Computing, Digest of Papers p 314-318

Gray, J.N. et. al., Parity Striping of Disk Arrays: Low Cost Reliable Storage With Acceptable Throughput, 16th International Conference on VLDB (Austrailia, August 1990)

Katz, R.H.; Patterson, D.A.; Gibson, G.A.; Disk System Architectures for High Performance Computing; Proc. IEEE v 78 n 2 Feb 1990

Muntz, Richard R.; Lui, John C.S.; Proformance Analysis of Disk Arrays Under Failure; Proceedings of the 16th International Conference on Very Large Data Bases (VLDB); Dennis Mcleod, Ron Sacks-Davis, Hans Schek (Eds.), Morgan Kaufmann Publishers, Aug 1990 pp 162-173

Ng, Spencer; Some Design Issues of Disk Arrays; Compcon '89: Thirty-Fourth IEEE Computer Society Internationsl Conference p 137-142 DISK ARRAYS, STRIPING, SPINDLE SYCHRONIZATION

Ng, Spencer W.; Improving Disk Performance via Latency Reduction; IEEE Transactions on Computers v 40 1 Jan 1991 p22-30 LATENCY REDUCTION, ROTATION LATENCY, DISK PERFORMANCE

Reddy, A.L. Narasimha; Banerjee, Prithviraj; Performance Evalutaion of Multiple-Disk I/O Systems; Proceedings of the 1989 International Conference on Parallel Processing p 315-318

Here are some good papers on disk arrays with emphasis on RAID:

Chen, Peter M.; Gibson, Garth A.; Katz, Randy H.; Patterson, David A.; Evaluation of Redundant Arrays of Disks Using an Amdahl 5890; 1990 ACM SIGMETRICS Conference on Measurement & Modeling of Computer Systems p 74-85

Chen, Peter M.; Patterson, David A.; Maximizing Performance in a Striped Disk Array; Proceedings of the 17th IEEE Annual International Symposium on Computer Architecture p 322-331

Chen, Shenze; Don Towsley; Performance of a Mirrored Disk in a Real-Time Transaction System; 1991 ACM SIGMETRICS Conference on Measurement & Modeling of Computer Systems p 198-207

Chervenak, Ann L.; Katz, Randy H.; Performance of a Disk Array Prototype; ACM SIGMETRICS 1991 Conference Proceedings p 188-197

Menon, J.; Mattson, R.L. and Spencer, N.; Distributed Sparing for Improved Performance of Disk Arrays; IBM Research Report RJ 7943 (Jan. 1991)

Patterson, David A.; Chen, Peter; Gibson, Garth; Katz, Randy H.; Introduction of Redundant Arrays of Inexpensive Disks (RAID); Compcon 1989: Thirty-Fourth IEEE Computer Society International Conference

p 112-117

Schulze, Martin; Gibson, Garth; Katz, Randy; Patterson, David A.; How Reliable is a RAID; Compcon '89: Thirty-Fourth IEEE Computer Society International Conference p 118-123

(danj@hub.parallan.com, Dan Jones) -------- >>I am looking for papers or technical papers on RAID...

A good set of the Berkeley papers are available via anonymous FTP. If I remember, the machine was ftp.cs.berkeley.edu. Also, an archie search on "RAID" would probably turn up a nice on-line collection of information. (sorry, not at an Internet site to check this right now...)

(buck@siswat.hou.tx.us , Lester Buck)

Further Information: %A Garth Gibson %A Randy H. Katz %T A case for redundant arrays of inexpensive disks (RAID) %C Proc. SIGMOD. %c Chicago, Illinois %D 1--3 June 1988 %P 109 116 %k RAID, disk striping, reliability, availability, performance %k disk arrays, SCSI, hardware failures, MTTR, MTBF %k secondary storage %L Jacobson has a copy %x Increasing the performance of CPUs and memories will be

%x squandered if not matched by a similar performance increase in %x I/O. While the capacity of Single Large Expensive Disks (SLED) %x has grown rapidly, the performance improvement of SLED has been %x modest. Redundant Arrays of Inexpensive Disks (RAID), based %x on the magnetic disk technology developed for personal %x computers, offers an attractive alternative to SLED, promising %x improvements of an order of magnitude in performance, %x reliability, power consumption, and scalability. This paper %x introduces five levels of RAIDs, giving their relataive %x cost/performance, and compares RAID to an IBM 3380 and a %x Fujitsu Super Eagle.

(tage@cs.utwente.nl)

Das könnte Ihnen auch gefallen