Beruflich Dokumente
Kultur Dokumente
2003 Hewlett-Packard Development Company, L.P. Microsoft, Windows, and Windows NT are US registered trademarks of Microsoft Corporation. Hewlett-Packard Company shall not be liable for technical or editorial errors or omissions contained herein. The information in this document is provided as is without warranty of any kind and is subject to change without notice. The warranties for HP products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty. HP Smart Array 641/642 Controller User Guide July 2003 (Second Edition) Part Number 309311-002
Contents
iii
Contents
iv
Contents
Appendix B Electrostatic Discharge Appendix C Controller Specifications Appendix D Drive Arrays and Fault Tolerance
What Is a Drive Array? .................................................................................................. D-1 Fault-Tolerance Methods................................................................................................ D-5 Hardware-Based Fault-Tolerance Methods ............................................................. D-5 Alternative Fault-Tolerance Methods .................................................................... D-11
This guide provides step-by-step instructions for installation, and reference information for troubleshooting, for the HP Smart Array 641 and 642 Controllers.
Audience Assumptions
This guide is for the person who installs, administers, and troubleshoots servers. HP assumes you are qualified in the servicing of computer equipment and trained in recognizing hazards in products with hazardous energy levels.
Symbols on Equipment
The following symbols may be placed on equipment to indicate the presence of potentially hazardous conditions:
WARNING: This symbol, in conjunction with any of the following symbols, indicates the presence of a potential hazard. The potential for injury exists if warnings are not observed. Consult your documentation for specific details.
vii
This symbol indicates the presence of hazardous energy circuits or electric shock hazards. Refer all servicing to qualified personnel. WARNING: To reduce the risk of injury from electric shock hazards, do not open this enclosure. Refer all maintenance, upgrades, and servicing to qualified personnel. This symbol indicates the presence of electric shock hazards. The area contains no user or field serviceable parts. Do not open for any reason. WARNING: To reduce the risk of injury from electric shock hazards, do not open this enclosure.
This symbol indicates the presence of a hot surface or hot component. If this surface is contacted, the potential for injury exists. WARNING: To reduce the risk of injury from a hot component, allow the surface to cool before touching it.
Symbols in Text
These symbols may be found in the text of this guide. They have the following meanings.
WARNING: Text set off in this manner indicates that failure to follow directions in the warning could result in bodily harm or loss of life.
CAUTION: Text set off in this manner indicates that failure to follow directions could result in damage to equipment or loss of information.
IMPORTANT: Text set off in this manner presents essential information to explain a concept or complete a task. NOTE: Text set off in this manner presents additional information to emphasize or supplement important points of the main text.
viii
Related Documents
For additional information on the topics covered in this guide, refer to the following documentation: HP Array Configuration Utility User Guide, available on the software CD that is provided with the server, or downloadable from the HP website. HP Servers Troubleshooting Guide, available on the Documentation CD that is provided with the server. HP ROM-Based Setup Utility User Guide, available on the Documentation CD that is provided with the server, or downloadable from the HP website.
Getting Help
If you have a problem and have exhausted the information in this guide, you can get further information and other help in the following locations.
Technical Support
In North America, call the HP Technical Support Phone Center at 1-800-652-6672. This service is available 24 hours a day, 7 days a week. For continuous quality improvement, calls may be recorded or monitored. Outside North America, call the nearest HP Technical Support Phone Center. Telephone numbers for worldwide Technical Support Centers are listed on the HP website, http://www.hp.com. Be sure to have the following information available before you call HP: Technical support registration number (if applicable) Product serial number Product model name and number Applicable error messages Add-on boards or hardware Third-party hardware or software
ix
HP Website
The HP website has information on this product as well as the latest drivers and flash ROM images. You can access the HP website at http://www.hp.com.
Authorized Reseller
For the name of your nearest authorized reseller: In the United States, call 1-800-345-1518. In Canada, call 1-800-263-5868. Elsewhere, see the HP website for locations and telephone numbers.
Readers Comments
HP welcomes your comments on this guide. Send your comments and suggestions to
ServerDocumentation@hp.com.
1
Installation Overview
The recommended procedure for installing the controller depends on whether the server has been configured, and whether it can self-configure during its first powerup. (To determine whether a server is autoconfigurable, refer to the server-specific setup and installation guide.) The flowcharts on the following pages summarize the recommended procedure for each situation.
2
.......
Install physical drives if necessary. (The number of drives determines the RAID level that is autoconfigured.) :
3 5
Create and format additional logical drives if desired (Chapter 5).
.......
1-1
Installation Overview
1-2
Installation Overview
If the controller is to be the boot device, install the device driver for the operating system (Chapter 6). Otherwise, continue with step 4. :
.....
If using the System Configuration Utility, update the system partition (Chapter 3), and then check the controller order (Chapter 4).
8
.....
If the controller is not to be the boot device, install the device driver for the operating system (Chapter 6). :
Update the Management Agents if new versions are available (Chapter 6). :
11
.....
10
1-3
2
Installing the Hardware
Before beginning the installation procedure, visit the HP website, http://www.hp.com/support, to confirm that you have the latest version of each driver and utility file needed. Compare the version numbers of the files there with those of the same files on the software CD or DVD that is supplied in the controller kit.
3. Power down any peripheral devices that are attached to the server. 4. Unplug the AC power cord from the outlet, and then from the server. 5. Disconnect any peripheral devices from the server.
2-1
1. Remove or open the access panel. 2. Select an available 3.3-V PCI or PCI-X slot. 3. Remove the slot cover or open the hot-plug latch. Save the retaining screw, if one is present. 4. Slide the controller board along the slot alignment guide, and press the board firmly into the slot so that the contacts on the board edge are properly seated in the system board connector. 5. Secure the controller board in place with the hot-plug latch or retaining screw. If there is a guide latch on the rear of the board, close the latch. 6. To finish installing the hardware, connect the internal and external drives by following the instructions given in Connecting Storage Devices.
2-2
SCSI buses require termination on both ends to prevent signal degradation. In HP ProLiant servers, however, the controller, SCSI cable, and backplane already provide this termination.
NOTE: Drives that are to be grouped in the same array should have the same capacity.
For additional information about drive installation, refer to the appropriate section in this guide (Replacing, Moving, or Adding Hard Drives) and consult the documentation that accompanied the drives. When you have finished installing drives, continue with the next step. If the drives are hot-pluggable, go to step 3. If the drives are not hot-pluggable, go to step 4. 3. Attach the internal point-to-point SCSI cable (provided with the server) from the internal connector of the controller to the hot-plug drive cage. Installation of the hot-pluggable drives is complete. 4. For each SCSI bus, manually set the SCSI ID on each drive to a unique value in the range of 0 to 15, except 7 (which is reserved for controller use). For detailed instructions, consult the documentation that is provided with the drive. 5. Attach a multi-device SCSI cable from the internal connector of the controller to the non-hot-pluggable hard drives. (The cable may have been provided with the server.) 6. Replace the access panel, and secure it with the thumbscrews if any are present.
2-3
CAUTION: Do not operate the server for long periods without the access panel. Operating the server without the access panel results in improper airflow and improper cooling that can lead to thermal damage.
Note: If additional cables are required, order by the option kit number.
2-4
3
Updating the Firmware
To update the firmware, you can use the Smart Components (also known as Online ROM Flash Components) that are available on the HP website,
http://www.hp.com/support/proliantstorage.
1. Locate the Smart Components for the operating system and controller that the server is using. 2. Follow the instructions for installing the components. These instructions are given on the same Web page as the components. 3. Follow the additional instructions that describe how to use the components to flash the ROM. These instructions are provided with each component. Alternatively, you can use the software CD that is provided in the controller kit. Printed instructions are provided with the CD. Because the Smart Components may be more recent than the firmware upgrade files on the CD, check the Smart Components on the website before using the updates on the CD.
IMPORTANT: If you are updating the firmware on a system that was configured using SCU, you must update the system partition immediately after you have finished updating the firmware. For more information, refer to Configuring the Server.
3-1
4
Configuring the Server
After installing the controller hardware and updating the firmware, configure the server by using either RBSU or SCU. For more information, refer to the HP ROMBased Setup Utility User Guide or the server-specific setup and installation guide.
Using RBSU
RBSU is a system configuration utility that is embedded in the system ROM. It is customized for the server on which it is installed.
CAUTION: Not all servers support RBSU. Do not flash an RBSU-ROM image onto a server that is already configured with SCU unless the update instructions specifically state that upgrading from SCU to RBSU is supported. If the upgrade is not supported, the consequences of upgrading are unpredictable, and you may lose data.
1. Power up the server. 2. Press the F9 key when prompted during system startup. The main RBSU screen is displayed. 3. Configure the system. (For detailed instructions, refer to the HP ROM-Based Setup Utility User Guide.) 4. Select Boot Controller Order on the main RBSU screen and follow the on-screen prompts to set the boot controller. 5. When you have finished using RBSU, press the Esc key, and then press the F10 key to confirm that you want to exit. The server reboots with the new configuration.
4-1
Using SCU
If you updated the firmware in a used system that was not configured using RBSU, you must use SCU immediately afterward to update the system partition. 1. Locate the page on the HP website (http://www.hp.com/support) that contains SCU, and then follow the on-screen instructions to create four SCU diskettes. 2. Insert SCU diskette #1 into the server diskette drive. 3. Restart the system. 4. Select System Configuration Utility from the menu or list of icons that is displayed. 5. Follow the on-screen instructions to update or create and populate a system partition. 6. Exit from SCU. If the server does not reboot or a CD error message is displayed, press the Ctrl+Alt+Del keys to reboot the server manually. When you have finished using SCU to configure the system, use ORCA immediately afterward to confirm that the controller order is unchanged, as follows: 1. Reboot the server. The POST sequence begins, and an ORCA prompt message is briefly displayed. 2. Press the F8 key to start ORCA.
NOTE: The ORCA prompt is displayed for only a few seconds. If you do not press the F8 key during this time, you must restart the server to obtain the prompt again.
3. On the Main Menu screen, select Select as Boot Controller. 4. Follow the remaining prompts to set the currently selected controller as the boot controller for the system. If you want to use ORCA to create logical drives, you do not need to exit the utility at this point. Continue as described in Chapter 5.
4-2
5
Configuring an Array
HP provides two utilities for manually configuring an array on a Smart Array controller: Array Configuration Utility (ACU)A versatile, browser-based utility that provides maximum control over configuration parameters Option ROM Configuration for Arrays (ORCA)A simple ROM-based configuration utility that runs on all operating systems
NOTE: To copy a particular array configuration to several other servers on the same network, use the Array Configuration Replicator (ACR) or the scripting capability of ACU. ACR is provided in the SmartStart Scripting Toolkit, available at http://www.hp.com/servers/sstoolkit.
Whichever utility you use, the following limitations apply: For the most efficient use of drive space, do not mix drives of different capacities within the same array. The configuration utility treats all physical drives in an array as if they have the same capacity as the smallest drive in the array. The excess capacity of any larger drives is wasted because it is unavailable for data storage. The probability that an array will experience a hard drive failure increases with the number of hard drives in the array. If you configure a logical drive with RAID 5, keep the probability of failure low by using no more than 14 physical drives in the array.
5-1
Configuring an Array
ORCA* n n n n n n
y y y y y y
y y y y
y y y y y
y y y y y y y y y n
y y n n n n n n n y
5-2
Configuring an Array
For conceptual information about arrays, logical drives, and fault-tolerance methods, refer to Appendix D.
Using ACU
For detailed information about using ACU, refer to the HP Array Configuration Utility User Guide. This document is available on the Documentation CD that is provided in the controller kit.
Using ORCA
When a server is powered up, the Power-On Self-Test (POST) runs, and any array controllers that are in the system are initialized. If the array controller supports ORCA, POST temporarily halts, and an ORCA prompt message is displayed for approximately five seconds. (If ORCA is not supported, the prompt message is not displayed, and the system continues with the startup sequence.) While the prompt is displayed, press the F8 key to start ORCA. The ORCA main menu is displayed, allowing you to create, view, or delete a logical drive. (On a ProLiant system, you can also use ORCA to set the currently selected controller as the boot controller.)
5-3
Configuring an Array
Configuration Procedure
To create a logical drive using ORCA: 1. Select Create Logical Drive. The screen displays a list of all available (unconfigured) physical drives and the valid RAID options for the system. 2. Use the Arrow keys, Spacebar, and Tab key to navigate around the screen and set up the logical drive, including an online spare drive if one is required.
NOTE: You cannot use ORCA to configure one spare drive to be shared among several arrays. Only ACU enables you to configure shared spare drives.
While configuring the logical drive, one of the settings allows you to use either 4 GB or 8 GB as the maximum boot drive size. Selecting 8 GB allows a larger boot partition for operating systems such as Microsoft Windows NT 4.0 that use cylinders, heads, and sectors of a physical drive to determine the drive size. The larger boot drive size also lets you increase the size of the logical drive at some later time. However, logical drive performance is likely to decrease if the larger boot drive size is enabled. 3. Press the Enter key to accept the settings. 4. Press the F8 key to confirm the settings and save the new configuration. After several seconds, the Configuration Saved screen is displayed. 5. Press the Enter key to continue. You can now create another logical drive by repeating the previous steps.
NOTE: Newly created logical drives are invisible to the operating system. To make the new logical drives available for data storage, format them using the instructions given in the operating system documentation.
5-4
6
Installing the Device Drivers and Management Agents
Device Drivers
The drivers for the controller are located on the Support Software CD or the SmartStart CD that is provided in the controller kit. Updates are posted to the HP website, http://www.hp.com/support. Using the Support Software CD: Instructions for installing the drivers from the Support Software CD are given in the leaflet that is supplied with the CD. Note that the exact procedure depends on whether the server is new or already contains the operating system and user data. Using the SmartStart CD: If you use the Assisted Installation path feature of SmartStart to install the operating system on a new server, the drivers are automatically installed at the same time. You can also use SmartStart to update the drivers manually on older systems. For further information, refer to the SmartStart documentation.
Management Agents
If you use the Assisted Installation path feature of SmartStart to install the operating system on a new server, the Management Agents are automatically installed at the same time.
6-1
You can update the Management Agents on older servers by using the latest versions of the agents from one of these sources: The Management CD, obtainable from your local HP reseller or authorized service provider The SmartStart CD The HP website, http://www.hp.com/servers/manage
For the procedure to update the agents, refer to the documentation on the Management CD or on the HP website. If the new agents do not function correctly, you may also need to update Insight Manager. The latest versions of Insight Manager are also available for download at the HP website.
6-2
7
Upgrading or Replacing the Cache
WARNING: There is a risk of explosion, fire, or personal injury if the battery pack in the cache is not properly handled. Refer to the Battery Replacement Notice in Appendix A before installing or removing the cache.
To install a cache module: 1. Pull out the latches on the memory socket (1). 2. Insert the cache module straight into the socket (2) applying equal pressure on both ends of the module until the latches snap into place. If the latches do not snap into place, press them inward to secure the module in the socket.
7-1
3. With the lip on the retainer clip facing the cache module, insert one prong of the clip into the corresponding hole in the controller board just above the cache module (1). 4. Squeeze the prongs together slightly (2) and insert the other prong into the remaining hole (3).
7-2
A
Regulatory Compliance Notices
A-1
Class A Equipment
This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment generates, uses, and can radiate radio frequency energy and, if not installed and used in accordance with the instructions, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case the user will be required to correct the interference at personal expense.
Class B Equipment
This equipment has been tested and found to comply with the limits for a Class B digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide reasonable protection against harmful interference in a residential installation. This equipment generates, uses, and can radiate radio frequency energy and, if not installed and used in accordance with the instructions, may cause harmful interference to radio communications. However, there is no guarantee that interference will not occur in a particular installation. If this equipment does cause harmful interference to radio or television reception, which can be determined by turning the equipment off and on, the user is encouraged to try to correct the interference by one or more of the following measures: Reorient or relocate the receiving antenna. Increase the separation between the equipment and receiver. Connect the equipment into an outlet on a circuit that is different from that to which the receiver is connected. Consult the dealer or an experienced radio or television technician for help.
A-2
Declaration of Conformity for Products Marked with the FCC Logo, United States Only
This device complies with Part 15 of the FCC Rules. Operation is subject to the following two conditions: (1) this device may not cause harmful interference, and (2) this device must accept any interference received, including interference that may cause undesired operation. For questions regarding your product, contact us by mail or telephone: Hewlett-Packard Company P. O. Box 692000, Mail Stop 530113 Houston, Texas 77269-2000 1-800-652-6672 (For continuous quality improvement, calls may be recorded or monitored.)
For questions regarding this FCC declaration, contact us by mail or telephone: Hewlett-Packard Company P. O. Box 692000, Mail Stop 510101 Houston, Texas 77269-2000 1-281-514-3333
To identify this product, refer to the part, series, or model number found on the product.
Modifications
The FCC requires the user to be notified that any changes or modifications made to this device that are not expressly approved by Hewlett-Packard Company may void the users authority to operate the equipment.
Cables
Connections to this device must be made with shielded cables with metallic RFI/EMI connector hoods in order to maintain compliance with FCC Rules and Regulations.
A-3
Class B Equipment
This Class B digital apparatus meets all requirements of the Canadian Interference-Causing Equipment Regulations. Cet appareil numrique de la classe B respecte toutes les exigences du Rglement sur le matriel brouilleur du Canada.
A-4
BSMI Notice
Japanese Notice
Korean Notices
A-5
Do not try to recharge the batteries if they are disconnected from the controller. Do not expose the battery pack to water, or to temperatures higher than 60C. Do not abuse, disassemble, crush, or puncture the battery pack. Do not short the external contacts. Replace the battery pack only with the designated HP spare.
Battery disposal should comply with local regulations. Alternatively, use established parts return methods to return the battery pack to HP for disposal.
Batteries, battery packs, and accumulators should not be disposed of together with the general household waste. To forward them to recycling or proper disposal, use the public collection system or return them to HP, your authorized HP Partners, or their agents.
For more information about battery replacement or proper disposal, contact your HP authorized reseller or your authorized service provider.
A-6
B
Electrostatic Discharge
To prevent damaging the system, be aware of the precautions you need to follow when setting up the system or handling parts. A discharge of static electricity from a finger or other conductor may damage system boards or other static-sensitive devices. This type of damage may reduce the life expectancy of the device. To prevent electrostatic damage, observe the following precautions: Avoid hand contact by transporting and storing products in static-safe containers. Keep electrostatic-sensitive parts in their containers until they arrive at static-free workstations. Place parts on a grounded surface before removing them from their containers. Avoid touching pins, leads, or circuitry. Always be properly grounded when touching a static-sensitive component or assembly.
There are several methods for grounding. Use one or more of the following methods when handling or installing electrostatic-sensitive parts: Use a wrist strap connected by a ground cord to a grounded workstation or computer chassis. Wrist straps are flexible straps with a minimum of 1 megohm resistance in the ground cords. To provide proper ground, wear the strap snug against the skin. Use heel straps, toe straps, or boot straps at standing workstations. Wear the straps on both feet when standing on conductive floors or dissipating floor mats. Use conductive field service tools.
B-1
Electrostatic Discharge
Use a portable field service kit with a folding static-dissipating work mat.
If you do not have any of the suggested equipment for proper grounding, have an HP authorized reseller install the part.
NOTE: For more information on static electricity, or assistance with product installation, contact your HP authorized reseller.
B-2
C
Controller Specifications
For more information about the controller features and specifications, refer to
http://www.compaq.com/smartarray.
C-1
D
Drive Arrays and Fault Tolerance
R/W
P1
P2
P3
D-1
With an array controller installed in the system, the capacity of several physical drives can be combined into one or more virtual units called logical drives (also called logical volumes and denoted by Ln in the figures in this section). Then, the read/write heads of all the constituent physical drives are active simultaneously, reducing the total time required for data transfer.
L1
P1
P2
P3
Because the read/write heads are active simultaneously, the same amount of data is written to each drive during any given time interval. Each unit of data is called a block (denoted by Bn in Figure D-3), and adjacent blocks form a set of data stripes (Sn) across all the physical drives that comprise the logical drive.
D-2
S1 S2 S3 S4
B1
B4
B7
B10
B2
B5
B8
B11
B3
B6
B9
B12
For data in the logical drive to be readable, the data block sequence must be the same in every stripe. This sequencing process is performed by the array controller, which sends the data blocks to the drive write heads in the correct order. A natural consequence of the striping process is that each physical drive in a given logical drive contains the same amount of data. If one physical drive has a larger capacity than other physical drives in the same logical drive, the extra capacity is wasted because it cannot be used by the logical drive. The group of physical drives containing the logical drive is called a drive array, or just array (denoted by An in Figure D-4). Since all the physical drives in an array are commonly configured into just one logical drive, the term array is also often used as a synonym for logical drive. However, an array can contain several logical drives, each of a different size.
D-3
A1
A2
L3 L1 L2 L4 L5
Figure D-4: Two arrays (A1, A2) containing five logical drives spread across five physical drives
Each logical drive in an array is distributed across all of the physical drives within the array. A logical drive can also extend across more than one port on the same controller, but it cannot extend across more than one controller. Drive failure, although rare, is potentially catastrophic. In Figure D-4, for example, failure of any physical drive causes all logical drives in the same array to fail, and all data on the drives is lost. To protect against data loss due to physical drive failure, logical drives are configured with fault tolerance. For more information, refer to Fault-Tolerance Methods. For any configuration except RAID 0, further protection against data loss can be achieved by assigning a drive as an online spare (or hot spare). This drive contains no data and is connected to the same controller as the array. When any other physical drive in the array fails, the controller automatically rebuilds information that was originally on the failed drive to the online spare. The system is thus restored to full RAID-level data protection, although it now no longer has an online spare. (However, in the unlikely event that another drive in the array fails while data is being rewritten to the spare, the logical drive will still fail.)
D-4
When you configure an online spare, it is automatically assigned to all logical drives in the same array. Additionally, you do not need to assign a separate online spare to each array. Instead, you can configure one hard drive to be the online spare for several arrays if the arrays are all on the same controller.
Fault-Tolerance Methods
Several fault-tolerance methods exist. Those that are most often used with Smart Array controllers are hardware-based RAID methods. Two alternative fault-tolerance methods that are sometimes used are described in the section Alternative Fault-Tolerance Methods. However, hardware-based RAID methods provide a much more robust and controlled fault-tolerance environment, so these alternative methods are seldom used.
RAID 0No Fault Tolerance A RAID 0 configuration (refer to Figure D-3 for an example) provides data striping, but there is no protection against data loss when a drive fails. However, it is useful for rapid storage of large amounts of non-critical data (for printing or image editing, for example), or when cost is the most important consideration.
D-5
Advantages Has the highest write performance of all RAID methods. Has the lowest cost per unit of stored data of all RAID methods. All drive capacity is used to store data (none is needed for fault tolerance).
Disadvantages All data on the logical drive is lost if a physical drive fails. Cannot use an online spare. Can only preserve data by backing it up to external drives.
RAID 1+0Drive Mirroring In a RAID 1+0 configuration, data is duplicated to a second drive.
B1
B2
B3
B4
B1
B2
B3
B4
P1
P2
When the array has more than two physical drives, drives are mirrored in pairs.
D-6
S1 S2
B1
B2
B3
B4
B5
B6
B7
B8
P1
P2
P3
P4
P5
P6
B2
B6
P7
B3
B7
P8
B4
B8
S1 S2
B1
B5
Figure D-6: Mirroring with more than two physical drives in the array
In each mirrored pair, the physical drive that is not busy answering other requests answers any read requests that are sent to the array. (This behavior is called load balancing.) If a physical drive fails, the remaining drive in the mirrored pair can still provide all the necessary data. Several drives in the array can fail without incurring data loss, as long as no two failed drives belong to the same mirrored pair. This fault-tolerance method is useful when high performance and data protection are more important than the cost of physical drives.
NOTE: When there are only two physical drives in the array, this fault-tolerance method is often referred to as RAID 1.
Advantages Has the highest read and write performance of any fault-tolerant configuration. No data is lost as long as no failed drive is mirrored to another failed drive (up to half of the physical drives in the array can fail).
Disadvantages This method is expensive (many drives are needed for fault tolerance). Only half of the total drive capacity is usable for data storage.
D-7
RAID 5Distributed Data Guarding In a RAID 5 configuration, data protection is provided by parity data (denoted by Px,y in Figure D-7). This parity data is calculated stripe by stripe from the user data that is written to all other blocks within that stripe. The blocks of parity data are distributed evenly over every physical drive within the logical drive.
S1 S2 S3 S4
B1
B3
P5,6
B7
B2
P1,2
B4
B6
P3,4
B5
B8
P7,8
When a physical drive fails, data that was on the failed drive can be calculated from the remaining parity data and user data on the other drives in the array. This recovered data is usually written to an online spare in a process called a rebuild. This configuration is useful when cost, performance, and data availability are equally important. Advantages Has high read performance. Data is not lost if only one physical drive fails. More drive capacity is usable than with RAID 1+0parity information requires only the storage space equivalent to one physical drive.
D-8
Disadvantages Has relatively low write performance. Data is lost if a second drive fails before data from the first failed drive is rebuilt.
RAID ADG, like RAID 5, generates and stores parity information to protect against data loss caused by drive failure. With RAID ADG, however, two different sets of parity data are used (denoted by Px,y and Qx,y in Figure D-8), allowing data to still be preserved if two drives fail. Each set of parity data uses a capacity equivalent to that of one of the constituent drives.
B1
B3 P5,6 Q7,8
B2
P3,4
Q5,6
P1,2
Q3,4
Q1,2
B4
B6
P7,8
B5
B8
B7
This method is most useful when data loss is unacceptable, but cost is also an important factor. The probability that data loss will occur when an array is configured with RAID ADG is less than it would be if it were configured with RAID 5.
D-9
Advantages Has high read performance. Allows high data availabilityany two drives can fail without loss of critical data. More drive capacity is usable than with RAID 1+0parity information requires only the storage space equivalent to two physical drives.
Disadvantage The main disadvantage of RAID ADG is a relatively low write performance (lower than RAID 5), because of the need for two sets of parity data. Comparison of RAID Methods Table D-1 summarizes the important features of the different kinds of RAID methods described here. The decision chart in Table D-2 may help you to determine which option is best for your situation.
Table D-1: Summary of RAID Methods
RAID 0 Alternative name Usable drive space** Usable drive space formula Minimum number of physical drives Tolerates failure of one physical drive? Tolerates simultaneous failure of more than one physical drive? Striping (no fault tolerance) 100% n 1 No No RAID 1+0 Mirroring 50% n/2 2 Yes Only if no two failed drives are in the same mirrored pair RAID 5 Distributed Data Guarding 67% to 93% (n-1)/n 3 Yes No RAID ADG* Advanced Data Guarding 50% to 96% (n-2)/n 4 Yes Yes
continued
D-10
*Not all controllers support RAID ADG. **Values for usable drive space are calculated with these assumptions: (1) all physical drives in the array have the same capacity; (2) online spares are not used; (3) no more than 14 physical drives are used per array for RAID 5; (4) no more than 56 drives are used with RAID ADG.
D-11
Controller Duplexing uses two identical controllers with independent, identical sets of drives containing identical data. In the unlikely event of a controller failure, the remaining controller and drives will service all requests.
Neither of these alternative fault-tolerance methods supports online spares or automatic data recovery, nor do they support auto-reliability monitoring or interim data recovery. If you decide to use one of these alternative fault-tolerance methods, configure your arrays with RAID 0 for maximum storage capacity and refer to your operating system documentation for further implementation details.
D-12
E
Replacing, Moving, or Adding Hard Drives
E-1
On or flashing
E-2
For additional information about diagnosing hard drive problems, refer to the HP Servers Troubleshooting Guide.
CAUTION: Sometimes, a drive that has previously been failed by the controller may seem to be operational after the system is power-cycled or (for a hot-pluggable drive) after the drive has been removed and reinserted. However, continued use of such marginal drives may eventually result in data loss. Replace the marginal drive as soon as possible.
E-3
RAID 5 configurations can tolerate one drive failure. RAID ADG configurations can tolerate simultaneous failure of two drives.
E-4
After you have replaced the failed drives, the fault tolerance may again be compromised. If so, cycle the power again. If the 1779 POST message is displayed: a. Press the F2 key to re-enable the logical drives. b. Recreate the partitions. c. Restore all data from backup. To minimize the risk of data loss that is caused by compromised fault tolerance, make frequent backups of all logical volumes.
E-5
If you set the SCSI ID jumpers manually: Check the ID value of the removed drive to be sure that it corresponds to the ID of the drive marked as failed. Set the same ID value on the replacement drive to prevent SCSI ID conflicts.
Before replacing a degraded drive: Open Insight Manager and inspect the Error Counter window for each physical drive in the same array to confirm that no other drives have any errors. (For details, refer to the Insight Manager documentation on the Management CD.) Be sure that the array has a current, valid backup. Use replacement drives that have a capacity at least as great as that of the smallest drive in the array. The controller immediately fails drives that have insufficient capacity.
To minimize the likelihood of fatal system errors, take these precautions when removing failed drives: Do not remove a degraded drive if any other drive in the array is offline (the Online LED is off). In this situation, no other drive in the array can be removed without data loss. These cases are the exceptions: When RAID 1+0 is used, drives are mirrored in pairs. Several drives can be in a failed condition simultaneously (and they can all be replaced simultaneously) without data loss, as long as no two failed drives belong to the same mirrored pair. When RAID ADG is used, two drives can fail simultaneously (and be replaced simultaneously) without data loss. If the offline drive is a spare, the degraded drive can be replaced. Do not remove a second drive from an array until the first failed or missing drive has been replaced and the rebuild process is complete. (The rebuild is complete when the Online LED on the front of the drive stops blinking.)
E-6
These cases are the exceptions: In RAID ADG configurations, any two drives in the array can be replaced simultaneously. In RAID 1+0 configurations, any drives that are not mirrored to other removed or failed drives can be simultaneously replaced offline without data loss.
Time Required for a Rebuild The time required for a rebuild varies considerably, depending on several factors: The priority that the rebuild is given over normal I/O operations (you can change the priority setting by using ACU) The amount of I/O activity during the rebuild operation The rotational speed of the hard drives The availability of drive cache
E-7
The brand, model, and age of the drives The amount of unused capacity on the drives The number of drives in the array (for RAID 5 and RAID ADG)
Allow approximately 15 minutes per gigabyte for the rebuild process to be completed. This figure is conservative, and newer drive models usually require less time to rebuild. System performance is affected during the rebuild, and the system is unprotected against further drive failure until the rebuild has finished. Therefore, replace drives during periods of low activity when possible.
CAUTION: If the Online LED of the replacement drive stops blinking and the amber Fault LED glows, or if other drive LEDs in the array go out, the replacement drive has failed and is producing unrecoverable disk errors. Remove and replace the failed replacement drive.
When automatic data recovery has finished, the Online LED of the replacement drive stops blinking and begins to glow steadily. Failure of Another Drive During Rebuild If a non-correctable read error occurs on another physical drive in the array during the rebuild process, the Online LED of the replacement drive stops blinking and the rebuild abnormally terminates. If this situation occurs, reboot the server. The system may temporarily become operational for long enough to allow recovery of unsaved data. In any case, locate the faulty drive, replace it, and restore data from backup.
E-8
CAUTION: Because it can take up to 15 minutes per gigabyte to rebuild the data in the new configuration, the system is unprotected against drive failure for many hours while a given drive is upgraded. Perform drive capacity upgrades only during periods of minimal system activity.
To upgrade hard drive capacity: 1. Back up all data. 2. Replace any drive. The data on the new drive is recreated from redundant information on the remaining drives.
CAUTION: Do not replace any other drive until data rebuild on this drive is complete.
3. When data on the new drive has been rebuilt (the Activity LED turns off), repeat the previous step for the other drives in the array, one at a time. When you have replaced all drives, you can use the extra capacity to either create new logical drives or extend existing logical drives. For more information about these procedures, refer to the HP Array Configuration Utility User Guide.
E-9
The controller is not running capacity expansion, capacity extension, or RAID or stripe size migration. The controller is using the latest firmware version (recommended).
If you want to move an array to another controller, you must also consider the following additional limitations: All drives in the array must be moved at the same time. In most cases, a moved array (and the logical drives that it contains) can still undergo array capacity expansion, logical drive capacity extension, or migration of RAID level or stripe size. An exception occurs when the array meets all of these conditions: It was originally created on a SMART-2/P, SMART-2DH, SA-3200, SA-3100ES, SA-4200, SA-4250ES, or SA-530x controller. It is moved to a controller that does not have a battery-backed cache. It has less than 4 MB of unused capacity. If a controller contains a RAID ADG logical volume, none of the arrays on the controller can be moved directly to a controller that does not support RAID ADG. (The arrays can be moved indirectly, as described by the instructions in this section.)
When all the conditions have been met: 1. Back up all data before removing any drives or changing configuration. This step is required if you are moving data-containing drives from a controller that does not have a battery-backed cache. 2. Power down the system. 3. If you are moving an array from a controller that contains a RAID ADG logical volume to a controller that does not support RAID ADG: a. Remove or disconnect the drives that contain the RAID ADG logical volume. b. Reboot the server. c. Open ACU and navigate to the controller that contained the RAID ADG volume.
E-10
ACU displays the missing RAID ADG volume using a different icon to indicate that the volume is unavailable. d. Delete the RAID ADG volume. e. Accept the configuration change, and then close ACU. f. Power down the system. 4. Move the drives. 5. Power up the system. If a 1724 POST message is displayed, drive positions were changed successfully and the configuration was updated. If a 1785 POST message is displayed: a. Power down the system immediately to prevent data loss. b. Return the drives to their original locations. c. Restore the data from backup, if necessary. 6. Check the new drive configuration by running ORCA or ACU.
Adding Drives
You can add hard drives to a system at any time, as long as you do not exceed the maximum number of drives that the controller supports. You can then either build a new array from the added drives or use the extra storage capacity to expand the capacity of an existing array. To perform an array capacity expansion, use ACU. If the system is using hotpluggable drives, you can expand array capacity without shutting down the operating system (that is, with the server online) if ACU is running in the same environment as the normal server applications. (For more information, refer to the HP Array Configuration Utility User Guide.)
E-11
The expansion process is illustrated in the following figure, in which the original array (containing data) is shown with a dashed border and the newly added drives (containing no data) are shown unshaded. The array controller adds the new drives to the array and redistributes the original logical drives over the enlarged array one logical drive at a time. This process liberates some storage capacity on each of the physical drives in the array. During this procedure, the logical drives each keep the same fault-tolerance method in the enlarged array that they had in the smaller array.
When the expansion process has finished, you can use the liberated storage capacity on the enlarged array to create new logical drives. Alternatively, you can enlarge one of the original logical drives. This latter process is called logical drive capacity extension and is also carried out using ACU.
E-12
F
Probability of Logical Drive Failure
The probability that a logical drive will fail depends on the RAID level setting and on the number and type of physical drives in the array. If the logical drive does not have an online spare, the following results apply. A RAID 0 logical drive fails if only one physical drive fails. A RAID 1+0 logical drive fails if any two failed physical drives are mirrored to each other. The maximum number of physical drives that can fail without causing failure of the logical drive is n/2, where n is the number of hard drives in the array. In practice, a logical drive usually fails before this maximum is reached. As the number of failed physical drives increases, it becomes increasingly likely that the newly failed drive is mirrored to a previously failed drive. The minimum number of physical drive failures that can cause the logical drive to fail is two. This situation occurs when the two failed drives are mirrored to each other. As the total number of drives in the array increases, the probability that the only two failed drives in an array are mirrored to each other decreases. A RAID 5 logical drive fails if two physical drives fail. A RAID ADG logical drive fails when three physical drives fail.
F-1
At any given RAID level, the probability of logical drive failure increases as the number of physical drives in the logical drive increases. This is illustrated more quantitatively in Figure F-1. The data for this graph is calculated from the mean time between failures (MTBF) value for a typical physical drive, assuming that no online spares are present. If an online spare is added to any of the fault-tolerant RAID configurations, the probability of logical drive failure is further decreased.
F-2
RAID 0
RAID 5
RAID 1+0
RAID ADG
11
16
21
F-3
G
Troubleshooting
Several diagnostic tools provide feedback about problems with arrays. The most important are: ADU This utility can be downloaded from the HP website, http://www.hp.com/support. The meanings of the various ADU error messages are provided in the HP Servers Troubleshooting Guide. POST Messages Smart Array controllers produce diagnostic error messages at reboot. Many of these POST messages are self-explanatory and suggest corrective actions. For more information about POST messages, refer to the HP Servers Troubleshooting Guide. Server Diagnostics To use Server Diagnostics: a. Insert the SmartStart CD into the server CD-ROM drive. b. Click Agree when the license agreement is displayed, and select the Maintenance tab. c. Click Server Diagnostics, and follow the on-screen prompts and instructions.
G-1
Index
automatic data recovery description of E-7 limitation of D-12
A
ACR (Array Configuration Replicator) 5-1 ACU (Array Configuration Utility) 5-1 ADG See RAID ADG ADU (Array Diagnostics Utility) G-1 advanced data guarding See RAID ADG alert, predictive failure E-2 array adding hard drives to E-11 defined D-3 manual configuration of, using ORCA 5-3 mixing drive capacities in 5-1 moving E-9 online spares in D-5 physical limitations of D-4 array capacity expansion E-11 Array Configuration Replicator (ACR) 5-1 Array Configuration Utility (ACU) 5-1 array controller configuration of 5-1 dimensions of C-1 driver installation for 6-1 duplexing of D-12 installation of 1-1, 2-1 power requirements of C-1 Array Diagnostics Utility (ADU) G-1 authorized reseller x
B
batteries, recycling A-6 block of data, defined D-2 boot controller, setting 4-1 boot straps, using B-1
C
cables FCC compliance statement for A-3 part numbers for 2-4 cabling instructions 2-2 capacity extension E-12 capacity upgrade of hard drives E-8 comparison of ACU with ORCA 5-2 of different RAID methods D-10 of hardware-based RAID with softwarebased RAID D-11 of logical drive failure risk for different RAID levels F-3 of RAID methods with other faulttolerance methods D-11
Index-1
Index
configuring array controller 5-1 SCSI ID settings 2-2 server 4-1 controller configuration of 5-1 dimensions of C-1 driver installation for 6-1 duplexing of D-12 installation of 1-1, 2-1 power requirement of C-1 controller installation flowcharts for 1-1 precautions during 2-2 controller order, setting 4-1
E
ESD (electrostatic discharge) B-1 expanding an array E-11 extending a logical drive E-12 external drives 2-4 external storage, powering up and down 2-1
F
fault tolerance See also RAID methods alternative methods of D-11 compromised E-4 controller duplexing as D-12 description of methods D-5 software-based RAID as D-11 FCC notices A-1 features of ACU 5-2 of controller C-1 of ORCA 5-2 of RAID methods D-10 Federal Communications Commission notices A-1 firmware, updating 3-1 flowcharts, controller installation 1-1
D
data block, defined D-2 data protection methods non-RAID D-11 RAID D-5 data rebuild time E-7 data recovery, automatic E-7 data stripes, defined D-2 data transfer rate C-1 Declaration of Conformity A-3 device drivers, installing 6-1 device priority, setting 2-2 diagnosing problems error messages in POST G-1 general G-1 hard drive E-3 dimensions of controller C-1 distributed data guarding (RAID 5) D-8 drive array See array drive failure POST notification of E-3 probability graph F-3 replacing drive after E-5 drive mirroring (RAID 1+0) D-6 drive status LEDs E-1 drivers, installing and updating 6-1
G
graph, drive failure probability F-3 grounding methods B-1
H
hard drive failure detection of E-3 fault tolerance and D-10 multiple, simultaneous D-10 protection against D-5 recognizing E-1 replacing drive after E-5
Index-2
Index
hard drive status LEDs, interpreting pattern of E-2 hard drives adding, to array E-11 capacity of, restrictions on 5-1 different capacity of, on array 5-1 failure of E-3 interpreting status LEDs on E-2 larger capacity, using, in array E-8 LEDs of E-1 minimum number of, for RAID D-10 moving E-9 replacing E-5 status lights on E-1 upgrading capacity of E-8 heel straps, using B-1 hot spare D-4 HP website x
logical drives compared to array D-3 creation of, with ORCA 5-3 defined D-2 enlarging (extending) E-12 failure of E-4 recovery of, options for E-4
M
manual configuration of array 5-3 maximum number of hard drives for RAID 5 D-10 for RAID ADG D-10 minimum number of hard drives for RAID D-10 mirroring of drives D-6 moving drives E-9 multiple hard drive failure D-10
I
Insight Manager 6-1 installing controller hardware 2-1 controller, flowcharts for 1-1 device drivers 6-1 interim data recovery, limitation of D-12 internal drives, connecting 2-3
N
no fault tolerance (RAID 0) D-5
O
online drive capacity upgrade E-8 online spare D-4 option kit part numbers for cables 2-4 Option ROM Configuration for Arrays See ORCA options ROM, updating 3-1 ORCA (Option ROM Configuration for Arrays) 5-1 overview of installation process 1-1
J
jumpers, setting 2-2
L
LEDs on hard drives E-1 load balancing, defined D-7 logical drive capacity extension E-12
P
parity data in RAID 5 D-8 in RAID ADG D-9 part numbers for cables 2-4 parts, handling and storing B-1 peripherals, SCSI ID of 2-2
Index-3
Index
physical drives See hard drives POST messages G-1 power requirements of controller C-1 powering system up and down, caution for 2-1 precautions against ESD B-1 for installing controller 2-2 for setting SCSI IDs 2-2 predictive failure alert E-2 protecting data alternative methods D-11 RAID methods D-5
ORCA 5-1 POST G-1 RBSU 4-1 ROM, updating 3-1 ROM-Based Setup Utility (RBSU) 4-1
S
SCSI bus termination requirement 2-3 SCSI bus, termination of C-1 SCSI IDs, setting 2-2 server configuration 4-1 SmartStart CD, updating firmware using 3-1 spare drives, defined D-4 static-safe containers B-1 status LEDs, on drives E-1 striping data, defined D-2 summary of RAID method features D-10 Support Software CD, updating firmware using 3-1 symbols in text viii system ROM, updating 3-1
R
RAID 0 (no fault tolerance) D-5 RAID 1+0 (drive mirroring) D-6 RAID 5 (distributed data guarding) D-8 RAID ADG (advanced data guarding) D-9 RAID methods See also fault tolerance comparison with alternative faulttolerance methods D-11 comparison with each other D-10 selection chart for D-11 software-based D-11 summary of features D-10 RBSU (ROM-Based Setup Utility) 4-1 rebuild description of E-7 time required for E-7 recovering data, general information for E-4 replacing hard drive E-5 reprogramming ROM 3-1 resources ACR (Array Configuration Replicator) 5-1 ACU 5-1 ADU (Array Diagnostics Utility) G-1 automatic data recovery E-7 Insight Manager 6-1 Management Agents 6-1
T
technical support ix telephone numbers ix, x termination of SCSI bus 2-3, C-1 time needed for data rebuild E-7 troubleshooting See also POST messages general G-1 hard drive problems E-1
U
updating device drivers 6-1 firmware 3-1 Management Agents 6-1 upgrading hard drive capacity E-8
Index-4
Index
utilities ACR (Array Configuration Replicator) 5-1 ACU 5-1 Array Diagnostics Utility G-1 Insight Manager 6-1 ORCA 5-1 POST G-1
RBSU 4-1
W
website, HP x wrist straps B-1
Index-5