You are on page 1of 6

Technology Brief

Maximizing Solaris Quality of Servicethe Role of Fibre Channel HBAs


What to Look For In Noreboot Solaris Drivers
Introduction Quality of service (QoS) was once a metric that applied only to data networks, and referred to the fact that specific performance characteristics (such as latency, bandwidth and error rates) can be measured, improved and even guaranteed to users or applications. As servers and storage have become more critical to businesses of all types, the concept of quality of service has extended to the highly complex systems formed by enterprise servers connected to storage area networks (SANs.) In this environment, QoS means the reliable, effective combined operation of every component from the server hardware to the server operating system, applications, and device drivers to the disk arrays, switches and HBAs (the host bus adapters that link the server to the networked storage). HBAs are a critical ingredient in QoS. The behavior of the HBA (such as its ability to establish lowlevel connections at boot time or during card insertions and removals, and to make judicious use of interrupts) affects the server hardware initialization and management because the HBA resides on the servers PCI bus. It also affects the server software since the HBAs drivers interact with the server operating system and the system management tools running on the server. The HBAs also affect the SAN through their roles as protocol converters I/O initiators, and as objects managed by SAN directories. QoS encompasses reliability, availability, efficiency, usability and maintainability (the ability to manage changes and recover from errors). These issues are particularly critical for missioncritical applications supported by enterpriseclass systems such as highend Sun Solaris servers. The HBA and its driver must support very specific capabilities to allow customers to discover, connect to, configure and manage dynamic changes to storage devices without expensive server downtime or reboots.

www.emulex.com

What is SAN Quality of Service? In a networked storage environment, QOS is a combination of reliability, availability, efficiency, usability and maintainability. Reliability Do your missioncritical systems, and their associated storage networks, run when you need them? Availability Do your servers and associated storage remain available even as components are upgraded and replaced? Efficiency Can you make the most efficient, costeffective use of your storage components and your storage management staff? Usability How easy is it for you to monitor system behavior, apply changes and verify the effect of these changes? Maintainability Can you find and fix problems quickly and easily? Reliability is most often defined as the percent of the time a system is available compared to the time it is needed. One critical reliability factor, of course, is the mean time between failures for hardware such as the HBAs. Another factor that is equally as important, but often overlooked, is the quality and depth of the interoperability testing done to ensure the drivers and firmware provided by each hardware vendor can work with the applications, firmware and hardware from other leading vendors. Subtle interoperability problemsthose which cause an application outage when a specific combination of system components fail to communicatecan cause productivity sapping application outages. However, such incompatibilities are often harder to pinpoint than a basic failure, such as a driver that does not load

or cannot see its target storage devices. The best insurance against such vexing issues is to verify that the vendors for these components (storage arrays, switches, HBAs, servers, databases, backup applications, etc) have tested their components to be sure they work with each other. Storage array and application vendors who certify complete solutions describe in detail a composite environment, including a specific version of the OS, patch levels, HBAs, and versions of drivers which they test to work and pledge to fully support. When a customer purchases such a complete solution the vendor takes responsibility for fixing any interoperability problems rather than engaging in fingerpointing with other technology providers over whose fault the problem is. This results in faster and better service for customers, lower support costs and higher application uptime. Availability means keeping the application running as necessary changes are made to software (such as security patches to Solaris or upgrading to a new version of a business application) or to hardware (such as adding or upgrading drive arrays to increase storage capacity.) The storage environment also changes as applications are moved from test to production environments, or as business partners are given access to databases to facilitate ecommerce or joint development efforts. Efficiency means getting the most use out of hardware and software at any given moment in time, as well as over the longer term without having to retire useful hardware or software before its useful life has ended. At any given moment, efficiency can be measured as the percent of the available storage and I/O bandwidth being used, as well as the cost (in staff and other overhead) to manage each terabyte of data. Factors influencing such shortterm efficiency include the ability to easily make use of new resources (such as additional storage arrays,

www.emulex.com

a capability supported by noreboot). Another critical factor is the throughput of storage network components such as HBAs and their associated drivers, as well as the ability to tune, or throttle, HBAs so they dont overload slower disks or disks that are temporarily doing multiple tasks (such as performing backup while providing data to a transaction processing application). In such a complex environment, there is no single optimal set of parameters or configuration variables that will make all systems perform efficiently, but rather a sweet spot that each system will reach based on the talent of its administrator and the tools he has to work with. Among other capabilities, such tools should allow the administrator to dynamically tune key HBAcontrolled parameters such as wait times and queue depth to optimize throughput for a specific application, server and storage configuration. They also provide the ability to coalesce I/O (to adjust the rates at which data is streamed within the server) so as make the optimum use of server resources such as the CPU and the server bus. Longterm efficiency means getting the most from the customers investment in hardware and software over the life of those products. This means ease of migration and upgradability; the ability to use the same drivers with multiple generations of HBAs, as well as with multiple versions of operating systems. It also requires stable interfaces between hardware (such as a new generation of adapters) and the software drivers for the new adapters, which enable the customer to build on previous vendor crosstesting, the customers experience and their skill in finding the efficiency sweet spot for their storage systems. Usability involves the ability to monitor the behavior of all system components, especially the HBA that controls access to storage resources, for realtime failure detection and to gather ongoing performance

and error rate information. Administrators should be able to use the same graphical environment to immediately view changes to the configuration, and to finetune the environment to reach the systems efficiency sweet spot. Maintainability is the ability to effectively and cost efficiently detect, diagnose and solve problems which affect the availability, reliability or efficiency of the storage network. This means discovering problems through alerts and troubleshooting those problems by discovering and analyzing the status of components in the storage network in real time through dynamic logging. Noreboot Solaris Drivers Noreboot storage drivers allow Solaris customers to change the physical or logical storage environment (or to fix problems) without timeconsuming reboots of the server that relies on that storage. Rebooting an enterprise database server in a complex networkedstorage environment can take 30 to 45 minutes or more, as the server finds and recognizes all components in the storage network such as HBAs, switches, and storage arrays, as well as all the links and tables in the application, database, directories, messaging platforms, naming services and security managers. The boot sequence is a delicate time, where a number of factors, such as components that are temporarily offline or file systems that fail to load because they have grown beyond their allocated limits, can abort the reboot, triggering in turn more troubleshooting and delays. Avoiding the delays and risks involved in any reboot is particularly crucial for large servers running missioncritical applications, such as Sun Solaris mainframes. Many vendors promise noreboot capabilities, but not all noreboot drivers are created equal. Customers should demand noreboot storage drivers that have been tested with a wide variety of adapters, with multiple versions of Solaris,

and that support highavailability solutions such as server clustering. To prevent timeconsuming server crashes and compatibility glitches, the driver must be tested for compatibility with mainstream storage management tools, as well as with leading application and database platforms. The first noreboot requirement is the ability to insert a new adapter, or to hotswap an adapter, without bringing down the server. This in turn requires two capabilities. The first is that the new adapter can be recognized by the server. Solaris accomplishes this through Suns Dynamic Reconfiguration, which sends a query from the servers PCI bus to the adapter firmware. The firmware responds with the identity of the HBA, enabling the servers firmware to report to the operating system the presence and identity of the HBA, after which Solaris associates the HBA with the proper driver. The second requirement is that the driver can add the new adapter to the servers configuration, and initiate a SCSI query to find out what targets (such as disk arrays) may be connected to this driver. Customers should also choose noreboot drivers that support as many common storage functions as possible. This includes the addition of new storage arrays and new HBAs to accommodate additional networked storage or new storage topologies. Even a noreboot driver can slow performance if it must interrupt the I/O stream while it adjusts to a change in the server or storage environment. Customers should look for noreboot drivers that can perform such tasks dynamically, without interrupting the flow of data. Among the tasks a noreboot driver should handle dynamically are the discovery of new storage targets, such as disk arrays or tape libraries; the provisioning (creation) of new logical units (LUNs); the updating of firmware or drivers in switches or HBAs; as well as updates to Solaris and the storage management tools, databases or business applications running on it.

Noreboot drivers should also support the dynamic addition of storage resources. In most cases, the customers will want to associate this new resource discovery with persistent binding, which assigns the same SCSI device identifier (SCSI ID) to a physical storage device each time the server reboots. Without persistent binding, Solaris might assign a different identifier to a storage device each time the server reboots based on the order in which the operating system recognizes the devices. If and when a physical drive needs to be replaced, administrators should also be able to assign that SCSI ID to a new physical device without rebooting the server. The drivers ability to support network changes is also crucial to allow, for example, changes in the storage network topology to support new protocols or higher speeds, and to tune (or throttle) the rate at which HBAs send data to target devices. For example, an administrator might want to set different timeouts or differentsized data queues for different targets depending on the characteristics of the storage hardware, and the needs of the application accessing it. Customers should also determine whether features such as dynamic LUN and target throttling are truly dynamic and automatic. Some drivers, for example, require that all I/O to a specific HBA be stopped before the throttle (the number of I/O requests sent to the array) can be reset. In addition, only some drivers can automatically reset the throttle to maximize throughput, while others require the administrator to reset the throttle manually. Finally, maintainability demands that noreboot drivers support dynamic event logging, which allows an administrator to begin recording the status and performance of networked storage components without reconfiguring the system, rebooting or even interrupting I/O. This is important because rebooting, or interrupting the flow of data, might interrupt the problem condition the administrator

is trying to diagnose. The administrator should also be able to selectively specify the types of messages that will be recorded, such as initialization events, link discovery events, Fibre Channel Protocol (FCP) events or Internet Protocol (IP) events. The Emulex Solaris Noreboot Driver The latest 6.00 generation of the Emulex Solaris Noreboot Driver is uniquely suited to deliver the highest reliability, availability, efficiency and resiliency to missioncritical storage environments. First, it is tested with the widest variety of storage hardware and software, supporting all versions of Solaris, including 2.6, 7, 8 and 9, as well as the Sun Cluster 3.0 solution for clustering Solaris Servers. Emulex adapters are extremely reliable, with a fieldmeasured MTBF (Mean Time Between Failures) in excess of 4 million hours (or 456 years of nonstop operation). Emulex drivers and adapters fully support Suns Dynamic Reconfiguration, enabling a customer to hotplug or hotswap adapters during operation as needed. To provide the maximum availability, the Emulex Noreboot Driver supports dynamic target discovery and configuration changes. This provides not only persistent binding but Emulexexclusive capabilities such as dynamic target and LUN throttling. It also supports I/O coalescing, which allows storage administrators to optimize the flow of data to make the most efficient use of server I/O resources. The 6.00 driver allows administrators to dynamically fine tune 17 HBA parameters, giving storage customers an industry-leading level of finetuned control in selecting their operational sweet spot. With the Emulex Solaris Noreboot Driver, storage managers can dynamically load or unload drivers for testing or updating without either rebooting the server or interrupting I/O operations. Another compelling feature is the ability to update adapter firmware on the fly while the adapter is in service.

This allows the updating of adapter firmware without interrupting operation of the adapter, the server or the application, unlike competitive products that require one or in some case two reboots to perform the same operation. To aid troubleshooting and diagnostics without slowing performance, it also allows operators to create a log file to document and isolate issues in the SAN (not just in the HBA) without interrupting I/O; as well as providing a set of filters that allow the user to select which type of issues will be logged. The Emulex Noreboot Driver also provides industryleading usability, thanks to the HBAnyware graphical management suite, which allows the user to visualize their SAN attached storage configuration, from each HBA port down to the individual array and LUN, and to monitor offline targets and controllers, I/O rates, queue depths and a variety of additional data that will support tuning and troubleshooting decisions. It also provides pointandclick capabilities for all parameter changes described above, creating an interactive optimization environment. Like all Emulex drivers, the Solaris Noreboot driver is portable across all generations of Emulex HBAs, as well as among all versions of Solaris, to provide the maximum investment protection for customers. This backwards compatibility, which is unique to Emulex, allows customers to use the same driver on several generations of adapters, which minimizes test issues while maximizing the lifespan of the customers HBAs. Finally, it is backed by the testing, quality assurance and industry partnerships offered by the world leader in HBAs, with over 1.2 million HBAs installed worldwide.

www.emulex.com

Conclusion As networked storage becomes more mission critical, quality of servicea term once reserved for data networksbecomes more vital for storage networks. Quality of service in a storage network requires drivers that can provide not only noreboot capabilities such as discovery of new hardware and LUNs (logical units,) but also storage management functions that enhance the reliability, availability, usability and maintainability of the storage network. For system administrators interested in technical tips related to specific Solaris HBA driver operating instructions, please see document entitled: Emulex Solaris Drivers - No-Reboot Operation (http://www.emulex.com/products/white/index.html)

About Emulex Emulex Corporation is the world leader in Fibre Channel HBAs and delivers a broad range of intelligent building blocks for next generation storage networking systems. Emulex ranked number 16 in the Deloitte 2004 Technology Fast 50. The worlds leading server and storage providers rely on Emulex HBAs, embedded storage switching and I/O controller products to build reliable, scalable and high performance storage solutions. The Emulex award-winning product families, including its LightPulse HBAs and InSpeed embedded storage switching products, are based on internally developed ASIC, firmware and software technologies, and offer customers high performance, scalability, flexibility and reduced total cost of ownership. The companys products have been selected by the worlds leading server and storage providers, including Dell, EMC, Fujitsu Ltd., Fujitsu Siemens, Bull, HP Hitachi Data , Systems, IBM, NEC, Network Appliance, Quantum Corp., StorageTek, Sun Microsystems, Unisys and Xyratex. In addition, Emulex includes industry leaders Brocade, Computer Associates, Intel, McDATA, Microsoft and VERITAS among its strategic partners. Corporate headquarters are located in Costa Mesa, California. News releases and other information about Emulex Corporation are available at http://www.emulex.com.

This report is the property of Emulex Corporation and may not be duplicated without permission from the Company. | October, 2004 This document refers to various companies and products by their trade names. In most, if not all cases, their respective companies claim these designations as trademarks or registered trademarks. This information is provided for reference only. Although this information is believed to be accurate and reliable at the time of publication, Emulex assumes no responsibility for errors or omissions. Emulex reserves the right to make changes or corrections without notice.

Corporate HQ 3333 Susan Street Costa Mesa CA 92626 714 662.5600 | Wokingham U.K. (44) 118 977 2929 | Paris France (33) 41 91 19 90 | Beijing China (86 10) 6849 9547
05-092 10/04