Sie sind auf Seite 1von 5

Data Centers

Comparing Data Center & Computer Thermal Design


By Michael K. Patterson, Ph.D., P.E., Member ASHRAE; Robin Steinbrecher; and Steve Montgomery, Ph.D.

he design of cooling systems and thermal solutions for todays data centers and computers are handled by skilled mechanical engineers using advanced tools and methods. The engineers work in two different areas: those who are responsible for designing cooling for computers and servers and those who design data center cooling. Unfortunately, a lack of understanding exists about each others methods and design goals. This can lead to non-optimal designs and problems in creating a successful, reliable, energyefcient data processing environment. This article works to bridge this gap and provide insight into the parameters each engineer works with and the optimizations they go through. A basic understanding of each role will help their counterpart in their designs, be it a data center, or a server.
Server Design Focus

design power (TDP) and temperature specications of each component (typically junction temperature, TJ , or case temperature TC). Using a processor as an example, Figure 1 shows a typical component assembly. The processor is specied with a maximum case temperature, TC , which is used for design purposes. In this example, the design parameters are TDP = 103 W and TC = 72C. Given an ambient temperature specication (TA) = 35C, the required thermal resistance of this example would need to be equal to or lower than: CA, required= (TC TA)/TDP = 0.36 C/W (1) Sometimes this value of CA is not feasible. One option to relieve the demands of a thermal solution with a lower thermal resistance is a higher TC. Unfortunately, the trend for TC continues to decline. Reductions in TC result in higher performance, better reliability, and less power used. Those advantages
ashrae.org

are worth obtaining, making the thermal challenge greater. One of the rst parameters discussed by the data center designer is the temperature rise for the servers, but this value is a secondary consideration, at best, in the server design. As seen by Equation 1, no consideration is given to chassis temperature rise. The thermal design is driven by maintaining component temperatures within specications. The primary parameters being Tc, Tambient, and CA, actual. The actual thermal resistance of the solution is driven by component selection, material, conguration, and airow volumes. Usually, the only time that chassis TRISE
About the Authors Michael K. Patterson, Ph.D., P.E., is thermal research engineer, platform initiatives and pathnding, at Intels Digital Enterprise Group in Hillsboro, Ore. Robin Steinbrecher is staff thermal architect with Intels Server Products Group in DuPont, Wash. Steve Montgomery, Ph.D., is senior thermal architect at Intels Power and Thermal Technologies Lab, Digital Enterprise Group, DuPont, Wash.

Thermal architects are given a range of information to begin designing the thermal solution. They know the thermal
38 ASHRAE Journal

April 2005

The engineers work in two different areas: those who are responsible for designing cooling for computers and servers and those who design data center cooling. Unfortunately, a lack of understanding exists about each others methods and design goals.
Reliability: Operational continuity is server within specications. is calculated is to ensure that exhaust temvital to the success of the data center, so Required air-mover speeds are deterperatures stay within safety guidelines. In addition to TDP and TC , the engineer server reliability receives signicant focus. mined through calculations performed by a For the thermal solution, the items most baseboard management controller (BMC). has several other targets, including: Cost: Servers are sold into very com- likely to fail are air movers. These are typi- The SM then acts to change the air-mover petitive markets and cost is a critical cally redundant to provide for this increased speeds to ensure that the components stay consideration. Todays budget for thermal reliability. Redundancy results in oversizing within specication. Consequently, the SM solutions in servers is typically in the of air-mover capability for normal opera- normally is driving a server to be as quiet as possible while maximizing performance by range of $50 to $75, depending on the tion leading to further inefciencies. Acoustics: The volume of air required keeping component temperatures within, number of processors and features. It is to cool todays servers often creates a noise but not over, their limits. In some instances, desirable to minimize this cost. Weight: Current aluminum and copper problem such that hearing protection may SM enables a customer to choose perforheat sinks continue to expand in size and be required. The area of acoustics is im- mance over acoustics. In these cases, air movers are driven to levels to achieve the surface area to augment heat transfer. The portant enough to describe further. highest thermal performance increased weight of the heat prioritized over acoustics. sinks is a serious issue as the Tambient Heatsink Acoustics specications for processor package and mothercomputing equipment are speciboard must be made sufciently Thermal ed at ambient temperatures, robust to handle the resulting Interface Material typically 23C 2C (73C mechanical load. 4C). Above this range, it is deVolumetric: The space inside Tsink sirable, but not required, to have a server is extremely valuable, a quiet system. As a result, some especially as more computT systems attempt to maintain the ing power and capabilities are quietest possible operation as a added. Using this space for heat Socket competitive advantage. Others sinks and fans is not adding Processor Package sacrifice acoustics to reduce value for the customer. Power: The total power re- Figure 1: Thermal resistance of typical server thermal solution. cost through the elimination of elaborate SM systems. quired for servers is increasing The data center designer must underand driving changes to the data center Server Thermal Acoustic stand, as a result of these SM schemes, infrastructure. The server fans can use Management As mentioned previously, the thermal required airow through a system is greatly up to 10% of the server power.Reducing engineer designing the cooling and reduced when room temperatures, or more all power is a design goal. Many components to cool: Ideally, control system must counterbalance the specically server inlet air temperatures, sizing air-movers to cool the highest need to cool all components in a system are held below 25C (77F). The temperapower component would be sufcient with the necessity of meeting acoustics ture rise through a system may be relatively to cool the remainder of the system. requirements. To achieve this, the server high as a result of that lower airow. Typical systems are designed to deliver Unfortunately, this is rarely the case and management (SM) monitors combinaadditional fans, heat sinks, and ducting tions of temperature sensors and com- about 60% to 70% of their maximum ow ponent use to take action to maintain the in this lower inlet temperature environin the server often are required.
ca

case

April 2005

ASHRAE Journal

39

ment. Monitoring of temperature sensors is accomplished via on-die thermal diodes or discrete thermal sensors mounted on the printed circuit boards (PCBs). Component utilization monitoring is accomplished through activity measurement (e.g., memory throughput measurement by the chipset) or power measurement of individual voltage regulators. Either of these methods results in calculation of component or subsystem power.
Data Center Design Focus

The data center designer faces a similar list of criteria for the design of the center, starting with a set of requirements that drive the design. These include: Cost: The owner will have a set budget and the designer must create a system within the cost limits. Capital dollars are the primary metric. However, good designs also consider the operational cost of running the system needed to cool the data center. Combined, these comprise the total cost of ownership (TCO) for the cooling systems. Equipment list: The most detailed information would include a list of equipment in the space and how it will be racked together. This allows for a determination of total cooling load in the space, and the airow volume and distribution in the space. Caution must be taken if the equipment list is used to develop the cooling load by summing up the total connected load. This leads to over-design. The connected load or maximum rating of the power supply is always greater than the maximum heat dissipation possible by the sum of the components. Obtaining the thermal load generated by the equipment from the supplier is the only accurate way of determining the cooling requirements. Unfortunately, the equipment list is not always available, and the designer will be given only a cooling load per unit area and will need to design the systems based upon this information. Sizing the cooling plant is straightforward when the total load is known, but the design of the air-handling system is not as simple. Performance: The owner will dene the ultimate performance of the space, generally given in terms of ambient temperature and relative humidity. Beaty and Davidson2 discusses typical values of the space conditions and how these relate to classes of data centers. Performance also includes values for airow distribution, total cooling, and percent outdoor air. Reliability: The cooling systems reliability level is dened and factored into equipment selection and layout of distribution systems. The reliability of the data center cooling system requires an economic evaluation comparing the cost of the reliability vs. the cost of the potential interruptions to center operations. The servers protect themselves in the event of cooling failure. The reliability of the cooling system should not be justied based upon equipment protection.
Data Center Background

1. A single rack in a room, and 2. A fully populated room, with racks side by side in multiple rows. Case 2 assumes a hot-aisle/cold-aisle rack conguration, where the cold aisle is the server airow inlet side containing the perforated tiles. The hot aisle is the back-to-back server outlets, discharging the warm air into the room. The hot aisle/cold aisle is the most prevalent conguration as the arrangement prevents mixing of inlet cooling and warm return air. The most common airow conguration of individual servers is front-to-back, working directly with the hot-aisle/cold-aisle concept, but it is not the only conguration. Consider the rack of servers in a data processing environment. Typically, these racks are 42U high, where 1U = 44.5 mm (1.75 in.) A U is a commonly used unit to dene the height of electronics gear that can be rack mounted. The subject rack could hold 42 1U servers, or 10 4U servers, or other combinations of equipment, including power supplies, network hardware, and/or storage equipment. To consider the two limits, rst take the described rack and place it by itself in a reasonably sized space with some cooling in place. The other limit occurs when this rack of equipment is placed in a data center where the rack is one of many similar racks in an aisle. The data center would have multiple aisles, generally congured front-to-front and back-to-back.
Common Misconceptions

Experience in data center layout and conguration is helpful to the understanding of the design issues. Consider two cases at the limits of data center arrangement and cooling conguration:
40 ASHRAE Journal

A review of misconceptions illustrates the problems and challenges facing designers of data centers. During a recent design review of a data center cooling system, one of the engineers claimed that the servers were designed for a 20C (36F) TRISE, inlet to outlet air temperature. This is not the case. It is possible that there are servers that, when driven at a given airow and dissipating their nominal amount of power, may generate a 20C (36F) T, but none were ever designed with that in mind. Recall the parameters that were discussed in the section on server design. Reducing CA can be accomplished by increasing airow. However, this also has a negative effect. More powerful air movers increase cost, use more space, are louder, and consume more energy. Increasing airow beyond the minimum required is not a desirable tactic. In fact, reducing the airow as much as possible would be of benet in the overall server design. However, nowhere in that optimization problem is T across the server considered. Assuming a simple TRISE leads to another set of problems. This implies a xed airow rate. As discussed earlier, most servers monitor temperature at different locations in the system and modulate airow to keep the components within desired temperature limits. For example, a server in a well designed data center, particularly if located low in the rack, will likely see a TA of 20C (68F) or less. However, the thermal solution in the server is normally designed to handle a TA of 35C (95F). If the inlet temperature is at the lower value, the case temperature will be lower. Then, much less airow is required, and if variable ow capability is built into the server, it will run quieter and consume less power. The server airow
April 2005

ashrae.org

(and hence TRISE ) will vary between the TA = 20C (68F) and 35C (95F) cases, a variation described in ASHRAEs Thermal Guideline for Data Processing Environments. The publication provides a detailed discussion of what data should be reported by the server manufacturer and in which conguration. Another misconception is that the airow in the server exhaust must be maintained below the server ambient environmental specication. The outlet temperature of the server does not need to be below the allowed value for the environment (typically 35C [95F]).

if the airow is not adequate, the server airow will recirculate, causing problems for servers being fed the warmer air. If the design basis of the data center includes the airow rates of the servers, certain design decisions are needed. First, the design must provide enough total cooling capacity for the peak, matching the central plant to the load. Another question is at what temperature to deliver the supply air. Lowering this temperature can reduce the required fan size in the room cooling unit but also can be problematic, as the system, particularly in a high density data Design Decisions center, must provide the minimum To understand the problems that (or nominal) airow to all of the work can arise if the server design process cells. A variant of this strategy is that is not fully understood, revisit the two of increasing the T. Doing this alcases introduced earlier. Consider the lows a lower airow rate to give the fully loaded rack in a space with no same total cooling capability. This other equipment. If sufcient cooling will yield lower capital costs but if is available in the room, the server the airow rate is too low, increasing thermal requirements likely will be the T will cause recirculation. Also, satisfied. The servers will pull the Figure 2: The work cell is shown in orange. if the temperature is too low, comfort required amount of air to cool them, and ergonomic issues could arise. primarily from the raised oor distribution, but if needed, from If the supplier has provided the right data, another decision the sides and above the server as well. It is reasonable to assume must be made. Should the system provide enough for the peak the room is well mixed by the server and room distribution airow. airow, or just the typical? The peak airow rate will occur when There likely will be some variation of inlet temperature from the TA = 35C (95F) and the typical when TA = 20 ~ 25C (68F ~ bottom of the rack to the top but if sufcient space exists around 77F). Sizing the air-distribution equipment at the peak ow will the servers it is most likely not a concern. In this situation, not result in a robust design with exibility, but at a high cost. Another having the detailed server thermal report, as described in Refer- complication in sizing for the peak ow, particularly in dense data ence 3, may not be problematic. centers, is that it may prove difcult to move this airow through At the other limit, a rack is placed in a space that is fully popu- the raised oor tiles, causing an imbalance or increased leakage lated with other server racks in a row. Another row sits across the elsewhere. Care must be taken to ensure the raised oor is of sufcold aisle facing this row as well as another sitting back-to-back cient height and an appropriate design for the higher airows. on the hot-aisle side. The space covered by the single rack unit and If the nominal airow rate is used as the design point, the its associated cold-aisle and hot-aisle oor space often is called design, installation, and operation (including oor tile selection a work cell and generally covers a 1.5 m2 (16 ft2) area. The 0.6 m for balancing the distribution) must be correct for the proper 0.6 m (2 ft 2 ft) perforated tile in the front, the area covered operation of the data center, but a cost savings potential exists. by the rack (~0.6 m 1.3 m [~ 2 ft 4.25 ft]) and the remaining It is essential to perform some level of modeling to determine uncovered solid oor tile in the hot-aisle side. the right airow. In this design, any time the servers ramp up Consider the airow in and around the work cell. Each work to their peak airow rate, the racks will be recirculating warm cell needs to be able to exist as a stand-alone thermal zone. air from the hot aisle to feed some server inlets. The airow provided to the zone comes from the perforated This occurs because the work cell has to satisfy its own airow tile, travels through the servers, and exhausts out the top-back needs (because its neighbors are also short of airow) and, if of the work cell where the hot aisle returns the warm air to the servers need more air, they will receive it by recirculatthe inlet of the room air handlers. The work cell cannot bring ing. Another way to visualize this is to consider the walls of air into the front of the servers from the side as this would be symmetry around each work cell and recall that there is no removing air from another work cell and shorting that zone. No ux across a symmetry boundary. The servers are designed to air should come in from the top either as that will bring air at a operate successfully at 35C (95F) inlet air temperatures so if temperature well above the desired ambient and possibly above the prevalence of this recirculation is not too great, the design the specication value for TA (typically 35C [95F]). Based should be successful. on this concept of the work cell it is clear that designers must If the detailed equipment list is unknown when the data center know the airow through the servers or else they will not be is being designed, the airow may be chosen based on historical able to adequately size the ow rate per oor tile. Conversely, airows for similarly loaded racks in data centers of the same
41 ASHRAE Journal ashrae.org April 2005

load and use patterns. It is important to ensure the owner is Effecting Change The use of Thermal Guidelines has not been adopted yet aware of the airow assumptions made and any limits that the assumptions would place on equipment selection, particularly by all server manufacturers. The level of thermal information in light of the trend towards higher power density equipment. provided from the same manufacturer can even vary from The airow balancing and verication would then fall to a com- product to product. During a recent specication review of missioning agent or the actual space owner. In either case, the several different servers, one company provided extensive airow assumptions need to be made clear during the computer airow information, both nominal and peak, for their 1U server but gave no information on airow for their 4U server equipment installation and oor tile set up. Discussions with a leading facility engineering company in in the same product line. If data center operators and designers could convince their Europe provide an insight to an alternate design methodology when the equipment list is not available. A German engineering information technology sourcing managers to only buy servers society standard on data center design requires a xed value of that follow Thermal Guidelines (providing the needed infor28C at 1.8 m (82F at 6 ft) above the raised oor. This includes mation) the situation would rectify itself quickly. Obviously, that is not likely to happen, the hot aisle and ensures that nor should it. On the other if sufcient airow is provided Full Data Center hand, those who own the to the room, all servers will problem of making the data be maintained below the upcenter cooling work would per temperature limits even if help themselves by pointing recirculation occurs. out to the procurement deciUsing this approach, it is sion-makers that they can reasonable to calculate the have only a high degree of total airow in a new design condence in their data center by assuming an inlet temperadesigns for those servers that ture of 20C (68F) (low end adhere to the new publication. of Thermal Guidelines) and As more customers ask for the a discharge temperature of <12 30.106 48.213 66.319 >84.425 information, more equipment Temperature, C 35C (95F) (maximum inlet suppliers will provide it. temperature that should be fed Figure 3: Rack recirculation problem. to a server through recirculation) and the total cooling load of the room. A detailed design Summary The information discussed here is intended to assist data of the distribution still is required to ensure adequate airow center designers in understanding the process by which the at all server cold aisles. thermal solution in the server is developed. Conversely, the server thermal architect can benet from an understanding of The Solution The link for information and what is needed for successful the challenges in building a high density data center. Over time, design is well dened in Thermal Guidelines. Unfortunately, it is equipment manufacturers will continue to make better use of only now becoming part of server manufacturers vocabulary. Thermal Guidelines, which ultimately will allow more servers The data center designer needs average and peak heat loads to be used in the data centers with better use of this expensive and airows from the equipment. The best option is to obtain and scarce space. the information from the supplier. While testing is possible, particularly if the owner already has a data center with similar References 1. Processor Spec Finder, Intel Xeon Processors. http://processorequipment, this is not a straightforward process as the server inlet temperatures and workload can affect the airow rate. nder.intel.com/scripts/details.asp?sSpec=SL7PH&ProcFam=528& PkgType=ALL&SysBusSpd=ALL&CorSpd=ALL. Thermal Guidelines provides information about airow mea2. Beaty, D. and T. Davidson. 2003 New guideline for data center surement techniques. cooling. ASHRAE Journal 45(12):2834. The methodology of the German standard also can be used, 3. TC 9.9. 2004. Thermal Guidelines for Data Processing Environrecognizing recirculation as a potential reality of the design ments. ASHRAE Special Publications. 4. Koplin, E.C. 2003. Data center cooling. ASHRAE Journal and ensuring discharge temperatures are low enough to support continued computer operation. Finally, the worst but all-too- 45(3):4653. 5. Rouhana, H. 2004. Personal communication. Mechanical Engicommon way is to use a historical value for T and calculate neer, M+W Zander Mission Critical Facilities, Stuttgart, Germany, a cfm/kW based on the historical value. November 30. In any case, the total heat load of the room and the airow 6. Verein Deutscher Ingenieure, VDI 2054. 1994. Raumlufttechneed to be carefully considered to ensure a successful design. nische Anlagen fr Datenverarbeitung September.
42 ASHRAE Journal ashrae.org April 2005

Das könnte Ihnen auch gefallen