You are on page 1of 6

Applied Math Modeling White Paper

Improving Data Center PUE Through Airflow Management

By Paul Bemis and Liz Marshall, Applied Math Modeling Inc., Concord, NH
February 2010

Introduction
As energy prices continue to rise and concerns about global warming due to carbon emissions continue to grow, there is a growing motive to lower the PUE (power usage effectiveness) of data centers worldwide. The PUE of a data center is defined as Total Facility Power /Total IT Power. The Total Facility Power is comprised of all the power delivered to the entire data center, and the Total IT Power is defined as only that which is delivered to the IT equipment. A careful

look at this ratio (Figure 1) reveals that the Total Facility Power is dominated by the power to drive the data center cooling system (45%) and the power consumed by the IT equipment (30%). Another way to say this is that 75% of the non-IT power is consumed by the cooling system. By focusing on the power to drive the cooling system and IT equipment as the dominant parameters, an alternative ratio can be defined: Total Cooling Power/Total IT Power, which is often referred to as the Cooling Load Factor (CLF). The Cooling Load Factor is the total power consumed by the chillers, CRACS, cooling towers, pumps and other cooling related equipment, divided by the total IT equipment
Figure 1: The breakdown of power utilization in a typical data center

2010 Applied Math Modeling Inc.

WP103

power. To accurately determine the total annual cost of power to drive the cooling system for a given data center, one must take into account the kind of cooling unit (gas or liquid), the efficiency of the motors that drive the fan and compressors, as well as the specific geographic location of the data center. If power measurements of the equipment are not feasible, estimates must be made that often require detailed knowledge from the cooling manufacturer. Rather than focus on the power required by the cooling equipment, one can instead use the cooling capacity of the equipment. In this sense, another modified ratio can be defined: Total Cooling Capacity/IT Power Consumed. This ratio can be defined as the Cooling Capacity to IT Load Ratio, and by focusing on these two parameters and attempting to drive this ratio down as close as possible to 1.0, the PUE will also decline in direct proportion. The cooling for a given data center consists of two primary components: the total capacity of the cooling system, typically measured in Tons or Kilowatts, and its related airflow, typically measured in Cubic Feet per Minute (CFM). It is important to consider both of these parameters, since the reason for hot spots in many data centers is not the total cooling capacity (this is typically more than adequate) but rather the inability to get the cold air to where it is needed.

Figure 2: A 2500 sq. ft. data center that could operate more efficiently

Baseline Case
To illustrate this point we will use computational fluid dynamics (CFD) to consider a hypothetical data center of 2500 square feet as illustrated in Figure 2. For this data center, eight Liebert FH600C cooling units are deployed in a slightly staggered (asymmetric) pattern around the perimeter of the room, creating a total cooling capacity of 1724 kW. The thermal load consists of six rows of equipment racks, each row containing 20 racks, and each rack with a thermal load of 7 kW for a total of 840 kW. This results in a Cooling Capacity to IT Load Ratio of 2.0, a full 100% higher than should be required to cool the equipment. Notice, however, that the airflow supplied by each of the eight FH600C units is only 17100 CFM, creating a total airflow capacity of 136,800 CFM. Each 7 kW rack requires 1091 CFM to keep the temperature rise across the rack to a 20F maximum, so with 120 racks in the room, the total rack demand is 130,920 CFM, nearly
2
WP103

2010 Applied Math Modeling Inc.

5% more than is supplied by the Liebert cooling units. This will become a significant consideration when attempting to reduce the overall power consumption. One way of improving the PUE for this data center is to reduce the Cooling Capacity to IT Load Ratio. The Liebert FH600C uses an 11kW centrifugal blower to supply air to the data center. If we assume that the cost of electricity is \$.10/kW-hr, the annual cost of operating just the blower for this unit would exceed \$10,000, and would be nearly twice that amount when including the work done by the compressor. Shutting down one of these units would reduce the PUE and save money. The question, however, is whether or not this can be done without causing excessive temperatures at any of the server inlets. While shutting down a CRAC unit looks like a viable option, only a CFD model can identify which CRAC is the best one to shut down and whether doing so will result in troublesome hot spots on any of the equipment. Figure 3 illustrates the rack inlet temperatures in the data center with all CRACs operating normally. As can be seen, there are already hot spots located at the ends of the rack rows. In some cases, the rack inlet temperatures exceed the ASHRAE recommended maximum of 80.6F. The maximum rack inlet temperature for this case is 82F and
2010 Applied Math Modeling Inc.

the maximum temperature in the room is 91F. Turning off both the fan and coil on any of the 8 CRAC units would create a scenario where the total cooling capacity would be sufficient, but due to the lack of proper airflow to some servers, extreme temperatures may result. Using CFD, it is a straightforward matter to test this possibility and find out the consequences when each one of the CRACs is disabled. To compare scenarios, a CFD model was created using CoolSim that allowed a series of 8 simulations to be run concurrently, each with a CRAC unit shut off in a round robin fashion. A summary of the simulation results is presented in Table 1. The best case, highlighted in green, corresponds to the elimination of CRAC F (lower corner on the left in Figure 3). It has the smallest impact on the maximum rack inlet temperature, and drives up the maximum temperature in the room by only 3 degrees, from 91F to 94F, according to the detailed CFD output reports. The reFigure 3: Baseline model rack inlet temperature profiles

WP103

sulting Cooling Capacity to IT Load Ratio decreases by 1/8 or 12.5% when this CRAC is disabled, reducing the annual operating cost by thousands of dollars. But even in the best case when CRAC F is shut off, the rack inlet temperatures still reach a peak of 85F in one of the racks, exceeding the ASHRAE recommended maximum for inlet temperature. Therefore the approach of simply turning off one or more CRAC units will not work for this data center without first making some adjustments to the room configuration to improve the thermal efficiency.

pending on the resulting room temperature, this approach may not be comfortable for service technicians or administration personnel working in the room. The opposite problem occurs with hot aisle containment, as the entire room becomes part of the cold supply, driving the ambient room temperature downward. In this scenario, however, there is additional heat contributed by other objects in the room such as walls, UPSs, lights, and other equipment. The additional heat tends to increase the ambient temperature in the room, but if the supply air is well directed towards the rack inlets, the additional heat will have less impact on the equipment. Cost is also a primary decision factor as containment strategies of any kind require modifying the data center while in operation. Building virtual models of these two approaches can help ferret out which one is optimal for a given data center layout. While complete cold aisle containment is possible in a data center with a room return, complete hot aisle containment is not, since it requires a ceiling return. Thus two partial containment strategies were considered in which impermeable walls are positioned at the ends of either the hot or cold aisles. The modified scenarios are shown in Figure 4.
3 C 86 95 4 D 91 96 5 E 87 93 6 F 85 94 7 G 86 95 8 H 87 93

Improving Thermal Efficiency

There are two common methods for improving the thermal efficiency of data centers: hot aisle containment and cold aisle containment. To help understand which is more effective on this specific data center, the initial model can be quickly modified to consider each scenario so that the outcomes can be compared. There are several things to consider when trying to decide which approach is best for a given data center. For example, cold aisle containment is typically less expensive to implement because perforated tiles are often located near the rack inlets and therefore less duct work is required. But by fully containing the cold supply air, the rack exhaust drives the ambient room temperature up. DeSimulation Number CRAC Unit Off Max Rack Inlet Temp (F) Max Ambient Temp (F) 0 N/A 82 91 1 A 89 96 2 B 89 95

Table 1: Comparison of maximum rack inlet and ambient room temperatures for 8 trials of the baseline model where one CRAC was shut off for each trial; Simulation 4 generated the worst results, Simulation 6 the best; Simulation 0 has all CRACs turned on
2010 Applied Math Modeling Inc.

WP103

Table 2 shows a comparison of the two containment approaches with all CRACs on using the maximum rack inlet temperature and maximum room temperature as common metrics. In both cases, no other heat sources in the room were included. Both methods drop the maximum rack inlet temperature compared to the original case with no containment, but the partial cold aisle containment strategy is preferable. The difference between the strategies may be due to the fact that there are three containment regions for the cold aisle containment case compared to two for the hot aisle containment case. More contained space may lead to reduced mixing between the hot and cold air in the Figure 4: a) Partial cold aisle containment and b) partial room. For the cold aisle containment hot aisle containment, both achieved by placing imperstrategy, the maximum inlet tempera- meable walls at the ends of the aisles ture drops by 4 degrees to 78F, compared to a drop of only 1 degree for the hot operates with CRAC C turned off, the maxiaisle containment case. Partial cold aisle mum rack inlet temperature is the same as it containment leads to a 7 degree drop in the was in the baseline case with all CRACs on. maximum ambient room temperature as well. The maximum rack inlet temperature is still above the ASHRAE recommended maxiUsing partial cold aisle containment, the ismum value (80.6F), but it is well below the sue of reducing power consumption by the ASHRAE allowed maximum value (90F). cooling system can be considered once again. This exercise is evidence of the importance In Table 3, the results of a CRAC failure of using flow simulation to assess modificaanalysis indicate that if the data center now tions to a data center and determine Maximum Rack Maximum Ambient which, if any, coolContainment Method Inlet Temperature (F) Room Temperature (F) ing units can be disNo Containment 82 91 abled to improve Cold Aisle Containment 78 83 data center effiHot Aisle Containment 81 89 ciency.
Table 2 Comparison of maximum rack inlet and ambient room temperatures for cold aisle, hot aisle, and no containment strategies with all CRACs operating
2010 Applied Math Modeling Inc.

WP103

Simulation Number CRAC Unit Off Max Rack Inlet Temp (F) Max Ambient Temp (F)

0 N/A 78 83

1 A 85 89

2 B 87 90

3 C 82 86

4 D 91 96

5 E 87 93

6 F 86 92

7 G 83 91

8 H 88 92

Table 3: Maximum rack inlet and room temperatures using partial cold aisle containment for 8 trials with one CRAC turned off for each trial; Simulation 3 yields the best results, Simulation 4 the worst; Simulation 0 has all CRACs on

Another benefit to using a containment strategy is the improvement in overall reliability of the facility. Without any containment, the CRAC failure analysis predicted worst case rack inlet temperatures as high as 91F, above the ASHRAE allowed maximum value. However, by adding a simplified partial cold aisle containment solution, the reliability of the data center has been increased. That is, while the data center can be run with all 8 CRACs on, the results show that if any unit except CRAC D fails or must be taken down for servicing, the maximum rack inlet temperatures will not exceed 90F. In summary, this particular data center was used to illustrate how CFD can be used to compare some of the many techniques available to improve PUE. When striving to improve PUE, data center managers should focus on the Cooling Load Factor as a primary target, along with the purchase of energy star rated equipment. If the cooling power consumption values are not readily accessible, then focusing on the Cooling Capacity to IT Load Ratio is a reasonable alternative. To test if reductions in cooling are feasible, CFD can be effectively used to compare and contrast alternative approaches. Of course, modeling is not meant to be a substitute for good engineering. CFD models are based on assumptions, so the results should be validated
2010 Applied Math Modeling Inc.

with measurements to ensure that the model represents real world phenomena. Yet modeling will always produce a relative comparison between one design approach with another and is a helpful mechanism for supporting the decision making process. The PUE metric is most heavily influenced by the power to drive the IT load and the cooling necessary to sustain the resulting thermal load. By focusing on how the cold air is delivered to the servers and the hot air is returned to the CRACs, the thermal efficiency of a data center can be improved significantly. Understanding the air flow patterns presents opportunities to reduce the existing cooling capacity and its related costs, improve the reliability of the data center, or add more IT equipment to an existing data center without the need to add more cooling capacity. Any of these outcomes will also reduce the overall data center PUE. By focusing on improving airflow, managers can get more output from existing cooling capacity without expensive capital expenditures associated with adding or upgrading cooling units. With todays high density servers and increased rack thermal loads, traditional back of the envelope calculations are not sufficient without the aid of a CFD modeling tool.

WP103