Sie sind auf Seite 1von 11

PAPER # 6151 FORECASTING + MODELING: A PARTNERSHIP TO PREDICT AND PREVENT CAPACITY BOTTLENECKS

Margaret Churchill HyPerformix, Inc. 4301 Westbank Drive, Bldg A, Ste 300 Austin, TX 78746 mchurchill@hyperformix.com Martha Hays SAS Institute 100 SAS Campus Drive Cary, NC Martha.hays@sas.com

Capacity management includes the monitoring of critical applications to understand when servers and service levels are close to or above performance thresholds. Adding forecasting and modeling to the capacity management process allows the IT organization to anticipate when a problem will occur in the future and prescribe the right solution, effectively preventing IT fires. This paper will compare and contrast various forecasting techniques for predicting when bottlenecks will occur. Once a prediction is made, modeling can be used to determine the impact of possible changes to fix a bottleneck and prescribe the right change.

Introduction

patterns, server configurations, network infrastructure, and storage arrays. IT fire-fighting is expensive, time-consuming, and distracts the organization from spending time on innovative solutions that add value back to the business. [FOR2005]. However, IT fire prevention is seldom practiced. The combination of forecasting and modeling, as part of a capacity management process, provides the insight needed into future IT events, and effectively prevents the IT fires. The combination of these two methods also extends capacity management from a silo or server-centric view, to a broader end-to-end view. This allows predictions to be made about utilization, response time, workload growth, workload changes and infrastructure modifications. For example, todays systems may be running within acceptable thresholds, and a linear trend shows that there is enough capacity on the servers to sustain a 5% fixed workload growth over the next three months. However, this trend does not account for a spike in demand that is expected for the upcoming holiday season. By using a time-series forecast, we can predict that the servers will bottleneck at the beginning of the

Critical IT systems are managed by monitoring day-today activities to ensure they are performing within stated utilization and performance thresholds. Responding to system alarms causes the operations team to react to the problem and implement an immediate change. These changes may cause unforeseen consequences on other parts of the environment, resulting in a ripple or cascade effect across the enterprise. A capacity management process will instill a discipline of preparing for the future and planning for appropriate changes well before the system alarms go off. This allows the IT organization to predict when thresholds will be reached and prescribe the right changes. By doing so, better decisions can be made about server consolidation, hardware procurement, and service level management. Forecasts provide advance notice that an outage or other type of problem is likely to occur. Once the predictions have been made, what is the best way to prevent the problem from actually occurring? Modeling delivers this capability through what-if scenarios that identify the impact of planned or unplanned changes to the system in areas such as workload levels, workload

peak shopping season. A model is built from the forecasted data and is used to evaluate several configuration changes that could support the increased workload. This analysis allows us to plan for the changes needed to the system to support the seasonal demand. With ample time before the peak season starts, we have the time to properly procure and plan for the changes, avoiding emergency procedures and expensive procurement. We have maintained user service and utilization levels and have prevented an IT fire. This paper will describe the value of combining forecasting with modeling as part of a capacity management process. This includes the following steps: Capture data from the system monitors Gather information regarding planned business demands Analyze the data to determine the right baseline and pattern of the time series data Select an appropriate forecasting method and fit a statistical model Generate a forecast and include all business forecast events Select a modeling approach Build a model based on the historical and forecasted data Validate the model for accuracy Use the model for what-if analysis

The ITIL processes management are

that

encompass

capacity

Monitoring of performance and throughput of IT services and the supporting Infrastructure components Undertaking tuning activities to make the most efficient use of existing resources understanding the demands currently being made for IT resources and producing forecasts for future requirements influencing the demand for resource, perhaps in conjunction with Financial Management producing a Capacity Plan which enables the IT service provider to provide services of the quality defined in service level agreements (SLAs). [ITIL]

Figure 1 shows the relationship of monitoring and tuning existing systems versus planning for possible changes in the future. Once changes are made, either planned or unplanned, the cycle repeats with continued monitoring and then updating of the plans.

If a correct forecast is used by itself, we will have an accurate prediction but may not recommend the right preventive measure. If a validated model is used by itself, we may not trust its accuracy. It is the combination of both that provides accurate predictions and prescriptions for the optimal outcomes. These are important pieces of the capacity management process and result in better overall planning and management of the IT environment. 2 Capacity management Overview

Figure 1: ITIL capacity management Cycle. An ideal capacity management process will predict the impact of change, and then monitor the result to validate that the change behaved as expected. Capacity management evolves once an organization has implemented a performance management system that focuses on monitoring current systems and reacting to alarms that result from exceeding utilization thresholds or service levels. Since there must be an actual historical basis for the predictions, the data captured from performance monitors is required for the capacity planning process. Even if systems monitoring and event management is

According to the IT Infrastructure Library (ITIL), capacity management is responsible for ensuring that the capacity of the IT Infrastructure matches the evolving demands of the business in the most cost-effective and timely manner. [ITIL]

taking place, the resulting data must be stored for analysis and reporting purposes. A great deal of capacity management information can be uncovered from basic analysis of the system and event data. These include the following: Which systems are experiencing outages or exceeding utilization thresholds? Is there a pattern based on time, day, month, etc? Are the events consistent with changes in workload? Is there correlation between multiple events? When an issue occurs, what is the impact on the other systems?

seasonality, event correlation, and additional input variables to support the impact of business events or application changes. In the forecasting domain, a model is a statistical representation of the pattern of the data. This pattern is used by the statistical algorithm to forecast future values. Modeling is representing system behavior to make predictions based on workload, hardware configurations, software components, network devices, utilizations and/or other model parameters. Modeling methods include simulation and analytical/queuing theory. These types of models can be used to perform what-if analysis of changes to the system, workloads, or response times. Models are built by: Defining workloads as user-initiated business functions or as an aggregated set of processes. Defining system resources consumed by each workload. These can be measured or estimated. Defining the transaction flows for each workload. This information provides a system roadmap of the message traffic, visit counts, and message sizes. Defining the configuration details of the servers and network components.

For further analysis, data can be captured from the system monitors and stored in a centralized performance data repository. Since information from heterogeneous systems may have been collected at different time intervals, and in different time zones, an extraction, transformation, and load (ETL) is used to homogenize the data. Once done, reports can be created that provide detailed historical analysis. These reports are useful to show a correlation of past events and past system responses to those events. Once the data is captured, forecasting and modeling can be done. ITIL, however, does not differentiate between the two, as depicted in Figure 2.

After the models are built, they are validated for accuracy and then used for what-if analysis. The forecasted values, like growth rates, are entered in the model to determine the impact to capacity. If the model shows that capacity will be exceeded or that response time will be negatively impacted, the model can be used to prescribe which change (to the server(s), network, etc) is the most efficient. Figure 3 shows the relationship of forecasting and modeling for capacity planning. Forecasting includes simple linear trending, time-series analysis, event correlation and statistical models. Modeling includes the creation of a software representation of an actual system that is used to forecast future values and perform what-if analysis.

Figure 2: ITIL capacity management Processes In practice, the two methods are different even though they use some of the same terminology. Forecasting is making a prediction based on analysis of historical data. Trending is a simple form of forecasting, based on linear regression of three or more points. Advanced time-series forecasting methods include

Figure 3: Relationship of Forecasting and Modeling to support a complete capacity planning process 3 Forecasting Figure 4: Linear Trend Business decisions based on this trend will result in a false confidence that the system will continue to operate properly. As a result, basic linear trending is not recommended for making predictions of complex systems. For systems that experience pattern changes due to seasonality or planned business promotions, robust forecasting like time series with seasonality, trend and event correlation is recommended. Another problem with the linear trend is that it cant be used for systems that are close to experiencing a bottleneck. These systems will start queuing and/or consuming additional system overhead. This will cause the utilizations to skew, which will not be identified through a linear trend. Notice how a prediction of seasonal behavior from the chart above would have resulted in underestimating the capacity needed. 3.2 Time Series Forecasting Time Series Forecasting analyzes time series variables and forecasts future values by extrapolating trends and patterns in the past values of the series or by extrapolating the effect of other variables on the series. With the use of sophisticated statistical software, a forecasting model can be developed and customized to best predict your time series. In the IT environment these time series are easily obtained from your performance measurement data which has been stored and summarized in a capacity database (CDB). By choosing the proper time intervals (day, week or month) and variables as input, your historical data can lead you to a justifiable forecast of capacity requirements. Forecasting is accomplished through the following: Storing and summarizing the data

Forecasting is the ability to predict the future through analysis of existing data. Linear trending can be accomplished using simple spreadsheet extrapolation or more sophisticated regression analysis. The most accurate forecasting for IT data is time series forecasting. 3.1 Trends A common report generated from historical data is a linear trend. This is the quickest and simplest report and is supported by spreadsheet software. Trend lines are based on an average of three or more historical data points extended to some future point. Trend lines are appropriate when future system behavior is expected to be at the same rate as the historical data with no seasonality. In cases of seasonality or expected workload changes, trends will provide erroneous results. This is depicted in Figure 4. The forecast would indicate that there is sufficient capacity to support the system. However, when a new website comes online, the web server is unable to handle the traffic. As a result, the trend provided an incorrect result since it was not able to predict the impact of this workload change.

3.2.1

Analyzing the data to determine seasonality and other patterns Selecting an appropriate forecasting method Generating a forecast which includes expected business projections and events Data Analysis

Each of these methods also has multiple parameters, so the selection of the best fit model is not a simple undertaking and needs some understanding of statistics. However some high performance forecasting environments provide an automatic selection for you.

Data preparation is a vital step in time series forecasting. Time-series data must be equally spaced and have no missing values. A statistical software product that can transform unequally spaced time series into equally spaced time series and can extrapolate the missing values should be selected. To make sure the data is valid and to formulate forecast models for a time series, a time plot of the data is produced. From this series plot seen in Figure 5, you can readily identify features and patterns of the data. Figure 6: Seasonal Exponential Smoothing model

Figure 5: Resource usage on a monthly basis 3.2.2 Forecasting Methods Figure 7: An ARIMA model You can also produce forecasts by combining the forecasts from several models, which would support multiple workloads on the same server or the process of combining workloads for server consolidation. Figure 8 shows two forecasted workloads being combined onto one server.

To select a forecasting method, it is important to identify any trend function, the seasonal component of the series, and any irregular component. After including these components, multiple methods with different parameters can be use to fit a model to the time series. The method used to compare the models is an accuracy statistic. The model with the best accuracy statistics is said to be the best fit model of the data and will be used for our forecast. Some popular forecasting methods include several kinds of exponential smoothing models (example in Figure 6), Winters method, ARIMA (Box-Jenkins) models (example in Figure 7), and dynamic or transfer function models.

Figure 8: Forecasting combined workloads 3.2.3 Generate Forecasts Figure 10: Adding the effects of a new application 4 Modeling

Once the best fit model is found, the forecast can be generated. At this point the inclusion of outside data into the forecasting process enables future business events to be added to the forecast. For example, business growth forecast and campaigns, new functions added to the business applications, new applications, and any expected outside factors that may influence the future utilization of a system may also be included in the forecast. The next two figures show a report of the addition of a sales campaign and a new application to the forecast [SAS2006].

Modeling is used to build a replica of a real system for making predictions and doing what-if analysis. This provides a virtual environment for understanding the impacts of changes to workloads, servers, networks or other infrastructure components. Since workloads are often hard to predict from historical data, the model can be used to predict both known and hypothetical situations. Key to the models usefulness is confidence in its accuracy. Therefore, the forecasts and models work together for initial and ongoing validation. 4.1 Modeling Approaches The primary types of models are analytical queuingtheory based models and simulation models. As with forecasting, the right approach is related to the amount of detail needed to answer the business questions balanced with the amount of time that can be devoted to the modeling effort. Analytical models are typically faster and simpler to build and use than their simulation counterparts. Analytical models are well suited for capacity management studies, whereas simulation models are best suited for performance studies. Analytical modeling software can be learned in a couple of days whereas simulation modeling software requires a longer learning curve and a deeper pre-requisite skill set. Analytical models are designed for producing quick answers, and can be built and run within a matter of hours. They are best suited for predicting the overall system utilization of CPU, disk and network bandwidth based on broad workload definitions. For more precise

Figure 9: Additional growth due to a planned campaign

predictions of individual business functions, interdependent system components, and detailed response time, a simulation model is an appropriate choice. 4.1.1 Analytical Modeling Analytical models predict system behavior based on mathematical calculations. The most prevalent form is a queuing theory / network model. This divides the system into two main components: Service centers which represent the actual devices like CPU, disk or network device. Devices can consume a fixed amount of resource or a variable amount of resource depending on the load. Workloads which represent the demand placed on the service centers.

Figure 11: Analytical vs. Simulation Model Parameters

Model Element Workload type Application

Analytical Based on interarrival rate Business functions are represented as a whole.

Infrastructure

4.1.2 Simulation Modeling Simulation models are computer representations of real systems. They mimic actual system behavior and produce highly accurate predictions. These models allow for a detailed understanding of the impact of change. As a result, simulation models are the ideal change management system, allowing users to apply a change to the simulated system and see the result before waiting to see what would happen in the actual live system. A model provides a virtual test environment a risk-free way to assess changes, which reduces unexpected downtime and missed service levels. Although there are several simulation approaches available, discrete-event simulation (DES) models are recommended for IT systems since they can represent multiple system behaviors. Due to the highly flexible nature of simulation models, they can be used for a variety of levels of detail from basic utilization models to system scalability models to advanced optimization models. [SPEL2002]. Models are built from information collected about the actual system. As a result, the accuracy of the model is dependent on the accuracy of the inputs. differentiates the two modeling approaches based on parameters used in model construction. Resource Consumption Network Consumption Response Time Predictions Modeling engine

Server details, and basic network bandwidth Utilization at the system or process level N/A Relative Uses queuing theory to calculate system utilizations and relative changes to response time

Simulation Based on interarrival rate or user think time Transaction flows are represented for each business function individually or for the business process as a whole. Server and network configuration details Utilization at the system, process or business function level Network traffic by business function Detailed Simulates system events to determine system utilizations and expected (actual) response times

For capacity management purposes, analytic or basic simulation models are appropriate. Analytic models tend to be preferred for capacity planning since they are faster to build and easier to use. Simulation models are used when the utilization models will be extended to more detailed performance models that will be used for system scalability, software scalability or application optimization studies. 4.2 Building a Utilization Model The steps for building a utilization model are: Capture historical data Define the elements of the model: devices and workloads

4.2.1

Choose a baseline interval and set forecast period Validate the baseline configuration Perform what-if analysis Compare results & make business recommendations Capture Historical Data

The basis for all models is a valid set of input data. It is recommended that the data include a minimum of twelve samples so that the results are statistically significant. Typically, 5 or 15 minute intervals are used for a total baseline data requirement of one to three hours. For developing an enterprise wide capacity plan, it is common for multiple system monitors to be used. As a result, the data extraction, transformation and load (ETL) process should support data from a variety of sources and at the systems-, application-, or process-level. Once the data is normalized and stored, it provides a centralized performance data repository for modeling.

Figure 12: CPU Utilization from Multiple Servers 4.2.2 Define the Elements of the Model: Devices and Workloads

Figure 12 shows the data after is has been prepared [HYPR2006].

The infrastructure is built using a component library of pre-defined server, disk, and network devices (see Figure 13). Then workloads are defined by mapping the system or process level data. The workloads and infrastructure together comprise the virtual data center (the model) and the relationships between them. Thresholds are defined for each utilization limits for the servers and response time limits for the workloads. Growth rates are also specified which provide a fixed or variable increase in the workload going through the model during runtime. Forecasts are used to determining the right growth rate for the model. Figure 14 shows the seasonal growth rate (as determined from the forecast) entered in the model. [HYPRCM2006].

not recommended. The forecast interval is then set to define the period of time the model will look forward.

A steady state period in the historical data is chosen for the models baseline interval

Figure 13: Modeled infrastructure of server and network devices.

Figure 15: Baseline Interval


Growth rates for seasonality determined by forecast

4.2.4

Validate the Baseline Configuration

When each of these steps has been completed, the first model is generated. This is called the base configuration. Before the model is used to make business decisions, it must first be verified and validated for accuracy. Verification is used to determine that the model does what it is supposed to do. Validation occurs when the model makes proper predictions under many different configurations. Figure 16 below shows a validation report which compares modeled output with measured input. This report shows a close match, which gives us confidence in the accuracy of the model.
25.00 20.00 15.00 Measured 10.00 5.00 0.00 CPU1 CPU2 CPU3 CPU4 Modeled

Figure 14: Entering growth rates into the model 4.2.3 Choose a Baseline Interval and Set Forecast Period

The mathematical computations in the analytic model will use the baseline data to generate the predictions. Therefore, it is important that analysis be done to select an appropriate baseline interval instead of choosing all the data just because it is available. For example, the diagram below shows a number of peaks and valleys in the historical data. If the entire interval was selected, it would average across all the data and skew the results. Rather, we want to select a subset of the data that represents a fair steady state (Figure 15). In this case, the baseline can be any given number of minutes, hours, or days. Baselines that are extremely high (representing peak) or low (representing maintenance windows) are

Figure 16: Validation Report: Measured vs. Modeled Utilization models can also be validated on an ongoing basis. If the base configuration demonstrates that

server utilization will grow from 18% to 25% as a result of 3% fixed workload growth over the next month, the operations staff should schedule time to measure the process utilization a month from now. If the result of the measurement one month from now represents 25% utilization, the original model accurately projected this performance. If the value is measured significantly above or below this value, the initial model may have been fed incomplete data or recent changes to the infrastructure may have occurred in the month, which directly impacted this process. The model should be recalibrated to reflect the changes in baseline data, infrastructure components and/or workload characterization. 4.2.5 Perform What-If Analysis

Use the models component library to answer the question: what if we use a different server?

After the initial model has been created and validated, the model can be used for what-if scenarios (Figure 17). The following changes can be made to the model: Servers # of servers CPU speed # CPUs Operating systems Disk speed Server consolidation Networks Bandwidth Latency Workloads Growth rates (variable or static) Response time thresholds Workload reassignment Figure 17: What-if scenario of a change to an individual server. The results of multiple what-if analyses can be compared using scenario analysis. This makes it easy to evaluate multiple changes as once. As changes are made to the model, the scenarios keep track and differentiate the results (see Figure 18). Although it may be possible to predict the result of the first change, it becomes increasingly difficult to predict the results of third, fourth, and fifth successive change.

The model shows that the server used in the what-if scenario will provide enough capacity for the next 7 months.

The base configuration and 3 scenarios are shown in this report. The model shows that a fixed growth rate of 10% will run out of capacity in 6 months. Upgrading the app server will buy another 2 months of capacity.

and the performance of the application as experienced by end user. The model can be used to evaluate changes to workloads, hardware, software, and network bandwidth, and determine the performance of each. The two primary modeling approaches are analytic/queuing theory or simulation. Analytical models tend to be faster, but less detailed, whereas simulation models tend to be more granular and precise. The partnership of forecasting and modeling for capacity planning allows the IT organization to predict when a problem will occur in the future, and prescribe the right solution. This adds a proactive component to capacity management by preventing capacity shortages, system failures, and poor response times before they occur and before they impact the end users experience. REFERENCES

Figure 18: A scenario analysis report shows a quick side-by-side comparison of various what-if analyses. The model is used to predict the outcome for both individual changes as well as interdependencies across the complex IT environment. These cascade effects occur when one bottleneck is relieved, and another bottleneck occurs in an unexpected place. Modeling allows for multiple scenarios to be defined and run simultaneously, so that these cause and effect results can be identified, and the right changes prescribed before a threshold occurs in the actual live system. CONCLUSION The optimum capacity planning methodology includes both forecasting and modeling to predict capacity needs and prescribe the right solutions. The capacity management process starts with monitoring current systems and reacting to issues in the live production environment. Forecasting extends the capacity planning process by analyzing the monitored data and making predictions about future resource requirements or issues. This information is fed into models, which are used for what-if scenarios to determine the future resource requirements or the impact of other changes. Forecasting uses historical data to predict future values of workloads, rates (I/O, network, application transaction, etc.) and utilization (CPU, Disk Space, bandwidth, etc.). Through the use of time series forecasting, these critical drivers of IT resources can be statistically modeled and forecasted into the future. With all known changes and events added to the statistical forecast, the predicted workloads, rates and utilizations can be used as drivers for modeling the supporting IT infrastructure and applications Modeling is used to represent an interactive system in order to understand the utilization of all the components

[FOR2005] L. Orlov and A. Bartels, Memo to CEOs and CIOs: IT Innovation Capacity Not IT Spend Is What Matters, Forrester Research, (2005). [ITIL] Service Delivery, IT Infrastructure Library Series, Office of Government Commerce, 2001 [SPEL2002] A. Spellmann and R. Gimarc, Stepwise Refinement: A Pragmatic Approach for Modeling Web Applications, CMG 2002 International Conference. [SAS2006] SAS IT Intelligence, http://www.sas.com/solutions/itsysmgmt/index.html, Cary, NC, 2006. [HYPR2006] IPS Capacity Manager 3.1 and Performance Optimizer 3.4, www.hyperformix.com, Austin TX, 2006.

Das könnte Ihnen auch gefallen