Sie sind auf Seite 1von 3

Automatically Classifying Database Workloads

School of Computing Queens University Kingston, ON K7L 3N6, Canada +1 (613) 547-3556

Said Elnaffar

elnaffar@cs.queensu.ca

School of Computing Queens University Kingston, ON K7L 3N6, Canada +1 (613) 533-6513

Pat Martin

martin@cs.queensu.ca

IBM Toronto Lab 8200 Warden Ave. Markham, ON L6G 1C7, Canada +1 (905) 413-2432

Randy Horman

horman@ca.ibm.com

ABSTRACT
The type of the workload on a database management system (DBMS) is a key consideration in tuning the system. Allocations for resources such as main memory can be very different depending on whether the workload type is Online Transaction Processing (OLTP) or Decision Support System (DSS). In this paper, we present an approach to automatically identifying a DBMS workload as either OLTP or DSS. We build a classification model based on the most significant workload characteristics that differentiate OLTP from DSS, and then use the model to identify any change in the workload type. We construct a workload classifier from the Browsing and Ordering profiles of the TPC-W benchmark. Experiments with an industry-supplied workload show that our classifier accurately identifies the mix of OLTP and DSS work within an application workload.

to the tendency of issuing financial reports and running long executive queries to produce summaries. DBAs must therefore also recognize the significant shifts in the workload and reconfigure the system in order to maintain acceptable levels of performance. The goal of our research is a technology by which a DBMS can automatically identify the type of its workload. This is an important step towards autonomic DBMSs, which know themselves and the context surrounding their activities, and can automatically tune themselves to efficiently process the workloads put on them [5]. Our solution treats workload type identification as a data mining classification problem, in which DSS and OLTP are the class labels, and the data objects classified are database performance snapshots. We first construct a workload model, or a workload classifier, by training the classification algorithm on sample OLTP and DSS workloads. We then use the workload classifier to identify snapshot samples drawn from unknown workload mixes. The classifier scores the snapshots by tagging them by one of the class labels, DSS or OLTP. The number of DSS- and OLTPtagged snapshots reflects the concentration (relative proportions) of each type in the mix. We validate our approach experimentally with workloads generated from the Transaction Processing Performance Council (TPC) Web Commerce Benchmark TPC-W [8] benchmark1 and with real workloads provided by a major global banking firm. These workloads are run on DB2 Universal Database Version 7.2 [4]. To the best of our knowledge, there is no previous published work examining the problem of automatically identifying the type of a DBMS workload. There are, however, numerous studies characterizing database workloads based on different properties that can be exploited in tuning DBMSs [1]. The rest of this paper is organized as follows. Section 2 describes our approach to the problem. Section 3 explains our methodology and how to compose the snapshots. Section 4 describes our experiments and discusses the results obtained from experimenting with the benchmark and the industry-supplied workload. Section 5 presents our conclusions and guidelines for future work.

Categories and Subject Descriptors


C.4 [Computer Systems Organization]: Performance of Systems measurement techniques, modeling techniques, performance attributes.

General Terms
Management, Measurement, Performance, Experimentation.

Keywords
workload characterization, classification, autonomic databases, data mining, OLTP, DSS, self-managed DBMSs.

1. INTRODUCTION
Database administrators (DBAs) tune a database management system (DBMS) based on their knowledge of the system and its workload. The type of the workload, specifically whether it is Online Transactional Processing (OLTP) or Decision Support System (DSS), is a key criterion for tuning [4][6]. In addition, a DBMS experiences changes in the type of workload it handles during its normal processing cycle. For example, a bank may experience an OLTP-like workload by executing the traditional daily transactions for almost the whole month, while in the last few days of the month, the workload becomes more DSS-like due
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM02, November 4-9, 2002, McLean, Virginia, Copyright 2002 ACM 1-58113-492-4/02/0011$5.00. USA.

Since our TPC benchmark setups have not been audited per TPC specifications, when the term TPC-W is used, it should be taken to mean TPC-W-like.

Table 1. Snapshot attribute classes System-Dependence Low Snapshot Attribute Weight 1.0 1.0 1.0 1.0 1.0 0.75 0.75 0.3 0.3

Table 2. Settings for SPRINT classification algorithm Settings used in the DB2 Intelligent Miner Maximum Tree Depth Maximum Purity of Internal Node Minimum Records Per Internal Node Attributes Weights Error Matrix No Limit Imposed 100 5 See Table 1 None

Queries Ratio Pages Read Rows Selected Pages Scanned Logging

Medium

Number of Sorts Ratio of Using Indexes Sort Time Number of Locks Held

High

3. METHODOLOGY
We construct a workload classifier as follows. We run sample DSS and OLTP workloads and collect sets of snapshots for each one. We label the snapshots as OLTP or DSS and then use them as training sets to build the classifier. We need to choose a snapshot interval such that there are sufficient training objects to build a classifier and the interval is large enough to contain at least one completed SQL statement. With a snapshot interval of one second, we observed that many SQL statements complete within that size interval in an OLTP workload. This is not the case, however, for DSS workloads that contain complex queries that are too long to complete within one second. We therefore dynamically resize the snapshots by coalescing consecutive one-second raw snapshots until we encompass at least one statement completion. We then normalize the consolidated snapshots with respect to the number of SQL statements executed within a snapshot. Consequently, each normalized snapshot describes the characteristics of a single SQL statement. During this training phase, we run each workload type for about 20 minutes, producing 2400 one-second, raw snapshots to process. After training, we use the generated classifier to identify the OLTP-DSS mix of a given workload. We run the workload for about 10 minutes (600 raw snapshots) and produce a set of consolidated snapshots as described above. We then feed these snapshots to the classifier which identifies each one as either DSS or OLTP, supporting this decision by a confidence value between 0.0 and 1.0 that indicates the probability that the class of the snapshot is predicated correctly. Only snapshots with high confidence values, greater than 0.9, are considered. On average, we observed that over 90% of the total snapshots examined satisfy this condition. Eventually, we compute the workload type concentration3 in the mix

2. APPROACH
We view the problem of classifying DBMS workloads as a machine-learning problem in which the DBMS must learn how to recognize the type of the workload mix. Classification is a twostep process. In the first step, we build a model or classifier to describe a predetermined set of data classes. The model is constructed by analyzing a training set of data objects. Each object is described by attributes, including a class label attribute that identifies the class of the object. The learned model is represented in the form of a decision tree embodying the rules that can be used to categorize future data objects. In the second step, we use the model for classification. First, the predictive accuracy of the classifier is estimated using a test data set. If the accuracy is considered acceptable (for example, reporting that 80%, or more, of the tested snapshots are classified as DSS or OLTP when we attempt to identify a known DSS or OLTP workload), the model can be used to classify other sets of data objects for which the class label is unknown. For our problem, we define the DSS and OLTP workload types to be the two predefined data class labels. The data objects needed to build the classifier are performance snapshots taken during the execution of a training database workload. Each snapshot reflects the workload behavior (or characteristics) at some time during the execution, and is labeled as being either OLTP or DSS. We use SPRINT [7], a fast scalable decision-tree based algorithm implemented in DB2 Intelligent Miner Version 6.1 [3], to build our classifier. The attributes collected in each snapshot are shown in Table 1. We group the attributes into three classes based on their degrees of system-dependence, and assign different weights to each class of attribute to reflect their significance to the classification process. We arbitrarily assign weights of 1.0, 0.75, and 0.3 to low-, medium-, and high-dependence attributes, respectively2. A full description of these attributes and a discussion of their selection criterion are given elsewhere [2].

4. EXPERIMENTS
We constructed a classifier for our experiments using the TPC-W Browsing and Ordering profiles as the DSS and OLTP training workloads, respectively. We ran each training workload for approximately 20 minutes and collected the values of the snapshot attributes every second. The most important properties of the experimental setup for these runs are summarized in Table 2.

3 2

These weights are independent of any product or system settings we are using. Any other reasonable numbers that serve to rank the attribute classes are acceptable.

In the reminder of the paper, we express this concentration exclusively in terms of the DSSness, which is the percentage of DSS-classified snapshots in the mix. The OLTPness is the complement of the DSSness, that is, 100 DSSness.

100 90 80 60 % 50 40 30 20 10 0 70

91.6

93.8

100 90 80 % of DSSness 70 60 50 40 30 20 10 0

91.6 75.2

DSS OLTP

DSSness

8.4

6.2

6.2 Browsing Shopping Ordering

Browsing

Ordering

Figure 2. Performance of the workload classifier The first experiment evaluates the prediction accuracy of the classifier. Figure 2 shows the results of testing the classifier against test samples drawn from the Browsing and Ordering profiles. The classifier reports that approximately 91.6% of the snapshots in the Browsing workload are DSS while the rest, 8.4%, are OLTP, whereas it reports that approximately 6.2% of the snapshots in the Ordering workload are DSS while the rest, 93.8%, are OLTP. In the second experiment, we use the Shopping profile, a third mix available in TPC-W, to evaluate the ability of the classifier to detect variation in the type intensity of a workload. As seen in Figure 1, the classifier reports 75.2% of the Shopping profile is DSS, which means that the Shopping is closer to Browsing than Ordering. This finding matches the TPC-W specifications, which leads us to believe that the classifier has effectively learnt the characteristics of the TPC-W workload and is able to accurately sense any variation in the workload type intensity. The third experiment examines the ability of the classifier to identify a totally different workload. A major global investments and banking firm provided several workload samples from an online decision support system that helps investors and shareholders to get most recent information about the market status in order to help them balance their portfolios and make knowledgeable financial decisions. The classifier reported 90.96% of these industrial workload samples is DSS, which is very close to the percentage reported against the Browsing profile (91.6%), and this meets our expectations.

Figure 1. The classifier on the Shopping profile

6. ACKNOWLEDGEMENTS
We thank IBM Canada, the Natural Sciences and Engineering Research Council of Canada (NSERC) and Communications and Information Technology Ontario (CITO) for their support. We also thank many people at the IBM Toronto Lab, especially Berni Schiefer, Sam Lightstone, Robin Van Boeschoten and Kenton DeLathouwer, for their time and cooperation.

7. REFERENCES
[1] Elnaffar, S., Martin, P. Characterizing Computer Systems Workloads. Submitted to ACM Computing Surveys Journal. [2] Elnaffar, S. A Methodology for Auto-recognizing DBMS Workloads. To appear in Proceedings of CASCON 02, (November 2002). [3] IBM, DB2 Intelligent Miner for Data, http://www4.ibm.com/software/data/iminer/fordata/about.html, IBM (1999). [4] IBM, DB2 Universal Database Version 7 Administration Guide: Performance, IBM Corporation (2000). [5] IBM, Autonomic Computing: IBMs Perspective on the State of Information Technology, at http://www.research.ibm.com/autonomic/manifesto/, (June 2002). [6] Oracle9iDatabase Performance Guide and Reference, Release 1(9.0.1), Part# A87503-02, Oracle Corp. (2001). [7] Shafer, J.C., Agrawal, R., Mehta, M. SPRINT: A Scalable Parallel Classifier for Data Mining. Proc. of the 22th Int'l Conference on Very Large Databases, Mumbai (Bombay), India (September 1996). [8] TPC Benchmark W (Web Commerce) Standard Specification Revision 1.7, Transaction Processing Performance Council (October 2001).

5. CONCLUSIONS
In this paper, we present a methodology by which a DBMS can learn how to distinguish between workload types. We demonstrate our methodology by creating and evaluating a workload classifier, which was tested using benchmark workloads and a real industrial workload. This research is in progress, and we are currently experimenting with different benchmark and industrial workloads. There are a number of related issues that require further study. First, we will investigate the possibility of constructing a generic classifier that can recognize a wide range of workloads. Second, we will study the utility of establishing a feedback mechanism between the workload classifier and the DBAs to help them understand their system conditions and, consequently, develop better performancetuning strategies. Third, we plan to develop a method to anticipate when a change in the workload type may occur. This should eliminate the need for continuous, online monitoring that inevitably causes additional overhead on the system.

Trademarks
The following terms are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both: DB2, IBM, Intelligent Miner, Universal Database. Other company, product or service names may be trademarks or service marks of others.

Das könnte Ihnen auch gefallen