Sie sind auf Seite 1von 7

HOT METHOD PREDICTION USING SUPPORT VECTOR MACHINES

Sandra Johnson ,Dr S Valli


Department of Computer Science and Engineering, Anna University, Chennai – 600 025, India.
sandra_johnk@yahoo.com , valli@annauniv.edu

ABSTRACT
Runtime hot method detection being an important dynamic compiler optimization
parameter, has challenged researchers to explore and refine techniques to address
the problem of expensive profiling overhead incurred during the process. Although
the recent trend has been toward the application of machine learning heuristics in
compiler optimization, its role in identification and prediction of hot methods has
been ignored. The aim of this work is to develop a model using the machine
learning algorithm, the Support Vector Machine (SVM) to identify and predict hot
methods in a given program, to which the best set of optimizations could be
applied. When trained with ten static program features, the derived model predicts
hot methods with an appreciable 62.57% accuracy.

Keywords: Machine Learning, Support Vector Machines, Hot Methods, Virtual


Machines.

1 INTRODUCTION results of the evaluation. Section 7 proposes future


work and concludes the paper.
Optimizers depend on profile information to
identify hot methods of program segments. The 2 RELATED WORK
major inadequacy associated with the dynamic
optimization technique is the high cost of accurate Machine learning techniques are currently used
data profiling via program instrumentation. The to automate the construction of well-designed
major challenge is how to minimize the overhead individual optimization heuristics. In addition, the
that includes profile collection, optimization strategy search is on for automatic detection of a program
selection and re-optimization. segment for targeted optimization. While no previous
While there is a significant amount of work work to the best of our knowledge has used ML for
relating to cost effective and performance efficient predicting program hot spots, this section reviews the
machine learning (ML) techniques to tune individual research papers which use ML for compiler
optimization heuristics, relatively little work has optimization heuristics.
been done on the identification and prediction of In a recent review of research on the challenges
frequently executed program hot spots using confronting dynamic compiler optimizers, Arnold et
machine learning algorithms so as to target the best al. [1] give a detailed review of adaptive
set of optimizations. In this study it is proposed to optimizations used in the virtual machine
develop a machine learning based predictive model environment. They conclude that feedback-directed
using the Support Vector Machine (SVM) classifier. optimization techniques are not well used in
Ten features have been derived from the chosen production systems.
domain knowledge, for training and testing the Shun Long et al. [3] have used the Instance-
classifiers. The training data set are collected from based learning algorithm to identify the best
the SPEC CPU2000 INT and UTDSP benchmark transformations for each program. For each
programs. The SVM classifier is trained offline with optimized program, a database stores the
the training data set and it is used in predicting the transformations selected, the program features and
hot methods of a program which are not trained. This the resulting speedup. The aim is to apply
system is evaluated for the program’s hot method appropriate transformations when a new program is
prediction accuracy. encountered.
This paper is structured as follows. Section 2 Cavazos et al. [4] have applied an offline ML
discusses related work. Section 3 gives a brief technique to decide whether to inline a method or not.
overview of Support Vector Machines. In Section 4 The adaptive system uses online profile data to
this approach and in section 5 the evaluation identify “hot methods” and method calls in the hot
methodology is described. Section 6 presents the methods are in-lined using the ML heuristics.

Ubiquitous Computing and Communication Journal 1


Cavazos et al. [5, 12] have also used supervised machine learning to identify the best procedure clone
learning to decide on which optimization algorithm for the current run of the program. M. Stephenson et
to use: either graph coloring or Linear scan for al. [18] have used two machine learning algorithms,
register allocation. They have used three categories the nearest neighbor (NN) and Support Vector
of method level features for ML heuristics (i.e.) Machines (SVMs), to predict the loop unroll factor.
features of edges of a control flow graph, features None of these approaches aims at prediction at the
related to live intervals and finally, statistical method level. However, machine learning has been
features about the size of a method. widely used in work on branch prediction [21, 22, 23,
Cavazos et al. [11] report that the best of 24].
compiler optimizations is method dependent rather
than program dependent. Their paper describes how, 3 SUPPORT VECTOR MACHINES
logistic regression-based machine learning technique
trained using only static features of a method, is used The SVM [15, 16] classification maps a training
to automatically derive a simple predictive model data (xi,yi), i = 1,…,n where each instance is a set of
that selects the best set of optimizations for feature values xi ∈ Rn and a class label y ∈ {+1,-1},
individual methods within a dynamic compiler. They into a higher-dimensional feature space φ(x) and
take into consideration the structures of a particular defines a separating hyperplane. Only two types of
method within a program to develop a sequence of data can be separated by the SVM which is a binary
optimization phases. The automatically constructed classifier. Fig. 1 shows a linear SVM hyperplane
regression model is shown to out-perform hand- separating two classes.
tuned models. The linear separation in the feature space is done
To identify basic blocks for instruction using the dot product φ(x).φ(y). Positive definite
scheduling Cavazos et al. [20] have used supervised kernel functions k(x, y) correspond to feature space
learning. Monsifrot et al. [2] have used a decision dot products and are therefore used in the training
tree learning algorithm to identify loops for unrolling. algorithm instead of the dot product as in Eq. (1):
Most of the work [4, 5, 11, 12, 20] is implemented
and evaluated using Jikes RVM. k ( x, y ) = (Φ ( x) • Φ ( y )) (1)
The authors [8, 19] have used genetic
programming to choose an effective priority function
which prioritizes the various compiler options The decision function given by the SVM is given in
available. They have chosen hyper-block formation, Eq. (2):
register allocation and data pre-fetching for
n
evaluating their optimizations.
Agakov et al. [9] have applied machine learning
f ( x ) = ∑ vi .k ( x, xi ) + b (2)
i =1
to speed up search-based iterative optimization. The
statistical technique of the Principal component
where b is a bias parameter, x is the training example
analysis (PCA) is used in their work for appropriate
and vi is the solution to a quadratic optimization
program feature selection. The program features
problem. The margin of separation extending from
collected off-line from a set of training programs are
the hyperplane gives the solution of the quadratic
used for learning by the nearest neighbor algorithm.
optimization problem.
Features are then extracted for a new program and
are processed by the PCA before they are classified,
using the nearest neighbor algorithm. This reduces Optimal Hyperplane
the search space to a few good transformations for
the new program from the various available source-
level transformations. However, this model can be Margin of Separation
applied only to whole programs.
The authors [10] present a machine learning-
based model to predict the performance of a
modified program using static source code features
and features like execution frequencies of basic
blocks which are extracted from the profile data
collected. As proposed in their approach [9], the Feature Space

authors have used the PCA to reduce the feature set. Figure 3: Optimal hyperplane and margin of
A linear regression model and an artificial neural separation
network model are used for building the prediction
model which is shown to work better than the non- 4 HOT METHOD PREDICTION
feature-based predictors.
In their work Fursin et al. [14] have used This section briefly describes how machine

Ubiquitous Computing and Communication Journal 2


learning could be used in developing a model to Machine’s (LLVM) [6] bytecode representation of
predict hot methods within a program. A discussion the programs provides the training as well as the test
of the approach is followed by the scheme of the data set. The system architecture for the SVM-based
SVM-based strategy adopted in this study. hot method predictive model is shown in Fig.2 and it
closely resembles the architecture proposed by the
authors C. L. Huang et. al. [26]. Fig. 3 outlines the
strategies for building a predictive model.

1. Create training data set.


a. Collect method level features
i. Calculate the specified feature for every
method in a LLVM bytecode.
ii. Store the feature set in a vector.
b. Label each method
i. Instrument each method in the program
with a counter variable [25].
ii. Execute the program and collect the
frequency of the execution of each
method.
iii. Using the profile information, each
method is labeled as either hot or cold.
iv. Write the label and its corresponding
feature vector for every method in a file.
c. Steps (a) & (b) are repeated for as many
programs as are required for training.
2. Train the predictive model.
a. The feature data set is used to train the
SVM-based model.
b. The predictive model is generated as output.
3. Create test data set.
a. Collect method level features.
i. Calculate the specified features for every
method in a new program.
ii. Store the feature set in a vector.
iii. Assign the label ‘0’ for each feature
vector in a file.
4. Predict the label as either hot or cold for the test
data generated in step 3 using the predictive
model derived in step 2.

Figure 3: System outline

4.2 Extracting program features


The ‘C’ programs used for training are converted
Figure 2: System architecture of the SVM-based hot into LLVM bytecodes using the LLVM frontend.
method predictive model Every bytecode file is organized into a single module.
Each module contains methods which are either user-
defined or pre-defined. Only static features of the
4.1 The approach
Static features of each method in a program are user-defined methods are extracted from the
collected by offline program analysis. Each of these bytecode module, for the simple reason that they can
method level features forms a feature vector which is be easily collected by an offline program analysis.
labeled either hot or cold based on classification by a Table 1 lists the 10 static features that are used to
prior execution of the program. The training data set train the classifier. Each feature value of a method is
thus generated is used to train the SVM-based calculated in relation to an identical feature value
predictive model. Next, the test data set is created by extracted from the entire bytecode module. The
offline program analysis of a newly encountered collection of all the feature values for a method
program. The trained model is used to predict constitutes the feature vector xi. This feature vector
whether a method is hot or cold for the new program. xi is stored for subsequent labeling. Each feature
An offline analysis on the Low Level Virtual vector xi is then labeled yi and classified as either hot

Ubiquitous Computing and Communication Journal 3


(+1) or cold (-1) based on an arbitrary threshold feature 1 indicates the percent of loops found in the
scheme described in the next section. method. The “hot method threshold” used being 50%,
4 out of the 8 most frequently executed methods in a
Table 1: static features for identifying hot methods. program are designated as hot methods. The first
element in each vector is the label yi (+1 or -1). Each
1. Number of loops in a method. element of the feature vector indicates the feature
Average loop depth of all the loops in the number followed by the feature values.
2.
method.
3. Number of top level loops in a method. 4.4 Creating test data set
Number of bytecode level instructions in When a new program is encountered, the test
4. data set is collected in a way similar to the training
the method.
5. Number of Call instructions in a method. data set, except that the label is specified as zero.
6. Number of Load instructions in a method.
7. Number of Store instructions in a method. 0 1:1 2:1 3:1 4:1.13098 5:2.91262 6:2.05479
Number of Branch instructions in the 7:1.09091 8:1.34875 9:1.55172 10:34
8. 0 1:0 2:0 3:0 4:0.552341 5:0.970874 6:1.14155
method.
9. Number of Basic Blocks in the method. 7:0.363636 8:0.385356 9:0.862069 10:4
10. Number of call sites for each method. 0 1:1 2:1 3:1 4:1.26249 5:0 6:2.51142 7:2.90909
8:1.15607 9:1.2069 10:40
4.3 Extracting method execution frequencies
Figure 5: Sample test data set
Hot methods are frequently executing program
segments. To identify hot and cold methods within a
4.5 Training and prediction using SVM
training program, profile information is gathered
Using the training data set file as input , the
during execution. The training bytecode modules are
machine learning algorithm SVM is trained with
instrumented with a counter variable in each user-
default parameters (C-SVM, C=1, radial base
defined method. This instrumented bytecode module
function). Once trained the predictive model is
is then executed and the execution frequency of each
generated as output. The derived model is used to
method is collected. Using this profile information,
predict the label for each feature vector in the test
the top ‘N’ most frequently executed methods are
data set file. The training and prediction are done
classified as hot. This system keeps the value ‘N’ as
offline. Subsequently, the new program used for
the “hot method threshold”. In this scheme of
creating test data set is instrumented. Executing this
classification, each feature vector (xi) is now labeled
instrumented program provides the most frequently
yi (+1) and yi (-1) for hot and cold methods
executed methods. The prediction accuracy of the
respectively. The feature vector (xi) along with its
system is evaluated by comparing the predicted
label (yi) is then written into a training dataset file.
output with the actual profile values.
Similarly, the training data set of the different
training programs is accumulated in the file. This file
5 EVALUATION
is used as an input to train the predictive model.
5.1 Method
+1 1:1 2:1 3:1 4:0.880046 5:2.51046 6:0.875912
Prediction accuracy is defined as the ratio of
7:0.634249 8:1.23119 9:1.59314 10:29
events correctly predicted to all the events
-1 1:0 2:0 3:0 4:1.16702 5:1.25523 6:1.0219
encountered. This prediction accuracy is of two
7:3.38266 8:1.50479 9:1.83824 10:2
types: hot method prediction accuracy and total
+1 1:2 2:2 3:2 4:1.47312 5:0.83682 6:1.89781
prediction accuracy. Hot method prediction accuracy
7:1.47992 8:2.59918 9:2.81863 10:3
is the ratio of correct hot method predictions to the
actual number of hot methods in a program, whereas
Figure 4: Sample training data set
total prediction accuracy is the ratio of correct
predictions (either hot or cold) to the total number of
The general format of a feature vector is
methods in a program. Hot method prediction
yi 1:xi1, 2:xi2, 3:xi3, …….j:xij
accuracy is evaluated at three hot method threshold
where the labels 1, 2, 3,.... , j are the feature numbers
levels: 50%, 40% and 30%.
and xi1, xi2, ...., xij are their corresponding feature
The leave-one-out cross-validation method is
values. Fig. 4 shows a sample of three feature vectors
used in evaluating this system. This is a standard
from the training dataset collected for the user-
machine learning technique where ‘n’ benchmark
defined methods found in the SPEC benchmark
programs are used iteratively for evaluation. One out
program. The first feature vector in Fig. 4 is a hot
of the ‘n’ programs is used for testing and the rest ‘n-
method and is labeled +1. The values of the ten
1’ programs are used for training the model. This is
features are serially listed for example ‘1’ is the
repeated for all the ‘n’ programs in the benchmark
value of feature 1 and ‘29’ of 10. The value ‘1’ of

Ubiquitous Computing and Communication Journal 4


suite.
Total Method Prediction Accuracy
5.2 Benchmarks Hot Method Thresholds 50% 40% 30%
Two benchmark suites, SPEC CPU2000 INT 120
[17] and UTDSP [13] have been used for training

Prediction Accuracy %
100
and prediction. UTDSP is a C benchmark and SPEC
80
CPU2000 INT has C and C++ benchmarks.
Evaluation of the system is based on only the C 60

programs of either benchmark. The model trained 40

from the ‘n-1’ benchmark programs in the suite is 20


used to predict the hot methods in the missed out 0
benchmark program.

2
p

cf

x
er
c

e
f
ol
ip
gc

r te
zi

ag
m

rs

tw
bz
g

6.

1.

vo
pa

er
4.

0.
6.
17

18

Av
16

5.
7.

30
25
25
19
5.3 Tools and platform SPEC CPU2000 INT
The system is implemented in the Low Level
Virtual Machine (LLVM) version 1.6 [6]. LLVM is Figure 7: Total prediction accuracy on the SPEC
an open source compiler infrastructure that supports CPU2000 INT benchmark
compile time, link time, run time and idle time
optimizations. The results are evaluated on an Intel The total method prediction accuracy on the
(R) Pentium (R) D with 2.80 GHz and 480MB of SPEC CPU2000 INT and UTDSP benchmark suites
RAM running Fedora Core 4. This system uses the is shown in Fig. 7 and 9. The total method prediction
libSVM tool [7]. It is a simple library for Support accuracy for all C programs on the SPEC CPU2000
Vector Machines written in C. INT varies from 36 % to 100 % with an average of
68.43%, 71.14% and 71.14% for the three hot
method thresholds respectively. This averages to
Hot Method Prediction Accuracy
70.24%. The average prediction accuracies obtained
Hot Method Thresholds 50% 40% 30% on the UTDSP benchmark suite are 69%, 71% and
120 58% respectively for 50%, 40% and 30% hot method
thresholds. This averages to 66%. Overall the system
Prediction Accuracy %

100

80 predicts both hot and cold methods in a program with


68.15% accuracy.
60

40 7 CONCLUSION AND FUTURE WORK


20

0
Optimizers depend on profile information to
identify the hot methods of program segments. The
2
p

cf

x
er
c

e
f
ol
ip
gc

rte
zi

ag
m

major inadequacy associated with the dynamic


rs

tw
bz
g

6.

1.

vo
pa

er
4.

0.
6.
17

18

Av
16

5.
7.

30
25

optimization technique is the high cost of accurate


25
19

SPEC CPU2000 INT


data profiling via program instrumentation. In this
Figure 6: Hot method prediction accuracy on the work, a method is worked out to identify hot
SPEC CPU2000 INT benchmark methods in a program using the machine learning
algorithm, the SVM. According to our study, with a
6 RESULTS set of ten static features used in training the system,
the derived model predicts total methods within a
Fig. 6 shows the prediction accuracy of the program with 68.15% accuracy and hot methods
trained model on the SPEC CPU2000 INT with 62.57% accuracy. However, hot method
benchmark program at three different hot method prediction is of greater value because optimizations
thresholds: 50%, 40% and 30%. The hot method will be more effective in these methods.
prediction accuracy for all C programs on the Future work in this area is aimed at improving
benchmark is found to vary from 0 % to 100 % with the prediction accuracy of the system by identifying
an average of 57.86 %, 51.43% and 39.14% for the more effective static and dynamic features of a
three hot method thresholds respectively. This program. Further research in this system can be
averages to 49.48% on the SPEC CPU2000 INT extended to enhance it to a dynamic hot method
benchmark suite. Similarly, on the UTDSP prediction system which can be used by dynamic
benchmark suite, in a 0% to 100% range, the hot optimizers. Applying this approach, the prediction
method prediction accuracy averages for the three accuracy of the other machine learning algorithms
thresholds are 84%, 81% and 62% respectively. can be evaluated to build additional models.
This averages to 76% on the UTDSP benchmark
suite. Overall, this new system can obtain 62.57%
hot method prediction accuracy.

Ubiquitous Computing and Communication Journal 5


Hot Method Prediction Accuracy
Hot Method Thresholds 50% 40% 30%
100
Prediction Accuracy %

80

60

40

20

t
ir
c
2

ff t
m

em

m
s

fi r
ss

eg

ge
m

iir
ee
ct

ng

ul
tra
lp

l li

sf
72

nr
pc

te

ra
e

m
tr e

ra
jp
sL

lm
Fu

od
ec
pr

t
de

og
ad

la

e
cu

dy
m

m
sp

Av
_

st
co

2.
ar
ge

en

hi

V3
M
ed

W
1.

1.
72

72

UTDSP benchmark
G

Figure 8: Hot method prediction accuracy on the UTDSP benchmark.

Total Prediction Accuracy


Hot Method Thresholds
50% 40% 30%
100
Prediction Accuracy %

80

60

40

20

e
ff t

fir

i ir
ng

eg
ct

t
c

ir
m

rm
em
ee

llis
m

ra

ul
ss

ag
lp
72

sf
te
pc

ra

m
jp
Fu

tn
L
e

tre

lm

er
ec

od
e

G
s
pr
ad

og

la
_d

dy
cu

Av
sp

m
m

st
ge

en
ar

2.
co

hi
M

V3
ed

W
1.

1.
72

72

UTDSP benchmark
G

Figure 9: Total Prediction Accuracy on the UTDSP benchmark.

International Conference on Compiler


8 REFERENCES Construction (CC 2006), 2006.
[6] C. Lattner and V. Adve: LLVM: A compilation
[1] Matthew Arnold, Stephen Fink, David Grove, framework for lifelong program analysis &
Michael Hind, and Peter F. Sweeney: A Survey transformation, In Proceedings of the 2004
of Adaptive Optimization in Virtual Machines, International Symposium on Code Generation
Proceedings of the IEEE, pp. 449-466, and Optimization (CGO’04), March 2004.
February 2005. [7] Chih-Chung Chang and Chih-Jen Lin:
[2] A. Monsifrot, F. Bodin, and R. Quiniou: A LIBSVM : a library for support vector
machine learning approach to automatic machines, 2001. Software available at
production of compiler heuristics, In http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Proceedings of the International Conference on [8] M. Stephenson, S. Amarasinghe, M. Martin,
Artificial Intelligence: Methodology, Systems, and U. M. O’Reilly: Meta optimization:
Applications, LNCS 2443, pp. 41-50, 2002. Improving compiler heuristics with machine
[3] S. Long and M. O'Boyle: Adaptive java learning, In Proceedings of the ACM
optimization using instance-based learning, In SIGPLAN Conference on Programming
ACM International Conference on Language Design and Implementation
Supercomputing (ICS'04), pp. 237-246, June (PLDI’03), pp. 77–90, June 2003.
2004. [9] F Agakov, E Bonilla, J Cavazos, G Fursin, B
[4] John Cavazos and Michael F.P. O'Boyle: Franke, M.F.P. O'Boyle, M Toussant, J
Automatic Tuning of Inlining Heuristics, 11th Thomson, C Williams: Using machine learning
International Workshop on Compilers for to focus iterative optimization, In Proceedings
Parallel Computers (CPC 2006), January 2006. of the International Symposium on Code
[5] John Cavazos, J. Eliot B. Moss, and Michael Generation and Optimization (CGO), pp. 295-
F.P. O'Boyle: Hybrid Optimizations: Which 305, 2006.
Optimization Algorithm to Use?, 15th [10] Christophe Dubach, John Cavazos, Björn

Ubiquitous Computing and Communication Journal 6


Franke, Grigori Fursin, Michael O'Boyle and /~mstephen/stephenson_phdthesis.pdf , M. W.
Oliver Temam: Fast compiler optimization Stephenson, Automating the Construction of
evaluation via code-features based performance Compiler Heuristics Using Machine Learning,
predictor, In Proceedings of the ACM PhD thesis, MIT, USA, 2006.
International Conference on Computing [20] J. Cavazos and J. Moss: Inducing heuristics to
Frontiers, May 2007. decide whether to schedule, In Proceedings of
[11] John Cavazos, Michael O'Boyle: Method- the ACM SIGPLAN Conference on
Specific Dynamic Compilation using Logistic Programming Language Design and
Regression, ACM Conference on Object- Implementation (PLDI), 2004.
Oriented Programming, Systems, Languages, [21] B.Calder, D.Grunwald, Michael Jones,
and Applications (OOPSLA), Portland, Oregon, D.Lindsay, J.Martin, M.Mozer, and B.Zorn:
October 22-26, 2006. Evidence-Based Static Branch Prediction Using
[12] John Cavazos: Automatically Constructing Machine Learning, In ACM Transactions on
Compiler Optimization Heuristics using Programming Languages and Systems
Supervised Learning, Ph.D thesis, Dept. of (ToPLaS-19), Vol. 19, 1997.
Computer Science, University of Massachusetts, [22] Daniel A. Jiménez , Calvin Lin: Neural
2004. methods for dynamic branch prediction, ACM
[13] C. Lee: UTDSP benchmark suite. In Transactions on Computer Systems (TOCS),
http://www.eecg.toronto.edu/~corinna/DSP/infr Vol. 20 n.4, pp.369-397, November 2002.
astructure/UTDSP.html, 1998. [23] Jeremy Singer, Gavin Brown and Ian Watson:
[14] G. Fursin, C. Miranda, S. Pop, A. Cohen, and Branch Prediction with Bayesian Networks, In
O. Temam: Practical run-time adaptation with Proceedings of the First Workshop on
procedure cloning to enable continuous Statistical and Machine learning approaches
collective compilation, In Proceedings of the applied to Architectures and compilation, pp.
5th GCC Developer’s Summit, Ottawa, Canada, 96-112, Jan 2007.
July 2007. [24] Culpepper B., Gondre M.: SVMs for Improved
[15] Vapnik, V.N.: The support vector method of Branch Prediction, University of California,
function estimation, In Generalization in UCDavis, USA, ECS201A Technical Report,
Neural Network and Machine Learning, 2005.
Springer-Verlag, pp.239-268, 1999. [25] Youfeng Wu, Larus. J. R. : Static branch
[16] S. Kotsiantis: Supervised Machine Learning: A frequency and program profile analysis, 1994.
Review of Classification Techniques, MICRO-27, Proceedings of the 27th Annual
Informatica Journal 31, pp. 249-268, 2007. International Symposium on Microarchitecture,
[17] The Standard Performance Evaluation pp: 1 – 11, 1994.
Corporation. http://www.specbench.org. [26] C.-L. Huang and C.-J. Wang: A GA-based
[18] M. Stephenson and S.P. Amarasinghe: feature selection and parameters optimization
Predicting unroll factors using supervised for support vector machines, Expert Systems
classification, In Proceedings of International with Applications, Vol. 31, Issue 2, pp: 231-
Symposium on Code Generation and 240, 2006.
Optimization (CGO), pp. 123-134, 2005.
[19] www.cag.csail.mit.edu

Ubiquitous Computing and Communication Journal 7

Das könnte Ihnen auch gefallen