Beruflich Dokumente
Kultur Dokumente
Contents
Introduction
Part 1 Introducing profile-directed feedback
Part 2 Profile-directed feedback
Part 3 Optimizing C code using PDF compiler options
Part 4 Optimizing C code using PDF environment variables
Part 5 Application of profile-directed feedback
Summary
Introduction
This is the third in a series of tutorials that introduces the optimization
features of the XL C/C++ compiler on AIX and z/OS. These tutorials are
intended to be short and concise to give you a quick lead into how to use
these optimization features. Each part gives a brief introduction to certain
compiler features. For more specific information, you can use the summary
at the end of the document for quick reference or refer to listed references
in each part for more detailed information.
This tutorial introduces optimizing C code using profile-directed feedback
and using the XL C/C++ compiler. You will learn about profile-directed
feedback compiler options you can use that will help you optimize your
code based on an analysis of how often branches are taken and blocks of
code are executed. You will also learn how to make code changes that will
address the common code problems uncovered at this level of
optimization. Finally, and also the focus of this tutorial will be presenting
an example of applying profile-directed feedback to tune the performance
of an application.
Learning objectives
Describe what profile-directed feedback does and why you would want to
use it
Optimize C code using profile-directed feedback
Time required
This tutorial should take approximately 30 minutes to finish. If you
explore other concepts related to this tutorial, it could take longer to
complete.
Skill level
Beginner
Audience
C application programmers
System requirements
Note that the use of XL C for AIX, V11.1 or XL C/C++ for AIX, V11.1 is
important because it specifies the default values for compiler optimization.
These default values are different based on the version of the compiler.
You first compile the program with the -qpdf1 option (with a minimum
optimization level of -O2), which generates an instrumented executable,
which will collect profile data while the program executes. Then, execute
the instrumented executable one or more times with sample inputs that
cover the common usage of the program. As it runs, the instrumented
program will update a data file with the collected profile data. Finally,
recompile the program with the -qpdf2 option, which will read the profile
data, and optimize the program based on it.
full PDF process by avoiding recompilation during the PDF2 phase. Just
reuse any objects generated by the PDF1 step and relink them using the
-qpdf2 option.
You can find more information about these options in the IBM XL C/C++
for AIX, V11.1 Optimization and Programming Guide, SC27-2482-00 or
the z/OS V1R12.0 XL C/C++ User's Guide, SC09-4767-09.
Example:
As an illustration, the following source code can be compiled with profiledirected feedback at optimization level 2 (use xlc -O2 qpdf1 o
pdf1 foo.c then use xlc O2 qpdf2 o pdf2 foo.c).
#include <stdlib.h>
int main(int argc, char **argv){
long size;
int x;
if (argc<=1) return 0;
size = atoi(argv[1]);
if (size>65536) return 0;
for(int i=0; i<size; ++i){
x = x + i*i;
}
return x;
}
With optimization level 2, and compiling the program using qpdf1, the
compiler generates profile data. When you compile the same program
again using qpdf2 the compiler optimizes the program based on the
profiled data. For a more complete and detailed list of code changes
performed during PDF, see the next section. You can get additional
information added to your listing file such as loop iteration counts, block
and call counts, and cache misses to help tune your program by adding the
-qreport option to the compile command above, and checking the PDF
Report section in the listing, in this case called foo.lst.
-qpdf1=pdfname1 or - qpdf2=pdfname2
You can generate the name of the PDF file based on what you specify with
the o option. For example, you can use -qpdf1=exename -o foo foo.f to
generate a PDF file called .foo_pdf.
pdfname=file_path
You can specify the path to the file that will hold the profile data. By
default, the file name is ._pdf, and is placed in the current working
directory or in the directory named by the PDFDIR environment variable.
You can use the pdfname suboption to capture simultaneous runs of
multiple executables using the same PDF directory.
Compiler listings
You can instruct the compiler by using the -qreport compiler option with the
-qpdf2 option to provide the following information in the PDF report
section:
Loop iteration count
The most frequent loop iteration count and the average iteration count, for
a given set of input data, are calculated for most loops in a program. This
information is only available when the program is compiled at
optimization level -O5.
Block and call count
This section of the report covers the Call Structure of the program and the
respective execution count for each called function. It also includes Block
information for each function. For non-user defined functions, only
execution count is given. The Total Block and Call Coverage, and a list of
the user functions ordered by decreasing execution count are printed in the
end of this report section. In addition, the Block count information is
printed at the beginning of each block of the pseudo-code in the listing
files.
Cache miss
This section of the report is printed in a single table. It reports the number
of Cache Misses for certain functions, with additional information about
the functions such as: Cache Level, Cache Miss Ratio, Line Number, File
Name, and Memory Reference.
Note: You must use the option -qpdf1=level=2 to get this report. You can
also select the level of cache to profile using the environment variable
PDF_PM_EVENT during run time.
Checkpoint:
Self-test questions:
10
11
[Link objects]
xlc -O2 -qpdf1 -o myapp file1_pdf1.o file2_pdf1.o
file3_pdf1.o
Applying PDF
At this point, you will have an instrumented executable called "myapp".
An instrumented executable will generally run slower than an executable
compiled without -qpdf1 option, due to the extra profiling code. In order
to generate useful information for the compiler to consume, you will need
to run the executable with training data, which best represents, the typical
behavior of your application. "Training" refers to the execution of an
instrumented executable with a set of input data.
For example, you may have three sets of input data called dat1.in, dat2.in,
and dat3.in. Ideally, each data set would exercise different parts of your
code but remain a good representation of typical usage. To train "myapp",
you would need to run the application for each set of data as follows:
./myapp < dat1.in
./myapp < dat2.in
./myapp < dat3.in
12
At the end of execution, you will notice that a file called "._pdf" generated
in the current directory. Note that the filename and directory where it is
created can be changed through compiler options. The data generated
from every execution of the instrumented application is stored in the same
PDF file. In order words, every instrumented application generally has
one PDF file associated with it.
The PDF data can now be fed back into the compiler to further guide
optimizations based on the training data you provided. Simply re-link the
objects you've created with -qpdf1 option, but replace with -qpdf2 option
at the link step instead. It is not necessary to recompile your source files
to object files with -qpdf2 option, as PDF optimizations are done during
object linking. That being said, recompiling source files with -qpdf2
option and linking with -qpdf2 will produce an optimized executable with
the same behavior.
xlc -O2 -qpdf2 -o myapp file1_pdf1.o file2_pdf1.o
file3_pdf1.o
13
----------------------------------main(63): 1 (foo.c)
Block Counters:
4-8 | 1
8 | 1
8-10 | 1000000
8 | 1
8-14 | 1
14 | 1
14-15 | 1000000
14 | 1
14-17 | 1
18 |
Block coverage = 100% ( 9/9 )
----------------------------------Total Call coverage = 0% ( 0/0 )
Total Block coverage = 28% ( 9/32 )
Example
Function Workload A
Workload B
Workload A + B
foo()
times
bar()
xyz()
executed 10 times
executed 1000010
executed 1 times
executed 200 times
infrequently executed and foo() is the hottest function, and therefore the
application will be optimized in such a way where the code in foo() is
efficient and has the greatest speed-up, while xyz() is assumed to be cold
(ie. low execution frequency) and could therefore be optimized less
efficiently. The optimized application will likely perform well with
Workload B, but conversely Workload A will suffer. In such a case, the
mergepdf tool can be used to balance the data generated by the two
workloads by adding "weights" to each.
Example
In the above example, a scaling ratio of 10000:1 was used. PDF data for
Workload A was weighed as 10000 while Workload B was weighed as 1.
This way both foo() and xyz() end up with the same execution frequency
magnitude (order of 1000000) and the compiler will treat both as hot
functions and optimize accordingly. To use the merged PDF file, which in
this case was named ._pdf_total, the PDF filename would have to be
specified using -qpdf2=pdfname=<filename> as follows:
xlc -O2 -qpdf2=pdfname=._pdf_total -o myapp file1_pdf1.o
file2_pdf1.o file3_pdf1.o
Checkpoint
Summary:
Feeding the PDF data back into the compiler further guides optimization's
based on the training data you provided.
Several PDF tools can help you work with the results of PDF optimization
Self-test questions:
16
Summary
Over the course of this tutorial you learned about profile-directed feedback
and how to use it to improve your programs performance.
Takeaway points
PDF should be used after other debugging and tuning is finished, as one of
the last steps before putting the application into production.
Compiling the program with the -qpdf1 option generates profile data for
the compiled program. Compiling the program again with the -qpdf2
option optimizes the program based on the profile data.
Several options allow you to see additional information about the profiled
data
Several utility programs allow you to manage profiling data
To get maximum results from PDF optimization, you need to run your
program with training data, which best represents the typical behavior of
your application.
Additional resources
If you would like to learn more about all the different optimization options
consult the IBM XL C/C++ for AIX, V11.1 Compiler Reference, SC272479-00 or the z/OS V1R12.0 XL C/C++ User's Guide, SC09-4767-09.
Start with options that we have talked about in this tutorial and continue
learning about optimization levels 3, 4 and 5.
Guidelines on writing code that is best suited for optimization can be found
in IBM XL C/C++ for AIX, V11.1 Optimization and Programming Guide,
SC27-2482-00 or the z/OS V1R12.0 XL C/C++ Programming Guide,
SC09-4765-11. Here you will find more information about efficient I/O
methods, use of built-in functions, as well as additional notes on how to
improve performance with compiler options.
Be it optimization options or code changes, there is abundant input from
knowledgeable professionals in the Rational C/C++ Caf. Simply type
optimization in the search bar and you will be pointed to a number of
useful documents and threads with further discussions on the subject.
Contacting IBM
IBM welcomes your comments. You can send them to
compinfo@ca.ibm.com
17
March 2011
References in this document to IBM products, programs, or services do not imply that
IBM intends to make these available in all countries in which IBM operates. Any
reference to an IBM program product in this publication is not intended to state or imply
that only IBMs program product may be used. Any functionally equivalent program may
be used instead.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of
International Business Machines Corporation in the United States, other countries, or
both. If these and other IBM trademarked terms are marked on their first occurrence in
this information with a trademark symbol ( or ), these symbols indicate U.S.
registered or common law trademarks owned by IBM at the time this information was
published. Such trademarks may also be registered or common law trademarks in other
countries. A current list of IBM trademarks is available on the Web at Copyright and
trademark information at www.ibm.com/legal/copytrade.shtml
Copyright International Business Machines Corporation 2011. US Government
Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
18