Sie sind auf Seite 1von 20

Applying XL C/C++ Compiler

Optimization on AIX and on z/OS


Profile Directed Feedback

Contents
Introduction
Part 1 Introducing profile-directed feedback
Part 2 Profile-directed feedback
Part 3 Optimizing C code using PDF compiler options
Part 4 Optimizing C code using PDF environment variables
Part 5 Application of profile-directed feedback
Summary

Introduction
This is the third in a series of tutorials that introduces the optimization
features of the XL C/C++ compiler on AIX and z/OS. These tutorials are
intended to be short and concise to give you a quick lead into how to use
these optimization features. Each part gives a brief introduction to certain
compiler features. For more specific information, you can use the summary
at the end of the document for quick reference or refer to listed references
in each part for more detailed information.
This tutorial introduces optimizing C code using profile-directed feedback
and using the XL C/C++ compiler. You will learn about profile-directed
feedback compiler options you can use that will help you optimize your
code based on an analysis of how often branches are taken and blocks of
code are executed. You will also learn how to make code changes that will
address the common code problems uncovered at this level of
optimization. Finally, and also the focus of this tutorial will be presenting
an example of applying profile-directed feedback to tune the performance
of an application.
Learning objectives

Describe what profile-directed feedback does and why you would want to
use it
Optimize C code using profile-directed feedback

Time required
This tutorial should take approximately 30 minutes to finish. If you
explore other concepts related to this tutorial, it could take longer to
complete.
Skill level
Beginner
Audience
C application programmers
System requirements

XL C for AIX, V11.1 or XL C/C++ for AIX, V11.1


AIX V5.3 TL 5300-06, AIX V6.1 or IBM i V6.1 PASE

Note that the use of XL C for AIX, V11.1 or XL C/C++ for AIX, V11.1 is
important because it specifies the default values for compiler optimization.
These default values are different based on the version of the compiler.

XL C/C++ for z/OS, V1R12


IBM System z10
Note that XL C/C++ for z/OS, V1R12 is important because it specifies the
default values for compiler optimization. These default values are different
based on the version of the compiler.

Part 1 - Introducing profile directed


feedback
You can use profile-directed feedback (PDF) to improve the performance
of your application. This mechanism allows the compilers to learn the
common behavior of the application and tailor the code to optimize for that
behavior. The PDF process is intended to be used after other debugging
and tuning is finished, as one of the last steps before putting the application
into production.
The following diagram illustrates the PDF process.

You first compile the program with the -qpdf1 option (with a minimum
optimization level of -O2), which generates an instrumented executable,
which will collect profile data while the program executes. Then, execute
the instrumented executable one or more times with sample inputs that
cover the common usage of the program. As it runs, the instrumented
program will update a data file with the collected profile data. Finally,
recompile the program with the -qpdf2 option, which will read the profile
data, and optimize the program based on it.

Part 2 Profile-directed feedback


First compile and link a program with the -qpdf1 option (with a minimum
optimization level of -O2), to generate the instrumented executable. You
can invoke this level with -O2 qpdf1 compiler options:
xlc -O2 qpdf1 o pdf1 source.c
Run the resulting executable using data that is representative of the data
that is used during a normal run of your finished program. The program
will record profiling data information when it finishes. You can run the
program multiple times with different data sets, and the profiling
information will be accumulated to provide a count of how often branches
are taken and blocks of code are executed, based on the input data used.
This program will run more slowly than the regular optimized binary as
the instrumentation will introduce some runtime overhead. Note that the
PDF process requires the final link step to be done using the compiler
invocation, as opposed to using the linker directly. It is possible to mix-and
-match objects compiled with and without -qpdf1 during the linker
invocation. Only objects compiled with -qpdf1/-qpdf2 will be subject to
PDF optimization.
By default, the profile data will be collected to the PDF file in the current
working directory or the directory specified by the PDFDIR environment
variable. The default name for the instrumentation file is ._pdf. To
override these defaults, you can use the optionsqpdf1=pdfname or
qpdf2=pdfname.
Then recompile the program, using the same compiler options as before,
except changing qpdf1 to qpdf2. In this second compilation, the
accumulated profiling information is used to fine-tune the optimizations.
The resulting program contains no profiling overhead and runs at full
speed.
For best results, it is important to use the same program sources between
the PDF1 and PDF2 and the exact same compilation options. The PDF
process is able to tolerate some minor changes to the program sources
between these two steps, but the optimization of the resulting binary will
be reduced.
xlc -O2 qpdf2 o pdf2 source.c
The PDF instrumentation and optimization on the XL compilers is
performed completely during the linking step of the code generation
process. This makes it possible to reduce the time it takes to perform the
5

full PDF process by avoiding recompilation during the PDF2 phase. Just
reuse any objects generated by the PDF1 step and relink them using the
-qpdf2 option.
You can find more information about these options in the IBM XL C/C++
for AIX, V11.1 Optimization and Programming Guide, SC27-2482-00 or
the z/OS V1R12.0 XL C/C++ User's Guide, SC09-4767-09.

Example:
As an illustration, the following source code can be compiled with profiledirected feedback at optimization level 2 (use xlc -O2 qpdf1 o
pdf1 foo.c then use xlc O2 qpdf2 o pdf2 foo.c).
#include <stdlib.h>
int main(int argc, char **argv){
long size;
int x;
if (argc<=1) return 0;
size = atoi(argv[1]);
if (size>65536) return 0;
for(int i=0; i<size; ++i){
x = x + i*i;
}
return x;
}

With optimization level 2, and compiling the program using qpdf1, the
compiler generates profile data. When you compile the same program
again using qpdf2 the compiler optimizes the program based on the
profiled data. For a more complete and detailed list of code changes
performed during PDF, see the next section. You can get additional
information added to your listing file such as loop iteration counts, block
and call counts, and cache misses to help tune your program by adding the
-qreport option to the compile command above, and checking the PDF
Report section in the listing, in this case called foo.lst.

Part 3- Optimizing C/C++ code using


PDF compiler options
In this part you will learn about using various compiler options to optimize
your C or C++ code using PDF with minimal coding effort. You can
instruct the compiler to use a specific directory to save the profile
information, gather cache miss profiling information, include a PDF report
section in a compiler listing, and insert additional profiling information
into a compiled application. The following are options that can be used in
conjunction with qpdf1 or qpdf2.

-qshowpdf (AIX only)


When used with -qpdf1 and a minimum optimization level of -O2 at
compile and link steps, inserts additional profiling information into the
compiled application to collect call and block counts for all procedures in
the application.

-qpdf1=pdfname1 or - qpdf2=pdfname2
You can generate the name of the PDF file based on what you specify with
the o option. For example, you can use -qpdf1=exename -o foo foo.f to
generate a PDF file called .foo_pdf.

-qpdf1=level=0|1|2 (AIX only)


You can compile your application with -qpdf1=level=0|1|2 to generate
profiling data with different levels of optimization. The different levels
support multiple-pass profiling, cache miss, block counter, call counter and
extended value profiling. Note that -qpdf1=level=0 and -qpdf1=level=1
support single-pass profiling, whereas -qpdf=level=2 supports multiplepass profiling.

pdfname=file_path
You can specify the path to the file that will hold the profile data. By
default, the file name is ._pdf, and is placed in the current working
directory or in the directory named by the PDFDIR environment variable.
You can use the pdfname suboption to capture simultaneous runs of
multiple executables using the same PDF directory.

Utility programs (AIX only)


The following utility programs found in /usr/vacpp/bin help you manage
your profile data.

Cleanpdf (AIX only)


__ cleanpdf directory_path__
The cleanpdf utility program removes all profiling information from the
directory specified by directory_path; or if pathname is not specified, from
the directory set by the PDFDIR environment variable; or if PDFDIR is
not set, from the current directory. Removing profiling information
reduces runtime overhead if you change the program and then go through
the PDF process again. Run cleanpdf only when you are finished with the
PDF process for a particular application. Otherwise, if you want to resume
using PDF with that application, you will need to recompile all of the files
again with -qpdf1.

Mergepdf (AIX only)


__ mergepdf _ input -o output -r scaling -n -v__
The mergepdf utility program merges two or more PDF records into a
single PDF output record. -r scaling specifies the scaling ratio for the PDF
record file. This value must be greater than zero and can be either an
integer or floating point value. If not specified, a ratio of 1.0 is assumed.
input specifies the name of a PDF input record file, or a directory that
contains PDF record files. -o output specifies the name of the PDF output
record file, or a directory to which the merged output will be written. -n if
specified means that PDF record files are not normalized. If not specified,
mergepdf normalizes records based on an internally-calculated ratio
before applying any user-defined scaling factor. -v specifies verbose
mode, and causes internal and user-specified scaling ratios to be displayed
to standard output.

Resetpdf (AIX only)


__ resetpdf directory_path__
Same as cleanpdf, described above.

Showpdf (AIX only)

__ showpdf directory_path -f file_path__


The showpdf utility program displays the function call and block counts
written to the profile file, specified by the -f option, during a program run. To
use this command, you must first compile your application specifying both
-qpdf1 and -qshowpdf compiler options on the command line.

Compiler listings
You can instruct the compiler by using the -qreport compiler option with the
-qpdf2 option to provide the following information in the PDF report
section:
Loop iteration count
The most frequent loop iteration count and the average iteration count, for
a given set of input data, are calculated for most loops in a program. This
information is only available when the program is compiled at
optimization level -O5.
Block and call count
This section of the report covers the Call Structure of the program and the
respective execution count for each called function. It also includes Block
information for each function. For non-user defined functions, only
execution count is given. The Total Block and Call Coverage, and a list of
the user functions ordered by decreasing execution count are printed in the
end of this report section. In addition, the Block count information is
printed at the beginning of each block of the pseudo-code in the listing
files.
Cache miss
This section of the report is printed in a single table. It reports the number
of Cache Misses for certain functions, with additional information about
the functions such as: Cache Level, Cache Miss Ratio, Line Number, File
Name, and Memory Reference.
Note: You must use the option -qpdf1=level=2 to get this report. You can
also select the level of cache to profile using the environment variable
PDF_PM_EVENT during run time.

Checkpoint:

PDF is one of the last steps in tuning the performance of an application


Compiler listings can include a PDF report section
Cleanpdf, mergepdf, resetpdf and showpdf can be used to aid profile data
management
9

Self-test questions:

Why might PDF be helpful to tune the performance of an application?


What information is included in the PDF report section?
How do the utility programs help you manage your profile data?

10

Part 4- Optimizing C/C++ code using


PDF environment variables
In this part you will learn about using various environment variables to optimize
your C or C++ code using PDF with minimal coding effort. The following are the
environment variables that can be used in conjunction with qpdf1 or qpdf2.
PDFDIR
Optionally specifies the directory in which profiling information is saved
when you run an application that you have compiled with the -qpdf1 option.
The default value is unset, and the compiler places the profile data file in the
current working directory. When you recompile or relink your application
with -qpdf2, the compiler uses the data saved in this directory to optimize
the application. It is recommended that you set this variable to an absolute
path if you will be using profile-directed feedback.

PDF_PM_EVENT (AIX only)


When running the instrumented executable generated with qpdf1=level=2,
you can specify the environment variable PDF_PM_EVENT to L1MISS,
L2MISS or L3MISS (if applicable) to gather cache miss profiling at the
specified cache level.
PDF_BIND_PROCESSOR (AIX only)
If you want to bind your process to a particular processor, you can specify
PDF_BIND_PROCESSOR to bind the process tree from the executable to a
different processor. Processor 0 is set by default.

11

Part 5 Application of profiledirected-feedback


This part provides you with an example of how to use profile-directed
feedback for an application.
Introduction
The examples in this section use the following source files: file1.c, file2.c
and file3.c.
If you have a large number of source files, then its a good idea to separate
your compile and link steps as shown:
[Compile source]
xlc -O2 -qpdf1 -o file1_pdf1.o -c file1.c
xlc -O2 -qpdf1 -o file2_pdf1.o -c file2.c
xlc -O2 -qpdf1 -o file3_pdf1.o -c file3.c

[Link objects]
xlc -O2 -qpdf1 -o myapp file1_pdf1.o file2_pdf1.o
file3_pdf1.o

Applying PDF
At this point, you will have an instrumented executable called "myapp".
An instrumented executable will generally run slower than an executable
compiled without -qpdf1 option, due to the extra profiling code. In order
to generate useful information for the compiler to consume, you will need
to run the executable with training data, which best represents, the typical
behavior of your application. "Training" refers to the execution of an
instrumented executable with a set of input data.
For example, you may have three sets of input data called dat1.in, dat2.in,
and dat3.in. Ideally, each data set would exercise different parts of your
code but remain a good representation of typical usage. To train "myapp",
you would need to run the application for each set of data as follows:
./myapp < dat1.in
./myapp < dat2.in
./myapp < dat3.in

12

At the end of execution, you will notice that a file called "._pdf" generated
in the current directory. Note that the filename and directory where it is
created can be changed through compiler options. The data generated
from every execution of the instrumented application is stored in the same
PDF file. In order words, every instrumented application generally has
one PDF file associated with it.
The PDF data can now be fed back into the compiler to further guide
optimizations based on the training data you provided. Simply re-link the
objects you've created with -qpdf1 option, but replace with -qpdf2 option
at the link step instead. It is not necessary to recompile your source files
to object files with -qpdf2 option, as PDF optimizations are done during
object linking. That being said, recompiling source files with -qpdf2
option and linking with -qpdf2 will produce an optimized executable with
the same behavior.
xlc -O2 -qpdf2 -o myapp file1_pdf1.o file2_pdf1.o
file3_pdf1.o

The compiled executable "myapp" is now optimized with PDF


information!
Applying PDF tools (AIX only)
There are several PDF tools that can be used during compilation with
PDF; here we will describe the two most commonly used:
showpdf, which is a tool that can dump PDF data in human-readable
form
mergepdf, which allows users to combine different PDF files generated
by the same instrumented executable with weights
The showpdf tool allows you to examine the contents of the PDF file that
is generated after running the instrumented executable with typical input
data. For example, a user may want to determine the code coverage for
each or all of their input data sets used during training. The tool displays
the basic block counters (how many times each basic block of code was
executed), the block coverage (how many basic blocks were executed one
or more times versus the total number of basic blocks in the procedure,
and the call coverage (how many function calls were executed one or more
times versus the total number of function calls in the procedure)
Example
PDF info file is: ._pdf
Version = 10 Size = 176
Time stamp: Fri Feb 4 15:51:08 2011

13

----------------------------------main(63): 1 (foo.c)
Block Counters:
4-8 | 1
8 | 1
8-10 | 1000000
8 | 1
8-14 | 1
14 | 1
14-15 | 1000000
14 | 1
14-17 | 1
18 |
Block coverage = 100% ( 9/9 )
----------------------------------Total Call coverage = 0% ( 0/0 )
Total Block coverage = 28% ( 9/32 )

Mergepdf is useful for applications that use workloads as input data,


typically where execution never completes. An example of this is a server
daemon that runs continuously and stopped by external intervention.
Workloads can have execution times that vary widely, which means the
overall execution frequency of one workload may be many times more
than another. This can cause PDF data from smaller workloads with
shorter runtime to be completely engulfed by large workloads with much
longer runtimes.

Example
Function Workload A

Workload B

Workload A + B

foo()
times
bar()
xyz()

executed 10 times

executed 1000000 times

executed 1000010

executed 1 times
executed 200 times

executed 10000 times


executed 1 times

executed 10001 times


executed 201 times

In Workload A, it seems that xyz() is a hot (ie. high execution frequency)


function, thereby the compiler should aggressively optimize it. But in
Workload B, foo() is the hottest function and xyz() is insignificant by
comparison. For every training run, PDF data is accumulated in the PDF
file, which means that after runs with Workload A + B, the total execution
counts are added together as illustrated in Workload A + B. When the
compiler examines the PDF file, it will find that in fact xyz() is
14

infrequently executed and foo() is the hottest function, and therefore the
application will be optimized in such a way where the code in foo() is
efficient and has the greatest speed-up, while xyz() is assumed to be cold
(ie. low execution frequency) and could therefore be optimized less
efficiently. The optimized application will likely perform well with
Workload B, but conversely Workload A will suffer. In such a case, the
mergepdf tool can be used to balance the data generated by the two
workloads by adding "weights" to each.

Example

./myapp < workload_A


mv ._pdf ._pdf_wkld_A
./myapp < workload_B
mv ._pdf ._pdf_wkld_B
mergepdf -r 10000 ._pdf_wkld_A -r 1 ._pdf_wkld_B -o ._pdf_total

In the above example, a scaling ratio of 10000:1 was used. PDF data for
Workload A was weighed as 10000 while Workload B was weighed as 1.
This way both foo() and xyz() end up with the same execution frequency
magnitude (order of 1000000) and the compiler will treat both as hot
functions and optimize accordingly. To use the merged PDF file, which in
this case was named ._pdf_total, the PDF filename would have to be
specified using -qpdf2=pdfname=<filename> as follows:
xlc -O2 -qpdf2=pdfname=._pdf_total -o myapp file1_pdf1.o
file2_pdf1.o file3_pdf1.o

The resulting executable, myapp, is now optimized for both workload A


and B.

Checkpoint
Summary:

An instrumented executable will generally run slower than an executable


without the -qpdf1 option, due to the extra profiling code.
Generating useful information for the compiler to consume requires running
the executable with training data, which best represents the typical behavior
of your application.
15

Feeding the PDF data back into the compiler further guides optimization's
based on the training data you provided.
Several PDF tools can help you work with the results of PDF optimization

Self-test questions:

What does training refer to in the context of PDF optimization?


When would you use the -qpdf2 option?
What tools are available to help you work with the results of PDF
optimization?

16

Summary
Over the course of this tutorial you learned about profile-directed feedback
and how to use it to improve your programs performance.
Takeaway points

PDF should be used after other debugging and tuning is finished, as one of
the last steps before putting the application into production.
Compiling the program with the -qpdf1 option generates profile data for
the compiled program. Compiling the program again with the -qpdf2
option optimizes the program based on the profile data.
Several options allow you to see additional information about the profiled
data
Several utility programs allow you to manage profiling data
To get maximum results from PDF optimization, you need to run your
program with training data, which best represents the typical behavior of
your application.

Additional resources
If you would like to learn more about all the different optimization options
consult the IBM XL C/C++ for AIX, V11.1 Compiler Reference, SC272479-00 or the z/OS V1R12.0 XL C/C++ User's Guide, SC09-4767-09.
Start with options that we have talked about in this tutorial and continue
learning about optimization levels 3, 4 and 5.
Guidelines on writing code that is best suited for optimization can be found
in IBM XL C/C++ for AIX, V11.1 Optimization and Programming Guide,
SC27-2482-00 or the z/OS V1R12.0 XL C/C++ Programming Guide,
SC09-4765-11. Here you will find more information about efficient I/O
methods, use of built-in functions, as well as additional notes on how to
improve performance with compiler options.
Be it optimization options or code changes, there is abundant input from
knowledgeable professionals in the Rational C/C++ Caf. Simply type
optimization in the search bar and you will be pointed to a number of
useful documents and threads with further discussions on the subject.
Contacting IBM
IBM welcomes your comments. You can send them to
compinfo@ca.ibm.com

17

March 2011

References in this document to IBM products, programs, or services do not imply that
IBM intends to make these available in all countries in which IBM operates. Any
reference to an IBM program product in this publication is not intended to state or imply
that only IBMs program product may be used. Any functionally equivalent program may
be used instead.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of
International Business Machines Corporation in the United States, other countries, or
both. If these and other IBM trademarked terms are marked on their first occurrence in
this information with a trademark symbol ( or ), these symbols indicate U.S.
registered or common law trademarks owned by IBM at the time this information was
published. Such trademarks may also be registered or common law trademarks in other
countries. A current list of IBM trademarks is available on the Web at Copyright and
trademark information at www.ibm.com/legal/copytrade.shtml
Copyright International Business Machines Corporation 2011. US Government
Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.

18

Das könnte Ihnen auch gefallen