Sie sind auf Seite 1von 39

Java Performance Tuning

by Fabian Skivée


Profiling methodology Profiling tools Case study


There is a general perception that Java programs are slow.

In early versions of Java, you had to struggle hard and compromise a lot to make a Java application run quickly.

The VM technology and Java development tools have progressed to the point where a Java application is not particularly handicapped.

Why is it slow ?

The virtual machine layer that abstracts Java away from the underlying hardware increase the overhead.

These overheads can cause Java application to run slower that an equivalent application written in a lower-level language.

Java's advantages – platform-independence, memory management, powerful exception checking, built-in multi-threading, dynamic ressource loading and security checks – add costs.

The tuning game

Performance tuning is similar to playing a strategy game.

Your target is to get a better score than the last score after each attempt.

You are playing with, not against, the computer, the programmer, the design, the compiler.

Techniques include switching compilers, turning on optimizations, using a different VM, finding 2 or 3 bottleneck in the code that have simple fixes.

System limitations

Three ressources limits all applications :

CPU speed and availability System memory Disk (and network) input/output

The first step in the tuning is to determine which of these is causing your application to run slowly.

When you fix a bottleneck, is normal that the next bottleneck switch to another limitations.

A tuning strategy

1.Identify the main bottlenecks (look for about the top five bottlenecks)

2.Choose the quickest and easiest one to fix, and address it.

3.Repeat from Step 1.

Advantage :

- once a bottleneck has been eliminated, the characteritics of the application change, and the topmost bottlenck may no need to be addressed any longer.

Identify bottleneck

1. Measure the performance by using profilers and benchmark suites.

2. Identify the location of any bottlenecks.

3. Think of a hypothesis for the cause of the bottleneck.

4. Consider any factors that may refute your hypothesis.

5. Create a test to isolate the factor identified by the hypothesis.

6. Test the hypothesis

7. Alter the application to reduce the bottleneck

8. Test that the alteration improves performance, and measure the improvement

9. Repeat from Step 1.

Perceived Performance

The users has a particular view of performance that allows you to cut some corners.

Ex : A browser that gives a running countdown of the amount left to be downloaded from a server is seen to be faster that one that just sits here until all the data is downloaded.

Rules :

if application is unresponsive for more than 2 sec, it is seem as slow.

Users are not aware of response time improvements of less than 20 %

How to appear quicker ?

Threading : ensuring that your application remains reponsive to the user, even while it is executing some other function.

Streaming : display a partial result of the activity while continuing to compile more results in background. (very useful in distributed systems).

Caching : the caching technics help you to speed the data access. The read-ahead algorithme use in disk hardware is fast when you reading forward through a file.

Starting to tune

User agreements : you should agree with your users what the performance of the applications is expected to be : response times, systemwide throughput, max number of users, data,

Setting benchmarks : these are precise specifications stating what part of code needs to run in what amount of time.

How much faster and in which parts, and for how much effort ?

Without clear performance objectives, tuning will never be completed

Taking Measurements

Each run of your benchmarks needs to be under conditions that are identical as possible.

The benchmark should be run multiples times, and the full list of results retained, not just the average and deviation.

Run a initial benchmark to specify how far you need to go and highlight how much you have achieved when you finish tuning.

Make your benchmark long enough (over 5 sec)

What to measure ?

Main : the wall-clock time (System.currentTimeMillis())

CPU time : time allocated on the CPU for a particular procedure

Memory size

Disk throughput

Network traffic, throughput, and latency

Java doesn't provide mechanisms for measuring theses values directly.

Profiling Tools

Measurements and timings Garbage collection Method calls Object-creation profiling Monitoring gross memory usage

« If you only have a hammer, you tend to see every problem as a nail. »

Abraham Maslow

Measurements and Timings

Any profiler slow down the application it is profiling.

Using currentTimeMillis() is the only reliable way.

The OS interfere with the results by the allocation of different priorities to the process.

On certain OS, the foreground processes are given maximum priority.

Some cache effects can lead to wrong result.

Garbage Collection

Some of the commercial profilers provide statistics showing what the garbage collector is doing.

Or use the -verbosegc option with the VM.

With VM1.4 : java -Xloggc:<file>

The printout includes explicit synchronous calls to the garbage collector and asynchronous executions of the garbage collector when free memory available gets low.

Garbage Collection

The important items that all -verbosegc output are

the size of the heap after garbage collection

the time taken to run the garbage collection

the number of bytes reclaimed by the garbage collection.

Interesting value :

Cost of GC to your application (percentage)

Cost of the GC in the application's processing time

GC Viewer

Supported verbose:gc formats are:

Sun JDK 1.3.1/1.4 with the option -verbose:gc Sun JDK 1.4 with the option -Xloggc:<file> (preferred) IBM JDK 1.3.0/1.2.2 with the option -verbose:gc

GCViewer shows a number of lines :

Full GC Lines: Black vertical line at every Full GC Inc GC Lines: Cyan vertical line at every Incremental GC GC Times Line: Green line that shows the length of all GCs Total Heap: Red line that shows heap size Used Heap: Blue line that shows used heap size

GC Viewer

GCViewer also provides some metrics:

Acc Pauses: Sum of all pauses due to GC

Avg Pause: Average length of a GC pause

Min Pause: Shortest GC pause

Max Pause: Longest GC pause

Total Time: Time data was collected for (only Sun 1.4 and IBM


Footprint: Maximal amount of memory allocated

Throughput:Time percentage the application was NOT busy with GC

Freed Memory: Total amount of memory that has been freed

Freed Mem/Min: Amount of memory that has been freed per minute

GC Viewer

GC Viewer

Method Calls

Show where the bottlenecks in your code are and helping you to decide where to target your efforts.

Most method profilers work by sampling the call stack at regular intervals and recording the methods on the stack.

The JDK comes with a minimal profiler, obtain by using the -Xrunhprof option (depends on the JDK). This option produces a profile data file (java.hprof.txt).

Rolf's Profile Viewer

For each method

a count of the number of times the method is invoked

a short form of the class and method name itself the time spent in that method (in seconds)

a bargraph of the time.

All the methods which call the current method are listed in the caller pane

All the methods that the current method itself invokes are listed in the callee pane.

Rolf's Profile Viewer

Rolf's Profile Viewer
Rolf's Profile Viewer
Rolf's Profile Viewer

Object creation

Determine object numbers

Identifying where particular objects are created in the code.

The JDK provides very rudimentary object- creation statistics.

Use a commercial tool in place of the SDK.

Monitoring Gross Memory Usage

The JDK provides two methods for monitoring the amount of memory used by the runtime

system : freeMemory() and totalMemory() in the

java.lang.Runtime class.

totalMemory() returns a long, which is the number of bytes currently allocated to the runtime system for this particular VM process.

freeMemory() returns a long, which is the number of bytes available to the VM to create objects from the section of memory it controls.


(commercial) Optimizeit from Borland (commercial) JProbe from Quest Software (commercial) JProfiler from ej-technologies (commercial) WebSphere Studio from IBM (free) HPjmeter from Hewlett-Packard (free) HPjtune

Case study :

Tuning IO performance

Tuning IO performance

The example consists of reading lines from a large files.

We compare differents methods on 2 files :

small file with long lines long file with short lines

We test our methods with four JVM config :

JVM 1.2.2 JVM 1.3.1 JVM 1.4.1 JVM 1.4.1 -server

Method 1 : Unbuffered input stream

Use the deprecated method readLine() from DataInputStream.

DataInputStream in = new DataInputStream(new FileInputStrem(file));

while ((line = in.readLine()) != null) {




Method 2 : Buffered input stream

Use a BufferedInputStream to wrap the FileInputStream.

DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStrem(file)));

while ((line = in.readLine()) != null) {




Method 3 : 8K buffered input stream

Set the size of the buffer to 8192 bytes.

DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStrem(file),8192));

while ((line = in.readLine()) != null) {




Method 4 : Buffered reader

Use Readers instead of InputStreams, according to the Javadoc, for full portability, etc.

BufferedReader in = new BufferedReader(new FileReader(file));

while ((line = in.readLine()) != null) {




Method 5 : Custom-built reader

Let's get down to some real tuning.

You know from general tuning practices that creating objects is overhead.

Up until now, we have used the readLine() method, which returns a string.

Suppose we avoid the String creation.

Better, why not working directly on the underlying char array.

Method 5 : Custom-built reader

We need to implement the readLine() functionnality with our own buffer while passing the buffer to the method that does the string processing.

Our implementation uses its own char array buffer.

It reads in characters to fill the buffer, then runs through the buffer looking for ends of lines.

Method 5 : Custom-built reader

Each time the end of a line is found, the buffer together with the start and end index of the line in that buffer, is passed to the doSomething() method.

This implementation avoids both String-creation overhead and the subsequent String-processing overhead.

Method 6 : Custom reader and converter

Better, performing the byte-to-char conversion.

Change the FileReader to FileInputStream and add a byte array buffer of the same size as the char array buffer.

Create a convert() method that convert the byte buffer to the char buffer.

Results with small file


JDK 1.4.1


JDK 1.2.2

JDK 1.3.1

JDK 1.4.1


Unbuffered input stream





Buffered input stream





8K Buffered input stream





Buffered reader





Custom-built reader





Custom reader and converter





The file contains 10000 lines of 100 caracters. (977Kb)

Results with long file


JDK 1.4.1


JDK 1.2.2

JDK 1.3.1

JDK 1.4.1


Unbuffered input stream





Buffered input stream





8K Buffered input stream





Buffered reader





Custom-built reader





Custom reader and converter





The file contains 35000 lines of 50 caracters. (1,7Mb)