A widespread assumption is that Java is far slower than C or C++. While this was true at the beginning of Java, the introduction of the HotSpot Virtual Machine has radically changed the language. The Just In Time (JIT) compiler was introduced with the VM to optimize the runtime of programs written in Java. The JIT compiles the bytecode into native code after the program has started, on the fly. The new form called compiled frame runs faster than before, hence reducing the overall time of execution.

So you might be thinking why I’m talking about the JIT in an article about micro-benchmarking code sections in Java? The reason is that we cannot trust a measured time in such an environment where random things happens and can alter the results. Micro-benchmarks are very misleading and a lot of things could happen during the execution : JIT optimization and garbage collection are examples (with I/Os and caching effects as well).

Of course, it depends highly on what kind of tests you want to do. Maybe you wish to include all of the JVM specific aspects into your benchmarks. But because microbenchmarks are focusing on very small portions of code, it can add a non negligeable amount of time into the final results.

There is a huge amount of articles on the subject, but here is what I learned so far on micro-benchmarking in Java with some useful resources at the end of this post.

System.currentTimeMillis or System.nanoTime?

Always prefer the System.nanoTime over the System.currentTimeMillis method. Indeed, nanoTime cannot provide results worse than the currentTimeMillis method and generally offers a better approximation.

Warming-up the JVM

Running a benchmark at the very beginning of your application startup is not recommended because of the mechanisms explained before (JVM warmup and JIT). If you do so, the result may not reflect what you wanted to measure. So you will probably note during your tests that measured time will vary from the beginning until a certain state of the execution.
To circumvent this effect, a small trick is to do a warm-up phase to wait for the JIT to complete its first pass. This warm-up phase is generally a simple loop with thousands of iterations. This loop needs to take a few seconds to execute.

private final static int WARMUP_SIZE = 4000000;

static List<String> testList = new ArrayList<String>(WARMUP_SIZE);

// Static portion executed at class load
static {
    for (int i = 0; i < WARMUP_SIZE; i++) {
        // Code to execute during warm-up
        // Generally a piece of code to be measured in Main
        testList.add(Integer.toString(i));
    }
}

public static void main(String[] args) {
    long start = System.nanoTime();

    // Small portion to be measured
    long endTime = System.nanoTime() - start;

    System.out.println("Average time : " + endTime);
}

This is just a general example, you can achieve this in many ways. You can also simply discard the first set of results from your microbenchmark until it stabilizes. Also execute the benchmark multiple times to calculate a median value which will lower the impact of inaccurate results.

Class Loading and Empty loops

Be careful with the portion to be measured. JIT optimizes empty loops and predictable portions of code. The JIT is smarter and sees when a loop is actually useless for example, ending with the loop not being executed at all. So we need to be sure that the JIT cannot predict this portion’s behavior or that our code is modifying something during the execution. If the results printed out are really low or near zero, this is likely that the JIT has eliminated the code because it considers this portion useless.

In the code portion to be measured, do not execute methods that were not executed prior to the warm-up phase as this could trigger a class to be loaded. Loading a class generally incurs I/Os that will alter results.

OSR (On Stack Replacement)

On Stack Replacement allows the JVM to switch to an optimized (compiled) version of a method during its execution. With the example of a loop inside a method running 10K iterations, the HotSpot JVM will compile this method and at the next call, the compiled version will run instead of the interpreted one. But if the number of iterations exceeds a threshold after 10K iterations (generally 14K according to the articles linked at the end), OSR will kick in and replace the running optimized version with the compiled one.

If OSR is being used during the execution, there is a chance that we are not measuring top performance for that method. It is possible that OSR don’t use some classical loop optimizations like loop hoisting (lines moved outside of the loop without affecting the semantics of the loop), loop unrolling (rewriting the loop with repeated sequences to avoid end of loop branching overhead) and array-bounds check elimination (eliminates check within the loop for arrays to avoid out of bounds or buffer overflow errors).

Conclusion : do fewer loop iterations to avoid triggering the On Stack Replacement mechanism and measure top-performance for a given method while running a micro-benchmark.

Running too many threads

This may seem obvious but I’ve seen micro-benchmarks running a lot of threads concurrently. Even with much more threads than cores on the processor. The time spent by the Operating System to switch context is non-zero, damaging the results. Just try to run a simple test with two threads and one with fifty to observe the difference.

Using specific tools like Caliper

Caliper is a great tool for microbenchmarking Java applications. It has a lot of features (avoiding all of the headaches with micro-benchmarks) and allows to view the results on dedicated graphics (instead of playing with console prints). It even detects when the result may be false, telling you where lies the problem.

To finish, these are general guidelines to effectively measure elapsed time for a Java application :

Microbenchmark Guidelines

And a great article focusing on Java micro-benchmarks with everything you need to know about the subject and another one on the internal JVM clock and timer mechanisms :

Robust Java benchmarking

Inside the HotSpot VM clocks