Java Performance Notes

There have been several remarkable articles on Java performance issues in the last couple of months. Those links tend to get lost on Twitter, so I wanted to compile them here with brief overviews. All articles are based on the OpenJDK which also underlies Oracle’s distributions.

Java and SIMD

Piotr Nowojski tests automatic loop vectorization, i.e. the JIT compiler generating SIMD instructions for concurrent mathematical operations on number arrays. This transformation is available on the Server VM and enabled by default via -XX:+UseSuperWord. It results in a 4x speedup when it actually occurs, but Nowojski also identifies a case where Java 8 needs code refactoring in order to recognize the opportunity for parallelization. Happily, the current Java 9 preview requires no help there.

How does the default hashCode() work?

Galo Navarro analyzes the default implementation for Object.hashCode. OpenJDK calls a native method which turns out to run a surprisingly complex algorithm – and implicitly disables fast “biased locking” for the object it’s invoked on. As always, if you intend to use hash codes it’s a good idea to override the default implementation and provide your own.

JVM Anatomy Park #10: String.intern()

Aleksey Shipilёv examines String.intern and finds it essentially useless for large string pools. The call relies on a small native JVM hashtable whose size is fixed on startup. Worse, filling that hashtable increases GC pauses significantly as it’s part of the GC root set. Using your own (Concurrent)HashMap is considerably faster for large string pools. Shipilёv concludes:

In almost every project we were taking care of, removing String.intern() from the hotpaths, or optionally replacing it with a handrolled deduplicator, was the very profitable performance optimization. Do not use String.intern() without thinking very hard about it, okay?

The slow currentTimeMillis()

Pavel Zemtsov compares the Windows and Linux implementations of System.currentTimeMillis. Both are native calls, but whereas Windows is very fast (4 ns) Linux is either slower (36 ns) or much slower (640 ns), depending on the available time source. If you’re on Linux the extensive discussion of time functions on that system should be interesting.

Leave a Reply