I recently delved into the
java.lang.management package, in order to measure thread activity in my
Potential RPG server application. Tools like
VisualVM offer real-time probing, but I also like to have my applications produce their own statistical results. In this case, I'm logging CPU time (and a bunch of other stats) from my server application, which I use to produce
performance charts (using the
JFreeChart library).
In
java.lang.management,
OperatingSystemMXBean reports the
CPU time of your application process. The
ThreadMXBean class reports the
CPU time of each active thread.
CPU time measures time spent being executed by the CPU. Contrast this with
wallclock time, which may include time when your process is swapped out for other work. In addition, you can query
user time, which is the portion of
CPU time spent executing your application, as opposed to performing
system time events such as disk I/O.
Java tip: How to get CPU, system, and user time for benchmarking is an excellent article on using these classes.
In particular, notice that
ThreadMXBean does not report on threads that have died. To avoid missing thread activity, it is important to poll for thread activity at some granularity. I'm trying a 500ms polling rate, but my initial results imply that the thread I'm using to do this polling is using a significant percentage of the overall CPU time.
Any thread activity missed by the polling granularity can be reported by subtracting the sum of sampled thread time from the total process CPU time. By increasing the polling rate, you can catch more short-lived threads, at the expense of more overall CPU time spent polling.
In some cases, I'm using a thread pool (
Executors.newCachedThreadPool()). By default, each thread will have a unique name, but this is not a requirement. By passing in a custom thread factory, I can name all threads that perform a similar service the same. In my thread polling routine, I add the CPU time of each thread by name. This way, even if several threads are actively performing some task, the CPU time is reported for the group.
By monitoring my thread activity, I've found which parts of my application are working the hardest. I've also identified a few
rogue threads (probably quick-and-dirty cases in which I extend Thread itself), which need to be wrangled. These might account for the CPU time not being captured by my polling. Overall, these thread monitoring results give me a good idea where to improve performance.