Getting alerts when Java processes crash

This article was posted by Matty on 2008-01-29 22:52:00 -0400 -0400

When bugs occur in the Java runtime environment, most administrators want to get notified so they can take corrective action. These actions can range from restarting a Java process, collecting postmortem data or calling in application support personnel to debug the situation further. The Java runtime has a number of useful options that can be used for this purpose. The first option is “-XX:OnOutOfMemoryError”, which allows a command to be run when the runtime environment incurs an out of memory condition. When this option is combined with the logger command line utility:

$ java -XX:OnOutOfMemoryError="logger Java process %p encountered an OOM condition" ...

Syslog entries similar to the following will be generated each time an OOM event occurs:

Jan 21 19:59:17 nevadadev root: [ID 702911 daemon.notice] Java process 19001 encountered an OOM condition

Another super useful option is “-XX:OnError”, which allows a command to be run when the runtime environment incurs a fatal error (i.e., a hard crash). When this option is combined with the logger utility:

$ java -XX:OnError="logger -p Java process %p encountered a fatal condition" ...

Syslog entries similar to the following will be generated when a fatal event occurs:

Jan 21 19:52:17 nevadadev root: [ID 702911 daemon.notice] Java process 19004 encountered a fatal condition

The options above allow you to run one or more commands when these errors are encountered, so you could chain together a postmortem debugging tool, a utility (logger or mail) to generate alerts, and a restarter script to start a new Java process (this assumes you aren’t using SMF). Nice!

Monitoring Java garbage collection with jstat

This article was posted by Matty on 2008-01-16 21:18:00 -0400 -0400

Java memory management revolves around the garbage collector, which is the entity responsible for traversing the heap and freeing space that is being taken up by unreferenced objects. Garbage collection makes life easier for Java programmers, since it frees them from having to explicitly manage memory resources (this isn’t 100% true, but close enough). In the Java runtime environment, there are two types of collections that can occur. The first type of collection is referred to as minor collection. Minor collections are responsible for locating live objects in the young generation (eden), copying these objects to the inactive survivor space, and moving tenured objects from the active survivor space to the old (tenured) generation (this assumes that a generational collector is being used). The second form of collection is the major collection. This type of collection frees unreferenced objects in in the tenured generation, and optionally compacts the heap to reduce fragmentation.

When debugging performance problems, it is extremely useful to be able to monitor object allocations and frees in the new and old generations. The Java development kit comes with the jstat utility, which provides a ton of visibility into what the garbage collector is doing, as well as a slew of information on how each generation is being utilized. To use jstat to display garbage collection statistics for the new, old and permanent generations, jstat can be invoked with the “-gc” (print garbage collection heap statistics) option, the “-t” (print the total number of seconds the JVM has been up) option, the process id to retrieve statistics from, and an optional interval to control how often statistics are printed:

$ jstat -gc -tpgrep java 5000

Timestamp S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
98772.0 1600.0 1600.0 0.0 1599.8 13184.0 5561.6 245760.0 201671.9 16384.0 6443.0 166683 2402.690 32411 110.564 2513.255
98777.0 1600.0 1600.0 1599.4 0.0 13184.0 9533.7 245760.0 156797.1 16384.0 6443.0 166690 2402.785 32414 110.573 2513.359
98782.0 1600.0 1600.0 1599.7 0.0 13184.0 10328.6 245760.0 166402.2 16384.0 6443.0 166698 2402.889 32416 110.580 2513.469
98787.0 1600.0 1600.0 0.0 1599.9 13184.0 2383.5 245760.0 195366.0 16384.0 6443.0 166707 2403.016 32416 110.580 2513.595

The output above contains the size of each survivor space (S0C && S1C), the utilization of each survivor space (S0U && S1U), the capacity of eden (EC), the utilization of eden (EU), the capacity of the old generation (OC), the utilization of the old generation (OU), the permanent generation capacity (PC), the permanent generation utilization (PU), the total number of young generation garbage collection events (YGC), the total amount of time spent collecting objects in the new generation (YGCT), the total number of old generation garbage collection events that have occurred (FGC), the total amount of time spent collecting objects in the old generation (FGCT), and the total time spent performing garbage collection.

If you prefer to view garbage collection events as percentages, you can use the “-gcutil” option:

$ jstat -gcutil -t -h5pgrep java 5000

Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT
99814.1 0.00 99.99 18.08 63.77 39.32 168551 2427.512 32800 111.800 2539.313
99819.1 99.96 0.00 66.29 78.18 39.32 168562 2427.649 32800 111.800 2539.449
99824.1 100.00 0.00 94.40 62.46 39.32 168572 2427.795 32803 111.815 2539.610
99829.2 100.00 0.00 60.25 65.08 39.32 168580 2427.888 32806 111.824 2539.713

The output above contains the utilization of each survivor space as a percentage of the total survivor space capacity (S0 && S1), the utilization of eden as a percentage of the total eden capacity (E), the utilization of the tenured generation as a percentage of the total tenured generation capacity (O), the utilization of the permanent generation as a percentage of the total permanent generation capacity (P), the total number of young generation garbage collection events (YGC), the total time spent collection objects in the young generation (YGCT), the total number of of old generation garbage collection events (FGC), the total amount of time spent collecting objects in the old generation (FGCT), and the total garbage collection time.

To get the time spent in garbage collection along with the reason the collection occurred, jstat can be run with the “-gccause” option:

$ jstat -gccause -tpgrep java 1000

Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
100157.3 99.96 0.00 66.27 63.82 39.32 169160 2435.394 32925 112.202 2547.595 CMS Initial Mark No GC
100158.3 0.00 99.99 32.14 67.72 39.32 169163 2435.430 32925 112.202 2547.631 unknown GCCause No GC
100159.3 0.00 99.97 50.22 65.10 39.32 169165 2435.454 32927 112.208 2547.662 CMS Initial Mark No GC
100160.3 99.97 0.00 6.02 62.46 39.32 169168 2435.493 32928 112.211 2547.704 unknown GCCause No GC
100161.3 99.97 0.00 32.14 62.46 39.32 169168 2435.493 32928 112.211 2547.704 unknown GCCause No GC

There are also options to print class loader activity and hotspot compiler statistics, and to break down utilization by generation (this is extremely useful when your trying to profile a specific memory pool). There are a number of incredibly useful opensource tools for visualizing garbage collection data, and I hope to talk about these in the near future.

Building 32-bit openssl libraries with the Sun C compiler

This article was posted by Matty on 2007-12-30 15:53:00 -0400 -0400

This week I needed to install OpenSSL 0.9.8g on one of my servers. When I went to configure and build the libraries with the Sun C compiler, I noticed that 64-bit libraries were produced by default. It turns out that this is the default behavior if you try to build OpenSSL on a 64-bit platform. To build 32-bit shared libraries, I ran Configure with the “shared” and “solaris-x86-cc” options:

$ cd openssl-0.9.8g

$ ./Configure shared --prefix=/usr/local solaris-x86-cc

$ make

$ make install

There may be other ways to do this, but this method appears to work ok.

Monitoring the IPMI system event log

This article was posted by Matty on 2007-12-30 15:42:00 -0400 -0400

If you have a relatively recent server, your machine most likely supports IPMI. One technology that makes IPMI extremely useful is the baseboard management controller (BMC), which is an out-of-band controller that monitors the health of your server platform. Health monitoring is accomplished by distributing sensors throughout the server, and feeding the data these sensors collect back to the BMC. If the BMC detects a fault condition, it can log an error to the system event log.

The event log can be monitored by a number of IPMI software packages. Once such package is ipmitool, which provides the ipmievd daemon just for this purpose. If you are running a recent version of Solaris 10*, you probably already have the IPMI software and the ipmievd daemon installed. You can use the following commands to check:

$ pkginfo | grep ipmi
system SUNWipmi ipmitool, (usr) system SUNWipmir ipmitool, (root)

If the software is installed, you can use the svcadm utility to enable the ipmievd domain:

$ svcadm enable svc:/network/ipmievd:default

Once the ipmievd service is enabled, you can use the ps and svcs commands to verify that the daemon is running:

$ svcs -a | grep ipmi
online 0:25:52 svc:/network/ipmievd:default

$ ps -ef | grep ipmi
root 328 1 0 00:25:53 ? 0:01 /usr/lib/ipmievd sel

If the daemon starts up, it will periodically poll the BMC system event log. If ipmievd detects an error condition, it will log a message to syslog. This message will contain details on the fault, which can be used to help determine that a server is sick. Since FMA currently doesn’t do platform health monitoring (the sensor project will fix this), ipmievd is able to step in and fill this role for the time being. Nice!

This blog post assumes you are running Solaris 10 update 4 with patch 119765-06.

Finding bugs in Java programs

This article was posted by Matty on 2007-12-16 22:49:00 -0400 -0400

A while back I came across findbugs, which is a static analysis tool that can be used to locate bugs in Java programs. Findbugs is able to identify a number of bug patterns, which range from bad practices to performance and multithreaded programming bugs. Findbugs can be invoked through a graphical utility, or by running the findbugs command line utility. The command line option has the advantage that it can be easily incorporated into existing build processes (there are options readily available to integrate findbugs with maven and ant), allowing code to be tested when new builds are created.

To use the command line interface, you can run the findbugs executable with the “-textui” option and one or more options to control how findbugs goes about locating bugs. The following example uses the “-effort” option to tell findbugs to but the maximum amount of effort into finding bugs, requests that all bugs that are considered medium to high in priority be displayed, allocates 1GB of memory to findbugs, and sets the default output format to HTML:

$ findbugs -textui -effort:max -maxHeap 1024 -html -medium test.jar

Once findbugs completes its analysis, an HTML report similar to the ones on the findbugs website will be written to standard output. Since findbugs is free and can be easily integrated with several build tools, there is little to no reason that java developers shouldn’t use it to analyze their code for bugs. If your interested in learning more about findbugs or the bug patterns it detects, you should check out the findbugs website and the talk Bill Pugh gave at Google!