I recently spent some of my spare time assisting a friend with debugging some Java performance problems his company was experiencing. When I originally looked into the performance problem several weeks back, I used the mpstat and jstat utilities to observe CPU utilization and object allocations, and based on some jstat anomalies, I used the techniques described in my object allocation post to get a better understanding of how their Java application was allocating objects. After a bit of analysis and a a couple of email exchanges with my friend and one of the developers he worked with, we were able to locate two application problems that the developer has since fixed.

But even with these changes (which resulted in some significant speedups!!), my friend noticed that request times would periodically shoot up to unacceptable levels. After a bit more analysis with jstat, I noticed that the rate of object allocation in the new generation was still relatively high, and started to wonder if the current size of the new generation was limiting throughput. To see if this was the case, I had my friend add the “PrintGCApplicationConcurrentTime” and “PrintGCApplicationStoppedTime” options to the Java command lline:

$ java -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime …

These options will instruct the Java process to print the time an application is actually running, as well as the time the application was stopped due to GC events. Here are a few sample entries that were produced:

$ egrep ‘(Application|Total time)’ gc.log |more
Application time: 3.3319318 seconds
Total time for which application threads were stopped: 0.7876304 seconds
Application time: 2.1039898 seconds
Total time for which application threads were stopped: 0.4100732 seconds
…..

To get a better idea of how much time the application was running vs. stopped, I created a script (gctime) to summarize the log data. Gctime takes a GC log as input, and prints the total execution time, the time the application ran, the time the application was stopped, as well as the percentage of time the application spent in the running and stopped states. Here is some sample output from a short run:

$ gctimes gc.log

Total execution time               : 66.30secs
Time application ran               : 55.47secs
Time application was stopped       : 10.84secs
% of time application ran          : 83.65%
% of time application was stopped  : 16.35%



Based on the results above, the fact that objects were “spilling” into the old generation, as well as an observation that the tenuring threshold for most objects were extremely low, it appeared that increasing the size (they were using the default size) of the new generation would help decrease the time the application was paused. I asked my friend to double the size (the size for each Java generation should be chosen carefully based on the results of empirical testing methods) of the “NewSize” and “MaxNewSize” runtime options, and that appears to have fixed their latency problem. As I research the area of Java performance more and more, I am starting to realize that a myriad of factors can lead to poor performance. I hope to share some additional Java performance monitoring tools I have written in future posts.

Posted by matty, filed under Java. Date: February 18, 2008, 7:37 pm | 1 Comment »

The Java SDK comes with a number of tools and JVM options that can be used to analyze the performance of the Java runtime. One extremely useful tool is the heap profiler agent, which provides facilities to profile memory usage, CPU utilization and lock contention. To load the profiler agent to profile CPU utilization, you can add the “-agentlib:hprof=cpu=times” option to your java command line:

$ java -Xms256m -Xmx256m -verbose:gc -agentlib:hprof=cpu=times App

Once loaded, the agent will use byte code injection (BCI) to instrument each method’s entry and return points. This allows the agent to measure the number of times each method was called, the time spent in each method, and the call chain that led to the method being invoked. To utilize the agent to it’s full potential, you will need to exercise the application while the agent is active. Once the runtime has been exercised, you can hit cntrl+C to stop the process. This will cause the agent to write the data it has collected to the file java.hprof.txt, which can be viewed with your favorite pager or editor:

$ more java.hprof.txt

CPU TIME (ms) BEGIN (total = 40712194) Mon Jan 21 19:23:12 2008
rank   self  accum   count trace method
   1 38.52% 38.52% 1036273 301143 java.util.Random.next
   2 19.57% 58.09%  518136 301144 java.util.Random.nextDouble
   3 11.87% 69.96%  518136 301145 java.lang.Math.random
   4 11.05% 81.01% 1036274 301141 java.util.concurrent.atomic.AtomicLong.get
   5 10.53% 91.54% 1036273 301142 java.util.concurrent.atomic.AtomicLong.compareAndSet
   6  8.14% 99.68%  259068 301146 TestMain.foo
   7  0.05% 99.73%       1 300969 java.security.Permissions.add
   8  0.04% 99.77%       1 301106 java.lang.Class.privateGetDeclaredFields
   9  0.03% 99.79%       1 301138 java.util.Random.
  10  0.02% 99.81%       2 300908 java.io.FilePermission$1.run
  11  0.01% 99.82%       1 301283 java.lang.ThreadGroup.remove
  12  0.01% 99.83%       1 300820 sun.net.www.protocol.file.Handler.createFileURLConnection
……



The java.hprof.txt file contains the number of times each method was invoked, as well as the amount of CPU time (as a percentage) that was spent in each method. To see how a method was called, you can search the profiler output for the trace identifier that is listed along side the profiling data. This will produce a Java stack trace similar to the following that shows how a given method was invoked:

TRACE 301143:
        java.util.Random.next(Random.java:Unknown line)
        java.util.Random.nextDouble(Random.java:Unknown line)
        java.lang.Math.random(Math.java:Unknown line)
        TestMain.foo(TestMain.java:Unknown line)



The BCI approach introduces a fair amount of overhead to the Java runtime. In cases were the overhead is hindering testing, you can use the agent’s “cpu=samples” option instead. This will cause the agent to sample the runtime environment at periodic intervals to see which methods are executing. While this approach is not as accurate as the BCI approach, it provides a good set of results with less runtime overhead. The java profiling agent is incredibly useful, and just one of the vast number of tools that are available to profile Java applications.

Posted by matty, filed under Java. Date: February 2, 2008, 3:05 pm | No Comments »

When bugs occur in the Java runtime environment, most administrators want to get notified so they can take corrective action. These actions can range from restarting a Java process, collecting postmortem data or calling in application support personnel to debug the situation further. The Java runtime has a number of useful options that can be used for this purpose. The first option is “-XX:OnOutOfMemoryError”, which allows a command to be run when the runtime environment incurs an out of memory condition. When this option is combined with the logger command line utility:

$ java -XX:OnOutOfMemoryError=”logger Java process %p encountered an OOM condition” …

Syslog entries similar to the following will be generated each time an OOM event occurs:

Jan 21 19:59:17 nevadadev root: [ID 702911 daemon.notice] Java process 19001 encountered an OOM condition

Another super useful option is “-XX:OnError”, which allows a command to be run when the runtime environment incurs a fatal error (i.e., a hard crash). When this option is combined with the logger utility:

$ java -XX:OnError=”logger -p Java process %p encountered a fatal condition” …

Syslog entries similar to the following will be generated when a fatal event occurs:

Jan 21 19:52:17 nevadadev root: [ID 702911 daemon.notice] Java process 19004 encountered a fatal condition

The options above allow you to run one or more commands when these errors are encountered, so you could chain together a postmortem debugging tool, a utility (logger or mail) to generate alerts, and a restarter script to start a new Java process (this assumes you aren’t using SMF). Nice!

Posted by matty, filed under Java. Date: January 29, 2008, 10:52 pm | 2 Comments »

Java memory management revolves around the garbage collector, which is the entity responsible for traversing the heap and freeing space that is being taken up by unreferenced objects. Garbage collection makes life easier for Java programmers, since it frees them from having to explicitly manage memory resources (this isn’t 100% true, but close enough). In the Java runtime environment, there are two types of collections that can occur. The first type of collection is referred to as minor collection. Minor collections are responsible for locating live objects in the young generation (eden), copying these objects to the inactive survivor space, and moving tenured objects from the active survivor space to the old (tenured) generation (this assumes that a generational collector is being used). The second form of collection is the major collection. This type of collection frees unreferenced objects in in the tenured generation, and optionally compacts the heap to reduce fragmentation.

When debugging performance problems, it is extremely useful to be able to monitor object allocations and frees in the new and old generations. The Java development kit comes with the jstat utility, which provides a ton of visibility into what the garbage collector is doing, as well as a slew of information on how each generation is being utilized. To use jstat to display garbage collection statistics for the new, old and permanent generations, jstat can be invoked with the “-gc” (print garbage collection heap statistics) option, the “-t” (print the total number of seconds the JVM has been up) option, the process id to retrieve statistics from, and an optional interval to control how often statistics are printed:

$ jstat -gc -t `pgrep java` 5000

Timestamp        S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT
        98772.0 1600.0 1600.0  0.0   1599.8 13184.0   5561.6   245760.0   201671.9  16384.0 6443.0 166683 2402.690 32411  110.564 2513.255
        98777.0 1600.0 1600.0 1599.4  0.0   13184.0   9533.7   245760.0   156797.1  16384.0 6443.0 166690 2402.785 32414  110.573 2513.359
        98782.0 1600.0 1600.0 1599.7  0.0   13184.0  10328.6   245760.0   166402.2  16384.0 6443.0 166698 2402.889 32416  110.580 2513.469
        98787.0 1600.0 1600.0  0.0   1599.9 13184.0   2383.5   245760.0   195366.0  16384.0 6443.0 166707 2403.016 32416  110.580 2513.595


The output above contains the size of each survivor space (S0C && S1C), the utilization of each survivor space (S0U && S1U), the capacity of eden (EC), the utilization of eden (EU), the capacity of the old generation (OC), the utilization of the old generation (OU), the permanent generation capacity (PC), the permanent generation utilization (PU), the total number of young generation garbage collection events (YGC), the total amount of time spent collecting objects in the new generation (YGCT), the total number of old generation garbage collection events that have occurred (FGC), the total amount of time spent collecting objects in the old generation (FGCT), and the total time spent performing garbage collection.

If you prefer to view garbage collection events as percentages, you can use the “-gcutil” option:

$ jstat -gcutil -t -h5 `pgrep java` 5000

Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
        99814.1   0.00  99.99  18.08  63.77  39.32 168551 2427.512 32800  111.800 2539.313
        99819.1  99.96   0.00  66.29  78.18  39.32 168562 2427.649 32800  111.800 2539.449
        99824.1 100.00   0.00  94.40  62.46  39.32 168572 2427.795 32803  111.815 2539.610
        99829.2 100.00   0.00  60.25  65.08  39.32 168580 2427.888 32806  111.824 2539.713



The output above contains the utilization of each survivor space as a percentage of the total survivor space capacity (S0 && S1), the utilization of eden as a percentage of the total eden capacity (E), the utilization of the tenured generation as a percentage of the total tenured generation capacity (O), the utilization of the permanent generation as a percentage of the total permanent generation capacity (P), the total number of young generation garbage collection events (YGC), the total time spent collection objects in the young generation (YGCT), the total number of of old generation garbage collection events (FGC), the total amount of time spent collecting objects in the old generation (FGCT), and the total garbage collection time.

To get the time spent in garbage collection along with the reason the collection occurred, jstat can be run with the “-gccause” option:

$ jstat -gccause -t `pgrep java` 1000

Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
       100157.3  99.96   0.00  66.27  63.82  39.32 169160 2435.394 32925  112.202 2547.595 CMS Initial Mark     No GC
       100158.3   0.00  99.99  32.14  67.72  39.32 169163 2435.430 32925  112.202 2547.631 unknown GCCause      No GC
       100159.3   0.00  99.97  50.22  65.10  39.32 169165 2435.454 32927  112.208 2547.662 CMS Initial Mark     No GC
       100160.3  99.97   0.00   6.02  62.46  39.32 169168 2435.493 32928  112.211 2547.704 unknown GCCause      No GC
       100161.3  99.97   0.00  32.14  62.46  39.32 169168 2435.493 32928  112.211 2547.704 unknown GCCause      No GC



There are also options to print class loader activity and hotspot compiler statistics, and to break down utilization by generation (this is extremely useful when your trying to profile a specific memory pool). There are a number of incredibly useful opensource tools for visualizing garbage collection data, and I hope to talk about these in the near future.

Posted by matty, filed under Java. Date: January 16, 2008, 9:18 pm | 1 Comment »

A while back I came across findbugs, which is a static analysis tool that can be used to locate bugs in Java programs. Findbugs is able to identify a number of bug patterns, which range from bad practices to performance and multithreaded programming bugs. Findbugs can be invoked through a graphical utility, or by running the findbugs command line utility. The command line option has the advantage that it can be easily incorporated into existing build processes (there are options readily available to integrate findbugs with maven and ant), allowing code to be tested when new builds are created.

To use the command line interface, you can run the findbugs executable with the “-textui” option and one or more options to control how findbugs goes about locating bugs. The following example uses the “-effort” option to tell findbugs to but the maximum amount of effort into finding bugs, requests that all bugs that are considered medium to high in priority be displayed, allocates 1GB of memory to findbugs, and sets the default output format to HTML:

$ findbugs -textui -effort:max -maxHeap 1024 -html -medium test.jar

Once findbugs completes its analysis, an HTML report similar to the ones on the findbugs website will be written to standard output. Since findbugs is free and can be easily integrated with several build tools, there is little to no reason that java developers shouldn’t use it to analyze their code for bugs. If your interested in learning more about findbugs or the bug patterns it detects, you should check out the findbugs website and the talk Bill Pugh gave at Google!

Posted by matty, filed under Java. Date: December 16, 2007, 10:49 pm | 1 Comment »

I have been reading through the Java virtual machine specification, which covers all the details needed to implement a JVM. Once thing the specification talks about in detail is Java bytecode, which is the machine independent code that is executed by the virtual machine implementation. At various places in the specification I wondered what the byte code would look like for a chunk of Java code. Fortunately for me, the Java SDK comes with the javap utility, which can be run with the “-c” option to translate a class file into byte code:

$ javap -c -s -verbose MyEnv

{
MyEnv();
  Signature: ()V
  Code:
   Stack=1, Locals=1, Args_size=1
   0:   aload_0
   1:   invokespecial   #1; //Method java/lang/Object."“:()V
   4:   return
  LineNumberTable:
   line 3: 0

public static void main(java.lang.String[]);
  Signature: ([Ljava/lang/String;)V
  Code:
   Stack=2, Locals=2, Args_size=1
   0:   new     #2; //class MyEnv
   3:   dup
   4:   invokespecial   #3; //Method ““:()V
   7:   astore_1
   8:   aload_1
   9:   invokespecial   #4; //Method allocateObjects:()V
   12:  return
  LineNumberTable:
   line 7: 0
   line 8: 8
   line 9: 12
}

In the output above, you can see the methods and constants the MyEnv class utilizes, and the byte code that makes up the main method in the MyEnv class. Javap is amazingly cool, and I am starting to realize that understanding byte code can a super useful tool for debugging problems.

Posted by matty, filed under Java. Date: December 16, 2007, 6:31 pm | 1 Comment »

The Sun JDK has included an SNMP agent since version 5, and the agent has a number of objects in its MIB that provide runtime utilization data. The current set of objects include thread utilization counters, garbage collection events, time spent performing garbage collection, heap usage, class loader activity, compiler metrics, etc. Graphing these values can help with performance analysis, since you can view activity over time. This data is also useful for capacity planningl, since the graphs can be used to size hardware for an application, and for forecasting future hardware upgrades.

Numerous tools are available to graph utilization data, but my personal favorite is Orca. Orca can graph arbitrary data, and is a perfect fit for generating JVM performance graphs. Configuring Orca to generate JVM performance graphs is a snap. To begin, you will first need to enable the Java SNMP agent. To enable the agent, you can add the “-Dcom.sun.management.snmp.interface” (interface to bind to), “-Dcom.sun.management.snmp.port” (use a different SNMP port other that the default, port 161), “-Dcom.sun.management.snmp.acl” (use ACLs to limit who can read and write to the server) and ” -Dcom.sun.management.snmp.acl.file” (ACL file to use) options to the java command line:

$ java -Dcom.sun.management.snmp.interface=192.168.1.10 \
        -Dcom.sun.management.snmp.port=8161 \
        -Dcom.sun.management.snmp.acl=true \
        -Dcom.sun.management.snmp.acl.file=/usr/java/jre/lib/management/snmp.acl …

The $JAVA_HOME/jre/lib/management/ directory contains a sample SNMP ACL file (aptly named snmp.acl.template) that can be tailored to fit your site’s security policy (if you use the default locations, you do not need to specify the last two parameters). After the SNMP agent is enabled, you can run jvmstats.pl from cron to collect statistics at specific intervals. Each time Jvmstats.pl runs, it will write the current values of the Java agent’s performance counters (e.g., number of active threads, time spent performing garbage collection, etc.) to the file pased to the “-f” option, and will add a header to the file if it doesn’t exist.

The example below show how jvmstats.pl can be used to connect to server foo.prefetch.net on UDP port 8161, collect the current set of JVM statistics, and write these statistics to a file (the file name should match up with the regex used in the Orca find_files configuration stanza):

$ jvmstats.pl -s foo.prefetch.net -p 8161 -c public
        -f /home/matty/java/jvmstats/server1_myappjvm/javallator-2007-11-05

$ cat /home/matty/java/jvmstats/server1_myappjvm/javallator-2007-11-05

TIMESTAMP JVMUPTIME JITTIME THREADSTOTAL THREADSACTIVE CLASSLOADS CLASSUNLOADS \
NGGCEVENTS OLDGCEVENTS NGGCTIME OLDGCTIME OBJECTFINALIZATION HEAPCOMMITTED \
HEAPUSED NONHEAPCOMMITTED NONHEAPUSED
1194308951 218176652 735 308464 10 1348 0 380934 0 5309065 0 0 \
266797056 165481912 19136512 7156344

After a few samples are taken, the orca utility can be run with the javallator.cfg configuration file to graph the data collected above (if you haven’t used Orca before, you can peruse my monitoring LDAP article to see how to configure it). This will result in a set of graphs that look similar to the samples posted to my website. I really dig Java, and having historical performance graphs handy sure makes performance analysis a whole lot easier!

Posted by matty, filed under Java. Date: November 5, 2007, 10:21 pm | 2 Comments »

In my previous post, I discussed how jmap can be used to view the contents of the Java heap. Jmap is a wonderful utility for viewing heap utilization at a specific point in time, but it’s not the best utiltiy for answer questions like “which call stack is allocating objects of type foo?”, or “how many objects are being allocated per second?”, or “what is the average size of allocated objects?”. To answer these types of questions, we need to look to a more thorough instrumentation framework.

One such framework is the DTrace hotspot provider which was introduced in Java 6, and provides a number of probes to dynamically instrument Java applications. These probes allow you to observe thread behavior, garbage collection, method calls, class loader activity, monitor contention and object allocation. The probe to observe object allocation is rightfully named “object-alloc,” and contains several arguments (since the probe arguments are documented extremely well in the Java community, I won’t describe them here) that can be used to understand how the heap is being used.

To use the object-alloc probe, you will need to start the JVM with the “-XX:+DTraceAllocProbes” option, or run the jinfo utility to enable it at runtime (enabling it at runtime is preferred):

$ jinfo -XX:+DTraceAllocProbes `pgrep java`

Once the object-alloc probe is enabled, you can begin prodding your application to see what it’s doing. If you are seeing lots of garbage collection events in jconsole, jstat or your garbage collection logs, the object allocation probe can be used to display object allocations by object type, the number of objects being allocated, and the size of these allocations. The newobjects.d DTrace script provides the necessary probe definitions to display this information:

$ newobjects.d

Class                           Objects created  Bytes Allocated
[I                              112              16064
Wasteful                        17943            287088
java/lang/StringBuilder         17943            287088
java/lang/String                17943            430632
[C                              107658           52393560  

Class                           Objects created  Bytes Allocated
[I                              121              18520
Wasteful                        19541            312656
java/lang/StringBuilder         19541            312656
java/lang/String                19542            469008
[C                              117252           57062640

If you see tons of object allocations, you can use the DTrace quantize() action to create a histogram with object counts or object sizes. The objectsize.d DTrace script shows how to create a distribution that is keyed on object size:

$ objectsize.d

         value  ------------- Distribution ------------- count
             8 |                                         0
            16 |@@@@@@@@@@@@@                            95897
            32 |@@@@                                     31965
            64 |                                         1
           128 |@@@@                                     32157
           256 |@@@@                                     31965
           512 |@@@@@@@@@                                63932
          1024 |@@@@                                     31967
          2048 |                                         0

Once we know how many objects are being allocated, it would useful to see which Java call stacks are responsible for allocating these objects. To generate a list of Java calls stacks with the number of objects allocated per call stack, we can run the whoallocatecount.d DTrace script:

$ whoallocatecount.d

     < ..... >

     libjvm.so`__1cNSharedRuntimeTdtrace_object_alloc6FpnHoopDesc__i_+0×4f
     
     Wasteful.main([Ljava/lang/String;)V
     StubRoutines (1)
     libjvm.so`_pnGThread__v_+0×1a3
     libjvm.so`jni_CallStaticVoidMethod+0×15d
     java`JavaMain+0xd30
     libc.so.1`_thr_setup+0×52
     libc.so.1`_lwp_start
          5678

In the output above (which was cleaned up to make it easy to view), we can see that 5678 objects where allocated by the main() method in the Wasteful class. If you would prefer to see call stacks by bytes allocated, you can run the whoallocatebybytes.d script:

$ whoallocatebytes.d

     < ..... >

     libjvm.so`__1cNSharedRuntimeTdtrace_object_alloc6FpnHoopDesc__i_+0×4f
     
     Wasteful.main([Ljava/lang/String;)V
     StubRoutines (1)
     libjvm.so`_pnGThread__v_+0×1a3
     libjvm.so`jni_CallStaticVoidMethod+0×15d
     java`JavaMain+0xd30
     libc.so.1`_thr_setup+0×52
     libc.so.1`_lwp_start
          817632

The hotspot provider is an incredible debugging tool, and it’s pretty amazing what you can do with just one probe!! If you decide to use these probes in production, be somewhat cautious. Selectively enable probes, and do so only for the duration of your troubleshooting session. Be safe, be careful, and most of all, have fun! Niiiiiiice!

Posted by matty, filed under Java, Solaris DTrace. Date: October 31, 2007, 10:48 pm | No Comments »

« Previous Entries