Using Solaris source code + DTrace

Brian Leonard wrote an excellent blog post about how to use DTrace, how to show details about specific probes, and then using the source code in conjunction with his findings to understand exactly what was happening within that probe. ┬áif you’re starting to deep dive into DTrace, this is an excellent article for learning these crucial steps!

Instrumenting Java code with JSDT probes

A few months back, I got wind from the world famous Jarod Jenson that Keith McGuigan was working on adding JSDT probes to Java. JSDT probes struck me as an extremely useful feature for realtime problem analysis, so I decided to instrument some sample code to see how they worked. After a couple of e-mail exchanges with Keith, I had a working Java build that supported JSDT. After reviewing a number of Java projects, I chose to instrument the PostgreSQL Java type IV driver, since I wanted to learn PostgreSQL along with JDBC. After reading through a fair amount of the PostgreSQL type IV driver code and Keith’s blog, I decided to add the following probes to the postgreSQL driver:

$ dtrace -l -m java_tracing

   ID   PROVIDER            MODULE                          FUNCTION NAME
13199 postgresqljdbc1619      java_tracing                       unspecified transactioncommit
13200 postgresqljdbc1619      java_tracing                       unspecified transactionrollback
13201 postgresqljdbc1619      java_tracing                       unspecified executesqlstatementstart
13202 postgresqljdbc1619      java_tracing                       unspecified getconnectionfrompoolstop
13203 postgresqljdbc1619      java_tracing                       unspecified executepreparessqlstatementstart
13204 postgresqljdbc1619      java_tracing                       unspecified transactionautomcommit
13205 postgresqljdbc1619      java_tracing                       unspecified releaseconnectionstop
13206 postgresqljdbc1619      java_tracing                       unspecified getconnectionfrompoolstart
13207 postgresqljdbc1619      java_tracing                       unspecified executepreparedsqlstatementstop
13208 postgresqljdbc1619      java_tracing                       unspecified returnconnectiontopoolstop
13209 postgresqljdbc1619      java_tracing                       unspecified transactionsavepoint
13210 postgresqljdbc1619      java_tracing                       unspecified acquireconnectionstart
13211 postgresqljdbc1619      java_tracing                       unspecified returnconnectiontopoolstart
13212 postgresqljdbc1619      java_tracing                       unspecified releaseconnectionstart
13213 postgresqljdbc1619      java_tracing                       unspecified acquireconnectionstop
13214 postgresqljdbc1619      java_tracing                       unspecified executesqlstatementstop



DTrace uses providers to group probes, so the first thing I did was create a “postgresqljdbc ” provider that would be visible to DTrace through the JSDT framework. This was achieved by defining an interface that extended the com.sun.tracing.Provider class:

package org.postgresql;

public interface postgresqljdbc extends com.sun.tracing.Provider {

    // Probes that fire when a simple SQL statement begins / ends execution
    void executesqlstatementstart(String query);
    void executesqlstatementstop(String query);

    // Probes that fire when a prepared statement begins / ends execution
    void executepreparessqlstatementstart(String query);
    void executepreparedsqlstatementstop(String query);

    // Probe that fires when a new connection is established / released
    void acquireconnectionstart(String host, int port, String database, String user);
    void acquireconnectionstop(String host, int port, String database, String user);

    void releaseconnectionstart(String host, int port, String user, String passwd);
    void releaseconnectionstop(String host, int port, String user, String passwd);

    // Probes that fire when a DB connection pool is accessed / released
    void getconnectionfrompoolstart(String host, int port, String user, String passwd);
    void getconnectionfrompoolstop(String host, int port, String user, String passwd);

    void returnconnectiontopoolstart(String host, int port, String user, String passwd);
    void returnconnectiontopoolstop(String host, int port, String user, String passwd);

    // Probe that fires when a transaction starts, saves, commits or rolls back
    void transactionautomcommit(boolean value);
    void transactionsavepoint(String value);
    void transactioncommit();
    void transactionrollback();
}



Now that the provider was defined, I needed to import the Provider and ProviderFactory classes that are part of the JSDT framework, and add the code to instantiate a new postgresqljdbc object:

import com.sun.tracing.Provider;
import com.sun.tracing.ProviderFactory;

public class Driver implements java.sql.Driver
{
   .....

    public static ProviderFactory factory = ProviderFactory.getDefaultFactory();
    public static postgresqljdbc provider = factory.createProvider(postgresqljdbc.class);



Next, I started adding probe points to the driver. This was super easy, since probe points are defined by a call to one of the provider methods that are defined in the interface that extends the Provider class. Here are the probes I added to capture SQL statement execution:

    public boolean executeWithFlags(String p_sql, int flags) throws SQLException
    {
        org.postgresql.Driver.provider.executesqlstatementstart(p_sql);
        checkClosed();
        p_sql = replaceProcessing(p_sql);
        Query simpleQuery = connection.getQueryExecutor().createSimpleQuery(p_sql);
        execute(simpleQuery, null, QueryExecutor.QUERY_ONESHOT | flags);
        this.lastSimpleQuery = simpleQuery;

        org.postgresql.Driver.provider.executesqlstatementstop(p_sql);

        return (result != null && result.getResultSet() != null);
    }



In this specific code block, I added the “org.postgresql.Driver.provider.executesqlstatementstart” probe which fires when the SQL statement begins execution, and a “executesqlstatementstop” probe that fires after the SQL statement has executed. In both cases, the argument available to DTrace scripts is the SQL statement (represented as a String) that is being executed.

Next up, I ran ant to build a new postgreSQL driver, and executed a test program to verify everything worked:

$ export CLASSPATH=/home/matty/java/postgrestest:/home/matty/java/postgrestest/postgresql.jar

$ export JAVA_HOME=/usr/java/

$ java -Djdbc.drivers=org.postgresql.Driver -XX:+ExtendedDTraceProbes Test

The Test program does nothing more than create SQL connections, execute SQL statements and print the results to STDOUT. To allow me to see the difference between prepared and unprepared statements, I decided to add unique probes for each statement type.

After debugging a few issues, I was able to run the Test program and a few DTrace scripts I had created. The first script, sqlconnections.d, listed new connections to the database and the time it took to create these connections:

$ ./sqlconnections.d

TIMESTAMP             HOST                  PORT        DATABASE    USER        TIME      
2008 Oct 10 15:59:47  localhost             5432        postgres    matty       17048632  
2008 Oct 10 15:59:51  localhost             5432        postgres    matty       17597898  



The next script, sqltimes.d, listed all of the SQL statements that were executed along with the total time they had run:

$ sqltimes.d

Tracing started (CNTRL+C to end)
^C
SQL statements:

SQL QUERY                                                          COUNT        TIME
update matty set col1='one'                                            2      722359
select * from matty                                                    4     1456311


Prepared statements:

SQL QUERY                                                          COUNT        TIME
update matty set col1='one'                                            2      548765
select * from matty                                                    4     1345987



The third script, sqltrace.d, displayed each SQL statement executed along with the time it had executed:

$ ./sqltrace.d

TIMESTAMP             SQL QUERY                                                     TIME      
1970 Mar  1 16:59:06  select * from matty                                           314021    
1970 Mar  1 16:59:06  select * from matty                                           251793    
1970 Mar  1 16:59:06  update matty set col1='one'                                   308901    



I also had fun playing with transactions and JDBC connection pool probes, but that data wasn’t nearly as interesting as the SQL statement execution times listed above. If you are running openjdk on Solaris, and you want to get better visibility into your applications, JSDT may well be worth a look! I’m still not 100% sure what kind of performance impact these probes will have on an application, but will wait for the probes to be integrated into a mainsteam Java build prior to doing any real performance testing. Thanks Keith for answering my e-emails, and to team DTrace for creating the awesomeness that is DTrace!

Using the DTrace hotspot provider to observe java object allocations

In my previous post, I discussed how jmap can be used to view the contents of the Java heap. Jmap is a wonderful utility for viewing heap utilization at a specific point in time, but it’s not the best utiltiy for answer questions like “which call stack is allocating objects of type foo?”, or “how many objects are being allocated per second?”, or “what is the average size of allocated objects?”. To answer these types of questions, we need to look to a more thorough instrumentation framework.

One such framework is the DTrace hotspot provider which was introduced in Java 6, and provides a number of probes to dynamically instrument Java applications. These probes allow you to observe thread behavior, garbage collection, method calls, class loader activity, monitor contention and object allocation. The probe to observe object allocation is rightfully named “object-alloc,” and contains several arguments (since the probe arguments are documented extremely well in the Java community, I won’t describe them here) that can be used to understand how the heap is being used.

To use the object-alloc probe, you will need to start the JVM with the “-XX:+DTraceAllocProbes” option, or run the jinfo utility to enable it at runtime (enabling it at runtime is preferred):

$ jinfo -XX:+DTraceAllocProbes `pgrep java`

Once the object-alloc probe is enabled, you can begin prodding your application to see what it’s doing. If you are seeing lots of garbage collection events in jconsole, jstat or your garbage collection logs, the object allocation probe can be used to display object allocations by object type, the number of objects being allocated, and the size of these allocations. The newobjects.d DTrace script provides the necessary probe definitions to display this information:

$ newobjects.d

Class                           Objects created  Bytes Allocated
[I                              112              16064     
Wasteful                        17943            287088    
java/lang/StringBuilder         17943            287088    
java/lang/String                17943            430632    
[C                              107658           52393560  

Class                           Objects created  Bytes Allocated
[I                              121              18520     
Wasteful                        19541            312656    
java/lang/StringBuilder         19541            312656    
java/lang/String                19542            469008    
[C                              117252           57062640  

If you see tons of object allocations, you can use the DTrace quantize() action to create a histogram with object counts or object sizes. The objectsize.d DTrace script shows how to create a distribution that is keyed on object size:

$ objectsize.d

         value  ------------- Distribution ------------- count    
             8 |                                         0        
            16 |@@@@@@@@@@@@@                            95897    
            32 |@@@@                                     31965    
            64 |                                         1        
           128 |@@@@                                     32157    
           256 |@@@@                                     31965    
           512 |@@@@@@@@@                                63932    
          1024 |@@@@                                     31967    
          2048 |                                         0        

Once we know how many objects are being allocated, it would useful to see which Java call stacks are responsible for allocating these objects. To generate a list of Java calls stacks with the number of objects allocated per call stack, we can run the whoallocatecount.d DTrace script:

$ whoallocatecount.d

     < ..... >

     libjvm.so`__1cNSharedRuntimeTdtrace_object_alloc6FpnHoopDesc__i_+0x4f
     
     Wasteful.main([Ljava/lang/String;)V
     StubRoutines (1)
     libjvm.so`_pnGThread__v_+0x1a3
     libjvm.so`jni_CallStaticVoidMethod+0x15d
     java`JavaMain+0xd30
     libc.so.1`_thr_setup+0x52
     libc.so.1`_lwp_start
          5678

In the output above (which was cleaned up to make it easy to view), we can see that 5678 objects where allocated by the main() method in the Wasteful class. If you would prefer to see call stacks by bytes allocated, you can run the whoallocatebybytes.d script:

$ whoallocatebytes.d

     < ..... >

     libjvm.so`__1cNSharedRuntimeTdtrace_object_alloc6FpnHoopDesc__i_+0x4f
     
     Wasteful.main([Ljava/lang/String;)V
     StubRoutines (1)
     libjvm.so`_pnGThread__v_+0x1a3
     libjvm.so`jni_CallStaticVoidMethod+0x15d
     java`JavaMain+0xd30
     libc.so.1`_thr_setup+0x52
     libc.so.1`_lwp_start
          817632

The hotspot provider is an incredible debugging tool, and it’s pretty amazing what you can do with just one probe!! If you decide to use these probes in production, be somewhat cautious. Selectively enable probes, and do so only for the duration of your troubleshooting session. Be safe, be careful, and most of all, have fun! Niiiiiiice!

Enabling the DTrace hotspot provider after the JVM starts

While debugging a JVM performance issue a while back, I encountered the following error when I enabled the DTrace hotspot provider:

$ jinfo -flag +ExtendedDTraceProbes `pgrep java`
590: Unable to open door: target process not responding or HotSpot VM not loaded

After a bit of debugging, I figured out that the jinfo command needs to be run by the user the JVM runs as. Hopefully this will help others who encounter this annoying problem.

Measuring system call time with procsystime

When debugging application performance problems related to high system time, I typically start my analysis by watching the system calls the application is issuing, and measuring how much time is spent in each system call. Gathering this information is simple with DTrace syscall provider, and the DTraceToolkit comes with the procsystime script to allow admins to easily analyze system call behavior. To use procsystime to measure how much time the sshd proceses are spending in each system call, we can run procsystime with the “-T” option to get the total time spent in all system calls, the “-n” option, and the process name to analyze (in the example below, using the string “sshd” will cause procsystime to analyze the system call behavior for all processes named sshd):

$ procsystime -Tn sshd

Hit Ctrl-C to stop sampling...
^C

Elapsed Times for processes sshd,

         SYSCALL          TIME (ns)
           umask               9804
         setpgrp              12111
             nfs              12194
        pathconf              12973
           chdir              13656
        setregid              20676
        setreuid              22364
      getdents64              27036
       getgroups              28605
        lwp_self              29808
      getsockopt              30959
          setgid              31365
           alarm              31507
            zone              31691
          setuid              33861
      setsockopt              39311
         setegid              39660
      setcontext              40061
         seteuid              40149
           lseek              41430
             dup              43978
         c2audit              44604
     getsockname              51978
         privsys              52974
         waitsys              56247
          getgid              57050
          accept              59959
            fsat              76925
       setgroups              79791
         tasksys              81609
      systeminfo              91019
       sysconfig             112749
        recvfrom             127964
          access             138435
            pipe             144371
     getpeername             157330
          fxstat             175541
        schedctl             182224
           vfork             217932
          putmsg             241482
         connect             301860
       sigaction             317411
        shutdown             328316
             brk             356044
       so_socket             409152
          getpid             484735
           fcntl             494322
           gtime             526637
          stat64             539840
          llseek             624945
     resolvepath             678157
          open64             715602
          getuid             950119
         fstat64             964132
           ioctl            1171727
           xstat            1278052
         memcntl            1394735
            send            1846685
           close            2325223
            open            2685141
            mmap            5289087
          munmap            5379678
     lwp_sigmask            6178493
           exece           11787526
          doorfs           28604988
           write           46083911
           fork1           57233817
            read           96877372
         pollsys        25533333727
          TOTAL:        25811904817

The output will contain the name of the system call in the left hand column, and the time spent in that system call in the right hand column. There are additional options to display the number of calls to each system call, and you can also filter by process id if you want to measure a specific process. If you are running Solaris 10 and haven’t downloaded the DTraceToolkit, I highly recommend doing so!!!