Breaking down system time usage in the Solaris kernel

I am frequently asked (or paged) to review system performance issues on our Solaris 10 hosts. I use the typical set of Solaris performance tools to observe what my systems are doing, and start drilling down once I know if the problem is with userland applications or in the kernel itself. When I observe issues with the Solaris kernel (these are typically represented by high system time values in vmstat), the first thing I typically do is fire up lockstat to see where the kernel is spending its time:

$ lockstat -kIW /bin/sleep 5

Profiling interrupt: 31424 events in 5.059 seconds (6212 events/sec)

Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
-------------------------------------------------------------------------------
28962  92%  92% 0.00     3765 cpu[61]                cpu_halt
 1238   4%  96% 0.00     3747 cpu[22]                lzjb_compress
  165   1%  97% 0.00     2655 cpu[37]                copy_pattern
  124   0%  97% 0.00     2849 cpu[55]                copyout
   89   0%  97% 0.00     3682 cpu[63]                fletcher_4_native
   49   0%  97% 0.00     2565 cpu[37]                copyin
   45   0%  98% 0.00     3079 cpu[0]                 tcp_rput_data
   39   0%  98% 0.00     3597 cpu[0]                 mutex_enter
   30   0%  98% 0.00     3692 cpu[0]                 nxge_start
   28   0%  98% 0.00     3701 cpu[59]+6              nxge_receive_packet
   28   0%  98% 0.00     2935 cpu[0]                 disp_getwork
   25   0%  98% 0.00     3110 cpu[0]                 bcopy

If I see a function that stands out from the rest, I will use the lockstat ‘-f” option to drill down by the kernel function with that name, and use the “-s” option to print the call stack leading up this function:

$ lockstat -kIW -f lzjb_compress -s5 /bin/sleep 5

Profiling interrupt: 703 events in 2.058 seconds (342 events/sec)

-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
  130  18%  18% 0.00     3625 cpu[28]                lzjb_compress           

      nsec ------ Time Distribution ------ count     Stack
      2048 |                               2
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@       107
      8192 |@@@@                           21
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
   20   3%  21% 0.00     3529 cpu[37]                lzjb_compress           

      nsec ------ Time Distribution ------ count     Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@    18        0x74
      8192 |@@@                            2         zio_compress_data
                                                     zio_write_bp_init
                                                     zio_execute
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
   19   3%  24% 0.00     3696 cpu[28]                lzjb_compress           

      nsec ------ Time Distribution ------ count     Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@         14        0x50
      8192 |@@@@@@@                        5         zio_compress_data
                                                     zio_write_bp_init
                                                     zio_execute
-------------------------------------------------------------------------------

I use lockstat quite a bit to observe what the kernel is doing, and to help me figure out where I should look for answers in the opensolaris source code. It’s also useful for determining if you are encountering a kernel bug, since you can compare the backtrace returned from lockstat with the OpenSolaris bug databases.

Leave a Comment