Breaking down system time usage in the Solaris kernel

I am frequently asked (or paged) to review system performance issues on our Solaris 10 hosts. I use the typical set of Solaris performance tools to observe what my systems are doing, and start drilling down once I know if the problem is with userland applications or in the kernel itself. When I observe issues with the Solaris kernel (these are typically represented by high system time values in vmstat), the first thing I typically do is fire up lockstat to see where the kernel is spending its time:

$ lockstat -kIW /bin/sleep 5

Profiling interrupt: 31424 events in 5.059 seconds (6212 events/sec)

Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller                  
-------------------------------------------------------------------------------
28962  92%  92% 0.00     3765 cpu[61]                cpu_halt                
 1238   4%  96% 0.00     3747 cpu[22]                lzjb_compress           
  165   1%  97% 0.00     2655 cpu[37]                copy_pattern            
  124   0%  97% 0.00     2849 cpu[55]                copyout                 
   89   0%  97% 0.00     3682 cpu[63]                fletcher_4_native       
   49   0%  97% 0.00     2565 cpu[37]                copyin                  
   45   0%  98% 0.00     3079 cpu[0]                 tcp_rput_data           
   39   0%  98% 0.00     3597 cpu[0]                 mutex_enter             
   30   0%  98% 0.00     3692 cpu[0]                 nxge_start              
   28   0%  98% 0.00     3701 cpu[59]+6              nxge_receive_packet     
   28   0%  98% 0.00     2935 cpu[0]                 disp_getwork            
   25   0%  98% 0.00     3110 cpu[0]                 bcopy                   

If I see a function that stands out from the rest, I will use the lockstat ‘-f” option to drill down by the kernel function with that name, and use the “-s” option to print the call stack leading up this function:

$ lockstat -kIW -f lzjb_compress -s5 /bin/sleep 5

Profiling interrupt: 703 events in 2.058 seconds (342 events/sec)

-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller                  
  130  18%  18% 0.00     3625 cpu[28]                lzjb_compress           

      nsec ------ Time Distribution ------ count     Stack                   
      2048 |                               2         
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@       107       
      8192 |@@@@                           21        
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller                  
   20   3%  21% 0.00     3529 cpu[37]                lzjb_compress           

      nsec ------ Time Distribution ------ count     Stack                   
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@    18        0x74                    
      8192 |@@@                            2         zio_compress_data       
                                                     zio_write_bp_init       
                                                     zio_execute             
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller                  
   19   3%  24% 0.00     3696 cpu[28]                lzjb_compress           

      nsec ------ Time Distribution ------ count     Stack                   
      4096 |@@@@@@@@@@@@@@@@@@@@@@         14        0x50                    
      8192 |@@@@@@@                        5         zio_compress_data       
                                                     zio_write_bp_init       
                                                     zio_execute             
-------------------------------------------------------------------------------

I use lockstat quite a bit to observe what the kernel is doing, and to help me figure out where I should look for answers in the opensolaris source code. It’s also useful for determining if you are encountering a kernel bug, since you can compare the backtrace returned from lockstat with the OpenSolaris bug databases.

Leave a Reply

Your email address will not be published. Required fields are marked *