Using DTrace To Understand mpstat And vmstat Output


The Solaris Operating System provides the mpstat and vmstat utilties to view CPU and Virtual memory performance statistics. These utilities provide numerous useful performance counters, but don’t provide any details on why each performance counter contains a given value. This article will show how DTrace can be used to determine which application or kernel subsystem is responsible for the values printed during each sampling period. The DTrace output can be a valuable starting point for conducting additional tests, and in some cases, the output can spot misbehaving applications and kernel subsystems.

MPSTAT

The mpstat utility can be used to view an assortment of scheduling, locking and CPU utilization metrics. Mpstat is usually invoked with an integer argument. This argument controls the frequency in which data is displayed, and each line of mpstat output contains the number to events that occurred during that sampling period:

$ mpstat 5

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   11  13  558   760  212  265    1    3    1    0   525    9   1   0  90
  1    9  11  479   467    0  251    1    3    1    0   474    9   1   0  89
  2    7   4  226   884  383  147    0    4    2    0   192    4   1   0  96
  3    7   4  234   495    0  146    0    3    0    0   215    5   1   0  95

The following sections will provide the mpstat(1m) manual page description for each column, along with the DTrace or system utility that can be used to identify the process or kernel subsystem responsible for the values.

Minor faults (minf)

minf indicates the number of minor faults that occurred. To view the executables responsible for the minor faults, the vminfo provider’s as_fault probe can be used:

$ dtrace -n 'vminfo:::as_fault

{
    @execs[execname] = count() 
}'

dtrace: description 'vminfo:::as_fault ' matched 1 probe

sac                                                               1
mpstat                                                           10
uname                                                            36
dtrace                                                           47
ostname                                                          72
sh                                                              142
procmail                                                        165
fetchmail                                                       224
cron                                                            278
spamassassin                                                   2964

Major faults (majf)

majf indicates the number of major faults that occurred. To view the executables responsible for the major faults, the vminfo provider’s maj_fault probe can be used:

$ dtrace -n 'vminfo:::maj_fault

{
    @execs[execname] = count() 
}'

dtrace: description 'vminfo:::maj_fault ' matched 1 probe

awk                                                               1
sshd                                                              1
bash                                                             11

Cross calls (xcal)

xcal indicates the number of cross calls that occured. To view the executables that are responsible for the cross-call activity, the sysinfo provider’s xcal probe can be used:

$ dtrace -n 'sysinfo:::xcalls

{
    @execs[execname] = count(); 
}'

dtrace: description 'sysinfo:::xcalls' matched 2 probes

bash                                                              3
cron                                                              9
uname                                                            24
sched                                                           219
dtrace                                                        50415

Interrupts (intr and ithr)

intr and ithr indicate the total number of interrupts, and the number of interrupts handled as kernel threads. To see which devices are issuing interrupts, the DTraceToolkit intrtime script and the Solaris intrstat command can be used:

$ intrtime 60


 Interrupt         Time(ns)   %Time
      uata          2869846    0.00
       qfe         46331270    0.08 
       glm       1913715146    3.19
TOTAL(int)       1962916262    3.27 
TOTAL(dur)      60008698021  100.00

$ /usr/sbin/intrstat 5

device |      cpu0  %tim 
-------------+---------------
glm#0  |       953  2.6 
qfe#0  |       202  1.5 
uata#0 |        91  0.2

Context switches (csw)

csw indicates the total number of context switches that occurred. To view the number of context switches per process, the sysinfo provider’s pswitch probes can be used:

$ dtrace -n 'sysinfo:::pswitch

{
    @execs[execname] = count(); 
}'

dtrace: description 'sysinfo:::pswitch ' matched 3 probes

sshd                                                              2
fmd                                                               3
fsflush                                                           3
sendmail                                                          3
nscd                                                              5
svc.startd                                                        6
dtrace                                                            7
httpd                                                             8
orca                                                           1702
sched                                                          4189

Involuntary context switches (icsw)

icsw indicates the number of times an involuntary context switch occurred. To see which processes time quantums expired prior to the process finishing execution, the sysinfo provider’s inv_swtch probe can be used:

$ dtrace -n 'sysinfo::preempt:inv_swtch

{
    @execs[execname] = count(); 
}'

dtrace: description 'sysinfo::preempt:inv_swtch ' matched 1 probe

dtrace                                                            1
orca                                                             57

Thread migrations (migr)

migr indicates the number of times a thread was migrated to a new CPU. To see which threads were migrated, the sched provider’s on-cpu and off-cpu probes can be used:

$ dtrace -n ' sched:::off-cpu

{
    self->cpu = cpu; 
}

sched:::on-cpu /self->cpu != cpu/ 
{
    printf("%s migrated from cpu %d to cpu %d\n",execname,self->cpu,cpu);
    self->cpu = 0; 
}'

To see the list of threads that were moved to an idle CPU dispatch queue, the Solaris /usr/demo/dtrace/whosteal.d script can be executed:

$ dtrace -s /usr/demo/dtrace/whosteal.d


sched stolen from CPU 3 by:

value  ------------- Distribution ------------- count
   < 0 |                                         0
     0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
     1 |                                         0


sendmail stolen from CPU 2 by:

value  ------------- Distribution ------------- count
   < 0 |                                         0
     0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
     1 |                                         0

Spin on mutexes (smtx)

smtx indicates the number of times a thread spinned on a mutex. To locate the executable responsible for the non-zero values in the smtx column, the the lockstat provider’s adaptive-spin and adaptive-block probes can be used:

$ dtrace -n 'lockstat:::adaptive-spin, lockstat:::adaptive-block

{
    @execs[execname,probename] = count(); 
}'

dtrace: description 'lockstat:::adaptive-spin, lockstat:::adaptive-block' matched 2 probes

bash            adaptive-spin                             44
orca            adaptive-spin                            144
sched           adaptive-spin                            527

The plockstat command (a DTrace consumer) can also be used to detect userland lock contention (plockstat uses DTrace to gather lock contention events, and the DTrace used to find lock events can be viewed with the “-V” option):

$ plockstat -s 4 -p 336


Mutex block

-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   12 24816298 libc.so.1`libc_malloc_lock   0

      nsec ---- Time Distribution --- count Stack
   4194304 |@@                      |     1 libc.so.1`lmutex_lock+0xe0
   8388608 |@@@@@@                  |     3 libc.so.1`malloc+0x44
  16777216 |@@@@@@@@                |     4 libc.so.1`__regcomp_C+0x24
  33554432 |@@@@                    |     2 
  67108864 |@@@@                    |     2 

   Mutex unsuccessful spin

   -------------------------------------------------------------------------------
   Count     nsec Lock                         Caller
      16    32768 libc.so.1`libc_malloc_lock   0

    nsec ---- Time Distribution --- count Stack
   32768 |@@@@@@@@@@@@@@@@@@@@@@@@|    16 libc.so.1`lmutex_lock+0xe0
                                          libc.so.1`malloc+0x44
                                          libc.so.1`__regcomp_C+0x24

Spin on read/writer lock (srw)

srw indicates the number of times a thread blocked on a reader/writer lock. To locate the executables that are blocking on a reader/write lock, the lockstat provider’s rw-block probe can be used:

$ dtrace -n 'lockstat:::rw-block

{
    @execs[execname] = count(); 
}'

dtrace: description 'lockstat:::rw-block ' matched 1 probe

collect_server       221 orca                 254

The plockstat command (a DTrace consumer) can also be used to detect userland lock contention.

System calls (syscl)

syscl indicates the number of system calls that have been issued. To get a breakdown of system calls by process, the syscall provider can be used:

$ dtrace -n 'syscall:::entry

{
    @num[execname] = count(); 
}'

dtrace: description 'syscall:::entry' matched 225 probes

utmpd                                                             2
sac                                                               6
fmd                                                               7
svc.configd                                                       7
ttymon                                                            9
sshd                                                             17
svc.startd                                                       78
sendmail                                                         92
nscd                                                            149
httpd                                                           213
dtrace                                                         2018
orca                                                          70710

The syscall provider can also be used to display the system call along with the executable that issued the call:

$ dtrace -n 'syscall:::entry

{
    @num[execname,probefunc] = count(); 
}'

dtrace: description 'syscall:::entry ' matched 225 probes

orca                          stat64                501
orca                          llseek                610
orca                            read                662
orca                           lseek                666
orca                           gtime               1506
orca                         fstat64               1524

CPU utilization (usr/sys/wt/idl)

usr, sys, wt and idl indicate the amount of time spent running user applications, kernel functions, waiting for I/O (this is a useless metric), and running the kernel idle thread. To get a breakdown of CPU usage by process, the DTraceToolkit cputimes script can be used:

$ cputimes -a 5

2006 Jan 11 13:49:25,
THREADS        TIME (ns)
cron            47842 
nscd            211868
fmd             262522 
svc.configd     278960
svc.startd      282133
httpd           754139 
fsflush         9870970
mysqld          10047752 
pageout         11670536
KERNEL          32145457 
IDLE            4908549182

The DTrace pid provider can be used to profile individual applications, and the DTraceToolkit modcalls.d script can be used to determine which kernel subsystems are busy:

$ modcalls.d


dtrace: script './modcalls.d' matched 18578 probes

ldterm                                                           19
pcipsy                                                           27
ip                                                               29
ipf                                                              46
uata                                                             85
specfs                                                           88
TS                                                              145
sha1                                                            543
SUNW,UltraSPARC-IIi                                            2111
genunix                                                       20225
unix                                                         110134

VMSTAT

The vmstat utility can be used to view an assortment of virtual memory statistics, and provides the “-p” option to breakdown paging activity by page type (e.g., executable, anonymous, file system). vmstat is typically invoked with an integer argument. This argument controls the frequency in which data is displayed, and each line of vmstat output contains the number to events that occurred during that sampling period:

$ vmstat -p 5


     memory           page          executable      anonymous      filesystem 
  swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
1986360 5368   0  21 17703 0 46954  0    0 4491    0 4943 4943    0    0 8269
1986224 15944 15  81 12706 0 4659 165    0 2886    0 4643 4643   63  133 5176
1986040 11696  6  37 8293  0 16890 246   0 1560    0 3715 3715    0   16 3018
1985848 7368   0  19 5321  0 38852 53    0    0   15 5313 5313    0    8    8
1985768 4432   3  29 4994  0 52122 171   0   16    0 4971 4971    0    0    8
1985680 5936   6  37 6496  0 49302 192   0  592    0 5888 5888    0    0   16
1985648 7736   2  30 6415  0 53248 116   0    7   15 6400 6400    7    7    7
1985600 5776  12  31 6265  0 53264 317   0  143   16 6083 6083    0    0   40
1985568 8264   0  20 6083  0 58411 222   0  127   24 5956 5956   16    0    0

The following sections will provide the vmstat(1m) manual page description for each column, along with the DTrace or system utility that can be used to correlate paging activity to individual processes.

Page reclaims (re)

re indicates the number of pages that have been reclaimed during this sampling period. To view the executables that were able to be reclaim pages in the Solaris page cache, the vminfo provider’s page_reclaims probe can be used:

$ dtrace -n 'vminfo::page_reclaim:pgrec

{ 
    @execs[execname] = count(); 
}'

dtrace: description 'vminfo::page_reclaim:pgrec ' matched 1 probes

nscd                                                              6
bash                                                             92
metastat                                                        966

Pages freed (fr)

fr indicates the number of pages that have been freed by the page scanner or the file system. To view the executables that had pages freed, the vminfo provider’s free probes can be used:

$ dtrace -n 'vminfo:::*free

{ 
    @execs[execname,probename] = count(); 
}'

dtrace: description 'vminfo:::*free ' matched 8 probes

spamassassin           anonfree                                       114
spamassassin           execfree                                       114
spamassassin           fsfree                                         114
spamassassin           dfree                                          114
sched                  fsfree                                         770
sched                  dfree                                          770
sched                  execfree                                       770
sched                  anonfree                                       770

Executable pages paged in (epi)

epi indicates the number of executable pages that have been paged in. To see which executables are causing executable pages to be faulted in, the vminfo provider’s execpgin probe can be used:

$ dtrace -n'vminfo:::execpgin

{ 
    @execs[execname] = count(); 
}'

dtrace: description 'vminfo:::execpgin ' matched 1 probe

metastat                                                         69
kcfd                                                             70
vmstat                                                           77
bash                                                            207
spamassassin                                                    242
sshd                                                            358

Anonymous pages paged in (api)

api indicates the number of anonymous pages that have been paged in. To see which executables are causing anonymous memory pages to be paged in from the backing store (typically swap), the vminfo provider’s anonpgin probe can be used:

$ dtrace -n'vminfo:::anonpgin

{ 
    @execs[execname] = count(); 
}'

dtrace: description 'vminfo:::anonpgin ' matched 1 probe

nscd                                                              7
bash                                                              8
cron                                                             14

Anonymous pages paged out (apo)

apo indicates the number of anonymous pages that have been paged out. To see which executables are causing anonymous memory pages to be paged out to the the backing store (typically swap), the vminfo provider’s anonpgout probe can be used:

$ dtrace -n 'vminfo:::anonpgout

{ 
    @execs[execname] = count(); 
}'

dtrace: description 'vminfo:::anonpgout ' matched 1 probe

vi                                                                4
java                                                             22
sched                                                           693

File system pages paged in (fpi)

fpi indicates the numer of file system pages that have been paged in. To see which executables are responsible for the file system paging activity, the vminfo provider’s fspgin probe can be used:

$ dtrace -n 'vminfo:::fspgin

{ 
    @execs[execname] = count();
}'

dtrace: description 'vminfo:::fspgin ' matched 1 probe

metastat                                                          1
which                                                             1
ypcat                                                             1
bart                                                              2
bash                                                             27

File system pages paged out (fpo)

fpo indicates the number of file system pages that have been paged out. To see which executables are responsible for the file system paging activity, the vminfo provider’s fspgout probe can be used:

$ dtrace -n 'vminfo:::fspgout

{ 
    @execs[execname] = count();
}'

dtrace: description 'vminfo:::fspgout ' matched 1 probe

sshd                                                              1
mailx                                                            10
sched                                                          1003

Conclusion

DTrace is a very powerful tool for understanding system and application behavior, and greatly simplifies the process of correlating system events to applications that may be misbehaving. As with any software tool, it is important to read the documentation (the reference section includes links to numerous pieces of useful documentation) prior to beginning full
scale system or application analysis.

References

The following references were used while writing this article: