Watching slab usage with slabtop

This article was posted by Matty on 2007-03-24 22:43:00 -0400 -0400

The Linux kernel uses a slab based allocator to allocate kernel memory. Inside each slab is a collection of objects that have been allocated by one or more kernel subsystems. To monitor slab utilization in realtime, most modern day Linux distributions ship with the slaptop utility. When slabtop is run without any arguments, it displays a nice slab usage summary, and provides details of how various slabs are being used:

$ slabtop

Active / Total Objects (% used) : 220410 / 234629 (93.9%)
Active / Total Slabs (% used) : 4728 / 4728 (100.0%)
Active / Total Caches (% used) : 91 / 139 (65.5%)
Active / Total Size (% used) : 16609.59K / 18017.14K (92.2%)
Minimum / Average / Maximum Object : 0.01K / 0.08K / 128.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
80591 80494 99% 0.02K 397 203 1588K avtab_node
53562 53562 100% 0.03K 474 113 1896K size-32
35108 35099 99% 0.05K 524 67 2096K buffer_head
7378 7377 99% 0.27K 527 14 2108K radix_tree_node
7182 7182 100% 0.14K 266 27 1064K dentry_cache
6499 6497 99% 0.05K 97 67 388K selinux_inode_security
5460 5398 98% 0.04K 65 84 260K sysfs_dir_cache
3953 3425 86% 0.06K 67 59 268K size-64
3663 3644 99% 0.41K 407 9 1628K inode_cache
3654 479 13% 0.02K 18 203 72K biovec-1
2904 1882 64% 0.09K 66 44 264K vm_area_struct
2580 2442 94% 0.12K 86 30 344K size-128
2080 924 44% 0.19K 104 20 416K filp
2070 744 35% 0.12K 69 30 276K bio
1944 660 33% 0.05K 27 72 108K journal_head
1896 1892 99% 0.59K 316 6 1264K ext3_inode_cache
1595 525 32% 0.02K 11 145 44K anon_vma
1288 1209 93% 0.04K 14 92 56K Acpi-Operand
845 730 86% 0.02K 5 169 20K Acpi-Namespace
648 511 78% 0.05K 9 72 36K avc_node
540 471 87% 0.43K 60 9 240K proc_inode_cache

These is a sweet utility, and is another one of those tools that should be in every Linux administrators tool belt.

Using the Solaris coreadm utility to control core file generation

This article was posted by Matty on 2007-03-23 08:33:00 -0400 -0400

Solaris has shipped with the coreadm utiltiy for quite some time, and this nifty little utility allows you to control every facet of core file generation. This includes the ability to control where core files are written, the name of core files, which portions of the processes address space will be written to the core file, and my favorite option, whether or not to generate a syslog entry indicating that a core file was generated.

To begin using coreadm, you will first need to run it wit the “-g” option to specify where core files should be stored, and the pattern that should be used when creating the core file:

$ coreadm -g /var/core/core.%f.%p

Once a directory and file pattern are specified, you can optionally adjust which portions of the processes address space (e.g., text segment, heap, ISM, etc.) will be written to the core file. To ease debugging, I like to configure coreadm to dump everything with the”-G all” option:

$ coreadm -G all

Since core files are typically created at odd working hours, I also like to configure coreadm to log messages to syslog indicating that a core file was created. This can be done by using the coreadm “-e log” option:

$ coreadm -e log

After these settings are adjusted, the coreadm “-e global” option can be used to enable global core file generation, and the coreadm utility can be run without any arguments to view the settings (which are stored in /etc/coreadm.conf):

$ coreadm -e global

$ coreadm

global core file pattern: /var/core/core.%f.%p global core file content: all init core file pattern: core init core file content: default global core dumps: enabled per-process core dumps: enabled global setid core dumps: disabled per-process setid core dumps: disabled global core dump logging: enabled

Once global core file support is enabled, each time a process receives a deadly signal (e.g., SIGSEGV, SIGBUS, etc.):

$ kill -SIGSEGV 4652

A core file will be written to /var/core:

$ ls -al /var/core/4652

-rw——- 1 root root 4163953 Mar 9 11:51 /var/core/core.inetd.4652

And a message similar to the following will appear in the system log:

Mar 9 11:51:48 fubar genunix: [ID 603404 kern.notice] NOTICE: core_log: inetd[4652] core dumped: /var/core/core.inetd.4652

This is an amazingly useful feature, and can greatly simplify root causing software problems.

Measuring system call time with procsystime

This article was posted by Matty on 2007-03-17 10:47:00 -0400 -0400

When debugging application performance problems related to high system time, I typically start my analysis by watching the system calls the application is issuing, and measuring how much time is spent in each system call. Gathering this information is simple with DTrace syscall provider, and the DTraceToolkit comes with the procsystime script to allow admins to easily analyze system call behavior. To use procsystime to measure how much time the sshd proceses are spending in each system call, we can run procsystime with the “-T” option to get the total time spent in all system calls, the “-n” option, and the process name to analyze (in the example below, using the string “sshd” will cause procsystime to analyze the system call behavior for all processes named sshd):

$ procsystime -Tn sshd

Hit Ctrl-C to stop sampling...
^C

Elapsed Times for processes sshd,

SYSCALL TIME (ns)
umask 9804
setpgrp 12111
nfs 12194
pathconf 12973
chdir 13656
setregid 20676
setreuid 22364
getdents64 27036
getgroups 28605
lwp_self 29808
getsockopt 30959
setgid 31365
alarm 31507
zone 31691
setuid 33861
setsockopt 39311
setegid 39660
setcontext 40061
seteuid 40149
lseek 41430
dup 43978
c2audit 44604
getsockname 51978
privsys 52974
waitsys 56247
getgid 57050
accept 59959
fsat 76925
setgroups 79791
tasksys 81609
systeminfo 91019
sysconfig 112749
recvfrom 127964
access 138435
pipe 144371
getpeername 157330
fxstat 175541
schedctl 182224
vfork 217932
putmsg 241482
connect 301860
sigaction 317411
shutdown 328316
brk 356044
so_socket 409152
getpid 484735
fcntl 494322
gtime 526637
stat64 539840
llseek 624945
resolvepath 678157
open64 715602
getuid 950119
fstat64 964132
ioctl 1171727
xstat 1278052
memcntl 1394735
send 1846685
close 2325223
open 2685141
mmap 5289087
munmap 5379678
lwp_sigmask 6178493
exece 11787526
doorfs 28604988
write 46083911
fork1 57233817
read 96877372
pollsys 25533333727
TOTAL: 25811904817

The output will contain the name of the system call in the left hand column, and the time spent in that system call in the right hand column. There are additional options to display the number of calls to each system call, and you can also filter by process id if you want to measure a specific process. If you are running Solaris 10 and haven’t downloaded the DTraceToolkit, I highly recommend doing so!!!

Removing duplicate devices from vxdisk list

This article was posted by Matty on 2007-03-17 10:33:00 -0400 -0400

I replaced a disk in one of our A5200s last week, and noticed that vxdisk was displaying two entries for the device once I replaced it with vxdiskadm:

$ vxdisk list

DEVICE TYPE DISK GROUP STATUS c7t21d0s2 sliced disk01 oradg online c7t22d0s2 sliced disk02 oradg error c7t22d0s2 sliced - - error c7t23d0s2 sliced disk03 oradg online

To fix this annoyance, I first removed the disk disk02 from the oradg disk group:

$ vxdg -g oradg rmdisk disk02

Once the disk was removed, I ran vxdisk “remove” two times to remove both disk access records:

$ vxdisk rm c7t22d0s2

After both device access records were removed, I executed ‘devfsadm -C’ to clean the Solaris device tree, and then ran ‘vxdctl enable’ to have Veritas update the list of devices it knows about. After these oeprations completed, the device showed up once in the vxdisk output:

$ vxdisk list

DEVICE TYPE DISK GROUP STATUS c7t21d0s2 sliced disk01 oradg online c7t22d0s2 sliced disk02 oradg online c7t23d0s2 sliced disk03 oradg online

I have seen times where the Solaris device tree will hold on to old entries, which unfortunately requires a reboot to fix. Luckily for me, this wasn’t the case with my system. Shibby!

Locating disk drives in a sea of A5200s

This article was posted by Matty on 2007-03-13 17:56:00 -0400 -0400

I manage about a dozen Sun A5200 storage arrays, and periodically need to replace failed disk drives in these arrays. To ensure that I replace the correct device, I first use the format utility to locate the physical device path to the faulted drive:

$ format

< ….. > 43. c7t22d0 /sbus@3,0/SUNW,socal@0,0/sf@0,0/ssd@w22000004cf995f6c,0

Once I know which device to replace, I use the luxadm “remove_device” option to remove the drive for replacement, and then run luxadm with the “led_blink” option to turn an amber LED on and off next to the faulted drive:

$ luxadm led_blink
“/devices/sbus@3,0/SUNW,socal@0,0/sf@0,0/ssd@w22000004cf995f6c,0:a,raw”**

Once I enable the led_blink option, I wander down to the data center, locate the drive with the blinking light, and swap out the failed disk with a new disk. Even though the A5200 is an extremely old storage array, I still thoroughly enjoy managing them.