One super useful utility that ships with CentOS 4.4 is the watch utility. Watch allows you to monitor the output from a command at a specific interval, which is especially useful for monitoring array rebuilds. To use watch, you need to run it with a command to watch, and an optional interval to control how often the output from that command is displayed:
$ watch --interval=10 cat mdstat
Every 2.0s: cat mdstat Mon Mar 5 22:30:58 2007
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid1 sdb2[1] sda2[0]
8385856 blocks [2/2] [UU]
md2 : active raid5 sdg1[5] sdf1[3] sde1[2] sdd1[1] sdc1[0]
976751616 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
[=>...................] recovery = 9.8% (24068292/244187904) finish=161.1min speed=22764K/sec
md0 : active raid1 sdb1[1] sda1[0]
235793920 blocks [2/2] [UU]
unused devices:
I am running CentOS 4.4 on some old servers, and each of these servers has multiple internal disk drives. Since system availability concerns me more than the amount of storage that is available, I decided to add a hot spare to the md device that stores my data (md2). To add the hot spare, I ran the mdadm utility with the “–add” option, the md device to add the spare to, and the spare device to use:
$ /sbin/mdadm --add /dev/md2 /dev/sdh1
mdadm: added /dev/sdh1
After the spare was added, the device showed up in the /proc/mdstat output with the “(S)” string to indicate that it’s a hot spare:
$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid1 sdb2[1] sda2[0]
8385856 blocks [2/2] [UU]
bitmap: 0/128 pages [0KB], 32KB chunk
md2 : active raid5 sdh1[5](S) sdg1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
976751616 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
bitmap: 3/233 pages [12KB], 512KB chunk
md0 : active raid1 sdb1[1] sda1[0]
235793920 blocks [2/2] [UU]
bitmap: 7/225 pages [28KB], 512KB chunk
unused devices:
While performing a live upgrade from Nevada build 54 to Nevada build 57, I bumped into the following error:
$ lucreate -n Nevada_B57 -m /:/dev/dsk/c1d0s0:ufs -m
/var:/dev/dsk/c1d0s3:ufs -m -:/dev/dsk/c1d0s1:swap**
Discovering physical storage devices Discovering logical storage devices Cross referencing storage devices with boot environment configurations Determining types of file systems supported Validating file system requests Preparing logical storage devices Preparing physical storage devices Configuring physical storage devices Configuring logical storage devices Analyzing system configuration. Comparing source boot environment file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Searching /dev for possible boot environment filesystem devices
Template entry /var:/dev/dsk/c1d0s3:ufs skipped.
luconfig: ERROR: Template filesystem definition failed for /var, all devices are not applicable.. ERROR: Configuration of boot environment failed.
The error message provided little information on what the actual problem was, and when I removed “-m /var:/dev/dsk/c1d0s3:ufs” from the lucreate command line, everything worked as expected. Being extremely baffled by this problem, I started reading through the opensolaris.org installation forum, and eventually came across a post from Nils Nieuwejaar. Nils mentioned that he had debugged an issue where the partition flags weren’t set to “wm”, and this had caused his live upgrade to fail. I used Nils feedback, and went into format to change the partitiong flags for the new “/var” file system to “wm”. Once I saved my changes and ran lucreate again, everything worked as expected:
$ lucreate -n Nevada_B57 -m /:/dev/dsk/c1d0s0:ufs -m
/var:/dev/dsk/c1d0s3:ufs -m -:/dev/dsk/c1d0s1:swap**
Discovering physical storage devices Discovering logical storage devices Cross referencing storage devices with boot environment configurations Determining types of file systems supported Validating file system requests Preparing logical storage devices Preparing physical storage devices Configuring physical storage devices Configuring logical storage devices Analyzing system configuration. Comparing source boot environment file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Searching /dev for possible boot environment filesystem devices
Updating system configuration files. The device is not a root device for any boot environment; cannot get BE ID. Creating configuration for boot environment . Source boot environment is . Creating boot environment . Checking for GRUB menu on boot environment . The boot environment does not contain the GRUB menu. Creating file systems on boot environment . Creating file system for </> in zone on . Creating file system for in zone on . Mounting file systems for boot environment . Calculating required sizes of file systems for boot environment . Populating file systems on boot environment . Checking selection integrity. Integrity check OK. Populating contents of mount point </>. Populating contents of mount point . < ….. >
Now to convince the live upgrade developers to clean up their error messages. :)
While catching up with various opensolaris.org mailing lists, I came across a post that described the whocalls utility. This nifty little utility can be used to view the stack frames leading up to a call to a specific function, which can be super useful for debugging. To view all of the code paths leading up to the printf function being called, whocalls can be run with the name of the function to look for, and the executable that we want to analyze for calls to that function:
$ whocalls printf /bin/ls /bin/lsblk /bin/lsmod
printf(0x80541b0, 0x8067800, 0x80653a8) /usr/bin/ls:pentry+0x593 /usr/bin/ls:pem+0xb1 /usr/bin/ls:pdirectory+0x266 /usr/bin/ls:main+0x70e /usr/bin/ls:_start+0x7a printf(0x80541b0, 0x8067a48, 0x80653a8) /usr/bin/ls:pentry+0x593 /usr/bin/ls:pem+0xb1 /usr/bin/ls:pdirectory+0x266 /usr/bin/ls:main+0x70e /usr/bin/ls:_start+0x7a < ….. >
Now to do some research on the runtime linker’s auditing facilities in /usr/lib/link_audit/*!
Periodically I want to see the busiest application and system code paths on a system. Prior to Solaris 10, this was a difficult questions to answer without custom instrumentation. Now that we have DTrace, we can use the DTrace profile provider and an aggregation to view the busiest code paths in the kernel:
$ dtrace -n 'profile-1001 {@[stack(20)]=count()} END{trunc(@,2)}'
dtrace: description 'profile-1001 ' matched 2 probes
^C
CPU ID FUNCTION:NAME
0 2 :END
unix`atomic_cas_32+0x10
ufs`ufs_scan_inodes+0xf4
ufs`ufs_update+0x1e9
ufs`ufs_sync+0x213
genunix`fsop_sync_by_kind+0x36
genunix`fsflush+0x3e2
unix`thread_start+0x8
11
unix`cpu_halt+0x100
unix`idle+0x3f
unix`thread_start+0x8
3978
To see the busiest userland code paths, we can aggregate on the executable and userland stack frames, but only when we are in user context (arg1 indicates which context we are in):
$ dtrace -n 'profile-1001 /arg1/ {@[execname,ustack()]=count()} END{trunc(@,2)}'
dtrace: description 'profile-1001 ' matched 2 probes
^C
CPU ID FUNCTION:NAME
0 2 :END
kcfd
libmd.so.1`SHA1Transform+0x2ef
libmd.so.1`SHA1Update+0xb6
libelfsign.so.1`soft_digest_update+0x48
libelfsign.so.1`C_DigestUpdate+0xd7
libelfsign.so.1`_C01A7C0D+0x2d
libelfsign.so.1`elfsign_hash_common+0x175
libelfsign.so.1`_C01A7A0C+0x16
libelfsign.so.1`_C01A7A0D+0x23d
kcfd`kcfd_process_request+0x17e
libc.so.1`__door_return+0x60
12
sshd
libc.so.1`memcpy+0x6b
libcrypto.so.0.9.8`HMAC_Init_ex+0x1b2
libcrypto.so.0.9.8`HMAC_Init+0x3d
sshd`mac_compute+0x34
sshd`packet_send2+0x23c
sshd`packet_send+0x1b
sshd`channel_output_poll+0x1d2
sshd`server_loop2+0xe6
sshd`do_authenticated2+0xe
sshd`do_authenticated+0x3b
sshd`main+0x1081
sshd`_start+0x7a
52
This is huge, and with two lines of DTrace, you can dynamically view the busiest code paths in kernel and user context. Nice!