Monitoring md device rebuilds

This article was posted by Matty on 2007-03-11 12:26:00 -0400 -0400

One super useful utility that ships with CentOS 4.4 is the watch utility. Watch allows you to monitor the output from a command at a specific interval, which is especially useful for monitoring array rebuilds. To use watch, you need to run it with a command to watch, and an optional interval to control how often the output from that command is displayed:

$ watch --interval=10 cat mdstat

Every 2.0s: cat mdstat Mon Mar 5 22:30:58 2007

Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid1 sdb2[1] sda2[0]
8385856 blocks [2/2] [UU]

md2 : active raid5 sdg1[5] sdf1[3] sde1[2] sdd1[1] sdc1[0]
976751616 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
[=>...................] recovery = 9.8% (24068292/244187904) finish=161.1min speed=22764K/sec

md0 : active raid1 sdb1[1] sda1[0]
235793920 blocks [2/2] [UU]

unused devices:

Adding a hot spare to an md device

This article was posted by Matty on 2007-03-11 12:14:00 -0400 -0400

I am running CentOS 4.4 on some old servers, and each of these servers has multiple internal disk drives. Since system availability concerns me more than the amount of storage that is available, I decided to add a hot spare to the md device that stores my data (md2). To add the hot spare, I ran the mdadm utility with the “–add” option, the md device to add the spare to, and the spare device to use:

$ /sbin/mdadm --add /dev/md2 /dev/sdh1
mdadm: added /dev/sdh1

After the spare was added, the device showed up in the /proc/mdstat output with the “(S)” string to indicate that it’s a hot spare:

$ cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid1 sdb2[1] sda2[0]
8385856 blocks [2/2] [UU]
bitmap: 0/128 pages [0KB], 32KB chunk

md2 : active raid5 sdh1[5](S) sdg1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
976751616 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
bitmap: 3/233 pages [12KB], 512KB chunk

md0 : active raid1 sdb1[1] sda1[0]
235793920 blocks [2/2] [UU]
bitmap: 7/225 pages [28KB], 512KB chunk

unused devices:

Getting live upgrade to work with a separate /var

This article was posted by Matty on 2007-03-04 12:08:00 -0400 -0400

While performing a live upgrade from Nevada build 54 to Nevada build 57, I bumped into the following error:

$ lucreate -n Nevada_B57 -m /:/dev/dsk/c1d0s0:ufs -m
/var:/dev/dsk/c1d0s3:ufs -m -:/dev/dsk/c1d0s1:swap**

Discovering physical storage devices Discovering logical storage devices Cross referencing storage devices with boot environment configurations Determining types of file systems supported Validating file system requests Preparing logical storage devices Preparing physical storage devices Configuring physical storage devices Configuring logical storage devices Analyzing system configuration. Comparing source boot environment file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Searching /dev for possible boot environment filesystem devices

Template entry /var:/dev/dsk/c1d0s3:ufs skipped.

luconfig: ERROR: Template filesystem definition failed for /var, all devices are not applicable.. ERROR: Configuration of boot environment failed.

The error message provided little information on what the actual problem was, and when I removed “-m /var:/dev/dsk/c1d0s3:ufs” from the lucreate command line, everything worked as expected. Being extremely baffled by this problem, I started reading through the opensolaris.org installation forum, and eventually came across a post from Nils Nieuwejaar. Nils mentioned that he had debugged an issue where the partition flags weren’t set to “wm”, and this had caused his live upgrade to fail. I used Nils feedback, and went into format to change the partitiong flags for the new “/var” file system to “wm”. Once I saved my changes and ran lucreate again, everything worked as expected:

$ lucreate -n Nevada_B57 -m /:/dev/dsk/c1d0s0:ufs -m
/var:/dev/dsk/c1d0s3:ufs -m -:/dev/dsk/c1d0s1:swap**

Updating system configuration files. The device is not a root device for any boot environment; cannot get BE ID. Creating configuration for boot environment . Source boot environment is . Creating boot environment . Checking for GRUB menu on boot environment . The boot environment does not contain the GRUB menu. Creating file systems on boot environment . Creating file system for </> in zone on . Creating file system for in zone on . Mounting file systems for boot environment . Calculating required sizes of file systems for boot environment . Populating file systems on boot environment . Checking selection integrity. Integrity check OK. Populating contents of mount point </>. Populating contents of mount point . < ….. >

Now to convince the live upgrade developers to clean up their error messages. :)

Viewing function calls with whocalls

This article was posted by Matty on 2007-03-03 11:31:00 -0400 -0400

While catching up with various opensolaris.org mailing lists, I came across a post that described the whocalls utility. This nifty little utility can be used to view the stack frames leading up to a call to a specific function, which can be super useful for debugging. To view all of the code paths leading up to the printf function being called, whocalls can be run with the name of the function to look for, and the executable that we want to analyze for calls to that function:

$ whocalls printf /bin/ls /bin/lsblk /bin/lsmod

printf(0x80541b0, 0x8067800, 0x80653a8) /usr/bin/ls:pentry+0x593 /usr/bin/ls:pem+0xb1 /usr/bin/ls:pdirectory+0x266 /usr/bin/ls:main+0x70e /usr/bin/ls:_start+0x7a printf(0x80541b0, 0x8067a48, 0x80653a8) /usr/bin/ls:pentry+0x593 /usr/bin/ls:pem+0xb1 /usr/bin/ls:pdirectory+0x266 /usr/bin/ls:main+0x70e /usr/bin/ls:_start+0x7a < ….. >

Now to do some research on the runtime linker’s auditing facilities in /usr/lib/link_audit/*!

Viewing busy code paths with DTrace

This article was posted by Matty on 2007-03-03 11:14:00 -0400 -0400

Periodically I want to see the busiest application and system code paths on a system. Prior to Solaris 10, this was a difficult questions to answer without custom instrumentation. Now that we have DTrace, we can use the DTrace profile provider and an aggregation to view the busiest code paths in the kernel:

$ dtrace -n 'profile-1001 {@[stack(20)]=count()} END{trunc(@,2)}'

dtrace: description 'profile-1001 ' matched 2 probes
^C
CPU ID FUNCTION:NAME
0 2 :END


unix`atomic_cas_32+0x10
ufs`ufs_scan_inodes+0xf4
ufs`ufs_update+0x1e9
ufs`ufs_sync+0x213
genunix`fsop_sync_by_kind+0x36
genunix`fsflush+0x3e2
unix`thread_start+0x8
11

unix`cpu_halt+0x100
unix`idle+0x3f
unix`thread_start+0x8
3978

To see the busiest userland code paths, we can aggregate on the executable and userland stack frames, but only when we are in user context (arg1 indicates which context we are in):

$ dtrace -n 'profile-1001 /arg1/ {@[execname,ustack()]=count()} END{trunc(@,2)}'

dtrace: description 'profile-1001 ' matched 2 probes
^C
CPU ID FUNCTION:NAME
0 2 :END

kcfd
libmd.so.1`SHA1Transform+0x2ef
libmd.so.1`SHA1Update+0xb6
libelfsign.so.1`soft_digest_update+0x48
libelfsign.so.1`C_DigestUpdate+0xd7
libelfsign.so.1`_C01A7C0D+0x2d
libelfsign.so.1`elfsign_hash_common+0x175
libelfsign.so.1`_C01A7A0C+0x16
libelfsign.so.1`_C01A7A0D+0x23d
kcfd`kcfd_process_request+0x17e
libc.so.1`__door_return+0x60
12
sshd
libc.so.1`memcpy+0x6b
libcrypto.so.0.9.8`HMAC_Init_ex+0x1b2
libcrypto.so.0.9.8`HMAC_Init+0x3d
sshd`mac_compute+0x34
sshd`packet_send2+0x23c
sshd`packet_send+0x1b
sshd`channel_output_poll+0x1d2
sshd`server_loop2+0xe6
sshd`do_authenticated2+0xe
sshd`do_authenticated+0x3b
sshd`main+0x1081
sshd`_start+0x7a
52

This is huge, and with two lines of DTrace, you can dynamically view the busiest code paths in kernel and user context. Nice!