Problems growing RAID6 MD devices on RHEL5 systems

This article was posted by Matty on 2010-03-13 11:24:00 -0400 -0400

I attempted to grow an existing RAID6 MD device this week, and ran into the following error when I performed the grow operation:

$ mdadm --grow --raid-devices=5 --backup-file=/tmp/mdadmgrow.tmp

/dev/md0
mdadm: Need to backup 384K of critical section..
mdadm: Cannot set device size/shape for /dev/md0: Invalid argument

It appears the ability to grow a RAID6 device was added in the Linux 2.6.21 kernel, and this feature has yet to be backported to RHEL5 (the mdadm manual page implies that this should work, so I reckon there is a documentation mismatch). If you are enountering this error, you will need to switch to a newer kernel in order to be able to grow RAID6 devices on RHEL5 systems. 2.6.33 worked like a champ, and I hope this issue is addressed when RHEL6 ships.

Debugging syslog-ng problems

This article was posted by Matty on 2010-03-09 16:07:00 -0400 -0400

While debugging the syslog-ng issue I mentioned previously, I needed to be able to observe the syslog-ng pattern matches as they occurred. The syslog-ng daemon has a couple of useful options to assist with this. The first is the “-e” option, which causes the daemon to log to stdout. The second is the “-F” option, which stops the daemon from forking. When you combine these option with the “-d” (debug) and “-v” (verbose) options, syslog-ng will print each log message it receives along with the rule processing logic that is applied to that rule:

$ /opt/syslog-ng/sbin/syslog-ng -e -F -d -v > /tmp/syslog-ng.out 2>&1

Incoming log entry; line='<85>sshd2[382]: Public key /root/.ssh/id_rsa_1024.pub used.x0a'
Filter rule evaluation begins; filter_rule='f_web_hosts'
Filter node evaluation result; filter_result='match', filter_type='level'
Filter node evaluation result; filter_result='not-match'
Filter node evaluation result; filter_result='not-match'
Filter node evaluation result; filter_result='not-match', filter_type='OR'
Filter node evaluation result; filter_result='not-match', filter_type='AND'
Filter rule evaluation result; filter_result='not-match', filter_rule='f_web_hosts'
Filter rule evaluation begins; filter_rule='f_app_hosts'

When a syslog message matches a given rule, you will see the filter_result string change from not-match to match:

Filter rule evaluation result; filter_result='match',
filter_rule='f_db_hosts'

Syslog-ng is pretty sweet, and you can check out my centralized logging presentation if you are interested in learning more about how this awesome piece of software works!

Breaking down system time usage in the Solaris kernel

This article was posted by Matty on 2010-03-08 19:55:00 -0400 -0400

I am frequently asked (or paged) to review system performance issues on our Solaris 10 hosts. I use the typical set of Solaris performance tools to observe what my systems are doing, and start drilling down once I know if the problem is with userland applications or in the kernel itself. When I observe issues with the Solaris kernel (these are typically represented by high system time values in vmstat), the first thing I typically do is fire up lockstat to see where the kernel is spending its time:

$ lockstat -kIW /bin/sleep 5


Profiling interrupt: 31424 events in 5.059 seconds (6212 events/sec)

Count indv cuml rcnt nsec Hottest CPU+PIL Caller
-------------------------------------------------------------------------------
28962 92% 92% 0.00 3765 cpu[61] cpu_halt
1238 4% 96% 0.00 3747 cpu[22] lzjb_compress
165 1% 97% 0.00 2655 cpu[37] copy_pattern
124 0% 97% 0.00 2849 cpu[55] copyout
89 0% 97% 0.00 3682 cpu[63] fletcher_4_native
49 0% 97% 0.00 2565 cpu[37] copyin
45 0% 98% 0.00 3079 cpu[0] tcp_rput_data
39 0% 98% 0.00 3597 cpu[0] mutex_enter
30 0% 98% 0.00 3692 cpu[0] nxge_start
28 0% 98% 0.00 3701 cpu[59]+6 nxge_receive_packet
28 0% 98% 0.00 2935 cpu[0] disp_getwork
25 0% 98% 0.00 3110 cpu[0] bcopy

If I see a function that stands out from the rest, I will use the lockstat ‘-f” option to drill down by the kernel function with that name, and use the “-s” option to print the call stack leading up this function:

$ lockstat -kIW -f lzjb_compress -s5 /bin/sleep 5


Profiling interrupt: 703 events in 2.058 seconds (342 events/sec)

-------------------------------------------------------------------------------
Count indv cuml rcnt nsec Hottest CPU+PIL Caller
130 18% 18% 0.00 3625 cpu[28] lzjb_compress

nsec ------ Time Distribution ------ count Stack
2048 | 2
4096 |@@@@@@@@@@@@@@@@@@@@@@@@ 107
8192 |@@@@ 21
-------------------------------------------------------------------------------
Count indv cuml rcnt nsec Hottest CPU+PIL Caller
20 3% 21% 0.00 3529 cpu[37] lzjb_compress

nsec ------ Time Distribution ------ count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 18 0x74
8192 |@@@ 2 zio_compress_data
zio_write_bp_init
zio_execute
-------------------------------------------------------------------------------
Count indv cuml rcnt nsec Hottest CPU+PIL Caller
19 3% 24% 0.00 3696 cpu[28] lzjb_compress

nsec ------ Time Distribution ------ count Stack
4096 |@@@@@@@@@@@@@@@@@@@@@@ 14 0x50
8192 |@@@@@@@ 5 zio_compress_data
zio_write_bp_init
zio_execute
-------------------------------------------------------------------------------

I use lockstat quite a bit to observe what the kernel is doing, and to help me figure out where I should look for answers in the opensolaris source code. It’s also useful for determining if you are encountering a kernel bug, since you can compare the backtrace returned from lockstat with the OpenSolaris bug databases.

Great write-up on AMD's RVI (Rapid Virtualization Indexing) hardware assisted virtualization feature

This article was posted by Matty on 2010-03-07 11:57:00 -0400 -0400

I came across an awesome Q&Q where Tim Mueting from AMD described the hardware virtualization features in AMD Opteron CPUs. The following excerpt from the interview was especially interesting:

Prior to the introduction of RVI, software solutions used something called shadow paging to translate a virtual machine “guest” physical address to the system’s physical address. Because the original page table architecture wasn’t designed with virtualization in mind, a mirror of the page tables had to be created in software, called shadow page tables, to keep information about the physical location of “guest” memory. With shadow paging, the hypervisor must keep the shadow page tables “in sync” with the page tables in hardware. Every time the guest OS modifies its page mapping, the hypervisor must adjust the shadow page tables to reflect the modification. The constant updating of the shadow pages tables takes a lot of CPU cycles. As you might expect, for memory intensive applications, this process can make up the largest part of the performance overhead for virtualization.

With Rapid Virtualization indexing the virtual memory (Guest OS) to physical memory (Guest OS) and the physical memory (Guest OS) to real physical memory translations are cached in the TLB. As described earlier, we also added a new identifier to the TLB called an Address Space Identifier (ASID) which assigns each entry to a specific VM. With this tag, the TLB entries do not need to be flushed each time execution switches from one VM to another. This simplifies the work that the hypervisor needs to do and removes the need for the hypervisor to update shadow page tables. We can now rely on the hardware to determine the physical location of the guest memory.”

I just ordered a second AMD Opteron 1354 for my lab, and am looking to forward to testing out the VMWare fault tolerance feature once I receive my new CPU. Viva la virtualization!

Viewing the scripts that run when you install a Linux RPM

This article was posted by Matty on 2010-03-07 10:22:00 -0400 -0400

RPM packages contain the ability to run scripts after a package is added or removed. These scripts can perform actions like adding or removing users, cleaning up temporary files, or checking to make sure a software component that is contained within a package isn’t running. To view the contents of the scripts that will be run, you can use the rpm “–scripts” option:

$ rpm -q --scripts -p

VirtualBox-3.1-3.1.4_57640_fedora11-1.x86_64.rpm |more**

preinstall scriptlet (using /bin/sh):
# defaults
[ -r /etc/default/virtualbox ] && . /etc/default/virtualbox

# check for active VMs
if pidof VBoxSVC > /dev/null 2>&1; then
echo "A copy of VirtualBox is currently running. Please close it and try again. Please not
e"
echo "that it can take up to ten seconds for VirtualBox (in particular the VBoxSVC daemon)
to"
echo "finish running."
exit 1
fi

RPM provides four types of pre and post installation scripts that can be run:

preinstall scriptlet – this will run before a package is installed
postinstall scriptlet – this will run after a package is installed
preuninstall scriptlet – this will run before a package is uninstalled
postuninstall scriptlet – this will run after a package is uninstalled

There are some awesome RPM options buried in the documentation, and you will definitely want to read through the various RPM resources prior to creating RPMs.