Using kdump to get core files on Fedora and CentOS hosts

One of things I love about Solaris is its ability to generate a core file when a system panics. The core files are an invaluable resource for figuring out what caused a host to panic, and are often the first thing OS vendor support organizations will request when you open a support case. Linux provides the kdump, diskdump and netdump tools to collect core file when a systems panics, and although not quite as seamless as their Solaris counterpart, they work relatively well.

I’m not a huge fan of diskdump and netdump, since they have special pre-requisites (i.e., operational networking, supported storage controller, etc.) that need to be met to ensure a core file is captured. Kdump does not. Kdump works by reserving a chunk of memory for a crash kernel, and then rebooting into this kernel when a box panics. Since the crashkernel uses a chunk of memory that is unused and reserved for this specific purpose, it can be sure that the memory it is using won’t taint the previous kernel. This approach also provides full access to the previous kernel’s memory, which is read and written off to disk or a network accessible location.

To configure kdump to write core files to the /var/crash directory on local disk, you will first need to install the kexec-tools package:

$ yum install kexec-tools

Once the package is installed, you will need to add a crashkernel line to the kernel boot arguments. This line contains the amount of memory to reserve for the crashkernel, and should look similar to the following (you may need to increase the amount of memory depending on the platform you are using):

title CentOS (2.6.18-128.el5)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.18-128.el5 ro root=LABEL=/ console=ttyS0 \
                   crashkernel=128M@16M
        initrd /boot/initrd-2.6.18-128.el5.img



To allow you to get a core file if a box hangs, you can enable sysrq magic key sequences by setting “kernel.sysrq” to 1 in /etc/sysctl.conf (you can also use the sysctl “-w” option to enable this feature on an active host):

kernel.sysrq = 1

Once these settings are in place, you can enable the kdump service with the chkconfig and service commands:

$ chkconfig kdump on

$ service kdump start

If you want to verify that kdump is working, you can type “alt + sysrq + c” on the console, or echo a “c” character to the sysrq-trigger proc entry:

$ echo “c” > /proc/sysrq-trigger

SysRq : Trigger a crashdump
Linux version 2.6.18-128.el5 (mockbuild@builder10.centos.org)
….

This will force a panic, which should result in a core file being generated in the /var/crash directory:

$ pwd
/var/crash/2009-07-05-18:31

$ ls -la
total 317240
drwxr-xr-x 2 root root 4096 Jul 5 18:32 .
drwxr-xr-x 3 root root 4096 Jul 5 18:31 ..
-r——– 1 root root 944203448 Jul 5 18:32 vmcore

If you are like me and prefer to be notified when a box panics, you can configure your log monitoring solution to look for the string “kdump: saved a vmcore” in /var/log/messages:

Jul 5 18:32:08 kvmnode1 kdump: saved a vmcore to /var/crash/2009-07-05-18:31

Kdump is pretty sweet, and it’s definitely one of those technologies that every RAS savy engineer should be configuring on each server he or she deploys.

Adjusting how often the Linux kernel checks for MCEs

I wrote about Linux mcelog utility a few weeks back, and described how it can be used to monitor the /dev/mcelog device for machine check exception (MCEs). By default, the Linux kernel will check for MCEs every five minutes. The polling interval is defined in the sysfs check_interval entry, which you can view with cat:

$ cat /sys/devices/system/machinecheck/machinecheck0/check_interval
12c

$ python
>>> print “%d” % 0x12c
300

To configure the host to use a shorter check interval, you can echo the desired value to the sysfs entry for processor 0:

$ echo 60 > /sys/devices/system/machinecheck/machinecheck0/check_interval

$ cat /sys/devices/system/machinecheck/machinecheck0/check_interval
3c

$ cat /sys/devices/system/machinecheck/machinecheck1/check_interval
3c

If you want to get additional information on check_interval, check out the machinecheck text file in the kernel documentation directory. If you are curious how the code actually detects a MCE, you can look through the source code in <KERNEL_SOURCE_ROOT>/arch/x86/kernel/cpu/mcheck.

Understanding the Linux /boot directory

When I first began using Linux quite some time ago, I remember thinking to myself WTF is all this stuff in /boot. There were files related to grub, a file called vmlinuz, and several ASCII text files with cool sounding names. After reading through the Linux kernel HOWTO, the /boot directory layout all came together, and understanding the purpose of each file has helped me better understand how things work, and allowed me to solve numerous issues in a more expedient manner. Given a typical CentOS or Fedora host, you will probably see something similar to the following in /boot:

$ cd /boot

$ tree

.
|-- System.map-2.6.29.5-191.fc11.x86_64
|-- System.map-2.6.30
|-- config-2.6.29.5-191.fc11.x86_64
|-- config-2.6.30
|-- efi
|   `-- EFI
|       `-- redhat
|           `-- grub.efi
|-- grub
|   |-- device.map
|   |-- e2fs_stage1_5
|   |-- fat_stage1_5
|   |-- ffs_stage1_5
|   |-- grub.conf
|   |-- iso9660_stage1_5
|   |-- jfs_stage1_5
|   |-- menu.lst -> ./grub.conf
|   |-- minix_stage1_5
|   |-- reiserfs_stage1_5
|   |-- splash.xpm.gz
|   |-- stage1
|   |-- stage2
|   |-- ufs2_stage1_5
|   |-- vstafs_stage1_5
|   `-- xfs_stage1_5
|-- initrd-2.6.29.5-191.fc11.x86_64.img
|-- initrd-2.6.30.img
|-- vmlinuz-2.6.29.5-191.fc11.x86_64
`-- vmlinuz-2.6.30



For each kernel release, you will typically see a vmlinuz, System.map, initrd and config file. The vmlinuz file contain the actual Linux kernel, which is loaded and executed by grub. The System.map file contains a list of kernel symbols and the addresses these symbols are located at. The initrd file is the initial ramdisk used to preload modules, and contains the drivers and supporting infrastructure (keyboard mappings, etc.) needed to manage your keyboard, serial devices and block storage early on in the boot process. The config file contains a list of kernel configuration options, which is useful for understanding which features were compiled into the kernel, and which features were built as modules. I am going to type up a separate post with my notes on grub, especially those related to solving boot related issues.

Using yum to install the latest kernels on Fedora hosts

As you may surmise from several of my recent posts, I have been doing a ton of Linux virtualization (Xen, KVM, openvz) testing. In the case of KVM, numerous bug fixes are integrated into each kernel release, so it’s often beneficial to stick to the bleeding edge kernels (though great for testing, I wouldn’t feel comfortable running these in production). Fedora provides the latest and greatest kernels and virtualization packages (libvirt, qemu, etc.) in the rawhide repository, which you can access by setting the enabled flag to 1 in /etc/yum.repos.d/fedora-rawhide.repo:

[rawhide]
name=Fedora - Rawhide - Developmental packages for the next Fedora release
failovermethod=priority
#baseurl=http://download.fedoraproject.org/pub/fedora/linux/development/$basearch/os/
mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=$basearch
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-$basearch

[rawhide-debuginfo]
name=Fedora - Rawhide - Debug
failovermethod=priority
#baseurl=http://download.fedoraproject.org/pub/fedora/linux/development/$basearch/debug/
mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=rawhide-debug&arch=$basearch
enabled=1

Once you have the repository enabled, you can use yum to install the latest kernel:

$ yum update kernel

This will install the latest kernel, and allow you to take advantage of the latest Linux kernel features.

Adding Machine Check Exception Logging support to the Linux kernel

In my previous post, I mentioned how the mcelog utility can be used to detect hardware problems. Mcelog relies on the /dev/mcelog device being present, which requires the kernel to be built with the following options:

CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y

To enable these, you can select the following options once you run ‘make menuconfig’:

         [*] Machine Check Exception
  │ │         [*]   Intel MCE features
  │ │         [*]   AMD MCE features                 


Built-in memory testing in the Linux 2.6.26 kernel

I have been using memtest86 and a custom built hardware testing image based on OpenSolaris, FMA and sun VTS for quite some time, and have had fantastic success with them. I just learned that the Linux kernel developers added built-in memory testing support to the Linux 2.6.26 kernel:

“Memtest is a commonly used tool for checking your memory. In 2.6.26 Linux is including his own in-kernel memory tester. The goal is not to replace memtest, in fact this tester is much simpler and less capable than memtest, but it’s handy to have a built-in memory tester on every kernel. It’s enabled easily with the “memtest” boot parameter.”

This is super useful, and will be useful when you don’t have memtest86 and company readily available. Nice!