How the Linux OOM killer works

Most admins have probably experienced failures due to applications leaking memory, or worse yet consuming all of the virtual memory (physical memory + swap) on a host. The Linux kernel has an interesting way of dealing with memory exhaustion, and it comes in the way of the Linux OOM killer. When invoked, the OOM killer will begin terminating processes in order to free up enough memory to keep the system operational. I was curious how the OOM worked, so I decided to spend some time reading through the linux/mm/oom_kill.c Linux kernel source code file to see what the OOM killer does.

The OOM killer uses a point system to pick which processes to execute. The points are assigned by the badness() function, which contains the following block comment:

/**
 * badness - calculate a numeric value for how bad this task has been
 * @p: task struct of which task we should calculate
 * @uptime: current uptime in seconds
 *
 * The formula used is relatively simple and documented inline in the
 * function. The main rationale is that we want to select a good task
 * to kill when we run out of memory.
 *
 * Good in this context means that:
 * 1) we lose the minimum amount of work done
 * 2) we recover a large amount of memory
 * 3) we don't kill anything innocent of eating tons of memory
 * 4) we want to kill the minimum amount of processes (one)
 * 5) we try to kill the process the user expects us to kill, this
 *    algorithm has been meticulously tuned to meet the principle
 *    of least surprise ... (be careful when you change it)
 */

The actual code in this function does the following:

– Processes that have the PF_SWAPOFF flag set will be killed first

– Processes which fork a lot of child processes are next in line

– Kill off niced processes, since they are typically less important

– Superuser processes are usually more important, so try to avoid killing those

The code also takes takes into account the length of time the process has been running, which may or may not be a good thing. It’s interesting to see how technologies we take for granted actually work, and this experience really helped me understand what all the fields in the task_struct structure are used for. Now to dig into mm_struct. :)

Viewing the status of NetworkManager managed links

As I mentioned in a previous post, I spent some time trying to get the NetworkManager to respect my custom DNS settings. When I was looking into this issue, I learned about the nm-tool utility. This nifty tool will print the status of each NetworkManager managed interface, as well as the connection state:

$ nm-tool

NetworkManager Tool

State: connected

- Device: eth0  [System eth0] --------------------------------------------------
  Type:              Wired
  Driver:            tg3
  State:             connected
  Default:           yes
  HW Address:        00:19:B9:3A:26:BC

  Capabilities:
    Carrier Detect:  yes
    Speed:           100 Mb/s

  Wired Properties
    Carrier:         on

  IPv4 Settings:
    Address:         192.168.1.91
    Prefix:          24 (255.255.255.0)
    Gateway:         192.168.1.254

    DNS:             192.168.1.1
    DNS:             192.168.1.2

I found the IPv4 settings section to be rather useful while I was debugging a network connectivity problem (nm-tool and ethtool make it SUPER easy to debug link problems), and will definitely be using this tool in the future!

Scanning SCSI controllers for new LUNs on Centos and Fedora Linux hosts

While building out a new ESX guest, I had to scan for a new SCSI device I added. To scan a SCSI controller for new LUNs, you can echo the “- – -” string to the SCSI controller’s scan sysfs node:

$ echo “- – -” > /sys/class/scsi_host/host0/scan

Now you may be asking yourself, what do those three dashes mean? Well, here is the answer from the Linux 2.6.31 kernel source (I had to look this up to recall):

static int scsi_scan(struct Scsi_Host *shost, const char *str)
{
        char s1[15], s2[15], s3[15], junk;
        unsigned int channel, id, lun;
        int res;

        res = sscanf(str, "%10s %10s %10s %c", s1, s2, s3, &junk);
        if (res != 3)
                return -EINVAL;
        if (check_set(&channel, s1))
                return -EINVAL;
        if (check_set(&id, s2))
                return -EINVAL;
        if (check_set(&lun, s3))
                return -EINVAL;
        if (shost->transportt->user_scan)
                res = shost->transportt->user_scan(shost, channel, id, lun);
        else
                res = scsi_scan_host_selected(shost, channel, id, lun, 1);
        return res;
}

As you can see above, the three values passed to the scan value are the channel, id and lun number you want to scan. The “-” equates to a wild card, which causes all of the channels, ids and luns to be scanned. The more I dig into the Linux kernel source code, the more I realize just how cool the Linux kernel is. I think it’s about time to write a device driver. :)

Getting the Linux NetworkManager process to respect custom DNS server settings

I recently switched my work Desktop from Ubuntu to Fedora 11, and noticed that there are some new configuration options now that network intefaces are managed by the NetworkManager process. Two useful options are the ability to specify the DNS servers and search domains in the network-scripts files, and have those applied when a DHCP lease is acquired (this assumes you override the values provided by your DHCP server). To override the DNS servers and search domains, you can set the DNS1, DNS2 and DOMAIN variables in your favorite ifcfg-eth[0-9]+ script:

$ egrep ‘(DNS1|DNS2|DOMAIN)’ /etc/sysconfig/network-scripts/ifcfg-eth0
DNS1=192.168.1.1
DNS2=192.168.1.2
DOMAIN=”prefetch.net ops.prefetch.net”

Hopefully the NetworkManager is all it’s cracked up to be. Only time will tell of course. :)

Listing packages that were added or updated after an initial Fedora or CentOS installation

I was reviewing the configuration of a system last week, and needed to find out which packages were added after the initial installation. The rpm utility has a slew of options (you can view the list of options by running `rpm –querytags | more`) to query the package database, including the extremely handy INSTALLTIME option. Using this query value along with my pkgdiff script, I was able to generate a list of packages that were installed (or updated) after the initial install:

$ pkgdiff
lsscsi-0.22-2.fc11.x86_64 was most likely added after the initial install
xmms-1.2.11-5.20071117cvs.fc11.x86_64 was most likely added after the initial install
gtk+-1.2.10-68.fc11.x86_64 was most likely added after the initial install
rlog-1.4-5.fc11.x86_64 was most likely added after the initial install
nx-3.3.0-35.fc11.x86_64 was most likely added after the initial install
xmms-libs-1.2.11-5.20071117cvs.fc11.x86_64 was most likely added after the initial install
tcl-8.5.6-6.fc11.x86_64 was most likely added after the initial install
glib-1.2.10-32.fc11.x86_64 was most likely added after the initial install
freenx-server-0.7.3-15.fc11.x86_64 was most likely added after the initial install
xorg-x11-apps-7.3-8.fc11.x86_64 was most likely added after the initial install
libmikmod-3.2.0-5.beta2.fc11.x86_64 was most likely added after the initial install
fuse-encfs-1.5-6.fc11.x86_64 was most likely added after the initial install
xorg-x11-fonts-misc-7.2-8.fc11.noarch was most likely added after the initial install
expect-5.43.0-17.fc11.x86_64 was most likely added after the initial install

Now this doesn’t take into account package updates, but it should be pretty easy to identify which items were added vs. updated with a couple more lines of shell script (you could cross reference the package list above with /root/install.log if you need to get super specific).