How the Linux OOM killer works

This article was posted by Matty on 2009-09-30 18:24:00 -0400 -0400

Most admins have probably experienced failures due to applications leaking memory, or worse yet consuming all of the virtual memory (physical memory + swap) on a host. The Linux kernel has an interesting way of dealing with memory exhaustion, and it comes in the way of the Linux OOM killer. When invoked, the OOM killer will begin terminating processes in order to free up enough memory to keep the system operational. I was curious how the OOM worked, so I decided to spend some time reading through the linux/mm/oom_kill.c Linux kernel source code file to see what the OOM killer does.

The OOM killer uses a point system to pick which processes to execute. The points are assigned by the badness() function, which contains the following block comment:

/**
 * badness - calculate a numeric value for how bad this task has been
 * @p: task struct of which task we should calculate
 * @uptime: current uptime in seconds
 *
 * The formula used is relatively simple and documented inline in the
 * function. The main rationale is that we want to select a good task
 * to kill when we run out of memory.
 *
 * Good in this context means that:
 * 1) we lose the minimum amount of work done
 * 2) we recover a large amount of memory
 * 3) we don't kill anything innocent of eating tons of memory
 * 4) we want to kill the minimum amount of processes (one)
 * 5) we try to kill the process the user expects us to kill, this
 *    algorithm has been meticulously tuned to meet the principle
 *    of least surprise ... (be careful when you change it)
 */

The actual code in this function does the following:

Processes that have the PF_SWAPOFF flag set will be killed first
Processes which fork a lot of child processes are next in line
Kill off niced processes, since they are typically less important
Superuser processes are usually more important, so try to avoid killing those

The code also takes takes into account the length of time the process has been running, which may or may not be a good thing. It’s interesting to see how technologies we take for granted actually work, and this experience really helped me understand what all the fields in the task_struct structure are used for. Now to dig into mm_struct. :)

Viewing the status of NetworkManager managed links

This article was posted by Matty on 2009-09-20 09:32:00 -0400 -0400

As I mentioned in a previous post, I spent some time trying to get the NetworkManager to respect my custom DNS settings. When I was looking into this issue, I learned about the nm-tool utility. This nifty tool will print the status of each NetworkManager managed interface, as well as the connection state:

$ nm-tool

NetworkManager Tool

State: connected

- Device: eth0 [System eth0] --------------------------------------------------
Type: Wired
Driver: tg3
State: connected
Default: yes
HW Address: 00:19:B9:3A:26:BC

Capabilities:
Carrier Detect: yes
Speed: 100 Mb/s

Wired Properties
Carrier: on

IPv4 Settings:
Address: 192.168.1.91
Prefix: 24 (255.255.255.0)
Gateway: 192.168.1.254

DNS: 192.168.1.1
DNS: 192.168.1.2

I found the IPv4 settings section to be rather useful while I was debugging a network connectivity problem (nm-tool and ethtool make it SUPER easy to debug link problems), and will definitely be using this tool in the future!

Scanning SCSI controllers for new LUNs on Centos and Fedora Linux hosts

This article was posted by Matty on 2009-09-18 19:18:00 -0400 -0400

While building out a new ESX guest, I had to scan for a new SCSI device I added. To scan a SCSI controller for new LUNs, you can echo the “- - -” string to the SCSI controller’s scan sysfs node:

$ echo "- - -" > /sys/class/scsi_host/host0/scan

Now you may be asking yourself, what do those three dashes mean? Well, here is the answer from the Linux 2.6.31 kernel source (I had to look this up to recall):

static int scsi_scan(struct Scsi_Host *shost, const char *str)
{
char s1[15], s2[15], s3[15], junk;
unsigned int channel, id, lun;
int res;

res = sscanf(str, "%10s %10s %10s %c", s1, s2, s3, &junk);
if (res != 3)
return -EINVAL;
if (check_set(&channel, s1))
return -EINVAL;
if (check_set(&id, s2))
return -EINVAL;
if (check_set(&lun, s3))
return -EINVAL;
if (shost->transportt->user_scan)
res = shost->transportt->user_scan(shost, channel, id, lun);
else
res = scsi_scan_host_selected(shost, channel, id, lun, 1);
return res;
}

As you can see above, the three values passed to the scan value are the channel, id and lun number you want to scan. The “-” equates to a wild card, which causes all of the channels, ids and luns to be scanned. The more I dig into the Linux kernel source code, the more I realize just how cool the Linux kernel is. I think it’s about time to write a device driver. :)

Udev presentation slides

This article was posted by Matty on 2009-09-17 22:03:00 -0400 -0400

I gave a talk on the Linux udev device management framework tonight, and posted my slides to the presentation section of my website. Thanks to everyone who came out! I had a blast presenting, and enjoyed meeting some new folks!

Getting the Linux NetworkManager process to respect custom DNS server settings

This article was posted by Matty on 2009-09-14 20:36:00 -0400 -0400

I recently switched my work Desktop from Ubuntu to Fedora 11, and noticed that there are some new configuration options now that network intefaces are managed by the NetworkManager process. Two useful options are the ability to specify the DNS servers and search domains in the network-scripts files, and have those applied when a DHCP lease is acquired (this assumes you override the values provided by your DHCP server). To override the DNS servers and search domains, you can set the DNS1, DNS2 and DOMAIN variables in your favorite ifcfg-eth[0-9]+ script:

$ egrep '(DNS1|DNS2|DOMAIN)' /etc/sysconfig/network-scripts/ifcfg-eth0

DNS1=192.168.1.1
DNS2=192.168.1.2
DOMAIN="prefetch.net ops.prefetch.net"

Hopefully the NetworkManager is all it’s cracked up to be. Only time will tell of course. :)