How the Linux OOM killer works

Most admins have probably experienced failures due to applications leaking memory, or worse yet consuming all of the virtual memory (physical memory + swap) on a host. The Linux kernel has an interesting way of dealing with memory exhaustion, and it comes in the way of the Linux OOM killer. When invoked, the OOM killer will begin terminating processes in order to free up enough memory to keep the system operational. I was curious how the OOM worked, so I decided to spend some time reading through the linux/mm/oom_kill.c Linux kernel source code file to see what the OOM killer does.

The OOM killer uses a point system to pick which processes to execute. The points are assigned by the badness() function, which contains the following block comment:

 * badness - calculate a numeric value for how bad this task has been
 * @p: task struct of which task we should calculate
 * @uptime: current uptime in seconds
 * The formula used is relatively simple and documented inline in the
 * function. The main rationale is that we want to select a good task
 * to kill when we run out of memory.
 * Good in this context means that:
 * 1) we lose the minimum amount of work done
 * 2) we recover a large amount of memory
 * 3) we don't kill anything innocent of eating tons of memory
 * 4) we want to kill the minimum amount of processes (one)
 * 5) we try to kill the process the user expects us to kill, this
 *    algorithm has been meticulously tuned to meet the principle
 *    of least surprise ... (be careful when you change it)

The actual code in this function does the following:

– Processes that have the PF_SWAPOFF flag set will be killed first

– Processes which fork a lot of child processes are next in line

– Kill off niced processes, since they are typically less important

– Superuser processes are usually more important, so try to avoid killing those

The code also takes takes into account the length of time the process has been running, which may or may not be a good thing. It’s interesting to see how technologies we take for granted actually work, and this experience really helped me understand what all the fields in the task_struct structure are used for. Now to dig into mm_struct. :)

9 thoughts on “How the Linux OOM killer works”

  1. OOM killer is in Linux mostly do to workaround problems with memory overcommiting in Linux. Linux is slowly moving into direction of getting rid of memory overcommiting approach (by tweaking /proc you can disable it to some extend for some time now). The truth is that memory overcommiting + OOM killer is a bad thing – killing semi-randomly applications because system allowed them to allocate more virtual memory than it has is just plain stupid in most environments. But as I said – Linux is slowly catching up and getting rid of that unpleasant feature.

    btw: in most other Unixes like Solaris OOM killer is not needed as system generally won’t allow for memory overcommitment.

    See also –

  2. We’ve run into the OOM Killer running some critical billing apps in our environment (the Vendor won’t support anything besides RHEL4 on HP DL580s). Don’t ask me why…we’ve beaten ourselves silly over this.

    In any case, their application running single-threaded processes bound to specific CPUs and has extensive memory leaks. As a result, OOM kills processes semi-randomly and has caused significant damage by panicking the system(s).

  3. I’m really surprised that Linux contains such a weird feature. I wonder what state of mind a coder must be to think of a feature that more or less randomly kills processes when the only sane reaction would be an ENOMEM to the offending malloc and letting the offending application handle the error itself.
    This really sounds like a basis for entertaining debug sessions… especially as root-owned processes seem to only be avoided instead of fully exempt. How long until this feature decides to kill an important daemon like nfsd or other critical infrastructure processes which sends the whole box to go boom?

  4. Woo – the problem with Linux is that by default it overcommits memory – basically when you do a malloc on linux it doesn’t reserve ane swap areay (memory + swap disk) and alway returns as successful. Then if you have couple of programs and all of them actually do want to use the memory the problem starts as system is running out of memory but there is basically no interface to tell it to applications as it already told all of them that there is enough of it… so it starts killing application in order to avoid complete lock. In Solaris everytime malloc() some memory system will reserve required space by default so you won’t end-up in such a situation.

  5. OOM killed our (production) DB2 databases yesterday. You should have seen my boss face when I tried to explain why we needed to restore a DB from backup during the midle of the day, FUN.

  6. Those of you running billing apps or DB2 there might want to look into the fact that you can “immunize” process from oom-killer.

    This fellow has a nice writeup:

    The key line being: “Any particular process leader may be immunized against the oom killer if the value of its /proc//oomadj is set to the constant OOM_DISABLE (currently defined as -17)”

    I suppose it won’t really help with the memory leaking billing processes — without oom-killer, they would crash on their own after leaking a bit more memory, so it’s not really the operating system’s fault that they’re crashing; it’s just killing them slightly before they would have completely f’d themselves, and maybe preventing the whole OS from going down with them.

    DB2, on the other hand, should know oom-killer is out there, and guard against it… :)

Leave a Reply

Your email address will not be published. Required fields are marked *