Setting up the OpenBSD watchdog daemon (watchdogd)

The watchdog daemon (watchdogd) was introduced in OpenBSD 3.8, and can be used to help machines automatically recover from system hangs. If the OpenBSD hardware watchdog daemon is enabled, it will periodically update the hardware watchdog timer built into the system. If this timer is not reset for a period of time, the hardware will reset itself. The watchdog daemon is not enabled by default, and can be enabled (assuming OpenBSD can find a watchdog timer in your system) by adding a pair of empty quotes to the watchdog_flags variables in /etc/rc.conf:

$ grep watchdog /etc/rc.conf
watchdogd_flags=”” # for normal use: “”

The update interval is controlled through the kern.watchdog.period variable, which can be set in /etc/sysctl.conf, and viewed with the sysctl(8) command:

$ sysctl -a | grep watchdog
kern.watchdog.period=30
kern.watchdog.auto=0

Using the hardware watchdog can be useful when you are running routers and access points in remote locations, and don’t want to spend time driving to a remote location to reboot a hung system. I always add an rc script to the servers I support to E-mail me when the system boots. If I get an E-mail while I am performing planned maintenance, I can toss it in the trash can. If I get an E-mail because the machine reboots due to faulty hardware or a kernel bug, I will know that the system reset, and can begin investigating the the source of the problem. There are definitely times (e.g., clustered nodes) when it’s better to leave the hardware watchdog disabled, and have a monitoring station alert you to a hung system.

2 thoughts on “Setting up the OpenBSD watchdog daemon (watchdogd)”

  1. Hello Matty,

    as the watchdog(4) manual states:

    “In situations where the machine provides vital services which are not
    handled completely in kernel space, e.g. mail exchange, it may be desir-
    able to reboot the machine if process scheduling fails. This is done by
    setting kern.watchdog.auto to zero and running a process which repeatedly
    sets kern.watchdog.period to the desired timeout value. Then, if process
    scheduling fails, the process resetting the timer will not be run, lead-
    ing to the machine being rebooted.”

    Note the “running a process” statement. I tried your config, with

    kern.watchdog.auto=0,

    and to my surprise saw my machine rebooting during execution of the startup
    scrpts. Perhaps you should be more clear about this: unless you are running an
    external program that periodically resets the watchdog timer, the value should
    be ‘1’, which presumably makes the kernel reset the timer. I find the manuals a
    bit ambiguous about this too. At least it could be a bit more explicit about the
    kernel handling of the watchdog timer.

    Bill

  2. I believe the userland process in question is watchdogd, set in rc.conf.local:

    watchdogd_flags=”” # for normal use: “”

    H.

Leave a Reply

Your email address will not be published. Required fields are marked *