Deploying Highly Available Virtual Interfaces With Keepalived


Linux is a powerhouse when it comes to networking, and provides a full featured and high performance network stack. When combined with web front-ends such as:

Or your favorite application server, Linux is a killer platform for hosting web applications. Keeping these applications up and operational can sometimes be a challenge, especially in this age of horizontally scaled infrastructure and commodity hardware. But don’t fret, since there are a number of technologies that can assist with making your applications and network infrastructure fault tolerant.

One of these technologies, keepalived, provides interface failover and the ability to perform application-layer health checks. When these capabilities are combined with the Linux Virtual Server (LVS) project, a fault in an application will be detected by keepalived, and the virtual interfaces that are accessed by clients can be migrated to another available node. This article will provide an introduction to keepalived, and will show how to configure interface failover between two or more nodes. Additionally, the article will show how to debug problems with keepalived and VRRP.

What Is Keepalived?

The keepalived project provides a keepalive facility for Linux servers. This keepalive facility consists of a VRRP implementation to manage virtual routers (aka virtual interfaces), and a health check facility to determine if a service (web server, samba server, etc.) is up and operational. If a service fails a configurable number of health checks, keepalived will fail a virtual router over to a secondary node. While useful in its own right, keepalived really shines when combined with the Linux Virtual Server project. This article will focus on keepalived, and a future article will show how to integrate the two to create a fault tolerant load-balancer.

Installing KeepAlived From Source Code

Before we dive into configuring keepalived, we need to install it. Keepalived is distributed as source code, and is available in several package repositories. To install from source code, you can execute wget or curl to retrieve the source, and then run “configure”, “make” and “make install” compile and install the software:

$ wget http://www.keepalived.org/software/keepalived-1.1.17.tar.gz

$ tar xfvz keepalived-1.1.17.tar.gz

$ cd keepalived-1.1.17

$ ./configure --prefix=/usr/local

$ make && make install

In the example above, the keepalived daemon will be compiled and installed as /usr/local/sbin/keepalived.

Configuring KeepAlived

The keepalived daemon is configured through a text configuration file, typically named keepalived.conf. This file contains one or more configuration stanzas, which control notification settings, the virtual interfaces to manage, and the health checks to use to test the services that rely on the virtual interfaces. Here is a sample annotated configuration that defines two virtual IP addresses to manage, and the individuals to contact when a state transition or fault occurs:

# Define global configuration directives
global_defs {

   # Send an e-mail to each of the following 
   # addresses when a failure occurs
   notification_email {
       matty@prefetch.net
       operations@prefetch.net
   }
   # The address to use in the From: header
   notification_email_from root@VRRP-director1.prefetch.net

   # The SMTP server to route mail through
   smtp_server mail.prefetch.net

   # How long to wait for the mail server to respond
   smtp_connect_timeout 30

   # A descriptive name describing the router
   router_id VRRP-director1
}

# Create a VRRP instance 
VRRP_instance VRRP_ROUTER1 {

    # The initial state to transition to. This option isn't
    # really all that valuable, since an election will occur
    # and the host with the highest priority will become
    # the master. The priority is controlled with the priority
    # configuration directive.
    state MASTER

    # The interface keepalived will manage
    interface br0

    # The virtual router id number to assign the routers to
    virtual_router_id 100

    # The priority to assign to this device. This controls
    # who will become the MASTER and BACKUP for a given
    # VRRP instance.
    priority 100

    # How many seconds to wait until a gratuitous arp is sent
    garp_master_delay 2

    # How often to send out VRRP advertisements
    advert_int 1

    # Execute a notification script when a host transitions to
    # MASTER or BACKUP, or when a fault occurs. The arguments
    # passed to the script are:
    #  $1 - "GROUP"|"INSTANCE"
    #  $2 = name of group or instance
    #  $3 = target state of transition
    # Sample: VRRP-notification.sh VRRP_ROUTER1 BACKUP 100
    notify "/usr/local/bin/VRRP-notification.sh"

    # Send an SMTP alert during a state transition
    smtp_alert

    # Authenticate the remote endpoints via a simple 
    # username/password combination
    authentication {
        auth_type AH
        auth_pass 192837465
    }
    # The virtual IP addresses to float between nodes. The
    # label statement can be used to bring an interface 
    # online to represent the virtual IP.
    virtual_ipaddress {
        192.168.1.100 label br0:100
        192.168.1.101 label br0:101
    }
}

The configuration file listed above is self explanatory, so I won’t go over each directive in detail. I will point out a couple of items:

Starting Keepalived

Keepalived can be executed from an RC script, or started from the command line. The following example will start keepalived using the configuration file /usr/local/etc/keepalived.conf:

$ keepalived -f /usr/local/etc/keepalived.conf

If you need to debug keepalived issues, you can run the daemon with the “–dont-fork”, “–log-console” and “–log-detail” options:

$ keepalived -f /usr/local/etc/keepalived.conf --dont-fork --log-console --log-detail

These options will stop keepalived from fork’ing, and will provide additional logging data. Using these options is especially useful when you are testing out new configuration directives, or debugging an issue with an existing configuration file.

Locating The Router That is Managing A Virtual IP

To see which director is currently the master for a given virtual interface, you can check the output from the ip utility:

dir1$ ip addr list br0

5: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:24:8c:4e:07:f6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.6/24 brd 192.168.1.255 scope global br0
    inet 192.168.1.100/32 scope global br0:100
    inet 192.168.1.101/32 scope global br0:101
    inet6 fe80::224:8cff:fe4e:7f6/64 scope link 
       valid_lft forever preferred_lft forever

dir2$ ip addr list br0

5: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:24:8c:4e:07:f6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.7/24 brd 192.168.1.255 scope global br0
    inet6 fe80::224:8cff:fe4e:7f6/64 scope link 
       valid_lft forever preferred_lft forever

In the output above, we can see that the virtual interfaces 192.168.1.100 and 192.168.1.101 are currently active on VRRP-director1.

Troubleshooting Keepalived And VRRP

The keepalived daemon will log to syslog by default. Log entries will range from entries that show when the keepalive daemon started, to entries that show state transitions. Here are a few sample entries that show keepalived starting up, and the node transitioning a VRRP instance to the MASTER state:

Jul  3 16:29:56 disarm Keepalived: Starting Keepalived v1.1.17 (07/03,2009)
Jul  3 16:29:56 disarm Keepalived: Starting VRRP child process, pid=1889
Jul  3 16:29:56 disarm Keepalived_VRRP: Using MII-BMSR NIC polling thread...
Jul  3 16:29:56 disarm Keepalived_VRRP: Registering Kernel netlink reflector
Jul  3 16:29:56 disarm Keepalived_VRRP: Registering Kernel netlink command channel
Jul  3 16:29:56 disarm Keepalived_VRRP: Registering gratutious ARP shared channel
Jul  3 16:29:56 disarm Keepalived_VRRP: Opening file '/usr/local/etc/keepalived.conf'.
Jul  3 16:29:56 disarm Keepalived_VRRP: Configuration is using : 62990 Bytes
Jul  3 16:29:57 disarm Keepalived_VRRP: VRRP_Instance(VRRP_ROUTER1) Transition to MASTER STATE
Jul  3 16:29:58 disarm Keepalived_VRRP: VRRP_Instance(VRRP_ROUTER1) Entering MASTER STATE
Jul  3 16:29:58 disarm Keepalived_VRRP: Netlink: skipping nl_cmd msg...

If you are unable to determine the source of a problem with the system logs, you can use tcpdump to display the VRRP advertisements that are sent on the local network. Advertisements are sent to a reserved VRRP multicast address (224.0.0.18), so the following filter can be used to display all VRRP traffic that is visible on the interface passed to the “-i” option:

$ tcpdump -vvv -n -i br0 host 224.0.0.18

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 96 bytes

10:18:23.621512 IP (tos 0x0, ttl 255, id 102, offset 0, flags [none], proto VRRP (112), length 40) \
                192.168.1.6 > 224.0.0.18: VRRPv2, Advertisement, vrid 100, prio 100, authtype simple, 
                intvl 1s, length 20, addrs: 192.168.1.100 auth "19283746"

10:18:25.621977 IP (tos 0x0, ttl 255, id 103, offset 0, flags [none], proto VRRP (112), length 40) \
                192.168.1.6 > 224.0.0.18: VRRPv2, Advertisement, vrid 100, prio 100, authtype simple, 
                intvl 1s, length 20, addrs: 192.168.1.100 auth "19283746"
                         .........

The output contains several pieces of data that be useful for debugging problems:

authtype - the type of authentication in use (authentication configuration directive)
vrid - the virtual router id (virtual_router_id configuration directive)
prio - the priority of the device (priority configuration directive)
intvl - how often to send out advertisements (advert_int configuration directive)
auth - the authentication token sent (auth_pass configuration directive)

Conclusion

In this article I described how to set up a host to use the keepalived daemon, and provided a sample configuration file that can be used to failover virtual interfaces between servers. Keepalived has a slew of options not covered here, and I will refer you to the keepalived source code and documentation for additional details.