Skip to main content

Using Keepalived for managing simple failover in clusters

This Linux high availability cluster introductory article walks you through the basic protocol that underpins Keepalived, which is a software implementation of VRRP on Linux.
Image
blade servers

Photo by panumas nikhomkhai from Pexels

When you hear the term "high availability," you might think of large, complex environments with arcane technologies that are beyond the reach of the average sysadmin. But basic HA doesn't have to be complicated: in this series, you will learn about implementing basic, highly available services using Keepalived. I will take you through simple failover situations, as well as a more complex configuration used to respond to external events and trigger failovers. First, we will start with the fundamentals of Keepalived and the Virtual Router Redundancy Protocol (VRRP).

This article is the first in a series of three articles covering everything from basic setup to advanced Linux HA concepts.

Keepalived and VRRP Basics

Image
server1server2

Network symbols in the diagrams available via VRT Network Equipment Extension, CC BY-SA 3.0.

If you've read some of the Enable Sysadmin networking articles, then you know that all sysadmins can benefit from a firm understanding of network fundamentals. Knowledge of Keepalived is no different. The protocol that underpins HA failover is the Virtual Router Redundancy Protocol (VRRP), and Keepalived provides both a version 2 and a version 3 implementation of this protocol.

It might sound strange that we're using a protocol built for routers on our servers. It turns out that the same networking technology used for providing redundancy to network equipment can also provide redundancy in server environments. Routers are often deployed in pairs, where one router is active and another is standby, ready to go in case the active router fails. These same concepts may be applied to servers.

VRRP uses the concept of a virtual IP address (VIP). One or more hosts (routers, servers, etc.) participate in an election to determine the host that will control that VIP. Only one host (the master) controls the VIP at a time. If the master fails, VRRP provides mechanisms for detecting that failure and quickly failing over to a standby host. In the above topology, server1 is the master and is responsible for the 192.168.122.200 IP address. If server1 fails, then server2 takes over this IP.

It's also worth being aware that Keepalived provides more than just a VRRP implementation. Keepalived also has the ability to configure Linux IP Virtual Servers for load balancing. Configuring IPVS is outside the scope of this series, but it's good to know that you can use Keepalived to configure an all-in-one redundant load balancer for your environment.

VRRP Protocol Operation

VRRP's behavior is specified by RFC 3768 (version 2) and RFC 5798 (version 3). I will be using version 2 in this series of articles. While reviewing the RFC is the best way to fully understand the protocol's behavior, you don't have to be an expert to begin using Keepalived's implementation in your environment. However, basic knowledge of VRRP's behavior will better position you to operate and troubleshoot it in your environment.

The first step in VRRP's operations is the election of a master to determine which server (or router, in the protocol specification) will hold the shared IP address. VRRP servers are configured with a priority value, which can be thought of like a weight. The server with the highest priority will be the owner of a VRRP address. The specification indicates that the master's priority should be 255, with any backup servers having a value lower than 255. In practice, a priority of 255 isn't strictly necessary as the protocol will select the server with the highest priority, even if it isn't 255.

Once a master is established, all other servers listen for periodic messages sent by the master to indicate that it is still alive. The master sends out these advertisements at a regular interval. As long as the master is alive, it will service traffic for the VIP and send advertisements. If the master goes offline for some reason, then the backup server with the highest priority will take over. Similarly, a feature called preemption can allow any server that has a higher priority to become master automatically when it comes online.

When a master first comes online and takes over an IP address, it broadcasts a gratuitous ARP. This message informs other servers on the network of the MAC address associated with the VIP so that they can address their traffic correctly at Layer 2. It also makes VIP failover faster: hosts don't have to wait for their ARP timers to expire and can simply update their ARP tables with the correct MAC address for the host that owns the VIP.

Packet format

Digging into the theoretical aspects of a protocol's operation can be a bit dull, but it's critical for understanding how a technology operates (and for troubleshooting it when it breaks). If you take a look at the packet structure of a VRRP advertisement using Wireshark, a few things become more clear.

Image
KeepaliveDWireshark

 

First, you will notice that both the Ethernet and IP destination addresses are multicast addresses. Multicast traffic, as the name implies, is sent out to multiple hosts on a network that are "listening" to that multicast address. Most networks avoid complex multicast configuration, so the multicast traffic for VRRP will become broadcast traffic on the local network segment and will go to all hosts.

You can also see that VRRP is neither TCP nor UDP. VRRP uses IP protocol number 112 for its operation. Knowing this protocol number can be important, because you may need to configure your host firewall to permit this traffic from the VRRP servers in your environment.

Once you start looking at the VRRP section of the packet, you will notice that it contains all of the information needed to elect a master and inform other servers of the current master:

  • Virtual Router ID (VRID) is a unique identifier for a VRRP instance and its IP addresses (there can be more than one) on a network. You should avoid reusing VRIDs on the same LAN, but they can safely be reused on different Layer 2 networks.
  • Priority is the priority for the host sending the advertisement. Once a master is elected, this is whatever the master's defined priority is. Strict adherence to the specification should use 255 for the master's priority, but many configurations choose a different value.
  • Auth Type and Authentication String contain a simple text password to authenticate members of the VRRP group with each other.
  • Advertisement Interval indicates how often advertisements will be sent out by the master. In this case, the master will send an advertisement every second.
  • IP Address contains one or more IP addresses for which the master is responsible. While this series will only cover failover of a single IP address, it is possible to have VRRP manage multiple IPs.

Conclusion

This article walked you through the basic protocol that underpins Keepalived, a software implementation of VRRP on Linux. While reviewing protocol specifics may seem dull, it's crucial to understand the network protocols operating in your environment so that you can effectively configure and troubleshoot them. In the next article, you will learn how to install and configure Keepalived.

[ Need to learn more about Linux system administration? Consider taking a Red Hat system administration course. ]

Topics:   Linux   Networking  
Author’s photo

Anthony Critelli

Anthony Critelli is a Linux systems engineer with interests in automation, containerization, tracing, and performance. He started his professional career as a network engineer and eventually made the switch to the Linux systems side of IT. He holds a B.S. and an M.S. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.