Creating highly available zookeeper clusters


Apache Kafka has become an extremely popular streams processing framework. Kafka uses zoopkeeper to store information about the Kafka brokers. Given the importance of zookeeper it’s critical to ensure your cluster are built to withstand failure. This document will summarize the tasks required to create a 3-node highly available zookeeper cluster.

To get started with zookeeper you will first need to install a supported version of Java on each node that will become part of the zookeeper ensemble. For Fedora-based systems this can be accomplished with the dnf utility:

$ dnf -y install java-1.8.0-openjdk

Once Java is installed you can create a non-privileged user to run the zookeeper java process as:

$ useradd -r zookeeper

Installing zookeeper is pretty straight forward. The following steps can be used to extract and store the latest zookeeper image in /opt:

$ cd /opt

$ wget http://ftp.wayne.edu/apache/zookeeper/zookeeper-3.4.11/zookeeper-3.4.11.tar.gz

$ tar xfvz zookeeper-3.4.11.tar.gz

$ chown -R zookeeper:zookeeper /opt/zookeeper-3.4.11

$ ln -s /opt/zookeeper-3.4.11 /opt/zookeeper

Once the bits are installed you will need to create a zookeeper configuration file:

cd /opt/zookeeper/conf && cat > zoo.cfg << EOF
# Configuration reference: https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html#sc_configuration

#The length of a single tick, which is the basic time unit used by ZooKeeper, 
# as measured in milliseconds. It is used to regulate heartbeats, and timeouts. 
# For example, the minimum session timeout will be two ticks.
tickTime=2000

# Amount of time, in ticks (see tickTime), to allow followers to connect and sync 
# to a leader. Increased this value as needed, if the amount of data managed by 
# ZooKeeper is large.
initLimit=10

# Amount of time, in ticks (see tickTime), to allow followers to sync with ZooKeeper. 
# If followers fall too far behind a leader, they will be dropped.
syncLimit=5

# Where to store the in memory database snapshots
dataDir=/opt/zookeeper/data

# Location to store the transaction logs
dataLogDir/opt/zookeeper/flash

# The port at which the clients will connect
clientPort=2181

# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60

# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3

# Server list
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
EOF

Zookeeper uses hostnames to represent the nodes in an ensemble so you will need to make sure these are present in DNS (or /etc/hosts). After the configuration file is in place you will need to assign a unique id to each node. This id needs to be placed in the /opt/zookeeper/data/myid file:

node1$ echo 1> /opt/zookeeper/data/myid

node2$ echo 2> /opt/zookeeper/data/myid

node3$ echo 3> /opt/zookeeper/data/myid

If everything completed successfully you should be able to start zookeeper on each host:

node1$ cd /opt/zookeeper && bin/zkServer.sh start

ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

node2$ cd /opt/zookeeper && bin/zkServer.sh start

ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

node3$ cd /opt/zookeeper && bin/zkServer.sh start

ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

To see if everything came up cleanly you can use the init script status option:

node1$ cd /opt/zookeeper && bin/zkServer.sh status

ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: follower

node2$ cd /opt/zookeeper && bin/zkServer.sh status

ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: follower

node3$ cd /opt/zookeeper && bin/zkServer.sh status

ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: leader

The ensemble should consist of one leader and two followers. To get additional information about the cluster you can connect to the client port with netcat and send one more more four letter commands:

node1$ nc -v localhost 2181

Ncat: Version 7.60 ( https://nmap.org/ncat )
Ncat: Connected to ::1:2181.
ruok
imok

node1$ nc -v localhost 2181

Ncat: Version 7.60 ( https://nmap.org/ncat )
Ncat: Connected to ::1:2181.
srvr
Zookeeper version: 3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
Latency min/avg/max: 0/0/0
Received: 5
Sent: 4
Connections: 1
Outstanding: 0
Zxid: 0x200000000
Mode: leader
Node count: 4

node1$ nc -v localhost 2181

Ncat: Version 7.60 ( https://nmap.org/ncat )
Ncat: Connected to ::1:2181.
stat
Zookeeper version: 3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
Clients:
 /0:0:0:0:0:0:0:1:32988[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x200000000
Mode: follower
Node count: 4

node1$ nc -v localhost 2181

Ncat: Version 7.60 ( https://nmap.org/ncat )
Ncat: Connected to ::1:2181.
mntr
zk_version	3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
zk_avg_latency	0
zk_max_latency	46
zk_min_latency	0
zk_packets_received	776
zk_packets_sent	775
zk_num_alive_connections	1
zk_outstanding_requests	0
zk_server_state	leader
zk_znode_count	26
zk_watch_count	0
zk_ephemerals_count	2
zk_approximate_data_size	692
zk_open_file_descriptor_count	34
zk_max_file_descriptor_count	4096
zk_followers	2
zk_synced_followers	2
zk_pending_syncs	0

If everything came up cleanly you will need to create your SASL and ACL configurations to restrict who can access the cluster. This guide provides the basic steps needed to create a new zookeeper cluster but is far from complete. There is no one size fits all approach to building clusters and some careful thought needs to go into ensuring your cluster will scale to meet your needs. You will want to smoke test your cluster with something like zk-smoketest and ensure you have proper monitoring in place. I’m currently using the prometheus zookeeper exporter and JMX exporters to monitor my clusters.