Presentation overview
- Tonight I am going to give an overview of CentOS cluster server, and describe
what is needed to build a basic HA clusters - This presentation assumes a
basic understanding of network and clustering technology, so make sure to ask
questions if you aren’t sure about something
What is CentOS cluster server?
- CentOS cluster server is a suite of packages that can be used to deploy highly
available services on CentOS Linux-based servers
- Based on Redhat cluster server
- Provides three main features:
- Cluster management and service failover
- Network load-balancing (LVS)
- Global read-write file system (GFS)
What is required to run a cluster?
- Two or more servers that are on the HCL
- Two or more bonded NICs to send cluster heartbeat messages over (this is
optional, but highly recommended!)
- Two or more bonded NICs dedicated to public network traffic
- Supported fencing solution
- Shared storage
What does a cluster consist of?
- An HA cluster typically consists of the following items:
- Two or more nodes
- One or more fence devices
- Shared storage
- Public and private network interfaces
- One or more resources
- One or more services
- Quorum devices
- Failover Domains
Quorum devices
- Quorum is used to ensure that a majority of nodes are available in the
cluster
- Needed to avoid split-brain conditions
- Works by assigning one or more votes to each server and quorum device in
the cluster
- To ensure quorum, a cluster needs to have 51% of the available votes to
form or continue running an operational cluster
- Most common type of quorum device is a disk device that supports SCSI
persistent reservations
Fencing devices
- Fencing devices provide a way for the cluster to remove an unresponsive server
from the cluster
- Removing unresponsive nodes ensures that the cluster doesn’t enter a split
brain configuration
- Several supported ways to fence nodes:
- IPMI
- Power Fencing
- SAN fencing
- VMWare virtual center fencing
- Vendor specific methods (HP ILO, Dell DRAC, etc.)
Cluster resources
- Cluster resources provide the basic unit of configuration in a cluster
- Several types of resources exist by default:
- Apache
- GFS
- MySQL
- Oracle
- Samba
- NFS
- Tomcat
- Virtual machines
Cluster services
- Services are collections of resources that serve a specific purpose
- An example of this would be an HA MySQL service that contains three resources:
- An IP address resource that is tied to the MySQL database instance
- File system resources that contain the data and indexes needed by the
database
- A MySQL resource that starts, stops and verifies that mysql is running
Failover domains
- Failover domains allow you to define where services should transition to when
a service faults and is migrated to another node
- Each failover domain can have a unique list of nodes, and each node can be
assigned a priority to tell the cluster it is a better candidate to run the
service
How do I install CCS?
- Verify your hardware meets the hardware guidelines in the CCS manuals
- Install CentOS on each node
- Install the clustering software on each node
- Create the cluster
- Add fence devices
- Add quorum devices if needed
- Create resources, services and failover domains
- Test, test and test some more!!
Installing the cluster software
- To install CentOS cluster server you can run yum groupinstall on each node in
the cluster:
yum groupinstall "Cluster Storage" "Clustering”
- If the software isn’t already installed on a node, the cluster will install
the required packages when you add the node to the cluster
Creating a cluster
- You can create the cluster in one of three ways
- Create /etc/cluster/cluster.xml
- Run system-config-cluster
- Use the conga web interface
- Once the cluster has been created, you can add fence devices, resources,
services and failover domains using one of the methods listed above
Cluster configuration
- The cluster configuration is stored in /etc/cluster/cluster.xml on each node
- Each tag in the cluster.xml file contains a configuration entity, such as the
name of a node in the cluster, the fence device to use for each node, and a
list of resources, services and failover domains
Example cluster.xml
<?xml version="1.0"?>
<cluster name="mycluster" config_version="1">
<clusternodes>
<clusternode name="node1.example.com" nodeid="1">
<fence>
<method name="1">
<device name="ipmi-node1"/>
</method>
</fence>
</clusternode>
<clusternode name="node2.example.com" nodeid="2">
<fence>
<method name="1">
<device name="ipmi-node2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_ipmilan" name="ipmi-node1"
ipaddr="192.168.1.101" login="admin" passwd="secret"/>
<fencedevice agent="fence_ipmilan" name="ipmi-node2"
ipaddr="192.168.1.102" login="admin" passwd="secret"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="domain-1" ordered="1" restricted="1">
<failoverdomainnode name="node1.example.com" priority="1"/>
<failoverdomainnode name="node2.example.com" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="192.168.1.200" monitor_link="1"/>
</resources>
<service name="my-service" domain="domain-1" autostart="1">
<ip ref="192.168.1.200"/>
</service>
</rm>
</cluster>
Cluster utilities
- There are a number of utilities that can be used to manage a cluster:
- clustat – displays cluster status
- clusvcadm – controls cluster services
- ccs_tool – manages the cluster configuration
- cman_tool – manages the cluster members
- fence_tool – manages fencing operations
- mkqdisk – manages quorum disks
Cluster processes
- There are a number of processes that make up the cluster suite:
- cman – controls overall cluster operation
- fenced – manages fencing operations
- clurgmgrd – controls resources
- gfs and dlm kernel threads
- The processes (e.g., httpd) that run your application
Debugging cluster problems
- If your cluster is acting up, you will want to review the default logging data
in /var/log/* to see what is going on - Debug stanzas can be added to each
cluster facility to get additional debugging data:
<logger debug=”on” ident=”CMAN” to_stderr=”yes”/>
- The Redhat bugzilla archives are a great resource for finding solutions to
problems, and for troubleshooting sporadic issues
Conclusion
- CentOS cluster server has a number of cool features, and won’t cost you a dime
to deploy (you don’t get support though)
- If you decide to use CCS, make SURE you have approved hardware and fencing
devices. If you don’t, you are asking for trouble (and data loss!)!