Deploying highly available zones with Solaris Cluster 3.2 -- Prefetch Technologies

I discussed my first impressions of Solaris Cluster 3.2 a while back, and have been using it in various capacities ever since. One thing that I really like about Solaris Cluster is its ability to manage resources running in zones, and fail these resources over to other zones running on the same host, or a zone running on a secondary host. Additionally, Solaris Cluster allows you to migrate zones between nodes, which can be quite handy when resources are tied to a zone and can’t be managed as a scalable services. Configuring zone failover is a piece of cake, and I will describe how to do it in this blog post.

To get failover working, you will first need to download and install Solaris Cluster 3.2. You can grab the binaries from sun.com, and you can install them using the installer script that comes in the zip file:

$ unzip suncluster_3_2u2-ga-solaris-x86.zip

$ cd Solaris_x86

$ ./installer

Once you run through the installer, the binaries should be placed in /usr/cluster, and you should be ready to configure the cluster. Prior to doing so, you should add something similar to the following to /etc/profile to make life easier for cluster administrators:

PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/sfw/bin:/usr/cluster/bin
export PATH

TERM=vt100
export TERM

Also, if you selected the disabled remote services option during the Solaris install, you should reconfigure the rpc/bind service to allow connections from other cluster members (Solaris Cluster uses RPC extensively for cluster communications). This can be accomplished with the svccfg utility:

$ svccfg

svc:> **select rpc/bind**
svc:/network/rpc/bind> **setprop config/local_only=false**

Once the properties are adjusted, you can refresh the rpc/bind service to get these properties to go into effect:

$ svcadm refresh rpc/bind

Now that the environment is set up, you can run scinstall on each node to configure the cluster. I personally like to configure the first node and then add the second node to the cluster (this requires you to run scinstall on node one, then again on node two), but you can configured everything in one scinstall run if you prefer. Once scinstall completes and the nodes reboot, you should be able to run the cluster command to see if the cluster is operational:

$ cluster status

=== Cluster Nodes ===

--- Node Status ---

Node Name Status
--------- ------
snode1 Online
snode2 Online


=== Cluster Transport Paths ===

Endpoint1 Endpoint2 Status
--------- --------- ------
snode1:e1000g2 snode2:e1000g2 Path online
snode1:e1000g1 snode2:e1000g1 Path online


=== Cluster Quorum ===

--- Quorum Votes Summary ---

Needed Present Possible
------ ------- --------
1 1 1


--- Quorum Votes by Node ---

Node Name Present Possible Status
--------- ------- -------- ------
snode1 1 1 Online
snode2 0 0 Online


=== Cluster Device Groups ===

--- Device Group Status ---

Device Group Name Primary Secondary Status
----------------- ------- --------- ------


--- Spare, Inactive, and In Transition Nodes ---

Device Group Name Spare Nodes Inactive Nodes In Transistion Nodes
----------------- ----------- -------------- --------------------


--- Multi-owner Device Group Status ---

Device Group Name Node Name Status
----------------- --------- ------

=== Cluster Resource Groups ===

Group Name Node Name Suspended State
---------- --------- --------- -----

=== Cluster Resources ===

Resource Name Node Name State Status Message
------------- --------- ----- --------------

=== Cluster DID Devices ===

Device Instance Node Status
--------------- ---- ------
/dev/did/rdsk/d1 snode1 Ok
snode2 Ok

/dev/did/rdsk/d3 snode1 Ok

/dev/did/rdsk/d5 snode2 Ok


=== Zone Clusters ===

--- Zone Cluster Status ---

Name Node Name Zone HostName Status Zone Status
---- --------- ------------- ------ -----------

In the cluster status output above, we can see that we have a 2-node cluster that contains the hosts named snode1 and snode2. If there are no errors in the status output, you can register the HAStoragePlus resource type (this manages disk storage, and allows volumes and pools to failover between node) with the cluster. This can be accomplished with the clresourcetype command:

$ clresourcetype register SUNW.HAStoragePlus

Next you will need to create a resource group, which will contain all of the zone resources:

$ clresourcegroup create hazone-rg

Once the resource group is created, you will need to add a HAStoragePlus resource to manage the UFS file system(s) or ZFS pool your zone lives on. In the example below, a ZFS pool named hazone-pool resource is added to manage the ZFS pool named hazonepool:

$ clresource create -t SUNW.HAStoragePlus -g hazone-rg -p Zpools=hazonepool -p AffinityOn=True hazone-zpool

After the storage is configured, you will need to update DNS or /etc/hosts with the name and IP address that you plan to assign to the highly available zone (this is the hostname that hosts will use to access services in the highly available zone). For simplicity purposes, I added an entry to /etc/hosts on each node:

# Add a hazone-lh entry to DNS or /etc/hosts
192.168.1.23 hazone-lh

Next you will need to create a logical hostname resource. This resource type is used to manage interface failover, which allows one or more IP addresses to float between cluster nodes. To create a logical hostname resource, you can use the clreslogicalhostname utility:

$ clreslogicalhostname create -g hazone-rg -h hazone-lh hazone-lh

Now that the storage and logical hostname resources are configured, you can bring the resource group that contains these resources online:

$ clresourcegroup online -M hazone-rg

If the cluster, clresourcegroup and clresource status commands list everything in an online state, we can create a zone with the zonecfg and zoneadm commands. The zone needs to be installed on each node so the zone gets put into the installed state, and the Sun documentation recommends removing the installed zone on the first node prior to installing it on the second node. This will work, though I think you can play with the index file to simplify this process (this is unsupported though). Once the zones are installed, you should failover the shared storage to each node in the cluster, and boot the zones to be extra certain. If this works correctly, then you are ready to register the SUNW.gds resource type:

$ clresourcetype register SUNW.gds

The SUNW.gds resource type provides the cluster hooks to bring the zone online, and will optionally start one or more services in a zone. To configure the resource type, you will need to create a configuration file that describes the resources used by the zone, the resource group the resources are part of, and the logical hostname to use with the zone. Here is an example configuration file I used to create my highly available zone:

$ cat /etc/zones/sczbt_config

# The resource group that contains the resources the zones depend on
RG=hazone-rg
# The name of the zone resource to create
RS=hazone-zone
# The directory where this configuration file is stored
PARAMETERDIR=/etc/zones
SC_NETWORK=true
# Name of the logical hostname resource
SC_LH=hazone-lh
# Name of the zone you passed to zonecfg -z
Zonename=hazone
Zonebrand=native
Zonebootopt=""
Milestone="multi-user-server"
FAILOVER=true
# ZFS pool that contains the zone
HAS_RS=hazone-zpool

The Solaris Cluster highly available guide for containers describes each of these parameters, so I won’t go into detail on the individual options. To tell the cluster framework that it will be managing the zone, you can execute the sczbt_register command passing it the configuration file you created as an argument:

$ cd /opt/SUNWsczone/sczbt/util

$ ./sczbt_register -f /etc/zones/sczbt_config

Once the zone is tied into the cluster framework, you can bring the zone resource group (and the zone) online with the clresourcegroup command:

$ clresourcegroup online -n snode2 hazone-rg

If the zone came online (which it should if everything was executed above), you should see the following:

$ clresourcegroup status

=== Cluster Resource Groups ===

Group Name Node Name Suspended Status
---------- --------- --------- ------
hazone-rg snode1 No Offline
snode2 No Online

$ zoneadm list -vc

ID NAME STATUS PATH BRAND IP
0 global running / native shared
1 hazone running /hazonepool/zones/hazone native shared

$ zlogin hazone zonename
hazone

If you have services that you want to start and stop when you bring your zone online, you can use SMF or the ServiceStartCommand, ServiceStopCommand and ServiceProbeCommand SUNW.gds configuration options. Here are a couple of sample entries that could be added to the configuration file listed above:

ServiceStartCommand="/usr/local/bin/start-customapp"
ServiceStopCommand="/usr/local/bin/stop-customapp"
ServiceProbeCommand="/usr/local/bin/probe-customapp"

As the names indicate, ServiceStartCommand contains the command to run to start the service, ServiceStopCommand contains the command to run to stop the service, and ServiceProbeCommand contains the command to run to verify the service is up and operational. This is super useful stuff, and it’s awesome that zones will now failover to a secondary node when a server fails, or when a critical error occurs and a zone is unable to run.

This article was posted by Matty on 2009-04-10 11:17:00 -0400 -0400