Deploying highly available zones with Solaris Cluster 3.2

I discussed my first impressions of Solaris Cluster 3.2 a while back, and have been using it in various capacities ever since. One thing that I really like about Solaris Cluster is its ability to manage resources running in zones, and fail these resources over to other zones running on the same host, or a zone running on a secondary host. Additionally, Solaris Cluster allows you to migrate zones between nodes, which can be quite handy when resources are tied to a zone and can’t be managed as a scalable services. Configuring zone failover is a piece of cake, and I will describe how to do it in this blog post.

To get failover working, you will first need to download and install Solaris Cluster 3.2. You can grab the binaries from sun.com, and you can install them using the installer script that comes in the zip file:

$ unzip suncluster_3_2u2-ga-solaris-x86.zip


$ cd Solaris_x86


$ ./installer

Once you run through the installer, the binaries should be placed in /usr/cluster, and you should be ready to configure the cluster. Prior to doing so, you should add something similar to the following to /etc/profile to make life easier for cluster administrators:

PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/sfw/bin:/usr/cluster/bin
export PATH

TERM=vt100
export TERM

Also, if you selected the disabled remote services option during the Solaris install, you should reconfigure the rpc/bind service to allow connections from other cluster members (Solaris Cluster uses RPC extensively for cluster communications). This can be accomplished with the svccfg utility:

$ svccfg
svc:> select rpc/bind
svc:/network/rpc/bind> setprop config/local_only=false


Once the properties are adjusted, you can refresh the rpc/bind service to get these properties to go into effect:

$ svcadm refresh rpc/bind


Now that the environment is set up, you can run scinstall on each node to configure the cluster. I personally like to configure the first node and then add the second node to the cluster (this requires you to run scinstall on node one, then again on node two), but you can configured everything in one scinstall run if you prefer. Once scinstall completes and the nodes reboot, you should be able to run the cluster command to see if the cluster is operational:

$ cluster status

=== Cluster Nodes ===

--- Node Status ---

Node Name                                       Status
---------                                       ------
snode1                                          Online
snode2                                          Online


=== Cluster Transport Paths ===

Endpoint1               Endpoint2               Status
---------               ---------               ------
snode1:e1000g2          snode2:e1000g2          Path online
snode1:e1000g1          snode2:e1000g1          Path online


=== Cluster Quorum ===

--- Quorum Votes Summary ---

            Needed   Present   Possible
            ------   -------   --------
            1        1         1


--- Quorum Votes by Node ---

Node Name       Present       Possible       Status
---------       -------       --------       ------
snode1          1             1              Online
snode2          0             0              Online


=== Cluster Device Groups ===

--- Device Group Status ---

Device Group Name     Primary     Secondary     Status
-----------------     -------     ---------     ------


--- Spare, Inactive, and In Transition Nodes ---

Device Group Name   Spare Nodes   Inactive Nodes   In Transistion Nodes
-----------------   -----------   --------------   --------------------


--- Multi-owner Device Group Status ---

Device Group Name           Node Name           Status
-----------------           ---------           ------

=== Cluster Resource Groups ===

Group Name       Node Name       Suspended      State
----------       ---------       ---------      -----

=== Cluster Resources ===

Resource Name       Node Name       State       Status Message
-------------       ---------       -----       --------------

=== Cluster DID Devices ===

Device Instance              Node               Status
---------------              ----               ------
/dev/did/rdsk/d1             snode1             Ok
                             snode2             Ok

/dev/did/rdsk/d3             snode1             Ok

/dev/did/rdsk/d5             snode2             Ok


=== Zone Clusters ===

--- Zone Cluster Status ---

Name    Node Name    Zone HostName    Status    Zone Status
----    ---------    -------------    ------    -----------



In the cluster status output above, we can see that we have a 2-node cluster that contains the hosts named snode1 and snode2. If there are no errors in the status output, you can register the HAStoragePlus resource type (this manages disk storage, and allows volumes and pools to failover between node) with the cluster. This can be accomplished with the clresourcetype command:

$ clresourcetype register SUNW.HAStoragePlus

Next you will need to create a resource group, which will contain all of the zone resources:

$ clresourcegroup create hazone-rg

Once the resource group is created, you will need to add a HAStoragePlus resource to manage the UFS file system(s) or ZFS pool your zone lives on. In the example below, a ZFS pool named hazone-pool resource is added to manage the ZFS pool named hazonepool:

$ clresource create -t SUNW.HAStoragePlus -g hazone-rg -p Zpools=hazonepool -p AffinityOn=True hazone-zpool

After the storage is configured, you will need to update DNS or /etc/hosts with the name and IP address that you plan to assign to the highly available zone (this is the hostname that hosts will use to access services in the highly available zone). For simplicity purposes, I added an entry to /etc/hosts on each node:

# Add a hazone-lh entry to DNS or /etc/hosts
192.168.1.23 hazone-lh

Next you will need to create a logical hostname resource. This resource type is used to manage interface failover, which allows one or more IP addresses to float between cluster nodes. To create a logical hostname resource, you can use the clreslogicalhostname utility:

$ clreslogicalhostname create -g hazone-rg -h hazone-lh hazone-lh

Now that the storage and logical hostname resources are configured, you can bring the resource group that contains these resources online:

$ clresourcegroup online -M hazone-rg

If the cluster, clresourcegroup and clresource status commands list everything in an online state, we can create a zone with the zonecfg and zoneadm commands. The zone needs to be installed on each node so the zone gets put into the installed state, and the Sun documentation recommends removing the installed zone on the first node prior to installing it on the second node. This will work, though I think you can play with the index file to simplify this process (this is unsupported though). Once the zones are installed, you should failover the shared storage to each node in the cluster, and boot the zones to be extra certain. If this works correctly, then you are ready to register the SUNW.gds resource type:

$ clresourcetype register SUNW.gds


The SUNW.gds resource type provides the cluster hooks to bring the zone online, and will optionally start one or more services in a zone. To configure the resource type, you will need to create a configuration file that describes the resources used by the zone, the resource group the resources are part of, and the logical hostname to use with the zone. Here is an example configuration file I used to create my highly available zone:

$ cat /etc/zones/sczbt_config
# The resource group that contains the resources the zones depend on
RG=hazone-rg
# The name of the zone resource to create
RS=hazone-zone
# The directory where this configuration file is stored
PARAMETERDIR=/etc/zones
SC_NETWORK=true
# Name of the logical hostname resource
SC_LH=hazone-lh
# Name of the zone you passed to zonecfg -z
Zonename=hazone
Zonebrand=native
Zonebootopt=””
Milestone=”multi-user-server”
FAILOVER=true
# ZFS pool that contains the zone
HAS_RS=hazone-zpool

The Solaris Cluster highly available guide for containers describes each of these parameters, so I won’t go into detail on the individual options. To tell the cluster framework that it will be managing the zone, you can execute the sczbt_register command passing it the configuration file you created as an argument:

$ cd /opt/SUNWsczone/sczbt/util

$ ./sczbt_register -f /etc/zones/sczbt_config


Once the zone is tied into the cluster framework, you can bring the zone resource group (and the zone) online with the clresourcegroup command:

$ clresourcegroup online -n snode2 hazone-rg


If the zone came online (which it should if everything was executed above), you should see the following:

$ clresourcegroup status

=== Cluster Resource Groups ===

Group Name       Node Name       Suspended      Status
----------       ---------       ---------      ------
hazone-rg        snode1          No             Offline
                 snode2          No             Online



$ zoneadm list -vc

  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   1 hazone           running    /hazonepool/zones/hazone       native   shared



$ zlogin hazone zonename
hazone



If you have services that you want to start and stop when you bring your zone online, you can use SMF or the ServiceStartCommand, ServiceStopCommand and ServiceProbeCommand SUNW.gds configuration options. Here are a couple of sample entries that could be added to the configuration file listed above:

ServiceStartCommand=”/usr/local/bin/start-customapp”
ServiceStopCommand=”/usr/local/bin/stop-customapp”
ServiceProbeCommand=”/usr/local/bin/probe-customapp”

As the names indicate, ServiceStartCommand contains the command to run to start the service, ServiceStopCommand contains the command to run to stop the service, and ServiceProbeCommand contains the command to run to verify the service is up and operational. This is super useful stuff, and it’s awesome that zones will now failover to a secondary node when a server fails, or when a critical error occurs and a zone is unable to run.

9 thoughts on “Deploying highly available zones with Solaris Cluster 3.2”

  1. Besides setting up a failover filesystem how would you setup a global file system to be mounted on all nodes in the cluster?
    So for example you have node1, node2
    each node is running zone’s…
    now each zone I would want to mount the same SAN lun… (granted only one zone will be running application at a time so should hopefully avoid clobbering writes)

    Thanks

  2. Unfortunately patching a cluster with failover zones installed is not a piece of cake…

  3. In the command: clresource create -t SUNW.HAStoragePlus -g hazone-rg -p Zpools=hazonepool -p AffinityOn=True hazone-zpool

    is Zpools a predefined/standard property?
    Is it on shared storage? Is hazonepool local to cluster node 1?

    You mention: “Once the zones are installed, you should failover the shared storage to each node in the cluster, and boot the zones to be extra certain”
    How do I failover the shared storage to each node – where have you configured shared storage?

  4. Okay, I made it this far…..
    you suggest: “The zone needs to be installed on each node so the zone gets put into the installed state, and the Sun documentation recommends removing the installed zone on the first node prior to installing it on the second node. ”
    so how do I set the zonepath for the zone, I have one shared zpool that is where the zonepath will be, how do I create two zones on one zonepath?

    Please help

  5. still trying to figure out how to install zones. Also I could not get the zone to boot because, it said address already in use by global zone…..

  6. Okay, I completed all the steps, but on one node the zone is in an installed state and it is in running state on the other node, I tested failover but the zone does not failover in running state – ie if the node it is running on fails the zone fails as well……

  7. It works! Thanks for the excellent blog entry. However one thing still does not work – the global zone controls the logical interface so I was not able to plumb the zones by it, how do I do that?

  8. does not work for me. without cluster hooks, the zones start on either node without issue – my issue is that when I put in the zone resource by using the /opt/SUNWsczone/sczbt/util/sczbt_register script the zone stays on OFFLINE state on both nodes, clresource enable brings it online – after a while. The zone then comes online but does not failover.

Leave a Reply

Your email address will not be published. Required fields are marked *