First thoughts of Sun Cluster 3.2

Over the past few weeks, I have been heads down studying for the Sun Cluster 3.2 beta exam. I finally took the certification test this week, and am hopeful that I passed (I am pretty sure I did). Prior to studying for this exam, my last experience with Sun’s clustering technology was Sun Cluster 2.2. I was not a big fan of it, since it caused a number of outages at my previous employer, and lacked a number of features that were available in Veritas Cluster Server.

When I learned about the Sun Cluster 3.2 exam a few weeks ago, I thought I would give Sun’s clustering technology a second chance (I was very hesitant to spend time with it, but the folks on the Sun cluster oasis psyched me up to work with it). I have only worked with Sun cluster 3.2 for three weeks, but my view of Sun’s clustering techology has completely changed. Sun cluster 3.2 is an incredible product, and has some amazingly cool features (and it’s free if you don’t need support!!). Here are a few of my favorites:

– You can create scalable services that run on one or more nodes at the same time. This allows you to turn a pool of web servers (which can run in global or non-global zones) into a large mesh of load-balanced servers.

– Sun cluster 3.2 comes with data service agents (these are the entities responsible for starting, stopping and monitoring a given application) for a number of commercial (e.g., Oracle, Oracle RAC, etc.) and opensource (e.g., BIND, NFS, Apache, Samba, etc.) applications.

– ZFS pool and non-global zone fail over are integrated natively into the product, so you can deploy highly available zones, or use a zpool with one of your HA services.

– Sun cluster 3.2 uses a single global device namespace to represent devices, which means the underlying device names can be different on each host.

– Global file systems are supported, which allows you to mount a file system on one node, and access it from any node in the cluster through the cluster interconnects.

– Sun cluster 3.2 comes with a new full featured command line that actually makes sense (this is the third time Sun has changed the Sun cluster command set, which has annoyed more than one adminstrator. I think they finally got it right, so hopefully it won’t change again!).

– There is a thorough set of documents and manual pages that describe the cluster agent API, and how to use it to easily (and I do mean easily) create agents for applications that don’t have a bundled agent.

– Resource and resource group dependencies can be created, and affinities can be used to control where in the cluster a resource group is brought online.

– Sun cluster manager (the web-based administrative portal for SC 3.2) has a really nice layout,
and allows you to view and manage pretty much every facet of the cluster over an HTTPS connection.

– The Sun cluster oasis is run by the folks who developed the code that went into Sun cluster 3.2. Not only have the developers and architects posted numerous useful examples, they answer comments left by cluster administrators.

While Sun cluster 3.2 has some cool features, there are still a few downsides (at least I think they are):

– To take a node out of cluster mode, you can to reboot the server with the “-x” option. This is a royal pain for machines that take a looooong time to boot, and leaves your cluster at risk for longer periods of time (there really needs to be a cluster -n NODENAME stop-cluster option added to stop the cluster framework on one or more nodes without a reboot).

– Log files are distributed throughout the file system, and are not centrally located. If you need to look at cluster framework messages, you need to look through /var/adm/messages. If on the other hand you want to monitor the Apache or Oracle data service logs, you need to wander to wanother location. If you need to review the commandlog, there is yet another place to check. Maybe Sun could investigate using a single location for log files (like VCS does).

– The Sun cluster 3.2 documentation set is riddled with typographical errors, contains a number of examples that don’t match what is being described, documents seem to contradict each other, and information is repeated in doens and dozens of places. There is also the issue of docs.sun.com taking days to load documents (why can’t Sun build a scable site for documentation?).

– While the global device namespace is nifty, it seems silly that you have to allocate a 512MB+ slice (or slices if you need to mirror the slice to make it highly availabe) on each node to contain the file system used for global devices (i.e., /globaldevices).

– Sun cluster only supports Solaris, which makes it hard for shops to standardize on a single clustering package.

Since Sun Cluster 3.2 is still relatively new, there isn’t a whole lot of data out there to gauge how reliable and stable it is. VCS is a great clustering framework, and if SC 3.2 is as stable as the folks on the cluster oasis claim it is, I think they will definitely give VCS a run for their money on Solaris hosts (hopefully the Sun folks will investigate porting Sun cluster to Linux)

9 Comments

dean ross-smith  on April 30th, 2007

as far as logging goes, how ’bout setting up a centralized logging server through syslog.conf? I get all of my cluster messages on a host that’s external to the cluster in one /var/adm/messages file …

A couple of things that seem to be cool for creating failover type services… I want to try this but haven’t yet… going off of what I’ve read…
1. any service that’s setup through the service framework (uses xml for controlling start and stop of a daemon instead of the rc and init.d directories) can be setup as a failover service- no cluster agent necesssary. That’s cool.
2. Because zones are supported as failover, that means that we should be able to set up any non-cluster aware application with init scripts for starting and killing it and fail the zone around a cluster with non-cluster aware applications inside. That seems useful to me…

matty  on May 2nd, 2007

Hi Dean,

A centralized syslog solution would work for the cluster engine logs, but coulnd’t be used for the agent and command logs. In regards to point number 2, the sczsmf and sczsh resources can greatly assist with this. I am currently working on a project that is deploying highly available zones, and we are looking to use the Sun cluster sczsh or sczsmf components to start, stop and monitor the applications running inside these zones.

– Ryan

Joe  on January 23rd, 2008

Matty, Like the Blog,
Got a question, does the GlobalDevices slice have to be on each of the local nodes or can it be on Global Shared Storage like the quorum slice in a two node config?
I’m just now starting to install cluster 3.2 for the first time and this question doesn’t really get answered in the docs.

-Joe

Paul  on January 28th, 2008

Hi Matty:

Great blog! Any idea how to get a 2-node Sun Cluster to provide a round-robin Virtual IP address similar to what a load balancer would do? (i.e. when both nodes are running fine, load is balanced but when one is down all traffic goes to the other?)

Thanks!

Paul

Konstantin  on February 5th, 2008

To Joe:
/globaldevices has to be on EVERY cluster node. and yes, its in the docs.

To Paul:
The Resource (sun cluster term) is SharedAddress.

– Konstantin

JD  on July 1st, 2008

Any one have bad VCS experience on Non-Sun systems? currently evaluating solutions

Tirthankar  on February 1st, 2009

Solaris Cluster 3.2 U2 has been released and comes with many more useful features like providing the ability to create zone clusters and running ORACLE RAC inside a zone.

Refer to the video blog about SC3.2U2
http://blogs.sun.com/SC/en_US/entry/sc32u2_now_available

Also Refer to SC success story at EMBARQ
http://www.sun.com/customers/software/embarq.xml

Dr. Kenneth Noisewater  on May 11th, 2009

BTW, just took the 3.2 class, very neat stuff.. However I was wondering if anyone had any hints of a timeframe for if/when ZFS will be globalized? Or are they thinking of making it clusterable (say a clustered zpool on shared storage that arbitrates at the zpool level, presenting zfs filesystems that are global)?

‘cuz that is probably the one thing that keeps VCS from getting the boot in my org: CFS…

jodi  on June 14th, 2010

Hii, anyone can help me regarding. how to configure oracle database as scalable data service ?

Leave a Comment