Deploying highly available Oracle databases with Sun Cluster 3.2

While preparing for my Sun cluster 3.2 exam, I got a chance to play with a number of the Sun Cluster 3.2 data services. One of my favorite data services was the Oracle HA data service, which allows Sun cluster to monitor and failover databases in response to system and application failures.

Configuring the Oracle HA data service is amazingly easy, and it took me all of about 5 minutes (plus two hours reading through the Oracle data service documentation, installing Oracle and creating a database). Here are the steps I used to configure Sun Cluster 3.2 to failover an Oracle 10G database between two nodes:

Step 1. Run sccheck

The sccheck utility will verify the cluster configuration, and report errors it finds. This is useful for detecting storage and network misconfigurations early on in the configuration process. To use sccheck, you can run it without any options:

$ sccheck

If sccheck reports errors, make sure to correct them prior to moving on.

Step 2. Create a resource group for the Oracle resources

The Oracle resource group is a container for individual resources, which include virtual IP addresses, disk storage, and resources to manage the Oracle database and listener. To create a new resource group, you can run the clresourcegroup utility with the “create” option and the name of the resource group to create:

$ clresourcegroup create oracleha-rg

After the resource group is created, you will also need to put the resource group into a managed state with the clresourcegroup manage option (you can also set the resource group to the managed state when you create it):

$ clresourcegroup manage oracleha-rg

Once the resource group is in a managed state, you can bring it online with the clresourcegroup “online” option:

$ clrg online oracleha-rg

To verify that the resource group is online, you can run the clresourcegroup command with the “status” option and the name of the resource group to check (if you don’t pass a resource group name as an argument, the status of all resource groups is displayed):

$ clrg status oracleha-rg

=== Cluster Resource Groups ===

Group Name        Node Name      Suspended      Status
----------        ---------      ---------      ------
oracleha-rg       snode1         No             Offline
                      snode2         No             Online

Step 3. Register resource types with the cluster framework

Sun cluster comes with a number of resource types, which provide the brains needed to start, stop and monitor a specific type of resource (e.g., storage, virtual IP addresses, Oracle listeners, zones, etc.). To make these resource types available to the cluster framework, you need to use the clresourcetype utility to register them. The following commands will register the HAStoragePlus, oracle_server and oracle_listener resources with the cluster:

$ clresourcetype register SUNW.oracle_HAStoragePlus

$ clresourcetype register SUNW.oracle_listener

$ clresourcetype register SUNW.oracle_server

To check if a resource type is registered with the cluster framework, you can run the clresourcetype utility with the “list” option:

$ clresourcetype list

SUNW.LogicalHostname:2
SUNW.SharedAddress:2
SUNW.derby
SUNW.sctelemetry
SUNW.HAStoragePlus:4
SUNW.oracle_listener:5
SUNW.oracle_server:6

Step 4. Configure HAStoragePlus resource

The Oracle database will store datafiles and controlfiles on some type of shared storage. If the node that is running the database fails, the cluster framework will migrate the shared storage to another cluster member when the database is brought online on an alternate cluster node. To facilitate the storage migration, an HAStoragePlus resource needs to be configured with the shared storage resources. In addition to mounting and unmounting file systems, the HAStoragePlus resource will also monitor the health of the storage.

To create an HAStoragePlus resource that controls the ZFS pool named oracle, the clresource command can be executed with the “create” command, the “-t” option and the type of resource to create, the “-g” option and the name of the resource group to assign the resource to, the “Zpools” option and the name of the ZFS pool to manage, an affinity value to ensure that the storage will be online on the same node as the other resources in the resource group and the name of the resource to create:

$ clresource create -t SUNW.HAStoragePlus \
-g oracleha-rg \
-p Zpools=oracle \
-p AffinityOn=True oracledisk-res

To verify that the ZFS pool is imported and the file systems are online, you can run df to view the file systems, and the clresource utility with the “status” option to view the state of the HAStoragePlus resources:

$ df -h | egrep ‘(File|oracle)’

Filesystem             size   used  avail capacity  Mounted on
oracle                  35G    24K    35G     1%    /u01
oracle/u02              35G    24K    35G     1%    /u02
oracle/u03              35G    24K    35G     1%    /u03

$ clresource status oracledisk-res

=== Cluster Resources ===

Resource Name  Node Name   State       Status Message
-------------       ---------        -----        --------------
oracledisk-res    snode1         Offline      Offline
                       snode2         Online      Online

If the file systems don’t mount for some reason, you can check /var/adm/messages to see why.

Step 5. Configure a logical hostname resource

In order for clients to automatically adapt to database failures, they will need to be smart enough to reconnect to the database, and will also need to be configured to connect to a virtual IP address that is configured on the active database node. Virtual IP addresses are managed by the Sun cluster LogicalHostname resource type (you can also configure Scalable addresses that are mastered on more than one node, but that is a topic for a subsequent blog entry). To configure a LogicalHostname resource, you will first need to add the virtual IP address and the name associated with that address to /etc/hosts on each cluster node. After the entry is added, the clreslogicalhostname utility can be run with the “create” option, the “-g” option and the resource group to assign the logical hostname to, the “-h” option and the hostname associated with the virtual IP and a name to assign to the resource:

$ clreslogicalhostname create -g oracleha-rg -h oracle-lh oraclelh-res

To verify that the LogicalHostname resource is online, you can run ifconfig to see if the virtual IP address is configured on one of the interfaces:

$ grep oracle-lh /etc/hosts
192.168.1.32 oracle-lh

$ ifconfig -a | grep 192.168.1.32
inet 192.168.1.32 netmask ffffff00 broadcast 192.168.1.255

Step 6. Configure Oracle monitoring user

For the cluster framework to detect failures with Oracle (e.g., an Oracle internal error), an Oracle user needs to be created. This user is used by the Oracle data service to login to the database and check the operational status of the database. There are two ways the oracle data service checks to see if the Oracle database is working correctly. The first method used is a SQL select statement against the v$archive_dest and the v$sysstat views. If the values the agent retrieves from these views don’t change between consecutive runs, the agent will attempt to create a table to force the database to perform some work. For the Oracle data service to function correctly, an Oracle user needs to be created and assigned privileges to perform these operations.

To create a user named monitor that can create a table in the users tablespace and query the two views listed above, the following SQL can be used:

$ sqlplus “/ as sysdba”
sql> grant connect, resource to monitor identified by COMPLEXPASSWORD;
sql> alter user monitor default tablespace users quota 1m on users;
sql> grant select on v_$sysstat to monitor;
sql> grant select on v_$archive_dest to monitor;
sql> grant create session to monitor;
sql> grant create table to monitor;

To verify that the monitoring user works, you can connect to the database as the monitoring user:

$ sqlplus “monitor/COMPLEXPASSWORD@proddb”

Once connected, you can query both system views, and then attempt to create a table in the user’s default tablespace (in this case, the user is configured to use the users tablespace):

sql> select * from v$sysstat;

sql> select * from v$archive_dest;

sql> create table foo (a number);

sql> drop table foo;

Step 7. Create Oracle server and listener resources

After the file systems and IP addresses are online and the monitoring user is created, oracle_server and oracle_listener resources need to be created. The oracle_server resource is used to stop, start and monitor an Oracle database, and the oracle_listener resource is used to stop, start and monitor the Oracle listener.

To create a new oracle_server resource, the clresource utility can be run with the “create” option and a list of properties (e.g., where the alert log is located, the name of a user to monitor the database, the ORACLE_SID, the ORACLE_HOME, etc) that are needed to start and stop the database:

$ clresource create -g oracleha-rg \
-t SUNW.oracle_server \
-p Connect_string=monitor/COMPLEXPASSWORD \
-p ORACLE_SID=proddb \
-p ORACLE_HOME=/opt/oracle/product/10.2.0.1.0 \
-p Alert_log_file=/opt/oracle/admin/proddb/bdump/alert_proddb.log \
-p resource_dependencies=oracledisk-res oracledb-res

To create an oracle_listener resource, the clresource utility can be run with the “create” option and a list of properties (e.g., listener name, where the ORACLE_HOME resides, etc.) that are needed to start and stop the listener:

$ clresource create -g oracleha-rg \
-t SUNW.oracle_listener \
-p LISTENER_NAME=LISTENER_PRODDB \
-p ORACLE_HOME=/opt/oracle/product/10.2.0.1.0 \
-p resource_dependencies=oracledisk-res,oraclelh-res oraclelistener-res

There is one important thing to note in the commands listed above. Each resource contains a resource dependency, which the cluster framework uses to bring resource online in a specific order. The oracle_server needs the oraceldisk-res to be online prior to starting, since the database won’t be able to start unless the file systems are mounted. The oracle_listener also has a dependency on the oracle-disk-res resource, as well as a dependency on the logical host resource (this ensures that the virtual IP is up and operational prior to starting the listener). To verify that the resources are online and working, the clresource command can be run with the “status” option and the name of a resource:

$ clresource status oracledb-res

=== Cluster Resources ===

Resource Name       Node Name      State        Status Message
-------------       ---------      -----        --------------
oracledb-res        snode1         Offline      Offline
                    snode2         Online       Online

$ clrs status oraclelistener-res

=== Cluster Resources ===

Resource Name          Node Name     State      Status Message
-------------          ---------     -----      --------------
oraclelistener-res     snode1        Offline    Offline
                       snode2        Online     Online

Step 8: Verify the cluster configuration

Once all of the resources are created and the dependencies configured, you should verify that the Oracle database can successfully fail over to each node that is configured in the resource group’s node list. This can be accomplished by running the clresourcegroup utility with the “switch” option, the “-n” option and the name of the node to fail the resource group to, and the name of the resource group to failover:

$ clresourcegroup switch -n snode1 oracleha-rg

If you are able to successfully migrate the database to each node in the cluster, you now have a highly available Oracle database. While I touched on the basics of deploying a highly available Oracle database, I didn’t touch on correctly configuring the cluster framework, or the steps required to achieve maximum availability. I highly recommend reading through the Sun cluster concepts and Oracle data service guides prior to setting up your highly available databases. While the steps listed above work flawlessly in the authors environment, you should verify everything in a test environment prior to deploying anything in production.

15 thoughts on “Deploying highly available Oracle databases with Sun Cluster 3.2”

  1. Hi:

    I’m interested into running several oracle databases into a cluster. I
    guess I must create 1 resource group (ip, directory, etc) per
    database? I’m thinking about 10-30 oracles (15 per node) and 30 in one
    node if one node fails.

    Do you know how much memory I need to run every oracle? (database file
    size goes from 3 Gb to 30 Gb).

    Of course, I can’t use only 1 database, I must use all those databases
    in HA

    Thank you.

  2. Thanks for this amazing article. I was wondering if it’s possible to setup the exact same thing for an apache web server? Basically, I’m trying to setup one master apache web server and a hot standby apache web server.

    your help is greatly appreciated!!!!!

    Thanks

    -Hicham

  3. Great outline of the steps. I have done many active/passive oracle clusters but am doing my first active/active oracle setup. Since both resource groups require a oracle listener, do the port numbers have to be different to avoid conflict when they both run on the same node? Could both resource groups use the same listener? I would not think that would be best since the listener would stop/start on a failover and all the db’s would be impacted.

    Thx

  4. Great post

    why in oracleha-rg resource group alert log file in local storage it means that each database instance using the two alert log files it difficult to track the alert log file

    advance thanks for replay

  5. Apache can be configured the exact same way (active-passive) OR can be configured to be active-active->active (on all nodes).

    Oracle cannot be active-active without using Oracle’s RAC.

    Just FYi stuff. Thanks for the article!

  6. It is possible to have two oracle instances configured in the same Sun Cluster?

    Can i specify another SID in the oracle resource and another listener in the listener resource?

    how it works?

    Thnak you in advance

  7. Need to make one change….

    clresourcetype register SUNW.oracle_HAStoragePlus

    Needs to read

    clresourcetype register SUNW.HAStoragePlus

    Sun changed the resource in the bundle.
    Just got it straight from the horses mouth.
    -Corry

    P.S. Awesome writeup … thank you!

  8. This is an excellent doc explaning clear step by step commands. Would you also document the steps for installing & configuring sun cluster 3.2. The BigAdmin doc is very lenghty to understand for the beginners.

    Thanks

  9. I can’t afford to take a training class and the docs are heady stuff, your blog would be helpful if it included how to set up the database – does it need to be setup on both nodes etc.

    This blog would be, in my opinion a one stop to learn oracle HA on Sun Cluster if it contained how to create the database and how to install Oracle in a cluster environment.

    Online docs show how to install it in a non clustered environment but not to how to install it in a clustered environment.

  10. Hi,

    I would like add an oracle instance to an existent resource with some Oracle instances. How do I add an instance to an existing resource?. I need to add another resource?. Thanks in advance

Leave a Reply

Your email address will not be published. Required fields are marked *