# **************************************************** # The information below comes from the Sun Cluster 3.2 # concepts and administration guides. It is summarized # here for easy access # **************************************************** # # Sun cluster daemons # rpc.pmfd - This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons scqsd - quorum server daemon failfastd - The failfast daemon allows the kernel to panic if certain essential daemons have failed. scprivipd - This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones. qd_userd - This daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland (for example, a NAS quorum device). clexecd - This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck commands). It is also used to run cluster commands remotely (like the cluster shutdown command). pnmd - This is the public network management daemon, which manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. cl_eventd - This daemon registers and forwards cluster events cl_eventlogd - This daemon logs cluster events into a binary log file. sc_delegated_restarter - This daemon restarts cluster applications that are written as SMF services and then placed under control of the cluster using the Sun Cluster 3.2 SMF proxy feature. rpc.fed - This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. rgmd - This is the resource group manager, which manages the state of all cluster-unaware applications. scdpmd - This daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. sc_zonesd - This daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting and failure. cl_ccrad - This daemon provides access from userland management applications to the CCR. cluster - This is a system process (created by the kernel) to encapsulate the kernel threads that make up the core kernel # # Sun cluster supports four main topologies # - Clustered pair - two or more nodes that operate under a single cluster framework. - Pair+N - A pair of nodes are connected to shared disk, and the remaining N nodes access the resources through the cluster interconnect. - N+1 (star) - Contains some number of primary nodes, and one secondary node. The secondary node has access to all of the shared disk, and can take over for any of the N nodes should they fail. - N*N (scalable) - enabled every shared storage device in the cluster to connect to every node in the cluster. Since all of the N nodes can see the disk, failover can occur to any node # # Cluster membership # - Cluster membership is handled through the Cluster Membership Monitor (CCM), which performs the following actions: - Enforcing a consistent membership view on all nodes (quorum) - Driving synchronized reconfiguration in response to membership changes - Handling cluster partitioning - Ensuring full connectivity among all clustermembers by leaving unhealthy nodes out of the cluster until it is repaired # # Cluster configuration repository # - The Cluster Configuration Repository (CCR) is a private, cluster-wide, distributed database for storing information that pertains to the configuration and state of the cluster. - The CCR structures contain the following types of information: - Cluster and node names - Cluster transport configuration - The names of Solaris Volume Manager disk sets or VERITAS disk groups - A list of nodes that can master each disk group - Operational parameter values for data services - Paths to data service callback methods - DID device configuration - Current cluster status # # Quorum devices / data integrity issues # - Quorum devices acquire quorum vote counts that are based on the number of node connections to the device. When you set up a quorum device, it acquires a maximum vote count of N-1 where N is the number of connected votes to the quorum device. For example, a quorum device that is connected to two nodes with nonzero vote counts has a quorum count of one (two minus one). - Split brain occurs when the cluster interconnect between nodes is lost and the cluster becomes partitioned into subclusters, and each subcluster believes that it is the only partition - Amnesia occurs if all the nodes leave the cluster in staggered groups. An example is a two-node cluster with nodes A and B. If node A goes down, the configuration data in the CCR is updated on node B only, and not node A. If node B goes down at a later time, and if node A is rebooted, node A will be running with old contents of the CCR. This state is called amnesia and might lead to running a cluster with stale configuration information. - Failure fencing limits node access to multihost disks by preventing access To the disks. When a node leaves the cluster (it either fails or becomes partitioned), failure fencing ensures that the node can no longer access the disks. Only current member nodes have access to the disks, ensuring data integrity. The Sun Cluster system uses SCSI disk reservations to implement failure fencing. Using SCSI reservations, failed nodes are fenced away from the multihost disks, preventing them from accessing those disks. # # Sun cluster device layout # /global - contains the actual global devices that are accessed /dev/global - contains the global device names /dev/did - contains device identifiers for each local device /dev/md//dsk - contains meta device for the disk set /dev/vx/dsk/ - contains volumes for the disk group /globaldevices - 512MB+ partition used by the cluster # # Resource group affiniities come in five flavors # +, or weak positive affinity ++, or strong positive affinity +++, or strong positive affinity with failover delegation -, or weak negative affinity --, or strong negative affinity # # resource group dependencies come in four flavors # resource_dependencies - hard dependency on another resource resource_dependencies_weak - allows a resource to come online it the underlying dependent resource isn't available resource_dependencies_restart - will restart a resource if the underlying resource it depends on is restarted resource_dependencies_offline_restart - Takes a dependent resource offline when the underlying resource fails, and brings it back online when the resource is brought back online. # # Recovering from amnesia # (use at your own risk) Scenario: Two node cluster (nodes A and B) with one QD, nodeA has gone bad, and amnesia protection is preventing nodeB from booting up. - Boot nodeB in non-cluster mode (boot -x). - Edit nodeB's file /etc/cluster/ccr/infrastructure as follows: - Change the value of "cluster.properties.installmode" from "disabled" to "enabled". - Change the number of votes for nodeA from "1" to "0", in the following property line: "cluster.nodes..properties.quorum_vote". - Delete all lines with "cluster.quorum_devices" to remove knowledge of the quorum device. - Run command: /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/infrastructure -o - Reboot nodeB in cluster mode. # # Misc # - The cluster manager runs on TCP port 6789 (over SSL) - Booting with the "-x" option will boot the node into non-cluster mode. # # Command reference # Command Use ----------------------------- ----------------------------------- claccess deny-all Stop new nodes from joining cluster clnode set -p \ Reboot node if all disk paths fail. Reboot_on_path_failure=enabled # # Configure a scalable Apache resource in two zones # $ clrg create -n snode1:zone1,snode2:zone1 sharedaddr-rg $ clressharedaddress create -g sharedaddr-rg -h sharedaddr sharedaddr-res $ clrg create -S -n snode1:zone1,snode2:zone1 apache-rg $ clresource create -g apache-rg -t SUNW.apache -p resource_dependencies=sharedaddr-res -p Port_list=80/tcp -p Scalable=true -p bin_dir=/usr/apache2/bin apache-res