I manage several V40Zs running Solaris 10, and these servers utilize the built-in hardware RAID controller. Siince the physical spindles are masked off from the operating system, using a tool like smartmontools to check disk health is not an option. Luckily Solaris shops with the raidctl utility, which provides insight into the status of both the controller and the disks that sit behind that controller:
$ raidctl
RAID Volume RAID RAID Disk
Volume Type Status Disk Status
------------------------------------------------------
c1t0d0 IM OK c1t0d0 OK
c1t1d0 OK
Since raidctl will display a disk fault when a drive fails, I run a shell wrapper from cron every fifteen minutes to check the RAID controller status. If the script detects a problem, it will send an email and generate a syslog entry to let folks know a problem exists. Viva la hardware RAID!
I had an application go nuts a week or two ago, and it filled up /tmp on one of my Solaris 10 hosts. Since /tmp is an in memory file system, you can only imagine the chaos this caused. :( To ensure that this never happens again, I modified the tmpfs entry in /etc/vfstab to limit tmpfs to 1GB in size:
$ grep ^swap /etc/vfstab
swap - /tmp tmpfs - yes size=1024m
That will teach that pesky application. :)
While reading through the VxFS administrators guide last week, I came across a cool mount option that can be used to zero out file system blocks prior to use:
“In environments where performance is more important than absolute data integrity, the preceding situation is not of great concern. However, for environments where data integrity is critical, the VxFS file system provides a mount -o blkclear option that guarantees that uninitialized data does not appear in a file.”
This is pretty cool, and a useful feature for environments that are super concerned about data integrity
One cool feature that is built into VxFS is the ability to preallocate files sequentially on disk. This capability can benefit sequential workloads, and will typically result in higher throughput since disk seek times are minimized (LBA addressing, disk drive defect management and storage array abstractions can sometimes obscure this, so this may not always be 100% accurate).
To use the VxFS preallocation features, a file first needs to be created:
$ dd if=/dev/zero of=oradata01.dbf count=2097152
2097152+0 records in
2097152+0 records out
In this example, I created a 1GB file (2097152 blocks 512-bytes per block gives us 1GB) named oradata01.dbf, and double checked that it was 1GB by running ls with the “-h” option:
$ ls -lh
total 3.1G -rw-r–r– 1 root root 1.0G Aug 25 09:06 oradata01.dbf
After a file of the correct size has been allocated, the setext utility can be used to reserve blocks for that file, and to create an extent that matches the number of blocks allocated to the file:
$ setext -r 2097152 -e 2097152 oradata01.dbf
To verify the settings that were assigned to the file, the getext utility can be used:
$ getext oradata01.dbf
oradata01.dbf: Bsize 1024 Reserve 2097152 Extent Size 2097152
This is an awesome feature, and yet another reason why VxFS is one of the best file systems available today!
Veritas cluster server stores custom agents and it’s configuration data as a series of files in /etc, /etc/VRTSvcs/conf/config and /opt/VRTSvcs/bin/ directories. Since these files are the life blood of the cluster engine, it is important to backup these files to ensure cluster recovery should disaster hit. VCS comes with the hasnap utility to simplify cluster configuration backups, and when run with the “-backup,” “-n,” “-f ,” and “-m ” options, a point in time snapshot of the cluster configuration will be written to the file passed to the “-f” option:
$ hasnap -backup -f clusterbackup.zip -n -m "Backup from March 25th
2007"**
Starting Configuration Backup for Cluster foo
Dumping the configuration...
Registering snapshot "foo-2006.08.25-1156511358610"
Contacting host lnode1...
Error connecting to the remote host "lnode1"
Starting backup of files on host lnode2
"/etc/VRTSvcs/conf/config/types.cf" ----> 1.0
"/etc/VRTSvcs/conf/config/main.cf" ----> 1.0
"/etc/VRTSvcs/conf/config/vcsApacheTypes.cf" ----> 1.0
"/etc/llthosts" ----> 1.0
"/etc/gabtab" ----> 1.0
"/etc/llttab" ----> 1.0
"/opt/VRTSvcs/bin/vcsenv" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/monitor" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/offline" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/online" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/clean" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGroup.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/fdsched" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/fdsetup.vxg" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/open" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/RVGSnapshotAgent.pm" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/RVGSnapshot.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/online" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/attr_changed" ----> 1.0
"/opt/VRTSvcs/bin/RVGSnapshot/clean" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/open" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/RVGPrimary.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/online" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/clean" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVGPrimary/actions/fbsync" ----> 1.0
"/opt/VRTSvcs/bin/triggers/violation" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/monitor" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/close" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/open" ----> 1.0
"/opt/VRTSvcs/bin/CampusCluster/CampusCluster.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVG/monitor" ----> 1.0
"/opt/VRTSvcs/bin/RVG/info" ----> 1.0
"/opt/VRTSvcs/bin/ScriptAgent" ----> 1.0
"/opt/VRTSvcs/bin/RVG/RVG.xml" ----> 1.0
"/opt/VRTSvcs/bin/RVG/offline" ----> 1.0
"/opt/VRTSvcs/bin/RVG/online" ----> 1.0
"/opt/VRTSvcs/bin/RVG/clean" ----> 1.0
"/opt/VRTSvcs/bin/internal_triggers/cpuusage" ----> 1.0
Backup of files on host lnode2 complete
Backup succeeded partially
To check the contents of the snapshot, the unzip utility can be run with the “-t” option:
$ unzip -t clusterbackup.zip |more
Archive: clusterbackup.zip
testing: /cat_vcs.zip OK
testing: /categorylist.xml.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/types.cf.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/main.cf.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/VRTSvcs/conf/config/vcsApacheTypes.cf.z
ip OK
testing: _repository__data/vcs/foo/lnode2/etc/llthosts.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/gabtab.zip OK
testing: _repository__data/vcs/foo/lnode2/etc/llttab.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/vcsenv.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/monitor.zip
OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/offline.zip
OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/online.zip
OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/clean.zip
OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGro
upAgent.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/LVMVolumeGroup/LVMVolumeGro
up.xml.zip OK
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/RVGSnapshot/fdsched.zip O
K
testing: _repository__data/vcs/foo/lnode2/opt/VRTSvcs/bin/RVGSnapshot/monitor.zip O
K
......
Since parts of the cluster configuration ran reside in memory and not on disk, it is a good idea to run “haconf -dump -makero” prior to running hasnap. This will ensure that the current configuration is being backed up, and will allow hasnap “-restore” to restore the correct configuration if disaster hits.