Archive
Posts in Storage
Backing up the Veritas Cluster Server configuration
Veritas cluster server stores custom agents and it's configuration data as a series of files in /etc, /etc/VRTSvcs/conf/config and /opt/VRTSvcs/bin/ directories. Since these files are the life blood of the cluster engine, it is important to backup these files to ensure cluster recovery should disaster hit. VCS comes with the hasnap utility to simplify cluster configuration backups, and when run with the "-backup," "-n," "-f ," and "-m cluster configuration will be written to the file passed to the "-f" option: To check the contents of the snapshot, the unzip utility can be run with the "-t" option: Since parts of the cluster configuration ran reside in memory and not on disk, it is a good idea to run "haconf -dump -makero" prior to running hasnap. This will ensure that the current configuration is being backed up, and will allow hasnap "-restore" to restore the correct configuration if disaster hits.
$ read more →Removing duplicate devices from vxdisk list
I replaced a disk in one of our A5200s last week, and noticed that vxdisk was displaying two entries for the device once I replaced it with vxdiskadm: DEVICE TYPE DISK GROUP STATUS c7t21d0s2 sliced disk01 oradg online c7t22d0s2 sliced disk02 oradg error c7t22d0s2 sliced - - error c7t23d0s2 sliced disk03 oradg online To fix this annoyance, I first removed the disk disk02 from the oradg disk group: Once the disk was removed, I ran vxdisk "remove" two times to remove both disk access records: After both device access records were removed, I executed 'devfsadm -C' to clean the Solaris device tree, and then ran 'vxdctl enable' to have Veritas update the list of devices it knows about. After these oeprations completed, the device showed up once in the vxdisk output: DEVICE TYPE DISK GROUP STATUS c7t21d0s2 sliced disk01 oradg online c7t22d0s2 sliced disk02 oradg online c7t23d0s2 sliced disk03 oradg online I have seen times where the Solaris device tree will hold on to old entries, which unfortunately requires a reboot to fix. Luckily for me, this wasn't the case with my system. Shibby!
$ read more →Locating disk drives in a sea of A5200s
I manage about a dozen Sun A5200 storage arrays, and periodically need to replace failed disk drives in these arrays. To ensure that I replace the correct device, I first use the format utility to locate the physical device path to the faulted drive: < ..... > 43. c7t22d0 /sbus@3,0/SUNW,socal@0,0/sf@0,0/ssd@w22000004cf995f6c,0 Once I know which device to replace, I use the luxadm "remove_device" option to remove the drive for replacement, and then run luxadm with the "led_blink" option to turn an amber LED on and off next to the faulted drive: Once I enable the led_blink option, I wander down to the data center, locate the drive with the blinking light, and swap out the failed disk with a new disk…
$ read more →Monitoring md device rebuilds
One super useful utility that ships with CentOS 4.4 is the watch utility. Watch allows you to monitor the output from a command at a specific interval, which is especially useful for monitoring array rebuilds. To use watch, you need to run it with a command to watch, and an optional interval to control how often the output from that command is displayed:
$ read more →Adding a hot spare to an md device
I am running CentOS 4.4 on some old servers, and each of these servers has multiple internal disk drives. Since system availability concerns me more than the amount of storage that is available, I decided to add a hot spare to the md device that stores my data (md2). To add the hot spare, I ran the mdadm utility with the "--add" option, the md device to add the spare to, and the spare device to use: After the spare was added, the device showed up in the /proc/mdstat output with the "(S)" string to indicate that it's a hot spare:
$ read more →