Replacing failed disk devices with the Solaris Volume Manager

While reviewing a post I left on blogs.sun.com, I noticed that Jerry Jelinek replied to my post on replacing disks managed by the Solaris Volume Manager. I had run into the exact issue he covered in his BLOG, and was glad to see that the disk replacement annoyance was addressed in the latest Solaris Express.

The following example (per Jerrry’s feedback and limited testing in my sandbox) will show how to replace a disk named c0t0d0 using the cfgadm(1m) and metareplace(1m) utilities. The first step is to remove (if they exist) any meta state databases on the disk that needs to be replaced. To locate the locations of all meta state databases, the metadb(1m) command can be run with the “-i” option:

$ metadb -i

If meta devices exist, you can run metadb(1m) with the “-d” option to remove the databases. The following example deletes all meta state databases on slice 7 of the disk (c0t0d0) that we are going to replace:

$ metdb -d c0t0d0s7

Once the meta state databases are removed, you can use cfgadm(1m)’s “-c unconfigure” option to remove an occupant (an entity that lives in a receptacle) from Solaris:

$ cfgadm -c unconfigure c0::dsk/c0t0d0

Once Solaris unconfigures the device, you can physically replace the disk. Once the drive is replaced, you can run cfgadm(1m) with the “-c configure” option to let Solaris know the occupant is available for use:

$ cfgadm -c configure c0::dsk/c0t0d0

Once Solaris know the drive is available, you will need to VTOC the new drive with fmthard(1m) or format(1m). This will add a disk label to the drive, which defines the partions types and sizes. Once a valid VTOC is installed, you can invoke the trusty old metareplace(1m) utility to replace the faulted meta devices. The following example will replace the device associated with meta device d10, and cause the meta device to start synchronizing data from the other half of the mirror ( if RAID level 1 is used):

$ metareplace -e d10 c0t0d0s0

I dig the terminology cfgadm(1m) uses to describe the connections between components. “Receptable,” “occupant” and “atachment point” should be used a bit more universally in the IT industry! :) I recommend testing all examples in a sandbox prior to use in production (i.e., I am not responsible for breaking your stuff)!