Using the ZFS RAIDZ2 and hot spare features on old storage arrays


I support some antiquated Sun disk arrays (D1000s, T3s, A5200s, etc), and due to the age of the hardware, it is somewhat prone to failure. FMA helps prevent outages due to CPU and memory problems, but it doesn’t yet support diagnosing disk drive errors. Since disk drives are prone to failure, I have started creating RAIDZ2 (dual parity RAIDZ) pools with multiple hot spares to protect our data. RAIDZ2 and hot spares are available in the 11/06 release of Solaris 10, and are both super easy to configure.

To create a RAIDZ2 pool, you can run the zpool utility with the “create” option, the “raidz2” keyword, the name of the pool to create, and the disks to add to the pool:

$ zpool create rz2pool raidz2 c1t9d0 c1t10d0 c1t12d0 c2t1d0 c2t2d0

Once the pool is created, the layout can be viewed with the zpool utility:

$ zpool status

pool: rz2pool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rz2pool ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c1t9d0 ONLINE 0 0 0
c1t10d0 ONLINE 0 0 0
c1t12d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0

errors: No known data errors

RAIDZ2 allows 2 disks in a pool to fail without data loss, which is ideal for sites that are more concerned with data integrity than performance. On my ancient storage subsystems, I like to combine RAIDZ2 with several hot spares to allow the pool to automatically recover each time a disk bites the dust. To add one or more hot spares to a pool, you can run the zpool utility with the “add” option, the “spare” keyword, and the device to turn into a spare:

$ zpool add rz2pool spare c2t3d0

pool: rz2pool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rz2pool ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c1t9d0 ONLINE 0 0 0
c1t10d0 ONLINE 0 0 0
c1t12d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
spares
c2t3d0 AVAIL

errors: No known data errors

Luckily we are using most of the storage listed above in development, QE and testing environments, so performance isn’t super critical (downtime is much more costly).

This article was posted by Matty on 2007-02-09 21:58:00 -0400 -0400