Replacing failed disk drives in a ZFS pool

I had a disk drive fail in one of my ZFS pools over the weekend, and needed to swap it out to restore the pool to an optimal state. To begin the swap out, I used the zpool utility to see which disk drive was faulty:

$ zpool status -v

  pool: rz2pool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Tue Feb 13 14:12:37 2007
config:

        NAME          STATE     READ WRITE CKSUM
        rz2pool       DEGRADED     0     0     0
          raidz2      DEGRADED     0     0     0
            c1t9d0    ONLINE       0     0     0
            c1t10d0   ONLINE       0     0     0
            c1t12d0   ONLINE       0     0     0
            c2t1d0    ONLINE       0     0     0
            spare     DEGRADED     0     0     0
              c2t2d0  UNAVAIL      0     0     0  cannot open
              c2t3d0  ONLINE       0     0     0
        spares
          c2t3d0      INUSE     currently in use

Once I located the faulty device, I used cfgadm to add and remove the old and new disk drives from the system, and then ran zpool with the “replace” option to replace the failed drive in my pool:

$ zpool replace rz2pool c2t2d0 c2t2d0

After the replacement operation completed, I used zpool to monitor the resilvering of the replacement drive:

$ zpool status -v

  pool: rz2pool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.10% done, 0h31m to go

config:

        NAME                STATE     READ WRITE CKSUM
        rz2pool             DEGRADED     0     0     0
          raidz2            DEGRADED     0     0     0
            c1t9d0          ONLINE       0     0     0
            c1t10d0         ONLINE       0     0     0
            c1t12d0         ONLINE       0     0     0
            c2t1d0          ONLINE       0     0     0
            spare           DEGRADED     0     0     0
              replacing     DEGRADED     0     0     0
                c2t2d0s0/o  UNAVAIL      0     0     0  cannot open
                c2t2d0      ONLINE       0     0     0
              c2t3d0        ONLINE       0     0     0
        spares
          c2t3d0            INUSE     currently in use

errors: No known data errors

All of this was done online, and with minimal interruption to the applications running on the host.

1 thought on “Replacing failed disk drives in a ZFS pool”

  1. Hi, I had a similar situation, but I’m having some trouble.
    In my case, I had no spare, just 4 disks in raidz1 config.
    I had no failure singals on zpool, but heavy slow downs. So I found hardware errors on /var/adm/messages. A technician went to the customer and added a new disk, then I asked him to run the replace command, but he did differently by mistake, and created a spare with both the failing disk and the new disk.
    The failing disk is now offline, and the new spare is using the new disk, but the system is still slow, and zpool signal the degraded state.
    I would like to return to my original situation, with no spare, because this way I cannot remove the failing disk from the zpool (only spare can be removed…).
    Do I risk anything if I remove the spare, leaving just the 3 disks running for a moment?
    What will happen when I then add the new disk to the zpool? Will it start a new sync? I mean…it should be already synced as it was already running in the spare…will the system accept it as another disk even if it was added as spare before?

    Thanx for any help.
    Gabriele Bulfon.

Leave a Reply

Your email address will not be published. Required fields are marked *