I had a disk drive fail in one of my ZFS pools over the weekend, and needed to swap it out to restore the pool to an optimal state. To begin the swap out, I used the zpool utility to see which disk drive was faulty:
$ zpool status -v
pool: rz2pool
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: resilver completed with 0 errors on Tue Feb 13 14:12:37 2007
config:
NAME STATE READ WRITE CKSUM
rz2pool DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
c1t9d0 ONLINE 0 0 0
c1t10d0 ONLINE 0 0 0
c1t12d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
spare DEGRADED 0 0 0
c2t2d0 UNAVAIL 0 0 0 cannot open
c2t3d0 ONLINE 0 0 0
spares
c2t3d0 INUSE currently in use
Once I located the faulty device, I used cfgadm to add and remove the old and new disk drives from the system, and then ran zpool with the “replace” option to replace the failed drive in my pool:
$ zpool replace rz2pool c2t2d0 c2t2d0
After the replacement operation completed, I used zpool to monitor the resilvering of the replacement drive:
$ zpool status -v
pool: rz2pool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 0.10% done, 0h31m to go
config:
NAME STATE READ WRITE CKSUM
rz2pool DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
c1t9d0 ONLINE 0 0 0
c1t10d0 ONLINE 0 0 0
c1t12d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
spare DEGRADED 0 0 0
replacing DEGRADED 0 0 0
c2t2d0s0/o UNAVAIL 0 0 0 cannot open
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
spares
c2t3d0 INUSE currently in use
errors: No known data errors
All of this was done online, and with minimal interruption to the applications running on the host.
April 17th, 2008 at 4:45 am
Hi, I had a similar situation, but I’m having some trouble.
In my case, I had no spare, just 4 disks in raidz1 config.
I had no failure singals on zpool, but heavy slow downs. So I found hardware errors on /var/adm/messages. A technician went to the customer and added a new disk, then I asked him to run the replace command, but he did differently by mistake, and created a spare with both the failing disk and the new disk.
The failing disk is now offline, and the new spare is using the new disk, but the system is still slow, and zpool signal the degraded state.
I would like to return to my original situation, with no spare, because this way I cannot remove the failing disk from the zpool (only spare can be removed…).
Do I risk anything if I remove the spare, leaving just the 3 disks running for a moment?
What will happen when I then add the new disk to the zpool? Will it start a new sync? I mean…it should be already synced as it was already running in the spare…will the system accept it as another disk even if it was added as spare before?
Thanx for any help.
Gabriele Bulfon.