Configuring ZFS to gracefully deal with pool failures

If you are running ZFS in production, you may have experienced a situation where your server paniced and reboot when a ZFS file system was corrupted. With George Wilson’s recent putback of CR #6322646, this is no longer the case. George’s putback allows the file system administrator to set the “failmode” property to control that happens when a pool incurs a fault. Here is a description of the new property from the zpool(1m) manual page:

failmode=wait | continue | panic

    Controls the system behavior  in  the  event  of  catas-
    trophic  pool  failure.  This  condition  is typically a
    result of a  loss  of  connectivity  to  the  underlying
    storage device(s) or a failure of all devices within the
    pool. The behavior of such an  event  is  determined  as

    wait        Blocks all I/O access until the device  con-
                nectivity  is  recovered  and the errors are
                cleared. This is the default behavior.

    continue    Returns EIO to any new  write  I/O  requests
                but  allows  reads  to  any of the remaining
                healthy devices.  Any  write  requests  that
                have  yet  to  be committed to disk would be

    panic       Prints out a message to the console and gen-
                erates a system crash dump.

To see just how well this feature worked, I decided to test out the new failmode property. To begin my tests, I created a new ZFS pool from two files:

$ cd / && mkfile 1g file1 file2

$ zpool create p1 /file1 /file2

$ zpool status

  pool: p1
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        p1          ONLINE       0     0     0
          /file1    ONLINE       0     0     0
          /file2    ONLINE       0     0     0

After the pool was created, I checked the failmode property:

$ zpool get failmode p1

p1    failmode  wait      default

And then then began writing garbage to one of the files to see what would happen:

$ dd if=/dev/zero of=/file1 bs=512 count=1024

$ zpool scrub p1

I was overjoyed to find that the box was still running, even though the pool showed up as faulted:

$ zpool status

  pool: p1
 state: FAULTED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
 scrub: scrub completed after 0h0m with 0 errors on Tue Feb 19 13:57:41 2008

        NAME        STATE     READ WRITE CKSUM
        p1          FAULTED      0     0     0  insufficient replicas
          /file1    UNAVAIL      0     0     0  corrupted data
          /file2    ONLINE       0     0     0

errors: No known data errors

But my joy didn’t last long, since the box became unresponsive after a few minutes, and paniced with the following string:

Feb 19 13:57:47 nevadadev genunix: [ID 603766 kern.notice] assertion failed: vdev_config_sync(rvd->vdev_child, rvd->vdev_children, txg) == 0 (0x5 == 0x0), file: ../../common/fs/zfs/spa.c, line: 4130
Feb 19 13:57:47 nevadadev unix: [ID 100000 kern.notice] 
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feab30 genunix:assfail3+b9 ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feabd0 zfs:spa_sync+5d2 ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feac60 zfs:txg_sync_thread+19a ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feac70 unix:thread_start+8 ()

Since the manual page states that the failmode property “controls the system behavior in the event of catas-trophic pool failure,” it appears the box should have stayed up and operational when the pool became unusable. I filed a bug on the opensolaris website, so hopefully the ZFS team will get this issue addressed in the future.

5 thoughts on “Configuring ZFS to gracefully deal with pool failures”

  1. We have just had to pull 2 ZFS pools from a production environment using a couple of Clariion LUN’s because of consistent panic’s. This is something that should have been a feature long ago.

  2. hrmm – this is still occuring.

    S10U6 :(

    Not very good in my opinion. We just moved to ZFS on our production server and have come across this problem when setting up a test failover lab environment.

    Sun is normally known for its robustness, but ZFS is really ruining that reputation at the moment.

    Also zpool/zfs shouldn’t “hang” if there is a zfs operation in progress, they should display “what they can” as well as the fact that an operation is in-progress.

    Oh well – ZFS is still a baby :)

  3. I agree with David — there’s no good reason for “zpool status” to wedge unkillably. When the pool’s having a problem, it’s the first thing that’s natural to type, and it should return *something* rather than wedging.

    In my case, on two separate occasions/hosts I’ve had the following behavior:

    o One half of a mirror dies. The system grabs a hot spare to replace it. I yank the dead disk and RMA it with the vendor
    o 1-3 days later, any access to the zpool hangs, including “zpool status”. After a forcible reboot, nothing has changed — the zpool is death when touched. If I boot failsafe, I can import the zpool and fix up the mirrors, but then “zpool status” lists a number of files corrupted. This is exactly what mirroring is supposed to avoid! When I imported the pool from the failsafe boot several disks were erroneously marked as bad so it ran out of spares trying to replace them.

    Hosts are running u4 and u6, default failmode.

Leave a Reply

Your email address will not be published. Required fields are marked *