Configuring ZFS to gracefully deal with pool failures -- Prefetch Technologies

If you are running ZFS in production, you may have experienced a situation where your server paniced and reboot when a ZFS file system was corrupted. With George Wilson’s recent putback of CR #6322646, this is no longer the case. George’s putback allows the file system administrator to set the “failmode” property to control that happens when a pool incurs a fault. Here is a description of the new property from the zpool(1m) manual page:

failmode=wait | continue | panic

Controls the system behavior in the event of catas- trophic pool failure. This condition is typically a result of a loss of connectivity to the underlying storage device(s) or a failure of all devices within the pool. The behavior of such an event is determined as follows:

wait Blocks all I/O access until the device con- nectivity is recovered and the errors are cleared. This is the default behavior.

continue Returns EIO to any new write I/O requests but allows reads to any of the remaining healthy devices. Any write requests that have yet to be committed to disk would be blocked.

panic Prints out a message to the console and gen- erates a system crash dump.

To see just how well this feature worked, I decided to test out the new failmode property. To begin my tests, I created a new ZFS pool from two files:

$ cd / && mkfile 1g file1 file2

$ zpool create p1 /file1 /file2

$ zpool status

pool: p1
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
p1 ONLINE 0 0 0
/file1 ONLINE 0 0 0
/file2 ONLINE 0 0 0

After the pool was created, I checked the failmode property:

$ zpool get failmode p1

NAME PROPERTY VALUE SOURCE
p1 failmode wait default

And then then began writing garbage to one of the files to see what would happen:

$ dd if=/dev/zero of=/file1 bs=512 count=1024

$ zpool scrub p1

I was overjoyed to find that the box was still running, even though the pool showed up as faulted:

$ zpool status

pool: p1
state: FAULTED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: scrub completed after 0h0m with 0 errors on Tue Feb 19 13:57:41 2008
config:

NAME STATE READ WRITE CKSUM
p1 FAULTED 0 0 0 insufficient replicas
/file1 UNAVAIL 0 0 0 corrupted data
/file2 ONLINE 0 0 0

errors: No known data errors

But my joy didn’t last long, since the box became unresponsive after a few minutes, and paniced with the following string:

Feb 19 13:57:47 nevadadev genunix: [ID 603766 kern.notice] assertion failed: vdev_config_sync(rvd->vdev_child, rvd->vdev_children, txg) == 0 (0x5 == 0x0), file: ../../common/fs/zfs/spa.c, line: 4130
Feb 19 13:57:47 nevadadev unix: [ID 100000 kern.notice]
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feab30 genunix:assfail3+b9 ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feabd0 zfs:spa_sync+5d2 ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feac60 zfs:txg_sync_thread+19a ()
Feb 19 13:57:47 nevadadev genunix: [ID 655072 kern.notice] ffffff0001feac70 unix:thread_start+8 ()

Since the manual page states that the failmode property “controls the system behavior in the event of catas-trophic pool failure,” it appears the box should have stayed up and operational when the pool became unusable. I filed a bug on the opensolaris website, so hopefully the ZFS team will get this issue addressed in the future.

This article was posted by Matty on 2008-03-01 12:05:00 -0400 -0400