A great introduction to ZFS de-duplication

I’ve been looking into deploying ZFS de-duplication, and I have one application in particular (backup staging) that would greatly benefit from it. George Wilson did an awesome introduction to ZFS de-duplication video, and it’s a great place to get started. I’m planning to start testing out de-duplication as soon as my SSDs are ordered, and hopefully I will have some positive results to report!

An interesting way of looking at file system versioning (ZFS feature flags)

I got a chance to catch up with a bunch of stuff in my “need to read” / “need to watch” folder this past weekend. One of the videos I watched talked about ZFS feature flags, and how they will be used by the Illumos community to add new features to ZFS. ZFS feature flags make a lot of sense, and will definitely be invaluable once ZFS is extended by more than one organization. Cool video, and well worth the ten minutes it takes to watch it.

Improved ZFS scrub statistics in Solaris 10 update 9

I talked about the ZFS scrub feature a few months back. In the latest Solaris 10 update the developers added additional scrub statistics, which are quite handy for figuring out throughout and estimated completion times:

$ zpool scrub rpool

$ zpool status -v

  pool: rpool
 state: ONLINE
 scan: scrub in progress since Tue Dec  6 07:45:31 2011
    1005M scanned out of 81.0G at 29.5M/s, 0h46m to go
    1005M scanned out of 81.0G at 29.5M/s, 0h46m to go
    0 repaired, 1.21% done
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c1t0d0s0  ONLINE       0     0     0

errors: No known data errors

This sure beats the previous output! Nice job team Solaris.

Using the ZFS scrub feature to verify the integrity of your storage

There have been a number of articles written over the past few years that talk about how silent data corruption can occur due to faulty hardware, solar flares as well as software defects. I’ve seen some oddities in the past that would probably fall into these categories, but without sufficient time to dig deep it’s impossible to know for sure.

With ZFS this is no longer the case. ZFS checksums every block of data that is written to disk, and compares this checksum when the data is read back into memory. If the checksums don’t match we know the data was changed by something other than ZFS (assuming a ZFS bug isn’t the culprit), and assuming we are using ZFS to RAID protect the storage the issue will be autoamtically fixed for us.

But what if you have a lot of data on disk that isn’t read often? Well, there is a solution. ZFS provides a scrub option to read back all of the data in the file system and validate that the data still matches the computed checksum. This feature can be access by running the zpool utility with the “scrub” option and the name of the pool to scrub:

$ zpool scrub rpool

To view the status of the scrub you can run the zpool utility with the “status” option:

$ zpool status

  pool: rpool
 state: ONLINE
 scrub: scrub in progress for 0h0m, 3.81% done, 0h18m to go
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c1t0d0s0  ONLINE       0     0     0

errors: No known data errors

The scrub operation will consume any and all I/O resources on the system (there are supposed to be throttles in place, but I’ve yet to see them work effectively), so you definitely want to run it when you’re system isn’t busy servicing your customers. If you kick off a scrub and determine that it needs to be haulted, you can add a “-s” option (stop scrubbing) to the zpool scrub command line:

$ zpool scrub -s rpool

You can confirm the scrub stopped by running zpool again:

$ zpool status

  pool: rpool
 state: ONLINE
 scrub: scrub stopped after 0h0m with 0 errors on Sat Oct 15 08:28:36 2011
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c1t0d0s0  ONLINE       0     0     0

errors: No known data errors

This is pretty darn useful, and something I wish every file system had. fsck sucks, and being able to periodically check the consistency of your file system while it’s online is rad (for some reason I always want to watch Point Break after saying rad).

Better ZFS pool fault handling coming to an opensolaris release near you!

I just saw the following ARC case fly by, and this will be a welcome addition to the ZFS file system!:

OVERVIEW:

       Uncooperative or deceptive hardware, combined with power
       failures or sudden lack of access to devices, can result in
       zpools without redundancy being non-importable.  ZFS'
       copy-on-write and Merkle tree properties will sometimes allow
       us to recover from these problems. Only ad-hoc means currently
       exist to take advantage of this recoverability. This proposal
       aims to rectify that short-coming.

PROPOSED SOLUTION:

       This fast-track proposes two new command line flags each for
       the 'zpool clear' and 'zpool import' sub-commands.

       Both sub-commands will now accept a '-F' recovery mode flag.
       When specified, a determination is made if discarding the last
       few transactions performed in an unopenable or non-importable
       pool will return the pool to an usable state.  If so, the
       transactions are irreversibly discarded, and the pool
       imported.  If the pool is usable or already imported and this
       flag is specified, the flag is ignored and no transactions are
       discarded.

       Both sub-commands will now also accept a '-n' flag.  This flag
       is only meaningful in conjunction with the '-F' flag.  When
       specified, an attempt is made to see if discarding transactions
       will return the pool to a usable state, but no transactions are
       actually discarded.

I have encountered errors where this feature would have been handy, and will be stoked when this feature is available in Solaris 10 / Solaris next.

Triple parity RAIDZ (RAIDZ3) support in ZFS

I just saw the following putback notice come over the wire:

Author: Adam Leventhal
Repository: /hg/onnv/onnv-gate
Latest revision: 17811c723fb4f9fce50616cb740a92c8f6f97651
Total changesets: 1
Log message:
6854612 triple-parity RAID-Z

This is pretty sweet, and with the introduction of 2TB+ drives, using multiple parity drives will become essential to ensuring that your data is safe when a drive failures occur.