I’ve been looking into deploying ZFS de-duplication, and I have one application in particular (backup staging) that would greatly benefit from it. George Wilson did an awesome introduction to ZFS de-duplication video, and it’s a great place to get started. I’m planning to start testing out de-duplication as soon as my SSDs are ordered, and hopefully I will have some positive results to report!
I got a chance to catch up with a bunch of stuff in my “need to read” / “need to watch” folder this past weekend. One of the videos I watched talked about ZFS feature flags, and how they will be used by the Illumos community to add new features to ZFS. ZFS feature flags make a lot of sense, and will definitely be invaluable once ZFS is extended by more than one organization. Cool video, and well worth the ten minutes it takes to watch it.
I talked about the ZFS scrub feature a few months back. In the latest Solaris 10 update the developers added additional scrub statistics, which are quite handy for figuring out throughout and estimated completion times:
$ zpool scrub rpool
$ zpool status -v
pool: rpool state: ONLINE scan: scrub in progress since Tue Dec 6 07:45:31 2011 1005M scanned out of 81.0G at 29.5M/s, 0h46m to go 1005M scanned out of 81.0G at 29.5M/s, 0h46m to go 0 repaired, 1.21% done config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors
This sure beats the previous output! Nice job team Solaris.
There have been a number of articles written over the past few years that talk about how silent data corruption can occur due to faulty hardware, solar flares as well as software defects. I’ve seen some oddities in the past that would probably fall into these categories, but without sufficient time to dig deep it’s impossible to know for sure.
With ZFS this is no longer the case. ZFS checksums every block of data that is written to disk, and compares this checksum when the data is read back into memory. If the checksums don’t match we know the data was changed by something other than ZFS (assuming a ZFS bug isn’t the culprit), and assuming we are using ZFS to RAID protect the storage the issue will be autoamtically fixed for us.
But what if you have a lot of data on disk that isn’t read often? Well, there is a solution. ZFS provides a scrub option to read back all of the data in the file system and validate that the data still matches the computed checksum. This feature can be access by running the zpool utility with the “scrub” option and the name of the pool to scrub:
$ zpool scrub rpool
To view the status of the scrub you can run the zpool utility with the “status” option:
$ zpool status
pool: rpool state: ONLINE scrub: scrub in progress for 0h0m, 3.81% done, 0h18m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors
The scrub operation will consume any and all I/O resources on the system (there are supposed to be throttles in place, but I’ve yet to see them work effectively), so you definitely want to run it when you’re system isn’t busy servicing your customers. If you kick off a scrub and determine that it needs to be haulted, you can add a “-s” option (stop scrubbing) to the zpool scrub command line:
$ zpool scrub -s rpool
You can confirm the scrub stopped by running zpool again:
$ zpool status
pool: rpool state: ONLINE scrub: scrub stopped after 0h0m with 0 errors on Sat Oct 15 08:28:36 2011 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors
This is pretty darn useful, and something I wish every file system had. fsck sucks, and being able to periodically check the consistency of your file system while it’s online is rad (for some reason I always want to watch Point Break after saying rad).
I just saw the following ARC case fly by, and this will be a welcome addition to the ZFS file system!:
OVERVIEW: Uncooperative or deceptive hardware, combined with power failures or sudden lack of access to devices, can result in zpools without redundancy being non-importable. ZFS' copy-on-write and Merkle tree properties will sometimes allow us to recover from these problems. Only ad-hoc means currently exist to take advantage of this recoverability. This proposal aims to rectify that short-coming. PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. Both sub-commands will now accept a '-F' recovery mode flag. When specified, a determination is made if discarding the last few transactions performed in an unopenable or non-importable pool will return the pool to an usable state. If so, the transactions are irreversibly discarded, and the pool imported. If the pool is usable or already imported and this flag is specified, the flag is ignored and no transactions are discarded. Both sub-commands will now also accept a '-n' flag. This flag is only meaningful in conjunction with the '-F' flag. When specified, an attempt is made to see if discarding transactions will return the pool to a usable state, but no transactions are actually discarded.
I have encountered errors where this feature would have been handy, and will be stoked when this feature is available in Solaris 10 / Solaris next.
I just saw the following putback notice come over the wire:
Author: Adam Leventhal
Latest revision: 17811c723fb4f9fce50616cb740a92c8f6f97651
Total changesets: 1
6854612 triple-parity RAID-Z
This is pretty sweet, and with the introduction of 2TB+ drives, using multiple parity drives will become essential to ensuring that your data is safe when a drive failures occur.