Checking ext3 file system consistency on production systems

As an admin, there is nothing worse that the feeling you get when you determine you are dealing with file system corruption. Wether it’s a lost inode or a corrupted superblock, I always get a big knot in my stomach when I figure out that corruption exists. With modern file systems like ZFS it’s trivial to check the file system consistency while the server is online. But with older file systems (ext3, ext4, etc.) you typically needed to unmount the file system, run fsck and wait (sometimes for hours!) to throughly check the consistency of the file system.

I recently came across an ingenious idea from Theodora Tso on the Redhat EXT3 users mailing list. Assuming you are using LVM, you can create a snapshot of your volume and then run fsck against the snapshot while the server is online. Nice! Ted posted a sample script to the list, and I’m currently testing this out one some large QA database machines. This may be a good solution to use while we wait for btrfs to stabilize and release a file system check tool (btrfsck). I’l post my thoughts on online fsck once I get this working reliably on a few production systems.

This article was posted by Matty on 2011-10-15 09:02:00 -0400 EDT