Why the ext3 fsck’s after X days or Y mounts?

Reading through my RSS feeds, I came across the following blog post describing one Linux administrator using tune2fs to disable the “please run fsck on this file system after X days or Y mounts.”

I’ve got to admit, this is kind of annoying. I’ve taken production critical Linux boxes down for some maintenance, only to have the downtime extended +15-30 minutes because the file system was configured to run a fsck. Google searching this topic even shows other administrators trying other stupid tactics to avoid the fsck on reboot.

Is there really any value on having fsck run after some period of time? On Unix based systems (and even in Windows), fsck (or chkdisk) only runs when the kernel notices that a file system is in some sort of inconsistent state. So then I ask, why did the Linux community decide to run fsck on file systems in consistent state? ZFS has a “scrub” operation that can be ran against a dataset, but even that is comparing block level checksums. Ext2/3, RiserFS, XFS don’t perform block level checksums (btrfs does) so why the need to run fsck after some period of time? Does running fsck give folks the warm n’ fuzzies that their data is clean, or is there some deeper technical reason why this is scheduled? If you have any answers / historical data, please feel free to share. =)

8 thoughts on “Why the ext3 fsck’s after X days or Y mounts?”

  1. I would imagine that Silent data corruption is one reason why people still won’t turn off fsck checks on boot. The only file system I know of that’s resistant to this is ZFS.

  2. I always thought that it was mostly to perform a regular defrag, but right now I can’t find an official link.
    I am going to keep searching…

  3. Sorry, I was wrong, fsck can’t perform any defrag.

    So I guess the standard setting only makes sense when the file system is mounted without journaling (by default on most distros).

    I personnally use journaling and deactivate any boot scan.

  4. I’ve always assumed it was an artifact of pushing Linux to desktop users, who wouldn’t know how to tell if their filesystem needed checking, let alone how to do it.

  5. Or even simpler…it’s a sensible default that works well for most situations and doesn’t cause undo harm.

    Once upon a time in 2k10…we performed maintenance on 14 hosts at the same time…and they all kicked off a fsck of their SAN-attached LUNs at the same time :(

  6. tune2fs(8) – see below – talks about possible data corruption due to faulty disks, cables and/or memory plus kernel bugs which may corrupt the FS without the kernel noticing it so it would never become visibly dirty to the kernel.

    Though you could still ask yourself if it matters if a fsck is forced because of the n’th mount/reboot while your silent data corruption happened a possibly long time ago…

    You should strongly consider the consequences of disabling mount-count-dependent checking entirely. Bad disk drives, cables, memory, and kernel bugs could all corrupt a filesystem without marking the filesystem dirty or in error. If you are using journaling on your filesystem, your filesystem will never
    be marked dirty, so it will not normally be checked. A filesystem error detected by the kernel will still force an fsck on the
    next reboot, but it may already be too late to prevent data loss at that point.

  7. +15-30 minutes?
    Man, you’re either lucky or have small disks. I have several terabytes on my company fileserver, and I’ll be very happy it the thing will be up and running in 2-3 hours.

Leave a Reply

Your email address will not be published. Required fields are marked *