I received a call from one of my users today, and he mentioned that the /var file system utilization reported by df did not match the output from du. I logged into the box to see what was going on, and ran the df and du commands to see how much space was being used:
$ df -h /var
Filesystem size used avail capacity Mounted on
/dev/md/dsk/d3 3.9G 2.0G 1.8G 53% /var
$ cd /var && du -sk . ..
302898
One I saw this information, I realized that a file had most likely been unlinked from the file system, but was still open by one or more processes. To see which process was responsible for this annoyance, I used the lsof “+L1” option to list open files with a link count of zero:
$ lsof +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
evhandsd 1424 root 3w VREG 85,3 897032 0 7404 /var (/dev/md/dsk/d3)
syslogd 1818 root 14w VREG 85,3 1884238513 0 6803 /var (/dev/md/dsk/d3)
[ ... ]
Errrr – based on this information, it looks like syslogd has a 1.8GB logfile open with a link count of zero (I wish I had process accounting running to see which process unlinked this file out from under syslogd). To fix this issue and synchronize the df and du output, I restarted syslogd:
$ /etc/init.d/syslogd stop && /etc/init.d/syslogd start
Which allowed the file to go away and the df and du output to match:
$ df -h /var
Filesystem size used avail capacity Mounted on
/dev/md/dsk/d3 3.9G 302M 3.6G 8% /var
$ du -sk /var
300668 /var
This little exercise reminded me how awesome lsof is.