ZFS user and group quotas

ZFS allows quotas to be defined for each file system, but currently lacks the ability to define user and group quotas inside a file system (you can create one file system per user to get around this). This issue is being addressed, and user and group quotas will soon be part of opensolaris. Here is a blurb from the ARC case that was submitted to address this issue:

“A. SUMMARY

This case adds support to ZFS for user/group quotas & per-uid/gid space
tracking.

B. PROBLEM

Enterprise customers often want to know who is using space, based on
what uid and gid owns each file.

Education customers often want to apply per-user quotas to hundreds of
thousands of users. In these situations, the number of users and/or
existing infrastructure prohibits using one filesystem per user and
setting filesystem-wide quotas.

1. Overview

Each filesystem keeps track of how much space inside it is owned by each
user (uid) and group (gid). This is the amount of space “referenced”,
so relationships between filesystems, descendents, clones, and snapshots
are ignored, and each tracks their “user used” and “group used”
independently. This is the same policy behind the “referenced”,
“refquota”, and “refreservation” properties. The amount of space
charged is the amount of space reported by struct stat’s st_blocks and
du(1).

Both POSIX ids (uid & gid) and untranslated SIDs are supported (eg, when
sharing filesystems over SMB without a name service translation set up).

ZFS will now enforce quotas on the amount of space referenced by files
owned by particular users and groups. Enforcement may be delayed by
several seconds. In other words, users may go a bit over their quota
before the system notices that they are over quota and begins to refuse
additional writes with EDQUOT. This decision was made to get the
feature to market in a reasonable time, with a minimum of engineering
resources expended. The design and implementation do not preclude
implementing strict enforcement at a later date.”

This will be pretty sweet, and universities and other institutions that supports lots of users will be super happy when this feature is integrated!

Recovering deleted files with ext3grep

While perusing packages on my Debian 5 host, I came across the ext3grep utility. Ext3grep allows you to poke and prod ext file system metadata structures (superblocks, inode bitmaps, block details, etc.), and has the ability to recover deleted files. To see first hand how file recovery works (this is a useful thing to have in your bag of tricks), I first created an ext3 file system to test this out:

$ dd if=/dev/zero of=/fstmp bs=512 count=4192

4192+0 records in
4192+0 records out
2146304 bytes (2.1 MB) copied, 0.0218335 s, 98.3 MB/s


$ mkfs.ext3 /fstmp

mke2fs 1.41.3 (12-Oct-2008)
/tmp/fstmp is not a block special device.
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
264 inodes, 2096 blocks
104 blocks (4.96%) reserved for the super user
First data block=1
Maximum filesystem blocks=2359296
1 block group
8192 blocks per group, 8192 fragments per group
264 inodes per group

Writing inode tables: done                            
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 32 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.


$ mount -o loop /fstmp /mnt


Once the file system was mounted, I copied over and immediately removed a file:

$ cd /mnt


$ cp /etc/services .


$ ls -la

total 37
drwxr-xr-x  3 root root  1024 2009-03-28 12:43 .
drwxr-xr-x 22 root root  4096 2009-03-28 12:41 ..
drwx------  2 root root 12288 2009-03-28 12:41 lost+found
-rw-r--r--  1 root root 18480 2009-03-28 12:43 services



$ rm -f services


$ cd /


$ umount /mnt


After the file was removed, I used the ext3grep utilities “–dump-name” option to display a list of file names:

$ ext3grep –dump-name /fstmp

Running ext3grep version 0.8.0
WARNING: I don't know what EXT3_FEATURE_COMPAT_EXT_ATTR is.
Number of groups: 1
Minimum / maximum journal block: 60 / 1089
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from 1238258603 = Sat Mar 28 12:43:23 2009
Number of descriptors in journal: 16; min / max sequence numbers: 2 / 3
Finding all blocks that might be directories.
D: block containing directory start, d: block containing more directory entries.
Each plus represents a directory start that references the same inode as a directory start that we found previously.

Searching group 0: DD+
Writing analysis so far to 'fstmp.ext3grep.stage1'. Delete that file if you want to do this stage again.
Result of stage one:
  2 inodes are referenced by one or more directory blocks, 2 of those inodes are still allocated.
  1 inodes are referenced by more than one directory block, 1 of those inodes is still allocated.
  0 blocks contain an extended directory.
Result of stage two:
  2 of those inodes could be resolved because they are still allocated.
All directory inodes are accounted for!

Writing analysis so far to 'fstmp.ext3grep.stage2'. Delete that file if you want to do this stage again.
lost+found
services



In the output above, we can see that the services file I previously removed is listed. To recover deleted files, you can run ext3grep with the “–restore-file” option to restore individual files, or with the “–restore-all” option to restore all deleted files:

$ ext3grep –restore-all /fstmp

Running ext3grep version 0.8.0
WARNING: I don't know what EXT3_FEATURE_COMPAT_EXT_ATTR is.
Number of groups: 1
Minimum / maximum journal block: 60 / 1089
Loading journal descriptors... sorting... done
The oldest inode block that is still in the journal, appears to be from 1238260343 = Sat Mar 28 13:12:23 2009
Number of descriptors in journal: 23; min / max sequence numbers: 21 / 25
Loading fstmp.ext3grep.stage2... done
Restoring services



After the restore operation completes, you can change to the RESTORED_FILES directory to view the recovered files:

$ cd RESTORED_FILES/


$ ls -la

total 32
drwxr-xr-x 3 root root  4096 2009-03-28 13:15 .
drwxr-xr-x 6 root root  4096 2009-03-28 13:15 ..
drwx------ 2 root root  4096 2009-03-28 12:41 lost+found
-rw-r--r-- 1 root root 18480 2009-03-28 13:12 services



$ openssl md5 services /etc/services

MD5(services)= b92ef61d6606c6589df9c2c406632457
MD5(/etc/services)= b92ef61d6606c6589df9c2c406632457



This is pretty fricking sweet, though there are a few caveats I should mention. It is possible that blocks will get overwritten between the file removal and recovery steps, which can hinder recovery. It is also possible that an inode may be recycled, which would indeed make recovery a bit more difficult. Additionally, any recovery operation should work off a copy of the original data to avoid tainting the original file system. Those things said, ext3grep is an incredible tool, and Carlo’s write up is a must read for those wanting to understand EXT file system structures and what is takes to recover files. Great work Carlo!

Disabling SELinux on CentOS Linux hosts

I spend a bunch of time a while back learning how SELinux worked, and it definitely has some useful applications (especially with the tools that were recently added to assist with policy setup). On some of the hosts I support it is overkill, and I disable in one of my kickstart postinstall actions. To disable SELinux, you can change the SELINUX variable to disabled in /etc/selinux/config:

$ more /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#	enforcing - SELinux security policy is enforced.
#	permissive - SELinux prints warnings instead of enforcing.
#	disabled - SELinux is fully disabled.
SELINUX=disabled
# SELINUXTYPE= type of policy in use. Possible values are:
#	targeted - Only targeted network daemons are protected.
#	strict - Full SELinux protection.
SELINUXTYPE=targeted



If you are performing this action on a running host, you can save a reboot by using the setenforce utility to disable SELinux:

$ setenforce 0

If I get some time in the next few weeks, I will clean up my SELinux notes and put them on the main prefetch.net home page.

Boot tracing in the Linux 2.6.28 kernel

While perusing the Linux change list, I noticed that boot tracing was introduced in the 2.6.28 kernel:

“The purpose of this tracer is to helps developers to optimize boot times: it records the timings of the initcalls. Its aim is to be parsed by the scripts/bootgraph.pl tool to produce graphics about boot inefficiencies, giving a visual representation of the delays during initcalls. Users need to enable CONFIG_BOOT_TRACER, boot with the “initcall_debug” and “printk.time=1” parameters, and run “dmesg | perl scripts/bootgraph.pl > output.svg” to generate the final data.”

This is pretty sweet, and I will definitely have to test this out once I finish up Redhat cluster suite + GFS testing!

Changing the zone path of a pre-existing zone

So, the hostname of one of our zones changed, and we create ZFS file systems for the zones using <path>/<zonename>

Easy enough of a fix…

root@db@blah-global:~#zoneadm list -cv
ID NAME             STATUS     PATH                           BRAND    IP
0 global           running    /                              native   shared
1 blah         running    /local2/data/zones/blah    native   shared
5 blah1          running    /local/data/zones/blah1      native   shared

root@db@blah-global:~#zonecfg -z blah1
zonecfg:blah1> set zonename=blah2
blah1: Zone state is invalid for the requested operation
So lets shut down the zone and change the zone name.

root@db@blah-global:~#zoneadm -z blah1 halt
root@db@blah-global:~#zonecfg -z blah1 info
zonename: blah1
zonepath: /local/data/zones/blah1

root@db@blah-global:~#zonecfg -z blah1
zonecfg:blah1> set zonename=blah2
zonecfg:blah2> verify
zonecfg:blah2> commit
zonecfg:blah2> exit

So now that we’ve changed the zone name, I also wanted to update the ZFS file system to reflect the new hostname…

root@db@blah-global:~#zfs rename pool0/local/data/zones/blah1 pool0/local/data/zones/blah2

Easy enough.  Lets boot the zone.

root@db@blah-global:~#zoneadm list -cv
ID NAME             STATUS     PATH                           BRAND    IP
0 global           running    /                              native   shared
1 blah         running    /local2/data/zones/blah    native   shared
– blah2    installed  /local/data/zones/blah1      native   shared

root@db@blah-global:~#zoneadm -z blah2 boot
zoneadm: /local/data/zones/blah1: No such file or directory
could not verify zonepath /local/data/zones/blah1 because of the above errors.
zoneadm: zone blah1 failed to verify

DOH.  Forgot to change the zone path.. So lets go fix this…

root@db@blah-global:~#zonecfg -z blah2
zonecfg:blah2> set zonepath=/local/data/zones/blah2
Zone blah2 already installed; set zonepath not allowed.

WHAT?!?!  Come on now zonecfg, this shouln’t be brain surgery….

So, lets go poke at the source of truth — /etc/zones.

root@db@blah-global:/etc/zones#grep blah1
index:blah2:installed:/local/data/zones/blah1:6b9891a3-7029-ef67-9581-aa01475c9b6e
blah2.xml:<zone name=”blah2″ zonepath=”/local/data/zones/blah1″ autoboot=”true”>

So, go and edit blah2.xml to change the zone path, as well as index.  (Make backup copies of these before doing so please)

Once the modifications were done, I was able to boot up the zone with the newly changed zonepath.

root@db@blah-global:/etc/zones#zoneadm list -cv
ID NAME             STATUS     PATH                           BRAND    IP
0 global           running    /                              native   shared
1 blah         running    /local2/data/zones/blah    native   shared
– blah2    installed  /local/data/zones/blah2 native   shared

root@db@blah-global:/etc/zones#zoneadm -z blah2 boot
root@db@blah-global:/etc/zones#zoneadm list -cv
ID NAME             STATUS     PATH                           BRAND    IP
0 global           running    /                              native   shared
1 blah         running    /local2/data/zones/blah    native   shared
6 blah2    running    /local/data/zones/blah2 native   shared
root@db@blah-global:/etc/zones#zlogin blah2
[Connected to zone ‘blah2’ pts/1]
Last login: Fri Mar 27 17:14:03 from blah-global
Sun Microsystems Inc.    SunOS 5.10    Generic    January 2005
#

Sun, could we please extend zonecfg to do this for us?  The header of /etc/zones/index is scary.

root@db@blah-global:~#cat /etc/zones/index
# Copyright 2004 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# ident “@(#)zones-index        1.2     04/04/01 SMI”
#
# DO NOT EDIT: this file is automatically generated by zoneadm(1M)
# and zonecfg(1M).  Any manual changes will be lost.
#