Configuring jumpstart to install Solaris on a ZFS root

I was playing around with ZFS root a week or two back, and wanted to be able to create the ZFS root pool and associated file systems (dump device, swap, /var) through jumpstart. To install to a ZFS root pool, you can add the “pool” directive to your client profile:

pool rpool auto 4g 4g rootdisk.s0

The entry above breaks down as follows:

pool <root pool name> <pool size> <swap size> <dump device size> <device list>

The device list can contain a single device for non-mirrored configurations, or multiple devices for mirrored configurations. If you specify a mirrored configuration, you will need to include the “mirror” keyword in your profile:

pool rpool auto 4g 4g mirror c0t0d0s0 c0t1d0s0

If you are using live upgrade, you can also name the boot environment with the “bootenv” keyword. This is pretty cool stuff, and it’s nice having the various ZFS features (checksums, snapshots, compression, etc.) available in the root pool!

Automating ZFS snapshots with the SMF auto-snapshot service

One of the nice features of ZFS is the ability to take file system snapshots, which you can then use to recover perviously deleted data. In recent opensolaris and Nevada builds, there are several auto-snapshot services that can be used to schedule hourly, daily, weekly and monthly snapshots:

$ svcs -a | grep auto-snapshot
disabled 9:28:28 svc:/system/filesystem/zfs/auto-snapshot:frequent
online 9:28:53 svc:/system/filesystem/zfssnap-roleadd:default
online 12:55:54 svc:/system/filesystem/zfs/auto-snapshot:daily
online 12:56:02 svc:/system/filesystem/zfs/auto-snapshot:weekly
online 12:56:11 svc:/system/filesystem/zfs/auto-snapshot:monthly
online 12:58:37 svc:/system/filesystem/zfs/auto-snapshot:hourly

To enable scheduled snapshots (these services are disabled by default), you can enable one or more of these services with svcadm. Once enabled, these services will create a cron entry in the zfssnap users crontab:

$ cat /var/spool/cron/crontabs/zfssnap
0 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 * * /lib/svc/method/zfs-auto-snapshot svc:/system/filesystem/zfs/auto-snapshot:daily
0 0 1,8,15,22,29 * * /lib/svc/method/zfs-auto-snapshot svc:/system/filesystem/zfs/auto-snapshot:weekly
0 0 1 1,2,3,4,5,6,7,8,9,10,11,12 * /lib/svc/method/zfs-auto-snapshot svc:/system/filesystem/zfs/auto-snapshot:monthly
0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 * * * /lib/svc/method/zfs-auto-snapshot svc:/system/filesystem/zfs/auto-snapshot:hourly

The cron jobs are used to schedule the automated snapshots, and are added and removed when one of the services are enabled or disabled. I’m not entirely clear why the auto-snapshot author didn’t use “*” in the daily and hourly entries, but hopefully there is a good reason.

To view the list of snapshots on a host, you can use the zfs utility:

$ zfs list -r -t snapshot | head -5

NAME                                                       USED  AVAIL  REFER  MOUNTPOINT
bits@zfs-auto-snap:daily-2009-06-18-12:55                     0      -  34.4K  -
bits@zfs-auto-snap:weekly-2009-06-18-12:56                    0      -  34.4K  -
bits@zfs-auto-snap:monthly-2009-06-18-12:56                   0      -  34.4K  -
bits@zfs-auto-snap:daily-2009-06-19-00:00                     0      -  34.4K  -



If you need to recover a file that was previously deleted, you can cd into the “.zfs/snapshot” directory in the file system that contained the deleted data:

$ zfs list bits/home

NAME        USED  AVAIL  REFER  MOUNTPOINT
bits/home   181K  1.17T   149K  /home



$ cd /home/.zfs/snapshot

Locate the correct snapshot to recover the file from with ls:

$ ls -l | tail -5
drwxr-xr-x 3 root root 3 May 19 22:43 zfs-auto-snap:hourly-2009-06-20-09:00
drwxr-xr-x 3 root root 3 May 19 22:43 zfs-auto-snap:hourly-2009-06-20-10:00
drwxr-xr-x 3 root root 3 May 19 22:43 zfs-auto-snap:hourly-2009-06-20-11:00
drwxr-xr-x 3 root root 3 May 19 22:43 zfs-auto-snap:monthly-2009-06-18-12:56
drwxr-xr-x 3 root root 3 May 19 22:43 zfs-auto-snap:weekly-2009-06-18-12:56

And then recover the file with cp (or a restore program):

$ find . -name importantfile
./matty/importantfile

$ cp matty/importantfile /tmp

$ ls -la /tmp/importantfile
-rw-r–r– 1 root root 101186 Jun 20 11:30 /tmp/importantfile

This is pretty sweet, and being able to enable automated snapshots with a couple of svcadm invocations is super convenient!

ZFS user and group quotas

ZFS allows quotas to be defined for each file system, but currently lacks the ability to define user and group quotas inside a file system (you can create one file system per user to get around this). This issue is being addressed, and user and group quotas will soon be part of opensolaris. Here is a blurb from the ARC case that was submitted to address this issue:

“A. SUMMARY

This case adds support to ZFS for user/group quotas & per-uid/gid space
tracking.

B. PROBLEM

Enterprise customers often want to know who is using space, based on
what uid and gid owns each file.

Education customers often want to apply per-user quotas to hundreds of
thousands of users. In these situations, the number of users and/or
existing infrastructure prohibits using one filesystem per user and
setting filesystem-wide quotas.

1. Overview

Each filesystem keeps track of how much space inside it is owned by each
user (uid) and group (gid). This is the amount of space “referenced”,
so relationships between filesystems, descendents, clones, and snapshots
are ignored, and each tracks their “user used” and “group used”
independently. This is the same policy behind the “referenced”,
“refquota”, and “refreservation” properties. The amount of space
charged is the amount of space reported by struct stat’s st_blocks and
du(1).

Both POSIX ids (uid & gid) and untranslated SIDs are supported (eg, when
sharing filesystems over SMB without a name service translation set up).

ZFS will now enforce quotas on the amount of space referenced by files
owned by particular users and groups. Enforcement may be delayed by
several seconds. In other words, users may go a bit over their quota
before the system notices that they are over quota and begins to refuse
additional writes with EDQUOT. This decision was made to get the
feature to market in a reasonable time, with a minimum of engineering
resources expended. The design and implementation do not preclude
implementing strict enforcement at a later date.”

This will be pretty sweet, and universities and other institutions that supports lots of users will be super happy when this feature is integrated!

ZFS in the trenches

Ben Rockwood is awesome.  If you haven’t had a chance to check out his blog, its a must read for any Solaris Admin.

He gave a presentation at the Open Storage Summit about ZFS.  (Video Here) Its worth the read / view   for some indepth ZFS concepts.

Joerg assisted in turning some of this into plain English so the rest of us can understand.

Kudos to Ben for giving props to Matty in the pdf!

ZFS dataset and volume properties

A  few chapters of the upcoming OpenSolaris Bible have been released.  Specifically, looking through chapter 8 on ZFS, I came across this handy list of properties and their descriptions.

File System Properties
aclinherit– Inheritance of ACL entries
aclmode– Modification of ACLs in a chmod(2) operation
atime– Whether access times of files are updated when read
available– Space available to the file system
canmount– Whether the file system is mountable
casesensitivity– Case sensitivity of filename matching
checksum– Checksum algorithm for data integrity
compression– Compression algorithm
compressratio– Compression ratio achieved
copies– Number of data copies stored
creation– Time the file system was created
devices– Whether device nodes can be opened
exec– Whether processes can be executed
mounted– Whether the file system is mounted
mountpoint– Mount point for the file system
nbmand– Use of nonblocking mandatory locks with CIFS
normalization– Use Unicode-normalized filenames in name comparisons
origin– Snapshot on which a clone is based
primarycache– Controls whether ZFS data and metadata are cached in the primary cache
quota– Limit on space that the file system can consume
readonly– Whether the file system can be modified
recordsize– Suggested block size for files
referenced– Amount of data accessible within the file system
refquota– Space limit for this file system
refreservation– Minimum space guaranteed to the file system
reservation– Minimum space guaranteed to the file system and descendants
secondarycache– Controls whether ZFS data and metadata are cached in the secondary cache
setuid– Allow setuid file execution
shareiscsi– Export volumes within the file system as iSCSI targets
sharenfs– Share the file system via NFS
sharesmb– Share the file system via CIFS
snapdir– Whether the .zfs directory is visible
type– Type of dataset
used– Space consumed by the file system and descendants
usedbychildren– Space freed if children of the file system were destroyed
usedbydataset– Space freed if snapshots and refreservation were destroyed, and contents of the file system were deleted
usedbyrefreservation– Space freed if the refreservation was removed
usedbysnapshots– Space freed if all snapshots of the file system were destroyed
utf8only– Use only UTF-8 character set for filenames
version– On-disk version of the file system
vscan– Whether to scan regular files for viruses
xattr– Whether extended attributes are enabled
zoned– Whether the file system is managed from a nonglobal zone

Volume Properties
available– Space available to the volume
checksum– Checksum algorithm for data integrity
compression– Compression algorithm
compressratio– Compression ratio achieved
copies– Number of data copies stored
creation– Time the volume was created
origin– Snapshot on which the clone is based
primarycache– Controls whether ZFS data and metadata are cached in the primary cache
readonly– Whether the volume can be modified
referenced– Amount of data accessible within the volume
refreservation– Minimum space guaranteed to the volume
reservation– Minimum space guaranteed to the volume and descendants
secondarycache– Controls whether ZFS data and metadata are cached in the secondary cache
shareiscsi– Export the volume as an iSCSI target
type– Type of dataset
used– Space consumed by the volume and descendants
usedbychildren– Space freed if children of the volume were destroyed
usedbydataset– Space freed if snapshots and refreservation were destroyed, and contents of the volume were deleted
usedbyrefreservation– Space freed if the refreservation was removed
usedbysnapshots– Space freed if all snapshots of the volume were destroyed
volblocksize– Block size of the volume
volsize– Logical size of the volume

Some of these properties were new to me, as they probably only exist in later versions of ZFS in OpenSolaris.   Specifically, vscan, to scan files for viruses is interesting.  I’m wondering where virus definations are stored and updated.  This is actually a pretty nifty feature if you plan on using Solaris’ new in-kernel SMB server to share data with Microsoft Windows based clients.

UPDATE:  Richard provided via a comment an awesome link that shows how to administrate the new CIFS server within OpenSolaris, as well as how to execute the virus scan using the vscanadm utility.  Take a look at the slides on that link for an in-depth administrative tour of these features.

I’d like to learn more about the primarycache and secondarycache settings — exactly what gets tuned when fiddling around with these.

Also, a property called “copies” which allows you to specifiy how many copies of the data should be kept on disk.   I’m not sure exactly why you would want to increase the number of copies of data instead of using raidz, raidz2, mirroring, hot spares, etc. but its neat that the option is there.

Figuring out if a dedicated ZFS intent log will help

ZFS uses the ZFS intent log (also referred to as a ZIL) to store synchronous writes. This has the advantage that a full transaction group doesn’t need to be written when a synchronous write occurs, and maximizes the use of I/O bandwidth. For some applications (databases come to mind), placing the ZIL on a dedicated device can be extremely helpful. But how can you tell if placing the intent log on a separate device would be useful? To answer this question, you can run Richard Elling’s super handy zilstat DTrace script. As SSD becomes cheaper and cheaper, this script will be a useful tool for folks who want to applications and database performance.