Using the ZFS scrub feature to verify the integrity of your storage

There have been a number of articles written over the past few years that talk about how silent data corruption can occur due to faulty hardware, solar flares as well as software defects. I’ve seen some oddities in the past that would probably fall into these categories, but without sufficient time to dig deep it’s impossible to know for sure.

With ZFS this is no longer the case. ZFS checksums every block of data that is written to disk, and compares this checksum when the data is read back into memory. If the checksums don’t match we know the data was changed by something other than ZFS (assuming a ZFS bug isn’t the culprit), and assuming we are using ZFS to RAID protect the storage the issue will be autoamtically fixed for us.

But what if you have a lot of data on disk that isn’t read often? Well, there is a solution. ZFS provides a scrub option to read back all of the data in the file system and validate that the data still matches the computed checksum. This feature can be access by running the zpool utility with the “scrub” option and the name of the pool to scrub:

$ zpool scrub rpool

To view the status of the scrub you can run the zpool utility with the “status” option:

$ zpool status

  pool: rpool
 state: ONLINE
 scrub: scrub in progress for 0h0m, 3.81% done, 0h18m to go
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c1t0d0s0  ONLINE       0     0     0

errors: No known data errors

The scrub operation will consume any and all I/O resources on the system (there are supposed to be throttles in place, but I’ve yet to see them work effectively), so you definitely want to run it when you’re system isn’t busy servicing your customers. If you kick off a scrub and determine that it needs to be haulted, you can add a “-s” option (stop scrubbing) to the zpool scrub command line:

$ zpool scrub -s rpool

You can confirm the scrub stopped by running zpool again:

$ zpool status

  pool: rpool
 state: ONLINE
 scrub: scrub stopped after 0h0m with 0 errors on Sat Oct 15 08:28:36 2011
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c1t0d0s0  ONLINE       0     0     0

errors: No known data errors

This is pretty darn useful, and something I wish every file system had. fsck sucks, and being able to periodically check the consistency of your file system while it’s online is rad (for some reason I always want to watch Point Break after saying rad).

Good write up Linux consistent network device naming

In RHEL 6.1 the default names assigned to Dell server network interfaces changed from ethX to emX and pXpX. The new names describe where a network interface physically resides in the system, and will have the following format:

emX – the X (first, second, etc.) onboard interface
pXpY – PCI device X port Y

Dell wrote a really good white paper on this, and the following text from the document summarizes how the pieces fit together:

“A naming mechanism that can impart meaning to the network interface‟s name based on the physical location of a network port in concordance to the intended system design is necessary. To achieve that, the system firmware has the ability to communicate the intended order for network devices on the mother board to the Operating System via standard mechanisms such as SMBIOS and ACPI.

The new naming scheme uses ‘biosdevname’ udev helper utility , developed by Dell and released under GPL, suggests new names based on the location of the network adapters on the system as suggested by system BIOS.”

I like the new format, and this will definitely be a nice addition to hardware provisioning systems. Hopefully in the near future we won’t have to poke around with lspci to see which interface is which. :)

Using collectl on Linux to view system performance

I recently needed to figure out what process was generating a bunch of I/O requests on a Linux system. On Solaris, there are a ton of tools available in the DTraceToolkit that can pin down i/o performance consumers.

I really miss DTrace coming into Linux. I know there’s Systemtap, but I personally haven’t had much exposure to it / seen very many customers using it in the wild.

Using iostat in Linux, I can see that the drives are spinning like mad, but I really want to know which process on the machine is driving the disk. Using collectl, we can view this. The collectl utility has a “top” like function, which you can direct separate subsystems at.

# collectl –showtopopts
The following is a list of –top’s sort types which apply to either
process or slab data. In some cases you may be allowed to sort
by a field that is not part of the display if you so desire

TOP PROCESS SORT FIELDS

Memory
vsz virtual memory
rss resident (physical) memory

Time
syst system time
usrt user time
time total time

I/O
rkb KB read
wkb KB written
iokb total I/O KB

rkbc KB read from pagecache
wkbc KB written to pagecache
iokbc total pagecacge I/O
ioall total I/O KB (iokb+iokbc)

rsys read system calls
wsys write system calls
iosys total system calls

iocncl Cancelled write bytes

Page Faults
majf major page faults
minf minor page faults
flt total page faults

Miscellaneous (best when used with –procfilt)
cpu cpu number
pid process pid
thread total process threads (not counting main)

TOP SLAB SORT FIELDS

numobj total number of slab objects
actobj active slab objects
objsize sizes of slab objects
numslab number of slabs
objslab number of objects in a slab
totsize total memory sizes taken by slabs
totchg change in memory sizes
totpct percent change in memory sizes
name slab names

 

 

 

So, i’m really interested in total I/O KB per process.

 

 

# collectl –top iokb
waiting for 1 second sample…

# TOP PROCESSES sorted by iokb (counters are /sec) 15:52:17
# PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
3751  mysql    15  3698   25 S    2G    2G  6  0.00  0.02   2  88:09.79   96   12    0    3 /usr/libexec/mysqld

 

 

Cool, so MySQL is driving the I/O subsystem.

 

This is a pretty cool utility, and even more documentation on the project’s website here.

A real world approach to learning new Operating Systems

New Operating Systems don’t pop into existence every day, but there are a slew of them out there. This includes various versions of Windows, BSD OSs, a number of Linux distributions, Solaris, AIX, Plan 9 as well as several others. As a technology geek I’m always looking to learning something new, and I recently got the opportunity to expand my Operating System knowledge. I’m now spending a good bit of my time learning everything there is to know about IBM’s AIX.

When I’ve had to learn a new OS in the past I’ve typically started off by finding a decent book (see my previous post for tips on finding cheap used books) on the OS and a machine that is capable of running the Operating System. The book allows me to pick up the basics in a short period of time, and the server allows me to run through various scenarios to see how you would install, configure, secure, troubleshoot and fix issues with the Operating System.

In addition to reading and “tinkering” around, I also like to make a list of things I need to get hands on experience with. This typically breaks down into something like this:

– How do you correctly install the operating system?

– How do I add new storage or expand existing storage?

– How do I apply and remove patches to the system?

– What steps do I need to go through to secure the system?

– What tools are available to monitor performance?

– How does the OS provide highly available services?

– What tools are available to view hardware and software problems?

– How does the logical volume manager work?

– How do I recover from a disk failure?

– How do I recover from a corrupted root file system?

– How do I recover the system from single user mode?

– How do I repair a broken package?

– How do I backup and restore a system (I focus on bare metal restores)?

– What bonding modes are supported and how do I configure them?

– Which virtualization technologies are available?

– How do I keep up to date with security and reliability updates?

And the list goes on, and on … When my list is relatively complete I always find a way to simulate each scenario with my test machine. Breaking machines, fixing them and then documenting what you did is one of the best ways to nail down the basics. Real world experience is obviously better, but I like to have a firm grasp of the basics before I start making changes to systems that could have potentially negative effects (broken patches that hose systems, updates that cause unintended issues, etc.).

In addition to getting some hands on skills the documents I produce while I’m learning are quite handy to have on standby in case you need to perform these tasks down the road. I have learned first hand the importance of familiarizing oneself with the basics of recovering a system from various disaster scenarios, because at 2am when your companies site is down you don’t have time to read through manuals or deal with 8 lines of support engineers. You need to get things back up, and if you learn how to deal with disaster situations ahead of time you will be calm, cool and collected at 2am (this assumes the disaster is something you are able to recover from though).

Once I have the basics mastered the next thing I focus on is getting certified. While I don’t place a ton of credibility in IT certifications, I definitely feel they are a great way to expand your knowledge and learn things you might not have otherwise known. If I’m fortunate enough to have the luxury of vendor OS support, I love to open a few support cases to learn how the support organization for the OS works. I’ve yet to find two companies who operate the same way, and it’s nice to learn the system before you truly need it.

I just started reading my AIX book this past weekend, and plan to start playing around with a couple of IBM p550s I have access to. I’m also going to take my AIX certification test in a couple of weeks, so I’ll definitely be crazy busy for the next month learning as much as I can. Luckily for me I love learning new things and experimenting with technology. If you’ve had to learn a new OS in the past few years feel free to chime in. I would love to get others thoughts / feedback on how they learn new stuff!!