Displaying CPU temperatures on Linux hosts

Intel and AMD keep coming out with bigger and faster CPUs. Each time I upgrade (I’m currently eyeing one of these) to a newer CPU it seems like the heat sinks and cooling fans have tripled in size (I ran across this first hand when I purchased a Zalman CPU cooler last year). If you use Linux and a relatively recent motherboard, there should be a set of sensors on the motherboard that you can retrieve the current temperatures from. To access these sensors you will first need to install the lm_sensors package:

$ yum install lm_sensors

Once the software is installed and configured for your hardware you can run the sensors tool to display the current temperatures:

$ sensors

k8temp-pci-00c3
Adapter: PCI adapter
Core0 Temp:
             +14°C
Core1 Temp:
             +14°C

This is useful information, especially if you are encountering unexplained reboots. Elevated temperatures can lead to all sorts of issues, and lm_sensors is a great tool for helping to isolate these types of problems. Now back to drooling over the latest generation of processors from Intel and AMD. :)

Recovering the MBR from a Windows machine

I was chatting with a friend the other day about recovering the MBR on one of my Windows systems. He is a seasoned admin and recommended the following:

1. Boot from the Windows XP CD (the UBC may also work) and select the recovery console.

2. Once the recovery console comes up you can run fixmbr to fix up the master boot record.

3. Reboot.

This is a pretty simple fix and one I want to note for future use. I don’t always use Windows, but when I do I like to have fixes for blue screens and MBR troubles. :)

Fun times with the bash read function and subshells

There are a few shellisms that have bitten me over the years. One issue that has bitten me more than once is the interation of variable assignments when a pipe is used to pass data to a subshell. This annoyance can be easily illustrated with an example:

$ cat test

#!/bin/bash

grep MemTotal /proc/meminfo | read stat total size
echo $total

$ ./test

On first glance you would think that the echo statement would display the total amount of memory in the system. But alas, it produces nothing. The reason this occurs is because the grep output is piped to read which is run in a subshell. Read assigns the values passed from grep to the variables, but once read is finished the subshell it is running inside of will exit and the contents of stat, total and size will be lost.

To work around this we can implement one of the solutions proposed in the bash FAQ. Here is my favorite for cases similar to this:

$ cat test

#!/bin/bash

foo=$(grep MemTotal /proc/meminfo)
set -- $foo
total=$2
echo $total

$ ./test
8128692

This works because the output is stored in the variable foo and processed inside the current shell. No susbshells are created so there is no way for the variables to get nuked. If you haven’t read the bash FAQ, gotchas pages or Chris Johnson’s book you are definitely missing out. I still encounter goofy shell-related issues but am now able to immediately identify most of them since I’ve flooded myself with shell-related information. :) So what is your biggest annoyance with the various shells?

Getting the number of bytes read and written by your Linux NFS kernel threads (nfsd)

Linux NFS server implementations export a number of statistics through the /proc file system. The nfsstat utility can parse this file and display various performance counters, and the data that is displayed comes from the /proc/net/rpc/nfsd proc entry:

$ cat /proc/net/rpc/nfsd

rc 0 2585 290
fh 0 0 0 0 0
io 1069882 10485760
th 8 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ra 32 16 0 0 0 0 0 0 0 0 0 1
net 2880 0 2865 14
rpc 2875 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 6 50 1 16 37 0 0 2560 13 1 0 0 10 0 0 0 0 16 3 6 3 10
proc4 2 3 124
proc4ops 59 0 0 0 12 2 0 0 0 0 88 17 0 0 0 0 12 0 0 4 0 2 0 104 0 7 17 1 0 0 0 11 2 4 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

So what do these values mean? To figure this out we can surf over to fs/nfsd/stats.c in the kernel source:

* Format:
 *      rc   
 *                      Statistsics for the reply cache
 *      fh     
 *                      statistics for filehandle lookup
 *      io  
 *                      statistics for IO throughput
 *      th   <10%-20%> <20%-30%> ... <90%-100%> <100%> 
 *                      time (seconds) when nfsd thread usage above thresholds
 *                      and number of times that all threads were in use
 *      ra cache-size  <10%  <20%  <30% ... <100% not-found
 *                      number of times that read-ahead entry was found that deep in
 *                      the cache.
 *      plus generic RPC stats (see net/sunrpc/stats.c)

and then read through the nfsstat manual pages. While debugging an NFS issue a few weeks back I noticed that nfsstat doesn't have an option to print the number of bytes read and written (there is an io line in the output above, but for some reason nfsstat doesn't process it). The io line along with the number of reads and writes is useful for understanding how many VFS write operations are performed and the average size of these operations. To help me understand the workload pattern of the nfsd threads on one of my systems I developed the nfsrwstat.sh script:

$ nfsrwstat.sh

Date       Bytes_Read Bytes_Written
19:55:02   2139764    25187130  
19:55:07   0          3145728   
19:55:12   0          3358720   
19:55:18   0          3497984   
19:55:23   0          3629056   
19:55:28   0          3145728   

This script will help you better understand how much I/O your NFS server threads are performing, and you should be able to figure out if you have a mixed workload or a workload that is read or write intensive. When you compare this data along side the output from iostat you can start to see how the block I/O layer and I/O scheduler are handling the data as it's pushed down the stack . If you haven't read the NFS RFC of the Linux NFS FAQ I would highly recommend you do so. There is a plethora of information in those two documents, and it will make managing NFS a breeze!

To iPad or not to iPad, that is the question for my readers

Even though I’m your typical IT geek, I’m not one to jump on technology just because it’s new. I like to wait until technology stabilizes, prices drop and the lines at your favorite geek store decrease in size. I’m fortunate to have a Macbook Pro, and use it for just about everything I do. While I love my laptop the size and weight have always been a draw back for me. This week I got to visit the Apple store for the first time in I can’t recall, and I fell in love with the iPad2.

I’m seriously thinking about getting one, though a few questions came to mind:

– How much storage do I actually need? Should I max it out or will 16GB get me by?

– Will the lack of flash be a problem?

– How will I like using the parallels or fusion viewer vs. firing up a whole VM?

– Is the iPad a suitable platform for development and managing systems?

– Will I actually use the TV integration features offered by the iPad2 / Apple tv combo?

– Can I just grab my iPad at 3am and deal with system issues?

– How well do the various SSL and IPSEC VPN solutions work with the iPad2?

I’m sure a number of my readers are using iPads, and I would love to get your thoughts on these questions. I’m leaning towards this model, though I’m not really sure I need 64GB of SSD storage. I like to be practical when it comes to things like this, though sometimes more is actually better. :)

Checking ext3 file system consistency on production systems

As an admin, there is nothing worse that the feeling you get when you determine you are dealing with file system corruption. Wether it’s a lost inode or a corrupted superblock, I always get a big knot in my stomach when I figure out that corruption exists. With modern file systems like ZFS it’s trivial to check the file system consistency while the server is online. But with older file systems (ext3, ext4, etc.) you typically needed to unmount the file system, run fsck and wait (sometimes for hours!) to throughly check the consistency of the file system.

I recently came across an ingenious idea from Theodora Tso on the Redhat EXT3 users mailing list. Assuming you are using LVM, you can create a snapshot of your volume and then run fsck against the snapshot while the server is online. Nice! Ted posted a sample script to the list, and I’m currently testing this out one some large QA database machines. This may be a good solution to use while we wait for btrfs to stabilize and release a file system check tool (btrfsck). I’l post my thoughts on online fsck once I get this working reliably on a few production systems.