Why “partition X does not end on cylinder boundary” warnings don’t matter

While reviewing the partion layout on one of my hard drives, I noticed a number of “Partition X does not end on cylinder boundary” messages in the fdisk output:

$ fdisk /dev/sda

The number of cylinders for this disk is set to 9726.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 80.0 GB, 80000000000 bytes
255 heads, 63 sectors/track, 9726 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xac42ac42

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          26      204800   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              26         287     2097152   83  Linux
Partition 2 does not end on cylinder boundary.
/dev/sda3             287        9726    75822111+  8e  Linux LVM

This was a bit disconcerting at first, but after a few minutes of thinking it dawned on me that modern systems use LBA (Logical Block Addressing) instead of CHS (Cylinder/Head/Sector) to address disk drives. If we view the partition table using sectors instead of cylinders:

$ sfdisk -uS -l /dev/sda

Disk /dev/sda: 9726 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sda1   *        63    409662     409600  83  Linux
/dev/sda2        409663   4603966    4194304  83  Linux
/dev/sda3       4603967 156248189  151644223  8e  Linux LVM
/dev/sda4             0         -          0   0  Empty

We can see that we end at a specific sector number, and start the next partition at that number plus one. I must say that I have grown quite fond of sfdisk and parted, and they sure make digging through DOS and GPT labels super easy.

Better ZFS pool fault handling coming to an opensolaris release near you!

I just saw the following ARC case fly by, and this will be a welcome addition to the ZFS file system!:

OVERVIEW:

       Uncooperative or deceptive hardware, combined with power
       failures or sudden lack of access to devices, can result in
       zpools without redundancy being non-importable.  ZFS'
       copy-on-write and Merkle tree properties will sometimes allow
       us to recover from these problems. Only ad-hoc means currently
       exist to take advantage of this recoverability. This proposal
       aims to rectify that short-coming.

PROPOSED SOLUTION:

       This fast-track proposes two new command line flags each for
       the 'zpool clear' and 'zpool import' sub-commands.

       Both sub-commands will now accept a '-F' recovery mode flag.
       When specified, a determination is made if discarding the last
       few transactions performed in an unopenable or non-importable
       pool will return the pool to an usable state.  If so, the
       transactions are irreversibly discarded, and the pool
       imported.  If the pool is usable or already imported and this
       flag is specified, the flag is ignored and no transactions are
       discarded.

       Both sub-commands will now also accept a '-n' flag.  This flag
       is only meaningful in conjunction with the '-F' flag.  When
       specified, an attempt is made to see if discarding transactions
       will return the pool to a usable state, but no transactions are
       actually discarded.

I have encountered errors where this feature would have been handy, and will be stoked when this feature is available in Solaris 10 / Solaris next.

Dealing with cron bad user messages on Solaris hosts

While reviewing the cron logs on one of my Solaris hosts, I noticed a number of entries similar to the following:

CMD: /opt/software/bin/arrecord -backup
> arr 20359 c Thu Sep 3 23:45:00 2009
! bad user (arr) Thu Sep 3 23:45:00 2009
< arr 20359 c Thu Sep 3 23:45:00 2009 rc=1 These errors are typically generated when the account the job run as doesn't exist, or when the user's shadow entry is locked (locked accounts have a *LK* in the /etc/shadow password field). In this specific case a password or NP entry (the account doesn't have a password, and logins are denied) wasn't assigned to the arr user, so the account was still listed in the locked state. Setting a strong password fixed the issue, and everything is working swimmingly!

Getting tape drive throughput and performance statistics on Linux hosts

I manager a number of Linux Netbackup media servers, and just recently learned that Linux doesn’t provide a tool to view tape statistics (it appears there are no /proc interfaces to retrieve SCSI tape drive performance data). Fortunately the SystemTap developers saw this glaring deficiency, and created the iostat-scsi.stp script to display statistics for each SCSI tape and disk device in a server. To use SystemTap on a Redhat, CentOS or Fedora Linux host, you will first need to install the kernel debuginfo files. Here are the commands I used to install the debuginfo RPMs on a Redhat Enterprise Linux machine (you can download the RHEL debuginfo files from the Redhat FTP server, and you can get the debuginfo files for CentOS and Fedora from one of the various mirrors):

$ ls -l
total 179240
-rw-r–r– 1 matty matty 155787274 Sep 2 10:39 kernel-debuginfo-2.6.18-128.el5.x86_64.rpm
-rw-r–r– 1 matty matty 27557888 Sep 2 10:39 kernel-debuginfo-common-2.6.18-128.el5.x86_64.rpm

$ rpm -ivh kernel*
warning: kernel-debuginfo-2.6.18-128.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 37017186
Preparing… ########################################### [100%]
1:kernel-debuginfo-common########################################### [ 50%]
2:kernel-debuginfo ########################################### [100%]

Once the debuginfo files are installed, you can download the iostat-scsi.stp script from the systemtap website. To use the script to monitor just tape devices, you can use the following command line (the script will print statistics for all block devices by default):

$ stap iostat-scsi.stp 5 | egrep ‘(Device|st)’

  Device:       tps blk_read/s blk_wrtn/s  blk_read  blk_wrtn
      st1    199.20      0.00 407961.60         0   2039808
      st0    103.60      0.00 212172.80         0   1060864
      st0    141.00      0.00 288768.00         0   1443840
      st1    221.00      0.00 452608.00         0   2263040
      st0    162.80      0.00 333414.40         0   1667072
      st1    182.00      0.00 372736.00         0   1863680
      st1    197.60      0.00 404684.80         0   2023424

This will print the tape drive instance (st0 -> SCSI tape instance 0, st1 -> SCSI tape instance 1, etc.), the number of transactions per second, the blocks read and written per second, as well as the total number of blocks read and written. Systemtap is pretty cool, and I hope to publish a few scripts I wrote in the near future.