Blog O' Matty


Generating Netbackup throughput data reports

This article was posted by Matty on 2009-12-12 10:04:00 -0400 -0400

If you support Netbackup at your site, I’m sure you’ve had to look into issues with slow clients and failed backups. The nbstatus script I mentioned in a previous post is useful for identifying connection problems, but it doesn’t help you understand how well your clients are performing. To help me understand how much data my clients are pushing, I wrote the nbthroughput shell script:

$ nbthroughput


Top 5 hosts by data written

Policy Schedule Storage Unit Bytes Bytes/s
-------------------- -------------------- --------------- ---------- -------
ZeusFileSystem Full-Quarterly med01-hcart3- 9199683296 39376
VMWareMBackups Full-Weekly med01-hcart3- 1756762304 84219
VMWareBackups Cumulative-Increment med01-hcart3- 1035153514 34155
VMWareBackups Cumulative-Increment med01-hcart3- 879121280 68009
ApolloFileSystem Full-Weekly med01-disk 771900576 19919

Fastest 5 clients (processed 10MB+)

Policy Schedule Storage Unit Bytes Bytes/s
-------------------- -------------------- --------------- ---------- -------
Oracle01FileSystem Differential-Increme med01-hcart3- 3609248 128365
App01FileSystem Differential-Increme med01-hcart3- 3569984 128250
Web01FileSystem Differential-Increme med01-hcart3- 3550592 126423
VMWareBackups Cumulative-Increment med01-disk 335576832 100559
ZeusFileSystem Default-Application- med01-disk 104857632 93847

Slowest 5 clients (processed 10MB+)

Policy Schedule Storage Unit Bytes Bytes/s
-------------------- -------------------- --------------- ---------- -------
W2k3-1FileSystem Differential-Increme med01-disk 1298912 333
W2k3-2FileSystem Differential-Increme med01-disk 1482752 2000
W2k3-3FileSystem Differential-Increme med01-disk 1095936 2083
W2k3-4FileSystem Differential-Increme med01-disk 4114880 2425
W2k3-5FileSystem Differential-Increme med01-disk 3496576 2483

The script will display the fastest clients, the slowest clients, and how much data your clients are pushing to your media servers. I find it useful, so I thought I would post it here for others to use.

Fixing Solaris hosts that boot to a grub> prompt

This article was posted by Matty on 2009-12-12 09:38:00 -0400 -0400

I applied the latest recommended patch bundle this week to two X4140 servers running Solaris 10. When I rebooted, I was greeted with a grub> prompt instead of the grub menu:

grub>

This wasn’t so good, and for some reason the stage1 / stage2 loaders weren’t installed correctly (or the zpool upgrade caused some issues). To fix this issue, I booted to single user mode by inserting a Solaris 10 update 8 CD and adding “console=ttya -s” to the end of the boot line. Once my box booted, I ran ‘zpool status’ to verify my pool was available:

$ zpool status

pool: rpool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c1t0d0s0 ONLINE 0 0 0

errors: No known data errors

To re-install the grub stage1 and stage2 loaders, I ran installgrub (you can get the device to use from ‘zpool status’):

$ /sbin/installgrub /boot/grub/stage1 /boot/grub/stage2

/dev/rdsk/c1t0d0s0**

stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 272 sectors starting at 50 (abs 16115)

To ensure that the boot archive was up to date, I ran ‘bootadm update-archive’:

$ bootadm update-archive -f -R /a

Creating boot_archive for /a
updating /a/platform/i86pc/boot_archive

Once these changes were made, I init 6’ed the system and it booted successfully. I’ve created quite a grub cheat sheet over the years (this made recovery a snap), and will post it here once I get it cleaned up.

Watching process creation on Linux hosts

This article was posted by Matty on 2009-12-10 00:06:00 -0400 -0400

I have been debugging a problem with Redhat cluster, and was curious if a specific process was getting executed. On my Solaris 10 hosts I can run execsnoop to observe system-wide process creation, but there isn’t anything comparable on my Linux hosts. The best I’ve found is systemtap, which provides the kprocess.exec probe to monitor exec()‘s. To access this probe, you can stash the following in a file of your choosing:

probe kprocess.exec {
    printf("%s (pid: %d) is exec'ing %sn", execname(), pid(), filename)
}

Once the file is created, you can execute the stap program to enable the exec probe:

$ stap -v exec.stp

This will produce output similar to the following:

modclusterd (pid: 5125) is exec'ing /usr/sbin/clustat
clurgmgrd (pid: 5129) is exec'ing /usr/share/cluster/clusterfs.sh
clusterfs.sh (pid: 5130) is exec'ing /usr/bin/dirname
clusterfs.sh (pid: 5131) is exec'ing /usr/bin/dirname
clusterfs.sh (pid: 5132) is exec'ing /usr/bin/dirname
clusterfs.sh (pid: 5134) is exec'ing /bin/basename
clusterfs.sh (pid: 5135) is exec'ing /sbin/consoletype
clusterfs.sh (pid: 5138) is exec'ing /usr/bin/readlink

While systemtap is missing various features that are available in DTrace, it’s still a super useful tool!

Cleaning up space used by the OpenSolaris pkg utility

This article was posted by Matty on 2009-12-06 10:40:00 -0400 -0400

I’ve been experimenting with the new OpenSolaris package manager (pkg), and ran into an odd issue last weekend. The flash drive I was running image-update on filled up, and after poking around I noticed that /var/pkg had some large directories:

$ cd /var/pkg

$ du -sh *

4.6M catalog
1.5K cfg_cache
892M download
1.5K file
4.5K history
19M index
1.5K lost+found
146M pkg
318K state

In this specific case, pkg downloaded close to 900MB to the download directory, but failed to remove the downloaded files once the image was updated. :( The pkg tool currently doesn’t have a purge option to remove old stuff in this directory, so I had to go in manually and remove everything in the download directory. It appears bug #2266 is open to address this, and removing the contents from the download directory is safe (at least according to a post I read on the pkg mailing list). I think I prefer yum to the pkg tool, but hopefully I can be swayed once pkg matures a bit more!

Scanning Linux hosts for newly added ESX storage devices

This article was posted by Matty on 2009-12-06 10:30:00 -0400 -0400

I currently support a number of Linux hosts that run inside VMWare vSphere server. Periodically I need to add new storage devices to these hosts, which requires me to login to the vSphere client and add the device through the “edit settings” selection. The cool thing about vSphere is that the LUNs are dynamically added to the guest, and the guest will see the devices once the SCSI bus has been scanned. There are several ways to scan for storage devices, but the simplest way I’ve found is to use the rescan-scsi-bus.sh shell script that comes with the sg3_utils package.

To use rescan-scsi-bus.sh, you will first need to install the sg3_utils package:

$ yum install sg3_utils

Once installed, you can run fdisk or lsscsi to view the devices on your system:

$ lsscsi

[0:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
[1:0:0:0] disk VMware Virtual disk 1.0 /dev/sdb

$ fdisk -l

Disk /dev/sda: 9663 MB, 9663676416 bytes
255 heads, 63 sectors/track, 1174 cylinders

Device Boot Start End Blocks Id System
/dev/sda2 652 1173 4192965 82 Linux swap / Solaris

Disk /dev/sdb: 19.3 GB, 19327352832 bytes
255 heads, 63 sectors/track, 2349 cylinders

Device Boot Start End Blocks Id System
/dev/sdb1 1 2349 18868311 83 Linux

As you can see above, we currently have two physical devices connected to the system. To scan for a new device I just added, we can run rescan-scsi-bus.sh from the host:

$ /usr/bin/rescan-scsi-bus.sh -r

Host adapter 0 (mptspi) found.
Host adapter 1 (mptspi) found.
Scanning SCSI subsystem for new devices
and remove devices that have disappeared
Scanning host 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs
Scanning for device 0 0 0 0 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
Scanning host 1 channels 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs
Scanning for device 1 0 0 0 ...
OLD: Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
Scanning for device 1 0 1 0 ...
NEW: Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
Scanning for device 1 0 1 0 ...
OLD: Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
0 new device(s) found.
0 device(s) removed.

The scan output will show you the devices it finds, and as you can see above, it was able to locate 3 “Virtual disk” drives. To verify the machine sees the drives, we can run lsscsi and fdisk again:

$ lsscsi

[0:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
[1:0:0:0] disk VMware Virtual disk 1.0 /dev/sdb
[1:0:1:0] disk VMware Virtual disk 1.0 /dev/sdc

$ fdisk -l

Disk /dev/sda: 9663 MB, 9663676416 bytes
255 heads, 63 sectors/track, 1174 cylinders

Device Boot Start End Blocks Id System
/dev/sda2 652 1173 4192965 82 Linux swap / Solaris

Disk /dev/sdb: 19.3 GB, 19327352832 bytes
255 heads, 63 sectors/track, 2349 cylinders

Device Boot Start End Blocks Id System
/dev/sdb1 1 2349 18868311 83 Linux

Disk /dev/sdc: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders

Disk /dev/sdc doesn't contain a valid partition table

There are various solutions to check for new SCSI devices (Emulex has a tool, QLogic has a tool, you can scan for devices by hand, etc.), the rescan script has proven to be the easiest solution for me. Since I didn’t write the rescan script, use this information at your own risk. It should work flawlessly, but you’re on your own when you run it!