If you support Netbackup at your site, I’m sure you’ve had to look into issues with slow clients and failed backups. The nbstatus script I mentioned in a previous post is useful for identifying connection problems, but it doesn’t help you understand how well your clients are performing. To help me understand how much data my clients are pushing, I wrote the nbthroughput shell script:
$ nbthroughput
Top 5 hosts by data written
Policy Schedule Storage Unit Bytes Bytes/s
-------------------- -------------------- --------------- ---------- -------
ZeusFileSystem Full-Quarterly med01-hcart3- 9199683296 39376
VMWareMBackups Full-Weekly med01-hcart3- 1756762304 84219
VMWareBackups Cumulative-Increment med01-hcart3- 1035153514 34155
VMWareBackups Cumulative-Increment med01-hcart3- 879121280 68009
ApolloFileSystem Full-Weekly med01-disk 771900576 19919
Fastest 5 clients (processed 10MB+)
Policy Schedule Storage Unit Bytes Bytes/s
-------------------- -------------------- --------------- ---------- -------
Oracle01FileSystem Differential-Increme med01-hcart3- 3609248 128365
App01FileSystem Differential-Increme med01-hcart3- 3569984 128250
Web01FileSystem Differential-Increme med01-hcart3- 3550592 126423
VMWareBackups Cumulative-Increment med01-disk 335576832 100559
ZeusFileSystem Default-Application- med01-disk 104857632 93847
Slowest 5 clients (processed 10MB+)
Policy Schedule Storage Unit Bytes Bytes/s
-------------------- -------------------- --------------- ---------- -------
W2k3-1FileSystem Differential-Increme med01-disk 1298912 333
W2k3-2FileSystem Differential-Increme med01-disk 1482752 2000
W2k3-3FileSystem Differential-Increme med01-disk 1095936 2083
W2k3-4FileSystem Differential-Increme med01-disk 4114880 2425
W2k3-5FileSystem Differential-Increme med01-disk 3496576 2483
The script will display the fastest clients, the slowest clients, and how much data your clients are pushing to your media servers. I find it useful, so I thought I would post it here for others to use.
I applied the latest recommended patch bundle this week to two X4140 servers running Solaris 10. When I rebooted, I was greeted with a grub> prompt instead of the grub menu:
grub>
This wasn’t so good, and for some reason the stage1 / stage2 loaders weren’t installed correctly (or the zpool upgrade caused some issues). To fix this issue, I booted to single user mode by inserting a Solaris 10 update 8 CD and adding “console=ttya -s” to the end of the boot line. Once my box booted, I ran ‘zpool status’ to verify my pool was available:
$ zpool status
pool: rpool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c1t0d0s0 ONLINE 0 0 0
errors: No known data errors
To re-install the grub stage1 and stage2 loaders, I ran installgrub (you can get the device to use from ‘zpool status’):
$ /sbin/installgrub /boot/grub/stage1 /boot/grub/stage2
/dev/rdsk/c1t0d0s0**
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 272 sectors starting at 50 (abs 16115)
To ensure that the boot archive was up to date, I ran ‘bootadm update-archive’:
$ bootadm update-archive -f -R /a
Creating boot_archive for /a
updating /a/platform/i86pc/boot_archive
Once these changes were made, I init 6’ed the system and it booted successfully. I’ve created quite a grub cheat sheet over the years (this made recovery a snap), and will post it here once I get it cleaned up.
I have been debugging a problem with Redhat cluster, and was curious if a specific process was getting executed. On my Solaris 10 hosts I can run execsnoop to observe system-wide process creation, but there isn’t anything comparable on my Linux hosts. The best I’ve found is systemtap, which provides the kprocess.exec probe to monitor exec()‘s. To access this probe, you can stash the following in a file of your choosing:
probe kprocess.exec {
printf("%s (pid: %d) is exec'ing %sn", execname(), pid(), filename)
}
Once the file is created, you can execute the stap program to enable the exec probe:
$ stap -v exec.stp
This will produce output similar to the following:
modclusterd (pid: 5125) is exec'ing /usr/sbin/clustat
clurgmgrd (pid: 5129) is exec'ing /usr/share/cluster/clusterfs.sh
clusterfs.sh (pid: 5130) is exec'ing /usr/bin/dirname
clusterfs.sh (pid: 5131) is exec'ing /usr/bin/dirname
clusterfs.sh (pid: 5132) is exec'ing /usr/bin/dirname
clusterfs.sh (pid: 5134) is exec'ing /bin/basename
clusterfs.sh (pid: 5135) is exec'ing /sbin/consoletype
clusterfs.sh (pid: 5138) is exec'ing /usr/bin/readlink
While systemtap is missing various features that are available in DTrace, it’s still a super useful tool!
I’ve been experimenting with the new OpenSolaris package manager (pkg), and ran into an odd issue last weekend. The flash drive I was running image-update on filled up, and after poking around I noticed that /var/pkg had some large directories:
$ cd /var/pkg
$ du -sh *
4.6M catalog
1.5K cfg_cache
892M download
1.5K file
4.5K history
19M index
1.5K lost+found
146M pkg
318K state
In this specific case, pkg downloaded close to 900MB to the download directory, but failed to remove the downloaded files once the image was updated. :( The pkg tool currently doesn’t have a purge option to remove old stuff in this directory, so I had to go in manually and remove everything in the download directory. It appears bug #2266 is open to address this, and removing the contents from the download directory is safe (at least according to a post I read on the pkg mailing list). I think I prefer yum to the pkg tool, but hopefully I can be swayed once pkg matures a bit more!
I currently support a number of Linux hosts that run inside VMWare vSphere server. Periodically I need to add new storage devices to these hosts, which requires me to login to the vSphere client and add the device through the “edit settings” selection. The cool thing about vSphere is that the LUNs are dynamically added to the guest, and the guest will see the devices once the SCSI bus has been scanned. There are several ways to scan for storage devices, but the simplest way I’ve found is to use the rescan-scsi-bus.sh shell script that comes with the sg3_utils package.
To use rescan-scsi-bus.sh, you will first need to install the sg3_utils package:
$ yum install sg3_utils
Once installed, you can run fdisk or lsscsi to view the devices on your system:
$ lsscsi
[0:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
[1:0:0:0] disk VMware Virtual disk 1.0 /dev/sdb
$ fdisk -l
Disk /dev/sda: 9663 MB, 9663676416 bytes
255 heads, 63 sectors/track, 1174 cylinders
Device Boot Start End Blocks Id System
/dev/sda2 652 1173 4192965 82 Linux swap / Solaris
Disk /dev/sdb: 19.3 GB, 19327352832 bytes
255 heads, 63 sectors/track, 2349 cylinders
Device Boot Start End Blocks Id System
/dev/sdb1 1 2349 18868311 83 Linux
As you can see above, we currently have two physical devices connected to the system. To scan for a new device I just added, we can run rescan-scsi-bus.sh from the host:
$ /usr/bin/rescan-scsi-bus.sh -r
Host adapter 0 (mptspi) found.
Host adapter 1 (mptspi) found.
Scanning SCSI subsystem for new devices
and remove devices that have disappeared
Scanning host 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs
Scanning for device 0 0 0 0 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
Scanning host 1 channels 0 for SCSI target IDs 0 1 2 3 4 5 6 7, all LUNs
Scanning for device 1 0 0 0 ...
OLD: Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
Scanning for device 1 0 1 0 ...
NEW: Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
Scanning for device 1 0 1 0 ...
OLD: Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: VMware Model: Virtual disk Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
0 new device(s) found.
0 device(s) removed.
The scan output will show you the devices it finds, and as you can see above, it was able to locate 3 “Virtual disk” drives. To verify the machine sees the drives, we can run lsscsi and fdisk again:
$ lsscsi
[0:0:0:0] disk VMware Virtual disk 1.0 /dev/sda
[1:0:0:0] disk VMware Virtual disk 1.0 /dev/sdb
[1:0:1:0] disk VMware Virtual disk 1.0 /dev/sdc
$ fdisk -l
Disk /dev/sda: 9663 MB, 9663676416 bytes
255 heads, 63 sectors/track, 1174 cylinders
Device Boot Start End Blocks Id System
/dev/sda2 652 1173 4192965 82 Linux swap / Solaris
Disk /dev/sdb: 19.3 GB, 19327352832 bytes
255 heads, 63 sectors/track, 2349 cylinders
Device Boot Start End Blocks Id System
/dev/sdb1 1 2349 18868311 83 Linux
Disk /dev/sdc: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Disk /dev/sdc doesn't contain a valid partition table
There are various solutions to check for new SCSI devices (Emulex has a tool, QLogic has a tool, you can scan for devices by hand, etc.), the rescan script has proven to be the easiest solution for me. Since I didn’t write the rescan script, use this information at your own risk. It should work flawlessly, but you’re on your own when you run it!