The /devices and /dev directories on one of my Solaris 9 hosts got majorly borked a few weeks back, and the trusy old `devfsadm -Cv’ command wasn’t able to fix our problem. To clean up the device tree, I booted from CDROM into single user mode and manually cleaned up the device hierarchy. Here is what I did to fix my problems (WARNING: This fixed my problem, but there is no guarantee that this will work for you. Please test changes similar to this on non-production systems prior to adjusting production systems.):
Step 1: Boot from CDROM into single user mode
Step 2: Mount the “/” partition to your favorite place (if your boot devices are mirrored, you will need to perform the following operations on each half of the mirror):
$ mount /dev/dsk/c0t0d0s0 /a
Step 3: Move the existing path_to_inst aside:
$ mv /a/etc/path_to_inst /a/etc/08012007.path_to_inst.orig
Step 4: Clean out the /devices and /dev directories:
$ rm -rf /a/devices/
$ rm -rf /a/dev/
Step 5: Replicate the /devices and /dev directories that were created during boot:
$ cd /devices; find . | cpio -pmd /a/devices
$ cd /dev; find . | cpio -pmd /a/dev
Step 6: Adjust the vfstab to reflect any device changes
Step 7: Boot with the “-a”, “-s” and “-r” options to create a new path_to_inst (you can optionally use `devfsadm -C -r /a -p /a/etc/path_to_inst -v’ to create the path_to_inst from single user mode), and to add device entries that weren’t found while booted from single user mode
Step 8: Grab a soda and enjoy the fruits of your labor! :)
I had a mirrored ZFS pool fill up on me this week, which required me to add additional storage to ensure that my application kept functioning correctly. Since expanding storage is a trivial process with ZFS, I decided to increase the available pool storage by replacing the 36GB disks in the pool with 72GB disks. Here is the original configuration:
$ df -h netbackup
Filesystem size used avail capacity Mounted on netbackup 33G 32G 1G 96% /opt/openv
$ zpool status -v netbackup
pool: netbackup state: ONLINE scrub: none requested config:
NAME STATE READ WRITE CKSUM netbackup ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0
errors: No known data errors
To expand the available storage, I replaced the disk c1t2d0 with a 72GB disk, and then used the zpool “replace” option to replace the old disk with the new one:
$ zpool replace netbackup c1t2d0
Once the pool finished resilvering (you can run `zpool status -v’ to monitor the progress), I replaced the disk c1t3d0 with a 72GB disk, and used the zpool “replace” option to replace the old disk with the new one:
$ zpool replace netbackup c1t3d0
Once the pool finished resilvering, I had an extra 36GB of disk space available:
$ df -h netbackup
Filesystem size used avail capacity Mounted on netbackup 67G 32G 35G 47% /opt/openv
This is pretty powerful, and it’s nice not to have to run another utility to extend volumes and file systems once new storage is available. There is also the added benefit that ZFS resilvers at the object level, and not at the block level. Giddie up!
I installed the SE Toolkit on several Solaris 10 hosts this week, and noticed that the se process was SEGFAULT’ing during startup:
$ /etc/rc3.d/S99orcallator start
Writing data into /opt/orca/nbm01/
Starting logging
Sending output to nohup.out
$ tail -1 /var/adm/messages
Aug 10 23:09:27 nbm01 genunix: [ID 603404 kern.notice] NOTICE: core_log: se.sparcv9[17571] core dumped: /var/core/core.se.sparcv9.17571
After fileing a bug report on the SE Toolkit website, it dawned on me that the issue wasn’t with the se program, but with the orcallator.se script that accompanied the SE Toolkit. Based on a hunch, I installed the latest version of orcallator.se from the Orca website, and that fixed my issue (and provided a number of additional useful performance graphs!). Hopefully this will help others who bump into this issue.
I have written several times about the Solaris Leadvilel storage stack. While debugging a problem on a Solaris 9 host that wasn’t using the Leadville stack, I would wait and wait and wait for various commands to complete. Here is one such example:
$ timex /usr/openv/volmgr/bin/scan >/dev/null
real 2:25.09 user 0.10 sys 0.45
After migrating the same host to Solaris 10 and the Leadville stack, I no longer had to wait for my commands to complete. Here is an example:
$ timex /usr/openv/volmgr/bin/scan >/dev/null
real 0.11 user 0.05 sys 0.03
After doing a bit of debugging, I noticed that the Solaris 9 fibre channel stack (which was using a vendor supplied HBA driver) was enumerating each path to see if devices were present. In the Solaris 10 case (which was using a driver in the Leadville stack), the requests where satisfied from device nodes that were cached in the device tree. I wish I could prove this without a doubt, but unfortunately I can’t run Chris’ awesome scsi.d DTrace script on my Solaris 9 host. If one of the Sun storage engineers happens to come across this blog entry, please leave me a comment with your thoughts.
I have been repairing our backup environment for the past few weeks, and have encountered several nifty tools in the Netbackup volumen management bin directory. Once of these tools is the scan utility, which displays the robots and tape devices visible to a system:
$ /usr/openv/volmgr/bin/scan |more
------------------------------------------------------------
Device Name : "/dev/rmt/0cbn"
Passthru Name: "/dev/sg/c0tw500104f0005f027cl0"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry : "STK T9940B 1.35"
Vendor ID : "STK "
Product ID : "T9940B "
Product Rev: "1.35"
Serial Number: "479000037011"
WWN : ""
WWN Id Type : 0
Device Identifier: ""
Device Type : SDT_TAPE
NetBackup Drive Type: 10
Removable : Yes
Device Supports: SCSI-3
Flags : 0x4
Reason: 0x0
<.....>
This utility is extremely useful for getting the device paths for a specific tape device, and for viewing the information returned from a SCSI INQUIRY command. Viva la scan!