Updating SVM/Disksuite device relocation information

While booting my Solaris 10 box today, I noticed the following error in /var/adm/messages:

Jul 23 11:15:53 tigger metadevadm: [ID 209699 daemon.error] Invalid device relocation 
information detected in Solaris Volume Manager
Jul 23 11:15:53 tigger metadevadm: [ID 912841 daemon.error] Please check the status of 
the following disk(s):
Jul 23 11:15:53 tigger metadevadm: [ID 702911 daemon.error]     c1t6d0

I recently replaced c1t6d0, but forgot to update the device relocation information in the meta state database. To fix this issue, I ran metadevadm(1m) with the “-u” (update diskid in the meta state database) option:

$ metadevadm -u c1t6d0

Updating Solaris Volume Manager device relocation information for c1t6d0
Old device reloc information:
        id1,sd@SSEAGATE_SX318203LC______LR834657____1024048T
New device reloc information:
        id1,sd@SSEAGATE_SX318203LC______LR22875200001004H76G

Now metadevadm doesn’t complain when the box boots! :)

Monitoring interrupts with Solaris

The intrstat(1m) utility was introduced in Solaris 10, and allows interrupt activity to be monitored on a system:

$ intrstat 5

      device |      cpu0 %tim
-------------+---------------
       glm#0 |       953  2.6
       qfe#0 |       202  1.5
      uata#0 |        91  0.2

      device |      cpu0 %tim
-------------+---------------
       glm#0 |       879  2.6
       qfe#0 |       198  1.5
      uata#0 |        89  0.2

This provides a snapshot of the number of interrupts generated over interval seconds. To get cumulative interrupt activity for a specific period of time, Brendan Gregg’s intrtime D (DTRACE) script can be used:

$ intrtime 60

   Interrupt         Time(ns)   %Time
        uata          2869846    0.00
         qfe         46331270    0.08
         glm       1913715146    3.19
  TOTAL(int)       1962916262    3.27
  TOTAL(dur)      60008698021  100.00

With these two utilities, you can easily see which devices are busy generating interrupts. This information can also be used to ask questions like “which process is causing the activity in the qfe driver,” or “what SCSI devices are busy in the system,” or “HEY! The SCSI disk drives off the Ultra Wide SCSI controller shouldn’t be in use! Who is accessing them?!?”

DTRACE is da bomb yizo!

Fixing Safari RSS Feeds

I noticed this week that Safari was no longer updating the RSS links in my bookmarks bar, and started to wonder if 10.4.2 introduced some additional bugs. After digging through the Apple discussion forums, I came across a fix:

The RSS ‘cache’ is maintained in a separate location. To clear it, try this..

1. Quit Safari
2. Open Terminal (in ‘/Applications/Utilities’) and enter killall SyndicationAgent
3. Trash ‘~/Library/Syndication/’

After removing the Syndication directory and restarting Safari, things are back to normal.

Veritas disk group configuration records

Veritas uses disk group configuration records to store subdisk, plex, volume, and device configuration data. The configuration records get written to the private region of specific devices in each disk group, and are described in the vxinfo(1m) manual page:

A disk group configuration is a small database
that contains all volume, plex, subdisk, and disk
media records. These configurations are repli-
cated onto some or all disks in the disk group,
usually with one copy on each disk. Because these
databases are stored within disk groups, record
associations cannot span disk groups. Thus, a
subdisk defined on a disk in one disk group cannot
be associated with a volume in another disk group.

If multiple devices are present in a disk group, Veritas will replicate the configuration records to multiple devices for redundancy. You can see which devices contain disk configuration records by invoking vxdg(1m) with the “list” option:

$ vxdg list oof

Group:     oof
dgid:      1120604922.22.tigger
import-id: 1024.10
flags:     cds
version:   120
alignment: 8192 (bytes)
ssb:            on
detach-policy: global
dg-fail-policy: dgdisable
copies:    nconfig=2 nlog=default
config:    seqno=0.17098 permlen=1280 free=1223 templen=27 loglen=192
config disk c1t1d0s2 copy 1 len=1280 state=clean online
config disk c1t2d0s2 copy 1 len=1280 state=clean online
config disk c1t3d0s2 copy 1 len=1280 disabled
config disk c1t4d0s2 copy 1 len=1280 disabled
config disk c1t5d0s2 copy 1 len=1280 disabled
config disk c1t6d0s2 copy 1 len=1280 disabled
log disk c1t1d0s2 copy 1 len=192
log disk c1t2d0s2 copy 1 len=192
log disk c1t3d0s2 copy 1 len=192
log disk c1t4d0s2 copy 1 len=192
log disk c1t5d0s2 copy 1 len=192

This example shows that targets 1 and 2 contain configuration records. If you are a paranoid person, you will probably want to replicate the configuration records to several devices in the disk group. This can be accomplished with the vxedit(1m) utility:

$ vxedit -g oof set nconfig=6 oof

$ vxdg list oof | grep ^config

config:    seqno=0.17137 permlen=1280 free=1265 templen=27 loglen=192
config disk c1t1d0s2 copy 1 len=1280 state=clean online
config disk c1t2d0s2 copy 1 len=1280 state=clean online
config disk c1t3d0s2 copy 1 len=1280 state=clean online
config disk c1t4d0s2 copy 1 len=1280 state=clean online
config disk c1t5d0s2 copy 1 len=1280 state=clean online
config disk c1t6d0s2 copy 1 len=1280 state=clean online

Since the configuration records need to be updated periodically, it is poor practice to replicate configuration records to all devices in large disk groups. Veritas will use the correct number of configuration records by default, so creating additional configuration records is seldom required. For further details on configuration records, take a look at the Veritas Volume Manager administrators guide and vxintro(1m) manual page.

Finding out how a file system was created

Part of being an SA involves creating new file systems as databases and applications expand. The creation process usually requires a bit of detective work, since block sizes, inodes, journal size, and a variety of other file system attributes can boost or hamper performance. Whenever I take over control of a new server, I like to run mkfs with the “-m”( show how a file system was created) option against all of the existing file systems:

$ /usr/sbin/mkfs -F vxfs -m /dev/vx/rdsk/oradg/oravol01
mkfs -F vxfs -o bsize=8192,version=6,inosize=256,logsize=2048,largefiles /dev/vx/rdsk/oradg/oravol01 20971584

$ /usr/sbin/mkfs -F ufs -m /dev/dsk/c0t0d0s0
mkfs -F ufs -o nsect=255,ntrack=16,bsize=8192,fragsize=1024,cgsize=26,free=1, rps=90,nbpi=8154,opt=t,apc=0,gap=0,nrpos=8,maxcontig=16,
mtb=n /dev/dsk/c0t0d0s0 193245120

The “-m” option will print the options passed to mkfs at file system creation time. This information can be invaluable for reverse engineering why something was created (or changed) with a specific option. I am not sure if this option is available on other Operating Systems, but Solaris definitely supports it.

Manually synchronizing Solaris meta devices

While performing routine maintenance today, I discovered that one of my hot spare drives kicked in to replace a faulted disk drive. Since the synchronization process had recently started, I decided to shutdown the box to replace the faulted drive. Once I booted the box back up, I noticed that the synchronization process didn’t start automatically:

$ metastat d5

d5: RAID
    State: Resyncing    
    Hot spare pool: hsp001
    Interlace: 128 blocks
    Size: 106085968 blocks (50 GB)
Original device:
    Size: 106086528 blocks (50 GB)
        Device     Start Block  Dbase        State Reloc  Hot Spare
        c1t1d0s0       6002        No         Okay   Yes 
        c1t2d0s0       4926        No    Resyncing   Yes c1t6d0s0
        c1t3d0s0       4926        No         Okay   Yes 
        c1t4d0s0       4926        No         Okay   Yes 

Under normal operation, a “Resync in progress” line would be listed. To manually start the syncrhonization process, I ran the metasync(1m) command by hand:

$ metasync -r 2048

Once the command was executed, the synchronization process started:

$ metastat d5

d5: RAID
    State: Resyncing    
    Resync in progress:  0.5% done
    Hot spare pool: hsp001
    Interlace: 128 blocks
    Size: 106085968 blocks (50 GB)
Original device:
    Size: 106086528 blocks (50 GB)
        Device     Start Block  Dbase        State Reloc  Hot Spare
        c1t1d0s0       6002        No         Okay   Yes 
        c1t2d0s0       4926        No    Resyncing   Yes c1t6d0s0
        c1t3d0s0       4926        No         Okay   Yes 
        c1t4d0s0       4926        No         Okay   Yes 

Since this is a software RAID5 meta device, the synchronization process ( read data and parity, calculate parity, write data, write parity) will take a looooooong time to complete.