mpstat's wi (wait on I/O) statistic

This article was posted by Matty on 2005-06-12 21:56:00 -0400 -0400

I have used mpstat over the years to check system performance metrics, and always wanted to know what went on behind the scenes. With the recent release of opensolaris, I can finally find out how mpstat gathers the statistics it displays (it uses the kstat interfaces). One statistic that has always been confusing is “wt.” According to the mpstat man page, the wt column means:

wt percent wait time

This definition is rather vague, so I read Jim Mauro’s article to get additional information on what wi meant. After reading Jim’s article, I started to think that wt was worthless, but decided to verify my hypothesis with the experts in the opensolaris performance forums. 2-minutes after posting my wi question, I received the following reply:

From: johansen To: perf-discuss@opensolaris.org Subject: [perf-discuss] Re: Wait I/O Calculations

%wio is a meaningless statistic. On Solaris 9, each thread that was blocked in biowait(), waiting for I/O, would lead to a counter being incremented. When the I/O completed, the counter was decremented. This counter was what was used to generate %wio. The problem is that we attempted to normalize the I/O-wait as portion of CPU time, when it is really the count of threads sleeping for block-I/O. Attempting to decompose the number of threads blocked for I/O into a percentage of CPU-time-used turns out to be both confusing and not particularly useful.

Becuase %wio has lead to so much confusion, and didn’t represent any real quantity of CPU time, we’ve removed the statistic in Solaris 10. %wio will now always return zero. If you’re interested more useful I/O statistics, check out iostat(1M), which will correctly show you the percentage of time each I/O transaction spends waiting to be serviced. This message posted from opensolaris.org

This re-affirmed by beliefs, and comes from the people who wrote the code that runs your servers. :) The opensolaris project is rad!

Veritas hostid debugging

This article was posted by Matty on 2005-06-11 21:59:00 -0400 -0400

I was approached by a colleague last week to investigate a problem with Veritas Volume Manager simultaneously importing a disk group on two different nodes (this is ok if your using CVM, but we weren’t). As you can imagine, this is a BAD thing, and can lead to chaos and data corruption (depending on what gets modified). I started my investigation by running “vxdisk list DISKID” to review the disk configuration records on each node:

$ vxdisk list c3t22d0s2

Device: c3t22d0s2
devicetag: c3t22d0
type: sliced
hostid: dmp
disk: name=c3t22d0s2 id=1117829023.1341.dmp
group: name=DG id=1030035763.1312.node1
flags: online ready private autoconfig autoimport imported
pubpaths: block=/dev/vx/dmp/c3t22d0s4 char=/dev/vx/rdmp/c3t22d0s4
privpaths: block=/dev/vx/dmp/c3t22d0s3 char=/dev/vx/rdmp/c3t22d0s3
version: 2.2
iosize: min=512 (bytes) max=256 (blocks)
public: slice=4 offset=0 len=104847360
private: slice=3 offset=1 len=2559
update: time=1117841684 seqno=0.9
headers: 0 248
configs: count=1 len=1865
logs: count=1 len=282
Defined regions:
config priv 000017-000247[000231]: copy=01 offset=000000 enabled
config priv 000249-001882[001634]: copy=01 offset=000231 enabled
log priv 001883-002164[000282]: copy=01 offset=000000 enabled
Multipathing information:
numpaths: 4
c3t22d0s2 state=enabled type=secondary
c3t21d0s2 state=enabled type=primary
c4t20d0s2 state=enabled type=secondary
c4t21d0s2 state=enabled type=primary

$ vxdisk list c2t0d45s2

Device: c2t0d45s2
devicetag: c2t0d45
type: sliced
hostid: dmp
disk: name=c2t0d45s2 id=1030485832.1770.dmp
group: name=DG id=1030485575.1756.node2
flags: online ready private autoconfig autoimport imported
pubpaths: block=/dev/vx/dmp/c2t0d45s4 char=/dev/vx/rdmp/c2t0d45s4
privpaths: block=/dev/vx/dmp/c2t0d45s3 char=/dev/vx/rdmp/c2t0d45s3
version: 2.2
iosize: min=512 (bytes) max=256 (blocks)
public: slice=4 offset=0 len=28445760
private: slice=3 offset=1 len=2879
update: time=1113472254 seqno=0.49
headers: 0 248
configs: count=1 len=2104
logs: count=1 len=318
Defined regions:
config priv 000017-000247[000231]: copy=01 offset=000000 disabled
config priv 000249-002121[001873]: copy=01 offset=000231 disabled
log priv 002122-002439[000318]: copy=01 offset=000000 disabled
Multipathing information:
numpaths: 2
c2t0d45s2 state=enabled
c3t1d45s2 state=enabled

Upon reviewing the ouput, the mental alarm immediately sounded when I saw that the “hostid” was the same on both nodes. If you are new to Veritas, VxVM uses the hostid flag to indicate that a disk group (and the devices in it) is imported by a specific node, which prevents nodes with non-matching hostids from automatically importing disk groups when the “autoimport” flag is set. The hostid is stored in the file /etc/vx/volboot:

$ cat /etc/vx/volboot

volboot 3.1 0.7 30
hostid dmp
end
###############################################################
###############################################################
###############################################################
###############################################################
###############################################################
###############################################################
###############################################################
#############################

After reviewing the SAN configurations that had been in place prior to me starting, I noticed that both servers were able to see the LUNs in the DG disk group. Since the autoimport flag was set, and both nodes thought they were hostid dmp, both boxes happily imported the disk group.

To resolve the problem, we deported the DG disk group from node1, and changed the hostid with the following command:

$ vxdctl hostid SOMETHING_OR_ANOTHER

Upon reviewing the vxdg man page, it looks like no data corruption would have occurred, since node1 didn’t access the volumes in the DG disk group. My work life has been filled with all kinds of interesting problems the past few weeks!

Printing PICL information

This article was posted by Matty on 2005-06-09 22:02:00 -0400 -0400

The Solaris operating system comes with the picld daemon, which allows hardware specific data to be presented in a general way. When clients read data from a PICL implementation, they can be ensured that “temperature” (or any counter value for that matter) means the same thing across different hardware implementations. To print the PICL tree, you can use the prtpicl(1M) utiltiy:

$ prtpicl -v |more

/ (picl, 3800000001)
:PICLVersion 1.100000
:_class picl
:name /
platform (upa, 3800000005)
:PlatformGroup sun4u
:OS-Version Generic_118558-06
:OS-Release 5.9
:HostName winnie
:OS-Name SunOS
:device_type upa
:banner-name Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 333MHz)
:clock-frequency 0x4f64b50
:model SUNW,375-0066
:energystar-v2
:#size-cells 0x2
:breakpoint-trap 0x7f
:#address-cells 0x2
:#interrupt-cells 0x1
:interrupt-map-mask 00 00 00 3e 00 00 00 00 00 00 00 00
:interrupt-map
00 00 00 3c 00 00 00 00 00 00 00 00 f0 05 fa 24 80 00 00 33
:idprom
01 80 08 00 20 c0 f6 e9 00 00 00 00 c0 f6 e9 a9 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00
:reset-reason S-POR
:boot-retained-page
:PAGESIZE 8192
:MMU_PAGESIZE 8192
:MMU_PAGEOFFSET 8191
:relative-addressing 1
:pm-hardware-state no-suspend-resume
:devfs-path /
:driver-name rootnex
:binding-name SUNW,Ultra-5_10
:bus-addr
:instance -1
:PlatformName SUNW,Ultra-5_10
:_class upa
:name platform
memory (memory, 38000000c4)
:Size 0x40000000
:UnitAddress 0,0
:available
[ ... ]

I find myself using this all the time to find a wide variety of information. Nifty utility!

Making Solaris DHCP server IP addresses usable

This article was posted by Matty on 2005-06-02 13:20:00 -0400 -0400

While working through a few issues with our x86 jumpstart setup, I saw the following error messages when one of the clients was attempting to PXE boot:

429f62d0: Datagram received on network device: ge0(limited broadcast)
429f62d0: (01000D609C8888,1.2.3.4) currently marked as unusable.

To make the IP address usable again, I used the Solaris pntadm utility:

$ pntadm -M 1.2.3.4 -f PERMANENT 1.2.3.0

The “-M” option instructs pntadm to modify the entry, and the “-f PERMANENT” option tells pntadm to make this entry permanent. The final option is the macro associated with the network which the IP address/client is assigned to. To verify the command worked, we can use pntadm’s “-P” (print) option:

$ pntadm -P 1.2.3.0

Client ID Flags Client IP Server IP Lease Expiration Macro Comment
01000D609C8888 01 1.2.3.4 1.2.3.200 Forever 1.2.3.0 me

Once I adjusted the flags, everything started working. Jeah!

PF's skip on interface directive

This article was posted by Matty on 2005-05-23 22:04:00 -0400 -0400

The OpenBSD packet filter (PF) received several enhancements in OpenBSD 3.7. One of the coolest things is the ability to tell PF not to filter traffic on specific interfaces, such as the loopback interface. This behavior is defined in the pf.conf configuration file with the “set skip on” statement:

set skip on lo0

Prior to the “set skip on” option you had to explicilty allow traffic to flow with the following pass statements:

pass in quick on lo0 all
pass out quick on lo0 all

This of course assumes a default policy of “block everything not explicilty allowed,” which all firewalls SHOULD be using.