Veritas hostid debugging


I was approached by a colleague last week to investigate a problem with Veritas Volume Manager simultaneously importing a disk group on two different nodes (this is ok if your using CVM, but we weren’t). As you can imagine, this is a BAD thing, and can lead to chaos and data corruption (depending on what gets modified). I started my investigation by running “vxdisk list DISKID” to review the disk configuration records on each node:

$ vxdisk list c3t22d0s2

Device: c3t22d0s2
devicetag: c3t22d0
type: sliced
hostid: dmp
disk: name=c3t22d0s2 id=1117829023.1341.dmp
group: name=DG id=1030035763.1312.node1
flags: online ready private autoconfig autoimport imported
pubpaths: block=/dev/vx/dmp/c3t22d0s4 char=/dev/vx/rdmp/c3t22d0s4
privpaths: block=/dev/vx/dmp/c3t22d0s3 char=/dev/vx/rdmp/c3t22d0s3
version: 2.2
iosize: min=512 (bytes) max=256 (blocks)
public: slice=4 offset=0 len=104847360
private: slice=3 offset=1 len=2559
update: time=1117841684 seqno=0.9
headers: 0 248
configs: count=1 len=1865
logs: count=1 len=282
Defined regions:
config priv 000017-000247[000231]: copy=01 offset=000000 enabled
config priv 000249-001882[001634]: copy=01 offset=000231 enabled
log priv 001883-002164[000282]: copy=01 offset=000000 enabled
Multipathing information:
numpaths: 4
c3t22d0s2 state=enabled type=secondary
c3t21d0s2 state=enabled type=primary
c4t20d0s2 state=enabled type=secondary
c4t21d0s2 state=enabled type=primary

$ vxdisk list c2t0d45s2

Device: c2t0d45s2
devicetag: c2t0d45
type: sliced
hostid: dmp
disk: name=c2t0d45s2 id=1030485832.1770.dmp
group: name=DG id=1030485575.1756.node2
flags: online ready private autoconfig autoimport imported
pubpaths: block=/dev/vx/dmp/c2t0d45s4 char=/dev/vx/rdmp/c2t0d45s4
privpaths: block=/dev/vx/dmp/c2t0d45s3 char=/dev/vx/rdmp/c2t0d45s3
version: 2.2
iosize: min=512 (bytes) max=256 (blocks)
public: slice=4 offset=0 len=28445760
private: slice=3 offset=1 len=2879
update: time=1113472254 seqno=0.49
headers: 0 248
configs: count=1 len=2104
logs: count=1 len=318
Defined regions:
config priv 000017-000247[000231]: copy=01 offset=000000 disabled
config priv 000249-002121[001873]: copy=01 offset=000231 disabled
log priv 002122-002439[000318]: copy=01 offset=000000 disabled
Multipathing information:
numpaths: 2
c2t0d45s2 state=enabled
c3t1d45s2 state=enabled

Upon reviewing the ouput, the mental alarm immediately sounded when I saw that the “hostid” was the same on both nodes. If you are new to Veritas, VxVM uses the hostid flag to indicate that a disk group (and the devices in it) is imported by a specific node, which prevents nodes with non-matching hostids from automatically importing disk groups when the “autoimport” flag is set. The hostid is stored in the file /etc/vx/volboot:

$ cat /etc/vx/volboot

volboot 3.1 0.7 30
hostid dmp
end
###############################################################
###############################################################
###############################################################
###############################################################
###############################################################
###############################################################
###############################################################
#############################

After reviewing the SAN configurations that had been in place prior to me starting, I noticed that both servers were able to see the LUNs in the DG disk group. Since the autoimport flag was set, and both nodes thought they were hostid dmp, both boxes happily imported the disk group.

To resolve the problem, we deported the DG disk group from node1, and changed the hostid with the following command:

$ vxdctl hostid SOMETHING_OR_ANOTHER

Upon reviewing the vxdg man page, it looks like no data corruption would have occurred, since node1 didn’t access the volumes in the DG disk group. My work life has been filled with all kinds of interesting problems the past few weeks!

This article was posted by Matty on 2005-06-11 21:59:00 -0400 -0400