Veritas Volume Manager (VxVM) Hot Spares


When a disk fails that is part of a redundant volume (e.g., RAID 1, RAID 5), the volume is able to continue handling I/O requests, but becomes susceptible to data loss if additional devices fail ( and in the case of RAID5 volumes, the volume will operate in a degraded state, since parity calculations are required to recreate data).

To remedy the potential impacts associated with device failures, Veritas Volume Manager (VxVM) starts the vxrelocd(1m) failure event detection and subdisk relocation daemon at system boot time. This daemon periodically scans the vxnotify(1m) output, and upon detecting a failure, attempts to relocate data to a working device.

When relocating data, Veritas will first attempt to use a device marked as a spare. If Veritas is unable to find a device marked as a spare, Veritas will attempt to relocate data to a device that contains adequate space and doesn’t have the “nohotuse” flag set. To see if a device contains the nohotuse or spare flag, the vxdisk(1m) utility can be invoked with the list option, and the device to list:

$ vxdisk list c1t6d0

Device: c1t1d0s2
devicetag: c1t1d0
type: auto
hostid: pooh
disk: name=c1t1d0 id=1123602295.10.pooh
group: name=oradg id=1123603158.13.pooh
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig spare autoimport imported
pubpaths: block=/dev/vx/dmp/c1t1d0s2 char=/dev/vx/rdmp/c1t1d0s2
version: 3.1
iosize: min=512 (bytes) max=2048 (blocks)
public: slice=2 offset=2304 len=35365968 disk_offset=0
private: slice=2 offset=256 len=2048 disk_offset=0
update: time=1123603160 seqno=0.6
ssb: actual_seqno=0.0
headers: 0 240
configs: count=1 len=1280
logs: count=1 len=192
Defined regions:
config priv 000048-000239[000192]: copy=01 offset=000000 enabled
config priv 000256-001343[001088]: copy=01 offset=000192 enabled
log priv 001344-001535[000192]: copy=01 offset=000000 enabled
lockrgn priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths: 1
c1t1d0s2 state=enabled

To mark a device as a hot spare, the vxedit(1m) utility can be used:

$ vxedit set spare=on c1t6d0

$ vxdisk list c1t6d0

Device: c1t6d0s2
devicetag: c1t6d0
type: auto
hostid: winnie
disk: name=c1t6d0 id=1127240120.14.winnie
group: name=oradg id=1127240283.19.winnie
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig spare autoimport imported
pubpaths: block=/dev/vx/dmp/c1t6d0s2 char=/dev/vx/rdmp/c1t6d0s2
version: 3.1
iosize: min=512 (bytes) max=2048 (blocks)
public: slice=2 offset=2304 len=35521408 disk_offset=0
private: slice=2 offset=256 len=2048 disk_offset=0
update: time=1127961735 seqno=0.28
ssb: actual_seqno=0.0
headers: 0 240
configs: count=1 len=1280
logs: count=1 len=192
Defined regions:
config priv 000048-000239[000192]: copy=01 offset=000000 disabled
config priv 000256-001343[001088]: copy=01 offset=000192 disabled
log priv 001344-001535[000192]: copy=01 offset=000000 disabled
lockrgn priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths: 1
c1t6d0s2 state=enabled

To request that a device not be used for relocation, the “nohotuse” flag can be set. This will cause vxrelocd(1m) to skip the device when making relocation decisions, which ensures that data doesn’t get relocated to free space on busy disks. To set the “nohotuse"flag, the vxedit(1m) utility can be used:

$ vxedit set nohotuse=on c1t6d0

$ vxdisk list c1t6d0

Device: c1t6d0s2
devicetag: c1t6d0
type: auto
hostid: winnie
disk: name=c1t6d0 id=1127240120.14.winnie
group: name=oradg id=1127240283.19.winnie
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig nohotuse autoimport imported
pubpaths: block=/dev/vx/dmp/c1t6d0s2 char=/dev/vx/rdmp/c1t6d0s2
version: 3.1
iosize: min=512 (bytes) max=2048 (blocks)
public: slice=2 offset=2304 len=35521408 disk_offset=0
private: slice=2 offset=256 len=2048 disk_offset=0
update: time=1127961735 seqno=0.28
ssb: actual_seqno=0.0
headers: 0 240
configs: count=1 len=1280
logs: count=1 len=192
Defined regions:
config priv 000048-000239[000192]: copy=01 offset=000000 disabled
config priv 000256-001343[001088]: copy=01 offset=000192 disabled
log priv 001344-001535[000192]: copy=01 offset=000000 disabled
lockrgn priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths: 1
c1t6d0s2 state=enabled

The relocation process can consumes considerable amounts of I/O and CPU resources, so it’s often beneficial to explicitly pick the hot spares by hand. This will ensure that when failures occur, data is not relocated to a chunk of free space that resides on the same spindles as your production data.

This article was posted by Matty on 2005-09-28 23:54:00 -0400 -0400