Adding mirrors to Veritas Volume Manager volumes

One of the cool features of Veritas Volume Manager (VxVM) is it’s ability to change the layout of a volume on the fly with vxasssist(1m). This option has helped me numerous times, especially when I needed to mirror volumes that weren’t mirrored. Given the following unmirrored striped volume:

$ vxprint -hft

Disk group: oradg

DG NAME         NCONFIG      NLOG     MINORS   GROUP-ID
ST NAME         STATE        DM_CNT   SPARE_CNT         APPVOL_CNT
DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE
RV NAME         RLINK_CNT    KSTATE   STATE    PRIMARY  DATAVOLS  SRL
RL NAME         RVG          KSTATE   STATE    REM_HOST REM_DG    REM_RLNK
CO NAME         CACHEVOL     KSTATE   STATE
VT NAME         NVOLUME      KSTATE   STATE
V  NAME         RVG/VSET/CO  KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
SV NAME         PLEX         VOLNAME  NVOLLAYR LENGTH   [COL/]OFF AM/NM    MODE
SC NAME         PLEX         CACHE    DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
DC NAME         PARENTVOL    LOGVOL
SP NAME         SNAPVOL      DCO

dg oradg        default      default  10000    1127240283.19.winnie

dm c1t1d0       c1t1d0s2     auto     2048     35521408 -
dm c1t2d0       c1t2d0s2     auto     2048     35521408 -
dm c1t3d0       c1t3d0s2     auto     2048     35521408 -
dm c1t4d0       c1t4d0s2     auto     2048     35365968 -
dm c1t5d0       c1t5d0s2     auto     2048     35521408 -
dm c1t6d0       c1t6d0s2     auto     2048     35521408 -

v  oravol01     -            ENABLED  ACTIVE   20971520 SELECT    oravol01-01 fsgen
pl oravol01-01  oravol01     ENABLED  ACTIVE   20971776 STRIPE    3/128    RW
sd c1t1d0-01    oravol01-01  c1t1d0   0        6990592  0/0       c1t1d0   ENA
sd c1t2d0-01    oravol01-01  c1t2d0   0        6990592  1/0       c1t2d0   ENA
sd c1t3d0-01    oravol01-01  c1t3d0   0        6990592  2/0       c1t3d0   ENA

We can easily add a mirror by invoking vxassist(1m) with the “mirror” option:

$ vxassist mirror oravol01 layout=stripe ncol=3 &

The mirror option accepts a layout option and several keywords to control the layout of the new mirror. In this example we used a 3-column striped plex to match the layout of the existing plex. After the mirror operation completes, the volume will contain a second plex (the mirror) that matches the original:

$ vxprint -hft

Disk group: oradg

DG NAME         NCONFIG      NLOG     MINORS   GROUP-ID
ST NAME         STATE        DM_CNT   SPARE_CNT         APPVOL_CNT
DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE
RV NAME         RLINK_CNT    KSTATE   STATE    PRIMARY  DATAVOLS  SRL
RL NAME         RVG          KSTATE   STATE    REM_HOST REM_DG    REM_RLNK
CO NAME         CACHEVOL     KSTATE   STATE
VT NAME         NVOLUME      KSTATE   STATE
V  NAME         RVG/VSET/CO  KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
SV NAME         PLEX         VOLNAME  NVOLLAYR LENGTH   [COL/]OFF AM/NM    MODE
SC NAME         PLEX         CACHE    DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
DC NAME         PARENTVOL    LOGVOL
SP NAME         SNAPVOL      DCO

dg oradg        default      default  10000    1127240283.19.winnie

dm c1t1d0       c1t1d0s2     auto     2048     35521408 -
dm c1t2d0       c1t2d0s2     auto     2048     35521408 -
dm c1t3d0       c1t3d0s2     auto     2048     35521408 -
dm c1t4d0       c1t4d0s2     auto     2048     35365968 -
dm c1t5d0       c1t5d0s2     auto     2048     35521408 -
dm c1t6d0       c1t6d0s2     auto     2048     35521408 -

v  oravol01     -            ENABLED  ACTIVE   20971520 SELECT    -        fsgen
pl oravol01-01  oravol01     ENABLED  ACTIVE   20971776 STRIPE    3/128    RW
sd c1t1d0-01    oravol01-01  c1t1d0   0        6990592  0/0       c1t1d0   ENA
sd c1t2d0-01    oravol01-01  c1t2d0   0        6990592  1/0       c1t2d0   ENA
sd c1t3d0-01    oravol01-01  c1t3d0   0        6990592  2/0       c1t3d0   ENA
pl oravol01-02  oravol01     ENABLED  ACTIVE   20971776 STRIPE    3/128    RW
sd c1t4d0-01    oravol01-02  c1t4d0   0        6990592  0/0       c1t4d0   ENA
sd c1t5d0-01    oravol01-02  c1t5d0   0        6990592  1/0       c1t5d0   ENA
sd c1t6d0-01    oravol01-02  c1t6d0   0        6990592  2/0       c1t6d0   ENA

Veritas makes managing storage a snap!

Getting failure notifications with Veritas Volume Manager

One of the cool and often overlooked features in Veritas Volume Manager is the failure notification mechanism. This facility provides automated notifications when problems are detected with Veritas managed disks, plexes, subdisks and volumes. These notifications are active by default, and will generate an e-mail to the user root each time a failure is detected. These e-mail notifications take the following form:

To: root@tigger
Subject: Volume Manager failures on host tigger
Content-Length: 240

Failures have been detected by the VERITAS Volume Manager:

failed disks:
 c1t6d0

failed plexes:
raid5vol-P08

Since e-mails will be sent to the root user by default, it is often beneficial to create a root alias in the /etc/aliases file:

$ grep “^root” /etc/aliases
root: admins@prefetch.net

If you would like messages to be sent to a user other than root, you can add additional users to the line that starts vxrelocd in the vxvm-recover init script:

$ grep vxrelocd /etc/init.d/S95vxvm-recover
vxrelocd root alerts &

I have been using this facility for years to get advanced notifications, which has helped me avoid disaster on more than one occassion.

Veritas Volume Manager (VxVM) Hot Spares

When a disk fails that is part of a redundant volume (e.g., RAID 1, RAID 5), the volume is able to continue handling I/O requests, but becomes susceptible to data loss if additional devices fail ( and in the case of RAID5 volumes, the volume will operate in a degraded state, since parity calculations are required to recreate data).

To remedy the potential impacts associated with device failures, Veritas Volume Manager (VxVM) starts the vxrelocd(1m) failure event detection and subdisk relocation daemon at system boot time. This daemon periodically scans the vxnotify(1m) output, and upon detecting a failure, attempts to relocate data to a working device.

When relocating data, Veritas will first attempt to use a device marked as a spare. If Veritas is unable to find a device marked as a spare, Veritas will attempt to relocate data to a device that contains adequate space and doesn’t have the “nohotuse” flag set. To see if a device contains the nohotuse or spare flag, the vxdisk(1m) utility can be invoked with the list option, and the device to list:

$ vxdisk list c1t6d0

Device:    c1t1d0s2
devicetag: c1t1d0
type:      auto
hostid:    pooh
disk:      name=c1t1d0 id=1123602295.10.pooh
group:     name=oradg id=1123603158.13.pooh
info:      format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags:     online ready private autoconfig spare autoimport imported
pubpaths:  block=/dev/vx/dmp/c1t1d0s2 char=/dev/vx/rdmp/c1t1d0s2
version:   3.1
iosize:    min=512 (bytes) max=2048 (blocks)
public:    slice=2 offset=2304 len=35365968 disk_offset=0
private:   slice=2 offset=256 len=2048 disk_offset=0
update:    time=1123603160 seqno=0.6
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=1280
logs:      count=1 len=192
Defined regions:
 config   priv 000048-000239[000192]: copy=01 offset=000000 enabled
 config   priv 000256-001343[001088]: copy=01 offset=000192 enabled
 log      priv 001344-001535[000192]: copy=01 offset=000000 enabled
 lockrgn  priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths:   1
c1t1d0s2        state=enabled

To mark a device as a hot spare, the vxedit(1m) utility can be used:

$ vxedit set spare=on c1t6d0

$ vxdisk list c1t6d0

Device:    c1t6d0s2
devicetag: c1t6d0
type:      auto
hostid:    winnie
disk:      name=c1t6d0 id=1127240120.14.winnie
group:     name=oradg id=1127240283.19.winnie
info:      format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags:     online ready private autoconfig spare autoimport imported
pubpaths:  block=/dev/vx/dmp/c1t6d0s2 char=/dev/vx/rdmp/c1t6d0s2
version:   3.1
iosize:    min=512 (bytes) max=2048 (blocks)
public:    slice=2 offset=2304 len=35521408 disk_offset=0
private:   slice=2 offset=256 len=2048 disk_offset=0
update:    time=1127961735 seqno=0.28
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=1280
logs:      count=1 len=192
Defined regions:
 config   priv 000048-000239[000192]: copy=01 offset=000000 disabled
 config   priv 000256-001343[001088]: copy=01 offset=000192 disabled
 log      priv 001344-001535[000192]: copy=01 offset=000000 disabled
 lockrgn  priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths:   1
c1t6d0s2        state=enabled

To request that a device not be used for relocation, the “nohotuse” flag can be set. This will cause vxrelocd(1m) to skip the device when making relocation decisions, which ensures that data doesn’t get relocated to free space on busy disks. To set the “nohotuse”flag, the vxedit(1m) utility can be used:

$ vxedit set nohotuse=on c1t6d0

$ vxdisk list c1t6d0

Device:    c1t6d0s2
devicetag: c1t6d0
type:      auto
hostid:    winnie
disk:      name=c1t6d0 id=1127240120.14.winnie
group:     name=oradg id=1127240283.19.winnie
info:      format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags:     online ready private autoconfig nohotuse autoimport imported
pubpaths:  block=/dev/vx/dmp/c1t6d0s2 char=/dev/vx/rdmp/c1t6d0s2
version:   3.1
iosize:    min=512 (bytes) max=2048 (blocks)
public:    slice=2 offset=2304 len=35521408 disk_offset=0
private:   slice=2 offset=256 len=2048 disk_offset=0
update:    time=1127961735 seqno=0.28
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=1280
logs:      count=1 len=192
Defined regions:
 config   priv 000048-000239[000192]: copy=01 offset=000000 disabled
 config   priv 000256-001343[001088]: copy=01 offset=000192 disabled
 log      priv 001344-001535[000192]: copy=01 offset=000000 disabled
 lockrgn  priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths:   1
c1t6d0s2        state=enabled

The relocation process can consumes considerable amounts of I/O and CPU resources, so it’s often beneficial to explicitly pick the hot spares by hand. This will ensure that when failures occur, data is not relocated to a chunk of free space that resides on the same spindles as your production data.

Reattaching to failed devices with Veritas Volume Manager

When Veritas loses contact (e.g., if a fiber cable is removed between a server and a storage array) with an active device, Veritas will place the device in the failed state, which will be reported as “failed was: cXtXdX” in the vxdisk(1m) output:

$ vxdisk list

DEVICE       TYPE            DISK         GROUP        STATUS
c0t0d0s2     auto:none       -            -            online invalid
c0t1d0s2     auto:none       -            -            online invalid
c1t1d0s2     auto:cdsdisk    c1t1d0       oradg        online
c1t2d0s2     auto:cdsdisk    -            -            online
c1t3d0s2     auto:cdsdisk    -            -            online
c1t4d0s2     auto:cdsdisk    -            -            online
c1t5d0s2     auto:cdsdisk    -            -            online
c1t6d0s2     auto:cdsdisk    -            -            online
-            -         c1t2d0       oradg        failed was:c1t2d0s2
-            -         c1t3d0       oradg        failed was:c1t3d0s2
-            -         c1t4d0       oradg        failed was:c1t4d0s2
-            -         c1t5d0       oradg        failed was:c1t5d0s2
-            -         c1t6d0       oradg        failed was:c1t6d0s2

When situations like this arise, the vxreattach(1m) utility can be used to reconnect Veritas to lost devices:

$ vxreattach

$ vxdisk list

DEVICE       TYPE            DISK         GROUP        STATUS
c0t0d0s2     auto:none       -            -            online invalid
c0t1d0s2     auto:none       -            -            online invalid
c1t1d0s2     auto:cdsdisk    c1t1d0       oradg        online
c1t2d0s2     auto:cdsdisk    c1t2d0       oradg        online
c1t3d0s2     auto:cdsdisk    c1t3d0       oradg        online
c1t4d0s2     auto:cdsdisk    c1t4d0       oradg        online
c1t5d0s2     auto:cdsdisk    c1t5d0       oradg        online
c1t6d0s2     auto:cdsdisk    c1t6d0       oradg        online

Prior to running vxreattach(1m), the problem that caused Veritas to lose contact with the devices should be corrected (e.g., the fiber cable should be reconnected to the server), or vxreattach(1m) should be invoked with the “-c” (check if a reattach is possible) option to see if a reattach is possible. vxreattach(1m) is a cool utility, and should go into every storage adminstrators utility belt.

Displaying Veritas controller information

When utilizing Veritas’s DMP utilities, a controller, enclosure or DMP nodename is required when performing most actions.
To list the controllers on a server, the vxdmpadm(1m) utility can be invoked with the “listctlr” option:

$ vxdmpadm listctlr all

CTLR-NAME       ENCLR-TYPE      STATE      ENCLR-NAME
=====================================================
c3              EMC             ENABLED      EMC0
c2              EMC             ENABLED      EMC0
c0              Disk            ENABLED      Disk

The vxdmpadm(1m) utility also has a “getctlr” option to display the physical device path associated with a controller:

$ vxdmpadm getctlr c2

LNAME     PNAME
===============
c2        /pci@80,2000/lpfc@1

Monitoring VxVM Usage

While poking around /var/adm/vx this week, I noticed that VxVM (Veritas Volume Manager) 4.x logs all commands that have been executed to /var/adm/vx/cmdlog:

$ tail -6 /var/adm/vx/cmdlog
# 7601, 24143, Mon Sep 12 09:21:04 2005
/usr/sbin/vxdg list -o alldgs

# 32045, 24257, Mon Sep 12 09:21:47 2005
/usr/sbin/vxdisk -o alldgs list

# 13637, 24444, Mon Sep 12 09:22:03 2005
/usr/sbin/vxprint -G -q -n -A

This is super cool, and will be extremely valuable for troubleshooting storage related problems.