Bizarre SVM Issue

I had a disk drive fail in one of my machines this week, and used the typical drive replacement procedure (cfgadm / metadevadm / devfsadm) to replace the physical disk. Once the drive was replaced, I attempted to run metareplace to re-synchronize the two sub-mirrors:

$ metareplace -e d0 c0t1d0s0
d0: device c0t1d0s0 is enabled

$ metastat d0

d0: Mirror
    Submirror 0: d10
      State: Okay
    Submirror 1: d20
      State: Unavailable
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8391880 blocks (4.0 GB)

d10: Submirror of d0
    State: Okay
    Size: 8391880 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s0          0     No            Okay   Yes


d20: Submirror of d0
    State: Unavailable
    Size: 8391880 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s0          0     No               -   Yes

Eh? For some reason d20 refused to re-synchronize and enter the Okay state, and repeated attempts to use metareplace led to the same behavior. This seemed odd, so I decided to detach d20 and re-attach it with metadetach and metattach:

$ metadetach d0 d20
d0: submirror d20 is detached

$ metattach d0 d20
d0: submirror d20 is attached

These operations completed successfully, and once the re-synchronization completed, the sub-mirror entered the “Okay” state:

$ metastat d0

d0: Mirror
    Submirror 0: d10
      State: Okay
    Submirror 1: d20
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8391880 blocks (4.0 GB)

d10: Submirror of d0
    State: Okay
    Size: 8391880 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s0          0     No            Okay   Yes


d20: Submirror of d0
    State: Okay
    Size: 8391880 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s0          0     No            Okay   Yes

I am starting to speculate that this is a bug in metareplace, but wasn’t able to pinpoint anything specific on sunsolve. The moral of the story is use metattach/metadetach if you don’t want to waste lots of time. :)

8 thoughts on “Bizarre SVM Issue”

  1. I just experienced the exact same problem with:
    uname -a
    SunOS unknown 5.9 Generic_112233-12 sun4u sparc SUNW,Ultra-4

    so thank you for publishing this, I’ve been trying to solve this stupid problem for 3 days now!

  2. It appens to me too … looks like Solaris 9 SVM, has some BUGs yet…

    SunOS [hostname] 5.9 Generic_117171-17 sun4u sparc SUNW,Ultra-Enterprise

    Maybe the problem could be some SVM patched missing, anyone knows if in Solaris 9 they come with Recommended?

    thankx

  3. I faced the same issue and used “metastat -i” to update the status of the metadevices and clear the unavailable status.

  4. same error, metastat -i fixed metastat unavailable condition, I tried detach/attach etc etc
    SunOS akcux29 5.9 Generic_122300-03 sun4u sparc SUNW,Ultra-60

    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    0.0 1.0 0.0 8.2 12.0 0.0 11999.9 7.6 100 1 c0t1d0

    Unsure why iostat -zxn 5 gives me this horrendous wait time on the new disk. Write/read speed seems OK.

  5. Same error,
    SunOS dppg1 5.9 Generic_112233-12 sun4u sparc SUNW,Sun-Fire-V240
    metastat -i would properly fix it for me, but after detaching the disk, its not able to attach it again.
    # metattach d20 d21
    metattach: dppg1: /dev/md/rdsk/d21: No such device or address

    metaclear -r d21
    metainit d21 1 1 c1t0d0s1
    metattach d20 d21
    d20: submirror d21 is attached
    and now its syncing,

  6. Similar weirdness happened to me. I post my solution here hoping that it might be of help to anyone out there:

    ****************************************************

    procedure to recover from a faulted root disk dinar 20080910

    solaris 9

    metastat and metadb yield these problems:

    metastat:

    2: Mirror
    Submirror 0: d12
    State: Okay
    Submirror 1: d22
    State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 28404648 blocks (13 GB)

    d12: Submirror of d2
    State: Okay
    Size: 28404648 blocks (13 GB)
    Stripe 0:
    Device Start Block Dbase State Reloc Hot Spare
    c1t0d0s3 0 No Okay Yes

    d22: Submirror of d2
    State: Unavailable
    Size: 28404648 blocks (13 GB)
    Stripe 0:
    Device Start Block Dbase State Reloc Hot Spare
    c1t1d0s3 0 No – Yes

    d1: Mirror
    Submirror 0: d11
    State: Okay
    Submirror 1: d21
    State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 4194828 blocks (2.0 GB)

    d11: Submirror of d1
    State: Okay
    Size: 4194828 blocks (2.0 GB)
    Stripe 0:
    Device Start Block Dbase State Reloc Hot Spare
    c1t0d0s1 0 No Okay Yes

    d21: Submirror of d1
    State: Unavailable
    Size: 4194828 blocks (2.0 GB)
    Stripe 0:
    Device Start Block Dbase State Reloc Hot Spare
    c1t1d0s1 0 No – Yes

    d0: Mirror
    Submirror 0: d10
    State: Okay
    Submirror 1: d20
    State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 25166079 blocks (12 GB)

    d10: Submirror of d0
    State: Okay
    Size: 25166079 blocks (12 GB)
    Stripe 0:
    Device Start Block Dbase State Reloc Hot Spare
    c1t0d0s0 0 No Okay Yes

    d20: Submirror of d0
    State: Unavailable
    Size: 25166079 blocks (12 GB)
    Stripe 0:
    Device Start Block Dbase State Reloc Hot Spare
    c1t1d0s0 0 No – Yes

    metadb:

    metadb -i
    flags first blk block count
    a m p luo 16 8192 /dev/dsk/c1t0d0s6
    a p luo 16 8192 /dev/dsk/c1t0d0s7
    W p l 16 8192 /dev/dsk/c1t1d0s6
    W p l 16 8192 /dev/dsk/c1t1d0s7
    a p luo 16 8192 /dev/dsk/c5t0d0s6
    a p luo 16 8192 /dev/dsk/c5t0d0s7
    a p luo 16 8192 /dev/dsk/c6t1d0s6
    a p luo 16 8192 /dev/dsk/c6t1d0s7

    …problem is one of the internal 2 disks, it has faulted. it houses mirror halves of 3 svm mirrors (all but informix),
    as well as a 2 GB partition (unknown to me now), and 2 SVM meta database clones.

    PROCEDURE TO RECOVER:

    1. metadb -d c1t1d0s6; metadb -d c1t1d0s7, to delete the bad meta state databases from the bad disk
    this leaves 6 out of 8 replicas alive. In theory, half+1 are needed for reboot to function.

    2. replace the disk physically

    3. start up the server, become root, format and partition the new disk exactly as the other mirror half.

    4. metareplace -e cxtxdxsx of new disk, for example metareplace -e d2 c1t1d0s3
    this should trigger a reynch, after which the mirror should functiong normally again.

    5. metareplace work, after resync 99% it left the submirror still ; quite confusing.
    did a metadetach d1 d21, then metaclear d21, then metainit d21 1 1 c1t1d1s1, then metattach d1 d21, and
    this . Did same for d0/d20 and d2/d22. Last, di

Leave a Reply

Your email address will not be published. Required fields are marked *