Bizarre SVM Issue

I had a disk drive fail in one of my machines this week, and used the typical drive replacement procedure (cfgadm / metadevadm / devfsadm) to replace the physical disk. Once the drive was replaced, I attempted to run metareplace to re-synchronize the two sub-mirrors:

$ metareplace -e d0 c0t1d0s0
d0: device c0t1d0s0 is enabled

$ metastat d0

d0: Mirror
    Submirror 0: d10
      State: Okay
    Submirror 1: d20
      State: Unavailable
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8391880 blocks (4.0 GB)

d10: Submirror of d0
    State: Okay
    Size: 8391880 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s0          0     No            Okay   Yes


d20: Submirror of d0
    State: Unavailable
    Size: 8391880 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s0          0     No               -   Yes

Eh? For some reason d20 refused to re-synchronize and enter the Okay state, and repeated attempts to use metareplace led to the same behavior. This seemed odd, so I decided to detach d20 and re-attach it with metadetach and metattach:

$ metadetach d0 d20
d0: submirror d20 is detached

$ metattach d0 d20
d0: submirror d20 is attached

These operations completed successfully, and once the re-synchronization completed, the sub-mirror entered the “Okay” state:

$ metastat d0

d0: Mirror
    Submirror 0: d10
      State: Okay
    Submirror 1: d20
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8391880 blocks (4.0 GB)

d10: Submirror of d0
    State: Okay
    Size: 8391880 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s0          0     No            Okay   Yes


d20: Submirror of d0
    State: Okay
    Size: 8391880 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s0          0     No            Okay   Yes

I am starting to speculate that this is a bug in metareplace, but wasn’t able to pinpoint anything specific on sunsolve. The moral of the story is use metattach/metadetach if you don’t want to waste lots of time. :)

8 Comments

John  on January 20th, 2006

I just experienced the exact same problem with:
uname -a
SunOS unknown 5.9 Generic_112233-12 sun4u sparc SUNW,Ultra-4

so thank you for publishing this, I’ve been trying to solve this stupid problem for 3 days now!

Sakiko  on February 15th, 2006

Too great!!!!!!!!!!!Thank you sooo much!!!
It seems this does not happen on Solaris8…

Eduardo Malaquias  on December 13th, 2006

It appens to me too … looks like Solaris 9 SVM, has some BUGs yet…

SunOS [hostname] 5.9 Generic_117171-17 sun4u sparc SUNW,Ultra-Enterprise

Maybe the problem could be some SVM patched missing, anyone knows if in Solaris 9 they come with Recommended?

thankx

rno  on May 10th, 2007

I faced the same issue and used “metastat -i” to update the status of the metadevices and clear the unavailable status.

threeta  on August 4th, 2008

same error, metastat -i fixed metastat unavailable condition, I tried detach/attach etc etc
SunOS akcux29 5.9 Generic_122300-03 sun4u sparc SUNW,Ultra-60

r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 1.0 0.0 8.2 12.0 0.0 11999.9 7.6 100 1 c0t1d0

Unsure why iostat -zxn 5 gives me this horrendous wait time on the new disk. Write/read speed seems OK.

Morten  on August 12th, 2008

Same error,
SunOS dppg1 5.9 Generic_112233-12 sun4u sparc SUNW,Sun-Fire-V240
metastat -i would properly fix it for me, but after detaching the disk, its not able to attach it again.
# metattach d20 d21
metattach: dppg1: /dev/md/rdsk/d21: No such device or address

metaclear -r d21
metainit d21 1 1 c1t0d0s1
metattach d20 d21
d20: submirror d21 is attached
and now its syncing,

mats  on November 28th, 2008

Similar weirdness happened to me. I post my solution here hoping that it might be of help to anyone out there:

****************************************************

procedure to recover from a faulted root disk dinar 20080910

solaris 9

metastat and metadb yield these problems:

metastat:

2: Mirror
Submirror 0: d12
State: Okay
Submirror 1: d22
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 28404648 blocks (13 GB)

d12: Submirror of d2
State: Okay
Size: 28404648 blocks (13 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s3 0 No Okay Yes

d22: Submirror of d2
State: Unavailable
Size: 28404648 blocks (13 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s3 0 No – Yes

d1: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d21
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 4194828 blocks (2.0 GB)

d11: Submirror of d1
State: Okay
Size: 4194828 blocks (2.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s1 0 No Okay Yes

d21: Submirror of d1
State: Unavailable
Size: 4194828 blocks (2.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s1 0 No – Yes

d0: Mirror
Submirror 0: d10
State: Okay
Submirror 1: d20
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 25166079 blocks (12 GB)

d10: Submirror of d0
State: Okay
Size: 25166079 blocks (12 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Okay Yes

d20: Submirror of d0
State: Unavailable
Size: 25166079 blocks (12 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No – Yes

metadb:

metadb -i
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s6
a p luo 16 8192 /dev/dsk/c1t0d0s7
W p l 16 8192 /dev/dsk/c1t1d0s6
W p l 16 8192 /dev/dsk/c1t1d0s7
a p luo 16 8192 /dev/dsk/c5t0d0s6
a p luo 16 8192 /dev/dsk/c5t0d0s7
a p luo 16 8192 /dev/dsk/c6t1d0s6
a p luo 16 8192 /dev/dsk/c6t1d0s7

…problem is one of the internal 2 disks, it has faulted. it houses mirror halves of 3 svm mirrors (all but informix),
as well as a 2 GB partition (unknown to me now), and 2 SVM meta database clones.

PROCEDURE TO RECOVER:

1. metadb -d c1t1d0s6; metadb -d c1t1d0s7, to delete the bad meta state databases from the bad disk
this leaves 6 out of 8 replicas alive. In theory, half+1 are needed for reboot to function.

2. replace the disk physically

3. start up the server, become root, format and partition the new disk exactly as the other mirror half.

4. metareplace -e cxtxdxsx of new disk, for example metareplace -e d2 c1t1d0s3
this should trigger a reynch, after which the mirror should functiong normally again.

5. metareplace work, after resync 99% it left the submirror still ; quite confusing.
did a metadetach d1 d21, then metaclear d21, then metainit d21 1 1 c1t1d1s1, then metattach d1 d21, and
this . Did same for d0/d20 and d2/d22. Last, di

surajraina  on August 14th, 2009

Thanks you for publishing this.It is really helpful and stupid bug in SVM

Leave a Comment