The importance of keeping your storage array firmware up to date


A couple of weeks back I attempted to migrate a pair of clustered Solaris 10 servers to a new disk storage array. After rebooting into single user mode to pick up the new devices, I went to add the new quorum disk with clquorum. This resulted in both nodes panicking with the following panic string:

panic[cpu3]/thread=fffffe800125bc60: Reservation Conflict
Disk: /scsi_vhci/disk@g6000d310002c6700000000000000003e

fffffe800125ba40 fffffffff7959e39 ()
fffffe800125ba70 sd:sd_pkt_status_reservation_conflict+c8 ()
fffffe800125bab0 sd:sdintr+431 ()
fffffe800125bb50 scsi_vhci:vhci_intr+3da ()
fffffe800125bb70 fcp:ssfcp_post_callback+4a ()
fffffe800125bba0 fcp:ssfcp_cmd_callback+4c ()
fffffe800125bc00 qlc:ql_task_thread+756 ()
fffffe800125bc40 qlc:ql_task_daemon+94 ()
fffffe800125bc50 unix:thread_start+8 ()

At first I thought I was doing something wrong, but after a lot of research I figured out that there were a couple of Solaris-related bugs in the version of the storage array firmware we were using. One of the bugs was triggering the panic above, and after the array was patched everything worked as expected. Keeping up to date with firmware is just as important as keeping up to date with OS patches. It’s amazing how many firmware bugs there are, and they bite you in the oddest ways.

This article was posted by Matty on 2012-01-24 09:02:00 -0400 -0400