Disable Hardware on SPARC Platforms from the OBP


You can disable hardware directly from the OBP with “asr” commands.  If it’s a production critical machine, and it won’t boot because of a failed component, you can disable the hardware from the OBP and get the machine back up (although crippled) to minimize your production downtime impact.

Rebooting with command: boot Boot device: /pci@1e,600000/pci@0/pci@2/scsi@0/disk@0,0  File and args: -rsv Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54. FCode UFS Reader 1.12 00/07/17 15:48:16. Loading: /platform/SUNW,Sun-Fire-V445/ufsboot Loading: /platform/sun4u/ufsboot ERROR: Last Trap: Corrected ECC Error

{3} ok

YIKES!@#$!  We have memory failure.

The OBP keyword “sifting” will search through all of the commands the OBP knows for a particular string.  So to search for all of the commands that contain asr:

{3} ok sifting asr In vocabulary  srassembler (f001d858) rdasr        (f001d550) wrasr        (f001d53c) rdasr In vocabulary  forth (f008ee08) asr-list-keys        (f008ed2c) asr-enable (f008ebd8) asr-disable          (f008d22c) .asr         (f008cb50) asr-clear (f0052240) asr-policies

So, the main commands here then are asr-list-keys (show what we can disable) .asr (show what we already have disabled) asr-enable, asr-disable, and asr-clear

{3} ok asr-list-keys

key = net2&3                /pci@1f,700000/pci@0/pci@2/pci@0/@4 key = net0&1                /pci@1e,600000/pci@0/pci@1/pci@0/@4 key = ide                   /pci@1f,700000/pci@0/pci@1/pci@0/@1f key = usb                   /pci@1f,700000/pci@0/pci@1/pci@0/@1c key = pci7                  /pci@1f,700000/pci@0/@9 key = pci6                  /pci@1e,600000/pci@0/@9 key = pci5                  /pci@1f,700000/pci@0/pci@2/pci@0/@8 key = pci4                  /pci@1f,700000/pci@0/pci@2/pci@0/@8 key = pci3                  /pci@1e,600000/pci@0/pci@1/pci@0/@8 key = pci2                  /pci@1e,600000/pci@0/pci@1/pci@0/@8 key = pci1                  /pci@1f,700000/pci@0/@8 key = pci0                  /pci@1e,600000/pci@0/@8 key = cpu3-bank3 key = cpu3-bank2 key = cpu3-bank1 key = cpu3-bank0 key = cpu2-bank3 key = cpu2-bank2 key = cpu2-bank1 key = cpu2-bank0 key = cpu1-bank3 key = cpu1-bank2 key = cpu1-bank1 key = cpu1-bank0 key = cpu0-bank3 key = cpu0-bank2 key = cpu0-bank1 key = cpu0-bank0

Since we have an ECC memory error, we know it is with one of the above memory banks.  By disabling the memory banks on each CPU one at a time, by trial and error we can find the failed memory.

{3} ok .asr There are no devices disabled by ASR.

Disabling cpu0-2 kept hitting the ECC memory error.  Lets disable CPU3.

{3} ok asr-disable cpu3-bank0 {3} ok asr-disable cpu3-bank1 {3} ok asr-disable cpu3-bank2 {3} ok asr-disable cpu3-bank3

{3} ok .asr cpu3-bank3              Disabled by USER No reason given cpu3-bank2              Disabled by USER No reason given cpu3-bank1              Disabled by USER No reason given cpu3-bank0              Disabled by USER No reason given

And lets boot the machine

Sun Fire V445, No Keyboard Copyright 2006 Sun Microsystems, Inc.  All rights reserved. OpenBoot 4.22.19, 24576 MB memory installed, Serial xxxxxxxxx Ethernet address 0:14:4f:xx:xx:xx, Host ID: xxxxxxx

NOTICE: CPU 3 has 8192/8192 MB of memory disabled

ERROR: The following devices are disabled: cpu3-bank3 cpu3-bank2 cpu3-bank1 cpu3-bank0

Thanks for telling me!

Rebooting with command: boot -rsv Boot device: /pci@1e,600000/pci@0/pci@2/scsi@0/disk@0,0  File and args: -rsv Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54. FCode UFS Reader 1.12 00/07/17 15:48:16. Loading: /platform/SUNW,Sun-Fire-V445/ufsboot Loading: /platform/sun4u/ufsboot module /platform/sun4u/kernel/sparcv9/unix: text at [0x1000000, 0x107a767] data at 0x1800000 module misc/sparcv9/krtld: text at [0x107a768, 0x10933af] data at 0x184c760 module /platform/sun4u/kernel/sparcv9/genunix: text at [0x10933b0, 0x11f0f17] data at 0x1852040 module /platform/SUNW,Sun-Fire-V445/kernel/misc/sparcv9/platmod: text at [0x11f0f18, 0x11f1817] data at 0x18a45e0 module /platform/sun4u/kernel/cpu/sparcv9/SUNW,UltraSPARC-IIIi: text at [0x11f1880, 0x120278f] data at 0x18a4e80 SunOS Release 5.10 Version Generic_118833-33 64-bit Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved. Use is subject to license terms. Ethernet address = 0:14:4f:2b:ea:aa mem = 25165824K (0x600000000) avail mem = 25226371072 root nexus = Sun Fire V445

YAY!  Our gimpy machine is going back into production minus 8gb of memory.  There will be a performance impact running on less system resources, but better something than nothing?

This article was posted by Matty on 2008-07-25 09:12:00 -0400 -0400