Disable Hardware on SPARC Platforms from the OBP

You can disable hardware directly from the OBP with “asr” commands.  If it’s a production critical machine, and it won’t boot because of a failed component, you can disable the hardware from the OBP and get the machine back up (although crippled) to minimize your production downtime impact.

Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@2/scsi@0/disk@0,0  File and args: -rsv
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-V445/ufsboot
Loading: /platform/sun4u/ufsboot
ERROR: Last Trap: Corrected ECC Error

{3} ok

YIKES!@#$!  We have memory failure.

The OBP keyword “sifting” will search through all of the commands the OBP knows for a particular string.  So to search for all of the commands that contain asr:

{3} ok sifting asr
In vocabulary  srassembler
(f001d858) rdasr        (f001d550) wrasr        (f001d53c) rdasr
In vocabulary  forth
(f008ee08) asr-list-keys        (f008ed2c) asr-enable
(f008ebd8) asr-disable          (f008d22c) .asr         (f008cb50) asr-clear
(f0052240) asr-policies

So, the main commands here then are asr-list-keys (show what we can disable) .asr (show what we already have disabled) asr-enable, asr-disable, and asr-clear

{3} ok asr-list-keys

key = net2&3                /pci@1f,700000/pci@0/pci@2/pci@0/@4
key = net0&1                /pci@1e,600000/pci@0/pci@1/pci@0/@4
key = ide                   /pci@1f,700000/pci@0/pci@1/pci@0/@1f
key = usb                   /pci@1f,700000/pci@0/pci@1/pci@0/@1c
key = pci7                  /pci@1f,700000/pci@0/@9
key = pci6                  /pci@1e,600000/pci@0/@9
key = pci5                  /pci@1f,700000/pci@0/pci@2/pci@0/@8
key = pci4                  /pci@1f,700000/pci@0/pci@2/pci@0/@8
key = pci3                  /pci@1e,600000/pci@0/pci@1/pci@0/@8
key = pci2                  /pci@1e,600000/pci@0/pci@1/pci@0/@8
key = pci1                  /pci@1f,700000/pci@0/@8
key = pci0                  /pci@1e,600000/pci@0/@8
key = cpu3-bank3
key = cpu3-bank2
key = cpu3-bank1
key = cpu3-bank0
key = cpu2-bank3
key = cpu2-bank2
key = cpu2-bank1
key = cpu2-bank0
key = cpu1-bank3
key = cpu1-bank2
key = cpu1-bank1
key = cpu1-bank0
key = cpu0-bank3
key = cpu0-bank2
key = cpu0-bank1
key = cpu0-bank0

Since we have an ECC memory error, we know it is with one of the above memory banks.  By disabling the memory banks on each CPU one at a time, by trial and error we can find the failed memory.

{3} ok .asr
There are no devices disabled by ASR.

Disabling cpu0-2 kept hitting the ECC memory error.  Lets disable CPU3.

{3} ok asr-disable cpu3-bank0
{3} ok asr-disable cpu3-bank1
{3} ok asr-disable cpu3-bank2
{3} ok asr-disable cpu3-bank3

{3} ok .asr
cpu3-bank3              Disabled by USER
No reason given
cpu3-bank2              Disabled by USER
No reason given
cpu3-bank1              Disabled by USER
No reason given
cpu3-bank0              Disabled by USER
No reason given

And lets boot the machine

Sun Fire V445, No Keyboard
Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.19, 24576 MB memory installed, Serial xxxxxxxxx
Ethernet address 0:14:4f:xx:xx:xx, Host ID: xxxxxxx

NOTICE: CPU 3 has 8192/8192 MB of memory disabled

ERROR: The following devices are disabled:

Thanks for telling me!

Rebooting with command: boot -rsv
Boot device: /pci@1e,600000/pci@0/pci@2/scsi@0/disk@0,0  File and args: -rsv
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-V445/ufsboot
Loading: /platform/sun4u/ufsboot
module /platform/sun4u/kernel/sparcv9/unix: text at [0x1000000, 0x107a767] data at 0x1800000
module misc/sparcv9/krtld: text at [0x107a768, 0x10933af] data at 0x184c760
module /platform/sun4u/kernel/sparcv9/genunix: text at [0x10933b0, 0x11f0f17] data at 0x1852040
module /platform/SUNW,Sun-Fire-V445/kernel/misc/sparcv9/platmod: text at [0x11f0f18, 0x11f1817] data at 0x18a45e0
module /platform/sun4u/kernel/cpu/sparcv9/SUNW,UltraSPARC-IIIi: text at [0x11f1880, 0x120278f] data at 0x18a4e80
SunOS Release 5.10 Version Generic_118833-33 64-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Ethernet address = 0:14:4f:2b:ea:aa
mem = 25165824K (0x600000000)
avail mem = 25226371072
root nexus = Sun Fire V445

YAY!  Our gimpy machine is going back into production minus 8gb of memory.  There will be a performance impact running on less system resources, but better something than nothing?

3 thoughts on “Disable Hardware on SPARC Platforms from the OBP”

  1. hi, i’m a engineer from china.i am very excited, your bolg is very helpful for me.
    i really want to make a friend with you,maybe my english is poor, but i don’t think it can baffle our communion.
    could you help me?
    this my MSN: xidian2002@hotmail.com

  2. For your post, I realise you know pretty well the Solaris OS, I am having trouble instaling linux on a sparc ultra 10, when I try to boot from a floppy or the secondary slave disk, it does not boot, error: bad magic number in label disk. I find it can be corrected using the solaris format tool, I also have the Solaris8 instalation CD, is there a way to boot from that CD an run the tool format. I apreciate your help. Thank, beautiful blog.

  3. I love your site! It comes up often when I’m having a bad day and need to find quick and dirty fixes, fast!

    One thing about the ASR commands I found out today was if the some hardware is causing faults, the disable commands aren’t truely turning off a pci slot – just preventing the OS from loading from drivers to it. Had a system with a critical hardware fault and this didn’t fix it; It helped prevent the errors in Solaris but would still cause kernel panics.

    Keep up the great work!

Leave a Reply

Your email address will not be published. Required fields are marked *