Disable Hardware on SPARC Platforms from the OBP

You can disable hardware directly from the OBP with “asr” commands.  If it’s a production critical machine, and it won’t boot because of a failed component, you can disable the hardware from the OBP and get the machine back up (although crippled) to minimize your production downtime impact.

Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@2/scsi@0/disk@0,0  File and args: -rsv
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-V445/ufsboot
Loading: /platform/sun4u/ufsboot
ERROR: Last Trap: Corrected ECC Error

{3} ok

YIKES!@#$!  We have memory failure.

The OBP keyword “sifting” will search through all of the commands the OBP knows for a particular string.  So to search for all of the commands that contain asr:

{3} ok sifting asr
In vocabulary  srassembler
(f001d858) rdasr        (f001d550) wrasr        (f001d53c) rdasr
In vocabulary  forth
(f008ee08) asr-list-keys        (f008ed2c) asr-enable
(f008ebd8) asr-disable          (f008d22c) .asr         (f008cb50) asr-clear
(f0052240) asr-policies

So, the main commands here then are asr-list-keys (show what we can disable) .asr (show what we already have disabled) asr-enable, asr-disable, and asr-clear

{3} ok asr-list-keys

key = net2&3                /pci@1f,700000/pci@0/pci@2/pci@0/@4
key = net0&1                /pci@1e,600000/pci@0/pci@1/pci@0/@4
key = ide                   /pci@1f,700000/pci@0/pci@1/pci@0/@1f
key = usb                   /pci@1f,700000/pci@0/pci@1/pci@0/@1c
key = pci7                  /pci@1f,700000/pci@0/@9
key = pci6                  /pci@1e,600000/pci@0/@9
key = pci5                  /pci@1f,700000/pci@0/pci@2/pci@0/@8
key = pci4                  /pci@1f,700000/pci@0/pci@2/pci@0/@8
key = pci3                  /pci@1e,600000/pci@0/pci@1/pci@0/@8
key = pci2                  /pci@1e,600000/pci@0/pci@1/pci@0/@8
key = pci1                  /pci@1f,700000/pci@0/@8
key = pci0                  /pci@1e,600000/pci@0/@8
key = cpu3-bank3
key = cpu3-bank2
key = cpu3-bank1
key = cpu3-bank0
key = cpu2-bank3
key = cpu2-bank2
key = cpu2-bank1
key = cpu2-bank0
key = cpu1-bank3
key = cpu1-bank2
key = cpu1-bank1
key = cpu1-bank0
key = cpu0-bank3
key = cpu0-bank2
key = cpu0-bank1
key = cpu0-bank0

Since we have an ECC memory error, we know it is with one of the above memory banks.  By disabling the memory banks on each CPU one at a time, by trial and error we can find the failed memory.

{3} ok .asr
There are no devices disabled by ASR.

Disabling cpu0-2 kept hitting the ECC memory error.  Lets disable CPU3.

{3} ok asr-disable cpu3-bank0
{3} ok asr-disable cpu3-bank1
{3} ok asr-disable cpu3-bank2
{3} ok asr-disable cpu3-bank3

{3} ok .asr
cpu3-bank3              Disabled by USER
No reason given
cpu3-bank2              Disabled by USER
No reason given
cpu3-bank1              Disabled by USER
No reason given
cpu3-bank0              Disabled by USER
No reason given

And lets boot the machine

Sun Fire V445, No Keyboard
Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.19, 24576 MB memory installed, Serial xxxxxxxxx
Ethernet address 0:14:4f:xx:xx:xx, Host ID: xxxxxxx

NOTICE: CPU 3 has 8192/8192 MB of memory disabled

ERROR: The following devices are disabled:
cpu3-bank3
cpu3-bank2
cpu3-bank1
cpu3-bank0

Thanks for telling me!

Rebooting with command: boot -rsv
Boot device: /pci@1e,600000/pci@0/pci@2/scsi@0/disk@0,0  File and args: -rsv
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Sun-Fire-V445/ufsboot
Loading: /platform/sun4u/ufsboot
module /platform/sun4u/kernel/sparcv9/unix: text at [0x1000000, 0x107a767] data at 0x1800000
module misc/sparcv9/krtld: text at [0x107a768, 0x10933af] data at 0x184c760
module /platform/sun4u/kernel/sparcv9/genunix: text at [0x10933b0, 0x11f0f17] data at 0x1852040
module /platform/SUNW,Sun-Fire-V445/kernel/misc/sparcv9/platmod: text at [0x11f0f18, 0x11f1817] data at 0x18a45e0
module /platform/sun4u/kernel/cpu/sparcv9/SUNW,UltraSPARC-IIIi: text at [0x11f1880, 0x120278f] data at 0x18a4e80
SunOS Release 5.10 Version Generic_118833-33 64-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Ethernet address = 0:14:4f:2b:ea:aa
mem = 25165824K (0x600000000)
avail mem = 25226371072
root nexus = Sun Fire V445

YAY!  Our gimpy machine is going back into production minus 8gb of memory.  There will be a performance impact running on less system resources, but better something than nothing?

x86 / linux boot process

There is quite a bit of documentation around the internet on the linux boot process, but Gustavo Duarte I think did an excellent job describing this in a clear and concise way.  He also has several links to the Linux  kernel source code and describes what is occurring step-by-step through the bootstrap phase all the way to the execution of /sbin/init.

His first entry lays the foundation of the basis of the x86 Intel chipset, memory map, and logical motherboard layout.   This provides a basic understanding about the traditional hardware motherboard implementations.

Next, he describes BIOS initialization, and loading of the MBR.  This briefly touches on the boot loader which starts the Linux bootstrap phase.

Finally, the kernel boot process is detailed with links to C and Assembly source code, with a brief narrative of exactly what is happening.

This was an awesome description of the early-on start up and initialization phases of hardware and bootstrapping of the O/S.  Gustavo provides a great description of real-mode and protected-mode CPU states.

Thanks Gustavo!

Viewing the changes that have occurred to an RPM package

I recently encountered a bug in one of the Linux utilities I was using, and upgrading to the latest version of the utility appeared to fix the issue. Being the curious guy I am, I started poking around the web and various release notes to see when the issue was fixed. While digging through this information, I came across the SUPER handy yum changelog plugin. This nifty plugin will display the changes that have occurred to a package, along with the version those changes were incorporated into. To use the changelog plugin, you first need to install it:

$ yum install yum-changelog

After the plugin is installed, you can add the “–changelog” argument to the yum command line to view the changelog for that package:

$ yum update kernel –changelog
Loading “installonlyn” plugin
Loading “changelog” plugin
Setting up Update Process
Setting up repositories
other.xml.gz 100% |=========================| 1.1 MB 00:08
################################################## 361/361
other.xml.gz 100% |=========================| 5.3 MB 00:42
################################################## 462/462
other.xml.gz 100% |=========================| 7.1 MB 00:15
################################################## 2400/2400
other.xml.gz 100% |=========================| 145 B 00:00
Reading repository metadata in from local files
Resolving Dependencies
–> Populating transaction set with selected packages. Please wait.
—> Downloading header for kernel to pack into transaction set.
kernel-2.6.18-53.1.14.el5 100% |=========================| 258 kB 00:00
—> Package kernel.i686 0:2.6.18-53.1.14.el5 set to be installed
–> Running transaction check

Changes in packages about to be updated:

kernel – 2.6.18-53.1.14.el5.i686
* Wed Mar 5 17:00:00 2008 Karanbir Singh
– Change gpg key to CentOS

* Tue Feb 19 17:00:00 2008 Anton Arapov [2.6.18-53.1.14.el5]
– merge from 2.6.18-53.1.13 to 2.6.18-53.1.12
– [nfs] potential file corruption issue when writing (Jeff Layton ) [432078]
– [ppc] chrp: fix possible strncmp NULL pointer usage (Vitaly Mayatskikh ) [396821]
– [isdn] i4l: fix memory overruns (Vitaly Mayatskikh ) [425171]
– [isdn] fix possible isdn_net buffer overflows (Aristeu Rozanski ) [392151] {CVE-2007-6063}
– [mm] hugepages: leak due to pagetable page sharing (Larry Woodman ) [431522]
– [net] NULL dereference in iwl driver (Vitaly Mayatskikh ) [401421] {CVE-2007-5938}
– [misc] Denial of service with wedged processes (Jerome Marchand ) [221403]
– [xen] ia64: hvm guest memory range checking (Jarod Wilson ) [408701]

….

This is an incredibly useful feature, especially if you are trying to track down when a specific bug was fixed by a given Linux distribution. Rock on!

SCSI Enclosure Services

Eric Schrock has done some really cool work with integrating disk (SMART) /platform monitoring (IPMI) information into Opensolaris.   Just recently, he has extended FMA with a new technology called SES (SCSI Enclosure Services) into build 93 of OpenSolaris.

This looks like some really cool stuff.  The following was taken directly from his blog on the examples of using the new fmtopo utility to map out an external storage array.

# /usr/lib/fm/fmd/fmtopo
...

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:serial=2029QTF0000000002:part=Storage-J4400:revision=3R13/ses-enclosure=0

hc://:product-id=SUN-Storage-J4400:chassis-id=22029QTF0809QCK012:server-id=:part=123-4567-01/ses-enclosure=0/psu=0

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:part=123-4567-01/ses-enclosure=0/psu=1

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/fan=0

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/fan=1

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/fan=2

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/fan=3

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=2029QTF0811RM0386:part=375-3584-01/ses-enclosure=0/controller=0

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=2029QTF0811RM0074:part=375-3584-01/ses-enclosure=0/controller=1

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/bay=0

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=5QD0PC3X:part=SEAGATE-ST37500NSSUN750G-0720A0PC3X:revision=3.AZK/ses-enclosure=0/bay=0/disk=0

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=/ses-enclosure=0/bay=1

...

# fmtopo -V '*/ses-enclosure=0/bay=0/disk=0'
TIME                 UUID
Jul 14 03:54:23 3e95d95f-ce49-4a1b-a8be-b8d94a805ec8

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=5QD0PC3X:part=SEAGATE-ST37500NSSUN750G-0720A0PC3X:revision=3.AZK/ses-enclosure=0/bay=0/disk=0
  group: protocol                       version: 1   stability: Private/Private
    resource          fmri      hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=5QD0PC3X:part=SEAGATE-ST37500NSSUN750G-0720A0PC3X:revision=3.AZK/ses-enclosure=0/bay=0/disk=0
    ASRU              fmri      dev:///:devid=id1,sd@TATA_____SEAGATE_ST37500NSSUN750G_0720A0PC3X_____5QD0PC3X____________//scsi_vhci/disk@gATASEAGATEST37500NSSUN750G0720A0PC3X5QD0PC3X
    label             string    SCSI Device  0
    FRU               fmri      hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0809QCK012:server-id=:serial=5QD0PC3X:part=SEAGATE-ST37500NSSUN750G-0720A0PC3X:revision=3.AZK/ses-enclosure=0/bay=0/disk=0
  group: authority                      version: 1   stability: Private/Private
    product-id        string    SUN-Storage-J4400
    chassis-id        string    2029QTF0809QCK012
    server-id         string
  group: io                             version: 1   stability: Private/Private
    devfs-path        string    /scsi_vhci/disk@gATASEAGATEST37500NSSUN750G0720A0PC3X5QD0PC3X
    devid             string    id1,sd@TATA_____SEAGATE_ST37500NSSUN750G_0720A0PC3X_____5QD0PC3X____________
    phys-path         string[]  [ /pci@0,0/pci10de,377@a/pci1000,3150@0/disk@1c,0 /pci@0,0/pci10de,375@f/pci1000,3150@0/disk@1c,0 ]
  group: storage                        version: 1   stability: Private/Private
    logical-disk      string    c0tATASEAGATEST37500NSSUN750G0720A0PC3X5QD0PC3Xd0
    manufacturer      string    SEAGATE
    model             string    ST37500NSSUN750G 0720A0PC3X
    serial-number     string    5QD0PC3X
    firmware-revision string       3.AZK
    capacity-in-bytes string    750156374016

dennis’ experience with opensolaris 2008.05

Dennis Clarke blogged about an introduction to opensolaris 2008.05, IPS, and how using ZFS (and beadm) as your root file system provides advantages with system upgrades and multiple root file systems.
Take a look at his blog post here if you haven’t yet seen IPS on opensolaris.  A lot of people are really glad to see the Solaris package / patch system being revamped as its needed some attention for some time.

Speaking of Dennis and opensolaris, if you haven’t ever performed a complete build, he has another post here showing the entire build process of opensolaris.

Thanks Dennis! Your excitement around opensolaris rocks.  And thanks for blastwave.  =)

Bash’s built in commands

If you’re a frequent user of the bash shell, I would suggest taking a peek at the GNU reference guide next time you have a chance.  There are a lot of cool built in functions/commands within bash that are pretty neat.  To get an idea of what these built in commands are:

$ ps
PID TTY          TIME CMD
15997 pts/2    00:00:00 bash
23625 pts/2    00:00:00 ps

$ which enable
/usr/bin/which: no enable in (/bin:/usr/bin:/usr/sbin:/sbin)

$ enable
enable .
enable :
enable [
enable alias
enable bg
enable bind
enable break
enable builtin
enable caller
enable cd
enable command
enable compgen
enable complete
enable continue
enable declare
enable dirs
enable disown
enable echo
enable enable
enable eval
enable exec
enable exit
enable export
enable false
enable fc
enable fg
enable getopts
enable hash
enable help
enable history
enable jobs
enable kill
enable let
enable local
enable logout
enable popd
enable printf
enable pushd
enable pwd
enable read
enable readonly
enable return
enable set
enable shift
enable shopt
enable source
enable suspend
enable test
enable times
enable trap
enable true
enable type
enable typeset
enable ulimit
enable umask
enable unalias
enable unset
enable wait

Some of these do have binaries within the /usr or /bin namespace, while others do not.  Bash’s internal built in definition of these commands is what actually gets executed…

$ which cd
/usr/bin/which: no cd in (/bin:/usr/bin:/usr/sbin:/sbin)
$ which echo
/bin/echo
$ which eval
/usr/bin/which: no eval in (/bin:/usr/bin:/usr/sbin:/sbin)
$ which exit
/usr/bin/which: no exit in (/bin:/usr/bin:/usr/sbin:/sbin)
$ which kill
/bin/kill

Some of these are pretty intuitive why they should be built-in commands (cd, exit, etc.)  Looking through the GNU reference guide describes what all of these do.  There is also a built-in called “help” which describes some of this without going to the GNU reference guide.

$ which help
/usr/bin/which: no help in (/bin:/usr/bin:/usr/sbin:/sbin)
$ help cd
cd: cd [-L|-P] [dir]
Change the current directory to DIR.  The variable $HOME is the
default DIR.  The variable CDPATH defines the search path for
the directory containing DIR.  Alternative directory names in CDPATH
are separated by a colon (:).  A null directory name is the same as
the current directory, i.e. `.’.  If DIR begins with a slash (/),
then CDPATH is not used.  If the directory is not found, and the
shell option `cdable_vars’ is set, then try the word as a variable
name.  If that variable has a value, then cd to the value of that
variable.  The -P option says to use the physical directory structure
instead of following symbolic links; the -L option forces symbolic links
to be followed.
I found the “bind” built in to be super useful.  Bash has shortcuts that allow you to move the cursor around, search history, etc. and I always find myself forgetting what the keyboard shortcut / sequence is.  When this happens, I just use the “bind” (not DNS!) built in to find what I’m looking for…

$ bind -p

Displays all “bindings” of keyboard shortcuts to variables.   Here, I want to see what keyboard shortcut I can use to search through my command history to find a command…

$ bind -p | grep reverse-search-history
“\C-r”: reverse-search-history

Ah, so its Ctrl-r..  Sure enough, hitting ctrl-r at a command prompt brings up the reverse search prompt through my history and I can look for any command…  Looking for the command “cat” seems to have found me looking through /etc/passwd.

$

(reverse-i-search)`cat’: cat /etc/passwd

Hitting ctrl-r multiple times once command is found will continue to search backwards through the history.

(reverse-i-search)`cat’: cat /proc/kallsyms | more

Reverse history search is pretty useful, but there are so many other neat features of bash that I find invaluable.  What if I have created some super long command with multiple pipes and I want to move the cursor to the beginning of the line?

$ bind -p | grep beginning-of-line
“\C-a”: beginning-of-line
“\eOH”: beginning-of-line
“\e[1~”: beginning-of-line
“\e[H”: beginning-of-line

Well, it looks like there are a few defined shortcuts, but ctrl-a seems to be a winner.  Sure enough..

$ cat /etc/passwd | awk -F: ‘{print $7}’ | sed ‘s,bin,mike,'[]

$ [c]at /etc/passwd | awk -F: ‘{print $7}’ | sed ‘s,bin,mike,’

Take a peek!  You might find yourself saving some keystrokes.

$ help bind
bind: bind [-lpvsPVS] [-m keymap] [-f filename] [-q name] [-u name] [-r keyseq] [-x keyseq:shell-command] [keyseq:readline-function or readline-command]
Bind a key sequence to a Readline function or a macro, or set
a Readline variable.  The non-option argument syntax is equivalent
to that found in ~/.inputrc, but must be passed as a single argument:
bind ‘”\C-x\C-r”: re-read-init-file’.
bind accepts the following options:
-m  keymap         Use `keymap’ as the keymap for the duration of this
command.  Acceptable keymap names are emacs,
emacs-standard, emacs-meta, emacs-ctlx, vi, vi-move,
vi-command, and vi-insert.
-l                 List names of functions.
-P                 List function names and bindings.
-p                 List functions and bindings in a form that can be
reused as input.
-r  keyseq         Remove the binding for KEYSEQ.
-x  keyseq:shell-command  Cause SHELL-COMMAND to be executed when
KEYSEQ is entered.
-f  filename       Read key bindings from FILENAME.
-q  function-name  Query about which keys invoke the named function.
-u  function-name  Unbind all keys which are bound to the named function.
-V                 List variable names and values
-v                 List variable names and values in a form that can
be reused as input.
-S                 List key sequences that invoke macros and their values
-s                 List key sequences that invoke macros and their values
in a form that can be reused as input.

Want to see if your current shell has history enabled, or some other key bash features?  The GNU Bash Reference guide has details on what all of these variables are.. Like checkwinsize.

$ shopt
cdable_vars     off
cdspell         off
checkhash       off
checkwinsize    on
cmdhist         on
dotglob         off
execfail        off
expand_aliases  on
extdebug        off
extglob         off
extquote        on
failglob        off
force_fignore   on
gnu_errfmt      off
histappend      off
histreedit      off
histverify      off
hostcomplete    on
huponexit       off
interactive_comments    on
lithist         off
login_shell     on
mailwarn        off
no_empty_cmd_completion off
nocaseglob      off
nocasematch     off
nullglob        off
progcomp        on
promptvars      on
restricted_shell        off
shift_verbose   off
sourcepath      on
xpg_echo        off
$ help shopt
shopt: shopt [-pqsu] [-o long-option] optname [optname…]
Toggle the values of variables controlling optional behavior.
The -s flag means to enable (set) each OPTNAME; the -u flag
unsets each OPTNAME.  The -q flag suppresses output; the exit
status indicates whether each OPTNAME is set or unset.  The -o
option restricts the OPTNAMEs to those defined for use with
`set -o’.  With no options, or with the -p option, a list of all
settable options is displayed, with an indication of whether or
not each is set.

checkwinsize
If set, Bash checks the window size after each command and, if
necessary, updates the values of LINES and COLUMNS.

Ah nice.  My shell wont send a SIGHUP to jobs while i’m logging out of an interactive shell.  =)