I went to update the Nevada image on my jumpstart server today, and noticed that my loopback mount was performing terribly:
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
11.5 0.0 0.0 0.0 0.0 1.0 0.0 86.5 0 99 lofi1
Ouch! 11 reads per second causes the loopback device to become 100% busy. It turns out this is bug #6806627, which is fixed in Nevada build 112. I got around the issue by using a Linux host to copy the CD contents, and then running setup_install_server from that location. Now I should be able to live upgrade the host to something a bit more current.
I attempted to kickstart a server this week with multiple network interfaces, and received an anaconda error stating that it couldn’t find the kickstart configuration file. After a bit of debugging, I noticed that the host kickstarted from one of the interfaces on PCI bus 2, but Anaconda was attempting to use the interface on PCI bus 0. This made sense, since I PXE booted from the one NIC that had link, but anaconda discovered and tried to use all of the installed interfaces. To get around this issue, you can modify the ksdevice= variable a couple of ways:
Due to a bug, option number 3 didn’t work. Since only one of the three NICs connected to the server had link, I ended up using option #2 to install my servers. This worked pretty well, and with a bit of custom logic in my kickstart %post section, unattended installs are now working awesome!
My friend Mike sent me a link to a Linux predictive failure post, which describes using the mcelog utility to check for machine check exceptions (these are hardware faults registered by the CPU). The utility discussed in the post (mcelog) is pretty sweet, and provides a portion of the capabilities that are currently available in the Solaris FMA architecture. The mcelog utility ships with several distributions, and can also be installed from various network repositories:
$ yum install mcelog
$ rpm -q -a | grep mcelog
mcelog-0.7-1.22.fc6
The mcelog package will add an hourly cron job to /etc/cron.hourly to check for new MCEs. If mcelog locates a MCE, an entry similar to the following will be written to /var/log/mcelog:
$ less /var/log/mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC 1157b0af355f7d
MISC c008064f00000000 ADDR 40db12ae0
Northbridge Chipkill ECC error
Chipkill ECC syndrome = 7273
bit46 = corrected ecc error
bit59 = misc error valid
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 9c39c00072080a13 MCGSTATUS 0
If you would prefer to route fault messages to a central location for processing, you can add the “–syslog” option to the mcelog cron job. This is an awesome utility, and should simplify locating hardware errors (especially if this gets combined with memtest86+) on my various Linux hosts.
I have been doing a lot of work with KVM, and bumped into an odd issue when PXE booting a Fedora 10 KVM guest. The installer (anaconda) would progress to the point where it would try to locate and configure the network interface, but would spit out the following error and wait for operator input:
+--------+ Network Error +---------+
| |
| There was an error configuring |
| your network interface. |
| |
| +-------+ |
| | Retry | |
| +-------+ |
| |
| |
+----------------------------------+
After doing a bit of profiling, I noticed that the guest was having issues initializing the Realtek interface (this is the default interface type presented to KVM guests by QEMU). To see if a different interface type cured my problems, I changed the NIC model to an e1000 by adding the “model” tag to the KVM guest XML file:
<interface type='bridge'>
<mac address='54:52:00:53:20:02'/>
<source bridge='br0'/>
<target dev='vnet2'/>
<model type='e1000'/>
</interface>
Once I made this simple change, the PXE install worked flawlessly. This was a fun problem to debug, and I learned a TON about QEMU, Anaconda, libvirt and several Linux profiling tools in the process. Viva la problem resolution!
I’ve previously talked about creating Brocade aliases and zones, and wanted to discuss zone configurations in this post. Brocade zone configurations allow you to group one or more zones into an administrative unit, which you can then apply to a switch. Brocade has a number of commands that can be used to manage configurations, and they start with the string “cfg”:
cfgadd - Add a member to the configuration
cfgcopy - Copy a zone configuration
cfgcreate - Create a zone configuration
cfgdelete - Delete a zone configuration
cfgremove - Remove a member from a zone configuration
cfgrename - Rename a zone configuration
cfgshow - Print zone configuration
To create a new configuration, you can run the cfgcreate command with the name of the configuration to create, and an initial zone to place in the configuration:
Fabric1Switch1:admin>**cfgcreate "SANFabricOne", "CentOSNode1Zone1"**
Once the configuration is created, you can add additional zones using the cfgadd command:
Fabric1Switch1:admin> **cfgadd "SANFabricOne", "CentOSNode1Zone2"**
To ensure that your changes persistent through switch reboots, you can run cfgsave to write the configuration to flash memory:
Fabric1Switch1:admin> **cfgsave**
Starting the Commit operation...
0x102572c0 (tRcs): May 8 08:51:37
INFO ZONE-MSGSAVE, 4, cfgSave completes successfully.
cfgSave successfully completed
To view a configuration, you can run the cfgshow command:
Fabric1Switch1:admin> **cfgshow**
Defined configuration:
cfg: SANFabricOne
CentOSNode1Zone1; CentOSNode1Zone2; CentOSNode2Zone1;
CentOSNode2Zone2
zone: CentOSNode1Zone1
CentOSNode1Port1; NevadaPort1
zone: CentOSNode1Zone2
CentOSNode1Port2; NevadaPort2
zone: CentOSNode2Zone1
NevadaPort1; CentosNode2Port1
zone: CentOSNode2Zone2
NevadaPort2; CentosNode2Port2
alias: CentOSNode1Port1
21:00:00:1b:32:04:86:c3
alias: CentOSNode1Port2
21:01:00:1b:32:24:86:c3
alias: CentosNode2Port1
21:00:00:e0:8b:1d:f9:03
alias: CentosNode2Port2
21:01:00:e0:8b:3d:f9:03
alias: NevadaPort1
10:00:00:00:c9:3e:4c:eb
alias: NevadaPort2
10:00:00:00:c9:3e:4c:ea
Effective configuration:
cfg: SANFabricOne
zone: CentOSNode1Zone1
21:00:00:1b:32:04:86:c3
10:00:00:00:c9:3e:4c:eb
zone: CentOSNode1Zone2
21:01:00:1b:32:24:86:c3
10:00:00:00:c9:3e:4c:ea
zone: CentOSNode2Zone1
10:00:00:00:c9:3e:4c:eb
21:00:00:e0:8b:1d:f9:03
zone: CentOSNode2Zone2
10:00:00:00:c9:3e:4c:ea
21:01:00:e0:8b:3d:f9:03
Now you may notice in the output that there is a defined and effective configuration. The effective configuration contains the configuration that is currently running on the switch, and the defined configuration contains the configuration that is saved in flash. To make the configuration in flash effective, the cfgenable command needs to be run (this should be run after you make alias/switch/configuration changes and issue a cfgsave):
Fabric1Switch1:admin> **cfgenable "SANFabricOne"**
Starting the Commit operation...
0x1024f980 (tRcs): Apr 29 20:44:39
INFO ZONE-MSGSAVE, 4, cfgSave completes successfully.
cfgEnable successfully completed
Once the cfgenable runs, the effective configuration will be updated to match the configuration you have defined and saved. This completes this part of the Brocade series, and the final installation will cover switch backups and putting all the pieces together.