Monitoring the IPMI system event log

If you have a relatively recent server, your machine most likely supports IPMI. One technology that makes IPMI extremely useful is the baseboard management controller (BMC), which is an out-of-band controller that monitors the health of your server platform. Health monitoring is accomplished by distributing sensors throughout the server, and feeding the data these sensors collect back to the BMC. If the BMC detects a fault condition, it can log an error to the system event log.

The event log can be monitored by a number of IPMI software packages. Once such package is ipmitool, which provides the ipmievd daemon just for this purpose. If you are running a recent version of Solaris 10*, you probably already have the IPMI software and the ipmievd daemon installed. You can use the following commands to check:

$ pkginfo | grep ipmi
system SUNWipmi ipmitool, (usr) system SUNWipmir ipmitool, (root)

If the software is installed, you can use the svcadm utility to enable the ipmievd domain:

$ svcadm enable svc:/network/ipmievd:default

Once the ipmievd service is enabled, you can use the ps and svcs commands to verify that the daemon is running:

$ svcs -a | grep ipmi
online 0:25:52 svc:/network/ipmievd:default

$ ps -ef | grep ipmi
root 328 1 0 00:25:53 ? 0:01 /usr/lib/ipmievd sel

If the daemon starts up, it will periodically poll the BMC system event log. If ipmievd detects an error condition, it will log a message to syslog. This message will contain details on the fault, which can be used to help determine that a server is sick. Since FMA currently doesn’t do platform health monitoring (the sensor project will fix this), ipmievd is able to step in and fill this role for the time being. Nice!

This blog post assumes you are running Solaris 10 update 4 with patch 119765-06.

This article was posted by Matty on 2007-12-30 15:42:00 -0400 -0400