I recently debugged an issue where a host panicked with the following message: Apr 3 04:41:44 pluto.prefetch.com genunix: [ID 663943 kern.notice] Unrecoverable Machine-Check Exception These errors are typically generated due to CPU or memory faults, but on this specific machine nothing was being displayed when I checked the fault and errors logs. Upon closer inspection, [...]
Archive for 'Solaris Fault Management'
Eric Schrock has done some really cool work with integrating disk (SMART) /platform monitoring (IPMI) information into Opensolaris. Just recently, he has extended FMA with a new technology called SES (SCSI Enclosure Services) into build 93 of OpenSolaris. This looks like some really cool stuff. The following was taken directly from his blog on the [...]
If you have a relatively recent server, your machine most likely supports IPMI. One technology that makes IPMI extremely useful is the baseboard management controller (BMC), which is an out-of-band controller that monitors the health of your server platform. Health monitoring is accomplished by distributing sensors throughout the server, and feeding the data these sensors [...]
With the introduction of Solaris 10, the Solaris kernel was modified and userland tools were added to detect and report on hardware faults. The fault analysis is handled by the Solaris fault manager, which currently detects and responds (the kernel can retire memory pages, CPUs, etc. when it detects faulty hardware) to failures in AMD [...]
A while back I wrote a blog entry about the lack of SMART support in Solaris. Just recently, Eric Schrock added a FMA disk-transport diagnosis engine, which provides generic SMART monitoring as part of the base operating system. The disk-transport diagnosis engine currently only supports SATA disk drives, but SCSI support is right around the [...]
Gavin Maltby has an awesome blog entry about the FMA support that is presently in Nevada, and soon to be in Solaris 10 update 2: http://blogs.sun.com/roller/page/gavinm/20060315 I have written about FMA before, and still think it’s my favorite Solaris 10 feature.