Hardware redundancy

While reading through Mike Shapiro’s FMA presentation today, I came across two cool new hardware technologies. The first is FBDIMM, which Micron describes as:

Address/command soft errors can disrupt server performance and reliability. To help lessen their occurrence, Micron’s FBDIMMs incorporate an enhanced cyclic redundancy check (CRC) that provides greater data and address/command protection than traditional server modules.

Designers can also configure it to suit their particular applications. Providing an even greater defense, the bit lane fail-over correction feature identifies bad data paths and removes them from the operation. Together, these error detection methods dramatically reduce address/command soft errors.

The second technology is CPU chipkill, which Findany ISP describes as:

CHIPKILL – A technology developed by IBM for servers and other systems that demand high availability. It allows a computer motherboard and BIOS to detect problems with the computer’s memory and selectively disable problematic parts of the memory. Depending on the technology used, this technology may or may not require specialized memory chips.

Hopefully Fujitsu and Sun will integrate these technologies into their next generation APL server line.