While performing some routine checks on one of the servers I support, I noticed numerous input errors on Gigabit Ethernet interface zero:
$ netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 959 0 959 0 0 0
ge0 1500 server1 server1 713548208 155599 680686711 0 0 0
Since this was a Sun server running Solaris 9, I fired up the kstat(1m) utility to find the cause of these errors:
$ kstat -m ge -i 0
module: ge instance: 0
name: ge0 class: net
After reviewing the kstat(1m) output I noticed that the rx_overflow value was well in excess of 150k. Since the word “overflow” is never a good sign, I started to research this issue by reading the manual page for gld(7D). This page contains descriptions for the generic LAN driver (gld) kstat values, but for some reason didn’t include a description for rx_overflow (the name is self-evident, but I wanted a definite answer). After a quick Google search I came across the following information in the the Sun Maximizing Performance of a Gigabit Ethernet NIC Interface blueprint:
“When rx_overflow is incrementing, packet processing is not keeping up with the packet arrival rate. If it is incrementing and no_free_rx_desc is not, this indicates that the PCI bus or SBus bus is presenting an issue to the flow of packets through the device. This could be because the ge card is plugged into a slower I/O bus. You can confirm the bus speed by looking at the pci_bus_speed statistic. An SBus bus speed of 40 MHz or a PCI bus speed of 33 MHz might not be sufficient to sustain full bidirectional one-gigabit Ethernet traffic. Another scenario that can lead to rx_overflow incrementing on its own is sharing the I/O bus with another device that has similar bandwidth requirements to those of the ge card.”
After reading through the blueprint, I used the blueprint’s advice and checked the no_free_rx_desc value. The no_free_rx_desc value was set to zero, so I again used the blueprint’s advice and checked the hardware configuration. I first reviewed the prtdiag(1m) output to get the server identification string, and then turned to the Sunsolve FE handbook. The handbook indicated that the that the PCI bus ran at a clock rate of 33MHZ, and the prtdiag(1m) output indiciated that the disk controller and GE adaptor shared the PCI bus. To ensure that disk I/O bandwidth wasn’t a problem, I fired up iostat(1m) ad monitored the number of bytes written per second. There was little I/O traffic, so it didn’t seem to be a bus congestion problem. Next I reviewed the recommended solutions in the blueprint:
1. Use DMA infinite burst capability mode by setting ge_dmaburst_mode in /etc/system. Since the machine uses an UltraSPARCIIi CPU, and DMA infinite burst mode is only applicable to UltraSPARC III or better CPUs, this solution won’t help us. Bummer!
2. Move the Gigabit Ethernet adaptor to a 66MHZ PCI slot. Since all of our slots are 33MHZ, this won’t help us either. Strike 2!
3. Move the Gigabit Ethernet adaptor to it’s own PCI bus. Since the machine we are using has a single PCI bus, we couldn’t use this option either. Youch!
Since these options weren’t applicable to our system, I started digging through our ORCA graphs to find the exact days and times when these errors occurred. After analyzing the graph for all of about 60-seconds, I realized the errors where occurring at the exact same time each week (Monday afternoons). This was the time our weekly backups had been configured to run, and this would definitely saturate all of the available bandwidth. Since the backups were being performed during a busy time of the day, I speculated that the CPU and PCI bus weren’t sufficient to push all of the backup and production traffic. Since the system is not super critical, I plan to fire up busstat(1m) next Monday to prove my theory. I also plan to do some reading to see why layer-2 flow control isn’t implemented. That should theoretically be the “right fix” for this problem.
More to come …