Solaris reporting multiple devices sharing IRQ assignments


One of my co-workers this week was fighting disk failure on a Solaris 10 x86 host.  I was checking /var/adm/messages and came across something interesting.

Apr 11 03:29:21 sinatra.fatkitty.com nge: [ID 801725 kern.info] NOTICE:
nge1: Using FIXED interrupt type
Apr 11 03:29:21 sinatra.fatkitty.com unix: [ID 954099 kern.info] NOTICE:
IRQ20 is being shared by drivers with different interrupt levels.
Apr 11 03:29:21 sinatra.fatkitty.com This may result in reduced system
performance.
Apr 11 03:29:21 sinatra.fatkitty.com mac: [ID 469746 kern.info] NOTICE:
nge1 registered

Weird.  So, x86 hardware assigns different IRQ assignments to devices.  This usually happens at the BIOS level. Anyways, I was kind of curious as to what devices were sharing IRQ20.  We can invoke the kernel debugger and run a dcmd to find this out.

$ mdb -k

Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc
pcplusmp ufs ip hook neti sctp arp usba fcp fctl lofs zfs random nfs md
cpc fcip crypto logindmux ptm ]
> ::interrupts -d
IRQ  Vector IPL Bus   Type  CPU Share APIC/INT# Driver Name(s)
3    0xb0   12  ISA   Fixed 3   1     0x0/0x3   asy#1
4    0xb1   12  ISA   Fixed 3   1     0x0/0x4   asy#0
9    0x81   9   PCI   Fixed 1   1     0x0/0x9   acpi_wrapper_isr
16   0x60   6   PCI   Fixed 7   1     0x0/0x10  bge#0
17   0x62   6   PCI   Fixed 7   1     0x0/0x11  bge#1
20   0x63   6   PCI   Fixed 2   2     0x0/0x14  nge#1, nv_sata#0
21   0x20   1   PCI   Fixed 4   1     0x0/0x15  ehci#0
22   0x21   1   PCI   Fixed 5   1     0x0/0x16  ohci#0
23   0x61   6   PCI   Fixed 0   1     0x0/0x17  nge#0
24   0x82   7         MSI   6   1     -         pcie_pci#3
25   0x83   7         MSI   6   1     -         pcie_pci#3
160  0xa0   0         IPI   ALL 0     -         poke_cpu
192  0xc0   13        IPI   ALL 1     -         xc_serv
208  0xd0   14        IPI   ALL 1     -         kcpc_hw_overflow_intr
209  0xd1   14        IPI   ALL 1     -         cbe_fire
210  0xd3   14        IPI   ALL 1     -         cbe_fire
240  0xe0   15        IPI   ALL 1     -         xc_serv
241  0xe1   15        IPI   ALL 1     -         apic_error_intr

Bleh.  So, the SATA device and the nVidia gigabit NIC in port 1 are both sharing IRQ20.    We’re using nge1 on this host for IPMP.  Usually, we just stick with the Broadcom NICs, but, this is one of those one-off cases.  We wanted to use IPmP over the different chipsets to maximize redundancy on the host.

Anywho, the output above shows IRQ20 bound to CPU2, so we can poke at it with intrstat to see the interrupt activity happening on that processor.  Here, I invoke interstat to just output CPU2 over 10s intervals.

$ intrstat -c 2 10

device |      cpu2 %tim
-------------+---------------
asy#1 |         0  0.0
bge#0 |         0  0.0
ehci#0 |         0  0.0
nge#0 |         0  0.0
nge#1 |       101  1.0
nv_sata#0 |       101  0.0
device |      cpu2 %tim
-------------+---------------
asy#1 |         0  0.0
bge#0 |         0  0.0
ehci#0 |         0  0.0
nge#0 |         0  0.0
nge#1 |       158  1.5
nv_sata#0 |       158  0.0
ohci#0 |         0  0.0
device |      cpu2 %tim
-------------+---------------
asy#1 |         0  0.0
bge#0 |         0  0.0
ehci#0 |         0  0.0
nge#0 |         0  0.0
nge#1 |        99  1.0
nv_sata#0 |        99  0.0
ohci#0 |         0  0.0
device |      cpu2 %tim
-------------+---------------
asy#1 |         0  0.0
bge#0 |         0  0.0
ehci#0 |         0  0.0
nge#0 |         0  0.0
nge#1 |       108  1.0
nv_sata#0 |       108  0.0
ohci#0 |         0  0.0
device |      cpu2 %tim
-------------+---------------
asy#1 |         0  0.0
bge#0 |         0  0.0
ehci#0 |         0  0.0
nge#0 |         0  0.0
nge#1 |       160  1.5
nv_sata#0 |       160  0.0
ohci#0 |         0  0.0
^C

So, its really minimal interrupt activity going on here. The application on this host is completely based on ramdisk, and the primary IPmP NIC is on bge0, so both devices really aren’t competing for resources.  But, if nge1 was super busy and so was disk, this would be a bottleneck.  Anyways, I thought it was interesting.  I learned a new mdb dcmd and had one of those rare use cases for poking at intrstat.

This article was posted by Mike on 2010-04-14 12:22:00 -0400 -0400