One of my co-workers this week was fighting disk failure on a Solaris 10 x86 host. I was checking /var/adm/messages and came across something interesting.
Apr 11 03:29:21 sinatra.fatkitty.com nge: [ID 801725 kern.info] NOTICE:
nge1: Using FIXED interrupt type
Apr 11 03:29:21 sinatra.fatkitty.com unix: [ID 954099 kern.info] NOTICE:
IRQ20 is being shared by drivers with different interrupt levels.
Apr 11 03:29:21 sinatra.fatkitty.com This may result in reduced system
performance.
Apr 11 03:29:21 sinatra.fatkitty.com mac: [ID 469746 kern.info] NOTICE:
nge1 registered
Weird. So, x86 hardware assigns different IRQ assignments to devices. This usually happens at the BIOS level. Anyways, I was kind of curious as to what devices were sharing IRQ20. We can invoke the kernel debugger and run a dcmd to find this out.
$ mdb -k
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc
pcplusmp ufs ip hook neti sctp arp usba fcp fctl lofs zfs random nfs md
cpc fcip crypto logindmux ptm ]
> ::interrupts -d
IRQ Vector IPL Bus Type CPU Share APIC/INT# Driver Name(s)
3 0xb0 12 ISA Fixed 3 1 0x0/0x3 asy#1
4 0xb1 12 ISA Fixed 3 1 0x0/0x4 asy#0
9 0x81 9 PCI Fixed 1 1 0x0/0x9 acpi_wrapper_isr
16 0x60 6 PCI Fixed 7 1 0x0/0x10 bge#0
17 0x62 6 PCI Fixed 7 1 0x0/0x11 bge#1
20 0x63 6 PCI Fixed 2 2 0x0/0x14 nge#1, nv_sata#0
21 0x20 1 PCI Fixed 4 1 0x0/0x15 ehci#0
22 0x21 1 PCI Fixed 5 1 0x0/0x16 ohci#0
23 0x61 6 PCI Fixed 0 1 0x0/0x17 nge#0
24 0x82 7 MSI 6 1 - pcie_pci#3
25 0x83 7 MSI 6 1 - pcie_pci#3
160 0xa0 0 IPI ALL 0 - poke_cpu
192 0xc0 13 IPI ALL 1 - xc_serv
208 0xd0 14 IPI ALL 1 - kcpc_hw_overflow_intr
209 0xd1 14 IPI ALL 1 - cbe_fire
210 0xd3 14 IPI ALL 1 - cbe_fire
240 0xe0 15 IPI ALL 1 - xc_serv
241 0xe1 15 IPI ALL 1 - apic_error_intr
Bleh. So, the SATA device and the nVidia gigabit NIC in port 1 are both sharing IRQ20. We’re using nge1 on this host for IPMP. Usually, we just stick with the Broadcom NICs, but, this is one of those one-off cases. We wanted to use IPmP over the different chipsets to maximize redundancy on the host.
Anywho, the output above shows IRQ20 bound to CPU2, so we can poke at it with intrstat to see the interrupt activity happening on that processor. Here, I invoke interstat to just output CPU2 over 10s intervals.
$ intrstat -c 2 10
device | cpu2 %tim
-------------+---------------
asy#1 | 0 0.0
bge#0 | 0 0.0
ehci#0 | 0 0.0
nge#0 | 0 0.0
nge#1 | 101 1.0
nv_sata#0 | 101 0.0
device | cpu2 %tim
-------------+---------------
asy#1 | 0 0.0
bge#0 | 0 0.0
ehci#0 | 0 0.0
nge#0 | 0 0.0
nge#1 | 158 1.5
nv_sata#0 | 158 0.0
ohci#0 | 0 0.0
device | cpu2 %tim
-------------+---------------
asy#1 | 0 0.0
bge#0 | 0 0.0
ehci#0 | 0 0.0
nge#0 | 0 0.0
nge#1 | 99 1.0
nv_sata#0 | 99 0.0
ohci#0 | 0 0.0
device | cpu2 %tim
-------------+---------------
asy#1 | 0 0.0
bge#0 | 0 0.0
ehci#0 | 0 0.0
nge#0 | 0 0.0
nge#1 | 108 1.0
nv_sata#0 | 108 0.0
ohci#0 | 0 0.0
device | cpu2 %tim
-------------+---------------
asy#1 | 0 0.0
bge#0 | 0 0.0
ehci#0 | 0 0.0
nge#0 | 0 0.0
nge#1 | 160 1.5
nv_sata#0 | 160 0.0
ohci#0 | 0 0.0
^C
So, its really minimal interrupt activity going on here. The application on this host is completely based on ramdisk, and the primary IPmP NIC is on bge0, so both devices really aren’t competing for resources. But, if nge1 was super busy and so was disk, this would be a bottleneck. Anyways, I thought it was interesting. I learned a new mdb dcmd and had one of those rare use cases for poking at intrstat.