Debugging Solaris in.rarpd issues


While performing a routine jumpstart this week, one of my friends was receiving a steady stream of Timeouts while attempting to jumpstart a system:

{0} ok boot net - install
Boot device: /pci@83,4000/network@1,1 File and args: - install
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
[ ..... ]

He had used the jumpstart server earlier in the week to build a system, and was uncertain why the server he was jumpstarting couldn’t get an IP address. He asked me if I could take a look at the problem, so I fired up snoop on the jumpstart server to see if the RARP requests were reaching the server:

$ snoop -d hme0 ether 0:e0:0:c4:d8:e3

OLD-BROADCAST -> (broadcast) RARP C Who is 0:e0:0:c4:d8:e3 ?
OLD-BROADCAST -> (broadcast) RARP C Who is 0:e0:0:c4:d8:e3 ?
OLD-BROADCAST -> (broadcast) RARP C Who is 0:e0:0:c4:d8:e3 ?
OLD-BROADCAST -> (broadcast) RARP C Who is 0:e0:0:c4:d8:e3 ?

The requests were indeed getting to the server, but for some reason the server wasn’t sending anything back to the client. I double checked all the configuration files (e.g., /etc/bootparams, /etc/ethers, /etc/hosts), and then double checked that the host had valid entry in /etc/tftpboot. Everything appeared to be correct, so I fired up truss to watch what in.rarp was doing:

$ truss -leaf -x all -p 8192

8192/2: psargs: /usr/sbin/in.rarpd -a
8192/4: getmsg(5, 0xFEEFBF94, 0xFEEFB788, 0xFEEFAF84) (sleeping...)
8192/3: getmsg(4, 0xFEFFBF94, 0xFEFFB788, 0xFEFFAF84) (sleeping...)
8192/2: lwp_park(0, 0x00000000, 0) (sleeping...)
8192/4: getmsg(5, 0xFEEFBF94, 0xFEEFB788, 0xFEEFAF84) = 0
8192/4: open(0xFEDD49AC, 0) = 7
8192/4: 0xFEDD49AC: "/etc/ethers"
8192/4: fstat64(7, 0xFEEFA258) = 0
8192/4: fstat64(7, 0xFEEFA100) = 0
8192/4: ioctl(7, 0x00005401, 0xFEEFA1E4) Err#25 ENOTTY
8192/4: read(7, 0x0002697C, 8192) = 744
8192/4: 0x0002697C: " 8 : 0 : 2 0 : c 7 : 9 e"..
8192/4: llseek(7, 0xFFFFFFFFFFFFFFE5, 1) = 717
8192/4: close(7) = 0
8192/4: door(6, 0xFEEF83B0) = 0
8192/4: door(6, 0xFEEF8398) = 0
8192/4: getmsg(5, 0xFEEFBF94, 0xFEEFB788, 0xFEEFAF84) (sleeping...)
8192/4: getmsg(5, 0xFEEFBF94, 0xFEEFB788, 0xFEEFAF84) = 0
8192/4: open(0xFEDD49AC, 0) = 7
8192/4: 0xFEDD49AC: "/etc/ethers"
8192/4: fstat64(7, 0xFEEFA258) = 0
8192/4: fstat64(7, 0xFEEFA100) = 0
8192/4: ioctl(7, 0x00005401, 0xFEEFA1E4) Err#25 ENOTTY
8192/4: read(7, 0x0002697C, 8192) = 744
8192/4: 0x0002697C: " 8 : 0 : 2 0 : c 7 : 9 e"..

The truss output indicated that certain RARP requests were being processed by in.rarpd, but the MAC address of the system my friend was attempting to jumpstart was not showing up. This led me to believe that a cable was unplugged, or something had happened to the jumpstart interface. I checked the ifconfig(1m) output and several files in /etc, and immediately noticed that the IP address assigned to ge0 had changed. A quick check of the in.rarpd source code* revealed that each physical interface is initialized when in.rarpd starts, so when the IP address changed, in.rarpd knew nothing about the new network, and therefore ignored RARP requests for that network. Once I knew exactly what caused the problem, I restarted in.rarpd with the “-a” (process requests on all interfaces) option, and the client was able to receive an IP address. I thoroughly enjoy debugging problems like this, and I had a fun time learning how in.rarpd works! Wh00t wh00t!!

I will BLOG about my in.rarp source code disection in a later post.

This article was posted by Matty on 2005-10-24 12:00:00 -0400 -0400