I have been knee deep this week debugging a rather complex DNS issue. I’ll do a full write up on that next week. While I was debugging the issue I needed to fire up tcpdump to watch the DNS queries from one of my authoritative servers to various servers on the Internet. What I noticed when I fed the data into wireshark were periods of time with no data, and I wasn’t quite sure why at first.
Based on what I could find on the tcpdump / BPF sites when tcpdump is busy processing existing data and is not able to take packets captured by BPF out of the queue fast enough, the kernel will drop them. If this occurs you will see the tcpdump message “ packets dropped by kernel” become non-zero:
tcpdump -w dns-capture.cap -s 1520 -ttt -vvv -i bond0 port 53
...... 9559 packets captured 12533 packets received by filter 2974 packets dropped by kernel
I started to do some digging to see why tcpdump couldn’t keep up, and after a bit of profiling I noticed that the program was spending an excessive amount of time resolving IPs to names. This processing was stalling the program from reading more data from the queue, and resulted in packets being dropped. Once I ran tcpdump with the “-n” (do not resolve IPs to names) option I no longer experienced this issue:
tcpdump -w dns-capture.cap -s 1520 -ttt -vvv -n -i bond0 port 53
..... 9339 packets captured 9339 packets received by filter 0 packets dropped by kernel
This stopped gaps from occurring in my wireshark display, and since wireshark can resolve IPs to names all was well. It’s really crazy how you can start debugging one issue and wind up debugging 3 - 4 more prior to solving the original problem. Debugging issues is definitely fun, and I guess it gives me plenty to write about. :) This past week I’ve learned more about the DNS protocol and the various implementations than I have in the past 10 years. It’s amazing how many cool nuggets of information are buried in the various DNS RFCs!