Using dnscap to debug DNS problems on Linux hosts

DNS can often make a SysAdmins life difficult, since a misconfigured entry or a broken authoritative DNS server will cause things to fail in odd ways. If you are fortunate enough to use Linux on your servers and desktops, you have a slew of utilities available to look at problems. I’ve discussed a few of my favourite DNS debugging utilities in past posts, and recently added the dnscap utility to this list.

Dnscap is a command line utility that allows you to view ALL of the DNS requests sent over an interface in a dig-like or binary format. While tcpdump and company display traffic to UDP and TCP port 53, dnscap will actually decode the entries and give you everything you need to debug an issue in one place.

To use this super useful tool you can run it with the “-i” option, the interface to monitor along with the -g (dump the output in dig format) or “-b” (dump the output in binary) options:

$ dnscap -i eth0 -g

;@ 2011-01-26 16:33:21.892326 - 56 octets via eth0 (msg #0)
;: [192.168.144.91]:56239 -> [192.168.86.2]:53
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62131
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;;	google.com, type = A, class = IN
--
;@ 2011-01-26 16:33:21.896426 - 240 octets via eth0 (msg #1)
;: [192.168.86.2]:53 -> [192.168.144.91]:56239
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62131
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 4, ADDITIONAL: 4
;;	google.com, type = A, class = IN
google.com.		1m31s IN A	74.125.157.99
google.com.		1m31s IN A	74.125.157.104
google.com.		1m31s IN A	74.125.157.147
google.com.		1d6h57m32s IN NS  ns2.google.com.
google.com.		1d6h57m32s IN NS  ns3.google.com.
google.com.		1d6h57m32s IN NS  ns4.google.com.
google.com.		1d6h57m32s IN NS  ns1.google.com.
ns1.google.com.		1d6h51m10s IN A  216.239.32.10
ns2.google.com.		1d6h51m10s IN A  216.239.34.10

The utility will then display all of the DNS requests on your console, and you can review the detailed request / SOA data along with the record information. This is extremely handy for debugging problems, and I'm glad I came across this awesome little utility!

Configuring a caching only DNS server on Solaris hosts

While investigating a performance issue a few weeks back, I noticed that a couple of our Solaris hosts were sending 10s of thousands of DNS requests to our authoritative DNS servers. Since the application was broken and unable to cache DNS records, I decided to configure a local caching only DNS server to reduce load on our DNS servers.

Creating a caching only name server on a Solaris host is a piece of cake. To begin, you will need to create a directory to store the bind zone files:

$ mkdir -p /var/named/conf

After this directory is created, you will need to place the 127.0.0.1, localhost and root.hints file in the conf directory. You can grab the 127.0.0.1 and localhost files from my site, and the root.hints file can be generated with the dig utility:

$ dig @a.root-servers.net . ns > /var/named/conf/root.hints

Next you will need to create a BIND configuration file (a sample bind configuration file is also available on my site). The BIND packages that ship with Solaris check for this file in /etc/named.conf by default, so it’s easiest to create it there (you can also hack the SMF start script, but that can get overwritten in the future and wipe out your changes). To start the caching only DNS server, you can enable the dns/server SMF service:

$ svcadm enable dns/server

If things started up properly, you should see log entries similar to the following in /var/adm/messages:

Jun 18 10:26:57 server named[7819]: [ID 873579 daemon.notice] starting BIND 9.6.1-P3
Jun 18 10:26:57 server named[7819]: [ID 873579 daemon.notice] built with –prefix=/usr –with-libtool –bindir=/usr/sbin –sbindir=/usr/sbin –libdir=/usr/lib/dns –sysconfdir=/etc –localstatedir=/var –with-openssl=/usr/sfw –enable-threads=yes –enable-devpoll=yes –enable-fixed-rrset –disable-openssl-version-check -DNS_RUN_PID_DIR=0

To test the caching only DNS server, you can use our trusty friend dig:

$ dig @127.0.0.1 a cnn.com

If that returns the correct A record, it’s a safe bet that the caching only name server is doing its job! To configure the server to query the local DNS server, you will need to replace the nameserver entries in /etc/resolv.conf with the following:

nameserver 127.0.0.1

This will force resolution to the DNS server bound to localhost, and allow the local machine to cache query responses. DNS caching is good stuff, and setting this up on a Solaris machine is a piece of cake!

Getting DNS ping (aka nsping) to compile on Linux hosts

While debugging a DNS issue this week, I wanted to run my trusty old friend nsping on my Linux desktop. I grabbed the source from the FreeBSD source site, checked to make sure the bits were legit, then proceeded to compile it:

$ make
cc -g -c -o nsping.o nsping.c
In file included from nsping.c:13:
nsping.h:45: error: conflicting types for ‘dprintf’
/usr/include/stdio.h:399: note: previous declaration of ‘dprintf’ was here
nsping.c:613: error: conflicting types for ‘dprintf’
/usr/include/stdio.h:399: note: previous declaration of ‘dprintf’ was here
make: *** [nsping.o] Error 1

Erf! The source archive I downloaded didn’t compile, and from the error message it appears the function definition for dprintf conflicts with a function definition in libc. Instead of mucking around with map files, I changed all occurrences of dprintf to ddprintf. When I ran make again I got a new error:

$ make
cc -g -c -o nsping.o nsping.c
cc -g -c -o dns-lib.o dns-lib.c
dns-lib.c: In function ‘dns_query’:
dns-lib.c:22: warning: incompatible implicit declaration of built-in function ‘memset’
cc -g -c -o dns-rr.o dns-rr.c
dns-rr.c: In function ‘dns_rr_query’:
dns-rr.c:26: warning: return makes integer from pointer without a cast
cc -g -o nsping nsping.o dns-lib.o dns-rr.o
dns-rr.o: In function `dns_string’:
/home/matty/Download/nsping-0.8/dns-rr.c:63: undefined reference to `__dn_comp’
collect2: ld returned 1 exit status
make: *** [nsping] Error 1

This error message indicates that the dn_comp symbol couldn’t be resolved. This function typically resides in the resolver library, so working around this was as easy as assing “-lresolv” to the LIBS variable in the nsping Makefile:

LIBS = -lresolv

Once this change was made, everything compiled and ran flawlessly:

$ make
cc -g -o nsping nsping.o dns-lib.o dns-rr.o -lresolv

$ ./nsping -h
./nsping: option requires an argument — ‘h’
nsping [ -z | -h ] -p -t
-a -P
-T <-r | -R, recurse?>

Debugging this issue was a bunch of fun, and reading through the nsping source code was extremely educational. Not only did I learn more about how libresolv works, but I found out sys5 sucks:

#ifdef sys5
#warning “YOUR OPERATING SYSTEM SUCKS.”

You gotta love developers who are straight and to the point! ;)

Preventing domain expiration article

I just came across Rick Moen’s Preventing Domain Expiration article. Rick did a great job with the article, and it’s cool to see that they took my domain-check shell script and implemented it in Perl. The Perl version supports for TLDS, and contains a bit more functionality than the bash implementation. If I get some time in the next few months, I will have to update the domain-check bash script to support the same TLDs as the Perl implementation. Great job Rick and Ben!!

Logfile format for BIND queries

While perusing my BIND query logs, I came across the following entry:

Nov 21 12:34:41 dns named[780]: [ID 866145 local0.info] client 1.2.3.4#32773: query: yikes.com IN MX -E

All of the text up to the record type (MX in this case) made sense, but I had no idea what the “-E” meant. Being the curious person I am, I dug through the BIND source code to locate the logging code. After a couple of find statements, I was able to locate the logging code in query.c:

ns_client_log(client, NS_LOGCATEGORY_QUERIES, NS_LOGMODULE_QUERY,
                     level, "query: %s %s %s %s%s%s", namebuf, classname,
                     typename, WANTRECURSION(client) ? "+" : "-",
                     (client->signer != NULL) ? "S": "",
                     (client->opt != NULL) ? "E" : "");

So a “+” or “-” in a query log entry indicates that a client requested recursion, and the “E” means that the query requested EDNS0. I would like to thank Knobee for his feedback on this post.

Measuring DNS latency with nsping

While debugging a DNS problem a few weeks back, I needed a way to measure the time it took a name server to respond to a DNS request. After poking around the OpenBSD ports collection, I came across the nsping utility. Nsping queries a DNS server passed on the command line, and reports the time it took the server to resolve a name. The following example shows how to use nsping to measure the time it takes to resolve the name prefetch.net on the name server ns2.dreamhost.com:

$ nsping -t 5 -h prefetch.net ns2.dreamhost.com

NSPING ns2.dreamhost.com (66.201.54.66): Hostname = "prefetch.net", Type = "IN A"
+ [   0 ]    46 bytes from 66.201.54.66:   76.224 ms [    0.000 san-avg ]
+ [   1 ]    46 bytes from 66.201.54.66:   79.862 ms [   78.043 san-avg ]
+ [   3 ]    46 bytes from 66.201.54.66:   79.902 ms [   78.663 san-avg ]
+ [   4 ]    46 bytes from 66.201.54.66:   79.912 ms [   78.975 san-avg ]
+ [   6 ]    46 bytes from 66.201.54.66:   79.920 ms [   79.164 san-avg ]
^C
Total Sent: [   7 ] Total Received: [   5 ] Missed: [   2 ] Lagged [   0 ]
Ave/Max/Min:   79.164 /   79.920 /   76.224

Each line contains the size of the response, the time it took to complete the request, and a sequence number. The summary line contains the numer of requests that were sent to the server, the number that were missing, and the average, maximum and minimum response times. If you want to use a resource record type other than “A,” (the default resource record type) you can invoke nsping with the “-T” option the resource record type to use:

$ nsping -t 5 -h prefetch.net -T mx ns1.dreamhost.com

NSPING ns1.dreamhost.com (66.33.206.206): Hostname = "prefetch.net", Type = "IN MX"
+ [   0 ]   136 bytes from 66.33.206.206:   73.875 ms [    0.000 san-avg ]
+ [   1 ]   136 bytes from 66.33.206.206:   79.905 ms [   76.890 san-avg ]
+ [   2 ]   136 bytes from 66.33.206.206:   80.476 ms [   78.085 san-avg ]
+ [   3 ]   136 bytes from 66.33.206.206:   80.030 ms [   78.572 san-avg ]
+ [   6 ]   136 bytes from 66.33.206.206:   80.004 ms [   78.858 san-avg ]
^C
Total Sent: [   7 ] Total Received: [   5 ] Missed: [   2 ] Lagged [   0 ]
Ave/Max/Min:   78.858 /   80.476 /   73.875

Now to figure out why DNS responses are missing!