Creating Bind query log statistics with dnsrecon

A month or two back I was investigating a production issue and wanted to visualize our Bind query logs. The Bind statistics channel looked useful but there wasn’t enough data to help me troubleshoot my issue. In the spirit of software re-use I looked at a few opensource query log parsing utilities. The programs I found used MySQL and once again they didn’t have enough data to fit my needs. My needs were pretty simple:

– Summarize queries by record type
– Show the top # records requested
– Show the top # of clients querying the server
– Print DNS query histograms by minute
– Print DNS query histograms by hour
– Have an extended feature to list all of the clients querying a record
– Allow the records to be filtered by time periods

Instead of mucking around with these solutions I wrote dnsrecon. Dnsrecon takes one or more logs as an argument and produces a compact DNS query log report which can be viewed in a terminal window:

$ dnsrecon.py logs/* --histogram

Processing logfile ../logs/named.queries
Processing logfile ../logs/named.queries.0
Processing logfile ../logs/named.queries.1
Processing logfile ../logs/named.queries.2
Processing logfile ../logs/named.queries.3
Processing logfile ../logs/named.queries.4
Processing logfile ../logs/named.queries.5
Processing logfile ../logs/named.queries.6
Processing logfile ../logs/named.queries.7
Processing logfile ../logs/named.queries.8
Processing logfile ../logs/named.queries.9

Summary for 05-Nov-2016 10:31:36.230 - 08-Nov-2016 14:15:51.426

Total DNS_QUERIES processed : 9937837
  PTR    records requested : 6374013
  A      records requested : 3082344
  AAAA   records requested : 372332
  MX     records requested : 32593
  TXT    records requested : 23508
  SRV    records requested : 19815
  SOA    records requested : 19506
  NS     records requested : 6661
  DNSKEY records requested : 2286

Top  100  DNS names requested:
  prefetch.net : 81379
  sheldon.prefetch.net : 75244
  penny.prefetch.net : 54637
  ..... 

Top  100  DNS clients:
  blip.prefetch.net :  103680
  fmep.prefetch.net :  92486
  blurp.prefetch.net : 32456
  gorp.prefetch.net : 12324
  .....

Queries per minute:
  00: ******************* (149807)
  01: ******************* (149894)
  02: ******************************* (239495)
  03: *********************************************** (356239)
  04: ********************************************** (351916)
  05: ********************************************* (346121)
  06: ************************************************ (362635)
  07: ************************************************** (377293)
  08: ********************************************* (343376)
  .....

Queries per hour:
  00: ********* (325710)
  01: ********** (363579)
  02: ******** (304630)
  03: ******** (302274)
  04: ******** (296872)
  05: ******** (295430)
  .....

Over the course of my IT career I can’t recall how many times I’ve been asked IF a record is in use and WHO is using it. To help answer that question you can add the “–matrix” option to print domain names along with the names / IPs of the clients requesting those records. This produces a list similar to this:

prefetch.net
  |-- leonard.prefetch.net 87656
  |-- howard.prefetch.net 23456
  |-- bernadette.prefetch.net 3425

The top entry is the domain being requested and the entries below it are the clients asking questions about it. I’m looking to add the record type requested to the resolution matrix as well as –start and –end arguments to allow data to be summarized during a specific time period. Shoot me a pull request if you enhance the script or see a better way to do something.

Installing NVidia binary drivers on an Ubuntu 16.10 desktop

I recently upgraded my Ubuntu 16.10 desktop with an NVidia GeForce 730 graphics adapter. My experiences with the nouveau opensource driver haven’t been good so I decided to install the binary drivers from NVidia. Ubuntu makes this process INCREDIBLY easy. To install the latest NVidia drivers you can click the “Search your Computer” icon in the Unity menu, type in “Drivers” and then click additional drivers. That should present you with a screen similar to the following”

NVidia Binary Driver Installation Menu

Clicking “Using NDIVIA Binary Driver” and then “Apply Changes” should load the correct driver for your card. Once the driver loads and your X11 configs are updated you should be able to reboot into a desktop environment that is powered by the NVidia closed source graphics driver. So far it’s working well!

Getting more out of your Linux servers with moreutils

I accidentally came across the moreutils package a few years back and the day I did my life changed forever. This package contains some killer utilities and fill some nice gaps in the *NIX tool chain. Here is a list of the binaries in this package (descriptions taken from the man page of each utility):

chronic - runs a command quietly unless it fails
combine - combine sets of lines from two files using boolean operations
errno - look up errno names and descriptions
ifdata - get network interface info without parsing ifconfig output
ifne - Run command if the standard input is not empty
isutf8 - check whether files are valid UTF-8
lckdo - run a program with a lock held
mispipe - pipe two commands, returning the exit status of the first
pee - tee standard input to pipes
sponge - soak up standard input and write to a file
ts - timestamp input
vidir - edit directory
vipe - edit pipe
zrun - automatically uncompress arguments to command

I’m especially fond of errno, chronic and pee. But my favorite utilities have to be ifne and ts. Ifne is useful if you need to run a command if output is present. One such use is e-mailing someone if a monitoring program spits out an error:

$ hardware_monitor | mail -s “Problem detected with the hardare on `/bin/hostname` admins

The ts utility is just as useful. Say you have a program that randomly spits out lines of gik and you want to know when the lines of gik occurred. To get a timestamp you can pipe the programs output to ts:

$ gik_generator | ts
Nov 02 09:55:11 The world needs more cow bell!
Nov 02 09:55:12 The world needs more cow bell!
Nov 02 09:55:13 The world needs more cow bell!

I love coming across tools that make troubleshooting and shell scripting more enjoyable. Now we just need more cow bell!

Using Linux auditing to observe program executions and file accesses

Linux systems have a rather nice auditing framework built into the kernel. This framework consists of a kernel component that inspects and filters events and a userland daemon (auditd) that takes the events and archives them to persistent storage. These audit log events can then be searched and summarized with the ausearch and aureport utilities. The following ausearch command will dump all audit records for UID 7005:

$ ausearch --uid-all 7005 | head -4
time->Thu Oct 20 15:01:49 2016
type=PROCTITLE msg=audit(1476990109.442:153031): proctitle="-bash"
type=SYSCALL msg=audit(1476990109.442:153031): arch=c000003e syscall=1 success=yes exit=1024 a0=1 a1=55b32ff2fe40 a2=400 a3=6553726567616e61 items=0 ppid=18901 pid=19071 auid=7005 uid=7005 gid=7005 euid=7005 suid=7005 fsuid=7005 egid=7005 sgid=7005 fsgid=7005 tty=pts9 ses=6031 comm="ps" exe="/usr/bin/ps" key=(null)

Each audit event contains the type of event (system call executed, file read, etc.) as well as a bunch of user and system context. To control the types of events that get audited you can create an audit.rules file in /etc/audit/rules.d. This file can contain control rules, file system rules and system call rules. Control rules are used to control the behavior of the audit subsystem (e.g., what to do when the disk fills up, how many audit buffers to allocate, etc.). File system rules allow you to audit when files or directories are accessed and system call rules allow you to audit system calls (full list in /usr/include/asm/unistd_64.h).

To illustrate how useful auditing is lets say you are tasked with figuring out what a black box application does. Furthermore, lets assume this application runs as the user fmep with the UID 7005. To log EVERY system call generated by processes with UID 7005 we can add the following audit rule (**WARNING** DO NOT ENABLE THIS ON PRODUCTION SYSTEMS WITHOUT PROPER TESTING ***END WARNING***):

$ auditctl -a always,exit -S all -F auid=7005

The “-a” option tells auditd to append the rule to the chain, “always,exit” audits system call events at exit, “-S all” audits all system calls and the “-F auid=7005” filters events where the loginuid is set to 7005 (more info on AUID is available here). To verify the rule was added we can run auditctl with the “-l” option:

$ auditctl -l
-a always,exit -S all -F auid=7005

Once the rules are activated you will start seeing events similar to the following in /var/log/audit/audit.log:

type=SYSCALL msg=audit(1476990167.749:160759): arch=c000003e syscall=14 success=yes exit=0 a0=2 a1=7ffe53ea8fc0 a2=0 a3=8 items=0 ppid=18884 pid=18900 auid=7005 uid=7005 gid=7005 euid=7005 suid=7005 fsuid=7005 egid=7005 sgid=7005 fsgid=7005 tty=(none) ses=6031 comm=”sshd” exe=”/usr/sbin/sshd” key=(null)

Each system call audit event contains the system call number as well as a good deal of context (pid, uid, ppid, etc.) to describe the event. To better understand the list of commands and files being accessed by a given UID I wrote auditloginfo. This script parses the audit log and produces a report listing the commands executed, files accessed and a summary of the system calls made:

$  sudo auditloginfo.py --uid 7005
Audit report for AUID 7005

System calls made:
   read 207
   stat 179
   close 147
   open 142
   rt_sigprocmask 141
   clock_gettime 120
   write 88
   
Commands executed:
   clear  2
   /bin/basename bash  2
   uname -m  1
   ps -ef  1
   ls --color=auto -la  1

Files opened:
   /etc/ld.so.cache 17
   /lib64/libc.so.6 15
   /etc/passwd 14
   /usr/lib/locale/locale-archive 12
   /lib64/libpthread.so.0 8

I’ve found auditing rather useful and it’s amazing what you can discover when you enable it.

Tracking Linux UIDs across process creation

When you support mission critical systems it is critically important to know WHAT is running on your systems and WHO is running it. In the past it was sometimes a chore to tell which UID ran a program. This was especially true in the early days when programs fork()’ed and execve()’ed after escalating privileges.

The which UID ran a program mystery has been mostly solved (I say mostly because user namespaces throw a few kinks into the equation) by auditing and the PAM loginuid module. PAM loginuid helps solve this problem by maintaining a /proc/PID/loginuid entry with the UID of the user that interacted with a service (e.g., ssh) that is configured to use pam_loginuid. You can get a list of PAM loginuid’ed services with grep:

$ grep --files-with-matches loginuid /etc/pamd/* | paste - - - - -
atd	crond	gdm-autologin	gdm-fingerprint	gdm-password
gdm-pin	gdm-smartcard	lightdm	lightdm-autologin	login
lxdm	pluto	remote	sshd	systemd-user

Once the loginuid is set the original UID will follow process creation:

$ id
uid=1000(matty) gid=1000(matty) .....

$ cat /proc/$$/loginuid 
1000

$ sudo -i

$ id
uid=0(root) gid=0(root) groups=0(root)

# cat /proc/$$/loginuid 
1000

# bash

# cat /proc/$$/loginuid 
1000

Now this gets especially interesting when you enable auditing. If auditing is enabled and privileges are escalated each event will contain an AUID with the value of the actual user that ran a privileged command:

type=SYSCALL msg=audit(1477500276.323:92205): arch=c000003e syscall=231 a0=1 a1=3c a2=1 a3=3 items=0 ppid=30493 pid=30544 auid=7005 uid=7005 gid=7005 euid=7005 suid=7005 fsuid=7005 egid=7005 sgid=7005 fsgid=7005 tty=pts1 ses=1196 comm=”dmidecode” exe=”/usr/sbin/dmidecode” key=(null)

No more seeing uid=0 and having to track down the underlying user or application by digging through logs. This is also incredibly useful when conducting RCAs and doing postmordems. To learn more about this feature check out pam_loginuid(8), the pam_loginuid source and the following link from the samhein team.

Displaying device multi-path statistics on Linux hosts

Linux provides a number of useful utilities for viewing I/O statistics. Atop, iostat, iotop, etc. are extremely useful utilities but they don’t allow you to easily see I/O statistics for each path to a mapper device. To illustrate this lets say you have a mapper device named mpatha with four paths (represented by sdcl, sdnf, sdeq and sdpk in the output below):

$ multipath -ll mpatha
mpatha (12345456789) dm-95 MYARRAY, FastBizzle
size=4.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 1:0:1:25  sdcl 69:144  active ready running
  |- 3:0:2:25  sdnf 71:272  active ready running
  |- 1:0:2:25  sdeq 129:32  active ready running
  `- 3:0:3:25  sdpk 130:416 active ready running

To visualize the amount of I/O traversing down each path to a mapper device we need to combine iostat and grep:

$ iostat -xN 1 | egrep '(mpatha|sdcl|sdnf|sdeq|sdpk)'
sdcl              0.00     0.00    0.20    0.04     2.06     7.35    39.19     0.00    0.88    0.56    2.57   0.86   0.02
sdnf              0.00     0.00    0.20    0.04     2.43     7.09    39.66     0.00    0.81    0.45    2.68   0.79   0.02
sdeq              0.00     0.00    0.20    0.04     1.99     7.06    37.68     0.00    0.90    0.59    2.57   0.88   0.02
sdpk              0.00     0.00    0.20    0.04     1.93     7.03    37.33     0.00    0.79    0.45    2.63   0.77   0.02
mpatha     0.06     0.00    0.52    0.15     5.96    28.53    51.28     0.00    0.98    0.49    2.67   0.86   0.06

This solution requires you to look up the device you want to monitor and the default output is terrible! I wanted a cleaner solution so I wrote mpathstat. Mpathstat parses ‘multipath -ll’ to get the mapper devices on a system as well as the block devices that represent the paths to this device. The script then iterates over an iostat sample and combines everything in one location:

$ mpathstat.py 1
Device Name                                  Reads     Writes    KBytesR/S  KBytesW/S  Await   
mpatha                                       0.00      44.00     0.00       912.00     0.82    
|- sdcj                                      0.00      11.00     0.00       256.00     0.73    
|- sdpi                                      0.00      11.00     0.00       224.00     0.91    
|- sdeo                                      0.00      11.00     0.00       192.00     0.73    
|- sdnd                                      0.00      11.00     0.00       240.00     0.91    
mpathb                                       0.00      3.00      0.00       384.00     1.00    
|- sdmv                                      0.00      1.00      0.00       128.00     1.00    
|- sdcb                                      0.00      0.00      0.00       0.00       0.00    
|- sdpa                                      0.00      1.00      0.00       128.00     1.00    
|- sdeg                                      0.00      1.00      0.00       128.00     1.00    

This has helped me track down a couple of issues with one of our Enterprise storage arrays. If you make any enhancements to the script please send me a pull request on github!

« Older Entries