Using Linux auditing to observe program executions and file accesses

Linux systems have a rather nice auditing framework built into the kernel. This framework consists of a kernel component that inspects and filters events and a userland daemon (auditd) that takes the events and archives them to persistent storage. These audit log events can then be searched and summarized with the ausearch and aureport utilities. The following ausearch command will dump all audit records for UID 7005:

$ ausearch --uid-all 7005 | head -4
time->Thu Oct 20 15:01:49 2016
type=PROCTITLE msg=audit(1476990109.442:153031): proctitle="-bash"
type=SYSCALL msg=audit(1476990109.442:153031): arch=c000003e syscall=1 success=yes exit=1024 a0=1 a1=55b32ff2fe40 a2=400 a3=6553726567616e61 items=0 ppid=18901 pid=19071 auid=7005 uid=7005 gid=7005 euid=7005 suid=7005 fsuid=7005 egid=7005 sgid=7005 fsgid=7005 tty=pts9 ses=6031 comm="ps" exe="/usr/bin/ps" key=(null)

Each audit event contains the type of event (system call executed, file read, etc.) as well as a bunch of user and system context. To control the types of events that get audited you can create an audit.rules file in /etc/audit/rules.d. This file can contain control rules, file system rules and system call rules. Control rules are used to control the behavior of the audit subsystem (e.g., what to do when the disk fills up, how many audit buffers to allocate, etc.). File system rules allow you to audit when files or directories are accessed and system call rules allow you to audit system calls (full list in /usr/include/asm/unistd_64.h).

To illustrate how useful auditing is lets say you are tasked with figuring out what a black box application does. Furthermore, lets assume this application runs as the user fmep with the UID 7005. To log EVERY system call generated by processes with UID 7005 we can add the following audit rule (**WARNING** DO NOT ENABLE THIS ON PRODUCTION SYSTEMS WITHOUT PROPER TESTING ***END WARNING***):

$ auditctl -a always,exit -S all -F auid=7005

The “-a” option tells auditd to append the rule to the chain, “always,exit” audits system call events at exit, “-S all” audits all system calls and the “-F auid=7005” filters events where the loginuid is set to 7005 (more info on AUID is available here). To verify the rule was added we can run auditctl with the “-l” option:

$ auditctl -l
-a always,exit -S all -F auid=7005

Once the rules are activated you will start seeing events similar to the following in /var/log/audit/audit.log:

type=SYSCALL msg=audit(1476990167.749:160759): arch=c000003e syscall=14 success=yes exit=0 a0=2 a1=7ffe53ea8fc0 a2=0 a3=8 items=0 ppid=18884 pid=18900 auid=7005 uid=7005 gid=7005 euid=7005 suid=7005 fsuid=7005 egid=7005 sgid=7005 fsgid=7005 tty=(none) ses=6031 comm=”sshd” exe=”/usr/sbin/sshd” key=(null)

Each system call audit event contains the system call number as well as a good deal of context (pid, uid, ppid, etc.) to describe the event. To better understand the list of commands and files being accessed by a given UID I wrote auditloginfo. This script parses the audit log and produces a report listing the commands executed, files accessed and a summary of the system calls made:

$  sudo --uid 7005
Audit report for AUID 7005

System calls made:
   read 207
   stat 179
   close 147
   open 142
   rt_sigprocmask 141
   clock_gettime 120
   write 88
Commands executed:
   clear  2
   /bin/basename bash  2
   uname -m  1
   ps -ef  1
   ls --color=auto -la  1

Files opened:
   /etc/ 17
   /lib64/ 15
   /etc/passwd 14
   /usr/lib/locale/locale-archive 12
   /lib64/ 8

I’ve found auditing rather useful and it’s amazing what you can discover when you enable it.

Tracking Linux UIDs across process creation

When you support mission critical systems it is critically important to know WHAT is running on your systems and WHO is running it. In the past it was sometimes a chore to tell which UID ran a program. This was especially true in the early days when programs fork()’ed and execve()’ed after escalating privileges.

The which UID ran a program mystery has been mostly solved (I say mostly because user namespaces throw a few kinks into the equation) by auditing and the PAM loginuid module. PAM loginuid helps solve this problem by maintaining a /proc/PID/loginuid entry with the UID of the user that interacted with a service (e.g., ssh) that is configured to use pam_loginuid. You can get a list of PAM loginuid’ed services with grep:

$ grep --files-with-matches loginuid /etc/pamd/* | paste - - - - -
atd	crond	gdm-autologin	gdm-fingerprint	gdm-password
gdm-pin	gdm-smartcard	lightdm	lightdm-autologin	login
lxdm	pluto	remote	sshd	systemd-user

Once the loginuid is set the original UID will follow process creation:

$ id
uid=1000(matty) gid=1000(matty) .....

$ cat /proc/$$/loginuid 

$ sudo -i

$ id
uid=0(root) gid=0(root) groups=0(root)

# cat /proc/$$/loginuid 

# bash

# cat /proc/$$/loginuid 

Now this gets especially interesting when you enable auditing. If auditing is enabled and privileges are escalated each event will contain an AUID with the value of the actual user that ran a privileged command:

type=SYSCALL msg=audit(1477500276.323:92205): arch=c000003e syscall=231 a0=1 a1=3c a2=1 a3=3 items=0 ppid=30493 pid=30544 auid=7005 uid=7005 gid=7005 euid=7005 suid=7005 fsuid=7005 egid=7005 sgid=7005 fsgid=7005 tty=pts1 ses=1196 comm=”dmidecode” exe=”/usr/sbin/dmidecode” key=(null)

No more seeing uid=0 and having to track down the underlying user or application by digging through logs. This is also incredibly useful when conducting RCAs and doing postmordems. To learn more about this feature check out pam_loginuid(8), the pam_loginuid source and the following link from the samhein team.

Displaying device multi-path statistics on Linux hosts

Linux provides a number of useful utilities for viewing I/O statistics. Atop, iostat, iotop, etc. are extremely useful utilities but they don’t allow you to easily see I/O statistics for each path to a mapper device. To illustrate this lets say you have a mapper device named mpatha with four paths (represented by sdcl, sdnf, sdeq and sdpk in the output below):

$ multipath -ll mpatha
mpatha (12345456789) dm-95 MYARRAY, FastBizzle
size=4.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 1:0:1:25  sdcl 69:144  active ready running
  |- 3:0:2:25  sdnf 71:272  active ready running
  |- 1:0:2:25  sdeq 129:32  active ready running
  `- 3:0:3:25  sdpk 130:416 active ready running

To visualize the amount of I/O traversing down each path to a mapper device we need to combine iostat and grep:

$ iostat -xN 1 | egrep '(mpatha|sdcl|sdnf|sdeq|sdpk)'
sdcl              0.00     0.00    0.20    0.04     2.06     7.35    39.19     0.00    0.88    0.56    2.57   0.86   0.02
sdnf              0.00     0.00    0.20    0.04     2.43     7.09    39.66     0.00    0.81    0.45    2.68   0.79   0.02
sdeq              0.00     0.00    0.20    0.04     1.99     7.06    37.68     0.00    0.90    0.59    2.57   0.88   0.02
sdpk              0.00     0.00    0.20    0.04     1.93     7.03    37.33     0.00    0.79    0.45    2.63   0.77   0.02
mpatha     0.06     0.00    0.52    0.15     5.96    28.53    51.28     0.00    0.98    0.49    2.67   0.86   0.06

This solution requires you to look up the device you want to monitor and the default output is terrible! I wanted a cleaner solution so I wrote mpathstat. Mpathstat parses ‘multipath -ll’ to get the mapper devices on a system as well as the block devices that represent the paths to this device. The script then iterates over an iostat sample and combines everything in one location:

$ 1
Device Name                                  Reads     Writes    KBytesR/S  KBytesW/S  Await   
mpatha                                       0.00      44.00     0.00       912.00     0.82    
|- sdcj                                      0.00      11.00     0.00       256.00     0.73    
|- sdpi                                      0.00      11.00     0.00       224.00     0.91    
|- sdeo                                      0.00      11.00     0.00       192.00     0.73    
|- sdnd                                      0.00      11.00     0.00       240.00     0.91    
mpathb                                       0.00      3.00      0.00       384.00     1.00    
|- sdmv                                      0.00      1.00      0.00       128.00     1.00    
|- sdcb                                      0.00      0.00      0.00       0.00       0.00    
|- sdpa                                      0.00      1.00      0.00       128.00     1.00    
|- sdeg                                      0.00      1.00      0.00       128.00     1.00    

This has helped me track down a couple of issues with one of our Enterprise storage arrays. If you make any enhancements to the script please send me a pull request on github!

Making Python code more readable with list comprehensions

This post is added as a reference for myself.

As I mentioned previously I’ve been spending a good deal of time learning how to write efficient and readable Python code. Jeff Knupp’s Idiomatic Python videos are absolutely incredible and have definitely helped me become a better programmer. In his first video he turned me on to list comprehensions and how they can make code more readable. Prior to watching this video I had a tendency to use multi line iterators similar to this:

def lower_a_list(list):
    ml = []
    for word in list:
    return ml

After watching Jeff do some crazy awesome python sorcery on I’ve started to use list comprehensions. Here is a refactored lower_a_list() that uses list comprehensions:

def lower_a_list(list):
    return [word.lower() for word in list]

I find the second much more readable and you can get the gist of what’s going on by scanning from right to left. Huge thanks to Jeff for putting out this amazing series!

Making sense of docker storage drivers

Docker has a pluggable storage architecture which currently contains 6 drivers.

AUFS - Original docker storage driver.
OverlayFS Driver built on top of overlayfs.
Btrfs Driver built on top of brtfs.
Device Mapper Driver built on top of the device mapper.
ZFS Driver built on top of the ZFS file system.
VFS A VFS-layer driver that isn't considered suitable for production. 

If you have docker installed you can run ‘docker info’ to see which driver you are using:

$ docker info | grep “Storage Driver:”
Storage Driver: devicemapper

Picking the right driver isn’t straightforward due to how fast docker and the storage drivers are evolving. The docker documentation has some excellent suggestions and you can’t go wrong using the most widely used drivers. I have hit a couple of bugs with the overlayfs driver and I have never bothered with the devicemapper driver with loopback files (vs. the device mapper driver w/ direct LVM) because of Jason’s post.

My biggest storage lesson learned (i.e., I do this because I hit bugs) from the past year is to give docker a chunk of dedicated storage. This space can reside in your root volume group, a dedicated volume group or in a partition. To use a dedicated volume group you can add “VG=VOLUME_GROUP” to /etc/sysconfig/docker-storage-setup:

$ cat /etc/sysconfig/docker-storage-setup

To use a dedicate disk you can add “DEV=BLOCK_DEVICE” to /etc/sysconfig/docker-storage-setup:

$ cat /etc/sysconfig/docker-storage-setup

If either of these variables are set docker-storage-setup will create an LVM thin pool which docker will use to layer images. This layering is the foundation that docker containers are built on top of.

If you change VG or DEVS and docker is operational you will need to backup up your images, clean up /var/lib/docker and then run docker-storage-setup to apply the changes. The following shows what happens if docker-storage-setup is run w/o any options set:

$ docker-storage-setup
  Rounding up size to full physical extent 412.00 MiB
  Logical volume "docker-poolmeta" created.
  Logical volume "docker-pool" created.
  Converted docker/docker-pool to thin pool.
  Logical volume docker/docker-pool changed.

This create the data and metadata volumes in the root volume group and updates the docker configuration. If anyone is using the brtfs or zfs storage drivers shoot me a note to let me know what your experience has been.

Viewing Linux tape drive statistics with tapestat

A while back I wrote a blog entry showing how to get tape drives statistics with systemtap. This script wasn’t very reliable and I would frequently see it crash after collecting just a few samples. Due to the work of some amazing Linux kernel engineers I no longer have to touch systemtap. Recent Linux kernels now expose a number of incredibly useful statistics through the /sys file system:

$ pwd

$ ls -l
total 0
-r--r--r-- 1 root root 4096 Oct 10 16:15 in_flight
-r--r--r-- 1 root root 4096 Oct 10 16:15 io_ns
-r--r--r-- 1 root root 4096 Oct 10 16:15 other_cnt
-r--r--r-- 1 root root 4096 Oct 10 15:30 read_byte_cnt
-r--r--r-- 1 root root 4096 Oct 10 15:30 read_cnt
-r--r--r-- 1 root root 4096 Oct 10 16:15 read_ns
-r--r--r-- 1 root root 4096 Oct 10 16:15 resid_cnt
-r--r--r-- 1 root root 4096 Oct 10 15:30 write_byte_cnt
-r--r--r-- 1 root root 4096 Oct 10 15:30 write_cnt
-r--r--r-- 1 root root 4096 Oct 10 16:15 write_ns

There is also a tapestats utility in the syststat package that can be used to summarize these statistics:

$ tapestat -z 1

Linux 2.6.32-642.1.1.el6.x86_64 (wolfie)        10/10/2016      _x86_64_        (24 CPU)

Tape:     r/s     w/s   kB_read/s   kB_wrtn/s %Rd %Wr %Oa    Rs/s    Ot/s
st0         0     370           0       94899   0  22  22       0       0
st1         0     367           0       93971   0  18  19       0       0
st2         0     315           0       80885   0  19  19       0       0
st3         0      27           0        6979   0   1   1       0       0

Tape:     r/s     w/s   kB_read/s   kB_wrtn/s %Rd %Wr %Oa    Rs/s    Ot/s
st0         0     648           0      165888   0  30  30       0       0
st2         0     362           0       92928   0  17  17       0       0

This is a useful addition and I no longer have to worry about systemtap croaking when I’m tracking down issues.