I accidentally came across the moreutils package a few years back and the day I did my life changed forever. This package contains some killer utilities and fill some nice gaps in the *NIX tool chain. Here is a list of the binaries in this package (descriptions taken from the man page of each utility):
chronic - runs a command quietly unless it fails
combine - combine sets of lines from two files using boolean operations
errno - look up errno names and descriptions
ifdata - get network interface info without parsing ifconfig output
ifne - Run command if the standard input is not empty
isutf8 - check whether files are valid UTF-8
lckdo - run a program with a lock held
mispipe - pipe two commands, returning the exit status of the first
pee - tee standard input to pipes
sponge - soak up standard input and write to a file
ts - timestamp input
vidir - edit directory
vipe - edit pipe
zrun - automatically uncompress arguments to command
I’m especially fond of errno, chronic and pee. But my favorite utilities have to be ifne and ts. Ifne is useful if you need to run a command if output is present. One such use is e-mailing someone if a monitoring program spits out an error:
$ hardware_monitor | mail -s "Problem detected with the hardare on $(/bin/hostname)" admins
The ts utility is just as useful. Say you have a program that randomly spits out lines of gik and you want to know when the lines of gik occurred. To get a timestamp you can pipe the programs output to ts:
$ gik_generator | ts
Nov 02 09:55:11 The world needs more cow bell!
Nov 02 09:55:12 The world needs more cow bell!
Nov 02 09:55:13 The world needs more cow bell!
I love coming across tools that make troubleshooting and shell scripting more enjoyable. Now we just need more cow bell!
Linux systems have a rather nice auditing framework built into the kernel. This framework consists of a kernel component that inspects and filters events and a userland daemon (auditd) that takes the events and archives them to persistent storage. These audit log events can then be searched and summarized with the ausearch and aureport utilities. The following ausearch command will dump all audit records for UID 7005:
$ ausearch --uid-all 7005 | head -4
time->Thu Oct 20 15:01:49 2016
type=PROCTITLE msg=audit(1476990109.442:153031): proctitle="-bash"
type=SYSCALL msg=audit(1476990109.442:153031): arch=c000003e syscall=1 success=yes exit=1024 a0=1 a1=55b32ff2fe40 a2=400 a3=6553726567616e61 items=0 ppid=18901 pid=19071 auid=7005 uid=7005 gid=7005 euid=7005 suid=7005 fsuid=7005 egid=7005 sgid=7005 fsgid=7005 tty=pts9 ses=6031 comm="ps" exe="/usr/bin/ps" key=(null)
Each audit event contains the type of event (system call executed, file read, etc.) as well as a bunch of user and system context. To control the types of events that get audited you can create an audit.rules file in /etc/audit/rules.d. This file can contain control rules, file system rules and system call rules. Control rules are used to control the behavior of the audit subsystem (e.g., what to do when the disk fills up, how many audit buffers to allocate, etc.). File system rules allow you to audit when files or directories are accessed and system call rules allow you to audit system calls (full list in /usr/include/asm/unistd_64.h).
To illustrate how useful auditing is lets say you are tasked with figuring out what a black box application does. Furthermore, lets assume this application runs as the user fmep with the UID 7005. To log EVERY system call generated by processes with UID 7005 we can add the following audit rule (WARNING: DO NOT ENABLE THIS ON PRODUCTION SYSTEMS WITHOUT PROPER TESTING :END WARNING):
$ auditctl -a always,exit -S all -F auid=7005
The “-a” option tells auditd to append the rule to the chain, “always,exit” audits system call events at exit, “-S all” audits all system calls and the “-F auid=7005” filters events where the loginuid is set to 7005 (more info on AUID is available here). To verify the rule was added we can run auditctl with the “-l” option:
$ auditctl -l
-a always,exit -S all -F auid=7005
Once the rules are activated you will start seeing events similar to the following in /var/log/audit/audit.log:
*type=SYSCALL msg=audit(1476990167.749:160759): arch=c000003e syscall=14
success=yes exit=0 a0=2 a1=7ffe53ea8fc0 a2=0 a3=8 items=0 ppid=18884
pid=18900 auid=7005 uid=7005 gid=7005 euid=7005 suid=7005 fsuid=7005
egid=7005 sgid=7005 fsgid=7005 tty=(none) ses=6031 comm="sshd"
exe="/usr/sbin/sshd" key=(null)*
Each system call audit event contains the system call number as well as a good deal of context (pid, uid, ppid, etc.) to describe the event. To better understand the list of commands and files being accessed by a given UID I wrote auditloginfo. This script parses the audit log and produces a report listing the commands executed, files accessed and a summary of the system calls made:
$ sudo auditloginfo.py --uid 7005
Audit report for AUID 7005
System calls made:
read 207
stat 179
close 147
open 142
rt_sigprocmask 141
clock_gettime 120
write 88
Commands executed:
clear 2
/bin/basename bash 2
uname -m 1
ps -ef 1
ls --color=auto -la 1
Files opened:
/etc/ld.so.cache 17
/lib64/libc.so.6 15
/etc/passwd 14
/usr/lib/locale/locale-archive 12
/lib64/libpthread.so.0 8
I’ve found auditing rather useful and it’s amazing what you can discover when you enable it.
When you support mission critical systems it is critically important to know WHAT is running on your systems and WHO is running it. In the past it was sometimes a chore to tell which UID ran a program. This was especially true in the early days when programs fork()‘ed and execve()‘ed after escalating privileges.
The which UID ran a program mystery has been mostly solved (I say mostly because user namespaces throw a few kinks into the equation) by auditing and the PAM loginuid module. PAM loginuid helps solve this problem by maintaining a /proc/PID/loginuid entry with the UID of the user that interacted with a service (e.g., ssh) that is configured to use pam_loginuid. You can get a list of PAM loginuid’ed services with grep:
$ grep --files-with-matches loginuid /etc/pamd/ | paste - - - - -
atd crond gdm-autologin gdm-fingerprint gdm-password
gdm-pin gdm-smartcard lightdm lightdm-autologin login
lxdm pluto remote sshd systemd-user
Once the loginuid is set the original UID will follow process creation:
$ id
uid=1000(matty) gid=1000(matty) .....
$ cat /proc//loginuid
1000
$ sudo -i
$ id
uid=0(root) gid=0(root) groups=0(root)
$ cat /proc/$$/loginuid
1000
$ bash
$ cat /proc/$$/loginuid
1000
Now this gets especially interesting when you enable auditing. If auditing is enabled and privileges are escalated each event will contain an AUID with the value of the actual user that ran a privileged command:
type=SYSCALL msg=audit(1477500276.323:92205): arch=c000003e syscall=231
a0=1 a1=3c a2=1 a3=3 items=0 ppid=30493 pid=30544 **auid=7005** uid=7005
gid=7005 euid=7005 suid=7005 fsuid=7005 egid=7005 sgid=7005 fsgid=7005
tty=pts1 ses=1196 comm="dmidecode" exe="/usr/sbin/dmidecode" key=(null)
No more seeing uid=0 and having to track down the underlying user or application by digging through logs. This is also incredibly useful when conducting RCAs and doing postmordems. To learn more about this feature check out pam_loginuid(8), the pam_loginuid source and the following link from the samhein team.
Linux provides a number of useful utilities for viewing I/O statistics. Atop, iostat, iotop, etc. are extremely useful utilities but they don’t allow you to easily see I/O statistics for each path to a mapper device. To illustrate this lets say you have a mapper device named mpatha with four paths (represented by sdcl, sdnf, sdeq and sdpk in the output below):
$ multipath -ll mpatha
mpatha (12345456789) dm-95 MYARRAY, FastBizzle
size=4.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 1:0:1:25 sdcl 69:144 active ready running
|- 3:0:2:25 sdnf 71:272 active ready running
|- 1:0:2:25 sdeq 129:32 active ready running
`- 3:0:3:25 sdpk 130:416 active ready running
To visualize the amount of I/O traversing down each path to a mapper device we need to combine iostat and grep:
$ iostat -xN 1 | egrep '(mpatha|sdcl|sdnf|sdeq|sdpk)'
sdcl 0.00 0.00 0.20 0.04 2.06 7.35 39.19 0.00 0.88 0.56 2.57 0.86 0.02
sdnf 0.00 0.00 0.20 0.04 2.43 7.09 39.66 0.00 0.81 0.45 2.68 0.79 0.02
sdeq 0.00 0.00 0.20 0.04 1.99 7.06 37.68 0.00 0.90 0.59 2.57 0.88 0.02
sdpk 0.00 0.00 0.20 0.04 1.93 7.03 37.33 0.00 0.79 0.45 2.63 0.77 0.02
mpatha 0.06 0.00 0.52 0.15 5.96 28.53 51.28 0.00 0.98 0.49 2.67 0.86 0.06
This solution requires you to look up the device you want to monitor and the default output is terrible! I wanted a cleaner solution so I wrote mpathstat. Mpathstat parses ‘multipath -ll’ to get the mapper devices on a system as well as the block devices that represent the paths to this device. The script then iterates over an iostat sample and combines everything in one location:
$ mpathstat.py 1
Device Name Reads Writes KBytesR/S KBytesW/S Await
mpatha 0.00 44.00 0.00 912.00 0.82
|- sdcj 0.00 11.00 0.00 256.00 0.73
|- sdpi 0.00 11.00 0.00 224.00 0.91
|- sdeo 0.00 11.00 0.00 192.00 0.73
|- sdnd 0.00 11.00 0.00 240.00 0.91
mpathb 0.00 3.00 0.00 384.00 1.00
|- sdmv 0.00 1.00 0.00 128.00 1.00
|- sdcb 0.00 0.00 0.00 0.00 0.00
|- sdpa 0.00 1.00 0.00 128.00 1.00
|- sdeg 0.00 1.00 0.00 128.00 1.00
This has helped me track down a couple of issues with one of our Enterprise storage arrays. If you make any enhancements to the script please send me a pull request on github!
This post is added as a reference for myself.
As I mentioned previously I’ve been spending a good deal of time learning how to write efficient and readable Python code. Jeff Knupp’s Idiomatic Python videos are absolutely incredible and have definitely helped me become a better programmer. In his first video he turned me on to list comprehensions and how they can make code more readable. Prior to watching this video I had a tendency to use multi line iterators similar to this:
def lower_a_list(list):
ml = []
for word in list:
ml.append(word.lower())
return ml
After watching Jeff do some crazy awesome python sorcery on bad.py I’ve started to use list comprehensions. Here is a refactored lower_a_list() that uses list comprehensions:
def lower_a_list(list):
return [word.lower() for word in list]
I find the second much more readable and you can get the gist of what’s going on by scanning from right to left. Huge thanks to Jeff for putting out this amazing series!