Riemann and the case of the unhappy system call

This past weekend I spent a good deal of time playing with riemann. Riemann is a powerful stream processor and I’m hoping to use it to correlated and analyze metrics from disparate data sources. After downloading and installing it I received the following error:

$ ./riemann
ERROR [2016-10-10 12:21:36,614] main – riemann.bin – Couldn’t start
java.util.concurrent.ExecutionException: java.lang.UnsatisfiedLinkError: /tmp/libnetty-transport-native-epoll3719688563306389605.so: /tmp/libnetty-transport-native-epoll3719688563306389605.so: failed to map segment from shared object

Not exactly what I was exepecting on our first date but I guess riemann plays hard to get. :) To make a little more sense out of this error I fired up strace to retrieve the ERRNO value and to see which system call was failing:

$ strace -f ./riemann
[pid 9826] open(“/tmp/libnetty-transport-native-epoll27354233456383270.so”, O_RDONLY|O_CLOEXEC) = 45
[pid 9826] read(45, “\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240>\0\0\0\0\0\0″…, 832) = 832
[pid 9826] fstat(45, {st_mode=S_IFREG|0644, st_size=63749, …}) = 0
[pid 9826] mmap(NULL, 2146168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 45, 0) = -1 EPERM (Operation not permitted)

Now that hits the spot! mmap() is getting an EPERM which the mmap(2) manual page describes as:

“EPERM The prot argument asks for PROT_EXEC but the mapped area belongs to a file on a filesystem that was mounted no-exec.”

One problem solved! The tmp file system is mounted with the noexec flag which prohibits the mmap() from running with PROT_EXEC set. Here’s a picture of tmp:

$ mount | grep “/tmp”
tmpfs on /tmp type tmpfs (rw,noexec)

I’m not one to disable security measures so I went looking for a workaround. After reading through the netty source code for 20 minutes I came across this nugget:

            f = toDirectory(SystemPropertyUtil.get("java.io.tmpdir"));
            if (f != null) {
                logger.debug("-Dio.netty.tmpdir: " + f + " (java.io.tmpdir)");
                return f;

Netty uses the java.io.tmpdir property to craft the temporary file location. Further digging showed that riemann passes properties to the Java runtime through the EXTRA_JAVA_OPTS variable:

exec java $EXTRA_JAVA_OPTS $OPTS -cp "$JAR" riemann.bin "$COMMAND" "$CONFIG"

Furthermore, you can pass properties to the Java runtime via the “-D” option. So to make riemann happy I set the EXTRA_JAVA_OPTS environment variable to “-Djava.io.tmpdir=/path/to/new/tmpdir” and fired off riemann:

$ EXTRA_JAVA_OPTS=-Djava.io.tmpdir=/sfw/riemann/latest/tmp cd $RIEMANN_HOME && ./riemann

Bingo! Riemann started and I was able to start working with it. Clojure takes me back to my Lisp programming days in college. Glad vim has () matching built in! :)

Ganglia never met a quote it didn’t like. Wait it did …

This week I encountered a weird issue while developing a new Ganglia plug-in. After moving my ganglia processes to a docker container I noticed that the grid overview images weren’t displaying. This was a new ganglia installation so I figured I typo’ed something in the gmetad.conf configuration file. I reviewed the file in my git repo and everything looked perfectly fine. Any Apache errors? Gak!:

$ tail -6 /var/log/httpd/error_log

sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file

Well those messages are super helpful lol. Something is being run through /bin/sh and it’s missing a matching single quote. But what command or script? Luckily I follow the world famous performance expert Brendan Gregg’s blog so I was aware of the Linux eBPF and bcc project. If you aren’t familiar with this amazing project here is a short blurb from their website:

“BCC is a toolkit for creating efficient kernel tracing and manipulation programs, and includes several useful tools and examples. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF, a new feature that was first added to Linux 3.15. Much of what BCC uses requires Linux 4.1 and above.”

I knew Brendan ported a large number of the DTraceToolkit scripts to bcc so I went searching for execsnoop to see what is being executed:

$ ./execsnoop

PCOMM            PID    PPID   RET ARGS
sh               18748  18350    0   /usr/bin/rrdtool
rrdtool          18748  18350    0 /usr/bin/rrdtool
sh               18749  18350    0   /usr/bin/rrdtool graph /dev/null  --start -3600s --end now DEF:avg='/ganglia/rrds// __SummaryInfo__/cpu_num.rrd':'sum':AVERAGE PR

Bingo! Now I have a lead, but it appears execsnoop limit’s the arguments displayed (NTB: need to figure out why). So lets go to the Ganglia source code to see whats up:

$ cd /usr/share/ganglia

$ grep rrdtool *.php
*** lots of output ***

In the output above I noticed several exec() statements in graph.php so I decided to begin there. Starting from the main code block and reading down I came across the following:

if ($debug) {
  error_log("Final rrdtool command:  $command");

Digging further I noticed that debug output (including the full string that is being passed to the shell) could be set through a query string variable:

$debug = isset($_GET['debug']) ? clean_number(sanitize($_GET["debug"])) : 0;

Sweet! Adding “&debug=1” to the end of the ganglia URL got me a useful log entry in my error_log:

[Fri Oct 07 13:18:01.429310 2016] [:error] [pid 19884] [client] Final rrdtool command:  /usr/bin/rrdtool graph -   --start '-3600s' --end now --width 650 --height 300 --title 'Matty\\'s Bodacious Grid Load last hour' --vertical-label 'Loads/Procs' --lower-limit 0 --slope-mode   DEF:'a0'='/ganglia/rrds//__SummaryInfo__/load_one.rrd':'sum':AVERAGE  DEF:'a1'='/ganglia/rrds//__SummaryInfo__/cpu_num.rrd':'num':AVERAGE  DEF:'a2'='/ganglia/rrds//__SummaryInfo__/cpu_num.rrd':'sum':AVERAGE  DEF:'a3'='/ganglia/rrds//__SummaryInfo__/proc_run.rrd':'sum':AVERAGE   AREA:'a0'#BBBBBB:'1-min' VDEF:a0_last=a0,LAST VDEF:a0_min=a0,MINIMUM VDEF:a0_avg=a0,AVERAGE VDEF:a0_max=a0,MAXIMUM GPRINT:'a0_last':'Now\\:%5.1lf%s' GPRINT:'a0_min':'Min\\:%5.1lf%s' GPRINT:'a0_avg':'Avg\\:%5.1lf%s' GPRINT:'a0_max':'Max\\:%5.1lf%s\\l' LINE2:'a1'#00FF00:'Nodes' VDEF:a1_last=a1,LAST VDEF:a1_min=a1,MINIMUM VDEF:a1_avg=a1,AVERAGE VDEF:a1_max=a1,MAXIMUM GPRINT:'a1_last':'Now\\:%5.1lf%s' GPRINT:'a1_min':'Min\\:%5.1lf%s' GPRINT:'a1_avg':'Avg\\:%5.1lf%s' GPRINT:'a1_max':'Max\\:%5.1lf%s\\l' LINE2:'a2'#FF0000:'CPUs ' VDEF:a2_last=a2,LAST VDEF:a2_min=a2,MINIMUM VDEF:a2_avg=a2,AVERAGE VDEF:a2_max=a2,MAXIMUM GPRINT:'a2_last':'Now\\:%5.1lf%s' GPRINT:'a2_min':'Min\\:%5.1lf%s' GPRINT:'a2_avg':'Avg\\:%5.1lf%s' GPRINT:'a2_max':'Max\\:%5.1lf%s\\l' LINE2:'a3'#2030F4:'Procs' VDEF:a3_last=a3,LAST VDEF:a3_min=a3,MINIMUM VDEF:a3_avg=a3,AVERAGE VDEF:a3_max=a3,MAXIMUM GPRINT:'a3_last':'Now\\:%5.1lf%s' GPRINT:'a3_min':'Min\\:%5.1lf%s' GPRINT:'a3_avg':'Avg\\:%5.1lf%s' GPRINT:'a3_max':'Max\\:%5.1lf%s\\l' 

Running that manually from a shell prompt generated the same error as the one I originally found! Excellent, now I know exactly what is causing the error. I opened the output above in vim and searched for a single quote. That highlighted all single quotes and my eyes immediately picked up the following:

–title ‘Matty\’s Bodacious Grid Load last hour’

Oh snap, I found a gem here. A LOOOONG time ago in a shell scripting book (possibly in UNIX shell programming?) far, far away I remembered reading that you can never have a single quote inside a pair of single quotes. Googling this turned up the following POSIX standard which states:

“A single-quote cannot occur within single-quotes.”

When I removed the single quote after Matty and re-ran rrdtool it ran w/o error and spit out a nice png. So could that be the issue? I removed the single quote from the gridname directive in gmetad.conf, bounced gmetad and low and behold all of my graphs started showing up. Now to finish up my plug-in.

Summarizing system call activity on Linux systems

Linux has a guadzillion debugging utilities available. One of my favorite tools for debugging problems is strace, which allows you to observe the system calls a process is making in realtime. Strace also has a “-c” option to summarize system call activity:

$ strace -c -p 28009

Process 28009 attached
Process 28009 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 78.17    0.001278           2       643           select
 14.80    0.000242           1       322           read
  7.03    0.000115           0       322           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.001635                  1287           total

The output contains the percentage of time spent in each system call, the total time in seconds, the microseconds per call, the total number of calls, a count of the number of errors and the type of system call that was made. This output has numerous uses, and is a great way to observe how a process is interacting with the kernel. Good stuff!

Using the automated bug-reporting tool (abrt) to generate core dumps when a Linux process fails

Software fails, and it often occurs at the wrong time. When failures occur I want to understand why, and will usually start putting together the events that lead up to the issue. Some application issues can be root caused by reviewing logs, but catastrophic crashes will often require the admin to sit down with gdb and review a core file if it exists.

Solaris has always led the charge when it comes to reliably creating core files during crashes. System crashes will cause core files to be dumped to /var/crash, and the coreadm utility can be used to save application core files. Linux has been playing catch up in this realm, and just in the past couple of years started providing diskdump and netdump to generate kernel crash dumps when an OOPS occurs (in my experience these tools aren’t as reliable as their Solaris counterparts though).

In the latest releases of Fedora, the automated bug-reporting tool (abrt) infrastructure was added to generate core files when processes crashed. The abrt website provides the following description for their automated crash dump collection infrastructure:

“abrt is a daemon that watches for application crashes. When a crash occurs, it collects the crash data (core file, application’s command line etc.) and takes action according to the type of application that crashed and according to the configuration in the abrt.conf configuration file. There are plugins for various actions: for example to report the crash to Bugzilla, to mail the report, to transfer the report via FTP or SCP, or to run a specified application.”

This is a welcome addition to the Linux family, and it works pretty well from my initial testing. To see abrt in action I created a script and sent it a SIGSEGV:

$ cat loop


while :
   sleep 1

$ ./loop &
[1] 22790

$ kill -SIGSEGV 22790
[1]+ Segmentation fault (core dumped) ./loop

Once the fault occurred I tailed /var/log/messages to get the crash dump location:

$ tail -20 /var/log/messages

Jan 17 09:52:08 theshack abrt[20136]: Saved core dump of pid 20126 (/bin/bash) to /var/spool/abrt/ccpp-2012-01-17-09:52:08-20126 (487424 bytes)
Jan 17 09:52:08 theshack abrtd: Directory 'ccpp-2012-01-17-09:52:08-20126' creation detected
Jan 17 09:52:10 theshack abrtd: New problem directory /var/spool/abrt/ccpp-2012-01-17-09:52:08-20126, processing

If I change into the directory referenced above I will see a wealth of debugging data, including a core file from the application (bash) that crashed:

$ cd /var/spool/abrt/ccpp-2012-01-17-09:52:08-20126

$ ls -la

total 360
drwxr-x---. 2 abrt root   4096 Jan 17 09:52 .
drwxr-xr-x. 4 abrt abrt   4096 Jan 17 09:52 ..
-rw-r-----. 1 abrt root      5 Jan 17 09:52 abrt_version
-rw-r-----. 1 abrt root      4 Jan 17 09:52 analyzer
-rw-r-----. 1 abrt root      6 Jan 17 09:52 architecture
-rw-r-----. 1 abrt root     16 Jan 17 09:52 cmdline
-rw-r-----. 1 abrt root      4 Jan 17 09:52 component
-rw-r-----. 1 abrt root 487424 Jan 17 09:52 coredump
-rw-r-----. 1 abrt root      1 Jan 17 09:52 count
-rw-r-----. 1 abrt root    649 Jan 17 09:52 dso_list
-rw-r-----. 1 abrt root   2110 Jan 17 09:52 environ
-rw-r-----. 1 abrt root      9 Jan 17 09:52 executable
-rw-r-----. 1 abrt root      8 Jan 17 09:52 hostname
-rw-r-----. 1 abrt root     19 Jan 17 09:52 kernel
-rw-r-----. 1 abrt root   2914 Jan 17 09:52 maps
-rw-r-----. 1 abrt root     25 Jan 17 09:52 os_release
-rw-r-----. 1 abrt root     18 Jan 17 09:52 package
-rw-r-----. 1 abrt root      5 Jan 17 09:52 pid
-rw-r-----. 1 abrt root      4 Jan 17 09:52 pwd
-rw-r-----. 1 abrt root     51 Jan 17 09:52 reason
-rw-r-----. 1 abrt root     10 Jan 17 09:52 time
-rw-r-----. 1 abrt root      1 Jan 17 09:52 uid
-rw-r-----. 1 abrt root      5 Jan 17 09:52 username
-rw-r-----. 1 abrt root     40 Jan 17 09:52 uuid
-rw-r-----. 1 abrt root    620 Jan 17 09:52 var_log_messages

If we were debugging the crash we could poke around the saved environment files and then fire up gdb with the core dump to see where it crashed:

$ gdb /bin/bash coredump

GNU gdb (GDB) Fedora (
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /bin/bash...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 20126]
Core was generated by `/bin/bash ./loop'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000344aabb83e in waitpid () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bash-4.2.20-1.fc16.x86_64
(gdb) backtrace
#0  0x000000344aabb83e in waitpid () from /lib64/libc.so.6
#1  0x0000000000440679 in ?? ()
#2  0x00000000004417cf in wait_for ()
#3  0x0000000000432736 in execute_command_internal ()
#4  0x0000000000434dae in execute_command ()
#5  0x00000000004351c5 in ?? ()
#6  0x000000000043150b in execute_command_internal ()
#7  0x0000000000434dae in execute_command ()
#8  0x000000000041e0b1 in reader_loop ()
#9  0x000000000041c8ef in main ()

From there you could navigate through the saved memory image to see caused the program to die an unexpected death. Now you may be asking yourself how exactly does abrt work? After digging through the source code I figured out that it installs a custom hook using the core_pattern kernel entry point:

$ cat /proc/sys/kernel/core_pattern
|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e

Each time a crash occurs the kernel will invoke the hook above passing a number of arguments to the program listed in the core_pattern proc entry. From what I have derived from the source so far there are currently hooks for C/C++ and Python applications, and work is in progress to add support for Java. This is really cool stuff

Dealing with xauth “error in locking authority file” errors

I recently logged into one of my servers and received the following error:

$ ssh foo
matty@foo’s password:
Last login: Tue Nov 1 13:42:52 2011 from
/usr/bin/xauth: error in locking authority file /home/matty/.Xauthority

I haven’t seen this one before, but based on previous “locking issues” I’ve encountered in the past I ran strace against xauth to see if an old lock file existed:

$ strace xauth

In the output I saw the following which confirmed my suspicion:

open(“/home/matty/.Xauthority-c”, O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EEXIST (File exists)

A quick check with ls revealed the same thing:

$ ls -l /home/lnx/matty/.Xauthority*

-rw------- 2 matty SysAdmins 0 Nov  1 13:42 /home/lnx/matty/.Xauthority-c
-rw------- 2 matty SysAdmins 0 Nov  1 13:42 /home/lnx/matty/.Xauthority-l

I removed both of the lock files and haven’t seen the error again. Thought I would pass this on in case someone else encountered this issue.

One way to avoid tcpdump “packets dropped by kernel” messages

I have been knee deep this week debugging a rather complex DNS issue. I’ll do a full write up on that next week. While I was debugging the issue I needed to fire up tcpdump to watch the DNS queries from one of my authoritative servers to various servers on the Internet. What I noticed when I fed the data into wireshark were periods of time with no data, and I wasn’t quite sure why at first.

Based on what I could find on the tcpdump / BPF sites when tcpdump is busy processing existing data and is not able to take packets captured by BPF out of the queue fast enough, the kernel will drop them. If this occurs you will see the tcpdump message ” packets dropped by kernel” become non-zero:

$ tcpdump -w dns-capture.cap -s 1520 -ttt -vvv -i bond0 port 53
9559 packets captured
12533 packets received by filter
2974 packets dropped by kernel

I started to do some digging to see why tcpdump couldn’t keep up, and after a bit of profiling I noticed that the program was spending an excessive amount of time resolving IPs to names. This processing was stalling the program from reading more data from the queue, and resulted in packets being dropped. Once I ran tcpdump with the “-n” (do not resolve IPs to names) option I no longer experienced this issue:

$ tcpdump -w dns-capture.cap -s 1520 -ttt -vvv -n -i bond0 port 53
9339 packets captured
9339 packets received by filter
0 packets dropped by kernel

This stopped gaps from occurring in my wireshark display, and since wireshark can resolve IPs to names all was well. It’s really crazy how you can start debugging one issue and wind up debugging 3 – 4 more prior to solving the original problem. Debugging issues is definitely fun, and I guess it gives me plenty to write about. :) This past week I’ve learned more about the DNS protocol and the various implementations than I have in the past 10 years. It’s amazing how many cool nuggets of information are buried in the various DNS RFCs!