Ganglia never met a quote it didn’t like. Wait it did …

This week I encountered a weird issue while developing a new Ganglia plug-in. After moving my ganglia processes to a docker container I noticed that the grid overview images weren’t displaying. This was a new ganglia installation so I figured I typo’ed something in the gmetad.conf configuration file. I reviewed the file in my git repo and everything looked perfectly fine. Any Apache errors? Gak!:

$ tail -6 /var/log/httpd/error_log

sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file

Well those messages are super helpful lol. Something is being run through /bin/sh and it’s missing a matching single quote. But what command or script? Luckily I follow the world famous performance expert Brendan Gregg’s blog so I was aware of the Linux eBPF and bcc project. If you aren’t familiar with this amazing project here is a short blurb from their website:

“BCC is a toolkit for creating efficient kernel tracing and manipulation programs, and includes several useful tools and examples. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF, a new feature that was first added to Linux 3.15. Much of what BCC uses requires Linux 4.1 and above.”

I knew Brendan ported a large number of the DTraceToolkit scripts to bcc so I went searching for execsnoop to see what is being executed:

$ ./execsnoop

PCOMM            PID    PPID   RET ARGS
sh               18748  18350    0   /usr/bin/rrdtool
rrdtool          18748  18350    0 /usr/bin/rrdtool
sh               18749  18350    0   /usr/bin/rrdtool graph /dev/null  --start -3600s --end now DEF:avg='/ganglia/rrds// __SummaryInfo__/cpu_num.rrd':'sum':AVERAGE PR

Bingo! Now I have a lead, but it appears execsnoop limit’s the arguments displayed (NTB: need to figure out why). So lets go to the Ganglia source code to see whats up:

$ cd /usr/share/ganglia

$ grep rrdtool *.php
*** lots of output ***

In the output above I noticed several exec() statements in graph.php so I decided to begin there. Starting from the main code block and reading down I came across the following:

if ($debug) {
  error_log("Final rrdtool command:  $command");

Digging further I noticed that debug output (including the full string that is being passed to the shell) could be set through a query string variable:

$debug = isset($_GET['debug']) ? clean_number(sanitize($_GET["debug"])) : 0;

Sweet! Adding “&debug=1” to the end of the ganglia URL got me a useful log entry in my error_log:

[Fri Oct 07 13:18:01.429310 2016] [:error] [pid 19884] [client] Final rrdtool command:  /usr/bin/rrdtool graph -   --start '-3600s' --end now --width 650 --height 300 --title 'Matty\\'s Bodacious Grid Load last hour' --vertical-label 'Loads/Procs' --lower-limit 0 --slope-mode   DEF:'a0'='/ganglia/rrds//__SummaryInfo__/load_one.rrd':'sum':AVERAGE  DEF:'a1'='/ganglia/rrds//__SummaryInfo__/cpu_num.rrd':'num':AVERAGE  DEF:'a2'='/ganglia/rrds//__SummaryInfo__/cpu_num.rrd':'sum':AVERAGE  DEF:'a3'='/ganglia/rrds//__SummaryInfo__/proc_run.rrd':'sum':AVERAGE   AREA:'a0'#BBBBBB:'1-min' VDEF:a0_last=a0,LAST VDEF:a0_min=a0,MINIMUM VDEF:a0_avg=a0,AVERAGE VDEF:a0_max=a0,MAXIMUM GPRINT:'a0_last':'Now\\:%5.1lf%s' GPRINT:'a0_min':'Min\\:%5.1lf%s' GPRINT:'a0_avg':'Avg\\:%5.1lf%s' GPRINT:'a0_max':'Max\\:%5.1lf%s\\l' LINE2:'a1'#00FF00:'Nodes' VDEF:a1_last=a1,LAST VDEF:a1_min=a1,MINIMUM VDEF:a1_avg=a1,AVERAGE VDEF:a1_max=a1,MAXIMUM GPRINT:'a1_last':'Now\\:%5.1lf%s' GPRINT:'a1_min':'Min\\:%5.1lf%s' GPRINT:'a1_avg':'Avg\\:%5.1lf%s' GPRINT:'a1_max':'Max\\:%5.1lf%s\\l' LINE2:'a2'#FF0000:'CPUs ' VDEF:a2_last=a2,LAST VDEF:a2_min=a2,MINIMUM VDEF:a2_avg=a2,AVERAGE VDEF:a2_max=a2,MAXIMUM GPRINT:'a2_last':'Now\\:%5.1lf%s' GPRINT:'a2_min':'Min\\:%5.1lf%s' GPRINT:'a2_avg':'Avg\\:%5.1lf%s' GPRINT:'a2_max':'Max\\:%5.1lf%s\\l' LINE2:'a3'#2030F4:'Procs' VDEF:a3_last=a3,LAST VDEF:a3_min=a3,MINIMUM VDEF:a3_avg=a3,AVERAGE VDEF:a3_max=a3,MAXIMUM GPRINT:'a3_last':'Now\\:%5.1lf%s' GPRINT:'a3_min':'Min\\:%5.1lf%s' GPRINT:'a3_avg':'Avg\\:%5.1lf%s' GPRINT:'a3_max':'Max\\:%5.1lf%s\\l' 

Running that manually from a shell prompt generated the same error as the one I originally found! Excellent, now I know exactly what is causing the error. I opened the output above in vim and searched for a single quote. That highlighted all single quotes and my eyes immediately picked up the following:

–title ‘Matty\’s Bodacious Grid Load last hour’

Oh snap, I found a gem here. A LOOOONG time ago in a shell scripting book (possibly in UNIX shell programming?) far, far away I remembered reading that you can never have a single quote inside a pair of single quotes. Googling this turned up the following POSIX standard which states:

“A single-quote cannot occur within single-quotes.”

When I removed the single quote after Matty and re-ran rrdtool it ran w/o error and spit out a nice png. So could that be the issue? I removed the single quote from the gridname directive in gmetad.conf, bounced gmetad and low and behold all of my graphs started showing up. Now to finish up my plug-in.