When binaries (errr scripts) attack!

I was recently asked to debug a performance problem with a message passing applications. Once I received an overview of how the application worked, I started digging through the system data. To see where the application was spending it’s time, I ran the following DTrace program:

$ cat syscalltime.d

#pragma D option quiet

syscall:::entry
/ execname == "yap" /
{
    self->ts = timestamp;
}

syscall:::return
/ execname == "yap" && self->ts /
{
    @systime[probefunc] = sum(timestamp - self->ts);
}

dtrace:::END
{
    printa("%25s  %@d\n",@systime);
    printf("\nEnded at %Y",walltimestamp);
}

$ dtrace -s syscalltime.d

               setcontext  10000
                      brk  33400
                 lwp_self  83600
                 schedctl  84700
                    ioctl  153100
              lwp_sigmask  201800
                    alarm  321100
                     pipe  464600
                   access  881400
                    fcntl  988200
                   getpid  1268100
                    uname  1439600
               setsockopt  2684500
                   getcwd  4710800
                   sendto  4900800
                     open  6460800
                    gtime  9309400
                so_socket  11209500
                   doorfs  19206100
                  waitsys  24783800
                    pread  29320300
                     stat  45851200
                    close  53674000
                    lseek  63071800
                   pwrite  66676300
               getsockopt  94663800
                  connect  111886700
                    write  237189600
                     read  462985100
                    pause  2998067200
                    fork1  3568343234
                  pollsys  170526307900

Since the application was supposed to broker connections and initialize itself when the system booted, my first task was determining what was causing all those fork()s. Since an exec would most likely follow each fork(), I fired up the execsnoop script in the DTraceToolkit:

$ execsnoop

  UID    PID   PPID ARGS
911116    901  21542 /bin/sh -c hostname
911116    902    901 /usr/bin/sh /bin/hostname
911116    903    902 /bin/uname -n
911116    905    904 /usr/bin/sh /bin/hostname
911116    904  21617 /bin/sh -c hostname
911116    906    905 /bin/uname -n
911116    907  21600 /bin/sh -c hostname
911116    908    907 /usr/bin/sh /bin/hostname
911116    909    908 /bin/uname -n
911116    910  21583 /bin/sh -c hostname
911116    912    911 /bin/uname -n
911116    911    910 /usr/bin/sh /bin/hostname
911116    914    913 /usr/bin/sh /bin/hostname
911116    913  21707 /bin/sh -c hostname
911116    915    914 /bin/uname -n

Errrr — now why are we invoking a shell and executing /bin/hostname over and over and over again? To see what the application was doing, I wandered off to read the source code. After an hour or so of analyzing the code, I came to a chunk of code that checked for the HOSTNAME environment variable. If the variable was set, the program used the value of the environment variable. If the variable was not set, the program invoked the hostname command to get the name of the host. This is definitely not the best way to structure the program, but the hostname command should never be executed since HOSTNAME is set by the shell. Or is it? To answer this question, I decided to run a quick test:

$ ssh foo@blorp

$ env
HOME=/export/home/foo
LC_COLLATE=en_US.ISO8859-1
LC_CTYPE=en_US.ISO8859-1
LC_MESSAGES=C
LC_MONETARY=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1
LOGNAME=foo
MAIL=/var/mail//foo
PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/sfw/bin
SHELL=/bin/sh
SSH_CLIENT=192.168.1.10 57398 22
SSH_CONNECTION=192.168.1.10 57398 192.168.1.5 22
SSH_TTY=/dev/pts/3
TERM=xterm-color
TZ=US/Eastern
USER=foo

$ echo $SHELL
/bin/sh

$ echo $HOSTNAME

$ /bin/bash

$ echo $HOSTNAME
blorp

It looks like /bin/sh fails to set the HOSTNAME variable, which is why hostname keeps getting executed. But what about all those shells that are created? It turns out that /bin/hostname is actually a shell script that invokes /bin/uname:

$ file /bin/hostname
/bin/hostname: executable /usr/bin/sh script

$ cat /bin/hostname

#!/usr/bin/sh
#       Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T
#         All Rights Reserved
#
#       THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF AT&T
#       The copyright notice above does not evidence any
#       actual or intended publication of such source code.
#
# Copyright (c) 1988, 2001 by Sun Microsystems, Inc.
# All rights reserved.
#
# ident "@(#)hostname.sh        1.5     01/08/15 SMI"
#

TEXTDOMAIN=SUNW_OST_OSCMD export TEXTDOMAIN

if [ $# -eq 0 ]; then
        /bin/uname -n
elif [ $# -eq 1 ]; then
        /bin/uname -S $1
else
        echo `/bin/gettext "Usage: hostname [name]"`
        exit 1
fi

Since the application called /bin/hostname once per connection if HOSTNAME wasn’t set, and /bin/hostname was a shell script, each connection caused a new process to be forked()’ed, a shell started, and /bin/uname executed. This is an inefficient use of resources, especially since this happens once per connection. I submitted a patch to fix this issue, and the solution is now working much better.

The moral of this story: scripts can and do attack!

3 Comments

dean  on April 13th, 2006

we had a script running here where the “which” command caused continual core dumps because it was called from within another script. Turns out that which is also a shell script…

Dan Price  on April 14th, 2006

Matty,

We should probably fold the ‘hostname’ command into the implementation of uname. Plus, the hostname shell script induces yet another fork(), needlessly….

‘which’ is harder to fix.

Dan Price  on April 14th, 2006

I filed:

6413595 hostname(1) is needlessly slow

Leave a Comment