Solaris nscd keep-hot-count issue

While reviewing the DNS logfiles on one of my name servers, I noticed that one of my Solaris hosts was querying getupdates.sun.com once per hour:

Sep 30 02:28:05 dns named[1097]: [ID 866145 local0.info] client 1.2.3.4#47644: view internal: query: getupdates.sun.com IN A +
Sep 30 03:27:35 dns named[1097]: [ID 866145 local0.info] client 1.2.3.4#47738: view internal: query: getupdates.sun.com IN A +
Sep 30 04:27:05 dns named[1097]: [ID 866145 local0.info] client 1.2.3.4#47841: view internal: query: getupdates.sun.com IN A +
[ ..... ]

Since I had recently applied the latest connection manager patches to the server, I started to wonder if Sun had installed a cron job (or started a daemon) to periodically poll the server that hosted getupdates.sun.com. To see what process was querying getupdates.sun.com, I first reviewed the cron jobs on the system, and then I perused the list of daemons that were running. Everything checked out fine, so I fired up DTrace to see which process was actually responsible for the DNS requests. It turns out these DNS queries were coming from the nscd daemon. After reading through the nscd.conf manual page, I came across a description of the “keep-hot-count” attribute:

“This attribute allows the administrator to set the number of entries nscd(1M) is to keep current in the specified cache. value is an integer number which should approximate the number of entries frequently used during the day.”

A quick peak at /etc/nscd.conf revealed that the value of keep-hot-count is set to 20 by default:

$ egrep “keep-hot.*host” /etc/nscd.conf
keep-hot-count hosts 20

It turns out nscd was refreshing the name service information entry for this host once per hour. To disable this behavior, I set the value of keep-hot-count to 0, and the DNS queries for getupdates.sun.com ceased. Numerous admins kill nscd on systems that don’t handle lots of login sessions, and I reckon I will need to investigate this option.

Mapping pfiles output to files

While I was analyzing the performance characteristics of one of my Solaris 9 Oracle database servers, I needed to map a file descriptor listed in the pfiles output to the actual file name on the file system (Solaris 10 provides this information in the pfiles output, which is yet another reason to run Solaris 10). When you run pfiles on a process or core file, it gives you the type (e.g., socket, fifo, regular file, etc) of file that the file descriptor references, and the inode number associated with the file:

$ pfiles 12345

  [ ..... ]

 269: S_IFREG mode:0660 dev:228,119007 ino:61 uid:102 gid:315 size:10729046016
      O_RDWR|O_DSYNC|O_LARGEFILE FD_CLOEXEC
  [ ..... ]

To map the inode listed after the “ino:” field to the file name, I first used the “dev:” identifier to locate the device the file was located on:

$ cd /dev/vx/dsk/foodg

$ ls -la

total 4
drwxr-xr-x   2 root     root         512 Sep 18  2005 .
drwxr-xr-x   5 root     root         512 Sep 18  2005 ..
brw-------   1 root     root     228,119000 Sep 18  2005 u16
brw-------   1 root     root     228,119001 Sep 18  2005 u17
brw-------   1 root     root     228,119002 Sep 18  2005 u18
brw-------   1 root     root     228,119003 Sep 18  2005 u19
brw-------   1 root     root     228,119004 Sep 18  2005 u20
brw-------   1 root     root     228,119005 Sep 18  2005 u57
brw-------   1 root     root     228,119006 Sep 18  2005 u58
brw-------   1 root     root     228,119007 Sep 18  2005 u59
brw-------   1 root     root     228,119008 Sep 18  2005 u60
brw-------   1 root     root     228,119009 Sep 18  2005 u61
brw-------   1 root     root     228,119010 Sep 18  2005 u62
brw-------   1 root     root     228,119011 Sep 18  2005 u63
brw-------   1 root     root     228,119012 Sep 18  2005 u64
brw-------   1 root     root     228,119013 Sep 18  2005 u65
brw-------   1 root     root     228,119014 Sep 18  2005 u66

Once I had the device name and associated mount point, I used the trusty old find utility to locate the file by it’s inode number:

$ find /u59 -inum 61 -ls

   61 10477592 -rw-rw----   1 oracle   dba      10729046016 Sep 18 13:15 /u59/oradata/blatch/data01.dbf

Some folks might be wondering why I didn’t use lsof. Well, unfortunately the version of lsof available in various package repositories isn’t able to interpret the VxFS metadata, and I would have spent more time building the new package than analayzing the problem I was trying to solve. Viva la Solaris!

When binaries (errr scripts) attack!

I was recently asked to debug a performance problem with a message passing applications. Once I received an overview of how the application worked, I started digging through the system data. To see where the application was spending it’s time, I ran the following DTrace program:

$ cat syscalltime.d

#pragma D option quiet

syscall:::entry
/ execname == "yap" /
{
    self->ts = timestamp;
}

syscall:::return
/ execname == "yap" && self->ts /
{
    @systime[probefunc] = sum(timestamp - self->ts);
}

dtrace:::END
{
    printa("%25s  %@d\n",@systime);
    printf("\nEnded at %Y",walltimestamp);
}

$ dtrace -s syscalltime.d

               setcontext  10000
                      brk  33400
                 lwp_self  83600
                 schedctl  84700
                    ioctl  153100
              lwp_sigmask  201800
                    alarm  321100
                     pipe  464600
                   access  881400
                    fcntl  988200
                   getpid  1268100
                    uname  1439600
               setsockopt  2684500
                   getcwd  4710800
                   sendto  4900800
                     open  6460800
                    gtime  9309400
                so_socket  11209500
                   doorfs  19206100
                  waitsys  24783800
                    pread  29320300
                     stat  45851200
                    close  53674000
                    lseek  63071800
                   pwrite  66676300
               getsockopt  94663800
                  connect  111886700
                    write  237189600
                     read  462985100
                    pause  2998067200
                    fork1  3568343234
                  pollsys  170526307900

Since the application was supposed to broker connections and initialize itself when the system booted, my first task was determining what was causing all those fork()s. Since an exec would most likely follow each fork(), I fired up the execsnoop script in the DTraceToolkit:

$ execsnoop

  UID    PID   PPID ARGS
911116    901  21542 /bin/sh -c hostname
911116    902    901 /usr/bin/sh /bin/hostname
911116    903    902 /bin/uname -n
911116    905    904 /usr/bin/sh /bin/hostname
911116    904  21617 /bin/sh -c hostname
911116    906    905 /bin/uname -n
911116    907  21600 /bin/sh -c hostname
911116    908    907 /usr/bin/sh /bin/hostname
911116    909    908 /bin/uname -n
911116    910  21583 /bin/sh -c hostname
911116    912    911 /bin/uname -n
911116    911    910 /usr/bin/sh /bin/hostname
911116    914    913 /usr/bin/sh /bin/hostname
911116    913  21707 /bin/sh -c hostname
911116    915    914 /bin/uname -n

Errrr — now why are we invoking a shell and executing /bin/hostname over and over and over again? To see what the application was doing, I wandered off to read the source code. After an hour or so of analyzing the code, I came to a chunk of code that checked for the HOSTNAME environment variable. If the variable was set, the program used the value of the environment variable. If the variable was not set, the program invoked the hostname command to get the name of the host. This is definitely not the best way to structure the program, but the hostname command should never be executed since HOSTNAME is set by the shell. Or is it? To answer this question, I decided to run a quick test:

$ ssh foo@blorp

$ env
HOME=/export/home/foo
LC_COLLATE=en_US.ISO8859-1
LC_CTYPE=en_US.ISO8859-1
LC_MESSAGES=C
LC_MONETARY=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1
LOGNAME=foo
MAIL=/var/mail//foo
PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/sfw/bin
SHELL=/bin/sh
SSH_CLIENT=192.168.1.10 57398 22
SSH_CONNECTION=192.168.1.10 57398 192.168.1.5 22
SSH_TTY=/dev/pts/3
TERM=xterm-color
TZ=US/Eastern
USER=foo

$ echo $SHELL
/bin/sh

$ echo $HOSTNAME

$ /bin/bash

$ echo $HOSTNAME
blorp

It looks like /bin/sh fails to set the HOSTNAME variable, which is why hostname keeps getting executed. But what about all those shells that are created? It turns out that /bin/hostname is actually a shell script that invokes /bin/uname:

$ file /bin/hostname
/bin/hostname: executable /usr/bin/sh script

$ cat /bin/hostname

#!/usr/bin/sh
#       Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T
#         All Rights Reserved
#
#       THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF AT&T
#       The copyright notice above does not evidence any
#       actual or intended publication of such source code.
#
# Copyright (c) 1988, 2001 by Sun Microsystems, Inc.
# All rights reserved.
#
# ident "@(#)hostname.sh        1.5     01/08/15 SMI"
#

TEXTDOMAIN=SUNW_OST_OSCMD export TEXTDOMAIN

if [ $# -eq 0 ]; then
        /bin/uname -n
elif [ $# -eq 1 ]; then
        /bin/uname -S $1
else
        echo `/bin/gettext "Usage: hostname [name]"`
        exit 1
fi

Since the application called /bin/hostname once per connection if HOSTNAME wasn’t set, and /bin/hostname was a shell script, each connection caused a new process to be forked()’ed, a shell started, and /bin/uname executed. This is an inefficient use of resources, especially since this happens once per connection. I submitted a patch to fix this issue, and the solution is now working much better.

The moral of this story: scripts can and do attack!

Debugging problems with Solaris in.dhcpd vendor options

While attempting to jumpstart my Sun Ultra 10 this week, I encountered the following error:

ok boot net:dhcp – install
Boot device: /pci@1f,0/pci@1,1/network@1,1:dhcp File and args: – install
38800 panic – boot: Could not mount filesystem.
Program terminated

The machine was getting a kernel, but for some reason was unable to mount the Solaris miniroot. Since DHCP was used to jumpstart the client, the location of the miniroot should have been provided as a DHCP option. To begin debugging the problem, I fired up snoop on the jumpstart server to see which options (options are used to pass custom items such as the boot server to the client) were sent to the client:

$ snoop -vv -d hme0 ether 0:3:ba:9:62:f8 | grep DHCP


DHCP: Message type = DHCPDISCOVER
DHCP: Client Class Identifier = "SUNW.Ultra-5_10"
DHCP: Requested Options:
DHCP:    1 (Subnet Mask)
DHCP:    3 (Router)
DHCP:   12 (Client Hostname)
DHCP:   43 (Vendor Specific Options)
DHCP: Maximum DHCP Message Size = 1472 bytes

[ ..... ]

DHCP: Message type = DHCPOFFER
DHCP: DHCP Server Identifier = 192.168.1.3
DHCP: IP Address Lease Time = -1 seconds
DHCP: Subnet Mask = 255.255.255.0
DHCP: Boot File Name = SUNW.sun4u

Well this is odd, the server is returning just the basic options (i.e. boot file name, IP address, netmask, etc.). Since the jumpstart-specific options weren’t returned, I ran the dhtadm utility with the “-P” (print the contents of DHCP tables) option to print the symbols and macros in the DHCP tables:

$ dhtadm -P

Name                    Type            Value
==================================================
Solaris11SPARC          Macro           :SinstNM="tigger":SinstIP4=192.168.1.3:SinstPTH="/media/solaris/11_sparc/current":SrootNM="tigger":SrootIP4=192.168.1.3:SrootPTH="/media/solaris/11_sparc/current/Solaris_11/Tools/Boot":BootFile="SUNW.sun4u":SsysidCF="192.168.1.3:/media/solaris/11_sparc/":SjumpsCF="192.168.1.3:/media/solaris/":BootSrvA=192.168.1.3:
Solaris10x86            Macro           :SinstNM="tigger":SinstIP4=192.168.1.3:SinstPTH="/media/solaris/10_x86/current":SrootNM="tigger":SrootIP4=192.168.1.3:SrootPTH="/media/solaris/10_x86/current/Solaris_10/Tools/Boot":BootFile="SUNW.sun4u":SsysidCF="192.168.1.3:/media/solaris/10_x86/":SjumpsCF="192.168.1.3:/media/solaris/":BootSrvA=192.168.1.3:
Solaris10SPARC          Macro           :SinstNM="tigger":SinstIP4=192.168.1.3:SinstPTH="/media/solaris/10_sparc/current":SrootNM="tigger":SrootIP4=192.168.1.3:SrootPTH="/media/solaris/10_sparc/current/Solaris_10/Tools/Boot":BootFile="SUNW.sun4u":SsysidCF="192.168.1.3:/media/solaris/10_sparc/":SjumpsCF="192.168.1.3:/media/solaris/":BootSrvA=192.168.1.3:
FedoraCore5             Macro           :BootFile="fedora5/pxelinux.0":BootSrvA=192.168.1.3:
PXEClient:Arch:00000:UNDI:002001        Macro           :BootSrvA=192.168.1.3:
tigger                  Macro           :Include=Locale:Timeserv=192.168.1.3:LeaseTim=86400:LeaseNeg:DNSdmain="foo.com":DNSserv=192.168.1.1:
Locale                  Macro           :UTCoffst=-18000:
Sterm                   Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,15,ASCII,1,0
SjumpsCF                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,14,ASCII,1,0
SsysidCF                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,13,ASCII,1,0
SinstPTH                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,12,ASCII,1,0
SinstNM                 Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,11,ASCII,1,0
SinstIP4                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,10,IP,1,1
SbootRS                 Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,9,NUMBER,2,1
Stz                     Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,8,ASCII,1,0   
SbootFIL                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,7,ASCII,1,0
SswapPTH                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,6,ASCII,1,0
SswapIP4                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,5,IP,1,0
SrootPTH                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,4,ASCII,1,0
SrootNM                 Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,3,ASCII,1,0
SrootIP4                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,2,IP,1,1
SrootOpt                Symbol          Vendor=SUNW.Ultra-10 SUNW.i86pc,1,ASCII,1,0

At first glance, the DHCP vendor options looked fine, so I started to think through how a DHCP jumpstart installation works. After pondering this for a few minutes, it dawned on me that in.dhcpd uses the Client Class Identifier (e.g., SUNW.Ultra-30) to identify the client and return the options specific to that client. To see what my Ultra 10 was reporting itself as, I fired up snoop:

$ snoop -vv -d hme0 ether 0:3:ba:9:62:f8 | grep DHCP

[ ….. ]

DHCP: Client Class Identifier = “SUNW.Ultra-5_10”

Bingo! The client reported itself as a SUNW.Ultra-5_10, but I had specified SUNW.Ultra-10 when creating the symbols. To fix the issue, I updated the pertinent symbols, restarted in.dhcpd, and low and behold the jumpstart worked flawlessly! The moral of this story: there is none. :)

Locating pipe endpoints

While debugging a performance problem this week, I noticed that one of our applications was issuing 1000s of read and write system calls each minute to file descriptor 19. To get a better idea of what file descriptor 19 was used for, I used the trusty Solaris pfiles utility:

$ pfiles 8988

[ ..... ]
19: S_IFIFO mode:0000 dev:291,0 ino:382227 uid:911116 gid:315 size:0
O_RDWR

Well this is interesting, the application is reading and writing to a pipe. But what resides on the remote side of the pipe? To answer this question, I passed the “ino” numeric argument to find’s “-inum” option:

$ find /proc -inum 382227 2>/dev/null

/proc/8996/fd/0
/proc/8995/fd/0
/proc/8988/fd/20

Once I located the process identifiers, I used the ps command to locate the pipe endpoints:

$ ps -ef | egrep ‘(8995|8996|8988)’

app 8996  8995  0 06:20:31 ?        0:00 /opt/VENDOR/logging.binary
app 8995  8988  0 06:20:31 ?        0:00 /bin/sh -c /opt/VENDOR/logging.binary
app 8988     1  1 06:20:31 ?        5:48 /opt/VENDOR/application

This definitely explains why they are issuing lots of writes, but why reads? Only the vendor can tell. :)

Perl module library madness!

While reading up on website performance monitoring applications last week, I came across the cricket HTTP-performance module. HTTP-performance allows you to graph the time it takes to connect to a website and to render a page. This sounded interesting, so I decided to download and install cricket. After reading through the beginners guide, I installed all of the required Perl modules, and proceeded to run the cricket compile script:

$ ./compile

Can't load '/usr/perl5/5.8.4/lib/i86pc-solaris-64int/auto/DB_File/DB_File.so' for module DB_File: ld.so.1: /usr/perl5/5.8.4/bin/perl: fatal: libdb-4.2.so: open failed: No such file or
directory at /usr/perl5/5.8.4/lib/i86pc-solaris-64int/XSLoader.pm line 68.
 at /usr/perl5/5.8.4/lib/i86pc-solaris-64int/DB_File.pm line 251
Compilation failed in require at /home/rmatteso/cricket-1.0.5/lib/ConfigTree/Node.pm line 25.
BEGIN failed--compilation aborted at /home/rmatteso/cricket-1.0.5/lib/ConfigTree/Node.pm line 25.
Compilation failed in require at ./compile line 39.
BEGIN failed--compilation aborted at ./compile line 39.I 

Ack!!!! — it looks like DB_File.so can’t find the BerkelyDB shared library:

$ ldd /usr/perl5/5.8.4/lib/i86pc-solaris-64int/auto/DB_File/DB_File.so

        libdb-4.2.so =>  (file not found)
        libc.so.1 =>     /lib/libc.so.1
        libm.so.2 =>     /lib/libm.so.2

Since I hate using LD_LIBRARY_PATH, I decided to adjust the link line to include a hard coded runtime path. To determine which variable to edit, I searched the Makefile until I came to the following line:

$ less Makefile

$(LD) $(LDDLFLAGS) $(LDFROM) $(OTHERLDFLAGS) -o $@ $(MYEXTLIB) $(PERL_ARCHIVE) \ $(LDLOADLIBS) $(PERL_ARCHIVE_AFTER) $(EXPORT_LIST) $(INST_DYNAMIC_FIX)

After locating this line, I added “-R/usr/sfw/lib:/usr/local/BerkeleyDB/lib” to the LDDLFLAGS variable and rebuilt the Perl module:

$ make

cc -c  -I/usr/local/BerkeleyDB/include  -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TS_ERR
NO     -DVERSION="1.814"  -DXS_VERSION="1.814"  "-I/usr/perl5/5.8.4/lib/i86pc-solaris-64
int/CORE"  -D_NOT_CORE  -DmDB_Prefix_t=size_t -DmDB_Hash_t=u_int32_t version.c
cc -c  -I/usr/local/BerkeleyDB/include  -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TS_ERR
NO     -DVERSION="1.814"  -DXS_VERSION="1.814"  "-I/usr/perl5/5.8.4/lib/i86pc-solaris-64
int/CORE"  -D_NOT_CORE  -DmDB_Prefix_t=size_t -DmDB_Hash_t=u_int32_t DB_File.c
Running Mkbootstrap for DB_File ()
chmod 644 DB_File.bs
rm -f blib/arch/auto/DB_File/DB_File.so
LD_RUN_PATH="/usr/local/BerkeleyDB/lib" 
cc  -G -R/usr/sfw/lib:/usr/local/BerkeleyDB/lib version.o DB_File.o  -o blib/arch/auto/DB_File/DB_File.so   -L/usr/local/BerkeleyDB/lib -ldb
chmod 755 blib/arch/auto/DB_File/DB_File.so
cp DB_File.bs blib/arch/auto/DB_File/DB_File.bs
chmod 644 blib/arch/auto/DB_File/DB_File.bs
Manifying blib/man3/DB_File.3

$ make install

Installing /usr/perl5/5.8.4/lib/i86pc-solaris-64int/auto/DB_File/DB_File.so
Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
Installing /usr/perl5/5.8.4/man/man3/DB_File.3
Writing /usr/perl5/5.8.4/lib/i86pc-solaris-64int/auto/DB_File/.packlist
Appending installation info to /usr/perl5/5.8.4/lib/i86pc-solaris-64int/perllocal.pod

Everything seemed to work, but did the runpath get set correctly? A quick check revealed that it did:

$ dump -Lv /usr/perl5/5.8.4/lib/i86pc-solaris-64int/auto/DB_File/DB_File.so

/usr/perl5/5.8.4/lib/i86pc-solaris-64int/auto/DB_File/DB_File.so:

  **** DYNAMIC SECTION INFORMATION ****
.dynamic:
[INDEX] Tag         Value
[1]     NEEDED          libdb-4.2.so
[2]     NEEDED          libc.so.1
[3]     INIT            0xe030
[4]     FINI            0xe050
[5]     RUNPATH         /usr/sfw/lib:/usr/local/BerkeleyDB/lib
[6]     RPATH           /usr/sfw/lib:/usr/local/BerkeleyDB/lib

This was a fun experience, since it finally forced me to sit down and read through the ELF standard (great read if you are into that kind of stuff).