Summarizing system call activity on Solaris hosts

I previously described how to use strace to to summarize system call activity on Linux hosts. Solaris provides similar data with the truss “-c” option:

$ truss -c -p 26396
syscall               seconds   calls  errors
read                     .000       3
write                   7.671   25038
time                     .000      21
stat                     .000      15
lseek                    .460   24944
getpid                   .000      15
kill                     .162    7664
sigaction                .004     237
writev                   .000       3
lwp_sigmask              .897   49887
pollsys                  .476    7667
                     --------  ------   ----
sys totals:             9.674  115494      0
usr time:               2.250
elapsed:              180.940

The output contains the total elapsed time, a breakdown of user and system time, the number of errors that occurred, the number of times each system call was invoked and the total accrued time for each system call. This has numerous uses, and allows you to easily see how a process is intereacting with the kernel. Sweet!

Configuring NSCD to cache DNS host lookups

I haven’t really spent that much time configuring nscd, so I thought I would take a crack at it this morning while sipping my cup of joe.

Looking at one of my production hosts, I queried for the “host” cache statistics. This is the nscd cache which keeps DNS lookups. With the nscd daemon running, you can query the size / performance of the caches with the -g flag.


$ nscd -g   
CACHE: hosts

         CONFIG:
         enabled: yes
         per user cache: no
         avoid name service: no
         check file: yes
         check file interval: 0
         positive ttl: 0
         negative ttl: 0
         keep hot count: 20
         hint size: 2048
         max entries: 0 (unlimited)

         STATISTICS:
         positive hits: 0
         negative hits: 0
         positive misses: 0
         negative misses: 0
         total entries: 0
         queries queued: 0
         queries dropped: 0
         cache invalidations: 0
         cache hit rate:        0.0

Ugh! No bueno! So, out of the box, nscd isn’t configured to cache anything. This means that every request this machines does is hitting a DNS server in /etc/resolv.conf. This adds overhead to our DNS servers, and increases the time the applications running on this box have to wait to do something useful. Looking at the configuration options for the “host” cache…


$ grep hosts /etc/nscd.conf 
        enable-cache            hosts           yes
        positive-time-to-live   hosts           0
        negative-time-to-live   hosts           0
        keep-hot-count          hosts           20
        check-files             hosts           yes

Hm. So positive-time-to-live is set to zero. Looking at the man page for /etc/nscd.conf…

positive-time-to-live cachename value
Sets the time-to-live for positive entries (successful
queries) in the specified cache. value is in integer
seconds. Larger values increase cache hit rates and
reduce mean response times, but increase problems with
cache coherence. Note that sites that push (update) NIS
maps nightly can set the value to be the equivalent of
12 hours or more with very good performance implica-
tions.

Ok, so lets set the cache age here to 60 seconds. It seems like a decent starting value… After making this change, and restarting the daemon, here are some performance statistics of the host cache.


CACHE: hosts

         CONFIG:
         enabled: yes
         per user cache: no
         avoid name service: no
         check file: yes
         check file interval: 0
         positive ttl: 60
         negative ttl: 0
         keep hot count: 20
         hint size: 2048
         max entries: 0 (unlimited)

        STATISTICS:
         positive hits: 143
         negative hits: 1
         positive misses: 20
         negative misses: 41
         total entries: 20
         queries queued: 0
         queries dropped: 0
         cache invalidations: 0
         cache hit rate:       70.2

Crazy. Enabling only a 60s cache, we are now performing 70% less DNS lookups. This is going to have a significant performance improvement. By default, the setting keep-hot-count is set to 20. This is the number of objects allowed in the “hosts” cache. Looking at the man page for nscd.conf…


keep-hot-count cachename value

This attribute allows the administrator to set the
number of entries nscd(1M) is to keep current in the
specified cache. value is an integer number which should
approximate the number of entries frequently used during
the day.

So, raising positive-time-to-live to say, 5 minutes wont have much value unless keep-hot-count is also raised. The cache age, and the number of objects within the cache both need to be increased. Doing so will help keep your DNS servers idle, and applications happy.

ipmitool + LOM = rad amounts of hardware data collection.

There are so many different hardware sensors on machines now. If you have a machine with a iLOM (like Sun’s line of x86 hardware) you can probe all of this information over the network using ipmitool and direct it at the iLOM.

Usage for ipmitool over the network:


$ ipmitool -I lan -H (ip address of your lom) -U (username on lom) (command)

Typically Sun iLOMs use either "root" or "admin" as the username.   Adjust to your hardware / environment.  Note, ipmitool does send traffic across the wire in clear text.  It isn't encrypting a password, so keep that in mind.

Reporting on if there are any active fault LEDs on the box. (sunoem is of course, just for Sun gear)


$ ipmitool -I lan -H lomipddy -U root sunoem sbled get
Password: 
LOCATE           | OFF
OK               | ON
SERVICE          | ON
FAN_FAULT        | OFF
TEMP_FAULT       | OFF
PS_FAULT         | ON
MB/P0/D0/SVC     | OFF
MB/P0/D1/SVC     | OFF
MB/P0/D2/SVC     | OFF
MB/P0/D3/SVC     | OFF
....
..

Reporting what components make up the machine (FRU = field replaceable units)


$ ipmitool -I lan -H ilomipaddy -U root fru
Password: 
FRU Device Description : Builtin FRU Device (ID 0)
 Product Manufacturer  : ASPEED
 Product Name          : BMC CONTROLLER

FRU Device Description : /SYS (ID 27)
 Chassis Type          : Rack Mount Chassis
 Chassis Part Number   : 541-1982-06
 Board Product         : ASSY,MOTHERBOARD,DORADO/TUCANA  
 Board Part Number     : 511-1394-02
 Board Extra           : 50
 Board Extra           : DT_MB
 Product Manufacturer  : SUN MICROSYSTEMS
 Product Name          : SUN FIRE X4140  
 Product Part Number   : 4534567-4

FRU Device Description : MB/P0/D4 (ID 12)
 Product Manufacturer  : Micron Technology
 Product Name          : 4096MB DDR-II 666
 Product Part Number   : 36HTF51272PY-667E1
 Product Version       : 0100
 Product Serial        : D93D8C07

FRU Device Description : MB/P1 (ID 7)
 Product Manufacturer  : AMD
 Product Name          : Six-Core AMD Opteron(tm) Processor 2435
 Product Part Number   : 1008
 Product Version       : 00
...
......

iLOM event logs of what happened to the machine in the past


$ ipmitool -I lan -H ilomipaddy -U root sel elist
Password: 
   1 | 12/24/2009 | 02:49:24 | Power Supply PS1/PWROK | State Deasserted
   2 | 06/05/2010 | 07:49:00 | System ACPI Power State ACPI | S0/G0: working | Asserted
   3 | 06/05/2010 | 07:49:03 | Power Supply PS1/PWROK | State Deasserted
   4 | 06/05/2010 | 07:49:04 | Power Supply PS1/VINOK | State Deasserted
   5 | 06/05/2010 | 07:55:14 | Power Supply PS1/PWROK | State Asserted
   6 | 06/05/2010 | 07:55:16 | Power Supply PS1/VINOK | State Asserted
   7 | 06/05/2010 | 08:09:04 | Power Supply PS0/PWROK | State Deasserted
   8 | 06/05/2010 | 08:09:06 | Power Supply PS0/VINOK | State Deasserted
   9 | 06/05/2010 | 08:09:10 | Power Supply PS0/PWROK | State Asserted
   a | 06/05/2010 | 12:23:46 | System ACPI Power State ACPI | S0/G0: working | Asserted
   b | 06/05/2010 | 12:23:51 | Power Supply PS0/VINOK | State Deasserted
....
...

And of course, current voltage / temperature / RPMs of fans, etc.

$ ipmitool -I lan -H ipaddyoflom -U root sdr elist
Password: 
ACPI             | EAh | lnc |  7.0 | 0 unspecified
INTSW            | EBh | ok  | 23.0 | 
MB/P0/PRSNT      | 01h | ok  |  3.0 | Device Present
MB/P1/PRSNT      | 02h | ok  |  3.1 | Device Present
MB/P0/T_CORE     | 09h | ok  |  3.0 | 7 degrees C
MB/P1/T_CORE     | 0Ah | ok  |  3.1 | 18 degrees C
MB/P0/V_VDDCORE  | 05h | lnr |  3.0 | 0 Volts
MB/P1/V_VDDCORE  | 06h | ok  |  3.1 | 1.03 Volts
MB/P0/V_+0V9     | 0Dh | ok  |  3.0 | 0.90 Volts
MB/P1/V_+0V9     | 0Eh | ok  |  3.1 | 0.90 Volts
MB/P0/V_+1V8     | 11h | ok  |  3.0 | 1.80 Volts
MB/P1/V_+1V8     | 12h | ok  |  3.1 | 1.79 Volts
MB/P0/V_VDDNB    | 15h | lnr |  3.0 | 0 Volts
MB/P1/V_VDDNB    | 16h | ok  |  3.1 | 1.31 Volts
MB/P0/PROCHOT    | 19h | lnc |  3.0 | 0 unspecified
MB/P1/PROCHOT    | 1Ah | lnc |  3.1 | 0 unspecified
MB/T_AMB         | 32h | ok  |  7.0 | 29 degrees C
MB/V_+12V        | 1Eh | ok  |  7.0 | 12.10 Volts
MB/V_+1V2HT      | 26h | ok  |  7.0 | 1.22 Volts
MB/V_+1V5        | 20h | ok  | 10.0 | 1.50 Volts
...
...

Note, since we are hitting a iLOM here to query this information using ipmitool, we're not directly interacting with a machine. In all the examples above, the actual server was powered off. Some cool stuff!

Also, all of these ipmitool commands can also be ran locally from the O/S. Instead of using -I Lan to communicate over the network, you'll want to use -I BMC (if you leave it out, it defaults to this) so the O/S knows to communicate with its own LOM.

Sweet! Only thing left is to gather this data and graph it into Zenoss or throw alerts in Nagios based off of poor values. =)

Note, that ipmitool gathering data is only as useful as the hardware it probes that support it. Various SPARC / x64 machines may return other values. Dell / HP gear may also report on things differently, so check out your vendor's LOM's ability to use ipmitool. You can download the source for ipmitool here.

Ridding your Solaris host of zombie processes

We encountered a nasty bug in our backup software this week. When this bug is triggered, each job (one process is created per job) that completes will turn into a zombie. After a few days we will have hundreds or even thousands of zombie processes, which if left unchecked will eventually lead to the system-side process table filling up. Solaris comes with a nifty tool to help deal with zombies (no, they don’t ship you a shotgun with your media kit), and it comes by the name preap. To use preap, you can pass it the PID of the zombie process you want to reap:

$ ps -ef | grep defunct

    root   646   426   0        - ?           0:00 <defunct>
    root  1489 12335   0 09:32:54 pts/1       0:00 grep defunct

$ preap 646
646: exited with status 0

This will cause the process to exit, and the kernel can then free up the resources that were allocated by that process. On a related note, if you haven’t seen the movie zombieland you are missing out!!!! That movie is hilarious!

Managing 100s of Linux and Solaris machines with clusterit

I use numerous tools to perform my SysAdmin duties. One of my favorite tools it clusterit, which is a suite of programs that allows you to run commands across one or more machines in parallel. To begin using the awesomeness that is clusterit, you will first need to download and install the software. This is as easy as:

$ wget http://prdownloads.sourceforge.net/clusterit/clusterit-2.5.tar.gz

$ tar xfvz clusterit*.gz

$ cd clusterit* && ./configure –prefix=/usr/local/clusterit && make && make install

Once the software is installed, you should have a set of binaries and manual pages in /usr/local/clusterit. To use the various tools in the clusterit/bin directory, you will first need to create one or more cluster files. Each cluster file contains a list of hosts you want to manage as a group, and each host is separated by a newline. Here is an example:

$ cat servers
foo1
foo2
foo3
foo4
foo5

The cluster file listed above contains 5 servers named foo1 – foo5. To tell clusterit you want to use this list of hosts, you will need to export the file via the $CLUSTER environment variable:

$ export CLUSTER=/home/matty/clusters/servers

Once you specify the list of hosts you want to use in the $CLUSTER variable, you can start using the various tools. One of the handiest tools is dsh, which allows you to run commands across the hosts in parallel:

$ dsh uptime

foo1  :   2:17pm  up 8 day(s), 23:37,  1 user,  load average: 0.06, 0.06, 0.06
foo2  :   2:17pm  up 8 day(s), 23:56,  0 users,  load average: 0.03, 0.03, 0.02
foo3  :   2:17pm  up 7 day(s), 23:32,  1 user,  load average: 0.27, 2.04, 3.21
foo4  :   2:17pm  up 7 day(s), 23:33,  1 user,  load average: 3.98, 2.07, 0.96
foo5  :   2:17pm  up  5:06,  0 users,  load average: 0.08, 0.09, 0.09

In the example above I ran the uptime command across all the servers listed in file that is referenced by the CLUSTER variable! You can also do more complex activities through dsh:

$ dsh ‘if uname -a | grep SunOS >/dev/null; then echo Solaris; fi’
foo1 : Solaris
foo2 : Solaris
foo3 : Solaris
foo4 : Solaris
foo5 : Solaris

This example uses dsh to run uname across a batch of servers, and prints the string Solaris if the keyword “SunOS” is found in the uname output. Clusterit also comes with a distributed scp command called pcp, which you can use to copy a file to a number of hosts in parallel:

$ pcp /etc/services /tmp

services                   100%  616KB 616.2KB/s   00:00    
services                   100%  616KB 616.2KB/s   00:00    
services                   100%  616KB 616.2KB/s   00:00    
services                   100%  616KB 616.2KB/s   00:00    
services                   100%  616KB 616.2KB/s   00:00    

$ openssl md5 /etc/services
MD5(/etc/services)= 14801984e8caa4ea3efb44358de3bb91

$ dsh openssl md5 /tmp/services
foo1 : MD5(/tmp/services)= 14801984e8caa4ea3efb44358de3bb91
foo2 : MD5(/tmp/services)= 14801984e8caa4ea3efb44358de3bb91
foo3 : MD5(/tmp/services)= 14801984e8caa4ea3efb44358de3bb91
foo4 : MD5(/tmp/services)= 14801984e8caa4ea3efb44358de3bb91
foo5 : MD5(/tmp/services)= 14801984e8caa4ea3efb44358de3bb91

In this example I am using pcp to copy the file /etc/services to each host, and then using dsh to create a checksum of the file that was copied. Clusterit also comes with a distributed top (dtop), distributed df (pdf) as well as a number of job control tools! If you are currently performing management operations using the old for stanza:

for i in `cat hosts`
do
    ssh $host 'run_some_command'
done

You really owe it to yourself to set up clusterit. You will be glad you did!

Configuring the Solaris FTP server to log extended data

I periodically use the stock Solaris FTP server on some of my servers, especially when I need to move tons of data around. Enabling the ftp service in Solaris is a snap:

$ svcadm enable network/ftp

The default ftp configuration leaves a lot to be desired, especially when you consider that nothing is logged. To configure the FTP daemon to log logins, transferred files and the commands sent to the server, you can enter the svccfg shell and add some additional options to the in.ftpd command line:

$ svccfg

svc:> select ftp

svc:/network/ftp> setprop inetd_start/exec=”/usr/sbin/in.ftpd -a -l -L -X -w”

svc:/network/ftp> listprop

The “-a” option will enable the use of the ftpaccess file, the “-l” option will log each FTP session, the “-L” option will log all commands sent to the server, the “-X” option will cause all file acesses to be logged to syslog, and the “-w” option will record the logins to the wtmpx file. Since most of this information is logged using the daemon facility and info log level, you will need to add a daemon.info entry to /etc/syslog.conf if you want the data to be logged to a file (or to a remote log server). To force the changes listed above to take effect, you will need to restart the inetd, system-log and ftp services:

$ svcadm restart inetd

$ svcadm restart network/ftp

$ svcadm restart system-log

Now each time an FTP transfer occurs, you will get entries similar to the following in the system log:

Nov 24 17:46:32 prefetch01 ftpd[9304]: [ID 716067 daemon.info] AUTH GSSAPI
Nov 24 17:46:32 prefetch01 ftpd[9304]: [ID 716067 daemon.info] AUTH KERBEROS_V4
Nov 24 17:46:32 prefetch01 ftpd[9304]: [ID 165209 daemon.info] USER prefetch
Nov 24 17:46:32 prefetch01 ftpd[9304]: [ID 125383 daemon.info] PASS password
Nov 24 17:46:32 prefetch01 ftpd[9304]: [ID 124999 daemon.info] FTP LOGIN FROM 1.2.3.4 [1.2.3.4], backup
Nov 24 17:46:32 prefetch01 ftpd[9304]: [ID 470890 daemon.info] SYST
Nov 24 17:48:42 prefetch01 ftpd[9304]: [ID 225560 daemon.info] QUIT
Nov 24 17:48:42 prefetch01 ftpd[9304]: [ID 528697 daemon.info] FTP session closed

While FTP isn’t to be relied on for 99% of what we do, it definitely still has its place.