Finding BIND failures in OpenLDAP logfiles

When OpenLDAP is configured to log connection information, a RESULT entry is written with the status (e.g., success or failure) of the last BIND:

$ grep RESULT openldap.log | head -1
Dec 28 21:05:01 winnie slapd[7101]: [ID 217296 local4.debug] conn=25 op=0 RESULT tag=97 err=0 text=

The “err=” string contains zero if the BIND was successful, and an error code if the BIND didn’t complete successfully. The error codes are defined in “LDAPResult.h,” which is included with the OpenLDAP source code. When a BIND fails because the credentials were invalid, the error string will contain the value 49:

$ grep “err=49” openldap.log | head -1

Dec 29 12:22:00 winnie slapd[7101]: [ID 217296 local4.debug] conn=27 op=0 RESULT tag=97 err=49 text=

To get the DN that tried to BIND, you can grep the connection number (the value after conn=) out of the OpenLDAP logfile:

$ grep “conn=27.*BIND” openldap.log
Dec 29 12:22:00 winnie slapd[7101]: [ID 347666 local4.debug] conn=27 op=0 BIND dn=”cn=mail,dc=synackfin,dc=com” method=128

You can get the IP address of the host that initiated the BIND by grepping the connection id, along with the ACCEPT keyword, from the OpenLDAP logfile:

$ grep “conn=27.*ACCEPT” openldap.log
Dec 29 12:22:00 winnie slapd[7101]: [ID 848112 local4.debug] conn=27 fd=11 ACCEPT from IP=192.168.1.2:39749 (IP=192.168.1.2:636)

This is useful for tracking down folks who are poking around your directory server.

Measuring Website Page Load Times

I was curious to see how long it would take to load “index.html” from my web server, so I created http-ping.pl to measure the time it took to retrieve a webpage from my web server:
$ http-ping.pl -s prefetch.net -p 80 -d 2

Querying HTTP server prefetch.net:80 every 2 seconds (Ctrl-C to stop):
Mon Nov 29 18:00:21 2004: TCP Connection Time=0.034s HTTP GET Time=0.052s [Normal Delay]
Mon Nov 29 18:00:23 2004: TCP Connection Time=0.035s HTTP GET Time=0.052s [Normal Delay]
Mon Nov 29 18:00:25 2004: TCP Connection Time=0.035s HTTP GET Time=0.055s [Normal Delay]
Mon Nov 29 18:00:27 2004: TCP Connection Time=0.035s HTTP GET Time=0.053s [Normal Delay]
Mon Nov 29 18:00:30 2004: TCP Connection Time=0.035s HTTP GET Time=0.051s [Normal Delay]

This will display the time it takes to perform the TCP three way handshake ( Syn, Syn/Ack, Ack), and GET the index page from a web server. This was a quick and dirty hack, and a full rewrite in C (w/ SSL support and timings!) is forthcoming. If you are interested in using http-ping.pl, you can grab the code from prefetch.net.

Selectively tracing Solaris system calls

I was debugging an I/O problem last week, and needed to see how an application was using a UFS file system on a Solaris 9 server. Solaris comes with the “truss” utility, which can be used to trace library and system calls. When truss is invoked with the “-t” option, only the system calls passed to “-t” will be traced:

$ truss -leaf -t open,close,write,lseek -p 7044
7044/1: psargs: /usr/local/bin/blah -ubpid 17595 -web -wtbhostname w4
7044/1: open(“log/httpd_log”, O_WRONLY) = 25
7044/1: lseek(25, 0, SEEK_END) = 0x04B33A05
7044/1: write(25, ” SOME_ TEXT “.., 133) = 133
7044/1: close(25) = 0

This example is taken from an application I manage ( I didn’t write it ). The application opens a log file, seeks to the end of the file, writes 133-bytes of data, and then closes the log file. The application does this 100s of times per minute, which is extremely inefficient. To fix this problem, the applications needs to be modified to use a single open() and close(). This will also allow us to cluster writes (if the appropriate tunables and open() options are are set correctly) to the log file.