Substituting text in the HTTP request body with mod_substitute

While doing a bit of research tonight I came across a reference to mod_substitute. This nifty module allows you to substitute text in the HTTP request body, which provides an easy way to do things similar to the following:

<Location /private>
    AddOutputFilterByType SUBSTITUTE text/html
    Substitute s/SECRET/XXXXX/ni
</Location>

I digs me some Apache!

Simplifying Apache chroot creation with mod_chroot

Building and maintaining Apache chroot environments can be a royal pain. Creating a chroot environment for Apache requires you to first identify all the libraries and applications that are required to run the httpd processes. Once you identify the dependencies, you need to create a chroot environment that contains these files. After you successfully create the chroot environment, you need to update it when security and reliability updates are released. This can be a time consuming process, and even though several tools (e.g., mock, makejail, etc.) exist to ease this process, there is still a fair amount of work that needs to occur to get things running properly.

One way to get around the hassles of creating chroot environments is to use mod_chroot. Mod_chroot will issue the chroot() system call after the runtime linker loads dependent libraries, and Apache processes its configuration file and opens the access and error logs. Delaying the chroot() system call until after Apache is initialized can greatly reduce the amount of work required to configure the chroot environment, since libraries don’t need to be copied* into the jail, and logs and configuration files can live outside of the chroot environment.

Installing and configuring mod_chroot is a snap. To compile and install mod_chroot from source, you can use the apxs utility from the Apache installation you want to run in the chroot environment:

$ tar xfvz mod_chroot-0.5.tar.gz

$ apxs -cai mod_chroot-0.5/src/apache20/mod_chroot.c

This will compile mod_chroot and install it the Apache loadable modules directory. To configure mod_chroot, you will first need to add a “LoadModule” directive to your httpd.conf to load mod_chroot:

LoadModule chroot_module modules/mod_chroot.so

Next you will need to add a “ChrootDir” directive with the directory you want to chroot Apache to:

ChrootDir /var/chroot/apache

The chroot directory should contain the content your web server serves, and any dependencies that can’t be resolved prior to the web server starting. Configuration is extremely simple, though there are a few caveats to watch out for. The web server cannot be gracefully restarted unless the web server configuration file is moved into the chroot, and programs that lazily load shared libraries will fail. Mod_chroot is an incredible module, and can definitely make managing chroot environments a whole lot easier! Nice!

* If a program uses dlopen() to load a library, you will need to copy the library into the chroot environment, or use the Apache “LoadFile” directive to load it at initialization time.

Customizing PHP builds

A few weeks back I helped a friend build PHP on a server with a non-standard directory structure. Changing the structure to use common defaults wasn’t an option, so we needed to adjust the PHP configure script to point to the pertinent places. Here is what we came up with:

$ export CPPFLAGS=”-I/home/apps/include -I/home/apps/include/mysql”

$ export LDFLAGS=”-L/home/apps/lib -L/home/apps/lib/mysql”

$ ./configure \\
     --prefix=/home/apps/sfw/php-5.1.4 \\
     --with-apxs2=/home/apps/httpd/bin/apxs \\
     --with-libxml=/home/apps \\
     --with-libxml-dir=/home/apps \\
     --with-mysql=/home/apps \\
     --with-zlib=/home/apps

This will build PHP using an apxs utility that resides in /home/apps/bin/apxs, and will look for the MySQL, libxml and zlib libraries and headers in /home/apps/lib and /home/apps/include.

Using wildcards in Apache server aliases

Apache allows you to create hundreds of virtual host containers. Each container is required to have a ServerName directive, which contains the domain name associated with the virtual host. In addition to a server name, one ore more aliases can be associated with the virtual host with the ServerAlias directive. Aliases can contain a domain, or a regex that allows you to match based on some specific criteria. This is super useful, and allows you to do things like this:

NameVirtualHost 192.168.1.18:8080

<VirtualHost 192.168.1.18:8080>
     ServerName foo.com
     ServerAlias *.foo.com
</VirtualHost>

<VirtualHost 192.168.1.18:8080>
     ServerName bar.com
     ServerAlias *.bar.com
</VirtualHost>

This opens a whole slew of cool and interesting possibilities for virtual hosting. Niiiiice!

Measuring Apache request processing time

I support a fair number of Apache web server instances, and periodically need to measure the time it takes Apache (and it’s various modules) to process a request. On Solaris 10 hosts, I can use DTrace to retrieve this information on the fly. Since Solaris 9 and CentOS and Redhat Linux don’t come with DTrace, I use a different approach on these platforms.

To get the time when each request was received by Apache, I used mod_header’s “Header” directive, and “%t” option (time when a request was received, measured in milliseconds from the epoch), to add a response header with the time each request was received:

Header set X-Request-Received: %t

To get the total time Apache spent processing a request, I use mod_header’s “Header” directive, and “%D” option (milliseconds spent processing the request), to add a response header with the total time Apache spent processing each request:

Header set X-Request-Processing-Time: %D

Since I don’t always need the headers to be present, I like to be able to enable and disable them from the command line. The easiest way to do this is by enclosing the directives in a conditional block similar to the following:

<IfDefine RequestTime>
     Header set X-Request-Received: %t
     Header set X-Request-Processing-Time: %D
</IfDefine>

And using the httpd “-D” option to enable them:

$ httpd -k start -DRequestTime

After the headers are enabled, you will see entries similar to the following in each HTTP response:

$ curl -v http://192.168.1.13:8080/

* About to connect() to 192.168.1.13 port 8080

< ..... >

< HTTP/1.1 200 OK
< Date: Tue, 02 Jan 2007 17:59:42 GMT
< Server: Apache/2.2.3 (Unix) DAV/2
< Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT
< ETag: "34d37-2c-4c23b600"
< Accept-Ranges: bytes
< Content-Length: 44
< X-Request-Received: t=1167760782452525
< X-Request-Processing-Time: D=3513
< Content-Type: text/html
* Connection #0 to host 192.168.1.13 left intact
* Closing connection #0

Similar capabilities are available for measuring request processing time on the client. Total time is helpful, but knowing how much of that time was consumed by Apache is invaluable!

Buffering Apache logfiles with the BufferedLogs directive

On busy web servers, the process of writing to the access_log can sometimes overwhelm the spindles in a server. In Apache 2.0.41, the developers added the experimental “BufferedLogs” directive to buffer access_log entries in memory, and write them out as a single group. The documentation indicates that setting “BufferedLogs” to “On” enables buffered logging, but I couldn’t find anything that described how to configure the size of the buffer. After a bit of poking around in mod_log_config.c, I noticed that the size of the buffer was controlled by the LOG_BUFSIZE macro:

if (len + buf->outcnt > LOG_BUFSIZE) {      
      flush_log(buf);
}
if (len >= LOG_BUFSIZE) {
      apr_size_t w;

      str = apr_palloc(r->pool, len + 1);
      for (i = 0, s = str; i < nelts; ++i) {
          memcpy(s, strs[i], strl[i]);
          s += strl[i];
      }
      w = len;
      rv = apr_file_write(buf->handle, str, &w);
}

To see what LOG_BUFSIZE was set to, I searched for the value in mod_log_config.c. This is what my search turned up:

/* POSIX.1 defines PIPE_BUF as the maximum number of bytes that is
 * guaranteed to be atomic when writing a pipe.  And PIPE_BUF >= 512
 * is guaranteed.  So we'll just guess 512 in the event the system
 * doesn't have this.  Now, for file writes there is actually no limit,
 * the entire write is atomic.  Whether all systems implement this
 * correctly is another question entirely ... so we'll just use PIPE_BUF
 * because it's probably a good guess as to what is implemented correctly
 * everywhere.
 */
#ifdef PIPE_BUF
#define LOG_BUFSIZE     PIPE_BUF
#else
#define LOG_BUFSIZE     (512)
#endif

Nifty! So if PIPE_BUF is defined, that will be used as the size. Now to see what the value of PIPE_BUF is set to on my CentOS 4.4 server:

$ cd /usr/include && find . | xargs grep PIPE_BUF
./linux/limits.h:#define PIPE_BUF 4096 /* # bytes in atomic write to a pipe */

So PIPE_BUF is set to a 4k buffer on my CentOS 4.4 server, and this value will be used if it exists. I am curious to see if there are any downsides to using extremely large buffers (hosting providers might be interested in using something larger than 4k). More testing is needed …