Over the course of the past ten years companies have increasingly turned to the world wide web to sell and market products, and as an avenue to provide around the clock support for their customers. The infrastructure used to support these systems is typically split into vertical tiers based on the type of infrastructure (e.g., load-balancers, web proxies, web servers, application servers, database servers) deployed. This tiered architecture approach allows companies to deploy multiple systems at each tier to ensure website resiliency, and eases the process of upgrading systems since each tier can be scaled as demand for the companies services increases.

There are numerous performance and availability benefits associated with tiered web-based infrastructure, but there are also a few drawbacks. The biggest drawback is the complexity added by multiple systems, since pinpointing faulty systems in large deployments becomes a bit more difficult. This article will provide an introduction to three opensource tools that can assist with debugging web-applications and pinpointing problems. A case study will also be presented to show how these tools can be used to solve a real-world problem.

Debugging with Curl

One tool that is invaluable for debugging web-based applications is curl. Curl provides a full featured command line environment which can be used to retrieve files, download web-based content, and to view the application-layer headers that are sent between clients and servers. To get started with curl(1m), the curl binary can be executed with the “-h” (print help menu) option to print the available options and the values that can be passed to those options.

curl will retrieve the resource passed as an argument and print the contents to standard out, or to the file passed to the “-o” option. The resource can be in the form of an http://, https:// or ftp:// style URL. The following example shows how curl can be used to retrieve the the curl source code:

$ curl -o curl.tar http://curl.haxx.se/download/curl-7.15.0.tar.gz

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 23 1709k   23  396k    0     0  40671      0  0:00:43  0:00:09  0:00:34 23822

Curl also contains numerous advanced debugging options, which can be used to test HTTP features, display verbose output, and to retrieve protocol headers. The following example uses curl’s “-v” (verbose output) option to display the HTTP request and HTTP response headers for a connection to the prefetch.net web server:

$ curl -k -v https://mail.prefetch.net

* About to connect() to mail.prefetch.net port 443
*   Trying 206.222.17.179... * connected
* Connected to mail.prefetch.net (206.222.17.179) port 443
* successfully set certificate verify locations:
*   CAfile: /usr/share/curl/curl-ca-bundle.crt
  CApath: none
* SSL connection using DHE-RSA-AES256-SHA
* Server certificate:
*        subject: /C=US/O=mail.prefetch.net/OU=https://services.choicepoint.net/get.jsp?1605445126 \
         /OU=See www.rapidssl.com/cps (c)04/OU=Domain Control Validated - StarterSSL(TM)/CN=mail.prefetch.net
*        start date: 2005-05-20 22:09:48 GMT
*        expire date: 2006-06-20 22:09:48 GMT
*        common name: mail.prefetch.net (matched)
*        issuer: /C=US/O=Equifax Secure Inc./CN=Equifax Secure eBusiness CA-1
* SSL certificate verify result: error number 1 (20), continuing anyway.
> GET / HTTP/1.1
User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7g zlib/1.2.3
Host: mail.prefetch.net
Pragma: no-cache
Accept: */*

< HTTP/1.1 302 Found
< Date: Thu, 20 Oct 2005 18:49:20 GMT
< Server: Apache
< Set-Cookie: Horde=7ebcae69e30287045dd3d3d1fe1dd31f; path=/; domain=mail.prefetch.net; secure
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Location: https://mail.prefetch.net/login.php?Horde=7ebcae69e30287045dd3d3d1fe1dd31f
< Content-Length: 0
< Content-Type: text/html; charset=ISO-8859-1
* Connection #0 to host mail.prefetch.net left intact
* Closing connection #0

If reverse proxies or load-balancers are used to distribute content across multiple web servers, it can sometimes be difficult to determine which web server responded to a clients request. This is especially true when the clients source IP address is NAT’ed, or when the web server is handling hundreds of requests simultaneously. When these issues arise, you can use curl’s “–user-agent” (user custom user agent) and “-H” (send custom request header) options to send a unique user-agent and request header to the server:

$ curl -v --user-agent "CURL DEBUG (date)" -H "X-foo: yikes" --show-error prefetch.net

If the web server is configured to log the User-agent attribute, the string “CURL DEBUG” will be logged along with a date timestamp:

$ tail -1 access_log

10.10.10.10 - - [26/Oct/2005:19:11:24 -0400] "GET / HTTP/1.1" 301 252 "-" "CURL DEBUG (Wed Oct 26 13:11:24 EDT 2005)"

When a problem is detected with the content returned from a server, this information can used to easily find the server that is returning the errant content. The case study “Debugging sporadic website behavior” will show how this feature was used to debug problems with an Apache web server.

Debugging with Chaosreader

When debugging complex web applications, the ability to view the complete client-server interaction and drill down to specific requests and responses can be invaluable. This capability is available with the freeware ethereal and chaosreader utilities. Since ethereal is covered thoroughly in numerous books and online articles, we will focus on the capabilities of chaosreader in this section.

Chaosreader is written in Perl, and produces reports from tcpdump or snoop capture files. These reports include information on TCP, UDP, ICMP and IP traffic, and contain the application layer data from several well known protocols. To get started with chaosreader, you need to download the Perl script from the Sourceforge website.

Once the script has been downloaded to a local file system, the script can be executed with the “–help” option to display the available options and several practical examples.

You use tcpdump and snoop to collect network traffic that can be analyzed by chaosreader. The following example uses tcpdump to write all packets with a source or destination port of 80 to a file named chaosreader.dump:

$ tcpdump -i en0 -s 1518 -w chaosreader.dump port 80

This example uses a snap length of 1518-bytes to ensure that all protocol headers and application data are captured. To get the most benefit from chaosreader, the packet captures should be taken when a problem is detected with a web or application server. To analyze the packet capture with chaosreader, the file with the saved packets can be passed as an option to the chaosreader script:

$ chaosreader.pl -D html chaosreader.dump

  0003  192.168.1.8:55510,209.249.116.195:80           http
  0008  192.168.1.8:55515,209.249.116.197:80           http
  0004  192.168.1.8:55511,209.249.116.195:80           http
  0002  192.168.1.8:55509,209.249.116.195:80           http
  0005  192.168.1.8:55512,209.249.116.195:80           http
  0011  192.168.1.8:55518,209.249.116.197:80           http
  0001  192.168.1.8:55508,209.249.116.195:80           http
  0009  192.168.1.8:55516,216.52.17.116:80             http
  0006  192.168.1.8:55513,209.249.116.197:80           http
  0007  192.168.1.8:55514,216.52.17.116:80             http

Once chaosreader finishes processing the packet capture file, the results of the analysis can be viewed by changing to the directory passed to the “-D” (Output all files to this directory) option and opening the file named index.html with a web browser. This page contains all of the connections that were detected displayed in chronological order.

Each connection contains a unique connection descriptor, the date the request was issued, the number of bytes sent between the two end-points, and the source and destination IP addresses and port numbers. Each connection also contains a table with hyper links to the individual objects (e.g., images, HTML) transmitted between the client and server. To view the protocol headers along with the results of the result of the requests, the “as_html” link can be used. The “as_html” link is a great tool for debugging web applications, since the requests and the results of those requests are displayed in chronological order.

Viewing content and headers with HTTP Live Headers

The curl and chaosreader utilities are great tools for debugging web applications, but require a UNIX shell and Perl interpreter to utilize their full capabilities. This is not ideal for all users, since some administrators are unable to get shell access to a UNIX system, or are unable to install a Perl interpreter on their desktop. If you happen to fall into this category, you can use the Firefox live headers plug-in to debug web-based applications. The Live HTTP Headers plug-in will display the request and response headers as a page is loaded in Firefox, and provides numerous options to filter results and to control which data is collected.

To get started with the HTTP live headers plug-in, you can point your Firefox browser to the main live headers website. Once your browser renders the page, you can click the “Installation” tab, and click on the version that matches the version of Firefox you are using. Firefox will then proceed to install the plug-in from a remote location, and will add a new menu titled “HTTP Live Headers to the Tools drop down menu.

If you would like to have more control over the installation process, you can download the plug-in by clicking the “download it” link on the live headers Installation page, or by right clicking on the file and using the save as option. Once the plug-in has been downloaded to the local drive, you can use Firefox’s “File -> Open File” menu to open the file and begin the installation. Once the installation completes, a new menu titled “HTTP Live Headers” will be available under the Tools drop down menu.

To open the Live HTTP Headers plug-in, you can click “Tools -> Live HTTP Headers.” This will open a new window with four tabs and a large text box. Each time a website is visited in the main Firefox window, the HTTP request and response headers will be displayed in the large text box. If you would like to display the headers from specific pages, you can click on the Config tab, and add a regular expression to the “Filter URLs with regexp” option.

Case Study: Debugging inconsistent website behavior

While working at my desk on a Friday afternoon, one of my colleagues came over to my desk to discuss a problem he was experiencing. When he visited a specific website, he was periodically receiving messages in his browser stating that the “maximum number of redirects had been reached.” My colleague asked if I could recreate the problem, so I placed my current task on hold and started debugging the issue to find the source of the problem.

I started my analysis by connecting to the site with Firefox and repeatedly refreshing the site. After refreshing the site 10 - 20 times, I received the error message mentioned above. Since this appeared to be an issue with redirects on one of a slew of web servers behind a load-balancer, I needed a way to accurately pinpoint which server or servers were sending the faulty redirects. I also needed a way to capture the redirect location, which is reflected in the “Location” attribute in the HTTP header. After some careful thought, I decided to use a Bourne shell loop and curl’s “–user-agent” option to address both issues. The loop would allow me to send multiple requests to the server with curl, which I could parse to retrieve the Location header. Curl’s “–user-agent” option would allow me to set a unique string identifier which could be parsed out of the web server access_log once I detected a failure. The following loop is what resulted:

while :
do
   DATE=`/bin/date`

   echo "** Processing request at ${DATE} **" >> badserver.txt
   curl -v --user-agent "CURL DEBUG (${DATE})" http://mysite.com 2&gt;&amp;1 | egrep "Location" &gt;&gt; badserver.txt
   sleep 5
done

This loop will send one HTTP GET request to the server mysite.com every 5-seconds, and the Location attribute and time of the request will be logged to the file badserver.txt. I let this loop run 10 - 20 times, which seemed to be the number of connections required to trigger the problem. Once I exited the loop, I saw the following entries in the file badserver.txt:

** Processing request at Wed Oct 26 19:11:18 EDT 2005 **
< Location: https://mysite.com/

** Processing request at Wed Oct 26 19:11:24 EDT 2005 **
< Location: https://mysite.com/

** Processing request at Wed Oct 26 19:11:29 EDT 2005 **
< Location: http://mysite.com/

** Processing request at Wed Oct 26 19:11:34 EDT 2005 **
< Location: https://mysite.com/

The Location directives indicated that non-secure requests were being redirected to a secure site on all but one of the servers. The one server that was behaving differently was sending clients a non-secure redirect. When the client followed the redirect to the non-secure website, the web server would reply with another non-secure redirect, which was causing a redirect loop to occur. The browser noticed the loop, and terminated the connection after a specific number of redirects were performed. Once I had this information, I used the string “CURL DEBUG” along with the date to identify the server that was not sending the broken redirect. This was accomplished by tailing the access_log on each server and searching for the string “CURL DEBUG (Wed Oct 26 19:11:29 EDT 2005)":

$ tail -10000 access_log | egrep "CURL DEBUG (Wed Oct 26 19:11:29 EDT 2005)"

10.10.10.10 - - [26/Oct/2005:19:11:29 -0400] "GET / HTTP/1.1" 301 252 "-" "CURL DEBUG Wed Oct 26 13:11:29 EDT 2005"

Once I found the web server that was returning the incorrect redirect information, I fired up vi to review the Apache httpd.conf configuration file. A quick search for the string “Redirect” revealed the following configuration:

<VirtualHost *:80>
        Redirect permanent / http://mysite.com/
</virtualhost>

The problem turned out to be a typographical error in the httpd.conf configuration file, and was easily fixed by changing the Redirect string to the following:

<VirtualHost *:80>
        Redirect permanent / https://mysite.com/
</virtualhost>

Once this change was made, all of the servers began working as expected. This problem could have been debugged from a number of angles and with numerous software utilities and load-balacner features.

Conclusion

Web-applications are being used by more and more companies and organizations to sell services and to interface with customers. When web-based applications break or become unresponsive, it is essential to have a set of software tools to troubleshoot and pinpoint problems. This article presented a brief overview of three tools that can be used to debug problems with web-based applications. For additional information on each utility discussed in this article, and for references to additional utilities, please see the Reference section of this article.

References

The following references were used while writing this article:

Acknowledgments

Ryan would like to thank the developers of chaosreader, curl, ethereal, firefox, siege, and the wget software utilities!

Originally published in the March ‘06 issue of SysAdmin Magazine