Verifying web server content with checksums


While setting up monit to monitor several services I support, I decided to look for an in-depth HTTP monitoring solution to compliment the monitoring capabilities provided by monit. To be more exact, I wanted to find a monitoring solution that would validate the authenticity of the content returned by a web server. Several monitoring solutions (including monit) will issue a GET request to a web server, and check that the server replied with a 200 OK status code. This works for most situations, but it doesn’t detect content deployment snafus, or server misconfigurations (the ones that don’t generate 500 status codes). I couldn’t find an opensource software package that provided this level of in-depth monitoring, so I decided to write content-check.

Content-check is written in Bourne shell, and provides in-depth HTTP monitoring by comparing a saved SHA1 hash with a SHA1 hash generated from the content returned by a web server. If the two hashes don’t match, content-check will generate a syslog entry (which can be picked up by monit) with the logger utility, and E-mail the website administrator to let them know that the content did not hash to a known value.

To configure content-check, you first need to generate a hash for the webpage you want to monitor. This can be accomplished by passing an absolute URL to content-check’s “-g” (generate hash) option:

$ content-check -g http://prefetch.net/index.htm

da39a3ee5e6b4b0d3255bfef95601890afd80709

After you generate the hash, you will need to place the hash and the absolute URL to monitor in a text file. This file can contain multiple site / hash values, but only one site / hash pair is allowed per line. Once the file is populated with one or more sites to monitor, content-check can be invoked with the “-f” option and the file that contains the list of sites to monitor:

$ cat sites

http://prefetch.net/articles/yum.html da39a3ee5e6b4b0d3255bfef95601890afd80709
http://prefetch.net/index.html da39a3ee5e6b4b0d3255bfef95601890afd80709

$ content-check -f sites

If one of the sites listed in the file doesn’t hash to the value stored in the file, an E-mail is sent to the address passed to the “-e” option (or root), and a syslog message similar to the following is generated:

Jul 21 16:27:01 neutron matty: [ID 702911 daemon.notice] Content from
http://prefetch.net/index.html did not hash to
da39a3ee5e6b4b0d3255bfef95601890afd80709

Since it is possible for web servers to break in ways that allow them to still serve content, validating the content they return is the only way to know for sure that your site is working optimally.

This article was posted by Matty on 2006-07-21 23:30:00 -0400 -0400