Cleaning up HTML files with tidy


I have read a number of documents on correctly using CSS and XHTML over the past month, and have learned about a number of common mistakes people make when creating content that uses these technologies. Most of the articles discussed ways to structure web content to avoid these pitfalls, which got me wondering if anyone had taken these recommendations and created a tool to analyze content for errors. After a bit of googling, I came across the W3C content validation site, as well as the tidy utility.

The W3C website is super easy to use, and it provides extremely useful feedback that you can use to improve your content. The tidy utility provides similar capabilities, but has options to actually correct errors it finds in the files it analyzes. Tidy can be downloaded from sourceforge, or installed with your favorite package utility (the CentOS repositories contain tidy, so it’s a yum install way). Once tidy is installed, you can pass the name of one or more files to analyze as arguments:

$ tidy --indent index.html

line 8 column 1 - Warning: <link> isn't allowed in

elements
line 3 column 1 - Info: <html> previously mentioned
line 74 column 28 - Warning: unescaped & which should be written as &
line 74 column 29 - Warning: unescaped & which should be written as &
line 191 column 15 - Warning: discarding unexpected </h2>
line 181 column 9 - Warning: <a> escaping malformed URI reference
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
5 warnings, 0 errors were found!

<HTML FILE CONTENTS WITH FIXES APPLIED>

The tidy output will contain the list of errors it detected as well as the corrected HTML code. This is amazingly cool, and it has tipped me off to a few issues with some of the XHTML files that I am using to support my website. Tidy and the W3C validation site are incredibly useful which will hopefully enhance the experience for individuals who access W3C validated content.

This article was posted by Matty on 2008-02-16 16:23:00 -0400 -0400