Having the Butler Tidy up your HTML

Guess what? We've got a session on this topic proposed for DrupalCon London! This post is just the beginning...want more? Vote for us!

Continuous Integration tools have become incredibly popular for helping to maintain and deploy Drupal sites. Often, CI is only talked about in terms of helping sysadmins and developers move code throughout environments and run utility scripts like cron. Continuous Integration can be a great boon for QA teams and Front-End developers, too.

Brian McMurray, Software Architect
#Development | Posted

Guess what? We've got a session on this topic proposed for DrupalCon London! This post is just the beginning...want more? Vote for us!

Continuous Integration tools have become incredibly popular for helping to maintain and deploy Drupal sites. Often, CI is only talked about in terms of helping sysadmins and developers move code throughout environments and run utility scripts like cron. Continuous Integration can be a great boon for QA teams and Front-End developers, too.

At Treehouse Agency, we love to use Jekins to manage our continuous integration jobs. One of the great things about Jenkins is that a scheduled job in Jenkins can execute shell scripts. With some creative bash-scripting, we can create utility jobs to do just about anything.

Tidying Up

A little while back I was researching ways to validate webpages from the command line and I came across the Tidy command line tool ( http://tidy.sourceforge.net/ ). This simple utility lets you pass in an HTML document and validate it. Like the online versions ( http://infohound.net/tidy/ ), the command-line version of tidy can re-format an HTML page to clean up indentation and some other cool stuff. Most importantly for the purposes of having Jenkins tell me if my HTML becomes invalid, it can generate a validation report and let me know if there are problems with a webpage.

Using Tidy and Curl, I whipped together this simple little shell script that takes a comma-separated list of urls to test, then runs Tidy on them all, generating a report of any validation warnings for each URL.

Let's take a look at the script:

tidier.sh

[sh]#!/bin/bash
[[ $1 ]] || {
  echo "Usage: tidier.sh url [url]"
  exit 1;
}

global_ec=0
for url in $@ ; do
  echo '---- Validating' $url '----'
  tidy_err=`curl -s $url | tidy -eiq 2>&1`
  # Save the exit code
 EC=$?
  case $ec in
    0)
      vs='--SUCCESS--';;
    1)
      vs='--WARNING--';;
    2)
      vs='--ERROR--'
      global_ec=2;;
  esac
  echo 'Validation Status:' $vs
  [[ "$ec" == "0" ]] || {
    # Convert each error to be on its own line.
   echo '------------------'
    echo " "$tidy_err | sed 's/ line/
      line/g'
    echo '------------------'
  }
done

exit $global_ec[/sh]

The script keeps track of the error responses from Tidy and can exit with error code 2 if there is a catastrophic error in attempting to validate a page.

Assigning our job to the Butler

With some very simple modifications, we can take this script and use it as a configurable build job in Jenkins. Jenkins jobs can take build parameters, and since our script is designed to take a configurable set of urls to validate, this is a great opportunity to make that a build parameter for Jenkins.

In Jenkins, let's add a new Free-Style Software Project and call it 'tidy_up':

Now let's configure the new job. First, we'll add a description and configure this job to keep some of its build history so we have a historical record. Then, we'll add a string build parameter and call it `IN_URLS.` This build parameter will let us change which URLs we test against each time we run this build.

At this point, you could set up your Jenkins job to run automatically by polling for changes from a source code repository or by setting it to run on a regular time interval.

Now we need to add a build step to the job. Since we've written a shell script, we'll add an "Execute Shell" build step and slightly modify our script above to use the new IN_URLS parameter:

[sh]#!/bin/bash
[[ $IN_URLS ]] || {
  echo "ERROR: Please set the IN_URLS parameter to one or more URLs."
  exit 1
}

global_ec=0
for url in $IN_URLS ; do
  echo
  echo '---- Validating' $url '----'
  echo
  tidy_err=`curl -s $url | tidy -eiq 2>&1`
  # Save the exit code
 ec=$?
  case $ec in
    0)
      vs='--SUCCESS--';;
    1)
      vs='--WARNING--';;
    2)
      vs='--ERROR--'
      global_ec=2;;
  esac
  echo 'Validation Status:' $vs
  [[ "$ec" == "0" ]] || {
    # Convert each error to be on its own line.
   echo '------------------'
    echo " "$tidy_err | sed 's/ line/
      line/g'
    echo '------------------'
  }
done

exit $global_ec[/sh]

Save the job and we have a working Jenkins job to run Tidy on a configurable set of URLs.

If you run our new Jenkins job through its paces, you'll notice something. The build reports will report a success even if there are validation warnings returned by Tidy. To address this we'll add in a simple Jenkins plug-in called Text Finder, which will allow us to run a regular expression on the build report and look for our validation warning message.

Get the Text Finder plugin here: https://github.com/jenkinsci/text-finder-plugin and install it then go back to your 'tidy_up' configuration screen. In the "Post-build Actions" enable "Hudson Text Finder" and configure it like so:

Save your changes and we've now configured Jenkins to run a simple HTML validator on any number of URLs and mark the build unstable if there are validation errors that need to be addressed.

Interested in learning more about putting Jenkins to use for assisting your QA team and Front-End Developers? Vote for and plan to attend our session at DrupalCon London 2011!

Brian McMurray

Software Architect