Sane Cookie Processing with Varnish

A common refrain heard around the web these days when talking about Drupal web sites is, “Just put Varnish in front of it.” Unfortunately, it is not that simple. I will show you some techniques that will help you get the most out of Varnish and improve performance.

The first prerequisite is that you’ll need Pressflow for best results. Pressflow allows anonymous traffic to be cached more easily by only setting the Drupal session cookie after a user has logged in.

Jed Prentice
#Development | Posted

A common refrain heard around the web these days when talking about Drupal web sites is, “Just put Varnish in front of it.” Unfortunately, it is not that simple. I will show you some techniques that will help you get the most out of Varnish and improve performance.

The first prerequisite is that you’ll need Pressflow for best results. Pressflow allows anonymous traffic to be cached more easily by only setting the Drupal session cookie after a user has logged in. With regular Drupal you can still cache static assets and particular paths, but you will not be able to take full advantage of Varnish; at least, not easily.

For the rest of this discussion, I will assume you know the basics of Varnish and Pressflow. If not, read the Varnish documentation.

A well-known fact with Varnish is that it will not cache requests by default if there are cookies present. Most people get around this by processing common cookies individually: Google Analytics, chartbeat, various ad cookies, etc. The problem is, each time a new ad or other cookie shows up on your site it will bust the cache unless you specifically exclude it from processing. So I began thinking, what about stripping all the cookies except the one I care about, which is the Drupal session cookie? Then our customers can add advertising and other cookies until the cows come home without busting the cache and causing performance to grind to a halt. With a little regexp magic, this is possible. Try this in the vcl_recv subroutine in your VCL:

1
2
3
4
5
6
7
8
9
10
11
12
13
  // Remove all cookies except the Drupal session cookie
  if (req.http.Cookie) {
    set req.http.Cookie = ";" req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie,
                          ";(SESS[0-9a-f]{32,32})=", "; 1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
 
    if (req.http.Cookie == "") {
      remove req.http.Cookie;
    }
  }

This does the following:

  • Puts a semicolon on the front to normalize the cookie string for processing
  • Strips whitespace from between semicolons
  • Strips everything but the Drupal session cookie
  • Further processing to remove whitespace and the leading semicolon
  • If the cookie is an empty string after processing, remove the cookie entirely.

It is important to understand that this processing occurs only between Varnish and the backend (typically Apache); the client, typically a user’s browser, still has all the cookies. Nothing is happening to the client’s original request.

So now you (or your customers) are safe from having to worry about the name of each cookie that might end up on requests to your site.

Jed Prentice