Digging In To Apache's Rewrite Module And Drupal

Chris Johnson, VP of Engineering
#Drupal | Posted

Apache's mod_rewrite module is a powerful tool behind Drupal's clean URLs, but it can lead to some head scratching moments when attempting to implement custom behavior. In this post, we'll walk through a few examples and dig into some corner cases of Apache's Rewrite module and Drupal.

For our first scenario we're redirecting all GET requests to SSL except those for robots.txt. This is pretty straight forward

  1. RewriteCond %{HTTP_HOST} example.com
  2. RewriteCond %{REQUEST_URI} !^/robots.txt
  3. RewriteCond %{HTTPS} !on
  4. RewriteCond %{REQUEST_METHOD} GET
  5. RewriteRule ^ https://example.com%{REQUEST_URI} [L,QSA,R=301]

Where it becomes interesting is if we also want to allow requests handled by Drupal to avoid being redirected. For example, assume that any RSS feeds are handled by Drupal but there is no need to force those over SSL. You might think the following would be sufficient:

  1. RewriteCond %{HTTP_HOST} example.com
  2. RewriteCond %{REQUEST_URI} !^/robots.txt
  3. RewriteCond %{REQUEST_URI} !.rss$
  4. RewriteCond %{HTTPS} !on
  5. RewriteCond %{REQUEST_METHOD} GET
  6. RewriteRule ^ https://example.com%{REQUEST_URI} [L,QSA,R=301]

Using only the above, however, results in a redirection being issued to https://example.com/index.php. This is because while the redirect is bypassed here, Drupal's use of the Rewrite module means that another pass through the rewrite rules is generated and the redirect is happening on this pass. This pass does not appear to trigger as a sub-request so adding the NS flag to the RewriteRule does not stop the rule from being applied. Using SetEnvIf, however, provides an option.

  1. SetEnvIf Request_URI .rss$ RSS_REQUEST=TRUE
  2. RewriteCond %{HTTP_HOST} example.com
  3. RewriteCond %{REQUEST_URI} !^/robots.txt
  5. RewriteCond %{ENV:RSS_REQUEST} !=TRUE
  6. RewriteCond %{HTTPS} !on
  7. RewriteCond %{REQUEST_METHOD} GET
  8. RewriteRule ^ https://example.com%{REQUEST_URI} [L,QSA,R=301]

What is important to notice is that while the SetEnvIf line only sets the variable RSS_REQUEST, we actually test against both RSS_REQUEST and REDIRECT_RSS_REQUEST. This is because when Apache is processing the first pass the RSS_REQUEST variable has our desired value but on the subsequent pass it is renamed to REDIRECT_VARIABLE_NAME, in this case REDIRECT_RSS_REQUEST.

The SetEnvIf technique can be very useful, especially in places where you may want conditional authentication provided by the web server. This is commonly done for development or staging sites to prevent them from being accidentally crawled by search engines and to provide some security for the sites while they are in development. The conditional authentication can be handy when there are aspects of the site that can't interoperate with the authentication requirement but you don't want to make the entire site available. Here is an example of using SetEnvIf to allow bypassing HTTP Basic authentication for a few different paths on a site. As you can see, what we learned about the REDIRECT_* variable renaming from the Rewrite examples is important.

  1. AuthType Basic
  2. AuthName "Access Restricted"
  3. AuthUserFile /path/to/the/username/password/file/.htpasswd
  4. Require valid-user
  5. SetEnvIf Request_URI ^/openid/ AUTH_WHITELIST
  6. SetEnvIf Request_URI ^/user/.*/identity AUTH_WHITELIST
  7. SetEnvIf Request_URI ^/sites/.*/files/ AUTH_WHITELIST
  8. Deny from All
  9. Allow from env=AUTH_WHITELIST
  10. Allow from env=REDIRECT_AUTH_WHITELIST
  11. Satisfy Any

If you would like to dig in more on Apache's rewrite module check out the following resources

Chris Johnson

VP of Engineering