Module Caches – When, Where and How

In the Drupal community, you see caching discussions related to pages, blocks, reverse-proxies, opcodes, and everything in between. These are often tied to render- and database-intensive optimizations to decrease the load on a server and increase throughput. However, there is another form of caching that can have a huge impact on your site’s performance – module level data caching. This article explores Drupal 7 core caching mechanisms that modules can take advantage of.

h2.

Robert Bates, Senior Developer
#Drupal | Posted

In the Drupal community, you see caching discussions related to pages, blocks, reverse-proxies, opcodes, and everything in between. These are often tied to render- and database-intensive optimizations to decrease the load on a server and increase throughput. However, there is another form of caching that can have a huge impact on your site’s performance – module level data caching. This article explores Drupal 7 core caching mechanisms that modules can take advantage of.

When?

Not all modules require data caching, and in some cases due to “real-time” requirements it might not be an option. However, here are some questions to ask yourself to determine if module-level data caching can help you out:

  • Does the module make queries to an external data provider (e.g. web service API) that returns large datasets?
  • If the module pulls data from an external source, is it a slow or unreliable connection?
  • If calling a web service, are there limits to the number of calls the module can make (hourly, daily, monthly, etc.)? Also, if it is a pay service, is it a variable cost based on number of calls?
  • Does the hosting provider have penalties for large amounts of inbound data?
  • Does the data my module handles require significant processing (e.g. heavy XML parsing)?
  • Is the data the module loads from an external source relatively stable and not change rapidly?

If you answered, “yes,” to more than a third of the questions above, module-level data caching can probably help your module’s performance by providing the following features:

  • Decrease external bandwidth
  • Decrease page load times
  • Reduce load on the site’s server
  • Provide reliable data services

Where?

OK, so you’ve decided your module could probably benefit from some form of module-level data caching. The next thing to determine is where to store it. You can always use some form of file-based caching, but to implement that with the proper abstractions to run on a variety of servers requires calls through the Drupal core File APIs, which can be a bit convoluted at times. File-based caching mechanisms also cannot take advantage of scalable performance solutions like memcache or multiple database server configurations that might be changed at any time.

Luckily, Drupal core provides a cache mechanism available to any module using the cache_get and cache_set functions, fully documented on http://api.drupal.org:

 

  1. <span style="color: #000000;"><span style="color: #0000bb;"><?php
  2. cache_get</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$cid</span><span style="color: #007700;">, </span><span style="color: #0000bb;">$bin </span><span style="color: #007700;">= </span><span style="color: #dd0000;">'cache'</span><span style="color: #007700;">)
  3. </span><span style="color: #0000bb;">cache_set</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$cid</span><span style="color: #007700;">, </span><span style="color: #0000bb;">$data</span><span style="color: #007700;">, </span><span style="color: #0000bb;">$bin </span><span style="color: #007700;">= </span><span style="color: #dd0000;">'cache'</span><span style="color: #007700;">, </span><span style="color: #0000bb;">$expire </span><span style="color: #007700;">= </span><span style="color: #0000bb;">CACHE_PERMANENT</span><span style="color: #007700;">)
  4. </span><span style="color: #0000bb;">?></span></span>

 

By default, these functions will work with the core cache bin called simply “cache.” This is the main dumping ground for Drupal core for data that can persist in the system for a length of time beyond the one page call, and are not tied to a session. However, many modules define their own cache bins so they can provide their own cache management processes. A few core module ones are:

  • cache_block
  • cache_field
  • cache_filter
  • cache_form
  • cache_menu
  • cache_page

Seeing as how several core Drupal modules implement their own cache bins, the next questions for your new module are:

  • Does the module need to manage its cache in a manner that is not consistent with the main cache bin?
  • Will its cache need to be flushed independently of the main cache at any time, or have some other expiration logic assigned to it that falls outside of the core cron cache clear calls?

If the answer to either of these questions is, “yes,” then a dedicated cache bin is probably a wise idea.

Cache bin management is abstracted in the Drupal system via classes implementing DrupalCacheInterface. The core codebase provides a default database-driven cache mechanism via DrupalDatabaseCache that is used for any cache bin type that has not been overridden with a custom class (see the documentation on DrupalCacheInterface for details on how to do that) and has a table in the database named the same as the bin. This table conforms to the same schema as the core cache tables. For reference, this is the core cache table schema in MySQL that we will use as the base for our module’s cache bin:

  1. +------------+--------------+------+-----+---------+-------+
  2. | Field | Type | Null | Key | Default | Extra |
  3. +------------+--------------+------+-----+---------+-------+
  4. | cid | varchar(255) | NO | PRI | | |
  5. | data | longblob | YES | | NULL | |
  6. | expire | int(11) | NO | MUL | 0 | |
  7. | created | int(11) | NO | | 0 | |
  8. | serialized | smallint(6) | NO | | 0 | |
  9. +------------+--------------+------+-----+---------+-------+

How?

For the sake of simplicity, we will assume that our module is fine with using the default cache mechanism and database schema. As an exercise, we will also assume that we meet the criteria for defining our own cache bin so we can explore all the hooks required to implement a complete custom bin leveraging the default cache implementation. The sample module is called cachemod, and the cache bin name is cache_cachemod.

Define the cache bin schema

In order to add a table with the correct schema to the system, we borrow from some code found in the block module that copies the schema from the core cache table and add this to our install hooks in cachemod.install:

 

  1. <code><span style="color: #000000;"><span style="color: #0000bb;"><?php
  2. </span><span style="color: #ff8000;">/**
  3. * Implements hook_schema
  4. */
  5. </span><span style="color: #007700;">function </span><span style="color: #0000bb;">cachemod_schema</span><span style="color: #007700;">() {
  6. </span><span style="color: #ff8000;">// Create new cache table using core cache schema
  7. </span><span style="color: #0000bb;">$schema</span><span style="color: #007700;">[</span><span style="color: #dd0000;">'cache_cachemod'</span><span style="color: #007700;">] = </span><span style="color: #0000bb;">drupal_get_schema_unprocessed</span><span style="color: #007700;">(</span><span style="color: #dd0000;">'system'</span><span style="color: #007700;">, </span><span style="color: #dd0000;">'cache'</span><span style="color: #007700;">);
  8. </span><span style="color: #0000bb;">$schema</span><span style="color: #007700;">[</span><span style="color: #dd0000;">'cache_cachemod'</span><span style="color: #007700;">][</span><span style="color: #dd0000;">'description'</span><span style="color: #007700;">] = </span><span style="color: #dd0000;">'Cache bin for the cachemod module'</span><span style="color: #007700;">;</span></span>
  1.   return <span style="color: #0000bb;">$schema</span><span style="color: #007700;">;
  2. }
  3. </span><span style="color: #0000bb;">?></span>

 

Now that we have defined a table for our cache bin that replicates the schema of the core cache table, we can make basic set and get calls using the following:

 

  1. <span style="color: #000000;"><span style="color: #0000bb;"><?php
  2. cache_get</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$cid</span><span style="color: #007700;">, </span><span style="color: #dd0000;">'cache_cachemod'</span><span style="color: #007700;">);
  3. </span><span style="color: #0000bb;">cache_set</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$cid</span><span style="color: #007700;">, </span><span style="color: #0000bb;">$data</span><span style="color: #007700;">, </span><span style="color: #dd0000;">'cache_cachemod'</span><span style="color: #007700;">);
  4. </span><span style="color: #0000bb;">?></span></span>

 

Using our new cache bin

Notice the CID (cache ID) parameter. This will need to be unique to the data being stored, so in the case of something like a web service, the CID might be built from the arguments being passed to the service and the data will be the returned data. One way to abstract this so you get consistent CID values for calls to cache_get and cache_set is to build a helper function. This sample assumes our service call takes an array of key-value pairs:

 

  1. <code><span style="color: #000000;"><span style="color: #0000bb;"><?php
  2. </span><span style="color: #ff8000;">/**
  3. * Util function to generate cid from service call args
  4. */
  5. </span><span style="color: #007700;">function </span><span style="color: #0000bb;">_cachemod_cid</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$args</span><span style="color: #007700;">) {
  6. </span><span style="color: #ff8000;">// Make sure we have a valid set of args
  7. </span><span style="color: #007700;">if (empty(</span><span style="color: #0000bb;">$args</span><span style="color: #007700;">)) {
  8. return </span><span style="color: #0000bb;">NULL</span><span style="color: #007700;">;
  9. }</span></span>

// Make sure we are consistently operating on an array
If (!is_array($args)) {
$args = array($args);
}

  1.   <span style="color: #ff8000;">// Sort the array by key, serialize it, and calc the hash
  2. </span><span style="color: #0000bb;">ksort</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$args</span><span style="color: #007700;">);
  3. </span><span style="color: #0000bb;">$cid </span><span style="color: #007700;">= </span><span style="color: #0000bb;">md5</span><span style="color: #007700;">(</span><span style="color: #0000bb;">serialize</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$args</span><span style="color: #007700;">));
  4. return </span><span style="color: #0000bb;">$cid</span><span style="color: #007700;">;
  5. }
  6. </span><span style="color: #0000bb;">?></span>

 

Now we can implement a basic public web service function leveraging our cache like this:

 

  1. <code><span style="color: #000000;"><span style="color: #0000bb;"><?php
  2. </span><span style="color: #ff8000;">/**
  3. * Public function to execute web service call
  4. */
  5. </span><span style="color: #007700;">function </span><span style="color: #0000bb;">cachemod_call</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$args</span><span style="color: #007700;">) {
  6. </span><span style="color: #ff8000;">// Create our cid from args
  7. </span><span style="color: #0000bb;">$cid </span><span style="color: #007700;">= </span><span style="color: #0000bb;">_cachemod_cid</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$args</span><span style="color: #007700;">);</span></span>

// See if we have cached data already
$data = cache_get($cid, 'cache_cachemod')
if (!
$data) {
// No such luck, go try to pull it from the web service
$data = _cachemod_call_service($args);
if (
$data) {
// Great, we have data!  Store it off in the cache
cache_set($cid, $data, 'cache_cachemod');
}
}

  1.   return <span style="color: #0000bb;">$data</span><span style="color: #007700;">;
  2. }
  3. </span><span style="color: #0000bb;">?></span>

 

Note that there are several values for the optional expire parameter to the cache_set call that are fully documented in the API docs.

Hooking into the core cache management system

If you want your module’s cache bin to clear out when Drupal executes a cache wipe during cron runs or a general cache_clear_all, set the expire parameter in your cache_set call above to either CACHE_TEMPORARY or a Unix timestamp to expire after, and add the following hook to your module:

 

  1. <span style="color: #000000;"><span style="color: #0000bb;"><?php
  2. </span><span style="color: #ff8000;">/**
  3. * Implements hook_flush_caches
  4. */
  5. </span><span style="color: #007700;">function </span><span style="color: #0000bb;">cachemod_flush_caches</span><span style="color: #007700;">() {
  6. </span><span style="color: #0000bb;">$bins </span><span style="color: #007700;">= array(</span><span style="color: #dd0000;">'cache_cachemod'</span><span style="color: #007700;">);
  7. return </span><span style="color: #0000bb;">$bins</span><span style="color: #007700;">;
  8. }
  9. </span><span style="color: #0000bb;">?></span></span>

 

This will add your cache bin to the list of bins that Drupal’s cron task will empty.

Additionally, if you would like to add your cache bin to the list of caches that drush can selectively clear, add the following to your module in a file named cachemod.drush.inc:

 

  1. <code><span style="color: #000000;"><span style="color: #0000bb;"><?php
  2. </span><span style="color: #ff8000;">// Implements hook_drush_cache_clear
  3. </span><span style="color: #007700;">function </span><span style="color: #0000bb;">cachemod_drush_cache_clear</span><span style="color: #007700;">(&</span><span style="color: #0000bb;">$types</span><span style="color: #007700;">) {
  4. </span><span style="color: #0000bb;">$types</span><span style="color: #007700;">[</span><span style="color: #dd0000;">'cachemod'</span><span style="color: #007700;">] = </span><span style="color: #dd0000;">'_cachemod_cache_clear'</span><span style="color: #007700;">;
  5. }</span></span>
  1. <span style="color: #ff8000;">// Util function to clear the cachemod bin
  2. </span><span style="color: #007700;">function </span><span style="color: #0000bb;">_cachemod_cache_clear</span><span style="color: #007700;">() {
  3. </span><span style="color: #0000bb;">cache_clear_all</span><span style="color: #007700;">(</span><span style="color: #dd0000;">'*'</span><span style="color: #007700;">, </span><span style="color: #dd0000;">'cache_cachemod'</span><span style="color: #007700;">, </span><span style="color: #0000bb;">true</span><span style="color: #007700;">);
  4. }
  5. </span><span style="color: #0000bb;">?></span>

 

Note that if you set the expiration of the cache item to CACHE_PERMANENT (the default), only an explicit call to cache_clear_all with the item’s CID will remove it from the cache.

Conclusion

Sometimes it makes sense to have a module cache data for its own use, and even possibly in its own cache bin to maintain a finer-grained control of the data and cache management if something beyond the core cache management is required. Utilizing the cache abstraction built into Drupal 7 core and some custom classes, hooks, and drush callbacks can give your module a range of options for reducing data calls, processing overhead, and bandwidth consumption. For more detailed info, check out the API pages at http://api.drupal.org for the functions, classes and hooks mentioned above.

Robert Bates

Senior Developer