Adding Entities to ApacheSolr Search

Apache Solr is one of the great solutions for providing search functionality to one's site and there are many modules for Drupal that provide the functionality. The Search API and ApacheSolr modules are two great examples of such. I became a huge fan of the options that Search API provided until I ran into one major roadblock... the site(s) I was implementing search for, was hosted on Acquia and using Acquia's Solr service, which only supports the ApacheSolr module.

Apache Solr is one of the great solutions for providing search functionality to one's site and there are many modules for Drupal that provide the functionality. The Search API and ApacheSolr modules are two great examples of such. I became a huge fan of the options that Search API provided until I ran into one major roadblock... the site(s) I was implementing search for, was hosted on Acquia and using Acquia's Solr service, which only supports the ApacheSolr module.

So, what's the drawback? Well, out of the box ApacheSolr only supports nodes for indexing. Taxonomy and entities are ignored, which for many of the requirements for my sites in progress was necessary. Was it the end of the world? Not at all. After a little research I found that taxonomy and other entities could be added for indexing by the simple implementation of hook_apachesolr_entity_info_alter.

Though I'm not directly demonstrating it, taxonomy can be included as well as Drupal (for the most part.) The taxonomy is treated like an entity.

Here's an example of using the hook to add some custom photo and text entities for indexing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/**
 * Implements hook_apachesolr_entity_info_alter().
 *
 * Adds Text and Photo items to the search index.
 */
function tr_search_apachesolr_entity_info_alter(&$entity_info) {
  $entity_info['text_item']['indexable'] = TRUE;
  $entity_info['text_item']['status callback'] = 'tr_search_status_callback';
  $entity_info['text_item']['document callback'][] = 'tr_search_solr_document';
  $entity_info['text_item']['reindex callback'] = 'tr_search_solr_reindex_text_item';
  $entity_info['text_item']['index_table'] = 'apachesolr_index_entities_text_item';
 
  $entity_info['photo_item']['indexable'] = TRUE;
  $entity_info['photo_item']['status callback'] = 'tr_search_status_callback';
  $entity_info['photo_item']['document callback'][] = 'tr_search_solr_document';
  $entity_info['photo_item']['reindex callback'] = 'tr_search_solr_reindex_photo_item';
  $entity_info['photo_item']['index_table'] = 'apachesolr_index_entities_photo_item';  
}

Within that hook we are saying that we wish the two new entities to be included for indexing.

The structure of the $entity_info array is a series of sub-arrays keyed by the entity bundle name followed by a series values for the appropriate callbacks for each of the new entities.

For those elements, let's step through what each means with an example of a callback function.

  • "Indexable" - indicates to ApacheSolr that the entity can be indexed and exposes the entity within the ApacheSolr configuration pages in Drupal's admin.
  • "status callback" - indicates that the entity is enabled and available in the system
    Example:
    1
    2
    3
    4
    5
    6
    
    /**
     * Callback for the search status. Since all text and photo items are 'published', this is always true.
     */
    function tr_search_status_callback($term, $type) {
      return TRUE;
    }

    In this case, we are just returning TRUE, but more advanced logic could be implemented to indicate the status pending advanced needs.

  • "document callback" - callback to provide the content/values that will be sent to Solr server for indexing.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    
    /**
     * Builds the information for a Solr document.
     *
     * @param ApacheSolrDocument $document
     *   The Solr document we are building up.
     * @param stdClass $entity
     *   The entity we are indexing.
     * @param string $entity_type
     *   The type of entity we're dealing with.
     */
    function tr_search_solr_document(ApacheSolrDocument $document, $entity, $entity_type) {
      // Headline
      $document->label = apachesolr_clean_text((!empty($entity->field_tr_headline[LANGUAGE_NONE][0])) ? $entity->field_tr_headline[LANGUAGE_NONE][0]['value'] : '');
     
      // Credit
      $document->ss_credit = (!empty($entity->field_tr_credit[LANGUAGE_NONE][0])) ? $entity->field_tr_credit[LANGUAGE_NONE][0]['value'] : '';
     
      // Slug
      $document->ss_slug = (!empty($entity->field_tr_slug[LANGUAGE_NONE][0])) ? $entity->field_tr_slug[LANGUAGE_NONE][0]['value'] : '';
      $document->tus_slug = (!empty($entity->field_tr_slug[LANGUAGE_NONE][0])) ? $entity->field_tr_slug[LANGUAGE_NONE][0]['value'] : '';
     
      // Content
      if ($entity_type == 'photo_item') {
        $document->content = apachesolr_clean_text((!empty($entity->field_tr_caption[LANGUAGE_NONE][0])) ? $entity->field_tr_caption[LANGUAGE_NONE][0]['value'] : '');
      }
      else {
        $document->content = apachesolr_clean_text((!empty($entity->field_tr_text_body[LANGUAGE_NONE][0])) ? $entity->field_tr_text_body[LANGUAGE_NONE][0]['value'] : '');
      }
     
      $documents = array();
      $documents[] = $document;
      return $documents;
    }

    Pretty much checking the entity fields that we want to send we capture their values as properties of the solr document object and return

  • "reindex callback" - callback to provide instructions for when the content is being re-indexed in the system
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
    /**
     * Reindexing callback for ApacheSolr, for text items
     */
    function tr_search_solr_reindex_text_item() {
      $indexer_table = apachesolr_get_indexer_table('text_item');
      $transaction = db_transaction();
      $env_id = apachesolr_default_environment();
      try {
        db_delete($indexer_table)
          ->condition('entity_type', 'text_item')
          ->execute();
         
        $select = db_select('text_item', 't');
        $select->addField('t', 'eid', 'entity_id');    
        $select->addExpression(REQUEST_TIME, 'changed');
     
        $insert = db_insert($indexer_table)
          ->fields(array('entity_id', 'changed'))
          ->from($select)
          ->execute();
      }
      catch (Exception $e) {
        $transaction->rollback();
        watchdog_exception('Apache Solr', $e);
        return FALSE;
      }
     
      return TRUE;
    }
  • "index_table" - the name of the database table that ApacheSolr will locally store references for.
    This table is one that you define in a *.install file using drupal's schema api. The table structure should mimic the same structure that ApacheSolr already defines for nodes.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    
    /**
     * Implements hook_schema().
     */
    function tr_search_schema() {
      $types = array(
          'text_item' => 'apachesolr_index_entities_text_item',
          'photo_item' => 'apachesolr_index_entities_photo_item',
      );
      foreach ($types as $key => $type) {
        $schema[$type] = array(
          'description' => t('Stores a record of when an entity changed to determine if it needs indexing by Solr.'),
          'fields' => array(
            'entity_type' => array(
              'description' => t('The type of entity.'),
              'type' => 'varchar',
              'length' => 128,
              'not null' => TRUE,
              'default' => $key,          
            ),
            'entity_id' => array(
              'description' => t('The primary identifier for an entity.'),
              'type' => 'int',
              'unsigned' => TRUE,
              'not null' => TRUE,
            ),
            'bundle' => array(
              'description' => t('The bundle to which this entity belongs.'),
              'type' => 'varchar',
              'length' => 128,
              'not null' => TRUE,
              'default' => $key,
            ),
            'status' => array(
              'description' => t('Boolean indicating whether the entity is visible to non-administrators (eg, published for nodes).'),
              'type' => 'int',
              'not null' => TRUE,
              'default' => 1,
            ),
            'changed' => array(
              'description' => t('The Unix timestamp when an entity was changed.'),
              'type' => 'int',
              'not null' => TRUE,
              'default' => 0,
            ),
          ),
          'indexes' => array(
            'changed' => array('changed', 'status'),
          ),
          'primary key' => array('entity_id'),
        );
      }
      return $schema;
    }

Using this base set of code you can be able to add any drupal entity to ApacheSolr for indexing. Not everything demonstrated may conform to ones exact needs but, the premise remains the same in allowing entities not supported out of the box to be added to search indexes.

(A hand to Brad Blake for contribution in the code as well)

Geoff Maxey