Skip to main content
Using Search API Attachments with Remote Solr Extraction
August 28, 2012 |

Search API Attachments

is very similar to Apachesolr Attachments in that it lets you extract text from attachments using Apache Tika. It makes this text indexable and searchable so that documents on the site can be searched along with nodes and entities. However, while Apachesolr Attachments lets you select either to use a local copy of Tika or Tika installed on a remote SOLR server, Search API doesn't support the same configuration. Search API Attachments only supports local Tika extraction. For large-scale sites, this is an issue as it takes resources away from the web server to do resource-intensive processing work. There is a way to enable remote SOLR extraction in Search API with just a few patches. First, make sure you are using the 7.x-1.2 copy of the Search API Attachments module. If you are not, upgrade to that version. Next, apply the http://drupal.org/files/search_api_attachments-allow_external_extraction_and_cache_extraction-1289222-8.patch patch to your Search API Attachments module. This patch adds a configuration option to the Search API Attachments screen to allow for remote SOLR extraction or local Tika extraction, and contains the necessary code to make it work. It also adds a table to store the text that was extracted, so that you don't need to send the files to the server every time you need to re-index your site. Don't forget to run the database updates after this patch has been applied. Last, apply the http://drupal.org/files/search_api_solr-allow_abitrary_query-1580118-1.patch patch to your Search API module ( not Search API Attachments ). This patch is required by the previous patch in order to make the query to the remote SOLR server. You'll want to re-index your site after you've made these changes if you are already using the local Tika extraction.

Development

Brad has a wide range of experience architecting large-scale platforms for clients in a variety of industries such as healthcare, sports, government, education and publishing. His breadth of skills includes creating dynamic administrative interfaces, migrating content and creating scalable, stable and secure code. He is equally adept at creating responsive front-end or complex back-end systems, and enjoys coming up with creative solutions to problems and leading our teams to success.
 

Jump back to top