Calais - The Why And The How

In the last several weeks, Phase2 has blogged about our development and release of an integration module for Drupal CMS with the Calais Web Service. Why did we spend time and money on it? Here's some background.

In the last several weeks, Phase2 has blogged about our development and release of an integration module for Drupal CMS with the Calais Web Service. Why did we spend time and money on it? Here's some background.

The Wisdom of Crowds

In any diverse group of people, individuals carry large amount of scattered, tacit knowledge relevant to their areas of learning. Finding effective ways to convert individual knowledge into explicit codified knowledge boosts innovation. Such knowledge, when properly aggregated and analyzed allows the exploration of new dimensions in the "thought space", discovery of new perspectives on "old" facts. This phenomena has been long known as Collective Intelligence and, more recently, has been popularized inThe Wisdom of Crowds, by James Surowiecki. In order for collective intelligence to produce useful information, the opinions of the participants must be independent, diverse, decentralized and be effectively aggregated to produce specific conclusions. Clearly, the Internet has addressed the issue of getting together large numbers of independent, diverse and decentralized individuals. But what about aggregation of information?

Missing Pieces

RDF and its simpler alternative for less complex tasks, RSS have played monumental role in facilitating the aggregation of information in the realm of the Semantic Web. These technologies provide an important delivery channel, but once the information is pulled to one place, it needs to be categorized and analyzed, otherwise it will never be transformmed into its useful form - Knowledge. Calais Web-Service from ThompsonReuters fills the void. It allows users to submit free-format text and returns a list of intelligent keywords (terms) categorized in different bins of "entities" and "events". For instance, when you submit a recent news item about the 2008 US elections, it will return a list of persons mentioned in the article. It will probably also return United States under Countries. And so on. What's the benefit of having content tagged? It provides logical categorization and allows you to link different content items to each other. For instance, you can quickly grab a list of content items that all talk about George Bush. Using advanced visualization tools like Tag Clouds, you can see which keywords (terms) the incoming feeds talk most about. The list of possibilities is really only limited by your own creativity and content.

Where Does Drupal Fit In Open Calais?

Drupal is one of the leading content-management system right now. Intelligent tagging provided by Calais, combined with Drupal, opens up a wealth of possibilities for the community. Drupal already has a bunch of very capable Feed Aggregators (e.g. FeedAPI) that can be plugged into it as an extension module. Adding Calais module to the mix and allowing it to auto-tag incoming feed items will produce some top-notch business analysis tool in less time than you can finish drinking a cup of coffee. The module has flexible administration capabilities: you can edit a wealth of settings on the "global" level or for each content type, individually. You can have feed items automatically tagged as they are being harvested, where you can set Calais tagging of "news articles" or "editorials" to "manual". In manual mode, Calais suggests possible tags to an editor, and it is up to the editor to choose the relevant terms. Bringing together Calais and Drupal is a powerful combination and provides versatile ground for further innovation. We hope to see many creative solutions, and we are excited to be part of the Semantic revolution.

* * *

Irakli Nadareishvili