Slick Data Sharding: Slides from DrupalCon London

On Wednesday at DrupalCon London, I gave a session called "Slick Data Sharding: How to Develop Scalable Data Applications With Drupal". As the title suggests, the session dealt with different ways you can use sharding to scale your site and make certain parts of it more manageable.

Tobby Hagler, Director of Engineering
#Drupal | Posted

On Wednesday at DrupalCon London, I gave a session called "Slick Data Sharding: How to Develop Scalable Data Applications With Drupal". As the title suggests, the session dealt with different ways you can use sharding to scale your site and make certain parts of it more manageable.

For those who weren't able to attend DrupalCon in London, or if you were and just missed the session, slides and a video of the presentation are available here (PDF).

So what is sharding, you ask? In its simplest form, sharding is simply breaking up a large "something" into smaller pieces, or shards. Just like the shards of a broken plate, you have to be careful -- the edges are sharp, and it's sometimes difficult to piece them back together.

One of the things I tried to convey was the concept of horizontal and vertical sharding. Horizontal sharding, or "partitioning" is a complicated way of scaling a database by splitting rows of a single table out to multiple databases. This has the advantage of having fewer rows in a single database (and therefore reduces index size) but has the downside of being very difficult to reassemble with Drupal. Vertical sharding, or "federation" is a way to shard conceptually logical data (like splitting customer data based on geographical location) to separate databases. The advantages to federation is that your overall datasets are smaller, but the downside is that this is still vertical scaling and you can still potentially reach the same database ceiling (resources such as connections, memory, or CPU) that you are trying to avoid with sharding in the first place.

Reasons you may have for sharding may be for performance improvement, manageability, and security. Sharding helps performance simply by reducing the overall database size, which not only speeds up queries as well has helping to reduce replication lag. It also helps manage large data sites by keeping unrelated data in separate data sets, making it easier to provide access to only the data a particular user or application requires. It also helps secure your data by allowing you to keep sensitive data away from the same database that runs your primary public-facing website.

For those of you who attended, thank you for stopping by. For those of you you weren't able to attend, hopefully these slides and presentation video will still be of some help to you.

Tobby Hagler

Director of Engineering