Better, Stronger, Faster! A New Open Atrium Installer.

Open AtriumInstalling Drupal from scratch is a relatively painless for most people.  However, installing a large Drupal distribution such as Open Atrium 2 has the potential to be painful.  Core Drupal only contains a handful of modules that need to be installed, but Open Atrium 2 is a very feature-rich distribution with nearly 200 modules to be installed.  An installation of Open Atrium typically takes 15-20 minutes and consumes significant server resources during that period.  In this article, I will show how we reduced that installation time down to only TWO minutes!

Overview of Drupal Installation

Have you ever looked at the Drupal core installer code, or traced through it?  It’s actually a fairly robust and flexible installer framework.  The installation consists of several different “steps”.  Each step can be a form asking for information, such as the database information, site information, theme selection, etc.  Or a step can simply be a function that performs processing, such as the step to check if the system requirements of Drupal are met.  Each step can optionally be processed via the Batch API, such as the step that installs each required module.

The Drupal installer can be run interactively, or non-interactively, such as when using the “drush site-install” command.  During interactive installs, separate HTTP requests are used for each step of the process, and for each “batch” of processing done by the batch steps.  Batch processing helps avoid timeout errors, but a single step that takes too long can still cause a timeout.  The installer tries to break batch steps into one-second requests, but that only allows fast steps to be combined into single requests.  A batch step that takes more than one second only ensures that a new request is used for the next batch, and can still cause a timeout.

Caching during the installation process is also very complex.  In general the installer minimizes the amount of cache clearing to speed up the install process.  However, during module installation, some modules might depend on the existence of other modules and without some cache clearing, modules installed during the same batch request might not see each other.  This becomes even more difficult if a module is actually a Features export.  Features has it’s own caches and sometimes needs to rebuild a feature to put configuration into the database that might be needed by other modules.

Installing Open Atrium 2

What appears to be a straight-forward installation process for Drupal core becomes exponentially more complicated when installing a distribution that uses hundreds of modules and Features, such as Open Atrium.  An interactive installation of Atrium begins fine and quickly enables the first 30 modules.  Soon it becomes slower and slower.  Enabling the last module takes nearly ten times the first module.  Each Feature module that is installed causes the info files of all previously enabled modules to be reloaded into cache.   So the more modules installed, the slower Features becomes.  There are many issues, such as this in the Features queue to try and address some of these performance issues, but it is a very complex process.

Because modules take longer and longer to enable, some Open Atrium installs run into PHP timeout problems and need to increase their limit from the 30-second default up to 60-seconds.  This is often frustrating because these timeouts often near the end of the installer after 15 minutes have already passed.  In general, a successful Open Atrium 2 installation takes around 15 to 20 minutes, which is far too long for most people trying to make a quick evaluation of Open Atrium functionality.

There Must be a Better Way!

During the recent Drupal NYC Camp, we held an Open Atrium 2 training session.  During this training, we had 20 people all installing Open Atrium at the same time.  We all collectively waited 20 minutes just to get 20 identical copies of Open Atrium 2 installed on our computers.  Wouldn’t it have been nice to simply install Open Atrium once and then clone that onto the other computers?  That’s when I realized there was a better way to install Open Atrium…by cloning a previous installation!

OA2_Install_Option-2

The new 2.18 version of Open Atrium contains this new installer option.  After entering your database information, you are prompted for the Installation Method.  The Standard Drupal installation method is still available (if you really want to wait 20 minutes!).  The new Quick installation option is the default.  Selecting Quick installation will direct the installer to import from an existing database dump that is saved within the Open Atrium code repository.

Instead of a batch process step to install and enable each module, the Quick install step uses a batch process to import database tables from the sql dump.  This will only work if you are using MySQL.  If you are using a different database engine you won’t get prompted for the new Quick install…it will just use the Drupal default installer.

After only TWO minutes, the installer should finish importing the database tables.  You will then be prompted to fill in your Site information as normal.

Technical Details

Importing an existing database from the Drupal installer turned out to be trickier than originally anticipated.  The Drupal installer isn’t happy when certain database tables or variables stored in the database are ripped out from under it.  Another complexity was the fact that the new database might have a different database “prefix” for table names.  The new installer code actually parses the MySQL database dump line by line to split the SQL statements into batches for each table and to replace table names with the new database prefix.

To visualize the difference between the standard Drupal installer and the new Quick install, I ran both and sent the statistics into Graphite:

OA2_Install_Graph-2

The first 20 minutes of this graph shows the normal Drupal installation of Open Atrium 2.  The CPU is almost fully utilized during the entire install process.  The resident memory used by Apache climbs as the installer adds more and more modules to the site.  At around 08:27 the first Feature export module is enabled, causing a sharp increase in the amount of memory being used.  The decrease in CPU at 08:30.5 (when the green CPU line goes to zero) is caused by the installer asking for Site Information.  After that step, the Drupal installer clears all caches, runs cron, and does other cleanup.  This increases the resident Apache memory even further.

The second set of data starting at 08:36 is for the new Quick install option.  You’ll notice the actual install time is around 2-3 minutes.  Once again, the decrease in CPU at 08:38.5 is when the installer prompted for Site information.  Thus, the data after that represents the same cache clearing and install cleanup as in the full installer.  Ultimately the same amount of memory is needed to fully clear the cache and clean up the site as the same modules have been installed and enabled at that point.

In fact, the Quick installation process itself consists of importing over 300 database tables, followed by it’s own cache clear.  The memory increase at around 08:37.5 is caused by cache clear that the Quick installer executes after all the database tables have been imported.  During the actual database import, memory usage in Apache is flat.

Conclusion

Any other Drupal distribution that requires a large number of modules should be able to leverage the code used in Open Atrium.  The code is all within the “install_from_db” subdirectory of the profile, which can be found in the Open Atrium project on drupal.org.  The huge reduction in installation time should help retain new clients that are evaluating Open Atrium and Drupal for their organizations and generally improve first impressions of Drupal.

  • http://www.alexweber.com.br/ Alex Weber

    Nice! Feature-based distributions (are there other kinds?) are notoriously difficult to maintain because of this kind of issues…

    I’ve personally lost track of just how much time I’ve spent installing, clearing databases, looking at logs, tweaking a line of code and re-installing over and over again because the installer just stopped working all of a sudden. It’s also almost always related to a new feature export that’s causing interdependencies and changing up module installation orders and things get out of whack.

    A lot of times I wonder whether it’s worth it and somewhat miss the D6 days where every project started from a base build and no node ever had id 1 anymore :)

    Whereas it might seem like a step back, I actually kinda like the idea of starting from a dump, not only to solve this particular issue, but in general, for distributions. Wouldn’t it be nice to install, get demo content, jump right into it without having to jump through the same hoops over and over? Just boom, instant Drupal! Specially as far as in-house distros go… maybe I’m just infatuated with this idea but either way, this is very cool, thanks for sharing!

  • Othermachines

    Works like a charm.

  • Tim Loudon

    that’s super cool!

    do you know offhand if this is something that would easily work w/ any distro? briefly looking @ the install_from_db.profile code, it doesn’t look like it’s oa specific.

    btw, i’m wondering if there are further optimizations to be had:

    [tloudon@addy install_from_db]$ time mysql tester < ../db/openatrium.sql
    real 0m21.732s
    user 0m0.257s
    sys 0m0.123s

    the actual db importing part only takes ~22s. could you split openatrium.sql into two files, one for "constants" across installations and another for the installation-specific transformations? i'm guessing the ~100-160s is largely spent parsing and regexing the 2.6mb dumpfile; but how much actually needs to be changed? idk, just a thought. again, super cool. thanks for sharing!

    • Mike Potter

      It shouldn’t be distribution-specific, but since installers are a bit complex, your mileage may vary. The bulk of the new install time is actually the clearing of the drupal cache after the databases are imported. The parsing and regexing goes quickly. But after importing the tables you need to clear the drupal cache. Then after entering site information the Drupal installer clears the cache again.

      I didn’t want to get fancy with splitting the sql file or anything. My release process is to add a “drush sql-dump” command to my Jenkins test script that does a “drush site-install” for each release to verify it can be installed. So unless it can easily be automated like this, the work involved in splitting the db dump and keeping it updated wouldn’t be worth the small time savings.

      • Donovan Dilon

        Mike, thank you very much for posting this! The OA2 install approach is clever and very useful. I’m attempting to adapt the install script and install_from_db.profile code for use with a custom distribution. Almost everything works .. however the install_from_db code triggers the following error: “An AJAX HTTP request terminated abnormally. .. StatusText: ResponseText: ReadyState: 4″ … which is identical to this OA2 error reported on Drupal.org: https://www.drupal.org/node/2276509. The case suggests a memory issue, but my distribution is not as complex as OA2, and the site PHP memory limit is 256MB — so I thought it might be worth seeking feedback from you.

        The php-error log shows the following errors:

        [23-Aug-2014 23:16:00 UTC] PHP Fatal error: Call to undefined function field_info_instances() in /srv/bindings/…/…/profiles/profile-name/install_from_db/install_from_db.profile on line 182
        [23-Aug-2014 23:20:54 UTC] PHP Fatal error: Call to undefined function field_attach_load() in /srv/bindings/…/…/includes/entity.inc on line 316

        I apologize for seeking assistance in a blog post comment; but I believe this might be helpful to others who are impressed with the OA2 approach and having difficulty adapting it.

        Any suggestions are appreciated.

        Thanks!

        • Mike Potter

          You’ll want to look in your MySql error log. When this thing fails, you’ll get all sorts of weird PHP errors because Drupal can’t bootstrap itself when missing some of the DB tables.

          The only issues I have run into is 1) if your DB user doesn’t have the LOCK permission in MySQL it will fail to clone the tables, and/or 2) if the max_allowed_packets in MySQL isn’t set to a larger value such as 32M then some of the DB imports will fail.

          • Donovan Dilon

            Thanks Mike, I’ll look at the MySQL logs. Your reply and responsiveness are much appreciated.

  • RyeSeronie

    ok so this seems to be the most recent thing I can find about Atrium. I cannot find anything explaining how to get atrium up and running. Do I install drupal first or is atrium its own installation; if so do I just copy the files over or do I install it using the drupal module installer… for something so great I can’t seem to get an answer from anyone!

    • Mike Potter

      Open Atrium includes Drupal. So just go to http://drupal.org/project/openatrium and click on the 7.x-2.18 tar.gz or zip link at the bottom of the page to download it. Unpack it to a directory in the root of your web server. Then in your browser enter your hostname/install.php to run the installer. More information on installing and updating is found here: https://drupal.org/node/2169701

  • Jake Schlachter

    Mike, this is a great idea. I’m using Barracuda / Aegir to manage my OA platforms. Any thoughts on how this will / will not interact with the Aegir automated install process?

  • moshe

    I’ve been longing for the old ‘import from database’ days as I wait eternally for Drupal 8 to install. I agree that this makes sense sometimes.

  • Aslan Kanzas

    Mike, this is really great! Thank you very much for your ongoing contributions. Alex’ idea of demo content would really ice this cake – making it really easy to communicate the value of OA to new users. Demo data would also be great for new users to get a feel of how to build their own system, without getting lost in the myriad of awesome options that may be used to do so.

    Do you have any plans on adding demo data e.g. a demo University OA setup? Let me know if I can help.

  • http://www.waqasnasir.com/ Waqas Nasir

    I tried both options of normal installation and this quick recommended one. I am on Bluehost with PHP 5.4. On both of the options at the last step where we define the admin and admin email and time zone, the page ended up and blank white , i tried to refresh the page but same result … Anyways when going to the base URL of the site, the site seems to be functional and i can login as admin as well … I am not sure if i only missed the “thankyou for installing success message” , OR something else as well ..

  • http://brightsolutions.de ManuelBS

    Thanks for this amazing Idea! I really thought about this way to install distributions much quicker but I didn’t try it yet. Now as you confirmed it works, i will definetly implement this for ERPAL for Service Providers (http://drupal.org/project/erpal).
    As we develop ERPAL Platform (http://drupal.org/project/erpal_platform) as a more light weight Drupal business distribution we decided not to use features because of the missing flexibility and the heavy installation load. The installation process is much quicker. So to given an answer to Alex Weber, yes there are “non feature based” distributions. You can just code the configuration during the installation process using the Drupal API. But, of course, this takes much more time to develop and maintain the distro..