development icon

Managing Your Drupal 8 Migration

Joshua Turton, Senior Developer
#Drupal 8 | Posted

In this post, we’ll begin to talk about the development considerations of actual website code migration and other technological details. In these exercises, we’re assuming that you’re moving from Drupal 6 or 7 to Drupal 8. In a later post, I will examine ways to move other source formats into Drupal 8 - including CSV files, non-Drupal content management systems, or database dumps from weird or proprietary frameworks.

Migration: A Primer

Before we get too deep into the actual tech here, we should probably take a minute to define some terms and explain what’s actually happening under the hood when we run a migration, or the rest of this won’t make much sense.

When we run a migration, what happens is that the Web Server loads the content from the old site, converts it to a Drupal 8 format, and saves it in the new site.  Sounds simple, right?

Actually, it pretty much is that simple. At least, conceptually. So, try to keep those three steps in mind as we go through the hard stuff later. Everything we do is designed to make one of those three steps work.

Key Phrases

  • Migration: The process of moving content from one site to another. ‘A migration’ typically refers to all the content of a single content or entity type (in other words, one node type, one taxonomy, and so on).

  • Migration Group: A collection of Migrations with common traits

  • Source: The Drupal 6 or 7 database from which you’re drawing your content (or other weird source of data, if applicable)

  • Process: The stuff that Drupal code does to the data after it’s been loaded, in order to digest it into a format that Drupal 8 can work with

  • Destination: The Drupal 8 site

Interestingly, each of those key phrases above corresponds directly to a code file that’s required for migration. Each Migration has a configuration (.yml) file, and each is individually tailored for the content of that entity. As config files, each of these is pretty independant and not reusable. However, we can also assign them to Migration Groups. Groups are also configuration (.yml) files. They allow us to declare common configurations once, and reuse them in each migration that belongs to that group.

The Source Plugin code is responsible for doing queries to the Source database, retrieving the data, and formatting it into PHP objects that can be worked on. The Process Plugin takes that data, does stuff to it, and passes it to the next step. The Destination Plugin then saves it in Drupal 8 format.  Rinse, repeat.

On a Drupal-to-Drupal migration, around 75% of your time will be spent working in the Migration or Migration Group config, declaring the different Process Plugins to use. You may wind up writing one or more Process Plugins as part of your migration development, but a lot of really useful ones are included in Drupal core migration code and are documented here. A few more are included with Migrate Plus.

Drupal 8 core has Source Plugins for all standard Drupal 6 and Drupal 7 entity types (node, taxonomy, user, etc.). The only time you’ll ever need to write a Source plugin is for a migration from a source other than Drupal 6 or 7, and many of these are already available as Contrib modules.

Also included in Drupal core are Destination Plugins for all of the core entity types. Unless you’re using a custom entity in Drupal 8, and migrating data into that entity, you’ll probably never write a Destination Plugin.

Development Foundations

There are a few key requirements you need to have in place before you can begin development.  First, and probably foremost, you need to have both your Drupal 6/7 and Drupal 8 sites - the former full of all your valuable content, and the latter empty of everything but structure.

An important note: though the completed migration will be run on your production server, you should be using development environments for this work. At Phase2, we use Outrigger to simplify and standardize our dev and production environments.

For migration purposes, we only actually need the Drupal 7 site’s database itself, in a place that’s accessible to the destination site.  I usually take an SQL dump from production, and install it as an additional database on the same server as the destination, to avoid network latency and complicated authentication requirements. Obviously, unless you freeze content for the duration of the migration development, you’ll have to repeat this process for final content migration on production.

I’d like to reiterate some advice from my last post: I strongly recommend sanitizing user accounts and email addresses on your development databases.  Use drush sql-sanitize and avoid any possibly embarrassing and unprofessional gaffes.

On your Drupal 8 site, you should already have completed the creation of the new content types, based on information you discovered and documented in your first steps.  This should also encompass the creation of taxonomy vocabularies, and any fields on your user entities.

In your Drupal 8 settings.php file, add a second database config array pointed at the Drupal 7 source database.

sites/default/settings.php

  1. $databases['migration_source_db']['default'] = array(
  2.   'database' => 'example_source',
  3. 'username' => 'username',
  4. 'password' => 'password',
  5. 'prefix' => '',
  6. 'host' => 'db',
  7. 'port' => '',
  8. 'namespace' => 'Drupal\Core\Database\Driver\mysql',
  9. 'driver' => 'mysql',
  10. );

Finally, you’ll need to add the migration module suite to your site.  The baseline for migrations is migrate, migrate_drupal, migrate_plus, and migrate_tools.  The Migrate and Migrate Drupal modules are core code. Migrate provides the basic functionality required to take content and put it into Drupal 8.  Migrate Drupal provides code that understands the structure of Drupal 6 and 7 content, and makes it much more straightforward to move content forward within the Drupal ecosystem.

Both Migrate Plus and Migrate Tools are contributed modules available at drupal.org. Migrate Plus, as the name implies, adds some new features, most importantly migration groups. Migrate Tools provides the drush integration we will use to run and rollback migrations.

Drupal 8 core code also provides migrate_drupal_ui, but I recommend against using it. By using Migrate Tools, we can make use of drush, which is more efficient, can be incorporated into shell scripts, and has more clear error messages.

Framing the House

We’ve done the planning and laid the foundations, so now it’s time to start building this house!

We start with a new, custom module.  This can be pretty bare-bones, to start with.

example_migrate/example_migrate.info.yml

  1. type: module
  2. name: 'Example Migrate'
  3. description: 'Example custom migrations'
  4. package: 'Example Migrate'
  5. core: '8.x'
  6. dependencies:
  7. - drupal:migrate
  8. - drupal:migrate_plus
  9. - drupal:migrate_tools
  10. - drupal:migrate_drupal

Within our module folder, we need a config/install directory. This is where all our config files will go.

Migration Groups

The first thing we should make is a general migration group. While it’s possible to put all the configuration into each and every migration you write, I’m a strong believer in DRY programming (Don’t Repeat Yourself).  Migrate Plus gives us the ability to put common configuration into a single file and use it for multiple migrations, so let’s take advantage of that power!

Note the filename we’re using here. This naming convention gives Migrate Plus the ability to find and parse this configuration, and marks it as a migration group.

example_migrate/config/install/migrate_plus.migration_group.example_general.yml

  1. # The machine name of the group, by which it is referenced in individual migrations.
  2. id: example_general
  3.  
  4. # A human-friendly label for the group.
  5. label: General Imports
  6.  
  7. # More information about the group.
  8. description: Common configuration for simple migrations.
  9.  
  10. # Short description of the type of source, e.g. "Drupal 6" or "WordPress".
  11. source_type: Drupal 7 Site
  12.  
  13. # Here we add any default configuration settings to be shared among all
  14. # migrations in the group.
  15. shared_configuration:
  16. source:
  17. key: migration_source_db
  18.  
  19. # We add dependencies just to make sure everything we need will be available
  20. dependencies:
  21. enforced:
  22. module:
  23. - example_migrate
  24. - migrate_drupal
  25. - migrate_tools

This is a very simple group that will use for migrations of simple content . Most of the stuff in here is self-descriptive.  However, source is a critical config - it uses the key of the database configuration we added earlier, to give migrate access to that database.  We’ll examine a more complicated migration group another time.

User Migration

In Drupal, users pretty much have their fingers in every pie.  They are listed as authors on content, they are creators of files… you get the picture.  That’s why it’s usually the first migration to get run.

Note again the filename convention here, which allows Migrate Plus to find it, and marks it as a migration (as opposed to a group).

example_migrate/config/install/migrate_plus.migration.example_user.yml

  1. # Migration for user accounts.
  2. id: example_user
  3. label: User Migration
  4. migration_group: example_general
  5. source:
  6. plugin: d7_user
  7. destination:
  8. plugin: entity:user
  9. process:
  10. plugin: get
  11. source: mail
  12. status: status
  13.  
  14. name:
  15. -
  16. plugin: get
  17. source: name
  18. -
  19. plugin: dedupe_entity
  20. entity_type: user
  21. field: name
  22.  
  23. roles:
  24. plugin: static_map
  25. source: roles
  26. map:
  27. 2: authenticated
  28. 3: administrator
  29. 4: author
  30. 5: guest_author
  31. 6: content_approver
  32.  
  33. created: created
  34. changed: changed
  35.  
  36. migration_dependencies:
  37. required: { }
  38.  
  39.  
  40. dependencies:
  41. enforced:
  42. module:
  43. - example_migrate

Wow! There’s lots of stuff going on here.  Let’s try and break it down a bit.

  1. id: example_user
  2. label: User Migration
  3. migration_group: example_general

The id designation is a standard machine name for this migration.  We will call this with drush to run the migration. Label is a standard human-readable name.  The migration_group should be obvious - it connects this migration to the group we designed above, which means we are now importing all the config in there.  Notably, that connects us to the D7 database.

  1. source:
  2. plugin: d7_user
  3. destination:
  4. plugin: entity:user

Here are two key items.  The source plugin defines where we are getting our data, and what format it’s going to come in.  In this case, we are using Drupal core’s d7_user plugin.

The destination plugin defines what we’re making out of that data, and the format it ends up in.  In this case, we’re using Drupal core’s entity:user plugin.

  1. process:
  2. plugin: get
  3. source: mail
  4.  
  5. status: status
  6.  
  7. name:
  8. -
  9. plugin: get
  10. source: name
  11. -
  12. plugin: dedupe_entity
  13. entity_type: user
  14. field: name
  15.  
  16.  
  17. roles:
  18. plugin: static_map
  19. source: roles
  20. map:
  21. 2: authenticated
  22. 3: administrator
  23. 4: author
  24. 5: guest_author
  25. 6: content_approver
  26.  
  27. created: created
  28. changed: changed

Now we get into the real meat of a migration - the Process section. Each field you’re going to migrate has to be defined here. They are keyed by their field machine name in Drupal 8.  

Each field assigns a plugin parameter, which defines the Process Plugin to use on the data. Each of these process plugins will take a source parameter, and then possibly others.  The source parameter defines the field in the data array provided by the source plugin.  (Yeah, like I’ve said before, naming things clearly isn’t Drupal’s strong suit).

Our first example is mail. Here we are assigning it the get process plugin. This is the easiest process to understand, as it literally takes the data from the old site and gives it to the new site without transforming it in any way. Since email addresses don’t have any formatting changes or necessary transformations, we just move them.

In fact, the get process plugin is Drupal’s default, and our next example shows a shortcut to use it. The status field is getting its data from the old status field. Since get is our default, we don’t even need to actually specify the plugin, and the source is simply implied. See the documentation on drupal.org for more detail.

Name is a slightly more complicated matter.  While usernames don’t change much in their format, we want to make absolutely sure that they are unique.  This leads us to Plugin Chaining, an interesting option that allows us to pass data from one plugin to another, before saving it. The YML array syntax, as demonstrated above, allows us to define more than one plugin for a single field.

We start off by defining the get plugin, which just gets the data from a source field. (You can’t use the default shortcut when you’re chaining, incidentally.)

We then pass it off to the next plugin in the chain, dedupe_entity. This plugin ensures that each record is absolutely certain to be unique.  It has the additional parameters entity_type and field. These define the entity type to check against for uniqueness, and the field in which to look on that entity. See the documentation for more detail.

Note that this usage of dedupe_entity does not specify a source parameter.  That’s because plugin chaining hands off the data from the first plugin in line to the next, becoming, in effect, the source.  It’s very similar to method chaining in jQuery or OOP PHP.  You can chain together as many process plugins as you need, though if you start getting up above four it might be time to re-evaluate what you’re doing, and possibly write a custom processor.

Our final example to examine is roles. User roles in Drupal 7 were keyed numerically, but in Drupal 8 they are based on machine names.  The static_map plugin takes the old numbers, and assigns them to a machine name, which becomes the new value.

The last two process items are changed and created. Like status, they are using the get process plugin, and being designated in the shortcut default syntax.

  1. migration_dependencies:
  2. required: { }
  3.  
  4. dependencies:
  5. enforced:
  6. module:
  7. - example_migrate

The last two configs are pretty straightforward.  Migration Dependencies are used when a migration requires data from other migrations (we’ll get into that more another time). Dependencies are used when a migration requires a specific additional module to be enabled. In my opinion it’s pretty redundant with the dependencies declared in the module itself, so I don’t use it much.

In the next post, we’ll cover taxonomy migrations and simple node migrations. We’ll also share a really useful tool for migration development.  Thanks for reading!

Joshua Turton

Joshua Turton

Senior Developer