Migration is a complex and interesting topic. We’ve already covered some important migration information on our blog:
-
Drupal 8 Content Migration: A Guide For Marketers - What content should we migrate, and how do we organize and plan a migration?
-
Estimating Drupal 8 Migration Scope - How long will all this take?
-
Managing Your Drupal 8 Migration - Key concepts, setting up the tools, and starting with a user migration.
In this post, we’ll delve into node and taxonomy migrations, which is where the bulk of the content in most sites is stored. If you haven’t read through the previous installments in this series, I highly recommend you do so. We’ll be building on some of those concepts here.
Categories and Tags
Categorization and tagging in Drupal is managed through the use of taxonomies. Taxonomies are a ‘fieldable entity’ type, much like users and nodes. As we did with user migrations in our previous post, we will assign all the taxonomy migrations to our general migration group.
In this case, Category is a taxonomy vocabulary that has two fields: name (the taxonomy term itself) and description. This example is based on the migration template file from the core taxonomy module.
example_migrate/config/install/migrate_plus.migration.example_category.yml
[php]# Migration for category taxonomy id: example_category label: Category Taxonomy terms migration_group: example_general deriver: Drupal\taxonomy\Plugin\migrate\D7TaxonomyTermDeriver source: plugin: d7_taxonomy_term bundle: example_category destination: plugin: entity:taxonomy_term process: tid: tid vid: plugin: default_value default_value: example_category name: name weight: weight 'description/value': description 'description/format': format # Only attempt to stub real (non-zero) parents. parent_id: - plugin: skip_on_empty method: process source: parent - plugin: migration_lookup migration: example_category parent: plugin: default_value default_value: 0 source: '@parent_id' changed: timestamp migration_dependencies: { }[/php]
Wow again! As is often the case, there’s lots of things going on here. Let’s break down the new stuff.
[php]deriver: Drupal\taxonomy\Plugin\migrate\D7TaxonomyTermDeriver[/php]
This is a component within the Drupal 8 core taxonomy module. When migration loads the taxonomy term, the Deriver goes and loads all the field values from the source database and attaches it to the term. This saves us the heavy lifting of writing SQL to load all those field values ourselves.
[php] source: plugin: d7_taxonomy_term bundle: example_category destination: plugin: entity:taxonomy_term[/php]
Here are a few key items. The source plugin defines where we are getting our data, and what format it’s going to come in. In this case, we are using Drupal core’s d7_taxonomy_term plugin.
Bundle tells us what type of term we’re loading; in this case the example_category.
The destination plugin defines what we’re making out of that data, and the format it ends up in. In this case, we’re using Drupal’s core entity handler for taxonomy_term.
[php]vid: plugin: default_value default_value: example_category[/php]
There are a number of ways to deal with a taxonomy term’s vid. Because we are doing a migration of a single vocabulary, we are simply setting it to a constant value. Were we migrating several vocabularies at once, the static map plugin would probably be how we’d handle this. Note that vid is a machine name in D8 and an integer in D7, so you will have to do a translation of it somehow.
An additional note here about vocabularies: As noted in my previous posts, it’s also possible to migrate the vocabularies themselves. This would allow us to use the migration_lookup plugin here.
However, I recommend against migrating content and vocabulary types. Unless your content model is extremely simple, the changes to a content type’s fields are usually pretty significant. You’re better off putting in some labor up front to manually create them rather than trying to clean up a computer’s mess later.
[php] 'description/value': description 'description/format': format[/php]
So, this is an interesting example. Because it’s a rich-text field, description has both a value (the content itself) and a format. The ‘/’ syntax allows us to assign values to each. In this case, each of them is using the default get plugin, but we could use any process plugin we needed.
[php]# Only attempt to stub real (non-zero) parents. parent_id: - plugin: skip_on_empty method: process source: parent - plugin: migration_lookup migration: example_category parent: plugin: default_value default_value: 0 source: '@parent_id'[/php]
OK - this one is complicated. Because taxonomies can be hierarchical, they have a ‘parent’ parameter. Because that parent is a term that’s being migrated, we have to translate the source id into the new id by way of the migration lookup process plugin. It gets complicated, though, because we’re looking up values in the very same migration that’s currently running!
We start off by creating a ‘pseudo-field’: parent_id. This isn’t an actual field in the D8 site, but we can use it to hold a value temporarily. The skip_on_empty plugin will skip this field if there’s no value in the source data. Then we use the migration_lookup plugin to find the new value in the already migrated content.
“But wait!” some of you might be saying. “Migration is an iterative process - what happens if a term has a parent value in the source data that hasn’t been migrated yet?”
A great question and the answer is, “Magic!” Well, no, not really, but it does kind of seem that way. The migration_lookup plugin has a handy feature called stubbing. If this plugin finds a reference to a value that doesn’t exist, it creates it. However, since it doesn’t know any actual data beyond the ID value(s), it fills it with nonsense data. Later, when that ID comes up to be migrated, it will clean up after itself and replace the nonsense with actual values.
If, for some reason, you don’t want to stub out values, you can add no_stub: true to the configuration of the migration_lookup plugin.
Once the parent_id pseudo-field is populated, we use it to fill in the value of parent. The default_value plugin can take an additional parameter, source, which will override the default_value itself. Using ‘@parent_id’ as the value of that allows us to substitute in the value we assigned to the pseudo-field (that’s what the @ does). If there isn’t a value in that, then default_value will use 0, meaning no parent is assigned to this taxonomy term.
Nodes
Finally, we get to migrate some actual content! Nodes are the heart of Drupal’s content management.
example_migrate/config/install/migrate_plus.migration_group.example_nodes_group.yml
[php]# Migration Group for nodes id: example_nodes_group label: Nodes Group description: Common config for node migrations # Here we add any default configuration settings to be shared # among all migrations in the group. shared_configuration: source: key: migration_source_db destination: plugin: entity:node process: nid: nid type: type title: title uid: plugin: migration_lookup source: node_uid migration: example_user status: status created: created changed: changed comment: comment promote: promote sticky: sticky field_body: field_body 'field_body/format': plugin: default_value default_value: 'filtered_html' field_category: plugin: migration_lookup source: field_category migration: example_category no_stub: true migration_dependencies: required: - example_category - example_user dependencies: { }[/php]
By now, you should be able to read pretty much all of this. It’s important to note, though, that this is a migration group. So why are we setting field process configuration here? Because most of these are fields that are common to all Drupal nodes - nid, type, title, uid, etc. The rest of them are fields that are common to all the nodes in this particular site. Either way, we can use the migration group to set these common parameters and avoid having to do it in every node migration.
[php]migration_dependencies: required: - example_category - example_user[/php]
This is really the only new thing we’re adding here - dependencies on other migrations. Because there are entity reference fields in this content type, we need to ensure that the user and category migrations have been run before this one.
example_migrate/config/install/migrate_plus.migration.example_nodes.yml
[php]# Migration for Example Nodes. id: example_node label: Example Nodes migration_group: example_node_group source: plugin: d7_node node_type: example_node destination: plugin: entity:node process: type: plugin: default_value default_value: example_node field_related_nodes: source: field_related_nodes plugin: sub_process process: target_id: plugin: migration_lookup source: target_id migration: example_blog_post migration_dependencies: required: - example_blog_post[/php]
At this point, you should be able to read almost all of this without any further introduction. The only new thing being introduced here is one of the most useful plugins, sub_process. Formerly known as iterator, this plugin is pretty much a foreach loop. It runs whatever is assigned to its process in a loop, assuming that the source value is an array. When the plugin specified in the process is run, it is given each of the array rows one at a time. It then uses its own source parameter as the key in that array. In this case, it is looking up an array of blog posts and finding the new NIDs, then assigning them to target_id. You can see the API page linked above for more details.
Handy-dandy Drush Command
One thing about Drupal 8’s new config management is, once you’ve enabled a module it doesn’t generally read the config files from that module again. Drush provides the cim command to route around this problem. Run this command whenever you make a change to a migration .yml file:
[php]drush cim -y --partial --source=modules/custom/example_module/config/install/[/php]
Obviously, replace ‘example_module’ with the name of your module.
That’s it for today, folks! Next time, we’ll cover migrations from sources other than Drupal.