development icon

Drupal 8 Migrations: Taxonomy and Nodes

Joshua Turton, Senior Developer
#Drupal | Posted

Migration is a complex and interesting topic. We’ve already covered some important migration information on our blog:

In this post, we’ll delve into node and taxonomy migrations, which is where the bulk of the content in most sites is stored. If you haven’t read through the previous installments in this series, I highly recommend you do so.  We’ll be building on some of those concepts here.

Categories and Tags

Categorization and tagging in Drupal is managed through the use of taxonomies. Taxonomies are a ‘fieldable entity’ type, much like users and nodes. As we did with user migrations in our previous post, we will assign all the taxonomy migrations to our general migration group.

In this case, Category is a taxonomy vocabulary that has two fields: name (the taxonomy term itself) and description. This example is based on the migration template file from the core taxonomy module.

example_migrate/config/install/migrate_plus.migration.example_category.yml

  1. # Migration for category taxonomy
  2. id: example_category
  3. label: Category Taxonomy terms
  4. migration_group: example_general
  5. deriver: Drupal\taxonomy\Plugin\migrate\D7TaxonomyTermDeriver
  6.  
  7. source:
  8. plugin: d7_taxonomy_term
  9. bundle: example_category
  10.  
  11. destination:
  12. plugin: entity:taxonomy_term
  13.  
  14. process:
  15. tid: tid
  16. vid:
  17. plugin: default_value
  18. default_value: example_category
  19.  
  20.   name: name
  21. weight: weight
  22.  
  23. 'description/value': description
  24. 'description/format': format
  25.  
  26. # Only attempt to stub real (non-zero) parents.
  27. parent_id:
  28. -
  29. plugin: skip_on_empty
  30. method: process
  31. source: parent
  32. -
  33. plugin: migration_lookup
  34. migration: example_category
  35. parent:
  36. plugin: default_value
  37. default_value: 0
  38. source: '@parent_id'
  39.  
  40. changed: timestamp
  41.  
  42. migration_dependencies: { }

Wow again! As is often the case, there’s lots of things going on here.  Let’s break down the new stuff.

deriver: Drupal\taxonomy\Plugin\migrate\D7TaxonomyTermDeriver

This is a component within the Drupal 8 core taxonomy module.  When migration loads the taxonomy term, the Deriver goes and loads all the field values from the source database and attaches it to the term. This saves us the heavy lifting of writing SQL to load all those field values ourselves.

  1. source:
  2. plugin: d7_taxonomy_term
  3. bundle: example_category
  4.  
  5. destination:
  6. plugin: entity:taxonomy_term

Here are a few key items.  The source plugin defines where we are getting our data, and what format it’s going to come in.  In this case, we are using Drupal core’s d7_taxonomy_term plugin.

Bundle tells us what type of term we’re loading; in this case the example_category.

The destination plugin defines what we’re making out of that data, and the format it ends up in.  In this case, we’re using Drupal’s core entity handler for taxonomy_term.

  1. vid:
  2. plugin: default_value
  3. default_value: example_category

There are a number of ways to deal with a taxonomy term’s vid.  Because we are doing a migration of a single vocabulary, we are simply setting it to a constant value.  Were we migrating several vocabularies at once, the static map plugin would probably be how we’d handle this.  Note that vid is a machine name in D8 and an integer in D7, so you will have to do a translation of it somehow.

An additional note here about vocabularies: As noted in my previous posts, it’s also possible to migrate the vocabularies themselves.  This would allow us to use the migration_lookup plugin here.

However, I recommend against migrating content and vocabulary types. Unless your content model is extremely simple, the changes to a content type’s fields are usually pretty significant. You’re better off putting in some labor up front to manually create them rather than trying to clean up a computer’s mess later.

  1. 'description/value': description
  2. 'description/format': format

So, this is an interesting example. Because it’s a rich-text field, description has both a value (the content itself) and a format.  The ‘/’ syntax allows us to assign values to each.  In this case, each of them is using the default get plugin, but we could use any process plugin we needed.

  1. # Only attempt to stub real (non-zero) parents.
  2. parent_id:
  3. -
  4. plugin: skip_on_empty
  5. method: process
  6. source: parent
  7. -
  8. plugin: migration_lookup
  9. migration: example_category
  10.  
  11. parent:
  12. plugin: default_value
  13. default_value: 0
  14. source: '@parent_id'

OK - this one is complicated.  Because taxonomies can be hierarchical, they have a ‘parent’ parameter. Because that parent is a term that’s being migrated, we have to translate the source id into the new id by way of the migration lookup process plugin. It gets complicated, though, because we’re looking up values in the very same migration that’s currently running!

We start off by creating a ‘pseudo-field’: parent_id.  This isn’t an actual field in the D8 site, but we can use it to hold a value temporarily.  The skip_on_empty plugin will skip this field if there’s no value in the source data. Then we use the migration_lookup plugin to find the new value in the already migrated content.

“But wait!” some of you might be saying. “Migration is an iterative process - what happens if a term has a parent value in the source data that hasn’t been migrated yet?”

A great question and the answer is, “Magic!”  Well, no, not really, but it does kind of seem that way.  The migration_lookup plugin has a handy feature called stubbing.  If this plugin finds a reference to a value that doesn’t exist, it creates it.  However, since it doesn’t know any actual data beyond the ID value(s), it fills it with nonsense data.  Later, when that ID comes up to be migrated, it will clean up after itself and replace the nonsense with actual values.

If, for some reason, you don’t want to stub out values, you can add no_stub: true to the configuration of the migration_lookup plugin.

Once the parent_id pseudo-field is populated, we use it to fill in the value of parent. The default_value plugin can take an additional parameter, source, which will override the default_value itself.  Using ‘@parent_id’ as the value of that allows us to substitute in the value we assigned to the pseudo-field (that’s what the @ does).  If there isn’t a value in that, then default_value will use 0, meaning no parent is assigned to this taxonomy term.

Nodes

Finally, we get to migrate some actual content! Nodes are the heart of Drupal’s content management.

example_migrate/config/install/migrate_plus.migration_group.example_nodes_group.yml

  1. # Migration Group for nodes
  2. id: example_nodes_group
  3. label: Nodes Group
  4. description: Common config for node migrations
  5.  
  6. # Here we add any default configuration settings to be shared
  7. # among all migrations in the group.
  8. shared_configuration:
  9. source:
  10. key: migration_source_db
  11.  
  12. destination:
  13. plugin: entity:node
  14.  
  15. process:
  16. nid: nid
  17. type: type
  18. title: title
  19. uid:
  20. plugin: migration_lookup
  21. source: node_uid
  22. migration: example_user
  23.  
  24. status: status
  25. created: created
  26. changed: changed
  27. comment: comment
  28. promote: promote
  29. sticky: sticky
  30.  
  31. field_body: field_body
  32. 'field_body/format':
  33. plugin: default_value
  34. default_value: 'filtered_html'
  35.  
  36. field_category:
  37. plugin: migration_lookup
  38. source: field_category
  39. migration: example_category
  40. no_stub: true
  41.  
  42. migration_dependencies:
  43. required:
  44. - example_category
  45. - example_user
  46.  
  47. dependencies: { }

By now, you should be able to read pretty much all of this.  It’s important to note, though, that this is a migration group. So why are we setting field process configuration here?  Because most of these are fields that are common to all Drupal nodes - nid, type, title, uid, etc.  The rest of them are fields that are common to all the nodes in this particular site. Either way, we can use the migration group to set these common parameters and avoid having to do it in every node migration.

  1. migration_dependencies:
  2. required:
  3. - example_category
  4. - example_user

This is really the only new thing we’re adding here - dependencies on other migrations.  Because there are entity reference fields in this content type, we need to ensure that the user and category migrations have been run before this one.

example_migrate/config/install/migrate_plus.migration.example_nodes.yml

  1. # Migration for Example Nodes.
  2. id: example_node
  3. label: Example Nodes
  4. migration_group: example_node_group
  5.  
  6. source:
  7. plugin: d7_node
  8. node_type: example_node
  9.  
  10. destination:
  11. plugin: entity:node
  12.  
  13. process:
  14. type:
  15. plugin: default_value
  16. default_value: example_node
  17.  
  18. field_related_nodes:
  19. source: field_related_nodes
  20. plugin: sub_process
  21. process:
  22. target_id:
  23. plugin: migration_lookup
  24. source: target_id
  25. migration: example_blog_post
  26.  
  27. migration_dependencies:
  28. required:
  29. - example_blog_post

At this point, you should be able to read almost all of this without any further introduction.  The only new thing being introduced here is one of the most useful plugins, sub_process. Formerly known as iterator, this plugin is pretty much a foreach loop.  It runs whatever is assigned to its process in a loop, assuming that the source value is an array.  When the plugin specified in the process is run, it is given each of the array rows one at a time. It then uses its own source parameter as the key in that array.  In this case, it is looking up an array of blog posts and finding the new NIDs, then assigning them to target_id.  You can see the API page linked above for more details.

Handy-dandy Drush Command

One thing about Drupal 8’s new config management is, once you’ve enabled a module it doesn’t generally read the config files from that module again.  Drush provides the cim command to route around this problem.  Run this command whenever you make a change to a migration .yml file:

drush cim -y --partial --source=modules/custom/example_module/config/install/

Obviously, replace ‘example_module’ with the name of your module.

That’s it for today, folks!  Next time, we’ll cover migrations from sources other than Drupal.

Joshua Turton

Joshua Turton

Senior Developer