Drupal 8 Migrations: Taxonomy and Nodes

Joshua Turton | Senior Developer

May 4, 2018

Migration is a complex and interesting topic. We’ve already covered some important migration information on our blog:

Drupal 8 Content Migration: A Guide For Marketers - What content should we migrate, and how do we organize and plan a migration?
Estimating Drupal 8 Migration Scope - How long will all this take?
Managing Your Drupal 8 Migration - Key concepts, setting up the tools, and starting with a user migration.

In this post, we’ll delve into node and taxonomy migrations, which is where the bulk of the content in most sites is stored. If you haven’t read through the previous installments in this series, I highly recommend you do so. We’ll be building on some of those concepts here.

Categories and Tags

Categorization and tagging in Drupal is managed through the use of taxonomies. Taxonomies are a ‘fieldable entity’ type, much like users and nodes. As we did with user migrations in our previous post, we will assign all the taxonomy migrations to our general migration group.

In this case, Category is a taxonomy vocabulary that has two fields: name (the taxonomy term itself) and description. This example is based on the migration template file from the core taxonomy module.

example_migrate/config/install/migrate_plus.migration.example_category.yml

# Migration for category taxonomy
id: example_category
label: Category Taxonomy terms
migration_group: example_general
deriver: Drupal\taxonomy\Plugin\migrate\D7TaxonomyTermDeriver
source:
  plugin: d7_taxonomy_term
  bundle: example_category
destination:
  plugin: entity:taxonomy_term
process:
  tid: tid
  vid:
    plugin: default_value
    default_value: example_category
  
  name: name
  weight: weight
  'description/value': description
  'description/format': format
  # Only attempt to stub real (non-zero) parents.
  parent_id:
    -
      plugin: skip_on_empty
      method: process
      source: parent
    -
      plugin: migration_lookup
      migration: example_category
  parent:
    plugin: default_value
    default_value: 0
    source: '@parent_id'
  changed: timestamp
migration_dependencies: { }

Wow again! As is often the case, there’s lots of things going on here. Let’s break down the new stuff.

deriver: Drupal\taxonomy\Plugin\migrate\D7TaxonomyTermDeriver

This is a component within the Drupal 8 core taxonomy module. When migration loads the taxonomy term, the Deriver goes and loads all the field values from the source database and attaches it to the term. This saves us the heavy lifting of writing SQL to load all those field values ourselves.

source:
  plugin: d7_taxonomy_term
  bundle: example_category
destination:
  plugin: entity:taxonomy_term[/php]

Here are a few key items. The source plugin defines where we are getting our data, and what format it’s going to come in. In this case, we are using Drupal core’s d7_taxonomy_term plugin.

Bundle tells us what type of term we’re loading; in this case the example_category.

The destination plugin defines what we’re making out of that data, and the format it ends up in. In this case, we’re using Drupal’s core entity handler for taxonomy_term.

vid:
  plugin: default_value
  default_value: example_category[/php]

There are a number of ways to deal with a taxonomy term’s vid. Because we are doing a migration of a single vocabulary, we are simply setting it to a constant value. Were we migrating several vocabularies at once, the static map plugin would probably be how we’d handle this. Note that vid is a machine name in D8 and an integer in D7, so you will have to do a translation of it somehow.

An additional note here about vocabularies: As noted in my previous posts, it’s also possible to migrate the vocabularies themselves. This would allow us to use the migration_lookup plugin here.

However, I recommend against migrating content and vocabulary types. Unless your content model is extremely simple, the changes to a content type’s fields are usually pretty significant. You’re better off putting in some labor up front to manually create them rather than trying to clean up a computer’s mess later.

  'description/value': description
  'description/format': format[/php]

So, this is an interesting example. Because it’s a rich-text field, description has both a value (the content itself) and a format. The ‘/’ syntax allows us to assign values to each. In this case, each of them is using the default get plugin, but we could use any process plugin we needed.

# Only attempt to stub real (non-zero) parents.
  parent_id:
    -
      plugin: skip_on_empty
      method: process
      source: parent
    -
      plugin: migration_lookup
      migration: example_category
   parent:
     plugin: default_value
     default_value: 0
     source: '@parent_id'

OK - this one is complicated. Because taxonomies can be hierarchical, they have a ‘parent’ parameter. Because that parent is a term that’s being migrated, we have to translate the source id into the new id by way of the migration lookup process plugin. It gets complicated, though, because we’re looking up values in the very same migration that’s currently running!

We start off by creating a ‘pseudo-field’: parent_id. This isn’t an actual field in the D8 site, but we can use it to hold a value temporarily. The skip_on_empty plugin will skip this field if there’s no value in the source data. Then we use the migration_lookup plugin to find the new value in the already migrated content.

“But wait!” some of you might be saying. “Migration is an iterative process - what happens if a term has a parent value in the source data that hasn’t been migrated yet?”

A great question and the answer is, “Magic!” Well, no, not really, but it does kind of seem that way. The migration_lookup plugin has a handy feature called stubbing. If this plugin finds a reference to a value that doesn’t exist, it creates it. However, since it doesn’t know any actual data beyond the ID value(s), it fills it with nonsense data. Later, when that ID comes up to be migrated, it will clean up after itself and replace the nonsense with actual values.

If, for some reason, you don’t want to stub out values, you can add no_stub: true to the configuration of the migration_lookup plugin.

Once the parent_id pseudo-field is populated, we use it to fill in the value of parent. The default_value plugin can take an additional parameter, source, which will override the default_value itself. Using ‘@parent_id’ as the value of that allows us to substitute in the value we assigned to the pseudo-field (that’s what the @ does). If there isn’t a value in that, then default_value will use 0, meaning no parent is assigned to this taxonomy term.

Nodes

Finally, we get to migrate some actual content! Nodes are the heart of Drupal’s content management.

example_migrate/config/install/migrate_plus.migration_group.example_nodes_group.yml

# Migration Group for nodes
id: example_nodes_group
label: Nodes Group
description: Common config for node migrations
# Here we add any default configuration settings to be shared
# among all migrations in the group.
shared_configuration:
  source:
    key: migration_source_db
  destination:
    plugin: entity:node
  process:
    nid: nid
    type: type
    title: title
    uid:
      plugin: migration_lookup
      source: node_uid
      migration: example_user
    status: status
    created: created
    changed: changed
    comment: comment
    promote: promote
    sticky: sticky
    field_body: field_body
    'field_body/format':
       plugin: default_value
       default_value: 'filtered_html'
    field_category:
      plugin: migration_lookup
      source: field_category
      migration: example_category
      no_stub: true
  migration_dependencies:
    required:
      - example_category
      - example_user
  dependencies: { }

By now, you should be able to read pretty much all of this. It’s important to note, though, that this is a migration group. So why are we setting field process configuration here? Because most of these are fields that are common to all Drupal nodes - nid, type, title, uid, etc. The rest of them are fields that are common to all the nodes in this particular site. Either way, we can use the migration group to set these common parameters and avoid having to do it in every node migration.

migration_dependencies:
    required:
      - example_category
      - example_user

This is really the only new thing we’re adding here - dependencies on other migrations. Because there are entity reference fields in this content type, we need to ensure that the user and category migrations have been run before this one.

example_migrate/config/install/migrate_plus.migration.example_nodes.yml

# Migration for Example Nodes.
id: example_node
label: Example Nodes
migration_group: example_node_group
source:
  plugin: d7_node
  node_type: example_node
destination:
  plugin: entity:node
process:
  type:
    plugin: default_value
    default_value: example_node
  field_related_nodes:
    source: field_related_nodes
    plugin: sub_process
    process:
      target_id:
        plugin: migration_lookup
        source: target_id
        migration: example_blog_post
migration_dependencies:
  required:
    - example_blog_post

At this point, you should be able to read almost all of this without any further introduction. The only new thing being introduced here is one of the most useful plugins, sub_process. Formerly known as iterator, this plugin is pretty much a foreach loop. It runs whatever is assigned to its process in a loop, assuming that the source value is an array. When the plugin specified in the process is run, it is given each of the array rows one at a time. It then uses its own source parameter as the key in that array. In this case, it is looking up an array of blog posts and finding the new NIDs, then assigning them to target_id. You can see the API page linked above for more details.

Handy-dandy Drush Command

One thing about Drupal 8’s new config management is, once you’ve enabled a module it doesn’t generally read the config files from that module again. Drush provides the cim command to route around this problem. Run this command whenever you make a change to a migration .yml file:

drush cim -y --partial --source=modules/custom/example_module/config/install/

Obviously, replace ‘example_module’ with the name of your module.

That’s it for today, folks! Next time, we’ll cover migrations from sources other than Drupal.

Recommended Next

Development

A Developer's Guide For Contributing To Drupal

Development

3 Steps to a Smooth Salesforce Integration

Development

Drupal 8 End of Life: What You Need To Know