Development

Migration: Making It All Work

Joshua Turton, Senior Developer
#Content Management | Posted

We’ve written a lot about content migration on our blog here—it’s something we have more than a passing interest in, because we do it a lot! The posts below cover the project management, estimation, and basics of content migration from Drupal to Drupal, and other sources too.

If you’ve been following along with this series, you will have a lot of good information at your fingertips. (If you haven’t, I highly recommend you do so now. We’re building on their foundation here.)

If you’ve tried to implement the code samples, you might even have a functional migration!

But what if you don’t? What if you’re getting errors, or unexpected results, or just… nothing at all? Well, that’s what this post is all about. What do you do, when what you did isn’t doing what you thought it was going to do?

Here are some tips, tricks, and starters for figuring out what went wrong.

Migrate Message / Map Tables

Drupal 8’s migrate system is very responsible. When you create and first run a migration, it adds two new tables to your database: migrate_map_[migration_id] and migrate_message_[migration_id].

The migrate_map_ tables are comparison tables. They store three items of interest:

  • Source ID (sourceid1)

    • the ID of the content item in the old system (as defined in your migration’s source: ids section)

  • Destination ID (destid1)

    • the ID of the content item in the new site, as determined by Drupal’s entity system.

  • Status (source_row_status)

    • An integer reflecting the status of that migration line. This allows you to see if it was imported, ignored, or failed.

The purpose of storing this information is two-fold. First, it allows the MigrationLookup plugin to do its thing, associating old references to migrated content at the new ID. Second, and more relevant for this post, the Status field is a handy reference for you to quickly determine if a given item of content successfully made the transition to the new system. Imported is 0, Ignored is 2, Failed is 3.

What’s the difference between ignored and failed? Ignored means that something in your migration told Drupal to skip that row. Usually, that’s the SkipOnEmpty or SkipRowIfNotSet plugins; some other process plugins will call SkipOnEmpty, as part of their own working. This is usually intentional and not a reason to worry.

Failed means that Drupal tried, and it didn’t work. This is often because a necessary field isn’t set, or because there was a PHP error or other problems along the way.

Failures are where the migrate_message_ tables come in. In most cases, when Drupal records a failed migration row, it will also provide a message giving you a clue as to why that happened. The two tables are cross-referenced by source_ids_hash. In the event of a failed migration, migrate_message is the first stop in your diagnosis.

For example, a common one when dealing with files is

File public://example_file.pdf' does not exist

Obviously, this is a simple fix—either replace the file, or fix the file path in the source data. Look at what Drupal’s telling you here, and see if it’s something you can easily fix. If nothing jumps out at you, keep reading.

Power Cycle Script

Why isn’t the change you made in your migration.yml file working? Well, Drupal probably doesn’t know about it. By default, Drupal imports a module’s configuration files only when the module is initially enabled. The ‘active’ configuration lives in the database, so changes you make to your migration module’s YML files are not registered by Drupal.

In order to overcome this, you have to run a configuration import in drush. It looks like this:

drush cim --partial --source=modules/custom/example_migration/config/install/

Of course, typing that out every time you make a small change in your config is a hassle, so script it! A good migration script will stop the migration, reset it, reimport the configuration, roll the data import back, and run it again. Having a script like that will save you the hassle of typing the same five drush commands over and over, and possibly forgetting to import config or reset your migration. You can find ours in our D8 Examples repository. Run it from the command line and relax, secure in the knowledge that your scripts are using the most current configuration.1

Adding Line Numbers in YML Files

This tip is a little obscure, but useful. When you are migrating from an XML source, there are a whole bunch of things you have to specify in your source config. Notably, you have to specify the fields you will be using from the XML. The details of how this works are spelled out in a previous blog post, Migrating to Drupal From Alternate Sources, linked at the start of this article.

example_xml_migrate/config/install/migrate_plus.migration.example_xml_articles.yml

  1. id: example_xml_articles
  2. label: 'Import articles'
  3. status: true
  4. source:
  5. plugin: url
  6. data_fetcher_plugin: http
  7. urls: 'https://www.phase2technology.com/ideas/rss.xml'
  8. data_parser_plugin: simple_xml
  9. item_selector: /rss/channel/item
  10. fields:
  11. -
  12. name: guid
  13. label: GUID
  14. selector: guid
  15. -
  16. name: title
  17. label: Title
  18. selector: title
  19. -
  20. name: pub_date
  21. label: 'Publication date'
  22. selector: pubDate

The migrate_plus module gives us the ability to create migration groups, which allow us to consolidate configuration. That’s covered in detail in our blog post Drupal 8 Migrations: Taxonomy and Nodes, also linked above.

These two ideas combine together pretty nicely: You can in fact call out common fields in your XML by putting them in a migration group, and then use those fields in multiple migrations.

example_xml_migrate/config/install/migrate_plus.migration_group.example_xml_group.yml

  1. id: example_xml_group
  2. label: General Content Imports
  3. description: Common configuration for node migrations from XML.
  4. source_type: XML File
  5. shared_configuration:
  6. source:
  7. plugin: url
  8. data_fetcher_plugin: http
  9. urls: 'https://www.phase2technology.com/ideas/rss.xml'
  10. data_parser_plugin: simple_xml
  11. item_selector: /rss/channel/item
  12. fields:
  13. 1:
  14. name: guid
  15. label: GUID
  16. selector: guid
  17. 2:
  18. name: title
  19. label: Title
  20. selector: title
  21. 3:
  22. name: pub_date
  23. label: 'Publication date'
  24. selector: pubDate

However—there’s a catch. Note that, unlike any other YML file shown in this series so far, the fields here have numerical array keys. This is because, if you specify a fields section in your individual migrations as well, and both arrays are keyed with the normal -, the migration fields section will completely override the group fields section. Keying them by number in both YML files allows them to be additive. Just make sure that your group and migration keys don’t collide.

Weirdly, the process section of migration config seems to be additive already; you can specify process plugins in both the group and the migration. The migration will only override duplicates on the field level, not the whole shebang.

XDebug and figuring out failures/error messages.

This is the big one, the lynchpin of debugging a migration. I’m not going to tell you how to set up XDebug with your IDE and dev environment. Let’s face it, there are a berjillion different dev environments, and every one of them is set up differently. Fortunately, there seem to be two berjillion tutorials on making XDebug work with whatever your dev setup is. So, go figure out how to make that part of things work, then you can start looking for issues in the code.2

OK, got that part? Good.

A good IDE will allow you to set “breakpoints”. Wikipedia defines them thus:

...a breakpoint is an intentional stopping or pausing place in a program, put in place for debugging purposes.

When XDebug is enabled, and the IDE is ‘listening’ to your server, the execution of the code will halt at the breakpoint(s), and you should get some tools to examine the state of variables in that moment. In this case, we’ll be setting breakpoints in a few key files in the migration process.

Migrate Executable

First up, the main executable file in migrate module: core/modules/migrate/src/MigrateExecutable.php. This class file has the massively important import() method.

The import() method is the spider in the center of the web of a migration. It calls tons of other methods as it checks requirements, gets the source data, gets the destination configuration, and loops through the data to create & save new content in the target environment.3

  • Line 184 calls getSource(). This method attempts to retrieve the data you’re planning on migrating. Setting a breakpoint here will allow you to dig into the retrieval process.

    • The getSource() method invokes the source plugin you’ve specified in your migration and migration_group YML file. See below for more detail on Source Plugins.

    • If Drupal gets through to line 197 without throwing an exception, and $source has a value, then Drupal is (probably) successfully retrieving the data. You can use your IDE to examine the validity of $source, just to be sure.

  • Line 198 is the start of where you’re most likely to encounter errors. The while loop defined here cycles through all the data from the source, and runs processRow() on each $row of data (Line 203). This method calls all the process plugins defined in your migration and migration groups.

    • When you use your IDE to step through processRow(), you will quickly find yourself in the individual process plugins (Line 368). These are the code files that do actual data manipulation. There’s a bunch defined in core, more in migrate_plus, and you can also create your own. If you determine that an issue is happening in one of those specifically, you should probably just put a breakpoint there; it’ll save you a lot of clicking.

  • Finally, line 226 calls $destination->import(), which is where the data is actually saved to the destination environment. Usually, if you’ve gotten to this point, saving is smooth sailing, but if the problems aren’t occurring in the process section, this is a likely next bet.

    • This line of code will lead you to the Destination Plugin; see below for more details.

MigrateExecutable.php also defines a bunch of exception error message. This is the source of many of the messages seen in migrate_message_ tables, as well as command line errors when running migrations with Drush. The messages can also be a good way to figure out where to set breakpoints - track down the error message, then backtrack to the try statement that’s associated with that exception’s catch.

When you’re searching through the code, bear in mind that Drupal does a lot of string substitution in error messages, like so: 'Migration @id did not meet the requirements. @message @requirements'. Make sure you edit your search terms to exclude things that are specific to your situation, like the migration @id.

Source Plugins

Drupal core defines a lot of source plugins—one for pretty much every entity type present in Drupal 6 and 7, in fact, plus revisions and translations. And, confusingly, these files are not stored in the migrate or migrate_drupal modules. Source plugins are stored in the folder of the module that defines the entity type. For example, the source plugins for node entities are at drupal/core/modules/node/src/Plugin/migrate/source.

Fortunately, Drupal 8 is object oriented, which means that each of these plugins will have the same base set of methods in them. The most important is prepareRow().

  • The prepareRow() method is responsible for loading and preparing each row of data for the migration. For example, the D7 nodes plugin gets the baseline node values and then adds in any Field API data associated with that node.

  • Every Source plugin will have this method. In the end, they are all responsible for returning an array of objects in a uniform format that the process plugins will understand. How they do this will, of course vary, based on the type of source data, but their output at the end should be effectively identical.

  • Generally speaking, if you are having issues with your source data being weird, the problem probably isn’t in the source plugin. It’s more likely that you are somehow specifying things incorrectly in your migration or migration_group YML file.

The migrate_plus module also provides a URL source plugin, which is used for XML, JSON, and RSS imports. It’s substantially more abstracted than the core DB-based entity plugins. In addition to the URL source, it makes use of data fetchers, which grab the data from either a file or an http request, and data parsers, which are responsible for reading and understanding the format of the data. They do not directly invoke the prepareRow() method; instead, your debugging will likely need to poke into the data parsers.

Destination Plugins

The code that does the work of formatting and saving entities is substantially more abstracted than the source plugins, because all entity types in Drupal 8 are structurally the same. This particular functionality is pretty battle-hardened, so it’s unlikely that your issues will be here. That said, if you do have need of debugging it, start with drupal/core/modules/migrate/src/Plugin/migrate/destination/Entity.php.

Still Not Working? Have Additional Ideas?

Migration’s a pretty involved process, with a lot of moving parts. If you’ve tried all of this, and nothing is making it any better, well, it might be time to seek help. The #migration channel on Drupal Slack is a great place to start. I can be found there as @srjosh.

If you’ve discovered a legitimate bug in the core migrate code, you’ll want to peruse the issue queue. The migrate_plus module has its own issue queue, as well.

If you’re interested in the latest on the efforts to stabilize and improve migrate in Drupal, the Migration Initiative is the group responsible. They can be found at @MigrateDrupal.

If you have a tip, trick, or snippet that just plain makes your migration life easier, please drop it in the comments below. Happy migrating!

 

  1. It is also possible to store the migration yml files in module/migrations, instead of module/config/install. Additionally, the naming convention is simplified—it's just migration_id.yml, instead of migrate_plus.migration.migration_id.yml. This allows migration configurations to be reimported with only a cache clear, instead of running a config import. However, migrate groups from the migrate_plus module have not caught up with this, meaning that you still have to put them in module/config/install, and import them with a config import. Which you choose is your call, but there’s a lot of value in having a consistent workflow for both migrations and migration groups. ↩︎
  2. Here at Phase2, we’ve standardized on using Docksal for our dev setups, and a lot of us use PHPStorm. The Docskal docs for integrating the two are great. ↩︎
  3. Please note that all line numbers reference Drupal version 8.6.x; line numbers from other versions can and will vary. ↩︎
Joshua Turton

Joshua Turton

Senior Developer