Rewriting History with Git

Brian McMurray, Software Architect
#Development | Posted

Recently while working on a client project I had the need to split some code out of a subdirectory in one Git repository into its own separate repository.

In particular, we had built a Drupal site and had some custom modules we had included in the project's repository that we now wanted to separate those modules out to become their own repositories so that other sites within the organization could use the same code.

Just copying and pasting would have been a quick way to accomplish the task, but I really wanted to be able to preserve the commit history of the code even after it was separated.

Luckily there are some pretty powerful features within Git that allow you to literally rewrite history when it comes to performing tasks like this.

In this example, let's say the project's repository was structured something like this:

  1. .
  2. src/
  3. modules/
  4. my_custom_module/
  6. my_custom_module.module
  7. another_module/

All that we want are the contents of src/modules/my_custom_module/ in our new repository because we're going to use Drush Make to do a git checkout to include the module. We want our final repository structure to look like this:

  1. .
  3. my_custom_module.module

Git has a great command called filter-branch which allows us to pull this off pretty easily!

Here's what I did:

First, I prepared a new repository for the module code:

[sh]mkdir my_custom_module
cd my_custom_module
git init[/sh]

Next, I went to my project's repository where the code I wanted to separate was located and ran these commands:

[sh]# Use filter-branch's subdirectory filter to reduce the history to only
# commits that affected this directory path.
# This also takes care of rewriting the history so that
# files within this subdirectory path will now appear at the
# repository root.

git filter-branch --subdirectory-filter src/modules/my_custom_module/

# Create a new temporary branch in the project's repository to
# push to our new module-specific repository, and let's go ahead
# and use Drupal branch conventions for the name.

git branch 7.x-1.x

# Add our new module-specific repository as a remote to our project repository.

git remote add my_custom_module /Path/To/my_custom_module/repository

# Push the filtered history and code to the new repository.

git push my_custom_module 7.x-1.x[/sh]

So at this point, we have filtered the code we wanted out of the project's repository, made a branch of just that filtered code, and pushed it to our new module-specific repository. You can verify your code is in the module-specific repository by running a git checkout 7.x-1.x (the branch you pushed to that repository) and checking the git log.

Now, let's say, that you have separated this code to its own repository, you want to remove every trace of it from your project's repository. You can use git filter-branch to do that, too.

[sh]# Use git filter-branch's index-filter to rewrite our commit history and remove any commits to our now-separated module's former location.

git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch src/modules/my_custom_module' HEAD[/sh]

And there we have it, the commit history of our separated module still exists in its own module-specific repository, but we've rewritten history in our project repository so that it doesn't look like that module ever even existed in that code.

Some important notes:

You're rewriting history when you're doing these operations and that can be dangerous and messy.

It can be particularly challenging because of the distributed nature of Git – your remotes will have to be force-updated and you will have to contact anyone who might have cloned your repository to alert them to the changes. Consider it just as hard as convincing multiple real-life historians all keeping their own copies of a timeline to suddenly update their copies because you've revised yours – it's tricky.  Check out Chris Johnson's blog about digging into more Git!

Brian McMurray

Software Architect