Digging into the internals of Git

While giving an introductory session to Git for some new coworkers I encountered the following question: What happens to a tag if you delete the branch that it came from? This was related to deleting a release branch that has one or more tags associated with it after merging those changes to a mainline branch to start another release development cycle. As a long time user of version control systems but a relatively new user of Git this question uncovered a gap in my knowledge about the underlying structures Git uses to do its work.

Chris Johnson, VP of Engineering
#Development | Posted

While giving an introductory session to Git for some new coworkers I encountered the following question: What happens to a tag if you delete the branch that it came from? This was related to deleting a release branch that has one or more tags associated with it after merging those changes to a mainline branch to start another release development cycle. As a long time user of version control systems but a relatively new user of Git this question uncovered a gap in my knowledge about the underlying structures Git uses to do its work.

To answer this question it's important to understand how Git tracks changes. Let's initialize a repository

[sh]> mkdir test-repo
> cd test-repo
> git init .[/sh]

After doing so poking around in the .git directory shows us the following structure:

  1. .git/
  2. |-- HEAD
  3. |-- branches
  4. |-- config
  5. |-- description
  6. |-- hooks
  7. | |-- applypatch-msg.sample
  8. | |-- commit-msg.sample
  9. | |-- post-commit.sample
  10. | |-- post-receive.sample
  11. | |-- post-update.sample
  12. | |-- pre-applypatch.sample
  13. | |-- pre-commit.sample
  14. | |-- pre-rebase.sample
  15. | |-- prepare-commit-msg.sample
  16. | |-- update.sample
  17. |-- info
  18. | |-- exclude
  19. |-- objects
  20. | |-- info
  21. | |-- pack
  22. |-- refs
  23. |-- heads
  24. |-- tags

From there we add a test file and commit it.

  1. > echo "This is a test" > test.txt
  2. > git add test.txt
  3. > git commit -m "Initial commit"

 Our .git directory now shows the following:

  1. .git/
  2. |-- COMMIT_EDITMSG
  3. |-- HEAD
  4. |-- branches
  5. |-- config
  6. |-- description
  7. |-- hooks
  8. | |-- applypatch-msg.sample
  9. | |-- commit-msg.sample
  10. | |-- post-commit.sample
  11. | |-- post-receive.sample
  12. | |-- post-update.sample
  13. | |-- pre-applypatch.sample
  14. | |-- pre-commit.sample
  15. | |-- pre-rebase.sample
  16. | |-- prepare-commit-msg.sample
  17. | |-- update.sample
  18. |-- index
  19. |-- info
  20. | |-- exclude
  21. |-- logs
  22. | |-- HEAD
  23. | |-- refs
  24. | |-- heads
  25. | |-- master
  26. |-- objects
  27. | |-- 05
  28. | | |-- 27e6bd2d76b45e2933183f1b506c7ac49f5872
  29. | |-- 10
  30. | | |-- 7d016f8fbd0ea7a6475b56e276ad313ee0f073
  31. | |-- c2
  32. | | |-- 69d751b8e2fd0be0d0dc7a6437a4dce4ec0200
  33. | |-- info
  34. | |-- pack
  35. |-- refs
  36. |-- heads
  37. | |-- master
  38. |-- tags

That may be surprising amount of change for one commit. At this point it's important to understand everything Git has stored. It's fundamental to understanding how Git's branching works and to what really happens when a branch is created and deleted.

There are currently three files in our .git/objects tree and git show allows us to see that they represent a blob with the contents of test.txt, the commit that we just made and a tree object which can be thought of as Git's internal filesystem.

  1. > git show 0527e6
  2. This is a test
  1. > git show 107d01
  2. commit 107d016f8fbd0ea7a6475b56e276ad313ee0f073
  3. Author: Chris Johnson <cjohnson@.....>
  4. Date: Fri Feb 3 05:03:46 2012 -0500
  5.  
  6. Initial commit
  7.  
  8. diff --git a/test.txt b/test.txt
  9. new file mode 100644
  10. index 0000000..0527e6b
  11. --- /dev/null
  12. +++ b/test.txt
  13. @@ -0,0 +1 @@
  14. +This is a test
  1. > git show c269d7
  2. tree c269d7
  3.  
  4. test.txt

Show isn't the best command to use for examining tree objects, ls-tree is much more informative.

  1. > git ls-tree c269d7
  2. 100644 blob 0527e6bd2d76b45e2933183f1b506c7ac49f5872 test.txt

There is also a bit of information missing from the information from our commit object is useful to see.

  1. > git show --pretty=raw 107d01
  2. commit 107d016f8fbd0ea7a6475b56e276ad313ee0f073
  3. tree c269d751b8e2fd0be0d0dc7a6437a4dce4ec0200
  4. author Chris Johnson <cjohnson@.....> 1328263426 -0500
  5. committer Chris Johnson <cjohnson@.....> 1328263426 -0500
  6.  
  7. Initial commit
  8.  
  9. diff --git a/test.txt b/test.txt
  10. new file mode 100644
  11. index 0000000..0527e6b
  12. --- /dev/null
  13. +++ b/test.txt

After a subsequent commit examining the commit message starts to shed light on what a branch in Git really is

  1. > echo "An update" >> test.txt
  2. > git commit -m "Update to test.txt" test.txt
  3. > git show --pretty=raw
  4. git show --pretty=raw b12a515
  5. commit b12a5158c941187daf127bf1f777d592f642fe47
  6. tree f2b51d0ec8c6efb9430dc4183e7f782b2697fa44
  7. parent 107d016f8fbd0ea7a6475b56e276ad313ee0f073
  8. author Chris Johnson <cjohnson@.....> 1328265034 -0500
  9. committer Chris Johnson <cjohnson@.....> 1328265034 -0500
  10.  
  11. Update to test.txt
  12.  
  13. diff --git a/test.txt b/test.txt
  14. index 0527e6b..c81e6a6 100644
  15. --- a/test.txt
  16. +++ b/test.txt
  17. @@ -1 +1,2 @@
  18. This is a test
  19. +An update
  20. <hr />
  21. > cat .git/refs/heads/master
  22. b12a5158c941187daf127bf1f777d592f642fe47

So the branch points at a particular commit object. The commit contains information about the state of the filesystem in the pointer to the tree object and a pointer to the previous commit. So a branch is a specific commit and the commits reachable by following parent references. When you make a new branch, initially the only thing created will be the reference to the same commit as its ancestor branch. Commits made on the new branch eventually trace their ancestry back to the original chain of commits on the branch from which the new branch was descended.

The important question now becomes, what happens to the commit objects when a branch is deleted? The answer is that nothing happens until Git's garbage collection runs, at which point orphaned commits are deleted. An orphaned commit is one that isn't reachable from any of the references Git has. What this means in terms of the original question is illustrated by creating and examining a tag.

  1. > git tag -a 2012-02-03 -m "Example tag"
  2. > git show --pretty=raw 2012-02-03
  3. git show --pretty=raw 2012-02-03
  4. tag 2012-02-03
  5. Tagger: Chris Johnson <cjohnson@.....>
  6.  
  7. Example tag
  8.  
  9. commit b12a5158c941187daf127bf1f777d592f642fe47
  10. tree f2b51d0ec8c6efb9430dc4183e7f782b2697fa44
  11. parent 107d016f8fbd0ea7a6475b56e276ad313ee0f073
  12. author Chris Johnson <cjohnson@.....> 1328265034 -0500,
  13. committer Chris Johnson <cjohnson@.....> 1328265034 -0500
  14.  
  15. Update to test.txt
  16.  
  17. diff --git a/test.txt b/test.txt
  18. index 0527e6b..c81e6a6 100644
  19. --- a/test.txt
  20. +++ b/test.txt
  21. @@ -1 +1,2 @@
  22. This is a test
  23. +An update

Creating a tag creates a reference to the underlying commit so that even if the branch is deleted Git has a reference to the commit and through the parent attribute all of the commits that led up to it. This means nothing that had been in the branch is considered orphaned and you can delete the branch with impunity if you like. In the original question the merge of the release branch would also provide Git a path back to the commits so a tag isn't necessary to preserve the commits of interest when deleting a branch.

For more information you may want to check out the following resources
http://book.git-scm.com/1_the_git_object_model.html
http://gitready.com/advanced/2009/01/17/restoring-lost-commits.html

Chris Johnson

Chris Johnson

VP of Engineering