Subscribe to XML Feed
11 Jul 2011

Reset Demystified

One of the topics that I didn't cover in depth in the Pro Git book is the reset command. Most of the reason for this, honestly, is that I never strongly understood the command beyond the handful of specific use cases that I needed it for. I knew what the command did, but not really how it was designed to work.

Since then I have become more comfortable with the command, largely thanks to Mark Dominus's article re-phrasing the content of the man-page, which I always found very difficult to follow. After reading that explanation of the command, I now personally feel more comfortable using reset and enjoy trying to help others feel the same way.

This post assumes some basic understanding of how Git branching works. If you don't really know what HEAD and the Index are on a basic level, you might want to read chapters 2 and 3 of this book before reading this post.

The Three Trees of Git


The way I now like to think about reset and checkout is through the mental frame of Git being a content manager of three different trees. By 'tree' here I really mean "collection of files", not specifically the data structure. (Some Git developers will get a bit mad at me here, because there are a few cases where the Index doesn't exactly act like a tree, but for our purposes it is easier - forgive me).

Git as a system manages and manipulates three trees in it's normal operation. Each of these is covered in the book, but let's review them.

Tree Roles
The HEADlast commit snapshot, next parent
The Indexproposed next commit snapshot
The Working Directorysandbox

The HEAD last commit snapshot, next parent

The HEAD in Git is the pointer to the current branch reference, which is in turn a pointer to the last commit you made or the last commit that was checked out into your working directory. That also means it will be the parent of the next commit you do. It's generally simplest to think of it as HEAD is the snapshot of your last commit.

In fact, it's pretty easy to see what the snapshot of your HEAD looks like. Here is an example of getting the actual directory listing and SHA checksums for each file in the HEAD snapshot:

$ cat .git/HEAD 
ref: refs/heads/master

$ cat .git/refs/heads/master 
e9a570524b63d2a2b3a7c3325acf5b89bbeb131e

$ git cat-file -p e9a570524b63d2a2b3a7c3325acf5b89bbeb131e
tree cfda3bf379e4f8dba8717dee55aab78aef7f4daf
author Scott Chacon  1301511835 -0700
committer Scott Chacon  1301511835 -0700

initial commit

$ git ls-tree -r cfda3bf379e4f8dba8717dee55aab78aef7f4daf
100644 blob a906cb2a4a904a152...   README
100644 blob 8f94139338f9404f2...   Rakefile
040000 tree 99f1a6d12cb4b6f19...   lib

The Index next proposed commit snapshot

The Index is your proposed next commit. Git populates it with a list of all the file contents that were last checked out into your working directory and what they looked like when they were originally checked out. It's not technically a tree structure, it's a flattened manifest, but for our purposes it's close enough. When you run git commit, that command only looks at your Index by default, not at anything in your working directory. So, it's simplest to think of it as the Index is the snapshot of your next commit.

$ git ls-files -s
100644 a906cb2a4a904a152e80877d4088654daad0c859 0	README
100644 8f94139338f9404f26296befa88755fc2598c289 0	Rakefile
100644 47c6340d6459e05787f644c2447d2595f5d3a54b 0	lib/simplegit.rb

The Working Directory sandbox, scratch area

Finally, you have your working directory. This is where the content of files are placed into actual files on your filesystem so they're easily edited by you. The Working Directory is your scratch space, used to easily modify file content.

$ tree
.
├── README
├── Rakefile
└── lib
    └── simplegit.rb

1 directory, 3 files

The Workflow

So, Git is all about recording snapshots of your project in successively better states by manipulating these three trees, or collections of contents of files.


Let's visualize this process. Say you go into a new directory with a single file in it. We'll call this V1 of the file and we'll indicate it in blue. Now we run git init, which will create a Git repository with a HEAD reference that points to an unborn branch (aka, nothing)


At this point, only the Working Directory tree has any content.

Now we want to commit this file, so we use git add to take content in your Working Directory and populate our Index with the updated content


Then we run git commit to take what the Index looks like now and save it as a permanent snapshot pointed to by a commit, which HEAD is then updated to point at.


At this point, all three of the trees are the same. If we run git status now, we'll see no changes because they're all the same.

Now we want to make a change to that file and commit it. We will go through the same process. First we change the file in our working directory.


If we run git status right now we'll see the file in red as "changed but not updated" because that entry differs between our Index and our Working Directory. Next we run git add on it to stage it into our Index.


At this point if we run git status we will see the file in green under 'Changes to be Committed' because the Index and HEAD differ - that is, our proposed next commit is now different from our last commit. Those are the entries we will see as 'to be Committed'. Finally, we run git commit to finalize the commit.


Now git status will give us no output because all three trees are the same.

Switching branches or cloning goes through a similar process. When you checkout a branch, it changes HEAD to point to the new commit, populates your Index with the snapshot of that commit, then checks out the contents of the files in your Index into your Working Directory.

The Role of Reset

So the reset command makes more sense when viewed in this context. It directly manipulates these three trees in a simple and predictable way. It does up to three basic operations.

Step 1: Moving HEAD killing me --soft ly

The first thing reset will do is move what HEAD points to. Unlike checkout it does not move what branch HEAD points to, it directly changes the SHA of the reference itself. This means if HEAD is pointing to the 'master' branch, running git reset 9e5e64a will first of all make 'master' point to 9e5e64a before it does anything else.


No matter what form of reset with a commit you invoke, this is the first thing it will always try to do. If you add the flag --soft, this is the only thing it will do. With --soft, reset will simply stop there.

Now take a second to look at that diagram and realize what it did. It essentially undid the last commit you made. When you run git commit, Git will create a new commit and move the branch that HEAD points to up to it. When you reset back to HEAD~ (the parent of HEAD), you are moving the branch back to where it was without changing the Index (staging area) or Working Directory. You could now do a bit more work and commit again to accomplish basically what git commit --amend would have done.

Step 2: Updating the Index having --mixed feelings

Note that if you run git status now you'll see the in green the difference between the Index and what the new HEAD is.

The next thing reset will do is to update the Index with the contents of whatever tree HEAD now points to so they're the same.


If you specify the --mixed option, reset will stop at this point. This is also the default, so if you specify no option at all, this is where the command will stop.

Now take another second to look at THAT diagram and realize what it did. It still undid your last commit, but also unstaged everything. You rolled back to before you ran all your git adds AND git commit.

Step 3: Updating the Working Directory math is --hard, let's go shopping

The third thing that reset will do is to then make the Working Directory look like the Index. If you use the --hard option, it will continue to this stage.


Finally, take yet a third second to look at that diagram and think about what happened. You undid your last commit, all the git adds, and all the work you did in your working directory.

It's important to note at this point that this is the only way to make the reset command dangerous (ie: not working directory safe). Any other invocation of reset can be pretty easily undone, the --hard option cannot, since it overwrites (without checking) any files in the Working Directory. In this particular case, we still have v3 version of our file in a commit in our Git DB that we could get back by looking at our reflog, but if we had not committed it, Git still would have overwritten the file.

Overview

That is basically it. The reset command overwrites these three trees in a specific order, stopping when you tell it to.

  • #1) Move whatever branch HEAD points to (stop if --soft)
  • #2) THEN, make the Index look like that (stop here unless --hard)
  • #3) THEN, make the Working Directory look like that

There are also --merge and --keep options, but I would rather keep things simpler for now - that will be for another article.

Boom. You are now a reset master.

Reset with a Path

Well, I lied. That's not actually all. If you specify a path, reset will skip the first step and just do the other ones but limited to a specific file or set of files. This actually sort of makes sense - if the first step is to move a pointer to a different commit, you can't make it point to part of a commit, so it simply doesn't do that part. However, you can use reset to update part of the Index or the Working Directory with previously committed content this way.

So, assume we run git reset file.txt. This assumes, since you did not specify a commit SHA or branch that points to a commit SHA, and that you provided no reset option, that you are typing the shorthand for git reset --mixed HEAD file.txt, which will:

  • #1) Move whatever branch HEAD points to (stop if --soft)
  • #2) THEN, make the Index look like that (stop here unless --hard)

So it essentially just takes whatever file.txt looks like in HEAD and puts that in the Index.


So what does that do in a practical sense? Well, it unstages the file. If we look at the diagram for that command vs what git add does, we can see that it is simply the opposite. This is why the output of the git status command suggests that you run this to unstage a file.


We could just as easily not let Git assume we meant "pull the data from HEAD" by specifying a specific commit to pull that file version from to populate our Index by running something like git reset eb43bf file.txt.


So what does that mean? That functionally does the same thing as if we had reverted the content of the file to v1, ran git add on it, then reverted it back to to v3 again. If we run git commit, it will record a change that reverts that file back to v1, even though we never actually had it in our Working Directory again.

It's also pretty interesting to note that like git add --patch, the reset command will accept a --patch option to unstage content on a hunk-by-hunk basis. So you can selectively unstage or revert content.

A fun example

I may use the term "fun" here a bit loosely, but if this doesn't sound like fun to you, you may drink while doing it. Let's look at how to do something interesting with this newfound power - squashing commits.

If you have this history and you're about to push and you want to squash down the last N commits you've done into one awesome commit that makes you look really smart (vs a bunch of commits with messages like "oops.", "WIP" and "forgot this file") you can use reset to quickly and easily do that (as opposed to using git rebase -i).

So, let's take a slightly more complex example. Let's say you have a project where the first commit has one file, the second commit added a new file and changed the first, and the third commit changed the first file again. The second commit was a work in progress and you want to squash it down.


You can run git reset --soft HEAD~2 to move the HEAD branch back to an older commit (the first commit you want to keep):


And then simply run git commit again:


Now you can see that your reachable history, the history you would push, now looks like you had one commit with the one file, then a second that both added the new file and modified the first to it's final state.

Check it out

Finally, some of you may wonder what the difference between checkout and reset is. Well, like reset, checkout manipulates the three trees and it is a bit different depending on whether you give the command a file path or not. So, let's look at both examples seperately.

git checkout [branch]

Running git checkout [branch] is pretty similar to running git reset --hard [branch] in that it updates all three trees for you to look like [branch], but there are two important differences.

First, unlike reset --hard, checkout is working directory safe in this invocation. It will check to make sure it's not blowing away files that have changes to them. Actually, this is a subtle difference, because it will update all of the working directory except the files you've modified if it can - it will do a trivial merge between what you're checking out and what's already there. In this case, reset --hard will simply replace everything across the board without checking.

The second important difference is how it updates HEAD. Where reset will move the branch that HEAD points to, checkout will move HEAD itself to point to another branch.

For instance, if we have two branches, 'master' and 'develop' pointing at different commits, and we're currently on 'develop' (so HEAD points to it) and we run git reset master, 'develop' itself will now point to the same commit that 'master' does.

On the other hand, if we instead run git checkout master, 'develop' will not move, HEAD itself will. HEAD will now point to 'master'. So, in both cases we're moving HEAD to point to commit A, but how we do so is very different. reset will move the branch HEAD points to, checkout moves HEAD itself to point to another branch.


git checkout [branch] file

The other way to run checkout is with a file path, which like reset, does not move HEAD. It is just like git reset [branch] file in that it updates the index with that file at that commit, but it also overwrites the file in the working directory. Think of it like git reset --hard [branch] file - it would be exactly the same thing, it is also not working directory safe and it also does not move HEAD. The only difference is that reset will a file name will not accept --hard, so you can't actually run that.

Also, like git reset and git add, checkout will accept a --patch option to allow you to selectively revert file contents on a hunk-by-hunk basis.

Cheaters Gonna Cheat

Hopefully now you understand and feel more comfortable with the reset command, but are probably still a little confused about how exactly it differs from checkout and could not possibly remember all the rules of the different invocations.

So to help you out, I've created something that I pretty much hate, which is a table. However, if you've followed the article at all, it may be a useful cheat sheet or reminder. The table shows each class of the reset and checkout commands and which of the three trees it updates.

Pay especial attention to the 'WD Safe?' column - if it's red, really think about it before you run that command.

head index work dir wd safe
Commit Level  
reset --soft [commit] REF NO NO YES
reset [commit] REF YES NO YES
reset --hard [commit] REF YES YES NO
checkout [commit] HEAD YES YES YES
File Level  
reset (commit) [file] NO YES NO YES
checkout (commit) [file] NO YES YES NO

Good night, and good luck.

View Comments
25 Aug 2010

Note to Self

One of the cool things about Git is that it has strong cryptographic integrity. If you change any bit in the commit data or any of the files it keeps, all the checksums change, including the commit SHA and every commit SHA since that one. However, that means that in order to amend the commit in any way, for instance to add some comments on something or even sign off on a commit, you have to change the SHA of the commit itself.

Wouldn’t it be nice if you could add data to a commit without changing it’s SHA? If only there existed an external mechanism to attach data to a commit without modifying the commit message itself. Happy day! It turns out there exists just such a feature in newer versions of Git! As we can see from the Git 1.6.6 release notes where this new functionality was first introduced:

* "git notes" command to annotate existing commits.

Need any more be said? Well, maybe. How do you use it? What does it do? How can it be useful? I’m not sure I can answer all of these questions, but let’s give it a try. First of all, how does one use it?

Well, to add a note to a specific commit, you only need to run git notes add [commit], like this:

$ git notes add HEAD

This will open up your editor to write your commit message. You can also use the -m option to provide the note right on the command line:

$ git notes add -m 'I approve - Scott' master~1

That will add a note to the first parent on the last commit on the master branch. Now, how to view these notes? The easiest way is with the git log command.

$ git log master
commit 0385bcc3bc66d1b1ec07346c237061574335c3b8
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Tue Jun 22 20:09:32 2010 -0700

  yield to run block right before accepting connections

commit 06ca03a20bb01203e2d6b8996e365f46cb6d59bd
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Wed May 12 06:47:15 2010 -0700

  no need to delete these header names now

Notes:
  I approve - Scott

You can see the notes appended automatically in the log output. You can only have one note per commit in a namespace though (I will explain namespaces in the next section), so if you want to add a note to that commit, you have to instead edit the existing one. You can either do this by running:

$ git notes edit master~1

Which will open a text editor with the existing note so you can edit it:

I approve - Scott

#
# Write/edit the notes for the following object:
#
# commit 06ca03a20bb01203e2d6b8996e365f46cb6d59bd
# Author: Ryan Tomayko <rtomayko@gmail.com>
# Date:   Wed May 12 06:47:15 2010 -0700
# 
#     no need to delete these header names now
# 
#  kidgloves.rb |    2 --
#  1 files changed, 0 insertions(+), 2 deletions(-)
~                                                                               
~                                                                               
~                                                                               
".git/NOTES_EDITMSG" 13L, 338C

Sort of weird, but it works. If you just want to add something to the end of an existing note, you can run git notes append SHA, but only in newer versions of Git (I think 1.7.1 and above).

Notes Namespaces

Since you can only have one note per commit, Git allows you to have multiple namespaces for your notes. The default namespace is called ‘commits’, but you can change that. Let’s say we’re using the ‘commits’ notes namespace to store general comments but we want to also store bugzilla information for our commits. We can also have a ‘bugzilla’ namespace. Here is how we would add a bug number to a commit under the bugzilla namespace:

$ git notes --ref=bugzilla add -m 'bug #15' 0385bcc3

However, now you have to tell Git to specifically look in that namespace:

$ git log --show-notes=bugzilla
commit 0385bcc3bc66d1b1ec07346c237061574335c3b8
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Tue Jun 22 20:09:32 2010 -0700

  yield to run block right before accepting connections

Notes (bugzilla):
  bug #15

commit 06ca03a20bb01203e2d6b8996e365f46cb6d59bd
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Wed May 12 06:47:15 2010 -0700

  no need to delete these header names now

Notes:
  I approve - Scott

Notice that it also will show your normal notes. You can actually have it show notes from all your namespaces by running git log --show-notes=* - if you have a lot of them, you may want to just alias that. Here is what your log output might look like if you have a number of notes namespaces:

$ git log -1 --show-notes=*
commit 0385bcc3bc66d1b1ec07346c237061574335c3b8
Author: Ryan Tomayko <rtomayko@gmail.com>
Date:   Tue Jun 22 20:09:32 2010 -0700

    yield to run block right before accepting connections

Notes:
    I approve of this, too - Scott

Notes (bugzilla):
    bug #15

Notes (build):
    build successful (8/13/10)

You can also switch the current namespace you’re using so that the default for writing and showing notes is not ‘commits’ but, say, ‘bugzilla’ instead. If you export the variable GIT_NOTES_REF to point to something different, then the --ref and --show-notes options are not neccesary. For example:

$ export GIT_NOTES_REF=refs/notes/bugzilla

That will set your default to ‘bugzilla’ instead. It has to start with the ‘refs/notes/’ though.

Sharing Notes

Now, here is where the general usability of this really breaks down. I am hoping that this will be improved in the future and I put off writing this post because of my concern with this phase of the process, but I figured it has interesting enough functionality as-is that someone might want to play with it.

So, the notes (as you may have noticed in the previous section) are stored as references, just like branches and tags. This means you can push them to a server. However, Git has a bit of magic built in to expand a branch name like ‘master’ to what it really is, which is ‘refs/heads/master’. Unfortunately, Git has no such magic built in for notes. So, to push your notes to a server, you cannot simply run something like git push origin bugzilla. Git will do this:

$ git push origin bugzilla
error: src refspec bugzilla does not match any.
error: failed to push some refs to 'git@github.com:schacon/kidgloves.git'

However, you can push anything under ‘refs/’ to a server, you just need to be more explicit about it. If you run this it will work fine:

$ git push origin refs/notes/bugzilla
Counting objects: 3, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 263 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To git@github.com:schacon/kidgloves.git
 * [new branch]      refs/notes/bugzilla -> refs/notes/bugzilla

In fact, you may want to just make that git push origin refs/notes/* which will push all your notes. This is what Git does normally for something like tags. When you run git push origin --tags it basically expands to git push origin refs/tags/*.

Getting Notes

Unfortunately, getting notes is even more difficult. Not only is there no git fetch --notes or something, you have to specify both sides of the refspec (as far as I can tell).

$ git fetch origin refs/notes/*:refs/notes/*
remote: Counting objects: 12, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 12 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (12/12), done.
From github.com:schacon/kidgloves
 * [new branch]      refs/notes/bugzilla -> refs/notes/bugzilla

That is basically the only way to get them into your repository from the server. Yay. If you want to, you can setup your Git config file to automatically pull them down though. If you look at your .git/config file you should have a section that looks like this:

[remote "origin"]
  fetch = +refs/heads/*:refs/remotes/origin/*
  url = git@github.com:schacon/kidgloves.git

The ‘fetch’ line is the refspec of what Git will try to do if you run just git fetch origin. It contains the magic formula of what Git will fetch and store local references to. For instance, in this case it will take every branch on the server and give you a local branch under ‘remotes/origin/’ so you can reference the ‘master’ branch on the server as ‘remotes/origin/master’ or just ‘origin/master’ (it will look under ‘remotes’ when it’s trying to figure out what you’re doing). If you change that line to fetch = +refs/heads/*:refs/remotes/manamana/* then even though your remote is named ‘origin’, the master branch from your ‘origin’ server will be under ‘manamana/master’.

Anyhow, you can use this to make your notes fetching easier. If you add multiple fetch lines, it will do them all. So in addition to the current fetch line, you can add a line that looks like this:

  fetch = +refs/notes/*:refs/notes/*

Which says also get all the notes references on the server and store them as though they were local notes. Or you can namespace them if you want, but that can cause issues when you try to push them back again.

Collaborating on Notes

Now, this is where the main problem is. Merging notes is super difficult. This means that if you pull down someone’s notes, you edit any note in a namespace locally and the other developer edits any note in that same namespace, you’re going to have a hard time getting them back in sync. When the second person tries to push their notes it will look like a non-fast-forward just like a normal branch update, but unline a normal branch you can’t just run git pull and then try again. You have to check out your notes ref as if it were a normal branch, which will look ridiculously confusing and then do the merge and then switch back. It is do-able, but probably not something you really want to do.

Because of this, it’s probably best to namespace your notes or better just have an automated process create them (like build statuses or bugzilla artifacts). If only one entity is updating your notes, you won’t have merge issues. However, if you want to use them to comment on commits within a team, it is going to be a bit painful.

So far, I’ve heard of people using them to have their ticketing system attach metadata automatically or have a system attach associated mailing list emails to commits they concern. Other people just use them entirely locally without pushing them anywhere to store reminders for themselves and whatnot.
Probably a good start, but the ambitious among you may come up with something else interesting to do. Let me know!

View Comments
09 Jun 2010

Pro Git 简体中文版

The amazing Chunzi, in addition to translating and helping to coordinate the translation of the Chinese version of Pro Git, has just sent me a draft epub of the book in Chinese.

As he says:

“下载地址:http://cl.ly/da7a450319adfac01108

此书基本完成翻译,但仍有大半尚未审阅。今心血来潮,做了 epub 版本,可在 iPad 上阅读。生成代码在 http://github.com/chunzi/progit/tree/master/epub-zh。

既然是开源图书,所以优先级比较低。后续更新时再作新版。”

I have also uploaded the file to S3 in addition to his cl.ly link. If you want Pro Git in Chinese on your iPad, you can download it here.

祝你好运!

View Comments
06 Jun 2010

Kindle Version Available

When Pro Git was first released, I asked about being able to get it on my Kindle. In fact, one of the very first people to read the book was @adelcambre with a mobi file I generated myself. It was horrible looking, because I didn’t do it very well, but it did work. My editor at Apress wanted to get a professional Kindle version produced, but wasn’t sure if it was going to get done anytime soon.

Well, I just randomly found out that it did in fact finally get done. About 9 months after it was first published, I saw on my Amazon referal account that I had sold a Kindle version, which confused me. However, I went to Amazon and there it was:

I assume it looks much, much better than the version I originally did for Andy. (I should probably get myself a copy). So, if you feel like reading it on the Kindle, have at it. There have also been a number of people who have asked me how they can support the book without getting a dead-tree version, so here’s your chance. You can even get a Kindle reader for your computer, so you can search through it and whatnot. Enjoy!

View Comments

Older Posts

Pro Git on your iPad 17 May 2010 Comments
Pro Git CliffNotes 22 Apr 2010 Comments
Git Loves the Environment 11 Apr 2010 Comments
Replace Kicker 17 Mar 2010 Comments
Git's Little Bundle of Joy 10 Mar 2010 Comments
Rerere Your Boat... 08 Mar 2010 Comments
Smart HTTP Transport 04 Mar 2010 Comments
Undoing Merges 02 Mar 2010 Comments
Translate This 19 Aug 2009 Comments
The Gory Details 28 Jul 2009 Comments
Welcome to the Pro Git website 11 Feb 2009 Comments