Version Control

Reddit Discussion

Whenever you are working on something that changes over time, it’s useful to be able to track those changes. This can be for a number of reasons: it gives you a record of what changed, how to undo it, who changed it, and possibly even why. Version control systems (VCS) give you that ability. They let you commit changes to a set of files, along with a message describing the change, as well as look at and undo changes you’ve made in the past.

Most VCS support sharing the commit history between multiple users. This allows for convenient collaboration: you can see the changes I’ve made, and I can see the changes you’ve made. And since the VCS tracks changes, it can often (though not always) figure out how to combine our changes as long as they touch relatively disjoint things.

There a lot of VCSes out there that differ a lot in what they support, how they function, and how you interact with them. Here, we’ll focus on git, one of the more commonly used ones, but I recommend you also take a look at Mercurial.

With that all said – to the cliffnotes!

Is git dark magic?

not quite.. you need to understand the data model. we’re going to skip over some of the details, but roughly speaking, the core “thing” in git is a commit.

initially, the repository (roughly: the folder that git manages) has no content, and no commits. let’s set that up:

$ git init hackers
$ cd hackers
$ git status

the output here actually gives us a good starting point. let’s dig in and make sure we understand it all.

first, “On branch master”.

let’s skip over “No commits yet” because that’s all there is to it.

then, “nothing to commit”.

while you’re playing with the above, try to run git status to see what git thinks you’re doing – it’s surprisingly helpful!

A commit you say…

okay, we have a commit, now what?

What’s in a name?

clearly, names are important in git. and they’re the key to understanding a lot of what goes on in git. so far, we’ve talked about commit hashes, master, and HEAD. but there’s more!

Clean up your mess

your commit history will very often end up as:

that’s fine as far as git is concerned, but is not very helpful to your future self, or to other people who are curious about what has changed. git lets you clean up these things:

Playing with others

a common use-case for version control is to allow multiple people to make changes to a set of files without stepping on each other’s toes. or rather, to make sure that if they step on each other’s toes, they won’t just silently overwrite each other’s changes.

git is a distributed VCS: everyone has a local copy of the entire repository (well, of everything others have chosen to publish). some VCSes are centralized (e.g., subversion): a server has all the commits, clients only have the files they have “checked out”. basically, they only have the current files, and need to ask the server if they want anything else.

every copy of a git repository can be listed as a “remote”. you can copy an existing git repository using git clone ADDRESS (instead of git init). this creates a remote called origin that points to ADDRESS. you can fetch names and the commits they point to from a remote with git fetch REMOTE. all names at a remote are available to you as REMOTE/NAME, and you can use them just like local names.

if you have write access to a remote, you can change names at the remote to point to commits you’ve made using git push. for example, let’s make the master name (branch) at the remote origin point to the commit that our master branch currently points to:

often you’ll use GitHub, GitLab, BitBucket, or something else as your remote. there’s nothing “special” about that as far as git is concerned. it’s all just names and commits. if someone makes a change to master and updates github/master to point to their commit (we’ll get back to that in a second), then when you git fetch github, you’ll be able to see their changes with git log github/master.

Working with others

so far, branches seem pretty useless: you can create them, do work on them, but then what? eventually, you’ll just make master point to them anyway, right?

inevitably, you will have to merge changes in one branch with changes in another, whether those changes are made by you or someone else. git lets you do this with, unsurprisingly, git merge NAME. merge will:

once your big feature has been finished, you can merge its branch into master, and git will ensure that you don’t lose any changes from either branch!

if you’ve used git in the past, you may recognize merge by a different name: pull. when you do git pull REMOTE BRANCH, that is:

this usually works great. as long as the changes to the branches being merged are disjoint. if they are not, you get a merge conflict. sounds scary…

you’ve just resolved your first git merge conflict! \o/ now you can publish your finished changes with git push

When worlds collide

when you push, git checks that no-one else’s work is lost if you update the remote name you’re pushing too. it does this by checking that the current commit of the remote name is an ancestor of the commit you are pushing. if it is, git can safely just update the name; this is called fast-forwarding. if it is not, git will refuse to update the remote name, and tell you there have been changes.

if your push is rejected, what do you do?

Further reading

XKCD on git

Exercises

  1. On a repo try modifying an existing file. What happens when you do git stash? What do you see when running git log --all --oneline? Run git stash pop to undo what you did with git stash. In what scenario might this be useful?

  2. One common mistake when learning git is to commit large files that should not be managed by git or adding sensitive information. Try adding a file to a repository, making some commits and then deleting that file from history (you may want to look at this). Also if you do want git to manage large files for you, look into Git-LFS

  3. Git is really convenient for undoing changes but one has to be familiar even with the most unlikely changes
    1. If a file is mistakenly modified in some commit it can be reverted with git revert. However if a commit involves several changes revert might not be the best option. How can we use git checkout to recover a file version from a specific commit?
    2. Create a branch, make a commit in said branch and then delete it. Can you still recover said commit? Try looking into git reflog. (Note: Recover dangling things quickly, git will periodically automatically clean up commits that nothing points to.)
    3. If one is too trigger happy with git reset --hard instead of git reset changes can be easily lost. However since the changes were staged, we can recover them. (look into git fsck --lost-found and .git/lost-found)
  4. In any git repo look under the folder .git/hooks you will find a bunch of scripts that end with .sample. If you rename them without the .sample they will run based on their name. For instance pre-commit will execute before doing a commit. Experiment with them

  5. Like many command line tools git provides a configuration file (or dotfile) called ~/.gitconfig . Create and alias using ~/.gitconfig so that when you run git graph you get the output of git log --oneline --decorate --all --graph (this is a good command to quickly visualize the commit graph)

  6. Git also lets you define global ignore patterns under ~/.gitignore_global, this is useful to prevent common errors like adding RSA keys. Create a ~/.gitignore_global file and add the pattern *rsa, then test that it works in a repo.

  7. Once you start to get more familiar with git, you will find yourself running into common tasks, such as editing your .gitignore. git extras provides a bunch of little utilities that integrate with git. For example git ignore PATTERN will add the specified pattern to the .gitignore file in your repo and git ignore-io LANGUAGE will fetch the common ignore patterns for that language from gitignore.io. Install git extras and try using some tools like git alias or git ignore.

  8. Git GUI programs can be a great resource sometimes. Try running gitk in a git repo an explore the different parts of the interface. Then run gitk --all what are the differences?

  9. Once you get used to command line applications GUI tools can feel cumbersome/bloated. A nice compromise between the two are ncurses based tools which can be navigated from the command line and still provide an interactive interface. Git has tig, try installing it and running it in a repo. You can find some usage examples here.

Edit this page.

Licensed under CC BY-NC-SA.