Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2022-03-06T17:32:58+0000


Core Git commands

Overview

This tutorial describes five core Git commands: status, add (also rm), commit, push, pull. It does not cover all features of these commands; it concentrates on the ways you are most likely to use them in your course projects.

Because you’ll be working on the command line, this tutorial also assumes that you already know about command-line navigation. If you are new to the command line, or uncertain about your knowledge, we recommend first working through parts 1–3 of the Software Carpentry tutorial about the Unix shell.

Before you start

How Git thinks about your files

Your local Git repo has three types of spaces: the working area (sometimes called the working tree), the staging area (sometimes also called the index), and the local branch. You won’t understand how Git works without understanding the relationship among these areas, so take a moment now to read about them at Lucas Maurer’s Git Gud: The Working Tree, Staging Area, and Local Repo. The descriptions below assume that you have read this introduction.

There is also a fourth type of space in your local repo, a remote tracking branch, with which you interact explicitly primarily when you use git fetch. We won’t practice fetching here, but we’ll refer below to remote tracking branches where appropriate.

In our course you must do all work on the command line inside your local Git repo. Specifically:

Users often find Git disorienting at first because it is not like other systems we may know for sharing files, like Dropbox or Google Docs, but whatever the virtues of those cloud storage systems, Git is better for collaborative code development because that is what it was designed for. The commands below are enough to get you started, and you will become comfortable with them quickly as you use them regularly to develop your course projects. Don’t be shy about asking any of the instructors if you get stuck or confused!

Branches

The branches we think about

Branches are different versions of a project that coexist, and Git allows you to work on one, then switch and work on another, and then merge branches to bring all of your work together. Every GitHub repo always has a main branch. A common use of branches might be that the main branch holds the latest stable version of your project, and you create separate branches to develop different features. Once each feature is stable, you merge its branch into the main branch and delete the feature branch. The reason for this workflow is that it keeps the main branch stable; you add new content to the main branch only after you have verified in your feature branch that it is working properly.

Some projects, especially small ones created by small teams, are developed entirely in the main branch, without creating feature branches to develop individual features. This simplifies your repo in obvious ways, at the expense of having different things going on in the main branch at the same time, some of which may be broken because you are still working on them. Whether your project team will use feature branches or work entirely in main is up to each team. We do not discuss here how to create and select feature branches; if you decide to use them, consult with your project mentor for more information.

The branches we may not think about as branches

When we use Git, we typically synchronize our local repo with a repo of the same name on GitHub, and our teammates do the same. GitHub thus serves as the middle man, so that we share new content with teammates by pushing our changes to GitHub, after which our teammates pull them from there, even as we pull our teammates’ changes after they push them. It might appear, then, as if each member of the team is working with two versions of the content: the local repo on the local file system (that is, the files and other information on your own computer) and the version on GitHub (which you can see by visiting GitHub in a web browser). In fact, though, we are working with three versions: 1) the version on GitHub (the remote branch), 2) what we know locally about what’s on GitHub (the local tracking branch), and 3) our local files (the local branch, including our working tree and our staging area). To keep these in sync we need to push new local work to GitHub, and we need to pull new work by our teammates from GitHub.

In GitHub terms the remote branch, the local tracking branch, and the local branch are all branches, which may be confusing because they may all appear to have the same name. For example, if you have a local branch called main, there is typically a branch called main on GitHub as well as a copy of that GitHub branch on your local machine called remotes/origin/main (your local tracking branch, which you update every time you fetch or pull from GitHub). Additionally, your local workspace may contain changes that are not part of your local branch or any other. Changes become part of your local branch only after you commit them, and before that they are located in either the working tree or the staging area. They are in your Git repo directory, but they aren’t part of any branch.

If you run git branch in your local repo, it will list all local branches on your machine, with an asterisk next to the one you are working in. In most cases in this course you won’t have any feature branches, so your only local branch will be called main and your output will look like:

* main

If you run git branch -a (the -a switch stands for ‘all’), you see all branches that your repo machine knows about, which means your local branches plus any local tracking branches, that is, the state of branches with the same name on GitHub as of the last time you fetched or pulled. (This command also shows information about the location of your HEAD, which we won’t discuss here.) The output of git branch -a might look like:

* main
  remotes/origin/HEAD -> origin/main
  remotes/origin/main

Ignoring the line about HEAD for the moment, the line that reads simply main with a leading asterisk is your local branch called main, and remotes/origin/main is your local tracking branch for the main branch on GitHub the last time you fetched or pulled. If your teammates have pushed new content since your last fetch or pull, those recent changes will not be reflected in remotes/origin/main until you fetch or pull again.

In the discussion below we will need to distinguish your local branch, your local tracking branch (what your machine knows about what’s in the corresponding branch on GitHub), and your remote branch (what’s actually in the corresponding branch on GitHub).

git status

You can run this command at any time to get more information about the state of your repo. git status provides information about the current state of your local repo, including:

For example, when I run git status in one of my local repos, I see:

On branch verb-xproc-for-each
Your branch is ahead of 'origin/verb-xproc-for-each' by 1 commit.
  (use "git push" to publish your local commits)

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   pos/verb/paradigm-values.md

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   pos/verb/verb-generate.xspec
	modified:   pos/verb/verb-lib.xspec

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	pos/verb/modules/verb-2b.xsl

Reading from the top down, this status message reports the following:

Run git status as often as you’d like to keep track of the state of your work. We recommend running it before adding, committing, pushing, or pulling, and looking at the results to ensure that you are about to do what you think you are about to do. Note that running git status prompts you about things you might want to do next.

git add

git add, followed by a directory or file name, moves the specified files from the working tree into the staging area. git add is the only command discussed here that can refer to individual files and directories, so if you want to commit or push only some new or changed files and not others, you have to specify that at the moment when you add the files.

We commonly use git add in the following ways:

Using git add -A will ensure that you haven’t forgotten to add changes you want to add, but you don’t always want to add everything that you’ve changed at once (see below, under the discussion of commits, for details). Don’t get in the habit of using git add -A automatically; when you are about to use it, run git status first and verify that you want to stage (add) and commit all of your changes at once. If you accidentally add something you wanted to add only later, or not at all, you can remove it from the staging area before you commit; git status will tell you how.

Note that:

git commit

We recommend committing by running git commit -m "commit message goes here", that is, always specifying the commit message as part of the command. We explain why below.

git commit moves all files in your staging area (you put them there with git add) into your local branch, that is, it makes them part of the history of the project. All commits must be accompanied by a commit message, typically a single line that describes what the changes do, possibly also with more detailed information. The quality of your commit message matters because Git keeps a history of all commits, and if you later want to undo a change, the commit message will help you find it. Stop now to read Chris Beams’s How to write a Git commit message.

When you run git commit you cannot specify that you want to commit some files and not others; git commit always commits everything in the staging area at once. The way you control which files to commit is with the add instruction; you should add only the files you want to commit together, with a shared commit message, and then run git commit to commit all of them. This is why you don’t want to run git add -A carelessly. If, for example, you work on two different features at once, you should add only the files from one feature, commit them with a meaningful message, and then add the files from the other and commit them separately, with a separate meaningful commit message.

If you run git commit with the -m switch, followed by text in quotation marks, that text becomes your commit message. This is the easiest way to add the message. If you run just git commit by itself, Git will open an editor so that you can compose your commit message separately. By default Git uses an editor called Vim, which can be confusing to new users, and that’s the editor that will open unless you’ve configured your instance of Git to use a different editor (if you can’t find instructions that make sense about how to do this on line, ask your project mentor for help). If you accidentally forget the -m switch, then, you’ll find yourself in the Vim editing environment, and possibly confused about how to get out. To get out of Vim, hit the Escape key followed by a colon (your cursor should move to the bottom line when you do that), then a lower-case q, and then the Enter key. This should deposit you back at the command line and display a message that the commit has been aborted. You can then try again with the -m switch.

The commit command also has an -a switch, which can be combined with the -m switch. The -a switch adds all changed files in the repo and commits them as a single operation, so if you use it, you don’t need to use git add first. git commit -a differs, through, from git add -A followed by git commit because git commit -a adds all changed files, but not any new files, that is, files that have never been committed (the technical term for these is untracked files, and you’ll see them referred to that way when you run git status). If you are tempted to use git commit -a, you should first run git status first to verify that it will add all the files you want to add, and no others.

You can see a history of commits with git log. There are ways to customize the output of this command, and we won’t go into detail here, but if you need to revert a commit (that is, undo a change you committed), you can use git log to find the commit you want to revert by its commit message (see why meaningful commit messages matter?!) and then use the appropriate Git commands to revert it.

git push

git push uploads to GitHub the changes you have committed to your local branch. This is how you share your work with your teammates, who then fetch or pull those changes down to their local machines. git push pushes all commits at the same time. That is, just as git commit commits all changes you have moved to the staging area with git add, and you can’t pick and choose among files you want to commit at the commit stage, git push pushes all commits you have made, so you can’t pick and choose among commits at the push stage. In other words, if you want to share only some of your work at a time, you have to start thinking about that when you add.

If your teammates have pushed content to GitHub since your last fetch or pull, Git will refuse to let you push until you first fetch or pull that new content. If you get a message to that effect, run git pull and then try again to push.

git pull

git pull does two things at once: it fetches changes from GitHub to your local version of the corresponding GitHub branch (your local tracking branch, i.e., remotes/origin/main) and it merges your local tracking branch into your local branch. This is how you get your teammates’ work into your local branch. You can fetch and merge separately, and you can execute other types of merges (that is, not only between branches that share a name), but most commonly in course projects git pull will do what you want.

When you pull you may get an error report about a merge conflict. Merge conflicts happen when the version of a file that you pull from GitHub differs from your local version and Git doesn’t know which version you want to keep as the result of the merge. Git can merge versions of the same file with different changes as long as the changes are on different lines, but because Git tracks changes at the level of lines, you’ll raise a merge conflict if the two versions have different changes on the same line. Here’s how to think about merge conflicts: