Core Git commands

Maintained by: David J. Birnbaum (djbpitt@gmail.com)

Last modified: 2022-03-06T17:32:58+0000

Core Git commands

Before you start

How Git thinks about your files

Your local Git repo has three types of spaces: the working area (sometimes called the working tree), the staging area (sometimes also called the index), and the local branch. You won’t understand how Git works without understanding the relationship among these areas, so take a moment now to read about them at Lucas Maurer’s Git Gud: The Working Tree, Staging Area, and Local Repo. The descriptions below assume that you have read this introduction.

There is also a fourth type of space in your local repo, a remote tracking branch, with which you interact explicitly primarily when you use git fetch. We won’t practice fetching here, but we’ll refer below to remote tracking branches where appropriate.

In our course you must do all work on the command line inside your local Git repo. Specifically:

Do not use a Git graphical client (GUI; graphical user interface). Graphical clients are easier for beginners, but you won’t be a beginner for very long, and once you become comfortable with the basic commands, the command line will be both faster and more capable.
Do not use the browser to download from or upload individual files to GitHub. After you clone a repo, the only way you should transfer information between your local machine and GitHub is by pushing (to upload) and pulling (to download).
Do not edit any files directly on GitHub, that is, while inside your browser looking at a file on GitHub. The only editing you should do is in the files inside your local repo on your own machine.
Do not work in other space on your system and then copy the files into your repo when you are ready to share them. Do all of your work inside your local repo.

Users often find Git disorienting at first because it is not like other systems we may know for sharing files, like Dropbox or Google Docs, but whatever the virtues of those cloud storage systems, Git is better for collaborative code development because that is what it was designed for. The commands below are enough to get you started, and you will become comfortable with them quickly as you use them regularly to develop your course projects. Don’t be shy about asking any of the instructors if you get stuck or confused!

Branches

The branches we think about

Branches are different versions of a project that coexist, and Git allows you to work on one, then switch and work on another, and then merge branches to bring all of your work together. Every GitHub repo always has a main branch. A common use of branches might be that the main branch holds the latest stable version of your project, and you create separate branches to develop different features. Once each feature is stable, you merge its branch into the main branch and delete the feature branch. The reason for this workflow is that it keeps the main branch stable; you add new content to the main branch only after you have verified in your feature branch that it is working properly.

Some projects, especially small ones created by small teams, are developed entirely in the main branch, without creating feature branches to develop individual features. This simplifies your repo in obvious ways, at the expense of having different things going on in the main branch at the same time, some of which may be broken because you are still working on them. Whether your project team will use feature branches or work entirely in main is up to each team. We do not discuss here how to create and select feature branches; if you decide to use them, consult with your project mentor for more information.

The branches we may not think about as branches

When we use Git, we typically synchronize our local repo with a repo of the same name on GitHub, and our teammates do the same. GitHub thus serves as the middle man, so that we share new content with teammates by pushing our changes to GitHub, after which our teammates pull them from there, even as we pull our teammates’ changes after they push them. It might appear, then, as if each member of the team is working with two versions of the content: the local repo on the local file system (that is, the files and other information on your own computer) and the version on GitHub (which you can see by visiting GitHub in a web browser). In fact, though, we are working with three versions: 1) the version on GitHub (the remote branch), 2) what we know locally about what’s on GitHub (the local tracking branch), and 3) our local files (the local branch, including our working tree and our staging area). To keep these in sync we need to push new local work to GitHub, and we need to pull new work by our teammates from GitHub.

In GitHub terms the remote branch, the local tracking branch, and the local branch are all branches, which may be confusing because they may all appear to have the same name. For example, if you have a local branch called main, there is typically a branch called main on GitHub as well as a copy of that GitHub branch on your local machine called remotes/origin/main (your local tracking branch, which you update every time you fetch or pull from GitHub). Additionally, your local workspace may contain changes that are not part of your local branch or any other. Changes become part of your local branch only after you commit them, and before that they are located in either the working tree or the staging area. They are in your Git repo directory, but they aren’t part of any branch.

If you run git branch in your local repo, it will list all local branches on your machine, with an asterisk next to the one you are working in. In most cases in this course you won’t have any feature branches, so your only local branch will be called main and your output will look like:

* main

If you run git branch -a (the -a switch stands for ‘all’), you see all branches that your repo machine knows about, which means your local branches plus any local tracking branches, that is, the state of branches with the same name on GitHub as of the last time you fetched or pulled. (This command also shows information about the location of your HEAD, which we won’t discuss here.) The output of git branch -a might look like:

* main
  remotes/origin/HEAD -> origin/main
  remotes/origin/main

Ignoring the line about HEAD for the moment, the line that reads simply main with a leading asterisk is your local branch called main, and remotes/origin/main is your local tracking branch for the main branch on GitHub the last time you fetched or pulled. If your teammates have pushed new content since your last fetch or pull, those recent changes will not be reflected in remotes/origin/main until you fetch or pull again.

In the discussion below we will need to distinguish your local branch, your local tracking branch (what your machine knows about what’s in the corresponding branch on GitHub), and your remote branch (what’s actually in the corresponding branch on GitHub).

git status

You can run this command at any time to get more information about the state of your repo. git status provides information about the current state of your local repo, including:

The branch you are on. This will usually be main (or, in older repos, master) unless you have created additional branches. In the discussion below, we’ll assume that you are working in the main branch, and that it is your only local branch.
Whether your local current branch, main, is up to date with the corresponding local tracking branch, remotes/origin/main, which is what your local machine knows about what’s on GitHub. Your local branch may be behind the version you last fetched or pulled from GitHub (in which case you can tell it to catch up with git pull or git merge), ahead of the version you last fetched or pulled from GitHub (in which case you can run git push to add your new information to the repository on GitHub and update your remote tracking branch), or up to date with the remote tracking branch, that is, the most recent version you fetched or pulled from GitHub. Your local branch may also, simultaneously, be both behind your local tracking branch (there is new work on GitHub that you have not yet merged into your local branch) and ahead (you have new work in your local branch that you have not yet pushed to GitHub).

Importantly, up to date does not mean up to date with what is on GitHub because Git git status doesn’t look at GitHub; it compares the work in your local branch (e.g., main) with your remote tracking branch (e.g., remotes/origin/main). If your teammates have pushed anything to GitHub since your last fetch or pull, that newest information is not yet in your remote tracking branch, which means that git status won’t know about it. For this reason, it is helpful to pull often.
If you have changes that are not in GitHub, git status will tell you which changed files:
- Are ready to be pushed to GitHub, so that GitHub will catch up on any changes that you have committed locally with git commit but not yet pushed.
- Are ready to be committed, that is, have been moved onto your staging area with git add but have not yet been committed.
- Have changes not staged for commit, that is, the files have been committed previously, after which you changed them, but you haven’t yet run git add to move your changes to those files into your staging area.
- Are untracked, that is, have never been committed.

For example, when I run git status in one of my local repos, I see:

On branch verb-xproc-for-each
Your branch is ahead of 'origin/verb-xproc-for-each' by 1 commit.
  (use "git push" to publish your local commits)

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   pos/verb/paradigm-values.md

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   pos/verb/verb-generate.xspec
	modified:   pos/verb/verb-lib.xspec

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	pos/verb/modules/verb-2b.xsl

Reading from the top down, this status message reports the following:

I am working in a local branch called verb-xproc-for-each. If you have not created additional branches, you will normally be on a branch called main. In this example, I’m working in a feature branch called verb-xproc-for-each.
My verb-xproc-for-each branch is ahead of what my machine knows about what’s on GitHub, that is, with my remotes/origin/verb-xproc-for-each local tracking branch. This means that I have made one local commit that I have not yet pushed to GitHub. If my teammates have pushed new content to GitHub since my last fetch or pull, my machine also doesn’t know about that.
I have changes in one file that I have not yet committed. This means that I have edited this file and run git add to move it into my staging area, but I have not yet run git commit to save it into my local repository.
I have untracked changes (changes not staged for commit) in two files, which means that the files were committed previously and I then edited them in my working tree, but I have not yet moved the edited versions into my staging area with git add.
I have one untracked file, which is a new file that I created in my working tree but have never added to the staging area.

Run git status as often as you’d like to keep track of the state of your work. We recommend running it before adding, committing, pushing, or pulling, and looking at the results to ensure that you are about to do what you think you are about to do. Note that running git status prompts you about things you might want to do next.

git add

git add, followed by a directory or file name, moves the specified files from the working tree into the staging area. git add is the only command discussed here that can refer to individual files and directories, so if you want to commit or push only some new or changed files and not others, you have to specify that at the moment when you add the files.

We commonly use git add in the following ways:

git add plus a filename adds a file to the staging area, so that it is ready to be committed.
git add plus a directory name adds all new content in the directory (and everything below it, if there are subdirectories) to the staging area. git add . (the dot means ‘current directory’) adds all new content in the current directory and below; if you are in the top-level directory of your repo, it adds all new content in the repo.
git add -A adds all new content (new files and changed files) anywhere in the local repo, regardless of where you are located. This differs from the preceding command, which only adds new content in the current working directory and below.

Using git add -A will ensure that you haven’t forgotten to add changes you want to add, but you don’t always want to add everything that you’ve changed at once (see below, under the discussion of commits, for details). Don’t get in the habit of using git add -A automatically; when you are about to use it, run git status first and verify that you want to stage (add) and commit all of your changes at once. If you accidentally add something you wanted to add only later, or not at all, you can remove it from the staging area before you commit; git status will tell you how.

Note that:

You should create a .gitignore file in the main directory of your repo (it must have exactly this name, with the leading dot and no filename extension) and commit and push it. See How to use a .gitignore file. Your should mention at least .DS_Store (note the leading dot); this is an annoying MacOS housekeeping file that can confuse Git about whether your files have changed, so you want to tell Git to ignore it.
To remove files or directories from your repo, precede the regular shell deletion commands with git, so: git rm with the filename for a single file, git rmdir with the directory name for an empty directory, and git rm -rf with the directory name for a directory with content—be careful with this last one! If you forget to precede your command with the word git, you’ll remove the content from your local filesystem, but you won’t have told Git that you’ve removed it, much as if you create a new file you haven’t told Git about it until you run git add. This means that this type of deletion will be an untracked change until you run git rm, etc., so you might as well do that from the start to delete the file and notify Git with a single command.

Removing content from a repo in this way does not remove it from the Git history, and that means that you (or anyone who has cloned your repo) can restore it later. This is a feature; Git is designed to provide version control, and you want it to maintain a full history of every change you ever make to your repo. Should you need to remove something permanently and without a trace (for example, if you accidentally upload confidential information that should not have been shared), you need to rewrite your Git history, and not just delete the file, to make it irretrievable (ask your instructors for advice if you need to do this).
Because of the way Git tracks changes, you cannot create and commit empty directories. If you want to add a directory that will eventually hold content, but that doesn’t contain anything yet, create a small placeholder file and remove it later (with git rm) once you’ve added real content.

You can call a placeholder file whatever you want, and it can have any content, since you’re just putting it there to enable Git to see the directory, and you’re going to remove it later. The convention, though, is to create an empty (zero-byte) file called .gitkeep (with a leading dot). You can do that by running the command touch .gitkeep in the directory where you want to create it.

git commit

We recommend committing by running git commit -m "commit message goes here", that is, always specifying the commit message as part of the command. We explain why below.

git commit moves all files in your staging area (you put them there with git add) into your local branch, that is, it makes them part of the history of the project. All commits must be accompanied by a commit message, typically a single line that describes what the changes do, possibly also with more detailed information. The quality of your commit message matters because Git keeps a history of all commits, and if you later want to undo a change, the commit message will help you find it. Stop now to read Chris Beams’s How to write a Git commit message.

When you run git commit you cannot specify that you want to commit some files and not others; git commit always commits everything in the staging area at once. The way you control which files to commit is with the add instruction; you should add only the files you want to commit together, with a shared commit message, and then run git commit to commit all of them. This is why you don’t want to run git add -A carelessly. If, for example, you work on two different features at once, you should add only the files from one feature, commit them with a meaningful message, and then add the files from the other and commit them separately, with a separate meaningful commit message.

If you run git commit with the -m switch, followed by text in quotation marks, that text becomes your commit message. This is the easiest way to add the message. If you run just git commit by itself, Git will open an editor so that you can compose your commit message separately. By default Git uses an editor called Vim, which can be confusing to new users, and that’s the editor that will open unless you’ve configured your instance of Git to use a different editor (if you can’t find instructions that make sense about how to do this on line, ask your project mentor for help). If you accidentally forget the -m switch, then, you’ll find yourself in the Vim editing environment, and possibly confused about how to get out. To get out of Vim, hit the Escape key followed by a colon (your cursor should move to the bottom line when you do that), then a lower-case q, and then the Enter key. This should deposit you back at the command line and display a message that the commit has been aborted. You can then try again with the -m switch.

The commit command also has an -a switch, which can be combined with the -m switch. The -a switch adds all changed files in the repo and commits them as a single operation, so if you use it, you don’t need to use git add first. git commit -a differs, through, from git add -A followed by git commit because git commit -a adds all changed files, but not any new files, that is, files that have never been committed (the technical term for these is untracked files, and you’ll see them referred to that way when you run git status). If you are tempted to use git commit -a, you should first run git status first to verify that it will add all the files you want to add, and no others.

You can see a history of commits with git log. There are ways to customize the output of this command, and we won’t go into detail here, but if you need to revert a commit (that is, undo a change you committed), you can use git log to find the commit you want to revert by its commit message (see why meaningful commit messages matter?!) and then use the appropriate Git commands to revert it.

git pull

git pull does two things at once: it fetches changes from GitHub to your local version of the corresponding GitHub branch (your local tracking branch, i.e., remotes/origin/main) and it merges your local tracking branch into your local branch. This is how you get your teammates’ work into your local branch. You can fetch and merge separately, and you can execute other types of merges (that is, not only between branches that share a name), but most commonly in course projects git pull will do what you want.

When you pull you may get an error report about a merge conflict. Merge conflicts happen when the version of a file that you pull from GitHub differs from your local version and Git doesn’t know which version you want to keep as the result of the merge. Git can merge versions of the same file with different changes as long as the changes are on different lines, but because Git tracks changes at the level of lines, you’ll raise a merge conflict if the two versions have different changes on the same line. Here’s how to think about merge conflicts:

Most merge conflicts arise because of poor Git hygiene, that is, from failing to pull before every working session and push at the end of every working session. The more time that elapses between the pull and push activities that synchronize your local files with those on GitHub, the greater the opportunity for you and your teammates to edit the same lines in the same file and cause a merge conflict.
Even with proper Git hygiene, merge conflicts may arise through accidental bad timing. Should you encounter a merge conflict, don’t panic, but do ask your project mentor for guidance in resolving it. All developers encounter merge conflicts occasionally, and they are not difficult to recover from, but they are a nuisance, so you should be mindful about pushing and pulling regularly in order to avoid them as much as possible.

<oo>→<dh> Digital humanities