HOW TO: GitHub for Version Control

[Written by Graham Cooper on 2019/15/01]

Table of content

  1. Version Control in GitHub
  2. Set up Git and GitHub
  3. Git Commands
  4. Links
  5. Literature

Version Control in GitHub

Reproducibility ensures that anyone (including you in 6 months’ time!) can take your data and get the same results and tables that you originally found/generated. Problems with reproducibility (and how to overcome these) have been widely discussed in recent years, particularly in the behavioral sciences. One major obstacle standing in the way of reproducibility in research is good curation of data, i.e. storing everything in a sensible place, keeping track of changes in files by various collaborators. This is a particular challenge for us as researchers as (generally) we have been taught how to collect and analyze data to a high level of skill but have received next to no training in how to curate that data once it exists. Git (and GitHub) is one popular version control system that enables such data curation and is widely used in the field of software engineering, where the skills of data science/data management are taught to a gold standard. Using Git and GitHub, you have one space where all your relevant documents can be stored. Every version of your documents is saved so you can easily go back and check previous versions without needing to search through endless files names “manuscript_draft_version_16_FINAL_draft_ACTUAL_FINAL_DRAFT.doc”, for example! It also makes collaboration easier as everyone can work on the same document at once and merge versions, while keeping track of who did what. At the end of the project it also facilitates sharing of data and code as you can make a private repository public and add a link that everyone can access.

Currently GitHub allow users to have unlimited private repositories!

See Vuorre & Corley (2018) for a more detailed tutorial on Git and GitHub including how to set it up. A link to the online pdf is available in the links section.

Git Commands

Below is a list of common commands for Git and what they are used for.

git init

Initializes an empty git repository

git status

Shows all changes made since last “commit”

git add

Adds changes to the file to be committed

git commit -m “added file”

Commit all changes (signify that these changes should be included in the document). The text after the -m flag is the commit message that describes the changes you made

Links

Literature

  1. Vuorre, M. & Curley, J. P. (2018) Curating Research Assets: A Tutorial on the Git Version Control System. Advances in Methods and Practices in Psychological Science, 1(2).