Version Control with Git

SciencesPo Intro To Programming 2024

Florian Oswald and The Software Carpentry

11 October, 2024

Version What?

Question

  • What is Version Control and Why Should I Care?

Objectives

  • Understand the benefits of an automated version control system.
  • Understand the basics of how automated version control systems work.

Final.doc

“Piled Higher and Deeper” by Jorge Cham, http://www.phdcomics.com

Undo

  • The latest version is often best for text documents.
  • However, sometimes our view of best evolves. Then, we want to undo.
  • Undo means going back in history.
  • MS Word etc have track changes features.
  • Once you accepted the proposed changes of a collaborator, can you go back?
  • What about Dropbox-like solutions? (What is dropbox actually?)

Which Version: 20210611_draft.tex

Research team 👇 orders files by YYYYMMDD.

  • Hey, fixed that thing last week.
  • In 20220629-paper.tex?
  • Erm. Yes. No. I think 20211203-paper.tex - messed up the file name.
  • Ok, can you copy it into the latest version?
  • Sure. Damn, can’t find it anymore. I’ll just write it again. All in my head. 🤯

“True Story” by Florian Oswald

Which Version 2: Why is the sample size so small suddenly?

  • We had 800 observations, now 733. Why?
  • Erm…😱 No clue!
  • Well you must have changed the code.
  • Yes, I improved the code in several parts.
  • Well you have to find out what happened.
  • But that was weeks ago - I don’t remember! 😢

Hard Bugs

  • The hard bugs 🐛 are the ones you see only after a while.
  • See result today, error was introduced long ago.
  • You can rewind dropbox 30 days. What if… ?
  • Also, throw away 30 days of work?
  • 😱 😱 😱 😱

Setting Up Git

  • We all installed git.
  • Let’s setup our name
$ git config --global user.name "Your Name"
$ git config --global user.email "your@mail.com"
  • Line Endings on Windows:
git config --global core.autocrlf false

Creating a Git Repository

Question

  • Where does Git store information?

Objectives

  • Create a local repository
  • Describe purpose of .git directory

House Prices Project

  • Let’s create a project folder in our home to look at the house prices from last week.
$ cd    # going to home dir 
$ mkdir houseprices  # create directory
$ cd houseprices 
$ git init
  • Now the directory ~/houseprices is endowed with git version control.
  • What does that look like?

Where is Git?

  • Remember hidden files and folders?
$ ls -a 
./    ../   .git/
  • Git for this repository resides in .git

Danger Zone

  • If you delete that folder, the entire version control is GONE.
  • Be very careful that you really want to do that.

Tracking Changes with Git

Question

  • How do I record changes in Git?
  • How do I check the status of my version control repository?
  • How do I record notes about what changes I made and why?

Objectives

  • Understand the benefits of an automated version control system.
  • Understand the basics of how automated version control systems work.

Adding Code and Text

Note

  • Notice: The code we produce is text.
  • Remember what we learned about file endings.
  • Let’s add a shell script where we add our pipeline from last week.
  1. run to get the raw data again:
wget https://static.data.gouv.fr/resources/demandes-de-valeurs-foncieres/20240408-125738/valeursfoncieres-2023.txt

Adding Code and Text

  1. create a script
nano maketable.sh  # open nano
# type this:
cd ~/houseprices   # make sure we are in the right place
cut -d '|' -f 18 valeursfoncieres-2023.txt | sort | uniq -c | sort -r | head -n 10
# save and exit 
  1. (Does it work?)
ls .  # check the new file is there
./maketable.sh   # run it!
  1. No, it doesn’t. 😖
chmod +x ./maketable.sh  # add executable mode
ls -a
./maketable.sh

Viewing Changes

  • Ok, now let’s see what git makes of our additions to this directory.
floswald@PTL11077 ~/houseprices (main)> git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        valeursfoncieres-2023.txt
        maketable.sh
  • It is actually helpful not to use bash as a shell…
  • Customizing your shell is an extremely effective procrastination device.
  • You must know what shaving a Yak means before you walk out of my class.

Seeing the Difference

  • the command git diff shows you what changed between versions.
  • lets see what it shows now:
$ git diff
  • It shows nothing, i.e. an empty diff, because there are no commits yet to compare with.
  • Ok, let’s change that.

Modify-Add-Commit 1

  • git reports about untracked files. We need to decide what to track.
  1. Move files to staging area:
git add maketable.sh 
git status
  • Notice that I did not want to track the csv file.
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   maketable.sh

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        valeursfoncieres-2023.txt

Modify-Add-Commit 2

  • Now, let’s record what is in the staging area.
$ git commit -m 'added the maketable script'

[main (root-commit) 9956506] added the maketable script
 1 file changed, 2 insertions(+)
 create mode 100644 maketable.sh
  • check status:
$ git status

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        valeursfoncieres-2023.txt

nothing added to commit but untracked files present (use "git add" to track)
  gasprices git:(main)  

Modify-Add-Commit 3

  • Let’s check what’s in the log.
$ git log 

commit 9956506dc3159403b87aea3b04654c293e82c680 (HEAD -> main)
Author: Florian Oswald <florian.oswald@gmail.com>
Date:   Tue Feb 7 10:50:51 2023 +0100

    added the maketable script

Modify-Add-Commit 4

  • Now let’s modify the script finally.
$ nano maketable.sh

# add this line on top 
echo hello user, will make a contigency table now.
# save and exit
  • now - what’s the difference in the repo?

Diffing

  • there are still the same files here:
$ ls 
valeursfoncieres-2023.txt  maketable.sh
  • But we can now compare versions:
$ git diff 

diff --git a/maketable.sh b/maketable.sh
index 7e01058..3b7007e 100644
--- a/maketable.sh
+++ b/maketable.sh
@@ -1,2 +1,3 @@
+echo hello user, will make a contigency table now.
 cd ~/valeursfoncieres-2023.txt   # make sure we are in the right place
 cut -d ';' -f 5 valeursfoncieres-2023.txt | tr [:lower:] [:upper:] | sort | uniq -c | sort

Commiting Changes Again

  • let’s first check everything runs
$ ./maketable.sh
  • good. commit!
$ git add maketable.sh 
$ git commit -m 'added message to user'

Adding a README

  • Good. Now let’s add a README file.
  • It’s customary to write this in markdown
$ nano README.md 

write this in nano and save when done.

# Gas Prices

This repo contains code to analyse gas prices at French gas stations.
  • add to staging area, so we can take a snapshot
$ git add README.md 
$ git commit -m 'added readme'

What is this Staging Area?

  • git is like a fotographic camera.
  • before you take a picture of your friends, you need to arrange them somehow, so that all fit, and so that all 😁.
  • You put them on stage. Same for files in your repo.

figure from software carpentry

What is this Staging Area?

I took that picture at CDG airport

Looking at History

Question

  • How can I identify old versions of files?
  • How do I review my changes?
  • How can I recover old versions of files?

Objectives

  • Explain what the HEAD of a repository is and how to use it.
  • Identify and use Git commit numbers.
  • Compare various versions of tracked files.
  • Restore old versions of files.

The most recent version: HEAD

  • Let’s change the maketable.sh script again:
$ nano maketable.sh
echo program run successfully
# save exit

$ git add maketable.sh 
  • The most recent version of our repo is called HEAD.
$ git diff # compares entire repo to HEAD 
$ git diff HEAD maketable.sh  

Whoops, typo

  • Oh no, we wrote program run successfully. That should be ran not run.
  • What now?
  • we have not committed this yet!
  • we can just get back the version in HEAD, and edit again:
$ git restore maketable.sh
$ git checkout maketable.sh # also works
  • edit the script, add and commit.

How to get a specific version

  • What if you want something else than HEAD?
  • like, the first version of maketable.sh?
  • look at history:
$ git log --oneline --graph

* a6f023b (HEAD -> main) added readme
* 9956506 added the maketable script
  • The 9956506 is the unique identifier of that version.
  • We can go back to that version:
$ git checkout 9956506 maketable.sh 

Key Points

  • git diff displays differences between commits.
  • git checkout recovers old versions of files.

So, how does this thing work?

software carpentry image.

Version Control with VScode

  • Download Visual Studio Code
  • Start
  • Open folder ~/gasprices
  • check version control tab on the left.

Version Control with RStudio

  • top right click on new project
  • Select existing directory
  • Select ~/gasprices
  • checkout out the git tab in Rstudio!

Collaborating with Git on GitHub

  • Create repo
  • copy ssh remote URL
  • connect local to remote repo

SSH connections

  • Secure Shell Protocol
  • Private-Public key pair. It’s like a lock, and you have the only key.
  • Let’s check if you have one already!
ls -la ~/.ssh 

if error, create one:

ssh-keygen -t ed25519 -C "your@email.com"

press enter (no passphrase)

check

ls -la ~/.ssh 

Communicate with GitHub Remote

  • Let’s ping the remote server at GitHub now.
ssh -T git@github.com
  • right, of course Github doesn’t have our public key yet (the lock for our key!)

  • copy from your terminal

cat ~/.ssh/id_ed25519.pub  # or your *.pub 
  • Go to github.com, click top right corner, settings, SSH keys.

Adding a Remote to your local Repo

  • Now that we can talk to Github.com, let’s add the remote to our local repo.
  • We add a remote by getting the SSH url from the repository (green button) online.
$ git remote add origin git@github.com:YOUR_USER/YOUR_REPO.git
  • origin is the name of the remote server. your choice, but origin is common.
  • this should set that remote both for sending and retrieving stuff from the repo. pull and push, in git language:
$ git remote --v

Pushing It

  • Now we can push our local repository to the remote repo.
  • There will be a full copy of what is in .git (i.e., the entire history of the repo) on that remote machine.
  • You will be able to use it like a central backup location for your work.
$ git push -u origin main
  • the -u flag sets the main branch as default upstream branch to track.

Branching It

  • Next to different versions of a file/directory over time, we can have versions evolving in parallel.
  • Imagine development history branching off into 2 separate directions at one point.
  • They may converge at some point again, but maybe one of them will turn out a failure and we drop it.
  • Branches are hugely useful to organize team work.
$ git checkout -b testing # checkout repo on new branch `testing`
Switched to a new branch 'testing'
  • Now can develop stuff on the testing branch.
  • Later on, we can merge it back into main if we like it.

The Full Picture(s)

picture from @MarkLodato - click for more!

Pushing Branches to GitHub

  • Once you created a local branch you can of course copy (push) it to your remote to share with others.
  • you would amend the push command:
# make sure you are on the desired branch 
$ git branch 
  main
* testing

$ git push origin testing