23 KiB
Git version control for collaborating
Setup
Software
We will use Git inside a command-line shell called Bash.
Installation instructions are available on this page.
Material
Git??
If you need to collaborate on a project, a script, some code or a document, there are a few ways to operate. Sending a file back and forth and taking turns is not efficient; a cloud-based office suite requires a connection to the Internet and doesn't usually keep a clean record of contributions.
Version control allows users to:
- record a clean history of changes;
- keep track of who did what;
- go back to previous versions;
- work offline; and
- resolve potential conflicts.
Programmers use version control systems to collaborativelly write code all the time, but it isn’t just for software: books, papers, small data sets, and any text-base file that changes over time or needs to be shared can be stored in a version control system.
A version control system is a tool that keeps track of changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit), and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.
Configuring Git
On a command line, Git commands are written as git verb
, where verb
is what we actually want to do.
Before we use Git, we need to configure it with some defaults, like our credentials and our favourite text editor. For example:
git config --global user.name "Vlad Dracula"
git config --global user.email "vlad@tran.sylvan.ia"
This user name and email will be associated with your subsequent Git activity, which means that any changes pushed to gerrit, GitLab, GitHub, BitBucket or another Git repository host in the future will include this information. To facilitate collaboration, use the same email address as you use on Bugzilla and gerrit.
git config --global core.editor "nano -w"
git config --list
You can always find help about git with the --help
flag:
git --help
git config --help
Creating a repository
First, let's make sure we're in the right directory. We can check the directory using the pwd
command, and then change directory using the cd
command. On Windows we can change to our default home directory like so:
cd /c/Users/<yourusername>
We can using ls
to get a list of everything that is in our current directory.
Now, let's create a directory for our project and move into it:
mkdir planets
cd planets
This is the same as creating a new folder.
Then we tell Git to make planets
a repository — a place where Git can store versions of our files:
git init
Using the ls
command won't show anything new, but adding the -a
flag will show the hidden files and directories too:
ls -a
Git created a hidden .git
directory to store information about the project (i.e. everything inside the directory where the repository was initiated).
Now that we've initialised the git repositry, we can start using commands to manage versions. We can now check the status of our project with:
git status
Tracking changes
How do we record changes and make notes about them?
You should still be in the planets
directory, which you can check with the pwd
command.
Let's create a new text file that contains some notes about the Red Planet’s suitability as a base. We'll use the nano
text editor:
nano mars.txt
Type the following text into it:
Cold and dry, but everything is my favorite colour
Write out with Ctrl+O and exit nano with Ctrl+X.
We can now use ls
to check that the file has been created.
You can also check the contents of your new file with the cat
command:
cat mars.txt
Now, check the status of our project:
git status
Git noticed there is a new file. The "Untracked files” message means that there’s a file in the directory that Git isn’t keeping track of. We can tell Git to track a file using git add
:
git add mars.txt
You may get a note saying warning: LF will be replaced by CRLF in mars.txt.
This is highlighting the difference in the way that Linux/Unix-based systems and Windows handle carriage returns. And this can be recorded as a change when the file is edited on different operating systems. We can safely ignore this warning and let Git handle this automatically.
Now we can use git status
again to see what happenned:
git status
Git now knows that it's supposed to keep track of mars.txt
, but it hasn’t recorded these changes as a commit yet. To get it to do that, we need to run one more command:
git commit -m "Start notes on Mars as a base"
When we run git commit
, Git takes everything we have told it to save by using git add
and stores a copy permanently inside the special .git directory. This permanent copy is called a commit (or revision) and it is given an identifier (that can be shortened to a few characters).
We use the -m
flag (for "message") to record a short descriptive comment that will help us remember what was done and why.
If we run git status
now:
git status
... we can see that the working tree is clean.
To see the recent history, we can use git log
:
git log
git log
lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier (which starts with the same characters as the short identifier printed by the git commit
command earlier), the commit’s author, when it was created, and the log message Git was given when the commit was created.
Now, let's add a line to our text file:
nano mars.txt
After writing out and saving, let's check the status:
git status
We have changed this file, but we haven’t told Git we will want to save those changes (which we do with git add
) nor have we saved them (which we do with git commit
). So let’s do that now. It is good practice to always review our changes before saving them. We do this using git diff
. This shows us the differences between the current state of the file and the most recently saved version – which is also useful if we can't remember what we changed in the file since the last commit:
git diff
There is a quite a bit of cryptic-looking information in there: it contains the command used to compare the files, the names and identifiers of the files, and finally the actual differences. The +
sign indicates which line was added.
It is now time to commit it:
git commit -m "<your comment>"
That didn't work, because we forgot to use git add
first. Let's fix that:
git add mars.txt
git commit -m "<your comment>"
Using git add
allows us to select which changes are going to make it into a commit, and which ones won't. It sends them to what is called the staging area. In a way, git add
specifies what will go in a snapshot (putting things in the staging area), and git commit then actually takes the snapshot.
Challenge 1
The staging area can hold changes from any number of files that you want to commit as a single snapshot.
- Add some text to mars.txt noting your decision to consider Venus as a base
- Create a new file venus.txt with your initial thoughts about Venus as a base for you and your friends
- Add changes from both files to the staging area, and commit those changes as one single commit.
Adding and committing multiple files:
Exploring history
How can we identify old versions of files, review changes and recover old versions?
As we saw in the previous lesson, we can refer to commits by their identifiers. You can refer to the most recent commit of the working directory by using the identifier HEAD
.
Let's add a line to our file:
nano mars.txt
We can now check the difference with the head:
git diff HEAD mars.txt
Which is the same as using git diff mars.txt
. What is useful is that we can refer to previous commits, for example for the commit before HEAD
:
git diff HEAD~1 mars.txt
Similarly, git show
can help us find out what was changed in a specific commit:
git show HEAD~2 mars.txt
We can also use the unique 7-character identifiers that were attributed to each commit:
git diff XXXXXXX mars.txt
How do we restore older versions of our file?
Overwrite your whole text with one single new line:
nano mars.txt
git diff
We can put things back the way they were by using git checkout
:
git checkout HEAD mars.txt
cat mars.txt
git checkout
checks out (i.e., restores) an old version of a file. In this case, we’re telling Git that we want to recover the version of the file recorded in HEAD
, which is the last saved commit. If we want to go back even further, we can use a commit identifier instead:
git log -3
git checkout XXXXXXX mars.txt
cat mars.txt
git status
Notice that the changes are on the staged area. Again, we can put things back the way they were by using git checkout:
git checkout HEAD mars.txt
cat mars.txt
Challenge 2
Jennifer has made changes to the Python script that she has been working on for weeks, and the modifications she made this morning “broke” the script and it no longer runs. She has spent more than an hour trying to fix it, with no luck…
Luckily, she has been keeping track of her project’s versions using Git! Which commands below will let her recover the last committed version of her Python script called data_cruncher.py?
git checkout HEAD
git checkout HEAD data_cruncher.py
git checkout HEAD~1 data_cruncher.py
git checkout <unique ID of last commit> data_cruncher.py
- Both 2 and 4
Recap
git config
: configure gitgit init
: initialise a git repository heregit status
: see information about current state of the repositorygit add
: add a change from a file (or several) to the staging areagit commit -m "..."
: commit a change (or several) to our history with a descriptiongit log
: see historygit show
: show changes in one commit for one filegit checkout
: roll back to previous versiongit diff
: difference between file on disk and commit in repository
Ignoring things
How can I tell git to ignore things?
Sometimes, we don't want git to track files like automatic backup files, intermediate files created during an analysis, or a scratchpad file. If we don't tell Git to ignore it, it will keep pestering us about them being "untracked".
Say you create a bunch of .dat
files like so:
touch a.dat b.dat c.dat
git status
If you don't want to track them, create a .gitignore
file:
nano .gitignore
... and add the following line to it:
*.dat
That will make sure no file finishing with .dat
will be tracked by git.
git status
git add .gitignore
git commit -m "Ignore data files"
git status
Branches
Just like in a tree, you git repository can have branches. In software development, it make sense to work on different branche that diverge to some extent, for example a stable branch vs an experimental branch, or a branch entirely dedicated to developing a new feature. In the latter case, once the feature is considered good enough to be integrated in the main branch, it can be merged into it.
Let's see how we can create a new branch and navigate between branches.
First, create a new branch called "back-to-earth":
git branch back-to-earth
This creates a new branch, but does not move you to that other branch. You can list existing branches with:
git branch back-to-earth
To start working on that new branch, use checkout:
git checkout back-to-earth
Now, changes commited to the history will pertain to that specific branch:
touch maybe-we-should-stay.md
git add maybe-we-should-stay.md
git commit -m "consider making do with what we have"
One way to visualise the repository's structure with a colourful graph is:
git log --all --oneline --decorate --graph
... which is more intersting once the branches diverge.
If the repository is very complex, like the LibreOffice core repository, you can ommit individual commits to focus on the branch points:
git log --all --decorate --oneline --graph --simplify-by-decoration
To quickly create a new branch, switch to it and make sure it tracks the right remote, you can use:
git checkout -b <a_local_branch_name_of_your_choice> origin/master
For LibreOffice development, we recommend working on patches in this way, so it is then trivial to go back to a vanilla main branch or work on another, independent patch at the same time. One branch per patch, which can be kept as long as it is being reviewed on gerrit.
Collaborating with remotes
How do I share my changes with others on the web? How do we collaborate on the same codebase?
Version control really becomes extra useful when we begin to collaborate with other people. We already have most of the machinery we need to do this; the only thing missing is to copy changes from one repository to another.
Remotes are other repositories hosted elsewhere, for example the central repository for the core of LibreOffice.
There are many places where git repositores are hosted. GitHub, owned by Microsoft, is the most popular one currently. However, there are many alternatives such as GitLab, BitBucket, GitTea, and GitBucket. LibreOffice repositories are hosted on git.libreoffice.org.
For this workshop's purpose, and to experiment with a remote, we'll clone locally the repository where this workshop's material is hosted, on Codeberg: https://codeberg.org/stragu/libocon-2023-workshop
Cloning
How do we make a local copy of a repository hosted elsewhere?
Go to https://codeberg.org/stragu/libocon-2023-workshop and copy the URL next to the HTTPS button. (Or feel free to use whatever repository you own elsewhere!)
Now, in your terminal, first make sure you are not located inside a git repository. Git repositories are not matryoshka dolls!
For example, move out of the current directory:
cd ..
Then, clone the remote repository:
git clone https://codeberg.org/stragu/libocon-2023-workshop.git
You now have a local copy of the repository in the libocon-2023-workshop
directory.
You can now commit changes to the repository locally, like you did with our first repository.
Pulling
What if the remote repository has new commits since you cloned it? You need to pull those changes from the remote:
git pull
If you collaborate on a remote repository, remember to pull
before working!
Pushing
If you have edit rights on the remote (which can be managed by the owner of the remote, directly on the website), you can then add, commit, and push your change to the remote:
nano course-notes.md
git add course-notes.md
git commit -m "added personal course notes"
git push
We didn't have to create a remote called origin
, or set the default upstream: that was done by default by Git when cloning the repository.
However, if you do it the other way around and start with a local repository to then connect it to a newly created remote, you would have to set that up. We do this by making the remote repository a remote for the local repository. Copy the remote's URL, and in your local repository, run the following command:
git remote add origin https://domain.xyz/<your_username>/remote-repo.git
The name origin
is a local nickname for your remote repository. We could use something else if we wanted to, but origin
is by far the most common choice.
Depending on where a repository was created, and with what version of Git, the default branch might be called "master" or "main". You might note that your local default branch is called master, and the remote's is main. If that's the case, you can change the branch name to main with the next line of code:
git branch -M main
Now, we can push our changes from our local repository to the remote. Try this:
git push
Git does not know where it should push by default. See the suggested command in the error message? We can set the default remote with a shorter version of that:
git push -u origin main
We only need to do that once: from now one, Git will know that the default is the origin
remote and the main
branch.
Note of GitHub authentication
GitHub, contrary to other platforms, might now requires you to create a Personal Access Token. Here's how:
- Click on your avatar in the top right of GitHub.com
- Click settings
- Scroll to the bottom and on the left, and click Developer settings.
- Click Personal access tokens (either type is fine)
- Click
Generate new token
(either is fine, classic is simpler) - You may need to authenticate yourself using TFA (you can also choose to use your password)
- You can select what you need to be able to edit. If you've chosen classic, you can skip this and scroll to the bottom to
Generate token
. - Make sure you copy this token immediately and save it somewhere so you can reuse it.
You can now see on GitHub that your changes were pushed to the remote repository.
Pushing for LibreOffice
In this workshop, we don't practice pushing to the LibreOffice core repository, as we an extra tool to review patches: gerrit. You will learn about gerrit and how to push a patch to LibreOffice in the next session.
However, you can already explore the available repositories. For the core repo:
git clone https://git.libreoffice.org/core
But there are many more. For example, to contribute to documentation:
git clone https://git.libreoffice.org/help
Now explore the full list of repsitories.
Conflicts
What do I do when changes conflict with someone else's?
As soon as people can work in parallel, they’ll likely step on each other’s toes. This will even happen with a single person: if we are working on a piece of software on both our personal laptop and a work computer, we could make different changes to each copy, and forget to pull changes before. Version control helps us manage these conflicts by giving us tools to resolve overlapping changes.
To see how we can resolve conflicts, make a change to the title of the [git.md] file, and commit that change.
nano git.md
git add git.md
git commit -m "use better title"
Another contributor made a change to the repository at the same time, in the same file, at the same location! How unlucky.
Try to pull the changes from the remote and see what Git tells you:
git pull
You should see something like:
Auto-merging git.md
CONFLICT (content): Merge conflict in git.md
Automatic merge failed; fix conflicts and then commit the result.
If you use git status
, you will see more information:
Your branch and 'origin/main' have diverged,
and have 1 and 1 different commits each, respectively.
(use "git pull" to merge the remote branch into yours)
You have unmerged paths.
(fix conflicts and run "git commit")
(use "git merge --abort" to abort the merge)
Unmerged paths:
(use "git add <file>..." to mark resolution)
both modified: git.md
We now have to resolve the conflict as directed. Some conflicts are straight forward because they don't affect the same part of the file, but others have to be fixed by hand, like here: it mentions the "Automatic merge failed".
It is now up to us to fix the conflict:
nano git.md
Our change is preceded by <<<<<<< HEAD
. Git has then inserted =======
as a separator between the conflicting changes and marked the end of the content downloaded from the remote with >>>>>>>
. (The string of letters and digits after that marker identifies the commit we’ve just downloaded.)
We can now fix the conflict, add and commit the merge to the local repo:
git add git.md
git commit -m "merge conflicting edits to title"
(and push if we have the necessary rights!)
See how the log lists both commits and the merge commit:
git log
The other way around, Git can reject a push because it detects that the remote repository has new updates that have not been incorporated into the local branch. What we have to do is:
- pull the changes from GitHub,
- merge them into the copy we’re currently working in, and then
- push it all.
Amending
When writing and submitting patches to LibreOffice, it is common to have to fix issues with our implementation after other contributors gave feedback. With Git, you can amend a commit so you don't end up with numerous commits all pertaining to one single change.
If you want to change the latest commit, add the --amend
flag when you commit:
git commit --amend
Now confirm that the changes are stored as one single commit:
git log
git show HEAD
Further resources
Git is a huge tool with many options to get lost into. You will learn about them as they become useful to you. An online search will often lead you to StackOverflow or the Git documentation.
You can also learn more about useful git commands on our wiki:
And here are some general resources to learn more about Git:
- Learn Git branching interactively
- Kate Hudson's "Flight Rules for Git"
- Shaun Mangelsdorf's "Git is a Directed Acyclic Graph"
- The full Version Control with Git lesson from The Carpentries
Legal
This short course is based on the UQ Library Git course which is itself based on the longer course Version Control with Git developped by the non-profit organisation The Carpentries. The original material is licensed under a Creative Commons Attribution license (CC-BY 4.0), and this modified version uses the same license. You are therefore free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
... as long as you give attribution, i.e. you give appropriate credit to the original author, and link to the license.