Git collaboration

Written by: Omar Siam
Published on: November 5, 2023
Tagged with: Data management and Git

Learning outcomes

work collaboratively with git
create, clone and change a remote repository
understand the commands git fetch, git push and git pull
create a personal access token to use in a code editor

Using Git for collaborative team work

In this section we’ll look at using Git to manage files in a project that is being worked on by more than one person as this is the most likely reason why you might be asked to use Git.

This involves synchronizing changes across computer networks, and resolving conflicts between document versions that can arise when two people have changed the same document in different ways.

Many “drive” services in the cloud like https://www.dropbox.com, https://onedrive.live.com/ or https://drive.google.com” provide such mechanisms in their Web interface as well. Git however offers more fine grained tracking of changes, as well as more control over the handling of changes, versions and conflicts.

It is also easier for applications to interact with Git services than with “drive” services. That’s why software developers often ask you to use Git instead of any “drive” service.

If you get into the Git workflow you may want to use it also locally for any kind of documents (remember Git is best at text documents, but not limited to them).

Most importantly however, everything we have learned in the first part is still valid: in Git everybody always works on a local copy of a project, and Git helps to keep local project copies in sync. This is why Git is sometimes called a distributed version control system.

Remote repositories

Remote repositories are versions of your project that are hosted “elsewhere”, usually somewhere on the internet. There can be multiple such remote versions. Remote repositories are the principle mechanism that allow to synchronize changes between collaborators in a team. While it would be in principle possible to set up the local repository of a colleague as a remote repository and synchronize directly between individual colleagues, this would become complicated very quickly with teams of more than two members. Therefore almost invariably a dedicated Git hosting provider like GitHub or GitLab is used to store a primary copy of our project, with which all team members can synchronize their local project copies.

In the following, we will learn a few new git commands needed to work with remote repositories.

Creating a remote repository

If no remote repository for a project has been established yet, you need to create a new one.

First, let’s create a new empty remote repository called my-test-project on GitHub’s servers:

gh repo create my-test-project

To connect a local Git project with the newly created remote repository, type:

git remote add origin https://github.com/my-git-username/my-test-project

By convention, the remote repository which holds the primary copy of a project is called “origin”.

Cloning a remote repository

In a collaborative setup, there is a good chance that a remote repository has already been set up and we just need to “connect” to it, that is, create a local copy, or clone a remote repository, which will take care of initializing the project locally, connecting it with the remote, and synchronizing the local project with the current state of the remote.

To clone a local repository from a remote repository, we just need to specify its URL and create a local copy on our machine in a directory of our choosing, using the command git clone <url>.

git clone https://github.com/my-git-username/my-test-project

The above will create a local directory for the git repository that has the same name as the remote repo, “my-test-project”. You are free to choose the local directory’s name to your liking:

git clone https://github.com/my-git-username/my-test-project the_new_project

Now, our local git repo sits in a directory called the_new_project.

Alternatively to the https:// protocol, you can also use the SSH transfer protocol. Check the git documentation for further information.

Keeping repositories in sync: fetch, pull, push

Working collaboratively also means keeping in sync with changes that have been created as well as bringing your own work back into the remote repository.

These three commands are essential to keep your local version in sync with the remote repository:

git fetch

git fetch fetches any changes that have been made to the remote repository while you were working on your own local repository. The command will show changes that are not in your local version yet. They need to be integrated before you can push your version back into the remote repository. You can specify which branch you want to fetch by adding its name. Using git log in the terminal will give you further information on the changes. Use git commit and git merge to integrate the changes.

git pull

git pull is similar to git fetch. However, when you use this command, the change in the remote repository or remote branch are merged into your local copy straight away. While git fetch gives you more control, git pull is faster when you want to perform multiple actions.

git push

git push is the opposite action of git fetch and git pull. You push back your local branch or repository into the remote repository. If you have already an upstream remote for your local branch, git push can push to that branch as a default. In case you don’t want to use the default branch, you can specify the branch every time you push: git push <remote> <branch-name>.

Working collaboratively in code editors

Alternatively to working on the command line, some hosting providers like GitHub offer a desktop client or a browser based interface. In addition, there are various code editors available that provide a graphical user interface that support working with your team fast, diligently and easily. Some of the more well-known ones are VS Codium, Visual Studio Code, and Git Kraken.

In order to work with your team but use a code editor, you need to link the code editor and your respective GitHub, GitLab, Bitbucket etc account. This way, the code editor can access the remote repository and you can set up your local version and/or bring locally created files into the remote repo. Follow the installation instructions for your favourite code editor and come back for information on linking the editor with your hosting provider.

Getting a throw away password for Git: Creating an access token

Security experts will tell you that you should never save a password on a computer. Write it down on a peace of paper and put it into a locked drawer maybe but never save it to a computer in a readable way. This of course can mean that you need to type a password again and again and again when working with some data and that is tiresome.

So the solution are throw away passwords no one needs to remember and that have a scope, that is they can only be used to do certain things with a service.

Besides gitlab.oeaw.ac.at allows you to use a number of other services (like Google, GitHub, etc) you already might have an account for to log into gitlab.oeaw.ac.at. If you use that, and that is in general a good idea, gitlab.oeaw.ac.at has no idea what your password is. It just trusts the other services to check your password and tell it that you are who you claim to be.

So to get our data from gitlab.oeaw.ac.at we need to create a password for exactly that purpose. This is called an “access token”.

Go to gitlab.oeaw.ac.at and sign in.

In the top right corner you see your user settings menu. Open it and choose preferences.

On the left side select “Access Tokens”

gitlab.oeaw.ac.at preferences Access Tokens

We will create a new access token that has all available rights or scopes so Git and Visual Studio Code can interact with gitlab.oeaw.ac.at. We enter a descriptive Token Name like Visual Studio Code or Git and add a name for the computer we will use this on. This is a hint for us for later.

gitlab.oeaw.ac.at create all access token

We could only grant a particular set of rights here, limit the scopes the token is valid for. We can also set an expiration date if we know we won’t use the token for very long, for example when trying some new software.

After we click on Create personal access token! we see this new generated password in our browser. As the text clearly states we won’t be able to see it again. If we need access for another tool we just create a new access token.

For the purpose of this workshop we should leave the browser tab open or copy the access token we just created somewhere. We might need it later for enabling an add-on.

If we ever forget what the purpose of a token was or if we don’t use it anymore we find the token by name and Revoke it, we delete it.

Similar to GitLab, an access token can be created in GitHub too:

Creating a personal access token

Quiz

Training task

Open your terminal / command line / git shell. Clone the repository “https://github.com/acdh-oeaw/howto_trainingmaterials” to your local work station.

Inspect the repository by checking its status.

Create another file named abc.txt with the following text:

A, B, C,
Granny caught a flea
She salted it and peppered it
And put it in her tea
The flea died and Granny cried.
A, B, C.

Add the file to the repo, and commit the file with commit message.

Then, check the local repo’s status.

Before you push your repo, perfom git fetch (if you think others are working in the same remote repo), or git pull (if you think you are the only person working on it).

Perform git push.