Let’s have a committed relationship … with git

Git is the most widely used version control system in the world. I have to use it during my work, and as it is not difficult to use (if you don’t have conflicts), I didn’t think much about it until recently. However, once I started to read more about it, I quickly realised its elegant way of dealing with versions and integrity control worth more than an article! We have probably all taken this amazing tool granted time to time, underestimating all the trouble it saves us from. So let’s realise the power and elegance of git, and let’s finally get into a committed relationship with it!

Creation of git

Some funny facts about git. First of all, did you know that git was created by Linus Torvalds? And yes, by that Linus Torvalds who has created Linux. Well, I guess a carrier to wish for! Linux used another version control system tool called BitKeeper but as Mr. Torvalds believes and promotes open-source solutions, he created a free, open-source and an even better version control system around 2004-2005.

And as a second funny fact, do you know why git is called git? It could be something like Global Information Tracker” and I found some sources suggesting this. But Mr. Torvalds stated the following:

“I’m an egotistical bastard, and I name all my projects after myself. First Linux, now git.”

I guess the irony in the statement is more amusing once we quote urbandictionary’s definition of git (for non English-natives):

“Total and utter tosser who is incapable of doing anything other than annoying people, and not in a way that is funny to others.”

Git was also affectionately referred to as “the information manager from hell” by its creator, and I think everyone who uses git regularly face conflicts that simply drives us crazy… But let’s go back in time for a moment and imagine a world without it. 

Why do we need git?

Firstly, we don’t code projects alone nowadays. At least I don’t. I work in teams and often more than one person changes the code simultaneously. So we need a system that allows for distributed development. Not only parallel but also independent and simultaneous developments need to be allowed in private repos without synchronising each development with a central repository.

Git was also created  to allow for a very large number of developers working on the same project, but still ensuring a reliable integration of their work. Git also ensures data integrity. The idea is that it needs to ensure that data is not altered from a repository to the next, or from a programmer to the next. To do this, git uses a cryptographic hash function, called Secure Hash Function to uniquely identify each item. Git also enforces a change log that is altered at each commit. This means that we can always trace back who changed files and what he changed exactly. To ensure this accountability and integrity, git uses immutable data objects. Even the history stored in the version control is immutable. This leads to very quick comparisons of different versions. Git uses atomic transactions and allows cloning entire repositories. On the top of these, git is free!

Git concepts

The git repository is a database, that contains all information needed to retain and manage the history of the project and to be able to manage revisions. Git also maintains two data structures, the object store and the index in the hidden directory .git/ . The object store contains all original data files, identified by their secure hash indexes, moreover, the logs of messages, informations on the authors, dates and so on. Git has four type of objects in the object store:

  • Blobs, that is binary large objects. Each version of the file is saved as a blob.
  • Trees, that represent a level of directory information.
  • Commits, containing metadata for each change in the directory.
  • Tags, that assign an arbitrary name to each commit.

Over time, each object in the object store increase as we add files and as we commit. Git can therefore compress the past object store. On the other hand, the index is a temporary and dynamic binary file, that describes the directory structure of the repository at a given time. The index record changes until committing the repository.

Git can also be used as a content tracking system. This essentially means that the object store is based on the contents of the objects, and not on the name of the object. If for instance two files contain exactly the same content but they are stored at a separate location in the directory hierarchy, git stores only one example of this object. I’ll come back to this point later. 

Getting started with git

After installing git, we are ready to use it. We can check that git is good to go with the following command:

$ git --version

To see git in action, let’s create an initial repository with one html file:

$ mkdir ~/my_dir
$ cd ~/my_dir
$ echo "Let's get committed to git" > git.html

To initialise a git repository, we need to use the following command:

$ git init 
Initialized empty Git repository in ~/my_dir/.git/

This command creates a hidden directory, .git/ . From now on, all revision information about our repo will be saved in this  directory. Git also considers our directory as the working directory from now on.

Now we can check what we were mentioning in the previous section: we can check all created by git, included of course the object store: 

$ find . 

As we can see, git contains a lots of stuff, even though now we only created an empty git repository. To add files to it, we need to select them and tell git to add these files to the git repository. This allows for keeping scratch files only in our directory and not sending it (or pushing it) to the git repository.

$ git add git.html

A useful option is “-A”, this adds all files in your directory to the git repository. Another useful tool to know is the .gitignore, if you add your files to this, it will be ignored and not added to git when you use the git add command.  Now we can see the status of git by the command:

$ git status
On branch master
No commits yet
Changes to be committed:
    (use "git rm --cached <file>..." to unstage)

This says that we are on the master branch, that we have not yet committed but we added the git.html file to the git. git add does not yet mean that our file “git.html” is already in the git repository. Git remembers that we wanted to add this file, but allows “adds” to be bunched together. This means that git does not require us to update the git repository at each modification. Instead, we can add each modification to the git as we move forward with our code, and update the repository when we think that our update creates a stable update in the git repo. To do this step, we use the command:

$ git commit -m "Initial commit" 
 [master (root-commit) 2dcfe4d] Initial commit 
  1 file changed, 1 insertion(+)
  create mode 100644 git.html

If we rerun the status command, we see now that no files needs to be committed:

$ git status 
On branch master
nothing to commit, working tree clean

Now let’s see what is in git’s objects after the first commit.

$ find .git/objects

What are these objects? We need to remember, that git does not store a file in its object store according to its file name. So here, git does not care that the file name is “git.html”. It cares about what is the content of the file, a text “Let’s get committed to git”. Git first perform a few operations on the corresponding blob, calculates its secured hash index, and enters the hexadecimal representation of the data into the object store. This representation in the example above is 6c09f284282b627e4bec8d20984b7f64aeb0315d. Let’s see what this gives back:

$ git cat-file -p 6c09f284282b627e4bec8d20984b7f64aeb0315d 
Let's get committed to git

We see similarly what the other two objects are: the corresponding blob and the tree. The blob corresponding to the file name, while the tree stores the directory structure (here that the file is in the working directory). Pretty cool, isn’t it?

$ git cat-file -p 6425db77eb1f6ee467a3bccd7aa08e4b934c1798
100644 blob 6c09f284282b627e4bec8d20984b7f64aeb0315d git.html

$ git cat-file -p 71025e579c1401f754db91ee91f5caae7c91f338
tree 6425db77eb1f6ee467a3bccd7aa08e4b934c1798
author fannihalmai <Fanni.halmai@tsm-education.fr> 1585240035 +0100
committer fannihalmai <Fanni.halmai@tsm-education.fr> 1585240035 +0100
Added file

Let’s now change the git.html file and add to the git repository and commit it:

$ git add -A 

$ git commit -m "Changing git.html" 
 [master 567e061] Changing git.html 
 1 file changed, 5 insertions(+), 1 deletion(-)

And now let’s see the logs of the commits by the following command:

$ git log 
commit 567e061691156c31ba989ab60e6ba053b4a05040 
(HEAD -> master) Author: fanni <fanni@machinelearnit.com> 
Date:   Thu Mar 26 14:35:50 2020 +0100     
Changing git.html commit 2dcfe4d95cd63a796f62476c5a146e31a403fdc8 
Author: fanni <fanni@machinelearnit.com> 
Date:   Thu Mar 26 14:26:46 2020 +0100 Initial commit

To inspect one commit, we can use the “show” command with the git commit number:

$ git show 567e061691156c31ba989ab60e6ba053b4a05040
commit 567e061691156c31ba989ab60e6ba053b4a05040 (HEAD -> master)
Author: fanni <fanni@machinelearnit.com>
Date:   Thu Mar 26 14:35:50 2020 +0100
    Changing git.html 
diff --git a/git.html b/git.html
index 6c09f28..354a522 100644
--- a/git.html
+++ b/git.html
 @@ -1 +1,5 @@
-Let's get committed to git
+Hi, let's get committed to git

This shows the last commit I have made. Now let’s see the difference between the first and second commit:

$ git diff 567e061691156c31ba989ab60e6ba053b4a05040 2dcfe4d95cd63a796f62476c5a146e31a403fdc8
diff --git a/git.html b/git.html
index 354a522..6c09f28 100644
--- a/git.html
+++ b/git.html
@@ -1,5 +1 @@
-Hi, let's get committed to git
+Let's get committed to git

Now suppose we want to remove the file git.html from the git repository:

$ git rm git.html
rm 'git.html'
$ git commit -m "Removed file"
 [master 28df16f] Removed file
  1 file changed, 5 deletions(-)
  delete mode 100644 git.html

Git configuration

We can add a username and an email by the following two commands (leave the option –global out if your only want to set these configs for the present git repo):

$ git config --global user.name "fanni"
$ git config --global user.email "fanni@machinelearnit.com

and with the following command we display the configs:

$ git config -l 

To finish..

I plan to write more on git later on, talking about the graph structure of git as I find it very interesting! But, if I added all this stuff to this article, it would probably be too long and so nobody would read it, therefore it will be the topic of a new one! I hope you liked this article, and thanks for reading!


I used for a more in depth view the following excellent book: Jon LoeligerMatthew McCullough: Version Control with Git, 2nd Edition.

One thought on “Let’s have a committed relationship … with git

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: