Byte Pair Encoding

In information theory, byte pair encoding (BPE) or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. Look up Wikipedia for a good example of using BPE on a single string.

This technique is also employed in natural language processing models, such as the GPT-2, to tokenize word sequences. Continue reading “Byte Pair Encoding”

Transformer… Transformer…

Neural Machine Translation [NMT] is a recently proposed task of machine learning that builds and trains a single, large neural network that reads a sentence and outputs a correct translation. Previous state of the art methods [here] use Recurrent Neural Networks and LSTM architectures to model long sequences, however, the recurrent nature of these methods prevents parallelization within training examples and this in turn leads to longer training time. Vaswani et al. 2017 proposes a novel technique, the Transformer, that relies entirely on the Attention Mechanism to model long sequences, thus can be parallelized and can be trained quicker.

Continue reading “Transformer… Transformer…”

WordPiece Tokenisation

With the high performance of Google’s BERT model, we can hear more and more about the Wordpiece tokenisation. There is even a multilingual BERT model, as it was trained on 104 different languages. But how is it possible to apply the same model for 104 languages? The idea of using a shared vocabulary for above 100 languages intrigued me so I drove into it!

Continue reading “WordPiece Tokenisation”

Hello from the Virtual Machines’ World

This article explains a trick, almost like a magic trick. The aim of this trick? To deceive our computer into thinking that it is inside another computer, equipped maybe with another operating system. Why do we do that? For numerous reasons. Some of the possible reasons include: to be able to run multiple applications on one server, to be able to simultaneously run multiple operating systems on one computer, to be able to run multiple sessions of a single operating systems or just to be able to host applications that are incompatible with our host operating  system. The reasons are numerous, the solution is elegant. Let’s see then what is virtualisation and how it is achieved!

Continue reading “Hello from the Virtual Machines’ World”

Principal Component Analysis through the Happiness Index exemple

What determines happiness? Why countries are more (or less) happy than other ones? In 2017, Norway tops the global happiness ranking, made as an annual publication of the United Nations Sustainable Development Solutions Network. In this article, we use their data to show correlations of the variables used in this Index, furthermore we analyse the countries with the help of the Principal Component Analysis technic.

Keep on reading!

Multiple correspondence analysis, Clustering and Tandem Analysis through a basic income analysis example

Okay… So there were several basic income experiments launched in 2017, Finland started a two-year experiment by giving 2,000 unemployed citizens approximately $600 a month. In the Silicon Valley, Y Combinator, announced in mid-2016 that it would begin paying out monthly salaries between $1,000 and $2,000 a month to 100 families in Oakland, while in Utrecht, Netherland 250 Dutch citizens will receive about $1,100 per month. These are just three of the already launched experiments, and their aim is to measure how basic income could provide new structure for social security and to see how people’s productivity levels change when they receive a guaranteed salary.

But how people think about basic income? Are we supportive of it or we fear it? Who is the most likely to vote for it? Is there a difference between people according to their education or job status who are more pro or contra of this idea? This study aims to answer these question by using a semi-supervised approach, Clustering and a Tandem analysis to classify people according to their characteristics and their opinion of basic income.

Keep on reading!

Web Scraping


This article shows a simple program written in Python to do a basic web scraping. As an exercice, we get the titles of Youtube videos and the number of views, then we store these information in a Pandas DataFrame.

Keep on reading!

Create a website or blog at

Up ↑