XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang et al. was published in June 2019. The article claims that it overcomes shortcomings of BERT and achieves SOTA results in many NLP tasks.

In this article I explain XLNet and show the code of a binary classification example on the IMDB dataset. I compare the two model as I did the same classification with BERT (see here). For the complete code, see my github (here).

Continue reading “XLNet”

BERT: Bidirectional Transformers for Language Understanding

One of the major advances in deep learning in 2018 has been the development of effective NLP transfer learning methods, such as ULMFiT, ELMo and BERT. The Transformer Bidirectional Encoder Representations aka BERT has shown strong empirical performance therefore BERT will certainly continue to be a core method in NLP for years to come.

Continue reading “BERT: Bidirectional Transformers for Language Understanding”

Equity codes prediction using Naive Bayesian Classifier with scikit-learn

The aim of this article is to have an introduction to Naive baysian classification using scikit-learn. The naive Bayesian classification is a simple Bayesian type of probabilistic classification based on Bayes’ theorem with strong (so-called naive) independence of hypotheses. In this article, we will use it to build a basic text prediction system. We will predict Equity codes in a search form fashion (i.e prediction starts when user starts typing).

Continue reading “Equity codes prediction using Naive Bayesian Classifier with scikit-learn”

Byte Pair Encoding

In information theory, byte pair encoding (BPE) or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. Look up Wikipedia for a good example of using BPE on a single string.

This technique is also employed in natural language processing models, such as the GPT-2, to tokenize word sequences. Continue reading “Byte Pair Encoding”

Transformer… Transformer…

Neural Machine Translation [NMT] is a recently proposed task of machine learning that builds and trains a single, large neural network that reads a sentence and outputs a correct translation. Previous state of the art methods [here] use Recurrent Neural Networks and LSTM architectures to model long sequences, however, the recurrent nature of these methods prevents parallelization within training examples and this in turn leads to longer training time. Vaswani et al. 2017 proposes a novel technique, the Transformer, that relies entirely on the Attention Mechanism to model long sequences, thus can be parallelized and can be trained quicker.

Continue reading “Transformer… Transformer…”

WordPiece Tokenisation

With the high performance of Google’s BERT model, we can hear more and more about the Wordpiece tokenisation. There is even a multilingual BERT model, as it was trained on 104 different languages. But how is it possible to apply the same model for 104 languages? The idea of using a shared vocabulary for above 100 languages intrigued me so I drove into it!

Continue reading “WordPiece Tokenisation”

Principal Component Analysis through the Happiness Index exemple

What determines happiness? Why countries are more (or less) happy than other ones? In 2017, Norway tops the global happiness ranking, made as an annual publication of the United Nations Sustainable Development Solutions Network. In this article, we use their data to show correlations of the variables used in this Index, furthermore we analyse the countries with the help of the Principal Component Analysis technic.

Keep on reading!

Create a website or blog at WordPress.com

Up ↑