XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang et al. was published in June 2019. The article claims that it overcomes shortcomings of BERT and achieves SOTA results in many NLP tasks.

In this article I explain XLNet and show the code of a binary classification example on the IMDB dataset. I compare the two model as I did the same classification with BERT (see here). For the complete code, see my github (here).

Continue reading “XLNet”

Featured post

You Do Cker

There are some innovations in IT that everyone follows and hear about and then there are some others, quieter, humbler, those sort of innovations that hide behind a wall and when you understand their power and what you can do with them, you are truly astonished.

Continue reading “You Do Cker”

Featured post

BERT: Bidirectional Transformers for Language Understanding

One of the major advances in deep learning in 2018 has been the development of effective NLP transfer learning methods, such as ULMFiT, ELMo and BERT. The Transformer Bidirectional Encoder Representations aka BERT has shown strong empirical performance therefore BERT will certainly continue to be a core method in NLP for years to come.

Continue reading “BERT: Bidirectional Transformers for Language Understanding”

Featured post

Transformer… Transformer…

Neural Machine Translation [NMT] is a recently proposed task of machine learning that builds and trains a single, large neural network that reads a sentence and outputs a correct translation. Previous state of the art methods [here] use Recurrent Neural Networks and LSTM architectures to model long sequences, however, the recurrent nature of these methods prevents parallelization within training examples and this in turn leads to longer training time. Vaswani et al. 2017 proposes a novel technique, the Transformer, that relies entirely on the Attention Mechanism to model long sequences, thus can be parallelized and can be trained quicker.

Continue reading “Transformer… Transformer…”

Featured post

Maximum Likelihood Estimation (MLE)

Maximum likelihood estimation (MLE) is a method of estimating some parameters in a probabilistic setting. It is based on finding the parameters of a probability distribution that maximise a likelihood function of the observed data. The idea is to find the probability density function under which the observed data is most probable, the most likely. This blog gives a brief MLE overview.

Linear Algebra with NumPy

Linear Algebra is essential to understand ML for three main reasons. One that when you read a book or an article of ML, models are very often explained with linear algebra. This is a consequence of much mathematical convenience as explained below. Second, many models are founded by linear algebra methods. Third, deep learning uses extensively vectors. In either way, if ML interest you, you need to trespass linear algebra. This article contains its most important notions with NumPy examples.

Continue reading “Linear Algebra with NumPy”

Create a website or blog at WordPress.com

Up ↑