## Expectation Maximization for MAP estimation

“Expectation is the root of a heartache.” –

William Shakespeare

Expectation–maximization (EM) is an iterative method that attempts to find the maximum likelihood estimator of a parameter θ of a parametric probability distribution. The algorithm computes maximum likelihood estimates of unknown parameters in probabilistic models involving latent variables. Therefore the EM algorithm is an iterative method that alternates between computing a conditional expectation and solving a maximization problem, hence its name.

## Introduction to Graph Models

“Graphical models are a marriage between probability theory and graph theory.”

– Michael Jordan, 1998.

Probability is very important in modern pattern recognition problems. These problems could be assessed by formulating and solving difficult probabilistic models, however, using a graphic representation of these probabilistic problems is often highly advantageous for the following reasons:

1) The visualisation of models makes the models themselves easier to understand and handle. They can also help us to distinguish new models or to point out similarities between already existing model structures, that we have not assumed.

## Equity codes prediction using Naive Bayesian Classifier with scikit-learn

## Introduction

The aim of this article is to have an introduction to Naive baysian classification using scikit-learn.

The naive Bayesian classification is a simple Bayesian type of probabilistic classification based on Bayes’ theorem with strong (so-called naive) independence of hypotheses. In this article, we will use it to build a basic text prediction system. We will predict Equity codes in a search form fashion (i.e prediction starts when user starts typing).

Keep on reading!

## Forecasting recessions with economic indicators

Abstract

This study compares three economic indicators often used in forecasting recessions: the Yield Spread, the Chicago Index and the Leading index. We find that the latter two predict recessions well one and two quarters ahead, but fail in forecasting recessions on a longer time period. On the contrary, the Yield Spread performs better when forecasting recessions four and six quarters ahead.

## Sentiment Analysis: Spervised Learning with SVM and Apache Spark

“Humans aren’t as good as we should be in our capacity to empathize with feelings and thoughts of others, be they humans or other animals on Earth. So maybe part of our formal education should be training in empathy. Imagine how different the world would be if, in fact, that were ‘reading, writing, arithmetic, empathy.’ –

Neil deGrasse Tyson

**Abstract**

The objective is the two-class discrimination (positive or negative opinion) from movie reviews using data from the IMDB database (50000 reviews).

## Terrorism around the Word- Study with R

“Everybody’s worried about stopping terrorism. Well, there is a really easy way: stop participating in it.” –

Noam Chomsky

**Abstract**

According to the Wikipedia, the English word “*terror*“, just like the French “*terreur”*, derives from that Latin word “*terrere*” and means to fright, alarm, anguish, fear, panic. Indeed, we all fear terrorism that is the more and more part of our life. But do we understand the global picture? Who attacks who, where and why? Why do we see the more and more suicide attacks? This study focuses on answering these questions by an Exploratory Data Analysis, semi-supervised learning and a supervised Logit model.

## The impact of individual characteristics on the length of life in India – Oaxaca-Blinder decomposition, Logit model

**Abstract**

This study estimates the impact of social status, education and average life standards on the length of life by analyzing India’s mortality statistics in 2009 for two states, Uttarakhand and Bihar. Using several estimation methods such as MCO, GLM and Logit regressions, furthermore the Oaxaca-Blinder decomposi- tion, we find that education, electricity and the access to toilet significantly raises the length of life. We also find that members of the scheduled tribes live shorter, and this difference cannot be explained by differences between the average value of the two groups’ characteristics.

## Multiple correspondence analysis, Clustering and Tandem Analysis through a basic income analysis example

**Abstract**

Okay… So there were several basic income experiments launched in 2017, Finland started a two-year experiment by giving 2,000 unemployed citizens approximately $600 a month. In the Silicon Valley, Y Combinator, announced in mid-2016 that it would begin paying out monthly salaries between $1,000 and $2,000 a month to 100 families in Oakland, while in Utrecht, Netherland 250 Dutch citizens will receive about $1,100 per month. These are just three of the already launched experiments, and their aim is to measure how basic income could provide new structure for social security and to see how people’s productivity levels change when they receive a guaranteed salary.

But how people think about basic income? Are we supportive of it or we fear it? Who is the most likely to vote for it? Is there a difference between people according to their education or job status who are more pro or contra of this idea? This study aims to answer these question by using a semi-supervised approach, Clustering and a Tandem analysis to classify people according to their characteristics and their opinion of basic income.

## Principal Component Analysis through the Happiness Index exemple

**Abstract**

What determines happiness? Why countries are more (or less) happy than other ones? In 2017, Norway tops the global happiness ranking, made as an annual publication of the United Nations Sustainable Development Solutions Network. In this article, we use their data to show correlations of the variables used in this Index, furthermore we analyse the countries with the help of the Principal Component Analysis technic.

## Web Scraping

**Abstract**

This article shows a simple program written in Python to do a basic web scraping. As an exercice, we get the titles of Youtube videos and the number of views, then we store these information in a Pandas DataFrame.