Introduction The aim of this article is to have an introduction to Naive baysian classification using scikit-learn. The naive Bayesian classification is a simple Bayesian type of probabilistic classification based on Bayes’ theorem with strong (so-called naive) independence of hypotheses. In this article, we will use it to build a basic text prediction system. We will predict Equity codes in a search form fashion (i.e prediction starts when user starts typing).
“Humans aren’t as good as we should be in our capacity to empathize with feelings and thoughts of others, be they humans or other animals on Earth. So maybe part of our formal education should be training in empathy. Imagine how different the world would be if, in fact, that were ‘reading, writing, arithmetic, empathy.’ – Neil deGrasse Tyson Abstract The objective is the two-class discrimination (positive or negative opinion) from movie reviews using data from the IMDB database (50000 reviews).
“Everybody’s worried about stopping terrorism. Well, there is a really easy way: stop participating in it.” – Noam Chomsky Abstract According to the Wikipedia, the English word “terror“, just like the French “terreur”, derives from that Latin word “terrere” and means to fright, alarm, anguish, fear, panic. Indeed, we all fear terrorism that is the more and more part of our life. But do we understand the global picture? Who attacks who, where and why? Why do we see the more and more suicide attacks? This study focuses on answering these questions… Read More
The impact of individual characteristics on the length of life in India – Oaxaca-Blinder decomposition, Logit model
Abstract This study estimates the impact of social status, education and average life standards on the length of life by analyzing India’s mortality statistics in 2009 for two states, Uttarakhand and Bihar. Using several estimation methods such as MCO, GLM and Logit regressions, furthermore the Oaxaca-Blinder decomposi- tion, we find that education, electricity and the access to toilet significantly raises the length of life. We also find that members of the scheduled tribes live shorter, and this difference cannot be explained by differences between the average value of the two groups’ characteristics.
Multiple correspondence analysis, Clustering and Tandem Analysis through a basic income analysis example
Abstract Okay… So there were several basic income experiments launched in 2017, Finland started a two-year experiment by giving 2,000 unemployed citizens approximately $600 a month. In the Silicon Valley, Y Combinator, announced in mid-2016 that it would begin paying out monthly salaries between $1,000 and $2,000 a month to 100 families in Oakland, while in Utrecht, Netherland 250 Dutch citizens will receive about $1,100 per month. These are just three of the already launched experiments, and their aim is to measure how basic income could provide new structure for social security and to see how people’s… Read More
Abstract What determines happiness? Why countries are more (or less) happy than other ones? In 2017, Norway tops the global happiness ranking, made as an annual publication of the United Nations Sustainable Development Solutions Network. In this article, we use their data to show correlations of the variables used in this Index, furthermore we analyse the countries with the help of the Principal Component Analysis technic.
Abstract This article shows a simple program written in Python to do a basic web scraping. As an exercice, we get the titles of Youtube videos and the number of views, then we store these information in a Pandas DataFrame.
Abstract A well-made graph provides us a valuable insight to better understand and analyse data. This post gives us a very simple introduction to graphs in R with ggplot2 through exemples and with the codes.