Important probability distributions

This article summarises some discrete probability distributions (Bernouilli, Binomial and the Poisson distributions) and some continuous ones, such as the normal, student-t and exponential distributions, and graphs them in Python.

Discrete distributions

1) Bernoulli distribution

Let’s start with the easiest case, a binary variable that represents the probability that you win the lottery. Let x be 0 if you don’t win and 1 if you do. The Bernoulli distribution describes the probability distribution of such as binary variable, and it is characterised by the μ, the probability of x = 1:

(With the example of the lottery μ is about 1/14 millions). Since probabilities are always between 0 and 1, μ is greater or equal to 0, but less than or equal to 1. Now since the probabilities of all possible events add up to 1, the probability of x = 0 is:

Therefore, the Bernoulli distribution can be written as:

Now imagine that you play the lottery each week, and you want to know the joint probability of  winning. We can assume that each lottery outcome is a random variable that is independent of previous and future games and that each game has the same distribution (we say that the outcomes are i.i.d. that is independently and identically distributed). In this case the joint probability can be computed as the product of individual probabilities, that is:

Spoiler alert, it stay quite close to zero unfortunately. The following code plots the Bernoulli distribution with a μ=0.25. (Unfortunately I could not plot the Bernoulli of winning the lottery because it gave an empty plot. 😦 )

from scipy.stats import bernoulli
import numpy as np
import matplotlib.pyplot as plt
# Bernoulli
mu = 1/4
x_values = np.arange(0, 1)
plt.plot(0, bernoulli.pmf(0, mu), 'bo', ms=8, label='bernoulli pmf')
plt.vlines(x_values, 0, bernoulli.pmf(x_values, mu), colors="#6a79f7", lw=2, alpha=0.5)
plt.plot(1, bernoulli.pmf(1, mu), 'bo', ms=8, label='bernoulli pmf')
plt.vlines(1, 0, bernoulli.pmf(1, mu), colors="#6a79f7", lw=2, alpha=0.5)
plt.title(f"Bernoulli with mu = {mu}")

Now let’s pose another question, given that we play the lottery n times, what is the probability that we win exactly m times (with m ≤ n). This can be modelled by the Binomial distribution:

2) Binomial distribution

The binomial distribution with parameters n and m is a discrete probability distribution of the number of successes in a sequence of n independent experiments:

Its expectation is n * m, while its variance is N μ (1- μ).

# Binomial
plt.hist(np.random.binomial(n=10, p=mu, size=10000), rwidth=0.5, color="#00035b")
plt.title(f"Binomial with mu={mu}and n=10")

3) Poisson distribution

The Poisson distribution is a discrete probability distribution, that shows the probability of a given number of independent events occurring in a fixed time period or space, assuming that these events occur with a known constant mean rate (and independently from the last occurrence). For instance, the number of new arrivals at a petrol station or the number of patients of rare diseases in a population may follow a Poisson distribution.

The distribution is characterised by λ, the expected value of the distribution. The probability of an event occurring k times in an interval can be written as:

Its expectation is n * m, while its variance is N μ (1- μ). The code and graphs for different values of lambda are shown below: 

from scipy.stats import poisson

# Poisson
lambda1, lambda2, lambda3 = 1, 4, 6
dist1, dist2, dist3 = poisson(lambda1), poisson(lambda2), poisson(lambda3)

x_axis_values = [x for x in range(0, 10)]
# PDF-s
probabilities1 = [dist1.pmf(value) for value in x_axis_values]
probabilities2 = [dist2.pmf(value) for value in x_axis_values]
probabilities3 = [dist3.pmf(value) for value in x_axis_values]
# CDF-s
cdf1 = [dist1.cdf(value) for value in x_axis_values]
cdf2 = [dist2.cdf(value) for value in x_axis_values]
cdf3 = [dist3.cdf(value) for value in x_axis_values]
plt.plot(x_axis_values, probabilities1, "go--", color="#000080", label=f"PDF- lambda={lambda1}")
plt.plot(x_axis_values, probabilities2, "go--", color="#5DADE2", label=f"PDF- lambda={lambda2}", )
plt.plot(x_axis_values, probabilities3, "go--", color="#E74C3C", label=f"PDF- lambda={lambda3}")
plt.title(f"Poisson PDF with different values of lambda")
plt.plot(x_axis_values, cdf1, "go--", color="#000080", label=f"CDF- lambda={lambda1}")
plt.plot(x_axis_values, cdf2, "go--", color="#5DADE2", label=f"CDF- lambda={lambda2}", )
plt.plot(x_axis_values, cdf3, "go--", color="#E74C3C", label=f"CDF- lambda={lambda3}")
plt.title(f"Poisson CDF with different values of lambda")

The difference between binomial and poisson distribution is quite subtle: binomial distribution is for discrete trials (probability of winning the lottery m times when tried n times in total), whereas poisson distribution is for continuous trials (for instance the probability of winning the lottery in a fixed time period).

For very large n and near-zero m, binomial distribution is near identical to poisson distribution.

Continuous distributions

1) Normal distribution

The normal distribution (also called Gaussian) is the probability distribution that probably occurs the most often naturally. For instance, people’s heights, test results, babies weight and IQ tests follow normal distributions.

Another reason why the normal distribution is very important is because of the sum of random variables. The Central Limit Theorem states that the sum of random variables (itself a random variable) becomes increasingly gaussian as the number of terms in the sum increases.

If the normal distribution is defined for a single real valued variable, x, the distribution is given by the following form:

We see immediately that the distribution of x is determined by two parameters:

  • its mean- the value with the highest probability occurring, its expectation (μ )
  • and its variance, its spread from the mean ( σ2 )

The maximum of the PDF is called the mode, which equals the mean in the case of the normal distribution.

The standard distribution has a mean of 0 and a variance of 1. The corresponding PDF and CDF are shown below.

from scipy.stats import norm
mu = 0
sigma = 1
# create distribution
dist = norm(mu, sigma)
# plot pdf
x_axis_values = [0+0.1*x for x in range(-100, 100)]
probabilities = [dist.pdf(value) for value in x_axis_values]
cdfp = [dist.cdf(value) for value in x_axis_values]

plt.plot(x_axis_values, probabilities, color="#00035b", label="PDF")
plt.plot(x_axis_values, cdfp, color="#6a79f7", label="CFD")
plt.title("Normal distribution", color="#00035b")


Now if you have a D-dimensional vector, x, then the normal distribution becomes a function of the D-dimensional mean vector and a D x D dimensional covariance matrix.

The multivariate Gaussian with D=2 is shown below.

from scipy.stats import multivariate_normal

#Parameters to set
mu = [0, 0]
var = [[10, 0], [0, 10]]

# Grid and multivariate normal distribution
x = np.linspace(-10, 10, 500)
y = np.linspace(-10, 10, 500)
X, Y = np.meshgrid(x, y)
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X; pos[:, :, 1] = Y
rv = multivariate_normal(mu, var)

# Make a 3D plot
fig = plt.figure()
ax = fig.gca(projection='3d', title=f"Mutivariate normal with mu = {mu} and variance = {var}")
ax.plot_surface(X, Y, rv.pdf(pos), cmap='viridis', linewidth=0)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')

2) Student-t distribution

The student-t distribution is very similar to the normal distribution, but it is slightly shorter and its tails are fatter. When conducting statistical tests on small sets of data, t-distribution is preferred to the normal distribution, but the larger the sample size is, the more the student-t distribution converges towards the normal distribution.

Suppose that we draw an n sized sample from an underlying normal distribution with mean  μ and variance σ2. Let  x̄ be the sample mean and s the sample standard deviation. Then the t-statistic, defined as:

has a student-t distribution with n-1 degrees of freedom (dof). For each sample size, there is a different student-t distribution. The bigger the sample size gets, the closer the student-t distribution gets to the normal distribution. But student-t distributions are generally quite similar to normal distributions, they are bell-shaped and centred around their mean. The following code shows some student-t PDF-s and CDF-s for different dof-s, and compare them to the normal distribution. We can see that as the sample size (and so the dof) increases, the student-t distribution becomes almost identical to the normal distribution. 

from scipy.stats import t, norm
import matplotlib.pyplot as plt
dof1, dof2, dof3 = 1, 5, 30
dist1, dist2, dist3 = t(dof1), t(dof2), t(dof3)
# normal distribution
mu = 0
sigma = 1
# create distribution
normal_dist = norm(mu, sigma)

x_axis_values = [0+0.1*x for x in range(-100, 100)]
# PDF-s
probabilities1 = [dist1.pdf(value) for value in x_axis_values]
probabilities2 = [dist2.pdf(value) for value in x_axis_values]
probabilities3 = [dist3.pdf(value) for value in x_axis_values]
normal_distr_proba = [normal_dist.pdf(value) for value in x_axis_values]
# CDF-s
cdf1 = [dist1.cdf(value) for value in x_axis_values]
cdf2 = [dist2.cdf(value) for value in x_axis_values]
cdf3 = [dist3.cdf(value) for value in x_axis_values]
normal_cdf = [normal_dist.cdf(value) for value in x_axis_values]
plt.plot(x_axis_values, probabilities1, "--", color="#E74C3C", label=f"Student-t- dof={dof1}")
plt.plot(x_axis_values, probabilities2, "--", color="#3498DB", label=f"Student-t- dof={dof2}", )
plt.plot(x_axis_values, probabilities3, "--", color="#27AE60", label=f"Student-t- dof={dof3}")
plt.plot(x_axis_values, normal_distr_proba, "--", color="#17202A", label=f"Standard normal")
plt.title(f"Student-t PDF with different values of dof vs Standard Normal")
plt.plot(x_axis_values, cdf1, "--", color="#E74C3C", label=f"Student-t- dof={dof1}")
plt.plot(x_axis_values, cdf2, "--", color="#3498DB", label=f"Student-t- dof={dof2}", )
plt.plot(x_axis_values, cdf3, "--", color="#27AE60", label=f"Student-t- dof={dof3}")
plt.plot(x_axis_values, normal_cdf, "--", color="#17202A", label=f"Standard normal")
plt.title(f"Student-t CDF with different values of dof vs Standard Normal")

3) Exponential distribution

The exponential probability distribution is a continuous probability distribution, defined by one parameter, λ, the rate of the distribution. This describes processes in which a few outcome are the most likely, the probability of other outcomes being much lower. It can also be seen as the time between events characterised by the Poisson distribution.

For instance, the time between two earthquakes or how long a car’s battery last can be described by the exponential distribution.

It’s PDF is described as:

While its CDF is described by:

Its mean is 1/λ and its variance is 1/λ2 .

The following code graphs an PDF and CDF for the exponential distribution for different values of lambda:

from scipy.stats import expon
lambda1, lambda2, lambda3 = 0.5, 1.5, 2.5
dist1, dist2, dist3 = expon(scale=1/lambda1), expon(scale=1/lambda2), expon(scale=1/lambda3)
x_axis_values = [x for x in range(0, 10)]
# PDF-s
probabilities1 = [dist1.pdf(value) for value in x_axis_values]
probabilities2 = [dist2.pdf(value) for value in x_axis_values]
probabilities3 = [dist3.pdf(value) for value in x_axis_values]
# CDF-s
cdf_lambda1 = [dist1.cdf(value) for value in x_axis_values]
cdf_lambda2 = [dist2.cdf(value) for value in x_axis_values]
cdf_lambda3 = [dist3.cdf(value) for value in x_axis_values]
# Plot PDF for different values of lambda
plt.plot(x_axis_values, probabilities1, color="#000080", label=f"PDF- lambda={lambda1}", )
plt.plot(x_axis_values, probabilities2, color="#5DADE2", label=f"PDF- lambda={lambda2}", )
plt.plot(x_axis_values, probabilities3, color="#1B4F72", label=f"PDF- lambda={lambda3}")
plt.title("Exponential PDF for different values of lambda")

plt.plot(x_axis_values, cdf_lambda1, color="#000080", label=f"PDF- lambda={lambda1}", )
plt.plot(x_axis_values, cdf_lambda2, color="#5DADE2", label=f"CDF- lambda={lambda2}")
plt.plot(x_axis_values, cdf_lambda3, color="#1B4F72", label=f"PDF- lambda={lambda3}", )
plt.title("Exponential CDF for different values of lambda")
plt.legend()  Summary

This blog shows the main properties of some discrete and continuous probability densities. This list is by no means exclusive, check the references if you want to dig deeper! Thanks for reading!


Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.

Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.

Bertsekas, Dimitri P., and John N. Tsitsiklis. “Introduction to Probability Vol. 1.” (2002).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Create a website or blog at

Up ↑

%d bloggers like this: