# Expectation Maximization for MAP estimation

“Expectation is the root of a heartache.” – William Shakespeare

Expectation–maximization (EM) is an iterative method that attempts to find the maximum likelihood estimator of a parameter θ of a parametric probability distribution. The algorithm computes maximum likelihood estimates of unknown parameters in probabilistic models involving latent variables. Therefore the EM algorithm is an iterative method that alternates between computing a conditional expectation and solving a maximization problem, hence its name.

### Intro of the model: – will be detailed soon (right after my exams 🙂 )

Suppose we have some observed data ‘y’ and a parametric density p(y |θ) and some complete data x that we wish we had, and the parametric density p(x|θ).

Given that we only observe y, the goal here is to find the maximum likelihood estimate (MLE) of θ, that is :

θ_{MLE} = arg max p(y|θ)

The algo can be decribed in the following steps:

Step 1.  Initialize θ with an initial step

Step 2. Given the observed data y and pretending for the moment that our current guess of θ is correct, we compute the probability distribution of p(x|y,θ(m))

Step 3.  Using the conditional probability distribution p(x|y,θ(m)) calculated in Step 2, we construct the conditional expected log-likelihood, which is called the Q-function, expressed as below:

Q(θ|θM)=E{X|Y,θm} logP(X|θ)

Step 4. We find the θ that maximizes the Q-function. The result is our new estimate θ(m+1).

Step 5 We go back to step 2 and we iterate until convergence! (So we can define an epsilon value and if our variables don’t move more than epsilon, we stop.)

So the Expectation Maximization algo can be described as 2 iterative steps, the E (expectation) and the M (maximization step). It is most often used for Mixture Models but in what follows, I will use it for a MAP (Maximum Aposteri Probability Maximization) with a non-negative matrix decomposition.

## Implementation

This section is based (but implemented a bit differently) on an article : (https://www.hindawi.com/journals/cin/2009/785152/), check it out, it is really very interesting!)

Consider the following probabilistic non-negative matrix factorization (NMF) model with W, H matrices the matrices that factorized give V approximately back, (for f = 1, . . . , F , n = 1, . . . , N , k = 1,…,K)

where G and PO denote the gamma and the Poisson distributions, respectively.

Suppose also that we have a latent variable, S, and

Where  PO(s;λ) denotes the Poisson distribution of the random variable s, with non-negative  λ:

where:

Therefore:

Since S – M,we can say that  E_{S_{fin}} = v_{fn} p_i

Once we defined the latent variable, S and its conditional distribution knowing W, H, V, we can proceed with the first step of the Expectation Maximization algorithm.

#### E-step

We define the E-Step as follows:

#### M-step

The M-step is defined as follows (with the following updates):

Similarly, we take the partial derivative of  Lt w.r.t. H, we obtain by symmetricity :

Therefore we can illustrate the graph model as follows:

import networkx as nx
import numpy as np
import matplotlib.pyplot as plt

G=nx.DiGraph()
nx.draw(G, with_labels=True, font_size=18, node_size=2000, node_color='#CDC1C5')
plt.title("Directed graph model of the MAP estimation with latent variable S", fontsize=20)
plt.show()
I will use the AT&T Database of Faces, containing face images from 40 people, where there are 10 images for each person. In total there are 400 images in the dataset, each image is 92 x 112 pixels, with 256 gray levels per pixel.
# First open the data, save the np.array of pixels as V
import scipy.io as sio
V = data["V"]
# Plot the different images in a first and simple way:
# reorganize the data to show each image next to another
def all_photo_visualisation(input_matrix):
"""
Input : the input np.array

Returns: Reorganizes the columns of the input np.array to (92*112) np.arrays,
concatenate these lines and return a graph showing all images.
"""
v_act = np.array(())
V_images = (input_matrix[:,0].reshape((92,112)))
v_act = np.array(())

for i in range(1,input_matrix.shape[1]):
v_act = input_matrix[:,i].reshape((92,112))
V_images = np.concatenate((V_images, v_act), axis=0)

list_i = [i*2 for i in range(1,20)]

v_act = V_images[range(92*20), :]
for i in list_i:
v_act2 = V_images[range(92*i*10, 92*(i+2)*10, ), :]
v_act = np.concatenate((v_act, v_act2), axis=1)

fig = plt.figure()
fig.set_size_inches(26, 30)
plt.imshow(v_act.T, interpolation='none',cmap='gray')
all_photo_visualisation(V)

#### EM algorithm implementation¶

In [25]:
# First I define a class for the model
class NMF_EM(object):
"""
V - a matrix of dimension (f, n) that we try to estimate
W, H- the non-negative matrix factorisation of V, dimensions (f, K), (K, n) respectively
K - the number of trends
alpha_w, alpha_h, beta_w, beta_h the constant terms of the shape and scale of the gamma distributions of W and H respectively
"""
def __init__(self, V, W, H, K, alpha_w, alpha_h, beta_w, beta_h):
self.V = V
self.W = W
self.H = H
self.K = K
self.alpha_h = alpha_h
self.alpha_w = alpha_w
self.beta_w = beta_w
self.beta_h = beta_h
self.n_rows, self.n_cols = V.shape

In [18]:
# The EM algo
def em_algo(Nmf_Class, max_iters):
"""
Inputs: a NMF_EM class with parameters:
V - a matrix of dimension (f, n) that we try to estimate
W, H- the non-negative matrix factorisation of V, dimensions (f, K), (K, n) respectively
K - the number of trends

Returns: iterates over W, H to get better estimates
"""
# Denominator initialization
den = (Nmf_Class.W).dot(Nmf_Class.H)

# Initialization of the updates np.arrays
W_new = (Nmf_Class.W).copy()
H_new = (Nmf_Class.H).copy()

# Temporary np.arrays to facilitate the calculations
temp_den_w = (Nmf_Class.W).copy()
temp_den_h = (Nmf_Class.H).copy()

for i in range(max_iters):
W_new = (Nmf_Class.alpha_w-1) * np.ones((Nmf_Class.W).shape) + np.multiply((Nmf_Class.W), ((Nmf_Class.V)/ den).dot((Nmf_Class.H).T))
temp_den_w = Nmf_Class.beta_w * np.ones((Nmf_Class.W).shape) + np.ones((Nmf_Class.V).shape).dot((Nmf_Class.H).T)
W_new = W_new/ temp_den_w

H_new = (Nmf_Class.alpha_h - 1) * np.ones((Nmf_Class.H).shape) + np.multiply(Nmf_Class.H, ((Nmf_Class.W).T.dot(Nmf_Class.V/ den)))
temp_den_h = Nmf_Class.beta_h * np.ones((Nmf_Class.H).shape) + (Nmf_Class.W).T.dot(np.ones((Nmf_Class.V).shape))
H_new = H_new/ temp_den_h

Nmf_Class.W = W_new
Nmf_Class.H = H_new

den = W_new.dot(H_new)

return Nmf_Class.W, Nmf_Class.H

In [19]:
# Now define a function that generates W and H randomly
def generate_data(K=25, f=92*112, n=400, alpha_w=1, beta_w=1, alpha_h=1, beta_h=1):
"""
Inputs:
K - nb of features
alpha_w, alpha_h - shape parameter of the gamma distribution of W and H
beta_w, beta_h - scale parameter of the gamma distribution of W and H
"""
# Set prior parameters
Alpha_w = alpha_w * np.ones([f, K]) # shape
Beta_w = beta_w * np.ones([f, K]) # scale
Alpha_h = alpha_h * np.ones([K, n]) # shape
Beta_h = beta_h * np.ones([K, n]) # scale

# Generate a random W and H matrix
W = np.random.gamma(Alpha_w, Beta_w)
H = np.random.gamma(Alpha_h, Beta_h)

return W, H
# We initialize randomly W_0 and H_0, following a Gamma distribution
W_0, H_0 = generate_data(K=25, f=92*112, n=400, alpha_w=1, beta_w=1, alpha_h=1, beta_h=1)
# Initialise a class.
# We initialize here the value of alpha and beta for W and H that we will use in the em algorithm
First_model = NMF_EM(V, W_0, H_0, K=25, alpha_w=1, alpha_h=1, beta_w=1, beta_h=1)
# We test the model on the initialized class with alpha_w = alpha_h = beta_w = beta_h = 1
W_new, H_new = em_algo(First_model, 100)
# Reorganize the data of the W matrix to show each feature (we reuse the originally written all_photo_visualisation fct
# But we change the number of photos per line and column.)
def all_feature_visualisation(input_matrice, nb_col, nb_row):
"""
Input : np_array (updated W matrix)

Returns: Reorganizes the columns of W into (92,112) np.array,
concecanate the np.arrays to plot a 5*5 together
"""
v_act = np.array(())
V_images = (input_matrice[:,0].reshape((92,112)))
v_act = np.array(())

for i in range(1,input_matrice.shape[1]):
v_act = input_matrice[:,i].reshape((92,112))
V_images = np.concatenate((V_images, v_act), axis=0)

list_i = [i for i in range(1, nb_row)]

v_act = V_images[range(92*nb_col), :]
for i in list_i:
v_act2 = V_images[range(92*i*nb_col, 92*(i+1)*nb_col, ), :]
v_act = np.concatenate((v_act, v_act2), axis=1)

fig = plt.figure()

all_feature_visualisation(W_new,5,5)