Introduction to Graph Models


“Graphical models are a marriage between probability theory and graph theory.” – Michael Jordan, 1998.

Probability is very important in modern pattern recognition problems. These problems could be assessed by formulating and  solving difficult probabilistic models, however, using a graphic representation of these probabilistic problems is often highly advantageous for the following reasons:

1) The visualisation of models makes the models themselves easier to understand and handle. They can also help us to distinguish new models or to point out similarities between already existing model structures, that we have not assumed.

2) They help us to have greater isights into the models structures. They can for instance carry out information whether there is a conditional independence between variables by following simples rules. These rules are based, however, on complex underlying mathematical manipulation. To avoid the troublesome computations, we can easily determine the inference and conditional dependencies by looking at the graphs.

A graph has nodes that are connected by edges. In probabilistic models, each node stands for a random variable whereas edges show us the probabilistic relationships between these variables. This article will give an introduction to a short probability review (where only necessary probability theory is presented), followed by an introduction to:

1) Directed graphical models (DAG-s), also known as Bayesian networks

2) Undirected graphs (UD), also known as Markov random fields and

3) Factor graphs

1) Directed a-cyclic graphical models (DAG-s)

Suppose we have three random variables, a, b and c. The probability that all three variables happens, (that is for instance, they take a value) can be expressed as : p(a, b, c).

The DAG-s can help us to model the relationships between these random variables. Suppose that our random variables are as following:

-a – it will rain (treating it as a random variable)

-b – I will be wet (treating it as a random variable that depends on variable a)

-c – I will be grumpy (treating it as a random variable that depends on the variables a and b – indeed, I am very moody)

We see that the probability p(a, b, c)  represent the probability that it rains so I will get wet in the rain and I will be grumpy. Therefore, the probability that all these three occurs, can be written as:

p(a, b, c) = p(c | a, b) p(b, a)

We have used the product rule here. Next, we use again the product rule to break up the  joint probability of a and b (p(b, a)):

p(a, b, c) = p(a) p(b | a) p(c | a, b)

As the probability that I get wet in the rain depends on the whether it rains, whereas the probability that  I will be grumpy depends on the rain and whether I got wet in the rain. Indeed, the force of DAG-s is that they can easily represent such relationships between random variables.  To represent this model as a DAG, first, we need to represent each variable as a node, then we need to show the conditional probabilities in forms of edges. Here, a is not conditional on b and c, however, b is conditional on a. To show this, we direct an arrow from a to b. The variable ‘c’ depends on ‘a’ and ‘b’, so we direct an arrow from ‘a’ and from ‘b’ to ‘c’. Finally, we have created our first directed model, shown on figure below:

In [2]:
import networkx as nx
import matplotlib.pyplot as plt

G.add_nodes_from(['a', 'b','c'])
nx.draw(G, with_labels=True, font_size=18, node_size=2000, node_color='#CDC1C5')
plt.title("First directed graph model", fontsize=20)

Now imagine for a moment that we have three random variable: x, y and z and I precise nothing about them. Then, following the approach shown above, we could rewrite the joint probability of x, y and z as follows :

p(x, y, z) = p(x | y, z) p(y, z)

And by repeatedly using the product rule:

p(x, y, z) = p(x | y, z) p(y | z) p(z)

The associated graph is shown in the figure below.

In [5]:
G.add_nodes_from(['x', 'y','z'])
nx.draw(G, with_labels=True, font_size=18, node_size=2000, node_color='#CDC1C5')
plt.title("Second directed graph model", fontsize=20)

Could have we chosen a different ordering? Could we represent the data differently? Look at the following product rules:

p(x, y, z) = p(y | x, z) p(x, z) $$$ p(x, y, z) = p(y | x, z) p(z | x)  p(x) $

And the associated graph is shown in figure X.

In [7]:
G.add_nodes_from(['x', 'y','z'])
nx.draw(G, with_labels=True, font_size=18, node_size=2000, node_color='#CDC1C5')
plt.title("Third directed graph model", fontsize=20)

Therefore by writing a probabilistic relationship between the three variables, we see that the although the right side of the equations (p(x, y, z)) is symmetrical to the three random variables, the left side is not, it assumes an already existing ordering (namely that x depends on y and z or y depends on x and y, etc) so would have we chosen a different ordering, we would have had a different decomposition and therefore a different graphical representation.

Now we extend our model to the general n-element model:

p( x1, x2, … xn) = p( x1 | x2, … xn) p( x2 | x3, … xn) … p( xn-1 | xn) p(xn)

The graph of this model would have n nodes, and would be fully connected meaning that between each two nodes, there is a directed edge. This model is not very informative and is difficult to work with. Our aim is to try to deduct important relationships between the random variables and disregard those that do not concern us. Therefore we can make an interesting point: not only the  presence but also the absence of edges between two variables convey an important fact about our models.

Next, the question arises how could we represent the joint distribution of random variables given a DAG model? Until now, we have expressed the probability distribution of random variables that we have translated into a graph, how do we translate a graph into a probabilistic distribution?

We can now deduce the general method to translate a given DAG model into the probabilistic distribution of the random variables. The joint distribution defined by the graph is given by the product, over all nodes of the graph, of a conditional distribution of each node conditioned on the corresponding parent variables of the node. We can write :

p(x) = \Pi_{k=1}^{K} P(x_k|parents(x) )

This above is the factorisation property.

Explaining away or the Conditional Independence Idea

Now we consider three variables ‘a’, ‘b’, and ‘c’, and we suppose that the conditional distribution of ‘a’, given ‘b’ and ‘c’ is equal to the conditional distribution of ‘a’ given ‘b’, that is:

p(a|b, c) = p(a|b)

Therefore we say here that’a’ is conditionally independent of ‘c’ given ‘b’.

We will use the notion of the conditional probability in the following application of DAGs:

Hidden Markov Models

Dynamic Bayesian Networks (DBNs) are directed graphical models of stochastic processes. The idea is to represent the hidden (and observed) state in terms of state variables, which can have complex interdependencies.


The simplest kind of DBN is a Hidden Markov Model (HMM), which has one discrete hidden node and one observed node for each step. The above graph shows a HMM with 10 nodes.

In [10]:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import random
import plotly.plotly as py
import plotly.graph_objs as go
from IPython.display import display, Math, Latex
import pyparsing
import matplotlib.pyplot as plt
import seaborn as sns
np.set_printoptions(precision=5, suppress=True)

%matplotlib inline

def draw_graph_2(A, B):

    G = np.zeros((20, 20))

    positions = [(0, 0), (0, -1), (1, 0), (1, -1), (2, 0), (2, -1), (3, 0), (3, -1), (4, 0), (4, -1), (5, 0),
           (5, -1), (6, 0), (6, -1), (7, 0), (7, -1), (8, 0), (8, -1), (9, 0), (9, -1)]

    labels = {0: '$x_{1}$', 1: '$y_{1}$', 2: '$x_{2}$', 3: '$y_{2}$', 4: '$x_{3}$',
              5: '$y_{3}$', 6: '$x_{4}$', 7: '$y_{4}$', 8: '$x_{5}$', 9: '$y_{5}$',
              10: '$x_{6}$', 11: '$y_{6}$', 12: '$x_{7}$', 13: '$y_{7}$', 14: '$x_{8}$',
              15: '$y_{8}$', 16: '$x_{9}$', 17: '$y_{9}$', 18: '$x_{10}$', 19: '$y_{10}$'}

    G[0, 1] = 1

    for i in range(1, 10):

        for e in A:
            a, b = e
            a = a + 2 * i
            b = b + 2 * i
            G[a, b] = 1

        for e in B:
            a, b = e
            a = a + 2 * (i - 1)
            b = b + 2 * i
            G[a, b] = 1

    return G, positions, labels

A = [(0, 1)]
B = [(0, 0)]

A, positions, labels = draw_graph_2(A, B)
G = nx.DiGraph(A)
nx.draw(G, positions, font_size=18, node_size=2000, labels=labels, arrows=True, node_color="#CDC1C5")

Screenshot 2018-11-27 at 21.48.34

More later!!!!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: