One of the reasons we are interested in probabilities is to be able to tell the average value of a function. We call the average of function f under the probability distribution p(x) the expectation of f (denoted by E[f]), and it is equal to the weighted average of the function.

If x is discrete, this is computed by: while for a continuous x: The expectation of a function can be estimated by the sample mean. Suppose we have N observations, then the expectation of f(x) is approximated by the mean of the observations. This estimation gets closer and closer to the real value of the expectation as we increase N. The variance (var or σ2) of a function gives a measure of spread around its expectation. It is calculated as: Now of course we can talk about the variance of x, in this case the variance shows the spread around the mean (expected value) of x: The standard deviation is the square root of the variance, and it is noted as sd(x) or σ(x).

Finally, when considering two variables, x and y, their covariance gives a measure of the extent they vary together and it is defined by: If the covariance is positive, the two variables tend to move together, higher values indicating higher extent of joint variability, while a negative covariance indicates that as one variable increases, the other declines and/ or vice versa. Therefore, the sign of the covariance shows the tendency of the linear relationship between the two variables, however, the value is not easily interpreted. Its normalised value, the correlation between two variables is their covariance divided by the product of their standard deviation: This creates a statistic between -1 and 1, the first showing a complete negative linear relationship between the two variables, while values close to 1 indicating a strong positive linear relationship between the two variables. Finally if the two variables are linearly independent, their correlation (and covariance) is zero. As a final note, the covariance and the correlation shows uniquely linear relationships, it does not pick up on any other relationship between the two variables, even if there is a strong non-linear one! Thanks for reading!!!

# References

Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.

This site uses Akismet to reduce spam. Learn how your comment data is processed.