Some ML techniques work fine when you have only a few dimensions (d), but when you increase the dimensionality, they break down. Some phenomenas that arise when analysing data only in high-dimensional spaces (and that do not occur in low-dimensional settings) are referred to as the curse of dimensionality. This short post explains what is the curse and why it occurs.
Two examples are often given two the curse of dimensionality. First, let’s consider the combinatorics of binary variables: if you have D binary variables, the number of possible outcomes is 2D. The number of possible outcomes is exponential in the number of dimensions!
Let’s see the curse of dimensionality through another example. The K-Nearest Neighbour (KNN) classifier receives a new point (x) and verifies the class of the closest K points to x. Then returns the probability of the class membership (p(c | x, D, K)) as the ratio of the points belonging to the given class. So the algorithm returns the predicted class membership as the ratio of class membership of the nearest K points to x:
An often used distance metric of the KNN is the Euclidean distance.
After reviewing the KNN classifier, let’s apply it in two settings: in a lower and higher dimensional space. First, we uniformly place 100 points in a two-dimensional square. We choose one point from the 100 uniformly allocated one randomly, and we grow a D-dimensional cube around it until it contains a chosen ratio of the points, say f. How big the D-dimensional cube has to be compared to the total 2-dimensional space? The expected edge length of the square will be given by the following equation:
with D-dimensional space, and f fraction of the points in the D-dimensional cube.
So to return to our example, by considering 2 dimensions (D = 2) and a fraction of 10% of the points (f = 0.1), we find:
which means that the D-dimensional cube that contains 10% of the uniformly placed points would be approximately 31% of the total two dimensional space!
Now let’s increase the number of dimensions, and imagine placing the 100 points uniformly in a 10-dimensional space (D=10). Now the D-dimensional cube drawn on a point is almost 80% of the size of the total space filled with the 100 points:
But then looking so far away from our original point may not be a very good approximation. In summary, the curse of dimensionality is that as we increase the dimensionality of space, our data space becomes the more and more sparse and so adding one more dimension would require an exponential increase in the number of sample data points to have the same sparsity.
Thanks for reading!
References
Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.
Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.
Leave a Reply