This post explains what happens when you get underflow and overflow error. I think it is important to keep in mind when you look at different models in order to understand the difference between the theory and the implementation. I guess it is also an easy question to answer at interviews!
Why do we have underflow and overflow?
We try to model continuous mathematics on a computer with limited memory. This means that we try to represent an infinite number of values on a fixed set of bits. This is not possible so we almost always use some approximation to represent a number – often, this approximation is rounding error. This can have some unwanted consequences though.
When we try to represent very little numbers around zero, and they are rounded to zero, we may encounter underflow. Therefore underflow occurs when a mathematical operation yields a number that is smaller that the computer is capable of storing. Applications handle the underflow error differently, but many will round the number, so very little values around zero will be rounded to zero. The problem is that many function behaves differently for the value of zero. Take the division as a simple example: we can divide any number by 0.0000000000001 but not by 0! The computer program crashes and gives an underflow error.
Underflow occurs most often when working with decimal numbers. Working with so small numbers is actually often the case as probabilities (especially joint probabilities) computed over and over again can be extremely small. Although underflow is a relatively harmless approximation error compared to overflow (since using zero as the approximation of very small numbers around zero is actually quite logic), it may lead to the unfortunate situation that your algorithm should work fine in theory, but you can’t make it work in practice.
Another type of rounding error is overflow. It occurs when your computer receives a number that is bigger than your device is able to store. When very large numbers (too large for your computer’s memory) are computed, they are approximated to infinity or minus infinity. This can be considered as a more serious approximation error than underflow, since it is a less good approximation. This may cause your algorithm to crash as well, even if it worked fine in theory.
When you use the ML libraries for your models, they are often coded in a robust way to try to avoid underflow and overflow. In what follows I give you an example for such a robustness.
Let’s say that you want to code and use your own softmax function. Nothing difficult in it, you can simply code as:
import numpy as np def my_softmax(x): """Compute softmax for each element in x.""" return np.exp(x)/ np.sum(np.exp(x), axis=0)
Now your algo might work just fine for some values:
y = np.array([1, 1, 1, 1]) my_softmax(y)
array([0.25, 0.25, 0.25, 0.25])
But not for every values, when you use this with positive and very large values, overflow is encountered:
z = np.array([10000000000000000, 90000000000000000]) my_softmax(z)
/RuntimeWarning: overflow encountered in exp """ RuntimeWarning: invalid value encountered in true_divide """
And when you use it with negative and large values, underflow is encountered. This is because exp(x) gives zero, and the denominator (sum of exp(x) values becomes zero).
w = np.array([-10000000000000000, -90000000000000000]) my_softmax(w), np.exp(np.array([-90000000000000000])) /RuntimeWarning: invalid value encountered in true_divide """ (array([nan, nan]), array([0.]))
def robust_softmax(x): """Compute softmax for each element in x in a robust way to avoid underflow and overflow. """ z = x - np.max(x) return np.exp(z)/ np.sum(np.exp(z), axis=0) # Now all values give back the correct softmax values as our # softmax function is numerically more stable robust_softmax(y), robust_softmax(z), robust_softmax(w) (array([0.25, 0.25, 0.25, 0.25]), array([0., 1.]), array([1., 0.]))
Now our softmax function is more stable. Let’s show that the robust function is equivalent to the softmax function in theory, only the numerical stability is reinforced:
Since exp(max(x)) is just a contsant (and thus does not depend on the sum), we can simplify teh denominator by it and we find the softmax function:
ML algos are often coded in a robust way in order to avoid underflows and overflows, however, even with this precaution, we often we encounter these problems. At least now you know why you encounter it and you understand the difference between the theory and the practical implementation! Thanks for reading!