Underfitting refers to the case where a learning algorithm did not learn "well enough" in the sense that the predictions of the learned model are too far from the true labels, even for the training data.
Overfitting refers to the opposite situation where the model fits the training data too well, meaning that it ressembles a model that would learn by heart and thus that cannot generalize satisfactorily. In such cases, the model is typically too complex for the task at hand since the model must have a certain level of flexibility to be able to fit the training data.
Underfitting and overfitting are difficult to avoid as their definitions involve rather vague terms such as "too far from the true labels" or "too complex" without precise thresholds at which a model becomes too complex or predictions too far from the truth. This is where model selection techniques come into play. They try to estimate the best model that does not underfit nor overfit the training data. Model selection techniques are often based on a trade-off between model complexity and accuracy on the training data.
When approximating the target function
with $N = 20$ noisy data points in the training set,
the quality of the polynomial model highly depends on the degree
of the polynomial, which can be seen as an indication of the model complexity.
Degree:
Excercise: observe how the empirical risk converges to 0 when increasing the degree and how the true risk decreases with the empirical risk until it starts growing very large for high degrees.