The risk and the loss function

In words...

The risk of a learning machine or model is the amount of prediction error that can be expected when using that model. The main goal in supervised learning is to minimize this risk in order to obtain the most accurate model.

In the sentence above, the "amount of error" is measured via a loss function while the term "that can be expected" refers to the mathematical expectation of this loss function, i.e., the value of the loss obtained on average when testing the model on randomly drawn inputs.

The loss function directly encodes our knowledge about a given problem and the goal we set for the learning machine. The definition of the loss function thus depends on whether we are in a classification or regression setting. In all cases, a loss function must obey a few basic rules: i) it should compare a label with a predicted label and output a positive real number, ii) it must output zero when the two labels are equal.

The minimum of the risk over all possible models is called the Bayes' risk, which is what we would ideally obtain with a learning machine having access to the probability distribution of the data or to an infinite number of examples.

The risk is not a quantity that can be computed exactly, since it measures the quality of predictions of unknown phenomena, and in practice must be estimated. The most simple estimate of the risk is the test error. Cross-validation is also a widely used technique.

In maths...

In supervised learning, a loss function $\ell : \Y \times \Y \rightarrow \R^+$ computes a pointwise measure $\ell(y,\hat{y})$ of the error between a label $y$ and its estimate $\hat{y}$. For all $(y,\hat{y}) \in \Y^2$, a loss function satisfies $$\ell(y,\hat{y}) \geq 0$$ and $$\hat{y} = y \quad \Rightarrow \quad \ell(y,\hat{y}) = 0 .$$

Given a loss function $\ell$, the risk of a model $f$ is defined as the expectation of the loss function applied to the output of the model: $$R(f) = \E_{X,Y} [ \ell(f(X), Y) ].$$ Using the law of total expectation, another convenient formulation involves the conditional expectation of $Y$ given $X$: $$R(f) = \E_{X} \E_{Y|X=x} [ \ell(f(X), Y)\ |\ X=x ].$$