Regularization

In words...

Regularization is a versatile approach to the problem of overfitting, which can be seen as a modification of empirical risk minimization with control of the model complexity. In this approach, we minimize a trade-off between the training error and a regularization term which penalizes complex models.

Yet, the aforementioned trade-off remains to be tuned and is the focus of many model selection methods.

In pictures...

In maths...

Regularized learning in a function class $\F$ can be posed as the functional optimization problem $$ \min_{f\in\F} \ R_{emp}(f) + \lambda \Omega(f) , $$ where $R_{emp}(f)$ is the empirical risk of $f$, $$ R_{emp}(f) = \sum_{i=1}^N \ell(y_i, f(\g x_i)), $$ using a loss function $\ell$, $\Omega : \F\rightarrow \R^+$ is a regularizer which penalizes too complex functions $f$, and $\lambda >0$ is a hyperparameter tuning the trade-off between the fit to the training data and the regularization.

A common regularizer for functions $f\in \F\subseteq\H$ from a Hilbert space $\H$ is the squared norm: $$ \Omega(f) = \|f\|_{\H}^2 = \inner{f}{f}_{\H}. $$ If $\H$ is a reproducing kernel Hilbert spaces, then the solution to the learning problem with such a regularizer can be found efficiently thanks to the representer theorem.