Learning algorithms

In words...

A learning algorithm is a method that aims at building a predictive model from a training set in the context of supervised learning.

If the predictive model is restrained to be in some model space which imposes a particular structure on the model, then the goal of the learning algorithm is to determine the values of the model parameters involved in this structure.

A learning algorithm can take different forms, from a fully detailed algorithm that can be directly implemented on a computer to a higher-level formulation, for instance as an optimization problem for which precise algorithms are well-known.

When a learning algorithm involves parameters that have to be tuned by the user, these are called hyperparameters to make a clear distinction with the model parameters that are determined automatically by the learning algorithm.

Finding the optimal value for a hyperparameter is a difficult task, known as model selection and which typically relies on an estimation of the risk, for instance with an additional data sample or a cross-validation procedure. Note that when using an additional sample to tune hyperparameters, this sample is called a validation sample in order to distinguish it from the test sample that is used to estimate the risk of the model at the end of the learning procedure.

In pictures...

Abstract view of a learning algorithm

In maths...

In a supervised learning problem with input space $\X$ and label space $\Y$, a learning algorithm $\mathcal{A}$ is a mapping of the form $$ \mathcal{A} : ( \F , \{(\g x_i, y_i)\}_{i=1}^N ) \mapsto f \in \F $$ where $\F$ is a set of possible models, $\{(\g x_i, y_i)\}_{i=1}^N \in (\X\times \Y)^N$ is a training set and $f$ is the returned predictive model.

Hyperparameters

Some learning algorithms have one or more parameters that have to be tuned by the user. To differentiate these from the model parameters that are learned when applying the algorithm, we call the parameters of an algorithm its hyperparameters. In this case, the mapping takes the form $$ \mathcal{A} : ( \F , \{(\g x_i, y_i)\}_{i=1}^N, \gamma ) \mapsto f \in \F $$ for a generic (and possibly multidimensional) hyperparameter $\gamma$.