Overall learning procedure

In words...

Applying machine learning techniques to solve a practical supervised learning problem involves a number of steps.

  1. Identify the type of problem at hand (is this a classification or a regression problem? what are the input and labels? ) and choose a loss function accordingly.
  2. Collect some data pairs of inputs with corresponding labels.
  3. Depending on the amount of data collected, decide on a validation method to estimate the risk (test data or cross validation?)
  4. Choose a restricted model class.
  5. Choose a learning algorithm (which inherently implies a choice of learning paradigm, e.g., between empirical risk minimization or maximum likelihood inference) suitable for this type of problem and of models.
  6. If the algorithm (or the model class) has some hyperparameters, then perform model selection.
  7. Apply the learning algorithm on the training data (with hyperparamters values found at the previous step).
  8. Apply the validation procedure chosen at step 3.
Note that some of these steps may or may not be relevant in different situations. The ordering between the choice of model and of algorithm might change. For instance, if one chooses the $K$-nearest neighbors algorithm, then one does not have to choose a restricted model class.

In pictures...

In maths...