In words...

A regression problem is a supervised learning problem in which the quantity to be predicted (the label) is a real number and thus has an infinite number of possible values.

The most common loss function for regression is the squared loss, which strongly penalizes large prediction errors. With this choice, the optimal (best) model minimizing the risk (defined as the expectation of the squared loss) is known as the regression function, which however cannot be computed in practice from the limited amount of information contained in a training set. A learning algorithm that tries to mimic the regression function and minimizes the squared loss over the training set is the famous least squares method.

As for classification, we may distinguish between linear and nonlinear regression problems and methods. Another class of problems is switching regression, where the data are assumed to be generated by a collection of models, and which is closely related to piecewise smooth regression.

Despite belonging to the family of nonlinear regression methods, the $K$-nearest neighbors algorithm provides a rather straightforward and easy to understand approach to regression.

In picture...

Example: predicting the weight of an apple based on its diameter

Another example: predicting the weight of an apple based on a picture


In maths...

A regression problem is characterized by the infinite number of possible values for the label, with typically $\Y = \R$ or $\Y \subset \R$.

The most common loss function for regression is the squared loss $$ \ell(\hat{y}, y) = (y - \hat{y})^2 , $$ which strongly penalizes large deviations from the target value $y$. Using this loss function, the risk of a model $f$ is known as the Mean Squared Error (MSE), $$ R(f) = \E_{X,Y} [ ( Y - f(X))^2 ] = \int_{\X\times \Y} (y-f(x))^2\ p_{X,Y}(x,y) dx dy , $$ and its minimizer as the regression function.

Another typical loss function is the $\ell_1$-loss (or absolute loss) function $$ \ell(\hat{y}, y) = |y - \hat{y}| . $$