The regression function

In words...

In a supervised learning problem characterized by a joint probability distribution of the random variables corresponding to the input and output label to be predicted, the regression function is defined as the condtional expectation of the output given the input.

In a regression setting with the squared loss function, the regression function is the optimal target model yielding the minimal risk. However, it cannot be computed without access to the true distribution of the data, which we assumed to be unknown. Its interest is therefore more of a theoretical nature, but some learning algorithms, such as the $K$-nearest neighbors algorithm, can be seen as trying to estimate this function.

In picture...

...

In maths...

The regression function is defined as $$ \forall x\in\X,\quad \eta(x) = \E_{Y|X} [ Y\ |\ X=x ] . $$

In a regression setting with the squared loss function, the regression function is the optimal target model minimizing risk $$ R(f) = \E_{X,Y} [ ( Y - f(X))^2 ] = \int_{\X\times \Y} (y-f(x))^2\ p_{X,Y}(x,y) dx dy . $$ Indeed, \begin{align} f^* &=\arg\min_f\ R(f) \\ &= \arg\min_f \ \E_{X,Y} [ ( Y - f(X))^2 ] \\ &= \arg\min_f \ \E_{X} \E_{Y|X} [ ( Y - f(X))^2 \ |\ X=x ] & \mbox{(by the } \href{totalprobability.html}{\mbox{law of total expectation}})\\ &= \arg\min_f \ \E_{X} \int_{\Y} (y-f(X))^2 p_{Y|X}(y|X) dy\\ &= \arg\min_f \ \E_{X} \int_{\Y} (y^2 + f(X)^2 - 2y f(X) )p_{Y|X}(y|X) dy \\ &= \arg\min_f \ \E_{X} \left[ f(X)^2 -2 f(X) \int_{\Y} y\ p_{Y|X}(y|X) dy \right]\\ &= \arg\min_f \ \E_{X} \left[ f(X)^2 -2 f(X) \E_{Y|X} [Y\ |\ X=x] \right]\\ \end{align} The expectation over $X$ can be seen as a sum over all possible $x$ and thus its minimizer is the function defined as the pointwise minimum of the argument of the expectation: \begin{align} \forall x\in \X,\quad f^*(x) &= \min_u \ u^2 -2 u \E_{Y|X} [Y\ |\ X=x] \\ &= \E_{Y|X} [Y\ |\ X=x] \\ &= \eta(x), \end{align} where the solution of the minimization with respect to $u$ is found by canceling the derivative of the cost function: $2u -2\E_{Y|X} [Y\ |\ X=x] = 0 \ \Rightarrow u = \E_{Y|X} [Y\ |\ X=x]$.