Reproducing kernel Hilbert space (RKHS)

In words...

A reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions associated with a reproducing kernel at the basis of the definition of the inner product and the evaluation functional in this space.

A real-valued positive definite function is what we call in machine learning a kernel function. Common kernel functions are the linear kernel, the polynomial kernel and the Gaussian kernel. The Moore-Aronszajn theorem states that for any kernel function, there is one and only one RKHS with this function as reproducing kernel.

RKHSs are particularly well-suited function classes for regularized learning thanks to the representer theorem.

In pictures...

In maths...

Definitions

We call kernel function a function that is a real-valued positive definite function according to the following definition.

A real-valued function $K$ on $\X^2$ is called a positive definite function if it is symmetric and \begin{align*} \forall n\in\mathbb{N},\ \forall \{\g x_i\}_{i=1}^n \in \X^n,\ \forall \{a_i\}_{i=1}^n \in \R^n, \quad \sum_{i=1}^n\sum_{j=1}^n a_i a_j K(\g x_i, \g x_j) \geq 0 . \end{align*}

Let $\left(\H, \inner{\cdot}{\cdot}_{\H}\right)$ be a Hilbert space of real-valued functions on $\X$. A real-valued function $K$ on $\X^2$ is a reproducing kernel of $\H$ if and only if
  1. $\forall \g x \in \X$, $K(\g x, \cdot) \in\H$;
  2. $\forall \g x\in\X,\ \forall f\in\H,\ \inner{f}{K(\g x,\cdot)}_{\H} = f(\g x)\quad$ (reproducing property).
A Hilbert space of real-valued functions which possesses a reproducing kernel is called a reproducing kernel Hilbert space (RKHS).

Note that the reproducing property of $K$ implies in particular that $$ \inner{K(\g x, \cdot)}{K(\g x^\prime, \cdot)}_{\H} = K(\g x, \g x^\prime) . $$

The Moore-Aronszajn theorem states that for all kernel function $K$, there is one and only one RKHS with $K$ as reproducing kernel.

PROOF ???

Properties of kernel functions

Beside the reproducing property, kernel functions can be regarded as generalized inner products and in particular satisfy the Cauchy-Schwartz inequality $$ K(\g x, \g x^\prime)^2 \leq K(\g x, \g x) K(\g x^\prime, \g x^\prime) $$

Proof: By the definition of a positive definite kernel $K$, the Gram matrix for the set of two points $\{\g x, \g x^\prime\}$, $$ \g K = \begin{bmatrix} K(\g x, \g x) & K(\g x, \g x^\prime) \\ K(\g x^\prime, \g x) & K(\g x^\prime, \g x^\prime) \end{bmatrix}, $$ is positive-semidefinite. As such, it has two nonnegative eigenvalues $\lambda_1$ and $\lambda_2$ and we have $$ 0\leq \lambda_1\lambda_2 = det(\g K ) = K(\g x, \g x)K(\g x^\prime, \g x^\prime) - K(\g x^\prime, \g x) K(\g x, \g x^\prime) = K(\g x, \g x)K(\g x^\prime, \g x^\prime) - K(\g x, \g x^\prime) ^2. $$

Function class, inner product and norm

Let $K$ be a kernel function and $\left(\H, \inner{\cdot}{\cdot}_{\H}\right)$ the associated RKHS, i.e., $K$ is the reproducing kernel of $\H$. Then, the class of functions $\H$ can be written as $$ \H = \left\{ f\in\R^{\X} \ :\ f = \sum_{i=1}^\infty \beta_i K(\g x_i, \cdot) ,\ \beta_i \in\R,\ \g x_i\in\X,\ \|f\|_{\H} <+\infty \right\} . $$ The RKHS norm $\|\cdot\|_{\H}$ appearing in this definition is the one induced as $$ \left\|f \right\|_{\mathcal{H}} = \sqrt{\inner{f}{f}_{\H}} $$ by the inner product defined for two functions, $f = \sum_{i=1}^{m} \alpha_i K(\g x_i, \cdot)$ and $g = \sum_{i=1}^{m^\prime} \beta_i K(\g x_i^\prime, \cdot)$, as $$ \inner{f}{g}_{\H} = \sum_{i=1}^{m}\sum_{j=1}^{m^\prime} \alpha_i \beta_j K(\g x_i, \g x_j^\prime) . $$

Proof: The function $\inner{\cdot}{\cdot}_{\H}$ defined above satisfies the basic requirements of an inner product as follows.
  1. Symmetry is inherited from the symmetry of the kernel function $K$ $$ \inner{f}{g}_{\H} = \sum_{i=1}^{m}\sum_{j=1}^{m^\prime} \alpha_i \beta_j K(\g x_i, \g x_j^\prime) =\sum_{j=1}^{m^\prime} \sum_{i=1}^{m} \beta_j\alpha_i K(\g x_j^\prime, \g x_i) = \inner{g}{f}_{\H}. $$
  2. Linearity in its first argument (1): $$ \forall \lambda\in\R,\quad \inner{\lambda f}{g}_{\H} = \lambda\sum_{i=1}^{m}\sum_{j=1}^{m^\prime} \alpha_i \beta_j K(\g x_i, \g x_j^\prime) = \lambda \inner{ f}{g}_{\H} , $$
  3. Linearity in its first argument (2): $f+g = \sum_{i=1}^{m} \alpha_i K(\g x_i, \cdot) + \sum_{j=1}^{m^\prime} \beta_j K(\g x_j^\prime , \cdot) = \sum_{i=1}^{m + m^\prime} \alpha_i K(\g x_i, \cdot)$ with $\alpha_i = \beta_{i-m}$ and $\g x_i=\g x_{i-m}^\prime$, for $i=m+1,\dots,m+m^\prime$, such that \begin{align} \forall h = \sum_{i=1}^{m^{\prime\prime}} \gamma_i K(\g x_i^{\prime\prime}, \cdot) \in\H,\quad \inner{f+g}{h}_{\H} &= \sum_{i=1}^{m+m^\prime} \sum_{j=1}^{m^{\prime\prime}} \alpha_i \gamma_i K(\g x_i, \g x_i^{\prime\prime} ) \\ &= \sum_{i=1}^{m} \sum_{j=1}^{m^{\prime\prime}} \alpha_i \gamma_i K(\g x_i, \g x_i^{\prime\prime} ) + \sum_{i=m+1}^{m+m^\prime} \sum_{j=1}^{m^{\prime\prime}} \alpha_i \gamma_i K(\g x_i, \g x_i^{\prime\prime} ) \\ &= \sum_{i=1}^{m} \sum_{j=1}^{m^{\prime\prime}} \alpha_i \gamma_i K(\g x_i, \g x_i^{\prime\prime} ) + \sum_{i=1}^{m^\prime} \sum_{j=1}^{m^{\prime\prime}} \beta_i \gamma_i K(\g x_i, \g x_i^{\prime\prime} ) \\ &= \inner{f}{h}_{\H} + \inner{g}{h}_{\H} \end{align}
  4. Positive-semidefiniteness is a direct consequence of the definition of a positive definite kernel function $K$: $$ \inner{f}{f}_{\H} = \sum_{i=1}^{m}\sum_{j=1}^{m} \alpha_i \alpha_j K(\g x_i, \g x_j) \geq 0 $$
  5. Positive-definiteness ($\inner{f}{f}_{\H} = 0 \Rightarrow f \overset{?}{=} 0$ ): First, we have $$ \forall \g x\in\X,\quad \inner{f}{K(\g x, \cdot)}_{\H} = \sum_{i=1}^{m}\alpha_i K(\g x_i, \g x) = f(\g x) $$ which shows that $K$ is a reproducing kernel of $\H$, and that $$ f(\g x)^2 = \inner{f}{K(\g x, \cdot)}_{\H}^2 $$ Then, note that by the linearity of $\inner{\cdot}{\cdot}_{\H}$ and its positive semidefiniteness proved just above, $\inner{\cdot}{\cdot}_{\H}$ is a positive definite kernel function defined over $\H^2$: \begin{align} \forall n\in\mathbb{N},\ \forall \{f_i\}_{i=1}^n \in \H^n,\ \forall \{a_i\}_{i=1}^n \in \R^n, \quad \sum_{i=1}^n\sum_{j=1}^n a_i a_j \inner{f_i}{f_j}_{\H} &= \inner{\sum_{i=1}^n a_i f_i}{\sum_{j=1}^n a_j f_j}_{\H} \\ &= \inner{f}{f}_{\H} \\ &\geq 0. \end{align} Therefore, it satisfies the Cauchy-Schwartz inequality, $\inner{f}{g}_{\H}^2 \leq \inner{f}{f}_{\H}\inner{g}{g}_{\H}$, and we obtain $$ f(\g x)^2 = \inner{f}{K(\g x, \cdot)}_{\H}^2 \leq \inner{K(\g x, \cdot)}{K(\g x, \cdot)}_{\H}\inner{f}{f}_{\H} = K(\g x, \g x)\inner{f}{f}_{\H} $$ Thus, $$ \inner{f}{f}_{\H} = 0\ \Rightarrow \ f(\g x)^2 = 0 \ \Rightarrow \ f(\g x) = 0 \quad \forall \g x\in\X, $$ which implies that $f$ is the zero function on $\X$: $f=0$.

Notes

The definition of positive definite kernel functions given here might be confusing since it is based on a property of positive-semidefiniteness of the resulting Gram matrices rather than positive-definiteness. However, this terminology is the most common in the machine learning literature.