Constructing kernel functions

In words...

A kernel function is a function of two arguments that implicitly computes an inner product between images of its arguments in some feature space. There are two major techniques to construct valid kernel functions: either from an explicit feature map or from other valid kernel functions, such as these common kernels.

For instance, the sum or product of two kernel functions is a kernel function. A kernel function multiplied by a scalar remains a kernel function. The composition of two kernel functions is also a kernel function.

In pictures...

In maths...

Kernels with explicit feature maps

Given an explicit feature map \begin{align*} \phi : & \X \rightarrow \X^\phi \\ & \g x \mapsto \phi(\g x) , \end{align*} a valid kernel function can be defined as $$ K(\g x, \g x^\prime) = \inner{\phi(\g x)}{\phi(\g x^\prime)}_{\X^\phi} . $$

Notably, for any function $f : \X \rightarrow \R$, the function $$ K(\g x, \g x^\prime) = f(\g x) f(\g x^\prime) $$ is a valid kernel function.

Proof: Using the feature map $\phi : \g x\mapsto f(\g x)$, we have $$ K(\g x, \g x^\prime) = f(\g x) f(\g x^\prime) = \inner{\phi(\g x)}{\phi(\g x^\prime)}_{\X^\phi} . $$

Constructing kernels from kernels

A function is a valid kernel function if it is a real-valued positive definite function, whose definition is recalled below.

A real-valued function $K$ on $\X^2$ is called a positive definite function if it is symmetric and \begin{align*} \forall n\in\mathbb{N}^*,\ \forall \{\g x_i\}_{i=1}^n \in \X^n,\ \forall \{a_i\}_{i=1}^n \in \R^n, \quad \sum_{i=1}^n\sum_{j=1}^n a_i a_j K(\g x_i, \g x_j) \geq 0 . \end{align*}

This condition can be written more compactly in terms of the positive semi-definiteness of the kernel matrix (or Gram matrix), which is defined for a data set $\{\g x_i\}_{i=1}^n\in\X^n$ as $$ \g K = \begin{bmatrix} K(\g x_1, \g x_1) & \dots & K(\g x_1,\g x_n) \\ \vdots\\ K(\g x_n, \g x_1) & \dots & K(\g x_n,\g x_n) \\ \end{bmatrix} . $$ With these notations a function $K$ is a valid kernel if, for any set of points $\{\g x_i\}_{i=1}^n\in\X^n$, it gives rise to a kernel matrix $\g K$ that is positive semi-definite, i.e., if $$ \forall \g a \in \R^n, \quad \g a^T \g K \g a \geq 0 . $$

In all the following, $K_1$ and $K_2$ are assumed to be valid kernel functions and $\g K_1$ and $\g K_2$ their respective Gram matrices.

Scalar multiplication

The validity of a kernel is conserved after multiplication by a positive scalar, i.e., for any $\alpha\geq 0$, the function $$ K(\g x, \g x^\prime) = \alpha K_1 (\g x, \g x^\prime) $$ is a valid kernel.

Proof: By construction, the Gram matrix is given by $$ \g K = \alpha \g K_1 $$ which implies that $$ \forall \g a \in \R^n, \quad \g a^T \g K \g a = \alpha \g a^T \g K_1 \g a \geq 0 $$ due to the positivity of $\alpha$ and the validity of $K_1$.

Adding a positive constant

For any positive constant $\alpha \geq 0$, the function $$ K(\g x, \g x^\prime) = K_1 (\g x, \g x^\prime) + \alpha $$ is a valid kernel function.

Proof: Let $\phi_1$ denote a feature map of $K_1$. Then, using the feature map $\phi : \g x\mapsto [\phi_1(\g x),\ \sqrt{\alpha}]^T$, we have $$ \inner{\phi(\g x)}{\phi(\g x^\prime)} = \inner{\phi_1(\g x)}{\phi_1(\g x^\prime)} + \alpha = K_1(\g x, \g x^\prime) + \alpha = K(\g x, \g x^\prime). $$

Linear combination (includes the sum of two kernels as a special case)

A linear combination of kernel functions involving only positive weights, i.e., $$ K(\g x, \g x^\prime) = \sum_{j=1}^m \alpha_j K_j (\g x, \g x^\prime),\qquad \mbox{with } \alpha_j \geq 0, $$ is a valid kernel function.

Proof: By construction, the Gram matrix is given by $$ \g K = \sum_{j=1}^m\alpha_j \g K_j $$ which implies that $$ \forall \g a \in \R^n, \quad \g a^T \g K \g a = \sum_{j=1}^m\alpha_j \g a^T \g K_j \g a \geq 0 $$ due to the positivity of the $\alpha_j$ and the validity of the kernels $K_j$.

Product

The product of two kernel functions, i.e., $$ K(\g x, \g x^\prime) = K_1 (\g x, \g x^\prime) K_2 (\g x, \g x^\prime), $$ is a valid kernel function.

Proof: By construction, the Gram matrix is given by $$ \g K = \g K_1 \odot \g K_2 , $$ where $\odot$ denotes the Hadamard (entrywise) product. Given that $\g K_1$ and $\g K_2$ are symmetric positive semi-definite matrices, their eigendecompositions $$ \g K_1 = \sum_{i=1}^n \lambda_i \g u_i \g u_i^T ,\quad \mbox{ and } \quad \g K_2 = \sum_{j=1}^n \mu_j \g v_j \g v_j^T , $$ have positive eigenvalues $\lambda_i \geq 0$ and $\mu_i\geq 0$. This leads to \begin{align} \g K &= \sum_{i=1}^n \sum_{j=1}^n \lambda_i \mu_j (\g u_i \g u_i^T) \odot ( \g v_j \g v_j^T ) \\ &= \sum_{i=1}^n \sum_{j=1}^n \lambda_i \mu_j (\g u_i \odot \g v_j ) (\g u_i \odot \g v_j )^T \\ &= \sum_{k=1}^{n^2} \gamma_k \g w_k \g w_k^T, \end{align} with $\gamma_k = \lambda_{\lfloor k/n \rfloor} \mu_{k\ mod\ n} \geq 0$ and $\g w_k = \g u_{\lfloor k/n \rfloor} \odot \g v_{k\ mod\ n}$. Thus, $$ \forall \g a \in \R^n, \quad \g a^T \g K \g a = \sum_{k=1}^{n^2} \gamma_k \g a^T \g w_k \g w_k^T \g a = \sum_{k=1}^{n^2} \gamma_k (\g w_k^T \g a )^2 \geq 0 $$

Polynomial functions of a kernel output

Given a polynomial $P : \R \rightarrow \R$ with positive coefficients, the function $$ K(\g x, \g x^\prime) = P\left ( K_1 (\g x, \g x^\prime) \right) $$ is a valid kernel.

Proof: The polynomial $P$ is a linear combination of powers of the kernel $K_1$ with positive coefficients. Since the powers of $K_1$ are products of $K_1$ by itself and thus valid kernels, their linear combination is also a valid kernel.

Exponential function of a kernel output

The function $$ K(\g x, \g x^\prime) = \exp\left ( K_1 (\g x, \g x^\prime) \right) $$ is a valid kernel.

Proof: take Taylor series, limit of polynomial case...