Common kernel functions

In words...

Kernel functions are at the basis of the kernel trick which allows many learning algorithms based on linear models to build nonlinear models easily. A kernel function takes two input vectors as arguments and returns a real value corresponding to the inner product of the images of these vectors in some feature space without having to actually map the points to the feature space.

Techincally, a kernel function is a positive definite function associated to a reproducing kernel Hilbert space (RKHS) which must fulfill the basic requirement of corresponding to the inner product of the images of its arguments in some feature space.

Common kernel functions include the linear kernel, the polynomial kernel and the Gaussian kernel. Other kernel functions can be constructed from these basic kernels.

In pictures...

Polynomial kernel

The plots below show the input and feature spaces of an inhomogeneous polynomial kernel of degree 3 defined on $\X = \R$.

Mapped to feature space =>

You can click in the plot on the left to add points
and see where they are mapped to in feature space.





In maths...

Linear kernel

The linear kernel is simply defined by $$ \forall (\g x, \g x^\prime)\in\X^2,\quad K(\g x, \g x^\prime) = \inner{\g x}{\g x^\prime}_{\X} $$ and obviously satisfies the requirements regarding its equivalence with some inner product in a feature space, here, simply $\X$.

Polynomial kernel

The polynomial kernel of degree $\gamma$ can be defined either in a homogeneous form $$ \forall (\g x, \g x^\prime)\in\X^2,\quad K(\g x, \g x^\prime) = \left(\inner{\g x}{\g x^\prime}_{\X}\right)^\gamma $$ or in an inhomogeneous form $$ \forall (\g x, \g x^\prime)\in\X^2,\quad K(\g x, \g x^\prime) = \left(\inner{\g x}{\g x^\prime}_{\X} + 1\right)^\gamma . $$

The polynomial kernel is a valid kernel.

Proof: The polynommial kernel can be written with scalar multiplications, sums and products of kernels, which, according to the rules for constructing kernels is a valid kernel. More precisely, let $K_1$ be the linear kernel, then for the homogeneous polynomial kernel, we have $$ K(\g x, \g x^\prime) = \left(K_1(\g x, \g x^\prime) \right)^\gamma $$ while the inhomogeneous kernel can be written as $$ K(\g x, \g x^\prime) = \left(K_1(\g x, \g x^\prime) + 1\right)^\gamma = \left(K_2(\g x, \g x^\prime) \right)^\gamma . $$ The fact that $K_2 = K_1 + 1$ is a valid kernel can be seen by defining the feature map $\phi_2 = [\phi_1,\ 1]^T$ with $\phi_1$ the feature map associated to $K_1$. We have $$ \inner{\phi_2(\g x)}{\phi_2(\g x^\prime)} = \inner{\phi_1(\g x)}{\phi_1(\g x^\prime)} + 1 = K_1(\g x, \g x^\prime) + 1 = K_2(\g x, \g x^\prime) , $$ and thus $K_2$ corresponds to an inner product in some feature space and is a valid kernel.

Gaussian kernel

The Gaussian kernel of bandwidth $\sigma>0$ defined by $$ \forall (\g x, \g x^\prime)\in\X^2,\quad K(\g x, \g x^\prime) = \exp\left(- \frac{\|\g x - \g x^\prime\|_2^2}{2\sigma^2}\right) $$

is a valid kernel.

Proof: Let define $$ \phi_1(\g x) = \exp\left(\frac{-1}{2\sigma^2} \|\g x\|_2^2\right) $$ and the valid kernel functions $$ K_1(\g x, \g x^\prime) = \inner{\phi_1(\g x)}{\phi_1(\g x^\prime)} = \exp\left(\frac{-1}{2\sigma^2} \|\g x\|_2^2\right) \exp\left(\frac{-1}{2\sigma^2} \|\g x^\prime \|_2^2\right) = \exp\left(\frac{-1}{2\sigma^2} \left[\|\g x\|_2^2 + \|\g x^\prime\|_2^2 \right] \right) $$ and $$ K_2(\g x, \g x^\prime) = \exp \left( \frac{1}{\sigma^2} \inner{\g x}{\g x^\prime} \right) . $$ Note that $K_2$ is valid since it is the exponential of the linear kernel multiplied by a positive scalar.
Then, we can rewrite $K$ as \begin{align} K(\g x, \g x^\prime)& = \exp\left(- \frac{\|\g x - \g x^\prime\|_2^2}{2\sigma^2}\right) = \exp\left(\frac{-1}{2\sigma^2}(\g x - \g x^\prime)^T(\g x - \g x^\prime)\right) \\ &= \exp\left(\frac{-1}{2\sigma^2}\left[ \|\g x\|_2^2 + \|\g x^\prime\|_2^2 - 2\inner{\g x}{\g x^\prime}\right] \right) \\ &= \exp\left(\frac{-1}{2\sigma^2}\left[ \|\g x\|_2^2 + \|\g x^\prime\|_2^2 \right]\right) \exp\left( \frac{1}{\sigma^2}\inner{\g x}{\g x^\prime} \right) \\ &= K_1(\g x, \g x^\prime) K_2(\g x, \g x^\prime) . \end{align} Thus, $K$ is a product of two valid kernels and a valid kernel itself.