Kernel functions are at the basis of the kernel trick which allows many learning algorithms based on linear models to build nonlinear models easily. A kernel function takes two input vectors as arguments and returns a real value corresponding to the inner product of the images of these vectors in some feature space without having to actually map the points to the feature space.

Techincally, a kernel function is a positive definite function associated to a reproducing kernel Hilbert space (RKHS) which must fulfill the basic requirement of corresponding to the inner product of the images of its arguments in some feature space.

Common kernel functions include the linear kernel, the polynomial kernel and the Gaussian kernel. Other kernel functions can be constructed from these basic kernels.

You can click in the plot on the left to add points

and see where they are mapped to in feature space.

The linear kernel is simply defined by $$ \forall (\g x, \g x^\prime)\in\X^2,\quad K(\g x, \g x^\prime) = \inner{\g x}{\g x^\prime}_{\X} $$ and obviously satisfies the requirements regarding its equivalence with some inner product in a feature space, here, simply $\X$.

The polynomial kernel of degree $\gamma$ can be defined either in a homogeneous form $$ \forall (\g x, \g x^\prime)\in\X^2,\quad K(\g x, \g x^\prime) = \left(\inner{\g x}{\g x^\prime}_{\X}\right)^\gamma $$ or in an inhomogeneous form $$ \forall (\g x, \g x^\prime)\in\X^2,\quad K(\g x, \g x^\prime) = \left(\inner{\g x}{\g x^\prime}_{\X} + 1\right)^\gamma . $$

The polynomial kernel is a valid kernel.
**Proof:**
The polynommial kernel can be written with scalar multiplications, sums and products of kernels, which, according to the rules for constructing kernels is a valid kernel. More precisely, let $K_1$ be the linear kernel, then for the homogeneous polynomial kernel, we have
$$
K(\g x, \g x^\prime) = \left(K_1(\g x, \g x^\prime) \right)^\gamma
$$
while the inhomogeneous kernel can be written as
$$
K(\g x, \g x^\prime) = \left(K_1(\g x, \g x^\prime) + 1\right)^\gamma = \left(K_2(\g x, \g x^\prime) \right)^\gamma .
$$
The fact that $K_2 = K_1 + 1$ is a valid kernel can be seen by defining the feature map $\phi_2 = [\phi_1,\ 1]^T$ with $\phi_1$ the feature map associated to $K_1$. We have
$$
\inner{\phi_2(\g x)}{\phi_2(\g x^\prime)} = \inner{\phi_1(\g x)}{\phi_1(\g x^\prime)} + 1 = K_1(\g x, \g x^\prime) + 1 = K_2(\g x, \g x^\prime) ,
$$
and thus $K_2$ corresponds to an inner product in some feature space and is a valid kernel.

The Gaussian kernel of bandwidth $\sigma>0$ defined by $$ \forall (\g x, \g x^\prime)\in\X^2,\quad K(\g x, \g x^\prime) = \exp\left(- \frac{\|\g x - \g x^\prime\|_2^2}{2\sigma^2}\right) $$

is a valid kernel.
**Proof:**
Let define
$$
\phi_1(\g x) = \exp\left(\frac{-1}{2\sigma^2} \|\g x\|_2^2\right)
$$
and the valid kernel functions
$$
K_1(\g x, \g x^\prime) = \inner{\phi_1(\g x)}{\phi_1(\g x^\prime)} = \exp\left(\frac{-1}{2\sigma^2} \|\g x\|_2^2\right) \exp\left(\frac{-1}{2\sigma^2} \|\g x^\prime \|_2^2\right) = \exp\left(\frac{-1}{2\sigma^2} \left[\|\g x\|_2^2 + \|\g x^\prime\|_2^2 \right] \right)
$$
and
$$
K_2(\g x, \g x^\prime) = \exp \left( \frac{1}{\sigma^2} \inner{\g x}{\g x^\prime} \right) .
$$
Note that $K_2$ is valid since it is the exponential of the linear kernel multiplied by a positive scalar.

Then, we can rewrite $K$ as
\begin{align}
K(\g x, \g x^\prime)& = \exp\left(- \frac{\|\g x - \g x^\prime\|_2^2}{2\sigma^2}\right)
= \exp\left(\frac{-1}{2\sigma^2}(\g x - \g x^\prime)^T(\g x - \g x^\prime)\right) \\
&= \exp\left(\frac{-1}{2\sigma^2}\left[ \|\g x\|_2^2 + \|\g x^\prime\|_2^2 - 2\inner{\g x}{\g x^\prime}\right] \right) \\
&= \exp\left(\frac{-1}{2\sigma^2}\left[ \|\g x\|_2^2 + \|\g x^\prime\|_2^2 \right]\right) \exp\left( \frac{1}{\sigma^2}\inner{\g x}{\g x^\prime} \right) \\
&= K_1(\g x, \g x^\prime) K_2(\g x, \g x^\prime) .
\end{align}
Thus, $K$ is a product of two valid kernels and a valid kernel itself.