A kernel function is a function of two arguments that implicitly computes an inner product between images of its arguments in some feature space. There are two major techniques to construct valid kernel functions: either from an explicit feature map or from other valid kernel functions, such as these common kernels.

For instance, the sum or product of two kernel functions is a kernel function. A kernel function multiplied by a scalar remains a kernel function. The composition of two kernel functions is also a kernel function.

Given an explicit feature map \begin{align*} \phi : & \X \rightarrow \X^\phi \\ & \g x \mapsto \phi(\g x) , \end{align*} a valid kernel function can be defined as $$ K(\g x, \g x^\prime) = \inner{\phi(\g x)}{\phi(\g x^\prime)}_{\X^\phi} . $$

Notably, for any function $f : \X \rightarrow \R$, the function $$ K(\g x, \g x^\prime) = f(\g x) f(\g x^\prime) $$ is a valid kernel function.

**Proof:**
Using the feature map $\phi : \g x\mapsto f(\g x)$, we have
$$
K(\g x, \g x^\prime) = f(\g x) f(\g x^\prime) = \inner{\phi(\g x)}{\phi(\g x^\prime)}_{\X^\phi} .
$$

A function is a valid **kernel function** if it is a real-valued positive definite function, whose definition is recalled below.

A real-valued function $K$ on $\X^2$ is called a positive definite function if it is symmetric and \begin{align*} \forall n\in\mathbb{N}^*,\ \forall \{\g x_i\}_{i=1}^n \in \X^n,\ \forall \{a_i\}_{i=1}^n \in \R^n, \quad \sum_{i=1}^n\sum_{j=1}^n a_i a_j K(\g x_i, \g x_j) \geq 0 . \end{align*}

This condition can be written more compactly in terms of the positive semi-definiteness of the kernel matrix (or Gram matrix), which is defined for a data set $\{\g x_i\}_{i=1}^n\in\X^n$ as $$ \g K = \begin{bmatrix} K(\g x_1, \g x_1) & \dots & K(\g x_1,\g x_n) \\ \vdots\\ K(\g x_n, \g x_1) & \dots & K(\g x_n,\g x_n) \\ \end{bmatrix} . $$ With these notations a function $K$ is a valid kernel if, for any set of points $\{\g x_i\}_{i=1}^n\in\X^n$, it gives rise to a kernel matrix $\g K$ that is positive semi-definite, i.e., if $$ \forall \g a \in \R^n, \quad \g a^T \g K \g a \geq 0 . $$

In all the following, $K_1$ and $K_2$ are assumed to be valid kernel functions and $\g K_1$ and $\g K_2$ their respective Gram matrices.

The validity of a kernel is conserved after multiplication by a positive scalar, i.e., for any $\alpha\geq 0$, the function $$ K(\g x, \g x^\prime) = \alpha K_1 (\g x, \g x^\prime) $$ is a valid kernel.

**Proof:**
By construction, the Gram matrix is given by
$$
\g K = \alpha \g K_1
$$
which implies that
$$
\forall \g a \in \R^n, \quad \g a^T \g K \g a = \alpha \g a^T \g K_1 \g a \geq 0
$$
due to the positivity of $\alpha$ and the validity of $K_1$.

For any positive constant $\alpha \geq 0$, the function $$ K(\g x, \g x^\prime) = K_1 (\g x, \g x^\prime) + \alpha $$ is a valid kernel function.

**Proof:**
Let $\phi_1$ denote a feature map of $K_1$. Then, using the feature map $\phi : \g x\mapsto [\phi_1(\g x),\ \sqrt{\alpha}]^T$, we have
$$
\inner{\phi(\g x)}{\phi(\g x^\prime)} = \inner{\phi_1(\g x)}{\phi_1(\g x^\prime)} + \alpha = K_1(\g x, \g x^\prime) + \alpha = K(\g x, \g x^\prime).
$$

A linear combination of kernel functions involving only positive weights, i.e., $$ K(\g x, \g x^\prime) = \sum_{j=1}^m \alpha_j K_j (\g x, \g x^\prime),\qquad \mbox{with } \alpha_j \geq 0, $$ is a valid kernel function.

**Proof:**
By construction, the Gram matrix is given by
$$
\g K = \sum_{j=1}^m\alpha_j \g K_j
$$
which implies that
$$
\forall \g a \in \R^n, \quad \g a^T \g K \g a = \sum_{j=1}^m\alpha_j \g a^T \g K_j \g a \geq 0
$$
due to the positivity of the $\alpha_j$ and the validity of the kernels $K_j$.

The product of two kernel functions, i.e., $$ K(\g x, \g x^\prime) = K_1 (\g x, \g x^\prime) K_2 (\g x, \g x^\prime), $$ is a valid kernel function.

**Proof:**
By construction, the Gram matrix is given by
$$
\g K = \g K_1 \odot \g K_2 ,
$$
where $\odot$ denotes the Hadamard (entrywise) product. Given that $\g K_1$ and $\g K_2$ are symmetric positive semi-definite matrices, their eigendecompositions
$$
\g K_1 = \sum_{i=1}^n \lambda_i \g u_i \g u_i^T ,\quad \mbox{ and } \quad \g K_2 = \sum_{j=1}^n \mu_j \g v_j \g v_j^T ,
$$
have positive eigenvalues $\lambda_i \geq 0$ and $\mu_i\geq 0$. This leads to
\begin{align}
\g K &= \sum_{i=1}^n \sum_{j=1}^n \lambda_i \mu_j (\g u_i \g u_i^T) \odot ( \g v_j \g v_j^T ) \\
&= \sum_{i=1}^n \sum_{j=1}^n \lambda_i \mu_j (\g u_i \odot \g v_j ) (\g u_i \odot \g v_j )^T \\
&= \sum_{k=1}^{n^2} \gamma_k \g w_k \g w_k^T,
\end{align}
with $\gamma_k = \lambda_{\lfloor k/n \rfloor} \mu_{k\ mod\ n} \geq 0$ and $\g w_k = \g u_{\lfloor k/n \rfloor} \odot \g v_{k\ mod\ n}$. Thus,
$$
\forall \g a \in \R^n, \quad \g a^T \g K \g a = \sum_{k=1}^{n^2} \gamma_k \g a^T \g w_k \g w_k^T \g a = \sum_{k=1}^{n^2} \gamma_k (\g w_k^T \g a )^2 \geq 0
$$

**Proof:**
The polynomial $P$ is a linear combination of powers of the kernel $K_1$ with positive coefficients. Since the powers of $K_1$ are products of $K_1$ by itself and thus valid kernels, their linear combination is also a valid kernel.

**Proof:**
take Taylor series, limit of polynomial case...