Spectral clustering is an efficient and versatile method for difficult or complex clustering problems. It is rather different from center-based methods in that it does not build local models of the groups. As a result, the clustering is not based on a predefined model structure, but rather on an assumption on the connectivity of the points: informally, there should be a path between any two points of a group that jumps from points to points with only small jumps, while there is no such path for points of different groups.
An important feature of spectral clustering is that it can easily separate between two arbitrarily shaped groups of points as soon as these are well enough separated, meaning that that the smallest distance between points of different groups is significantly larger than the largest distance between any two neighboring points of a group.
Spectral clustering computations are based on a symmetric affinity matrix containing a measure of similarity for any pair of points. While the default method uses a similarity measure based on the Euclidean distance, this makes the method very versatile as it sufficies to define a new similarity measure suitable for a problem to dedicate the method to this problem and its data distribution.
Spectral clustering can be interpreted as a convex relaxation of the normalized-cut problem, a particularly difficult instance of graph-cut. As such, it offers a computational adavantage as it relies exclusively on standard linear algebra operations.
Given a data set $\{\g x_i\}_{i=1}^N$ to cluster into $n$ groups, spectral clustering applies the following procedure.
The affinity matrix $\g A$ can be interpreted as the adjacency matrix of the so-called similarity graph $G=(V,E)$, an undirected weighted graph of vertices $V = \{\g x_i\}_{i=1}^N$ and edges $E$. Points with 0 similarity appear disconnected in this graph, while positive similarities weight the edges between vertices representing the data points.
The second step of the algorithm computes the graph Laplacian $$ \g L = \g D - \g A , $$ where $\g D$ is the degree matrix, i.e., a diagonal matrix of entries $D_{ii}$ corresponding to the sum of the weights of the edges connected to the vertex $V_i$: $\g D = diag(\g A\g 1)$. The graph Laplacian has a number of important properties, including the following.
Proof: \begin{align*} \g v^T \g L \g v & = \g v^T \g D \g v - \g v^T \g A \g v \\ & = \sum_{i=1}^N D_{ii} v_i^2 - \sum_{i=1}^N \sum_{j=1}^N A_{ij} v_i v_j \\ & = \frac{1}{2}\sum_{i=1}^N D_{ii} v_i^2 + \frac{1}{2} \sum_{j=1}^N D_{jj} v_j^2 - \sum_{i=1}^N \sum_{j=1}^N A_{ij} v_i v_j\\ & = \frac{1}{2}\left(\sum_{i=1}^N \sum_{j=1}^N A_{ij} v_i^2 - 2\sum_{i=1}^N \sum_{j=1}^N A_{ij} v_i v_j + \sum_{j=1}^N \sum_{i=1}^N A_{ij} v_j^2 \right) & (\mbox{due to } D_{ii} = \sum_{j=1}^N A_{ij} ) \\ & = \frac{1}{2}\sum_{i=1}^N \sum_{j=1}^N A_{ij} \left( v_i^2 - 2 v_i v_j + v_j^2 \right) \\ & = \frac{1}{2}\sum_{i=1}^N \sum_{j=1}^N A_{ij} \left( v_i - v_j \right)^2 \end{align*}
Proof: By definition of the affinity matrix, $A_{ij} \geq 0$, $i=1,\dots,N$, $j=1,\dots,N$. Thus, by Property 1 above, $\g v^T \g L \g v \geq 0$ for all $\g v\in\R^N$.
Proof: We can check that $\g u_1$ is an eigenvector with eigenvalue $\lambda_1$: $$\g L \g u_1 = \g L \g 1 = \g D \g 1 - \g A \g 1 = diag(\g D) - diag(\g D) = \g 0 = \lambda_1 \g u_1.$$ In addition, by Property 2, there cannot be negative eigenvalues and thus $\lambda_1$ is the smallest one.
Let $G$ be an undirected weighted graph with non-negative weights, $A_{ij} \geq 0$. Then, the multiplicity $n$ of the eigenvalue 0 of its graph Laplacian $\g L$ equals the number of connected components, $G_1 \subseteq V, \dots, G_k\subseteq V$ in the graph. In addition, the eigenspace of eigenvalue 0 is spanned by the indicator vectors $\g 1_{G_1}, \dots, \g 1_{G_n}$ of these components where $(\g 1_S)_i = 1$ if $i\in S$ and 0 otherwise.
Proof:
Consider first the case $n=1$, i.e., when $G$ forms a single connected component. Then, if $\g v\in \R^N$ is an eigenvector of $\g L$ with eigenvalue 0, $\g L \g v = \g 0$ and $\g v^T \g L \g v = 0$. Thus, by the properties above,
$$
\frac{1}{2}\sum_{i=1}^N \sum_{j=1}^N A_{ij} ( v_i - v_j)^2 = 0.
$$
For this to hold with non-negative weights $A_{ij}$, we need all the terms of the sum to be 0, and thus $v_i = v_j$ for all pairs $(i,j)$ such that $A_{ij} \neq 0$, resulting in $\g v$ being a constant vector if all pairs of vertices are connected with a path in the graph. Thus, $\g v = \g 1$ and the eigenvalue 0 has multiplicity 1 as there is no other linearly independent eigenvector associated to this value.
For the case $n\geq 2$, the adjacency matrix is block-diagonal since, for all $i$, $A_{ij} = 0$ for any $j$ such that $V_j$ is not connected to $V_i$. As a direct consequence, the Laplacian is also block-diagonal:
$$
\g L = \begin{bmatrix} \g L_1 & & & \\ & \g L_2 & & \\ & & \ddots & \\ & & & \g L_n
\end{bmatrix}
$$
with one block for each connected component. As for any block-diagonal matrix, its eigenvalues are the eigenvalues of its blocks $\g L_i$ and the eigenvectors are given by the eigenvectors of $\g L_i$ filled with zeros on the entries aligned with the other blocks. In addition, each block $\g L_i$ is a proper graph Laplacian for a subgraph corresponding to the $i$th connected component $G_i$ and as such enjoys the property that $\g u_i= \g 1$ is an eigenvector with eigenvalue 0.
Therefore, $\g L$ has eigenvalue 0 with multiplicity $n$ and the corresponding eigenvectors are indicator vectors of the connected components.
These properties give us access to the indicator vectors of the graph components simply by computing a few eigenvectors of its graph Laplacian. This means that, if we can build the graph such that the connected components represent the desired clusters, we can efficiently recover the desired clustering of the data points.
Therefore, the most difficult part in applying spectral clustering consists in crafting a suitable similarity measure in order to build a relevant similarity graph with a clustered structure, or, equivalently, a block-diagonal affinity matrix. In most practical cases, the affinity matrix does not have this ideal block-diagonal structure, but nonetheless can be close to that, meaning that it is the sum of a block-diagonal matrix and one with small entries. In this case, K-means should be able to recover the correct clustering in eigenspace even if the eigenvectors are not exactly piecewise constant over the clusters.
A $K$-nearest neighbors graph is constructed by choosing $$ s(\g x_i, \g x_j) = \begin{cases} s^\prime (\g x_i, \g x_j), & \mbox{if } \g x_j \in \mathcal{N}_K(\g x_i) \mbox{ or } \g x_i \in \mathcal{N}_K(\g x_j) \\ 0, &\mbox{otherwise,} \end{cases} $$ where $\mathcal{N}_K(\g x)$ is the set of $K$-nearest neighbors of $\g x$ in the data set and $s^\prime (\g x_i, \g x_j)$ is a continuous similarity function such as the one below.
Another possibility based on $K$-nearest neighbors is to use $$ s(\g x_i, \g x_j) = \begin{cases} s^\prime (\g x_i, \g x_j), & \mbox{if } \g x_j \in \mathcal{N}_K(\g x_i) \mbox{ and } \g x_i \in \mathcal{N}_K(\g x_j) \\ 0, &\mbox{otherwise,} \end{cases} $$ which results in a mutual $K$-nearest neighbors graph.
A fully connected graph is obtained with the Gaussian RBF similarity function (or the Gaussian RBF kernel) $$ s(\g x_i, \g x_j) = \exp\left(\frac{-\|\g x_i - \g x_j\|^2}{2\sigma^2} \right), $$ where $\sigma$ is the parameter tuning the bandwidth of the Gaussian, i.e., the width of the neighborhood within which the points are considered as similar.
Despite resulting in a fully connected graph, this similarity measure is often used in practice as most entries in the affinity matrix will be close to 0 and thus playing a role close to nonexistant edges.