Function classes with infinite VC-dimension

In words...

In binary classification, the VC-dimension is a measure of the capcity of a function class that can be used to derive error bounds for infinite function classes.

The VC-dimension of the set of classifiers that output the sign of a sine wave parametrized by a single parameter (the angular frequency of the sine wave) is infinite.

Another example of function class with infinite VC-dimension is the class of functions implemented by the nearest neighbor algorithm (the $K$-nearest neighbors algorithm using a single neighbor).

In pictures...

The sine slalom

For any number of points and any labeling of these points, we can fix their locations such that there is a sine wave that slaloms around these points by passing below the blue points and above the red ones.

The plot shows the classifier computing the
possible labelings
This classifier has parameter
$\omega = $

In maths...

We will show the following result on the VC-dimension of a set of classifier with a single parameter.

The set of classifiers $$ \F = \{f\ :\ f(x) = \sign\left(\sin(\omega x)\right),\ \omega \geq 0\}, $$ operating in one dimension with input space $\X = [0,2\pi]$, is infinite, i.e., $$ \VC(\F) = +\infty. $$

Proof: Consider the labeled data set $\{(2\pi 10^{-i}, y_i)\}_{i=1}^n$ and choose, for any such data set, the parameter $$ \omega = \frac{1}{2}\left( 1 + \sum_{i=1}^n \frac{1-y_i}{2} 10^i \right). $$

Step 1: correctly predicting negative labels
The parameter can be rewritten as $$ \omega = \frac{1}{2}\left(1 + \sum_{\{i\ :\ y_i=-1\}} 10^i \right). $$ where we observe that, for any point $x_j =2\pi 10^{-j} $ in the considered data set such that $y_j=-1$, the term $10^{j}$ appears in the sum. This leads to \begin{align*} \omega x_j &= \pi 10^{-j}\left(1 + \sum_{\{i\ :\ y_i=-1\}} 10^i \right)\\ & = \pi 10^{-j}\left(1 + 10^j + \sum_{\{i\ :\ y_i=-1,\ i\neq j\}} 10^i \right)\\ &= \pi \left( 10^{-j} + 1 + \sum_{\{i\ :\ y_i=-1,\ i\neq j\}} 10^{i-j} \right)\\ &= \pi \left( 10^{-j} + 1 + \sum_{\{i \ :\ y_i=-1,\ i > j\}} 10^{i-j} + \sum_{\{i\ :\ y_i=-1,\ i < j\}} 10^{i-j} \right) \end{align*} For all $i>j$, the terms $10^{i-j}$ are positive powers of $10$ and thus are even numbers that can be written as $2k_i$ for some $k_i \in \N$. Therefore, we have $$ \sum_{\{i \ :\ y_i=-1,\ i > j\}} 10^{i-j} = \sum_{\{i \ :\ y_i=-1,\ i > j\}} 2 k_i = 2 k $$ for some $k\in \N$, which gives \begin{align*} \omega x_j &= \pi \left( 10^{-j} + 1 + \sum_{\{i \ :\ y_i=-1,\ i < j\}} 10^{i-j} \right) + 2k\pi . \end{align*} Regarding the remaining sum, we have $$ \sum_{\{i\ :\ y_i=-1,\ i < j\}} 10^{i-j} < \sum_{i=1}^{+\infty} 10^{-i} = \sum_{i=0}^{+\infty} 10^{-i} - 1 $$ The geometric series $S_n = \sum_{i=0}^{n-1} 10^{-i}$, of first term $1$ and common ratio $0.1$, is known to converge towards $$ \sum_{i=0}^{+\infty} 10^{-i} = \frac{1}{1-0.1}, $$ which gives $$ \sum_{\{i\ :\ y_i=-1,\ i < j\}} 10^{i-j} < \frac{1}{1-0.1} -1 = \frac{1}{9}. $$ Let define $$ \epsilon = 10^{-j} + \sum_{\{i\ :\ y_i=-1,\ i < j\}} 10^{i-j}. $$ and rewrite $\omega x_j$ as $$ \omega x_j = \pi \left( 1 + \epsilon \right) + 2k\pi . $$ Given that $10^{-j} \leq 0.1$ and the previous result, we have $0< \epsilon < 1/9 + 1/10 = 1$, and thus $$ \pi < \pi(1+\epsilon) < 2\pi \quad \Rightarrow\quad \sin(\omega x_j) < 0. $$ Hence, the classifier correctly predicts all negative labels $y_j = -1 = \sign(\sin(\omega x_j) ) = f(x_j)$.

Step 2: correctly predicting positive labels
The same steps can be reproduce with positive labels $y_j=+1$ with the difference that the term $10^j$ does not appear in the sum defning $\omega$. This leads to \begin{align*} \omega x_j &= \pi 10^{-j} \left( 1 + \sum_{\{i \ :\ y_i=-1,\ i \neq j\}} 10^{i} \right)\\ &= \pi \left( 10^{-j} + \sum_{\{i \ :\ y_i=-1,\ i > j\}} 10^{i-j} + \sum_{\{i \ :\ y_i=-1,\ i < j\}} 10^{i-j} \right)\\ & = \pi\epsilon + 2k\pi \end{align*} with $$ 0 < \pi \epsilon < \pi\quad \Rightarrow\quad \sin(\omega x_j) > 0. $$ Thus, all positively labeled points are also correctly classified by $f$ using the particular choice of $\omega$. Since the steps above are valid for any labeling of the points, we proved that $\F$ shatters the set of points. In addition, the proof is valid for any number of points $n$, which shows that $\F$ can shatter sets of points of any size and thus has infinite VC-dimension.

1-nearest neighbor

We will show the following result on the VC-dimension for the $K$-nearest neighbors algorithm using $K=1$.

Let $\F$ denote the set of classifiers that are implemented by the $K$-nearest neighbors algorithm using $K=1$. Then, $$ \VC(\F) = +\infty. $$

To see this, simply observe that by definition the nearest neighbor algorithm assigns every point of the training set to its category: $f(x_i) = y_i$, whatever the particular value of $y_i$ is. Since this remains true for any training set size, we know that the algorithm can shatter a set of any size, simply by taking this set as the training set. Therefore, its VC-dimension cannot be bounded and is infinite.