In binary classification, the Vapnik-Chervonenkis dimension (VC-dimension) is a measure of the capacity of a function class that can be used to derive error bounds for infinite function classes.
The VC-dimension is based on the notion of shattering. We say that a set of points is shattered by a function class, if for all possible labeling of these points, there is a classifier in the function class that can produce this labeling.
The VC-dimension of a function class is defined as the maximal number of points that can be shattered by this function class. In other words, it is the largest number of points for which the growth function reaches its combinatorial upper bound. If the bound is reached for all numbers of points, then the VC-dimension is infinite.
The VC-dimension of linear classifiers is proportional to the data dimension, and thus the number of parameters for the classifiers. However, there are examples of function classes with a single parameter and infinite VC-dimension.
Here, we illustrate how the class of linear classifiers shatters a set of 3 points in $\R^2$.
$y_1$ | $y_2$ | $y_3$ |
+1 | +1 | +1 |
+1 | +1 | -1 |
+1 | -1 | +1 |
+1 | -1 | -1 |
-1 | +1 | +1 |
-1 | +1 | -1 |
-1 | -1 | +1 |
-1 | -1 | -1 |
Hover over the different labelings to see a linear classifier that produces this classification.
Since there is no set of 4 points shattered by the class of linear classifiers in $\R^2$, the VC-dimension of these classifiers is 3 (see a detailed proof in arbitrary dimension).
In the plane, $\R^2$, the VC-dimension of the set of linear classifiers is 3, because there are sets of 3 points that can be shattered by linear classifiers (see above). Yet, this does not mean that any set of 3 points can be shattered, as illustrated by the plot on the left:
no linear classifier can produce this classification (no line can separate the blue points from the red one).
In a classification setting with labels in $\Y$, a set of $n$ points $S=\{x_1,\dots,x_n\}$ is said to be shattered by a function class $\F$ if $$ \forall \g y \in \Y^n,\quad \exists f\in\F, \mbox{ such that } f(x_i) = y_i,\ i=1,\dots,n. $$
The VC-dimension of a function class $\F$ (of binary classifiers), denoted $\VC(\F)$, is the size of the largest set of points that can be shattered by $\F$.
Another equivalent definition based on the growth function, $\Pi_{\F}(n) $, of $\F$ defines the VC-dimension as the largest $n$ such that $$ \Pi_{\F}(n) = 2^n . $$ If this equality holds for all $n>0$, then the VC-dimension is infinite.
Note that it is sufficient to find a particular set of points $S=\{x_1,\dots,x_n\}$ that can be shattered by $\F$ to conclude that $\VC(\F) \geq n$. But on the other hand, having $\VC(\F)=n$ does not imply that $\F$ can shatter all the sets of $n$ points.