The analysis of infinite sets of classifiers relies on the projection of the function class onto a set of points. This projection yields the set of all possible outcomes in terms of function values computed on these points. For instance, if we project a function class with two different classifiers that produce the same classification of a few points on these points, we obtain a single projection. Thus, we can reduce the number of elements to analyze from the number of classifiers to the number of projections. Given that the outputs of classifiers take their values in a finite set of labels, we can reduce the analysis of infinite function classes to the analysis of finite projected sets.
In this analysis, an important quantity for a function class is the growth function which gives the number of possible labelings of a set of points as a function of the number of points. The growth function allows us to measure the "meaningful size" of a function class with respect to a given problem and data set size. This measure of the capacity of a function class can be used to derive an error bound via the symmetrization Lemma.
For a binary classification problem, the output of a classifier has only two possible values and the growth function is bounded by the number of possible combinations of these two labels. This bound is at the basis of the definition of the VC-dimension.
The projection of $\F$ onto $S$ is the set of all possible classifications of the three points by one of the 3 classifiers: \begin{align*} \F_S = \{ & ( f_1(x_1) ,\ f_1(x_2),\ f_1(x_3) ),\\ & ( f_2(x_1) ,\ f_2(x_2),\ f_2(x_3) ),\\ & ( f_3(x_1) ,\ f_3(x_2),\ f_3(x_3) ) \} \end{align*} It has a maximal size corresponding to the size of the set of classifiers $\F$, i.e., 3.
Here are the labels computed by the 3 classifiers (hover over the rows in the table to see the classification):$x_1$ | $x_2$ | $x_3$ | |
$f_1(x)$ | +1 | -1 | -1 |
$f_2(x)$ | +1 | +1 | -1 |
$f_3(x)$ | +1 | -1 | -1 |
Since the labels computed by $f_1$ and $f_3$ are the same, the projection reduces to 2 possible labelings: \begin{align*} \F_S = \{ & ( +1,-1,-1 ),\\ & (+1,+1,-1) \} \end{align*}
$y_1$ | $y_2$ | $y_3$ |
+1 | +1 | +1 |
+1 | +1 | -1 |
+1 | -1 | +1 |
+1 | -1 | -1 |
-1 | +1 | +1 |
-1 | +1 | -1 |
-1 | -1 | +1 |
-1 | -1 | -1 |
Exercise: For each labeling, try to find a linear classifier (i.e., a line) that produces this classification.
(hover over the rows to see the solution)
Since there is a linear classifier for every possible combination of labels, the value of the growth function for 3 points (at $n=3$) is the number of combinations: 8.
Now, click in the plot to add a point and compute the growth function value at $n=4$...
The projection of a function class $\F \subseteq \Y^{\X}$ on a set of points $S=\{x_1,\dots,x_n\}\subset \X^n$ is the set of labelings (label vectors) $$ \F_S = \left\{ (f(x_1), \dots, f(x_n) )\ :\ f\in\F \right\} \subseteq \Y^n. $$ Note that for all finite sets of categories $\Y$, this projection is a finite set, even if $\F$ is an infinite function class.
The growth function, $\Pi_{\F}(n)$, of a function class $\F$ is the maximal number of possible labelings of a set of points $S$ of size $n$ by a function in $\F$: $$ \Pi_{\F}(n) = \sup_{S\in \X^n } |\F_S|, $$ where $|\F_S|$ is the cardinality of a projection.
In binary classification, we have $\Y= \{-1,+1\}$ and $|\Y| = 2$. Thus, any projection $\F_S \subseteq \Y^{|S|}$ has cardinality bounded by $2^{|S|}$ (the size of the set $\Y^{|S|}$). As a result, the growth function is upper bounded by $$ \Pi_{\F}(n) \leq 2^n, $$ on which is based the definition of the VC-dimension.