Uniform error bounds for finite function classes

In words...

To derive useful error bounds such that these bounds hold for a particular classifier returned by an algorithm after seeing the data, we must resort to a worst-case analysis. Indeed, the classifier chosen by the algorithm depends on the data and by deriving probabilistic results in terms of random draws of the data, this classifier also becomes a random object. A worst-case analysis provides bounds that hold for all classifiers (in a predefined set) and thus we can guarantee that the result holds for a random one.

Technically, such a worst-case analysis leads us to consider uniform deviations of the risk, i.e., to bound the supremum over all classifiers of the deviation between the empirical and true risks.

The most simple bound of this type is obtained for finite sets of classifiers by using the union bound.

Since the obtained result depends on the number of classifiers in the given set, it becomes useless for sets with an infinite number of classifiers. In this case, we must resort to more advanced techniques.

In pictures...

In maths...

We are now interested in uniform bounds of the form $$ \sup_{f\in \F} (R(f)- R_{emp}(f)) \leq \epsilon , $$ where $\F$ is the set of all classifiers possibly returned by the considered algorithm. In order to obtain such bounds, we must derive bounds that hold for all $f\in\F$.

Here, we consider finite function classes $\F$, with $|\F| = n$ classifiers. Let us first focus on the case where the algorithm has to choose between only $n=2$ classifiers, say $f_1$ and $f_2$. Denote by $A_i$ the event corresponding to the observation of a training set leading to $f_i$ having a large deviation between its empirical and expected risks. Then, $$ P(A_i) = P^N \left\{ R(f_i) - R_{emp}(f_i) > \epsilon \right\} \leq \exp\left(-2N \epsilon^2\right) , $$ where the bound on the right-hand side comes from my first and trivial error bound.
The probability of drawing a training set leading to a bad estimation of the risk for either one of the two classifiers is $P(A_1 \cup A_2)$ and can be bounded as $$ P(A_1 \cup\ A_2) \leq P(A_1) + P(A_2) \leq 2\exp\left(-2N\epsilon^2\right) $$ by using the union bound. Extending this reasoning to the case $n>2$ and using the general form of the union bound, we can bound the probability that the risk is badly estimated for any of $n$ classifiers as $$ P(A_1 \cup\ \dots\ \cup A_n) \leq \sum_{i=1}^n P(A_i) , $$ which leads to \begin{align*} P^N\left\{ \exists f\in \mathcal{F}\ :\ R(f) - R_{emp}(f) > \epsilon\right\} & \leq \sum_{i=1}^n \exp(-2 N \epsilon^2)\\ & \leq n \exp(-2N\epsilon^2) . \end{align*}

If we denote by $\delta = n \exp(-2N\epsilon^2)$ the bound on the probability and compute the value of $\epsilon$ as a function of $\delta$, we obtain $$ \epsilon = \sqrt{\frac{\log n + \log \frac{1}{\delta}}{2N}}. $$ Finally, by noting that $$ P^N\left\{ \sup_{f\in \F} (R(f)- R_{emp}(f)) > \epsilon \right\} = P^N\left\{ \exists f\in \mathcal{F}\ :\ R(f) - R_{emp}(f) > \epsilon\right\} , $$ we obtain the following uniform bound.

Given a function class $\F$ of finite cardinality $|\F|=n$, the bound $$ \forall f\in \F,\quad R(f) \leq R_{emp}(f) + \sqrt{\frac{\log n + \log \frac{1}{\delta}}{2N}} $$ holds with probability at least $1-\delta$.

Compared with my first error bound, this bound holds for any classifier in a given function class, and in particular for the one returned by my favorite algorithm. The main difference lies in the additional $\log n$ term in the bound. This corresponds to the fact that this bound actually implies $n$ bounds on different classifiers.