# My first error bound (and why it is useless)

## In words...

A simple error bound can be obtained by a straightforward application of Hoeffding's inequality.

However, such a bound is useless since it requires the knowledge of the classifier computed by the algorithm *before* drawing the training data. Nonetheless, useful bounds are often based on this one.

## In pictures...

## In maths...

We know that the 0-1 loss function for classification is bounded as
$$
\ell(Y,f(X)) = \I{f(X)\neq Y} \in \{0,1\} \subset [0,1].
$$
Thus, we can apply (the second) Hoeffding's inequality to the random variables $L_i = \ell(Y_i,f(X_i))$ and obtain
$$
P^N \left\{ R(f) - R_{emp}(f) > \epsilon \right\} \leq \exp\left(-2N\epsilon^2\right),
$$
with $R(f) = \E [L_i]$ and $R_{emp}(f) = \frac{1}{N}\sum_{i=1}^N L_i$.

In order to rewrite this as an explicit error bound, let denote the right-hand side by
$$
\delta = \exp\left(-2N\epsilon^2\right).
$$
Then, we can express $\epsilon$ as a function of $\delta$:
$$
\epsilon = \sqrt{\frac{\log\frac{1}{\delta}}{2N}},
$$
which leads to
$$
P^N \left\{ R(f) - R_{emp}(f) > \sqrt{\frac{\log\frac{1}{\delta}}{2N}} \right\} \leq \delta.
$$
Finally, applying the probability inversion principle, we obtain that

For all $f$ and all $\delta>0$, the bound
$$
R(f) \leq R_{emp}(f) + \sqrt{\frac{\log\frac{1}{\delta}}{2N}}
$$
holds with probability at least $1-\delta$.

Though maybe attractive at first sight, this bound is only true for a function $f$ fixed a priori. In particular, the bound states that for a given classifier $f$, there exists some training set for which $R(f) \leq R_{emp}(f) + \sqrt{\frac{\log\frac{1}{\delta}}{2N}}$ and that the probability of drawing this data set is at least $1-\delta$.

However, the particular data set leading to this bound can be different for each classifier $f$. This means that in a practical learning situation where one has a fixed data set, the bound is only valid for some functions $f$ without any guarantee for the one $f$ returned by the algorithm. Therefore, the bound above cannot be used to quantify the quality of an algorithm.