Margin-based error bound with the fat-shattering dimension

In words...

The bound previously obtained can be difficult to apply due to the dependence of the covering number on the training sample size. However, the bound can be reformulated in terms of the fat-shattering dimension, which is independent of the sample size.

Much like Sauer's Lemma was used to bound the growth function with the VC-dimension, a generalized version of the lemma is used to bound the covering number involved in the previous bound by a function of the fat-shattering dimension.

Another additional step can be taken by making the bound hold for all value of margin.

In pictures...

Packing number of quantized class...

In maths...

Consider a binary classification problem and a set of large-margin classifiers based on a real-valued function $g$: $$ \F=\{f \in \{-1,1\}^{\X} : \forall x\in\X,\ f( x) = \sign ( g(x)), \ g\in\G\subset \R^{\X}\}. $$ For these classifiers we can define a margin-based error rate as $$ R_{emp}^{\gamma} ( f) = \frac{1}{N} \sum_{i=1}^N \I{Y_i g(X_i) < \gamma} $$ on a training sample $D=\{(X_i,Y_i)\}_{i=1}^N$ of size $N$. Then, the following bound based on the fat-shattering dimension of the real-valued function class $\G$ can be derived.

For all $\delta > 0$, the bound $$ \forall f\in\F,\quad R(f) \leq R_{emp}^{\gamma}(f) + 2\sqrt{2\frac{\fat_{\gamma/16}(\G)\log \frac{34 eN}{\fat_{\gamma/16}(\G)} \log_2 (578N) + \log\frac{4}{\delta}}{N}} $$ holds with probability at least $1-\delta$.

Proof: We start from the previous bound involving a covering number that we will bound by the fat-shattering dimension: $$ \forall f\in\F,\quad R(f) \leq R_{emp}^{\gamma}(f) + 2\sqrt{2\frac{\log \mathcal{N}_{\infty}(\frac{\gamma}{2}, \pi_{\gamma} (\G), 2N ) + \log\frac{2}{\delta}}{N}} $$ holds with probability at least $1-\delta$ for all $\delta>0$.

Define the quantization function $Q: \R\rightarrow \R$ as $$ Q_{\alpha} ( u) = \left\lceil \frac{u - \frac{\alpha}{2}}{\alpha} \right\rceil \alpha $$ and the set of quantized functions $\mathcal{Q} = \{ q\in\R^{\X} : q(x) = Q_{\alpha}( \pi_{\gamma}( g(x) ) ),\ g\in\G \}$. We have the following relationship between the packing numbers of the function classes $\pi_{\gamma}(\mathcal{G})$ and $\mathcal{Q}$ with respect to the $\ell_{\infty}$-pseudo metric: $$ \mathcal{M}_{\infty}(\gamma/2, \pi_{\gamma}(\mathcal{G}), 2N ) \leq\mathcal{M}_{\infty}(\gamma/2, \mathcal{Q}, 2N) . $$

We make use of the following property of the quantization function: $$ \forall (a,b)\in \R^2, \forall k\in\mathbb{N},\quad |a-b| \geq k \alpha \quad \Rightarrow\quad |Q_{\alpha}(a) - Q_{\alpha}(b) | \geq k\alpha $$ This can be shown as follows. Without loss of generality, assume $a\geq b$. Then, \begin{align} |a-b| \geq \alpha k \ & \Rightarrow \ (a-\frac{\alpha}{2}) - (b - \frac{\alpha}{2}) \geq \alpha k \\ & \Rightarrow\ a-\frac{\alpha}{2} \geq b - \frac{\alpha}{2} + \alpha k \\ & \Rightarrow \ \frac{a-\frac{\alpha}{2}}{\alpha} \geq \frac{b - \frac{\alpha}{2}}{\alpha} + k \\ & \Rightarrow \ \frac{a-\frac{\alpha}{2}}{\alpha} > \left\lceil \frac{b - \frac{\alpha}{2}}{\alpha} \right\rceil -1 + k\\ & \Rightarrow \ \left\lceil \frac{a-\frac{\alpha}{2}}{\alpha}\right\rceil > \left\lceil \frac{b - \frac{\alpha}{2}}{\alpha} \right\rceil -1 + k \\ & \Rightarrow \ \left\lceil \frac{a-\frac{\alpha}{2}}{\alpha}\right\rceil \geq \left\lceil \frac{b - \frac{\alpha}{2}}{\alpha} \right\rceil + k\\ &\Rightarrow Q_{\alpha}(a) - Q_{\alpha}(b) \geq k\alpha \end{align} This property transfers to all pairs of real-valued functions $(g,g^\prime)$ with the $\ell_{\infty}$-pseudo metric as $$ \forall \g x_{2N}\in\X^{2N},\ \forall k\in \mathbb{N},\quad d_{\ell_\infty(\g x_{2N})}(g, g^\prime) \geq k \alpha \quad \Rightarrow\quad d_{\ell_\infty(\g x_{2N})}(Q_{\alpha}(g), Q_{\alpha}(g^\prime) ) \geq k\alpha $$ since $d_{\ell_\infty(\g x_{2N})}(g, g^\prime) \geq k\alpha$ implies that there is an $x_i$ for which $|g(x_i) - g^\prime(x_i)|\geq k\alpha$, in which case the above yields $$ d_{\ell_\infty(\g x_{2N})}(Q_{\alpha}(g), Q_{\alpha}(g^\prime) ) \geq |Q_{\alpha}(g(x_i)) - Q_{\alpha}(g^\prime(x_i))|\geq k\alpha . $$ Therefore, any $k\alpha$-packing of $\pi_{\gamma}( \G )$ yields a $k\alpha$-packing of $Q_{\alpha}(\pi_{\gamma}( \G ))$ and by choosing $k = 4$ and $\alpha=\gamma/8$ this yields $\mathcal{M}(\gamma/2, \pi_{\gamma}(\mathcal{G}), d_{\ell_\infty(\g x_{2N})} ) \leq \mathcal{M}(\gamma/2, \mathcal{Q}, d_{\ell_\infty(\g x_{2N})} )$. Since this holds for any $\g x_{2N}$, it also holds for $\hat{\g x}_{2N} \in \arg\max_{\g x_{2N}\in\X^{2N}} \mathcal{M}(\gamma/2, \pi_{\gamma}(\mathcal{G}), d_{\ell_\infty(\g x_{2N})} )$ and thus, for the uniform packing numbers, we also have $$ \mathcal{M}_{\infty}(\gamma/2, \pi_{\gamma}(\mathcal{G}), 2N ) = \mathcal{M}(\gamma/2, \pi_{\gamma}(\mathcal{G}), d_{\ell_\infty(\hat{\g x}_{2N})} ) \leq \mathcal{M}(\gamma/2, \mathcal{Q}, d_{\ell_\infty(\hat{\g x}_{2N})} ) \leq \mathcal{M}_{\infty}(\gamma/2, \mathcal{Q}, 2N). $$

Using this with the classical relationship between covering and packing numbers, $$ \mathcal{N}_{\infty}(\gamma/2, \pi_{\gamma}(\mathcal{G}), 2N ) \leq \mathcal{M}_{\infty}(\gamma/2, \pi_{\gamma}(\mathcal{G}), 2N) $$ we obtain that $$ \mathcal{N}_{\infty}(\gamma/2, \pi_{\gamma}(\mathcal{G}), 2N ) \leq \mathcal{M}_{\infty}(\gamma/2, \mathcal{Q}, 2N) $$ and, using the fact that an $\epsilon$-packing number is a decreasing function of $\epsilon$: $$ \mathcal{N}_{\infty}(\gamma/2, \pi_{\gamma}(\mathcal{G}), 2N ) \leq \mathcal{M}_{\infty}(\gamma/4, \mathcal{Q}, 2N) . $$ Consider now the function class $\overline{\mathcal{Q}} = (9+\frac{8}{\gamma}\mathcal{Q})$ of integer-valued functions with output in $\{1,\dots,17\}$. We have $$ \mathcal{M}_{\infty}(\gamma/4, \mathcal{Q}, n) \leq \mathcal{M}_{\infty}(2,\overline{\mathcal{Q}},n). $$

For any $\g x_n\in\X^n$, let $P$ be a $\gamma/4$-packing of $\mathcal{Q}$, then $$ \forall (q_1,q_2) \in P^2, q_1\neq q_2,\ \exists x_i\in\g x_n,\quad \mbox{such that } |q_1(x_i) - q_2(x_i)| \geq \frac{\gamma}{4}, $$ which implies that $\forall (q_1,q_2) \in P^2, q_1\neq q_2, \exists x_i\in\g x_n$, such that $$ \left|\frac{8}{\gamma}q_1(x_i) - \frac{8}{\gamma}q_2(x_i)\right|\geq \frac{8}{\gamma}\frac{\gamma}{4} = 2 \quad \Rightarrow\quad \left|9 + \frac{8}{\gamma}q_1(x_i) - 9 - \frac{8}{\gamma}q_2(x_i)\right| = |\overline{q}_1(x_i) - \overline{q}_2(x_i)|\geq 2 $$ and thus that $\overline{P}= (9+\frac{8}{\gamma}P)$ is a $2$-packing of $\overline{\mathcal{Q}}$. Therefore, the number of $2$-packings of $\overline{\mathcal{Q}}$ cannot be less than the number of $\gamma/4$-packings of $\mathcal{Q}$ and $\mathcal{M}(\frac{\gamma}{4},\mathcal{Q},d_{\ell_\infty(\g x_{n})}) \leq \mathcal{M}(2,\overline{\mathcal{Q}},d_{\ell_\infty(\g x_{n})})$. Since this holds for any $\g x_n$, we have, for $\hat{\g x}_{n} \in \arg\max_{\g x_{n}\in\X^{n}}\mathcal{M}(\frac{\gamma}{4},\mathcal{Q},d_{\ell_\infty(\g x_{n})})$, $$ \mathcal{M}_{\infty}(\gamma/4, \mathcal{Q}, n) = \mathcal{M}(\frac{\gamma}{4},\mathcal{Q},d_{\ell_\infty(\hat{\g x}_{n})}) \leq \mathcal{M}(2,\overline{\mathcal{Q}},d_{\ell_\infty(\hat{\g x}_{n})}) \leq \mathcal{M}_{\infty}(2,\overline{\mathcal{Q}},n) . $$

We will now apply a generalized Sauer's lemma to bound $\mathcal{M}_{\infty}(2,\overline{\mathcal{Q}},n)$. For this lemma to apply with $d=\fat_{\gamma/8}(\mathcal{Q})$, we need $\fat_{1}(\overline{\mathcal{Q}}) \leq \fat_{\gamma/8}(\mathcal{Q})$.
To see this, note that $\fat_{1}(\frac{1}{\alpha}\mathcal{Q}) = \fat_{\alpha}(\mathcal{Q})$, since $y_i(q(x_i) - r_i)\geq \alpha \Leftrightarrow y_i(\frac{1}{\alpha} q(x_i) - \frac{1}{\alpha} r_i) \geq 1$. In addition, the fat-shattering dimension is invariant to translation of the function class: $\fat_{1}(\frac{1}{\alpha}\mathcal{Q}) = \fat_{1}(\beta + \frac{1}{\alpha}\mathcal{Q})$ (it suffices to choose a witness $r_i^\prime= r_i+\beta$). Thus, choosing $\beta = 9$ and $\alpha = \gamma/8$ yields $\fat_{1}(\overline{\mathcal{Q}}) = \fat_{\gamma/8}(\mathcal{Q})$ and the lemma can be applied with $n=2N$ and $m=17$ to give $$ \mathcal{M}_{\infty}(2,\overline{\mathcal{Q}},2N) \leq 2(2N m^2)^p , \quad \mbox{with } p = \fat_{\gamma/8}(\mathcal{Q}) \log_2 \left(\frac{2Nm}{\fat_{\gamma/8}(\mathcal{Q})} \right) , $$ in which we can replace $\fat_{\gamma/8}(\mathcal{Q})$ by the fat-shattering dimension of the class $\pi_{\gamma}(\G)$ since $$ \fat_{\gamma/16} ( \pi_{\gamma}( \G ) )\geq \fat_{\gamma/8} ( \mathcal{Q} ). $$

With $n= \fat_{\gamma/8} ( \mathcal{Q} )$, there is a set of points and a vector $\g r$ such that $$ \forall \g y \in \{-1,+1\}^n,\quad \exists q\in\mathcal{Q}, \mbox{ such that } y_i( q(x_i) - r_i) \geq \frac{\gamma}{8},\quad i=1,\dots,n, $$ Since the quantization function introduces an approximation error that is bounded by $$ |Q_{\alpha}(u) - u | \leq \frac{\alpha}{2} , $$ this implies for $\alpha=\gamma/8$ that $$ \forall \g y \in \{-1,+1\}^n,\quad \exists g\in\G, \mbox{ such that } y_i( \pi_{\gamma}( g(x_i) ) - r_i) \geq \frac{\gamma}{8} - \frac{\gamma}{16} \geq \frac{\gamma}{16},\quad i=1,\dots,n, $$ and thus that $\fat_{\gamma/16} ( \pi_{\gamma}( \G ) )\geq \fat_{\gamma/8} ( \mathcal{Q} )$.

The last step amounts to bounding the fat-shattering dimension of $\pi_{\gamma}( \G )$ with the one of $\G$ as $$ \fat_{\gamma/16} ( \pi_{\gamma}( \G ) )\leq \fat_{\gamma/16} ( \G ), $$ which holds by the fact that any set of points $\frac{\gamma}{16}$-shattered by $\pi_{\gamma}( \G )$ is also $\frac{\gamma}{16}$-shattered by $\G$.
Finally, we obtain $$ \mathcal{N}_{\infty}(\gamma/2, \pi_{\gamma}(\mathcal{G}), 2N ) \leq 2(578N)^p , \quad \mbox{with } p = \fat_{\gamma/16} ( \G ) \log_2 \left(\frac{34eN}{ \fat_{\gamma/16} ( \G )} \right) $$ and \begin{align} \log \mathcal{N}_{\infty}(\gamma/2, \pi_{\gamma}(\mathcal{G}), 2N ) \leq \log 2 + \fat_{\gamma/16} ( \G ) \log_2 \left(\frac{34eN}{ \fat_{\gamma/16} ( \G )} \right) \log(578N) \\ = \log 2 + \fat_{\gamma/16} ( \G ) \log \left(\frac{34eN}{ \fat_{\gamma/16} ( \G )} \right) \log_2(578N) \end{align} where the last equality is due to the definition of logarithms as $\log_2 (a) = \frac{\log a}{\log 2}$ and $\log b = \log_2 b \log 2$. Plugging this in the error bound concludes the proof.

Notes

The results described in this page were published in .