In binary classification, the fat-shattering dimension is a capacity measure generalizing the Vapnik-Chervonenkis dimension to classes of real-valued functions used as score functions by classifiers and that can be used to derive margin-based error bounds.
The fat-shattering dimension of a set of linear functions used by linear classifiers can be bounded by a square function of the ratio of the radius of ball enclosing the data and the scale at which the dimension is computed.
Interestingly, the bound on the fat-shattering dimension of linear classifiers does not involve the dimension of the input space. Thus, contrary to the VC dimension of linear classifiers, the fat-shattering dimension of linear functions remains finite in infinite dimensional Hilbert spaces.
Consider the set of real-valued functions at the core of linear classifiers restricted to parameter vector with bounded size: $$ \G = \{ g \in \R^{\X} : g(\g x) = \inner{\g w}{\g x},\quad \|\g w\| \leq R_{\g w} \} . $$ Then, its fat-shattering dimension is bounded by $$ \fat_{\gamma}(\G) \leq \frac{ R_{\g w}^2 R_{\g x}^2 }{\gamma^2} $$ where $ R_{\g x}$ is radius of the data-enclosing ball ($\max_{\g x\in \X} \|\g x\| \leq R_{\g x}$ ) and $\gamma$ is the scale at which the fat-shattering dimension is computed.
Proof:
If $\{\g x_i\}_{i=1}^n$ is $\gamma$-shattered by $\G$, say with a witness $\{r_i\}_{i=1}^N$, then for all labelings $\g y \in \{-1,+1\}^{n}$ , there is a $\g w$ with $\|\g w \| \leq R_{\g w}$ such that
$$
y_i( \inner{\g w}{\g x_i} - r_i) \geq \gamma,\ i=1,\dots,n.
$$
Assume that $I_0$ is such that
$$
\sum_{_i\in I_0} r_i \geq \sum_{_i\in I\setminus I_0} r_i
$$
and consider the labeling with $y_i = 1$ if $i \in I_0$ and $y_i= -1$ otherwise. Then, we have
$$
\begin{cases}
\inner{\g w}{\g x_i} \geq r_i + \gamma,& \mbox{if } i \in I_0, \\
\inner{\g w}{\g x_i} \leq r_i - \gamma,& \mbox{if } i \in I\setminus I_0,
\end{cases}
$$
which leads to
$$
\inner{\g w}{\sum_{i\in I_0} \g x_i} = \sum_{i\in I_0} \inner{\g w}{ \g x_i} \geq \sum_{i\in I_0} r_i + |I_0|\gamma
$$
and
$$
\inner{\g w}{\sum_{i\in I\setminus I_0} \g x_i} = \sum_{i\in I\setminus I_0} \inner{\g w}{ \g x_i} \leq \sum_{i\in I\setminus I_0} r_i - (n-|I_0|) \gamma .
$$
Thus, by using the assumption above,
$$
\inner{\g w}{\sum_{i\in I_0} \g x_i - \sum_{i\in I\setminus I_0} \g x_i} \geq \sum_{i\in I_0} r_i -\sum_{_i\in I\setminus I_0} r_i + |I_0|\gamma + (n-|I_0|) \gamma \geq n\gamma .
$$
By the Cauchy-Schwarz inequality, $\inner{\g w}{\sum_{i\in I_0} \g x_i - \sum_{i\in I\setminus I_0} \g x_i} \leq \|\g w\| \|\sum_{i\in I_0} \g x_i - \sum_{i\in I\setminus I_0} \g x_i \| \leq R_{\g w} \|\sum_{i\in I_0} \g x_i - \sum_{i\in I\setminus I_0} \g x_i\|$, so that $\|\sum_{i\in I_0} \g x_i - \sum_{i\in I\setminus I_0} \g x_i\|$ must be at least $n\gamma / R_{\g w}$ for the above to hold.
Since we obtain a similar result in the case $\sum_{_i\in I_0} r_i < \sum_{_i\in I\setminus I_0} r_i$ by looking at the opposite labeling where $y_i=1$ if and only if $i\in I\setminus I_0$, the statement holds in all cases.
Given a set of indexes $I=\{1,\dots,n\}$, we let the labels $y_i$ be independent and uniform random variables taking values in $\{-1,+1\}$ and consider the random subset $I_0=\{ i\in I : y_i = 1\}$. Then, we compute the following expectation \begin{align} \E_{\g y} \left\|\sum_{ i\in I_0} \g x_i - \sum_{i\in I\setminus I_0} \g x_i \right\|^2 &= \E_{\g y} \left\| \sum_{i\in I} y_i \g x_i \right\|^2 \\ &= \E_{\g y} \inner{\sum_{i\in I} y_i \g x_i}{\sum_{j\in I} y_j \g x_j} \\ &= \E_{\g y} \sum_{i\in I} \inner{ y_i \g x_i}{\sum_{j\in I} y_j \g x_j} \\ &= \sum_{i\in I} \E_{\g y}\inner{ y_i \g x_i}{\sum_{j\in I} y_j \g x_j} \\ &= \sum_{i\in I} \sum_{j\in I} \E_{y_i,y_j} \inner{ y_i \g x_i}{ y_j \g x_j} \\ &= \sum_{i\in I} \sum_{j\neq i}\E_{y_i,y_j} \inner{y_i \g x_i}{ y_j \g x_j} + \E_{y_i}\inner{ y_i \g x_i}{ y_i \g x_i}\\ &= \sum_{i\in I} \sum_{j\neq i} \E_{y_i,y_j} y_i y_j \inner{\g x_i}{ \g x_j} + \E_{y_i}\| y_i \g x_i\|^2 \\ &= \sum_{i\in I} \E_{y_i}\| y_i \g x_i\|^2 & ( \mbox{due to } y_i, y_j \mbox{ being independent and centered )}\\ & \leq n R_{\g x}^2 \end{align} Since the expectation over all labelings is smaller than $ n R_{\g x}^2$, there must be at least one labeling (and thus one set $I_0$) for which $\left\|\sum_{ i\in I_0} \g x_i - \sum_{i\in I\setminus I_0} \g x_i \right\|^2 \leq n R_{\g x}^2$.