Sauer's Lemma provides a polynomial bound on the growth function of a class of binary classifiers based on its VC-dimension.
In pictures...
Below is a plot of Sauer's bound on the growth function. The plot is in log-scale (a linear curve denotes an exponential growth).
The plot shows that the growth function grows exponentially with the number of points as $2^n$ only until this number reaches the VC-dimension, since it is bounded by a polynomial function when the number of points is larger than the VC-dimension.
Given a function class $\F$ of VC-dimension $\VC(\F)$, its growth function is bounded for all $n > \VC(\F)$ by
$$
\Pi_{\F}(n) \leq \sum_{k=0}^{\VC(\F)} \begin{pmatrix}n\\k\end{pmatrix} \leq \left( \frac{e n}{\VC(\F)}\right)^{\VC(\F)} .
$$
Note: $e=\exp(1)$ in the above.
Proof:
The proof works by induction simultaneously on both the VC-dimension and the set size $n$, i.e., at a constant sum $n+\VC(\F) = k$. We use the extended definition of the binomial coefficients:
$$
\begin{pmatrix}n\\k\end{pmatrix} = \begin{cases}
\frac{n!}{k! (n-k)!}, &\mbox{if } 0\leq k \leq n \\
\\
0, &\mbox{otherwise}.
\end{cases}
$$
Base cases: At $n=0$, for any $\VC(\F)$, we have $\Pi_{\F}(0) = 0$ while
$$
\sum_{k=0}^{\VC(\F)} \begin{pmatrix}n\\k\end{pmatrix} = \begin{pmatrix}0\\0\end{pmatrix} + \sum_{k=1}^{\VC(\F)} \begin{pmatrix}0\\k\end{pmatrix} = 1 + 0 = 0
$$
For $\VC(\F)=0$, there is no point that can be shattered, which means that the predicted label is fixed at all points of $\X$ whatever the classifier $f\in\F$. In other words, there is only one possible labeling of any set of points and thus, for any $n$, we have $\Pi_{\F}(n) = 1$, which satisfies the bound since
$$
\sum_{k=0}^{0} \begin{pmatrix}n\\k\end{pmatrix} = \begin{pmatrix}n\\0\end{pmatrix} = 1
$$
Another way for Base cases: For all $n\leq \VC(\F)$, we have
$$
\Pi_{\F}(n) \leq 2^n = \sum_{k=0}^{n} \begin{pmatrix}n\\k\end{pmatrix} = \sum_{k=0}^{\VC(\F)} \begin{pmatrix}n\\k\end{pmatrix}
$$
Inductive step: choose a set of points $S=\{x_1,\dots,x_n\}$ such that each of the $\Pi_{\F}(n)$ labelings of these points corresponds to a different classifier $f\in\F$. Let $F$ denote the set of these classifiers. Let $S^\prime = S \setminus \{x_1\}$ and $F^{\prime}\subseteq F$ be the smallest subset of $F$ that can label $S^\prime$ in the maximum number of ways.
Note that $F^\prime$ cannot contain 2 classifiers that produce the same labeling of $S^\prime$ (since it is the "smallest subset" of $F$). Thus, if 2 classifiers of $F$ produce the same labeling of $S^\prime$, one is in $F^\prime$ and the other is in $F\setminus F^\prime$. This results in a partition of $F$: $F= F^\prime \cup (F\setminus F^\prime)$ with the relation $|F|= |F^\prime| + |F\setminus F^\prime|$.
Note that since we chose $F$ with one function for each labeling of $S$, $|F^\prime|$ and $|F\setminus F^\prime|$ also represent the number of labelings of $S^\prime$ produced by $F^\prime$ and $F\setminus F^\prime$, respectively. Thus:
$$
|F^\prime|\leq \Pi_{F^\prime}(n-1) ,\quad \mbox{ and }\quad |F\setminus F^\prime| \leq \Pi_{F\setminus F^\prime}(n-1)
$$
We will now bound the VC-dimensions of these sets. Since $F^\prime \subseteq F\subseteq \F$, we have $\VC(F^\prime) \leq \VC(F) \leq \VC(\F)$.
If there is a set $T\subseteq S^\prime$ shattered by $F\setminus F^\prime$, then $F^\prime$ also shatters $T$ since $F^\prime$ can produce the maximal number of labelings on $S^\prime$ and thus on all its subsets. This means that for all labelings of $T$, there is a function $f\in F\setminus F^\prime$ and another $f^\prime\in F^\prime$ that produce this labeling. Since $F$ contains a single function for each labeling of $S$, this means that $f$ and $f^\prime$ must disagree on $S\setminus T$. Whatever the output of $f$ on $S^\prime$ is, there is an $f^\prime\in F^\prime$ that can produce this labeling. Thus $f$ and $f^\prime$ must actually disagree on $S\setminus S^\prime = \{x_1\}$: $f(x_1) \neq f^\prime(x_1)$. Since all these pairs of functions are in $F$, we have that $F$ shatters $T\cup \{x_1\}$ (by alternatively considering a function from $F^\prime$ and one from $F\setminus F^\prime$).
If there is a set $T\subseteq S^\prime$ shattered by $F\setminus F^\prime$, then $\VC(F\setminus F^\prime) \geq |T|$.
But we also have that $F$ shatters $T\cup \{ x_1\}$. Thus $\VC(F) > \VC(F\setminus F^\prime)$ and $\VC(F\setminus F^\prime) \leq \VC(\F)-1$.
Inductive step: Assume the Lemma to hold for a number of points $m = n-1$. This implies the following bounds for $F^\prime$ and $F\setminus F^\prime$:
$$
\Pi_{F^\prime}(n-1) \leq \sum_{k=0}^{\small\VC(F^\prime)} \begin{pmatrix}n-1\\k\end{pmatrix}
\leq \sum_{k=0}^{\small\VC(\F)} \begin{pmatrix}n-1\\k\end{pmatrix}
$$
and
$$
\Pi_{F\setminus F^\prime}(n-1) \leq \sum_{k=0}^{\small\VC(F\setminus F^\prime)} \begin{pmatrix}n-1\\k\end{pmatrix}
\leq \sum_{k=0}^{\small\VC(\F)-1} \begin{pmatrix}n-1\\k\end{pmatrix}
$$
Finally, we have
\begin{align*}
\Pi_{\F}(n) = |F| &= |F^\prime| + |F\setminus F^\prime| \\
&\leq \Pi_{F^\prime}(n-1) + \Pi_{F\setminus F^\prime}(n-1) \\
&\leq \sum_{k=0}^{\small\VC(\F)} \begin{pmatrix}n-1\\k\end{pmatrix} + \sum_{k=0}^{\small\VC(\F)-1} \begin{pmatrix}n-1\\k\end{pmatrix} \\
&\leq \sum_{k=0}^{\small\VC(\F)} \begin{pmatrix}n-1\\k\end{pmatrix} + \sum_{k=1}^{\small\VC(\F)} \begin{pmatrix}n-1\\k-1\end{pmatrix}
= \sum_{k=0}^{\small\VC(\F)} \begin{pmatrix}n\\k\end{pmatrix}
\end{align*}
where we used the identity
$$
\begin{pmatrix}n\\k\end{pmatrix} = \begin{pmatrix}n-1\\k\end{pmatrix} + \begin{pmatrix}n-1\\k-1\end{pmatrix} .
$$
This identity characterizes the number of ways we can choose $k$ items from $n$ by computing it in two steps. In the first step, we leave the first item aside and count the number of ways we can choose $k$ items from $n-1$. In the second step, we start by choosing the first item and count the number of ways we can choose $k-1$ items from the $n-1$ remaining ones. The number of ways we can choose $k$ items from $n$ is the sum of the numbers obtained in the two steps.
Polynomial bound:
For any $n > \VC(\F)$, the combinatorial result below leads to
$$
\sum_{k=0}^{\small\VC(\F)} \begin{pmatrix}n\\k\end{pmatrix} \leq \left( \frac{en}{\VC(\F)} \right)^{\small\VC(\F)} .
$$
Combinatorial result
A useful bound for sums of binomial coefficients is the following.