# Covering numbers, epsilon-nets and packing numbers

## In words...

A covering number is the minimal number of balls of a given size required to entirely cover a given set.

More formally, in a space equipped with a (possibly pseudo) metric, we consider covering a given subset of the space with open balls of a given radius, say $\epsilon$. Then, any set of balls entirely convering the considered subset is called an $\epsilon$-cover of the subset, while the set of their centers is an $\epsilon$-net.

The cardinality of the smallest such $\epsilon$-net is a covering number of the subset for the chosen value of $\epsilon$.

A subset is said to be $\epsilon$-separated if the distance between all pairs of elements from the subset is at least $\epsilon$. An $\epsilon$-packing number is the maximal number of elements that can be $\epsilon$-separated.

Covering and packing numbers are particularly useful to measure the capacity of real-valued function classes, as for instance in obtaining margin-based error bounds.

## In maths...

### Covering numbers

Let $(E,\rho)$ be a (possibly pseudo) metric space and consider a set $F\subset E$ and a scale parameter $\epsilon\in (0,+\infty)$. An $\epsilon$-cover of $F$ is a coverage of $F$ with open balls of radius $\epsilon$ and centers in $E$. These centers form an $\epsilon$-net of $F$. If $F$ has an $\epsilon$-net of finite cardinality, then its covering number is the smallest cardinality of its $\epsilon$-nets: $$\mathcal{N}(\epsilon, F, \rho) = \min_{C \subset E} |C|,\quad s.t. \ \forall f\in F,\ \rho(f,C) < \epsilon ,$$ where the dependence of the covering number $\mathcal{N}(\epsilon, F, \rho)$ on $\epsilon$ and the metric $\rho$ appears explicitly in the notation and where the distance from an element $f$ of $F$ to the set of centers $C$ is defined as $$\rho(f, C) = \min_{c\in C} \rho( f,c).$$ If there is no finite $\epsilon$-net, then the covering number is defined to be infinite.

### Packing numbers

With the notations above, a finite set $P\subset F$ is said to be $\epsilon$-separated if all pairs of distinct elements of $P$ are $\epsilon$-separated, i.e., if $$\forall p\in P, \forall p^\prime \in P\setminus\{p\}, \quad \rho(p,p^\prime) \geq \epsilon.$$ The maximal size of $\epsilon$-separated sets in $F$ is the packing number of $F$ at scale $\epsilon$: $$\mathcal{M}(\epsilon, F, \rho) = \max |P|,\ \mbox{s.t. } P \mbox{ is } \epsilon\mbox{-separated}.$$

### Uniform covering and packing numbers

For a function class $F \subseteq \R^{\X}$ and a sequence of points $\g x_n = \{x_i\}_{i=1}^n \in \X^n$, we define the $\ell_{\infty}$-pseudo metric as $$\forall (f,f^\prime) \in F^2,\quad d_{\ell_{\infty}(\g x_n)}(f,f^\prime) = \max_{i\in\{1,\dots,n\}} |f(x_i) - f^\prime(x_i) |.$$ Such a metric depends on the sample $\g x_n$ and leads to sample-dependent covering and packing numbers. In this case, we also define the sample-independent uniform covering numbers $$\mathcal{N}_{\infty}(\epsilon, F, n) = \max_{\g x_n \in \X^n}\ \mathcal{N}(\epsilon, F, d_{\ell_{\infty}(\g x_n)})$$ which upper bound the covering numbers with respect to the $\ell_{\infty}$ pseudo-metric induced by any sample $\g x_n$, and, similarly, the uniform packing numbers $$\mathcal{M}_{\infty}(\epsilon, F, n) = \max_{\g x_n \in \X^n}\ \mathcal{M}(\epsilon, F, d_{\ell_{\infty}(\g x_n)}).$$ Similar definitions can be introduced for other $\ell_p$-pseudo metrics.

### Relationship between covering and packing numbers

We have the following relationship: $$\mathcal{M}(2 \epsilon, F, \rho) \leq \mathcal{N}(\epsilon, F, \rho) \leq \mathcal{M}(\epsilon, F, \rho) .$$

We prove that $\mathcal{M}(2 \epsilon, F, \rho) \leq \mathcal{N}(\epsilon, F, \rho)$ as follows. For $\mathcal{M}(2 \epsilon, F, \rho)=1$ this is obvious. Otherwise, let $P$ be the $2\epsilon$-separated set of maximal cardinality $\mathcal{M}(2 \epsilon, F, \rho)\geq 2$ and $\mathcal{C}$ be the $\epsilon$-net with minimal cardinality $\mathcal{N}(\epsilon, F, \rho)$. Note that any $\epsilon$-cover of $F$ is also an $\epsilon$-cover of $P\subset F$ and thus, given a point $p\in P$, there is a center $c\in\mathcal{C}$ for which $$\rho(p,c) < \epsilon.$$ On the other hand, the triangle inequality gives \begin{align} &\rho(p,p^\prime) \leq \rho(p, c) + \rho(c, p^\prime) \\ \Rightarrow\ & \rho(c, p^\prime) \geq \rho(p,p^\prime) - \rho(p, c) \end{align} Since $P$ is $2\epsilon$-separated, for any $p\in P$ and $p^\prime \in P$ such that $p^\prime \neq p$, this yields $$\rho(c, p^\prime) \geq 2\epsilon - \rho(p, c) > \epsilon.$$ Therefore, any $\epsilon$-net of $P$ must contain at least one center per point in $P$, hence $|\mathcal{C}| \geq |P|$ and we obtain the desired result.

Let $P\subseteq F$ be an $\epsilon$-separated set. If $P$ is not an $\epsilon$-net for $F$, then there is some $p \in F$ for which $\forall p\in P$, $\rho(p,p^\prime) \geq \epsilon$. Thus, $P\cup\{p^\prime\}$ is also an $\epsilon$-separated set and we conclude that any $\epsilon$-separated set of maximal cardinality $\mathcal{M}(\epsilon, F, \rho)$ must also be an $\epsilon$-net for $F$. This implies that the minimal cardinality $\mathcal{N}(\epsilon, F, \rho)$ of an $\epsilon$-net of $F$ is at least $\mathcal{M}(\epsilon, F, \rho)$.

The relationship also holds for the uniform covering and packing numbers: for an $\ell_p$ pseudo-metric $d_{\ell_p(\g x_n)}$, we have $$\mathcal{M}_p(2 \epsilon, F, n) \leq \mathcal{N}_p(\epsilon, F, n) \leq \mathcal{M}_p(\epsilon, F, n) .$$

Let $\hat{\g x}_n \in \arg\max_{\g x_n \in \X^n } \mathcal{N}(\epsilon, F, d_{\ell_{p}(\g x_n)})$. Then, the classical relationship for $\rho = d_{\ell_{p}(\hat{\g x}_n)}$ leads to $$\mathcal{N}_{p}(\epsilon, F, n) = \mathcal{N}(\epsilon, F, d_{\ell_{p}(\hat{\g x}_n)}) \leq \mathcal{M}(\epsilon, F, d_{\ell_{p}(\hat{\g x}_n)}) \leq \mathcal{M}_p(\epsilon, F, n)$$ Similarly, with $\hat{\g x}_n \in \arg\max_{\g x_n \in \X^n } \mathcal{M}(2\epsilon, F, d_{\ell_{p}(\g x_n)})$, we have $$\mathcal{M}_p(2\epsilon, F, n) = \mathcal{M}(2\epsilon, F, d_{\ell_{p}(\hat{\g x}_n)}) \leq \mathcal{N}(\epsilon, F, d_{\ell_{p}(\hat{\g x}_n)}) \leq \mathcal{N}_{p}(\epsilon, F, n).$$