The curse of dimensionality

In words...

The curse of dimensionality...

For instance, consider that we want to find the global minimizer up to a given precision of a nonconvex objective function by sampling, i.e., simply by computing the objective function value at many points and retaining the point with minimal value. To ensure that we obtain the desired precision, we actually have to sample points, such that the distance between points is less than the desired precision. In this case, the curse of dimensionality says that the number of points required quickly increases with the dimension and thus that our strategy is intractable for large dimensions (where large actually means no more than 5 or 6...).

The curse of dimensionality typically occurs in machine learning when we learn nonlinear models as linear models in an extended feature space, such as in polynomial regression for instance. In this case, we need a training sample of overwhelming size to sample sufficiently well the feature space.

In pictures...

Getting a visual feeling about the curse of dimensionality




Say we want to cover this entire line segment (a 1D-shape) with a set of points such that there is a distance of at most of the segment length between two consecutive points (thus making sure that no big holes are left unsampled).

What is the minimal number of points we would need?

Now, we want to cover a square (a 2D-shape) instead of a line, again such that there is a distance of at most of the square width between two neighboring points.

What is the minimal number of points we would need?

Finally, let's increase the dimension once more and do this for a cube (a 3D-shape) instead of a square with a distance of at most % of the cube width between two points.

What is the minimal number of points we would need?

Number of points versus the dimension

XXX WRONG, only 9 points needed !! XXX
In dimension , we need to cover a with a distance of of its width between two points.

Hover over the plot to see how fast N increases.

In maths...

Let $\epsilon \in(0,1)$ be the precision with which we want to sample the unit hypercube $[0,1]^d$ in dimension $d$. Then, we need a lattice with a number of points equal to $$ N = \left(\frac{1}{\epsilon}\right)^d , $$ which has an exponential growth with respect to $d$.

Things are even worse than that...

Let's reformulate the problem as finding a sampling $\{\g x_i\}_{i=1}^N$ of the unit hypercube such that for any point in the hypercube, the distance to the closest sample point is less than $\epsilon$: $$ \forall \g x\in\R^d,\quad \min_{i=1,\dots,N}\ \|\g x - \g x_i\|_2 \leq \epsilon $$ (epsilon-cover ??)

In 1D, the maximum distance to a sample point of our lattice is $\epsilon/2$ for the middle of two consecutive points. So everything looks fine, just as for the 2D-square, where the maximum is attained at the center of a small square delimited by 4 sample points with a distance $$ \epsilon \frac{\sqrt{2}}{2} < \epsilon . $$ This already gives us a feeling about the influence of the dimension. In fact, for a $d$-dimensional unit hypercube, the distance to the center of a small hypercube of width $\epsilon$ is $$ \epsilon \frac{\sqrt{d}}{2} , $$ which is larger than $\epsilon$ for any $d> 4$.

But we can get the desired property with a lattice with $\epsilon_d = 2 \epsilon / \sqrt{d}$ distance, leading to $$ N = \left(\frac{1}{\epsilon_d}\right)^d = \left(\frac{\sqrt{d}}{2\epsilon}\right)^d . $$