Quadratic programming

In words...

Quadratic programming problems are optimization problems that can be seen as an extension of linear programming where the objective function is quadratic.

In pictures...

In maths...

General form of a quadratic program

A quadratic program is an optimization problem of the form \begin{align*} \min_{\g x\in\R^d} \ & \frac{1}{2} \g x^T \g Q \g x + \g c^T \g x \\ \mbox{s.t. } & \g A\g x \leq \g b \\ & \g A_{eq} \g x = \g b_{eq} \\ &\g l \leq \g x\leq \g u \end{align*} If the matrix $\g Q$ is positive semi-definite, then this problem is convex.

Solving a quadratic program: the Frank-Wolfe algorithm

We here describe one out of many algorithms for solving a quadratic program.

The Frank-Wolfe algorithm is an iterative first-order method, i.e., it uses only information about the first-order derivatives (the gradient). It heavily relies on linear programming to find a suitable direction of descent at each iteration.

The basic idea of the Frank-Wolfe algorithm is to minimize, at the $k$th iteration, a linear approximation of the objective function given by its first-order Taylor expansion computed at $\g x^k$: $$ J(\g x) = \frac{1}{2} \g x^T \g Q \g x + \g c^T \g x \approx J(\g x^k) + \nabla J(\g x^k)^T \g x , $$ with $$ \nabla J(\g x^k) = \g Q \g x^k + \g c . $$

Step 1. Solve the linear program \begin{align*} \min_{\g y\in\R^d} \ & J(\g x^k) + \nabla J(\g x^k)^T (\g y - \g x_k) \\ \mbox{s.t. } & \g A\g y \leq \g b \\ & \g A_{eq} \g y = \g b_{eq} \\ &\g l \leq \g y\leq \g u \end{align*} which, after removing the constant terms is equivalent to \begin{align*} \g y^* = \arg\min_{\g y\in\R^d} \ & \nabla J(\g x^k)^T \g y \\ \mbox{s.t. } & \g A\g y \leq \g b \\ & \g A_{eq} \g y = \g b_{eq} \\ &\g l \leq \g y\leq \g u \end{align*} Since we used the original constraints that define a convex set $\mathcal{C}$, we have $\g x^k\in\mathcal{C}$ and $\g y\in\mathcal{C}$. By the convexity of $\mathcal{C}$, their convex combination also belongs to $\mathcal{C}$: with $\gamma \in[0,1]$, $$ (1-\gamma)\g x^k + \gamma \g y = \g x^k + \gamma (\g y - \g x^k) \in \mathcal{C} ; $$ which implies that $(\g y - \g x^k)$ is a feasible direction.

Step 2. Compute the optimal step length $\gamma^*$ as $$ \gamma^* = \arg\min_{\gamma \in[0,1]} \ J\left(\g x^k + \gamma (\g y - \g x^k)\right) $$ To do this, we compute the derivative of the objective function with respect to $\gamma$: $$ \drond{ J\left(\g x^k + \gamma (\g y - \g x^k)\right)}{\gamma} = (\g y - \g x^k)^T \nabla J\left(\g x^k + \gamma (\g y - \g x^k)\right) = (\g y - \g x^k)^T \left[ \g Q (\g x^k + \gamma (\g y - \g x^k)) + \g c\right] $$ which cancels at the optimum, therefore leading to $$ \gamma^*(\g y - \g x^k)^T \g Q (\g y - \g x^k) = (\g y - \g x^k)^T \left[ \g Q \g x^k + \g c \right] = - (\g y - \g x^k)^T \nabla J(\g x^k) $$ and thus to $$ \gamma^* = \frac{- (\g y - \g x^k)^T \nabla J(\g x^k)}{(\g y - \g x^k)^T \g Q (\g y - \g x^k)} . $$ Note that we left the constraint $\gamma\leq 1$ aside while computing the step length $\gamma^*$. Thus, we should make sure not to overshoot outside of the feasible set $\mathcal{C}$ by using the final step length $$ \overline{\gamma} = \min \{1, \gamma^*\} . $$

Step 3. Take the step with $$ \g x^{k+1} = \g x^k + \overline{\gamma} (\g y - \g x^k) $$ and repeat from Step 1 until convergence.