## In words...

Quadratic programming problems are optimization problems that can be seen as an extension of linear programming where the objective function is quadratic.

## In maths...

### General form of a quadratic program

A quadratic program is an optimization problem of the form \begin{align*} \min_{\g x\in\R^d} \ & \frac{1}{2} \g x^T \g Q \g x + \g c^T \g x \\ \mbox{s.t. } & \g A\g x \leq \g b \\ & \g A_{eq} \g x = \g b_{eq} \\ &\g l \leq \g x\leq \g u \end{align*} If the matrix $\g Q$ is positive semi-definite, then this problem is convex.

### Solving a quadratic program: the Frank-Wolfe algorithm

We here describe one out of many algorithms for solving a quadratic program.

The Frank-Wolfe algorithm is an iterative first-order method, i.e., it uses only information about the first-order derivatives (the gradient). It heavily relies on linear programming to find a suitable direction of descent at each iteration.

The basic idea of the Frank-Wolfe algorithm is to minimize, at the $k$th iteration, a linear approximation of the objective function given by its first-order Taylor expansion computed at $\g x^k$: $$J(\g x) = \frac{1}{2} \g x^T \g Q \g x + \g c^T \g x \approx J(\g x^k) + \nabla J(\g x^k)^T \g x ,$$ with $$\nabla J(\g x^k) = \g Q \g x^k + \g c .$$

Step 1. Solve the linear program \begin{align*} \min_{\g y\in\R^d} \ & J(\g x^k) + \nabla J(\g x^k)^T (\g y - \g x_k) \\ \mbox{s.t. } & \g A\g y \leq \g b \\ & \g A_{eq} \g y = \g b_{eq} \\ &\g l \leq \g y\leq \g u \end{align*} which, after removing the constant terms is equivalent to \begin{align*} \g y^* = \arg\min_{\g y\in\R^d} \ & \nabla J(\g x^k)^T \g y \\ \mbox{s.t. } & \g A\g y \leq \g b \\ & \g A_{eq} \g y = \g b_{eq} \\ &\g l \leq \g y\leq \g u \end{align*} Since we used the original constraints that define a convex set $\mathcal{C}$, we have $\g x^k\in\mathcal{C}$ and $\g y\in\mathcal{C}$. By the convexity of $\mathcal{C}$, their convex combination also belongs to $\mathcal{C}$: with $\gamma \in[0,1]$, $$(1-\gamma)\g x^k + \gamma \g y = \g x^k + \gamma (\g y - \g x^k) \in \mathcal{C} ;$$ which implies that $(\g y - \g x^k)$ is a feasible direction.

Step 2. Compute the optimal step length $\gamma^*$ as $$\gamma^* = \arg\min_{\gamma \in[0,1]} \ J\left(\g x^k + \gamma (\g y - \g x^k)\right)$$ To do this, we compute the derivative of the objective function with respect to $\gamma$: $$\drond{ J\left(\g x^k + \gamma (\g y - \g x^k)\right)}{\gamma} = (\g y - \g x^k)^T \nabla J\left(\g x^k + \gamma (\g y - \g x^k)\right) = (\g y - \g x^k)^T \left[ \g Q (\g x^k + \gamma (\g y - \g x^k)) + \g c\right]$$ which cancels at the optimum, therefore leading to $$\gamma^*(\g y - \g x^k)^T \g Q (\g y - \g x^k) = (\g y - \g x^k)^T \left[ \g Q \g x^k + \g c \right] = - (\g y - \g x^k)^T \nabla J(\g x^k)$$ and thus to $$\gamma^* = \frac{- (\g y - \g x^k)^T \nabla J(\g x^k)}{(\g y - \g x^k)^T \g Q (\g y - \g x^k)} .$$ Note that we left the constraint $\gamma\leq 1$ aside while computing the step length $\gamma^*$. Thus, we should make sure not to overshoot outside of the feasible set $\mathcal{C}$ by using the final step length $$\overline{\gamma} = \min \{1, \gamma^*\} .$$

Step 3. Take the step with $$\g x^{k+1} = \g x^k + \overline{\gamma} (\g y - \g x^k)$$ and repeat from Step 1 until convergence.