One-versus-one decomposition method

In words...

The one-versus-one decomposition method transforms a multi-class classification problem into a set of binary classification problems.

In the one-versus-one strategy, one builds a binary classifier for each pair of classes which it aims at discriminating. Thus, the number of binary problems in the decomposition is larger than in the one-versus-all method, but each problem involves less training data as only the examples of two classes are considered for each binary problem.

Any binary classification algorithm can be used to train the binary classifiers.

In the prediction phase, the predicted multi-class label of a new input pattern is determined using a majority voting scheme, where each binary classifier votes for one class among two.

The one-versus-one approach requires a number of binary classifiers that grows quadratically with the number of classes. When this number is large, an alternative method based on a directed acyclic graph can be used to gain speed during the prediction phase,

In pictures...

Step-by-step illustration of the one-versus-one approach to multi-class problems.

In maths...

The one-versus-all decomposition method transforms a multi-class classification problem with $Q=|\Y|$ categories into a set of $Q(Q-1)/2$ binary classification problems corresponding to all pairs of classes.

Let $f_{jk}$ be the binary classifier discriminating between class $j$ and class $k$. These classifiers are trained from training sets $D_{jk}$ built as $$ D_{jk} = \{(\g x_i, \I{y_i = j} - \I{y_i = k} )\}_{i : y_i = j \vee y_i = k} $$ from the original training set $\{(\g x_i, y_i ) \in\X\times \Y\}_{i=1}^N$.

Then, predictions $f(\g x)$ are computed by $$ f(\g x) = \arg\max_{j\in\{1,\dots,Q\}}\sum_{k = 1}^{j-1} \I{f_{kj}(\g x) = -1} + \sum_{k = j+1}^Q \I{f_{jk}(\g x) = 1}. $$

An alternative (more compact) formulation considers binary classifiers $f_{jk}$ with outputs in $\{j,k\}$, thus trained on $$ D_{jk}^\prime = \{(\g x_i, y_i )\}_{i : y_i = j \vee y_i = k}, $$ and resulting in multi-class predictions $$ f(\g x) = \arg\max_{y\in\{1,\dots,Q\}} \sum_{1\leq j < k \leq Q } \I{f_{jk}(\g x) = y}. $$