One-versus-all decomposition method

In words...

The one-versus-all decomposition method transforms a multi-class classification problem into a set of binary classification problems.

In the one-versus-all strategy, one builds a binary classifier for each class which it aims at separating from all the others. Thus, the number of binary problems in the decomposition equals the number of categories in the multi-class problem.

The binary classifiers are trained from modified training sets, where the labels are reduced to two different values: one for the class to which the binary classifier is assigned and another one for all other classes.

Any binary classification algorithm can be used to train the binary classifiers, with however the requirement that the algorithm should also output a score (a real value) when making a prediction. Classifiers that fulfill this requirement are for instance the perceptron or support vector machines...

In the prediction phase, the predicted multi-class label of a new input pattern is the one corresponding to the class of the binary classifier with maximal score.

In pictures...

Step-by-step illustration of the one-versus-all approach to multi-class problems.

In maths...

The one-versus-all decomposition method transforms a multi-class classification problem with $Q=|\Y|$ categories into a set of $Q$ binary classification problems.

In this method, the $k$th binary classifier is trained from a training set built as $$ \{(\g x_i, 2\I{y_i = k} -1 )\}_{i=1}^N $$ from the original training set $\{(\g x_i, y_i ) \in\X\times \Y\}_{i=1}^N$, i.e., with binary labels set to $+1$ for points from the $k$th category and $-1$ for the others.

Then, predictions $f(\g x)$ are computed based on scores $g_k(\g x) \in \R$ returned by the binary classifiers $f_k(\g x) = \sign(g_k (\g x) )$, $k=1,\dots,Q$: $$ f(\g x) = \arg\max_{k\in\{1,\dots,Q\}} g_k(\g x). $$ This implies that binary classification algorithms eligible for this approach must return a real-valued score and not only a class label.

Pairwise class boundaries

The $Q(Q-1)/2$ pairwise classifiers $f_{jk}$, $1\leq j < k \leq Q$, discriminating between two classes $j$ (labeled +1) and $k$ (labeled -1), can be expressed using the $Q$ score functions $g_k, k=1\dots, Q$, as $$ f_{jk}( \g x ) = \sign( g_j(\g x) - g_k(\g x) ) . $$ This also yields the pairwise class boundaries as the surfaces $S_{jk}$ defined by $$ S_{jk} = \{\g x\ :\ g_j(\g x) - g_k(\g x) = 0 \} . $$