Separating hyperplane

In words...

A separating hyperplane is a flat surface that divides the space in two half-spaces. Thus, it is used as a boundary between two classes in a binary classification problem. For these problems a hyperplane corresponds to a linear classifier and every linear classifier can be associated to a hyperplane yielding the same classification.

In pictures...

Nice figure with projection of dot product...

In maths...

A hyperplane is a surface (a set of points) defined as $$ H=\{ \g x \in \X \ :\ \inner{\g w}{\g x} + b = 0 \} $$

The distance to the origin (the offset)

Consider the point $\g x_0$ as the (orthogonal) projection of the origin $\g 0$ onto $H$, such that $dist(\g 0, H) = \|\g x_0\|$. Then, the vector $(\g x_0 - \g 0) = \g x_0$ is orthogonal to $H$ and thus colinear to $\g w$. This implies that $$ \inner{\g w}{\g x_0} = \pm \|\g w\| \|\g x_0\|. $$ In addition, the equation of the hyperplane implies, for all points on the hyerplane and in particular for $\g x_0$, $$ \inner{\g w}{\g x_0} = -b \quad \Rightarrow |\inner{\g w}{\g x_0}| = |b| . $$ Therefore, $$ dist(\g 0, H) = \|\g x_0\| = \frac{|\inner{\g w}{\g x}|}{\|\g w\|} = \frac{|b|}{\|\g w\|} $$

The distance from a point to the hyperplane

The same reasoning yields the distance from any $\g x\notin H$ to the hyperplane.
By considering $\g x_P$ as the (orthogonal) projection of $\g x$ on $H$, we have $dist(\g x, H) = \|\g x-\g x_P\|$ and the fact that $(\g x-\g x_P)$ is colinear with $\g w$. Thus, $$ |\inner{\g w}{\g x-\g x_P}| = \|\g w\|\|\g x-\g x_P\| $$ while $\g x_P$ belongs to $H$ and thus satisfies $\inner{\g w}{\g x_P} = -b$. Therefore, $$ dist(\g x, H) = \|\g x-\g x_P\| = \frac{|\inner{\g w}{\g x-\g x_P}|}{\|\g w\|} = \frac{|\inner{\g w}{\g x}-\inner{\g w}{\g x_P}|}{\|\g w\|} = \frac{|\inner{\g w}{\g x} + b|}{\|\g w\|}. $$ In addition, one can easily check that the formula works for $\g x\in H$, in which case $\inner{\g w}{\g x} + b=0$.