Separating hyperplane
In words...
A separating hyperplane is a flat surface that divides the space in two half-spaces. Thus, it is used as a boundary between two classes in a binary classification problem. For these problems a hyperplane corresponds to a linear classifier and every linear classifier can be associated to a hyperplane yielding the same classification.
In pictures...
Nice figure with projection of dot product...
In maths...
A hyperplane is a surface (a set of points) defined as
$$
H=\{ \g x \in \X \ :\ \inner{\g w}{\g x} + b = 0 \}
$$
The distance to the origin (the offset)
Consider the point $\g x_0$ as the (orthogonal) projection of the origin $\g 0$ onto $H$, such that $dist(\g 0, H) = \|\g x_0\|$. Then, the vector $(\g x_0 - \g 0) = \g x_0$ is orthogonal to $H$ and thus colinear to $\g w$. This implies that
$$
\inner{\g w}{\g x_0} = \pm \|\g w\| \|\g x_0\|.
$$
In addition, the equation of the hyperplane implies, for all points on the hyerplane and in particular for $\g x_0$,
$$
\inner{\g w}{\g x_0} = -b \quad \Rightarrow |\inner{\g w}{\g x_0}| = |b| .
$$
Therefore,
$$
dist(\g 0, H) = \|\g x_0\| = \frac{|\inner{\g w}{\g x}|}{\|\g w\|} = \frac{|b|}{\|\g w\|}
$$
The distance from a point to the hyperplane
The same reasoning yields the distance from any $\g x\notin H$ to the hyperplane.
By considering $\g x_P$ as the (orthogonal) projection of $\g x$ on $H$, we have $dist(\g x, H) = \|\g x-\g x_P\|$ and the fact that $(\g x-\g x_P)$ is colinear with $\g w$. Thus,
$$
|\inner{\g w}{\g x-\g x_P}| = \|\g w\|\|\g x-\g x_P\|
$$
while $\g x_P$ belongs to $H$ and thus satisfies $\inner{\g w}{\g x_P} = -b$. Therefore,
$$
dist(\g x, H) = \|\g x-\g x_P\| = \frac{|\inner{\g w}{\g x-\g x_P}|}{\|\g w\|} = \frac{|\inner{\g w}{\g x}-\inner{\g w}{\g x_P}|}{\|\g w\|} = \frac{|\inner{\g w}{\g x} + b|}{\|\g w\|}.
$$
In addition, one can easily check that the formula works for $\g x\in H$, in which case $\inner{\g w}{\g x} + b=0$.