Support Vector Machine

ml ml supervised classification svm kernel 2 min read

Margin-maximizing classifier with kernel trick for non-linear boundaries

Geometric Interpretation

There are many hyperplanes that separate positive and negative points.
We want to find the margin-maximizing hyperplane.
Margin is the perpendicular distance between two hyperplanes $H^{+}$ and $H^{-}$.
Points through which $H^{+}$ or $H^{-}$ pass are called support vectors.

For linearly separable data:

\[w^{\star}, b^{\star} = \arg\min_{w,b} \frac{\|w\|^2}{2} \quad \text{s.t. } y_i(w^T x_i + b) \ge 1 \; \forall i\]

For non-linearly separable data (soft margin):

\[w^{\star}, b^{\star} = \arg\min_{w,b} \frac{\|w\|^2}{2} + C \sum_i^N \xi_i\]

where $\xi_i \ge 0$ are slack variables and $C$ is the regularization parameter.

$L_{hinge}(y_i, f(x_i))=\max(0, 1-y_i f(x_i))$ where $f(x_i) = w^T x_i + b$

Complete objective: $\mathcal{L} = \min_{w,b} \frac{1}{n}\sum_i^N \max(0, 1 - y_i(w^T x_i +b)) + \lambda |w|^2$

For points outside margin ($y_i(w^T x_i + b) \geq 1$):
- $w = w - \eta \lambda w$
For margin violations ($y_i(w^T x_i + b) < 1$):
- $w = w - \eta(\lambda w - y_i x_i)$
- $b = b + \eta y_i$

\[\max_{\alpha} \sum_i^N \alpha_i - \frac{1}{2}\sum_i \sum_j \alpha_i \alpha_j y_i y_j x_i^T x_j\]

subject to: $\sum_i^N \alpha_i y_i = 0$ and $0 \le \alpha_i \le C$

Kernel trick: $K(x,y) = \phi(x)^T\phi(y)$

Common Kernels:

Prediction: $f(x) = \text{sign}(\sum_{i \in SV} \alpha_i y_i K(x_i,x) + b)$