- Used for both classification and regression tasks.
- Core idea: Find optimal hyperplane that maximizes margin between classes.
Geometric Interpretation
- There are many hyperplanes that separate positive and negative points.
- We want to find the margin-maximizing hyperplane.
- Margin is the perpendicular distance between two hyperplanes $H^{+}$ and $H^{-}$.
- Points through which $H^{+}$ or $H^{-}$ pass are called support vectors.
Derivation
- $H$: $w^T x + b = 0$
- $H^{+}$: $w^T x + b = 1$ (positive class boundary)
- $H^{-}$: $w^T x + b = -1$ (negative class boundary)
- Margin: $d = \frac{2}{|w|}$
For linearly separable data:
\[w^{\star}, b^{\star} = \arg\min_{w,b} \frac{\|w\|^2}{2} \quad \text{s.t. } y_i(w^T x_i + b) \ge 1 \; \forall i\]For non-linearly separable data (soft margin):
\[w^{\star}, b^{\star} = \arg\min_{w,b} \frac{\|w\|^2}{2} + C \sum_i^N \xi_i\]where $\xi_i \ge 0$ are slack variables and $C$ is the regularization parameter.
Hinge Loss
$L_{hinge}(y_i, f(x_i))=\max(0, 1-y_i f(x_i))$ where $f(x_i) = w^T x_i + b$
Complete objective: $\mathcal{L} = \min_{w,b} \frac{1}{n}\sum_i^N \max(0, 1 - y_i(w^T x_i +b)) + \lambda |w|^2$
Gradient Updates
- For points outside margin ($y_i(w^T x_i + b) \geq 1$):
- $w = w - \eta \lambda w$
- For margin violations ($y_i(w^T x_i + b) < 1$):
- $w = w - \eta(\lambda w - y_i x_i)$
- $b = b + \eta y_i$
Dual Form
\[\max_{\alpha} \sum_i^N \alpha_i - \frac{1}{2}\sum_i \sum_j \alpha_i \alpha_j y_i y_j x_i^T x_j\]subject to: $\sum_i^N \alpha_i y_i = 0$ and $0 \le \alpha_i \le C$
Kernel SVM
Kernel trick: $K(x,y) = \phi(x)^T\phi(y)$
Common Kernels:
- Linear: $K(x,y) = x^T y$
- Polynomial: $K(x,y) = (\gamma x^Ty + r)^d$
- RBF (Gaussian): $K(x,y) = \exp(-\gamma|x-y|^2)$
Prediction: $f(x) = \text{sign}(\sum_{i \in SV} \alpha_i y_i K(x_i,x) + b)$