Bias-Variance Tradeoff

ml ml supervised bias variance overfitting underfitting 2 min read

Understanding underfitting, overfitting, and the bias-variance decomposition

Let $f(x)$ be true model and $\hat{f}(x)$ be estimate of our model.

Bias

Measures the difference between the model’s average prediction and the true value
$\text{Bias}(\hat{f}(x)) = E[\hat{f}(x)] - f(x)$
Simple models have very high bias, complex models have very low bias

Variance

Measures the model’s sensitivity to fluctuations in the training set
$\text{Variance}(\hat{f}(x)) = E[(\hat{f}(x) - E[\hat{f}(x)])^2]$
Simple models have low variance, complex models have high variance

Summary:

Simple Model: high bias, low variance, underfitting
Complex Model: low bias, high variance, overfitting

Trade-off

\[E[(y - \hat{f}(x))^2] = \text{Bias}^2 + \text{Variance} + \sigma^2 \text{ (irreducible error)}\]

Underfitting

High loss for training and test
Fix: More complex models, reduce regularization, increase training time

Overfitting

Low training loss but high test loss
Fix: Regularization, more training data, data augmentation, K-fold cross-validation, reduce features