Bias-Variance Tradeoff

Understanding underfitting, overfitting, and the bias-variance decomposition

Let $f(x)$ be true model and $\hat{f}(x)$ be estimate of our model.

Bias

  • Measures the difference between the model’s average prediction and the true value
  • $\text{Bias}(\hat{f}(x)) = E[\hat{f}(x)] - f(x)$
  • Simple models have very high bias, complex models have very low bias

Variance

  • Measures the model’s sensitivity to fluctuations in the training set
  • $\text{Variance}(\hat{f}(x)) = E[(\hat{f}(x) - E[\hat{f}(x)])^2]$
  • Simple models have low variance, complex models have high variance

Summary:

  • Simple Model: high bias, low variance, underfitting
  • Complex Model: low bias, high variance, overfitting

Trade-off

\[E[(y - \hat{f}(x))^2] = \text{Bias}^2 + \text{Variance} + \sigma^2 \text{ (irreducible error)}\]

Underfitting

  • High loss for training and test
  • Fix: More complex models, reduce regularization, increase training time

Overfitting

  • Low training loss but high test loss
  • Fix: Regularization, more training data, data augmentation, K-fold cross-validation, reduce features