Let $f(x)$ be true model and $\hat{f}(x)$ be estimate of our model.
Bias
- Measures the difference between the model’s average prediction and the true value
- $\text{Bias}(\hat{f}(x)) = E[\hat{f}(x)] - f(x)$
- Simple models have very high bias, complex models have very low bias
Variance
- Measures the model’s sensitivity to fluctuations in the training set
- $\text{Variance}(\hat{f}(x)) = E[(\hat{f}(x) - E[\hat{f}(x)])^2]$
- Simple models have low variance, complex models have high variance
Summary:
- Simple Model: high bias, low variance, underfitting
- Complex Model: low bias, high variance, overfitting
Trade-off
\[E[(y - \hat{f}(x))^2] = \text{Bias}^2 + \text{Variance} + \sigma^2 \text{ (irreducible error)}\]Underfitting
- High loss for training and test
- Fix: More complex models, reduce regularization, increase training time
Overfitting
- Low training loss but high test loss
- Fix: Regularization, more training data, data augmentation, K-fold cross-validation, reduce features