Loss Functions

MSE, Cross-Entropy, Focal, Triplet, Contrastive, KL Divergence and more

Mean Square Loss

  • Use: Regression
  • $L_{MSE} = \frac{1}{n}\sum_i (y_i - \hat{y}_i)^2$
  • Sensitive to outliers

Binary Cross Entropy Loss

  • Use: Binary Classification
  • $L_{BCE} = -\sum_i y_i \log(\hat{y}{i}) + (1 - y_i) \log(1 - \hat{y}{i})$
  • Penalizes confident wrong predictions heavily

Cross Entropy Loss

  • Use: Multi-class Classification
  • $L_{CE} = -\sum_i y_i \log(\hat{y}_i)$

Hinge Loss

  • Use: SVM Classification
  • $L_{hinge}=\max(0, 1-\hat{y} \cdot y)$
  • Robust to outliers

Focal Loss

  • Use: Object Detection / Imbalanced Classification
  • $L_{focal} = - (1 - p_t)^{\gamma} \log(p_t)$
  • $\gamma$ controls focus on hard examples

Triplet Loss

  • Use: Similarity Learning / Embedding Learning
  • $L_{triplet} = \sum_i^N [|f(x_i^a) - f(x_i^p)|2^2 - |f(x_i^a) - f(x_i^n)|_2^2 + \alpha]{+}$
  • Requires triplets (anchor, positive, negative). Used in face recognition.

KL Divergence Loss

  • Use: Distribution Learning
  • $KL(P|Q)=\sum P(x) \log \frac{P(x)}{Q(x)}$
  • Not symmetric. Used in VAE.

Contrastive Loss

  • Use: Self-supervised / Multi-modal Learning
  • $l_i^{(u \rightarrow v)} = - \log \frac{\exp(sim(u_i,v_i)/\tau)}{\sum_{k=1}^N \exp(sim(u_i, v_k)/ \tau)}$
  • Pulls similar pairs together, pushes dissimilar apart. Used in CLIP, SimCLR.