Mean Square Loss
- Use: Regression
- $L_{MSE} = \frac{1}{n}\sum_i (y_i - \hat{y}_i)^2$
- Sensitive to outliers
Binary Cross Entropy Loss
- Use: Binary Classification
- $L_{BCE} = -\sum_i y_i \log(\hat{y}{i}) + (1 - y_i) \log(1 - \hat{y}{i})$
- Penalizes confident wrong predictions heavily
Cross Entropy Loss
- Use: Multi-class Classification
- $L_{CE} = -\sum_i y_i \log(\hat{y}_i)$
Hinge Loss
- Use: SVM Classification
- $L_{hinge}=\max(0, 1-\hat{y} \cdot y)$
- Robust to outliers
- Use: Object Detection / Imbalanced Classification
- $L_{focal} = - (1 - p_t)^{\gamma} \log(p_t)$
- $\gamma$ controls focus on hard examples
- Use: Similarity Learning / Embedding Learning
- $L_{triplet} = \sum_i^N [|f(x_i^a) - f(x_i^p)|2^2 - |f(x_i^a) - f(x_i^n)|_2^2 + \alpha]{+}$
- Requires triplets (anchor, positive, negative). Used in face recognition.
KL Divergence Loss
- Use: Distribution Learning
- $KL(P|Q)=\sum P(x) \log \frac{P(x)}{Q(x)}$
- Not symmetric. Used in VAE.
- Use: Self-supervised / Multi-modal Learning
- $l_i^{(u \rightarrow v)} = - \log \frac{\exp(sim(u_i,v_i)/\tau)}{\sum_{k=1}^N \exp(sim(u_i, v_k)/ \tau)}$
- Pulls similar pairs together, pushes dissimilar apart. Used in CLIP, SimCLR.