Sigmoid
- $f(x) = \frac{1}{1 + e^{-x}} = \sigma(x)$
- $f’(x) = \sigma(x)(1 - \sigma(x))$
Tanh
- $f(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
- $f’(x) = 1 - \tanh^2(x)$
ReLU
- $f(x) = \max(0, x)$
- $f’(x) = \begin{cases} 1 & x > 0 \ 0 & x \leq 0 \end{cases}$
Leaky ReLU
- $f(x) = \begin{cases} x & x > 0 \ \alpha x & x \leq 0 \end{cases}$
- $f’(x) = \begin{cases} 1 & x > 0 \ \alpha & x \leq 0 \end{cases}$
ELU (Exponential Linear Unit)
- $f(x) = \begin{cases} x & x > 0 \ \alpha(e^x - 1) & x \leq 0 \end{cases}$
- $f’(x) = \begin{cases} 1 & x > 0 \ \alpha e^x & x \leq 0 \end{cases}$
Swish
- $f(x) = x \cdot \sigma(x)$
- $f’(x) = f(x) + \sigma(x)(1 - f(x))$
PReLU (Parametric ReLU)
- Same as Leaky ReLU but $\alpha$ is a learnable parameter
GeLU
- $f(x) = x \cdot \Phi(x)$
- $\Phi(x) = \frac{1}{2}\left[1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right]$
- Approximation: $\Phi(x) \approx 0.5 \times (1 + \tanh(\sqrt{\frac{2}{\pi}} \times (x + 0.044715 \times x^3)))$
Linear
Comparison