Activation Functions

Sigmoid, Tanh, ReLU, GeLU, Swish and other activation functions with derivatives

Sigmoid

  • $f(x) = \frac{1}{1 + e^{-x}} = \sigma(x)$
  • $f’(x) = \sigma(x)(1 - \sigma(x))$

Tanh

  • $f(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
  • $f’(x) = 1 - \tanh^2(x)$

ReLU

  • $f(x) = \max(0, x)$
  • $f’(x) = \begin{cases} 1 & x > 0 \ 0 & x \leq 0 \end{cases}$

Leaky ReLU

  • $f(x) = \begin{cases} x & x > 0 \ \alpha x & x \leq 0 \end{cases}$
  • $f’(x) = \begin{cases} 1 & x > 0 \ \alpha & x \leq 0 \end{cases}$

ELU (Exponential Linear Unit)

  • $f(x) = \begin{cases} x & x > 0 \ \alpha(e^x - 1) & x \leq 0 \end{cases}$
  • $f’(x) = \begin{cases} 1 & x > 0 \ \alpha e^x & x \leq 0 \end{cases}$

Swish

  • $f(x) = x \cdot \sigma(x)$
  • $f’(x) = f(x) + \sigma(x)(1 - f(x))$

PReLU (Parametric ReLU)

  • Same as Leaky ReLU but $\alpha$ is a learnable parameter

GeLU

  • $f(x) = x \cdot \Phi(x)$
  • $\Phi(x) = \frac{1}{2}\left[1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right]$
  • Approximation: $\Phi(x) \approx 0.5 \times (1 + \tanh(\sqrt{\frac{2}{\pi}} \times (x + 0.044715 \times x^3)))$

Linear

  • $f(x) = x$
  • $f’(x) = 1$

Comparison