Home | Rohit Kumar | rohit.vision

Flash Attention: Making Transformers Scale

Dec 18, 2024 9 min

In my previous post on Transformer Attention, we explored the mathematical foundations of attention. The key limitation? Quadratic memory complexity $O(n^2)$...

transformers attention optimization flash-attention

Transformer Attention: A Mathematical Deep Dive

Dec 17, 2024 13 min

The attention mechanism is the core innovation behind transformers. Let’s break it down mathematically and implement it from scratch.

transformers attention deep-learning tutorial