Process
- Initialize number of k
- Randomly choose k points as centroids
- While centroids do not change (or max iterations):
- Assign each data point to its nearest centroid
- Recompute the centroids of each cluster
Optimal K (Elbow Method)
- Calculate Within-Cluster Sum of Squares (WSS) for different k
- Plot WSS vs k
- Choose k at the “elbow” point
Advantages
- Simple and intuitive
- Fast: $O(nkd)$ time complexity
- Memory efficient — only stores centroids and assignments
Drawbacks
- Sensitive to initialization (may get stuck in local optima)
- Assumes spherical, similarly-sized clusters
- Requires pre-specification of k
- Sensitive to outliers