Inference & Model Compression

Latency, throughput, quantization, pruning, and distillation

Latency

Bandwidth

Throughput

Model Compression

Quantization

Pruning

Distillation

TODO: Add content