CS224R: Deep Reinforcement Learning | Rohit Kumar | AI Research Blog

Lec 01 - Introduction to RL

Core concepts, MDP vs POMDP, policy and value functions, RL algorithm types

Lec 02 - Imitation Learning

Behavioral cloning, DAgger, HG-DAgger, and addressing compounding errors

Lec 03 - Policy Gradients

Policy gradient derivation, REINFORCE, variance reduction, off-policy methods

Lec 04 - Actor-Critic

Value functions, advantage estimation, actor-critic algorithm, bootstrapping

References

Chelsea Finn, Sergey Levine. CS224R Lectures Stanford University (2025)