Welcome to the Course! -Course Introduction
Course Introduction
()
Meet your instructors!
()
Reinforcement Learning Textbook
Read Me: Pre-requisites and Learning Objectives
Monte Carlo Methods for Prediction & Control-Introduction to Monte Carlo Methods
Module 1 Learning Objectives
Weekly Reading
What is Monte Carlo?
()
Using Monte Carlo for Prediction
()
Monte Carlo Methods for Prediction & Control-Monte Carlo for Control
Using Monte Carlo for Action Values
()
Using Monte Carlo methods for generalized policy iteration
()
Solving the Blackjack Example
()
Monte Carlo Methods for Prediction & Control-Exploration Methods for Monte Carlo
Epsilon-soft policies
()
Monte Carlo Methods for Prediction & Control-Off-policy Learning for Prediction
Why does off-policy learning matter?
()
Importance Sampling
()
Off-Policy Monte Carlo Prediction
()
Emma Brunskill: Batch Reinforcement Learning
()
Week 1 Summary
()
Chapter Summary
Temporal Difference Learning Methods for Prediction -Introduction to Temporal Difference Learning
Module 2 Learning Objectives
Weekly Reading
What is Temporal Difference (TD) learning?
()
Rich Sutton: The Importance of TD Learning
()
Temporal Difference Learning Methods for Prediction -Advantages of TD
The advantages of temporal difference learning
()
Comparing TD and Monte Carlo
()
Andy Barto and Rich Sutton: More on the History of RL
()
Week 2 Summary
()
Temporal Difference Learning Methods for Control -TD for Control
Module 3 Learning Objectives
Weekly Reading
Sarsa: GPI with TD
()
Sarsa in the Windy Grid World
()
Temporal Difference Learning Methods for Control - Off-policy TD Control: Q-learning
What is Q-learning?
()
Q-learning in the Windy Grid World
()
How is Q-learning off-policy?
()
Temporal Difference Learning Methods for Control -Expected Sarsa
Expected Sarsa
()
Expected Sarsa in the Cliff World
()
Generality of Expected Sarsa
()
Week 3 Summary
()
Chapter summary
Planning, Learning & Acting-What is a Model?
Module 4 Learning Objectives
Weekly Reading
What is a Model?
()
Comparing Sample and Distribution Models
()
Planning, Learning & Acting-Planning
Random Tabular Q-planning
()
Planning, Learning & Acting-Dyna as a formalism for planning
The Dyna Architecture
()
The Dyna Algorithm
()
Dyna & Q-learning in a Simple Maze
()
Planning, Learning & Acting-Dealing with inaccurate models
What if the model is inaccurate?
()
In-depth with changing environments
()
Drew Bagnell: self-driving, robotics, and Model Based RL
()
Week 4 Summary
()
Chapter Summary
Text Book Part 1 Summary
Planning, Learning & Acting-Course Wrap-up
Congratulations!
()