Welcome to the Course! -Course Introduction
Specialization Introduction
()
Course Introduction
()
Meet your instructors!
()
Your Specialization Roadmap
()
Reinforcement Learning Textbook
Read Me: Pre-requisites and Learning Objectives
An Introduction to Sequential Decision-Making-The K-Armed Bandit Problem
Module 1 Learning Objectives
Weekly Reading
Sequential Decision Making with Evaluative Feedback
()
An Introduction to Sequential Decision-Making-What to Learn? Estimating Action Values
Learning Action Values
()
Estimating Action Values Incrementally
()
An Introduction to Sequential Decision-Making-Exploration vs. Exploitation Tradeoff
What is the trade-off?
()
Optimistic Initial Values
()
Upper-Confidence Bound (UCB) Action Selection
()
Jonathan Langford: Contextual Bandits for Real World Reinforcement Learning
()
Week 1 Summary
()
Chapter Summary
Markov Decision Processes-Introduction to Markov Decision Processes
Module 2 Learning Objectives
Weekly Reading
Markov Decision Processes
()
Examples of MDPs
()
Markov Decision Processes-Goal of Reinforcement Learning
The Goal of Reinforcement Learning
()
Michael Littman: The Reward Hypothesis
()
Markov Decision Processes-Continuing Tasks
Continuing Tasks
()
Examples of Episodic and Continuing Tasks
()
Week 2 Summary
()
Value Functions & Bellman Equations -Policies and Value Functions
Module 3 Learning Objectives
Weekly Reading
Specifying Policies
()
Value Functions
()
Rich Sutton and Andy Barto: A brief History of RL
()
Value Functions & Bellman Equations -Bellman Equations
Bellman Equation Derivation
()
Why Bellman Equations?
()
Value Functions & Bellman Equations -Optimality (Optimal Policies & Value Functions)
Optimal Policies
()
Optimal Value Functions
()
Using Optimal Value Functions to Get Optimal Policies
()
Week 3 Summary
()
Chapter Summary
Dynamic Programming-Policy Evaluation (Prediction)
Module 4 Learning Objectives
Weekly Reading
Policy Evaluation vs. Control
()
Iterative Policy Evaluation
()
Dynamic Programming-Policy Iteration (Control)
Policy Improvement
()
Policy Iteration
()
Dynamic Programming-Generalized Policy Iteration
Flexibility of the Policy Iteration Framework
()
Efficiency of Dynamic Programming
()
Warren Powell: Approximate Dynamic Programming for Fleet Management (Short)
()
Warren Powell: Approximate Dynamic Programming for Fleet Management (Long)
()
Week 4 Summary
()
Chapter Summary
Dynamic Programming-Course Wrap-up
Congratulations!
()