Welcome to the Course!-Course Introduction
Course 3 Introduction
()
Meet your instructors!
()
Read Me: Pre-requisites and Learning Objectives
Reinforcement Learning Textbook
On-policy Prediction with Approximation-Estimating values functions with supervised learning
Module 1 Learning Objectives
Weekly Reading: On-policy Prediction with Approximation
Moving to Parameterized Functions
()
Generalization and Discrimination
()
Framing Value Estimation as Supervised Learning
()
On-policy Prediction with Approximation-The Objective for On-policy Prediction
The Value Error Objective
()
Introducing Gradient Descent
()
Gradient Monte for Policy Evaluation
()
State Aggregation with Monte Carlo
()
On-policy Prediction with Approximation-The Objective for TD
Semi-Gradient TD for Policy Evaluation
()
Comparing TD and Monte Carlo with State Aggregation
()
Doina Precup: Building Knowledge for AI Agents with Reinforcement Learning
()
On-policy Prediction with Approximation-Linear TD
The Linear TD Update
()
The True Objective for TD
()
Week 1 Summary
()
Constructing Features for Prediction-Feature Construction for Linear Methods
Module 2 Learning Objectives
Weekly Reading: On-policy Prediction with Approximation II
Coarse Coding
()
Generalization Properties of Coarse Coding
()
Tile Coding
()
Using Tile Coding in TD
()
Constructing Features for Prediction-Neural Networks
What is a Neural Network?
()
Non-linear Approximation with Neural Networks
()
Deep Neural Networks
()
Constructing Features for Prediction-Training Neural Networks
Gradient Descent for Training Neural Networks
()
Optimization Strategies for NNs
()
David Silver on Deep Learning + RL = AI?
()
Week 2 Review
()
Control with Approximation -Episodic Sarsa with Function Approximation
Module 3 Learning Objectives
Weekly Reading: On-policy Control with Approximation
Episodic Sarsa with Function Approximation
()
Episodic Sarsa in Mountain Car
()
Expected Sarsa with Function Approximation
()
Control with Approximation -Exploration under Function Approximation
Exploration under Function Approximation
()
Control with Approximation -Average Reward
Average Reward: A New Way of Formulating Control Problems
()
Satinder Singh on Intrinsic Rewards
()
Week 3 Review
()
Policy Gradient-Learning Parameterized Policies
Module 4 Learning Objectives
Weekly Reading: Policy Gradient Methods
Learning Policies Directly
()
Advantages of Policy Parameterization
()
Policy Gradient-Policy Gradient for Continuing Tasks
The Objective for Learning Policies
()
The Policy Gradient Theorem
()
Policy Gradient-Actor-Critic for Continuing Tasks
Estimating the Policy Gradient
()
Actor-Critic Algorithm
()
Policy Gradient-Policy Parameterizations
Actor-Critic with Softmax Policies
()
Demonstration with Actor-Critic
()
Gaussian Policies for Continuous Actions
()
Week 4 Summary
()
Policy Gradient-Course Wrap-up
Congratulations! Course 4 Preview
()