Welcome!-Welcome to the course
Important Update regarding the Machine Learning Specialization
Slides presented in this module
Welcome to the classification course, a part of the Machine Learning Specialization
()
What is this course about?
()
Impact of classification
()
Get help and meet other learners. Join your Community!
Welcome!-Course overview and details
Course overview
()
Outline of first half of course
()
Outline of second half of course
()
Assumed background
()
Let's get started!
()
Reading: Software tools you'll need
Linear Classifiers & Logistic Regression-Linear classifiers
Slides presented in this module
Linear classifiers: A motivating example
()
Intuition behind linear classifiers
()
Decision boundaries
()
Linear classifier model
()
Effect of coefficient values on decision boundary
()
Using features of the inputs
()
Linear Classifiers & Logistic Regression-Class probabilities
Predicting class probabilities
()
Review of basics of probabilities
()
Review of basics of conditional probabilities
()
Using probabilities in classification
()
Linear Classifiers & Logistic Regression-Logistic regression
Predicting class probabilities with (generalized) linear models
()
The sigmoid (or logistic) link function
()
Logistic regression model
()
Effect of coefficient values on predicted probabilities
()
Overview of learning logistic regression models
()
Linear Classifiers & Logistic Regression-Practical issues for classification
Encoding categorical inputs
()
Multiclass classification with 1 versus all
()
Linear Classifiers & Logistic Regression-Summarizing linear classifiers & logistic regression
Recap of logistic regression classifier
()
Linear Classifiers & Logistic Regression-Programming Assignment
Predicting sentiment from product reviews
Learning Linear Classifiers-Maximum likelihood estimation
Slides presented in this module
Goal: Learning parameters of logistic regression
()
Intuition behind maximum likelihood estimation
()
Data likelihood
()
Finding best linear classifier with gradient ascent
()
Learning Linear Classifiers-Gradient ascent algorithm for learning logistic regression classifier
Review of gradient ascent
()
Learning algorithm for logistic regression
()
Example of computing derivative for logistic regression
()
Interpreting derivative for logistic regression
()
Summary of gradient ascent for logistic regression
()
Learning Linear Classifiers-Choosing step size for gradient ascent/descent
Choosing step size
()
Careful with step sizes that are too large
()
Rule of thumb for choosing step size
()
Learning Linear Classifiers-(VERY OPTIONAL LESSON) Deriving gradient of logistic regression
(VERY OPTIONAL) Deriving gradient of logistic regression: Log trick
()
(VERY OPTIONAL) Expressing the log-likelihood
()
(VERY OPTIONAL) Deriving probability y=-1 given x
()
(VERY OPTIONAL) Rewriting the log likelihood into a simpler form
()
(VERY OPTIONAL) Deriving gradient of log likelihood
()
Learning Linear Classifiers-Summarizing learning linear classifiers
Recap of learning logistic regression classifiers
()
Learning Linear Classifiers-Programming Assignment
Implementing logistic regression from scratch
Overfitting & Regularization in Logistic Regression-Overfitting in classification
Slides presented in this module
Evaluating a classifier
()
Review of overfitting in regression
()
Overfitting in classification
()
Visualizing overfitting with high-degree polynomial features
()
Overfitting & Regularization in Logistic Regression-Overconfident predictions due to overfitting
Overfitting in classifiers leads to overconfident predictions
()
Visualizing overconfident predictions
()
(OPTIONAL) Another perspecting on overfitting in logistic regression
()
Overfitting & Regularization in Logistic Regression-L2 regularized logistic regression
Penalizing large coefficients to mitigate overfitting
()
L2 regularized logistic regression
()
Visualizing effect of L2 regularization in logistic regression
()
Learning L2 regularized logistic regression with gradient ascent
()
Overfitting & Regularization in Logistic Regression-Sparse logistic regression
Sparse logistic regression with L1 regularization
()
Overfitting & Regularization in Logistic Regression-Summarizing overfitting & regularization in logistic regression
Recap of overfitting & regularization in logistic regression
()
Overfitting & Regularization in Logistic Regression-Programming Assignment
Logistic Regression with L2 regularization
Decision Trees-Intuition behind decision trees
Slides presented in this module
Predicting loan defaults with decision trees
()
Intuition behind decision trees
()
Task of learning decision trees from data
()
Decision Trees-Learning decision trees
Recursive greedy algorithm
()
Learning a decision stump
()
Selecting best feature to split on
()
When to stop recursing
()
Decision Trees-Using the learned decision tree
Making predictions with decision trees
()
Multiclass classification with decision trees
()
Decision Trees-Learning decision trees with continuous inputs
Threshold splits for continuous inputs
()
(OPTIONAL) Picking the best threshold to split on
()
Visualizing decision boundaries
()
Decision Trees-Summarizing decision trees
Recap of decision trees
()
Decision Trees-Programming Assignment 1
Identifying safe loans with decision trees
Decision Trees-Programming Assignment 2
Implementing binary decision trees
Preventing Overfitting in Decision Trees-Overfitting in decision trees
Slides presented in this module
A review of overfitting
()
Overfitting in decision trees
()
Preventing Overfitting in Decision Trees-Early stopping to avoid overfitting
Principle of Occam's razor: Learning simpler decision trees
()
Early stopping in learning decision trees
()
Preventing Overfitting in Decision Trees-(OPTIONAL LESSON) Pruning decision trees
(OPTIONAL) Motivating pruning
()
(OPTIONAL) Pruning decision trees to avoid overfitting
()
(OPTIONAL) Tree pruning algorithm
()
Preventing Overfitting in Decision Trees-Summarizing preventing overfitting in decision trees
Recap of overfitting and regularization in decision trees
()
Preventing Overfitting in Decision Trees-Programming Assignment
Decision Trees in Practice
Handling Missing Data-Basic strategies for handling missing data
Slides presented in this module
Challenge of missing data
()
Strategy 1: Purification by skipping missing data
()
Strategy 2: Purification by imputing missing data
()
Handling Missing Data-Strategy 3: Modify learning algorithm to explicitly handle missing data
Modifying decision trees to handle missing data
()
Feature split selection with missing data
()
Handling Missing Data-Summarizing handling missing data
Recap of handling missing data
()
Boosting-The amazing idea of boosting a classifier
Slides presented in this module
The boosting question
()
Ensemble classifiers
()
Boosting
()
Boosting-AdaBoost
AdaBoost overview
()
Weighted error
()
Computing coefficient of each ensemble component
()
Reweighing data to focus on mistakes
()
Normalizing weights
()
Boosting-Applying AdaBoost
Example of AdaBoost in action
()
Learning boosted decision stumps with AdaBoost
()
Boosting-Programming Assignment 1
Exploring Ensemble Methods
Boosting-Convergence and overfitting in boosting
The Boosting Theorem
()
Overfitting in boosting
()
Boosting-Summarizing boosting
Ensemble methods, impact of boosting & quick recap
()
Boosting-Programming Assignment 2
Boosting a decision stump
Precision-Recall-Why use precision & recall as quality metrics
Slides presented in this module
Case-study where accuracy is not best metric for classification
()
What is good performance for a classifier?
()
Precision-Recall-Precision & recall explained
Precision: Fraction of positive predictions that are actually positive
()
Recall: Fraction of positive data predicted to be positive
()
Precision-Recall-The precision-recall tradeoff
Precision-recall extremes
()
Trading off precision and recall
()
Precision-recall curve
()
Precision-Recall-Summarizing precision-recall
Recap of precision-recall
()
Precision-Recall-Programming Assignment
Exploring precision and recall
Scaling to Huge Datasets & Online Learning-Scaling ML to huge datasets
Slides presented in this module
Gradient ascent won't scale to today's huge datasets
()
Timeline of scalable machine learning & stochastic gradient
()
Scaling to Huge Datasets & Online Learning-Scaling ML with stochastic gradient
Why gradient ascent won't scale
()
Stochastic gradient: Learning one data point at a time
()
Comparing gradient to stochastic gradient
()
Scaling to Huge Datasets & Online Learning-Understanding why stochastic gradient works
Why would stochastic gradient ever work?
()
Convergence paths
()
Scaling to Huge Datasets & Online Learning-Stochastic gradient: Practical tricks
Shuffle data before running stochastic gradient
()
Choosing step size
()
Don't trust last coefficients
()
(OPTIONAL) Learning from batches of data
()
(OPTIONAL) Measuring convergence
()
(OPTIONAL) Adding regularization
()
Scaling to Huge Datasets & Online Learning-Online learning: Fitting models from streaming data
The online learning task
()
Using stochastic gradient for online learning
()
Scaling to Huge Datasets & Online Learning-Summarizing scaling to huge datasets & online learning
Scaling to huge datasets through parallelization & module recap
()
Scaling to Huge Datasets & Online Learning-Programming Assignment
Training Logistic Regression via Stochastic Gradient Ascent