Introduction
Data is messy
()
What you need to know
()
1. Missing Data
Types of missing data
()
Missing values
()
Missing rows
()
Aggregations and missing values
()
2. Duplicated Data
Duplicated rows and values
()
Aggregations in the data set
()
3. Formatting Data
Converting dates
()
Unit conversions
()
Numbers stored as text
()
Text improperly converted to numbers
()
Inconsistent spellings
()
4. Outliers
Screening for outliers
()
Handling outliers
()
Outliers use case
()
Outliers in subgroups
()
Detecting illogical values
()
5. Tidy Data
What is tidy data?
()
Variables, observations, and values
()
Common data problems
()
Wide vs. long data sets
()
Making wide data sets long
()
Making long data sets wide
()
6. Red Flags
Suspicious values
()
Suspicious multiples
()
Conclusion
What's next?
()
Ex_Files_Cleaning_Bad_Data_R.zip
(41.7 MB)