Orientation to Data in Clusters and Cloud Storage-Introduction
Welcome to the Course
()
Review and Preparation
Instructions for Downloading and Installing the Exercise Environment
Troubleshooting the VM
Orientation to Data in Clusters and Cloud Storage-Browsing Tables in the Metastore
Browsing Tables with Hue
()
Browsing Tables with SQL Utility Statements
()
Orientation to Data in Clusters and Cloud Storage-Browsing Files in HDFS
Browsing HDFS with the Hue File Browser
()
Browsing HDFS from the Command Line
()
Orientation to Data in Clusters and Cloud Storage-Browsing Files in S3
Understanding S3 and Other Cloud Storage Platforms
()
Browsing S3 Buckets from the Command Line
()
Defining Databases, Tables, and Columns-Introduction
Week 2 Introduction
()
Defining Databases, Tables, and Columns-Creating Databases and Tables
Creating Databases and Tables with Hue
Creating Databases and Tables with SQL
Permissions to Create Databases and Tables
Defining Databases, Tables, and Columns-The CREATE TABLE Statement
Introduction to the CREATE TABLE Statement
()
The ROW FORMAT Clause
The STORED AS Clause
The LOCATION Clause
CREATE TABLE Shortcuts
Using Different Schemas on the Same Data
()
Defining Databases, Tables, and Columns-Advanced CREATE TABLE Techniques
Specifying TBLPROPERTIES
()
Using Hive SerDes
Working with Unstructured and Semi-Structured Data
Defining Databases, Tables, and Columns-Managing Existing Tables
Examining, Modifying, and Removing Tables
()
Examining Table Structure
Dropping Databases and Tables
Modifying Existing Tables
Defining Databases, Tables, and Columns-Apache Hive and Apache Impala Interoperability
Hive and Impala Interoperability
()
Impala Metadata Refresh
()
Data Types and File Types-Introduction
Week 3 Introduction
()
Data Types and File Types-Data Types
Overview of Data Types
()
Integer Data Types
Decimal Data Types
Character String Data Types
Other Data Types
Data Types and File Types-Working with Data Types
Choosing the Right Data Types
()
Examining Data Types
Out-of-Range Values
Data Types and File Types-File Types
Overview of File Types
()
Text Files
Avro Files
Parquet Files
ORC Files
Other File Types
Data Types and File Types-Working with File Types
Choosing the Right File Types
()
Creating Tables with Avro and Parquet Files
Managing Datasets in Clusters and Cloud Storage-Introduction
Week 4 Introduction
()
Refresh Impala's Metadata Cache after Loading Data
()
Managing Datasets in Clusters and Cloud Storage-Loading Files into HDFS
Loading Files into HDFS with Hue's Table Browser
()
Loading Files into HDFS with Hue's File Browser
()
Loading Files into HDFS from the Command Line
()
More about HDFS Shell Commands
Chaining and Scripting with HDFS Commands
HDFS Permissions
Managing Datasets in Clusters and Cloud Storage-Loading Data into Cloud Storage
Loading Files into S3 from the Command Line
()
Other Ways to Load Files into S3
S3 Permissions
Missing Values
Character Sets
Managing Datasets in Clusters and Cloud Storage-Using Sqoop to Load Data from Relational Databases
Using Sqoop to Import Data
More Sqoop Import Options
Using Sqoop to Export Data
Managing Datasets in Clusters and Cloud Storage-Using Hive and Impala to Load Data into Tables
Using Hive and Impala to Load Data into Tables
()
SQL LOAD DATA Statements
SQL INSERT Statements
SQL INSERT ... SELECT and CTAS Statements
Managing Datasets in Clusters and Cloud Storage-Quizzes, Peer-Graded Assignment, and Conclusion
Conclusion
()
Optimizing Hive and Impala (Honors)-Introduction
Week 5 Introduction
()
Optimizing Hive and Impala (Honors)-Simplifying Queries with Views
What to Do When Queries Are Too Complex
()
Creating and Querying Views
Modifying and Removing Views
Materialized and Non-Materialized Views
The ORDER BY Clause in Views
Optimizing Hive and Impala (Honors)-Improving Query Performance
What to Do When Queries Take Too Long
()
Choosing Which Query Engine to Use
Understanding Map Tasks and Reduce Tasks
Hive Query Performance Patterns
Understanding Execution Plans
Table and Column Statistics
Other Strategies for Query Optimization
Optimizing Hive and Impala (Honors)-Table Partitioning
When to Use Table Partitioning
()
Creating Partitioned Tables
Loading Data with Dynamic Partition
Loading Data with Static Partitioning
Risks of Using Partitioning
Optimizing Hive and Impala (Honors)-Complex Data and Denormalization
When to Use Complex Columns
()
Complex Data Types
Creating Tables with Complex Data
Querying Complex Data with Hive
Querying Complex Data with Impala
Complex Data in Practice
Optimizing Hive and Impala (Honors)-Storage Engines
File Systems versus Storage Engines
()
Overview of Apache Kudu