Introduction the course and grading environment-Introduction to the course
Course Overview and a warm welcome
()
Intro to Apache Spark
Overview of technology used within the course
()
IMPORTANT: How to submit your programming assignments
Tools that support BigData solutions-Apache Spark Basics
Data storage solutions
()
Parallel data processing strategies of Apache Spark
()
Programming language options on ApacheSpark
()
Functional programming basics
()
Apache Parquet (optional)
Introduction of Cloudant
()
Resilient Distributed Dataset and DataFrames - ApacheSparkSQL
()
OPTIONAL: Test Data Generator (data is provided for you already)
()
Create the data on your own (optional)
Scaling Math for Statistics on Apache Spark-Scaling Math for Statistics on Apache Spark
Overview of the week...
()
Averages
()
Standard deviation
()
Skewness
()
Kurtosis
()
Covariance, Covariance matrices, correlation
()
Multidimensional vector spaces
()
Exercise 2
Data Visualization of Big Data-Data Visualization of Big Data
Overview of the week
()
Plotting with ApacheSpark and python's matplotlib
()
Exercise on Plotting
Dimensionality reduction
()
PCA
()
Exercise on PCA
Data Visualization of Big Data-(Optional) Watson Studio
Assignment and Exercise Environment Setup
(Optional) Week 1: Setup the ApacheSpark and Jupyter notebook in Watson Studio Assignment 1_1
(Optional) Week 1: Setup Programming Assignment 1_2 in Watson Studio
(Optional) Week 2: Setup Programming Assignment in Watson Studio
(Optional) Week 3: Setup Programming Assignment in Watson Studio
(Optional) Week 4: Setup Programming Assignment in Watson Studio