Getting Started + Spark Basics-Getting Started
Working on Assignments
Tools Setup (Please read)
Scala 3 REPL and Worksheets
Cheat Sheet
SBT tutorial and Submission of Assignments (Please read)
Learning Resources
Scala Style Guide
Getting Started + Spark Basics-From Parallel to Distributed
Introduction, Logistics, What You'll Learn
()
Data-Parallel to Distributed Data-Parallel
()
Latency
()
Getting Started + Spark Basics-Basics of Spark's RDDs
RDDs, Spark's Distributed Collection
()
RDDs: Transformation and Actions
()
Evaluation in Spark: Unlike Scala Collections!
()
Cluster Topology Matters!
()
Reduction Operations & Distributed Key-Value Pairs-Reduction Operations & Distributed Key-Value Pairs
Reduction Operations
()
Pair RDDs
()
Transformations and Actions on Pair RDDs
()
Joins
()
Partitioning and Shuffling-Partitioning and Shuffling
Shuffling: What it is and why it's important
()
Partitioning
()
Optimizing with Partitioners
()
Wide vs Narrow Dependencies
()
Structured data: SQL, Dataframes, and Datasets-SQL, Dataframes, and Datasets
Structured vs Unstructured Data
()
Spark SQL
()
DataFrames (1)
()
DataFrames (2)
()
Datasets
()