Hadoop Basics-Lesson 1: Big Data Hadoop Stack
Hadoop Stack Basics
()
The Apache Framework: Basic Modules
()
Hadoop Distributed File System (HDFS)
()
The Hadoop "Zoo"
()
Hadoop Ecosystem Major Components
()
Apache Hadoop Ecosystem
Lesson 1 Slides (PDF)
Hadoop Basics-Lesson 2: Hands-On Exploration of the Cloudera VM
Hardware & Software Requirements
Exploring the Cloudera VM: Hands-On Part 1
()
Exploring the Cloudera VM: Hands-On Part 2
()
Lesson 2 Slides - Cloudera VM Tour
Introduction to the Hadoop Stack-Lesson 1: Overview of the Hadoop Stack
Overview of the Hadoop Stack
()
The Hadoop Distributed File System (HDFS) and HDFS2
()
MapReduce Framework and YARN
()
Hadoop Basics - Lesson 1 Slides
Introduction to the Hadoop Stack-Lesson 2: The Hadoop Execution Environment
The Hadoop Execution Environment
()
YARN, Tez, and Spark
()
Hadoop Resource Scheduling
()
Lesson 2: Hadoop Execution Environment - Slides
Introduction to the Hadoop Stack-Lesson 3: Overview of Hadoop based Applications and Services
Hadoop-Based Applications
()
Introduction to Apache Pig
()
Introduction to Apache HIVE
()
Introduction to Apache HBASE
()
Lesson 3: Hadoop-Based Applications Overview - All Slides
Command list for Applications Slides
Tips to handle service connection errors
References for Applications
Introduction to Hadoop Distributed File System (HDFS)-Lesson 1: HDFS Architecture and Configuration
Overview of HDFS Architecture
()
The HDFS Performance Envelope
()
Read/Write Processes in HDFS
()
Lesson 1: Introduction to HDFS - Slides
HDFS references
Introduction to Hadoop Distributed File System (HDFS)-Lesson 2: HDFS Performance and Tuning
HDFS Tuning Parameters
()
HDFS Performance and Robustness
()
Lesson 2: HDFS Performance and Tuning - Slides
Introduction to Hadoop Distributed File System (HDFS)-Lesson 3: HDFS Access, Commands, APIs, and Applications
Overview of HDFS Access, APIs, and Applications
()
HDFS Commands
()
Native Java API for HDFS
()
REST API for HDFS
()
HDFS Access, APIs
Lesson 3: HDFS Access, APIs, Applications - Slides
Introduction to Map/Reduce-Lesson 1: Introduction to Map/Reduce
Introduction to Map/Reduce
()
The Map/Reduce Framework
()
A MapReduce Example: Wordcount in detail
()
Lesson 1: Introduction to MapReduce - Slides
A note on debugging map/reduce programs.
Introduction to Map/Reduce-Lesson 2: Map/Reduce Examples and Principles
MapReduce: Intro to Examples and Principles
()
MapReduce Example: Trending Wordcount
()
MapReduce Example: Joining Data
()
MapReduce Example: Vector Multiplication
()
Computational Costs of Vector Multiplication
()
MapReduce Summary
()
Lesson 2: MapReduce Examples and Principles - Slides
Spark-Lesson 1: Introduction to Apache Spark
Introduction to Apache Spark
()
Architecture of Spark
()
Setup PySpark on the Cloudera VM
Lesson 1: Intro to Apache Spark - Slides
Spark-Lesson 2: Resilient Distributed Datasets and Transformations
Resilient Distributed Datasets
()
Spark Transformations
()
Wide Transformations
()
Lesson 2: RDD and Transformations - Slides
Spark-Lesson 3: Job scheduling, Actions, Caching and Shared Variables
Directed Acyclic Graph (DAG) Scheduler
()
Actions in Spark
()
Memory Caching in Spark
()
Broadcast Variables
()
Accumulators
()
Lesson 3: Scheduling, Actions, Caching - Slides