Data Science Context and Concepts-Lesson 1: Examples and the Diversity of Data Science
Appetite Whetting: Politics
()
Appetite Whetting: Extreme Weather
()
Appetite Whetting: Digital Humanities
()
Appetite Whetting: Bibliometrics
()
Appetite Whetting: Food, Music, Public Health
()
Appetite Whetting: Public Health cont'd, Earthquakes, Legal
()
Data Science Context and Concepts-Lesson 2: Working Definitions of Data Science
Characterizing Data Science
()
Characterizing Data Science, cont'd
()
Distinguishing Data Science from Related Topics
()
Four Dimensions of Data Science
()
Data Science Context and Concepts-Lesson 3: Characterizing this Course
Tools vs. Abstractions
()
Desktop Scale vs. Cloud Scale
()
Hackers vs. Analysts
()
Structs vs. Stats
()
Structs vs. Stats cont'd
()
Data Science Context and Concepts-Lesson 4: Related Topics
A Fourth Paradigm of Science
()
Data-Intensive Science Examples
()
Big Data and the 3 Vs
()
Big Data Definitions
()
Big Data Sources
()
Data Science Context and Concepts-Lesson 5 : Course Logistics
Supplementary: Three-Course Reading List
Supplementary: Resources for Learning Python
Course Logistics
()
Data Science Context and Concepts-Assignment 1: Twitter Sentiment Analysis
Supplementary: Class Virtual Machine
Supplementary: Github Instructions
Twitter Assignment: Getting Started
()
Relational Databases and the Relational Algebra-Lesson 6: Principles of Data Manipulation and Management
Data Models, Terminology
()
From Data Models to Databases
()
Pre-Relational Databases
()
Motivating Relational Databases
()
Relational Databases: Key Ideas
()
Relational Databases and the Relational Algebra-Lesson 7: Relational Algebra
Algebraic Optimization Overview
()
Relational Algebra Overview
()
Relational Algebra Operators: Union, Difference, Selection
()
Relational Algebra Operators: Projection, Cross Product
()
Relational Algebra Operators: Cross Product cont'd, Join
()
Relational Algebra Operators: Outer Join
()
Relational Algebra Operators: Theta-Join
()
Relational Databases and the Relational Algebra-Lesson 8: SQL for Data Science
From SQL to RA
()
Thinking in RA: Logical Query Plans
()
Practical SQL: Binning Timeseries
()
Practical SQL: Genomic Intervals
()
User-Defined Functions
()
Support for User-Defined Functions
()
Relational Databases and the Relational Algebra-Lesson 9: Key Principles of Relational Databases
Optimization: Physical Query Plans
()
Optimization: Choosing Physical Plans
()
Declarative Languages
()
Declarative Languages: More Examples
()
Views: Logical Data Independence
()
Indexes
()
MapReduce and Parallel Dataflow Programming-Lesson 10: Reasoning about Scale
What Does Scalable Mean?
()
A Sketch of Algorithmic Complexity
()
A Sketch of Data-Parallel Algorithms
()
"Pleasingly Parallel" Algorithms
()
More General Distributed Algorithms
()
MapReduce and Parallel Dataflow Programming-Lesson 11: The MapReduce Programming Model
MapReduce Abstraction
()
MapReduce Data Model
()
Map and Reduce Functions
()
MapReduce Simple Example
()
MapReduce Simple Example cont'd
()
MapReduce Example: Word Length Histogram
()
MapReduce Examples: Inverted Index, Join
()
MapReduce and Parallel Dataflow Programming-Lesson 12: Algorithms in MapReduce
Relational Join: Map Phase
()
Relational Join: Reduce Phase
()
Simple Social Network Analysis: Counting Friends
()
Matrix Multiply Overview
()
Matrix Multiply Illustrated
()
Shared Nothing Computing
()
MapReduce Implementation
()
MapReduce Phases
()
MapReduce and Parallel Dataflow Programming-Lesson 13: Parallel Databases vs. MapReduce
A Design Space for Large-Scale Data Systems
()
Parallel and Distributed Query Processing
()
Teradata Example, MR Extensions
()
RDBMS vs. MapReduce: Features
()
RDBMS vs. Hadoop: Grep
()
RDBMS vs. Hadoop: Select, Aggregate, Join
()
NoSQL: Systems and Concepts-Lesson 14: What problems do NoSQL systems aim to solve?
NoSQL Context and Roadmap
()
NoSQL Roundup
()
Relaxing Consistency Guarantees
()
Two-Phase Commit and Consensus Protocols
()
Eventual Consistency
()
CAP Theorem
()
NoSQL: Systems and Concepts-Lesson 15: Early key-value systems and key concepts
Types of NoSQL Systems
()
ACID, Major Impact Systems
()
Memcached: Consistent Hashing
()
Consistent Hashing, cont'd
()
DynamoDB: Vector Clocks
()
Vector Clocks, cont'd
()
NoSQL: Systems and Concepts-Lesson 16: Document Stores and Extensible Record Stores
CouchDB Overview
()
CouchB Views
()
BigTable Overview
()
BigTable Implementation
()
NoSQL: Systems and Concepts-Lesson 17: Extended NoSQL Systems
HBase, Megastore
()
Spanner
()
Spanner cont'd, Google Systems
()
MapReduce-based Systems
()
Bringing Back Joins
()
NoSQL Rebuttal
()
NoSQL: Systems and Concepts-Lesson 18: Pig: Programming with Relational Algebra
Almost SQL: Pig
()
Pig Architecture and Performance
()
Data Model
()
Load, Filter, Group
()
Group, Distinct, Foreach, Flatten
()
NoSQL: Systems and Concepts-Lesson 19: Pig Analytics
CoGroup, Join
()
Join Algorithms
()
Skew
()
Other Commands
()
Evaluation Walkthrough
()
Review
()
NoSQL: Systems and Concepts-Lesson 20: Spark
Context
()
Spark Examples
()
RDDs, Benefits
()
Graph Analytics-Lesson 21: Structural Tasks
Graph Overview
()
Structural Analysis
()
Degree Histograms, Structure of the Web
()
Connectivity and Centrality
()
Graph Analytics-Lesson 22: Traversal Tasks
PageRank
()
PageRank in more Detail
()
Traversal Tasks: Spanning Trees and Circuits
()
Traversal Tasks: Maximum Flow
()
Graph Analytics-Lesson 23: Pattern Matching Tasks and Graph Query
Pattern Matching
()
Querying Edge Tables
()
Relational Algebra and Datalog for Graphs
()
Querying Hybrid Graph/Relational Data
()
Graph Query Example: NSA
()
Graph Analytics-Lesson 24: Recursive Queries
Graph Query Example: Recursion
()
Evaluation of Recursive Programs
()
Recursive Queries in MapReduce
()
The End-Game Problem
()
Graph Analytics-Lesson 24: Representations and Algorithms
Representation: Edge Table, Adjacency List
()
Representation: Adjacency Matrix
()
PageRank in MapReduce
()
PageRank in Pregel
()