Introduction
What goes into a data pipeline?
()
Data science modules covered
()
1. GCP Data Pipeline Products
GCP data pipeline options
()
Cloud Dataproc
()
Cloud Dataflow
()
Cloud Pub/Sub
()
2. Apache Beam
What is Apache Beam?
()
Beam pipelines
()
PCollections
()
Transforms
()
Pipeline I/O
()
Runners
()
3. Setting Up Dataflow
Setting up GCP for Dataflow
()
Setting up Python
()
Creating a simple pipeline
()
Executing in Dataflow
()
4. Data Processing with Beam and Dataflow
Reading text files
()
ParDo
()
GroupBy
()
Map
()
Combine
()
Writing data to text files
()
Other capabilities
()
5. Cloud Pub/Sub
What is Pub/Sub?
()
Topics and messages
()
Publishers
()
Subscribers
()
Create a topic
()
Create a subscription
()
Publish and receive
()
Python SDK
()
6. Streaming with Dataflow
Streaming with Dataflow
()
Windowing with Dataflow
()
Streaming and windowing example
()
Ex_Files_Data_Science_Google_Cloud_Data.zip
(1.0 MB)