Clustering for Dataset Exploration
Unsupervised Learning
()
How many clusters?
Clustering 2D points
Inspect your clustering
Evaluating a clustering
()
How many clusters of grain?
Evaluating the grain clustering
Transforming features for better clusterings
()
Scaling fish data for clustering
Clustering the fish data
Clustering stocks using KMeans
Which stocks move together?
Visualization with Hierarchical Clustering and t-SNE
Visualizing hierarchies
()
Hierarchical clustering of the grain data
Hierarchies of stocks
Cluster labels in hierarchical clustering
()
Different linkage, different hierarchical clustering!
Intermediate clusterings
Extracting the cluster labels
t-SNE for 2-dimensional maps
()
t-SNE visualization of grain dataset
A t-SNE map of the stock market
Decorrelating Your Data and Dimension Reduction
Visualizing the PCA transformation
()
Correlated data in nature
Decorrelating the grain measurements with PCA
Principal components
Intrinsic dimension
()
The first principal component
Variance of the PCA features
Intrinsic dimension of the fish data
Dimension reduction with PCA
()
Dimension reduction of the fish measurements
A tf-idf word-frequency array
Clustering Wikipedia part I
Clustering Wikipedia part II
Discovering Interpretable Features
Non-negative matrix factorization (NMF)
()
NMF applied to Wikipedia articles
NMF features of the Wikipedia articles
NMF reconstructs samples
NMF learns interpretable parts
()
NMF learns topics of documents
Explore the LED digits dataset
NMF learns the parts of images
PCA doesn't learn parts
Building recommender systems using NMF
()
Which articles are similar to 'Cristiano Ronaldo'?
Recommend musical artists part I
Recommend musical artists part II
Final thoughts
()
company-stock-movements-2010-2015-incl.csv
(1.1 MB)
eurovision-2016.csv
(40 KB)
fish.csv
(3 KB)
Grains.zip
(4 KB)
lcd-digits.csv
(40 KB)
Musical artists.zip
(13 KB)
Wikipedia articles.zip
(462 KB)
wine.csv
(12 KB)