Structure Discovery

Advanced Clustering Algorithms

Today…

Multiple advanced algorithms for clustering

Gaussian Mixture Models

Often called EM-based clustering

Kind of a misnomer in my opinion
- What distinguishes this algorithm is the kind of clusters it finds
- Other patterns can be fit using the Expectation Maximization algorithm

I’ll use the terminology Andrew Moore uses, but note that it’s frequently called EM these days

Gaussian Mixture Models

A centroid and a radius

Fit with the same approach as k-means
(some subtleties on process for selecting radius)

Gaussian Mixture Models

Can do fun things like
- Overlapping clusters
- Explicitly treating points as outliers

Example

Nifty Subtlety

GMM still assigns every point to a cluster, but has a threshold on what’s really considered “in the cluster”

Used during model calculation

Example

Assessment

Can assess with same approaches as before
- Distortion
- BIC

Plus likelihood

Likelihood

(more commonly, log likelihood)

The probability of the data occurring, given the model

Assesses each point’s probability, given the set of clusters, adds it all together

For Instance….

Disadvantages of GMMs

Much slower to create than k-means
Can be overkill for many problems

Spectral Clustering

Spectral Clustering

Spectral Clustering

Conducts dimensionality reduction and clustering simultaneously
- Like support vector machines
- Mathematically equivalent to K-means clustering on a non-linear dimension-reduced space

Hierarchical Clustering

Clusters can contain sub-clusters

Example

Dendrogram

Hierarchical Agglommerative Clustering (HAC)

Each data point starts as its own cluster
Two clusters are combined if the resulting fit is better
Continue until no more clusters can be combined

DP-Means clustering

Starts with single point and cluster

Repeatedly adds new point
- If point does not fit well with existing clusters, creates a new cluster

Sequence clustering

Clusters sequences rather than individual data points

A variety of algorithms exist to do this

Dynamic Time Warping

A method for calculating difference between sequences when length is not the same
- i.e. what if two students have same overall pattern of behavior but don’t take the same time to demonstrate each behavior?

Tries to match each event in one sequence to an event in the other sequence and check if order is consistent

Latent Class Analysis*

* Not technically clustering

Latent Class Analysis

Type of structural equation modeling that groups data points into a latent variable (conceptually similar to a cluster)

Attempts to find data points that are statistically independent of each other, when controlling for group – such that the relationship between the data points is wholly explained by the grouping

Latent Class Analysis

Can conduct statistical analysis of latent classes
Can be used to model changes in membership over time

Requires tons and tons of data
Very slow
Tends to find smaller number of latent classes than cluster analysis

Many types of clustering

Which one you choose depends on what the data look like
And what kind of patterns you want to find

Next lecture