Structure Discovery

Advanced Clustering Algorithms

Today…

  • Multiple advanced algorithms for clustering

Gaussian Mixture Models

  • Often called EM-based clustering


  • Kind of a misnomer in my opinion

    • What distinguishes this algorithm is the kind of clusters it finds

    • Other patterns can be fit using the Expectation Maximization algorithm 


  • I’ll use the terminology Andrew Moore uses, but note that it’s frequently called EM these days

Gaussian Mixture Models

A centroid and a radius

  • Fit with the same approach as k-means
    (some subtleties on process for selecting radius)

Gaussian Mixture Models

  • Can do fun things like

    • Overlapping clusters

    • Explicitly treating points as outliers

Example

Nifty Subtlety

  • GMM still assigns every point to a cluster, but has a threshold on what’s really considered “in the cluster”


  • Used during model calculation

Example

Assessment

  • Can assess with same approaches as before

    • Distortion

    • BIC


  • Plus likelihood

Likelihood

  • (more commonly, log likelihood)


  • The probability of the data occurring, given the model


  • Assesses each point’s probability, given the set of clusters, adds it all together

For Instance….

Disadvantages of GMMs

  • Much slower to create than k-means

  • Can be overkill for many problems

Spectral Clustering

Spectral Clustering

Spectral Clustering

  • Conducts dimensionality reduction and clustering simultaneously

    • Like support vector machines

    • Mathematically equivalent to K-means clustering on a non-linear dimension-reduced space

Hierarchical Clustering

  • Clusters can contain sub-clusters

Example

Dendrogram

Hierarchical Agglommerative Clustering (HAC)

  • Each data point starts as its own cluster

  • Two clusters are combined if the resulting fit is better

  • Continue until no more clusters can be combined

DP-Means clustering

  • Starts with single point and cluster


  • Repeatedly adds new point

    • If point does not fit well with existing clusters, creates a new cluster

Sequence clustering

  • Clusters sequences rather than individual data points


  • A variety of algorithms exist to do this

Dynamic Time Warping

  • A method for calculating difference between sequences when length is not the same

    • i.e. what if two students have same overall pattern of behavior but don’t take the same time to demonstrate each behavior?


  • Tries to match each event in one sequence to an event in the other sequence and check if order is consistent

Latent Class Analysis*

  • * Not technically clustering

Latent Class Analysis

  • Type of structural equation modeling that groups data points into a latent variable (conceptually similar to a cluster)


  • Attempts to find data points that are statistically independent of each other, when controlling for group – such that the relationship between the data points is wholly explained by the grouping 

Latent Class Analysis

  • Can conduct statistical analysis of latent classes

  • Can be used to model changes in membership over time


  • Requires tons and tons of data

  • Very slow

  • Tends to find smaller number of latent classes than cluster analysis

Many types of clustering

  • Which one you choose depends on what the data look like

  • And what kind of patterns you want to find

Next lecture