Using Expectation Maximization for Soft-Clustering in Python

  • In this article, Expectation Maximization will be used for soft clustering of a dataset in k clusters. Given n data points in an m dimensional space Data = (Data_1, … , Data_n), let’s first represent their soft assignments to k clusters as a k x n dimensional column-stochastic matrix HiddenMatrix = (HiddenVector_1, … , HiddenVector_n), where each HiddenVector_j itself is a dimensional vector representing the probabilities of assigning the Data_j to each of the k clusters, where $j ∈ {1,..,k}. Then the k centers can be represented as k points in m dimensional space, Parameters = (θ1, …, θk) (Reference: Stepic, Rosalind and Coursera Course on Bioinformatics Algorithms by UCSD).
  • The expectation maximization algorithm starts with a random choice of Parameters. It then alternates between the Estep, in which a responsibility matrix HiddenMatrix for Data given Parameters is computed and the Mstep, in which the Parameters are re-estimated (with MLE) using the HiddenMatrix, till convergence. The iterative algorithm outline is shown below.


  • For this assignment, a 2 dimensional dataset with ~1000 data points will be used and EM algorithm will be run to cluster the dataset into 6 segments, starting with 6 random centers. The following figures / animations show the EM algorithm steps and the responsibility matrix and the clusters obtained with different values of the stiffness parameter β.test00test24
  • As can be seen, the soft assignments come closer to hard assignments as the β parameter value is increased.
  • For β = 1.1
  • For  β = 10
  • For  β = 0.1



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s