# Using Expectation Maximization for Soft-Clustering in Python

• In this article, Expectation Maximization will be used for soft clustering of a dataset in k clusters. Given n data points in an m dimensional space Data = (Data_1, … , Data_n), let’s first represent their soft assignments to k clusters as a k x n dimensional column-stochastic matrix HiddenMatrix = (HiddenVector_1, … , HiddenVector_n), where each HiddenVector_j itself is a dimensional vector representing the probabilities of assigning the Data_j to each of the k clusters, where \$j ∈ {1,..,k}. Then the k centers can be represented as k points in m dimensional space, Parameters = (θ1, …, θk) (Reference: Stepic, Rosalind and Coursera Course on Bioinformatics Algorithms by UCSD).
• The expectation maximization algorithm starts with a random choice of Parameters. It then alternates between the Estep, in which a responsibility matrix HiddenMatrix for Data given Parameters is computed and the Mstep, in which the Parameters are re-estimated (with MLE) using the HiddenMatrix, till convergence. The iterative algorithm outline is shown below.

• For this assignment, a 2 dimensional dataset with ~1000 data points will be used and EM algorithm will be run to cluster the dataset into 6 segments, starting with 6 random centers. The following figures / animations show the EM algorithm steps and the responsibility matrix and the clusters obtained with different values of the stiffness parameter β.
• As can be seen, the soft assignments come closer to hard assignments as the β parameter value is increased.
• For β = 1.1
• For  β = 10
• For  β = 0.1