The following problems appeared as a project in the *edX course ColumbiaX: CSMM.102x Machine Learning*.

In this assignment the following clustering algorithms will be implemented:

- Hard clustering with
**K-means** - Soft clustering with

a.**Weighted K-means**

b.**Gaussian mixture models**with**Expectation Maximization**.

Some datasets with *n* data points *{x_1,…,x_n}* will be used for testing the algorithms, where each x_i ∈ R^d.

### Hard Clustering

- Each point is assigned to a one and only one cluster (hard assignment).
- With
**K-means**we try to find*K*centroids {μ1,…,μK} and the corresponding assignments of each data point {c1,…,cn} where each ci∈{1,…,K} and c_i indicates which of the*K*clusters the observation x_i belongs to. The objective function that we seek to minimize can be written as

### Soft Clustering

(1) Each point is assigned to all the clusters with different weights or probabilities (soft assignment).

(2) With **Weighed K-means** we try to compute the weights ϕ_i(k) for each data point *i* to the cluster *k* as minimizing the following objective:

(3) With **GMM-EM** we can do soft clustering too. The **EM** algorithm can be used to learn the parameters of a Gaussian mixture model. For this model, we assume a generative process for the data as follows:

In other words, the *i_th* observation is first assigned to one of *K* clusters according to the probabilities in vector *π*, and the value of observation *x_i* is then generated from one of *K* multivariate Gaussian distributions, using the mean and covariance indexed by c_i. The EM algorithm seeks to maximize

over all parameters *π,μ1,…,μK,Σ1,…,ΣK* using the cluster assignments *c1,…,cn* as the hidden data.

The following figures show the algorithms that are going to be implemented for clustering.

The following animations show the output of the clustering algorithms (and how they converge with different iterations) on a few datasets (with *k=3* clusters), the weighted k-means is run with the stiffness-parameter *beta=10.*

**Dataset1**

**Dataset2**

**Dataset3**

**Dataset4**

**Dataset5**

Dataset6

**Dataset7**

**Dataset8**

The next animation shows the results with **weighted K-means** with stiffness parameter **beta=10**

The next animation shows the results with **weighted K-means** with stiffness parameter **beta=1
**

Here is how the **log-likelihood** for **EM** continuously increases over the iterations for a particular dataset:

The following animations show **GMM EM convergence** on a few more datasets: