In this article, the *R / Python* implementations of *KMeans Clustering* and *Kernel KMeans Clustering* algorithms will be used to cluster a few datasets. Then the *cluster assignments* are compared with the *ground truths* w.r.t. (R implementations of) the following *supervised clustering evaluation metrics*: *purity* and *NMI*.

## KMeans and Kernel KMeans Clustering

- The following two animations show the partitions induced in by
*KMeans clustering*in the following two datasets. On the first dataset,*KMeans*does a good job, but on the second dataset it performs poorly. - The next animation shows that the
*Kernel KMeans Clustering*(implemented in R with*Gaussian Kernel*) performs considerably better on the second dataset, starting with the*Gram*matrix ϕ, with ϕ(i,j)=exp(−(Xi−Xj)^2/(2σ^2)), with σ=5. - The next couple of animations show the clusters obtained at diffetent iterations with the algorithms
*KMean*and*Kernel KMeans*clustering (with σ=4) respectively, on the same dataset, but this time implemented in python.

4. The next couple of animations show the clusters obtained at diffetent iterations with the algorithms *KMeans* and *Kernel KMeans* clustering (with σ=4) respectively, on another dataset.

5. The next couple of animations show the clusters obtained at diffetent iterations with the algorithms *KMean* and *Kernel KMeans* clustering (with σ=4) respectively, on yet another dataset.

## KMeans++ Smart Initialization

- Instead of random initialization for the KMeans clustering to start with the initial centroids,
*KMeans++*uses a smart initialization to probabilisitically assure that the centroids are far from each other. The following figure describes the smart initialization. - The following figures / animations show the comparative results in between KMeans with and without smart
*KMeans++*initialization, in terms of initialization time taken, the iterative convergence of the algorithm to a local optimum and cluster quality (in terms of*heterogeneity*) on a dataset with*k=7*clusters. As can be seen, in general*KMeans++*smart initialization takes more time, but generally tends to converge faster and to better quality clusters.

- The next figures / animations show the comparative results in between KMeans with and without smart
*KMeans++*initialization, in terms of initialization time taken, the iterative convergence of the algorithm to a local optimum and cluster quality (in terms of*heterogenity*) on another dataset with*k=5*clusters.

## Cluster Evaluations

- The following figure shows the mathematical formulations of two
*supervised clustering evaluation metrics*,*purity*and*NMI*that will be used to evaluate the clusters obtained. - As can be seen from the following evaluation results by running the algorithms on the following dataset,
*Kernel KMeans*with*K=2*performs the best (keeping σσ of the*kernel*fixed at*4*) in terms of the*purity*and*NMI*, when compared to the*ground truth*shown next. - Again, as can be seen from the following evaluation results by running the algorithms on another dataset,
*Kernel KMeans*with*K=2*performs the best (keeping σσ of the*kernel*fixed at*4*) in terms of the*purity*and*NMI*, when compared to the*ground truth*shown next. - Now, keeping the number of clusters fixed at
*K=2*, the σσ for the*kernel*is varied for both the datasets. As shown in the next figures / animations and results, for the*kernel*higher σσ values,*Kernel KMeans*performs better in terms of the*purity*and*NMI*.