Kernel K-Means, K-Means++ and Cluster Evaluation

In this article, the R / Python implementations of KMeans Clustering and Kernel KMeans Clustering algorithms will be used to cluster a few datasets. Then the cluster assignments are compared with the ground truths w.r.t. (R implementations of) the following supervised clustering evaluation metrics: purity and NMI.

KMeans and Kernel KMeans Clustering

  1. The following two animations show the partitions induced in by KMeans clustering in the following two datasets. On the first dataset, KMeans does a good job, but on the second dataset it performs poorly.k3kB
  2. The next animation shows that the Kernel KMeans Clustering (implemented in R with Gaussian Kernel) performs considerably better on the second dataset, starting with the Gram matrix ϕ, with ϕ(i,j)=exp((XiXj)^2/(2σ^2)), with σ=5.
  3. The next couple of animations show the clusters obtained at diffetent iterations with the algorithms KMean and Kernel KMeans clustering (with σ=4) respectively, on the same dataset, but this time implemented in python.

kkmeansBkmeansB

4. The next couple of animations show the clusters obtained at diffetent iterations with the algorithms KMeans and Kernel KMeans clustering (with σ=4) respectively, on another dataset.

kmeansC

kkmeansC

5. The next couple of animations show the clusters obtained at diffetent iterations with the algorithms KMean and Kernel KMeans clustering (with σ=4) respectively, on yet another dataset.

kmeansSeedkkmeansSeed

KMeans++ Smart Initialization

  1. Instead of random initialization for the KMeans clustering to start with the initial centroids, KMeans++ uses a smart initialization to probabilisitically assure that the centroids are far from each other. The following figure describes the smart initialization.algo.png
  2. The following figures / animations show the comparative results in between KMeans with and without smart KMeans++ initialization, in terms of initialization time taken, the iterative convergence of the algorithm to a local optimum and cluster quality (in terms of heterogeneity) on a dataset with k=7 clusters. As can be seen, in general KMeans++ smart initialization takes more time, but generally tends to converge faster and to better quality clusters.kmeans++1
    kmeans++init1
    kmeans1
    kmeans++time1
    heter1ptchg1
  3. The next figures / animations show the comparative results in between KMeans with and without smart KMeans++ initialization, in terms of initialization time taken, the iterative convergence of the algorithm to a local optimum and cluster quality (in terms of heterogenity) on another dataset with k=5 clusters.kmeanskmeans++kmeans++initkmeans++timeptchg
    heter

Cluster Evaluations

  1. The following figure shows the mathematical formulations of two supervised clustering evaluation metrics, purity and NMI that will be used to evaluate the clusters obtained.math.png
  2. As can be seen from the following evaluation results by running the algorithms on the following dataset, Kernel KMeans with K=2performs the best (keeping σσ of the kernel fixed at 4) in terms of the purity and NMI, when compared to the ground truth shown next.gBkkevalBkkevalB
  3. Again, as can be seen from the following evaluation results by running the algorithms on another dataset, Kernel KMeans with K=2 performs the best (keeping σσ of the kernel fixed at 4) in terms of the purity and NMI, when compared to the ground truth shown next.gCkevalCkkevalCkkevalC
  4. Now, keeping the number of clusters fixed at K=2, the σσ for the kernel is varied for both the datasets. As shown in the next figures / animations and results, for the kernel higher σσ values, Kernel KMeans performs better in terms of the purity and NMI.kkBsigkkCsigkksigBC
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s