Using Linear PCA, Kernel PCA (with Gaussian Kernel) and Isomap for dimensionality reduction in Python

In this article, the linear PCA, the kernel PCA and the Isomap algorithms will be applied on a few datasets, to show whether the structure of the data in higher dimensions are preserved in the lower dimension or not for the methods. The scikit learn implementations will be used.

For Isomap, the original dataset from Joshua Tenenbaum, the primary creator of the isometric feature mapping algorithm, will be used (as given in one of the assignments of the Edx Course Microsoft: DAT210x Programming with Python for Data Science, by replicating his canonical, dimensionality reduction research experiment for visual perception). It consists of 698 samples of 4096-dimensional vectors. These vectors are the coded brightness values of 64×64-pixel heads that have been rendered facing various directions and lighted from many angles.

For this assignment, Dr. Tenenbaum’s experiment will be replicated by:

  1. Applying both PCA and Isomap to the 698 raw images to derive 2D principal components and a 2D embedding of the data’s intrinsic geometric structure.
  2. Projecting both onto a 2D scatter plot, with a few superimposed face images on the associated samples.
  3. Increasing n_components to three and plotting.

The following figure shows first 100 images from the dataset.

heads.png

Applying Principal Component Analysis (PCA)

Let’s first apply PCA on the dataset and choose the first 3 principal components. If we project the data onto these components, to obtain the following plots. Notice that with only 3 principal components chosen, the projected images approximate the original images quite well.heads_pca_1heads_pca_2heads_pca_3

heads_pca

Applying Kernel Principal Component Analysis (KPCA)

Now let’s apply KPCA on the dataset (with **Gaussian Kernel) and choose only the first 3 principal components. If we project the data onto these components, to obtain the following plots. Notice that with only 3 principal components chosen, the projected images approximate the original images quite well.

 heads_kpca_1heads_kpca_2heads_kpca_3
heads_kpca

Applying Isomap

Finally let’s apply Isomap on the same dataset with 8 nearest neighbors and choose only the first 3 manifolds. If we project the data onto these components, to obtain the following plots. Notice that with only 3 manifolds chosen, the projected images approximate the original images quite well.

heads_iso_1heads_iso_2heads_iso_3

heads_iso

As can be seen from the above results, between linear PCA, Kernel PCA and the non-linear Isomap, the Isomap algorithm is better able to capture the true nature of the faces dataset when reduced to two component.

Each coordinate axis of the 3D manifold correlates highly with one degree of freedom from the original, underlying data (e.g., Left, Right, Down, Up Head Positions).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s