In this article, the linear PCA, the kernel PCA and the Isomap algorithms will be applied on a few datasets, to show whether the structure of the data in higher dimensions are preserved in the lower dimension or not for the methods. The scikit learn implementations will be used.
For Isomap, the original dataset from Joshua Tenenbaum, the primary creator of the isometric feature mapping algorithm, will be used (as given in one of the assignments of the Edx Course Microsoft: DAT210x Programming with Python for Data Science, by replicating his canonical, dimensionality reduction research experiment for visual perception). It consists of 698 samples of 4096-dimensional vectors. These vectors are the coded brightness values of 64×64-pixel heads that have been rendered facing various directions and lighted from many angles.
For this assignment, Dr. Tenenbaum’s experiment will be replicated by:
- Applying both PCA and Isomap to the 698 raw images to derive 2D principal components and a 2D embedding of the data’s intrinsic geometric structure.
- Projecting both onto a 2D scatter plot, with a few superimposed face images on the associated samples.
- Increasing n_components to three and plotting.
The following figure shows first 100 images from the dataset.
Applying Principal Component Analysis (PCA)
Applying Kernel Principal Component Analysis (KPCA)
Now let’s apply KPCA on the dataset (with **Gaussian Kernel) and choose only the first 3 principal components. If we project the data onto these components, to obtain the following plots. Notice that with only 3 principal components chosen, the projected images approximate the original images quite well.
Finally let’s apply Isomap on the same dataset with 8 nearest neighbors and choose only the first 3 manifolds. If we project the data onto these components, to obtain the following plots. Notice that with only 3 manifolds chosen, the projected images approximate the original images quite well.
As can be seen from the above results, between linear PCA, Kernel PCA and the non-linear Isomap, the Isomap algorithm is better able to capture the true nature of the faces dataset when reduced to two component.
Each coordinate axis of the 3D manifold correlates highly with one degree of freedom from the original, underlying data (e.g., Left, Right, Down, Up Head Positions).