Applying Linear PCA vs. Kernel PCA (with Gaussian Kernel) for dimensionality reduction on a few datasets in R

In this article, both the linear PCA and the kernel PCA will be applied on a few shape datasets, to show whether the structure of the data (in terms of different clusters) in higher dimensions are preserved in the lower dimension or not for both the methods.

  1. For the linear PCA, as usual, the dataset is first z-score normalized and then the eigen-analysis of the covariance matrix is done. Then to reduce the dimension, the dataset is projected onto the first few principal components (dominant eigenvectors of the covariance matrix).
  2. For the kernel PCA, Gaussian Kernel is used to compute the distances between the datapoints and the Kernel matrix is computed (with the kernel trick), then normalized. Next the eigen-analysis of the Kernel matrix is done. Then to reduce the dimension, the first few dominant eigenvectors of the kernel matrix are chosen, which implicitly represent the data already projected on the principal components of the infinite dimensional space. The next figure shows the algorithms for the two methods.

algo.png

flame

flame

spiral1.png

spiral1.gif

circlescircles

Aggregateaggregate

moonsmoonsspiralspiralirisiris

CompoundCompoundR15R15swissswissglobeglobe

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s