# Using PCA to Detect Outliers in Images

In this article, the *Principal Component Analysis* will be used to find the *outliers* in images. PCA can be interpreted in the following ways:

- The principal components found in PCA captures the directions with highest variance in data (
*maximize the variance*of projection along each component). - The principal components minimize the
*reconstruction error*(i.e., the squared distance between the original data and its estimate, by projecting the data on the first few principal compnents).

Since most of the time, the first few principal components explain almost all of the variance in the data, the above interpretations lead to the intuition that the data points that are not explained well by the first few principal components are probably the ones that are noisy.

- Colored images can be viewed as a collection of three dimensional data points (e.g., in
*RGB*space), where each data point (pixel) has 3 dimensions. - PCA will be used on an image dataset II and depending on how much variance the first two principal components explain (using
*screeplot*) only the first or the first two PC vector(s) are chosen to represent the image data in lower dimension. - The projection is computed in the usual manner: I^=I.V1:k.VT1:kI^=I.V1:k.V1:kT, where V1:kV1:k are the first kk
*orthonormal*principal component vectors, where k∈{1,2}k∈{1,2}. - The reconstruction error ||I−I^||2||I−I^||2will be used to score each of the data points.
- The ones with the higher scores (e.g., > 90 percentile) will be marked as the outliers.
- Pixmap library is used to convert extract the color channels from an image.
- As can be seen from the following results, for each of the images the data points (pixels) with score >90>90 quantile value of the scores are marked as outliers.
- The outliers are marked with
*black*patches. - The 8585, 9090 and 9595 quantiles are shown with red dashed lines.
- In the first image the orange is detected as an outlier among the apples, in the second image the magenta patterns are detected as outliers in the space and in the third image some features of my face (eyebrows, hair, nostrill) are detected as outliers.

`## [1] "------------------------------------------" ## [1] "% Variance explained by upto k (1,2,3) PCs" ## [1] "------------------------------------------" ## [1] 94.79 98.85 100.00`

`## [1] "----------------------------------------------" ## [1] "85%, 90% and 95% quantile values of the scores" ## [1] "----------------------------------------------" ## 85% 90% 95% ## 0.1311154 0.1864878 0.2558950`

`## [1] "------------------------------------------" ## [1] "% Variance explained by upto k (1,2,3) PCs" ## [1] "------------------------------------------" ## [1] 99.47 99.99 100.00`

`## [1] "----------------------------------------------" ## [1] "85%, 90% and 95% quantile values of the scores" ## [1] "----------------------------------------------" ## 85% 90% 95% ## 0.006053905 0.010789753 0.017195377`

`## [1] "------------------------------------------" ## [1] "% Variance explained by upto k (1,2,3) PCs" ## [1] "------------------------------------------" ## [1] 99.48 99.90 100.00`

`## [1] "----------------------------------------------" ## [1] "85%, 90% and 95% quantile values of the scores" ## [1] "----------------------------------------------" ## 85% 90% 95% ## 0.03542360 0.04108505 0.05129503`

- The following animation shows the pixels detected as outliers with different threshold outlier scores.

- Yet another animation for outlier detection with PCA based score.

Advertisements