In this article, a few problems will be discussed that are related to face reconstruction and rudimentary face detection using eigenfaces (we are not going to discuss about more sophisticated face detection algorithms such as Voila-Jones or DeepFace).
This problem appeared as an assignment in the edX course Analytics for Computing (by Georgia Tech). The following description of the problem is taken straightaway from the assignment.
This is an application of principal components analysis (PCA) and k-means to the analysis of “familiar faces.” We’ll also create a simple interactive visualization for exploring this dataset (using bokeh).
Solving the PCA problem
The following figure shows the basic algorithm to compute a PCA, the interactive visual demo of which appears here.
The dataset: Some familiar faces
The dataset consists of a bunch of images of people’s faces taken from MIT Faces Recognition Project database. There are 3991 images in total. A randomly selected subset of 500 images are loaded, all the following experiments will be done on these images. They are shown below.
Preprocessing the images
To apply PCA, we’ll want to preprocess the images in various ways.
To begin with, it’s possible that that the images come in all shapes and sizes. The following code will figure out what is the largest height and width that are within the bounds of all the images.
import sys import numpy as np
min_rows, min_cols = sys.maxsize, sys.maxsize max_rows, max_cols = 0, 0 for (i, image) in enumerate(original_images): r, c = image.shape, image.shape min_rows = min(min_rows, r) max_rows = max(max_rows, r) min_cols = min(min_cols, c) max_cols = max(max_cols, c) print("\n==> Least common image size:", min_rows, "x", min_cols, "pixels")
- Suppose the least common image size is pixels is the smallest dimension. Let’s crop each image so that it is in size. If , then crop out any extra rows on the bottom of the image; and if , then center the columns of the image. Let’s store the output images in a 3-DNumpy array called
images[:, :, :], where
images[k, :, :]is the
k-th image, the following code exactly does that.
defrecenter(image, min_rows, min_cols): r, c = image.shape top, bot, left, right = 0, r, 0, c if r > min_rows: top = r - min_rows if c > min_cols: right = min_cols return image[top:bot, left:right] image0_recentered = recenter(image0, min_rows, min_cols) fig, axs = plt.subplots(1, 2, figsize=(10, 5)) imshow_gray(image0, ax=axs) imshow_gray(image0_recentered, ax=axs)
Let’s compute an “average” image, taken as the elementwise (pixelwise) mean over the images and display this “average face”.
Recall that PCA requires centered points. Let’s do that by subtracting the mean image from every image. The next figure shows couple of images and the ones obtained after mean subtraction.
From image set to a data matrix and back again
For PCA, we need a data matrix. Here is some code to convert our 3-D array of images into a 2-D data matrix, where we “flatten” each image into a 1-D vector by a simple
# Create m x d data matrix m = len(images) d = min_rows * min_cols X = np.reshape(images, (m, d))
# To get back to an image, just reshape it again imshow_gray(np.reshape(X[int(len(X)/2), :], (min_rows, min_cols)))
Let’s Compute the SVD of
X. Store the result in three arrays,
U holds ,
Sigma holds just the diagonal entries of , and
VT holds .
U, Sigma, VT = np.linalg.svd(X, full_matrices=False)
# Sanity check on dimensions print("X:", X.shape) print("U:", U.shape) print("Sigma:", Sigma.shape) print("V^T:", VT.shape)
X: (500, 16384) U: (500, 500) Sigma: (500,) V^T: (500, 16384)
The following table and the plot inspect the singular values, i.e., the entries of
Sigma. The plot will show the singular values as dots, plotted at each position for the -th singular values. To give a rough idea of how quickly the singular values decay, the plot includes a solid line showing the curve, .
Does the spectrum of these data decay quickly or slowly? How should that affect our choice ofif we consider a -truncated SVD?
The question is ill-defined and the answer is relative. In this case, a reasonable argument is that the spectrum actually decays somewhat slowly. Why? If we try fitting the singular values , we’ll find that is actually a pretty good fit. That is considered fairly slow decay; there would be significant compressibility if the curve dropped off exponentially (or at least superlinearly) instead.to functions of
Next, let’s plot the first few principal components. From what we computed above, each right singular vector has the same number of entries as there are pixels in an image. So, we could plot them as images by reshaping them. What do they appear to capture?
For example, the first (also third) principal component captures a male face, whereas the second (also fourth) one seems to capture a female face, the fifth one captures a face with long hairs.
Y, which is the original data matrix projected onto the first
num_components = 5 # Number of principal components Y = np.matmul(X, VT[:num_components,:].T)
The next plot visualizes the face datapoints projected on the first two principal components with bokeh with the thumbnails as the face images.
Next, let’s run Run
Y[:m, :num_components], to try to identify up to
The next animation shows the interactive plot of the face data points projected on the first two principal components with bokeh with the thumbnails as the face images.
For the 500 randomly selected faces, the green cluster seems to contain mostly the whiter / brighter faces whereas the red cluster mostly contains the darker faces, whereas the blue cluster seems to contain younger populations mostly.
2. Face Reconstruction and A Simple Face Detector
This problem appeared as projects in this CMU machine learning for signal processing course and also in this UW Computer Vision course. The description of the problem is taken straightaway from the course projects.
We are given a corpus of facial images [here] from the LFWCrop database. Each image in this corpus is 64 x 64 and grayscale. We need to learn a typical (i.e. Eigen) face from them. There are 1071 (.pgm) images in this dataset. A few of them are shown below.
Computing the EigenFaces
As described in the original paper from MIT and this case study from JHU, the following figure shows the theory / algorithm for how to efficiently compute the eigenfaces from the (training) images and then choose top k eigenfaces (eigenvectors).
The next 4 figures show the meanface, the percentage variance captured by the eigenfaces and the top 100 and then top 400 eigenfaces computed computed from the images, respectively.
Projection on the EigenFaces and Reconstruction
Now, let’s project a face onto the face space to generate a vector of k coefficients, one for each of the k eigenfaces (for different values of k). Then let’s reconstruct the same face from the vector of coefficients computed.
The following figures and animations show how a given image can be reconstructed by projecting on the first few dominant eigenfaces and also how the reconstruction error (residual) decreases as more and more eigenfaces are used for projection. Also, the reconstruction error goes down quite fast, with the selection of first few eigenfaces.
Original Face Image
Reconstructed Face Image by projecting on the truncated eigenfaces space
The Reconstruction Error when first k eigenfaces were used
The same is shown for yet another image.
The next figure shows the images projected on and reconstructed from the first two eigenfaces. The similar images stay in close proximity whereas the dissimilar ones (w.r.t. projection on the first 2 dominant eigenvectors) are far apart in the 2D space.
Projection of a Human vs. a Non-Human-Face (e.g. Cat) on the EigenFaces Space
Now let’s reconstruct a non-human object (a cat) to the same space spanned by the top-k eigenfaces for different values of k. As we can see from the below figures, the reconstruction error is much higher for the cat, as expected.
The following figure taken from the same JHU vision lab case study shows the threshold-based face detection algorithm with eigenfaces.
Now, we are also given four group photographs with multiple faces [here]. We need to use the Eigen face to detect the faces in these photos. One such group photo is shown below: the group photo of the rockstars from The Beatles.
The faces in the group photographs may have different sizes. We also need to account for these variations.
To detect faces in the image, we need to scan the group photo and identify all regions in it that “match” the patterns in Eigen facemost. To “Scan” the image to find matches against anEigen face, you must match every region of the photo against the Eigen face.
The “match” between anyregion of an image and an Eigen face is given by the normalized dot product between the Eigen face and the region of the image being evaluated. The normalized dot product between an Eigen face and a corresponding segment of the image is given by , where is the vector (unrolled) representation of the Eigen face, and is the unrolled vector form of the patch.
Scaling and Rotation
The Eigen face is fixed in size and can only be used to detect faces of approximately the same size as the Eigen face itself. On the other hand faces in the group photos are of different sizes — they get smaller as the subject gets farther away from the camera.
The solution to this is to make many copies of the eigen face and match them all.
In order to make your detection system robust, resize the Eigen faces from 64 pixels to 32×32, 48×48, 96×96, and 128×128 pixels in size. Once we’ve scaled your eigen face, we will have a total of five “typical” faces, one at each level of scaling. We need to scan the group pictures with all of the five eigen faces. Each of them will give us a “match” score for each position on the image. If we simply locate the peaks in each of them, we may find all the faces. Sometimes multiple peaks will occur at the same position, or within a few pixels of one another. In these cases, we need to merge all of these (non-maximum suppression), they probably all represent the same face.
Additional heuristics may also be required (appropriate setting of thresholds, comparison of peak values from different scaling factors, addiitonal scaling, filtering by human body color thresholds etc.). These are for us to investigate.
The faces detected using the window scanning and the threshold on the distance from the eigenfaces plane are shown below for the beatles group image.
Implement morphing. Given two images, compute an animation that shows one being transformed continuously into the other, using the eigenfaces representation. Also, try transforming an image “away” from another one, to exaggerate differences between two faces. Make a video from your morphed faces.
Let’s try to morph the first face to the second one.
Here is the algorithm that we are going to use:
- For the first face first create a reconstruction using only a few (k) dominant eigenfaces.
- Iteratively reconstruct the first face using lesser and lesser eigenfaces and animate.
- Stop when we reached the reconstruction of the first face with only k eigenfaces.
- Now compute the coefficients of the second face using only those k eigenfaces,
- Next start using second face’s coefficients instead of the first face’s coefficients.
- Now iteratively increase the number of eigenfaces to be used to reconstruct the second face, till the point when all the eigenfaces are used.
The following animation shows such morphing: