(Sandipan Dey 9 August 2016)
In this article, an R implementation of locality sensitive hashing will be used for fast approximate nearest neighbor search in images. The idea of document retrieval using LSH appears as one assignment in the Coursera Course Machine Learning Clustering and Retrieval. As kd-tree based implementation of ANN search does not scale well with high dimensional data (such as text data), LSH provides an alternative implementation.
Image Search with LSH
- The digits and the faces dataset containing 798 images faces and 5k images of digits respectively will be used to retrieve the approximate nearest neighbors with LSH query. The faces dataset is shown in the next figure that contains 798 images of faces with different poses, angles and illuminations.
- A face image is chosen from the face dataset as shown below and LSH query is used to retrieve the top 25 nearest neighbors (images) that are most similar to the query image.The query image face is oriented to the left so the nearest neighbors of this image is expected to be oriented to the left as well.
- The next figure shows the partitions induced by the LSH training process on the face images, images in the same partition (bin) has the same color, with a particular run of LSH the following 22 bins were obtained.
- The next figure shows the nearest neighbor graph induced by the LSH training, nodes that are neighbors share an edge in between them. Nodes in the graph correspond to bins, for a few bins the images assigned to the bins are also shown.
- The following figure shows the results obtained with the LSH query with maximum search radius till 2, along with the true nearest neighbors obtained with brute force search.
- As can be seen from the above figure, that the top 25 query results become more relevant as the search radius grows.
- As we increase the search radius, we find more neighbors that are more similar to the query image.
- With increased search radius comes a greater number images that have to be searched. Query time is higher as a consequence.
- With sufficiently high search radius, the results of LSH begin to resemble the results of brute-force search.
- As shown in the next figure, the impact of search radius on the quality metrics for the neighbors retrieved are examined.
- Similar experiments were done on the digits images dataset. The following figure shows the results obtained when queried with an image of the digit 5, with the LSH query with maximum search radius till 2, along with the true nearest neighbors obtained with brute force search.
- The next figure shows the results obtained when queried with an image of the digit 6, with the LSH query with maximum search radius till 2, along with the true nearest neighbors obtained with brute force search.
- As can be seen again from the above figures, the top 25 similar images obtained using LSH query starts to resemeble the true nearest neighbors of the query image, as the search radius increases.
- The next animation shows how the nearest neighbors change with LSH search radius, when queried with an image of the digit 0.