Some more Image Processing: Otsu’s Method, Hough Transform and Motion-based Segmentation with Python

Some of the following problems appeared in the lectures and the exercises in the coursera course Image Processing (by NorthWestern University). Some of the following descriptions of the problems are taken from the exercise’s description.

1. Otsu’s method for automatic thresholding to get binary images

We need to find a thershold to binarize an image, by separating the background from the foreground. Using Otsu’s method we can automatically find the global optimal threshold, by maximizing the between-class variance. The following figure shows the outline for the technique.

im10.png

The following figures show the thresholds computed and the output binary images obtained by using Otsu’s method on a few gray-scale low-contrast images:

o1.png

The following figures show the output binary images obtained by using Otsu’s method on a few gray-scale high-contrast images obtained using histogram equalization:

o2.png

The following figures show the distribution of the between-class variance for different images and the threshold chosen by Otsu’s method:

 ostu.png

2. Hough Transform to find lines in images

The Hough transform can be used to find lines in an image by taking votes from each of the pixels for all possible orientations of lines, for different values of the parameters in a straight line’s equation. The following figure describes the method.

im14.png

The following figures show a couple of binary images and the votes obtained by the image pixels in the parametric space of any straight line and how thresholding on the votes we can find the most prominent line(s) in the image (markd by green). The parameter values marked with arrow(s) represent the most prominent line(s) in the image.

hog2.png

The experiment  on the following image is inspired by the youtube video How Hough Transform works (by Thales Sehn Korting) which can be found here. Again, by thresholding on the votes by the pixels for the differrent values of the straight line parameters, we can find the most prominent line(s) in the image (markd by green). The parameter values marked with arrow(s) represent the most prominent line(s) in the image.

hog.png

 

The next figure shows all the steps in how the Hough transform can be used on any (gray-level) image to find lines (edges). As expected, the more voting threshold is increased, the lesser lines / edges are detected (since the more prominent ones will be detected by the Hough transform, they are colored red).

 

tiger_hough.png

 

 

umbc_hough.png

3. Motion based Segmentation using Accumulative Difference Image

In this problem we basically want to separate the moving objects in consecutive frames of a video from the non-moving objects. The following figures show the problem statement that appeared in an assignment in the same course, along with the theory to be used:

im14_15.png

im15.png

The following animation shows the motion of a moving rectangle.

motion

The next figure shows how the motion-based  segmentation using ADI-based techniques can be applied to separate out the moving rectangle from the static background.

adi.png

Again, the next animations show how the moving objects can be segmented from the non-moving ones from the consecutive frames of a video.

in.gif

First the frames from the video are binarized using Ostu’s method, as shown below.

motion_bin.gif

Then absolute ADI is applied to separate out the moving objects (here the students), as shown below:

motion_adi.gif

The next video is from some past cricket match with Sachin Tendulkar batting (taken from youtube) and the following one is the motion-segmented video with ADI:

sachin.gif

sachin_adi.gif

The next video is captured at a  live performance of a dance-drama written by Tagore and the following one is the motion-segmented video with ADI:

kalmrigaya.gif

kalmrigaya_adi

The markdown file can be found here.

Advertisements

Some Image Processing, Information and Coding Theory with Python

Some of the following problems appeared in the exercises in the coursera course Image Processing (by Northwestern University). The following descriptions of the problems are taken directly from the assignment’s description.

1. Some Information and Coding Theory

Computing the Entropy of an Image

The next figure shows the problem statement. Although it was originally implemented in MATLAB, in this article a python implementation is going to be described.

im11.png

Histogram Equalization and Entropy

Histogram equalization is a well-known image transformation technique to imrpove the contrast of an image. The following figure shows the theory for the technique: each pixel is transformed by the CDF of the image and as can be shown, the output image is expected to follow an uniform distribution (and thereby with the highest entropy) over the pixel intensities (as proved), considering the continuous pixel density. But since the pixel values are discrete (integers), the result obtained is going to be near-uniform.

im9.png

The following figures show a few images and the corresponding equalized images and how the PMF and CDF changes.

beans.png
eq.png

Image Data Compression with Huffman and Lempel Ziv (LZ78) Coding

Now let’s implement the following couple of compression techniques:

  1. Huffman Coding
  2. Lempel-Ziv Coding (LZ78)

and compress a few images and their histogram equalized versions and compare the entropies.

The following figure shows the theory and the algorithms to be implemented for these two source coding techniques:

im12.png

Let’s now implement the Huffman Coding algorithm to compress the data for a few gray-scale images.

The following figures show how the tree is getting constructed with the Huffman Coding algorithm (the starting, final and a few intermediate steps) for the following low-contrast image beans. Here we have alphabets from the set {0,1,…,255}, only the pixels present in the image are used. It takes 44 steps to construct the tree.

beans

htree_1_4l.png

htree_14.png
htree_40htree_41htree_42htree_43htree_44

The following table shows the Huffman Codes for different pixel values for the above low-contrast image beans.

 

 index pixel code
15 108 0
3 115 1
20 109 10
1 112 11
32 107 100
44 110 101
18 96 110
11 95 111
21 120 1000
35 91 1001
6 116 1010
24 123 1011
2 98 1100
13 100 1101
17 94 1111
12 119 10000
4 114 10001
42 103 10011
16 106 10100
37 132 10101
41 130 10111
5 117 11000
26 125 11010
40 99 11011
38 118 11100
0 113 11101
22 121 11111
39 131 101011
27 127 101111
31 102 110011
25 124 110101
23 122 110111
33 104 111011
34 105 111111
29 129 1001111
30 93 1010101
7 137 1010111
14 255 1101011
28 128 1101111
36 133 11010111
10 134 11101011
9 135 101010111
43 90 1001010111
8 136 10001010111
19 89 100001010111
Let’s now repeat the Huffman-tree construction for the following histogram-equalized image beans. The goal is:
  1. First construct the tree for the equalized image.
  2. Use the tree to encode the image data and then compare the compression ratio with the one obtained using the same algorithm with the low-contrast image.
  3. Find if the histogram equalization increases / reduces the compression ratio or equivalently the entropy of the image.

The following figures show how the tree is getting constructed with the Huffman Coding algorithm (the starting, final and a few intermediate steps) for the image beans. Here we have alphabets from the set {0,1,…,255}, only the pixels present in the image are used. It takes 40 steps to construct the tree, also as can be seen from the following figures the tree constructed is structurally different from the one constructed on the low-contract version of the same image.

equalized_beans.png
htree_1_4.png
htree_14htree_36htree_37htree_38htree_39htree_40

The following table shows the Huffman Codes for different pixel values for the above high-contrast image beans.

 index pixel code
13 119 0
11 250 1
5 133 10
0 150 11
8 113 100
32 166 101
36 60 110
40 30 111
25 208 1000
38 16 1001
21 181 1010
33 224 1011
9 67 1100
37 80 1101
27 244 1111
23 201 10000
17 174 10001
6 89 10011
34 106 10100
24 141 10101
28 246 10111
22 188 11000
10 235 11010
30 71 11011
2 195 11100
3 158 11101
1 214 11111
31 228 100001
15 84 101011
12 251 101111
39 18 110011
4 219 110111
14 93 111011
26 99 111111
19 253 1000001
7 238 1001111
18 21 1010111
29 241 1101111
35 248 1110011
20 0 10101111
16 255 110101111

The following figure shows the compression ratio for different images using Huffman and LZ78 codes, both on the low-contrast and high contrast images (obtained using histogram equalization). The following observations can be drawn from the comparative results shown in the following figures (here H represents Huffman and L represents LZ78):

  1. The entropy of an image stays almost the same after histogram equalization
  2. The compression ratio with Huffman / LZ78 also stays almost the same with an image with / without histogram equalization.
  3. The Huffman codes achieves higher compression in some cases than LZ78.

hist.png

 

The following shows the first few symbol/code pairs for the dictionary obtained with LZ78 algorithm, with the alphabet set as {0,1,..,255} for the low-contrast beans image (there are around 18k codes in the dictionary for this image):

index symbol (pixels) code
7081 0 0
10325 1 1
1144 2 10
7689 4 100
8286 8 1000
7401 16 10000
8695 32 100000
10664 64 1000000
5926 128 10000000
7047 128,128 1000000010000000
1608 128,128,128 100000001000000010000000
12399 128,128,128,128 10000000100000001000000010000000
3224 128,128,128,128,128 1000000010000000100000001000000010000000
3225 128,128,128,128,129 1000000010000000100000001000000010000001
3231 128,128,128,128,127 100000001000000010000000100000001111111
12398 128,128,128,129 10000000100000001000000010000001
12401 128,128,128,123 1000000010000000100000001111011
12404 128,128,128,125 1000000010000000100000001111101
12403 128,128,128,127 1000000010000000100000001111111
1609 128,128,129 100000001000000010000001
8780 128,128,129,128 10000000100000001000000110000000
7620 128,128,130 100000001000000010000010
2960 128,128,130,125 1000000010000000100000101111101
2961 128,128,130,127 1000000010000000100000101111111
8063 128,128,118 10000000100000001110110
1605 128,128,121 10000000100000001111001
6678 128,128,121,123 100000001000000011110011111011
1606 128,128,122 10000000100000001111010
1601 128,128,124 10000000100000001111100
1602 128,128,125 10000000100000001111101
4680 127,127,129 1111111111111110000001
2655 127,127,130 1111111111111110000010
1254 127,127,130,128 111111111111111000001010000000
7277 127,127,130,129 111111111111111000001010000001
9087 127,127,120 111111111111111111000
9088 127,127,122 111111111111111111010
595 127,127,122,121 1111111111111111110101111001
9089 127,127,123 111111111111111111011
9090 127,127,124 111111111111111111100
507 127,127,124,120 1111111111111111111001111000
508 127,127,124,125 1111111111111111111001111101
9091 127,127,125 111111111111111111101
9293 127,127,125,129 11111111111111111110110000001
2838 127,127,125,132 11111111111111111110110000100
1873 127,127,125,116 1111111111111111111011110100
9285 127,127,125,120 1111111111111111111011111000
9284 127,127,125,125 1111111111111111111011111101
10394 127,127,125,125,131 111111111111111111101111110110000011
4357 127,127,125,125,123 11111111111111111110111111011111011
4358 127,127,125,125,125 11111111111111111110111111011111101
6615 127,127,125,125,125,124 111111111111111111101111110111111011111100
9283 127,127,125,127 1111111111111111111011111111
9092 127,127,127 111111111111111111111
1003 127,127,127,129 11111111111111111111110000001
7031 127,127,127,130 11111111111111111111110000010
1008 127,127,127,121 1111111111111111111111111001
1009 127,127,127,122 1111111111111111111111111010
1005 127,127,127,125 1111111111111111111111111101
2536 127,127,127,125,125 11111111111111111111111111011111101
5859 127,127,127,127 1111111111111111111111111111

12454 rows × 2 columns

The next table shows the dictionary obtained for the high-contrast beans image using LZ78 algorithm:

index symbol (pixels) code
7003 0 0
11341 0,0 00
5300 0,0,16 0010000
5174 0,0,16,16 001000010000
10221 0,0,16,16,16 00100001000010000
10350 0,16 010000
9145 0,16,0 0100000
1748 0,16,0,0 01000000
12021 0,16,16 01000010000
7706 0,16,16,16 0100001000010000
6343 0,16,16,16,16 010000100001000010000
12355 0,16,16,16,16,18 01000010000100001000010010
6346 0,16,16,16,18 010000100001000010010
12019 0,16,18 01000010010
8496 0,16,18,18 0100001001010010
3459 0,67 01000011
8761 0,67,119 010000111110111
10345 0,18 010010
12295 0,18,80 0100101010000
5934 0,255 011111111
1356 0,255,214 01111111111010110
10272 1 1
1108 2 10
7611 4 100
8187 8 1000
7323 16 10000
9988 16,0 100000
8625 32 100000
10572 64 1000000
2806 16,0,0 1000000
7284 255,224 1111111111100000
10857 255,113 111111111110001
7286 255,228 1111111111100100
9053 255,228,228 111111111110010011100100
7734 255,235 1111111111101011
10853 255,119 111111111110111
1315 255,238 1111111111101110
7105 255,238,238 111111111110111011101110
10212 255,238,238,238 11111111111011101110111011101110
11699 255,30 1111111111110
6120 255,60 11111111111100
6846 255,241 1111111111110001
10205 255,241,241 111111111111000111110001
11894 255,241,241,251 11111111111100011111000111111011
9428 255,60,60 11111111111100111100
7768 255,60,60,60 11111111111100111100111100
111 255,60,60,60,67 111111111111001111001111001000011
5613 255,60,60,60,30 1111111111110011110011110011110
8713 255,60,60,60,30,30 111111111111001111001111001111011110
11729 255,60,60,60,30,30,30 11111111111100111100111100111101111011110
6848 255,244 1111111111110100
6850 255,246 1111111111110110
6845 255,248 1111111111111000
9401 255,248,248 111111111111100011111000
880 255,250 1111111111111010
881 255,251 1111111111111011
3725 255,251,251 111111111111101111111011
11316 255,251,251,235 11111111111110111111101111101011
879 255,253 1111111111111101
7215 255,253,253 111111111111110111111101

12394 rows × 2 columns

 

2. Spatial Domain Watermarking: Embedding images into images

The following figure describes the basics of spatial domain digital watermarking:

im13.png

The next figures show the watermark image and the grayscale host image to be used for embedding the watermark image inside the host image.

wm.pngThe next animations show the embedding of the most significant bit for each pixel of the watermark image inside the gray-scale host image at the upper-left corner at the bits 0,1,…,7 respectively, for each pixel in the host image. As expected, when embedded into a higher significant bit of the host image, the watermark becomes more prominent than when embedded into a lower significant bit.

watermark.gif

The next figure shows how the distribution of  the pixel intensities change when embedding into different bits.

dist.png

Some Image and Video Processing: Motion Estimation with Block-Matching in Videos, Noisy and Motion-blurred Image Restoration with Inverse Filter in Python and OpenCV

The following problems appeared in the exercises in the coursera course Image Processing (by Northwestern University). The following descriptions of the problems are taken directly from the exercises’ descriptions.

1. Analysis of an Image quality after applying an nxn Low Pass Filter (LPF) for different n

The next figure shows the problem statement. Although it was originally implemented in MATLAB, in this article a python implementation is going to be described.

ex1.png

The following figure shows how the images get more and more blurred after the application of the nxn LPF as n increases.

im1

The following figure shows how the quality of the transformed image decreases when compared to the original image, when an nxn LPF is applied and how the quality (measured in terms of PSNR) degrades as n (LPF kernel width) increases.

im2.png

2. Changing Resolution of an Image with Down/Up-Sampling

The following figure describes the problem:

ex2.png

The following steps are needed to be followed:

  1. Smooth the original image with a 3×3 LPF (box1) kernel.
  2. Downsample (choose pixels corresponding to every odd rows and columns)
  3. Upsample the image (double the width and height)
  4. Use the kernel and use it for convolution with the upsampled image to obtain the final image.

Although in the original implementation MATLAB was used, but the following results come from a python implementation.

sampling3.png

As we go on increasing the kernel size, the quality fo the final image obtained by down/up sampling the original image decreases as n increases, as shown in the following figure.

im3.png

3. Motion Estimation in Videos using Block matching between consecutive video frames

The following figure describes the problem:

ex3.png

For example we are provided with the input image with known location of an object (face marked with a rectangle) as shown below.

 digital-images-week4_quizzes-frame_1_with_block.jpg

Now we are provided with another image which is the next frame extracted from the same video but with the face unmarked) as shown below. The problem is that we have to locate the face in this next frame and mark it using simple block matching technique (and thereby estimate the motion).

digital-images-week4_quizzes-frame_2.jpg

As shown below, using just the simple block matching, we can mark the face in the very next frame.

digital-images-week4_quizzes-frame_2_with_block.jpg

Now let’s play with the following two videos. The first one is the video of some students working on a university corridor, as shown below (obtained from youtube), extract some consecutive frames, mark a face in one image and use that image to mark all thew faces om the remaining frames that are consecutive to each other, thereby mark the entire video and estimate the motion using the simple block matching technique only.

 in.gif

The following figure shows the frame with the face marked, now we shall use this image and block matching technique to estimate the motion of the student in the video, by marking his face in all the consecutive frames and reconstructing the video, as shown below..block_001.png

out.gif
The second video is the video of the Google CEO Mr. Pichai talking, as shown below (obtained from youtube), again extract some consecutive frames, mark his face in one image and use that image to mark all the faces in the remaining frames that are consecutive to each other, thereby mark the entire video and estimate the motion using the simple block matching technique only.
 in.gif
The following figure shows the frame with the face marked, now we shall use this image and block matching technique to estimate the motion of the Google CEO in the video, by marking his face in all the consecutive frames and reconstructing the video, as shown below.
block_001.png
out.gif
The most interesting thing above is that we have not used any sophisticated image features such as HOG / SIFT, still it did a pretty descent job with simple block matching.

4. Using Median Fiter to remove salt and paper noise from an image

The following figure describes the problem:

ex4.png

The following figure shows the original image, the noisy image and images obtained after applying the median filter of different sizes (nxn, for different values of n):

im4.png

As can be seen from the following figure, the optimal median filter size is 5×5, which generates the highest quality output, when compared to the original image.

im5.png

Using Inverse Filter to Restore noisy images with Motion Blurs

The following figure shows the description of the problem:

ex5.png

The following figure shows the theory behind the inverse filters in the (frequency) spectral domain.

im6.png

The following are the steps for the restoring an image with motion blurs  and noise-corruption using the Inverse Filter:

  1. Generate restoration filter in the frequency domain (with Fast Fourier Transform) from frequency response of motion blur and using the threshold T.
  2. Get the spectrum of blurred and noisy-corrupted image (the input to restoration).
  3. Compute spectrum of restored image by convolving the restoration filter with the blurred noisy image in the frequency domain.
  4. Genrate the restored image from its spectrum (with inverse Fourier Transform).

The following figures show the inverse filter is applied with different threshold value T (starting from 0.1 to 0.9 in that order) to restore the noisy / blurred image.

inverse_filter_0.1.png
inverse_filter_0.2.png
inverse_filter_0.3.png
inverse_filter_0.4.png
inverse_filter_0.5.png
inverse_filter_0.6.png
inverse_filter_0.7.png
inverse_filter_0.8.png
inverse_filter_0.9.png

The following figure shows how the Restored PSNR and the ISNR varies with different values if the threshold T.

im8.png

Some more Computational Photography: Creating Video Textures in Python and OpenCV

The following problem appeared as an assignment in the Coursera Course Computational Photography (by Georgia Tech, 2013). The following description of the problem is taken directly from the assignment’s description.

Introduction

In this article, we shall be applying our computational photography magics to video, with the purpose of creating video textures, or infinitely looping pieces of video. The input and output for the homework assignment is provided as a folder of images. The reasoning behind this, and suggestions for moving between different formats of video are given in the video Appendix in section 4.

Video Volume and SSD

Let’s get more familiar with the 4d coordinate system of a video volume (time x row x column x channel). To compute SSD, we need to find an image distance between every pair of frames in the video. SSD stands for sum of square distances, which as the name suggests, requires to take the pixelwise difference of the two frames, square it, and sum them all together.

Diff2

This function takes in a difference matrix created by ssd, and updates it with dynamic information. The intuition behind this is as follows when considering the transition cost from frame i to j, we should not only look at the frames themselves, but also consider the preceding and following frames. So, we have:

im1.png

Or, in other words, we are going to take a weighting function, and sum across the ssd outputs of the preceding and following frames in order to update the transition costs with dynamic information. We are going to use binomial filter for this purpose. The following figure shows the basics of a binomial filter:

im2.png

Find the biggest loop and synthesize loop

find the biggest loop

Now that we have the costs of transitioning from one frame to another, we should find a suitable loop for our video texture. Simply taking the smallest transition distance here might not be desirable what if the resulting loop is only 1 frame long? In order to state within the code that loop size matters, we will use a trick that is frequent in engineering disciplines. We are going to find the transition which is optimal under a metric: score(s, f) = α * (f − s) − diff2[f, s].

Note the two terms of the metric. The first is the difference between the final and starting frame of our loop. This term is large when the loop is large. The second term is the output of our diff2 function, which tells us the cost of transitioning from finish to start. Subtracting this term turns it into a ‘smoothness’ parameter. It is larger when the transition is less noticeable.

The last bit of wizardry is the alpha parameter. Because the size of the loop and the transition cost are likely to be in very different units, we introduce a new parameter to make them comparable. We can manipulate alpha to control the tradeoff between loop size and smoothness. Large alphas prefer large loop sizes, and small alphas prefer smoother transitions. The find biggest loop function has to compute this score for every choice of s and f, and return the s and f that correspond to the largest score.

synthesize loop

The finishing step is to take our video volume, and turn it back into a series of images, now cropping it to only contain the loop we found. Let’s implement this function does just that. It is pretty much the inverse of the video volume function implemented earlier, except for this time we are starting with a full video volume, and returning a list of only the image frames between start and finish (inclusive).

The following figure explains how we are going to find the video texture from an input candle video with 90 frames.

im0.png

Input Video

Let’s use the following candle video as the input video and extract 100 frames from the video. As can be seen that there are some jumps from the end to the beginning of the video and our aim is to remove the jump and create a smooth video texture that can be played infinitely many times without any jump.

anim_source.gif

The following figure shows the 100×100 distance matrix (diff1) computed in between any two video frames:

im3.png
The following figure shows the distance matrix (diff2) computed after taking into account the transition costs (weights) and distances of the adjacent frames in a window when computing the distance between any two frames.
im4.png
Finally, the following figure shows the distance matrix (diff3) computed after taking into account the tradeoff in between the length of the video texture and the smoothness of transition between the start and the end frame (with the controlling parameter alpha=90).
im5.png

Output Video Texture

The following video shows the output video texture produced (by choosing the start and end video frames i and j respectively that give the best score according to the above matrix), as can be seen the video now contains no jump and it’s pretty smooth.

anim_out.gif

The markdown file can be found here.

Some more Computational Photography: Merging and Blending Images using Gaussian and Laplacian Pyramids in Python

The following problem appeared as an assignment in the coursera course Computational Photography (by Georgia Tech, 2013). The following description of the problem is taken directly from the assignment’s description.

Introduction

In this article, we shall be putting together a pyramid blending pipeline that will allow us to blend any two images with a mask. The blending procedure takes the two images and the mask, and splits them into their red, green and blue channels. It then blends each channel separately.

Then a laplacian pyramid will be constructed for the two images. It will then scale the mask image to the range [0,1] and construct a gaussian pyramid for it. Finally, it will blend the two pyramids and collapse them down to the output image.

Pixel values of 255 in the mask image are scaled to the value 1 in the mask pyramid, and assign the strongest weight to the image labeled ‘white’ during blending. Pixel values of 0 in the mask image are scaled to the value 0 in the mask pyramid, and assign the strongest weight to the image labeled ‘black’ during blending.

The following figures describe the pyramid blending process with couple of sample images.

im1.png
im2.png

Reduce

This function takes an image and subsamples it down to a quarter of the size (dividing the height and width by two). Before we subsample the image, however, we need to first smooth out the image.

The following 5×5 generating kernel will be used for convolution, for the reduce and later for the expand function.

kern.png

Expand

This function takes an image and supersamples it to four times the size (multiplying the height and width by two). After increasing the size, we have to interpolate the missing values by running over it with a smoothing filter.

Gaussian Pyramid

This function takes an image and builds a pyramid out of it. The first layer of this pyramid is the original image, and each subsequent layer of the pyramid is the reduced form of the previous layer. Within the code, these pyramids are represented as lists of arrays, so pyramid = [layer0, layer1, layer2, …]. The reduce function implemented in the previous part needs to be used in order to implement this function.

Laplacian Pyramid

This function takes a gaussian pyramid constructed by the previous function, and turns it into a laplacian pyramid. Like with gaussian pyramids, laplacian pyramids are represented as lists and the expand function implemented in the previous part needs to be used in order to implement this function.

Blend

In this part, the pipeline will be set up by implementing the actual blend function, and then implementing a collapse function that will allow us to disassemble our laplacian pyramid into an output image.

Blend function takes three pyramids:

  • white – a laplacian pyramid of an image
  • black – a laplacian pyramid of another image
  • mask – a gaussian pyramid of a mask image

It should perform an alphablend of the two laplacian pyramids according to the mask pyramid. So, we need to blend each pair of layers together using the mask of that layer as the weight. Pixels where the mask is equal to 1 should be taken from the white image, pixels where the mask is 0 should be taken from the black image. Pixels with value 0.5 in the mask should be an equal blend of the white and black images, etc.

Collapse

This function is given a laplacian pyramid, and is expected to ‘flatten’ it to an image. We need to take the top layer, expand it, and then add it to the next layer. This results in a pyramid of one level less than we started with. We continue this process until we are left with only a single layer.

Results (some images are taken from the same Computational Photography Course)

The following figures show a few pairs of input images, the masks, the blended output images, along with the gaussian and laplacian pyramids.

1. Input Images

i1.png

Gaussian Pyramids

 sample_gauss_pyr.jpg

Laplacian Pyramids

sample_laplace_pyr.jpg|

Blended Laplacian Pyramids

sample_outpyr.jpg

Blended Output Image

o1.png

2. Input Images

i2.png

Gaussian Pyramids

 sample1_gauss_pyr.jpg

Laplacian Pyramids

 sample1_laplace_pyr.jpg

Blended Laplacian Pyramids

sample1_outpyr.png

Blended Output Image

o2.png

3. Input Images

i3.png

Gaussian Pyramids

sample2_gauss_pyr.png

Laplacian Pyramids

sample2_laplace_pyr.png

Blended Laplacian Pyramids

sample2_outpyr.png

Blended Output Image

o3.png

4. Input Images

i4.png

Gaussian Pyramids

sample3_gauss_pyr.png

Laplacian Pyramids

sample3_laplace_pyr.png

Blended Laplacian Pyramids

sample3_outpyr.png

Blended Output Image

o4.png
The next example shows how the pyramid pipeline can be used to edit an image background.

5. Input Images

i5.png

Gaussian Pyramids

sample4_gauss_pyr.png

Laplacian Pyramids

sample4_laplace_pyr.png

Blended Laplacian Pyramids

sample4_outpyr.png

Blended Output Image

o5.png

As can be seen from the output image above, the pyramid blending pipleline did a good job in blending the two images.

6. Input Images

i6.png

Gaussian Pyramids

sample6_gauss_pyr.png

Laplacian Pyramids

sample6_laplace_pyr.png

Blended Laplacian Pyramids

sample6_outpyr.png

Blended Output Image

 o6.png

The markdown file can be found here.

Some Image Processing and Computational Photography: Convolution, Filtering and Edge Detection with Python

The following problems appeared as an assignment in the coursera course Computational Photography (by Georgia Institute of Technology). The following descriptions of the problems are taken directly from the assignment’s descriptions.

Introduction

In this article, we shall be playing around with images, filters, and convolution. We will begin by building a function that performs convolution. We will then experiment with constructing and applying a variety of filters. Finally, we will see one example of how these ideas can be used to create interesting effects in our photos, by finding and coloring edges in our images.

Filtering

In this section, let’s apply a few filters to some images. Filtering basically means replacing each pixel of an image by the linear combination of its neighbors. We need to understand the following concepts in this context:

  1. Kernel (mask) for a filter: defines which neighbors to be considered and what weights are to be given.
  2. Cross-Correleation vs. Convolution: determines how the kernel is going to be applied on the neighboring pixels to compute the linear combination.

Convolution

The following figure describes the basic concepts of cross-correlation and convolution. Basically convolution flips the kernel before applying it to an image.

im1.png

Cross-Correlation vs. Convolution

The following figures show a custom kernel applied on an impulse response image changes the image both with cross-correlation and convolution.

The impulse response image is shown below:

impulse_response

The below figure shows a 3×3 custom kernel to be applied on the above impulse response image.

kernel_custom.png
The following figure shows the output images after applying cross-correlation and convolution with the above kernel on the above impulse response image. As can be seen, convolution produces the desired output.
cc.png

The following figures show the application of the same kernel on some grayscale images and the output images after convolution.

all_conv_kernel_custom.png

Now let’s apply the following 5×5 flat box2 kernel shown below.

kernel_box2.png

The following figures show the application of the box kernel above on some grayscale images and the output images after convolution. on some images and notice how the output images are blurred.

all_conv_kernel_box2.png

The following figures show the application of the box kernels of different size on the grayscale image lena and the output images after convolution. As expected, the blur effect increases as the size of the box kernel increases.

lena_conv_kernel_box.png

Gaussian Filter

The following figure shows a 11×11 Gaussian Kernel generated by taking outer product of the densities of two 1D i.i.d. Gaussians with mean 0 and s.d. 3.

kernel_gaussian_5_3.png

Here is how the impulse response image (enlarged) looks like after the application of the above Gaussian Filter.

Impulse_gaussian_3.png

The next figure shows the effect of Gaussian filtering / smoothing (blur) on some images.

all_conv_kernel_gaussian3.png

The following figure shows the 11×11 Gaussian Kernels generated with 1D i.i.d. Gaussians with different bandwidths (s.d. values).

gaussian_kernels.png
The following figures show the application of the Gaussian kernels of different bandwidth on the following grayscale image and the output images after filtering. As expected, the blur effect increases as the bandwidth of the Gaussian kernel increases.
a_gaussian_kernel.png

Sharpen Filter

The following figure shows a 11×11 Sharpen Kernel generated by subtracting a gaussian kernel (with s.d. 3) from a scaled impulse response kernel with 2 at center.

sharpen_kernel.png

The next figure shows the effect of Sharpen flitering on some images.

all_conv_kernel_sharpen3.png
The following figure shows the 11×11 Sharpen Kernels with different bandwidths (s.d. values).
sharpen_kernels.png
The following figures show the application of the Sharpen kernels of different bandwidth on the following grayscale image and the output images after filtering. As expected, the sharpen effect increases as the bandwidth of the Gaussian kernel increases.
mri_sharpen_kernel.png

Median filter

One last thing we shall do to get a feel for is nonlinear filtering. So far, we have been doing everything by multiplying the input image pixels by various coefficients and summing the results together. A median filter works in a very different way, by simply choosing a single value from the surrounding patch in the image.

The next figure shows the effect of Median flitering on some images. As expected, with a 11×11 mask, some of the images are getting quite blurred.

all_median.png

Drawing with edges

At a high level, we will be finding the intensity and orientation of the edges present in the image using convolutional filters, and then visualizing this information in an aesthetically pleasing way.

Image gradients

The first thing it does is to find the gradient of the image. A gradient is a fancy way of saying “rate of change”. The following figure shows the basic concepts about image gradients.

im6.png

Edge Detection

In order to find the edges in our image, we are going to look for places where pixels are rapidly changing in intensity. The following figure shows the concept:

im7.png

The Sobel Filter

There are a variety of ways for doing this, and one of the most standard is through the use of Sobel filters, which have the following kernels:

im2.png

 If we think about the x sobel filter being placed on a strong vertical edge:

im3

Considering the yellow location, the values on the right side of the kernel get mapped to brighter, and thus larger, values. The values on the left side of the kernel get mapped to darker pixels which are close to zero. The response in this position will be large and positive.

Compare this with the application of the kernel to a relatively flat area, like the blue location. The values on both sides are about equal, so we end up with a response that is close to zero. Thus, the x-direction sobel filter gives a strong response for vertical edges. Similarly, the y-direction sobel filter gives a strong response for horizontal edges.

The steps for edge detection:

  1. Convert the image to grayscale.
  2. Blur the image with a gaussian kernel. The purpose of this is to remove noise from the image, so that we find responses only to significant edges, instead of small local changes that might be caused by our sensor or other factors.
  3. Apply the two sobel filters to the image.

Edge orientations and magnitudes

Now we have the rate at which the image is changing in the x and y directions, but it makes more sense to talk about images in terms of edges, and their orientations and intensities. As we will see, we can use some trigonometry to transform between these two representations.

First, let’s see how our sobel filters respond to edges at different orientations from the following figures:

im4.png

The red arrow on the right shows the relative intensities of the response from the x and y sobel filter. We see that as we rotate the edge, the intensity of the response slowly shifts from the x to the y direction. We can also consider what would happen if the edge got less intense:

im5.png

Let’s apply the sobel filters on the following butterfly image (taken from the slides of the same computational photography course).

butterfly.png

The following figures show the result of applying the sobel filters on the above butterfly image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high (F is the image).butterfly_sobel.png

Again, let’s apply the sobel filters on the following tiger image (taken from the slides of the same computational photography course).
tiger.png

The following figures show the result of applying the sobel filters on the above tiger image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high.

tiger_sobel.png

Next, let’s apply the sobel filters on the following zebra image (taken from the slides of the same computational photography course).

zebra.png

The following figures show the result of applying the sobel filters on the above zebra image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high.

zebra_sobel.png

Finally, let’s apply the sobel filters on the following image of mine.

me.png

The following figures show the result of applying the sobel filters on my image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high.

me_sobel.png

The following figure shows the horizontal (H_x) and vertical (H_y) kernels for a few more filters such as Prewitt and Roberts along with Sobel.

im8

The Prewitt Filter

The following figures show the result of applying the prewitt filters on the butterfly image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high (F is the image).

butterfly_prewitt.png

The Roberts Filter

The following figures show the result of applying the roberts filters on the butterfly image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high (F is the image).

butterfly_roberts.png

The LOG Filter

The following figure shows the Gaussian Kernels with different bandwidths, a Sharpen Kernel, along with the LOG (Laplacian of Gaussian Kernel), a very useful kernel for edge detection in images and DOG kernel.


gaussian_kernels_cont.png

The following figures show the results of applying the LOG filter on the above images.

all_LOG.png

The DOG Filter


Difference of Gaussian (DOG)
Kernel can also be used to find edges. The following figure shows the results of application of DOG (with sd 2, 5) on the above images:

all_og_2_5.png

The Canny Edge Detector

The following figure shows the Canny Edge Detection algorithm:

im9.png

The following figures show the results of applying the Canny Edge Detection algorithm on the above images (with the intensity thresholds as 50 and 100 respectively).

all_canny.png

Mapping to color

We now have the magnitude and orientation of the edge present at every pixel. However, this is still not really in a form that we can examine visually. The angle varies from 0 to 180, and we don’t know the scale of the magnitude. What we have to do is to figure out a way to transform this information to some sort of color values to place in each pixel.

The edge orientations can be pooled into four bins edges that are roughly horizontal, edges that are roughly vertical, and edges at 45 degrees to the left and to the right. Each of these bins are assigned a color (vertical is yellow, etc). Then the magnitude is used to dictate the intensity of the edge. For example a roughly vertical edge of moderate intensity would be set to a medium yellow color, or the value (0, 100, 100).

The following figures show some of the edges extracted from some of the images previously used as inputs and color-mapped using sobel filter:

edges.png

edges_sobel.png

The markdown file can be found here: https://github.com/sandipan/Blogs/blob/master/comp_photo.md

Some BioInformatics: Suffix Tree Construction and the Longest Repeated Substring Problem in Python

The following problem appeared as an assignment in the coursera course BioInformatics Algorithms II (by UCSD). The following description of the problem is taken from the course assignment, Rosalind (http://rosalind.info/) and Stepik (https://stepik.org/).

The following problems define a couple of data structures for fast pattern matching: a collection of (smaller) strings Patterns that we wish to match against a (larger) genome string Text.

Creating a Trie from Patterns

we would like to organize Patterns into a data structure to prevent multiple passes down Text and to reduce the runtime. To this end, we will consolidate Patterns into a directed tree called a trie (pronounced “try”), which is written Trie(Patterns) and has the following properties (example shown in the next figure).

  1. The trie has a single node with indegree 0, called the root.
  2. Each edge of Trie(Patterns) is labeled with a letter of the alphabet.
  3. Edges leading out of a given node have distinct labels.
  4. Every string in Patterns is spelled out by concatenating the letters along some path from
  5. the root downward.
  6. Every path from the root to a leaf, or node with outdegree 0, spells a string from Patterns.

im00.png

im01.png

Creating a Suffix Trie

A suffix trie, denoted SuffixTrie(Text), is the trie formed from all suffixes of Text (see figure below). A dollar sign (“$”) is appended to Text in order to mark the end of Text. We will also label each leaf of the resulting trie by the starting position of the suffix whose path through the trie ends at this leaf (using 0 based indexing). This way, when we arrive at a leaf, we will immediately know where this suffix came from in Text.

The following pseudocode constructs the suffix trie of a string Text by traversing the suffixes of Text from longest to shortest. Given a suffix, it attempts to spell the suffix downward by following edge labels as far as possible until it can go no further. At that point, it adds the rest of the suffix to the trie in the form of a path to a leaf, along with the position of each symbol in the suffix.

There are |Text| suffixes of Text, ranging in length from 1 to |Text| and having total length O(|Text|2)O(|Text|2). Thus, we need to reduce both the construction time and memory requirements of suffix tries to make them practical. We can reduce the number of edges in SuffixTrie(Text) by combining the edges on any nonbranching path into a single edge. We then label this edge with the concatenation of symbols on the consolidated edges, as shown in the figure below. The resulting data structure is called a suffix tree, written SuffixTree(Text).

Prove: A SuffixTree(Text) has exactly |Text| leaves (since each leaf correspond to a suffix of Text) and at most |Text| other nodes (in the worst case Text can be repetition of the same symbol, in which case the suffix tree will be linear and it will have |Text| other nodes).

Creating a Suffix Tree

Given a string s having length L, recall that its suffix tree T is defined by the following properties:

  1. T is a rooted tree having exactly L+1 leaves.
  2. Every edge of T is labeled with a substring of s$, where s$ is the string formed by adding a placeholder symbol $ to the end of s.
  3. Every internal node of T other than the root has at least two children; i.e., it has degree at least 3.
  4. The substring labels for the edges leading down from a node to its children must begin with different symbols.
  5. By concatenating the substrings along edges, each path from the root to a leaf corresponds to a unique suffix of s.

The following figures show example Suffix Trie and Suffix Tree created from a Text and also the algorithm pseudocodes.

im0.png

im1.png

The following figures show how the suffix tree is constructed for the Text panamabananas$ as all the suffixes are inserted into the suffix tree starting from the largest to the smallest suffix.

Inserting suffix panamabananas$ to the empty tree
tree0
Inserting suffix anamabananas$ to the tree
tree1
Inserting suffix namabananas$ to the tree
tree2
Inserting suffix amabananas$ to the tree
tree3
Inserting suffix mabananas$ to the tree
tree4
Inserting suffix abananas$ to the tree
tree5
Inserting suffix bananas$ to the tree
tree6
Inserting suffix ananas$ to the tree
tree7
Inserting suffix nanas$ to the tree
tree8
Inserting suffix anas$ to the tree
tree9
Inserting suffix nas$ to the tree
tree10
Inserting suffix as$ to the tree
tree11
Inserting suffix s$ to the tree

tree12

Inserting suffix $ to the tree
tree13

The next figures show how the suffix tree is constructed for the genome Text ATAAATG$.

tree0
tree1
tree2
tree3
tree4
tree5tree6tree7

The next figures show how the suffix tree is constructed for the genome TextTCTGAGCCCTACTGTCGAGAAATATGTATCTCGCCCCCGCAGCTT$.tree0tree1tree2tree3tree4tree5tree6tree7tree8tree9tree10tree11tree12tree13tree14tree15tree16tree17tree18tree19tree20tree21tree22tree23tree24tree25tree26tree27tree28tree29tree30tree31tree32tree33tree34tree35tree36tree37tree38tree39tree40tree41tree42tree43tree44tree45

A more efficient implementation for suffix tree construction can be done using the  Ukkonen’s suffix tree algorithm.

The Longest  Repeated Substring Problem

im2

As we can see from the following figure, in order to find any repeated pattern inside the Text,

  1. We need to start at the root from the suffix tree and traverse towards the internal nodes (concatenating all the edges from the root to any of the internal nodes represents a repeated pattern inside the text).
  2. Choose the longest of all such texts.

In the next figure the paths colors red, blue and green each starting from the root and ending in some internal node of the suffix tree represent repeated patterns.

lptree.png

Analyzing the connectivity of a computer network undergoing a cyber-attack in Python

The following problem appeared as a project in the coursera course Algorithmic Thinking (by RICE university), a part of Fundamentals of Computing specialization. The following description of the problem is taken directly from the project description.

Graph exploration (that is, “visiting” the nodes and edges of a graph) is a powerful and necessary tool to elucidate properties of graphs and quantify statistics on them. For example, by exploring a graph, we can compute its degree distribution, pairwise distances among nodes, its connected components, and centrality measures of its nodes and edges. Both DFS and BFS can be used to compute the connected components of a graph.

Now, the task is to analyze the connectivity of a computer network as it undergoes a cyber-attack. In particular, we shall simulate an attack on this network in which an increasing number of servers are disabled. In computational terms, we will model the network by an undirected graph and repeatedly delete nodes from this graph. We will then measure the resilience of the graph in terms of the size of the largest remaining connected component as a function of the number of nodes deleted.

Example graphs

Let’s compute the resilience of following 3 types of undirected graphs:

  1. An example computer network – an undirected graph (with 1347 nodes and 3112 edges). The next figure shows a small part of the undirected network graph.
  2. ER random graphs.
  3. UPA graphs – the undirected version of the DPA graphs. Note that since the degree of the newly added node is no longer zero, its chances of being selected in subsequent trials increases.

Part of a computer network graph

network_graph.png

To begin analysis, let’s examine the resilience of the computer network under an attack in which servers are chosen at random. Then let’s compare the resilience of the network to the resilience of ER and UPA graphs of similar size.

To begin, let’s determine the probability such that the ER graph computed using this edge probability has approximately the same number of edges as the computer network. Likewise, let’s compute an integer such that the number of edges in the UPA graph is close to the number of edges in the computer network. Remember that all three graphs being analyzed should have the same number of nodes and approximately the same number of edges.

Then for each of the three graphs (computer network, ER, UPA), let’s compute a random attack order and use this attack order to compute the resilience of the graph.

The following figure shows the algorithms to be used to find the largest component size in a graph.

im29.png

The following figures show how BFS finds a component starting from a source node and visiting all the nodes reachable from the source.

network_graph_0_20network_graph_0_48network_graph_0_64network_graph_0_80network_graph_0_100network_graph_0_150network_graph_0_200

Once the resilience for all three graphs are computed, let’s plot the results as three curves combined in a single plot. The horizontal axis for the plot is going to be the the number of nodes removed (ranging from zero to the number of nodes in the graph) while the vertical axis should be the size of the largest connected component in the graphs resulting from the node removal.

im_25_26_1.png

Consider removing a significant fraction of the nodes in each graph in random order . We will say that a graph is resilient under this type of attack if the size of its largest connected component is roughly (within ~25%) equal to the number of nodes remaining, at all times during the attack.

As can be seen from the above figure, both the ER and UPA graphs are resilient, but the network graph is not under this attack.

20% of nodes ~ 270 nodes. If these first 270 nodes are removed from the total 1347 nodes,

For network graph the initial size of the largest component = 1239, final size after removal of these nodes = 889 which is ~28% reduction in size > 25%

For ER graph the initial size of the largest component = 1337, final size after removal of these nodes = 1047 which is ~ 21.69% reduction in size < 25%

For UPA graph the initial size of the largest component = 1347, final size after removal of these nodes = 1077 which is ~ 20% reduction in size < 25%

In the next three problems, let’s consider attack orders in which the nodes being removed are chosen based on the structure of the graph. A simple rule for these targeted attacks is to always remove a node of maximum (highest) degree from the graph. The function targeted_order takes an undirected graph ugraph and iteratively does the following:

  1. Computes a node of the maximum degree in ugraph . If multiple nodes have the maximum degree, it chooses any of them (arbitrarily).
  2. Removes that node (and its incident edges) from ugraph.

Observe that targeted_order continuously updates ugraph and always computes a node of maximum degree with respect to this updated graph. The output of targeted_order is a sequence of nodes that can be used as input to compute_resilience.

As we examine the algorithm targeted_order, it can be seen that is not as efficient as possible. In particular, much work is being repeated during the location of nodes with the maximum degree. In this question, we will consider an alternative method (which we will refer to as fast_targeted_order) for computing the same targeted attack order. Here is a pseudo-code description of the method:

im_25_26_3.png

im30.png

To continue our analysis of the computer network, we will examine its resilience under an attack in which servers are chosen based on their connectivity. We will again compare the resilience of the network to the resilience of ER and UPA graphs of similar size.

Using targeted_order (or fast_targeted_order), the task is to compute a targeted attack order for each of the three graphs (computer network, ER, UPA) from previous question. Then, for each of these three graphs, let’s compute the resilience of the graph using compute_resilience . Finally, let’s plot the computed resiliences as three curves (line plots) in a single plot.

im26.png

Now, let’s consider removing a significant fraction of the nodes in each graph using targeted_order. Let’s examine the shape of the three curves from the previous plot in Question 4. Which of the three graphs are resilient under targeted attacks as the first 20% of their nodes are removed?

ER graph is resilient, but the UPA and the network graphs are not resilient under this targeted attack.

20% of nodes ~ 270 nodes. If these first 270 nodes are removed from the total 1347 nodes,

For network graph the initial size of the largest component = 1239, final size after removal of these nodes = 3 which is ~99.76% reduction in size >> 25%

For ER graph the initial size of the largest component = 1333, final size after removal of these nodes = 990 which is ~ 25.73% reduction in size ~ 25%

For UPA graph the initial size of the largest component = 1347, final size after removal of these nodes = 19 which is ~ 98.59% reduction in size >> 25%

Some NLP: Spelling Correction with Noisy Channel Model in Python

The following problem appeared as an assignment in the coursera course Natural Languange Processing (by Stanford university). The following description of the problem is taken from the assignment description and the course slides.

The following figure shows the basic concepts of spelling correction using the noisy channel model. Given the misspelled word, the most probable correct word can be computed by

  1. generating the possible correct words (from the vocabulary) with some edit-distance model.
  2. estimating the prior and the likelihood probabilities using some language model.
  3. computing the posterior probabilities and the MAP estimate using the Bayes Theorem.

im0.png

The following figure shows how the candidate words (possible correct words) can be generated (mostly the words within edit distance 1) and then a suitable language model can be used to estimate (with MLE) the prior and the likelihood (insertion / deletion etc.) probabilities from the training corpus.

im0_1.png
The following figure shows the real world spelling correction methods using the noisy channel model, we are going to use these techniques to solve the spelling-correction problem. Given a k-word sentence W=w1,w2,,wk, we are going to apply the following steps to compute the most probable sentence with the possibly-corrected misspelledwords:

  1. For each word w_i generate the candidate set (e.g., the words that are withing 1-edit distance).
  2. Use a language model to compute the probability of a sequence P(W) and then find the most probable word-sequence W.

im0_2.png

 

The following figure shows the different language models that are going to be used to compute the MLE (without / with smoothing) probabilities and also how they can be used to compute the probability of a sentence (i.e. a sequence of words).

im1.png

 

  1. Given a training corpus, the parameters for the language models are first estimated from the corpus.
  2. Then these probabilities are used to compute the most probable (possibly spell-corrected) sequence of candidate words for each sentence in the dev corpus.
  3. The accuracy of spell-correction on the dev corpus is computed using the ground truth for each language model and reported.

Results

The following results show the accuracy of spell-correction using the different language models on the dev corpus:

  1. Uniform Language Model: correct: 31 total: 471 accuracy: 0.065817
  2. Unigram Language Model: correct: 29 total: 471 accuracy: 0.061571
  3. Laplace Unigram Language Model: correct: 52 total: 471 accuracy: 0.110403
  4. Laplace Bigram Language Model: correct: 64 total: 471 accuracy: 0.135881
  5. Stupid Backoff Language Model: correct: 85 total: 471 accuracy: 0.180467
  6. Custom Language Model (Simple Linear Interpolation λ1=0.17,λ2=0.39,λ3=0.44): correct: 91 total: 471 accuracy: 0.193206

im5.png

The following table shows some sentences from the dev corpus with misspelled word(s) along with the ground truths and the noisy channel model-corrected outputs with different language models.
im4.png

The next figure shows the simple linear interpolation language model accuracy on dev corpus with different values of λ1 and λ2.

im2.png

Implementing LASSO Regression with Coordinate Descent, Sub-Gradient of the L1 Penalty and Soft Thresholding in Python

This problem appeared as an assignment in the coursera course Machine Learning – Regression, part of Machine Learning specialization by the University of Washington. The following description of the problem is taken directly from the assignment.

The goal of this assignment is to implement an LASSO Solver using coordinate descent. The following are the sub-tasks:

  • First the features are needed to be normalized.
  • The coordinate descent for LASSO needs to be implemented (with the subgradient of the L1 penalty).
  • The effects of L1 penalty are going to be explored.

The following dataset (few rows and columns are shown in the below table) is from house sales in King County, the region where the city of Seattle, WA is located. This is going to be our input datatset for training the LASSO model, with the response variable as price.

im0.png

Normalize features

In the house dataset, features vary wildly in their relative magnitude: sqft_living is very large overall compared to bedrooms, for instance. As a result, weight for sqft_living would be much smaller than weight for bedrooms. This is problematic because “small” weights are dropped first as l1_penalty goes up.

To give equal considerations for all features, we need to normalize features by dividing each feature by its 2-norm so that the transformed feature has norm 1.

Implementing Coordinate Descent with normalized features

 The following theory is going to be used to implement the coordinate descent (for normalized feature_j, we shall have z_j=1):
im1.png
im2.png
im2_3.png

Effect of L1 penalty

Let us consider a simple model with 2 features:

simple_features = ['sqft_living', 'bedrooms']

Single Coordinate Descent Step

Using the formula above, let’s implement coordinate descent that minimizes the cost function over a single feature i. Note that the intercept (weight 0) is not regularized. The function should accept feature matrix, output, current weights, l1 penalty, and index of feature to optimize over. The function should return new weight for feature i.

Cyclical coordinate descent

Now that we have a function that optimizes the cost function over a single coordinate, let us implement cyclical coordinate descent where we optimize coordinates 0, 1, …, (k-1) in order and repeat.

When do we know to stop? Each time we scan all the coordinates (features) once, we measure the change in weight for each coordinate. If no coordinate changes by more than a specified threshold, we stop.

For each iteration:

  1. As we loop over features in order and perform coordinate descent, measure how much each coordinate changes.
  2. After the loop, if the maximum change across all coordinates is falls below the tolerance, stop. Otherwise, go back to step 1.

Return weights

IMPORTANT: when computing a new weight for coordinate i, make sure to incorporate the new weights for coordinates 0, 1, …, i-1. One good way is to update the weights variable in-place. 

The following figure shows the coefficient path for LASSO for different values of the L1-penalty using two simple features

im3

Evaluating LASSO fit with more features

Let us consider the following set of features.

all_features = ['bedrooms',
                'bathrooms',
                'sqft_living',
                'sqft_lot',
                'floors',
                'waterfront', 
                'view', 
                'condition', 
                'grade',
                'sqft_above',
                'sqft_basement',
                'yr_built',
                'yr_renovated']

The following figure shows the coefficient path for LASSO for different values of the L1-penalty using all the above features:

im4