# Solving Some Image Processing and Computer Vision Problems with Python libraries

In this article, a few image processing / computer vision problems and their solutions  with python libraries (scikit-image, cv2) will be discussed. Some of the problems are from the exercises from this book (available on amazon). This blog will be continued here.

## Removing Gaussian Noise from images by computing mean and median images with scikit-image

2. Create n (e.g, n=100) noisy images by adding i.i.d. Gaussian noise (with zero mean) to the original image, with scikit-image.
3. Compute the mean (median) of the noisy images.
4. Compare PSNR with the original image.
5. Vary n and compare the results.
from skimage import img_as_float
from skimage.util import random_noise
from skimage.measure import compare_psnr
from skimage.io import imread
import matplotlib.pylab as plt
import numpy as np

im = img_as_float(imread('../new images/parrot.jpg')) # original image
np.random.seed(0)
# generate n noisy images from the original image by adding Gaussian noise
n = 25
images = np.zeros((n, im.shape[0], im.shape[1], im.shape[2]))
sigma = 0.2
for i in range(n):
images[i,...] = random_noise(im, var=sigma**2)

im_mean = images.mean(axis=0)
im_median = np.median(images, axis=0)
plt.figure(figsize=(20,16))
plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05, hspace=.01)
plt.subplot(221), plt.imshow(im), plt.axis('off'), plt.title('Original image', size=20)
plt.subplot(222), plt.imshow(images[0]), plt.axis('off'), plt.title('Noisy PSNR: ' + str(compare_psnr(im, images[0])), size=20)
plt.subplot(223), plt.imshow(im_mean), plt.axis('off'), plt.title('Mean PSNR: ' + str(compare_psnr(im, im_mean)), size=20)
plt.subplot(224), plt.imshow(im_median), plt.axis('off'), plt.title('Median PSNR: ' + str(compare_psnr(im, im_median)), size=20)
plt.show()



The next figure shows the original image, a noisy image generated from it by adding Gaussian noise (with 0 mean) to it and the images obtained by taking mean / median over all the n noisy images generated. As can be seen, the Gaussian noise in the images gets cancelled out by taking mean / median.

with n = 25

with n=100


plt.hist(images[:,100,100,0], color='red', alpha=0.2, label='red')
plt.hist(images[:,100,100,1], color='green', alpha=0.2, label='green')
plt.hist(images[:,100,100,2], color='blue', alpha=0.2, label='blue')
plt.legend()
plt.grid()
plt.show()



The next figure shows how a pixel value (that can be considered a random variable) for a particular location in different noisy images follows approximately a Gaussian distribution.

Distribution of a pixel value at location (100,100) in the noisy images


ns = [25, 50, 100, 200]
# mean_psnrs contain the PSNR values for different n
plt.plot(ns, mean_psnrs, '.--', label='PSNR (mean)')
plt.plot(ns, median_psnrs, '.--', label='PSNR (median)')
plt.legend()
plt.xlabel('n'),  plt.ylabel('PSNR')
plt.show()



The following figure shows that the PSNR improves with large n (since by SLLN / WLLN, the sample mean converges to population mean 0 of the Gaussian noise). Also, for median the improvement in the image quality is higher for larger values of n.

## Tracking Pedestrians with HOG-SVM with OpenCV / scikit-image

1. Start with a video with pedestrians.
2. Capture the video / extract frames from the video.
3. For each frame
1. Create HOG scale pyramid of the frame image.
2. At each scale, use a sliding window to extract the corresponding block from the frame, compute the HOG descriptor features.
3. Use cv2‘s HOGDescriptor_getDefaultPeopleDetector() – a pre-trained SVM classifier on the HOG descriptor to classify whether the corresponding block contains a pedestrian or not.
4. Run non-max-suppression to get rid of multiple detection of the same person.
5. Use cv2‘s  detectMultiScale() function to implement steps 3-4.

The code is adapted from the code here and here.


# HOG descriptor using default people (pedestrian) detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

# run detection, using a spatial stride of 4 pixels,
# a scale stride of 1.02, and zero grouping of rectangles
# (to demonstrate that HOG will detect at potentially
# multiple places in the scale pyramid)
(foundBoundingBoxes, weights) = hog.detectMultiScale(frame, winStride=(4, 4), padding=(8, 8), scale=1.02, finalThreshold=0, useMeanshiftGrouping=False)

# convert bounding boxes from format (x1, y1, w, h) to (x1, y1, x2, y2)
rects = np.array([[x, y, x + w, y + h] for (x, y, w, h) in foundBoundingBoxes])

# run non-max suppression on the boxes based on an overlay of 65%
nmsBoundingBoxes = non_max_suppression(rects, probs=None, overlapThresh=0.65)



cv2 functions are used to extract HOG descriptor features and pedestrian detection with SVM,  whereas scikit-image functions are used to visualize the HOG features. The animations below display the original video, what HOG sees and  the detected pedestrians after non-max suppression. Notice there are a few false positive detection.

Original Video

HOG-descriptor features video (what HOG sees)Original Video with detected Pedestrians

## Face Detection with HaarCascade pre-trained AdaBoost classifiers with OpenCV

1. Capture video with webcam with cv2.VideoCapture().
2. For each frame, use the pre-trained Adaboost Cascade classifiers (the haarcascade_frontalface_default classifier for face detection and haarcascade_eye_tree_eyeglasses classifier for better detection of the eyes with glasses, from the corresponding xml files that come with cv2’s installation) using Haar-like features with cv2.CascadeClassifier().
3. First detect the face(s) with the detectMultiScale() function and draw a bounding box. Then detect the eyes inside a detected face with the same function.
4. The following python code snippet shows how to detect faces and eyes with cv2. The code is adapted from here.


# read the cascade classifiers from the xml files from the correct path into face_cascade  # and eye_cascade
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# return bounding box of the face(s) if one is detected
faces = face_cascade.detectMultiScale(gray, 1.03, 5)
for (x,y,w,h) in faces:
frame = cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = frame[y:y+h, x:x+w]
for (ex,ey,ew,eh) in eyes:
cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)



The next animation shows the results of face detection when scalefactor 1.03 was used to create the scale pyramid.  As can be seen, the eyes with the glasses on and some small faces from the photos are not detected at this scale.

The next animation shows the results of face detection when scalefactor 1.3 was used to create the scale pyramid.  As can be seen, the eyes with/without the glasses on as well as most of the small faces from the photos are detected at this scale most of the time.

## Semantic Segmentation with ENet / DeepLab (Deep Learning  model)

Input video and the segmented Output video

Input video and the segmented Output video

# Solving Some Image Processing Problems with Python libraries

In this article a few popular image processing problems along with their solutions are going to be discussed. Python image processing libraries are going to be used to solve these problems. Some of the problems are from the exercises from this book  (available on amazon).

## Image Transformations and Warping

### 0. Display RGB image color channels in 3D

1. A gray-scale image can be thought of a 2-D function f(x,y) of the pixel locations (x,y), that maps each pixel into its corresponding gray level (an integer in [0,255], e.g.,).
2. For an RGB image there are 3 such functions, f_R(x,y), f_G(x.y), f_B(x.y).
3. matplotlib’s 3-D plot functions can be used to plot each function.

The following python code shows how to plot the RGB channels separately in 3D:

import matplotlib.pylab as plt
from mpl_toolkits.mplot3d import Axes3D
def plot_3d(X, Y, Z, title, cmap):
# implement this function to plot the channel pixel values in 3D
plt.show()

Y = np.arange(im.shape[0])
X = np.arange(im.shape[1])
Z1 = im[...,0]
Z2 = im[...,1]
Z3 = im[...,2]
plot_3d(X, Y, Z1, cmap='Reds', title='3D plot for the Red Channel')
plot_3d(X, Y, Z2, cmap='Greens', title='3D plot for the Green Channel')
plot_3d(X, Y, Z3, cmap='Blues', title='3D plot for the Blue Channel')



The RGB image

### 1. Wave Transform

1. Use scikit-image’s warp() function to implement the wave transform.
2. Note that wave transform can be expressed with the following equations:

We shall use the madrill image to implement the wave transform. The next python code fragment shows how to do it:


def wave(xy):
xy[:, 1] += 20*np.sin(2*np.pi*xy[:, 0]/64)
return xy

from skimage.io import imread
from skimage.transform import warp
import matplotlib.pylab as plt
im = warp(im, wave)
plt.imshow(im)
plt.show()



The next figure shows the original mandrill input image and the output image obtained after applying the wave transform.

### 2. Swirl Transform

1. Use scikit-image’s warp() function to implement the swirl transform.
2. Note that swirl transform can be expressed with the following equations

We shall use the madrill image to implement the wave transform. The next python code fragment shows how to do it:


def swirl(xy, x0, y0, R):
r = np.sqrt((xy[:,1]-x0)**2 + (xy[:,0]-y0)**2)
a = np.pi * r / R
xy[:, 1] = (xy[:, 1]-x0)*np.cos(a) + (xy[:, 0]-y0)*np.sin(a) + x0
xy[:, 0] = -(xy[:, 1]-x0)*np.sin(a) + (xy[:, 0]-y0)*np.cos(a) + y0
return xy

im = warp(im, swirl, map_args={'x0':112, 'y0':112, 'R':512})
plt.imshow(im)
plt.axis('off')
plt.show()



The next figure shows the original mandrill input image and the output image obtained after applying the swirl transform.

Compare this with the output of the scikit-image swirl() function.

### 3. Very simple Face morphing with α-blending

1. Start from one face image (e.g., let image1 be the face of Messi) and end into another image (let image2 be the face of Ronaldo) iteratively, creating some intermediate images in between.
2. At each iteration create an image by using a linear combination of the two image numpy ndarrays given by

3. Iteratively increase α from 0 to 1.

The following code block shows how to implement it using matplotlib’s image and pylab modules.

im1 = mpimg.imread("../images/messi.jpg") / 255 # scale RGB values in [0,1]
im2 = mpimg.imread("../images/ronaldo.jpg") / 255
i = 1
plt.figure(figsize=(18,15))
for alpha in np.linspace(0,1,20):
plt.subplot(4,5,i)
plt.imshow((1-alpha)*im1 + alpha*im2)
plt.axis('off')
i += 1
plt.show()


The next animation shows the simple face morphing:

There are more sophisticated techniques to improve the quality of morphing, but this is the simplest one.

### 4. Creating Instagram-like Gotham Filter

#### The Gotham filter

The Gotham filter is computed as follows (the steps taken from here), applying the following operations on an image, the corresponding python code, input and output images are shown along with the operations (with the following input image):

1. A mid-tone red contrast boost
from PIL import Image
import numpy as np
import matplotlib.pylab as plt
im = Image.open('../images/city.jpg') # pixel values in [0,255]
r, g, b = im.split()
red_levels = [0., 12.75, 25.5, 51., 76.5, 127.5, 178.5, 204., 229.5, 242.25, 255.]
r1 = Image.fromarray((np.reshape(np.interp(np.array(r).ravel(), np.linspace(0,255,len(red_levels)), red_levels), (im.height, im.width))).astype(np.uint8), mode='L')
plt.figure(figsize=(20,15))
plt.subplot(221)
plt.imshow(im)
plt.title('original', size=20)
plt.axis('off')
plt.subplot(222)
im1 = Image.merge('RGB', (r1, g, b))
plt.imshow(im1)
plt.axis('off')
plt.title('with red channel interpolation', size=20)
plt.subplot(223)
plt.hist(np.array(r).ravel(), normed=True)
plt.subplot(224)
plt.hist(np.array(r1).ravel(), normed=True)
plt.show()


2. Make the blacks a little bluer
plt.figure(figsize=(20,10))
plt.subplot(121)
plt.imshow(im1)
plt.title('last image', size=20)
plt.axis('off')
b1 = Image.fromarray(np.clip(np.array(b) + 7.65, 0, 255).astype(np.uint8))
im1 = Image.merge('RGB', (r1, g, b1))
plt.subplot(122)
plt.imshow(im1)
plt.axis('off')
plt.title('with transformation', size=20)
plt.tight_layout()
plt.show()


3. A small sharpening
from PIL.ImageEnhance import Sharpness
plt.figure(figsize=(20,10))
plt.subplot(121)
plt.imshow(im1)
plt.title('last image', size=20)
plt.axis('off')
im2 = Sharpness(im1).enhance(3.0)
plt.subplot(122)
plt.imshow(im2)
plt.axis('off')
plt.title('with transformation', size=20)
plt.tight_layout()
plt.show()


4. A boost in blue channel for lower mid-tones
5. A decrease in blue channel for upper mid-tones
blue_levels = [0., 11.985, 30.09, 64.005, 81.09, 99.96, 107.1, 111.945, 121.125, 143.055, 147.9, 159.885, 171.105, 186.915, 215.985, 235.875, 255.]
b2 = Image.fromarray((np.reshape(np.interp(np.array(b1).ravel(), np.linspace(0,255,len(blue_levels)), blue_levels), (im.height, im.width))).astype(np.uint8), mode='L')
plt.figure(figsize=(20,15))
plt.subplot(221)
plt.imshow(im2)
plt.title('last image', size=20)
plt.axis('off')
plt.subplot(222)
im3 = Image.merge('RGB', (r1, g, b2))
plt.imshow(im3)
plt.axis('off')
plt.title('with blue channel interpolation', size=20)
plt.subplot(223)
plt.hist(np.array(b1).ravel(), normed=True)
plt.subplot(224)
plt.hist(np.array(b2).ravel(), normed=True)
plt.show()


The output image obtained after applying the Gotham filter is shown below:

## Down-sampling with anti-aliasing using Gaussian Filter

1. Start with a large gray-scale image and reduce the image size 16 times, by reducing both height and width by 4 times.
2. Select every 4th pixel in the x and the y direction from the original image to compute the values of the pixels in the smaller image.
3. Before down-sampling apply a Gaussian filter (to smooth the image) for anti-aliasing.
4. Compare the quality of the output image obtained by down-sampling without a Gaussian filter (with aliasing).

The next code block performs the above steps. Since the Gaussian blur is a low-pass filter, it removes the high frequencies from the original input image, hence it’s possible to achieve sampling rate above the Nyquist rate (by sampling theorem) to avoid aliasing.


from scipy.ndimage import gaussian_filter
print(im.shape)
plt.figure(figsize=(20,20))
plt.imshow(im)
plt.show()
plt.figure(figsize=(20,20))
im_blurred = gaussian_filter(im, sigma=2.5) #(5,5,1)
plt.imshow(im_blurred)
plt.show()
n = 4 # create and image 16 times smaller in size
w, h = im.shape[0] // n, im.shape[1] // n
im_small = np.zeros((w,h))
for i in range(w):
for j in range(h):
im_small[i,j] = im[n*i, n*j]
plt.figure(figsize=(20,20))
plt.imshow(im_small)
plt.show()
im_small = np.zeros((w,h))
for i in range(w):
for j in range(h):
im_small[i,j] = im_blurred[n*i, n*j]
plt.figure(figsize=(20,20))
plt.imshow(im_small)
plt.show()



Original Image

Image blurred with Gaussian Filter LPF

Down-sampled Image from the original image (with aliasing)

Down-sampled Image from the blurred image (with anti-aliasing)

## Some Applications of DFT

### 0. Fourier Transform of a Gaussian Kernel is another Gaussian Kernel

Also, the spread in the frequency domain  inversely proportional to the spread in the spatial domain (known as Heisenberg’s inequality). Here is the proof:

The following animation shows an example visualizing the Gaussian contours in spatial and corresponding frequency domains:

### 1. Using DFT to up-sample an image

1. Let’s use the lena gray-scale image.
2. First double the size of the by padding zero rows/columns at every alternate positions.
3. Use FFT followed by an LPF.
4. Finally use IFFT to get the output image.

The following code block shows the python code for implementing the steps listed above:


import numpy as np
import numpy.fft as fp
import matplotlib.pyplot as plt

im = np.mean(imread('images/lena.jpg'), axis=2)
im1 = np.zeros((2*im.shape[0], 2*im.shape[1]))
print(im.shape, im1.shape)
for i in range(im.shape[0]):
for j in range(im.shape[1]):
im1[2*i,2*j] = im[i,j]

return vector

# the LPF kernel
kernel = [[0.25, 0.5, 0.25], [0.5, 1, 0.5], [0.25, 0.5, 0.25]]
# enlarge the kernel to the shape of the image

plt.figure(figsize=(15,10))
plt.gray() # show the filtered result in grayscale

freq = fp.fft2(im1)
freq_kernel = fp.fft2(fp.ifftshift(kernel))
freq_LPF = freq*freq_kernel # by the Convolution theorem
im2 = fp.ifft2(freq_LPF)
freq_im2 = fp.fft2(im2)

plt.subplot(2,3,1)
plt.imshow(im)
plt.title('Original Image', size=20)
plt.subplot(2,3,2)
plt.imshow(im1)
plt.subplot(2,3,3)
plt.imshow( (20*np.log10( 0.1 + fp.fftshift(freq))).astype(int), cmap='jet')
plt.title('Original Image Spectrum', size=20)
plt.subplot(2,3,4)
plt.imshow( (20*np.log10( 0.1 + fp.fftshift(freq_kernel))).astype(int), cmap='jet')
plt.title('Image Spectrum of the LPF', size=20)
plt.subplot(2,3,5)
plt.imshow( (20*np.log10( 0.1 + fp.fftshift(freq_im2))).astype(int), cmap='jet')
plt.title('Image Spectrum after LPF', size=20)
plt.subplot(2,3,6)
plt.imshow(im2.astype(np.uint8)) # the imaginary part is an artifact
plt.title('Output Image', size=20)



The next figure shows the output. As can be seen from the next figure, the LPF removed the high frequency components from the Fourier spectrum of the padded image and with a subsequent inverse Fourier transform  we get a decent enlarged image.

### 2. Frequency Domain Gaussian Filter

1. Use an input image and use DFT to create the frequency 2D-array.
2. Create a small Gaussian 2D Kernel (to be used as an LPF) in the spatial domain and pad it to enlarge it to the image dimensions.
3. Use DFT to obtain the Gaussian Kernel in the frequency domain.
4. Use the Convolution theorem to convolve the LPF with the input image in the frequency domain.
5. Use IDFT to obtain the output image.
6. Plot the frequency spectrum of the image, the gaussian kernel and the image obtained after convolution in the frequency domain, in 3D.

The following code block shows the python code:


import matplotlib.pyplot as plt
from matplotlib import cm
from skimage.color import rgb2gray
from skimage.io import imread
import scipy.fftpack as fp

kernel = np.outer(signal.gaussian(im.shape[0], 10), signal.gaussian(im.shape[1], 10))
freq = fp.fft2(im)
assert(freq.shape == kernel.shape)
freq_kernel = fp.fft2(fp.ifftshift(kernel))
convolved = freq*freq_kernel # by the Convolution theorem
im_blur = fp.ifft2(convolved).real
im_blur = 255 * im_blur / np.max(im_blur)

# center the frequency response
plt.imshow( (20*np.log10( 0.01 + fp.fftshift(freq_kernel))).real.astype(int), cmap='coolwarm')
plt.colorbar()
plt.show()

plt.figure(figsize=(20,20))
plt.imshow(im, cmap='gray')
plt.show()

from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import LinearLocator, FormatStrFormatter
# ... code for 3D visualization of the spectrums


#### The original color temple image (time / spatial domain)

The temple image (frequency domain)

#### The smoothed temple image with the LPF (frequency domain)

If we set the standard deviation of the LPF Gaussian kernel to be 10 we get the following output as shown in the next figures. As can be seen, the frequency response value drops much quicker from the center.

#### The smoothed temple image with the LPF with higher s.d. (frequency domain)

The output image after convolution (spatial / time domain)

3. Using the inverse filter to restore a motion-blurred image

1. First create a motion blur kernel of a given shape.
2. Convolve the kernel with an input image in the frequency domain.
3. Get the motion-blurred image in the spatial domain with IDFT.
4. Compute the inverse filter kernel and convolve with the blurred image in the frequency domain.
5. Get the convolved image back in the spatial domain.
6. Plot all the images and kernels in the frequency domain.

The following code block shows the python code:


# create the motion blur kernel
size = 21
kernel = np.zeros((size, size))
kernel[int((size-1)/2), :] = np.ones(size)
kernel = kernel / size

freq = fp.fft2(im)
freq_kernel = fp.fft2(fp.ifftshift(kernel))
convolved1 = freq1*freq_kernel1
im_blur = fp.ifft2(convolved1).real
im_blur = im_blur / np.max(im_blur)

epsilon = 10**-6

freq = fp.fft2(im_blur)
freq_kernel = 1 / (epsilon + freq_kernel1)

convolved = freq*freq_kernel
im_restored = fp.ifft2(convolved).real
im_restored = im_restored / np.max(im_restored)

plt.figure(figsize=(18,12))
plt.subplot(221)
plt.imshow(im)
plt.title('Original image', size=20)
plt.axis('off')
plt.subplot(222)
plt.imshow(im_blur)
plt.title('Blurred image with motion blur kernel', size=20)
plt.axis('off')
plt.subplot(223)
plt.imshow(im_restored)
plt.title('Restored image with inverse filter', size=20)
plt.axis('off')
plt.subplot(224)
plt.imshow(im_restored - im)
plt.title('Diff restored &amp;amp;amp;amp;amp;amp; original image', size=20)
plt.axis('off')
plt.show()

# Plot the surface of the frequency responses here



Frequency response of the input image

(log) Frequency response of the motion blur kernel (LPF)

Input image convolved with the motion blur kernel (frequency domain)

(log) Frequency response of the inverse frequency filter kernel (HPF)

Motion-blurred image convolved with the inverse frequency filter kernel (frequency domain)

4. Impact of noise on the inverse filter

1. Add some random noise to the Lena image.
2. Blur the image with a Gaussian kernel.
3. Restore the image using inverse filter.

With the original image

Let’s first blur and apply the inverse filter on the noiseless blurred image. The following figures show the outputs:

(log) Frequency response of the input image

(log) Frequency response of the Gaussian blur kernel (LPF)

(log) Frequency response of the blurred image

(log) Frequency response of the inverse kernel (HPF)

Frequency response of the output image

Adding noise to the original image

The following python code can be used to add Gaussian noise to an image:


from skimage.util import random_noise
im = random_noise(im, var=0.1)



The next figures show the noisy lena image, the blurred image with a Gaussian Kernel and the restored image with the inverse filter. As can be seen, being a high-pass filter, the inverse filter enhances the noise, typically corresponding to high frequencies.

#### 5. Use a notch filter to remove periodic noise from the following half-toned car image.

1. Use DFT to obtain the frequency spectrum of the image.
2. Block the high frequency components that are most likely responsible fro noise.
3. Use IDFT to come back to the spatial domain.


from scipy import fftpack
F1 = fftpack.fft2((im).astype(float))
F2 = fftpack.fftshift(F1)
for i in range(60, w, 135):
for j in range(100, h, 200):
if not (i == 330 and j == 500):
F2[i-10:i+10, j-10:j+10] = 0
for i in range(0, w, 135):
for j in range(200, h, 200):
if not (i == 330 and j == 500):
F2[max(0,i-15):min(w,i+15), max(0,j-15):min(h,j+15)] = 0
plt.figure(figsize=(6.66,10))
plt.imshow( (20*np.log10( 0.1 + F2)).astype(int), cmap=plt.cm.gray)
plt.show()
im1 = fp.ifft2(fftpack.ifftshift(F2)).real
plt.figure(figsize=(10,10))
plt.imshow(im1, cmap='gray')
plt.axis('off')
plt.show()



Frequency response of the input image

Frequency response of the input image with blocked frequencies with notch

Output image

With a low-pass-filter (LPF):

Frequency response of the input image with blocked frequencies with LPF

Output image

## Histogram Matching with color images

As described here, here is the algorithm:

1. The cumulative histogram is computed for each image dataset, see the figure below.
2. For any particular value (xi) in the input image data to be adjusted has a cumulative histogram value given by G(xi).
3. This in turn is the cumulative distribution value in the reference (template) image  dataset, namely H(xj). The input data value xi is replaced by xj.

im = imread('images/lena.jpg')
im1 = np.zeros(im.shape).astype(np.uint8)
plt.figure(figsize=(20,10))
for i in range(3):
c = cdf(im[...,i])
c_t = cdf(im_t[...,i])
im1[...,i] = hist_matching(c, c_t, im[...,i]) # implement this function with the above algorithm
c1 = cdf(im1[...,i])
col = 'r' if i == 0 else ('g' if i == 1 else 'b')
plt.plot(np.arange(256), c, col + ':', label='input ' + col.upper(), linewidth=5)
plt.plot(np.arange(256), c_t, col + '--', label='template ' + col.upper(), linewidth=5)
plt.plot(np.arange(256), c1, col + '-', label='output ' + col.upper(), linewidth=2)
plt.title('CDF', size=20)
plt.legend(prop={'size': 15})
plt.show()

plt.figure(figsize=(10,10))
plt.imshow(im1[...,:3])
plt.axis('off')
plt.show()


Input image

Template Image

Output Image

The following figure shows how the histogram of the input image is matched with the histogram of the template image.

Another example:

Input image

Template Image

Output Image

## Mathematical Morphology

### 1. Automatically cropping an image

1. Let’s use the following image. The image has unnecessary white background  outside the molecule of the organic compound.
2. First convert the image to a binary image and compute the convex hull of the molecule object.
3. Use the convex hull image to find the bounding box for cropping.
4. Crop the original image with the bounding box.

The next python code shows how to implement the above steps:


from PIL import Image
from skimage.io import imread
from skimage.morphology import convex_hull_image
plt.imshow(im)
plt.title('input image')
plt.show()
im1 = 1 - rgb2gray(im)
threshold = 0.5
im1[im1  threshold] = 1
chull = convex_hull_image(im1)
plt.imshow(chull)
plt.title('convex hull in the binary image')
plt.show()
imageBox = Image.fromarray((chull*255).astype(np.uint8)).getbbox()
cropped = Image.fromarray(im).crop(imageBox)
cropped.save('L_2d_cropped.jpg')
plt.imshow(cropped)
plt.title('cropped image')
plt.show()



This can also be found here.

### 2. Opening and Closing are Dual operations in mathematical morphology

1. Start with a binary image and apply opening operation with some structuring element (e.g., a disk) on it to obtain an output image.
2. Invert the image (to change the foreground to background and vice versa) and apply closing operation on it with the same structuring element to obtain another output image.
3. Invert the second output image obtained and observe that it’s same as the first output image.
4. Thus applying opening operation to the foreground of a binary image is equivalent to applying closing operation to the background of the same image with the same structuring element.

The next python code shows the implementation of the above steps.


from skimage.morphology import binary_opening, binary_closing, disk
from skimage.util import invert
im = rgb2gray(imread('../new images/circles.jpg'))
im[im  0.5] = 1
plt.gray()
plt.figure(figsize=(20,10))
plt.subplot(131)
plt.imshow(im)
plt.title('original', size=20)
plt.axis('off')
plt.subplot(1,3,2)
im1 = binary_opening(im, disk(12))
plt.imshow(im1)
plt.title('opening with disk size ' + str(12), size=20)
plt.axis('off')
plt.subplot(1,3,3)
im1 = invert(binary_closing(invert(im), disk(12)))
plt.imshow(im1)
plt.title('closing with disk size ' + str(12), size=20)
plt.axis('off')
plt.show()



As can be seen the output images obtained are exactly same.

## Floyd-Steinberg Dithering (to convert a grayscale to a binary image)

The next figure shows the algorithm for error diffusion dithering.


def find_closest_palette_color(oldpixel):
return int(round(oldpixel / 255)*255)

im = rgb2gray(imread('../my images/godess.jpg'))*255
pixel = np.copy(im)
w, h = im.shape

for x in range(w):
for y in range(h):
oldpixel = pixel[x][y]
newpixel = find_closest_palette_color(oldpixel)
pixel[x][y] = newpixel
quant_error = oldpixel - newpixel
if x + 1 &amp;amp;amp;amp;amp;lt; w-1:
pixel[x + 1][y] = pixel[x + 1][y] + quant_error * 7 / 16
if x &amp;amp;amp;amp;amp;gt; 0 and y &amp;amp;amp;amp;amp;lt; h-1:
pixel[x - 1][y + 1] = pixel[x - 1][y + 1] + quant_error * 3 / 16
if y &amp;amp;amp;amp;amp;lt; h-1:
pixel[x ][y + 1] = pixel[x ][y + 1] + quant_error * 5 / 16
if x &amp;amp;amp;amp;amp;lt; w-1 and y &amp;amp;amp;amp;amp;lt; h-1:
pixel[x + 1][y + 1] = pixel[x + 1][y + 1] + quant_error * 1 / 16

plt.figure(figsize=(10,20))
plt.imshow(pixel, cmap='gray')
plt.axis('off')
plt.show()



The input image (gray-scale)

The output Image (binary)

The next animation shows how an another input grayscale image gets converted to output binary image using the error diffusion dithering.

## Sharpen a color image

1. First blur the image with an LPF (e.g., Gaussian Filter).
2. Compute the detail image as the difference between the original and the blurred image.
3. Now the sharpened image can be computed as a linear combination of the original image and the detail  image. The next figure illustrates the concept.

The next python code shows how this can be implemented in python:


from scipy import misc, ndimage
import matplotlib.pyplot as plt
import numpy as np

def rgb2gray(im):
return np.clip(0.2989 * im[...,0] + 0.5870 * im[...,1] + 0.1140 * im[...,2], 0, 1)

im = misc.imread('../my images/me.jpg')/255
im_blurred = ndimage.gaussian_filter(im, (5,5,0))
im_detail = np.clip(im - im_blurred, 0, 1)
fig, axes = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=True, figsize=(15, 15))
axes = axes.ravel()
axes[0].imshow(im)
axes[0].set_title('Original image', size=15)
axes[1].imshow(im_blurred)
axes[1].set_title('Blurred image, sigma=5', size=15)
axes[2].imshow(im_detail)
axes[2].set_title('Detail image', size=15)
alpha = [1, 5, 10]
for i in range(3):
im_sharp = np.clip(im + alpha[i]*im_detail, 0, 1)
axes[3+i].imshow(im_sharp)
axes[3+i].set_title('Sharpened image, alpha=' + str(alpha[i]), size=15)
for ax in axes:
ax.axis('off')
fig.tight_layout()
plt.show()



The next figure shows the output of the above code block. As cane be seen, the output gets more sharpened as the value of alpha gets increased.

The next animation shows how the image gets more and more sharpened with increasing alpha.

## Edge Detection with LOG and Zero-Crossing Algorithm by Marr and Hildreth

The following figure shows LOG filter and its DOG approximation.

In order to detect edges as a binary image, finding the zero-crossings in the LoG-convolved image was proposed by Marr and Hildreth. Identification of the edge pixels can be done by viewing the sign of the LoG-smoothed image by defining it as a binary image, the algorithm is as follows:

Algorithm to compute the zero-crossing

1. First convert the LOG-convolved image to a binary image, by replacing the pixel values by 1 for positive values and 0 for negative values.
2. In order to compute the zero crossing pixels, we need to simply look at the boundaries of the non-zero regions in this binary image.
3. Boundaries can be found by finding any non-zero pixel that has an immediate neighbor which is is zero.
4. Hence, for each pixel, if it is non-zero, consider its 8 neighbors, if any of the neighboring pixels is zero, the pixel can be identified as an edge.

The next python code and the output images / animations generated show how to detect the edges from the  zebra image with LOG + zero-crossings:


from scipy import ndimage, misc
import matplotlib.pyplot as plt
from scipy.misc import imread
from skimage.color import rgb2gray
def any_neighbor_zero(img, i, j):
for k in range(-1,2):
for l in range(-1,2):
if img[i+k, j+k] == 0:
return True
return False
def zero_crossing(img):
img[img > 0] = 1
img[img  0 and any_neighbor_zero(img, i, j):
out_img[i,j] = 255
return out_img

fig = plt.figure(figsize=(25,15))
plt.gray() # show the filtered result in grayscale

for sigma in range(2,10, 2):
plt.subplot(2,2,sigma/2)
result = ndimage.gaussian_laplace(img, sigma=sigma)
plt.imshow(zero_crossing(result))
plt.axis('off')
plt.title('LoG with zero-crossing, sigma=' + str(sigma), size=30)

plt.tight_layout()
plt.show()



Original Input Image

Output with edges detected with LOG + zero-crossing at different sigma scales

With another input image

Output with edges detected with LOG + zero-crossing at different sigma scales

## Constructing the Gaussian Pyramid with scikit-image transform module’s reduce function and Laplacian Pyramid from the Gaussian Pyramid and the expand function

The Gaussian Pyramid can be computed with the following steps:

2. Iteratively compute the image at each level of the pyramid first by smoothing the image (with gaussian filter) and then downsampling it .
3. Stop at a level where the image size becomes sufficiently small (e.g., 1×1).

The Laplacian Pyramid can be computed with the following steps:

1. Start with the Gaussian Pyramid and with the smallest image.
2. Iteratively compute the difference image in between the image at the current level and the image obtained by first upsampling and then smoothing the image (with gaussian filter) from the previous level of the Gaussian Pyramid.
3. Stop at a level where the image size becomes equal to the original image size.

The next python code shows how to create a Gaussian Pyramid from an image.

import numpy as np
import matplotlib.pyplot as plt
from skimage.io import imread
from skimage.color import rgb2gray
from skimage.transform import pyramid_reduce, pyramid_expand, resize

def get_gaussian_pyramid(image):
rows, cols, dim = image.shape
gaussian_pyramid = [image]
while rows&amp;amp;gt; 1 and cols &amp;amp;gt; 1:
image = pyramid_reduce(image, downscale=2)
gaussian_pyramid.append(image)
print(image.shape)
rows //= 2
cols //= 2
return gaussian_pyramid

def get_laplacian_pyramid(gaussian_pyramid):
laplacian_pyramid = [gaussian_pyramid[len(gaussian_pyramid)-1]]
for i in range(len(gaussian_pyramid)-2, -1, -1):
image = gaussian_pyramid[i] - resize(pyramid_expand(gaussian_pyramid[i+1]), gaussian_pyramid[i].shape)
laplacian_pyramid.append(np.copy(image))
laplacian_pyramid = laplacian_pyramid[::-1]
return laplacian_pyramid

gaussian_pyramid = get_gaussian_pyramid(image)
laplacian_pyramid = get_laplacian_pyramid(gaussian_pyramid)

w, h = 20, 12
for i in range(3):
plt.figure(figsize=(w,h))
p = gaussian_pyramid[i]
plt.imshow(p)
plt.title(str(p.shape[0]) + 'x' + str(p.shape[1]), size=20)
plt.axis('off')
w, h = w / 2, h / 2
plt.show()

w, h = 10, 6
for i in range(1,4):
plt.figure(figsize=(w,h))
p = laplacian_pyramid[i]
plt.imshow(rgb2gray(p), cmap='gray')
plt.title(str(p.shape[0]) + 'x' + str(p.shape[1]), size=20)
plt.axis('off')
w, h = w / 2, h / 2
plt.show()


## Blending images with Gaussian and Laplacian pyramids

Here is the algorithm:

Blending the following input images A, B with mask image M

Input Image A (Goddess Durga)

Input Image B (Lord Shiva)

with the following python code creates the output image I shown below


# get the Gaussian and Laplacian pyramids, implement the functions
pyramidA = get_laplacian_pyramid(get_gaussian_pyramid(A))
pyramidB = get_laplacian_pyramid(get_gaussian_pyramid(B))
pyramidM = get_gaussian_pyramid(M)

pyramidC = []
for i in range(len(pyramidM)):
im = pyramidM[i]*pyramidA[i] + (1-pyramidM[i])*pyramidB[i]
pyramidC.append(im)
# implement the following function to construct an image from its laplacian pyramids
I = reconstruct_image_from_laplacian_pyramid(pyramidC)



Output Image I (Ardhanarishwar)
The following animation shows how the output image is formed:

Another blending (horror!) example (from prof. dmartin)

# Detection of a Human Object with HOG Descriptor Features using SVM (Primal QuadProg implementation using CVXOPT) in Python

In this article, first how to extract the HOG descriptor from an image will be discuss. Then how a support vector machine binary classifier can be trained on a dataset containing labeled images (using the extracted HOG descriptor features) and later how the SVM model can be used (along with a sliding window) to predict whether or not a human object exists in a test image will be described.  How SVM can be represented as a Primal Quadratic Programming problem and can be solved with CVXOPT that will also be discussed. This problem appeared as an assignment problem in this Computer Vision course from UCF.

## Problem 1: Compute HOG features

Let’s first Implement Histogram of Orientated Gradients (HOG). The dataset to be used is the INRIA Person Dataset from here. The dataset consists of positive and negative examples for training as well as testing images. Let us do the following:

i. Take 2003 positive training images of size 96×160
ii. Take 997 negative training images of size 96×160
iii. Compute HOG for positive and negative examples.
iv. Show the visualization of HOG for some positive and negative examples.

The Histograms of Oriented Gradients for Human Detection (HOG) is a very heavily cited paper by N. Dalal and B. Triggs from CVPR 2005. The following figure shows the  algorithm proposed by them can be used to compute the HOG features for a 96×160 image:

The next python code snippet shows some helper functions to compute the hog features:


import numpy as np
from scipy import signal
import scipy.misc

def s_x(img):
kernel = np.array([[-1, 0, 1]])
imgx = signal.convolve2d(img, kernel, boundary='symm', mode='same')
return imgx
def s_y(img):
kernel = np.array([[-1, 0, 1]]).T
imgy = signal.convolve2d(img, kernel, boundary='symm', mode='same')
return imgy

imgx = s_x(img)
imgy = s_y(img)
s = np.sqrt(imgx**2 + imgy**2)
theta = np.arctan2(imgx, imgy) #imgy, imgx)
theta[theta<0] = np.pi + theta[theta<0]
return (s, theta)


The following figures animations show some positive and negative training examples along with the HOG features computed using the algorithm.

### Positive Example 1

The next animation shows how the HOG features are computed using the above algorithm.

### Positive Example 2

The next animation shows how the HOG features are computed using the above algorithm.

### Positive Example 3

The next animation shows how the HOG features are computed using the above algorithm.

### Negative Example 1

The next animation shows how the HOG features are computed using the above algorithm.

## Problem 2: Use sklearn’s SVC and 80-20 validation to compute accuracy on the held-out training images dataset using the extracted HOG features.

Before implementing SVC on our own with primal quadratic programming solver, let’s use the scikit-learn SVC implementation (with linear kernel) to train a support vector classifier on the training positive and negative examples using the HOG features extracted  from the training images with 80-20 validation and compute accuracy of classification on the held-out images.

The following python code does exactly that, with the X matrix containing the 1620 HOG features extracted from each image and the corresponding label (pos/neg, depending on whether human is present or not), with 98.5% accuracy on the held-out dataset.


import time
from sklearn.metrics import accuracy_score
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.8, random_state=123)
timestamp1 = time.time()
clf = SVC(C=1, kernel='linear')
clf.fit(Xtrain, ytrain)
print("%d support vectors out of %d points" % (len(clf.support_vectors_), len(Xtrain)))
timestamp2 = time.time()
print "sklearn LinearSVC took %.2f seconds" % (timestamp2 - timestamp1)
ypred = clf.predict(Xtest)
print('accuracy', accuracy_score(ytest, ypred))


430 support vectors out of 2400 points
sklearn LinearSVC took 3.40 seconds
accuracy 0.985

The next figures show the confusion matrices for the prediction on the held-out dataset with the SVC model learnt.

## Problem 3: Implement SVM by solving the Primal form of the problem using Quadratic Programming

Let’s implement Support Vector Machine (SVM) using Quadratic Programming. We shall use python’s CVXOPT package for this purpose. Let’s do the following:

i. Try to understand each input term in cvxopt.solvers.qp.
ii. Formulate soft- margin primal SVM in term of inputs of cvxopt.solvers.qp
iii. Show ‘P’, ‘Q’, ‘G”, ‘h’, ‘A’ and ‘b’ Matrices.
iv. Obtain parameter vector ‘w’ and bias term ‘b’ using cvxopt.solvers.qp

To be done

## Problem 4: Detect Human in testing images using trained model (‘w’, ‘b’) from the last problem

Let’s use the coefficients learnt by the SVM model from the training dataset and do the following:

i. Take at least 5 testing images from Test/pos.
ii. Test the trained model over testing images. Testing can be performed using
w*feature vector + b.
iii. Use sliding window approach to obtain detection at each location in the image.
iv. Perform non-maximal suppression and choose the highest scored location.
v. Display the bounding box at the final detection.

To be done

# Implementing Lucas-Kanade Optical Flow algorithm in Python

In this article an implementation of the Lucas-Kanade optical flow algorithm is going to be described. This problem appeared as an assignment in this computer vision course from UCSD. The inputs will be sequences of images (subsequent frames from a video) and the algorithm will output an optical flow field (u, v) and trace the motion of the moving objects. The problem description is taken from the assignment itself.

## Problem Statement

#### Single-Scale Optical Flow

• Let’s implement the single-scale Lucas-Kanade optical flow algorithm. This involves finding the motion (u, v) that minimizes the sum-squared error of the brightness constancy equations for each pixel in a window.  The algorithm will be implemented as a function with the following inputs:

def optical_flow(I1, I2, window_size, tau) # returns (u, v)

• Here, u and v are the x and y components of the optical flow, I1 and I2 are two images taken at times t = 1 and t = 2 respectively, and window_size is a 1 × 2 vector storing the width and height of the window used during flow computation.
• In addition to these inputs, a theshold τ should be added, such that if τ is larger than the smallest eigenvalue of A’A, then the the optical flow at that position should not be computed. Recall that the optical flow is only valid in regions where

has rank 2, which is what the threshold is checking. A typical value for τ is 0.01.

• We should try experimenting with different window sizes and find out the tradeoffs associated with using a small vs. a large window size.
• The following figure describes the algorithm, which considers a nxn (n>=3) window around each pixel and solves a least-square problem to find the best flow vectors for the pixel.

• The following code-snippet shows how the algorithm is implemented in python for a gray-level image.
import numpy as np
from scipy import signal
def optical_flow(I1g, I2g, window_size, tau=1e-2):

kernel_x = np.array([[-1., 1.], [-1., 1.]])
kernel_y = np.array([[-1., -1.], [1., 1.]])
kernel_t = np.array([[1., 1.], [1., 1.]])#*.25
w = window_size/2 # window_size is odd, all the pixels with offset in between [-w, w] are inside the window
I1g = I1g / 255. # normalize pixels
I2g = I2g / 255. # normalize pixels
# Implement Lucas Kanade
# for each point, calculate I_x, I_y, I_t
mode = 'same'
fx = signal.convolve2d(I1g, kernel_x, boundary='symm', mode=mode)
fy = signal.convolve2d(I1g, kernel_y, boundary='symm', mode=mode)
ft = signal.convolve2d(I2g, kernel_t, boundary='symm', mode=mode) +
signal.convolve2d(I1g, -kernel_t, boundary='symm', mode=mode)
u = np.zeros(I1g.shape)
v = np.zeros(I1g.shape)
# within window window_size * window_size
for i in range(w, I1g.shape[0]-w):
for j in range(w, I1g.shape[1]-w):
Ix = fx[i-w:i+w+1, j-w:j+w+1].flatten()
Iy = fy[i-w:i+w+1, j-w:j+w+1].flatten()
It = ft[i-w:i+w+1, j-w:j+w+1].flatten()
#b = ... # get b here
#A = ... # get A here
# if threshold τ is larger than the smallest eigenvalue of A'A:
nu = ... # get velocity here
u[i,j]=nu[0]
v[i,j]=nu[1]

return (u,v)



## Some Results

• The following figures and animations show the results of the algorithm on a few image sequences. Some of these input image sequences / videos are from the course and some are collected from the internet.
• As can be seen, the algorithm performs best if the motion of the moving object(s) in between consecutive frames is slow. To the contrary, if the motion is large, the algorithm fails and we should implement / use multiple-scale version Lucas-Kanade with image pyramids.
• Finally,  with small window size,  the algorithm captures subtle motions but not large motions. With large size it happens the other way.

Input Sequences

Output Optical Flow with different window sizes

window size = 15

window size = 21

Input Sequences

Output Optical Flow

Input Sequences (hamburg taxi)

Output Optical Flow

Input Sequences

Output Optical Flow

Input Sequences

Output Optical Flow

Input Sequences

Output Optical Flow

Input Sequences

Output Optical Flow

Input Sequences

Output Optical Flow

Input Sequences
Output Optical Flow

Input Sequences

Output Optical Flow

Output Optical Flow

Input Sequences

Output Optical Flow with window size 45

Output Optical Flow with window size 10

Output Optical Flow with window size 25

Output Optical Flow with window size 45

# Efficient Graph-Based Image Segmentation in Python

In this article, an implementation of an efficient graph-based image segmentation technique will be described, this algorithm was proposed by Felzenszwalb et. al. from MIT in this paper.  The slides on this paper can be found from this link from the Stanford Vision Lab too. The algorithm is closely related to Kruskal’s algorithm for constructing a minimum spanning tree of a graph, as stated by the author and hence can  be implemented to run in O(m log m) time, where m is the number of edges in the graph.

## Problem Definition and the basic idea (from the paper)

• Let G = (V, E) be an undirected graph with vertices vi ∈ V, the set of elements to be segmented, and edges (vi, vj ) ∈ E corresponding to pairs of neighboring vertices. Each edge (vi, vj ) ∈ E has a corresponding weight w((vi, vj)), which is a non-negative measure of the dissimilarity between neighboring elements vi and vj.

• In the case of image segmentation, the elements in V are pixels and the weight of an edge is some measure of the dissimilarity between the two pixels connected by that edge (e.g., the difference in intensity, color, motion, location or some other local attribute).

• Particularly for the implementation described here, an edge weight function based on the absolute intensity difference (in the yiq space) between the pixels connected by an edge, w((vi, vj )) = |I(pi) − I(pj )|.

• In the graph-based approach, a segmentation S is a partition of V into components
such that each component (or region) C ∈ S corresponds to a connected component
in a graph G0 = (V, E0), where E0 ⊆ E.

• In other words, any segmentation is induced by a subset of the edges in E. There are different ways to measure the quality of a segmentation but in general we want the elements in a component to be similar, and elements in different components to be dissimilar.

• This means that edges between two vertices in the same component should have relatively low weights, and edges between vertices in different components should have higher weights.

• The next figure shows the steps in the algorithm. The algorithm is very similar to Kruskal’s algorithm for computing the MST for an undirected graph.

• The threshold function τ controls the degree to which the difference between two
components must be greater than their internal differences in order for there to be
evidence of a boundary between them.

• For small components, Int(C) is not a good estimate of the local characteristics of the data. In the extreme case, when |C| = 1, Int(C) = 0. Therefore, a threshold function based on the size of the component, τ (C) = k/|C| is needed to be usedwhere |C| denotes the size of C, and k is some constant parameter.

• That is, for small components we require stronger evidence for a boundary. In practice k sets a scale of observation, in that a larger k causes a preference for larger components.

• In general, a Gaussian filter is used to smooth the image slightly before computing the edge weights, in order to compensate for digitization artifacts. We always use a Gaussian with σ = 0.8, which does not produce any visible change to the image but helps remove artifacts.

• The following python code shows how to create the graph.

import numpy as np
from scipy import signal
import matplotlib.image as mpimg

def gaussian_kernel(k, s = 0.5):
# generate a (2k+1)x(2k+1) gaussian kernel with mean=0 and sigma = s
probs = [exp(-z*z/(2*s*s))/sqrt(2*pi*s*s) for z in range(-k,k+1)]
return np.outer(probs, probs)

def create_graph(imfile, k=1., sigma=0.8, sz=1):
# create the pixel graph with edge weights as dissimilarities
gauss_kernel = gaussian_kernel(sz, sigma)
for i in range(3):
rgb[:,:,i] = signal.convolve2d(rgb[:,:,i], gauss_kernel, boundary='symm', mode='same')
yuv = rgb2yiq(rgb)
(w, h) = yuv.shape[:2]
edges = {}
for i in range(yuv.shape[0]):
for j in range(yuv.shape[1]):
#compute edge weight for nbd pixel nodes for the node i,j
for i1 in range(i-1, i+2):
for j1 in range(j-1, j+2):
if i1 == i and j1 == j: continue

if i1 >= 0 and i1 = 0 and j1 < h:
wt = np.abs(yuv[i,j,0]-yuv[i1,j1,0])
n1, n2 = ij2id(i,j,w,h), ij2id(i1,j1,w,h)
edges[n1, n2] = edges[n2, n1] = wt
return edges


## Some Results

• The images are taken from the paper itself or from the internet. The following figures and animations show the result of segmentation as a result of iterative merging of the components (by choosing least weight edges), depending on the internal difference of the components.

• Although in the paper the author described the best value of the parameter k to be around 300, but  since in this implementation the pixel RGB values are normalized (to have values in between 0 – 1) and then converted to YIQ values and the YIQ intensities are used for computing the weights (which are typically very small), the value of k that works best in this scenario is 0.001-0.01.

• As we can see from the below results, higher the value of the parameter k, larger the size of the final component and lesser the number of components in the result.

• The minimum spanning tree creation is also shown, the red edges shown in the figures are the edges chosen by the algorithm to merge the components.

## Input Image

Output Images for two different values of the parameter k

## Input Image

Output Images for two different values of the parameter k

## Input Image

Output Segmented Images

## Input Image

Output Images for two different values of the parameter k

## Input Image

Segmented Output Image

# Interactive Image Segmentation with Graph-Cut in Python

In this article, interactive image segmentation with graph-cut is going to be discussed. and it will be used to segment the source object from the background in an image. This segmentation technique was proposed by Boycov and Jolli in this paper. This problem appeared as a homework assignment here., and also in this lecture video from the Coursera image processing course by Duke university.

## Problem Statement: Interactive graph-cut segmentation

Let’s implement “intelligent paint” interactive segmentation tool using graph cuts algorithm on a weighted image grid. Our task will be to separate the foreground object from the background in an image.

Since it can be difficult sometimes to automatically define what’s foreground and what’s background for an image, the user is going to help us with a few interactive scribble lines using which our algorithm is going to identify the foreground and the background, after that it will be the algorithms job to obtain a complete segmentation of the foreground from the background image.

The following figures show how an input image gets scribbling from a user with two different colors (which is also going to be input to our algorithm) and the ideal segmented image output.

Scribbled Input Image                                       Expected Segmented Output Image

## The Graph-Cut Algorithm

The following describes how the segmentation problem is transformed into a graph-cut problem:

1. Let’s first define the Directed Graph G = (V, E) as follows:

1. Each of the pixels in the image is going to be a vertex in the graph. There will be another couple of special terminal vertices: a source vertex (corresponds to the foreground object) and a sink vertex (corresponds to the background object in the image). Hence, |V(G)| = width x height + 2.

2. Next, let’s defines the edges of the graph. As obvious, there is going to be two types of edges: terminal (edges that connect the terminal nodes to the non-terminal nodes) and non-terminal (edges that connect the non-terminal nodes only).

3. There will be a directed edge from the terminal node source to each of non-terminal nodes in the graph. Similarly,  a directed edge will be there from each non-terminal node (pixel) to the other terminal node sink. These are going to be all the terminal edges and hence, |E_T(G)| =  2 x width x height.

4. Each of the non-terminal nodes (pixels) are going to be connected by edges with the nodes corresponding to the neighboring pixels (defined by 4 or 8 neighborhood of a pixel).  Hence, |E_N(G)| =  |Nbd| x width x height.

2. Now let’s describe how to compute the edge weights in this graph.

1. In order to compute the terminal edge weights, we need to estimate the feature distributions first, i.e., starting with the assumption that each of the nodes corresponding to the scribbled pixels have the probability 1.0 (since we want the solution to respect the regional hard constraints marked by the user-seeds / scribbles) to be in foreground or background object in the image (distinguished by the scribble color, e.g.), we have to compute the probability that a node belongs to the foreground (or background) for all the other non-terminal nodes.

2. The simplest way to compute $P_F$ and $P_B$ is to first fit a couple of  Gaussian distributions on the scribbles by computing the parameters (μ, ∑)
with MLE from the scribbled pixel intensities and then computing the (class-conditional) probabilities from the individual pdfs (followed by a normalization) for each of the pixels as shown in the next figures. The following code fragment show how the pdfs are being computed.

import numpy as np
from collections import defaultdict

def compute_pdfs(imfile, imfile_scrib):
'''
# Compute foreground and background pdfs
# input image and the image with user scribbles
'''
yuv = rgb2yiq(rgb)
yuv_s = rgb2yiq(rgb_s)
# find the scribble pixels
scribbles = find_marked_locations(rgb, rgb_s)
imageo = np.zeros(yuv.shape)
# separately store background and foreground scribble pixels in the dictionary comps
comps = defaultdict(lambda:np.array([]).reshape(0,3))
for (i, j) in scribbles:
imageo[i,j,:] = rgbs[i,j,:]
# scribble color as key of comps
comps[tuple(imageo[i,j,:])] = np.vstack([comps[tuple(imageo[i,j,:])], yuv[i,j,:]])
mu, Sigma = {}, {}
# compute MLE parameters for Gaussians
for c in comps:
mu[c] = np.mean(comps[c], axis=0)
Sigma[c] = np.cov(comps[c].T)
return (mu, Sigma)


3. In order to compute the non-terminal edge weights, we need to compute the similarities in between a pixel node and the nodes corresponding to its neighborhood pixels, e.g., with the formula shown in the next figures (e.g., how similar neighborhood pixels are in RGB / YIQ space).

3. Now that the underlying graph is defined, the segmentation of the foreground from the background image boils down to computing the min-cut in the graph or equivalently computing the max-flow (the dual problem) from the source to sink.

4. The intuition is that the min-cut solution will keep the pixels with high probabilities to belong to the side of the source (foreground) node and likewise the background pixels on the other side of the cut near the sink (background) node, since it’s going to respect the (relatively) high-weight edges (by not going through the highly-similar pixels).

5. There are several standard algorithms, e.g., Ford-Fulkerson (by finding an augmenting path with O(E max| f |) time complexity) or Edmonds-Karp (by using bfs to find the shortest augmenting path, with O(VE2) time complexity) to solve the max-flow problem, typical implementations of these algorithms run pretty fast, in polynomial time in V, E.  Here we are going to use a different implementation (with pymaxflow) based on Vladimir Kolmogorov, which is shown to run faster on many images empirically in this paper.

## Results

The following figures / animations show the interactive-segmentation results (computed probability densities, subset of the flow-graph & min-cut, final segmented image) on a few images,  some of them taken from the above-mentioned courses / videos, some of them taken from Berkeley Vision dataset.

Input Image

Input Image with Scribbles

Fitted Densities from Color Scribbles

A Tiny Sub-graph with Min-Cut

Input Image

Input Image with Scribbles

Fitted Densities from Color Scribbles

A Tiny Sub-graph with Min-Cut

Input Image (liver)

Input Image with Scribbles

Fitted Densities from Color Scribbles

Input Image

Input Image with Scribbles

Fitted Densities from Color Scribbles

A Tiny Sub-graph of the flow-graph with Min-Cut

Input Image

Input Image with Scribbles

Input Image

Input Image with Scribbles

Fitted Densities from Color Scribbles

A Tiny Sub-graph of the flow-graph with Min-Cut

Input Image

Input Image with Scribbles

A Tiny Sub-graph of the flow-graph with Min-Cut

Input Image                                                         Input Image with Scribbles

Input Image

Input Image with Scribbles

A Tiny Sub-graph of the flow-graph with Min-Cut

Input Image

Input Image with Scribbles

Fitted Densities from Color Scribbles

Input Image

Input Image with Scribbles

Fitted Densities from Color Scribbles

A Tiny Sub-graph of the flow-graph with Min-Cut

Input Image (UMBC Campus Map)

Input Image with Scribbles

Input Image

Input Image with Scribbles

A Tiny Sub-graph of the flow-graph with Min-Cut (with blue foreground nodes)

## Changing the background of an image (obtained using graph-cut segmentation) with another image’s background with cut & paste

The following figures / animation show how the background of a given image can be replaced by a new image using cut & paste (by replacing the corresponding pixels in the new image corresponding to foreground), once the foreground in the original image gets identified after segmentation.

Original Input Image

New Background

# Image Colorization Using Optimization in Python

This article is inspired by this SIGGRAPH paper by Levin et. al, for which they took this patent , the paper was referred to in the course CS1114 from Cornell.  This method is also discussed in the coursera online image processing course by NorthWestern University. Some part of the problem description is taken from the paper itself. Also, one can refer to the implementation provided by the authors in matlab, the following link and the following python implementation in github.

## The Problem

Colorization is a computer-assisted process of adding color to a monochrome image or movie. In the paper the authors presented an optimization-based colorization method that is based on a simple premise: neighboring pixels in space-time that have similar intensities should have similar colors.

This premise is formulated using a quadratic cost function  and as an optimization problem. In this approach an artist only needs to annotate the image with a few color scribbles, and the indicated colors are automatically propagated in both space and time to produce a fully colorized image or sequence.

In this article the optimization problem formulation and the way to solve it to obtain the automatically colored image will be described for the images only.

## The Algorithm

YUV/YIQ color space is chosen to work in, this is commonly used in video, where Y is the monochromatic luminance channel, which we will refer to simply as intensity, while U and V are the chrominance channels, encoding the color.

The algorithm is given as input an intensity volume Y(x,y,t) and outputs two color volumes U(x,y,t) and V(x,y,t). To simplify notation the boldface letters are used (e.g. r,s) to denote $\left(x,y,t \right)$ triplets. Thus, Y(r) is the intensity of a particular pixel.

Now, One needs to impose the constraint that two neighboring pixels r,s should have similar colors if their intensities are similar. Thus, the problem is formulated as the following optimization problem that aims to minimize the difference between the  colorU(r) at pixel r and the weighted average of the colors at neighboring pixels, where w(r,s) is a weighting function that sums to one, large when Y(r) is similar to Y(s), and small when the two intensities are different.

When the intensity is constant the color should be constant, and when the intensity is an edge the color should also be an edge. Since the cost functions are quadratic and
the constraints are linear, this optimization problem yields a large, sparse system of linear equations, which may be solved using a number of standard methods.

As discussed in the paper, this algorithm is closely related to algorithms proposed for other tasks in image processing. In image segmentation algorithms based on normalized cuts [Shi and Malik 1997], one attempts to find the second smallest eigenvector of the matrix D − W where W is a npixels×npixels matrix whose elements are the pairwise affinities between pixels (i.e., the r,s entry of the matrix is w(r,s)) and D is a diagonal matrix whose diagonal elements are the sum of the affinities (in this case it is always 1). The second smallest eigenvector of any symmetric matrix A is a unit norm vector x that minimizes $x^{T}Ax$ and is orthogonal to the first eigenvector. By direct  inspection, the quadratic form minimized by normalized cuts is exactly the cost function J, that is $x^{T}(D-W)x =J(x)$. Thus, this algorithm minimizes the same cost function but under different constraints. In image denoising algorithms based on anisotropic diffusion [Perona and Malik 1989; Tang et al. 2001] one often minimizes a  function
similar to equation 1, but the function is applied to the image intensity as well.

The following figures show an original gray-scale image and the marked image with color-scribbles that are going to be used to compute the output colored image.

Original Gray-scale Image Input

Gray-scale image Input Marked with Color-Scribbles

## Implementation of the Algorithm

Here are the the steps for the algorithm:

1. Convert both the original gray-scale image and the marked image (marked with color scribbles for a few pixels by the artist) to from RGB to YUV / YIQ color space.
2. Compute the difference image from the marked and the gray-scale image. The pixels that differ  are going to be pinned and they will appear in the output, they are directly going to be used as stand-alone constraints in the minimization problem.
3. We need to compute the weight matrix W that depends on the similarities in the neighbor intensities for a pixel from the original gray-scale image.
4. The optimization problem finally boils down to solving the system of linear equations of the form $WU = b$ that has a closed form least-square solution
$U = W^{+}b = {(W^{T}W)}^{-1}W^{T}b$. Same thing is to be repeated for the V channel too.
5. However the W matrix is going to be very huge and sparse, hence sparse-matrix based implementations must be used to obtain an efficient solution. However, in this python implementation in github, the scipy sparse lil_matrix was used when constructing the sparse matrices, which is quite slow, we can construct more efficient scipy csc matrix rightaway, by using a dictionary to store the weights initially. It is much faster. The python code in the next figure shows my implementation for computing the weight matrix W.
6. Once W is computed it’s just a matter of obtaining the least-square solution, by computing the pseudo-inverse, which can be more efficiently computed with LU factorization and a sparse LU solver, as in this  python implementation in github.
7. Once the solution of the optimization problem is obtained in YUV / YIQ space, it needs to be converted back to RGB. The following formula is used for conversion.

import scipy.sparse as sp
from collections import defaultdict

def compute_W(Y, colored):

(w, h) = Y.shape
W = defaultdict()
for i in range(w):
for j in range(h):
if not (i, j) in colored: # pixel i,j in color scribble
(N, sigma) = get_nbrs(Y, i, j, w, h) #r = (i, j)
Z = 0.
id1 = ij2id(i,j,w,h) # linearized id
for (i1, j1) in N: #s = (i1, j1)
id2 = ij2id(i1,j1,w,h)
W[id1,id2] = np.exp(-(Y[i,j]-Y[i1,j1])**2/(sigma**2)) if sigma > 0 else 0.
Z += W[id1,id2]
if Z > 0:
for (i1, j1) in N: #s = (i1, j1)
id2 = ij2id(i1,j1,w,h)
W[id1,id2] /= -Z
for i in range(w):
for j in range(h):
id = ij2id(i,j,w,h)
W[id,id] = 1.

rows, cols = zip(*(W.keys())) #keys
data = W.values() #[W[k] for k in keys]
return sp.csc_matrix((data, (rows, cols)), shape=(w*h, w*h)) #W



## Results

The following images and animations show the results obtained with the optimization algorithm. Most of the following images are taken from the paper itself.

Original Gray-scale Image Input                 Gray-scale image Input Marked

Color image Output

The next animations show how the incremental scribbling results in better and better color images.

Original Gray-scale Image Input

As can be seen from the following animation, the different parts of the building get colored as more and more color-hints are scribbled / annotated.

Gray-scale image Input Marked

Color image Output

Original Gray-scale Image Input

Gray-scale image Input Marked

Color image Output

Original Gray-scale Image Input (me)

Gray-scale image Input Marked incrementally

Color image Output

Original Gray-scale Image Input

Gray-scale image Input Marked

Color image Output

Original Gray-scale Image Input

Gray-scale image Input Marked

Color image Output

# Recursive Graphics, Bi/Tri-linear Interpolation, Anti-aliasing and Image Transformation in Python

The following problem appeared in an assignment in the Princeton course COS 126 . The problem description is taken from the course itself.

## Recursive Graphics

Write a program that plots a Sierpinski triangle, as illustrated below. Then develop a program that plots a recursive patterns of your own design.

Part 1.

The Sierpinski triangle is an example of a fractal pattern. The pattern was described by Polish mathematician Waclaw Sierpinski in 1915, but has appeared in Italian art since the 13th century. Though the Sierpinski triangle looks complex, it can be generated with a short recursive program.

Examples. Below are the target Sierpinski triangles for different values of order N.

Our task is to implement a recursive function sierpinski(). We need to think recursively: our function should draw one black triangle (pointed downwards) and then call itself recursively 3 times (with an appropriate stopping condition). When writing our program, we should exercise modular design.

The following code shows an implementation:

class Sierpinski:
#Height of an equilateral triangle whose sides are of the specified length.
def height (self, length):
return sqrt(3) * length / 2.
#Draws a filled equilateral triangle whose bottom vertex is (x, y)
#of the specified side length.
def filledTriangle(self, x, y, length):
h = self.height(length)
draw(np.array([x, x+length/2., x-length/2.]), np.array([y, y+h, y+h]), alpha=1)
#Draws an empty equilateral triangle whose bottom vertex is (x, y)
#of the specified side length.
def emptyTriangle(self, x, y, length):
h = self.height(length)
draw(np.array([x, x+length, x-length]), np.array([y+2*h, y, y]), alpha=0)
# Draws a Sierpinski triangle of order n, such that the largest filled
# triangle has bottom vertex (x, y) and sides of the specified length.
def sierpinski(self, n, x, y, length):
self.filledTriangle(x, y, length)
if n > 1:
self.sierpinski(n-1, x-length/2., y, length/2.)
self.sierpinski(n-1, x+length/2., y, length/2.)
self.sierpinski(n-1, x, y+self.height(length), length/2.)


The following animation shows how such a triangle of order 5 is drawn recursively.

The following animation shows how such a triangle of order 6 is drawn recursively.

A diversion: fractal dimension.

Formally, we can define the Hausdorff dimension or similarity dimension of a self-similar figure by partitioning the figure into a number of self-similar pieces of smaller size. We define the dimension to be the log (# self similar pieces) / log (scaling factor in each spatial direction). For example, we can decompose the unit square into 4 smaller squares, each of side length 1/2; or we can decompose it into 25 squares, each of side length 1/5. Here, the number of self-similar pieces is 4 (or 25) and the scaling factor is 2 (or 5). Thus, the dimension of a square is 2 since log (4) / log(2) = log (25) / log (5) = 2. We can decompose the unit cube into 8 cubes, each of side length 1/2; or we can decompose it into 125 cubes, each of side length 1/5. Therefore, the dimension of a cube is log(8) / log (2) = log(125) / log(5) = 3.

We can also apply this definition directly to the (set of white points in) Sierpinski triangle. We can decompose the unit Sierpinski triangle into 3 Sierpinski triangles, each of side length 1/2. Thus, the dimension of a Sierpinski triangle is log (3) / log (2) ≈ 1.585. Its dimension is fractional—more than a line segment, but less than a square! With Euclidean geometry, the dimension is always an integer; with fractal geometry, it can be something in between. Fractals are similar to many physical objects; for example, the coastline of Britain resembles a fractal, and its fractal dimension has been measured to be approximately 1.25.

Part 2.

Drawing a tree recursively, as described here:

The following shows how a tree of order 10 is drawn:

The next problem appeared in an assignment in the Cornell course CS1114 . The problem description is taken from the course itself.

## Bilinear Interpolation

Let’s consider a 2D matrix of values at integer grid locations (e.g., a grayscale image). To interpolate values on a 2D grid, we can use the 2D analogue of linear interpolation: bilinear interpolation. In this case, there are four neighbors for each possible point we’d like to interpolation, and the intensity values of these four neighbors are all combined to compute the interpolated intensity, as shown in the next figure.

In the ﬁgure, the Q values represent intensities. To combine these intensities, we perform linear interpolation in multiple directions: we ﬁrst interpolate in the x direction (to get the value at the blue points), then in the y direction (to get the value at the green points).

## Image transformations

Next, we’ll use the interpolation function to help us implement image transformations.

A 2D afﬁne transformation can be represented with a 3 ×3 matrix T:

Recall that the reason why this matrix is 3×3, rather than 2 ×2, is that we operate in homogeneous coordinates; that is, we add an extra 1 on the end of our 2D coordinates (i.e., (x,y) becomes (x,y,1)), in order to represent translations with a matrix multiplication. To apply a transformation T to a pixel, we multiply T by the pixel’s location:

The following figure shows a few such transformation matrices:

To apply a transformation T to an entire image I, we could apply the transformation to each of I’s pixels to map them to the output image. However, this forward warping procedure has several problems. Instead, we’ll use inverse mapping to warp the pixels of the output image back to the input image. Because this won’t necessarily hit an integer-valued location, we’ll need to use the (bi-linear) interpolation to determine the intensity of the input image at the desired location, as shown in the next figure.

To demo the transformation function, let’s implement the following on a gray scale bird image:

1. Horizontal ﬂipping

2. Scaling by a factor of 0.5

3. Rotation by 45 degrees around the center of the image

The next animations show rotation and sheer transformations with the Lena image:

Next, let’s implement a function to transform RGB images. To do this, we need to simply call transform image three times, once for each channel, then put the results together into a single image.  Next figures and animations show some results on an RGB image.

## Some non-linear transformations

The next figure shows the transform functions from here:

The next figures and animations show the application of the above non-linear transforms on the Lena image.

Wave1

Wave2

Swirl

Warp

Some more non-linear transforms:

## Anti-aliasing

There is a problem with our interpolation method above: it is not very good at shrinking images, due to aliasing. For instance, if let’s try to down-sample the following bricks image by a factor of 0.4, we get the image shown in the following figure: notice the strange banding effects in the output image.

Original Image

Down-sampled Image with Bilinear Interpolation

The problem is that a single pixel in the output image corresponds to about 2.8 pixels in the input image, but we are sampling the value of a single pixel—we should really be averaging over a small area.

To overcome this problem, we will create a data structure that will let us (approximately) average over any possible square regions of pixels in the input image: an image stack. An image stack is a 3D matrix that we can think of as, not surprisingly, a stack of images, one on top of the other. The top image in the cube will be the original input image. Images further down the stack will be the input image with progressively larger amounts of blur. The size of the matrix will be rows × cols × num levels, where the original (grayscale) image has size rows×cols and there are num levels images in the stack.

Before we use the stack, we must write a function to create it, which takes as input a (grayscale) image and a number of levels in the stack, and returns a 3D matrix stack corresponding to the stack. Again, the ﬁrst image on the stack will be the original image. Every other image in the stack will be a blurred version of the previous image. A good blur kernel to use is:

Now, for image k in the stack, we know that every pixel is a (weighted) average of some number of pixels (a k × k patch, roughly speaking) in the input image. Thus, if we down-sample the image by a factor of k, we want to sample pixels from level k of the stack.

⇒  Let’s write the following function to create image stack that takes a grayscale image and a number max levels, and returns an image stack.

from scipy import signal

def create_image_stack(img, max_level):
K = np.ones((3,3)) / 9.
image_stack = np.zeros((img.shape[0], img.shape[1], max_level))
image_stack[:,:,0] = img
for l in range(1, max_level):
image_stack[:,:,l] = signal.convolve2d(image_stack[:,:,l-1], K,
boundary=’symm’, mode=’same’)
return image_stack

The next animation shows the image stack created from the bricks image.

## Trilinear Interpolation

Now, what happens if we down-sample the image by a fractional factor, such as 3.6? Unfortunately, there is no level 3.6 of the stack. Fortunately, we have a tool to solve this problem: interpolation. We now potentially need to sample a value at position (row,col,k) of the image stack, where all three coordinates are fractional. We therefore something more powerful than bilinear interpolation: trilinear interpolation! Each position we want to sample now has eight neighbors, and we’ll combine all of their values together in a weighted sum.

This sounds complicated, but we can write this in terms of our existing functions. In particular, we now interpolate separately along different dimensions: trilinear interpolation can be implemented with two calls to bilinear interpolation and one call to linear interpolation.

Let’s implement a function trilerp like the following that takes an image stack, and a row, column, and stack level k, and returns the interpolated value.

def trilerp (img_stack, x, y, k):

if k < 1: k = 1
if  k == int(k):
return bilerp(img_stack[:,:,k-1], x, y)
else:
f_k, c_k = int(floor(k)), int(ceil(k))
v_f_k = bilerp(img_stack[:,:,f_k-1], x, y)
v_c_k = bilerp(img_stack[:,:,c_k-1], x, y)
return linterp(k, f_k, c_k, v_f_k, v_c_k)

Now we can ﬁnally implement a transformation function that does proper anti-aliasing. In order to do this, let’s implement a function that will

• First compute the image stack.
• Then compute, for the transformation T, how much T is scaling down the image. If T is deﬁned by the six values a,b,c,d,e,f above, then, to a ﬁrst approximation, the downscale factor is:

However, if k < 1 (corresponding to scaling up the image), we still want to sample from level 1. This situation reverts to normal bilinear interpolation.
• Next call the trilerp function on the image stack, instead of bilerp on the input image.

The next figure shows the output image obtained image transformation with proper anti-aliasing:

Down-sampled Image with Anti-aliasing using Trilinear Interpolation

As we can see from the above output, the aliasing artifact has disappeared.

The same results are obtained on the color image, as shown below, by applying the trilerp function on the color channels separately and creating separate image stacks for different color channels.

Original Image

Down-sampled Image with Bilinear Interpolation

Down-sampled Image with Anti-aliasing

The following animation shows the branding artifacts created when using bilinear interpolation for  different scale factors and how they are removed with anti-aliasing.

Down-sampled Images with Bilinear Interpolation

Down-sampled Images with Anti-aliasing

# Deep Learning & Art: Neural Style Transfer – An Implementation with Tensorflow (using Transfer Learning with a Pre-trained VGG-19 Network) in Python

This problem appeared as an assignment in the online coursera course Convolution Neural Networks by Prof Andrew Ng, (deeplearing.ai).  The description of the problem is taken straightway from the assignment.

Neural Style Transfer algorithm was created by Gatys et al. (2015) , the paper can be found here .

In this assignment, we shall:

• Implement the neural style transfer algorithm
• Generate novel artistic images using our algorithm

Most of the algorithms we’ve studied optimize a cost function to get a set of parameter values. In Neural Style Transfer, we  shall optimize a cost function to get pixel values!

## Problem Statement

Neural Style Transfer (NST) is one of the most fun techniques in deep learning. As seen below, it merges two images, namely,

1. a “content” image (C) and
2. a “style” image (S),

to create a “generated” image (G). The generated image G combines the “content” of the image C with the “style” of image S.

In this example, we are going to generate an image of the Louvre museum in Paris (content image C), mixed with a painting by Claude Monet, a leader of the impressionist movement (style image S).

Let’s see how we can do this.

## Transfer Learning

Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning.

Following the original NST paper, we shall use the VGG network. Specifically, we’ll use VGG-19, a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and thus has learned to recognize a variety of low level features (at the earlier layers) and high level features (at the deeper layers). The following figure (taken from the google image search results) shows how a VGG-19 convolution neural net looks like, without the last fully-connected (FC) layers.

We run the following code to load parameters from the pre-trained VGG-19 model serialized in a matlab file. This takes a few seconds.

import pprint
pprint.pprint(model)

{‘avgpool1’: <tf.Tensor ‘AvgPool_5:0’ shape=(1, 150, 200, 64) dtype=float32>,
‘avgpool2’: <tf.Tensor ‘AvgPool_6:0’ shape=(1, 75, 100, 128) dtype=float32>,
‘avgpool3’: <tf.Tensor ‘AvgPool_7:0’ shape=(1, 38, 50, 256) dtype=float32>,
‘avgpool4’: <tf.Tensor ‘AvgPool_8:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘avgpool5’: <tf.Tensor ‘AvgPool_9:0’ shape=(1, 10, 13, 512) dtype=float32>,
‘conv1_1’: <tf.Tensor ‘Relu_16:0’ shape=(1, 300, 400, 64) dtype=float32>,
‘conv1_2’: <tf.Tensor ‘Relu_17:0’ shape=(1, 300, 400, 64) dtype=float32>,
‘conv2_1’: <tf.Tensor ‘Relu_18:0’ shape=(1, 150, 200, 128) dtype=float32>,
‘conv2_2’: <tf.Tensor ‘Relu_19:0’ shape=(1, 150, 200, 128) dtype=float32>,
‘conv3_1’: <tf.Tensor ‘Relu_20:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_2’: <tf.Tensor ‘Relu_21:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_3’: <tf.Tensor ‘Relu_22:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_4’: <tf.Tensor ‘Relu_23:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv4_1’: <tf.Tensor ‘Relu_24:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_2’: <tf.Tensor ‘Relu_25:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_3’: <tf.Tensor ‘Relu_26:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_4’: <tf.Tensor ‘Relu_27:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv5_1’: <tf.Tensor ‘Relu_28:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_2’: <tf.Tensor ‘Relu_29:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_3’: <tf.Tensor ‘Relu_30:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_4’: <tf.Tensor ‘Relu_31:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘input’: <tensorflow.python.ops.variables.Variable object at 0x7f7a5bf8f7f0>}

The next figure shows the content image (C) – the Louvre museum’s pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.

For the above content image, the activation outputs from the convolution layers are visualized in the next few figures.

## How to ensure that the generated image G matches the content of the image C?

As we know, the earlier (shallower) layers of a ConvNet tend to detect lower-level features such as edges and simple textures, and the later (deeper) layers tend to detect higher-level features such as more complex textures as well as object classes.

We would like the “generated” image G to have similar content as the input image C. Suppose we have chosen some layer’s activations to represent the content of an image. In practice, we shall get the most visually pleasing results if we choose a layer in the middle of the network – neither too shallow nor too deep.

First we need to compute the “content cost” using TensorFlow.

• The content cost takes a hidden layer activation of the neural network, and measures how different a(C) and a(G) are.
• When we minimize the content cost later, this will help make sure G
has similar content as C.

def compute_content_cost(a_C, a_G):
“””
Computes the content cost

Arguments:
a_C — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C
a_G — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G

Returns:
J_content — scalar that we need to compute using equation 1 above.
“””

# Retrieve dimensions from a_G
m, n_H, n_W, n_C = a_G.get_shape().as_list()

# Reshape a_C and a_G
a_C_unrolled = tf.reshape(tf.transpose(a_C), (m, n_H * n_W, n_C))
a_G_unrolled = tf.reshape(tf.transpose(a_G), (m, n_H * n_W, n_C))

# compute the cost with tensorflow
J_content = tf.reduce_sum((a_C_unrolled – a_G_unrolled)**2 / (4.* n_H * n_W *  \
n_C))

return J_content

## Computing the style cost

For our running example, we will use the following style image (S). This painting was painted in the style of impressionism, by  Claude Monet .

def gram_matrix(A):
“””
Argument:
A — matrix of shape (n_C, n_H*n_W)

Returns:
GA — Gram matrix of A, of shape (n_C, n_C)
“””

GA = tf.matmul(A, tf.transpose(A))
return GA

def compute_layer_style_cost(a_S, a_G):
“””
Arguments:
a_S — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S
a_G — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G

Returns:
J_style_layer — tensor representing a scalar value, style cost defined above by equation (2)
“””

# Retrieve dimensions from a_G
m, n_H, n_W, n_C = a_G.get_shape().as_list()

# Reshape the images to have them of shape (n_C, n_H*n_W)
a_S = tf.reshape(tf.transpose(a_S), (n_C, n_H * n_W))
a_G = tf.reshape(tf.transpose(a_G), (n_C, n_H * n_W))

# Computing gram_matrices for both images S and G (≈2 lines)
GS = gram_matrix(a_S)
GG = gram_matrix(a_G)

# Computing the loss
J_style_layer = tf.reduce_sum((GS – GG)**2 / (4.* (n_H * n_W * n_C)**2))

return J_style_layer

• The style of an image can be represented using the Gram matrix of a hidden layer’s activations. However, we get even better results combining this representation from multiple different layers. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
• Minimizing the style cost will cause the image G to follow the style of the image S.

## Defining the total cost to optimize

Finally, let’s create and implement a cost function that minimizes both the style and the content cost. The formula is:

def total_cost(J_content, J_style, alpha = 10, beta = 40):
“””
Computes the total cost function

Arguments:
J_content — content cost coded above
J_style — style cost coded above
alpha — hyperparameter weighting the importance of the content cost
beta — hyperparameter weighting the importance of the style cost

Returns:
J — total cost as defined by the formula above.
“””

J = alpha * J_content + beta * J_style
return J

• The total cost is a linear combination of the content cost J_content(C,G) and the style cost J_style(S,G).
• α and β are hyperparameters that control the relative weighting between content and style.

## Solving the optimization problem

Finally, let’s put everything together to implement Neural Style Transfer!

Here’s what the program will have to do:

• Create an Interactive Session
• Load the content image
• Load the style image
• Randomly initialize the image to be generated
• Load the VGG19 model
• Build the TensorFlow graph:
• Run the content image through the VGG19 model and compute the content cost.
• Run the style image through the VGG19 model and compute the style cost
Compute the total cost.
• Define the optimizer and the learning rate.
• Initialize the TensorFlow graph and run it for a large number of iterations, updating the generated image at every step.

Let’s first load, reshape, and normalize our “content” image (the Louvre museum picture) and “style” image (Claude Monet’s painting).

Now, we initialize the “generated” image as a noisy image created from the content_image. By initializing the pixels of the generated image to be mostly noise but still slightly correlated with the content image, this will help the content of the “generated” image more rapidly match the content of the “content” image. The following figure shows the noisy image:

Next, let’s load the pre-trained VGG-19 model.

To get the program to compute the content cost, we will now assign a_C and a_G to be the appropriate hidden layer activations. We will use layer conv4_2 to compute the content cost. We need to do the following:

• Assign the content image to be the input to the VGG model.
• Set a_C to be the tensor giving the hidden layer activation for layer “conv4_2”.
• Set a_G to be the tensor giving the hidden layer activation for the same layer.
• Compute the content cost using a_C and a_G.

Next, we need to compute the style cost and compute the total cost J by taking a linear combination of the two. Use alpha = 10 and beta = 40.

Then we are going to  set up the Adam optimizer in TensorFlow, using a learning rate of 2.0.

Finally, we need to initialize the variables of the tensorflow graph, assign the input image (initial generated image) as the input of the VGG19 model and runs the model to minimize the total cost J for a large number of iterations.

## Results

The following figures / animations show the generated images (G) with different content (C) and style images (S) at different iterations in the optimization process.

Content

Style (Claud Monet’s The Poppy Field near Argenteuil)

Generated

Content

Style

Generated

Content

Style

Generated

Content

Style (Van Gogh’s The Starry Night)

Generated

Content

Style

Generated

Content (Victoria Memorial Hall)

Style (Van Gogh’s The Starry Night)

Generated

Content (Taj Mahal)

Style (Van Gogh’s Starry Night Over the Rhone)

Generated

Content

Style (Claud Monet’s Sunset in Venice)

Generated

Content (Visva Bharati)

Style (Abanindranath Tagore’s Rabindranath in the role of  blind singer )

Generated

Content (Howrah Bridge)

Style (Van Gogh’s The Starry Night)

Generated

Content (Leonardo Da Vinci’s Mona Lisa)

Style (Van Gogh’s The Starry Night)

Generated

Content (My sketch: Rabindranath Tagore)

Style (Abanindranath Tagore’s Rabindranath in the role of  blind singer )

Generated

Content (me)

Style (Van Gogh’s Irises)

Generated

Content

Style

Generated

Content

Style (Publo Picaso’s Factory at Horto de Ebro)

Generated

The following animations show how the generated image changes with the change in VGG-19 convolution layer used for computing content cost.

Content

Style (Van Gogh’s The Starry Night)

Generated

convolution layer 3_2 used

convolution layer 4_2 used

convolution layer 5_2 used

# Some Computational Photography: Image Quilting (Texture Synthesis) with Dynamic Programming and Texture Transfer (Drawing with Textures) in Python

The following problems appeared as a programming assignment in the Computation Photography course (CS445) at UIUC. The description of the problem is taken from the assignment itself. In this assignment, a python implementation of the problems will be described instead of matlab, as expected in the course.

## The Problems

• The goal of this assignment is to implement the image quilting algorithm for
texture synthesis and transfer, described in this SIGGRAPH 2001 paper by Efros
and Freeman.
• Texture synthesis is the creation of a larger texture image from a small sample.
• Texture transfer is giving an object the appearance of having the
same texture as a sample while preserving its basic shape.
• For texture synthesis, the main idea is to sample patches and lay them down in overlapping patterns, such that the overlapping regions are similar.
• The overlapping regions may not match exactly, which will result in noticeable
edges artifact. To fix this, we need to compute a path along pixels with similar intensities through the overlapping region and use it to select which overlapping patch from which to draw each pixel.
• Texture transfer is achieved by encouraging sampled patches to have similar appearance to a given target image, as well as matching overlapping regions of already sampled patches.
• In this project, we need to apply important techniques such as template matching, finding seams, and masking. These techniques are also useful for image stitching, image completion, image retargeting and blending.

### Randomly Sampled Texture

First let’s randomly samples square patches of size patchsize from sample in order to create an output image of size outsize. Start from the upper-left corner, and tile samples until the image is full. If the patches don’t fit evenly into the output image, we can leave black borders at the edges. This is the simplest but least effective method. Save a result from a sample image to compare to the next two methods.

### Overlapping Patches

Let’s start by sampling a random patch for the upper-left corner. Then sample new patches to overlap with existing ones. For example, the second patch along the top row will overlap by patchsize pixels in the vertical direction and overlap pixels in the horizontal direction. Patches in the first column will overlap by patchsize pixels in the horizontal direction and overlap pixels in the vertical direction. Other patches will have two overlapping regions (on the top and left) which should both be taken into account. Once the cost of each patch has been computed, randomly choose on patch whose cost is
less than a threshold determined by some tolerance value.

As described in the paper, the size of the block is the only parameter controlled by the user and it depends on the properties of a given texture; the block must be big enough to capture the relevant structures in the texture, but small enough so that the interaction between these structures is left up to the algorithm. The overlap size is taken to be one-sixth of the block size (B/6) in general.

### Seam Finding

Next we need to find the min-cost contiguous path from the left to right side of the patch according to the cost. The cost of a path through each pixel is the square differences (summed over RGB for color images) of the output image and the newly
sampled patch. Use dynamic programming to find the min-cost path.

The following figure describes the algorithm to be implemented for image quilting.

### Texture Transfer

The final task is to implement texture transfer, based on the quilt implementation for creating a texture sample that is guided by a pair of sample/target correspondence images (section 3 of the paper). The main difference between this function and
quilt function is that there is an additional cost term based on the difference between
the sampled source patch and the target patch at the location to be filled.

## Image quilting (texture synthesis) results

The following figures and animations show the results of the outputs obtained with the quilting algorithm. The input texture images are mostly taken from the paper .

Input sample Texture

100 sampled blocks of a fixed size (e.g. 50×50) from the input sample

The next animation shows how the large output texture gets created (100 times larger than the input sample texture) with the quilting algorithm.

Output Texture (10×10 times larger than the input) created with texture synthesis (quilting)

Input Texture

Output Texture (25 times larger than the input) created with texture synthesis (quilting) with the minimum cost seams (showed as red lines) computed with dynamic programming

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (12 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (36 times larger than the input) created with quilting

Input Texture

Output Texture (9 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (9 times larger than the input) created with quilting along with the min-cost seams (shown as red lines) computed with dynamic programming

Output Texture (9 times larger than the input) created with quilting

## Texture transfer results

The following figures and animations show how the texture from an input image can be transferred to the target image using the above-mentioned simple modification of the quilting algorithm. Again, some of the images are taken from the paper.

Input Texture (milk)

Target Image

Output Image after Texture Transfer

Input Texture (milk)

Target Image

Output Image after Texture Transfer

The following figures show the output image obtained when a few textures were transferred to my image.

Target Image (me)

Input Texture (fire)

Output Image after Texture Transfer  (with small block size)

Input Texture (cloud)

Output Image after Texture Transfer  (with small block size)

Input Texture (toast)

Output Image after Texture Transfer  (with small block size)

Now applying some gradient mixing such as Poisson blending  on the original image and the the one after texture transfer with the target toast image we get the following two images respectively.

Input Texture

Output Image after Texture Transfer  (with small block size)

Input Texture

Output Image after Texture Transfer  (with small block size)

## Drawing with Textures

The following figures show the output image obtained when a few textures were transferred to another image of mine.

Target Image (me)

### Some textures from a few famous painting-masterpieces (obtained from here, here, here, here and here)

Input Texture (Van Gogh’s The Starry Night)

Output Images after Texture Transfer  (with 3 different patch sizes)

Input Texture (Van Gogh’s Wheatfield with Cypresses)

Output Image after Texture Transfer

Input Texture (Van Gogh’s The Mulberry Tree)

Output Image after Texture Transfer

Input Texture (Van Gogh’s Wheatfield with Crows)

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture (Van Gogh’s Vase with fifteen Sunflowers)

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture (Van Gogh’s Sunflowers (F452))

Output Image after Texture Transfer

Input Texture (Van Gogh’s Irises)

Output Image after Texture Transfer

Input Texture (Van Gogh’s Almond Blossom)

Output Image after Texture Transfer

Input Texture (Van Gogh’s Starry Night Over the Rhone)

Output Image after Texture Transfer

Input Texture (Pablo Picasso’s Mediterranean Landscape)

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture (Pablo Picasso’s Minotauromachy)

Output Image after Texture Transfer

Input Texture (Pablo Picasso’s Mother and Child 1921)

Output Image after Texture Transfer

Input Texture (Pablo Picasso’s Factory at Horto de Ebro)

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture (Pablo Picasso’s Carafe and Three Bowls)

Output Image after Texture Transfer

Input Texture (Pablo Picasso’s Bullfight 3)

Output Image after Texture Transfer

Input Texture (Pablo Picasso’s Accordionist)

Output Image after Texture Transfer

Input Texture (Pablo Picasso’s Las Meninas)

Output Image after Texture Transfer

Input Texture (Claude Monet’s The Artist’s Garden at Giverny

Output Image after Texture Transfer

Input Texture (Claude Monet’s The Poppy Field near Argenteuil

Output Image after Texture Transfer

Input Texture (Claude Monet’s The Magpie

Output Image after Texture Transfer

Input Texture (Claude Monet’s The Garden of Monet at Argenteuil

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture (Claude Monet’s Sunset in Venice

Output Image after Texture Transfer

Input Texture (Claude Monet’s Waterloo Bridge

Output Image after Texture Transfer

Input Texture (Claude Monet’s Water Lilies

Output Image after Texture Transfer

Input Texture (Claude Monet’s Impression Sunrise)

Output Image after Texture Transfer

Input Texture (Claude Monet’s Sunflowers)

Output Image after Texture Transfer

Input Texture (Abanindranath Tagore’s Canal in Shahjadpur)

Output Image after Texture Transfer

Input Texture (Abanindranath Tagore’s The Victory of Buddha)

Output Image after Texture Transfer

Input Texture (Abanindranath Tagore’s Rabindranath in the role of  blind singer)

Output Image after Texture Transfer

Input Texture (Abanindranath Tagore’s Slaying the Tornado Demon)

Output Image after Texture Transfer

Input Texture (Nandalal Bose’s Village Huts)

Output Image after Texture Transfer

### Some more Textures

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Image after Texture Transfer

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Images after Texture Transfer (with 2 different patch sizes)

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Image after Texture Transfer

Target Image (from The Mummy 1999)

Input Texture (sand)

Output Image after Texture Transfer

The following animation shows how the milk texture is being transformed to the target image of mine with the quilting algorithm with modified code.

Input Texture

Target Image (me)

Output Image after Texture Transfer  (with small block size)

The next figures and animations show the output image obtained after milk texture gets transferred to the target image of mine, for different block size of the samples (shown in red). As can be seen from the following outputs, the target image gets more and more prominent as the sampling block size gets smaller and smaller.