Solving Some Image Processing and Computer Vision Problems with Python libraries

In this article, a few image processing / computer vision problems and their solutions  with python libraries (scikit-image, cv2) will be discussed. Some of the problems are from the exercises from this book (available on amazon). This blog will be continued here.

Removing Gaussian Noise from images by computing mean and median images with scikit-image

  1. Start with an input image.
  2. Create n (e.g, n=100) noisy images by adding i.i.d. Gaussian noise (with zero mean) to the original image, with scikit-image.
  3. Compute the mean (median) of the noisy images.
  4. Compare PSNR with the original image.
  5. Vary n and compare the results.
from skimage import img_as_float
from skimage.util import random_noise
from skimage.measure import compare_psnr
from skimage.io import imread
import matplotlib.pylab as plt
import numpy as np

im = img_as_float(imread('../new images/parrot.jpg')) # original image
np.random.seed(0)
# generate n noisy images from the original image by adding Gaussian noise
n = 25
images = np.zeros((n, im.shape[0], im.shape[1], im.shape[2]))
sigma = 0.2
for i in range(n):
    images[i,...] = random_noise(im, var=sigma**2)

im_mean = images.mean(axis=0)
im_median = np.median(images, axis=0)
plt.figure(figsize=(20,16))
plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05, hspace=.01)
plt.subplot(221), plt.imshow(im), plt.axis('off'), plt.title('Original image', size=20)
plt.subplot(222), plt.imshow(images[0]), plt.axis('off'), plt.title('Noisy PSNR: ' + str(compare_psnr(im, images[0])), size=20)
plt.subplot(223), plt.imshow(im_mean), plt.axis('off'), plt.title('Mean PSNR: ' + str(compare_psnr(im, im_mean)), size=20)
plt.subplot(224), plt.imshow(im_median), plt.axis('off'), plt.title('Median PSNR: ' + str(compare_psnr(im, im_median)), size=20)
plt.show()

The next figure shows the original image, a noisy image generated from it by adding Gaussian noise (with 0 mean) to it and the images obtained by taking mean / median over all the n noisy images generated. As can be seen, the Gaussian noise in the images gets cancelled out by taking mean / median.

with n = 25

p25.png

with n=100
p100.png


plt.hist(images[:,100,100,0], color='red', alpha=0.2, label='red')
plt.hist(images[:,100,100,1], color='green', alpha=0.2, label='green')
plt.hist(images[:,100,100,2], color='blue', alpha=0.2, label='blue')
plt.legend()
plt.grid()
plt.show()

The next figure shows how a pixel value (that can be considered a random variable) for a particular location in different noisy images follows approximately a Gaussian distribution.

Distribution of a pixel value at location (100,100) in the noisy images

pg.png


ns = [25, 50, 100, 200]
# mean_psnrs contain the PSNR values for different n
plt.plot(ns, mean_psnrs, '.--', label='PSNR (mean)')
plt.plot(ns, median_psnrs, '.--', label='PSNR (median)')
plt.legend()
plt.xlabel('n'),  plt.ylabel('PSNR')
plt.show()

The following figure shows that the PSNR improves with large n (since by SLLN / WLLN, the sample mean converges to population mean 0 of the Gaussian noise). Also, for median the improvement in the image quality is higher for larger values of n.

psnr

Tracking Pedestrians with HOG-SVM with OpenCV / scikit-image

  1. Start with a video with pedestrians.
  2. Capture the video / extract frames from the video.
  3. For each frame
    1. Create HOG scale pyramid of the frame image.
    2. At each scale, use a sliding window to extract the corresponding block from the frame, compute the HOG descriptor features.
    3. Use cv2‘s HOGDescriptor_getDefaultPeopleDetector() – a pre-trained SVM classifier on the HOG descriptor to classify whether the corresponding block contains a pedestrian or not.
    4. Run non-max-suppression to get rid of multiple detection of the same person.
    5. Use cv2‘s  detectMultiScale() function to implement steps 3-4.

The code is adapted from the code here and here.


# HOG descriptor using default people (pedestrian) detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

# run detection, using a spatial stride of 4 pixels,
# a scale stride of 1.02, and zero grouping of rectangles
# (to demonstrate that HOG will detect at potentially
# multiple places in the scale pyramid)
(foundBoundingBoxes, weights) = hog.detectMultiScale(frame, winStride=(4, 4), padding=(8, 8), scale=1.02, finalThreshold=0, useMeanshiftGrouping=False)

# convert bounding boxes from format (x1, y1, w, h) to (x1, y1, x2, y2)
rects = np.array([[x, y, x + w, y + h] for (x, y, w, h) in foundBoundingBoxes])

# run non-max suppression on the boxes based on an overlay of 65%
nmsBoundingBoxes = non_max_suppression(rects, probs=None, overlapThresh=0.65)

cv2 functions are used to extract HOG descriptor features and pedestrian detection with SVM,  whereas scikit-image functions are used to visualize the HOG features. The animations below display the original video, what HOG sees and  the detected pedestrians after non-max suppression. Notice there are a few false positive detection.

Original Videoped1

HOG-descriptor features video (what HOG sees)pedh1Original Video with detected Pedestrians ped1o

Face Detection with HaarCascade pre-trained AdaBoost classifiers with OpenCV

  1. Capture video with webcam with cv2.VideoCapture().
  2. For each frame, use the pre-trained Adaboost Cascade classifiers (the haarcascade_frontalface_default classifier for face detection and haarcascade_eye_tree_eyeglasses classifier for better detection of the eyes with glasses, from the corresponding xml files that come with cv2’s installation) using Haar-like features with cv2.CascadeClassifier().
  3. First detect the face(s) with the detectMultiScale() function and draw a bounding box. Then detect the eyes inside a detected face with the same function.
  4. The following python code snippet shows how to detect faces and eyes with cv2. The code is adapted from here.

 


# read the cascade classifiers from the xml files from the correct path into face_cascade  # and eye_cascade
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# return bounding box of the face(s) if one is detected
faces = face_cascade.detectMultiScale(gray, 1.03, 5)
for (x,y,w,h) in faces:
   frame = cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),2)
   roi_gray = gray[y:y+h, x:x+w]
   roi_color = frame[y:y+h, x:x+w]
   eyes = eye_cascade.detectMultiScale(roi_gray)
   for (ex,ey,ew,eh) in eyes:
      cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)

The next animation shows the results of face detection when scalefactor 1.03 was used to create the scale pyramid.  As can be seen, the eyes with the glasses on and some small faces from the photos are not detected at this scale.

fcr.gif

The next animation shows the results of face detection when scalefactor 1.3 was used to create the scale pyramid.  As can be seen, the eyes with/without the glasses on as well as most of the small faces from the photos are detected at this scale most of the time.

me

kiff

Object Tracking with OpenCV trackers

ftfttmilfttcrt

Object Saliency Detection with OpenCV

saliency

Linear / QR Barcode Generation and Detection with OpenCV

me_barcodeme_barcode_detected

me_barcode_detected

bard

Image Segmentation with Random Walk with scikit-image

rw_segmentation1.png

 

Image Segmentation with Grab-Cut with OpenCV

gr_whale.png

Segmentation with SLIC + RAG in scikit-image

me_seg

Homography with scikit-image / OpenCV

books

me10.jpg
homo1

shutterstock

me12.jpg

me_shutter

 

boimela

 

 

homo.gif

Face Morphing (Beier-Neely morphing) with Pystasm

me14     sheldon

fk.png

ad

Object Detection with YOLO DarkNet / Keras / OpenCV (Deep Learning model)

yolo_me

Semantic Segmentation with ENet / DeepLab (Deep Learning  model)

Input video and the segmented Output video
result

Input video and the segmented Output video
cycle

Neural Style Transfer with OpenCV / Torch (Deep Learning Model)

Input  & Output with style (Starry night)me_style

Text Detection with EAST (Deep Learning Model)

st

result

 

Image Colorization with Deep Learning (OpenCV / Caffe)

pcolor

result

OCR + Text Recognition with EAST + Tesseract

tes

 

Advertisements

Solving Some Image Processing Problems with Python libraries

In this article a few popular image processing problems along with their solutions are going to be discussed. Python image processing libraries are going to be used to solve these problems. Some of the problems are from the exercises from this book  (available on amazon).

 

Image Transformations and Warping

 

0. Display RGB image color channels in 3D

  1. A gray-scale image can be thought of a 2-D function f(x,y) of the pixel locations (x,y), that maps each pixel into its corresponding gray level (an integer in [0,255], e.g.,).
  2. For an RGB image there are 3 such functions, f_R(x,y), f_G(x.y), f_B(x.y).
  3. matplotlib’s 3-D plot functions can be used to plot each function.

The following python code shows how to plot the RGB channels separately in 3D:

import matplotlib.pylab as plt
from mpl_toolkits.mplot3d import Axes3D
def plot_3d(X, Y, Z, title, cmap):
    # implement this function to plot the channel pixel values in 3D
    plt.show()

im = imread('../images/parrot.jpg')
Y = np.arange(im.shape[0])
X = np.arange(im.shape[1])
Z1 = im[...,0]
Z2 = im[...,1]
Z3 = im[...,2]
plot_3d(X, Y, Z1, cmap='Reds', title='3D plot for the Red Channel')
plot_3d(X, Y, Z2, cmap='Greens', title='3D plot for the Green Channel')
plot_3d(X, Y, Z3, cmap='Blues', title='3D plot for the Blue Channel')

The RGB image
parrot.jpg

parrot_redparrot_greenparrot_blue

1. Wave Transform

  1. Use scikit-image’s warp() function to implement the wave transform.
  2. Note that wave transform can be expressed with the following equations:

We shall use the madrill image to implement the wave transform. The next python code fragment shows how to do it:


def wave(xy):
    xy[:, 1] += 20*np.sin(2*np.pi*xy[:, 0]/64)
    return xy

from skimage.io import imread
from skimage.transform import warp
import matplotlib.pylab as plt
im = imread('images/mandrill.jpg')
im = warp(im, wave)
plt.imshow(im)
plt.show()

The next figure shows the original mandrill input image and the output image obtained after applying the wave transform.

mandrill      mandrill_w.png

mw.gif

2. Swirl Transform

  1. Use scikit-image’s warp() function to implement the swirl transform.
  2. Note that swirl transform can be expressed with the following equations


We shall use the madrill image to implement the wave transform. The next python code fragment shows how to do it:


def swirl(xy, x0, y0, R):
    r = np.sqrt((xy[:,1]-x0)**2 + (xy[:,0]-y0)**2)
    a = np.pi * r / R
    xy[:, 1] = (xy[:, 1]-x0)*np.cos(a) + (xy[:, 0]-y0)*np.sin(a) + x0
    xy[:, 0] = -(xy[:, 1]-x0)*np.sin(a) + (xy[:, 0]-y0)*np.cos(a) + y0
    return xy

im = imread('../images/mandrill.jpg')
im = warp(im, swirl, map_args={'x0':112, 'y0':112, 'R':512})
plt.imshow(im)
plt.axis('off')
plt.show()

The next figure shows the original mandrill input image and the output image obtained after applying the swirl transform.

mandrill   ms.png

ms.gif

Compare this with the output of the scikit-image swirl() function.

3. Very simple Face morphing with α-blending

  1. Start from one face image (e.g., let image1 be the face of Messi) and end into another image (let image2 be the face of Ronaldo) iteratively, creating some intermediate images in between.
  2. At each iteration create an image by using a linear combination of the two image numpy ndarrays given by
         
     3. Iteratively increase α from 0 to 1.

The following code block shows how to implement it using matplotlib’s image and pylab modules.

im1 = mpimg.imread("../images/messi.jpg") / 255 # scale RGB values in [0,1]
im2 = mpimg.imread("../images/ronaldo.jpg") / 255
i = 1
plt.figure(figsize=(18,15))
for alpha in np.linspace(0,1,20):
 plt.subplot(4,5,i)
 plt.imshow((1-alpha)*im1 + alpha*im2)
 plt.axis('off')
 i += 1
plt.subplots_adjust(wspace=0.05, hspace=0.05)
plt.show()

The next animation shows the simple face morphing:
fm.gif
There are more sophisticated techniques to improve the quality of morphing, but this is the simplest one.

4. Creating Instagram-like Gotham Filter

The Gotham filter

The Gotham filter is computed as follows (the steps taken from here), applying the following operations on an image, the corresponding python code, input and output images are shown along with the operations (with the following input image):

city2.jpg

  1. A mid-tone red contrast boost
    from PIL import Image
    import numpy as np
    import matplotlib.pylab as plt
    im = Image.open('../images/city.jpg') # pixel values in [0,255]
    r, g, b = im.split()
    red_levels = [0., 12.75, 25.5, 51., 76.5, 127.5, 178.5, 204., 229.5, 242.25, 255.]
    r1 = Image.fromarray((np.reshape(np.interp(np.array(r).ravel(), np.linspace(0,255,len(red_levels)), red_levels), (im.height, im.width))).astype(np.uint8), mode='L')
    plt.figure(figsize=(20,15))
    plt.subplot(221)
    plt.imshow(im)
    plt.title('original', size=20)
    plt.axis('off')
    plt.subplot(222)
    im1 = Image.merge('RGB', (r1, g, b))
    plt.imshow(im1)
    plt.axis('off')
    plt.title('with red channel interpolation', size=20)
    plt.subplot(223)
    plt.hist(np.array(r).ravel(), normed=True)
    plt.subplot(224)
    plt.hist(np.array(r1).ravel(), normed=True)
    plt.show()
    

    gotham1.png

  2. Make the blacks a little bluer
    plt.figure(figsize=(20,10))
    plt.subplot(121)
    plt.imshow(im1)
    plt.title('last image', size=20)
    plt.axis('off')
    b1 = Image.fromarray(np.clip(np.array(b) + 7.65, 0, 255).astype(np.uint8))
    im1 = Image.merge('RGB', (r1, g, b1))
    plt.subplot(122)
    plt.imshow(im1)
    plt.axis('off')
    plt.title('with transformation', size=20)
    plt.tight_layout()
    plt.show()
    

    gotham2.png

  3. A small sharpening
    from PIL.ImageEnhance import Sharpness
    plt.figure(figsize=(20,10))
    plt.subplot(121)
    plt.imshow(im1)
    plt.title('last image', size=20)
    plt.axis('off')
    im2 = Sharpness(im1).enhance(3.0)
    plt.subplot(122)
    plt.imshow(im2)
    plt.axis('off')
    plt.title('with transformation', size=20)
    plt.tight_layout()
    plt.show()
    

    gotham3.png

  4. A boost in blue channel for lower mid-tones
  5. A decrease in blue channel for upper mid-tones
    blue_levels = [0., 11.985, 30.09, 64.005, 81.09, 99.96, 107.1, 111.945, 121.125, 143.055, 147.9, 159.885, 171.105, 186.915, 215.985, 235.875, 255.]
    b2 = Image.fromarray((np.reshape(np.interp(np.array(b1).ravel(), np.linspace(0,255,len(blue_levels)), blue_levels), (im.height, im.width))).astype(np.uint8), mode='L')
    plt.figure(figsize=(20,15))
    plt.subplot(221)
    plt.imshow(im2)
    plt.title('last image', size=20)
    plt.axis('off')
    plt.subplot(222)
    im3 = Image.merge('RGB', (r1, g, b2))
    plt.imshow(im3)
    plt.axis('off')
    plt.title('with blue channel interpolation', size=20)
    plt.subplot(223)
    plt.hist(np.array(b1).ravel(), normed=True)
    plt.subplot(224)
    plt.hist(np.array(b2).ravel(), normed=True)
    plt.show()
    

gotham4.png

The output image obtained after applying the Gotham filter is shown below:

gotham_out

Down-sampling with anti-aliasing using Gaussian Filter

  1. Start with a large gray-scale image and reduce the image size 16 times, by reducing both height and width by 4 times.
  2. Select every 4th pixel in the x and the y direction from the original image to compute the values of the pixels in the smaller image.
  3. Before down-sampling apply a Gaussian filter (to smooth the image) for anti-aliasing.
  4. Compare the quality of the output image obtained by down-sampling without a Gaussian filter (with aliasing).

The next code block performs the above steps. Since the Gaussian blur is a low-pass filter, it removes the high frequencies from the original input image, hence it’s possible to achieve sampling rate above the Nyquist rate (by sampling theorem) to avoid aliasing.


from scipy.ndimage import gaussian_filter
im = rgb2gray(imread('images/umbc.png'))
print(im.shape)
plt.figure(figsize=(20,20))
plt.imshow(im)
plt.show()
plt.figure(figsize=(20,20))
im_blurred = gaussian_filter(im, sigma=2.5) #(5,5,1)
plt.imshow(im_blurred)
plt.show()
n = 4 # create and image 16 times smaller in size
w, h = im.shape[0] // n, im.shape[1] // n
im_small = np.zeros((w,h))
for i in range(w):
   for j in range(h):
      im_small[i,j] = im[n*i, n*j]
plt.figure(figsize=(20,20))
plt.imshow(im_small)
plt.show()
im_small = np.zeros((w,h))
for i in range(w):
   for j in range(h):
      im_small[i,j] = im_blurred[n*i, n*j]
plt.figure(figsize=(20,20))
plt.imshow(im_small)
plt.show()

Original Image
orig_umbc.png

    Image blurred with Gaussian Filter LPF blur_umbc.png

Down-sampled Image from the original image (with aliasing)

alias_umbc.png

Down-sampled Image from the blurred image (with anti-aliasing)anti_alias_umbc

Some Applications of DFT

 

0. Fourier Transform of a Gaussian Kernel is another Gaussian Kernel

Also, the spread in the frequency domain  inversely proportional to the spread in the spatial domain (known as Heisenberg’s inequality). Here is the proof:

f9.png

The following animation shows an example visualizing the Gaussian contours in spatial and corresponding frequency domains:

gaussian.gif

 

1. Using DFT to up-sample an image

  1. Let’s use the lena gray-scale image.
  2. First double the size of the by padding zero rows/columns at every alternate positions.
  3. Use FFT followed by an LPF.
  4. Finally use IFFT to get the output image.

 

The following code block shows the python code for implementing the steps listed above:


import numpy as np
import numpy.fft as fp
import matplotlib.pyplot as plt

im = np.mean(imread('images/lena.jpg'), axis=2)
im1 = np.zeros((2*im.shape[0], 2*im.shape[1]))
print(im.shape, im1.shape)
for i in range(im.shape[0]):
    for j in range(im.shape[1]):
        im1[2*i,2*j] = im[i,j]

def padwithzeros(vector, pad_width, iaxis, kwargs):
    vector[:pad_width[0]] = 0
    vector[-pad_width[1]:] = 0
    return vector

# the LPF kernel
kernel = [[0.25, 0.5, 0.25], [0.5, 1, 0.5], [0.25, 0.5, 0.25]]
# enlarge the kernel to the shape of the image
kernel = np.pad(kernel, (((im1.shape[0]-3)//2,(im1.shape[0]-3)//2+1), ((im1.shape[1]-3)//2,(im1.shape[1]-3)//2+1)), padwithzeros) 

plt.figure(figsize=(15,10))
plt.gray() # show the filtered result in grayscale

freq = fp.fft2(im1)
freq_kernel = fp.fft2(fp.ifftshift(kernel))
freq_LPF = freq*freq_kernel # by the Convolution theorem
im2 = fp.ifft2(freq_LPF)
freq_im2 = fp.fft2(im2)

plt.subplot(2,3,1)
plt.imshow(im)
plt.title('Original Image', size=20)
plt.subplot(2,3,2)
plt.imshow(im1)
plt.title('Padded Image', size=20)
plt.subplot(2,3,3)
plt.imshow( (20*np.log10( 0.1 + fp.fftshift(freq))).astype(int), cmap='jet')
plt.title('Original Image Spectrum', size=20)
plt.subplot(2,3,4)
plt.imshow( (20*np.log10( 0.1 + fp.fftshift(freq_kernel))).astype(int), cmap='jet')
plt.title('Image Spectrum of the LPF', size=20)
plt.subplot(2,3,5)
plt.imshow( (20*np.log10( 0.1 + fp.fftshift(freq_im2))).astype(int), cmap='jet')
plt.title('Image Spectrum after LPF', size=20)
plt.subplot(2,3,6)
plt.imshow(im2.astype(np.uint8)) # the imaginary part is an artifact
plt.title('Output Image', size=20)

The next figure shows the output. As can be seen from the next figure, the LPF removed the high frequency components from the Fourier spectrum of the padded image and with a subsequent inverse Fourier transform  we get a decent enlarged image.

lena_LPF_fft.png

 

2. Frequency Domain Gaussian Filter

  1. Use an input image and use DFT to create the frequency 2D-array.
  2. Create a small Gaussian 2D Kernel (to be used as an LPF) in the spatial domain and pad it to enlarge it to the image dimensions.
  3. Use DFT to obtain the Gaussian Kernel in the frequency domain.
  4. Use the Convolution theorem to convolve the LPF with the input image in the frequency domain.
  5. Use IDFT to obtain the output image.
  6. Plot the frequency spectrum of the image, the gaussian kernel and the image obtained after convolution in the frequency domain, in 3D.

The following code block shows the python code:


import matplotlib.pyplot as plt
from matplotlib import cm
from skimage.color import rgb2gray
from skimage.io import imread
import scipy.fftpack as fp

im = rgb2gray(imread('images/temple.jpg'))
kernel = np.outer(signal.gaussian(im.shape[0], 10), signal.gaussian(im.shape[1], 10))
freq = fp.fft2(im)
assert(freq.shape == kernel.shape)
freq_kernel = fp.fft2(fp.ifftshift(kernel))
convolved = freq*freq_kernel # by the Convolution theorem
im_blur = fp.ifft2(convolved).real
im_blur = 255 * im_blur / np.max(im_blur)

# center the frequency response
plt.imshow( (20*np.log10( 0.01 + fp.fftshift(freq_kernel))).real.astype(int), cmap='coolwarm')
plt.colorbar()
plt.show()

plt.figure(figsize=(20,20))
plt.imshow(im, cmap='gray')
plt.show()

from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import LinearLocator, FormatStrFormatter
# ... code for 3D visualization of the spectrums

The original color temple image (time / spatial domain)

temple.JPG

The temple image (frequency domain)

temple_freq

The Gaussian Kernel LPF in 2D (frequency domain)

gaussian_freq2d

The Gaussian Kernel LPF (frequency domain)
gaussian_freq

The smoothed temple image with the LPF (frequency domain)

temple_freq_LPF

If we set the standard deviation of the LPF Gaussian kernel to be 10 we get the following output as shown in the next figures. As can be seen, the frequency response value drops much quicker from the center.

The smoothed temple image with the LPF with higher s.d. (frequency domain)

temple_freq_LPF2

The output image after convolution (spatial / time domain)

temple_blurred

3. Using the inverse filter to restore a motion-blurred image

  1. First create a motion blur kernel of a given shape.
  2. Convolve the kernel with an input image in the frequency domain.
  3. Get the motion-blurred image in the spatial domain with IDFT.
  4. Compute the inverse filter kernel and convolve with the blurred image in the frequency domain.
  5. Get the convolved image back in the spatial domain.
  6. Plot all the images and kernels in the frequency domain.

The following code block shows the python code:


im = rgb2gray(imread('../images/madurga.jpg'))
# create the motion blur kernel
size = 21
kernel = np.zeros((size, size))
kernel[int((size-1)/2), :] = np.ones(size)
kernel = kernel / size
kernel = np.pad(kernel, (((im.shape[0]-size)//2,(im.shape[0]-size)//2+1), ((im.shape[1]-size)//2,(im.shape[1]-size)//2+1)), padwithzeros)

freq = fp.fft2(im)
freq_kernel = fp.fft2(fp.ifftshift(kernel))
convolved1 = freq1*freq_kernel1
im_blur = fp.ifft2(convolved1).real
im_blur = im_blur / np.max(im_blur)

epsilon = 10**-6

freq = fp.fft2(im_blur)
freq_kernel = 1 / (epsilon + freq_kernel1)

convolved = freq*freq_kernel
im_restored = fp.ifft2(convolved).real
im_restored = im_restored / np.max(im_restored)

plt.figure(figsize=(18,12))
plt.subplot(221)
plt.imshow(im)
plt.title('Original image', size=20)
plt.axis('off')
plt.subplot(222)
plt.imshow(im_blur)
plt.title('Blurred image with motion blur kernel', size=20)
plt.axis('off')
plt.subplot(223)
plt.imshow(im_restored)
plt.title('Restored image with inverse filter', size=20)
plt.axis('off')
plt.subplot(224)
plt.imshow(im_restored - im)
plt.title('Diff restored & original image', size=20)
plt.axis('off')
plt.show()

# Plot the surface of the frequency responses here

madurga_inverse

Frequency response of the input image

madurga_freq.png

(log) Frequency response of the motion blur kernel (LPF)

motion_blur_freq.png

Input image convolved with the motion blur kernel (frequency domain)

madurga_convolved1

(log) Frequency response of the inverse frequency filter kernel (HPF)

motion_blur_freq

Motion-blurred image convolved with the inverse frequency filter kernel (frequency domain)

madurga_convolved.png

 

4. Impact of noise on the inverse filter

  1. Add some random noise to the Lena image.
  2. Blur the image with a Gaussian kernel.
  3. Restore the image using inverse filter.

With the original image

Let’s first blur and apply the inverse filter on the noiseless blurred image. The following figures show the outputs:

lena_inverse

(log) Frequency response of the input imagelena_freq.png

(log) Frequency response of the Gaussian blur kernel (LPF)lena_gaussian_freq

(log) Frequency response of the blurred image

lena_blurred_spectrum.png

(log) Frequency response of the inverse kernel (HPF)

gauss_inverse_kernel.png

Frequency response of the output imagelena_inverse_spectrum

Adding noise to the original image

The following python code can be used to add Gaussian noise to an image:


from skimage.util import random_noise
im = random_noise(im, var=0.1)


The next figures show the noisy lena image, the blurred image with a Gaussian Kernel and the restored image with the inverse filter. As can be seen, being a high-pass filter, the inverse filter enhances the noise, typically corresponding to high frequencies.

 

lena_noisy_inverse

5. Use a notch filter to remove periodic noise from the following half-toned car image.

halftone.png

  1. Use DFT to obtain the frequency spectrum of the image.
  2. Block the high frequency components that are most likely responsible fro noise.
  3. Use IDFT to come back to the spatial domain.

 


from scipy import fftpack
im = imread('images/halftone.png')
F1 = fftpack.fft2((im).astype(float))
F2 = fftpack.fftshift(F1)
for i in range(60, w, 135):
   for j in range(100, h, 200):
     if not (i == 330 and j == 500):
        F2[i-10:i+10, j-10:j+10] = 0
for i in range(0, w, 135):
   for j in range(200, h, 200):
     if not (i == 330 and j == 500):
        F2[max(0,i-15):min(w,i+15), max(0,j-15):min(h,j+15)] = 0
plt.figure(figsize=(6.66,10))
plt.imshow( (20*np.log10( 0.1 + F2)).astype(int), cmap=plt.cm.gray)
plt.show()
im1 = fp.ifft2(fftpack.ifftshift(F2)).real
plt.figure(figsize=(10,10))
plt.imshow(im1, cmap='gray')
plt.axis('off')
plt.show()


Frequency response of the input image
car_freq

Frequency response of the input image with blocked frequencies with notch

car_notch_blocked

Output image

car_notch

With a low-pass-filter (LPF):

Frequency response of the input image with blocked frequencies with LPF
car_freq_notch

Output imagecar_notch

Histogram Matching with color images

As described here, here is the algorithm:

  1. The cumulative histogram is computed for each image dataset, see the figure below.
  2. For any particular value (xi) in the input image data to be adjusted has a cumulative histogram value given by G(xi).
  3. This in turn is the cumulative distribution value in the reference (template) image  dataset, namely H(xj). The input data value xi is replaced by xj.

 

im = imread('images/lena.jpg')
im_t = imread('images/vstyle.png')
im1 = np.zeros(im.shape).astype(np.uint8)
plt.figure(figsize=(20,10))
for i in range(3):
    c = cdf(im[...,i])
    c_t = cdf(im_t[...,i])
    im1[...,i] = hist_matching(c, c_t, im[...,i]) # implement this function with the above algorithm
    c1 = cdf(im1[...,i])
    col = 'r' if i == 0 else ('g' if i == 1 else 'b')
    plt.plot(np.arange(256), c, col + ':', label='input ' + col.upper(), linewidth=5)
    plt.plot(np.arange(256), c_t, col + '--', label='template ' + col.upper(), linewidth=5)
    plt.plot(np.arange(256), c1, col + '-', label='output ' + col.upper(), linewidth=2)
plt.title('CDF', size=20)
plt.legend(prop={'size': 15})
plt.show()

plt.figure(figsize=(10,10))
plt.imshow(im1[...,:3])
plt.axis('off')
plt.show()

Input image                                                 
lena

Template Imagevstyle

Output Image
lena_starry

The following figure shows how the histogram of the input image is matched with the histogram of the template image.

lena_starry_cdf

Another example:

Input image                                                 

me1

Template Image

treebirds

Output Image

me_t

me_hm

Mathematical Morphology

1. Automatically cropping an image

  1. Let’s use the following image. The image has unnecessary white background  outside the molecule of the organic compound.L_2d
  2. First convert the image to a binary image and compute the convex hull of the molecule object.
  3. Use the convex hull image to find the bounding box for cropping.
  4. Crop the original image with the bounding box.

The next python code shows how to implement the above steps:


from PIL import Image
from skimage.io import imread
from skimage.morphology import convex_hull_image
im = imread('../images/L_2d.jpg')
plt.imshow(im)
plt.title('input image')
plt.show()
im1 = 1 - rgb2gray(im)
threshold = 0.5
im1[im1  threshold] = 1
chull = convex_hull_image(im1)
plt.imshow(chull)
plt.title('convex hull in the binary image')
plt.show()
imageBox = Image.fromarray((chull*255).astype(np.uint8)).getbbox()
cropped = Image.fromarray(im).crop(imageBox)
cropped.save('L_2d_cropped.jpg')
plt.imshow(cropped)
plt.title('cropped image')
plt.show()

chem1chem2chem3

This can also be found here.

2. Opening and Closing are Dual operations in mathematical morphology

  1. Start with a binary image and apply opening operation with some structuring element (e.g., a disk) on it to obtain an output image.
  2. Invert the image (to change the foreground to background and vice versa) and apply closing operation on it with the same structuring element to obtain another output image.
  3. Invert the second output image obtained and observe that it’s same as the first output image.
  4. Thus applying opening operation to the foreground of a binary image is equivalent to applying closing operation to the background of the same image with the same structuring element.

The next python code shows the implementation of the above steps.


from skimage.morphology import binary_opening, binary_closing, disk
from skimage.util import invert
im = rgb2gray(imread('../new images/circles.jpg'))
im[im  0.5] = 1
plt.gray()
plt.figure(figsize=(20,10))
plt.subplot(131)
plt.imshow(im)
plt.title('original', size=20)
plt.axis('off')
plt.subplot(1,3,2)
im1 = binary_opening(im, disk(12))
plt.imshow(im1)
plt.title('opening with disk size ' + str(12), size=20)
plt.axis('off')
plt.subplot(1,3,3)
im1 = invert(binary_closing(invert(im), disk(12)))
plt.imshow(im1)
plt.title('closing with disk size ' + str(12), size=20)
plt.axis('off')
plt.show()

As can be seen the output images obtained are exactly same.

opening_closing_dual

Floyd-Steinberg Dithering (to convert a grayscale to a binary image)

The next figure shows the algorithm for error diffusion dithering.

fsa.png


def find_closest_palette_color(oldpixel):
    return int(round(oldpixel / 255)*255)

im = rgb2gray(imread('../my images/godess.jpg'))*255
pixel = np.copy(im)
w, h = im.shape

for x in range(w):
    for y in range(h):
        oldpixel = pixel[x][y]
        newpixel = find_closest_palette_color(oldpixel)
        pixel[x][y] = newpixel
        quant_error = oldpixel - newpixel
        if x + 1 < w-1:
          pixel[x + 1][y] = pixel[x + 1][y] + quant_error * 7 / 16
        if x > 0 and y < h-1:
           pixel[x - 1][y + 1] = pixel[x - 1][y + 1] + quant_error * 3 / 16
        if y < h-1:
           pixel[x ][y + 1] = pixel[x ][y + 1] + quant_error * 5 / 16
        if x < w-1 and y < h-1:
           pixel[x + 1][y + 1] = pixel[x + 1][y + 1] + quant_error * 1 / 16

plt.figure(figsize=(10,20))
plt.imshow(pixel, cmap='gray')
plt.axis('off')
plt.show()

The input image (gray-scale)

godess_gray.jpg

The output Image (binary)

godess_binary.png

The next animation shows how an another input grayscale image gets converted to output binary image using the error diffusion dithering.

swan.gif

Sharpen a color image

  1. First blur the image with an LPF (e.g., Gaussian Filter).
  2. Compute the detail image as the difference between the original and the blurred image.
  3. Now the sharpened image can be computed as a linear combination of the original image and the detail  image. The next figure illustrates the concept.

f8

The next python code shows how this can be implemented in python:


from scipy import misc, ndimage
import matplotlib.pyplot as plt
import numpy as np

def rgb2gray(im):
    return np.clip(0.2989 * im[...,0] + 0.5870 * im[...,1] + 0.1140 * im[...,2], 0, 1)

im = misc.imread('../my images/me.jpg')/255
im_blurred = ndimage.gaussian_filter(im, (5,5,0))
im_detail = np.clip(im - im_blurred, 0, 1)
fig, axes = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=True, figsize=(15, 15))
axes = axes.ravel()
axes[0].imshow(im)
axes[0].set_title('Original image', size=15)
axes[1].imshow(im_blurred)
axes[1].set_title('Blurred image, sigma=5', size=15)
axes[2].imshow(im_detail)
axes[2].set_title('Detail image', size=15)
alpha = [1, 5, 10]
for i in range(3):
    im_sharp = np.clip(im + alpha[i]*im_detail, 0, 1)
    axes[3+i].imshow(im_sharp)
    axes[3+i].set_title('Sharpened image, alpha=' + str(alpha[i]), size=15)
for ax in axes:
    ax.axis('off')
fig.tight_layout()
plt.show()

The next figure shows the output of the above code block. As cane be seen, the output gets more sharpened as the value of alpha gets increased.

me_unsharp2.png

The next animation shows how the image gets more and more sharpened with increasing alpha.

me_sharp

Edge Detection with LOG and Zero-Crossing Algorithm by Marr and Hildreth

The following figure shows LOG filter and its DOG approximation.

f1.png

LOG.png

In order to detect edges as a binary image, finding the zero-crossings in the LoG-convolved image was proposed by Marr and Hildreth. Identification of the edge pixels can be done by viewing the sign of the LoG-smoothed image by defining it as a binary image, the algorithm is as follows:

Algorithm to compute the zero-crossing 

  1. First convert the LOG-convolved image to a binary image, by replacing the pixel values by 1 for positive values and 0 for negative values.
  2. In order to compute the zero crossing pixels, we need to simply look at the boundaries of the non-zero regions in this binary image.
  3. Boundaries can be found by finding any non-zero pixel that has an immediate neighbor which is is zero.
  4. Hence, for each pixel, if it is non-zero, consider its 8 neighbors, if any of the neighboring pixels is zero, the pixel can be identified as an edge.

The next python code and the output images / animations generated show how to detect the edges from the  zebra image with LOG + zero-crossings:


from scipy import ndimage, misc
import matplotlib.pyplot as plt
from scipy.misc import imread
from skimage.color import rgb2gray
def any_neighbor_zero(img, i, j):
    for k in range(-1,2):
      for l in range(-1,2):
         if img[i+k, j+k] == 0:
            return True
    return False
def zero_crossing(img):
  img[img > 0] = 1
  img[img  0 and any_neighbor_zero(img, i, j):
        out_img[i,j] = 255
  return out_img

img = rgb2gray(imread('../images/zebras.jpg'))

fig = plt.figure(figsize=(25,15))
plt.gray() # show the filtered result in grayscale

for sigma in range(2,10, 2):
plt.subplot(2,2,sigma/2)
result = ndimage.gaussian_laplace(img, sigma=sigma)
plt.imshow(zero_crossing(result))
plt.axis('off')
plt.title('LoG with zero-crossing, sigma=' + str(sigma), size=30)

plt.tight_layout()
plt.show()

Original Input Imagezebras.jpg

Output with edges detected with LOG + zero-crossing at different sigma scales
zera_log_zc.png

zebra_log

With another input image
me6.jpg

Output with edges detected with LOG + zero-crossing at different sigma scales

me6_log_zc.png

me_log.gif

Constructing the Gaussian Pyramid with scikit-image transform module’s reduce function and Laplacian Pyramid from the Gaussian Pyramid and the expand function

The Gaussian Pyramid can be computed with the following steps:

  1. Start with the original image.
  2. Iteratively compute the image at each level of the pyramid first by smoothing the image (with gaussian filter) and then downsampling it .
  3. Stop at a level where the image size becomes sufficiently small (e.g., 1×1).

The Laplacian Pyramid can be computed with the following steps:

  1. Start with the Gaussian Pyramid and with the smallest image.
  2. Iteratively compute the difference image in between the image at the current level and the image obtained by first upsampling and then smoothing the image (with gaussian filter) from the previous level of the Gaussian Pyramid.
  3. Stop at a level where the image size becomes equal to the original image size.

The next python code shows how to create a Gaussian Pyramid from an image.

import numpy as np
import matplotlib.pyplot as plt
from skimage.io import imread
from skimage.color import rgb2gray
from skimage.transform import pyramid_reduce, pyramid_expand, resize

def get_gaussian_pyramid(image):
    rows, cols, dim = image.shape
    gaussian_pyramid = [image]
    while rows> 1 and cols > 1:
        image = pyramid_reduce(image, downscale=2)
        gaussian_pyramid.append(image)
        print(image.shape)
        rows //= 2
        cols //= 2
    return gaussian_pyramid

def get_laplacian_pyramid(gaussian_pyramid):
    laplacian_pyramid = [gaussian_pyramid[len(gaussian_pyramid)-1]]
    for i in range(len(gaussian_pyramid)-2, -1, -1):
        image = gaussian_pyramid[i] - resize(pyramid_expand(gaussian_pyramid[i+1]), gaussian_pyramid[i].shape)
        laplacian_pyramid.append(np.copy(image))
    laplacian_pyramid = laplacian_pyramid[::-1]
    return laplacian_pyramid

image = imread('../images/antelops.jpeg')
gaussian_pyramid = get_gaussian_pyramid(image)
laplacian_pyramid = get_laplacian_pyramid(gaussian_pyramid)

w, h = 20, 12
for i in range(3):
    plt.figure(figsize=(w,h))
    p = gaussian_pyramid[i]
    plt.imshow(p)
    plt.title(str(p.shape[0]) + 'x' + str(p.shape[1]), size=20)
    plt.axis('off')
    w, h = w / 2, h / 2
    plt.show()

w, h = 10, 6
for i in range(1,4):
    plt.figure(figsize=(w,h))
    p = laplacian_pyramid[i]
    plt.imshow(rgb2gray(p), cmap='gray')
    plt.title(str(p.shape[0]) + 'x' + str(p.shape[1]), size=20)
    plt.axis('off')
    w, h = w / 2, h / 2
    plt.show()

 

Some images from the Gaussian Pyramid

a1
a2
a3
a4

Some images from the Laplacian Pyramid

a5a6

a7

Blending images with Gaussian and Laplacian pyramids

Here is the algorithm:

f6.png

Blending the following input images A, B with mask image M

Input Image A (Goddess Durga)madurga1

Input Image B (Lord Shiva)
mahadeb

Mask Image M
mask1

 

with the following python code creates the output image I shown below


A = imread('../images/madurga1.jpg')/255
B = imread('../images/mahadeb.jpg')/255
M = imread('../images/mask1.jpg')/255
# get the Gaussian and Laplacian pyramids, implement the functions
pyramidA = get_laplacian_pyramid(get_gaussian_pyramid(A))
pyramidB = get_laplacian_pyramid(get_gaussian_pyramid(B))
pyramidM = get_gaussian_pyramid(M)

pyramidC = []
for i in range(len(pyramidM)):
   im = pyramidM[i]*pyramidA[i] + (1-pyramidM[i])*pyramidB[i]
   pyramidC.append(im)
# implement the following function to construct an image from its laplacian pyramids
I = reconstruct_image_from_laplacian_pyramid(pyramidC)

Output Image I (Ardhanarishwar)
ardhanariswara_output.pngThe following animation shows how the output image is formed:

ardhanariswar

 

ardhanariswar1

Another blending (horror!) example (from prof. dmartin)

horrorhorror1

 

Detection of a Human Object with HOG Descriptor Features using SVM (Primal QuadProg implementation using CVXOPT) in Python

In this article, first how to extract the HOG descriptor from an image will be discuss. Then how a support vector machine binary classifier can be trained on a dataset containing labeled images (using the extracted HOG descriptor features) and later how the SVM model can be used (along with a sliding window) to predict whether or not a human object exists in a test image will be described.  How SVM can be represented as a Primal Quadratic Programming problem and can be solved with CVXOPT that will also be discussed. This problem appeared as an assignment problem in this Computer Vision course from UCF.

Problem 1: Compute HOG features

Let’s first Implement Histogram of Orientated Gradients (HOG). The dataset to be used is the INRIA Person Dataset from here. The dataset consists of positive and negative examples for training as well as testing images. Let us do the following:

i. Take 2003 positive training images of size 96×160
ii. Take 997 negative training images of size 96×160
iii. Compute HOG for positive and negative examples.
iv. Show the visualization of HOG for some positive and negative examples.

The Histograms of Oriented Gradients for Human Detection (HOG) is a very heavily cited paper by N. Dalal and B. Triggs from CVPR 2005. The following figure shows the  algorithm proposed by them can be used to compute the HOG features for a 96×160 image:

f1.png

The next python code snippet shows some helper functions to compute the hog features:


import numpy as np
from scipy import signal
import scipy.misc

def s_x(img):
    kernel = np.array([[-1, 0, 1]])
    imgx = signal.convolve2d(img, kernel, boundary='symm', mode='same')
    return imgx
def s_y(img):
    kernel = np.array([[-1, 0, 1]]).T
    imgy = signal.convolve2d(img, kernel, boundary='symm', mode='same')
    return imgy

def grad(img):
    imgx = s_x(img)
    imgy = s_y(img)
    s = np.sqrt(imgx**2 + imgy**2)
    theta = np.arctan2(imgx, imgy) #imgy, imgx)
    theta[theta<0] = np.pi + theta[theta<0]
    return (s, theta)

 

The following figures animations show some positive and negative training examples along with the HOG features computed using the algorithm.

 

Positive Example 1

grad_phog_p
The next animation shows how the HOG features are computed using the above algorithm.

p.gif

Positive Example 2

grad_p1
hog_p1

The next animation shows how the HOG features are computed using the above algorithm.

p1

p1_.gif

Positive Example 3

grad_me
hog_me

The next animation shows how the HOG features are computed using the above algorithm.
me

Negative Example 1

grad_c.png
hog_c.png

The next animation shows how the HOG features are computed using the above algorithm.

c

Problem 2: Use sklearn’s SVC and 80-20 validation to compute accuracy on the held-out training images dataset using the extracted HOG features.

Before implementing SVC on our own with primal quadratic programming solver, let’s use the scikit-learn SVC implementation (with linear kernel) to train a support vector classifier on the training positive and negative examples using the HOG features extracted  from the training images with 80-20 validation and compute accuracy of classification on the held-out images.

The following python code does exactly that, with the X matrix containing the 1620 HOG features extracted from each image and the corresponding label (pos/neg, depending on whether human is present or not), with 98.5% accuracy on the held-out dataset.


import time
from sklearn.metrics import accuracy_score
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.8, random_state=123)
timestamp1 = time.time()
clf = SVC(C=1, kernel='linear')
clf.fit(Xtrain, ytrain)
print("%d support vectors out of %d points" % (len(clf.support_vectors_), len(Xtrain)))
timestamp2 = time.time()
print "sklearn LinearSVC took %.2f seconds" % (timestamp2 - timestamp1)
ypred = clf.predict(Xtest)
print('accuracy', accuracy_score(ytest, ypred))

430 support vectors out of 2400 points
sklearn LinearSVC took 3.40 seconds
accuracy 0.985

The next figures show the confusion matrices for the prediction on the held-out dataset with the SVC model learnt.

f2

f3

 

Problem 3: Implement SVM by solving the Primal form of the problem using Quadratic Programming

Let’s implement Support Vector Machine (SVM) using Quadratic Programming. We shall use python’s CVXOPT package for this purpose. Let’s do the following:

i. Try to understand each input term in cvxopt.solvers.qp.
ii. Formulate soft- margin primal SVM in term of inputs of cvxopt.solvers.qp
iii. Show ‘P’, ‘Q’, ‘G”, ‘h’, ‘A’ and ‘b’ Matrices.
iv. Obtain parameter vector ‘w’ and bias term ‘b’ using cvxopt.solvers.qp

 

To be done

 

Problem 4: Detect Human in testing images using trained model (‘w’, ‘b’) from the last problem

Let’s use the coefficients learnt by the SVM model from the training dataset and do the following:

i. Take at least 5 testing images from Test/pos.
ii. Test the trained model over testing images. Testing can be performed using
w*feature vector + b.
iii. Use sliding window approach to obtain detection at each location in the image.
iv. Perform non-maximal suppression and choose the highest scored location.
v. Display the bounding box at the final detection.

 

To be done

 

 

 

Implementing Lucas-Kanade Optical Flow algorithm in Python

In this article an implementation of the Lucas-Kanade optical flow algorithm is going to be described. This problem appeared as an assignment in this computer vision course from UCSD. The inputs will be sequences of images (subsequent frames from a video) and the algorithm will output an optical flow field (u, v) and trace the motion of the moving objects. The problem description is taken from the assignment itself.

 


Problem Statement



Single-Scale Optical Flow

  • Let’s implement the single-scale Lucas-Kanade optical flow algorithm. This involves finding the motion (u, v) that minimizes the sum-squared error of the brightness constancy equations for each pixel in a window.  The algorithm will be implemented as a function with the following inputs:

     def optical_flow(I1, I2, window_size, tau) # returns (u, v)

  • Here, u and v are the x and y components of the optical flow, I1 and I2 are two images taken at times t = 1 and t = 2 respectively, and window_size is a 1 × 2 vector storing the width and height of the window used during flow computation.
  • In addition to these inputs, a theshold τ should be added, such that if τ is larger than the smallest eigenvalue of A’A, then the the optical flow at that position should not be computed. Recall that the optical flow is only valid in regions where

f18.png
has rank 2, which is what the threshold is checking. A typical value for τ is 0.01.

  • We should try experimenting with different window sizes and find out the tradeoffs associated with using a small vs. a large window size.
  • The following figure describes the algorithm, which considers a nxn (n>=3) window around each pixel and solves a least-square problem to find the best flow vectors for the pixel.

f19.png

  • The following code-snippet shows how the algorithm is implemented in python for a gray-level image.
import numpy as np
from scipy import signal
def optical_flow(I1g, I2g, window_size, tau=1e-2):

    kernel_x = np.array([[-1., 1.], [-1., 1.]])
    kernel_y = np.array([[-1., -1.], [1., 1.]])
    kernel_t = np.array([[1., 1.], [1., 1.]])#*.25
    w = window_size/2 # window_size is odd, all the pixels with offset in between [-w, w] are inside the window
    I1g = I1g / 255. # normalize pixels
    I2g = I2g / 255. # normalize pixels
    # Implement Lucas Kanade
    # for each point, calculate I_x, I_y, I_t
    mode = 'same'
    fx = signal.convolve2d(I1g, kernel_x, boundary='symm', mode=mode)
    fy = signal.convolve2d(I1g, kernel_y, boundary='symm', mode=mode)
    ft = signal.convolve2d(I2g, kernel_t, boundary='symm', mode=mode) +
         signal.convolve2d(I1g, -kernel_t, boundary='symm', mode=mode)
    u = np.zeros(I1g.shape)
    v = np.zeros(I1g.shape)
    # within window window_size * window_size
    for i in range(w, I1g.shape[0]-w):
        for j in range(w, I1g.shape[1]-w):
            Ix = fx[i-w:i+w+1, j-w:j+w+1].flatten()
            Iy = fy[i-w:i+w+1, j-w:j+w+1].flatten()
            It = ft[i-w:i+w+1, j-w:j+w+1].flatten()
            #b = ... # get b here
            #A = ... # get A here
            # if threshold τ is larger than the smallest eigenvalue of A'A:
            nu = ... # get velocity here
            u[i,j]=nu[0]
            v[i,j]=nu[1]

    return (u,v)

 


Some Results


  • The following figures and animations show the results of the algorithm on a few image sequences. Some of these input image sequences / videos are from the course and some are collected from the internet.
  • As can be seen, the algorithm performs best if the motion of the moving object(s) in between consecutive frames is slow. To the contrary, if the motion is large, the algorithm fails and we should implement / use multiple-scale version Lucas-Kanade with image pyramids.
  • Finally,  with small window size,  the algorithm captures subtle motions but not large motions. With large size it happens the other way.


Input Sequences

sphere

shpere_cmap_15

Output Optical Flow with different window sizes

window size = 15

shpere_opt_15

window size = 21

shpere_opt_21

 



Input Sequences
rubic

Output Optical Flow
rubic_opt

rubic_cmap



Input Sequences (hamburg taxi)
taxi

taxi_cmap

Output Optical Flowtaxi_opt

 


Input Sequences
box

box_cmap

Output Optical Flow
box_opt


Input Sequences
seq

seq_cmap

Output Optical Flowseq_opt


Input Sequences    fount3.gif

fount_cmap

Output Optical Flowfount_opt


Input Sequences
corridor

Output Optical Flow
corridor_optc


Input Sequencessynth

synth'_cmap
Output Optical Flowsynth_opt


Input Sequencescars1
Output Optical Flowcars1_optcars1_cmap


Input Sequencescars2

Output Optical Flowcars2_opt

Output Optical Flowcars2_opt2cars2_cmap



Input Sequences

carsh.gif

cars3_cmap

Output Optical Flow with window size 45
cars3_opt.gif

Output Optical Flow with window size 10

cars3_opt2_10
Output Optical Flow with window size 25
cars3_opt2_25
Output Optical Flow with window size 45cars3_opt2_45


 

Efficient Graph-Based Image Segmentation in Python

In this article, an implementation of an efficient graph-based image segmentation technique will be described, this algorithm was proposed by Felzenszwalb et. al. from MIT in this paper.  The slides on this paper can be found from this link from the Stanford Vision Lab too. The algorithm is closely related to Kruskal’s algorithm for constructing a minimum spanning tree of a graph, as stated by the author and hence can  be implemented to run in O(m log m) time, where m is the number of edges in the graph.


Problem Definition and the basic idea (from the paper)



  • Let G = (V, E) be an undirected graph with vertices vi ∈ V, the set of elements to be segmented, and edges (vi, vj ) ∈ E corresponding to pairs of neighboring vertices. Each edge (vi, vj ) ∈ E has a corresponding weight w((vi, vj)), which is a non-negative measure of the dissimilarity between neighboring elements vi and vj.

  • In the case of image segmentation, the elements in V are pixels and the weight of an edge is some measure of the dissimilarity between the two pixels connected by that edge (e.g., the difference in intensity, color, motion, location or some other local attribute).


  • Particularly for the implementation described here, an edge weight function based on the absolute intensity difference (in the yiq space) between the pixels connected by an edge, w((vi, vj )) = |I(pi) − I(pj )|.


  • In the graph-based approach, a segmentation S is a partition of V into components
    such that each component (or region) C ∈ S corresponds to a connected component
    in a graph G0 = (V, E0), where E0 ⊆ E.


  • In other words, any segmentation is induced by a subset of the edges in E. There are different ways to measure the quality of a segmentation but in general we want the elements in a component to be similar, and elements in different components to be dissimilar.


  • This means that edges between two vertices in the same component should have relatively low weights, and edges between vertices in different components should have higher weights.



  • The next figure shows the steps in the algorithm. The algorithm is very similar to Kruskal’s algorithm for computing the MST for an undirected graph.


f17.png


  • The threshold function τ controls the degree to which the difference between two
    components must be greater than their internal differences in order for there to be
    evidence of a boundary between them.


  • For small components, Int(C) is not a good estimate of the local characteristics of the data. In the extreme case, when |C| = 1, Int(C) = 0. Therefore, a threshold function based on the size of the component, τ (C) = k/|C| is needed to be usedwhere |C| denotes the size of C, and k is some constant parameter.


  • That is, for small components we require stronger evidence for a boundary. In practice k sets a scale of observation, in that a larger k causes a preference for larger components.


  • In general, a Gaussian filter is used to smooth the image slightly before computing the edge weights, in order to compensate for digitization artifacts. We always use a Gaussian with σ = 0.8, which does not produce any visible change to the image but helps remove artifacts.


  • The following python code shows how to create the graph.


import numpy as np
from scipy import signal
import matplotlib.image as mpimg

def gaussian_kernel(k, s = 0.5):
    # generate a (2k+1)x(2k+1) gaussian kernel with mean=0 and sigma = s
    probs = [exp(-z*z/(2*s*s))/sqrt(2*pi*s*s) for z in range(-k,k+1)]
return np.outer(probs, probs)

def create_graph(imfile, k=1., sigma=0.8, sz=1):
    # create the pixel graph with edge weights as dissimilarities
     rgb = mpimg.imread(imfile)[:,:,:3]
     gauss_kernel = gaussian_kernel(sz, sigma)
     for i in range(3):
         rgb[:,:,i] = signal.convolve2d(rgb[:,:,i], gauss_kernel, boundary='symm', mode='same')
     yuv = rgb2yiq(rgb)
     (w, h) = yuv.shape[:2]
     edges = {}
     for i in range(yuv.shape[0]):
         for j in range(yuv.shape[1]):
             #compute edge weight for nbd pixel nodes for the node i,j
             for i1 in range(i-1, i+2):
                 for j1 in range(j-1, j+2):
                     if i1 == i and j1 == j: continue

                     if i1 >= 0 and i1 = 0 and j1 < h:
                        wt = np.abs(yuv[i,j,0]-yuv[i1,j1,0])
                        n1, n2 = ij2id(i,j,w,h), ij2id(i1,j1,w,h)
                        edges[n1, n2] = edges[n2, n1] = wt
     return edges

 


Some Results


  • The images are taken from the paper itself or from the internet. The following figures and animations show the result of segmentation as a result of iterative merging of the components (by choosing least weight edges), depending on the internal difference of the components.

  • Although in the paper the author described the best value of the parameter k to be around 300, but  since in this implementation the pixel RGB values are normalized (to have values in between 0 – 1) and then converted to YIQ values and the YIQ intensities are used for computing the weights (which are typically very small), the value of k that works best in this scenario is 0.001-0.01.

  • As we can see from the below results, higher the value of the parameter k, larger the size of the final component and lesser the number of components in the result.

  • The minimum spanning tree creation is also shown, the red edges shown in the figures are the edges chosen by the algorithm to merge the components.


Input Image



            player.png

Output Images for two different values of the parameter k

player_k_0.001.gif

player_k_0.01.gif

out_ 0.001player.png

out_ 0.010player.png


The forest created after a few iterations


mst1


Input Image


hill

Output Images for two different values of the parameter k

hill_k_0.01

out_ 0.010hill.png

hill_k_0.001

out_ 0.001hill.png


The forest created after a few iterations


mst3.png


Input Image


parrot

Output Segmented Images

parrot_k_0.001out_ 0.001parrot_Dark2.pngout_ 0.001parrot_hot.png



Input Image


road2

Segmented Output Images

road2_k_0.01
out_ 0.001road2_Set1.png



Input Image


road


Output Images for two different values of the parameter k


road_k_0.001

out_ 0.001road.png
road_k_0.01out_ 0.010road.png


The forest created after a few iterations


mst2.png

 


  Input Image (UMBC)


             umbc.png


Segmented Output Images


umbc_k_0.001out_ 0.001umbc.png


    Input Image


nj.png



Segmented Output Image with k=0.001



out_ 0.001nj.png


    Input Image


hand


Segmented Output Image with k=0.001


ring_k_0.001



    Input Image


flowers2

Segmented Output Image with k=0.001

out_ 0.001flowers2.png


    Input Image (liver)


liver

Segmented Output Images 

out_ 0.001liver.pngout_ 0.010liver.png



    Input Image


building

Segmented Output Images with different values of k

building_k_0.01

out_ 0.001building.pngout_ 0.010building.png


    Input Image


frame056

Segmented Output Image

out_ 0.001frame056.png


    Input Image


vic

Segmented Output Images for different k

vic


 

Interactive Image Segmentation with Graph-Cut in Python

In this article, interactive image segmentation with graph-cut is going to be discussed. and it will be used to segment the source object from the background in an image. This segmentation technique was proposed by Boycov and Jolli in this paper. This problem appeared as a homework assignment here., and also in this lecture video from the Coursera image processing course by Duke university.

Problem Statement: Interactive graph-cut segmentation

Let’s implement “intelligent paint” interactive segmentation tool using graph cuts algorithm on a weighted image grid. Our task will be to separate the foreground object from the background in an image.

Since it can be difficult sometimes to automatically define what’s foreground and what’s background for an image, the user is going to help us with a few interactive scribble lines using which our algorithm is going to identify the foreground and the background, after that it will be the algorithms job to obtain a complete segmentation of the foreground from the background image.

The following figures show how an input image gets scribbling from a user with two different colors (which is also going to be input to our algorithm) and the ideal segmented image output.

Scribbled Input Image                                       Expected Segmented Output Image       bj

 


The Graph-Cut Algorithm


 

The following describes how the segmentation problem is transformed into a graph-cut problem:


  1. Let’s first define the Directed Graph G = (V, E) as follows:

    1. Each of the pixels in the image is going to be a vertex in the graph. There will be another couple of special terminal vertices: a source vertex (corresponds to the foreground object) and a sink vertex (corresponds to the background object in the image). Hence, |V(G)| = width x height + 2.

    2. Next, let’s defines the edges of the graph. As obvious, there is going to be two types of edges: terminal (edges that connect the terminal nodes to the non-terminal nodes) and non-terminal (edges that connect the non-terminal nodes only).



    3. There will be a directed edge from the terminal node source to each of non-terminal nodes in the graph. Similarly,  a directed edge will be there from each non-terminal node (pixel) to the other terminal node sink. These are going to be all the terminal edges and hence, |E_T(G)| =  2 x width x height.


    4. Each of the non-terminal nodes (pixels) are going to be connected by edges with the nodes corresponding to the neighboring pixels (defined by 4 or 8 neighborhood of a pixel).  Hence, |E_N(G)| =  |Nbd| x width x height.



  2. Now let’s describe how to compute the edge weights in this graph.



    1. In order to compute the terminal edge weights, we need to estimate the feature distributions first, i.e., starting with the assumption that each of the nodes corresponding to the scribbled pixels have the probability 1.0 (since we want the solution to respect the regional hard constraints marked by the user-seeds / scribbles) to be in foreground or background object in the image (distinguished by the scribble color, e.g.), we have to compute the probability that a node belongs to the foreground (or background) for all the other non-terminal nodes.

    2. The simplest way to compute P_F and P_B is to first fit a couple of  Gaussian distributions on the scribbles by computing the parameters (μ, ∑)
      with MLE from the scribbled pixel intensities and then computing the (class-conditional) probabilities from the individual pdfs (followed by a normalization) for each of the pixels as shown in the next figures. The following code fragment show how the pdfs are being computed.


      import numpy as np
      from collections import defaultdict
      
      def compute_pdfs(imfile, imfile_scrib):
          '''
          # Compute foreground and background pdfs
          # input image and the image with user scribbles
          '''
           rgb = mpimg.imread(imfile)[:,:,:3]
           yuv = rgb2yiq(rgb)
           rgb_s = mpimg.imread(imfile_scrib)[:,:,:3]
           yuv_s = rgb2yiq(rgb_s)
           # find the scribble pixels
           scribbles = find_marked_locations(rgb, rgb_s)
           imageo = np.zeros(yuv.shape)
           # separately store background and foreground scribble pixels in the dictionary comps
           comps = defaultdict(lambda:np.array([]).reshape(0,3))
           for (i, j) in scribbles:
                imageo[i,j,:] = rgbs[i,j,:]
                # scribble color as key of comps
                comps[tuple(imageo[i,j,:])] = np.vstack([comps[tuple(imageo[i,j,:])], yuv[i,j,:]])
                mu, Sigma = {}, {}
           # compute MLE parameters for Gaussians
           for c in comps:
                mu[c] = np.mean(comps[c], axis=0)
                Sigma[c] = np.cov(comps[c].T)
           return (mu, Sigma)
      


      3. In order to compute the non-terminal edge weights, we need to compute the similarities in between a pixel node and the nodes corresponding to its neighborhood pixels, e.g., with the formula shown in the next figures (e.g., how similar neighborhood pixels are in RGB / YIQ space).


  3. Now that the underlying graph is defined, the segmentation of the foreground from the background image boils down to computing the min-cut in the graph or equivalently computing the max-flow (the dual problem) from the source to sink.


  4. The intuition is that the min-cut solution will keep the pixels with high probabilities to belong to the side of the source (foreground) node and likewise the background pixels on the other side of the cut near the sink (background) node, since it’s going to respect the (relatively) high-weight edges (by not going through the highly-similar pixels).


  5. There are several standard algorithms, e.g., Ford-Fulkerson (by finding an augmenting path with O(E max| f |) time complexity) or Edmonds-Karp (by using bfs to find the shortest augmenting path, with O(VE2) time complexity) to solve the max-flow problem, typical implementations of these algorithms run pretty fast, in polynomial time in V, E.  Here we are going to use a different implementation (with pymaxflow) based on Vladimir Kolmogorov, which is shown to run faster on many images empirically in this paper.


 

f13

f15f14

f16


Results


 

The following figures / animations show the interactive-segmentation results (computed probability densities, subset of the flow-graph & min-cut, final segmented image) on a few images,  some of them taken from the above-mentioned courses / videos, some of them taken from Berkeley Vision dataset.


Input Imagedeer

Input Image with Scribblesdeer_scribed

Fitted Densities from Color Scribblesgaussian_deer.png

maxflow_deerdeer_seg

A Tiny Sub-graph with Min-Cut

flow_deer.png


Input Image
eagle

Input Image with Scribbleseagle_scribed

Fitted Densities from Color Scribblesgaussian_eagle

maxflow_eagle

eagle.gif

A Tiny Sub-graph with Min-Cut flow_eagle


Input Image (liver)                                             
liver

Input Image with Scribblesliver_scribed

Fitted Densities from Color Scribbles
gaussian_liver

maxflow_liver



Input Imagefishes2

Input Image with Scribbles
fishes2_scribed

Fitted Densities from Color Scribblesgaussian_fishes2

maxflow_fishes2

A Tiny Sub-graph of the flow-graph with Min-Cut
flow_fishes2


Input Image
panda

Input Image with Scribblespanda_scribed

maxflow_panda


Input Imagebunny

Input Image with Scribblesbunny_scribed

Fitted Densities from Color Scribbles
gaussian_bunny

maxflow_bunny

seg_bunny

A Tiny Sub-graph of the flow-graph with Min-Cut
flow


Input Image
flowers

Input Image with Scribbles
flowers_scribedmaxflow_flowers

A Tiny Sub-graph of the flow-graph with Min-Cut

flow


Input Image                                                         Input Image with Scribbles
me3    me3_scribed

maxflow_me3


Input Imageinput1

Input Image with Scribblesinput1_scribed

maxflow_input1

A Tiny Sub-graph of the flow-graph with Min-Cut

flow.png


Input Imagerose

Input Image with Scribbles
rose_scribed

Fitted Densities from Color Scribbles
gaussian_rose

maxflow_rose



Input Image
whale

Input Image with Scribbleswhale_scribed

Fitted Densities from Color Scribblesgaussian_whale

A Tiny Sub-graph of the flow-graph with Min-Cut

flow_whalemaxflow_whalebgc_whale


Input Image (UMBC Campus Map)umbc_campus

Input Image with Scribbles
umbc_campus_scribed

maxflow_umbc_campus


Input Imagehorses

Input Image with Scribbles
horses_scribed

maxflow_horses

A Tiny Sub-graph of the flow-graph with Min-Cut (with blue foreground nodes)

flow_horses


Changing the background of an image (obtained using graph-cut segmentation) with another image’s background with cut & paste


The following figures / animation show how the background of a given image can be replaced by a new image using cut & paste (by replacing the corresponding pixels in the new image corresponding to foreground), once the foreground in the original image gets identified after segmentation.

Original Input Imagehorses

New Backgroundsky.png

bg_1.0000horses.png

bgc_horses


 

Image Colorization Using Optimization in Python

 

This article is inspired by this SIGGRAPH paper by Levin et. al, for which they took this patent , the paper was referred to in the course CS1114 from Cornell.  This method is also discussed in the coursera online image processing course by NorthWestern University. Some part of the problem description is taken from the paper itself. Also, one can refer to the implementation provided by the authors in matlab, the following link and the following python implementation in github.

 

The Problem

Colorization is a computer-assisted process of adding color to a monochrome image or movie. In the paper the authors presented an optimization-based colorization method that is based on a simple premise: neighboring pixels in space-time that have similar intensities should have similar colors.

This premise is formulated using a quadratic cost function  and as an optimization problem. In this approach an artist only needs to annotate the image with a few color scribbles, and the indicated colors are automatically propagated in both space and time to produce a fully colorized image or sequence.

In this article the optimization problem formulation and the way to solve it to obtain the automatically colored image will be described for the images only.

 

The Algorithm

YUV/YIQ color space is chosen to work in, this is commonly used in video, where Y is the monochromatic luminance channel, which we will refer to simply as intensity, while U and V are the chrominance channels, encoding the color.

The algorithm is given as input an intensity volume Y(x,y,t) and outputs two color volumes U(x,y,t) and V(x,y,t). To simplify notation the boldface letters are used (e.g. r,s) to denote \left(x,y,t \right) triplets. Thus, Y(r) is the intensity of a particular pixel.

Now, One needs to impose the constraint that two neighboring pixels r,s should have similar colors if their intensities are similar. Thus, the problem is formulated as the following optimization problem that aims to minimize the difference between the  colorU(r) at pixel r and the weighted average of the colors at neighboring pixels, where w(r,s) is a weighting function that sums to one, large when Y(r) is similar to Y(s), and small when the two intensities are different. 

f12

When the intensity is constant the color should be constant, and when the intensity is an edge the color should also be an edge. Since the cost functions are quadratic and
the constraints are linear, this optimization problem yields a large, sparse system of linear equations, which may be solved using a number of standard methods.

As discussed in the paper, this algorithm is closely related to algorithms proposed for other tasks in image processing. In image segmentation algorithms based on normalized cuts [Shi and Malik 1997], one attempts to find the second smallest eigenvector of the matrix D − W where W is a npixels×npixels matrix whose elements are the pairwise affinities between pixels (i.e., the r,s entry of the matrix is w(r,s)) and D is a diagonal matrix whose diagonal elements are the sum of the affinities (in this case it is always 1). The second smallest eigenvector of any symmetric matrix A is a unit norm vector x that minimizes x^{T}Ax and is orthogonal to the first eigenvector. By direct  inspection, the quadratic form minimized by normalized cuts is exactly the cost function J, that is x^{T}(D-W)x =J(x). Thus, this algorithm minimizes the same cost function but under different constraints. In image denoising algorithms based on anisotropic diffusion [Perona and Malik 1989; Tang et al. 2001] one often minimizes a  function
similar to equation 1, but the function is applied to the image intensity as well.

The following figures show an original gray-scale image and the marked image with color-scribbles that are going to be used to compute the output colored image.

Original Gray-scale Image Input
baby

Gray-scale image Input Marked with Color-Scribbles 
baby_marked

Implementation of the Algorithm

Here are the the steps for the algorithm:

  1. Convert both the original gray-scale image and the marked image (marked with color scribbles for a few pixels by the artist) to from RGB to YUV / YIQ color space.
  2. Compute the difference image from the marked and the gray-scale image. The pixels that differ  are going to be pinned and they will appear in the output, they are directly going to be used as stand-alone constraints in the minimization problem.
  3. We need to compute the weight matrix W that depends on the similarities in the neighbor intensities for a pixel from the original gray-scale image.
  4. The optimization problem finally boils down to solving the system of linear equations of the form WU = b that has a closed form least-square solution
    U = W^{+}b = {(W^{T}W)}^{-1}W^{T}b. Same thing is to be repeated for the V channel too.
  5. However the W matrix is going to be very huge and sparse, hence sparse-matrix based implementations must be used to obtain an efficient solution. However, in this python implementation in github, the scipy sparse lil_matrix was used when constructing the sparse matrices, which is quite slow, we can construct more efficient scipy csc matrix rightaway, by using a dictionary to store the weights initially. It is much faster. The python code in the next figure shows my implementation for computing the weight matrix W.
  6. Once W is computed it’s just a matter of obtaining the least-square solution, by computing the pseudo-inverse, which can be more efficiently computed with LU factorization and a sparse LU solver, as in this  python implementation in github.
  7. Once the solution of the optimization problem is obtained in YUV / YIQ space, it needs to be converted back to RGB. The following formula is used for conversion.rgbyiq.png

import scipy.sparse as sp
from collections import defaultdict

def compute_W(Y, colored):

    (w, h) = Y.shape
    W = defaultdict()
    for i in range(w):
        for j in range(h):
            if not (i, j) in colored: # pixel i,j in color scribble
            (N, sigma) = get_nbrs(Y, i, j, w, h) #r = (i, j)
            Z = 0.
            id1 = ij2id(i,j,w,h) # linearized id
            for (i1, j1) in N: #s = (i1, j1)
                id2 = ij2id(i1,j1,w,h)
                W[id1,id2] = np.exp(-(Y[i,j]-Y[i1,j1])**2/(sigma**2)) if sigma > 0 else 0.
                Z += W[id1,id2]
            if Z > 0:
               for (i1, j1) in N: #s = (i1, j1)
                   id2 = ij2id(i1,j1,w,h)
                   W[id1,id2] /= -Z
     for i in range(w):
        for j in range(h):
            id = ij2id(i,j,w,h)
            W[id,id] = 1.

    rows, cols = zip(*(W.keys())) #keys
    data = W.values() #[W[k] for k in keys]
    return sp.csc_matrix((data, (rows, cols)), shape=(w*h, w*h)) #W

 

Results

The following images and animations show the results obtained with the optimization algorithm. Most of the following images are taken from the paper itself.

Original Gray-scale Image Input                 Gray-scale image Input Markedbaby  baby_marked

Color image Output
baby_col.png


The next animations show how the incremental scribbling results in better and better color images.

Original Gray-scale Image Input 

monaco_gray

As can be seen from the following animation, the different parts of the building get colored as more and more color-hints are scribbled / annotated.

Gray-scale image Input Markedhouse_marked

Color image Output

house_col


Original Gray-scale Image Input 

waterfall_gray

Gray-scale image Input Marked
water_marked

Color image Output
water_col


Original Gray-scale Image Input (me)me3_gray

Gray-scale image Input Marked incrementally
me_marked

Color image Output
me_colored


Original Gray-scale Image Input
roses_gray

Gray-scale image Input Marked
roses_gray_marked
Color image Output
roses_col


Original Gray-scale Image Input

madurga_gray

Gray-scale image Input Marked

madurga_gray_marked

Color image Output

madurga_gray_col


 

Recursive Graphics, Bi/Tri-linear Interpolation, Anti-aliasing and Image Transformation in Python

The following problem appeared in an assignment in the Princeton course COS 126 . The problem description is taken from the course itself.

Recursive Graphics

Write a program that plots a Sierpinski triangle, as illustrated below. Then develop a program that plots a recursive patterns of your own design.

sierpinski9.png

Part 1. 

The Sierpinski triangle is an example of a fractal pattern. The pattern was described by Polish mathematician Waclaw Sierpinski in 1915, but has appeared in Italian art since the 13th century. Though the Sierpinski triangle looks complex, it can be generated with a short recursive program.

Examples. Below are the target Sierpinski triangles for different values of order N.

sierpinski2.png

Our task is to implement a recursive function sierpinski(). We need to think recursively: our function should draw one black triangle (pointed downwards) and then call itself recursively 3 times (with an appropriate stopping condition). When writing our program, we should exercise modular design.

The following code shows an implementation:

class Sierpinski:
#Height of an equilateral triangle whose sides are of the specified length.
def height (self, length):
return sqrt(3) * length / 2.
#Draws a filled equilateral triangle whose bottom vertex is (x, y)
#of the specified side length.
def filledTriangle(self, x, y, length):
h = self.height(length)
draw(np.array([x, x+length/2., x-length/2.]), np.array([y, y+h, y+h]), alpha=1)
#Draws an empty equilateral triangle whose bottom vertex is (x, y)
#of the specified side length.
def emptyTriangle(self, x, y, length):
h = self.height(length)
draw(np.array([x, x+length, x-length]), np.array([y+2*h, y, y]), alpha=0)
# Draws a Sierpinski triangle of order n, such that the largest filled
# triangle has bottom vertex (x, y) and sides of the specified length.
def sierpinski(self, n, x, y, length):
self.filledTriangle(x, y, length)
if n > 1:
self.sierpinski(n-1, x-length/2., y, length/2.)
self.sierpinski(n-1, x+length/2., y, length/2.)
self.sierpinski(n-1, x, y+self.height(length), length/2.)

The following animation shows how such a triangle of order 5 is drawn recursively.
sier.gif

The following animation shows how such a triangle of order 6 is drawn recursively.

sier6.gif

A diversion: fractal dimension.

Formally, we can define the Hausdorff dimension or similarity dimension of a self-similar figure by partitioning the figure into a number of self-similar pieces of smaller size. We define the dimension to be the log (# self similar pieces) / log (scaling factor in each spatial direction). For example, we can decompose the unit square into 4 smaller squares, each of side length 1/2; or we can decompose it into 25 squares, each of side length 1/5. Here, the number of self-similar pieces is 4 (or 25) and the scaling factor is 2 (or 5). Thus, the dimension of a square is 2 since log (4) / log(2) = log (25) / log (5) = 2. We can decompose the unit cube into 8 cubes, each of side length 1/2; or we can decompose it into 125 cubes, each of side length 1/5. Therefore, the dimension of a cube is log(8) / log (2) = log(125) / log(5) = 3.

We can also apply this definition directly to the (set of white points in) Sierpinski triangle. We can decompose the unit Sierpinski triangle into 3 Sierpinski triangles, each of side length 1/2. Thus, the dimension of a Sierpinski triangle is log (3) / log (2) ≈ 1.585. Its dimension is fractional—more than a line segment, but less than a square! With Euclidean geometry, the dimension is always an integer; with fractal geometry, it can be something in between. Fractals are similar to many physical objects; for example, the coastline of Britain resembles a fractal, and its fractal dimension has been measured to be approximately 1.25.

 

Part 2.

Drawing a tree recursively, as described here:

treed.png

The following shows how a tree of order 10 is drawn:

tree.gif


 

The next problem appeared in an assignment in the Cornell course CS1114 . The problem description is taken from the course itself.

Bilinear Interpolation

Let’s consider a 2D matrix of values at integer grid locations (e.g., a grayscale image). To interpolate values on a 2D grid, we can use the 2D analogue of linear interpolation: bilinear interpolation. In this case, there are four neighbors for each possible point we’d like to interpolation, and the intensity values of these four neighbors are all combined to compute the interpolated intensity, as shown in the next figure.

f6.png

In the figure, the Q values represent intensities. To combine these intensities, we perform linear interpolation in multiple directions: we first interpolate in the x direction (to get the value at the blue points), then in the y direction (to get the value at the green points).

Image transformations

Next, we’ll use the interpolation function to help us implement image transformations.

A 2D affine transformation can be represented with a 3 ×3 matrix T:

f4.png

Recall that the reason why this matrix is 3×3, rather than 2 ×2, is that we operate in homogeneous coordinates; that is, we add an extra 1 on the end of our 2D coordinates (i.e., (x,y) becomes (x,y,1)), in order to represent translations with a matrix multiplication. To apply a transformation T to a pixel, we multiply T by the pixel’s location:

f5

The following figure shows a few such transformation matrices:

f8.png

To apply a transformation T to an entire image I, we could apply the transformation to each of I’s pixels to map them to the output image. However, this forward warping procedure has several problems. Instead, we’ll use inverse mapping to warp the pixels of the output image back to the input image. Because this won’t necessarily hit an integer-valued location, we’ll need to use the (bi-linear) interpolation to determine the intensity of the input image at the desired location, as shown in the next figure.

f7.png

To demo the transformation function, let’s implement the following on a gray scale bird image:

1. Horizontal flipping

out_flip

2. Scaling by a factor of 0.5

out_scale
3. Rotation by 45 degrees around the center of the image

out_rot

The next animations show rotation and sheer transformations with the Lena image:

rot.gif

wave1_.gif
sheer.gif

Next, let’s implement a function to transform RGB images. To do this, we need to simply call transform image three times, once for each channel, then put the results together into a single image.  Next figures and animations show some results on an RGB image.

duck.gif


Some
non-linear transformations

The next figure shows the transform functions from here:

f9.png

The next figures and animations show the application of the above non-linear transforms on the Lena image.

Wave1
out_f1
wave1.gif
     Wave2
out_f2

wave2.gif           

Swirl
out_swirl

swirl.gif

me.gif

         Warp

out_warp

warp.gif

Some more non-linear transforms:

syncsync1

 

Anti-aliasing

There is a problem with our interpolation method above: it is not very good at shrinking images, due to aliasing. For instance, if let’s try to down-sample the following bricks image by a factor of 0.4, we get the image shown in the following figure: notice the strange banding effects in the output image.

Original Image

bricks_gray.png

Down-sampled Image with Bilinear Interpolation
out_scale_b.png

The problem is that a single pixel in the output image corresponds to about 2.8 pixels in the input image, but we are sampling the value of a single pixel—we should really be averaging over a small area.

To overcome this problem, we will create a data structure that will let us (approximately) average over any possible square regions of pixels in the input image: an image stack. An image stack is a 3D matrix that we can think of as, not surprisingly, a stack of images, one on top of the other. The top image in the cube will be the original input image. Images further down the stack will be the input image with progressively larger amounts of blur. The size of the matrix will be rows × cols × num levels, where the original (grayscale) image has size rows×cols and there are num levels images in the stack.

Before we use the stack, we must write a function to create it, which takes as input a (grayscale) image and a number of levels in the stack, and returns a 3D matrix stack corresponding to the stack. Again, the first image on the stack will be the original image. Every other image in the stack will be a blurred version of the previous image. A good blur kernel to use is:

f10.png

Now, for image k in the stack, we know that every pixel is a (weighted) average of some number of pixels (a k × k patch, roughly speaking) in the input image. Thus, if we down-sample the image by a factor of k, we want to sample pixels from level k of the stack.

⇒  Let’s write the following function to create image stack that takes a grayscale image and a number max levels, and returns an image stack.

from scipy import signal

def create_image_stack(img, max_level):
K = np.ones((3,3)) / 9.
image_stack = np.zeros((img.shape[0], img.shape[1], max_level))
image_stack[:,:,0] = img
for l in range(1, max_level):
image_stack[:,:,l] = signal.convolve2d(image_stack[:,:,l-1], K,
boundary=’symm’, mode=’same’)
return image_stack

The next animation shows the image stack created from the bricks image.

ims.gif

 

Trilinear Interpolation

Now, what happens if we down-sample the image by a fractional factor, such as 3.6? Unfortunately, there is no level 3.6 of the stack. Fortunately, we have a tool to solve this problem: interpolation. We now potentially need to sample a value at position (row,col,k) of the image stack, where all three coordinates are fractional. We therefore something more powerful than bilinear interpolation: trilinear interpolation! Each position we want to sample now has eight neighbors, and we’ll combine all of their values together in a weighted sum.

This sounds complicated, but we can write this in terms of our existing functions. In particular, we now interpolate separately along different dimensions: trilinear interpolation can be implemented with two calls to bilinear interpolation and one call to linear interpolation.

Let’s implement a function trilerp like the following that takes an image stack, and a row, column, and stack level k, and returns the interpolated value.

def trilerp (img_stack, x, y, k):

if k < 1: k = 1
if  k == int(k):
return bilerp(img_stack[:,:,k-1], x, y)
else:
f_k, c_k = int(floor(k)), int(ceil(k))
v_f_k = bilerp(img_stack[:,:,f_k-1], x, y)
v_c_k = bilerp(img_stack[:,:,c_k-1], x, y)
return linterp(k, f_k, c_k, v_f_k, v_c_k)

Now we can finally implement a transformation function that does proper anti-aliasing. In order to do this, let’s implement a function that will

  • First compute the image stack.
  • Then compute, for the transformation T, how much T is scaling down the image. If T is defined by the six values a,b,c,d,e,f above, then, to a first approximation, the downscale factor is:
    f11.png
    However, if k < 1 (corresponding to scaling up the image), we still want to sample from level 1. This situation reverts to normal bilinear interpolation.
  • Next call the trilerp function on the image stack, instead of bilerp on the input image.

The next figure shows the output image obtained image transformation with proper anti-aliasing:

Down-sampled Image with Anti-aliasing using Trilinear Interpolationout_scale_t.png

As we can see from the above output, the aliasing artifact has disappeared.

The same results are obtained on the color image, as shown below, by applying the trilerp function on the color channels separately and creating separate image stacks for different color channels.

Original Image
bricks_rgb

Down-sampled Image with Bilinear Interpolation
out_scale_b

Down-sampled Image with Anti-aliasing
out_scale_t

The following animation shows the branding artifacts created when using bilinear interpolation for  different scale factors and how they are removed with anti-aliasing.

Down-sampled Images with Bilinear Interpolation

a

Down-sampled Images with Anti-aliasing

aa

Deep Learning & Art: Neural Style Transfer – An Implementation with Tensorflow (using Transfer Learning with a Pre-trained VGG-19 Network) in Python

 

This problem appeared as an assignment in the online coursera course Convolution Neural Networks by Prof Andrew Ng, (deeplearing.ai).  The description of the problem is taken straightway from the assignment.

Neural Style Transfer algorithm was created by Gatys et al. (2015) , the paper can be found here .

In this assignment, we shall:

  • Implement the neural style transfer algorithm
  • Generate novel artistic images using our algorithm

Most of the algorithms we’ve studied optimize a cost function to get a set of parameter values. In Neural Style Transfer, we  shall optimize a cost function to get pixel values!

Problem Statement

Neural Style Transfer (NST) is one of the most fun techniques in deep learning. As seen below, it merges two images, namely,

  1. a “content” image (C) and
  2. a “style” image (S),

to create a “generated” image (G). The generated image G combines the “content” of the image C with the “style” of image S.

In this example, we are going to generate an image of the Louvre museum in Paris (content image C), mixed with a painting by Claude Monet, a leader of the impressionist movement (style image S).

louvre_generated.png

Let’s see how we can do this.

Transfer Learning

Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning.

Following the original NST paper, we shall use the VGG network. Specifically, we’ll use VGG-19, a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and thus has learned to recognize a variety of low level features (at the earlier layers) and high level features (at the deeper layers). The following figure (taken from the google image search results) shows how a VGG-19 convolution neural net looks like, without the last fully-connected (FC) layers.

vg-19

We run the following code to load parameters from the pre-trained VGG-19 model serialized in a matlab file. This takes a few seconds.

model = load_vgg_model(“imagenet-vgg-verydeep-19.mat”)
import pprint
pprint.pprint(model)

{‘avgpool1’: <tf.Tensor ‘AvgPool_5:0’ shape=(1, 150, 200, 64) dtype=float32>,
‘avgpool2’: <tf.Tensor ‘AvgPool_6:0’ shape=(1, 75, 100, 128) dtype=float32>,
‘avgpool3’: <tf.Tensor ‘AvgPool_7:0’ shape=(1, 38, 50, 256) dtype=float32>,
‘avgpool4’: <tf.Tensor ‘AvgPool_8:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘avgpool5’: <tf.Tensor ‘AvgPool_9:0’ shape=(1, 10, 13, 512) dtype=float32>,
‘conv1_1’: <tf.Tensor ‘Relu_16:0’ shape=(1, 300, 400, 64) dtype=float32>,
‘conv1_2’: <tf.Tensor ‘Relu_17:0’ shape=(1, 300, 400, 64) dtype=float32>,
‘conv2_1’: <tf.Tensor ‘Relu_18:0’ shape=(1, 150, 200, 128) dtype=float32>,
‘conv2_2’: <tf.Tensor ‘Relu_19:0’ shape=(1, 150, 200, 128) dtype=float32>,
‘conv3_1’: <tf.Tensor ‘Relu_20:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_2’: <tf.Tensor ‘Relu_21:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_3’: <tf.Tensor ‘Relu_22:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_4’: <tf.Tensor ‘Relu_23:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv4_1’: <tf.Tensor ‘Relu_24:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_2’: <tf.Tensor ‘Relu_25:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_3’: <tf.Tensor ‘Relu_26:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_4’: <tf.Tensor ‘Relu_27:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv5_1’: <tf.Tensor ‘Relu_28:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_2’: <tf.Tensor ‘Relu_29:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_3’: <tf.Tensor ‘Relu_30:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_4’: <tf.Tensor ‘Relu_31:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘input’: <tensorflow.python.ops.variables.Variable object at 0x7f7a5bf8f7f0>}

The next figure shows the content image (C) – the Louvre museum’s pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.

louvre_small.jpg

For the above content image, the activation outputs from the convolution layers are visualized in the next few figures.

1

3

2

4

5

6

7.png

 

How to ensure that the generated image G matches the content of the image C?

As we know, the earlier (shallower) layers of a ConvNet tend to detect lower-level features such as edges and simple textures, and the later (deeper) layers tend to detect higher-level features such as more complex textures as well as object classes.

We would like the “generated” image G to have similar content as the input image C. Suppose we have chosen some layer’s activations to represent the content of an image. In practice, we shall get the most visually pleasing results if we choose a layer in the middle of the network – neither too shallow nor too deep.

8.png

First we need to compute the “content cost” using TensorFlow.

  • The content cost takes a hidden layer activation of the neural network, and measures how different a(C) and a(G) are.
  • When we minimize the content cost later, this will help make sure G
    has similar content as C.

def compute_content_cost(a_C, a_G):
“””
Computes the content cost

Arguments:
a_C — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C
a_G — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G

Returns:
J_content — scalar that we need to compute using equation 1 above.
“””

# Retrieve dimensions from a_G
m, n_H, n_W, n_C = a_G.get_shape().as_list()

# Reshape a_C and a_G
a_C_unrolled = tf.reshape(tf.transpose(a_C), (m, n_H * n_W, n_C))
a_G_unrolled = tf.reshape(tf.transpose(a_G), (m, n_H * n_W, n_C))

# compute the cost with tensorflow
J_content = tf.reduce_sum((a_C_unrolled – a_G_unrolled)**2 / (4.* n_H * n_W *  \
n_C))

return J_content

 

Computing the style cost

For our running example, we will use the following style image (S). This painting was painted in the style of impressionism, by  Claude Monet .

claude-monet

9.png

def gram_matrix(A):
“””
Argument:
A — matrix of shape (n_C, n_H*n_W)

Returns:
GA — Gram matrix of A, of shape (n_C, n_C)
“””

GA = tf.matmul(A, tf.transpose(A))
return GA

10.png

def compute_layer_style_cost(a_S, a_G):
“””
Arguments:
a_S — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S
a_G — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G

Returns:
J_style_layer — tensor representing a scalar value, style cost defined above by equation (2)
“””

# Retrieve dimensions from a_G
m, n_H, n_W, n_C = a_G.get_shape().as_list()

# Reshape the images to have them of shape (n_C, n_H*n_W)
a_S = tf.reshape(tf.transpose(a_S), (n_C, n_H * n_W))
a_G = tf.reshape(tf.transpose(a_G), (n_C, n_H * n_W))

# Computing gram_matrices for both images S and G (≈2 lines)
GS = gram_matrix(a_S)
GG = gram_matrix(a_G)

# Computing the loss
J_style_layer = tf.reduce_sum((GS – GG)**2 / (4.* (n_H * n_W * n_C)**2))

return J_style_layer

11.png

  • The style of an image can be represented using the Gram matrix of a hidden layer’s activations. However, we get even better results combining this representation from multiple different layers. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
  • Minimizing the style cost will cause the image G to follow the style of the image S.

 

Defining the total cost to optimize

Finally, let’s create and implement a cost function that minimizes both the style and the content cost. The formula is:

 

12.png

def total_cost(J_content, J_style, alpha = 10, beta = 40):
“””
Computes the total cost function

Arguments:
J_content — content cost coded above
J_style — style cost coded above
alpha — hyperparameter weighting the importance of the content cost
beta — hyperparameter weighting the importance of the style cost

Returns:
J — total cost as defined by the formula above.
“””

J = alpha * J_content + beta * J_style
return J

  • The total cost is a linear combination of the content cost J_content(C,G) and the style cost J_style(S,G).
  • α and β are hyperparameters that control the relative weighting between content and style.

 

Solving the optimization problem

Finally, let’s put everything together to implement Neural Style Transfer!

Here’s what the program will have to do:

  • Create an Interactive Session
  • Load the content image
  • Load the style image
  • Randomly initialize the image to be generated
  • Load the VGG19 model
  • Build the TensorFlow graph:
    • Run the content image through the VGG19 model and compute the content cost.
    • Run the style image through the VGG19 model and compute the style cost
      Compute the total cost.
    • Define the optimizer and the learning rate.
  • Initialize the TensorFlow graph and run it for a large number of iterations, updating the generated image at every step.

Let’s first load, reshape, and normalize our “content” image (the Louvre museum picture) and “style” image (Claude Monet’s painting).

Now, we initialize the “generated” image as a noisy image created from the content_image. By initializing the pixels of the generated image to be mostly noise but still slightly correlated with the content image, this will help the content of the “generated” image more rapidly match the content of the “content” image. The following figure shows the noisy image:

13.png

Next, let’s load the pre-trained VGG-19 model.

To get the program to compute the content cost, we will now assign a_C and a_G to be the appropriate hidden layer activations. We will use layer conv4_2 to compute the content cost. We need to do the following:

  • Assign the content image to be the input to the VGG model.
  • Set a_C to be the tensor giving the hidden layer activation for layer “conv4_2”.
  • Set a_G to be the tensor giving the hidden layer activation for the same layer.
  • Compute the content cost using a_C and a_G.

Next, we need to compute the style cost and compute the total cost J by taking a linear combination of the two. Use alpha = 10 and beta = 40.

Then we are going to  set up the Adam optimizer in TensorFlow, using a learning rate of 2.0.

Finally, we need to initialize the variables of the tensorflow graph, assign the input image (initial generated image) as the input of the VGG19 model and runs the model to minimize the total cost J for a large number of iterations.

Results

The following figures / animations show the generated images (G) with different content (C) and style images (S) at different iterations in the optimization process.


Content

louvre_small

Style (Claud Monet’s The Poppy Field near Argenteuil)

claude-monet

Generated

l



Content

cat.jpg

Style

paint.jpg

Generated

cat


Content

circle.jpg

Style

sea.jpg

Generated

circle


Content

content1.jpg

Style (Van Gogh’s The Starry Night)

vstyle.jpg

Generated

content1


Content

content2.jpg

Style

style2.jpg

Generated

content2.gif


Content (Victoria Memorial Hall)

vic.jpg

Style (Van Gogh’s The Starry Night)

vstyle.jpg

Generated

vic


Content (Taj Mahal)

taj.jpg

Style (Van Gogh’s Starry Night Over the Rhone)

van8.jpg

Generated

taj.gif


Content

in

Style (Claud Monet’s Sunset in Venice)

monet5.png

Generated

in.gif


Content (Visva Bharati)biswa.jpg

Style (Abanindranath Tagore’s Rabindranath in the role of  blind singer )
aban2.jpg

Generated

biswa.gif

 


Content (Howrah Bridge)

hwhbr.jpg

Style (Van Gogh’s The Starry Night)

vstyle.jpg

Generated

hwhbr.gif

 


Content (Leonardo Da Vinci’s Mona Lisa)

monalisa

Style (Van Gogh’s The Starry Night)
vstyle.jpg

Generatedmonalisa


Content (My sketch: Rabindranath Tagore)

rabi.png

Style (Abanindranath Tagore’s Rabindranath in the role of  blind singer )
aban2.jpg

Generated
rabi.gif


Content (me)

me.jpg

Style (Van Gogh’s Irises)

van.jpg

Generated

me.gif

 


Content

me.jpg

Style

stars.jpg

Generated

stars.gif


Content

me2

Style (Publo Picaso’s Factory at Horto de Ebro)

picaso3.jpg

Generated

me2.gif


The following animations show how the generated image changes with the change in VGG-19 convolution layer used for computing content cost.

Content

content1.jpg

Style (Van Gogh’s The Starry Night)

vstyle.jpg

Generated

convolution layer 3_2 used

content1_32

convolution layer 4_2 used

content1_42.gif

convolution layer 5_2 used

content1_52

 

Some Computational Photography: Image Quilting (Texture Synthesis) with Dynamic Programming and Texture Transfer (Drawing with Textures) in Python

 

The following problems appeared as a programming assignment in the Computation Photography course (CS445) at UIUC. The description of the problem is taken from the assignment itself. In this assignment, a python implementation of the problems will be described instead of matlab, as expected in the course.

 

The Problems

  • The goal of this assignment is to implement the image quilting algorithm for
    texture synthesis and transfer, described in this SIGGRAPH 2001 paper by Efros
    and Freeman.
  • Texture synthesis is the creation of a larger texture image from a small sample.
  • Texture transfer is giving an object the appearance of having the
    same texture as a sample while preserving its basic shape.
  • For texture synthesis, the main idea is to sample patches and lay them down in overlapping patterns, such that the overlapping regions are similar.
  • The overlapping regions may not match exactly, which will result in noticeable
    edges artifact. To fix this, we need to compute a path along pixels with similar intensities through the overlapping region and use it to select which overlapping patch from which to draw each pixel.
  • Texture transfer is achieved by encouraging sampled patches to have similar appearance to a given target image, as well as matching overlapping regions of already sampled patches.
  • In this project, we need to apply important techniques such as template matching, finding seams, and masking. These techniques are also useful for image stitching, image completion, image retargeting and blending.

 

Randomly Sampled Texture

First let’s randomly samples square patches of size patchsize from sample in order to create an output image of size outsize. Start from the upper-left corner, and tile samples until the image is full. If the patches don’t fit evenly into the output image, we can leave black borders at the edges. This is the simplest but least effective method. Save a result from a sample image to compare to the next two methods.

 

Overlapping Patches

Let’s start by sampling a random patch for the upper-left corner. Then sample new patches to overlap with existing ones. For example, the second patch along the top row will overlap by patchsize pixels in the vertical direction and overlap pixels in the horizontal direction. Patches in the first column will overlap by patchsize pixels in the horizontal direction and overlap pixels in the vertical direction. Other patches will have two overlapping regions (on the top and left) which should both be taken into account. Once the cost of each patch has been computed, randomly choose on patch whose cost is
less than a threshold determined by some tolerance value.

As described in the paper, the size of the block is the only parameter controlled by the user and it depends on the properties of a given texture; the block must be big enough to capture the relevant structures in the texture, but small enough so that the interaction between these structures is left up to the algorithm. The overlap size is taken to be one-sixth of the block size (B/6) in general.

 

Seam Finding

Next we need to find the min-cost contiguous path from the left to right side of the patch according to the cost. The cost of a path through each pixel is the square differences (summed over RGB for color images) of the output image and the newly
sampled patch. Use dynamic programming to find the min-cost path.

The following figure describes the algorithm to be implemented for image quilting.

f20.png

Texture Transfer

The final task is to implement texture transfer, based on the quilt implementation for creating a texture sample that is guided by a pair of sample/target correspondence images (section 3 of the paper). The main difference between this function and
quilt function is that there is an additional cost term based on the difference between
the sampled source patch and the target patch at the location to be filled.

Image quilting (texture synthesis) results

The following figures and animations show the results of the outputs obtained with the quilting algorithm. The input texture images are mostly taken from the paper .

Input sample Texture
q2.png

100 sampled blocks of a fixed size (e.g. 50×50) from the input sample
samples_q2.png

The next animation shows how the large output texture gets created (100 times larger than the input sample texture) with the quilting algorithm.

quilt.gif

Output Texture (10×10 times larger than the input) created with texture synthesis (quilting)
quilt_q2.png

 

Input Texture

q6.png

Output Texture (25 times larger than the input) created with texture synthesis (quilting) with the minimum cost seams (showed as red lines) computed with dynamic programming

quilt_boundary_q6

Output Texture (25 times larger than the input) created with quiltingquilt_q6

 

Input Texture

q1

Output Texture (25 times larger than the input) created with quilting

quilt_q1_.png

Input Texture

q3

Output Texture (25 times larger than the input) created with quilting

quilt_q3.png

Input Texture

q4

Output Texture (12 times larger than the input) created with quilting

quilt_q4.png

Input Texture

q5

Output Texture (25 times larger than the input) created with quilting

quilt_q5.png

Input Texture

q7

Output Texture (25 times larger than the input) created with quilting

quilt_q7_.png

Input Texture

q9

Output Texture (36 times larger than the input) created with quilting

quilt_q9.png

Input Texture

q11

Output Texture (9 times larger than the input) created with quilting

quilt_q11_.png

Input Texture

q12

Output Texture (25 times larger than the input) created with quilting

quilt_q12.png

Input Texture

q13

Output Texture (9 times larger than the input) created with quilting along with the min-cost seams (shown as red lines) computed with dynamic programming 

quilt_boundary_q13

Output Texture (9 times larger than the input) created with quilting

quilt_q13

 

Texture transfer results

The following figures and animations show how the texture from an input image can be transferred to the target image using the above-mentioned simple modification of the quilting algorithm. Again, some of the images are taken from the paper.

Input Texture (milk)

q14.png

Target Image

qt1.png

Output Image after Texture Transfer

text_tran_q14

 

Input Texture (milk)

q14.png

Target Image

mona

Output Image after Texture Transfer

text_tran_q14_

 

The following figures show the output image obtained when a few textures were transferred to my image.

Target Image (me)

me1.png

 

Input Texture (fire)

fire.png

Output Image after Texture Transfer  (with small block size)

text_tran_fire

 

 

Input Texture (cloud)

cloud1.png

Output Image after Texture Transfer  (with small block size)

text_tran_cloud1

 

Input Texture (toast)
toast.png

Output Image after Texture Transfer  (with small block size)text_tran_toast

Now applying some gradient mixing such as Poisson blending  on the original image and the the one after texture transfer with the target toast image we get the following two images respectively.

pe_toast1

pe_toast

Input Texture
q7.png

Output Image after Texture Transfer  (with small block size)

text_tran_q7

 

Input Texture

draw.png

Output Image after Texture Transfer  (with small block size)text_tran_draw

 

 

Drawing with Textures

 

The following figures show the output image obtained when a few textures were transferred to another image of mine.

 

Target Image (me)

me6.png

 

Some textures from a few famous painting-masterpieces (obtained from here, here, here, here and here)



Input Texture (Van Gogh’s The Starry Night)

van.png

Output Images after Texture Transfer  (with 3 different patch sizes)

text_tran_012_van

text_tran_024_van
text_tran_050_van


Input Texture (Van Gogh’s Wheatfield with Cypresses)

van1.png

Output Image after Texture Transfer

text_tran_012_van1.png


Input Texture (Van Gogh’s The Mulberry Tree)

van2

Output Image after Texture Transfer

text_tran_050_van2.png


Input Texture (Van Gogh’s Wheatfield with Crows)

van3

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_024_van3text_tran_050_van3


Input Texture (Van Gogh’s Vase with fifteen Sunflowers)

van4

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_024_van4text_tran_050_van4


Input Texture (Van Gogh’s Sunflowers (F452))

van5

Output Image after Texture Transfer

text_tran_024_van5


Input Texture (Van Gogh’s Irises)

van6

Output Image after Texture Transfer

text_tran_024_van6.png


Input Texture (Van Gogh’s Almond Blossom)

van7

Output Image after Texture Transfer

text_tran_024_van7



Input Texture (Van Gogh’s Starry Night Over the Rhone)

van8

Output Image after Texture Transfer

text_tran_024_van8.png


Input Texture (Pablo Picasso’s Mediterranean Landscape)

picaso

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_024_picasotext_tran_050_picaso


Input Texture (Pablo Picasso’s Minotauromachy)

picaso1

Output Image after Texture Transfer

text_tran_024_picaso1


Input Texture (Pablo Picasso’s Mother and Child 1921)

picaso2

Output Image after Texture Transfer

text_tran_024_picaso2.png


Input Texture (Pablo Picasso’s Factory at Horto de Ebro)

picaso3

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_024_picaso3text_tran_050_picaso3.png


Input Texture (Pablo Picasso’s Carafe and Three Bowls) picaso4

Output Image after Texture Transfer
text_tran_024_picaso4



Input Texture (Pablo Picasso’s Bullfight 3) 

picaso6
Output Image after Texture Transfer
text_tran_024_picaso6


Input Texture (Pablo Picasso’s Accordionist) 

picaso7

Output Image after Texture Transfertext_tran_024_picaso7


Input Texture (Pablo Picasso’s Las Meninas) picaso5

Output Image after Texture Transfertext_tran_024_picaso5.png


Input Texture (Claude Monet’s The Artist’s Garden at Giverny

monet

Output Image after Texture Transfertext_tran_024_monet


Input Texture (Claude Monet’s The Poppy Field near Argenteuil

monet1

Output Image after Texture Transfer

text_tran_024_monet1


Input Texture (Claude Monet’s The Magpie

monet2

Output Image after Texture Transfer

text_tran_024_monet2


Input Texture (Claude Monet’s The Garden of Monet at Argenteuil

monet3

Output Images after Texture Transfer  (with 2 different patch sizes)text_tran_050_monet3text_tran_024_monet3


Input Texture (Claude Monet’s Sunset in Venice

monet5

Output Image after Texture Transfer

text_tran_024_monet5


Input Texture (Claude Monet’s Waterloo Bridge

monet4

Output Image after Texture Transfer text_tran_024_monet4.png


Input Texture (Claude Monet’s Water Lilies     monet6

Output Image after Texture Transfer

text_tran_024_monet6


Input Texture (Claude Monet’s Impression Sunrise)

monet7

Output Image after Texture Transfer

text_tran_024_monet7


Input Texture (Claude Monet’s Sunflowers)

monet8

Output Image after Texture Transfer

text_tran_024_monet8



Input Texture (Abanindranath Tagore’s Canal in Shahjadpur)

aban

Output Image after Texture Transfer

text_tran_024_aban


Input Texture (Abanindranath Tagore’s The Victory of Buddha)

aban1

Output Image after Texture Transfer

text_tran_024_aban1

 



Input Texture (Abanindranath Tagore’s Rabindranath in the role of  blind singer)

aban2

Output Image after Texture Transfer

text_tran_024_aban2


Input Texture (Abanindranath Tagore’s Slaying the Tornado Demon)

aban3

Output Image after Texture Transfer

text_tran_024_aban3


Input Texture (Nandalal Bose’s Village Huts)

nandalal

Output Image after Texture Transfer

text_tran_024_nandalal


Some more Textures

Input Texture

epii

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_012_epiitext_tran_024_epii

 

Input Texture

ds

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_012_dstext_tran_024_ds

 

Input Texture

alphabet.png

Output Image after Texture Transfer

text_tran_024_alphabet.png

 

Input Texture

rabi

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_012_rabitext_tran_024_rabi

 

Input Texture

notes

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_012_notestext_tran_024_notes

 

Input Texture

india

Output Images after Texture Transfer (with 2 different patch sizes)

text_tran_012_india.png

text_tran_024_india.png

 

 

Input Texture

bangla2.png

Output Image after Texture Transfer

text_tran_024_bangla2.png

 

 

Input Texture

r1

Output Image after Texture Transfer

text_tran_012_r1

 

 

Input Texture

ens

Output Image after Texture Transfer

text_tran_012_ens

 

Input Texture

rabi2

Output Image after Texture Transfer

text_tran_012_rabi2

 

 

Input Texture

toast

Output Image after Texture Transfer

text_tran_012_toast_12.png

 

 

Input Texture

q14

Output Images after Texture Transfer  (with 2 different patch sizes)

text_tran_012_q14_12text_tran_024_q14

 

Input Texture
 stars.png

Output Image after Texture Transfer
star.gif

stars2.gif

 

 

Target Image (from The Mummy 1999)
mm.png

Input Texture (sand)

msky.png

Output Image after Texture Transfer

text_tran_012_msky_1.png

 

 

The following animation shows how the milk texture is being transformed to the target image of mine with the quilting algorithm with modified code.

Input Texture

q14

Target Image (me)

me4

Output Image after Texture Transfer  (with small block size)

ttmea

 

The next figures and animations show the output image obtained after milk texture gets transferred to the target image of mine, for different block size of the samples (shown in red). As can be seen from the following outputs, the target image gets more and more prominent as the sampling block size gets smaller and smaller.

text_tran_036_q14

text_tran_804_q14

ttme