# Some Deep Learning with Python, TensorFlow and Keras

The following problems are taken from a few assignments from the coursera courses Introduction to Deep Learning (by Higher School of Economics) and Neural Networks and Deep Learning (by Prof Andrew Ng, deeplearning.ai). The problem descriptions are taken straightaway from the assignments.

# 1. Linear models, Optimization

In this assignment a linear classifier will be implemented and it will be trained using stochastic gradient descent with numpy.

## Two-dimensional classification

To make things more intuitive, let’s solve a 2D classification problem with synthetic data.

## Features

As we  can notice the data above isn’t linearly separable. Hence we should add features (or use non-linear model). Note that decision line between two classes have form of circle, since that we can add quadratic features to make the problem linearly separable. The idea under this displayed on image below:

Here are some test results for the implemented expand function, that is used for adding quadratic features:

# simple test on random numbers

dummy_X = np.array([
[0,0],
[1,0],
[2.61,-1.28],
[-0.59,2.1]
])

# call expand function
dummy_expanded = expand(dummy_X)

# what it should have returned:   x0       x1       x0^2     x1^2     x0*x1    1
dummy_expanded_ans = np.array([[ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  1.    ],
[ 1.    ,  0.    ,  1.    ,  0.    ,  0.    ,  1.    ],
[ 2.61  , -1.28  ,  6.8121,  1.6384, -3.3408,  1.    ],
[-0.59  ,  2.1   ,  0.3481,  4.41  , -1.239 ,  1.    ]])

## Logistic regression

To classify objects we will obtain probability of object belongs to class ‘1’. To predict probability we will use output of linear model and logistic function:

def probability(X, w):
"""
Given input features and weights
return predicted probabilities of y==1 given x, P(y=1|x), see description above

:param X: feature matrix X of shape [n_samples,6] (expanded)
:param w: weight vector w of shape [6] for each of the expanded features
:returns: an array of predicted probabilities in [0,1] interval.
"""

return 1. / (1 + np.exp(-np.dot(X, w)))

In logistic regression the optimal parameters w are found by cross-entropy minimization:

def compute_loss(X, y, w):
"""
Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,
and weight vector w [6], compute scalar loss function using formula above.
"""
return -np.mean(y*np.log(probability(X, w)) + (1-y)*np.log(1-probability(X, w)))

Since we train our model with gradient descent, we should compute gradients. To be specific, we need the following derivative of loss function over each weight:

Here is the derivation (can be found here too):

def compute_grad(X, y, w):
"""
Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,
and weight vector w [6], compute vector [6] of derivatives of L over each weights.
"""

return np.dot((probability(X, w) - y), X) / X.shape[0]

## Training

In this section we’ll use the functions we wrote to train our classifier using stochastic gradient descent. We shall try to change hyper-parameters like batch size, learning rate and so on to find the best one.

## Mini-batch SGD

Stochastic gradient descent just takes a random example on each iteration, calculates a gradient of the loss on it and makes a step:

w = np.array([0, 0, 0, 0, 0, 1]) # initialize

eta = 0.05 # learning rate
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)

for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
dw = compute_grad(X_expanded[ind, :], y[ind], w)
w = w - eta*dw

The following animation shows how the decision surface and the cross-entropy loss function changes with different batches with SGD where batch-size=4.

## SGD with momentum

Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations as can be seen in image below. It does this by adding a fraction α of the update vector of the past time step to the current update vector.

eta = 0.05 # learning rate
alpha = 0.9 # momentum
nu = np.zeros_like(w)
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)

for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
dw = compute_grad(X_expanded[ind, :], y[ind], w)
nu = alpha*nu + eta*dw
w = w - nu

The following animation shows how the decision surface and the cross-entropy loss function changes with different batches with SGD + momentum  where batch-size=4. As can be seen, the loss function drops much faster, leading to a faster convergence.

## RMSprop

We also need to implement RMSPROP algorithm, which use squared gradients to adjust learning rate as follows:

eta = 0.05 # learning rate
alpha = 0.9 # momentum
G = np.zeros_like(w)
eps = 1e-8
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)

for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
dw = compute_grad(X_expanded[ind, :], y[ind], w)
G = alpha*G + (1-alpha)*dw**2
w = w - eta*dw / np.sqrt(G + eps)

The following animation shows how the decision surface and the cross-entropy loss function changes with different batches with SGD + RMSProp where batch-size=4. As can be seen again, the loss function drops much faster, leading to a faster convergence.

# 2. Planar data classification with a neural network with one hidden layer, an implementation from scratch

In this assignment a neural net with a single hidden layer will be trained from scratch. We shall see a big difference between this model and the one implemented using logistic regression.

We shall learn how to:

• Implement a 2-class classification neural network with a single hidden layer
• Use units with a non-linear activation function, such as tanh
• Compute the cross entropy loss
• Implement forward and backward propagation

## Dataset

The following figure visualizes a “flower” 2-class dataset that we shall work on, the colors indicates the class labels.  We have m = 400 training examples.

## Simple Logistic Regression

Before building a full neural network, lets first see how logistic regression performs on this problem. We can use sklearn’s built-in functions to do that, by running the code below to train a logistic regression classifier on the dataset.

# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV();
clf.fit(X.T, Y.T);

We can now plot the decision boundary of the model and accuracy with the following code.

# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y)
plt.title("Logistic Regression")

# Print accuracy
LR_predictions = clf.predict(X.T)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y,LR_predictions) + np.dot(1-Y,1-LR_predictions))/float(Y.size)*100) +
'% ' + "(percentage of correctly labelled datapoints)")
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

 Accuracy 47%

Interpretation: The dataset is not linearly separable, so logistic regression doesn’t perform well. Hopefully a neural network will do better. Let’s try this now!

## Neural Network model

Logistic regression did not work well on the “flower dataset”. We are going to train a Neural Network with a single hidden layer, by implementing the network with python numpy from scratch.

Here is our model:

The general methodology to build a Neural Network is to:

1. Define the neural network structure ( # of input units,  # of hidden units, etc).
2. Initialize the model's parameters
3. Loop:
- Implement forward propagation
- Compute loss
- Implement backward propagation to get the gradients


### Defining the neural network structure

Define three variables and the function layer_sizes:

- n_x: the size of the input layer
- n_h: the size of the hidden layer (set this to 4)
- n_y: the size of the output layer

def layer_sizes(X, Y):
"""
Arguments:
X -- input dataset of shape (input size, number of examples)
Y -- labels of shape (output size, number of examples)

Returns:
n_x -- the size of the input layer
n_h -- the size of the hidden layer
n_y -- the size of the output layer
"""

### Initialize the model’s parameters

Implement the function initialize_parameters().

Instructions:

• Make sure the parameters’ sizes are right. Refer to the neural network figure above if needed.
• We will initialize the weights matrices with random values.
• Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
• We will initialize the bias vectors as zeros.
• Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.
def initialize_parameters(n_x, n_h, n_y):
"""
Argument:
n_x -- size of the input layer
n_h -- size of the hidden layer
n_y -- size of the output layer

Returns:
params -- python dictionary containing the parameters:
W1 -- weight matrix of shape (n_h, n_x)
b1 -- bias vector of shape (n_h, 1)
W2 -- weight matrix of shape (n_y, n_h)
b2 -- bias vector of shape (n_y, 1)
"""


### The Loop

Implement forward_propagation().

Instructions:

• Look above at the mathematical representation of the classifier.
• We can use the function sigmoid().
• We can use the function np.tanh(). It is part of the numpy library.
• The steps we have to implement are:
1. Retrieve each parameter from the dictionary “parameters” (which is the output of initialize_parameters()) by using parameters[".."].
2. Implement Forward Propagation. Compute Z[1],A[1],Z[2]Z[1],A[1],Z[2] and A[2]A[2] (the vector of all the predictions on all the examples in the training set).
• Values needed in the backpropagation are stored in “cache“. The cache will be given as an input to the backpropagation function.
def forward_propagation(X, parameters):
"""
Argument:
X -- input data of size (n_x, m)
parameters -- python dictionary containing the parameters (output of initialization function)

Returns:
A2 -- The sigmoid output of the second activation
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
"""


Implement compute_cost() to compute the value of the cost J.  There are many ways to implement the cross-entropy loss.

def compute_cost(A2, Y, parameters):
"""
Computes the cross-entropy cost given in equation (13)

Arguments:
A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
parameters -- python dictionary containing the parameters W1, b1, W2 and b2

Returns:
cost -- cross-entropy cost given equation (13)
"""


Using the cache computed during forward propagation, we can now implement backward propagation.

Implement the function backward_propagation().

Instructions: Backpropagation is usually the hardest (most mathematical) part in deep learning. The following figure is taken from is the slide from the lecture on backpropagation. We’ll want to use the six equations on the right of this slide, since we are building a vectorized implementation.

def backward_propagation(parameters, cache, X, Y):
"""
Implement the backward propagation using the instructions above.

Arguments:
parameters -- python dictionary containing our parameters
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
X -- input data of shape (2, number of examples)
Y -- "true" labels vector of shape (1, number of examples)

Returns:
grads -- python dictionary containing the gradients with respect to different parameters
"""


Implement the update rule. Use gradient descent. We have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).

General gradient descent rule: θ=θα(J/θ) where α is the learning rate and θ
represents a parameter.

Illustration: The gradient descent algorithm with a good learning rate (converging) and a bad learning rate (diverging). Images courtesy of Adam Harley.

def update_parameters(parameters, grads, learning_rate = 1.2):
"""

Arguments:
parameters -- python dictionary containing our parameters

Returns:
parameters -- python dictionary containing our updated parameters
"""


### Integrate previous parts in nn_model()

Build the neural network model in nn_model().

Instructions: The neural network model has to use the previous functions in the right order.

def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
"""
Arguments:
X -- dataset of shape (2, number of examples)
Y -- labels of shape (1, number of examples)
n_h -- size of the hidden layer
num_iterations -- Number of iterations in gradient descent loop
print_cost -- if True, print the cost every 1000 iterations

Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""


### Predictions

Use the model to predict by building predict(). Use forward propagation to predict results.

def predict(parameters, X):
"""
Using the learned parameters, predicts a class for each example in X

Arguments:
parameters -- python dictionary containing our parameters
X -- input data of size (n_x, m)

Returns
predictions -- vector of predictions of our model (red: 0 / blue: 1)
"""


It is time to run the model and see how it performs on a planar dataset. Run the following code to test our model with a single hidden layer of nh hidden units.

# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)

# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219471
Cost after iteration 9000: 0.218612

 Cost after iteration 9000 0.218607
# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')

Accuracy: 90%

Accuracy is really high compared to Logistic Regression. The model has learnt the leaf patterns of the flower! Neural networks are able to learn even highly non-linear decision boundaries, unlike logistic regression.

Now, let’s try out several hidden layer sizes. We can observe different behaviors of the model for various hidden layer sizes. The results are shown below.

### Tuning hidden layer size

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 20 hidden units: 90.0 %
Accuracy for 50 hidden units: 90.25 %


Interpretation:

• The larger models (with more hidden units) are able to fit the training set better, until eventually the largest models overfit the data.
• The best hidden layer size seems to be around n_h = 5. Indeed, a value around here seems to fits the data well without also incurring noticable overfitting.
• We shall also learn later about regularization, which lets us use very large models (such as n_h = 50) without much overfitting.

# 3. Getting deeper with Keras

• Tensorflow is a powerful and flexible tool, but coding large neural architectures with it is tedious.
• There are plenty of deep learning toolkits that work on top of it like Slim, TFLearn, Sonnet, Keras.
• Choice is matter of taste and particular task
• We’ll be using Keras to predict handwritten digits with the mnist dataset.
• The following figure shows 225 sample images from the dataset.

## The pretty keras

Using only the following few lines of code we can learn a simple deep neural net with 3 dense hidden layers and with Relu activation, with dropout 0.5 after each dense layer.

import keras
from keras.models import Sequential
import keras.layers as ll

model = Sequential(name="mlp")

# network body

# output layer: 10 neurons for each class with softmax

# categorical_crossentropy is our good old crossentropy
# but applied for one-hot-encoded vectors


The following shows the summary of the model:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_12 (InputLayer)        (None, 28, 28)            0
_________________________________________________________________
flatten_12 (Flatten)         (None, 784)               0
_________________________________________________________________
dense_35 (Dense)             (None, 128)               100480
_________________________________________________________________
activation_25 (Activation)   (None, 128)               0
_________________________________________________________________
dropout_22 (Dropout)         (None, 128)               0
_________________________________________________________________
dense_36 (Dense)             (None, 128)               16512
_________________________________________________________________
activation_26 (Activation)   (None, 128)               0
_________________________________________________________________
dropout_23 (Dropout)         (None, 128)               0
_________________________________________________________________
dense_37 (Dense)             (None, 128)               16512
_________________________________________________________________
activation_27 (Activation)   (None, 128)               0
_________________________________________________________________
dropout_24 (Dropout)         (None, 128)               0
_________________________________________________________________
dense_38 (Dense)             (None, 10)                1290
=================================================================
Total params: 134,794
Trainable params: 134,794
Non-trainable params: 0
_________________________________________________________________


### Model interface

Keras models follow Scikit-learn‘s interface of fit/predict with some notable extensions. Let’s take a tour.

# fit(X,y) ships with a neat automatic logging.
#          Highly customizable under the hood.
model.fit(X_train, y_train,
validation_data=(X_val, y_val), epochs=13);

Train on 50000 samples, validate on 10000 samples
Epoch 1/13
50000/50000 [==============================] - 14s - loss: 0.1489 - acc: 0.9587 - val_loss: 0.0950 - val_acc: 0.9758
Epoch 2/13
50000/50000 [==============================] - 12s - loss: 0.1543 - acc: 0.9566 - val_loss: 0.0957 - val_acc: 0.9735
Epoch 3/13
50000/50000 [==============================] - 11s - loss: 0.1509 - acc: 0.9586 - val_loss: 0.0985 - val_acc: 0.9752
Epoch 4/13
50000/50000 [==============================] - 11s - loss: 0.1515 - acc: 0.9577 - val_loss: 0.0967 - val_acc: 0.9752
Epoch 5/13
50000/50000 [==============================] - 11s - loss: 0.1471 - acc: 0.9596 - val_loss: 0.1008 - val_acc: 0.9737
Epoch 6/13
50000/50000 [==============================] - 11s - loss: 0.1488 - acc: 0.9598 - val_loss: 0.0989 - val_acc: 0.9749
Epoch 7/13
50000/50000 [==============================] - 11s - loss: 0.1495 - acc: 0.9592 - val_loss: 0.1011 - val_acc: 0.9748
Epoch 8/13
50000/50000 [==============================] - 11s - loss: 0.1434 - acc: 0.9604 - val_loss: 0.1005 - val_acc: 0.9761
Epoch 9/13
50000/50000 [==============================] - 11s - loss: 0.1514 - acc: 0.9590 - val_loss: 0.0951 - val_acc: 0.9759
Epoch 10/13
50000/50000 [==============================] - 11s - loss: 0.1424 - acc: 0.9613 - val_loss: 0.0995 - val_acc: 0.9739
Epoch 11/13
50000/50000 [==============================] - 11s - loss: 0.1408 - acc: 0.9625 - val_loss: 0.0977 - val_acc: 0.9751
Epoch 12/13
50000/50000 [==============================] - 11s - loss: 0.1413 - acc: 0.9601 - val_loss: 0.0938 - val_acc: 0.9753
Epoch 13/13
50000/50000 [==============================] - 11s - loss: 0.1430 - acc: 0.9619 - val_loss: 0.0981 - val_acc: 0.9761

As we could see, with a simple model without any convolution layers we could obtain more than 97.5% accuracy on the validation dataset.

The following figures show the weights learnt at different layers.

# Some Tips & tricks to improve accuracy

Here are some tips on what we can do to improve accuracy:

• Network size
• More neurons,
• More layers, (docs)
• Nonlinearities in the hidden layers
• tanh, relu, leaky relu, etc
• Larger networks may take more epochs to train, so don’t discard the net just because it could didn’t beat the baseline in 5 epochs.
• Early Stopping
• Training for 100 epochs regardless of anything is probably a bad idea.
• Some networks converge over 5 epochs, others – over 500.
• Way to go: stop when validation score is 10 iterations past maximum
• Faster optimization
• Converge faster and sometimes reach better optima
• It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
• Regularize to prevent overfitting
• Data augmemntation – getting 5x as large dataset for free is a great deal
• https://keras.io/preprocessing/image/
• Zoom-in+slice = move
• Rotate+zoom(to remove black stripes)
• any other perturbations
• Simple way to do that (if we have PIL/Image):
• from scipy.misc import imrotate,imresize
• and a few slicing
• Stay realistic. There’s usually no point in flipping dogs upside down as that is not the way we usually see them.

# Some Data Processing and Analysis with Python

The following problems appeared as assignments in the edX course Analytics for Computing (by Gatech). The descriptions are taken from the assignments.

# 1. Association rule mining

First we shall implement the basic pairwise association rule mining algorithm.

## Problem definition

Let’s say we have a fragment of text in some language. We wish to know whether there are association rules among the letters that appear in a word. In this problem:

• Words are “receipts”
• Letters within a word are “items”

We want to know whether there are association rules of the form, a⟹ b , where a and b are letters, for a given language (by calculating for each rule its confidencecon(⟹ b), which is an estimate of the conditional probability of b given a, or Pr[b|a].

## Sample text input

Let’s carry out this analysis on a “dummy” text fragment, which graphic designers refer to as the lorem ipsum:

latin_text= """
Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam
rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta
sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed
quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est,
qui dolorem ipsum, quia dolor sit amet consectetur adipisci[ng] velit, sed quia non numquam [do] eius
modi tempora inci[di]dunt, ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima
veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi
consequatur? Quis autem vel eum iure reprehenderit, qui in ea voluptate velit esse, quam nihil molestiae
consequatur, vel illum, qui dolorem eum fugiat, quo voluptas nulla pariatur?

At vero eos et accusamus et iusto odio dignissimos ducimus, qui blanditiis praesentium voluptatum
deleniti atque corrupti, quos dolores et quas molestias excepturi sint, obcaecati cupiditate non
provident, similique sunt in culpa, qui officia deserunt mollitia animi, id est laborum et dolorum
fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est
eligendi optio, cumque nihil impedit, quo minus id, quod maxime placeat, facere possimus, omnis voluptas
assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum
necessitatibus saepe eveniet, ut et voluptates repudiandae sint et molestiae non recusandae. Itaque
earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur
aut perferendis doloribus asperiores repellat.
"""

## Data cleaning

Like most data in the real world, this dataset is noisy. It has both uppercase and lowercase letters, words have repeated letters, and there are all sorts of non-alphabetic characters. For our analysis, we should keep all the letters and spaces (so we can identify distinct words), but we should ignore case and ignore repetition within a word.

For example, the eighth word of this text is “error.” As an itemset, it consists of the three unique letters{e,o,r}. That is, to treat the word as a set, meaning we only keep the unique letters. This itemset has six possible itempairs{e,o}{e,r}, and {o,r}.

• We need to start by “cleaning up” (normalizing) the input, with all characters converted to lowercase and all non-alphabetic, non-space characters removed.
• Next, we need to convert each word into an itemset like the following examples:

consequatur –> {‘a’, ‘e’, ‘c’, ‘s’, ‘n’, ‘o’, ‘u’, ‘t’, ‘q’, ‘r’}
voluptatem –> {‘l’, ‘a’, ‘e’, ‘o’, ‘p’, ‘u’, ‘m’, ‘t’, ‘v’}

## Implementing the basic algorithm

The followed algorithm is implemented:

First all item-pairs within an itemset are enumerated and a table that tracks the counts of those item-pairs is updated in-place.

• Now, given tables of item-paircounts and individual item counts, as well as a confidence threshold, the rules that meet the threshold are returned.
• The returned rules should be in the form of a dictionary whose key is the tuple(a,bcorresponding to the rule a⇒ b, and whose value is the confidence of the rule, conf(⇒ b).
• The following functions were implemented to compute the association rules.
from collections import defaultdict
from itertools import combinations

def update_pair_counts (pair_counts, itemset):
"""
Updates a dictionary of pair counts for all pairs of items
in a given itemset.
"""
assert type (pair_counts) is defaultdict

for item in list(combinations(itemset, 2)):
pair_counts[item] += 1
pair_counts[item[::-1]] += 1
return pair_counts
def update_item_counts(item_counts, itemset):

for item in itemset:
item_counts[item] += 1
return item_counts
def filter_rules_by_conf (pair_counts, item_counts, threshold):
rules = {} # (item_a, item_b) -> conf (item_a => item_b)

for (item_a, item_b) in pair_counts:
conf = pair_counts[(item_a, item_b)] / float(item_counts[item_a])
if conf >= threshold:
rules[(item_a, item_b)] = conf

return rules
def find_assoc_rules(receipts, threshold):

pair_counts = defaultdict(int)
item_counts = defaultdict(int)
for receipt in receipts:
update_pair_counts(pair_counts, receipt)
update_item_counts(item_counts, receipt)
return filter_rules_by_conf(pair_counts, item_counts, threshold)

• For the Latin string, latin_text, the function find_assoc_rules() was used to compute the rules whose confidence is at least 0.75, with the following rules obtained as result.

conf(q => u) = 1.000
conf(x => e) = 1.000
conf(x => i) = 0.833
conf(h => i) = 0.833
conf(v => t) = 0.818
conf(r => e) = 0.800
conf(v => e) = 0.773
conf(f => i) = 0.750
conf(b => i) = 0.750
conf(g => i) = 0.750

• Now, let’s look at rules common to the above Latin textandEnglish text obtained by a translation of the lorem ipsum text, as shown below:
english_text = """
But I must explain to you how all this mistaken idea of denouncing of a pleasure and praising pain was
born and I will give you a complete account of the system, and expound the actual teachings of the great
explorer of the truth, the master-builder of human happiness. No one rejects, dislikes, or avoids
pleasure itself, because it is pleasure, but because those who do not know how to pursue pleasure
rationally encounter consequences that are extremely painful. Nor again is there anyone who loves or
pursues or desires to obtain pain of itself, because it is pain, but occasionally circumstances occur in
which toil and pain can procure him some great pleasure. To take a trivial example, which of us
ever undertakes laborious physical exercise, except to obtain some advantage from it? But who has any
right to find fault with a man who chooses to enjoy a pleasure that has no annoying consequences, or
one who avoids a pain that produces no resultant pleasure?

On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and
demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the
pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty
through weakness of will, which is the same as saying through shrinking from toil and pain. These
cases are perfectly simple and easy to distinguish. In a free hour, when our power of choice is
untrammeled and when nothing prevents our being able to do what we like best, every pleasure is to
be welcomed and every pain avoided. But in certain circumstances and owing to the claims of duty or
the obligations of business it will frequently occur that pleasures have to be repudiated and
annoyances accepted. The wise man therefore always holds in these matters to this principle of
selection: he rejects pleasures to secure other greater pleasures, or else he endures pains to
avoid worse pains.
"""

• Again, for the English string, english_text, the function find_assoc_rules()  was used to compute the rules whose confidence is at least 0.75, with the following rules obtained as result.

conf(z => a) = 1.000
conf(j => e) = 1.000
conf(z => o) = 1.000
conf(x => e) = 1.000
conf(q => e) = 1.000
conf(q => u) = 1.000
conf(z => m) = 1.000
conf(z => r) = 1.000
conf(z => l) = 1.000
conf(z => e) = 1.000
conf(z => d) = 1.000
conf(z => i) = 1.000
conf(k => e) = 0.778
conf(q => n) = 0.750

• Let’s consider any rules with a confidence of at least 0.75 to be a “high-confidence rule“.  The common_high_conf_rules are all the high-confidence rules appearing in both the Latin text and the English text. The rules shown below are all such rules:High-confidence rules common to _lorem ipsum_ in Latin and English:

q => u
x => e

• The following table and the figure show the high confidence rules for  the latin and the english texts.
index rule confidence
0 z=> o 1.000000 English
1 z=> l 1.000000 English
2 z=> m 1.000000 English
3 q=> u 1.000000 English
4 q=> e 1.000000 English
5 x=> e 1.000000 English
6 z=> e 1.000000 English
7 j=> e 1.000000 English
8 z=> a 1.000000 English
9 z=> d 1.000000 English
10 q=> u 1.000000 Latin
11 z=> i 1.000000 English
12 x=> e 1.000000 Latin
13 z=> r 1.000000 English
14 x=> i 0.833333 Latin
15 h=> i 0.833333 Latin
16 v=> t 0.818182 Latin
17 r=> e 0.800000 Latin
18 k=> e 0.777778 English
19 v=> e 0.772727 Latin
20 g=> i 0.750000 Latin
21 q=> n 0.750000 English
22 f=> i 0.750000 Latin
23 b=> i 0.750000 Latin

## Putting it all together: Actual baskets!

Let’s take a look at some real data  from this link. First few lines of the transaction data is shown below:

tropical fruit,yogurt,coffee
whole milk
other vegetables,whole milk,condensed milk,long life bakery product
whole milk,butter,yogurt,rice,abrasive cleaner
rolls/buns
other vegetables,UHT-milk,rolls/buns,bottled beer,liquor (appetizer)
pot plants
whole milk,cereals
citrus fruit,tropical fruit,whole milk,butter,curd,yogurt,flour,bottled water,dishes
beef
frankfurter,rolls/buns,soda
chicken,tropical fruit
butter,sugar,fruit/vegetable juice,newspapers
fruit/vegetable juice
packaged fruit/vegetables
chocolate
specialty bar
other vegetables
butter milk,pastry
whole milk
tropical fruit,cream cheese ,processed cheese,detergent,newspapers
tropical fruit,root vegetables,other vegetables,frozen dessert,rolls/buns,flour,sweet spreads,salty snack,waffles,candy,bathroom cleaner
bottled water,canned beer
yogurt
sausage,rolls/buns,soda,chocolate
other vegetables
yogurt,beverages,bottled water,specialty bar

• Our task is to mine this dataset for pairwise association rules to produce a final dictionary, basket_rules, that meet these conditions:
1. The keys are pairs (a,b), where a and b are item names.
2. The values are the corresponding confidence scores, conf(⇒ b).
3. Only include rules ⇒ b where item a occurs at least MIN_COUNT times and conf(⇒ b) is at least THRESHOLD.

The result is shown below:

Found 19 rules whose confidence exceeds 0.5.
Here they are:

conf(honey => whole milk) = 0.733
conf(frozen fruits => other vegetables) = 0.667
conf(cereals => whole milk) = 0.643
conf(rice => whole milk) = 0.613
conf(rubbing alcohol => whole milk) = 0.600
conf(cocoa drinks => whole milk) = 0.591
conf(pudding powder => whole milk) = 0.565
conf(jam => whole milk) = 0.547
conf(cream => sausage) = 0.538
conf(cream => other vegetables) = 0.538
conf(baking powder => whole milk) = 0.523
conf(tidbits => rolls/buns) = 0.522
conf(rice => other vegetables) = 0.520
conf(cooking chocolate => whole milk) = 0.520
conf(frozen fruits => whipped/sour cream) = 0.500
conf(specialty cheese => other vegetables) = 0.500
conf(ready soups => rolls/buns) = 0.500
conf(rubbing alcohol => butter) = 0.500
conf(rubbing alcohol => citrus fruit) = 0.500

# 2. Simple string processing with Regex

## Phone numbers

• Write a function to parse US phone numbers written in the canonical “(404) 555-1212” format, i.e., a three-digit area code enclosed in parentheses followed by a seven-digit local number in three-hyphen-four digit format.
• It should also ignore all leading and trailing spaces, as well as any spaces that appear between the area code and local numbers.
• However, it should not accept any spaces in the area code (e.g., in ‘(404)’) nor should it in the local number.
• It should return a triple of strings, (area_code, first_three, last_four).
• If the input is not a valid phone number, it should raise a ValueError.
The following function implements the regex parser.
import re
def parse_phone(s):
pattern = re.compile("\s*$$(\d{3})$$\s*(\d{3})-(\d{4})\s*")
m = pattern.match(s)
if not m:
raise ValueError('not a valid phone number!')
return m.groups()

#print(parse_phone1('(404) 201-2121'))
#print(parse_phone1('404-201-2121'))

• Implement an enhanced phone number parser that can handle any of these patterns.
• (404) 555-1212
• (404) 5551212
• 404-555-1212
• 404-5551212
• 404555-1212
• 4045551212
• As before, it should not be sensitive to leading or trailing spaces. Also, for the patterns in which the area code is enclosed in parentheses, it should not be sensitive to the number of spaces separating the area code from the remainder of the number.

The following function implements the enhanced regex parser.

import re
def parse_phone2 (s):
pattern = re.compile("\s*$$(\d{3})$$\s*(\d{3})-?(\d{4})\s*")
m = pattern.match(s)
if not m:
pattern2 = re.compile("\s*(\d{3})-?(\d{3})-?(\d{4})\s*")
m = pattern2.match(s)
if not m:
raise ValueError('not a valid phone number!')
return m.groups()


# 3. Tidy data and the Pandas

“Tidying data,” is all about cleaning up tabular data for analysis purposes.

Definition: Tidy datasets. More specifically, Wickham defines a tidy data set as one that can be organized into a 2-D table such that

1. each column represents a variable;
2. each row represents an observation;
3. each entry of the table represents a single value, which may come from either categorical (discrete) or continuous spaces.

Definition: Tibbles. if a table is tidy, we will call it a tidy table, or tibble, for short.

### Apply functions to data frames

Given the following pandas DataFrame (first few rows are shown in the next table),compute the prevalence, which is the ratio of cases to the population, using the apply() function, without modifying the original DataFrame.

country year cases population
0 Afghanistan ’99 745 19987071
1 Brazil ’99 37737 172006362
2 China ’99 212258 1272915272
3 Afghanistan ’00 2666 20595360
4 Brazil ’00 80488 174504898
5 China ’00 213766 1280428583

The next function does exactly the same thing.

def calc_prevalence(G):
assert 'cases' in G.columns and 'population' in G.columns
H = G.copy()
H['prevalence'] = H.apply(lambda row: row['cases'] / row['population'], axis=1)
return H

### Tibbles and Bits

Now let’s start creating and manipulating tibbles.

Write a function, canonicalize_tibble(X), that, given a tibble X, returns a new copy Y of X in canonical order. We say Y is in canonical order if it has the following properties.

1. The variables appear in sorted order by name, ascending from left to right.
2. The rows appear in lexicographically sorted order by variable, ascending from top to bottom.
3. The row labels (Y.index) go from 0 to n-1, where n is the number of observations.

The following code exactly does the same:

def canonicalize_tibble(X):
# Enforce Property 1:
var_names = sorted(X.columns)
Y = X[var_names].copy()
Y = Y.sort_values(by=var_names, ascending=True)
Y.reset_index(drop=True, inplace=True)
return Y

## Basic tidying transformations: Implementing Melting and Casting

Given a data set and a target set of variables, there are at least two common issues that require tidying.

## Melting

First, values often appear as columns. Table 4a is an example. To tidy up, we want to turn columns into rows:

Because this operation takes columns into rows, making a “fat” table more tall and skinny,  it is sometimes called melting.

To melt the table, we need to do the following.

1. Extract the column values into a new variable. In this case, columns "1999"  and "2000" of table4 need to become the values of the variable, "year".
2. Convert the values associated with the column values into a new variable as well. In this case, the values formerly in columns "1999" and "2000"become the values of the "cases" variable.

In the context of a melt, let’s also refer to "year" as the new key variable and
"cases" as the new value variable.

Implement the melt operation as a function,

def melt(df, col_vals, key, value):
...


It should take the following arguments:

• df: the input data frame, e.g., table4 in the example above;
• col_vals: a list of the column names that will serve as values;
• key: name of the new variable, e.g., year in the example above;
• value: name of the column to hold the values.

The next function implements the melt operation:

def melt(df, col_vals, key, value):
assert type(df) is pd.DataFrame
df2 = pd.DataFrame()
for col in col_vals:
df1 = pd.DataFrame(df[col].tolist(), columns=[value]) #, index=df.country)
df1[key] = col
other_cols = list(set(df.columns.tolist()) - set(col_vals))
for col1 in other_cols:
df1[col1] = df[col1]
df2 = df2.append(df1, ignore_index=True)
df2 = df2[other_cols + [key, value]]
return df2

with the following output

=== table4a ===

country 1999 2000
0 Afghanistan 745 2666
1 Brazil 37737 80488
2 China 212258 213766
=== melt(table4a) ===

country year cases
0 Afghanistan 1999 745
1 Brazil 1999 37737
2 China 1999 212258
3 Afghanistan 2000 2666
4 Brazil 2000 80488
5 China 2000 213766
=== table4b ===

country 1999 2000
0 Afghanistan 19987071 20595360
1 Brazil 172006362 174504898
2 China 1272915272 1280428583
=== melt(table4b) ===

country year population
0 Afghanistan 1999 19987071
1 Brazil 1999 172006362
2 China 1999 1272915272
3 Afghanistan 2000 20595360
4 Brazil 2000 174504898
5 China 2000 1280428583

## Casting

The second most common issue is that an observation might be split across multiple rows. Table 2 is an example. To tidy up, we want to merge rows:

Because this operation is the moral opposite of melting, and “rebuilds” observations from parts, it is sometimes called casting.

Melting and casting are Wickham’s terms from his original paper on tidying data. In his more recent writing, on which this tutorial is based, he refers to the same operation as gathering. Again, this term comes from Wickham’s original paper, whereas his more recent summaries use the term spreading.

The signature of a cast is similar to that of melt. However, we only need to know the key, which is column of the input table containing new variable names, and the value, which is the column containing corresponding values.

Implement a function to cast a data frame into a tibble, given a key column containing new variable names and a value column containing the corresponding cells.

Observe that we are asking your cast() to accept an optional parameter, join_how, that may take the values 'outer' or 'inner' (with 'outer' as the default).

The following function implements the casting operation:

def cast(df, key, value, join_how='outer'):
"""Casts the input data frame into a tibble,
given the key column and value column.
"""
assert type(df) is pd.DataFrame
assert key in df.columns and value in df.columns
assert join_how in ['outer', 'inner']

fixed_vars = df.columns.difference([key, value])
tibble = pd.DataFrame(columns=fixed_vars) # empty frame

fixed_vars = fixed_vars.tolist()
#tibble[fixed_vars] = df[fixed_vars]
cols = []
for k,df1 in df.groupby(df[key]):
#tibble = pd.concat([tibble.reset_index(drop=True), df1[value]], axis=1)
tibble = tibble.merge(df1[fixed_vars+[value]], on=fixed_vars, how=join_how)
cols.append(str(k)) #list(set(df1[key]))[0])
tibble.columns = fixed_vars + cols

return tibble

with the following output:

=== table2 ===

country year type count
0 Afghanistan 1999 cases 745
1 Afghanistan 1999 population 19987071
2 Afghanistan 2000 cases 2666
3 Afghanistan 2000 population 20595360
4 Brazil 1999 cases 37737
5 Brazil 1999 population 172006362
6 Brazil 2000 cases 80488
7 Brazil 2000 population 174504898
8 China 1999 cases 212258
9 China 1999 population 1272915272
10 China 2000 cases 213766
11 China 2000 population 1280428583
=== tibble2 = cast (table2, "type", "count") ===

country year cases population
0 Afghanistan 1999 745 19987071
1 Afghanistan 2000 2666 20595360
2 Brazil 1999 37737 172006362
3 Brazil 2000 80488 174504898
4 China 1999 212258 1272915272
5 China 2000 213766 1280428583

## Separating variables

Consider the following table.

country year rate
0 Afghanistan 1999 745/19987071
1 Afghanistan 2000 2666/20595360
2 Brazil 1999 37737/172006362
3 Brazil 2000 80488/174504898
4 China 1999 212258/1272915272
5 China 2000 213766/1280428583

In this table, the rate variable combines what had previously been the cases
andpopulation data. This example is an instance in which we might want to separate a column into two variables.

Write a function that takes a data frame (df) and separates an existing column (key) into new variables (given by the list of new variable names, into).  How will the separation happen? The caller should provide a function, splitter(x), that given a value returns a list containing the components.

The following code implements the function:

import re

def default_splitter(text):
"""Searches the given spring for all integer and floating-point
values, returning them as a list _of strings_.

E.g., the call

default_splitter('Give me \$10.52 in exchange for 91 kitten stickers.')

will return ['10.52', '91'].
"""
#fields = re.findall('(\d+\.?\d+)', text)
fields = list(re.match('(\d+)/(\d+)', text).groups())
return fields

def separate(df, key, into, splitter=default_splitter):
"""Given a data frame, separates one of its columns, the key,
into new variables.
"""
assert type(df) is pd.DataFrame
assert key in df.columns
return (df.merge(df[key].apply(lambda s: pd.Series({into[i]:splitter(s)[i] for i in range(len(into))})),
left_index=True, right_index=True)).drop(key, axis=1)

with the following output:

=== table3 ===

country year rate
0 Afghanistan 1999 745/19987071
1 Afghanistan 2000 2666/20595360
2 Brazil 1999 37737/172006362
3 Brazil 2000 80488/174504898
4 China 1999 212258/1272915272
5 China 2000 213766/1280428583
=== tibble3 = separate (table3, ...) ===

country year cases population
0 Afghanistan 1999 745 19987071
1 Afghanistan 2000 2666 20595360
2 Brazil 1999 37737 172006362
3 Brazil 2000 80488 174504898
4 China 1999 212258 1272915272
5 China 2000 213766 1280428583

# Some Computational Photography: Image Quilting (Texture Synthesis) with Dynamic Programming and Texture Transfer (Drawing with Textures) in Python

The following problems appeared as a programming assignment in the Computation Photography course (CS445) at UIUC. The description of the problem is taken from the assignment itself. In this assignment, a python implementation of the problems will be described instead of matlab, as expected in the course.

## The Problems

• The goal of this assignment is to implement the image quilting algorithm for
texture synthesis and transfer, described in this SIGGRAPH 2001 paper by Efros
and Freeman.
• Texture synthesis is the creation of a larger texture image from a small sample.
• Texture transfer is giving an object the appearance of having the
same texture as a sample while preserving its basic shape.
• For texture synthesis, the main idea is to sample patches and lay them down in overlapping patterns, such that the overlapping regions are similar.
• The overlapping regions may not match exactly, which will result in noticeable
edges artifact. To fix this, we need to compute a path along pixels with similar intensities through the overlapping region and use it to select which overlapping patch from which to draw each pixel.
• Texture transfer is achieved by encouraging sampled patches to have similar appearance to a given target image, as well as matching overlapping regions of already sampled patches.
• In this project, we need to apply important techniques such as template matching, finding seams, and masking. These techniques are also useful for image stitching, image completion, image retargeting and blending.

### Randomly Sampled Texture

First let’s randomly samples square patches of size patchsize from sample in order to create an output image of size outsize. Start from the upper-left corner, and tile samples until the image is full. If the patches don’t fit evenly into the output image, we can leave black borders at the edges. This is the simplest but least effective method. Save a result from a sample image to compare to the next two methods.

### Overlapping Patches

Let’s start by sampling a random patch for the upper-left corner. Then sample new patches to overlap with existing ones. For example, the second patch along the top row will overlap by patchsize pixels in the vertical direction and overlap pixels in the horizontal direction. Patches in the first column will overlap by patchsize pixels in the horizontal direction and overlap pixels in the vertical direction. Other patches will have two overlapping regions (on the top and left) which should both be taken into account. Once the cost of each patch has been computed, randomly choose on patch whose cost is
less than a threshold determined by some tolerance value.

As described in the paper, the size of the block is the only parameter controlled by the user and it depends on the properties of a given texture; the block must be big enough to capture the relevant structures in the texture, but small enough so that the interaction between these structures is left up to the algorithm. The overlap size is taken to be one-sixth of the block size (B/6) in general.

### Seam Finding

Next we need to find the min-cost contiguous path from the left to right side of the patch according to the cost. The cost of a path through each pixel is the square differences (summed over RGB for color images) of the output image and the newly
sampled patch. Use dynamic programming to find the min-cost path.

The following figure describes the algorithm to be implemented for image quilting.

### Texture Transfer

The final task is to implement texture transfer, based on the quilt implementation for creating a texture sample that is guided by a pair of sample/target correspondence images (section 3 of the paper). The main difference between this function and
quilt function is that there is an additional cost term based on the difference between
the sampled source patch and the target patch at the location to be filled.

## Image quilting (texture synthesis) results

The following figures and animations show the results of the outputs obtained with the quilting algorithm. The input texture images are mostly taken from the paper .

Input sample Texture

100 sampled blocks of a fixed size (e.g. 50×50) from the input sample

The next animation shows how the large output texture gets created (100 times larger than the input sample texture) with the quilting algorithm.

Output Texture (10×10 times larger than the input) created with texture synthesis (quilting)

Input Texture

Output Texture (25 times larger than the input) created with texture synthesis (quilting) with the minimum cost seams (showed as red lines) computed with dynamic programming

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (12 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (36 times larger than the input) created with quilting

Input Texture

Output Texture (9 times larger than the input) created with quilting

Input Texture

Output Texture (25 times larger than the input) created with quilting

Input Texture

Output Texture (9 times larger than the input) created with quilting along with the min-cost seams (shown as red lines) computed with dynamic programming

Output Texture (9 times larger than the input) created with quilting

## Texture transfer results

The following figures and animations show how the texture from an input image can be transferred to the target image using the above-mentioned simple modification of the quilting algorithm. Again, some of the images are taken from the paper.

Input Texture (milk)

Target Image

Output Image after Texture Transfer

Input Texture (milk)

Target Image

Output Image after Texture Transfer

The following figures show the output image obtained when a few textures were transferred to my image.

Target Image (me)

Input Texture (fire)

Output Image after Texture Transfer  (with small block size)

Input Texture (cloud)

Output Image after Texture Transfer  (with small block size)

Input Texture (toast)

Output Image after Texture Transfer  (with small block size)

Now applying some gradient mixing such as Poisson blending  on the original image and the the one after texture transfer with the target toast image we get the following two images respectively.

Input Texture

Output Image after Texture Transfer  (with small block size)

Input Texture

Output Image after Texture Transfer  (with small block size)

### Drawing with Textures

The following figures show the output image obtained when a few textures were transferred to another image of mine.

Target Image (me)

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Image after Texture Transfer

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Images after Texture Transfer  (with 2 different patch sizes)

Input Texture

Output Images after Texture Transfer (with 2 different patch sizes)

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer

Input Texture

Output Image after Texture Transfer  (with 2 different patch sizes)

Target Image (from The Mummy 1999)

Input Texture (sand)

Output Image after Texture Transfer

The following animation shows how the milk texture is being transformed to the target image of mine with the quilting algorithm with modified code.

Input Texture

Target Image (me)

Output Image after Texture Transfer  (with small block size)

The next figures and animations show the output image obtained after milk texture gets transferred to the target image of mine, for different block size of the samples (shown in red). As can be seen from the following outputs, the target image gets more and more prominent as the sampling block size gets smaller and smaller.

# Feature Detection with Harris Corner Detector and Matching images with Feature Descriptors in Python

The following problem appeared in a project in this Computer Vision Course (CS4670/5670, Spring 2015) at Cornell. In this article, a python implementation is going to be described. The description of the problem is taken (with some modifications) from the project description. The same problem appeared in this assignment problem as well. The images used for testing the algorithms implemented are mostly taken from these assignments / projects.

## The Problem

In this project, we need to implement the problem of detect discriminating features in an image and find the best matching features in other images.  The features should be reasonably invariant to translation, rotation, and illumination.  The slides presented in class can be used as reference.

## Description

The project has three parts:  feature detection, feature description, and feature matching.

### 1. Feature detection

In this step, we need to identify points of interest in the image using the Harris corner detection method.  The steps are as follows:

• For each point in the image, consider a window of pixels around that point.
• Compute the Harris matrix H for (the window around) that point, defined aswhere the summation is over all pixels p in the window.   is the x derivative of the image at point p, the notation is similar for the y derivative.
•  The weights  are chosen to be circularly symmetric,  a 9×9 Gaussian kernel with 0.5 sigma is chosen to achieve this.
• Note that H is a 2×2 matrix.  To find interest points, first we need to compute the corner strength function

• Once c is computed for every point in the image, we need to choose points where c is above a threshold.
• We also want c to be a local maximum in a 9×9 neighborhood (with non-maximum suppression).
• In addition to computing the feature locations, we need to compute a canonical orientation for each feature, and then store this orientation (in degrees) in each feature element.
• To compute the canonical orientation at each pixel, we need to compute the gradient of the blurred image and use the angle of the gradient as orientation.

### 2. Feature description

• Now that the points of interest are identified,  the next step is to come up with a descriptor for the feature centered at each interest point.  This descriptor will be the representation to be used to compare features in different images to see if they match.
• In this article, we shall implement a simple descriptor, a 8×8 square window without orientation. This should be very easy to implement and should work well when the images we’re comparing are related by a translation. We also normalize the window to have zero mean and unit variance, in order to obtain illumination invariance.
• In order to obtain rotational invariance MOPS descriptor, by taking care of the orientation that is not discussed in this article for the time being.

### 3. Feature matching

• Now that the features in the image are detected and described, the next step is to write code to match them, i.e., given a feature in one image, find the best matching feature in one or more other images.
• The simplest approach is the following:  write a procedure that compares two features and outputs a distance between them.  For example, we simply sum the absolute value of differences between the descriptor elements.
• We then use this distance to compute the best match between a feature in one image and the set of features in another image by finding the one with the smallest distance. The distance used here is the Manhattan distance.

The following theory and math for the Harris Corner Detection will be used that’s taken from this youtube video.

The following figure shows the structure of the python code to implement the algorithm.

## Feature Detection (with Harris Corner Detection): Results on a few images

• The threshold to be used for the Harris Corner Detection is varied (as shown in the following animations in red, with the value of the threshold being 10^x, where x is shown (the common logarithm of the threshold is displayed).
• The corner strength function with kappa=0.04 is used instead of the minimum eigenvalue (since it’s faster to compute).
• As can be seen from the following animations, lesser and lesser corner features are detected when the threshold is increased.
• The direction and magnitude of the feature is shown by the bounding (green) square’s angle with the horizontal and the size of the square respectively, computed from the gradient matrices.

Input Image

Harris Corner Features Detected for different threshold values (log10)

Input Image

The following figure shows the result of thresholding on

1. the Harris corner strength R values and
2. the minimum eigenvalue for the Harris matrix respectively,

for each pixel, before applying non-maximum suppression (computing the local maximum).

The next animation shows the features detected after applying non-maximum suppression, with different threshold values.

Harris Corner Features Detected for different threshold values (log10)

Input Image

Harris Corner Features Detected for different threshold values (log10)

Input Image

Harris Corner Features Detected for different threshold values (log10)

Input Image

Harris Corner Features Detected for different threshold values (log10)

## Computing Feature descriptors

• In this article, we shall implement a very simple descriptor, a 8×8 square window without orientation. This is expected to work well when the images being compared are related by a translation.
• We also normalize the window to have zero mean and unit variance, in order to obtain illumination invariance.
• In order to obtain rotational invariance MOPS descriptor, by taking care of the orientation that is not discussed in this article for the time being.

## Matching Images with Detected Features: Results on a few images

• First the Harris corner features and the simple descriptors are computed for each of the images to be compared.
• Next, distance between each pair of corner feature descriptors is computed, by simply computing the sum the absolute value of differences between the descriptor elements.
• This distance is then used to compute the best match between a feature in one image and the set of features in another image by finding the one with the smallest distance.
• The following examples show how the matching works with simple feature descriptors around the Harris corners for images obtained using mutual translations.

Input images (one is a translation of the other)

Harris Corner Features Detected for the images

Matched Features with minimum sum of absolute distance

Input images

Harris Corner Features Detected for the images

Matched Features with minimum sum of absolute distance

The following example shows the input images to be compared being created more complex transformations (not only translation) and as expected, the simple feature descriptor does not work well in this case, as shown. We need some feature descriptors like SIFT to obtain robustness against rotation and scaling too.

Input images

Harris Corner Features Detected for the images

Matched Features with minimum sum of absolute distance

# Seam Carving: Using Dynamic Programming to implement Content-Aware Image Resizing in Python

The following problem appeared as an assignment in the Algorithm Course (COS 226) at Princeton University taught by Prof. Sedgewick.  The following description of the problem is taken from the assignment itself.

## The Seam Carving Problem

• Seam-carving is a content-aware image resizing technique where the image is reduced in size by one pixel of height (or width) at a time.
• vertical seam in an image is a path of pixels connected from the top to the bottom with one pixel in each row.
• horizontal seam is a path of pixels connected from the left to the right with one pixel in each column.
• Unlike standard content-agnostic resizing techniques (such as cropping and scaling), seam carving preserves the most interesting features (aspect ratio, set of objects present, etc.) of the image.
• In this assignment, a data type needs to be created that resizes W-by-H image using the seam-carving technique.
• Finding and removing a seam involves three parts:
1. Energy calculation.The first step is to calculate the energy of a pixel, which is a measure of its importance—the higher the energy, the less likely that the pixel will be included as part of a seam (as you will see in the next step).In this assignment, we shall use the dual-gradient energy function, which is described below.Computing the energy of a pixel. With the dual-gradient energy function, the energy of  pixel (x,y) is  given by the following:
2. Seam identification.The next step is to find a vertical seam of minimum total energy. (Finding a horizontal seam is analogous.) This is similar to the classic shortest path problem in an edge-weighted digraph, but there are three important differences:
• The weights are on the vertices instead of the edges.
• The goal is to find the shortest path from any of the W pixels in the top row to any of the W pixels in the bottom row.
• The digraph is acyclic, where there is a downward edge from pixel (xy) to pixels (x − 1, y + 1), (xy + 1), and (x + 1, y + 1). assuming that the coordinates are in the prescribed ranges.
• Also, Seams cannot wrap around the image (e.g., a vertical seam cannot cross from the leftmost column of the image to the rightmost column).
• As described in the paper, the optimal seam can be found using dynamic programming. The first step is to traverse the image from the second row to the last row and compute the cumulative minimum energy M for all possible connected seams for each pixel (i, j):
3. Seam removal.The final step is remove from the image all of the pixels along the vertical or horizontal seam.

## Results with a few images

The following image is the original 507×284 pixel input image taken from the same assignment. The next few images and animations show the outputs of the seam carving algorithm implementation. The shape (the transpose of the image shape is reported) and size of the image (in terms of the memory taken by python np array as float, to store the image) is also reported for each iteration of the algorithm. As can be seen, the algorithm resizes the image by removing  the minimum energy vertical (and horizontal seams) iteratively one at a time, by keeping the main objects as is.

Input Original Image

Removing the Vertical Seams

After Removing 200 Vertical Seams

Removing the Vertical and the Horizontal Seams in alternating manner

After Removing 100 Vertical Seams and 100 Horizontal Seams

The following is the original 1024×576 image of the Dakshineswar Temple, Kolkata along with the removed vertical seams with content-aware resizing.

Output image after removing 500 Vertical Seams

The next animation again shows how the content-aware resizing preserves the objects in the original image.

The next image is the original dolphin 239×200 image taken from the paper, along with the removed vertical seams with content-aware resizing.

After removing 112 Vertical Seams

The next animation shows how the seams are deleted from the image in the reverse order of deletion.

The next image is the original 750×498 bird image taken from the paper, along with the removed vertical seams with content-aware resizing.

After Removing 297 Vertical Seams

The next image is the original 450×299 sea image taken from the paper, along with the removed vertical seams with content-aware resizing.

The next image is the original 628×413 cycle image taken from the paper, along with the removed vertical seams with content-aware resizing.

After Removing 99 Vertical Seams

The next image is the original 697×706 Fuji image again taken from the paper, along with the removed vertical seams with content-aware resizing.

After Removing 282 Vertical Seams

## Object Removal

The same technique can be applied with mask to remove objects from an image. For example. consider the following image of the shoes taken from the same paper.

Let’s use a black mask to remove a shoe that we don’t want, as shown in the next figure.

Finally the vertical seams can be forced to go through the masked object, as shown in the next animation,  in order to remove the masked object completely just by using context-aware resizing.

Output after removing the shoe with content-aware image resize algorithm

# Measuring Semantic Relatedness using the Distance and the Shortest Common Ancestor and Outcast Detection with Wordnet Digraph in Python

The following problem appeared as an assignment in the Algorithm Course (COS 226) at Princeton University taught by Prof. Sedgewick.  The description of the problem is taken from the assignment itself. However, in the assignment, the implementation is supposed to be in java, in this article a python implementation will be described instead. Instead of using nltk, this implementation is going to be from scratch.

## The Problem

• WordNet is a semantic lexicon for the English language that computational linguists and cognitive scientists use extensively. For example, WordNet was a key component in IBM’s Jeopardy-playing Watson computer system.
• WordNet groups words into sets of synonyms called synsets. For example, { AND circuitAND gate } is a synset that represent a logical gate that fires only when all of its inputs fire.
• WordNet also describes semantic relationships between synsets. One such relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more general synset). For example, the synset gatelogic gate } is a hypernym of { AND circuitAND gate } because an AND gate is a kind of logic gate.
• The WordNet digraph. The first task is to build the WordNet digraph: each vertex v is an integer that represents a synset, and each directed edge v→w represents that w is a hypernym of v.
• The WordNet digraph is a rooted DAG: it is acyclic and has one vertex—the root— that is an ancestor of every other vertex.
• However, it is not necessarily a tree because a synset can have more than one hypernym. A small subgraph of the WordNet digraph appears below.

### The WordNet input file formats

The following two data files will be used to create the WordNet digraph. The files are in comma-separated values (CSV) format: each line contains a sequence of fields, separated by commas.

• List of synsets: The file synsets.txt contains all noun synsets in WordNet, one per line. Line i of the file (counting from 0) contains the information for synset i.
• The first field is the synset id, which is always the integer i;
• the second field is the synonym set (or synset); and
• the third field is its dictionary definition (or gloss), which is not relevant to this assignment.

For example, line 36 means that the synset { AND_circuitAND_gate } has an id number of 36 and its gloss is a circuit in a computer that fires only when all of its inputs fire. The individual nouns that constitute a synset are separated by spaces. If a noun contains more than one word, the underscore character connects the words (and not the space character).
• List of hypernyms: The file hypernyms.txt contains the hypernym relationships. Line i of the file (counting from 0) contains the hypernyms of synset i.
• The first field is the synset id, which is always the integer i;
• subsequent fields are the id numbers of the synset’s hypernyms.

For example, line 36 means that synset 36 (AND_circuit AND_Gate) has 42338 (gate logic_gate) as its only hypernym. Line 34 means that synset 34 (AIDS acquired_immune_deficiency_syndrome) has two hypernyms: 47569 (immunodeficiency) and 56099 (infectious_disease).

### The WordNet data type

Implement an immutable data type WordNet with the following API:

• The Wordnet Digraph contains 76066 nodes and 84087 edges, it’s very difficult to visualize the entire graph at once, hence small subgraphs will be displayed as and when required relevant to the context of the examples later.

• The sca() and the distance() between two nodes v and w are implemented using bfs (bread first search) starting from the two nodes separately and combining the distances computed.

Performance requirements

• The data type must use space linear in the input size (size of synsets and hypernyms files).
• The constructor must take time linearithmic (or better) in the input size.
• The method isNoun() must run in time logarithmic (or better) in the number of nouns.
• The methods distance() and sca() must make exactly one call to the length() and ancestor() methods in ShortestCommonAncestor, respectively.

### The Shortest Common Ancestor

• An ancestral path between two vertices v and w in a rooted DAG is a directed path from v to a common ancestor x, together with a directed path from w to the same ancestor x.
• shortest ancestral path is an ancestral path of minimum total length.
• We refer to the common ancestor in a shortest ancestral path as a shortest common ancestor.
• Note that a shortest common ancestor always exists because the root is an ancestor of every vertex. Note also that an ancestral path is a path, but not a directed path.

• The following animation shows how the shortest common ancestor node 1 for the nodes 3 and 10  for the following rooted DAG is found at distance 4 with bfs, along with the ancestral path 3-1-5-9-10.
• The following animation shows how the shortest common ancestor node 5 for the nodes 8 and 11  for the following rooted DAG is found at distance 3 with bfs, along with the ancestral path 8-5-9-11.
• The following animation shows how the shortest common ancestor node 0 for the nodes 2 and for the following rooted DAG is found at distance 4 with bfs, along with the ancestral path 6-3-1-0-2.

• We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B.
• The figure (digraph25.txt) below shows an example in which, for two subsets, red and blue, we have computed several (but not all) ancestral paths, including the shortest one.

• The following animation shows how the shortest common ancestor node 3 for the set of nodes {13, 23, 24} and {6, 16, 17}  for the following rooted DAG is found at associated length (distance) with bfs, along with the ancestral path 13-7-3-9-16.Shortest common ancestor data type

Implement an immutable data type ShortestCommonAncestor with the following API:

Basic performance requirements

The data type must use space proportional to E + V, where E and V are the number of edges and vertices in the digraph, respectively. All methods and the constructor must take time proportional to EV (or better).

### Measuring the semantic relatedness of two nouns

Semantic relatedness refers to the degree to which two concepts are related. Measuring semantic relatedness is a challenging problem. For example, let’s consider George W. Bushand John F. Kennedy (two U.S. presidents) to be more closely related than George W. Bush and chimpanzee (two primates). It might not be clear whether George W. Bush and Eric Arthur Blair are more related than two arbitrary people. However, both George W. Bush and Eric Arthur Blair (a.k.a. George Orwell) are famous communicators and, therefore, closely related.

Let’s define the semantic relatedness of two WordNet nouns x and y as follows:

• A = set of synsets in which x appears
• B = set of synsets in which y appears
• distance(x, y) = length of shortest ancestral path of subsets A and B
• sca(x, y) = a shortest common ancestor of subsets A and B

This is the notion of distance that we need to use to implement the distance() and sca() methods in the WordNet data type.

### Finding semantic relatedness for some example nouns with the shortest common ancestor and the distance method implemented

apple and potato (distance 5 in the Wordnet Digraph, as shown below)

As can be seen, the noun entity is the root of the Wordnet DAG.

beer and diaper (distance 13 in the Wordnet Digraph)

beer and milk (distance 4 in the Wordnet Digraph, with SCA as drink synset), as expected since they are more semantically closer to each other.

bread and butter (distance 3 in the Wordnet Digraph, as shown below)

cancer and AIDS (distance 6 in the Wordnet Digraph, with SCA as disease as shown below, bfs computed distances and the target distance between the nouns are also shown)

car and vehicle (distance 2 in the Wordnet Digraph, with SCA as vehicle as shown below)

cat and dog (distance 4 in the Wordnet Digraph, with SCA as carnivore as shown below)

cat and milk (distance 7 in the Wordnet Digraph, with SCA as substance as shown below, here cat is identified as Arabian tea)

Einstein and Newton (distance 2 in the Wordnet Digraph, with SCA as physicist as shown below)

Leibnitz and Newton (distance 2 in the Wordnet Digraph, with SCA as mathematician)

Gandhi and Mandela (distance 2 in the Wordnet Digraph, with SCA as national_leader synset)

laptop and internet (distance 11 in the Wordnet Digraph, with SCA as instrumentation synset)

school and office (distance 5 in the Wordnet Digraph, with SCA as construction synset as shown below)

bed and table (distance 3 in the Wordnet Digraph, with SCA as furniture synset as shown below)

Tagore and Einstein (distance 4 in the Wordnet Digraph, with SCA as intellectual synset as shown below)

Tagore and Gandhi (distance 8 in the Wordnet Digraph, with SCA as person synset as shown below)

Tagore and Shelley (distance 2 in the Wordnet Digraph, with SCA as author as shown below)

text and mining (distance 12 in the Wordnet Digraph, with SCA as abstraction synset as shown below)

milk and water (distance 3 in the Wordnet Digraph, with SCA as food, as shown below)

### Outcast detection

Given a list of WordNet nouns x1x2, …, xn, which noun is the least related to the others? To identify an outcast, compute the sum of the distances between each noun and every other one:

di   =   distance(xix1)   +   distance(xix2)   +   …   +   distance(xixn)

and return a noun xt for which dt is maximum. Note that distance(xixi) = 0, so it will not contribute to the sum.

Implement an immutable data type Outcast with the following API:

### Examples

As expected, potato is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except potato are fruits, but potato is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.

Again, as expected, table is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except table are mammals, but table is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.

Finally, as expected, bed is the outcast  in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except bed are drinks, but bed is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.

# Some more Variational Image Processing: Diffusion, TV Denoising, TV Image Inpainting in Python

In this article, a few variational image processing techniques will  be described along with application of those techniques with some images, most of the problems are taken from the assignments from this course.

## Some preliminaries: The Calculus of Images – Computing Curvature and TV

• Let’s first compute the Euclidian Curvature of a few images.  The curvature measures the rate at which the unit gradient vector is changing and is given by
• The following couple of images are used to compute the curvature. As can be seen from the below figures, the curvature is zero in flat regions and along straight edges and non-zero along the rounded edges of the circles, as expected.
• Now, let’s compute the total variation (TV), which is given by the following.
• First we need to approximate the partial derivatives using a forward difference.
• Let’s compute the TV for the grayscale image Cameraman. Now let’s add more and more Salt & Pepper noise (by increasing the probability threshold p) to the image and see how the norm of the gradient matrix along with the TV value changes from the following figure.

## The Heat Equation and Diffusion

Let’s implement the isotropic and anisotropic diffusion by solving PDEs numerically.
The following figure shows the math.

The following shows the isotropic diffusion output with Δt = 0.1, with gradient descent.  As can be seen, the results are same as applying gaussian blur kernel on the image.

The following shows the anisotropic diffusion output with Δt = 0.1, with gradient descent, with a = 5, 20, 100 respectively.  As can be seen, unlike isotropic diffusion, the anisotropic diffusion preserves the edges.

## Creating Cartoon / flat-texture Images with Anisotropic Diffusion

As can be seen from the following figures and animations, we can create cartoons from the real images with anisotropic diffusion, by applying the diffusion on each channel, this time on color images (the barbara image and my image).

Original image

Cartoon Image with anisotropic diffusion (a=5)

Original Image

Cartoon Image with anisotropic diffusion (a=5)

## Total Variation Denoising

The following math is going to be used for TV denoising, the Euler-Lagrange equation is used to solve the minimum of the functional, as shown in the following figures with proof.

• First a noisy grayscale image is prepared by adding Gaussian noise to the cameraman image.

Original Cameraman

Noisy Cameraman

• Let’s first denoise this image with linear TV denoising. The next animations show the results obtained , using the fidelity weight λ=1. As can be seen, even with the fidelity term, this model blurs the edges.
• Now let’s denoise this image with Nonlinear TV denoising. The next animations show the results obtained , using the fidelity weight λ=0.01 and λ=1 respectively.

## Image Inpainting

Inpainting is the process of restoring damaged or missing parts of an image. Suppose we have a binary mask D that specifies the location of the damaged pixels in the input image f as shown below:

The following theory is going to be used for TV inpainting.

Damaged image

Damaged image

Damaged image

Damaged image

# Some Variational Image Processing: Poisson Image Editing and its applications in Python

## Poisson Image Editing

The goal of Poisson image editing is to perform seamless blending of an object or a texture from a source image (captured by a mask image) to a target image. We want to create a photomontage by pasting an image region onto a new background using Poisson image editing. This idea is from the P´erez et al’s SIGGRAPH 2003 paper Poisson Image Editing.

The following figures describe the basic theory. As can be seen, the problem is first expressed in the continuous domain as a constrained variational optimization problem (Euler-Lagrange equation is used to find a solution) and then can be solved using a discrete Poisson solver.

As described in the paper and also in this assignment from this MIT course on Computational Photography, the main task of Poisson image editing is to solve a huge linear system Ax = b (where I is the new unknown image and S and T are the known images).

## Seamless Cloning

The following images are taken from an assignment  from the same MIT course, where the Poisson image editing had to be used to blend the source inside the mask inside the target image. The next few figures show the result obtained.

Source Image

Target Image

Output Gray-Scale Image with Poisson Image Editing

The next animation shows the solution gray-scale images obtained at different iterations using Conjugate Gradient method when solving the linear system of equations.

Output Color Image with Poisson Image Editing

The next animation shows the solution color images obtained at different iterations using Conjugate Gradient method to solve the linear system of equations, applying the discrete Poisson solver on each channel.

The following images are taken from this UNC course on computational photography. Again, the same technique is used to blend the octopus from the source image to the target image.

Source Image

Target Image

Output Image

The next animation shows the solution color images obtained at different iterations using Conjugate Gradient method to solve the linear system of equations, applying the discrete Poisson solver on each channel.

Again, Poisson gradient domain seamless cloning was used to blend the penguins’ image  inside the following target image with appropriate mask.

Source Image

Target Image

Output Image

The next animation again shows the solution color images obtained at different iterations using Conjugate Gradient method to solve the linear system of equations, applying the discrete Poisson solver on each channel.

The next figures show how a source bird image is blended into the target cloud image with seamless cloning.

Source Image

Target Image

Output gray-scale image

The next animation shows the solution gray-scale images obtained at the first few iterations using Conjugate Gradient method when solving the linear system of equations.

Finally, the next figures show how the idol of the Goddess Durga is blended into the target image of the city of kolkata with seamless cloning. As can be seen, since the source mask had its own texture and there is a lots of variations in the background texture, the seamless cloning does not work that well.

Source Image

Target Image

Output Image

The next animation shows the solution gray-scale images obtained at the first few iterations using Conjugate Gradient method when solving the linear system of equations.

The next animation again shows the solution color images obtained at different iterations using Conjugate Gradient method while solving the linear system of equations, applying the discrete Poisson solver on each channel.

## Feature Cloning: Inserting objects

The next figures are taken from the same paper, here the eyes, nose and lips from the  source face image is going to be inserted into the target monalisa face.

Source image          Target image         Output image

The next animation again shows the solution color images obtained at different iterations using Conjugate Gradient method while solving the linear system of equations, applying the discrete Poisson solver on each channel.

## Texture Swapping: Feature exchange with Seamless Cloning

As discussed in the paper, seamless cloning allows the user to replace easily certain features of one object by alternative features. The next figure shows how the texture of the other fruit was transferred to the orange, the images being taken from the same paper.

Source Image        Target Image        Mask Image

Output Image

The next animation again shows the solution color images obtained at different iterations using Conjugate Gradient method while solving the linear system of equations, applying the discrete Poisson solver on each channel.

## Gradient Mixing: Inserting objects with holes

Again, the next figures are taken from the same paper, this time the source objects contain holes. As can be seen from the following results, the seamless cloning does not work well in this case for inserting the object with holes into the target, the target texture is lost inside the mask after blending.

Source Image                                            Target Image

Output Image with Poisson Seamless Cloning

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for seamless cloning, applying the discrete Poisson solver on each channel.

Using the mixing gradients method the blending result obtained is far better, as shown below, it preserves the target texture.

Output Image with Poisson Mixing Gradients

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for mixing gradients, applying the discrete Poisson solver on each channel.

## Mixing Gradients: Inserting transparent objects

The next example shows the insertion of a rainbow into a target image, the images are taken from the paper again. As can be seen, the seamless cloning wrongly places the rainbow in front of the coconut tree in the target image. Using gradient mixing, the stronger gradient is used as the image gradient and this solves the issue.

Source Image                              Target Image                         Mask Image

Output Image with Seamless Cloning

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for mixing gradients, applying the discrete Poisson solver on each channel.

The next few figures show the results obtained using mixing gradients on another set of images, the seamless cloning does not work well in this case, but mixing gradient works just fine.

Source Image

Target Image

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for mixing gradients, applying the discrete Poisson solver on each channel.

## Texture Flattening: Creating Flat-texture Cartoon-like images

As illustrated in the paper, by retaining only the gradients at edge locations, before integrating with the Poisson solver, one washes out the texture of the selected region, giving its contents a flat aspect.

The following figures show the output cartoon-like image obtained using texture flattening, using the canny-edge detector to generate the  mask.

Source Image

Mask Image created with Canny edge detector

Output cartoon obtained with texture flattening from the source with the mask

The next animation shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for texture flattening, applying the discrete Poisson solver on each channel.

Again, the next figures show the output cartoon-like image obtained using texture flattening, using the canny-edge detector to generate the  mask on my image.

Source Image

Output image obtained with texture flattening

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for texture flattening, applying the discrete Poisson solver on each channel.

# Some more Social Network Analysis with Python: Centrality, PageRank/HITS, Random Network Generation Models, Link Prediction

In this article, some more social networking concepts will be illustrated with a few problems. The problems appeared in the programming assignments in the
coursera course Applied Social Network Analysis in Python.  The descriptions of the problems are taken from the assignments. The analysis is done using NetworkX.

The following theory is going to be used to solve the assignment problems.

# 1. Measures of  Centrality

In this assignment, we explore measures of centrality on the following two networks:

1. A friendship network
2. A political blog network.

• First let’s do some analysis with different centrality measures using the friendship network, which is a network of friendships at a university department. Each node corresponds to a person, and an edge indicates friendship. The following figure visualizes the network, with the size of the nodes proportional to the degree of the nodes.

### Degree Centrality

The next figure shows the distribution of the degree centrality of the nodes in the friendship network graph.

The following figure visualizes the network, with the size of the nodes again
proportional to the degree centrality of the nodes.

### Closeness Centrality

The next figure shows the distribution of the closeness centrality of the nodes in the friendship network graph.

The following figure again visualizes the network, with the size of the nodes being
proportional to the closeness centrality of the nodes.

### Betweenness Centrality

The next figure shows the distribution of the betweenness centrality of the nodes in the friendship network graph.

The following figure again visualizes the network, with the size of the nodes being
proportional to the betweenness centrality of the nodes.

• Now, let’s consider another network, which is a directed network of political blogs, where nodes correspond to a blog and edges correspond to links between blogs.
• This network will be used to compute  the following:
• PageRank of the nodes using random walk with damping factor.
• Authority and Hub Score of the nodes using the HITS.
• The blog network looks like the following:
source value
tsrightdominion.blogspot.com Blogarama 1
rightrainbow.com Blogarama 1
gregpalast.com LabeledManually 0
younglibs.com Blogarama 0
blotts.org/polilog Blogarama 0
marylandpolitics.blogspot.com BlogCatalog 1
thesakeofargument.com Blogarama 1
joebrent.blogspot.com Blogarama 0
thesiliconmind.blogspot.com Blogarama 0
40ozblog.blogspot.com Blogarama,BlogCatalog 0
progressivetrail.org/blog.shtml Blogarama 0
sonsoftherepublic.com Blogarama 1
rightvoices.com CampaignLine 1
blogs.salon.com/0002874 LeftyDirectory 0
notgeniuses.com CampaignLine 0
democratreport.blogspot.com BlogCatalog 0
• First let’s visualize the network, next figure shows the visualization. The network has nodes (blogs) from 47 different unique sources, each node belonging to a source is colored with a unique color. The gray lines denote the edges (links) between the nodes (blogs).

• The next figure shows the same network graph, but without the node labels (blog urls).

## Page-Rank and HITS

• Now, let’s apply the Scaled Page Rank Algorithm to this network.,with damping value 0.85. The following figure visualizes the graph with the node size proportional to the page rank of the node.
• The next animations show how the page rank of the nodes in the network changes with the first 25 iterations of the power-iteration algorithm.

• The top 5 nodes with the highest page rank values after 25 iterations of the power-iteration page-rank algorithm are shown below, along with their ranks scores.
• (u’dailykos.com’, 0.020416972184975967)
• (u’atrios.blogspot.com’, 0.019232277371918939)
• (u’instapundit.com’, 0.015715941717833914)
• (u’talkingpointsmemo.com’, 0.0152621016868163)
• (u’washingtonmonthly.com’, 0.013848910355057181)

• The top 10 nodes with the highest page rank values after the convergence of the  page-rank algorithm are shown below, along with their ranks scores.
• (u’dailykos.com’, 0.01790144388519838)
• (u’atrios.blogspot.com’, 0.015178631721614688)
• (u’instapundit.com’, 0.01262709066072975)
• (u’blogsforbush.com’, 0.012508582138399093)
• (u’talkingpointsmemo.com’, 0.012393033204751035)
• (u’michellemalkin.com’, 0.010918873519905312)
• (u’drudgereport.com’, 0.010712415519898428)
• (u’washingtonmonthly.com’, 0.010512012551452737)
• (u’powerlineblog.com’, 0.008939228332543162)
• (u’andrewsullivan.com’, 0.00860773822610682)

• The following figure shows the distribution of the page-ranks of the nodes after the convergence of the algorithm.

• Next, let’s apply the HITS Algorithm to the network to find the hub and authority scores of node.
• The following couple of figures visualizes the network graph where the node size is proportional to the hub score and the authority score for the node respectively, once the HITS algorithm converges.
• Next couple of figures show the distribution of the hub-scores and the authority-scores for the nodes once the HITS converges.

# 2. Random Graph Identification

Given a list containing 5 networkx graphs., with each of these graphs being generated by one of three possible algorithms:

• Preferential Attachment ('PA')
• Small World with low probability of rewiring ('SW_L')
• Small World with high probability of rewiring ('SW_H')

Anaylze each of the 5 graphs and determine which of the three algorithms generated the graph.

The following figures visualize all the graphs along with their degree-distributions. Since the Preferential Attachment model generates graphs with the node degrees following  the power-law distribution (since rich gets richer), the graphs with this pattern in their degree distributions are most likely generated by this model.

On the other hand, the Small World model generates graphs with the node degrees not following the power-law distribution, instead the distribution shows fat tails. If this pattern is seen, we can identify the network as to be generated with this model.

# 3. Prediction using Machine Learning models with the graph-centrality and local clustering features

For this assignment we need to work with a company’s email network where each node corresponds to a person at the company, and each edge indicates that at least one email has been sent between two people.

The network also contains the node attributes Department and ManagmentSalary.

Department indicates the department in the company which the person belongs to, and ManagmentSalary indicates whether that person is receiving a managment position salary. The email-network graph has

• Number of nodes: 1005
• Number of edges: 16706
• Average degree: 33.2458

The following figure visualizes the email network graph.

### Salary Prediction

Using network G, identify the people in the network with missing values for the node attribute ManagementSalary and predict whether or not these individuals are receiving a managment position salary.

To accomplish this, we shall need to

• Create a matrix of node features
• Train a (sklearn) classifier on nodes that have ManagementSalary data and
• Predict a probability of the node receiving a managment salary for nodes where ManagementSalary is missing.

The predictions will need to be given as the probability that the corresponding employee is receiving a managment position salary.

The evaluation metric for this assignment is the Area Under the ROC Curve (AUC).

A model needs to achieve an AUC of 0.75 on the test dataset.

Using the trained classifier, return a series with the data being the probability of receiving managment salary, and the index being the node id (from the test dataset).

The next table shows first few rows of the dataset with the degree and clustering features computed. The dataset contains a few ManagementSalary values are missing (NAN), the corresponding tuples form the test dataset, for which we need to predict the missing ManagementSalary values. The rest will be the training dataset.

Now,  a few classifiers will be trained on the training dataset , they are:

• RandomForest
• SVM

with 3 input features for each node:

• Department
• Clustering (local clustering coefficient for the node)
• Degree

in order to predict the output variable as the indicator receiving  managment salary.

Index Department ManagementSalary clustering degree
0 1 0.0 0.276423 44
1 1 NaN 0.265306 52
2 21 NaN 0.297803 95
3 21 1.0 0.384910 71
4 21 1.0 0.318691 96
5 25 NaN 0.107002 171
6 25 1.0 0.155183 115
7 14 0.0 0.287785 72
8 14 NaN 0.447059 37
9 14 0.0 0.425320 40

Typical 70-30 validation is used for model selection. The next 3 tables show the first few rows of the train, validation and the test datasets respectively.

Department clustering degree ManagementSalary
421 14 0.227755 52 0
972 15 0.000000 2 0
322 17 0.578462 28 0
431 37 0.426877 25 0
506 14 0.282514 63 0
634 21 0.000000 3 0
130 0 0.342857 37 0
140 17 0.394062 41 0
338 13 0.350820 63 0
117 6 0.274510 20 0
114 10 0.279137 142 1
869 7 0.733333 6 0
593 5 0.368177 42 0
925 10 0.794118 19 1
566 14 0.450216 22 0
572 4 0.391304 26 0
16 34 0.284709 74 0
58 21 0.294256 126 1
57 21 0.415385 67 1
207 4 0.505291 30 0

Index Department clustering degree ManagementSalary
952 15 0.533333 8 0
859 32 0.388106 74 1
490 6 0.451220 43 0
98 16 0.525692 25 0
273 17 0.502463 31 0
784 28 0.000000 3 0
750 20 0.000000 1 0
328 8 0.432749 21 0
411 28 0.208364 106 1
908 5 0.566154 28 0
679 29 0.424837 20 0
905 1 0.821429 10 0
453 6 0.427419 34 1
956 14 0.485714 15 0
816 13 0.476190 23 0
127 6 0.341270 28 0
699 14 0.452899 26 0
711 21 0.000000 2 0
123 13 0.365419 36 0
243 19 0.334118 53 0
Department clustering degree
1 1 0.265306 52
2 21 0.297803 95
5 25 0.107002 171
8 14 0.447059 37
14 4 0.215784 80
18 1 0.301188 56
27 11 0.368852 63
30 11 0.402797 68
31 11 0.412234 50
34 11 0.637931 31

The following table shows the first few predicted probabilities  by  the RandomForest classifier on the test dataset.

Index 0 1
0 1.0 0.0
1 0.2 0.8
2 0.0 1.0
3 1.0 0.0
4 0.5 0.5
5 1.0 0.0
6 0.7 0.3
7 0.7 0.3
8 0.3 0.7
9 0.7 0.3
10 0.9 0.1
11 0.8 0.2
12 1.0 0.0
13 0.6 0.4
14 0.7 0.3
15 0.5 0.5
16 0.0 1.0
17 0.2 0.8
18 1.0 0.0
19 1.0 0.0

The next figure shows the ROC curve to compare the performances (AUC) of the classifiers on the validation dataset.

As can be seen, the GradientBoosting  classifier performs the best (has the highest AUC on the validation dataset).

The following figure shows the Decision Surface for the Salary Prediction learnt by the RandomForest Classifier.

# 4. New Connections Prediction (Link Prediction with ML models)

For the last part of this assignment, we shall predict future connections between employees of the network. The future connections information has been loaded into the variable future_connections. The index is a tuple indicating a pair of nodes that currently do not have a connection, and the FutureConnectioncolumn indicates if an edge between those two nodes will exist in the future, where a value of 1.0 indicates a future connection. The next table shows first few rows of the dataset.

Index Future Connection
(6, 840) 0.0
(4, 197) 0.0
(620, 979) 0.0
(519, 872) 0.0
(382, 423) 0.0
(97, 226) 1.0
(349, 905) 0.0
(429, 860) 0.0
(309, 989) 0.0
(468, 880) 0.0

Using network G and future_connections, identify the edges  in future_connections
with missing values and predict whether or not these edges will have a future connection.

To accomplish this, we shall need to

1. Create a matrix of features for the edges found in future_connections
2. Train a (sklearn) classifier on those edges in future_connections that have Future Connection data
3. Predict a probability of the edge being a future connection for those edges in future_connections where Future Connection is missing.

The predictions will need to be given as the probability of the corresponding edge being a future connection.

The evaluation metric for this assignment is again the Area Under the ROC Curve (AUC).

Using the trained classifier, return a series with the data being the probability of the edge being a future connection, and the index being the edge as represented by a tuple of nodes.

Now,  a couple of classifiers will be trained on the training dataset , they are:

• RandomForest

with 2 input features for each edge:

• Preferential attachment
• Common Neighbors

in order to predict the output variable Future Connection.

The next table shows first few rows of the dataset with the preferential attachment and Common Neighbors  features computed.

Index Future Connection preferential attachment Common Neighbors
(6, 840) 0.0 2070 9
(4, 197) 0.0 3552 2
(620, 979) 0.0 28 0
(519, 872) 0.0 299 2
(382, 423) 0.0 205 0
(97, 226) 1.0 1575 4
(349, 905) 0.0 240 0
(429, 860) 0.0 816 0
(309, 989) 0.0 184 0
(468, 880) 0.0 672 1
(228, 715) 0.0 110 0
(397, 488) 0.0 418 0
(253, 570) 0.0 882 0
(435, 791) 0.0 96 1
(711, 737) 0.0 6 0
(263, 884) 0.0 192 0
(342, 473) 1.0 8140 12
(523, 941) 0.0 19 0
(157, 189) 0.0 6004 5
(542, 757) 0.0 90 0

Again, typical 70-30 validation is used for model selection. The next 3 tables show the first few rows of the train, validation and the test datasets respectively.

preferential attachment Common Neighbors Future Connection
(7, 793) 360 0 0
(171, 311) 1620 1 0
(548, 651) 684 2 0
(364, 988) 18 0 0
(217, 981) 648 0 0
(73, 398) 124 0 0
(284, 837) 132 1 0
(748, 771) 272 4 0
(79, 838) 88 0 0
(207, 716) 90 1 1
(270, 928) 15 0 0
(201, 762) 57 0 0
(593, 620) 168 1 0
(494, 533) 18212 40 1
(70, 995) 18 0 0
(867, 997) 12 0 0
(437, 752) 205 0 0
(442, 650) 28 0 0
(341, 900) 351 0 0
(471, 684) 28 0 0

Index preferential attachment Common Neighbors Future Connection
(225, 382) 150 0 0
(219, 444) 594 0 0
(911, 999) 3 0 0
(363, 668) 57 0 0
(161, 612) 2408 4 0
(98, 519) 575 0 0
(59, 623) 636 0 0
(373, 408) 2576 6 0
(948, 981) 27 0 0
(690, 759) 44 0 0
(54, 618) 765 0 0
(149, 865) 330 0 0
(562, 1001) 320 1 1
(84, 195) 4884 10 1
(421, 766) 260 0 0
(599, 632) 70 0 0
(814, 893) 10 0 0
(386, 704) 24 0 0
(294, 709) 75 0 0
(164, 840) 864 3 0

preferential attachment Common Neighbors
(107, 348) 884 2
(542, 751) 126 0
(20, 426) 4440 10
(50, 989) 68 0
(942, 986) 6 0
(324, 857) 76 0
(13, 710) 3600 6
(19, 271) 5040 6
(319, 878) 48 0
(659, 707) 120 0

The next figure shows the ROC curve to compare the performances (AUC) of the classifiers on the validation dataset.

As can be seen, the GradientBoosting  classifier again performs the best (has the highest AUC on the validation dataset).

The following figure shows the Decision Boundary for the Link Prediction learnt by the RandomForest Classifier.

# Some Social Network Analysis with Python

The following problems appeared in the programming assignments in the coursera course Applied Social Network Analysis in Python.  The descriptions of the problems are taken from the assignments. The analysis is done using NetworkX.

The following theory is going to be used to solve the assignment problems.

1. Creating and Manipulating Graphs

• Eight employees at a small company were asked to choose 3 movies that they would most enjoy watching for the upcoming company movie night. These choices are stored in a text file Employee_Movie_Choices , the following figure shows the content of the file.
Index Employee Movie
0 Andy Anaconda
1 Andy Mean Girls
2 Andy The Matrix
3 Claude Anaconda
4 Claude Monty Python and the Holy Grail
5 Claude Snakes on a Plane
6 Frida The Matrix
7 Frida The Shawshank Redemption
8 Frida The Social Network
9 Georgia Anaconda
10 Georgia Monty Python and the Holy Grail
11 Georgia Snakes on a Plane
12 Joan Forrest Gump
13 Joan Kung Fu Panda
14 Joan Mean Girls
15 Lee Forrest Gump
16 Lee Kung Fu Panda
17 Lee Mean Girls
18 Pablo The Dark Knight
19 Pablo The Matrix
20 Pablo The Shawshank Redemption
21 Vincent The Godfather
22 Vincent The Shawshank Redemption
23 Vincent The Social Network
• A second text file, Employee_Relationships, has data on the relationships between different coworkers.  The relationship score has value of -100 (Enemies) to +100 (Best Friends). A value of zero means the two employees haven’t interacted or are indifferent. The next figure shows the content of this file.

Index Employee1 Employee2 Score
0 Andy Claude 0
1 Andy Frida 20
2 Andy Georgia -10
3 Andy Joan 30
4 Andy Lee -10
5 Andy Pablo -10
6 Andy Vincent 20
7 Claude Frida 0
8 Claude Georgia 90
9 Claude Joan 0
10 Claude Lee 0
11 Claude Pablo 10
12 Claude Vincent 0
13 Frida Georgia 0
14 Frida Joan 0
15 Frida Lee 0
16 Frida Pablo 50
17 Frida Vincent 60
18 Georgia Joan 0
19 Georgia Lee 10
20 Georgia Pablo 0
21 Georgia Vincent 0
22 Joan Lee 70
23 Joan Pablo 0
24 Joan Vincent 10
25 Lee Pablo 0
26 Lee Vincent 0
27 Pablo Vincent -20
0 Claude Andy 0
1 Frida Andy 20
2 Georgia Andy -10
3 Joan Andy 30
4 Lee Andy -10
5 Pablo Andy -10
6 Vincent Andy 20
7 Frida Claude 0
8 Georgia Claude 90
9 Joan Claude 0
10 Lee Claude 0
11 Pablo Claude 10
12 Vincent Claude 0
13 Georgia Frida 0
14 Joan Frida 0
15 Lee Frida 0
16 Pablo Frida 50
17 Vincent Frida 60
18 Joan Georgia 0
19 Lee Georgia 10
20 Pablo Georgia 0
21 Vincent Georgia 0
22 Lee Joan 70
23 Pablo Joan 0
24 Vincent Joan 10
25 Pablo Lee 0
26 Vincent Lee 0
27 Vincent Pablo -20
• First, let’s load the bipartite graph from Employee_Movie_Choices file, the following figure visualizes the graph. The blue nodes represent the employees and the red nodes represent the movies.

• The following figure shows yet another visualization of the same graph, this time with a different layout.
• Now, let’s find a weighted projection of the graph which tells us how many movies different pairs of employees have in common. We need to compute an L-bipartite projection for this, the projected graph is shown in the next figure.
• The following figure shows the same projected graph with another layout and with weights. For example, Lee and Joan has the weight 3 for their connecting edges, since they share 3 movies in common as their movie-choices.

• Next, let’s load the graph from Employee_Relationships  file, the following figure visualizes the graph. The nodes represent the employees and the edge colors and widths (weights) represent the relationships. The green edges denote friendship, the red edges enmity and blue edges neutral relations. Also, the thicker an edge is, the more powerful is a +ve or a -ve relation.
• Suppose we like to find out if people that have a high relationship score also like the same types of movies.
• Let’s find the Pearson correlation between employee relationship scores and the number of movies they have in common. If two employees have no movies in common it should be treated as a 0, not a missing value, and should be included in the correlation calculation.
• The following data frame is created from the graphs and will be used to compute the correlation.
Index Employee1 Employee2 Relationship Score Weight in Projected Graph
0 Andy Claude 0 1.0
1 Andy Frida 20 1.0
2 Andy Georgia -10 1.0
3 Andy Joan 30 1.0
4 Andy Lee -10 1.0
5 Andy Pablo -10 1.0
6 Andy Vincent 20 0.0
7 Claude Frida 0 0.0
8 Claude Georgia 90 3.0
9 Claude Joan 0 0.0
10 Claude Lee 0 0.0
11 Claude Pablo 10 0.0
12 Claude Vincent 0 0.0
13 Frida Georgia 0 0.0
14 Frida Joan 0 0.0
15 Frida Lee 0 0.0
16 Frida Pablo 50 2.0
17 Frida Vincent 60 2.0
18 Georgia Joan 0 0.0
19 Georgia Lee 10 0.0
20 Georgia Pablo 0 0.0
21 Georgia Vincent 0 0.0
22 Joan Lee 70 3.0
23 Joan Pablo 0 0.0
24 Joan Vincent 10 0.0
25 Lee Pablo 0 0.0
26 Lee Vincent 0 0.0
27 Pablo Vincent -20 1.0
28 Claude Andy 0 1.0
29 Frida Andy 20 1.0
30 Georgia Andy -10 1.0
31 Joan Andy 30 1.0
32 Lee Andy -10 1.0
33 Pablo Andy -10 1.0
34 Vincent Andy 20 0.0
35 Frida Claude 0 0.0
36 Georgia Claude 90 3.0
37 Joan Claude 0 0.0
38 Lee Claude 0 0.0
39 Pablo Claude 10 0.0
40 Vincent Claude 0 0.0
41 Georgia Frida 0 0.0
42 Joan Frida 0 0.0
43 Lee Frida 0 0.0
44 Pablo Frida 50 2.0
45 Vincent Frida 60 2.0
46 Joan Georgia 0 0.0
47 Lee Georgia 10 0.0
48 Pablo Georgia 0 0.0
49 Vincent Georgia 0 0.0
50 Lee Joan 70 3.0
51 Pablo Joan 0 0.0
52 Vincent Joan 10 0.0
53 Pablo Lee 0 0.0
54 Vincent Lee 0 0.0
55 Vincent Pablo -20 1.0
• The correlation score is 0.788396222173 which is a pretty strong correlation. The following figure shows the association between the two variables with a fitted regression line.

# 2. Network Connectivity

• In this assignment we shall go through the process of importing and analyzing an internal email communication network between employees of a mid-sized manufacturing company.

Each node represents an employee and each directed edge between two nodes represents an individual email. The left node represents the sender and the right node represents the recipient, as shown in the next figure.

•  First let’s load the email-network as a directed multigraph and visualize the graph in the next figure. The graph contains 167 nodes (employees) and 82927 edges (emails sent). The size of a node in the figure is proportional to the out-degree of the node.

• The next couple of figures visualize the same network with different layouts.

• Assume that information in this company can only be exchanged through email.When an employee sends an email to another employee, a communication channel has been created, allowing the sender to provide information to the receiver, but not vice versa.Based on the emails sent in the data, is it possible for information to go from every employee to every other employee?

This will only be possible if the graph is strongly connected, but it’s not. The following figure shows 42 strongly-connected components (SCC) of the graph. Each SCC is shown using a distinct color.

As can be seen from the following histogram, only one SCC contains 126 nodes, each of the remaining 41 SCCs contains exactly one node.

• Now assume that a communication channel established by an email allows information to be exchanged both ways.Based on the emails sent in the data, is it possible for information to go from every employee to every other employee?This is possible since the graph is weakly connected.
• The following figure shows the subgraph  induced by the largest SCC with 126 nodes.
• The next figure shows the distribution of the (shortest-path) distances between the node-pairs in the largest SCC.

As can be seen from above, inside the largest SCC, all  the nodes are reachable from one another with at most 3 hops, the average distance between any node pairs belonging to the SCC being 1.6461587301587302.

• Diameter of the largest SCC: The largest possible distance between two employees, which is 3.
• Find the set of nodes in the subgraph with eccentricity equal to the diameter: these are exactly the nodes that are on the periphery. As can be seen from the next figure (the largest component is shown along with few other nodes from some other components in the graph, all the nodes and edges in the graph are not shown to avoid over-crowding), there are exactly 3 nodes on the periphery of the SCC, namely the node 97, 129 and 134.Each of the following 3 shortest paths shown in the next figure
• 97->14->113->133
• 129->1->38->132 and
• 134->1->38->132

has length equal to the diameter of this SCC.

• Center of the largest SCC: The set of node(s) in the subgraph induced by the largest SCC with eccentricity equal to the radius (which is 1). There is exactly one such node (namely, node 38), as shown in the next figure, all the nodes belonging to the largest SCC are distance-1 reachable from the center node 38 (again, the largest component is shown along with few other nodes from some other components in the graph, all the nodes and edges in the graph are not shown to avoid over-crowding).
• The following figure shows the distribution of eccentricity in the largest SCC.
• Which node in the largest SCC has the most shortest paths to other nodes whose distance equal the diameter ?  How many of these paths are there?As can be seen from the following figure, the desired node is 97 and there are 63 such shortest paths that have length equal to the diameter of the SCC,  5 of such paths (each with length 3) are shown in the next figure, they are:
• 97->14->113->133
• 97->14->113->130
• 97->14->113->136
• 97->14->45->131
• 97->14->45->132
.
• Suppose we want to prevent communication from flowing to the node 97 (the node that has the most shortest paths to other nodes whose distance equal the diameter), from any node in the center of the largest component, what is the smallest number of nodes we would need to remove from the graph (we’re not allowed to remove the node 97 or the center node(s))?

As obvious, the minimum number of nodes required to be removed exactly equals to the size of the min-cut with the source node (center node 38) to the target node (node 97), shown in red in the next figure. The size of the min-cut is 5 and hence 5 nodes (shown in pale-golden-red color) need to be removed, the corresponding node numbers are: 14, 32, 37, 45 and 46. As done in the earlier figures, all the nodes and edges in the email-net graph are not shown to avoid over-crowding.

The min-cut is separately shown in the following figure from source node 38 to target node 97.

• Construct an undirected graph from the subgraph induced by the largest component on the email-net directed multi-graph.

The next figure shows the undirected graph constructed. As before, the node size is proportional to the degree of the node.

• What is the transitivity and average clustering coefficient of the undirected graph?The transitivity and average clustering coefficient of the undirected graph are 0.570111160700385 and 0.6975272437231418 respectively.

The following figure shows the distribution of the local clustering coefficients.

The following figure shows the undirected graph, this time the node size being proportional to the local clustering coefficient of the node.

The next figure shows the degree distribution of the undirected graph.

Since there are more nodes with lower degrees than with higher degrees and the transitivity weights the nodes with higher degree more, the undirected graph has  lower transitivity and higher average clustering coefficient.