# Hand-Gesture Classification using Deep Convolution and Residual Neural Network (ResNet-50) with Tensorflow / Keras in Python

In this article, first an application of convolution net to classify a set of hand-sign images is going to be discussed.  Later the accuracy of this classifier will be improved using a deep res-net. These problems appeared as assignments in the Coursera course Convolution Neural Networks (a part of deep-learning specialization) by the Stanford Prof. Andrew Ng. (deeplearning.ai). The problem descriptions are taken straightaway from the course itself.

## 1. Hand-gesture Classification with Convolution Neural Network

In this assignment, the following tasks are going to be accomplished:

• Implement a fully functioning ConvNet using TensorFlow.
• Build and train a ConvNet in TensorFlow for a classification problem

This assignment is going to be done using tensorflow.

First the necessary packages are loaded:

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
import tensorflow as tf
from tensorflow.python.framework import ops
from cnn_utils import *

%matplotlib inline
np.random.seed(1)

Next the “SIGNS” dataset is loaded that we are going to use. The SIGNS dataset is a collection of 6 signs representing numbers from 0 to 5, as shown in the next figure. The output classes are shown with one hot encoding.

# Loading the data (signs)
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

The next figures show a few randomly sampled images for each class label from the training dataset. There are 180 images for each class and a total of 108 images in the training dataset.

number of training examples = 1080
number of test examples = 120
X_train shape: (1080, 64, 64, 3)
Y_train shape: (1080, 6)
X_test shape: (120, 64, 64, 3)
Y_test shape: (120, 6)


The following steps are to be executed to train a conv-net model with tensorflow using the trainign dataset and then classify the images from the test dataset using the model.

### Create placeholders

TensorFlow requires that we create placeholders for the input data that will be fed into the model when running the session.

Let’s implement the function below to create placeholders for the input image X and the output Y. We should not define the number of training examples for the moment. To do so, we could use “None” as the batch size, it will give us the flexibility to choose it later. Hence X should be of dimension [None, n_H0, n_W0, n_C0] and Y should be of dimension  [None, n_y].

def create_placeholders(n_H0, n_W0, n_C0, n_y):
"""
Creates the placeholders for the tensorflow session.

Arguments:
n_H0 -- scalar, height of an input image
n_W0 -- scalar, width of an input image
n_C0 -- scalar, number of channels of the input
n_y -- scalar, number of classes

Returns:
X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"
"""

X = tf.placeholder(tf.float32, shape=(None, n_H0, n_W0, n_C0))
Y = tf.placeholder(tf.float32, shape=(None, n_y))

return X, Y

### Initialize parameters

Let’s initialize weights/filters W1 and Wusing xavier_initializer.
We don’t need to worry about bias variables as you will soon see that TensorFlow functions take care of the bias. Note also that you will only initialize the weights/filters for the conv2d functions. TensorFlow initializes the layers for the fully connected part automatically.

def initialize_parameters():
"""
Initializes weight parameters to build a neural network with tensorflow. The shapes are:
W1 : [4, 4, 3, 8]
W2 : [2, 2, 8, 16]
Returns:
parameters -- a dictionary of tensors containing W1, W2
"""

tf.set_random_seed(1)                              # so that our "random" numbers match ours

W1 = tf.get_variable("W1", (4, 4, 3, 8), initializer = tf.contrib.layers.xavier_initializer(seed = 0))
W2 = tf.get_variable("W2", (2, 2, 8, 16), initializer = tf.contrib.layers.xavier_initializer(seed = 0))

parameters = {"W1": W1,
"W2": W2}

return parameters


### Forward propagation

Next we need to implement the forward_propagation function below to build the following model:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED.

We need to use the following built-in tensorflow functions:

• tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’): given an input XX and a group of filters W1W1, this function convolves W1W1‘s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation here
• tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’): given an input A, this function uses a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. You can read the full documentation here
• tf.nn.relu(Z1): computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation here.
• tf.contrib.layers.flatten(P): given an input P, this function flattens each example into a 1D vector it while maintaining the batch-size. It returns a flattened tensor with shape [batch_size, k]. You can read the full documentation here.
• tf.contrib.layers.fully_connected(F, num_outputs): given a the flattened input F, it returns the output computed using a fully connected layer. You can read the full documentation here.

In detail, we will use the following parameters for all the steps:

 - Conv2D: stride 1, padding is "SAME"
- ReLU
- Max pool: Use an 8 by 8 filter size and an 8 by 8 stride, padding is "SAME"
- Conv2D: stride 1, padding is "SAME"
- ReLU
- Max pool: Use a 4 by 4 filter size and a 4 by 4 stride, padding is "SAME"
- Flatten the previous output.
- FULLYCONNECTED (FC) layer: Apply a fully connected layer without an non-linear activation function. Do not call the softmax here. This will result in 6 neurons in the output layer, which then get passed later to a softmax. In TensorFlow, the softmax and cost function are lumped together into a single function, which you'll call in a different function when computing the cost. 
def forward_propagation(X, parameters):
"""
Implements the forward propagation for the model:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "W2"
the shapes are given in initialize_parameters

Returns:
Z3 -- the output of the last LINEAR unit
"""


### Compute cost

Next step is to implement the compute cost function using the following tensorflow functions:

• tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y): computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation here.
• tf.reduce_mean: computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation here.
def compute_cost(Z3, Y):
"""
Computes the cost

Arguments:
Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
Y -- "true" labels vector placeholder, same shape as Z3

Returns:
cost - Tensor of the cost function
"""

## Model

Finally we need to merge the helper functions we implemented above to build a model and train it on the SIGNS dataset.

The model should:

• create placeholders
• initialize parameters
• forward propagate
• compute the cost
• create an optimizer

Finally we need to create a session and run a for loop for num_epochs, get the mini-batches, and then for each mini-batch you will optimize the function.

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,
num_epochs = 100, minibatch_size = 64, print_cost = True):
"""
Implements a three-layer ConvNet in Tensorflow:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

Arguments:
X_train -- training set, of shape (None, 64, 64, 3)
Y_train -- test set, of shape (None, n_y = 6)
X_test -- training set, of shape (None, 64, 64, 3)
Y_test -- test set, of shape (None, n_y = 6)
learning_rate -- learning rate of the optimization
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 100 epochs

Returns:
train_accuracy -- real number, accuracy on the train set (X_train)
test_accuracy -- real number, testing accuracy on the test set (X_test)
parameters -- parameters learnt by the model. They can then be used to predict.
"""

Then let’s train the model for 100 epochs.

_, _, parameters = model(X_train, Y_train, X_test, Y_test)

with the following output:
Cost after epoch 0: 1.918487
Cost after epoch 5: 1.875008
Cost after epoch 10: 1.813409
Cost after epoch 15: 1.667654
Cost after epoch 20: 1.444399
Cost after epoch 25: 1.203926
Cost after epoch 30: 1.028009
Cost after epoch 35: 0.887578
Cost after epoch 40: 0.791803
Cost after epoch 45: 0.712319
Cost after epoch 50: 0.655244
Cost after epoch 55: 0.597494
Cost after epoch 60: 0.556236
Cost after epoch 65: 0.525260
Cost after epoch 70: 0.484548
Cost after epoch 75: 0.477365
Cost after epoch 80: 0.451908
Cost after epoch 85: 0.415393
Cost after epoch 90: 0.386501
Cost after epoch 95: 0.373167

Tensor(“Mean_1:0”, shape=(), dtype=float32)
Train Accuracy: 0.894444
Test Accuracy: 0.841667

## 2. Improving the Accuracy of the Hand-Gesture Classifier with Residual Networks

Now we shall learn how to build very deep convolutional networks, using Residual Networks (ResNets). In theory, very deep networks can represent very complex functions; but in practice, they are hard to train. Residual Networks, introduced by He et al., allow to train much deeper networks than were previously practically feasible.

In this assignment, the following tasks we are going to accomplish:

• Implement the basic building blocks of ResNets.
• Put together these building blocks to implement and train a state-of-the-art neural network for image classification.

This assignment will be done in Keras.

Let’s first load the following required packages.

import numpy as np
from keras import layers
from keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from keras.preprocessing import image
from keras.utils import layer_utils
from keras.utils.data_utils import get_file
from keras.applications.imagenet_utils import preprocess_input
import pydot_ng as pydot
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
from keras.utils import plot_model
from resnets_utils import *
from keras.initializers import glorot_uniform
import scipy.misc
from matplotlib.pyplot import imshow
%matplotlib inline
import keras.backend as K
K.set_image_data_format('channels_last')
K.set_learning_phase(1)

## The problem of very deep neural networks

In recent years, neural networks have become deeper, with state-of-the-art networks going from just a few layers (e.g., AlexNet) to over a hundred layers.

The main benefit of a very deep network is that it can represent very complex functions. It can also learn features at many different levels of abstraction, from edges (at the lower layers) to very complex features (at the deeper layers). However, using a deeper network doesn’t always help. A huge barrier to training them is vanishing gradients: very deep networks often have a gradient signal that goes to zero quickly, thus making gradient descent unbearably slow.

During training, we might therefore see the magnitude (or norm) of the gradient for the earlier layers descrease to zero very rapidly as training proceeds:

We are now going to solve this problem by building a Residual Network!

## Building a Residual Network

In ResNets, a “shortcut” or a “skip connection” allows the gradient to be directly back-propagated to earlier layers:

The image on the left shows the “main path” through the network. The image on the right adds a shortcut to the main path. By stacking these ResNet blocks on top of each other, we can form a very deep network.

Two main types of blocks are used in a ResNet, depending mainly on whether the input/output dimensions are same or different. We are going to implement both of them.

### 1 – The identity block

The identity block is the standard block used in ResNets, and corresponds to the case where the input activation (say a[l]) has the same dimension as the output activation (say a[l+2]). To flesh out the different steps of what happens in a ResNet’s identity block, here is an alternative diagram showing the individual steps:

The upper path is the “shortcut path.” The lower path is the “main path.” In this diagram, we have also made explicit the CONV2D and ReLU steps in each layer. To speed up training we have also added a BatchNorm step.

In this exercise, we’ll actually implement a slightly more powerful version of this identity block, in which the skip connection “skips over” 3 hidden layers rather than 2 layers. It looks like this:

Here’re the individual steps.

First component of main path:

• The first CONV2D has F1 filters of shape (1,1) and a stride of (1,1). Its padding is “valid” and its name should be conv_name_base + '2a'. Use 0 as the seed for the random initialization.
• The first BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2a'.
• Then apply the ReLU activation function. This has no name and no hyperparameters.

Second component of main path:

• The second CONV2D has F2 filters of shape (f,fand a stride of (1,1). Its padding is “same” and its name should be conv_name_base + '2b'. Use 0 as the seed for the random initialization.
• The second BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2b'.
• Then apply the ReLU activation function. This has no name and no hyperparameters.

Third component of main path:

• The third CONV2D has F3 filters of shape (1,1) and a stride of (1,1). Its padding is “valid” and its name should be conv_name_base + '2c'. Use 0 as the seed for the random initialization.
• The third BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2c'. Note that there is no ReLU activation function in this component.

Final step:

• The shortcut and the input are added together.
• Then apply the ReLU activation function. This has no name and no hyperparameters.

Now let’s implement the ResNet identity block.

• To implement the Conv2D step: See reference
• To implement BatchNorm: See reference (axis: Integer, the axis that should be normalized (typically the channels axis))
• For the activation, use: Activation('relu')(X)
• To add the value passed forward by the shortcut: See reference
defidentity_block(X, f, filters, stage, block):
"""
Implementation of the identity block as defined in Figure 3

Arguments:
X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
f -- integer, specifying the shape of the middle CONV's window for the main path
filters -- python list of integers, defining the number of filters in the CONV layers of the main path
stage -- integer, used to name the layers, depending on their position in the network
block -- string/character, used to name the layers, depending on their position in the network

Returns:
X -- output of the identity block, tensor of shape (n_H, n_W, n_C)
"""
### The first Component ###
# defining name basis
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'

# Retrieve Filters
F1, F2, F3 = filters

# Save the input value. You'll need this later to add back to the main path.
X_shortcut = X

# First component of main path
X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)

### The second Component ###

# ...

### The third Component ###

# ...

return X

## 2 – The convolutional block

Next, the ResNet “convolutional block” is the other type of block. We can use this type of block when the input and output dimensions don’t match up. The difference with the identity block is that there is a CONV2D layer in the shortcut path:

The CONV2D layer in the shortcut path is used to resize the input x to a different dimension, so that the dimensions match up in the final addition needed to add the shortcut value back to the main path. For example, to reduce the activation dimensions’s height and width by a factor of 2, we can use a 1×1 convolution with a stride of 2. The CONV2D layer on the shortcut path does not use any non-linear activation function. Its main role is to just apply a (learned) linear function that reduces the dimension of the input, so that the dimensions match up for the later addition step.

The details of the convolutional block are as follows.

First component of main path:

• The first CONV2D has F1 filters of shape (1,1) and a stride of (s,s). Its padding is “valid” and its name should be conv_name_base + '2a'.
• The first BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2a'.
• Then apply the ReLU activation function. This has no name and no hyperparameters.

Second component of main path:

• The second CONV2D has F2 filters of (f,f) and a stride of (1,1). Its padding is “same” and it’s name should be conv_name_base + '2b'.
• The second BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2b'.
• Then apply the ReLU activation function. This has no name and no hyperparameters.

Third component of main path:

• The third CONV2D has F3 filters of (1,1) and a stride of (1,1). Its padding is “valid” and it’s name should be conv_name_base + '2c'.
• The third BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '2c'. Note that there is no ReLU activation function in this component.

Shortcut path:

• The CONV2D has F3 filters of shape (1,1) and a stride of (s,s). Its padding is “valid” and its name should be conv_name_base + '1'.
• The BatchNorm is normalizing the channels axis. Its name should be bn_name_base + '1'.

Final step:

• The shortcut and the main path values are added together.
• Then apply the ReLU activation function. This has no name and no hyperparameters.

Let’s now implement the convolutional block.

• Conv Hint
• BatchNorm Hint (axis: Integer, the axis that should be normalized (typically the features axis))
• For the activation, use: Activation('relu')(X)
defconvolutional_block(X, f, filters, stage, block, s = 2):
"""
Implementation of the convolutional block as defined in Figure 4

Arguments:
X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
f -- integer, specifying the shape of the middle CONV's window for the main path
filters -- python list of integers, defining the number of filters in the CONV layers of the main path
stage -- integer, used to name the layers, depending on their position in the network
block -- string/character, used to name the layers, depending on their position in the network
s -- Integer, specifying the stride to be used

Returns:
X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)
"""

# defining name basis
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'

# Retrieve Filters
F1, F2, F3 = filters

# Save the input value
X_shortcut = X

##### MAIN PATH #####
# First component of main path
X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)

# Second component of main path
# ...
# Third component of main path
# ...
##### SHORTCUT PATH ####
# ...
# Final step: Add shortcut value to main path, and pass it through a RELU activation
# ...
return X

## 3 – Building our first ResNet model (50 layers)

We now have the necessary blocks to build a very deep ResNet. The following figure describes in detail the architecture of this neural network. “ID BLOCK” in the diagram stands for “Identity block,” and “ID BLOCK x3” means we should stack 3 identity blocks together.

The details of this ResNet-50 model are:

• Stage 1:
• The 2D Convolution has 64 filters of shape (7,7) and uses a stride of (2,2). Its name is “conv1”.
• BatchNorm is applied to the channels axis of the input.
• MaxPooling uses a (3,3) window and a (2,2) stride.
• Stage 2:
• The convolutional block uses three set of filters of size [64,64,256], “f” is 3, “s” is 1 and the block is “a”.
• The 2 identity blocks use three set of filters of size [64,64,256], “f” is 3 and the blocks are “b” and “c”.
• Stage 3:
• The convolutional block uses three set of filters of size [128,128,512], “f” is 3, “s” is 2 and the block is “a”.
• The 3 identity blocks use three set of filters of size [128,128,512], “f” is 3 and the blocks are “b”, “c” and “d”.
• Stage 4:
• The convolutional block uses three set of filters of size [256, 256, 1024], “f” is 3, “s” is 2 and the block is “a”.
• The 5 identity blocks use three set of filters of size [256, 256, 1024], “f” is 3 and the blocks are “b”, “c”, “d”, “e” and “f”.
• Stage 5:
• The convolutional block uses three set of filters of size [512, 512, 2048], “f” is 3, “s” is 2 and the block is “a”.
• The 2 identity blocks use three set of filters of size [512, 512, 2048], “f” is 3 and the blocks are “b” and “c”.
• The 2D Average Pooling uses a window of shape (2,2) and its name is “avg_pool”.
• The flatten doesn’t have any hyperparameters or name.
• The Fully Connected (Dense) layer reduces its input to the number of classes using a softmax activation. Its name should be 'fc' + str(classes).

Let’s implement the ResNet with 50 layers described in the figure above.

We’ll need to use this function:

Here’re some other functions we used in the code below:

def ResNet50(input_shape = (64, 64, 3), classes = 6):
"""
Implementation of the popular ResNet50 the following architecture:
CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3
-> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER

Arguments:
input_shape -- shape of the images of the dataset
classes -- integer, number of classes

Returns:
model -- a Model() instance in Keras
"""

# Define the input as a tensor with shape input_shape
X_input = Input(input_shape)

# Stage 1
X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3, 3), strides=(2, 2))(X)

# Stage 2
X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1)
X = identity_block(X, 3, [64, 64, 256], stage=2, block='b')
X = identity_block(X, 3, [64, 64, 256], stage=2, block='c')

# ...

# ...

# output layer
X = Flatten()(X)
X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X)

# Create model
model = Model(inputs = X_input, outputs = X, name='ResNet50')

return model

Next, let’s build the model’s graph. We have 6 output classes for the hand-signs dataset.

model = ResNet50(input_shape = (64, 64, 3), classes = 6)


We need to configure the learning process by compiling the model.

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


The model is now ready to be trained. The only thing we need is to pass the same hand-signs dataset that we used earlier. We need to load the dataset.

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

# Normalize image vectors
X_train = X_train_orig/255.
X_test = X_test_orig/255.

# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T

print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
number of training examples = 1080
number of test examples = 120
X_train shape: (1080, 64, 64, 3)
Y_train shape: (1080, 6)
X_test shape: (120, 64, 64, 3)
Y_test shape: (120, 6)


Now let’s train our  resnet model on 20 epochs with a batch size of 32.

model.fit(X_train, Y_train, epochs = 20, batch_size = 32)
Epoch 1/20
1080/1080 [==============================] - 173s - loss: 2.0610 - acc: 0.3435
Epoch 2/20
1080/1080 [==============================] - 149s - loss: 1.8561 - acc: 0.4259
Epoch 3/20
1080/1080 [==============================] - 147s - loss: 2.0284 - acc: 0.4343
Epoch 4/20
1080/1080 [==============================] - 151s - loss: 1.7140 - acc: 0.4500
Epoch 5/20
1080/1080 [==============================] - 134s - loss: 1.4401 - acc: 0.5676
Epoch 6/20
1080/1080 [==============================] - 128s - loss: 1.1950 - acc: 0.6481
Epoch 7/20
1080/1080 [==============================] - 129s - loss: 0.9886 - acc: 0.7426
Epoch 8/20
1080/1080 [==============================] - 133s - loss: 1.2155 - acc: 0.6843
Epoch 9/20
1080/1080 [==============================] - 131s - loss: 0.8536 - acc: 0.8185
Epoch 10/20
1080/1080 [==============================] - 132s - loss: 0.9502 - acc: 0.7565
Epoch 11/20
1080/1080 [==============================] - 129s - loss: 0.8180 - acc: 0.8111
Epoch 12/20
1080/1080 [==============================] - 130s - loss: 0.7060 - acc: 0.8343
Epoch 13/20
1080/1080 [==============================] - 130s - loss: 0.8687 - acc: 0.8148
Epoch 14/20
1080/1080 [==============================] - 130s - loss: 0.8282 - acc: 0.8509
Epoch 15/20
1080/1080 [==============================] - 130s - loss: 0.9303 - acc: 0.7972
Epoch 16/20
1080/1080 [==============================] - 146s - loss: 1.1211 - acc: 0.7870
Epoch 17/20
1080/1080 [==============================] - 143s - loss: 0.9337 - acc: 0.7824
Epoch 18/20
1080/1080 [==============================] - 150s - loss: 0.3976 - acc: 0.8870
Epoch 19/20
1080/1080 [==============================] - 143s - loss: 0.2532 - acc: 0.9407
Epoch 20/20
1080/1080 [==============================] - 133s - loss: 0.2528 - acc: 0.9556

Let’s see how this model performs on the test set.

preds = model.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))
Loss = 0.36906948487
Test Accuracy = 0.891666662693

We can also print a summary of your model by running the following code.

model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 64, 64, 3)     0
____________________________________________________________________________________________________
____________________________________________________________________________________________________
conv1 (Conv2D)                   (None, 32, 32, 64)    9472
____________________________________________________________________________________________________
bn_conv1 (BatchNormalization)    (None, 32, 32, 64)    256
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 32, 32, 64)    0
____________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)   (None, 15, 15, 64)    0
____________________________________________________________________________________________________
res2a_branch2a (Conv2D)          (None, 15, 15, 64)    4160
____________________________________________________________________________________________________
bn2a_branch2a (BatchNormalizatio (None, 15, 15, 64)    256
____________________________________________________________________________________________________
activation_5 (Activation)        (None, 15, 15, 64)    0
____________________________________________________________________________________________________
res2a_branch2b (Conv2D)          (None, 15, 15, 64)    36928
____________________________________________________________________________________________________
bn2a_branch2b (BatchNormalizatio (None, 15, 15, 64)    256
____________________________________________________________________________________________________
activation_6 (Activation)        (None, 15, 15, 64)    0
____________________________________________________________________________________________________
res2a_branch1 (Conv2D)           (None, 15, 15, 256)   16640
____________________________________________________________________________________________________
res2a_branch2c (Conv2D)          (None, 15, 15, 256)   16640
____________________________________________________________________________________________________
bn2a_branch1 (BatchNormalization (None, 15, 15, 256)   1024
____________________________________________________________________________________________________
bn2a_branch2c (BatchNormalizatio (None, 15, 15, 256)   1024
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_7 (Activation)        (None, 15, 15, 256)   0
____________________________________________________________________________________________________
res2b_branch2a (Conv2D)          (None, 15, 15, 64)    16448
____________________________________________________________________________________________________
bn2b_branch2a (BatchNormalizatio (None, 15, 15, 64)    256
____________________________________________________________________________________________________
activation_8 (Activation)        (None, 15, 15, 64)    0
____________________________________________________________________________________________________
res2b_branch2b (Conv2D)          (None, 15, 15, 64)    36928
____________________________________________________________________________________________________
bn2b_branch2b (BatchNormalizatio (None, 15, 15, 64)    256
____________________________________________________________________________________________________
activation_9 (Activation)        (None, 15, 15, 64)    0
____________________________________________________________________________________________________
res2b_branch2c (Conv2D)          (None, 15, 15, 256)   16640
____________________________________________________________________________________________________
bn2b_branch2c (BatchNormalizatio (None, 15, 15, 256)   1024
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_10 (Activation)       (None, 15, 15, 256)   0
____________________________________________________________________________________________________
res2c_branch2a (Conv2D)          (None, 15, 15, 64)    16448
____________________________________________________________________________________________________
bn2c_branch2a (BatchNormalizatio (None, 15, 15, 64)    256
____________________________________________________________________________________________________
activation_11 (Activation)       (None, 15, 15, 64)    0
____________________________________________________________________________________________________
res2c_branch2b (Conv2D)          (None, 15, 15, 64)    36928
____________________________________________________________________________________________________
bn2c_branch2b (BatchNormalizatio (None, 15, 15, 64)    256
____________________________________________________________________________________________________
activation_12 (Activation)       (None, 15, 15, 64)    0
____________________________________________________________________________________________________
res2c_branch2c (Conv2D)          (None, 15, 15, 256)   16640
____________________________________________________________________________________________________
bn2c_branch2c (BatchNormalizatio (None, 15, 15, 256)   1024
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_13 (Activation)       (None, 15, 15, 256)   0
____________________________________________________________________________________________________
res3a_branch2a (Conv2D)          (None, 8, 8, 128)     32896
____________________________________________________________________________________________________
bn3a_branch2a (BatchNormalizatio (None, 8, 8, 128)     512
____________________________________________________________________________________________________
activation_14 (Activation)       (None, 8, 8, 128)     0
____________________________________________________________________________________________________
res3a_branch2b (Conv2D)          (None, 8, 8, 128)     147584
____________________________________________________________________________________________________
bn3a_branch2b (BatchNormalizatio (None, 8, 8, 128)     512
____________________________________________________________________________________________________
activation_15 (Activation)       (None, 8, 8, 128)     0
____________________________________________________________________________________________________
res3a_branch1 (Conv2D)           (None, 8, 8, 512)     131584
____________________________________________________________________________________________________
res3a_branch2c (Conv2D)          (None, 8, 8, 512)     66048
____________________________________________________________________________________________________
bn3a_branch1 (BatchNormalization (None, 8, 8, 512)     2048
____________________________________________________________________________________________________
bn3a_branch2c (BatchNormalizatio (None, 8, 8, 512)     2048
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_16 (Activation)       (None, 8, 8, 512)     0
____________________________________________________________________________________________________
res3b_branch2a (Conv2D)          (None, 8, 8, 128)     65664
____________________________________________________________________________________________________
bn3b_branch2a (BatchNormalizatio (None, 8, 8, 128)     512
____________________________________________________________________________________________________
activation_17 (Activation)       (None, 8, 8, 128)     0
____________________________________________________________________________________________________
res3b_branch2b (Conv2D)          (None, 8, 8, 128)     147584
____________________________________________________________________________________________________
bn3b_branch2b (BatchNormalizatio (None, 8, 8, 128)     512
____________________________________________________________________________________________________
activation_18 (Activation)       (None, 8, 8, 128)     0
____________________________________________________________________________________________________
res3b_branch2c (Conv2D)          (None, 8, 8, 512)     66048
____________________________________________________________________________________________________
bn3b_branch2c (BatchNormalizatio (None, 8, 8, 512)     2048
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_19 (Activation)       (None, 8, 8, 512)     0
____________________________________________________________________________________________________
res3c_branch2a (Conv2D)          (None, 8, 8, 128)     65664
____________________________________________________________________________________________________
bn3c_branch2a (BatchNormalizatio (None, 8, 8, 128)     512
____________________________________________________________________________________________________
activation_20 (Activation)       (None, 8, 8, 128)     0
____________________________________________________________________________________________________
res3c_branch2b (Conv2D)          (None, 8, 8, 128)     147584
____________________________________________________________________________________________________
bn3c_branch2b (BatchNormalizatio (None, 8, 8, 128)     512
____________________________________________________________________________________________________
activation_21 (Activation)       (None, 8, 8, 128)     0
____________________________________________________________________________________________________
res3c_branch2c (Conv2D)          (None, 8, 8, 512)     66048
____________________________________________________________________________________________________
bn3c_branch2c (BatchNormalizatio (None, 8, 8, 512)     2048
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_22 (Activation)       (None, 8, 8, 512)     0
____________________________________________________________________________________________________
res3d_branch2a (Conv2D)          (None, 8, 8, 128)     65664
____________________________________________________________________________________________________
bn3d_branch2a (BatchNormalizatio (None, 8, 8, 128)     512
____________________________________________________________________________________________________
activation_23 (Activation)       (None, 8, 8, 128)     0
____________________________________________________________________________________________________
res3d_branch2b (Conv2D)          (None, 8, 8, 128)     147584
____________________________________________________________________________________________________
bn3d_branch2b (BatchNormalizatio (None, 8, 8, 128)     512
____________________________________________________________________________________________________
activation_24 (Activation)       (None, 8, 8, 128)     0
____________________________________________________________________________________________________
res3d_branch2c (Conv2D)          (None, 8, 8, 512)     66048
____________________________________________________________________________________________________
bn3d_branch2c (BatchNormalizatio (None, 8, 8, 512)     2048
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_25 (Activation)       (None, 8, 8, 512)     0
____________________________________________________________________________________________________
res4a_branch2a (Conv2D)          (None, 4, 4, 256)     131328
____________________________________________________________________________________________________
bn4a_branch2a (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_26 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4a_branch2b (Conv2D)          (None, 4, 4, 256)     590080
____________________________________________________________________________________________________
bn4a_branch2b (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_27 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4a_branch1 (Conv2D)           (None, 4, 4, 1024)    525312
____________________________________________________________________________________________________
res4a_branch2c (Conv2D)          (None, 4, 4, 1024)    263168
____________________________________________________________________________________________________
bn4a_branch1 (BatchNormalization (None, 4, 4, 1024)    4096
____________________________________________________________________________________________________
bn4a_branch2c (BatchNormalizatio (None, 4, 4, 1024)    4096
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_28 (Activation)       (None, 4, 4, 1024)    0
____________________________________________________________________________________________________
res4b_branch2a (Conv2D)          (None, 4, 4, 256)     262400
____________________________________________________________________________________________________
bn4b_branch2a (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_29 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4b_branch2b (Conv2D)          (None, 4, 4, 256)     590080
____________________________________________________________________________________________________
bn4b_branch2b (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_30 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4b_branch2c (Conv2D)          (None, 4, 4, 1024)    263168
____________________________________________________________________________________________________
bn4b_branch2c (BatchNormalizatio (None, 4, 4, 1024)    4096
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_31 (Activation)       (None, 4, 4, 1024)    0
____________________________________________________________________________________________________
res4c_branch2a (Conv2D)          (None, 4, 4, 256)     262400
____________________________________________________________________________________________________
bn4c_branch2a (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_32 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4c_branch2b (Conv2D)          (None, 4, 4, 256)     590080
____________________________________________________________________________________________________
bn4c_branch2b (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_33 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4c_branch2c (Conv2D)          (None, 4, 4, 1024)    263168
____________________________________________________________________________________________________
bn4c_branch2c (BatchNormalizatio (None, 4, 4, 1024)    4096
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_34 (Activation)       (None, 4, 4, 1024)    0
____________________________________________________________________________________________________
res4d_branch2a (Conv2D)          (None, 4, 4, 256)     262400
____________________________________________________________________________________________________
bn4d_branch2a (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_35 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4d_branch2b (Conv2D)          (None, 4, 4, 256)     590080
____________________________________________________________________________________________________
bn4d_branch2b (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_36 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4d_branch2c (Conv2D)          (None, 4, 4, 1024)    263168
____________________________________________________________________________________________________
bn4d_branch2c (BatchNormalizatio (None, 4, 4, 1024)    4096
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_37 (Activation)       (None, 4, 4, 1024)    0
____________________________________________________________________________________________________
res4e_branch2a (Conv2D)          (None, 4, 4, 256)     262400
____________________________________________________________________________________________________
bn4e_branch2a (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_38 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4e_branch2b (Conv2D)          (None, 4, 4, 256)     590080
____________________________________________________________________________________________________
bn4e_branch2b (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_39 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4e_branch2c (Conv2D)          (None, 4, 4, 1024)    263168
____________________________________________________________________________________________________
bn4e_branch2c (BatchNormalizatio (None, 4, 4, 1024)    4096
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_40 (Activation)       (None, 4, 4, 1024)    0
____________________________________________________________________________________________________
res4f_branch2a (Conv2D)          (None, 4, 4, 256)     262400
____________________________________________________________________________________________________
bn4f_branch2a (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_41 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4f_branch2b (Conv2D)          (None, 4, 4, 256)     590080
____________________________________________________________________________________________________
bn4f_branch2b (BatchNormalizatio (None, 4, 4, 256)     1024
____________________________________________________________________________________________________
activation_42 (Activation)       (None, 4, 4, 256)     0
____________________________________________________________________________________________________
res4f_branch2c (Conv2D)          (None, 4, 4, 1024)    263168
____________________________________________________________________________________________________
bn4f_branch2c (BatchNormalizatio (None, 4, 4, 1024)    4096
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_43 (Activation)       (None, 4, 4, 1024)    0
____________________________________________________________________________________________________
res5a_branch2a (Conv2D)          (None, 2, 2, 512)     524800
____________________________________________________________________________________________________
bn5a_branch2a (BatchNormalizatio (None, 2, 2, 512)     2048
____________________________________________________________________________________________________
activation_44 (Activation)       (None, 2, 2, 512)     0
____________________________________________________________________________________________________
res5a_branch2b (Conv2D)          (None, 2, 2, 512)     2359808
____________________________________________________________________________________________________
bn5a_branch2b (BatchNormalizatio (None, 2, 2, 512)     2048
____________________________________________________________________________________________________
activation_45 (Activation)       (None, 2, 2, 512)     0
____________________________________________________________________________________________________
res5a_branch1 (Conv2D)           (None, 2, 2, 2048)    2099200
____________________________________________________________________________________________________
res5a_branch2c (Conv2D)          (None, 2, 2, 2048)    1050624
____________________________________________________________________________________________________
bn5a_branch1 (BatchNormalization (None, 2, 2, 2048)    8192
____________________________________________________________________________________________________
bn5a_branch2c (BatchNormalizatio (None, 2, 2, 2048)    8192
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_46 (Activation)       (None, 2, 2, 2048)    0
____________________________________________________________________________________________________
res5b_branch2a (Conv2D)          (None, 2, 2, 512)     1049088
____________________________________________________________________________________________________
bn5b_branch2a (BatchNormalizatio (None, 2, 2, 512)     2048
____________________________________________________________________________________________________
activation_47 (Activation)       (None, 2, 2, 512)     0
____________________________________________________________________________________________________
res5b_branch2b (Conv2D)          (None, 2, 2, 512)     2359808
____________________________________________________________________________________________________
bn5b_branch2b (BatchNormalizatio (None, 2, 2, 512)     2048
____________________________________________________________________________________________________
activation_48 (Activation)       (None, 2, 2, 512)     0
____________________________________________________________________________________________________
res5b_branch2c (Conv2D)          (None, 2, 2, 2048)    1050624
____________________________________________________________________________________________________
bn5b_branch2c (BatchNormalizatio (None, 2, 2, 2048)    8192
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_49 (Activation)       (None, 2, 2, 2048)    0
____________________________________________________________________________________________________
res5c_branch2a (Conv2D)          (None, 2, 2, 512)     1049088
____________________________________________________________________________________________________
bn5c_branch2a (BatchNormalizatio (None, 2, 2, 512)     2048
____________________________________________________________________________________________________
activation_50 (Activation)       (None, 2, 2, 512)     0
____________________________________________________________________________________________________
res5c_branch2b (Conv2D)          (None, 2, 2, 512)     2359808
____________________________________________________________________________________________________
bn5c_branch2b (BatchNormalizatio (None, 2, 2, 512)     2048
____________________________________________________________________________________________________
activation_51 (Activation)       (None, 2, 2, 512)     0
____________________________________________________________________________________________________
res5c_branch2c (Conv2D)          (None, 2, 2, 2048)    1050624
____________________________________________________________________________________________________
bn5c_branch2c (BatchNormalizatio (None, 2, 2, 2048)    8192
____________________________________________________________________________________________________
____________________________________________________________________________________________________
activation_52 (Activation)       (None, 2, 2, 2048)    0
____________________________________________________________________________________________________
avg_pool (AveragePooling2D)      (None, 1, 1, 2048)    0
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 2048)          0
____________________________________________________________________________________________________
fc6 (Dense)                      (None, 6)             12294
====================================================================================================
Total params: 23,600,006.0
Trainable params: 23,546,886.0
Non-trainable params: 53,120.0
_____________________________

Finally, the next figure shows the visualization of our ResNet50.

### Key points

• Very deep “plain” networks don’t work in practice because they are hard to train due to vanishing gradients.
• The skip-connections help to address the Vanishing Gradient problem. They also make it easy for a ResNet block to learn an identity function.
• There are two main type of blocks: The identity block and the convolutional block.
• Very deep Residual Networks are built by stacking these blocks together.

### References

This article presents the ResNet algorithm due to He et al. (2015). The implementation here also took significant inspiration and follows the structure given in the github repository of Francois Chollet: