# Unsupervised Deep learning with AutoEncoders on the MNIST dataset (with Tensorflow in Python)

• Deep learning,  although primarily used for supervised classification / regression problems, can also be used as an unsupervised ML technique, the autoencoder being a classic example. As explained here,  the aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of  dimensionality reduction.
• Deep learning can be used to learn a different representation (typically a set of input features in a  low-dimensional space) of the data that can be used for pre-training for example in transfer-learning.
• In this article, a few vanilla autoencoder implementations will be demonstrated for the mnist dataset.
• The following figures describe the theory (ref: coursera course Neural Networks for Machine Learning, 2012 by Prof. Hinton, university of Toronto). As explained, the autoencoder with back-propagation-based implementation can be used to generalize the linear dimension-reduction techniques as PCA, since the hidden layers can learn non-linear manifolds with non-linear activation functions (e.g., relu , sigmoid).

• The input and output units of an autoencoder are identical, the idea is to learn the input itself as a different representation with one or multiple hidden layer(s).
• The mnist images are of size 28×28, so the number of nodes in the input and the output layer are always 784 for the autoencoders shown in this article.
• The left side of an auto-encoder network is typically a mirror image of the right side and the weights are tied (weights learnt in the left hand side of the network are reused, to better reproduce the input at the output).
• The next figures and animations show the outputs for the following simple autoencoder with just 1 hidden layer, with the input mnist data. A Relu activation function is used in the hidden layer.  It also uses the L2 regularization on the weights learnt. As can be seen from the next figure, the inputs are kind of reproduced with some variation at the output layer, as expected.

• The next animations visualize the hidden layer weights learnt (for the 400 hidden units) and the output of the autoencoder with the same input training dataset, with a different value of the regularization parameter.

• The next figure visualizes the hidden layer weights learnt with yet another different regulariation parameter value.

• The next animation visualizes the output of the autoencoder with the same input training dataset, but this time no activation function  being used at the hidden layer.

• The next animations show the results with a deeper autoencoder with 3 hidden layers (the architecture shown below). As before, the weights are tied and in this case no activation function is used, with L2 regularization on the weights.

• Let’s implement a more deeper autoencoder. The next animations show the results with a deeper autoencoder with 5 hidden layers (the architecture shown below). As before, the weights are tied and in this case no activation function is used, with L2 regularization on the weights.

# Diffusion, PDE and Variational Methods in Image Processing and Computer Vision (Python implementation)

This article is inspired by the lecture videos by Prof. Daniel Cremers and also by the coursera course Image and Video Processing: From Mars to Hollywood with a Stop at the Hospital (by Duke University).

## 1. The Diffusion Equation and Gaussian Blurring

• Diffusion is a physical process that minimizes the spatial concentration u(x,t) of a substance gradually over time.
• The following figure shows the PDE of general diffusion (from the Fick’s law), where the diffusivity g becomes a constant, the diffusion process becomes linear, isotropic and homogeneous.

• Here we shall concentrate mainly on the linear (Gaussian blur) and non-linear  (e.g., edge-preserving) diffusion techniques.
• The following figures show that the Gaussian Blur on an image can be thought of as heat flow, since the solution of the (linear) diffusion equation is typically the Gaussian Kernel convolution, as shown in the 1-D case, the diffusion equation has a unique solution with Gaussian convolution kernel G(0,σ=√(2t)), where the bandwidth of the kernel changes proportional to the square root of the time.
• We can verify that the solution satisfies the equation, but more generally, the solution of the diffusion equation can be analytically obtained using the Fourier transform, as descried here in this post.

• Since linear diffusion can be implemented more efficiently with Gaussian Filter, let’s focus on the non-linear diffusion techniques. One interesting example of a non-linear diffusion is edge-preserving diffusion from this seminal paper by Perona-Mallik, as shown in the next figure.

• The next animations show the results of a non-linear diffusion implementation with gradient descent, with 100 iterations using the edge-preserving diffusivity defined as above on the lena image with different λ (mentioned as kappa in the Perona-Mallik paper) values, the first one with λ=10 and the second one with λ=100.

• The next animation shows blurring the same image with simple filtering with the Gaussian kernel (equivalent to linear diffusion), as can be seen the edge-preserving non-linear diffusion does a way better job in terms of preserving edges, for  small values of the parameter λ.

• The next animations show tthe results of a non-linear diffusion implementation  with gradient descent, this time with 50 iterations using the  edge-preserving
diffusivity on yet another image (monument) with different λ (mentioned as kappa in the Perona-Mallik paper) values, the first one with λ=10 and the second one with λ=100.

• The next image shows the difference between the output after GD and the input image.

The above image shows how the edge detection can be done with edge-preserving non-linear diffusion. The next results compare the edge detection result obtained using edge-preserving non-linear diffusion and the one obtained with canny edge detector on my image.

The next image shows the edges detected with the difference image obtained with non-linear diffusion, which are much more subtler than canny’s.

## 2. Denoising Images with Variational Methods, Euler-Lagrange equation for the functional and Gradient Descent

• The following figures describe the theory required to de-noise an image with  variational methods, using gradient descent.

• The following animations and figures show the implementation results of the above technique, for a noisy ship image, taken from the book Variational Methods in Image Processing , by Luminita A. Vese, Carole Le Guyader.

• The following figure shows the difference image in between the original input image and the gradient descent output image.

# Dogs vs. Cats: Image Classification with Deep Learning using TensorFlow in Python

## The problem

Given a set of labeled images of  cats and dogs, a  machine learning model  is to be learnt and later it is to be used to classify a set of new images as cats or dogs. This problem appeared in a Kaggle competition and the images are taken from this kaggle dataset.

• The original dataset contains a huge number of images (25,000 labeled cat/dog images for training and 12,500 unlabeled images for test).
• Only a few sample images are chosen (1100 labeled images for cat/dog as training and 1000 images from the test dataset) from the dataset, just for the sake of  quick demonstration of how to solve this problem using deep learning (motivated by the Udacity course Deep Learning by Google), which is going to be described (along with the results) in this article.
• The sample test images chosen are manually labeled to compute model accuracy later with the model-predicted labels.
• The accuracy on the test dataset is not going to be good in general for the above-mentioned reason. In order to obtain good accuracy  on the test dataset using deep learning, we need to train the models with a large number of input images (e.g., with all the training images from the kaggle dataset).
• A few sample labeled images from the training dataset are shown below.

Dogs

Cats

• As a pre-processing step, all the images are first resized to 50×50 pixel images.

## Classification with a few off-the-self classifiers

• First, each image from the training dataset is fattened and represented as 2500-length vectors (one for each channel).
• Next, a few sklearn models are trained on this flattened data. Here are the results

As shown above, the test accuracy is quite poor with a few sophisticated off-the-self classifiers.

## Classifying images using Deep Learning with Tensorflow

Now let’s first train a logistic regression and then a couple of neural network models by introducing L2 regularization for both the models.

• First, all the images are converted to gray-scale images.
• The following figures visualize the weights learnt for the cat vs. the dog class during training the logistic regression  model with SGD with L2-regularization (λ=0.1, batch size=128).

Test accuracy: 53.6%

• The following animation visualizes the weights learnt for 400 randomly selected hidden units using a neural net with a single hidden layer with 4096 hidden nodes by training the neural net model  with SGD with L2-regularization (λ1=λ2=0.05, batch size=128).Minibatch loss at step 0: 198140.156250
Minibatch accuracy: 50.0%
Validation accuracy: 50.0%Minibatch loss at step 500: 0.542070
Minibatch accuracy: 89.8%
Validation accuracy: 57.0%Minibatch loss at step 1000: 0.474844
Minibatch accuracy: 96.9%
Validation accuracy: 60.0%

Minibatch loss at step 1500: 0.571939
Minibatch accuracy: 85.9%
Validation accuracy: 56.0%

Minibatch loss at step 2000: 0.537061
Minibatch accuracy: 91.4%
Validation accuracy: 63.0%

Minibatch loss at step 2500: 0.751552
Minibatch accuracy: 75.8%
Validation accuracy: 57.0%

Minibatch loss at step 3000: 0.579084
Minibatch accuracy: 85.9%
Validation accuracy: 54.0%

Test accuracy: 57.8%


Clearly, the model learnt above overfits the training dataset, the test accuracy improved a bit, but still quite poor.

• Now, let’s train a deeper neural net with a two hidden layers, first one with 1024 nodes and second one with 64 nodes.
• Minibatch loss at step 0: 1015.947266
Minibatch accuracy: 43.0%
Validation accuracy: 50.0%
• Minibatch loss at step 500: 0.734610
Minibatch accuracy: 79.7%
Validation accuracy: 55.0%
• Minibatch loss at step 1000: 0.615992
Minibatch accuracy: 93.8%
Validation accuracy: 55.0%
• Minibatch loss at step 1500: 0.670009
Minibatch accuracy: 82.8%
Validation accuracy: 56.0%
• Minibatch loss at step 2000: 0.798796
Minibatch accuracy: 77.3%
Validation accuracy: 58.0%
• Minibatch loss at step 2500: 0.717479
Minibatch accuracy: 84.4%
Validation accuracy: 55.0%
• Minibatch loss at step 3000: 0.631013
Minibatch accuracy: 90.6%
Validation accuracy: 57.0%
• Minibatch loss at step 3500: 0.739071
Minibatch accuracy: 75.8%
Validation accuracy: 54.0%
• Minibatch loss at step 4000: 0.698650
Minibatch accuracy: 84.4%
Validation accuracy: 55.0%
• Minibatch loss at step 4500: 0.666173
Minibatch accuracy: 85.2%
Validation accuracy: 51.0%
• Minibatch loss at step 5000: 0.614820
Minibatch accuracy: 92.2%
Validation accuracy: 58.0%
Test accuracy: 55.2%

• The following animation visualizes the weights learnt for 400 randomly selected hidden units from the first hidden layer, by training the neural net model with SGD with L2-regularization (λ1=λ2=λ3=0.1, batch size=128, dropout rate=0.6).
• The next animation visualizes the weights learnt and then the weights learnt for all the 64 hidden units for the second hidden layer.
• Clearly, the second deeper neural net model learnt above overfits the training dataset more, the test accuracy decreased a bit.

## Classifying images with Deep Convolution Network

Let’s use the following conv-net shown in the next figure.

As shown above, the ConvNet uses:

• convolution layers each with
• 5×5 kernel
• 16 filters
• 1×1 stride
• Max pooling layers each with
• 2×2 kernel
• 2×2 stride
• 64 hidden nodes
• 128 batch size
• 5K iterations
• 0.7 dropout rate
• No learning decay

Results

Minibatch loss at step 0: 1.783917
Minibatch accuracy: 55.5%
Validation accuracy: 50.0%

Minibatch loss at step 500: 0.269719
Minibatch accuracy: 89.1%
Validation accuracy: 54.0%

Minibatch loss at step 1000: 0.045729
Minibatch accuracy: 96.9%
Validation accuracy: 61.0%

Minibatch loss at step 1500: 0.015794
Minibatch accuracy: 100.0%
Validation accuracy: 61.0%

Minibatch loss at step 2000: 0.028912
Minibatch accuracy: 98.4%
Validation accuracy: 64.0%

Minibatch loss at step 2500: 0.007787
Minibatch accuracy: 100.0%
Validation accuracy: 62.0%

Minibatch loss at step 3000: 0.001591
Minibatch accuracy: 100.0%
Validation accuracy: 63.0%

Test accuracy: 61.3%


The following animations show the features learnt (for the first 16 images for each SGD batch) at different convolution and Maxpooling layers:

• Clearly, the simple convolution neural net outperforms all the previous models in terms of test accuracy, as shown below.

• Only 1100 labeled images (randomly chosen from the training dataset) were used to train the model and predict 1000 test images (randomly chosen from the test dataset). Clearly the accuracy can be improved a lot if a large number of images are used for training with deeper / more complex networks (with more parameters to learn).

# Deep Learning with TensorFlow in Python: Convolution Neural Nets

The following problems appeared in the assignments in the Udacity course Deep Learning (by Google). The descriptions of the problems are taken from the assignments (continued from the last post).

## Classifying the alphabets with notMNIST dataset with Deep Network

Here is how some sample images from the dataset look like:

Let’s try to get the best performance using a multi-layer model! (The best reported test accuracy using a deep network is 97.1%).

• One avenue you can explore is to add multiple layers.
• Another one is to use learning rate decay.

### Learning L2-Regularized  Deep Neural Network with SGD

The following figure recapitulates the neural network with a 3 hidden layers, the first one with 2048 nodes,  the second one with 512 nodes and the third one with with 128 nodes, each one with Relu intermediate outputs. The L2 regularizations applied on the lossfunction for the weights learnt at the input and the hidden layers are λ1, λ2, λ3 and λ4, respectively.

The next 3 animations visualize the weights learnt for 400 randomly selected nodes from hidden layer 1 (out of 2096 nodes), then another 400 randomly selected nodes from hidden layer 2 (out of 512 nodes) and finally at all 128 nodes from hidden layer 3, at different steps using SGD and L2 regularized loss function (with λλλλ4
=0.01).  As can be seen below, the weights learnt are gradually capturing (as the SGD step increases) the different features of the alphabets at the corresponding output neurons.

Results with SGD

Initialized
Validation accuracy: 27.6%
Minibatch loss at step 0: 4.638808
Minibatch accuracy: 7.8%
Validation accuracy: 27.6%

Validation accuracy: 86.3%
Minibatch loss at step 500: 1.906724
Minibatch accuracy: 86.7%
Validation accuracy: 86.3%

Validation accuracy: 86.9%
Minibatch loss at step 1000: 1.333355
Minibatch accuracy: 87.5%
Validation accuracy: 86.9%

Validation accuracy: 87.3%
Minibatch loss at step 1500: 1.056811
Minibatch accuracy: 84.4%
Validation accuracy: 87.3%

Validation accuracy: 87.5%
Minibatch loss at step 2000: 0.633034
Minibatch accuracy: 93.8%
Validation accuracy: 87.5%

Validation accuracy: 87.5%
Minibatch loss at step 2500: 0.696114
Minibatch accuracy: 85.2%
Validation accuracy: 87.5%

Validation accuracy: 88.3%
Minibatch loss at step 3000: 0.737464
Minibatch accuracy: 86.7%
Validation accuracy: 88.3%

Test accuracy: 93.6%

Batch size = 128, number of iterations = 3001 and Drop-out rate = 0.8 for training dataset are used for the above set of experiments, with learning decay. We can play with the hyper-parameters to get better test accuracy.

## Convolution Neural Network

Previously  we trained fully connected networks to classify notMNIST characters. The goal of this assignment is to make the neural network convolutional.

Let’s build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we’ll limit its depth and number of fully connected nodes. The below figure shows the simplified architecture of the convolution neural net.

As shown above, the ConvNet uses:

• 2 convolution layers each with
• 5×5 kernel
• 16 filters
• 2×2 strides
• 64 hidden nodes
• 16 batch size
• 1K iterations

### Results

Initialized
Minibatch loss at step 0: 3.548937
Minibatch accuracy: 18.8%
Validation accuracy: 10.0%

Minibatch loss at step 50: 1.781176
Minibatch accuracy: 43.8%
Validation accuracy: 64.7%

Minibatch loss at step 100: 0.882739
Minibatch accuracy: 75.0%
Validation accuracy: 69.5%

Minibatch loss at step 150: 0.980598
Minibatch accuracy: 62.5%
Validation accuracy: 74.5%

Minibatch loss at step 200: 0.794144
Minibatch accuracy: 81.2%
Validation accuracy: 77.6%

Minibatch loss at step 250: 1.191971
Minibatch accuracy: 62.5%
Validation accuracy: 79.1%

Minibatch loss at step 300: 0.441911
Minibatch accuracy: 87.5%
Validation accuracy: 80.5%

Minibatch loss at step 350: 0.605005
Minibatch accuracy: 81.2%
Validation accuracy: 79.3%

Minibatch loss at step 400: 1.032123
Minibatch accuracy: 68.8%
Validation accuracy: 81.5%

Minibatch loss at step 450: 0.869944
Minibatch accuracy: 75.0%
Validation accuracy: 82.1%

Minibatch loss at step 500: 0.530418
Minibatch accuracy: 81.2%
Validation accuracy: 81.2%

Minibatch loss at step 550: 0.227771
Minibatch accuracy: 93.8%
Validation accuracy: 81.8%

Minibatch loss at step 600: 0.697444
Minibatch accuracy: 75.0%
Validation accuracy: 82.5%

Minibatch loss at step 650: 0.862341
Minibatch accuracy: 68.8%
Validation accuracy: 83.0%

Minibatch loss at step 700: 0.336292
Minibatch accuracy: 87.5%
Validation accuracy: 81.8%

Minibatch loss at step 750: 0.213392
Minibatch accuracy: 93.8%
Validation accuracy: 82.6%

Minibatch loss at step 800: 0.553639
Minibatch accuracy: 75.0%
Validation accuracy: 83.3%

Minibatch loss at step 850: 0.533049
Minibatch accuracy: 87.5%
Validation accuracy: 81.7%

Minibatch loss at step 900: 0.415935
Minibatch accuracy: 87.5%
Validation accuracy: 83.9%

Minibatch loss at step 950: 0.290436
Minibatch accuracy: 93.8%
Validation accuracy: 84.0%

Minibatch loss at step 1000: 0.400648
Minibatch accuracy: 87.5%
Validation accuracy: 84.0%

Test accuracy: 90.3%

The following figures visualize the feature representations at different layers for the first 16 images for the last batch with SGD during training:

The next animation shows how the features learnt at convolution layer 1 change with iterations.

## Convolution Neural Network with Max Pooling

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation of stride 2 and kernel size 2. The below figure shows the simplified architecture of the convolution neural net with MAX Pooling layers.

As shown above, the ConvNet uses:

• 2 convolution layers each with
• 5×5 kernel
• 16 filters
• 1×1 stride
• 2×2 Max-pooling
• 64 hidden nodes
• 16 batch size
• 1K iterations

Results

Initialized
Minibatch loss at step 0: 4.934033
Minibatch accuracy: 6.2%
Validation accuracy: 8.9%

Minibatch loss at step 50: 2.305100
Minibatch accuracy: 6.2%
Validation accuracy: 11.7%

Minibatch loss at step 100: 2.319777
Minibatch accuracy: 0.0%
Validation accuracy: 14.8%

Minibatch loss at step 150: 2.285996
Minibatch accuracy: 18.8%
Validation accuracy: 11.5%

Minibatch loss at step 200: 1.988467
Minibatch accuracy: 25.0%
Validation accuracy: 22.9%

Minibatch loss at step 250: 2.196230
Minibatch accuracy: 12.5%
Validation accuracy: 27.8%

Minibatch loss at step 300: 0.902828
Minibatch accuracy: 68.8%
Validation accuracy: 55.4%

Minibatch loss at step 350: 1.078835
Minibatch accuracy: 62.5%
Validation accuracy: 70.1%

Minibatch loss at step 400: 1.749521
Minibatch accuracy: 62.5%
Validation accuracy: 70.3%

Minibatch loss at step 450: 0.896893
Minibatch accuracy: 75.0%
Validation accuracy: 79.5%

Minibatch loss at step 500: 0.610678
Minibatch accuracy: 81.2%
Validation accuracy: 79.5%

Minibatch loss at step 550: 0.212040
Minibatch accuracy: 93.8%
Validation accuracy: 81.0%

Minibatch loss at step 600: 0.785649
Minibatch accuracy: 75.0%
Validation accuracy: 81.8%

Minibatch loss at step 650: 0.775520
Minibatch accuracy: 68.8%
Validation accuracy: 82.2%

Minibatch loss at step 700: 0.322183
Minibatch accuracy: 93.8%
Validation accuracy: 81.8%

Minibatch loss at step 750: 0.213779
Minibatch accuracy: 100.0%
Validation accuracy: 82.9%

Minibatch loss at step 800: 0.795744
Minibatch accuracy: 62.5%
Validation accuracy: 83.7%

Minibatch loss at step 850: 0.767435
Minibatch accuracy: 87.5%
Validation accuracy: 81.7%

Minibatch loss at step 900: 0.354712
Minibatch accuracy: 87.5%
Validation accuracy: 83.8%

Minibatch loss at step 950: 0.293992
Minibatch accuracy: 93.8%
Validation accuracy: 84.3%

Minibatch loss at step 1000: 0.384624
Minibatch accuracy: 87.5%
Validation accuracy: 84.2%

Test accuracy: 90.5%

As can be seen from the above results, with MAX POOLING, the test accuracy increased slightly.

The following figures visualize the feature representations at different layers for the first 16 images during training with Max Pooling:

Till now the convnets we have tried are small enough and we did not obtain high enough accuracy on the test dataset. Next we shall make our convnet deep to increase the test accuracy.

## Deep Convolution Neural Network with Max Pooling

Let’s try with a few convnets:

1. The following ConvNet uses:

• 2 convolution layers (with Relu) each using
• 3×3 kernel
• 16 filters
• 1×1 stride
• 2×2 Max-pooling
• all weights initialized with truncated normal distribution with sd 0.01
• single hidden layer (fully connected) with 1024 hidden nodes
• 128 batch size
• 3K iterations
• 0.01 (=λ1=λ2) for regularization
• No dropout
• No learning decay

Results

Minibatch loss at step 0: 2.662903
Minibatch accuracy: 7.8%
Validation accuracy: 10.0%

Minibatch loss at step 500: 2.493813
Minibatch accuracy: 11.7%
Validation accuracy: 10.0%

Minibatch loss at step 1000: 0.848911
Minibatch accuracy: 82.8%
Validation accuracy: 79.6%

Minibatch loss at step 1500: 0.806191
Minibatch accuracy: 79.7%
Validation accuracy: 81.8%

Minibatch loss at step 2000: 0.617905
Minibatch accuracy: 85.9%
Validation accuracy: 84.5%

Minibatch loss at step 2500: 0.594710
Minibatch accuracy: 83.6%
Validation accuracy: 85.7%

Minibatch loss at step 3000: 0.435352
Minibatch accuracy: 91.4%
Validation accuracy: 87.2%

Test accuracy: 93.4%

As we can see, by introducing couple of convolution layers, the accuracy increased from 90% (refer to the earlier blog) to 93.4% under the same settings.

Here is how the hidden layer weights (400 out of 1024 chosen randomly) changes, although the features don’t clearly resemble the alphabets anymore, which is quite expected.

2. The following ConvNet uses:

• 2 convolution layers (with Relu) each using
• 3×3 kernel
• 32 filters
• 1×1 stride
• 2×2 Max-pooling
• all weights initialized with truncated normal distribution with sd 0.1
• hidden layers (fully connected) both with 256 hidden nodes
• 128 batch size
• 6K iterations
• 0.7 dropout
• learning decay starting with 0.1

Results

Minibatch loss at step 0: 9.452210
Minibatch accuracy: 10.2%
Validation accuracy: 9.7%
Minibatch loss at step 500: 0.611396
Minibatch accuracy: 81.2%
Validation accuracy: 81.2%
Minibatch loss at step 1000: 0.442578
Minibatch accuracy: 85.9%
Validation accuracy: 83.3%
Minibatch loss at step 1500: 0.523506
Minibatch accuracy: 83.6%
Validation accuracy: 84.8%
Minibatch loss at step 2000: 0.411259
Minibatch accuracy: 89.8%
Validation accuracy: 85.8%
Minibatch loss at step 2500: 0.507267
Minibatch accuracy: 82.8%
Validation accuracy: 85.9%
Minibatch loss at step 3000: 0.414740
Minibatch accuracy: 89.1%
Validation accuracy: 86.6%
Minibatch loss at step 3500: 0.432177
Minibatch accuracy: 85.2%
Validation accuracy: 87.0%
Minibatch loss at step 4000: 0.501300
Minibatch accuracy: 85.2%
Validation accuracy: 87.1%
Minibatch loss at step 4500: 0.391587
Minibatch accuracy: 89.8%
Validation accuracy: 87.7%
Minibatch loss at step 5000: 0.347674
Minibatch accuracy: 90.6%
Validation accuracy: 88.1%
Minibatch loss at step 5500: 0.259942
Minibatch accuracy: 91.4%
Validation accuracy: 87.8%
Minibatch loss at step 6000: 0.392562
Minibatch accuracy: 85.9%
Validation accuracy: 88.4%

Test accuracy: 94.6%

3. The following ConvNet uses:

• 3 convolution layers (with Relu) each using
• 5×5 kernel
• with 16, 32 and 64 filters, respectively
• 1×1 stride
• 2×2 Max-pooling
• all weights initialized with truncated normal distribution with sd 0.1
• hidden layers (fully connected) with 256, 128 and 64 hidden nodes respectively
• 128 batch size
• 10K iterations
• 0.7 dropout
• learning decay starting with 0.1

Results

Minibatch loss at step 0: 6.788681
Minibatch accuracy: 12.5%
Validation accuracy: 9.8%
Minibatch loss at step 500: 0.804718
Minibatch accuracy: 75.8%
Validation accuracy: 74.9%
Minibatch loss at step 1000: 0.464696
Minibatch accuracy: 86.7%
Validation accuracy: 82.8%
Minibatch loss at step 1500: 0.684611
Minibatch accuracy: 80.5%
Validation accuracy: 85.2%
Minibatch loss at step 2000: 0.352865
Minibatch accuracy: 91.4%
Validation accuracy: 85.9%
Minibatch loss at step 2500: 0.505062
Minibatch accuracy: 84.4%
Validation accuracy: 87.3%
Minibatch loss at step 3000: 0.352783
Minibatch accuracy: 87.5%
Validation accuracy: 87.0%
Minibatch loss at step 3500: 0.411505
Minibatch accuracy: 88.3%
Validation accuracy: 87.9%
Minibatch loss at step 4000: 0.457463
Minibatch accuracy: 84.4%
Validation accuracy: 88.1%
Minibatch loss at step 4500: 0.369346
Minibatch accuracy: 89.8%
Validation accuracy: 88.7%
Minibatch loss at step 5000: 0.323142
Minibatch accuracy: 89.8%
Validation accuracy: 88.5%
Minibatch loss at step 5500: 0.245018
Minibatch accuracy: 93.8%
Validation accuracy: 89.0%
Minibatch loss at step 6000: 0.480509
Minibatch accuracy: 85.9%
Validation accuracy: 89.2%
Minibatch loss at step 6500: 0.297886
Minibatch accuracy: 92.2%
Validation accuracy: 89.3%
Minibatch loss at step 7000: 0.309768
Minibatch accuracy: 90.6%
Validation accuracy: 89.3%
Minibatch loss at step 7500: 0.280219
Minibatch accuracy: 92.2%
Validation accuracy: 89.5%
Minibatch loss at step 8000: 0.260540
Minibatch accuracy: 93.8%
Validation accuracy: 89.7%
Minibatch loss at step 8500: 0.345161
Minibatch accuracy: 88.3%
Validation accuracy: 89.6%
Minibatch loss at step 9000: 0.343074
Minibatch accuracy: 87.5%
Validation accuracy: 89.8%
Minibatch loss at step 9500: 0.324757
Minibatch accuracy: 92.2%
Validation accuracy: 89.9%
Minibatch loss at step 10000: 0.513597
Minibatch accuracy: 83.6%
Validation accuracy: 90.0%

Test accuracy: 95.5%

To be continued…