*Deep learning*, although primarily used for*supervised classification / regression*problems, can also be used as an*unsupervised ML*technique, the**autoencoder**being a classic example. As explained here, the aim of an autoencoder is to learn a*representation*(encoding) for a set of data, typically for the purpose of*dimensionality**reduction*.

- Deep learning can be used to learn a different representation (typically a set of input features in a low-dimensional space) of the data that can be used for
*pre-training*for example in*transfer-learning*.

- In this article, a few vanilla autoencoder implementations will be demonstrated for the mnist dataset.

- The following figures describe the theory (ref:
*coursera course Neural Networks for Machine Learning, 2012 by Prof. Hinton, university of Toronto).*As explained, the autoencoder with back-propagation-based implementation can be used to generalize the l*inear dimension-reduction*techniques as*PCA*, since the*hidden layers*can learn*non-linear manifolds*with*non-linear activation functions*(e.g.,*relu , sigmoid*).

- The input and output units of an
*autoencoder*are identical, the idea is to learn the input itself as a different representation with one or multiple hidden layer(s).

- The mnist images are of size
*28×28*, so the number of nodes in the input and the output layer are always*784*for the autoencoders shown in this article.

- The left side of an auto-encoder network is typically a mirror image of the right side and the weights are tied (weights learnt in the left hand side of the network are reused, to better reproduce the input at the output).

- The next figures and animations show the outputs for the following simple
*autoencoder*with just*1 hidden layer*, with the input*mnist*data. A*Relu activation*function is used in the*hidden*layer. It also uses the*L2**regularization*on the*weights*learnt. As can be seen from the next figure, the inputs are kind of reproduced with some variation at the output layer, as expected.

- The next animations visualize the
*hidden layer weights*learnt (for the*400*hidden units) and the output of the*autoencoder*with the same input training dataset, with a different value of the*regularization parameter*.

- The next figure visualizes the
*hidden layer weights*learnt with yet another different*regulariation*parameter value.

- The next animation visualizes the output of the
*autoencoder*with the same input training dataset, but this time*no activation function*being used at the hidden layer.

- The next animations show the results with a
with**deeper**autoencoder*3 hidden layers*(the architecture shown below). As before, the weights are tied and in this case*no activation function*is used, with*L2 regularization*on the weights.

- Let’s implement a more deeper
*autoencoder*. The next animations show the results with awith**deeper**autoencoder*5 hidden layers*(the architecture shown below). As before, the weights are tied and in this case*no activation function*is used, with*L2 regularization*on the weights.