Sandipan Dey, April 10 2017:
In this article a semi-supervised classification algorithm implementation will be described using Markov Chains and Random Walks. We have the following 2D circles dataset (with 1000 points) with only 2 points labeled (as shown in the figure, colored red and blue respectively, for all others the labels are unknown, indicated by the color black).
Now the task is to predict the labels of the other (unlabeled) points. From each of the unlabeled points (states) a random walk with Markov transition matrix (computed from the row-stochastic kernelized distance matrix) will be started that will end in one labelled state, which will be an absorbing state in the Markov Chain.
This problem was discussed as an application of Markov Chain in a lecture from the edX course ColumbiaX: CSMM.102x Machine Learning. The following theory is taken straightway from the slides of the same course:
Since here the Markov chain does not satisfy the connectedness condition of the Perron-Frobenius theorem, it will not have a single stationary distribution, instead for each of the unlabeled points the random walk will end in exactly one of the terminal (absorbing) state and we can directly compute (using the above formula, or use power-iteration method for convergence, which will be used in this implementation) the probability on which absorbing state it’s more likely to terminate starting from any unlabeled point and accordingly label the point with the class of the most probable absorbing state in the Markov Chain.
The following animation shows how the power-iteration algorithm (which is run upto 100 iterations) converges to an absorbing state for a given unlabeled point and it gets labeled correctly. The bandwidth b for the Gaussian Kernel used is taken to be 0.01.
As can be seen, with increasing iterations, the probability that the state ends in that particular red absorbing state with state index 323 increases, the length of a bar in the second barplot represents the probability after an iteration and the color represents two absorbing and the other unlabeled states, where the w vector shown contains 1000 states, since the number of datapoints=1000.
Now, for each of the unlabeled datapoints, 500 iterations of the power-iteration algorithm will be used to obtain the convergence to one of the absorbing states, as shown. Each time a new unlabeled (black) point is selected, a random walk is started with the underlying Markov transition matrix and the power-iteration is continued until it terminates to one of the absorbing states with high probability. Since only two absorbing states are there, finally the point will be labeled with the label (red or blue) of the absorbing state where the random walk is likely to terminate with higher probability. The bandwidth b for the Gaussian Kernel used is taken to be 0.01. Off course the result varies with kernel bandwidth. The next animation shows how the unlabeled points are sequentially labeled with the algorithm.