*Sandipan Dey, **April 10 2017:*

In this article a **semi-supervised classification algorithm** implementation will be described using **Markov Chains** and **Random Walks**. We have the following **2D circles dataset** (with 1000 points) with only 2 points labeled (as shown in the figure, colored **red** and **blue** respectively, for all others the labels are unknown, indicated by the color **black**).

Now the task is to predict the labels of the other (unlabeled) points. From each of the unlabeled points (states) a **random walk** with **Markov transition matrix** (computed from the **row-stochastic kernelized distance** matrix) will be started that will end in one labelled state, which will be an **absorbing state** in the **Markov Chain**.

This problem was discussed as an application of **Markov Chain** in a lecture from the **edX course ColumbiaX: CSMM.102x Machine Learning**. The following theory is taken straightway from the slides of the same course:

Since here the **Markov chain** does not satisfy the **connectedness** condition of the **Perron-Frobenius** **theorem**, it will not have a single **stationary distribution**, instead for each of the unlabeled points the **random walk** will end in exactly one of the terminal (absorbing) state and we can directly compute (using the above formula, or use **power-iteration** method for convergence, which will be used in this implementation) the probability on which **absorbing** state it’s more likely to **terminate** starting from any unlabeled point and accordingly **label** the point with the **class** of the **most probable** **absorbing** state in the **Markov Chain**.

The following animation shows how the **power-iteration** algorithm (which is run upto 100 iterations) **converges** to an **absorbing** state for a given **unlabeled** point and it gets labeled correctly. The **bandwidth** **b** for the **Gaussian Kernel** used is taken to be **0.01**.

As can be seen, with increasing iterations, the **probability** that the state ends in that particular **red** absorbing state with state index **323 **increases, the **length** of a bar in the second **barplot** represents the **probability** after an iteration and the **color** represents two **absorbing** and the other **unlabeled** states, where the **w** vector shown contains 1000 states, since the number of datapoints=1000.

Now, for each of the **unlabeled** datapoints, 500 iterations of the **power-iteration** algorithm will be used to obtain the **convergence** to one of the **absorbing** states, as shown. Each time a **new unlabeled ( black) point **is

**selected**, a

**random walk**is started with the underlying Markov transition matrix and the

**power-iteration**is continued until it terminates to one of the

**absorbing states**with high probability. Since only two

**absorbing**states are there, finally the point will be labeled with the

**label**(

**red**or

**blue**) of the

**absorbing state**where the

**random walk**is likely to

**terminate**with

**higher probability**. The

**bandwidth**

**b**for the

**Gaussian Kernel**used is taken to be

**0.01**. Off course the result varies with kernel

**bandwidth**. The next animation shows how the

**unlabeled**points are

**sequentially**

**labeled**with the algorithm.