(*Sandipan Dey, **2 Jan 2017*)

- In this article, the basic concepts of the
*Google Page Rank Algorithm*will be discussed along with a few examples. - Web net can be represented as a directed graph, with nodes represented by web pages and edges represented by the links between them.
- Probabilistic point of view: Since the importance of a web page is measured by its popularity (how many incoming links it has), the importance of page
*i*can be viewed as the probability that a*random surfer*on the Internet that opens a browser to any page and starts following hyperlinks, visits the page*i*. - The process can be modeled as a random walk on the web graph.
- A smaller, but positive percentage of the time, the surfer will dump the current page and choose arbitrarily a different page from the web and “teleport” there. The damping factor
*p*reflects the probability. - The PageRank vector for a web graph with transition matrix
*A*, and damping factor*p*, is the*unique probabilistic eigenvector*of the matrix*M*(as shown in the following figure), corresponding to the*dominant eigenvalue 1*. - By
*Perron-Frobenius theorem*, if the underlying*web-graph*is*connected*and*aperiodic*, then the*power-iteration algorithm*used to compute the*page ranks*is guaranteed to converge to a*steady state vector*, which is precisely the vector with the*page ranks*of all the nodes.

- The next figure and animations show an example problem for computing page ranks that appeared in the final exam of the same
*edX (CornellX)*course*Networks, Crowds and Markets*.

- The following animation shows the pageranks computed using the
*power-iteration algorithm*without a damping factor. As can be seen, the nodes with*larger sizes*have*higher pageranks*.

- The next animation shows the pageranks computed using the
*power-iteration algorithm*with a*damping factor 0.2*.

- The next animations show the pageranks computed for an
*Erdos-Renyi random graph*with*n=100*nodes, with probability p=1/√n using the*power-iteration*algorithm and also directly as*a dominant eigenvector of the matrix M*. As can be seen, the nodes with ids in*larger fonts*have*higher pageranks*.

- The next animation shows how fast the page ranks computed by the power-iteration algorithm converges to the true page ranks (computed directly as
*normalized dominant eigenvector*) for the same*E-R*random graph.

- The
*power-iteration algorithm convergence rate*is directly proportional to the ratio of the dominant eigenvalues λ2/λ1 of the matrix*M*. Since λ1=1, the page rank computation algorithm*converges faster*if the*second eigenvalue*is*smaller*. - The next figures show the result of an experiment done starting with an
*Erdos-Renyi random graph*with*n=100*nodes, with probability*p=1/2*and then iteratively removing 10 edges randomly from the graph and running the power iteration algorithm and recording the # iterations to converge with an error at most 1e-5, also noting the λ2/λ1 each time. - As can be seen, with ratio of the dominant eigenvalues (or λ2) decreases as we go on removing the edges and as expected, #iterations taken to converge (with the same accuracy threshold) decreases (which means a faster convergence) as the
*second eigenvalue*becomes*smaller*.

Advertisements