(Sandipan Dey, 2 Jan 2017)
- In this article, the basic concepts of the Google Page Rank Algorithm will be discussed along with a few examples.
- Web net can be represented as a directed graph, with nodes represented by web pages and edges represented by the links between them.
- Probabilistic point of view: Since the importance of a web page is measured by its popularity (how many incoming links it has), the importance of page i can be viewed as the probability that a random surfer on the Internet that opens a browser to any page and starts following hyperlinks, visits the page i.
- The process can be modeled as a random walk on the web graph.
- A smaller, but positive percentage of the time, the surfer will dump the current page and choose arbitrarily a different page from the web and “teleport” there. The damping factor p reflects the probability.
- The PageRank vector for a web graph with transition matrix A, and damping factor p, is the unique probabilistic eigenvector of the matrix M (as shown in the following figure), corresponding to the dominant eigenvalue 1.
- By Perron-Frobenius theorem, if the underlying web-graph is connected and aperiodic, then the power-iteration algorithm used to compute the page ranks is guaranteed to converge to a steady state vector, which is precisely the vector with the page ranks of all the nodes.
- The next figure and animations show an example problem for computing page ranks that appeared in the final exam of the same edX (CornellX) course Networks, Crowds and Markets.
- The following animation shows the pageranks computed using the power-iteration algorithm without a damping factor. As can be seen, the nodes with larger sizes have higher pageranks.
- The next animation shows the pageranks computed using the power-iteration algorithm with a damping factor 0.2.
- The next animations show the pageranks computed for an Erdos-Renyi random graph with n=100 nodes, with probability using the power-iteration algorithm and also directly as a dominant eigenvector of the matrix M. As can be seen, the nodes with ids in larger fonts have higher pageranks.
- The next animation shows how fast the page ranks computed by the power-iteration algorithm converges to the true page ranks (computed directly as normalized dominant eigenvector) for the same E-R random graph.
- The power-iteration algorithm convergence rate is directly proportional to the ratio of the dominant eigenvalues of the matrix M. Since λ1, the page rank computation algorithm converges faster if the second eigenvalue is smaller.
- The next figures show the result of an experiment done starting with an Erdos-Renyi random graph with n=100 nodes, with probability p=1/2 and then iteratively removing 10 edges randomly from the graph and running the power iteration algorithm and recording the # iterations to converge with an error at most 1e-5, also noting the each time.
- As can be seen, with ratio of the dominant eigenvalues (or ) decreases as we go on removing the edges and as expected, #iterations taken to converge (with the same accuracy threshold) decreases (which means a faster convergence) as the second eigenvalue becomes smaller.