Google Page Rank and the impact of the Second EigenValue of the Google Matrix on Power Iteration Convergence in R

(Sandipan Dey, 2 Jan 2017)

  • In this article, the basic concepts of the Google Page Rank Algorithm will be discussed along with a few examples.
  • Web net can be represented as a directed graph, with nodes represented by web pages and edges represented by the links between them.
  • Probabilistic point of view: Since the importance of a web page is measured by its popularity (how many incoming links it has), the importance of page i can be viewed as the probability that a random surfer on the Internet that opens a browser to any page and starts following hyperlinks, visits the page i.
  • The process can be modeled as a random walk on the web graph.
  • A smaller, but positive percentage of the time, the surfer will dump the current page and choose arbitrarily a different page from the web and “teleport” there. The damping factor p reflects the probability.
  • The PageRank vector for a web graph with transition matrix A, and damping factor p, is the unique probabilistic eigenvector of the matrix M (as shown in the following figure), corresponding to the dominant eigenvalue 1.
  • By Perron-Frobenius theorem, if the underlying web-graph is connected and aperiodic, then the power-iteration algorithm used to compute the page ranks is guaranteed to converge to a steady state vector, which is precisely the vector with the page ranks of all the nodes.

pr.png

  • The next figure and animations show an example problem for computing page ranks that appeared in the final exam of the same edX (CornellX) course Networks, Crowds and Markets.

prprob.png

  • The following animation shows the pageranks computed using the power-iteration algorithm without a damping factor. As can be seen, the nodes with larger sizes have higher pageranks.

pr1.gif

  • The next animation shows the pageranks computed using the power-iteration algorithm with a damping factor 0.2.

pr2.gif

  • The next animations show the pageranks computed for an Erdos-Renyi random graph with n=100 nodes, with probability p=1/n using the power-iteration algorithm and also directly as a dominant eigenvector of the matrix M. As can be seen, the nodes with ids in larger fonts have higher pageranks.

er.gif

  • The next animation shows how fast the page ranks computed by the power-iteration algorithm converges to the true page ranks (computed directly as normalized dominant eigenvector) for the same E-R random graph.

er2.gif

  • The power-iteration algorithm convergence rate is directly proportional to the ratio of the dominant eigenvalues λ2/λ1 of the matrix M. Since λ1=1, the page rank computation algorithm converges faster if the second eigenvalue is smaller.
  • The next figures show the result of an experiment done starting with an Erdos-Renyi random graph with n=100 nodes, with probability p=1/2 and then iteratively removing 10 edges randomly from the graph and running the power iteration algorithm and recording the # iterations to converge with an error at most 1e-5, also noting the λ2/λ1 each time.
  • As can be seen, with ratio of the dominant eigenvalues (or λ2) decreases as we go on removing the edges and as expected, #iterations taken to converge (with the same accuracy threshold) decreases (which means a faster convergence) as the second eigenvalue becomes smaller.

er1.png

erconv

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s