Google’s PageRank idea is a marvel of power and simplicity. An introduction to the basic version of this algorithm was provided in a Numerical Linear Algebra course and is reproduced here.


Google runs spiders to collect information from web pages and these are indexed in a humongous database of phrases. When one of the phrases is googled, the results need to be rank ordered based on some metric, so as to be meaningful, which is where PageRank comes in. PageRank judges the importance of each page containing the phrase is judged as follows. (Note : this is a very simplistic explanation of the algorithm, which is definitely more complex and efficient)

P_j : page j
I(P_j) : importance of page j (needs to be determined)
l_j : number of links going out of page j

If P_j links to P_i, then P_j confers \frac{1}{l_j} of importance to P_i.

Let B_i be the set of all pages linking to P_i.

I(P_i)= \displaystyle \sum_{P_j \in B_i} \frac{I(P_j)}{l_j}

This presents a chicken-and-egg problem, as the importance of each page is calculated from the importance other pages whose importance is in turn … so on.

Let us define H, the hyperlink matrix as :

H_{ij}= \Big\{ \begin{array}{c} \frac{1}{l_j}, \qquad if P_j \in B_i  \\ 0, \qquad otherwise \end{array}

  • H has all non-negative entries
  • Columns of H sum to 1 (unless the page has no outgoing links).
  • Set I vector of page ranks, I_j=I(P_j). Then,

    HI=I

    So, I is the eigenvector associated with H associated with eigen value 1.

    Goal: Given H, find I.

    H is a huge matrix (around 25 billion x 25 billion !), but several entries are 0. Still, each page has an average of 10 links, so it is a very hard problem.

    Solution: Power Method

    1. Pick an initial vector I^{(0)}
    2. Iterate, I^{(k+1)}=HI^{(k)}

    In general, such an iterative systems converges, I^{(k)} \rightarrow I. Of course there are interesting questions which arise :

    • Does it always converge ?
    • Does the limit of the sequence depend of the initial vector?
    • Does the limit of the sequence provide desired information ?

    This exercise kind of gives some appreciation for what Google has to do constantly in order to provide relevant search results.