Google’s PageRank idea is a marvel of power and simplicity. An introduction to the basic version of this algorithm was provided in a Numerical Linear Algebra course and is reproduced here.
Google runs spiders to collect information from web pages and these are indexed in a humongous database of phrases. When one of the phrases is googled, the results need to be rank ordered based on some metric, so as to be meaningful, which is where PageRank comes in. PageRank judges the importance of each page containing the phrase is judged as follows. (Note : this is a very simplistic explanation of the algorithm, which is definitely more complex and efficient)
: page j
: importance of page j (needs to be determined)
: number of links going out of page j
If links to
, then
confers
of importance to
.
Let be the set of all pages linking to
.
This presents a chicken-and-egg problem, as the importance of each page is calculated from the importance other pages whose importance is in turn … so on.
Let us define H, the hyperlink matrix as :
H has all non-negative entries Columns of H sum to 1 (unless the page has no outgoing links).
Set I vector of page ranks, . Then,
So, I is the eigenvector associated with H associated with eigen value 1.
Goal: Given H, find I.
H is a huge matrix (around 25 billion x 25 billion !), but several entries are 0. Still, each page has an average of 10 links, so it is a very hard problem.
Solution: Power Method
- Pick an initial vector
- Iterate,
In general, such an iterative systems converges, . Of course there are interesting questions which arise :
- Does it always converge ?
- Does the limit of the sequence depend of the initial vector?
- Does the limit of the sequence provide desired information ?
This exercise kind of gives some appreciation for what Google has to do constantly in order to provide relevant search results.

No comments yet
Comments feed for this article