# Web Search Algorithms and PageRank

Thesis

## Degree Name

Master of Science (MSc)

Mathematics

## Faculty/School

Faculty of Science

Anthony Bonato

Thesis Supervisor

## Abstract

The mathematical theory underlying the Google search engine is the PageRank algorithm, first introduced by Sergey Brin and Lawrence Page, the founders of Google. A ranking of web pages is made considering many criteria. PageRank exploits the graph structure of the web. The web’s hyperlink structure forms a massive directed graph, where the web pages are presented as nodes and hyperlinks as edges. The PageRank equation finds a score by solving a recursive equation which calculates the PageRank vector. The PageRank vector is the stationary distribution of an ergodic Markov chain. The Perron-Frobenius theorem ensures that the primitive matrix produced by this massive Markov chain will converge to a unique stationary distribution. The PageRank vector existence is guaranteed since the so-called Google matrix is stochastic and has all entries positive.

In a recent work by Litvak, Scheinhardt and Volkovich [14], a mathematical model is presented that explains an interesting relation between PageRank values and in-degrees in power law graphs. They analytically prove that in power law graphs, the tail distributions of PageRank and in-degree differ only by a multiplicative factor.

We survey the mathematics of the PageRank algorithm, and study the work of Litvak et. al. We implement a PageRank calculator and expose different graphs to our calculator. For various power law graphs, we show that the ranking of the nodes by PageRank will be the same as the ranking given by in-degree. We give a counterexample for graphs which are not power law. For these graphs, the ranking derived from PageRank is different from the ranking derived from the in-degree values.

2008

COinS