The search engine Google could soon be cataloguing the world wide web up to five times faster, thanks to new software tweaks developed by Stanford University computer scientists.



Such a turbo boost should allow web searches to be tailored for particular topics and, in future, personalised for every user.



At the heart of Google’s software is the PageRank algorithm. It ranks how important a web page is by counting the number of links that lead to it, with links from a page that itself receives a lot of links weighing more heavily. Its fast, high-quality results have made Google the world’s most popular search engine.


Ranking the more than three billion web sites now online can currently take days with PageRank. An individually customised ranking “would now take 5000 computers five days to do. It’s not feasible,” says Stanford researcher Sepandar Kamvar.



But even speeding up the process enough to allow ranking based on broad topics could be useful, he says. For example, a comedy buff looking for Marx would find much more about Groucho than Karl, making the search even more accurate than before.


The first of the Stanford team’s software tweaks is called BlockRank and would give the greatest boost, tripling PageRank’s speed. The researchers found 80 per cent of the pages on any given web site link to other pages on the same site. Such relatively closed systems are much simpler for PageRank to deal with. By running PageRank only a few times on such networks and then “gluing” the results together before scanning the entire Web, BlockRank saves PageRank time.



Next, the scientists found the rankings of many low-rated pages are calculated early in the PageRank process since they receive few links, but PageRank keeps reanalysing them as it continues its work on other pages. In a method called Adaptive PageRank, these redundant computations are eliminated, for a 50 per cent increase in speed.



Finally, new “extrapolation methods” make an assumption that the link structure of the web is much simpler than actually is. This simplification permits PageRank to work about 50 per cent faster. These results are not fully correct, but they are close and can then be refined by the original PageRank algorithm.

More here.

0