A Study on PageRank Values of Paid-Result Search Engines
PPC Pagerank Values: Theoretical, Ethical and Business Issues.
Dr E. Garcia, Mi Islita.com
admin@mail.miislita.com
November 25, 2002
Last Edited: November 28, 2002
XHTML 1.1 Conversion: March 25, 2004
CSS Formatted: March 29, 2004
Topics
Available Now » Report 2
WPSS: A General Framework for Web Page Scoring Systems
Another independent study finds more flaws! Entire theoretical framework is questioned.
CONCLUSION
Our research reports can be reproduced by anyone with access to Google's Toolbar; unless those PageRanks suddenly change. We tried to present the facts to the best of our abilities and as they ocurred. The question of whether or not Google manipulated PageRank results to make an example out of SearchKing is something to be elucidated in the correspoding legal forum.
A different but closely related issue is if in fact Google and Google-friendly search engines pretend to control the PPC or paid-result industry via PageRank. Certainly one can raise the obvious "1 google-dollar question" (1 x 10 ^ 100): whether or not SearchKing's legal case and the results herein presented could be used to describe a patterned conduct from Google against Overture or against the entire PPC and paid-results industry.
We don't know if that is the case. However, based on our research findings and on the fact that Google has stated that PageRank is all about bringing order to the Web (3), that possibility cannot be excluded. The results herein presented speak for themselves, are available online and can be used or verified by any interested party; unless PageRank values change suddenly.
We welcome any feeback from the search engines herein studied, legal departments, PPC advertisers or industry experts. This Study on PageRank Values of Paid-Result Search Engines is just the first in a series of research reports on PageRank and other link metrics conducted at Mi Islita.com.
APPENDIX 1: A Final Word on Collution and Deception
We would like to repeat an "absolute" previously stated: In the interest of fairness, competitors that try to regulate other people businesses, (or the key components of a free market or industry), using their own standards are inherently biased enterprises. Such enterprises are more likely to collude with friendly peers from the same industry or market space. This "absolute" goes beyond Google, Overture, SearchKing and transcends any business, industry or marketing environment.
Collution is a strong word in the business scene, but deception is worse because it involves misrepresentation of facts. Deception seems to be the norm in Wallstreet and Nasdaq; i.e, SEC probes on AOL, Merrill Lynch fined this year $100M in relation with a case in which Infospace stock was described as a "powder keg" in private, but differently in public, and the like. (Check here)
Nobody is accusing Google and its partners of business collution or of deceiving the industry, yet. Whether or not publicly-traded search engine companies may grab the SEC attention for using any sort of vehicle or technology to collude or to deceive their market space or industry is something to be seen and exciting to watch for years to come.
APPENDIX 2: The Icing
This work was completed and first published on November 25, 02. Between Nov 25-28 it was pre-released to a selected group of web properties. During that period of time it was reviewed and edited several times (Is it ready, yet? Sure more grammar corrections are ahead.) Since then, we have added more content to reinforce the central facts of the study. On November 28 it was given to our atention that Google quietly published what eventually may be viewed by many as "Google Code of Ethics for SEOs". We don't know if the timing was just a coincidence or if they are just laying out strategies for the SearchKing-Google case. (7) For the most part we agree with Google's position against unethical SEOs. We also understand their position on "free-for-all" schemes. Some of those schemes are, indeed, deceiving link scams.
Whether or not that banning is necessary, our position against the banning of ANY type of link connectivity structure, (prefabricated or not, patterned or non patterned and Google-friendly or not), from Google or any search engine is driven by theoretical grounds. As previously mentioned, a valid metric for link connectivity across the Web should account for ANY sort of link connectivity structures. They should not be banned simply because a link connectivity metric or framework is not good enough to account for them. Banning actions cannot be selective, either.
It should be pointed out that an intrusive, irritating mechanism called pop-up windows is part of, or contributes to the creation of many link connectivity structures on the Web. Many pop-ups are part of "free-for-all" and "link-farm" schemes, too. These insidious link structures, as their propagation of link weights across the Web are hard to quantify, cannot be ignored by any candidate link metric just because the metric is imperfect.
The pop-up window mechanism is used by many, including some Google partners, many prestigious web properties and by many web pages listed in Google and in many search engines. Still we haven't seen Google or any search engine banning pages with such irritating features. Should the banning apply to some pop-windows but not to others? If so, on what grounds? How about search engine investors pop-up ads or investors that own web properties which promote pop-up windows or "link farms"? How about search engine pop-up ads? The Search Engine World is indeed one with too many vested interests.
ACKNOWLEDGMENTS
We would like to thank friends and readers for their insighful discussions, recommendations and critiques. PageRank is a Trademark of Google, Inc.
REFERENCES
- http://searchenginewatch.com/links/paid.html
- http://www.payperclicksearchengines.com
- http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm
- http://www.google.com/press/pressrel/askjeeves.html
- http://www.searchking.com/news/complaint3.htm
- http://www.pbs.org/now/politics/wallstreet.html
- http://www.google.com/webmasters/seo.html
FEEDBACK FROM READERS | This section, Last Edited: December 3, 02
We have received some friendly and unfriendly reviews and feedback from readers. We are using their input to try to improve the content of this article or to clarify some misunderstandings. The following is just a sample of some reviews. When required, we have added/edited our comments. Here we go.
Some readers seem to shoot blanks by calling this study just another "moan" or "conspiracy theory" against Google. We cannot control how readers may interpret the content of a document. As a friend pointed out, "People are very strange when it comes to protecting SE's." Others simply play the "calling names game". Even others prefer to discuss typos or dismiss the entire work based on bad "grammer". We don't know what a "grammer" is, so we queried Google for the term. A list of search results was obtained with some entries relevant to Kelsey Grammer. Google also displayed the following message "Did you mean: grammar ?" We apologize for the typos and for the "grammer" .
We have placed this "unrefined" version online since we believe the issues herein presented are too relevant for the industry to wait for. However, this is not the final version of the manuscript, so any well intentioned corrections are welcome and appreciated. In the meantime, some readers have been unable to discuss the issues we have raised, both from the practical and theoretical standpoints.
One reader argued that PageRank is just a numerical metric and easy to compute with a calculator script, therefore, disconnected from content quality or topic sensitivity. This is a valid argument. Unfortunately, few readers are familiar with research work in the area of topic-sensitive PageRank. Needless to say that the subject is not new. Few SEO specialists know that this year, at the W3C Conference (WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA) Taher H. Haveliwala, from Stanford, received two well earned awards for an excellent work on Topic-Sensitive PageRank (http://www2002.org/CDROM/refereed/127/). SEO specialists are invited to research the topic. Topic-Sensitive PageRank (TSPR) is an improved version of the regular PageRank. We don't know if TSPR is part of the "more than average" changes allegedly implemented by Google. Only Google knows if they are already using the improved version or key components of the new framework, which can also be used for scoring page content and for improving ranking results quality.
A reader commented that we could have used other terms in some portions of this document. We agree. For that, we apologize. We could have used "personalization" instead of "discrimination", "selective fine tunning" instead of "manipulation" and the like. Still these and "other terms" do not improve the picture. Some readers have suggested the sampling period of our study was too short. We agree. We would like to conduct a time series analysis of PageRank in a given industry and over several months or years; that is, to see how link structures and their PageRank values evolve in time. Any R&D center interested in the study?
Others have suggested we didn't know that Overture was formerly known as GoTo or that we don't know about redirections. Not true. We are well familiar with both issues. A reader commented that we could have included the expression "(formerly known as GoTo)" after the expression "PPC as a revenue model was pioneered by Overture". We agree, but that is fairly well understood. According to a review by PayPerClickSearchEngines.com "Overture maintains a highly professional, well maintained site. It pioneered the pay-per-click search engine model in 1997, after seeing a need for a more focused search engine." An excellent review on Overture (formerly known as GoTo) is given in reference 1.
Others seem to be displeased and argue the PageRank of Overture, as reported in this study corresponded to a redirected page, not to the index page of Overture. Certainly, and we clearly stated that in the body of the document. Even others have suggested Overture's index page has an actual PageRank of 8 units. Let us address that issue. First, we were not able to confirm PageRanks of redirected pages. We mentioned that too. How about content of such redirectional pages? Before proceeding any further, let us elaborate on the whole issue by splitting it in two parts: the PageRank of "blind" index pages and the redirection mechanism itself.
For the Overture case, we typed "view-source:http://www.overture.com" in the browser location and obtained a source code with an index page with no content but redirectional instructions. Even assuming Overture's index page has indeed a PageRank of 8 units, why give such a high value to a document with no content at all? Who in his right mind will cast or count a vote for a blank document or for "John Doe"? How many pages on the Web with high PageRank values (10, 9, 8) are pointing (giving a "vote") to Overture.com's blank index page? (in vain?) How could link weights involving "blind" pages be measured? How can the votes received by a landing page from a "blind" page be quantified? If PageRank is assigning "votes", link weight or importance to documents with no content at all or no real or significative content (Overture, AskJeeves, WebCrawler index pages and the like), or if PageRank is spreading weights ("votes") across link structures involving "blind pages" (empty nodes) then their metric algorithm has a serious flaw.
A reader claimed that Overture.com's index page PageRank value is actually GoTo.com's index page PageRank value. Not so fast. These two index pages are two different nodes in the link structure of the Web. Isn't this equivalent to state that link weights can be mirrored or "diffuse" thru different nodes across the Web, with one "full" of links and the other "empty"? To grasp the actual picture, let's think globally and locally: link structures with redirectional empty nodes receiving and spreading link weights (!?) Could this be rationalized as link-weight redirection (a spamming mechanism)? Interesting. Could this be a different kind of flaw in the PageRank framework?
Let's address the redirectional mechanism itself. Overture's redirectional page with no real content is not that different from a doorway or splash page. The same goes for AskJeeves and WebCrawler index pages. If an average site has the same redirectional mechanism in its index page sure it will get in trouble with Google, AltaVista or other search engines. Why then, from the PageRank and ranking results standpoint not apply the same rule to other search engine index pages; i.e., AskJeeves, WebCrawler and the like? (Will Google include these observations in their code of ethics?) Without trying to justify "blind" doorways or redirectional pages at all, (we hate them), isn't this a double standard? These are additional reasons why the current PageRank metric is imperfect; needless to say that a link metric divorced from content makes no sense. If PageRank cannot discriminate against index pages with no content at all, but still give "high citation importance" to the link, then we have serious flaws in the PageRank algorithm. Why then use that metric or follow it as a "standard"? If content is King, then something is wrong with this picture. Add Topic-Sensitive PageRank to the picture and the situation gets worst (Your current PageRank calculator? Forget about it!)
One reader asked how his current PageRank calculator can be modified to account for TSPR. Very good question. We recommended the reader to study Haveliwala's paper, keeping in mind that we don't know if Google is already implementing TSPR or its key components. Calculators using the regular (original) PageRank algorithm may not work in connection with TSPR calculations since the equations involved are different. Certainly, these calculators may need to be modified.
Some readers have suggested since the PageRank name is a trademark, no one should profit from metrics derived from the name, (PR values and ranking results), or display online calculators with the name "PageRank" or derivative names (eg., "Page Rank", "pagerank", "Download Joe's PageRank JavaScript" and the like). Excellent observations. These are valid issues to be decided in a court of law. Personally, we do not recommend the use of the "PageRank" name in connection with online calculators without a disclaimer or without consulting with Google.
One reader suggests TSPR can account for link structures or resolve issues in connection with dynamic links. Excellent and very good suggestion. We don't know if that may be the case. Issues in connection with dynamic links, deterministic traffic or prefabricated link structures are not addressed in Haveliwala's paper since that is not the central thesis of the paper. Also Haveliwala's paper is based on data from ODP (Open Directory Project) which consists of human editors self-regulated by ODP editorial policies.
Let us re-state an important point, mentioned several times in this document. As some readers, we are also fascinated with Google's technology. This is not about Google, SearchKing or Overture. This is about an imperfect metric an industry perceives as a "standard" and pretends to follow. Too many SEOs are paranoic with regard to PageRank values. Others don't even want to touch what Google considers a "bad link". That attitude is precisely what is wrong with the current scene in the industry. Google and many SEO "gurus" may have contributed to that paranoia. Having said that, we do believe there is room for arguing without playing the "calling names game". We do believe there is room for a better standard. Whether or not that standard comes from Google, Vivisimo, Teoma or other cutting-edge search engines is irrelevant.
A reader questioned we could have included more main page nodes (index and redirectional) in the study. We agree. The first line in the PROCEDURE section states "For this study, we did not include all PPC engines available across the entire Web ..." . Should this argument be used to undermine the theoretical issues herein raised or to ignore the potential implications of the SearchKing-Google case for the entire SEO and search engine industry?
We also believe that "As Is" we are dealing with a very imperfect metric designed to quantify a very complex dynamical system (banning in the process some link structures but not others). Perhaps something good may come out from the current discussion. Perhaps some researchers may be willing to tackle the theoretical arguments we have stated in this report with regard to random walk, link structures and deterministic traffic . Perhaps a messy victory in the SearchKing-Google case is ahead of us. Perhaps a federal court judge may decide that 1st-page rankings as rankings in general and PageRank values, although two different "animals" are indeed metrics derived from the technology of Google and that no one has the right to profit from search engine metrics. Then, say "adios" to the SEO industry (A metric is a metric is a metric). Perhaps this should only apply to the Google search engine since a trademark is involved. Perhaps and only perhaps, a better, stronger metric is ahead of us, from Google (or Haveliwala's future work) or from who knows; "John Doe"? As of Today, we don't know the answer(s). In the meantime, let us stick to the real issues and call things as they are.
Sections
Available Now » Report 2
WPSS: A General Framework for Web Page Scoring Systems
Another independent study finds more flaws! Entire theoretical framework is questioned.

