Home - Contacts - Terms -

Mi Islita

A Study on PageRank Values of Paid-Result Search Engines

PPC Pagerank Values: Theoretical, Ethical and Business Issues.

Dr E. Garcia, Mi Islita.com
admin@mail.miislita.com
November 25, 2002
Last Edited: November 28, 2002
XHTML 1.1 Conversion: March 25, 2004
CSS Formatted: March 29, 2004

Topics

Available Now » Report 2
WPSS: A General Framework for Web Page Scoring Systems
Another independent study finds more flaws! Entire theoretical framework is questioned.

METHODOLOGY

Our study was conducted from November 15, 2002 to November 21, 2002. The study started by downloading and installing the most recent version of the PageRank Toolbar from the Google.com site and by following the installation procedure recommended by Google. The search engines studied were selected by visiting two well known destination sites on the web:

  1. SearchEngineWatch.com, which lists and reviews search engines that offer free and paid services. According to SearchEngineWatch, some search engines sell listings on a cost-per-click (CPC) basis, also called pay-per-click (PPC). Advertisers pay for each click the search engine sends them. Those who pay the most generally get listed higher. These search engines are known as PPC search engines. PPC as a revenue model was pioneered by Overture, which offers free listings and PPC services. Overture's model has been replicated across the Web to the point that it comes in "different shapes and flavors". Like Overture and other search engines, Google offers free listings and a variety of paid result services (1)
  2. PayPerClickSearchEngines.com, recognized by SearchEngineWatch as a long-standing guide to paid-search engines that lists new and old PPC engines, links to articles, strategies and resources about dealing with PPC search engines (2)

We became aware of the web presence of other search engines by querying Google directly and by visiting urls listed in the above two seed urls. At the time of the study, the investigated search engines were offering three types of search results of some sort: paid, free and combined (free and paid result services). We use the term paid-results as a generic term to describe engines that offer some sort of Pay-Per-Click (PPC), paid inclusion, paid-listing or sponsored link services, preserving the term PPC for engines whose core and main service is pay-per-click services.

The business relationship of PPC search engines with Google; ie., whether or not they have some type of revenue-driven strategic alliance with Google, was also noticed. Based on this criterion, a two-table divide was constructed. PPC search engines that do not maintain or publicly acknowledge to have a business relationship with Google were placed in one table, (Table 1), and the rest on a separate table, (Table 2), regardless of their relationship with Google or type of services provided. This divide was constructed for three main reasons:

  1. As Overture, most PPC search engines do not maintain a business relationship with Google, which offers paid-result services of some sort.
  2. We wanted to compare the PageRank value of PPC engines with the values obtained by other search engines, by engines associated with Google, by relatively new engines and by small or less known engines.
  3. Overture's PPC model can hardly be labeled as one of the parasitic link-farm models punished by Google with low PageRank values. It should be pointed out that Overture's model is a valid, successful business model, replicated by many portals, and that existed before the inception of Google in the search engine scene. The link structure of Overture across the Web predates PageRank, Google and many search engines, regardless of the type of services they provide, free or paid.

THEORETICAL CONSIDERATIONS

With regard to point 3, above, we need to mention that link-farm models seem to be banned by Google because they were not well accounted for when PageRank's theoretical framework was first developed. It has been stated by many search engine optimization specialists (SEOs), webmasters and in many discussion forums, that link-farms are link connectivity structures designed exclusively to deceive PageRank. Without trying to justify link-farms, we disagree vehemently with this position for four important reasons;

  1. A framework, metric, tool or probe of the link structure of the Web as a graph of links, can only be valid if it can account for ANY type of link structure on the Web (link-farms, web rings, tree-like structures, hubs and the like).
  2. The fact that link-farm structures existed before the inception of Google in the Web scene disputes the argument that these are structures simply designed to deceive PageRank.
  3. Link-farms, web rings, tree-like structures, hub structures, structures of any kind, are all part, and will always be part, of the Web and for years to come. Any connectivity metric or "link probe" designed to accurately describe the dynamics of the Web in terms of a connected graph of links cannot ignore or penalize these structures to fit metric results to a theory.
  4. A valid framework for link connectivity across the Web cannot make arbitrary assumptions with regard to user behavior or web traffic, either. Users can behave as "random walkers", "deterministic surfers" or in many unpredictable ways. According to Google, "PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank" (3).

A Word on Link Structures, Dynamic Links and PageRank

Reason 4, above, is one of the first descriptions of the PageRank framework. Since Google does not publish changes made to the framework or their "secret formulae", we need to go by what is available online from them. Certainly, users do click back, very often. Therefore the above is a fallacy embedded into PageRank framework from the start. There is no such thing as a damping factor, "d" which can be manipulated at will to account for how often users click "back", get bored, stay in the current page, close the current browser window, follow dynamic links or links listed in pop-up windows or the like.

Dynamic links seem to be an "irritating" thing to the PageRank framework, since their citation importance, link propagation across the Web or weighted values derived from other dynamical links are a pervasive presence, hard to quantify. This, however, is a limitation of the metric utilized to grasp the behavior of a system. The problem here is caused by the limitations of a method that pretends to quantify a dynamical system, not by the dynamical system itself, which clearly predates the method.

To merely account prefabricated link structures or dynamic links as "noise" in the "propagation signal" is preposterous, but comprehensible. What is an incomprehensible shame is that some academics, the search engine industry and prestigious research institutions have embraced metrics based on fallacies just because it is "the least imperfect thing we have". Having said that, what is a real shame is to see an entire search engine industry trying to adopt a "standard" that is based on fallacies and vested interests. To say "Well, maybe you do not understand our theory" is preposterous, too.

It is also preposterous to assume that penalty functions have to be increased (or modified) because of the presence of prefabricated link structures. Needless to say that a web crawler does not understand what link-farms are, nor can discriminate against them; i.e, not without human intervention or pre-embedded selectivity rules. Whether or not users "land" in a predetermined link structure, (link-farm, a web ring, and the like), while navigating the Web, either by clicking "back" or "forward", is just part of the dynamical nature of the Web.

Nobody is suggesting to go back to the "stone age" of the Web, which consisted of non sense business models or "free-for-all" strategic alliances, for which we are still having a dot-com and dot-telecom meltdown and SEC probes. Those business structures were all about "the money trail", while prefabricated link structures are all about traffic enhancement.

It should be pointed out that web traffic consists of two components: random and deterministic . Carefully thought and prefabricated link structures are on the Web simply to induce deterministic traffic (traffic not by chance), and traffic is a key component of any ebusiness enterprise. In the interest of fairness, competitors that try to regulate other people's businesses, (or the key components of a free market or industry), using their own standards are inherently biased enterprises. When those standards are imperfect, or can be tweaked, the "free market" becomes a pathetic concept.

Let us stress an important point. We are not pro or against some "link-farms" that have been designed exclusively to deceive PageRank; neither we are "anti-PageRankers" or against Google as a research center. We understand Google's position, (made public during the editing of this paper), with respect to some specific "free-for-all" link schemes and designed by unethical SEOs (7). Still we argue from the theoretical standpoint that ANY valid link metric of the web as a graph of links should account for ALL type of link structures, Google-friendly or not. A metric that is not "nearly impossible" to deceive is not a good metric or a valid standard.

On the other hand, we recognize Google's founders business success. Google's founders have also earned recognition and a good crowd of intellectual "PageRankies" (PageRank fans and researchers). What Google's founders have accomplished in the business scene cannot be ignored. We are not against standards on the Web or in any industry, either. They are necessary. We simply argue that, at least at the present time, today, November 2002, PageRank, "as is", can hardly be considered an accurate metric for link connectivity or link citation in the presence of predetermined link structures.

With regard to a damping factor, "d" which Google manipulates, they have stated: "One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking." (3) This statement is hurting Google. Once they realized some link connectivity structures can indeed mislead PageRank framework, they found themselves in a difficult position. Either they acknowledge the above was an overstatement or misrepresentation of the facts, improve the framework to include all sort of link connectivity structures or simply penalize or ban what they consider "noisy link structures". We know now Google's position with regard to these structures. Consequently, this seems to be a case in which an imperfect metric is tweaked to match the behavior of a dynamical system in order to accomodate metric results to a probabilistic theory.

From the theoretical standpoint and for the reasons previously presented, a true "random walker" (a web user) cannot behave within, or be described by, the current PageRank framework. Furthermore, the Results section of this study shows serious flaws when PageRank values are used as descriptors of link citation quality in a real environment consisting of businesses or enterprises with vested interest and with strategic alliances of all sorts. What works in a computer lab, or an R&D environment, often does not work in WallStreet or "main street".

A Word on Penalty Functions and Deterministic Traffic

It can be argued that not penalizing link-farms with low PageRank scores may destabilize the entire PageRank framework. And the point is?

Certainly, this is a valid, scientific problem to be investigated and something worthwhile to be studied with an unbiased eye or vested interests. In the process, several issues with regard to the theory of random walk behavior and deterministic traffic across the Web as a connected graph of links can be examined without preconceived ideas. These studies can be conducted in the presence of prefabricated link structures by means of not assigning selective penalty functions to such structures when detected. Perhaps the outcome of such studies may produce even a better , stronger framework , with valid noise reduction techniques and free from blatant policies, often presented or masked as penalty functions or "filters". Such policies are far away from the Scientific Method. We believe this is an important scientific issue, more relevant than any derivative issue or emarketing issue.

Let us stress that when first presented, PageRank was described as a framework to describe a "random surfer" (random traffic), but leaving out of the picture deterministic traffic or traffic induced by pre-patterned link structures. Furthermore, the behavior of "random surfers" entering, moving thru and leaving a patterned structure of links is still an open question. Does the size and shape of the patterned structure matter? How about patterns within patterned structures (fractal patterns)? How about "link-farm" structures that to the untrained eye may look completely random but exhibit elements of scaled-symmetry? Does it matter to a link metric if these structures are located at the core of the Web as a connected graph or near the moving boundaries of that graph? If so, how is metric performance affected? Among others, these are details and valid questions which cannot be ignored by means of penalty functions by PageRank or any candidate metric for link connectivity.

Sections

Available Now » Report 2
WPSS: A General Framework for Web Page Scoring Systems
Another independent study finds more flaws! Entire theoretical framework is questioned.

Want to get out of the pagerank non sense? We can help.

Thank you for using this site.
Status of the Current Document 
W3C CSS Validation  W3C XHTML Validation
Copyright © 2006 Mi Islita.com -