Link Antispam – Homepage
Homepage / Link Antispam

Link Antispam

Theory and Practice. Why do search engines fight link spam?

Because any type of spam reduces the quality of search results. Link spam targets sites and pages whose positions were obtained unnaturally, aiming to manipulate search engine algorithms.

PageRank

One of the first algorithms designed to combat spam (not link spam but text spam) was Google’s PageRank.

PR(A)=(1-d)+d(PR(T1)/C(T1)+…+PR(Tn)/C(Tn)), where

PR(A) – weight of page A;

PR(Tn) – weight of the page linking to A;

C(Tn) – number of links on page Tn;

d – damping factor, usually 0.85;

1-d – teleportation element.

Principle: a “Random Surfer” moves through links randomly. He may either follow one of the links or “teleport” to any page. The probability of following a link is the PageRank.
PageRank

From the formula we can understand how to manipulate PageRank:

  • the donor page has high weight;
  • the donor page has few outbound links;
  • many donor pages link to A;
  • the distance is 1 (each additional step reduces weight by 0.85).

Conclusions about links on your page:

  • links do not remove weight from the hosting page;
  • the more links there are, the less weight each passes on.

Ironically, PageRank, created to fight text spam, started the age of link spam.

TrustRank

The goal of the algorithm is to detect spam pages and trustworthy ones.

It is based on semi-automatic detection of good pages, relying on these principles:

  • good documents rarely link to bad ones;
  • careful selection of links is inversely proportional to their number.

Working principle:

1. Compute inverted PageRank (based on outbound links).
2. Perform manual evaluation (~200 sites are enough to assess the Web).
0 – spam
1 – good

statia6-1

3. TrustRank spreads as follows:

  • the further from the source, the lower the score;

  • TrustRank is evenly divided among all outbound links.

TrustRank

TrustRank was developed in 2004.

Note: This describes the algorithm used by Yahoo. Other engines may use similar algorithms. Google acquired Hilltop in 2003.

High TrustRank factors:

  • sites in Yaca and Dmoz directories;
  • old sites;
  • sites with unique content;
  • sites that carefully select outbound links.

Example: Wikipedia ranks high because it has been manually assigned a very high trust level.

Topic-sensitive PageRank

This algorithm calculates link weight considering the topic of the donor page. Each topic has its own vector. Topic similarity is measured by vector closeness.

Possible manipulation:

  • buying links from donors with the same topic and high PR;
  • buying links from thematically close donors with high PR.

BrowseRank

BrowseRank is another algorithm used by search engines.

<img

Other articles

News

publicitate moldova

Online promotion in Moldova is evolving fast. If a few years ago a simple landing ...

internettrends1 600x447

Mary Meeker, a renowned web analyst and partner at venture capital firm Kleiner Perkins Caufield ...

Google has announced that the Open Directory (DMOZ) is no longer used as a source ...

fifa 1 1200x900

Throughout December, Chisinau hosted a tournament in the popular football simulator FIFA 2018 on SONY ...