Robots.txt – site indexing management – Homepage
Homepage / Robots.txt – site indexing management

Robots.txt – site indexing management

Managing site indexing by all search engines is possible using the robots.txt file, located in the root directory of the server. This file tells search engine crawlers (bots) which files they may index and which they may not.

The robots.txt file consists of records. Each record has at least two lines: a line with the client application name – User-agent, and one or more lines starting with the Disallow directive. Empty lines are significant, they separate records with different User-agent values.

User-agent

The User-agent line specifies the name of the crawler. For example, the following line specifies Google’s bot – “googlebot”:

User-agent: googlebot

Yandex bot – “Yandex”
Rambler bot – “StackRambler”
Yahoo! bot – “Yahoo! Slurp”
MSN bot – “msnbot”

Other crawler names can be found in your server logs.

If you want to block indexing of files and/or folders by all search engines, you can use the wildcard “*”:

User-agent: *
Disallow

The second part of the record contains Disallow lines. These directives tell the bot which files and/or folders must not be indexed. The paths in Disallow are relative, not absolute, so you do not need to enter the domain name.

Example: the following directive forbids indexing of the file “download.htm” located in the root directory:

Disallow: download.htm

The directive can also specify a folder. Example: forbid indexing of the “cgi-bin” directory:

Disallow: /cgi-bin/

The next directive blocks both the file “catalog.html” and the “catalog” folder:

Disallow: /catalog

If the Disallow line is empty, the bot may index everything. At least one Disallow directive must be present for each User-agent, otherwise robots.txt may not be read correctly. A completely empty robots.txt is equivalent to its absence.

Allow full indexing for all search engines:

User-agent: *
Disallow:

Block all indexing for all search engines:

User-agent: *
Disallow: /

Block indexing of the “cgi-bin” folder:

User-Agent: *
Disallow: /cgi-bin/

Block indexing of the “download.htm” file:

User-Agent: *
Disallow: download.htm

Block indexing of “download.htm” and the “cgi-bin” folder:

User-Agent: *
Disallow: /cgi-bin/
Disallow: download.htm

Block indexing of “download.htm” only for Google’s bot – “googlebot”:

User-agent: googlebot
Disallow: download.htm

Comments

Any text following the “#” symbol to the end of the line is considered a comment and ignored by bots. Example:

# Yahoo! No index.
User-agent: Yahoo! Slurp
Disallow: /

Other articles

News

publicitate moldova

Online promotion in Moldova is evolving fast. If a few years ago a simple landing ...

internettrends1 600x447

Mary Meeker, a renowned web analyst and partner at venture capital firm Kleiner Perkins Caufield ...

Google has announced that the Open Directory (DMOZ) is no longer used as a source ...

fifa 1 1200x900

Throughout December, Chisinau hosted a tournament in the popular football simulator FIFA 2018 on SONY ...