On-line request

Error: Contact form not found.

Homepage / Robots.txt – site indexing management

Robots.txt – site indexing management

Managing site indexing by all search engines is possible using the robots.txt file, located in the root directory of the server. This file tells search engine crawlers (bots) which files they may index and which they may not.

The robots.txt file consists of records. Each record has at least two lines: a line with the client application name – User-agent, and one or more lines starting with the Disallow directive. Empty lines are significant, they separate records with different User-agent values.

User-agent

The User-agent line specifies the name of the crawler. For example, the following line specifies Google’s bot – “googlebot”:

User-agent: googlebot

Yandex bot – “Yandex”
Rambler bot – “StackRambler”
Yahoo! bot – “Yahoo! Slurp”
MSN bot – “msnbot”

Other crawler names can be found in your server logs.

If you want to block indexing of files and/or folders by all search engines, you can use the wildcard “*”:

User-agent: *
Disallow

The second part of the record contains Disallow lines. These directives tell the bot which files and/or folders must not be indexed. The paths in Disallow are relative, not absolute, so you do not need to enter the domain name.

Example: the following directive forbids indexing of the file “download.htm” located in the root directory:

Disallow: download.htm

The directive can also specify a folder. Example: forbid indexing of the “cgi-bin” directory:

Disallow: /cgi-bin/

The next directive blocks both the file “catalog.html” and the “catalog” folder:

Disallow: /catalog

If the Disallow line is empty, the bot may index everything. At least one Disallow directive must be present for each User-agent, otherwise robots.txt may not be read correctly. A completely empty robots.txt is equivalent to its absence.

Allow full indexing for all search engines:

User-agent: *
Disallow:

Block all indexing for all search engines:

User-agent: *
Disallow: /

Block indexing of the “cgi-bin” folder:

User-Agent: *
Disallow: /cgi-bin/

Block indexing of the “download.htm” file:

User-Agent: *
Disallow: download.htm

Block indexing of “download.htm” and the “cgi-bin” folder:

User-Agent: *
Disallow: /cgi-bin/
Disallow: download.htm

Block indexing of “download.htm” only for Google’s bot – “googlebot”:

User-agent: googlebot
Disallow: download.htm

Comments

Any text following the “#” symbol to the end of the line is considered a comment and ignored by bots. Example:

# Yahoo! No index.
User-agent: Yahoo! Slurp
Disallow: /

On-line request

Robots.txt – site indexing management

Other articles

News

🌍 Moldovan business online:...

Mary Meeker Names Internet...

Google больше не использует...

SEMSEO debuts at the...

On-line request

Robots.txt – site indexing management

Share

Other articles

News

🌍 Moldovan business online:...

Mary Meeker Names Internet...

Google больше не использует...

SEMSEO debuts at the...