A text file kept in the root of a website. It tells robots, spiders and crawlers where they are allowed to go and not allowed to go on a website.
See Also:
SEO, Page Rank, Deep Linking, Linkage
How Robots.txt works:
Suppose a robot wants to vist a Web site URL(e.g http://www.example.com/welcome.html.), before doing this it firsts checks for http://www.example.com/robots.txt. It tries to find User-agent: * Disallow: /
- 'User-agent: *' means this section applies to all robots.
- 'Disallow: /' instructs the robot that it should not visit any pages on the site.
Some Important features of Robots.txt:
- It is put in the top-level directory of your web server.
- A "/robots.txt" file is a text file with one or more records.
Important Syntaxes of Robots.txt:
- To exclude all robots from the entire server syntax is :
- User-agent: *
- Disallow: /
- To allow all robots complete access syntax is :
- User-agent: *
- Disallow: /
- To exclude all robots from part of the server syntax is :
- User-agent: *
- Disallow: /cgi-bin/
- Disallow: /tmp/
- Disallow: /junk/
- To exclude a single robot server syntax is :
- User-agent: BadBot
- Disallow: /
- To allow a single robot server syntax is :
- User-agent: Google
- Disallow:
- User-agent: *
- Disallow: /
- To exclude all files except one syntax is :
- User-agent: *
- Disallow: /~joe/stuff/
- Alternatively to explicitly disallow all disallowed pages syntax is :
- User-agent: *
- Disallow: /~joe/junk.html
- Disallow: /~joe/foo.html
- Disallow: /~joe/bar.html