Thinkbiz Solutions

Robots.txt

A text file kept in the root of a website. It tells robots, spiders and crawlers where they are allowed to go and not allowed to go on a website.

See Also:

SEO, Page Rank, Deep Linking, Linkage

Robots.txt

Robots.txt is also known as Robot exclusion standard or the Robots Exclusion Protocol.It is a a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable. This particular file on a website work as a request that specified robots ignore specified files or directories in their search. In case websites with multiple subdomains, then each subdomain must have its own robots.txt file. As an example, suppose example.com had a robots.txt file but a.example.com did not, the rules that would apply for example.com would not apply to a.example.com.

A Web Robot is a program that traverse the Web automatically. Search engines e.g. Google uses them to index the web content. Where as spammers use them to scan for email addresses, and they have many other uses. In a website robots.txt file is used to give instructions about their site to web robots.

How Robots.txt works:

Suppose a robot wants to vist a Web site URL(e.g http://www.example.com/welcome.html.), before doing this it firsts checks for http://www.example.com/robots.txt. It tries to find User-agent: * Disallow: /

  • 'User-agent: *' means this section applies to all robots.
  • 'Disallow: /' instructs the robot that it should not visit any pages on the site.

Some Important features of Robots.txt:

  • It is put in the top-level directory of your web server.
  • A "/robots.txt" file is a text file with one or more records.

Important Syntaxes of Robots.txt:

  • To exclude all robots from the entire server syntax is :
    • User-agent: *
    • Disallow: /
  • To allow all robots complete access syntax is :
    • User-agent: *
    • Disallow: /
  • To exclude all robots from part of the server syntax is :
    • User-agent: *
    • Disallow: /cgi-bin/
    • Disallow: /tmp/
    • Disallow: /junk/
  • To exclude a single robot server syntax is :
    • User-agent: BadBot
    • Disallow: /
  • To allow a single robot server syntax is :
    • User-agent: Google
    • Disallow:
    • User-agent: *
    • Disallow: /
  • To exclude all files except one syntax is :
    • User-agent: *
    • Disallow: /~joe/stuff/
  • Alternatively to explicitly disallow all disallowed pages syntax is :
    • User-agent: *
    • Disallow: /~joe/junk.html
    • Disallow: /~joe/foo.html
    • Disallow: /~joe/bar.html
Follow us on: Twiter InFacebook Flicker
Payment Options: Paypal Visa Wire Transfer

SEO Forum | SEO Tips | SEO Free Quote | SEO Articles | Resources | Links | SEO Glossary | SEO Tools | Sitemap

LSI Based SEO | PHP Development | ASP.Net Development | JAVA Development | Ajax Development

SEO India | SEO company India | Search Engine Optimization | SEO Services India | SEO Agency India | SEO Expert India | SEO Firm India

PPC India | Pay Per Click Management India | PPC Advertising Company India