A text file kept in the root of a website. It tells robots, spiders and crawlers where they are allowed to go and not allowed to go on a website.
See Also:
Robots.txt is also known as Robot exclusion standard or the Robots Exclusion Protocol.It is a a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable. This particular file on a website work as a request that specified robots ignore specified files or directories in their search. In case websites with multiple subdomains, then each subdomain must have its own robots.txt file. As an example, suppose example.com had a robots.txt file but a.example.com did not, the rules that would apply for example.com would not apply to a.example.com.
A Web Robot is a program that traverse the Web automatically. Search engines e.g. Google uses them to index the web content. Where as spammers use them to scan for email addresses, and they have many other uses. In a website robots.txt file is used to give instructions about their site to web robots.
Suppose a robot wants to vist a Web site URL(e.g http://www.example.com/welcome.html.), before doing this it firsts checks for http://www.example.com/robots.txt. It tries to find User-agent: * Disallow: /