What is a robots.txt File?
The robots.txt file is a file placed on your server and used to keep Web pages from being crawled and indexed by search engines. The file itself is a simple text file, which can be created in Notepad.exe. It needs to be saved to the root directory of your site (where your home page or index page is).
Robots.txt file provides you with more functionality than meta robots tag which is available only partially to control behaviour of search engines. You can use it to prevent indexing totally, prevent certain areas of your site from being indexed or to issue individual indexing instructions to specific search engines. You can dictate that search engines crawl only one page per a certain time of period and also specify that they may crawl during certain hours of the day.
Why do I need robots.txt file?
• to prevent search engine spiders from consuming excessive amounts of bandwidth on your server
• to prevent potential copyright infringements
• for websites with multiple subdomains, each subdomain must have its own robots.txt file.
• while content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole
Creating a robots.txt file.
• Open any text editor, such as notepad.exe
• Type “User-Agent: *”.This line describes which search engine spiders should obey this rule. The * refers to all robots.
• To disallow all robots from crawling your /directory, add this text: Disallow:/yourdirectoryname/, to disallow certain file from being indexed, add this text: Disallow:/directory/file.html.
• Save the file and name it robots.txt. Upload the file to your Web root directory.