The little things to not forget about during development [Part: 1]

Robots.txt – Telling bots where to go and where not to

What is the robots.txt and why is the robots.txt file important?

The robots.txt file is placed in the root folder of your website. This file instructs search engine bots what can and cannot be indexed. All you need to do is to define some criteria of what can be indexed and what cannot.

Examples:

Below you will see “User-agent: *” which is defining all bots and “Allow: /” is stating index all folders.

User-agent: *
Allow: /

Here is a direct opposite to not allow indexing for all bots.

User-agent: *
Disallow: /

Not allowing indexing for your whole website would probably not be recommended. Although there can be special situations where you would want to block all. For example, if you are working on a ‘beta’ website that you did not want to get index. This would be the situation to use if you were not using an authentication process on the beta site to be able to view the site.

A few years ago when bandwidth was costly I ended up having to create a special rule to block my portfolio image folder. In this example you will see how disallow a folder for all bots.

User-agent: *
Disallow: /images/

For more information regarding robots.txt here are a few resources worth checking out:

http://www.robotstxt.org/robotstxt.html

http://help.yahoo.com/l/us/yahoo/search/webcrawler/

Lastly, it should be noted the robots.txt is merely a recommendation than absolute. Meaning a bot can totally ignore the robots.txt rules that you have laid out. So if you have any important content that you need to keep from being indexed, it is a best practice to put login credentials on the folder storing that content to keep it from the bots.

If you need help with your robots or SEO please feel free to give Force 5 a call.