Choosing the correct logo and positioning ads strategically are but two of the many facets of web promotion. Here you\'ll learn some basics of website marketing.  Home Site Maintenance Web Promotion Chat With Search Engine Spiders
Your Ad Here

Chat With Search Engine Spiders


Robots.txtWe are living in an age where robots and spiders are crawling all over your Web site. No, this isn't a tag line from an old 1950 horror movie, this is the way things are. Don't be frightened though. The fact that you have robots and spiders on your Web site is a good thing. A very good thing if you care about your own success Online. How can you make the most out of the robots and spiders? it all starts with a little file called "robots.txt".

Before I get into what the robots.txt file is all about, there is something I have to cover. If you have been around the proverbial Web master block a few times, you have heard about search engine spiders. They are small "robots" that search engines send out across the Internet to look for content. Just about every major search engine uses them.

Now let us start with what it is. The robots.txt file is a small text file that sits in your root directory. When search engines send out spiders to roam the Internet looking for content to pick up, they read the robots.txt file first. Think of it as your way to talk directly to the search engines.

This is how your Web site ends up on a search engine, like Google. When you "submit" your Web site to a search engine you are putting your domain on a list of Web sites for them to spider over. Now which is best? Is it better for the search engine spider to find you by itself or with you submitting yourself ot the search engine? There is debate for both sides, so I will not get too deep into that.

So you know now that a search engine sends out spiders to pick up content on the Internet. You know you can talk to the spider with including something within your robots.txt file in your root directory. Now comes the fun stuff.

Now that you have a robots.txt file in your root directory, you can figure out what you want to tell the search engine spiders. This time, "Hey, how you doing?!" isn't going to cut it. You have to learn how to speak their language. First there is the User-agent code. The User-agent code specifies the specific search engine you wish to speak to. Each search engine spider has a name. For example, Google's search engine spider's name is "googlebot". Other search engines have other names.

Here is a good Web site to check out if you are curious about what names certain search engines are using.

Search Engine Dictionary - Spider Names

The Web Robots Database

Search Engine Dictionary


To use the User-agent code to call for a specific search engine spiders to read, do this:
User-agent: googlebot

This tells Google's spider that you want them to follow the rules you set in your robots.txt file.

To use the User-agent code to call for all search engine spiders to read, do this:
User-agent: *

This tells all search engine spiders that you want them to follow the rules you set in your robots.txt file.

Now instead of telling them, "It is okay for you to get content from here, here and here" it is much easier to tell the spiders where not to go. That is where the robots.txt file is most helpful. That is where the Disallow: command comes in handy. Using it, you can tell a search engine spider not to get anything inside your "photos" folder for an example. How does it look in the robots.txt file?

Lets make this command for all search engines spiders to stay out of my "photos" folder located in my root directory.

User-agent: *
Disallow: /photos/


That is it! Now I don't have to worry about any search engine spiders looking inside my "photos" folder and indexing what is inside. The thing to remember is to keep your paths relative to where your root directory is. What does that mean?

If your domain name is (Mitchkeeler.com) then in the above example, I just told the spider to stay out of my folder here (Mitchkeeler.com Photos). If my "photos" folder was inside my "images" folder (Mitchkeeler.com-images-photos, then the above example wouldn't have worked.

It would have had to of been changed to this:

User-agent: *
Disallow: /images/photos/


For an example of what a robots.txt file looks like in action, lets take a trip to the White House! Here is the White House's Web site's robots.txt file:

Whitehouse.gov Robots.txt.

Now you are ready to get into your own favorite text editor and create a robots.txt file for yourself. If you still have any questions feel free to shoot me an E-mail or check out these handy links:

Search Engine World - Robots.txt Exclusion Standard Information.

Robotstxt.org.

Author's URL: Mitch Keeler
Thank you for voting.
Rate this Materials:
Bad 
1 2 3 4 5 Excellent
print this page subscribe to newsletter subscribe to rss

Read about the most important web services: web promotion, domain registration and website hosting. All web developers need this website maintenance stuff. More Site Maintenance: Most Popular Materials | Fresh Materials | Website Templates

Add comments to "Chat With Search Engine Spiders"

Only registered users can write comment

No comments yet...