Building a Robots.txt File and Implementing It Correctly

Search engines use Web Spiders and bots to reach your website and extract data. The extracted data then goes through a series of transformations before your Website is actually displayed in the SERP (search engine results page). Robots.txt file will tell a search engine when to index or not index a page.

What Happens when Web Spiders Visit Your Site?

When visiting a landing page, Web Spiders will ask for your website’s “/robots.txt file” and look for a “User-agent:” line that refers to it specifically. To tell a robot where it cannot search, rules for a user-agent are set up as “Disallow:” statements. Here is a preview of how it would look:

“Disallow: /test”

This command tells a Web Spider to completely ignore the /test and everything inside it. This means a search will not be performed in the specified directory. In this case, our “/test” directory.

“Disallow: /”

This command tells a Web Spider to ignore the whole site. Usually, webmasters or site admins use this command in specific cases. For e.g., duplicated content or irrelevant content for the website which should not be displayed in the SERP.

“Disallow: “

When you leave a blank field after “disallow: “, the command tells a Web Spider to crawl the entire site with no crawling limitations.

Examples of a Robots.txt File

This example shows how to use Robots.txt commands and implement them correctly.

Assume we want all Web Spiders from all Search Engines not to crawl the /uploads folder. Then the command would look like:

User-agent: *

Disallow: /uploads

This other example specifically tells only to the Google Web search bot (called GoogleBot) not to crawl the /uploads folder:

User-agent: Googlebot

Disallow: /uploads

Observations

You can easily write the codes/commands in a simple notepad file.
To check if robots.txt can be accessed by Web crawlers, access yourdomain.com/robots.txt
To check if the rules in robots.txt are correctly written, you can use an online checker like Google Webmaster tools (Crawl -> Blocked URLs) and follow the instructions on the page.

Building a Robots.txt File and Implementing It Correctly

What Happens when Web Spiders Visit Your Site?

“Disallow: “

Examples of a Robots.txt File

Observations

PREVIOUS

How to Submit your Site to Google?

NEXT

How To Connect and Implement Google Analytics on Your Website?

READY TO GET STARTED?

Get in touch or request a Demo!

GeoRanker

API

SEO Tools

Data Mining

Large-scale search engine data extraction company. High Volume API for SEO companies, data mining services, and SEO Tool Suite as a SaaS.