Go to Robots.txt Generator Page
One way to create a robots.txt file is to visit the robots.txt generator page. On that page, you can set the commands you will give the web crawler.
Figure 1: The robot.txt generator page view from cmlabs
Select Access Permission For Default Robot
Specify access permissions for the default web crawlers, whether they are allowed to crawl URLs or not. There are two options that you can choose, namely, allow and disallow.
Figure 2: Dropdown view of the permission options granted to the default robot
Set Crawl Delay
You can set how long the crawl delay will be for the web crawler. If you set crawl-delay then the web crawler will wait for some time before crawling your URL. Robots.txt generator allows you to choose without crawl delay or delay for 5 to 120 seconds.
Figure 3: A dropdown view of the crawl delay options provided to the default robot
Enter Sitemap (If Any)
A sitemap is a file that lists the URLs of your website, with this file, web crawlers will find it easier to crawl and index your site. You can enter the sitemap path into the field provided.
Make sure you have entered the correct sitemap path because this command is case sensitive (eg “/Sitemap.xml” and “/sitemap.xml” are considered different paths).
Figure 4: The display field for entering the sitemap path associated with your URL
Add Directive In Robots.txt
You can add directives to the robots.txt file by pressing the "Add Directive" button. Directives are commands given to web crawlers to tell you whether you allow or deny them to crawl certain URLs.
Figure 5: Button for adding commands to be executed by the web crawler
In the robots.txt generator, there are three rules that you need to adjust in the directive section, namely:
Set Access Permission
You can set the access permissions granted to web crawlers, whether you allow or disallow them from crawling your web pages. The options that can be used allow and disallow.
Figure 6: Choice of access permissions to be granted to web crawlers
A user-agent is the type of web crawler that you will instruct to crawl. The choice of this web crawler depends on the search engine used, such as Baiduspider, Bingbot, Googlebot, and others. The web crawler option can be selected via the available user-agent dropdown.
Figure 7: User-agent options available in cmlabs robots.txt generator
Enter Directory / File Path
A directory or file path is a specific location of a page that web crawlers may or may not crawl. You must pay close attention to writing the path because this command distinguishes between upper and lower case letters (eg "/File" and "/file" are considered different paths).
Figure 8: Field to add the path to be crawled by the crawler
After entering the command for the web crawler in the field provided, you will see a preview of the robots.txt file in the right section. You can copy the generated syntax and paste it into the robots.txt file that you have created.
Figure 9: Syntax copy options in the robots.txt generator.
Export Syntax Robots.txt
If you don't know how to create your own robots.txt file, you can export the file that cmlabs has generated. Downloading the robots file is quite easy. You can select the "Export" option contained in the robots.text generator tools. Next, the tool will start the download and you will receive a robots.txt file.
Figure 10: Data export options in the robots.txt generator.
Remove Unnecessary Directives
If you want to delete unneeded directives, then you can click the cross icon to the right of the field to enter the directive. Please note that deleted fields cannot be recovered.
Figure 11: The delete data directive option in the robots.txt generator
Reset Robots.txt Generator
This tool has options that make it easier for you to find out how to create another robots.txt file. Click the "Reset" option to delete all the commands you set in robots.txt earlier. Next, you can create a new robots.txt configuration file.
Figure 12: Data reset options in the robots.txt generator.