Add rules to generate your robots.txt
Groups 0
Rules 0
Sitemaps 0
User-Agent Groups
Sitemaps
Options
robots.txt

        

What Is robots.txt and Why It Matters

The robots.txt file is a plain text file placed at the root of your website that tells search engine crawlers which pages or sections they can and cannot access. It follows the Robots Exclusion Protocol, a standard that has been in use since 1994.

Every time a crawler visits your site, it checks https://yoursite.com/robots.txt first. If the file exists, the crawler reads the rules and respects them before crawling any pages. However, robots.txt is a directive, not a security measure — well-behaved bots follow it, but malicious bots can ignore it entirely.

Proper robots.txt configuration is important for several reasons. First, it prevents crawlers from wasting your crawl budget on low-value pages like admin panels, search result pages, or duplicate content. Second, it keeps private staging areas and internal tools out of search indexes. Third, with the rise of AI crawlers like GPTBot and CCBot, robots.txt has become the primary mechanism for controlling AI training access to your content.

That said, robots.txt does not remove pages from Google's index. If Google already knows about a URL (through links, sitemaps, or prior crawling), blocking it in robots.txt prevents crawling but the URL may still appear in search results. To truly deindex a page, use the noindex meta tag instead. For more on how these directives interact, see our guide on crawl budget optimization.


Frequently Asked Questions

What is robots.txt?

robots.txt is a plain text file placed at the root of a website (e.g., https://example.com/robots.txt) that instructs search engine crawlers which URLs they are allowed or not allowed to crawl. It follows the Robots Exclusion Protocol and is checked by crawlers before they access any page on your site.

Does robots.txt block pages from appearing in Google?

No. Blocking a URL in robots.txt prevents crawling, but the URL can still appear in search results if Google discovers it through external links or sitemaps. The listing will show the URL without a snippet. To remove a page from search results entirely, use a noindex meta tag or X-Robots-Tag HTTP header instead.

How do I block AI crawlers like GPTBot?

Add a separate User-agent group for the AI crawler and disallow all paths. For example: User-agent: GPTBot followed by Disallow: /. Common AI crawlers include GPTBot (OpenAI), ChatGPT-User, Google-Extended (Gemini training), CCBot (Common Crawl), and anthropic-ai (Anthropic). This tool includes all of these in the User-agent dropdown.

What is the difference between Disallow and Allow?

Disallow tells crawlers not to access a URL path. Allow explicitly permits crawling of a path, which is useful for allowing a subdirectory inside a disallowed parent. When both match, the more specific rule wins. For example, Disallow: /private/ with Allow: /private/public-page blocks the entire /private/ directory except for /private/public-page.

Should I add my sitemap to robots.txt?

Yes, it is a best practice. Adding Sitemap: https://yoursite.com/sitemap.xml to your robots.txt helps crawlers discover your XML sitemap even if it is not submitted through Google Search Console or Bing Webmaster Tools. You can include multiple Sitemap directives.

What is Crawl-delay?

Crawl-delay is a directive that tells crawlers to wait a specified number of seconds between requests. It is respected by some crawlers (Bing, Yandex, Baidu) but ignored by Googlebot. Google recommends using the crawl rate settings in Google Search Console instead. Use Crawl-delay only if your server has limited resources.

Is this robots.txt generator free?

Yes, completely free with no limits. No registration required. Everything runs in your browser — your robots.txt content is never sent to any server. You can also use the tester tab to validate existing robots.txt files and test URL access rules.