Skip to main content

How to Generate a robots.txt File

Create a properly formatted robots.txt file for your website with our free Robots.txt Generator. Control how search engines crawl your site.

Loading tool...

Steps

1

Add your sitemap URL

Enter the full URL of your XML sitemap (e.g., https://example.com/sitemap.xml). Including the sitemap directive in robots.txt ensures search engine crawlers discover it even if they have not been submitted via Search Console.

2

Define rules for Googlebot

Add Allow and Disallow rules for Googlebot specifically. Use Disallow: /admin/ to block the admin panel, Disallow: /private/ for private sections, and Disallow: ?* to block all URLs with query strings (useful for preventing crawling of search results pages).

3

Add rules for other user agents

Add separate sections for other major crawlers: Bingbot, DuckDuckBot, Yandex, and AhrefsBot. You can also add rules for AI training crawlers like GPTBot and Google-Extended if you want to block them from training AI models on your content.

4

Set crawl delay (optional)

Add a Crawl-delay directive to slow down aggressive crawlers if your server cannot handle the default crawl rate. Note that Googlebot ignores Crawl-delay — use Google Search Console to set Googlebot's crawl rate instead.

5

Test and deploy

Test your robots.txt rules using Google's robots.txt Tester in Search Console before deploying. Upload the file to the root of your domain (https://yourdomain.com/robots.txt). It must be at the exact root path to be recognised.

robots.txt Syntax and Common Patterns

A robots.txt file consists of groups of directives, each starting with a User-agent line specifying which crawler the rules apply to (* means all crawlers). Each group then contains Allow and Disallow directives with URL paths. Paths are prefix-matched: Disallow: /admin/ blocks all URLs starting with /admin/. Disallow: / blocks everything. Disallow: (empty value) allows everything. The Allow directive overrides Disallow for more specific paths: you can Disallow: /api/ while allowing Allow: /api/public/. Common sections to disallow include: /admin/, /login/, /wp-admin/ (WordPress), /private/, /checkout/, /cart/, /*.pdf$ (to save crawl budget on PDFs), and /search? (to block internal search results pages that create duplicate content).

Crawl Budget and Why robots.txt Matters for Large Sites

Crawl budget is the number of pages Googlebot crawls on your site within a given time period. For small sites (under a few thousand pages), crawl budget is rarely a concern — Google crawls all pages promptly. For large sites with hundreds of thousands of URLs, managing crawl budget becomes important: you want Google spending its crawl time on your most valuable pages rather than on thin parameter pages, duplicate content, pagination, or filtered views. Proper robots.txt configuration, combined with a clean XML sitemap, ensures crawlers prioritise your best content. Signs of crawl budget waste include duplicate pages from URL parameters (sort, filter, tracking parameters), session ID URLs, infinite scroll or calendar navigation, and search results pages.

Frequently Asked Questions

Related Tools