Robots.txt Generator

Easily create a robots.txt file to control search engine and AI crawler access to your website.

Search Engines

AI Crawlers

    Generated Output

    What is Robots.txt?

    robots.txt is a text file placed at the root of your website that tells search engine crawlers which pages they can or cannot access. It must be located at https://domain/robots.txt and encoded in UTF-8.

    Key Directives

    User-agent: Specifies the crawler the rule applies to. * means all crawlers.
    Allow: Permits crawling of the specified path. Takes precedence over Disallow.
    Disallow: Blocks crawling of the specified path.
    Sitemap: Tells crawlers where your sitemap is located.
    Crawl-delay: Sets seconds between requests (ignored by Google; supported by Bing, Yandex, etc.).

    Real-world patterns

    These are configurations commonly used on production sites. Before copying, adjust the paths to match your own site structure.

    Personal blog or basic site

    User-agent: *
    Allow: /
    
    Sitemap: https://example.com/sitemap.xml

    This is sufficient for most personal blogs and small sites. Simply declaring a Sitemap URL improves how quickly your pages are discovered.

    WordPress standard setup

    User-agent: *
    Allow: /
    Allow: /wp-admin/admin-ajax.php
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /?s=
    
    Sitemap: https://example.com/sitemap_index.xml

    admin-ajax.php needs to be allowed because it is the public endpoint called by the front end. /?s= produces internal search result pages that can trigger duplicate-content issues, so blocking it is recommended.

    Block AI training crawlers, allow search engines

    User-agent: *
    Allow: /
    
    User-agent: GPTBot
    Disallow: /
    
    User-agent: ClaudeBot
    Disallow: /
    
    User-agent: Google-Extended
    Disallow: /
    
    User-agent: CCBot
    Disallow: /
    
    Sitemap: https://example.com/sitemap.xml

    Regular search crawlers (Googlebot, Bingbot, Yeti) retain full access while generative-AI training crawlers are blocked. This became practical after 2024, when major AI companies split their User-Agent strings between training and real-time browsing.

    Staging or test server - full block

    User-agent: *
    Disallow: /

    One line is all it takes to prevent a pre-production server from being accidentally indexed and incurring duplicate-content penalties. Just remember to swap it out for the allow-all version before going live.

    What robots.txt can and cannot do

    Pre-deployment checklist

    1. Reachability - Open https://yourdomain/robots.txt in a browser and confirm it returns HTTP 200 with the correct content. Each subdomain needs its own file.
    2. Google Search Console robots.txt report - Search Console shows the parsed state, any syntax errors, the cached version, and the date it was last fetched.
    3. Bing Webmaster Tools robots.txt Tester - Instantly checks whether a specific URL is allowed or blocked for Bingbot.
    4. Case sensitivity and trailing slashes - URLs are case-sensitive. /Admin/ and /admin/ are different paths. The presence or absence of a trailing slash also matters.
    5. Wildcards - * matches any sequence of characters; $ anchors the end of a string. For example, Disallow: /*.pdf$ blocks all PDF files.
    6. Comments - Lines starting with # are ignored by crawlers. Annotate your rules so future you can debug them without guessing.
    7. Propagation time - Crawlers typically cache robots.txt for up to 24 hours. For urgent changes, request a manual re-crawl via Search Console.

    Common mistakes

    To strengthen your SEO setup alongside robots.txt, add structured data with the JSON-LD Generator and use the Schema.org Types Reference to choose markup that matches the page. You can also make your site recognizable with the Favicon Generator.

    FAQ

    Is robots.txt required?

    No. Without one, crawlers assume they can access all pages. However, it helps prevent indexing of admin pages, internal search results, and other unwanted content.

    Can robots.txt completely prevent indexing?

    No. robots.txt is a recommendation, not enforcement. URLs can still be indexed via external links. For full blocking, also use <meta name="robots" content="noindex">.

    Can I block AI crawlers like GPTBot or ClaudeBot?

    Yes. Major AI crawlers (GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, CCBot) respect robots.txt. Set their User-agent with Disallow: / to block training data collection.

    Do all crawlers support Crawl-delay?

    Google ignores Crawl-delay. Bing, Yandex, and Naver (Yeti) support it. For Google, adjust crawl rate via Search Console.

    Where should I place the robots.txt file?

    It must be at the domain root, e.g. https://example.com/robots.txt. Placing it in a subdirectory (e.g. /blog/robots.txt) will not be recognized by crawlers.