// free tool · 12 AI crawlers · verdicts you can verify
AI Crawler Checker
Plenty of sites blocked AI bots in 2023 with a copy-pasted robots.txt snippet and forgot about it — then wondered in 2026 why ChatGPT never cites them. Enter a domain; get the per-bot verdict, the exact rule that caused it, and the fix block if you want one.
Quick answer
To check whether AI bots can read a site, fetch its robots.txt and test each crawler's User-agent against the rules — most specific group first, longest path rule wins. This tool does that for 12 AI crawlers including GPTBot, ClaudeBot, PerplexityBot, and Google-Extended, and shows the matched rule per verdict.
Checker
Checks the homepage path (/) against the site's robots.txt, 5 checks per minute. Bot list current as of 2026-06-10.
// allow or block
The two-list strategy most sites should run
The blanket choices are both wrong for most businesses. Allow everything and your content trains models that never send you a visitor. Block everything and you vanish from the AI answers your customers increasingly read instead of clicking. The split that works: treat search-and-citation bots (OAI-SearchBot, ChatGPT-User, PerplexityBot, Perplexity-User, Claude-User) as marketing channels and allow them; treat pure training bots (GPTBot, CCBot, Bytespider, meta-externalagent) as a licensing question and decide based on whether exposure or protection is worth more to your business.
For a shop like ours the answer was easy — we allow all twelve, because an AI assistant recommending our toolkits is free distribution, and our content's value is in being found, not in being scarce. A newsroom or a course business selling the content itself can rationally choose the opposite. What is never rational is not knowing your current state: run the check, read the matched rules, and make the robots.txt say what you actually mean. Then give the bots something worth reading — starting with an llms.txt.
// common questions
AI crawlers and robots.txt — common questions
- Which AI crawlers should I care about in 2026?
- Twelve cover the landscape: GPTBot, OAI-SearchBot, and ChatGPT-User (OpenAI), ClaudeBot and Claude-User (Anthropic), PerplexityBot and Perplexity-User, Google-Extended (Gemini training), CCBot (Common Crawl), Bytespider (ByteDance), meta-externalagent (Meta), and Applebot-Extended. This checker tests all twelve against your robots.txt.
- What is the difference between training bots and search bots?
- Training bots (GPTBot, ClaudeBot, CCBot, Bytespider) collect text to train future models — blocking them keeps your content out of training data but does not hide you from AI search. Search and user-fetch bots (OAI-SearchBot, ChatGPT-User, PerplexityBot) retrieve pages to answer live questions with citations — blocking these removes you from AI answers, traffic included. The strategic split matters: many sites allow search bots and block training bots.
- Does blocking Google-Extended hurt my Google rankings?
- No. Google-Extended only controls whether your content trains Gemini models — Google has documented that it is not a Search ranking signal and does not affect inclusion in AI Overviews, which use Googlebot. You can block Google-Extended and lose nothing in classic search.
- If I have no robots.txt, are AI bots allowed?
- Yes. No robots.txt — or a 404 at that path — means every compliant crawler assumes full access. That is the default state of most small sites, and for sellers who want AI visibility it is actually the correct one. The checker reports this case explicitly instead of treating it as an error.
- Do AI companies actually respect robots.txt?
- The major ones documented here do — OpenAI, Anthropic, Google, and Apple publish their user agents and honor the protocol. Reports of non-compliance have centered on smaller or undisclosed scrapers, which is worth knowing: robots.txt is a published preference, not an enforcement mechanism. For hard guarantees you need WAF or CDN-level bot rules.
- How does this checker decide allowed vs blocked?
- It fetches the site’s robots.txt and applies the Robots Exclusion Protocol the way Google documents it: the most specific matching User-agent group wins, then the longest matching path rule, with Allow beating Disallow on ties. The matched rule is shown per bot so you can verify every verdict against the file yourself.
// access is the prerequisite, not the strategy
Crawlable is step zero. Citable is the goal.
The AI Search Visibility Toolkit audits whether AI engines actually cite you once they can read you — 11 skills, citability scoring included.
See the toolkit →