Question 1

Which AI crawlers should I care about in 2026?

Accepted Answer

Twelve cover the landscape: GPTBot, OAI-SearchBot, and ChatGPT-User (OpenAI), ClaudeBot and Claude-User (Anthropic), PerplexityBot and Perplexity-User, Google-Extended (Gemini training), CCBot (Common Crawl), Bytespider (ByteDance), meta-externalagent (Meta), and Applebot-Extended. This checker tests all twelve against your robots.txt.

Question 2

What is the difference between training bots and search bots?

Accepted Answer

Training bots (GPTBot, ClaudeBot, CCBot, Bytespider) collect text to train future models — blocking them keeps your content out of training data but does not hide you from AI search. Search and user-fetch bots (OAI-SearchBot, ChatGPT-User, PerplexityBot) retrieve pages to answer live questions with citations — blocking these removes you from AI answers, traffic included. The strategic split matters: many sites allow search bots and block training bots.

Question 3

Does blocking Google-Extended hurt my Google rankings?

Accepted Answer

No. Google-Extended only controls whether your content trains Gemini models — Google has documented that it is not a Search ranking signal and does not affect inclusion in AI Overviews, which use Googlebot. You can block Google-Extended and lose nothing in classic search.

Question 4

If I have no robots.txt, are AI bots allowed?

Accepted Answer

Yes. No robots.txt — or a 404 at that path — means every compliant crawler assumes full access. That is the default state of most small sites, and for sellers who want AI visibility it is actually the correct one. The checker reports this case explicitly instead of treating it as an error.

Question 5

Do AI companies actually respect robots.txt?

Accepted Answer

The major ones documented here do — OpenAI, Anthropic, Google, and Apple publish their user agents and honor the protocol. Reports of non-compliance have centered on smaller or undisclosed scrapers, which is worth knowing: robots.txt is a published preference, not an enforcement mechanism. For hard guarantees you need WAF or CDN-level bot rules.

Question 6

How does this checker decide allowed vs blocked?

Accepted Answer

It fetches the site’s robots.txt and applies the Robots Exclusion Protocol the way Google documents it: the most specific matching User-agent group wins, then the longest matching path rule, with Allow beating Disallow on ties. The matched rule is shown per bot so you can verify every verdict against the file yourself.

AI Crawler Checker

The two-list strategy most sites should run

AI crawlers and robots.txt — common questions

This tool is a chapter of Found by AI