Google Crawlers Explained (2025) – Types, Properties & SEO Best Practices

Google uses three main types:

1. Common Crawlers #

Standard bots like Googlebot for Search, Google Images, etc.
Always respect robots.txt.
Automatic crawling.

2. Special-Case Crawlers #

Used for specific products with agreements in place.
Example: AdsBot (for checking ad landing pages).
May bypass User-agent: * rules with permission.

3. User-Triggered Fetchers #

Triggered by a user action (not automatic).
Example: Google Site Verifier (checks ownership).
Fetch happens on demand.

2️⃣ Technical Properties #

Distributed Crawling #

Google crawls from many IPs worldwide (mostly US).
May crawl from other countries if US IPs are blocked.

Protocols Supported #

HTTP/1.1 (default)
HTTP/2 (faster, saves resources; opt-out with HTTP 421)
FTP / FTPS (rare use)

Compression Supported #

gzip
deflate
Brotli (br)

(Specified in Accept-Encoding header.)

3️⃣ Crawl Rate & Host Load #

Goal: Crawl maximum pages without overloading servers.
If overloaded → site can reduce crawl rate in Search Console.
Incorrect HTTP status codes can affect crawl behavior.

4️⃣ HTTP Caching Support #

Google crawlers support caching using:

ETag & If-None-Match (preferred)
Last-Modified & If-Modified-Since

💡 Tip:

Use ETag (no date format issues).
Correct Last-Modified format: Fri, 04 Sep 1998 19:15:56 GMT
Optionally set Cache-Control: max-age=<seconds> to hint when to recrawl.

5️⃣ Key Best Practices #

✅ Use correct robots.txt for controlling crawl.
✅ Implement ETag or Last-Modified for efficient recrawls.
✅ Ensure server handles HTTP/2 (unless opting out).
✅ Compress responses (gzip, br) to save resources.
✅ Monitor crawl activity in Search Console → Crawl Stats.

📌 Google’s Common Crawlers (Reference Table) #

Crawler Name	User Agent (Example)	Robots.txt Token	Affected Products
Googlebot Smartphone	Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X…) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)	Googlebot	Google Search (Mobile), Discover, Images, Video, News
Googlebot Desktop	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html)	Googlebot	Google Search (Desktop), Discover, Images, Video, News
Googlebot Image	Googlebot-Image/1.0	Googlebot-Image	Google Images, Search features with images/logos/favicons
Googlebot Video	Googlebot-Video/1.0	Googlebot-Video	Video features in Google Search, video indexing
Googlebot News	Uses Googlebot UA strings	Googlebot-News	Google News, news.google.com, Google News App
Google StoreBot	Mozilla/5.0 (X11; Linux x86_64; Storebot-Google/1.0) Chrome/W.X.Y.Z Safari/537.36	Storebot-Google	Google Shopping (Shopping tab, Shopping surfaces)
Google-InspectionTool	Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)	Google-InspectionTool	Search Console tools (URL Inspection, Rich Result Test)
GoogleOther	Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X…) (compatible; GoogleOther)	GoogleOther	Generic fetcher for internal research; not used for Search
GoogleOther-Image	GoogleOther-Image/1.0	GoogleOther-Image	Fetching publicly accessible images (non-Search)
GoogleOther-Video	GoogleOther-Video/1.0	GoogleOther-Video	Fetching publicly accessible videos (non-Search)
Google-CloudVertexBot	Contains Google-CloudVertexBot in UA	Google-CloudVertexBot	Vertex AI Agents (site-owner requested crawls)
Google-Extended	Uses existing Google UA; token used for permissions	Google-Extended	Controls if site content can be used to train Gemini models

Key Notes:

Chrome/W.X.Y.Z = Placeholder for Chrome version. Always match with wildcard, not exact number.
All Googlebot variants obey robots.txt unless otherwise agreed (special cases).
Google-Extended does not affect Search rankings; only AI model training permissions.

📌 Google’s Special-Case Crawlers #

Crawler Name	User Agent (Example)	Robots.txt Token	Notes / Products Affected
APIs-Google	APIs-Google (+https://developers.google.com/webmasters/APIs-Google.html)	APIs-Google	Push notification delivery via Google APIs (Ignores *)
AdsBot Mobile Web	Mozilla/5.0 (… Mobile Safari/537.36) (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)	AdsBot-Google-Mobile	Google Ads ad quality checks for mobile pages (Ignores *)
AdsBot	AdsBot-Google (+http://www.google.com/adsbot.html)	AdsBot-Google	Google Ads ad quality checks (Ignores *)
AdSense	Mediapartners-Google	Mediapartners-Google	Google AdSense crawler to deliver relevant ads (Ignores *)
Google-Safety	Google-Safety	(Ignores robots.txt)	Malware/abuse discovery for links on Google properties
(Retired) AdsBot Mobile Web (iPhone)	Mozilla/5.0 (iPhone; CPU iPhone OS…) (compatible; AdsBot-Google-Mobile…)	AdsBot-Google-Mobile	Used for iPhone ad quality checks (retired)
(Retired) Duplex on the Web	Mozilla/5.0 (Linux; Android 11; Pixel 2; DuplexWeb-Google/1.0)	DuplexWeb-Google	Supported “Duplex on the Web” service (retired)
(Retired) Google Favicon	Mozilla/5.0 (X11; Linux x86_64) … Google Favicon	Googlebot-Image	Favicon fetching (retired; handled by Googlebot-Image)
(Retired) Mobile Apps Android	AdsBot-Google-Mobile-Apps	AdsBot-Google-Mobile-Apps	Checked Android app page ad quality (retired)
(Retired) Web Light	Mozilla/5.0 (… googleweblight) Chrome/… Mobile Safari/…	googleweblight	Served lightweight pages under slow network (retired)

Key Points for FSIDM Students:

Special-case crawlers may ignore robots.txt (unlike common crawlers).
They operate from different IP ranges (special-crawlers.json) and have rate-limited-proxy-* hostnames.
Mostly tied to Google Ads, AdSense, APIs, and safety/security checks.
Retired crawlers are useful to know for log analysis and historical SEO audits.

📌 Google User-Triggered Fetchers #

Fetcher Name	User Agent (Example)	Purpose / Product
Feedfetcher	FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)	Crawls RSS/Atom feeds for Google News & PubSubHubbub
Google Publisher Center	GoogleProducer; (+https://developers.google.com/search/docs/crawling-indexing/google-producer)	Fetches publisher-supplied feeds for Google News landing pages
Google Read Aloud	Mobile: Mozilla/5.0 (Linux; Android 10; K) … (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943) Desktop: Mozilla/5.0 (X11; Linux x86_64) … (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943) (Former: google-speakr)	On user request, fetches and reads webpages aloud using TTS
Google Site Verifier	Mozilla/5.0 (compatible; Google-Site-Verification/1.0)	Fetches Search Console verification tokens

Key Notes for FSIDM Students #

These fetchers are triggered by a user’s action (not automated bulk crawling).
They generally ignore robots.txt because the action is intentional by a verified user.
Operate from user-triggered-fetchers.json IP ranges with hostnames like:
- ***.gae.googleusercontent.com (Google App Engine)
- google-proxy-***.google.com (Google proxy servers)
Common in server logs during site verification, feed submission, or Google services use.