Web Crawler User Agent

Google Crawler (User Agent) Overview | Google Search Central

Google Crawler (User Agent) Overview | Google Search Central
Documentation
Introduction
Just the basics
Beginner SEO
Advanced SEO
Support
Blog
What’s new
Events
Case studies
“Crawler” is a generic term for any program (such as a robot or spider) that is used to
automatically discover and scan websites by following links from one webpage to another.
Google’s main crawler is called
Googlebot. This table lists information
about the common Google crawlers you may see in your referrer logs, and how to
specify them in, the
robots meta tags, and the
X-Robots-Tag HTTP directives.
The following table shows the crawlers used by various products and services at Google:
The user agent token is used in the User-agent: line in
to match a crawler type when writing crawl rules for your site. Some crawlers have more than
one token, as shown in the table; you need to match only one crawler token for a rule to
apply. This list is not complete, but covers most of the crawlers you might see on your
website.
The full user agent string is a full description of the crawler, and appears in
the request and your web logs.
Crawlers
APIs-Google
User agent token
Full user agent string
APIs-Google (+)
AdSense
Mediapartners-Google
AdsBot Mobile Web Android
Checks Android web page ad quality.
AdsBot-Google-Mobile
Mozilla/5. 0 (Linux; Android 5. 0; SM-G920A) AppleWebKit (KHTML, like Gecko) Chrome Mobile Safari (compatible; AdsBot-Google-Mobile; +)
AdsBot Mobile Web
Checks iPhone web
page ad quality.
Mozilla/5. 0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601. 1. 46 (KHTML, like Gecko) Version/9. 0 Mobile/13B143 Safari/601. 1 (compatible; AdsBot-Google-Mobile; +)
AdsBot
Checks desktop web
AdsBot-Google
AdsBot-Google (+)
Googlebot Image
User agent tokens
Googlebot-Image
Googlebot
Googlebot-Image/1. 0
Googlebot News
Googlebot-News
Googlebot Video
Googlebot-Video
Googlebot-Video/1. 0
Googlebot Desktop
Full user agent strings
Mozilla/5. 0 (compatible; Googlebot/2. 1; +)
Mozilla/5. 0 AppleWebKit/537. 36 (KHTML, like Gecko; compatible; Googlebot/2. 1; +) Chrome/W. X. Y. Z Safari/537. 36
Googlebot/2. 1 (+)
Googlebot Smartphone
Mozilla/5. 0 (Linux; Android 6. 0. 1; Nexus 5X Build/MMB29P) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/W. Z Mobile Safari/537. 36 (compatible; Googlebot/2. 1; +)
Mobile AdSense
(Various mobile device types) (compatible; Mediapartners-Google/2. 1; +)
Mobile Apps Android
Checks Android app page ad
quality. Obeys AdsBot-Google robots rules.
AdsBot-Google-Mobile-Apps
Feedfetcher
FeedFetcher-Google
FeedFetcher-Google; (+)
Google Read Aloud
Google-Read-Aloud
Current agents:
Desktop agent: Mozilla/5. 0 (X11; Linux x86_64) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/41. 2272. 118 Safari/537. 36 (compatible; Google-Read-Aloud; +)
Mobile agent: Mozilla/5. 0 (Linux; Android 7. 0; SM-G930V Build/NRD90M) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/59. 3071. 125 Mobile Safari/537. 36 (compatible; Google-Read-Aloud; +)
Former agent (deprecated):
google-speakr
Duplex on the web
DuplexWeb-Google
Mozilla/5. 0 (Linux; Android 11; Pixel 2; DuplexWeb-Google/1. 0) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/86. 4240. 193 Mobile Safari/537. 36
Google Favicon
Mozilla/5. 36 (KHTML, like Gecko) Chrome/49. 2623. 75 Safari/537. 36 Google Favicon
Web Light
googleweblight
Mozilla/5. 0 (Linux; Android 4. 2. 1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535. 19 (KHTML, like Gecko; googleweblight) Chrome/38. 1025. 166 Mobile Safari/535. 19
Google StoreBot
Storebot-Google
Desktop agent:
Mozilla/5. 0 (X11; Linux x86_64; Storebot-Google/1. 36 (KHTML, like Gecko) Chrome/79. 3945. 88 Safari/537. 36
Mobile agent:
Mozilla/5. 0 (Linux; Android 8. 0; Pixel 2 Build/OPD3. 170816. 012; Storebot-Google/1. 36 (KHTML, like Gecko) Chrome/81. 4044. 138 Mobile Safari/537. 36
User agents in
Where several user agents are recognized in the file, Google will follow the most
specific. If you want all of Google to be able to crawl your pages, you don’t need a
file at all. If you want to block or allow all of Google’s crawlers from accessing
some of your content, you can do this by specifying Googlebot as the user agent. For example,
if you want all your pages to appear in Google Search, and if you want AdSense ads to appear
on your pages, you don’t need a file. Similarly, if you want to block some pages
from Google altogether, blocking the Googlebot user agent will also block all
Google’s other user agents.
But if you want more fine-grained control, you can get more specific. For example, you might
want all your pages to appear in Google Search, but you don’t want images in your personal
directory to be crawled. In this case, use to disallow the
Googlebot-Image user agent from crawling the files in your personal directory
(while allowing Googlebot to crawl all files), like this:
User-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow: /personal
To take another example, say that you want ads on all your pages, but you don’t want those
pages to appear in Google Search. Here, you’d block Googlebot, but allow the
Mediapartners-Google user agent, like this:
Disallow: /
User-agent: Mediapartners-Google
Some pages use multiple robots meta tags to specify directives for different crawlers, like
this:


In this case, Google will use the sum of the negative directives, and Googlebot will follow
both the noindex and nofollow directives.
More detailed information about controlling how Google crawls and indexes your site.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4. 0 License, and code samples are licensed under the Apache 2. 0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2021-09-09 UTC.
[{
“type”: “thumb-down”,
“id”: “missingTheInformationINeed”,
“label”:”Missing the information I need”}, {
“id”: “tooComplicatedTooManySteps”,
“label”:”Too complicated / too many steps”}, {
“id”: “outOfDate”,
“label”:”Out of date”}, {
“id”: “samplesCodeIssue”,
“label”:”Samples / code issue”}, {
“id”: “otherDown”,
“label”:”Other”}]
“type”: “thumb-up”,
“id”: “easyToUnderstand”,
“label”:”Easy to understand”}, {
“id”: “solvedMyProblem”,
“label”:”Solved my problem”}, {
“id”: “otherUp”,
“label”:”Other”}]
Web Crawlers and User Agents - Top 10 Most Popular

Web Crawlers and User Agents – Top 10 Most Popular

By Brian JacksonUpdated on June 6, 2017
When it comes to the world wide web there are both bad bots and good bots. The bad bots you definitely want to avoid as these consume your CDN bandwidth, take up server resources, and steal your content. Good bots (also known as web crawlers) on the other hand, should be handled with care as they are a vital part of getting your content to index with search engines such as Google, Bing, and Yahoo. Read more below about some of the top 10 web crawlers and user agents to ensure you are handling them crawlersWeb crawlers, also known as web spiders or internet bots, are programs that browse the web in an automated manner for the purpose of indexing content. Crawlers can look at all sorts of data such as content, links on a page, broken links, sitemaps, and HTML code engines like Google, Bing, and Yahoo use crawlers to properly index downloaded pages so that users can find them faster and more efficiently when they are searching. Without web crawlers, there would be nothing to tell them that your website has new and fresh content. Sitemaps also can play a part in that process. So web crawlers, for the most part, are a good thing. However there are also issues sometimes when it comes to scheduling and load as a crawler might be constantly polling your site. And this is where a file comes into play. This file can help control the crawl traffic and ensure that it doesn’t overwhelm your crawlers identify themselves to a web server by using the User-Agent request header in an HTTP request, and each crawler has their own unique identifier. Most of the time you will need to examine your web server referrer logs to view web crawler placing a file at the root of your web server you can define rules for web crawlers, such as allow or disallow certain assets from being crawled. Web crawlers must follow the rules defined in this file. You can apply generic rules which apply to all bots or get more granular and specify their specific User-Agent string. Example 1This example instructs all Search engine robots to not index any of the website’s content. This is defined by disallowing the root / of your *
Disallow: /
Example 2This example achieves the opposite of the previous one. In this case, the instructions are still applied to all user agents, however there is nothing defined within the Disallow instruction, meaning that everything can be *
Disallow:
To see more examples make sure to check out our in-depth post on how to use a 10 web crawlers and botsThere are hundreds of web crawlers and bots scouring the internet but below is a list of 10 popular web crawlers and bots that we have been collected based on ones that we see on a regular basis within our web server logs. 1. GoogleBotGooglebot is obviously one of the most popular web crawlers on the internet today as it is used to index content for Google’s search engine. Patrick Sexton wrote a great article about what a Googlebot is and how it pertains to your website indexing. One great thing about Google’s web crawler is that they give us a lot of tools and control over the
Full User-Agent stringMozilla/5. 0 (compatible; Googlebot/2. 1; +)
Googlebot example in robots. txtThis example displays a little more granularity pertaining to the instructions defined. Here, the instructions are only relevant to Googlebot. More specifically, it is telling Google not to index a specific page (/no-index/) Googlebot
Disallow: /no-index/
Besides Google’s web search crawler, they actually have 9 additional web crawlers:Web crawlerUser-Agent stringGooglebot NewsGooglebot-NewsGooglebot ImagesGooglebot-Image/1. 0Googlebot VideoGooglebot-Video/1. 0Google Mobile (featured phone)SAMSUNG-SGH-E250/1. 0 Profile/MIDP-2. 0 Configuration/CLDC-1. 1 owser/6. 2. 3. c. 101 (GUI) MMP/2. 0 (compatible; Googlebot-Mobile/2. 1; +)Google SmartphoneMozilla/5. 0 (Linux; Android 6. 0. 1; Nexus 5X Build/MMB29P) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/41. 2272. 96 Mobile Safari/537. 36 (compatible; Googlebot/2. 1; +)Google Mobile Adsense(compatible; Mediapartners-Google/2. 1; +)Google AdsenseMediapartners-GoogleGoogle AdsBot (PPC landing page quality)AdsBot-Google (+)Google app crawler (fetch resources for mobile)AdsBot-Google-Mobile-AppsYou can use the Fetch tool in Google Search Console to test how Google crawls or renders a URL on your site. See whether Googlebot can access a page on your site, how it renders the page, and whether any page resources (such as images or scripts) are blocked to can also see the Googlebot crawl stats per day, the amount of kilobytes downloaded, and time spent downloading a Googlebot documentation. BingbotBingbot is a web crawler deployed by Microsoft in 2010 to supply information to their Bing search engine. This is the replacement of what used to be the MSN
Full User-Agent stringMozilla/5. 0 (compatible; Bingbot/2. 0; +)
Bing also has a very similar tool as Google, called Fetch as Bingbot, within Bing Webmaster Tools. Fetch As Bingbot allows you to request a page be crawled and shown to you as our crawler would see it. You will see the page code as Bingbot would see it, helping you to understand if they are seeing your page as you Bingbot documentation. Slurp BotYahoo Search results come from the Yahoo web crawler Slurp and Bing’s web crawler, as a lot of Yahoo is now powered by Bing. Sites should allow Yahoo Slurp access in order to appear in Yahoo Mobile Search ditionally, Slurp does the following:Collects content from partner sites for inclusion within sites like Yahoo News, Yahoo Finance and Yahoo cesses pages from sites across the Web to confirm accuracy and improve Yahoo’s personalized content for our
Full User-Agent stringMozilla/5. 0 (compatible; Yahoo! Slurp;)
See Slurp documentation. 4. DuckDuckBotDuckDuckBot is the Web crawler for DuckDuckGo, a search engine that has become quite popular lately as it is known for privacy and not tracking you. It now handles over 12 million queries per day. DuckDuckGo gets its results from over four hundred sources. These include hundreds of vertical sources delivering niche Instant Answers, DuckDuckBot (their crawler) and crowd-sourced sites (Wikipedia). They also have more traditional links in the search results, which they source from Yahoo!, Yandex and
Full User-Agent stringDuckDuckBot/1. 0; (+)
It respects WWW::RobotRules and originates from these IP addresses:72. 94. 249. 3472. 3572. 3672. 3772. 385. BaiduspiderBaiduspider is the official name of the Chinese Baidu search engine’s web crawling spider. It crawls web pages and returns updates to the Baidu index. Baidu is the leading Chinese search engine that takes an 80% share of the overall search engine market of China
Full User-Agent stringMozilla/5. 0 (compatible; Baiduspider/2. 0; +)
Besides Baidu’s web search crawler, they actually have 6 additional web crawlers:Web crawlerUser-Agent stringImage SearchBaiduspider-imageVideo SearchBaiduspider-videoNews SearchBaiduspider-newsBaidu wishlistsBaiduspider-favoBaidu UnionBaiduspider-cproBusiness SearchBaiduspider-adsOther search pagesBaiduspiderSee Baidu documentation. 6. Yandex BotYandexBot is the web crawler to one of the largest Russian search engines, Yandex. According to LiveInternet, for the three months ended December 31, 2015, they generated 57. 3% of all search traffic in
Full User-Agent stringMozilla/5. 0 (compatible; YandexBot/3. 0; +)
There are many different User-Agent strings that the YandexBot can show up as in your server logs. See the full list of Yandex robots and Yandex documentation. 7. Sogou SpiderSogou Spider is the web crawler for, a leading Chinese search engine that was launched in 2004. As of April 2016 it has a rank of 103 in Alexa’s internet The Sogou web spider does not respect the robots exclusion standard, and is therefore banned from many websites because of excessive Pic Spider/3. 0()
Sogou head spider/3. 0()
Sogou web spider/4. 0(+)
Sogou Orion spider/3. 0()
Sogou-Test-Spider/4. 0 (compatible; MSIE 5. 5; Windows 98)
8. ExabotExabot is a web crawler for Exalead, which is a search engine based out of France. It was founded in 2000 and now has more than 16 billion pages currently (compatible; Konqueror/3. 5; Linux) KHTML/3. 5. 5 (like Gecko) (Exabot-Thumbnails)
Mozilla/5. 0 (compatible; Exabot/3. 0; +)
See Exabot documentation. 9. Facebook external hitFacebook allows its users to send links to interesting web content to other Facebook users. Part of how this works on the Facebook system involves the temporary display of certain images or details related to the web content, such as the title of the webpage or the embed tag of a video. The Facebook system retrieves this information only after a user provides a of their main crawling bots is Facebot, which is designed to help improve advertising
facebookexternalhit/1. 0 (+)
facebookexternalhit/1. 1 (+)
See Facebot documentation. 10. Alexa crawleria_archiver is the web crawler for Amazon’s Alexa internet rankings. As you probably know they collect information to show rankings for both local and international
Full User-Agent stringia_archiver (+;)
See Ia_archiver botsAs we mentioned above most of those are actually good web crawlers. You generally don’t want to block Google or Bing from indexing your site unless you have a good reason. But what about the thousands of bad bots? KeyCDN released a new feature back in February 2016 that you can enable in your dashboard called Block Bad Bots. KeyCDN uses a comprehensive list of known bad bots and blocks them based on their User-Agent a new Zone is added the Block Bad Bots feature is set to disabled. This setting can be set to enabled instead if you want bad bots to automatically be resourcesPerhaps you are seeing some user-agent strings in your logs that have you concerned. Here are a couple of good resources in which you can lookup popular bad bots, crawlers, and Caio Almeida also has a pretty good list on his crawler-user-agents GitHub mmaryThere are hundreds of different web crawlers out there but hopefully you are now familiar with couple of the more popular ones. Again you want to be careful when blocking any of these as they could cause indexing issues. It is always good to check your web server logs to see how often they are actually crawling your we miss any important ones? If so please let us know below and we will add them.
Google Crawler (User Agent) Overview | Google Search Central

Google Crawler (User Agent) Overview | Google Search Central

Google Crawler (User Agent) Overview | Google Search Central
Documentation
Introduction
Just the basics
Beginner SEO
Advanced SEO
Support
Blog
What’s new
Events
Case studies
“Crawler” is a generic term for any program (such as a robot or spider) that is used to
automatically discover and scan websites by following links from one webpage to another.
Google’s main crawler is called
Googlebot. This table lists information
about the common Google crawlers you may see in your referrer logs, and how to
specify them in, the
robots meta tags, and the
X-Robots-Tag HTTP directives.
The following table shows the crawlers used by various products and services at Google:
The user agent token is used in the User-agent: line in
to match a crawler type when writing crawl rules for your site. Some crawlers have more than
one token, as shown in the table; you need to match only one crawler token for a rule to
apply. This list is not complete, but covers most of the crawlers you might see on your
website.
The full user agent string is a full description of the crawler, and appears in
the request and your web logs.
Crawlers
APIs-Google
User agent token
Full user agent string
APIs-Google (+)
AdSense
Mediapartners-Google
AdsBot Mobile Web Android
Checks Android web page ad quality.
AdsBot-Google-Mobile
Mozilla/5. 0 (Linux; Android 5. 0; SM-G920A) AppleWebKit (KHTML, like Gecko) Chrome Mobile Safari (compatible; AdsBot-Google-Mobile; +)
AdsBot Mobile Web
Checks iPhone web
page ad quality.
Mozilla/5. 0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601. 1. 46 (KHTML, like Gecko) Version/9. 0 Mobile/13B143 Safari/601. 1 (compatible; AdsBot-Google-Mobile; +)
AdsBot
Checks desktop web
AdsBot-Google
AdsBot-Google (+)
Googlebot Image
User agent tokens
Googlebot-Image
Googlebot
Googlebot-Image/1. 0
Googlebot News
Googlebot-News
Googlebot Video
Googlebot-Video
Googlebot-Video/1. 0
Googlebot Desktop
Full user agent strings
Mozilla/5. 0 (compatible; Googlebot/2. 1; +)
Mozilla/5. 0 AppleWebKit/537. 36 (KHTML, like Gecko; compatible; Googlebot/2. 1; +) Chrome/W. X. Y. Z Safari/537. 36
Googlebot/2. 1 (+)
Googlebot Smartphone
Mozilla/5. 0 (Linux; Android 6. 0. 1; Nexus 5X Build/MMB29P) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/W. Z Mobile Safari/537. 36 (compatible; Googlebot/2. 1; +)
Mobile AdSense
(Various mobile device types) (compatible; Mediapartners-Google/2. 1; +)
Mobile Apps Android
Checks Android app page ad
quality. Obeys AdsBot-Google robots rules.
AdsBot-Google-Mobile-Apps
Feedfetcher
FeedFetcher-Google
FeedFetcher-Google; (+)
Google Read Aloud
Google-Read-Aloud
Current agents:
Desktop agent: Mozilla/5. 0 (X11; Linux x86_64) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/41. 2272. 118 Safari/537. 36 (compatible; Google-Read-Aloud; +)
Mobile agent: Mozilla/5. 0 (Linux; Android 7. 0; SM-G930V Build/NRD90M) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/59. 3071. 125 Mobile Safari/537. 36 (compatible; Google-Read-Aloud; +)
Former agent (deprecated):
google-speakr
Duplex on the web
DuplexWeb-Google
Mozilla/5. 0 (Linux; Android 11; Pixel 2; DuplexWeb-Google/1. 0) AppleWebKit/537. 36 (KHTML, like Gecko) Chrome/86. 4240. 193 Mobile Safari/537. 36
Google Favicon
Mozilla/5. 36 (KHTML, like Gecko) Chrome/49. 2623. 75 Safari/537. 36 Google Favicon
Web Light
googleweblight
Mozilla/5. 0 (Linux; Android 4. 2. 1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535. 19 (KHTML, like Gecko; googleweblight) Chrome/38. 1025. 166 Mobile Safari/535. 19
Google StoreBot
Storebot-Google
Desktop agent:
Mozilla/5. 0 (X11; Linux x86_64; Storebot-Google/1. 36 (KHTML, like Gecko) Chrome/79. 3945. 88 Safari/537. 36
Mobile agent:
Mozilla/5. 0 (Linux; Android 8. 0; Pixel 2 Build/OPD3. 170816. 012; Storebot-Google/1. 36 (KHTML, like Gecko) Chrome/81. 4044. 138 Mobile Safari/537. 36
User agents in
Where several user agents are recognized in the file, Google will follow the most
specific. If you want all of Google to be able to crawl your pages, you don’t need a
file at all. If you want to block or allow all of Google’s crawlers from accessing
some of your content, you can do this by specifying Googlebot as the user agent. For example,
if you want all your pages to appear in Google Search, and if you want AdSense ads to appear
on your pages, you don’t need a file. Similarly, if you want to block some pages
from Google altogether, blocking the Googlebot user agent will also block all
Google’s other user agents.
But if you want more fine-grained control, you can get more specific. For example, you might
want all your pages to appear in Google Search, but you don’t want images in your personal
directory to be crawled. In this case, use to disallow the
Googlebot-Image user agent from crawling the files in your personal directory
(while allowing Googlebot to crawl all files), like this:
User-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow: /personal
To take another example, say that you want ads on all your pages, but you don’t want those
pages to appear in Google Search. Here, you’d block Googlebot, but allow the
Mediapartners-Google user agent, like this:
Disallow: /
User-agent: Mediapartners-Google
Some pages use multiple robots meta tags to specify directives for different crawlers, like
this:


In this case, Google will use the sum of the negative directives, and Googlebot will follow
both the noindex and nofollow directives.
More detailed information about controlling how Google crawls and indexes your site.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4. 0 License, and code samples are licensed under the Apache 2. 0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2021-09-09 UTC.
[{
“type”: “thumb-down”,
“id”: “missingTheInformationINeed”,
“label”:”Missing the information I need”}, {
“id”: “tooComplicatedTooManySteps”,
“label”:”Too complicated / too many steps”}, {
“id”: “outOfDate”,
“label”:”Out of date”}, {
“id”: “samplesCodeIssue”,
“label”:”Samples / code issue”}, {
“id”: “otherDown”,
“label”:”Other”}]
“type”: “thumb-up”,
“id”: “easyToUnderstand”,
“label”:”Easy to understand”}, {
“id”: “solvedMyProblem”,
“label”:”Solved my problem”}, {
“id”: “otherUp”,
“label”:”Other”}]

Frequently Asked Questions about web crawler user agent

What is crawler user agent?

“Crawler” is a generic term for any program (such as a robot or spider) that is used to automatically discover and scan websites by following links from one webpage to another. Google’s main crawler is called Googlebot. … txt, the robots meta tags, and the X-Robots-Tag HTTP directives.

Is user agent a bot?

User-Agents and crawlers Search engine crawlers also have a user-agent. Given that the user-agent identifies bots as what they are, this is, bots, web servers give them special “privileges”. For example, the web server can walk Googlebot through a sign up page.Feb 8, 2021

What is a web crawler used for?

A web crawler, or spider, is a type of bot that is typically operated by search engines like Google and Bing. Their purpose is to index the content of websites all across the Internet so that those websites can appear in search engine results.

Leave a Reply

Your email address will not be published. Required fields are marked *