Google updates crawlers and user-triggered fetchers documentation


Google has made a series of updates to its crawlers and user-triggered fetchers documentation, mostly breaking out the single-page document into multiple pages and documents. But Google also expanded what product each crawler affects with a new section next to each crawler and added a robots.txt snippet for each crawler to demonstrate how to use the user agent tokens by each crawler.

What Google said. Google posted about these changes saying:

“Reorganized the documentation for Google’s crawlers and user-triggered fetchers. We also added explicit notes about what product each crawler affects, and added a robots.txt snippet for each crawler to demonstrate how to use the user agent tokens. There were no meaningful changes to the content otherwise.”

“The documentation grew very long which limited our ability to extend the content about our crawlers and user-triggered fetchers.”

What is new. Besides for just moving a lot of content around, to organize the content. Google also added the “affected products” sections and also the “Example robots.txt group” sections. Here is a screenshot of this applied to the Googlebot crawler, but it was added to each individual crawler:

I pulled out each one for you:

  • Googlebot: Crawling preferences addressed to the Googlebot user agent affect Google Search (including Discover and all Google Search features), as well as other products such as Google Images, Google Video, Google News, and Discover.
  • Googlebot Image: Crawling preferences addressed to the Googlebot-Image user agent affect Google Images, Discover, Google Video, and all features in Google Search where images, logos, and favicons are presented.
  • Googlebot Video: Crawling preferences addressed to the Googlebot-Video user agent affect video-related Google Search features and other products dependent on videos.
  • Googlebot News: Crawling preferences addressed to the Googlebot-News user agent affect all surfaces of Google News (for example, the News tab in Google Search and the Google News app).
  • Google StoreBot: Crawling preferences addressed to the Storebot-Google user agent affect all surfaces of Google Shopping (for example, the Shopping tab in Google Search and Google Shopping).
  • Google-InspectionTool: Crawling preferences addressed to the Storebot-Google user agent affect Search testing tools such as the Rich Result Test and URL inspection in Search Console. It has no effect on Google Search or other products.
  • GoogleOther: Crawling preferences addressed to the GoogleOther user agent don’t affect any specific product. GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development. It has no effect on Google Search or other products.
  • GoogleOther-Image: Crawling preferences addressed to the GoogleOther-Image user agent don’t affect any specific product, similar to GoogleOther. GoogleOther-Image is the version of GoogleOther optimized for fetching publicly accessible image URLs.
  • GoogleOther-Video: Crawling preferences addressed to the GoogleOther-Video user agent don’t affect any specific product, similar to GoogleOther. GoogleOther-Video is the version of GoogleOther optimized for fetching publicly accessible video URLs.
  • Google-CloudVertexBot: Crawling preferences addressed to the Google-CloudVertexBot user agent affect crawls requested by the site owners’ for building Vertex AI Agents. It has no effect on Google Search or other products.
  • Google-Extended: Google-Extended is a standalone product token that web publishers can use to manage whether their sites help improve Gemini Apps and Vertex AI generative APIs, including future generations of ****** that power those products. Google-Extended does not impact a site’s inclusion or ranking in Google Search.
  • APIs-Google: Crawling preferences addressed to the APIs-Google user agent affect the delivery of push notification messages by Google APIs.
  • AdsBot Mobile Web: Crawling preferences addressed to the AdsBot-Google-Mobile user agent affect Google Ads’ ability to check web page ad quality.
  • AdsBot: Crawling preferences addressed to the AdsBot-Google user agent affect Google Ads’ ability to check web page ad quality.
  • AdSense: Crawling preferences addressed to the Mediapartners-Google user agent affect Google AdSense. The AdSense crawler visits participating sites in order to provide them with relevant ads.
  • Google-Safety: The Google-Safety user agent handles abuse-specific crawling, such as malware discovery for publicly posted links on Google properties. As such it’s unaffected by crawling preferences.
  • Feedfetcher: Feedfetcher is used for crawling RSS or Atom feeds for Google News and PubSubHubbub.
  • Google Publisher Center: Google Publisher Center fetches and processes feeds that publishers explicitly supplied for use in Google News landing pages.
  • Google Read Aloud: Upon user request, Google Read Aloud fetches and reads out web pages using text-to-speech (TTS).
  • Google Site Verifier: Google Site Verifier fetches Search Console verification tokens.

Why we care. Reading through these affected product sections may help you better understand how each crawler affects various aspects of Google. Some don’t impact Google Search at all, while others are fundamental to how Google Search works.

Also the new robots.txt examples may be very useful to you and your development teams.


New on Search Engine Land

About the author

Barry Schwartz

Barry Schwartz is a technologist and a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics.

In 2019, Barry was awarded the Outstanding Community Services Award from Search Engine Land, in 2018 he was awarded the US Search Awards the “US Search Personality Of The Year,” you can learn more over here and in 2023 he was listed as a top 50 most influential PPCer by Marketing O’Clock.

Barry can be followed on X here and you can learn more about Barry Schwartz over here or on his personal site.





Source link : Searchengineland.com

Social media & sharing icons powered by UltimatelySocial
error

Enjoy Our Website? Please share :) Thank you!