Apple has made some really big changes to the Applebot documentation after the Apple WWDC event, where Apple announced Apple Intelligence. Apple added more about Applebot, reverse DNS details, Applebot-Extended and much more.
To be clear, Applebot is not new, it is about a decade old but now with Apple Intelligence, I guess Apple is getting more serious about it? The change to the document was made on June 11th, the day after the Apple keynote.
The big item on the AI side of Applebot is that Apple added Applebot-Extended, similar to Googlebot-Extended, for AI purposes. As Glenn Gabe noted on X on Friday, “You can block Applebot-Extended. So you can opt out via robots.txt -> Apple says it doesn’t train its ****** on users’ private data or user interactions, and instead relies on licensed materials and publicly available online data.”
There is a lot that changed but here is the Applebot-Extended portion:
In addition to following all robots.txt rules and directives, Apple has a secondary user agent, Applebot-Extended, that gives web publishers additional controls over how their website content can be used by Apple.
With Applebot-Extended, web publishers can choose to opt out of their website content being used to train Apple’s foundation ****** powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.
You can add a rule in robots.txt to disallow Applebot-Extended, as follows:
User-agent: Applebot-Extended
Disallow: /private/Applebot-Extended does not crawl webpages. Webpages that disallow Applebot-Extended can still be included in search results. Applebot-Extended is only used to determine how to use the data crawled by the Applebot user agent.
Allowing Applebot-Extended will help improve the capabilities and quality of Apple’s generative AI ****** over time.
Apple also added these new sections:
Learn about Applebot, the web crawler for Apple.
The data crawled by Applebot is used to power various features, such as the search technology that is integrated into many user experiences in Appleʼs ecosystem including Spotlight, Siri, and Safari. Enabling Applebot in robots.txt allows website content to appear in search results for Apple users around the world in these products.
Applebot accesses many kinds of resources from web servers, including but not limited to robots.txt, sitemaps, RSS feeds, HTML, sub resources needed to render pages such as javascript, Ajax requests, images, and more.
Another way is to match the IP address with a CIDR prefix contained in the following JSON file: Applebot IP CIDRs.
Reverse DNS
In macOS, the host command can be used to determine if an IP address is part of Applebot. These examples show the host command and its result:
The host command can be used to determine if an IP address is part of Applebot. These examples show the host command and its result:
$ host 17-58-101-179.applebot.apple.com
17-58-101-179.applebot.apple.com has address 17.58.101.179.The host command can also be used to verify that the DNS points to the same IP address:
User agents
A user agent helps webmasters identify crawler traffic, so that they can get accurate access log reports of crawler activity and control access to the site via robots.txt.
Applebot powers several user agents, including Search and Podcasts.
Search
For search web crawling and rendering, Applebot uses the following format:
The user-agent string contains ”Applebot” and other information. The following is the general format:
Mozilla/5.0 (Device; OS_version) AppleWebKit/WebKit_version (KHTML, like Gecko)Version/Safari_version [Mobile/Mobile_version] Safari/WebKit_version (Applebot/Applebot_version; +http://www.apple.com/go/applebot)
Apple Podcasts
iTMS traffic may also come from applebot.apple.com hosts, and will be identified by the following user agent:
User-Agent: iTMS
The iTMS user agent does not follow robots.txt, as it is not a general search crawler. It only crawls URLs associated with registered content on Apple Podcasts.
Like I said, there is a lot changed between the old version and the new version.
You can compare the two documents in your favorite text comparison tool.
OLD:
NEW:
Forum discussion at X.
Source link : Seroundtable.com