Reading Time: 7 minutes
On March 27th, 2024, Google accidentally uploaded code to GitHub that would become one of its most significant leaks. Based on the commit history, the code stayed on GitHub until 7th May 2024.
The more-than-month delay gave EA Eagle Digital founder and SEO practitioner Erfan Azimi enough time to make it available to the public.
As stated himself, it wasn’t for financial interests but to sensitize the SEO community to the company’s misstatements while exposing purported search ranking factors.
What Did the Google Search Leak Reveal?
The March Google search leak revealed 2,500 internal documents believed to contain over 14,014 search ranking factors.
After a series of analyses, former SEO expert Rand Fishkin and iPullRank’s CEO Mike King released elaborate previews and takeaways from the document.
But in case you missed that, here’s a recap of what was inside.
Google’s Claims on Search Ranking Factors Debunked
The leak exposed many misstatements about search ranking factors made by Googlers on socials and hangout sessions, as highlighted below.
Claim: Google Doesn’t Use Domain Authority
Over the past few years, Google employees have been extra vigilant about refuting the use of domain authority in their rankings.
In fact, webmaster trends analyst John Mueller repeatedly stated the absence of this metric. We first saw him highlight this in a 2016 Google Hangout, saying, “We don’t have anything like a website authority score.”
A week after his sentiments, his colleague Garry Illyes responded to a question on image links as follows:
Later in 2018, someone asked Mueller via Reddit whether domain authority existed. The analyst commented, “Of course, it exists; it’s a tool by Moz.”
Then, in 2022, Mueller reiterated his comments, adding asterisks to stress his point. He responded to a Redditor stating that they “ do not need DA” since “Google doesn’t use it *at all*…”
However, this leak exposed a metric called “siteAuthority” believed to affect rankings.
Claim: Google Doesn’t Consider Clicks in Ranking
In 2008, John Mueller commented, “Anyone visiting your site a few dozen times and hitting the back button on their browser is not going to impact your site’s crawling, indexing, or ranking at Google.”
Later in a 2018 Reddit AMA (Ask Me Anything), Gary Illyes termed theories of Google using UX metrics like CTR and dwell time as “made up crap.”
Dwayne Forrester, who used to work at Bing first introduced the concept of dwell time during a session I attended at Pubcon about 10 years ago. Since the community usually considers Bing to be not as sophisticated as Google, the logical assumption therefore is that Google might also use this metric.
But despite the company’s constant denial, the impact of clicks on ranking has been a *** topic for over a decade.
Of course, contradicting statements by Google employees reduced people’s faith in them.
Journalist Danny Sullivan took to X in 2015 to highlight sentiments by Udi Manber, Google’s previous chief of search quality: “The ranking itself is affected by the click data.”
The same report, which included Manber’s thoughts, also noted the beliefs synonymous with Google on this topic.
The report stated, “According to Marissa Mayer, Google did not use click-through rates to determine the position of the Universal Search properties…”
Fast forward to 2019, Moz SEO expert Britney Mueller found something interesting on a Google developer page:
And in 2023, former Googler Eric Lehman stated, “Pretty much everyone knows we’re using clicks in rankings.”
As such, the latest reveal in this GitHub leak only affirms something many knew.
Listed within the NavBoost module are a bunch of click-focused metrics, including:
- Unsquashed last longest clicks
- Unsquashed clicks
- Last longest clicks
- Good clicks
- Bad clicks
Claim: There’s No Sandbox
That was John Mueller’s exact response in 2019 when asked how long new websites took to revive from the sandbox. His reply was only a repetition of what Google has been saying every year since 2004, when users first coined the term.
Gary Illyes also refuted the existence of a sandbox in a 2016 tweet.
Matt Cutt’s admission of a sandbox-like effect for select industries in 2005 is probably one of the reasons why this fire kept burning all this while.
Now, documentation in this leaked PerDocData module highlights a “hostAge” feature. This attribute supposedly safeguards the search engine from poor content by sandboxing new spammy websites.
Claim: Chrome Data Doesn’t Affect Ranking
In a 2012 WebmasterWorld discussion, Bill Hartzer shared that Matt Cutts denied using Chrome data in ranking.
In the post, the internet marketer said, “Matt told me, in person, that Google’s organic algorithm does not use any Google Chrome data.” He continued, “The same goes for the Google Toolbar, as well.”
More recently, in 2022, John Mueller also rejected using Chrome data during Google Hangout. He stated, “I don’t think we use anything from Google Chrome for ranking.”
Mueller also discussed the user experience report (CruX). He said, “That’s the only thing that we use from Chrome within ranking.”
However, a few of the modules in this latest leak indicate that additional Chrome data may impact ranking. One such module related to page quality score is ChromeInTotal, which assesses site-level Chrome views.
Newly Exposed Google Ranking Features
The document highlighted numerous ranking factors, which helped illuminate the functions and relationships of different systems. It revealed over a hundred ranking systems.
King’s comprehensive breakdown showed that these might be part of Google’s 200 ranking signals.
Some of the systems exposed include:
- Indexing systems like SegIndexer and TeraGoogle deal with tier and long-term-dwelling disk documents under the primary system, Alexandria.
- Rendering systems like HtmlrenderWebkitHeadless deal with JavaScript pages.
- Processing systems like LinkExtractor and WebMirror extract page links and manage duplication and canonicalization.
- Ranking systems that include the primary system, Mustang, the feature name determiner WebChooserScorer system, and re-ranking systems, NavBoost and FreshnessTwiddler.
- Serving systems like Google Web Server, SuperRoot, SnippetBrain, Glue, and Cookbook.
Twiddlers
The leaked document had various functions featuring a Boost suffix like QualityBoost, WebImageBoost, RealTimeBoost, and NavBoost.
These functions run using the Twiddler framework.
Unlike the universal parker, which handles multiple corpora, twiddlers are Superoot components that re-rank results from one corpus.
King’s Google leak breakdown also highlighted that twiddlers come after the Ascorer search algorithm, which is part of the primary ranking system, Mustang.
Demotions
The leak also highlighted various reasons that can cause content demotion. Some of the key elements addressed include:
- Panda: It uses various patents, like “Ranking search results.” This patent modifies content using the number of independent links and NavBoost reference queries. Another patent, Site quality score, uses the ratios between reference queries and visitor clicks for rating.
- Nav demotion: Penalization for poor user experience and navigation.
- Anchor Mismatch: Penalization when the anchor text doesn’t match the linked document.
- SERP demotion: Penalization based on click-measured user satisfaction.
- Exact Match demotion: Penalization of exact match domains.
- Product Review demotion: An unexplained demotion that may relate to 2023’s product review update.
- Location demotions: A disregard for super global and global content in favor of location-based material.
- **** demotions: As suggested.
Other Key Revelations for SEO
Additional revelations from this leak which may impact SEO are as follows:
- Links have tiers that dictate their value and storage location. Fresh links from the flash memory are the most valuable. Standard hard drives feature irregularly updated material, while solid-state drives have less valuable content.
- There are various link spam velocity signals like phraseAnchorSpamDays. These can identify spamming sites and help Google disregard such links.
- Google stores all page changes. However, they only consider the last 20 to analyze links. So, you can’t earn link equity by redirecting pages to irrelevant targets.
- Google considers Homepage PageRank for all pages. So, this and siteAuthority may apply to fresh pages until they receive a predefined PageRank.
- Google values links based on homepage trust.
- Google assesses the font sizes of phrases and links and assigns importance. The “font size” element for anchor text and avgTermWeight for a document term signify different weights.
- The droppedLocalAnchorCount element may disregard some internal links.
- Each document has a token threshold for Google’s consideration. The rest undergoes truncation, highlighting the need to place important information at the top.
- OriginalContentScore analyzes the originality of short content.
- There’s a keyword stuffing measure.
- The titlematchScore attribute analyzes how well a title fits the content.
- There isn’t a specific metadata character count. However, the snippetPrefixCharCount measure determines the constituents of a snippet.
- Attributes like bylineDate, syntacticDate, and semanticDate indicate that Google favors fresh content.
- Websites with videos on at least half their pages are video sites, which receive different treatment.
- YMYL has a unique rating system.
- Google features a gold standard attribute that may denote human-generated content.
- There’s a “smallPersonalsite” attribute whose function remains unclear. King noted that they could assign a twiddler for the attribute that could demote or promote small sites.
- There’s a whitelist for the most controversial topics like politics, plus Covid and travel.
Does Google Use All Elements in the Leak for Search Ranking?
Given how frequent Google search changes, we can’t be sure that elements exposed in the leak contribute to search rankings.
Based on its latest ****, though, we believe the document was up-to-**** as of August 2023.
However, there’s no mention of the recent AI overviews. Whether those are new developments or omissions in the leaked document is unknown.
Given the radical nature of Google updates, it’s unwise to view any of the attributes as affecting search rankings currently.
We also don’t know how many of these are legacy and no longer used, or whether they’re used for other internal systems.
Is this Leak Really a Big Surprise?
None of these denials or misinformation from Google should be surprising. Around 2015 they hired multiple key engineers from Firefox, but Eric Schmidt who was then CEO denied they were building a browser. However the beta version of Chrome was then released in September 2008. It’s just the way that business works – companies mislead to try and maintain a competitive edge or uphold their brand/reputation.
Will SEO Change Now?
The March 2024 Google search leak potentially exposed many secrets, such as content whitelists, link tiers, and demotions. It also debunked a few misstatements, like the disregard for site authority in ranking and the absence of a sandbox.
Still, there’s no major change to SEO strategies for those of us who’ve been paying attention, running our own tests, and always taking Google’s word with a grain of salt.
At Vizion Interactive, we have the expertise, experience, and enthusiasm to get results and keep clients happy! Learn more about how our SEO Audits, Local Listing Management, Website Redesign Consulting, and B2B digital marketing services can increase sales and boost your ROI. But don’t just take our word for it, check out what our clients have to say, along with our case studies.