You’ve probably heard about the recent Google document leak. It is on every major site and all social media.
Where do the docs come from?
My understanding is that a bot called yoshi-code-bot leaked documents related to the Content API Warehouse on Github on March 13, 2024. It may have appeared before in some other repos, but this is the first one I’ve found.
They were discovered by Erfan Azimi who shared with Rand Fishkin who shared with Mike King. The document was removed on May 7.
I appreciate everyone involved in sharing their findings with the community.
Google’s response
There is some debate if these documents are real or not, but they mention a lot of internal systems and links to internal documentation and they certainly seem real.
A Google spokesperson released the following statement to Search Engine Land:
We will be careful not to make inaccurate assumptions about the Search based on information that is out of context, out of date, or incomplete. We have shared extensive information about Search methods and the types of factors our system considers, while also working to protect the integrity of the results from manipulation.
SEOs interpret things based on their own experiences and biases
Many SEOs say that the ranking factor is leaking. I haven’t seen any codes or weights, only what appears to be a description and storage info. Unless one of the descriptions states that these items are used for ranking, I think it is dangerous for SEO to assume that all of these are used for ranking.
Having some features or information stored does not mean they are used in the ranking. For our search engine, Yep.com, we have all kinds of things stored that can be used for crawling, indexing, ranking, personalization, testing, or feedback. We keep a lot of stuff that we haven’t used yet, but will in the future.
What is more likely is that SEOs make assumptions that support their own opinions and biases.
It’s the same for me. I may not have the full context or knowledge and may have inherent biases that affect my interpretation, but I try to be as fair as I can. If I’m wrong, that means I’ll learn something new and that’s a good thing! SEO can, and does, give different interpretations.
Gaelic Breton well said:
What I learned from the Google leak:
Everyone sees what they want to see.
đź”— The link seller tells you that you prove the link is still important.
đź“• Semantic SEO people tell you that they prove that they are right.
đź‘Ľ Niche sites tell you that this is why they are down.
👩‍💼 Agency informs…
– Gael Breton (@GaelBreton) May 28, 2024
I’ve been around long enough to see a lot of SEO myths created over the years and I can show you who started most of them and what you don’t know. We’ll likely see a lot of new myths from this leak unfold over the next decade or so.
Let’s look at a few things that in my opinion were misconstrued or concluded where they shouldn’t have been.
SiteAuthority
As far as I know, Google has a Site Authority score that is used for ranking like DR, the section is dedicated to compressed quality metrics and talks about quality.
I believe DR is more of an effect that happens because you have a lot of pages with strong PageRank, not what Google should use. Many pages with higher PageRank linking to each other means you are more likely to create a stronger page.
- Do I believe that PageRank can be part of what Google calls quality? yes already.
- Did I think it was all that? Not.
- Can Site Authority be like DR? Perhaps. Fit in the bigger picture.
- Can I prove it or even use it in the rankings? No, not from this.
From some of Google’s testimony to the US Department of Justice, we found that quality is often measured by the Information Satisfaction (IS) score of raters. It is not directly used in ranking, but is used for advice, testing, and model fine-tuning.
We know that quality raters have the concept of EEAT, but it’s not what Google uses. They use signals that correspond to EEAT.
Some of the EEAT signals that Google has mentioned are:
- PageRank
- Mention it on the official site
- Site questions. This could be “site:http://ahrefs.com EEAT” or a search like “ahrefs EEAT”
So that some kind of PageRank score can be extrapolated to the domain level and called Site Authority is used by Google and is part of what makes a quality signal? I would say it makes sense, but this leak doesn’t prove it.
I can remember 3 patents from Google that I have seen regarding quality scores. One of them is in line with the signal above for the site query.
I should point out that just because something is patented, it doesn’t mean it’s used. The patent on the site query was partially written by Navneet Panda. Want to guess who Panda’s quality-related algorithm is named after? I would say that there is a possibility that this is used.
Another is around the use of n-grams and it seems to calculate quality scores for new websites and other times mentioned on the site.
Sand box
I think this has been misinterpreted as well. The docs have a field called hostAge and refer to the sandbox, but specifically say that it’s used “to sandbox fresh spam during serving.”
For me, that does not confirm the existence of the sandbox in the way of SEO seeing where new sites can not rank. For me, it reads like a measure of spam protection.
Click
Are clicks used in ranking? Well, yes, and no.
We know Google uses clicks for things like personalization, timely events, testing, suggestions, etc. We know they have models against models trained on click data including navBoost. But what directly accesses click data and uses it in ranking? Nothing I’ve seen confirms that.
The problem is that SEO interprets this as CTR as a ranking factor. Navboost is built to predict which pages and features will be clicked on. It is also used to reduce the number of returns we learn from DOJ trials.
As far as I know, there is nothing to confirm that it takes into account click data from individual pages to re-order the results or if you ask more people to click on individual results, that your ranking will go up.
It should be easy enough to prove that it is small. Tried it many times. I tried it last year using the Tor network. my friend Russ Jones (may he rest in peace) tried using a home proxy.
I have never seen a successful version of this and people have been buying and trading clicks on various sites for years. I’m not trying to hurt you or anything. Test yourself, and if possible, publish the study.
Rand Fishkin’s test of search and click results at last year’s conference showed that Google uses click data for trending events, and will increase the results of any click. After the experiment, the results immediately returned to normal. It is not the same as using it for normal ranking.
Author
We know Google matches authors with entities in the knowledge graph and uses them in Google news.
There seems to be a good amount of author info in the document, but no one has confirmed that it is used in ranking because some SEOs are speculating.
Is Google lying to us?
What I do not agree with wholeheartedly is that SEOs are angry with Google Search Advocates and call them liars. He’s a good guy just doing his job.
If they tell you something wrong, it’s probably because they don’t know, they’re misinformed, or they’ve been told not to interfere to prevent abuse. They don’t deserve the hate the SEO community is giving them today. We are lucky that they share their information with all of us.
If you think what they say is wrong, take a test to prove it. Or if there’s a test you want me to do, let me know. Just being mentioned in the docs is not proof that something is used in ranking.
A final thought
While I agree or I disagree with other SEO interpretations, I appreciate everyone who is willing to share their analysis. It is not easy to put yourself or your thoughts out there for public scrutiny.
I also want to reiterate that unless this box specifically says they are used in the ranking, that information can easily be used for something else. We definitely don’t need a post about Google’s 14,000 ranking factors.
If you want my thoughts on a certain thing, message me X or LinkedIn.