A trove of leaked Google documents has provided an unprecedented look inside Google Search, revealing some of the most important elements used to rank content. The documents shed light on how Google uses clicks, links, content, entities, Chrome data, and more for ranking.
What the leaked Google Search documents reveal
Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released on March 13 on GitHub by an automated bot called yoshi-code-bot. These documents were shared with Rand Fishkin, SparkToro co-founder, earlier this month. This leak offers a glimpse into how Google’s ranking algorithm may work, providing invaluable insights for SEOs.
Key findings from the Google Search documents
According to Fishkin and Michael King, iPullRank CEO, who reviewed and analyzed the documents, here’s what we know:
- Ranking features: The API documentation includes 2,596 modules with 14,014 attributes. However, the documents do not specify how any of these ranking features are weighted.
- Twiddlers: These are re-ranking functions that can adjust the information retrieval score of a document or change its ranking.
- Demotions: Content can be demoted for various reasons, such as a link not matching the target site, SERP signals indicating user dissatisfaction, product reviews, location, exact match domains, and pornographic content.
- Change history: Google keeps a copy of every version of every page it has ever indexed, but only uses the last 20 changes of a URL when analyzing links.
- Links: Link diversity and relevance remain key factors, and PageRank is still very much alive within Google’s ranking features.
- Clicks: Google uses various measurements, including badClicks, goodClicks, lastLongestClicks, and unsquashedClicks. Longer documents may get truncated, while shorter content receives a score based on originality.
- Brand importance: Building a notable, popular, well-recognized brand is crucial for improving organic search rankings and traffic.
- Entities and authorship: Google stores author information associated with content and tries to determine whether an entity is the author of the document.
- SiteAuthority: Google uses a concept called “siteAuthority,” which suggests that low-quality content on part of a site can impact the site’s overall ranking.
- Chrome data: A module called ChromeInTotal indicates that Google uses data from its Chrome browser for ranking.
- Whitelists: Some modules indicate that Google whitelists certain domains related to elections and COVID-19.
- Small sites: A feature called smallPersonalSite may allow Google to boost or demote small personal sites or blogs, though the weighting of this feature remains uncertain.
Implications for SEOs from the Google leak
This leak provides significant insights into the inner workings of Google’s ranking algorithm, highlighting the complexity and multifaceted nature of search ranking. For SEOs and marketers, understanding these elements can help refine strategies and improve search performance.
- Freshness matters: Google looks at dates in the byline, URL, and on-page content.
- Core topics: Google vectorizes pages and sites, comparing page embeddings to site embeddings to determine if a document is a core topic of the website.
- Domain registration information: Google stores this information as part of its ranking considerations.
- Page titles: The titlematchScore feature measures how well a page title matches a query.
- Font size: Google measures the average weighted font size of terms in documents and anchor text.
The leaked documents offer a detailed look at the factors influencing Google Search rankings. These insights are invaluable for SEOs aiming to optimize content and improve search engine performance.