Guest essay by Leo Goldstein
A Method of Google Search Bias Quantification and Its Application in Climate Debate and General Political Discourse
The percentage of domain traffic, referred by Google Search, net of brand searches (PGSTN), tends to be in or around the range 25%-30% for a broad class of web domains. This hypothesis is tested by calculating the correlation between the popularity of news/opinions websites and their PGSTN, and finding it to be near zero. Thus, PGSTN can be used rigorously to detect and even quantify Google Search intentional bias. Intentional bias is the bias that has been introduced by internal Google decisions, and unrelated to external factors, such as the dominance of particular viewpoints on the web. Here, the PGSTN method is applied for intentional bias detection about climate debate and in general political discourse.
Google Search is found to be extremely biased in favor of climate alarmism and against climate realism. The PGSTN ranges for climate realism and climate alarmism do not even overlap! Some of the most important climate realist domains, including low-controversial judithcurry.com, have such a low PGSTN that they can be considered blacklisted by Google.
Google Search is found to be biased in favor of left/liberal domains and against conservative domains with a confidence of 95%. Further, certain hard-Left domains have such a high PGSTN that their standing raises suspicions that they have been hand-picked for prominent placement. Certain respected conservative domains are blacklisted.
Left-liberal political bias in Google Search has been noticed for years. See Robert Epstein et al, A Method for Detecting Bias in Search Rankings, with Evidence of Systematic Bias Related to the 2016 Presidential Election; Todd Dunning’s 2015-09-15 comment; Leo Goldstein, Why are Search Engines so Hostile to Climate Realism?
These observations have not completely resolved the question whether the bias was intentional, or reflected the biased web content. Recently, Google’s official Search Evaluation Guidelines have shown intentional bias against climate realism. At least one former Google employee claimed intentional demotion of “anything non-PC” by the Google Search team.
This paper uses published SEO data from multiple sources, including BrightEdge Research, 2017: Organic Search Is Still the Largest Channel, updating its 2014 report. From here on, the term ‘bias’ means intentional bias. This paper formulates, substantiates, and applies a quantitative method of bias detection in Google Search.
It is known that Google Search provides 25%-30% of the user’s traffic to an average website. As Google executives and PR repeated many times, Google Search service exists to provide the most relevant and useful results to the user’s queries. Google Chairman Eric Schmidt even joked that there should be only one result for each query – the result that the user seeks. Google servers crawl the whole web, extracting text, links, and other data from trillions of pages. Google constantly and successfully fights attempts to artificially promote websites through collusive linking, and other search engine optimization techniques. In its undertaking, Google also uses an enormous amount of off-web information, which it collects through Chrome browser, other Google applications and services, analytics beacons, domains registrar status, and so on. This information includes domains popularity and ownership. Google also processes immediate feedback from the users in the form of frequency of clicks on the results, bounce rate, the frequency of repeated searches with modified terms, etc.
Google is very good at its job. Sites and domains that are less popular with the visitors tend to be less likely to receive traffic from Google, and vice versa. The effect is that percentage of net traffic that domains receive from Google Search tends to be similar across web domains! This fact is illustrated by nearly zero correlation between domain popularity and percentage of net traffic it receives from Google within each of the sets of the left/liberal media and conservative media, despite the domain popularity (according to Alexa.com, lower values mean higher popularity), varying from 24 to 1,469 for the left/liberal media set, and from 56 to 12,795 for the conservative media set. Traffic from Google ads is about 5% of the total Google traffic, so it is not a factor. “Net traffic,” used throughout this research, excludes traffic received from the users, intentionally searching for the website by its name (i.e., searches for ‘foxnews’ and ‘fox news’ are excluded from the net traffic for foxnews.com). Net traffic better reflects Google intent, because Google Search does not have much choice when the user searches for a website by its name or brand. Alexa.com provides information which allows PGSTN calculation for hundreds of thousands of web domains.
Given the robustness of PGSTN, I conclude that statistically significant difference in PGSTN between a priori defined sets of comparable domains is due to intentional bias by Google, unless there is another good explanation.
All the data in this research is based on Alexa (free version) snapshots from September 4, 2017. For each domain, Google Search Total was taken from “Upstream Sites | Which sites did people visit immediately before this site?” table. Branded traffic was taken from the “Top Keywords from Search Engines | Which search keywords send traffic to this site?” table. It should be noted that only five top values, appearing in the free Alexa snapshots, were used. All Google search domains shown in the table were included (google.com, google.ca, google.co.uk, google.co.in etc.) If the total of the branded traffic were less than 5%, the value 1% was entered. PGSTN was calculated by deducting branded search traffic from the total Google Search traffic.
PGSTN is not expected to provide sufficient certainty for individual domains, because multiple factors influence it, including possible error in Alexa data. Nevertheless, the Google attitude toward a domain has been provisionally noted and color coded in the attached spreadsheet PGSTN-Domains.xlsx as follows:
Whitelist / Green Light: >36%
Grey Area: 12%-20%
Most domains were expected (based on the cited SEO research) to have PGSTN in the 20%-36% range. This expectation has been met. PGSTN <= 12% provisionally indicates that the domain is blacklisted by Google. Everything between the blacklist and the normal range is considered a grey area. Finally, PGSTN > 36% provisionally indicates unusual favoritism by Google.
Google Bias in Climate Debate
The domains were selected mostly according to Alexa classification. Detection of extreme bias Google Search has against climate realism did not require statistical methods.
There is a huge gap between PGSTN of realism domains (6.3% – 17.4%), and PGSTN of climate alarmism domains (23.5%-52.4%). The gap is 6.1%. Except for drroyspencer.com, all climate realism domains are blacklisted by Google (PGSTN is 6.3% – 11.0%).
On the other hand, self-appointed “fact checkers,” including snopes.com and politifact.com have PGSTN about 50%. That gives ground to the suspicion that they had been hand-picked by Google for prioritization. Another two sites with suspiciously high PGSTN are sourcewatch.org (PGSTN = 50.1%) and prwatch.org (PGSTN = 40.9%). These two sites grossly exchange links (they refer to each other as the source), have overlapping content, and are known to Google to belong to the same organization, the Center for Media and Democracy. These are well-known signs of spam – yet Google has not only failed to downrank them as spam, but likely manually prioritized them.
This section includes netrootsnation.org, a site of a radical left conference, not specifically geared toward climate alarmism. Its PGSTN = 44.5%. This domain could have been hand-picked or its owners had been advised by Google insiders on gaming the rankings. Google has funded the conference, and Google representatives attended it and made presentations on relevant subjects, like this one. A quote:
“We’ll share some ways to leverage the power of online video and how to integrate Google and YouTube’s tools with other advocacy efforts.”
All other alarmist domains have PGSTN in the whitelisted or normal range.
Google Bias in General Political Discourse
To quantify Google general political bias, I selected top U.S. news and opinions sites by their ranking in Alexa, then added some lower ranking conservative sites based on my personal knowledge and/or Alexa suggestions. There was an element of subjectivity in selection and classification, and I omitted some domains that I could not classify. Nevertheless, the most popular domains in both left/liberal (including Left, Mainstream Liberal, and Mainstream Center) and conservative (including Conservative and Mainstream Conservative) categories have been selected and classified rigorously, and use of weighted statistics minimized the element of subjectivity in the results.
The results show that Google Search is heavily biased against conservative domains, and some respectable conservative domains seem to be blacklisted:
There might be an alternative or additional explanation for low PGSTN of the Drudge Report – the site mostly consists of links to articles on other sites, a practice Google looks down on.
On average, the conservative domains have almost two times lower PGSTN than the left/liberal ones: conservative 15.5% (standard deviation 5.1%) vs. left/liberal 27.4% (standard deviation 4.9%). Hypothesis of Google Search left/liberal bias is confirmed with a confidence of 95%.
Although PGSTN of individual domains is not sufficient for conclusions, I cannot avoid noticing that extremist websites, such as dailystormer.com (PGSTN = 13.6%; ceased to exist by the time of the research) and dailykos.com (PGSTN = 20.2%) are preferred by Google over many conservative and climate realist domains.
Google Search is biased in favor of left/liberal websites against conservative websites, and is extremely biased in favor of climate alarmism against climate realism.
I hold short positions in Google stock.
The references are in the body of the article.
Alexa snapshots are available from https://defyccc.com/data/PGSTN-Snapshots.7z (compressed with 7-Zip).
Contact Author: Leo Goldstein, DefyCCC.com, firstname.lastname@example.org