Guest Post by Roman Mureika
It was bound to happen eventually. We could see it coming – a feeding frenzy from “really, it is still getting warmer” to “we told you so: this is proof positive that the science is settled and we will all boil or fry!” The latest numbers are in and they show the “hottest” year since temperature data has become available depending on which data you look at.
The cheerleader this time around seems to have been AP science correspondent Seth Borenstein. Various versions of his essay on the topic have permeated most of America’s newspapers including my own hometown Canadian paper. In his articles, e.g. here and here, he throws enormous numbers at us involving probabilities actually calculated (and checked!) by real statisticians which purport to show that the temperatures are still rising and spiraling out of control:
Nine of the 10 hottest years in NOAA global records have occurred since 2000. The odds of this happening at random are about 650 million to 1, according to University of South Carolina statistician John Grego. Two other statisticians confirmed his calculations.
I was duly impressed by this and other numbers in Seth’s article and asked myself what else of this extremely unlikely nature might one find in the NOAA data. With a little bit of searching I was able to locate an interesting tidbit that they clearly missed. If we wind the clock back to 1945 and look back at the previous temperatures, we notice that they also rose somewhat rapidly and new “hot” records were created. In fact, the graphic below shows that the highest 8 temperatures of the 65 year series to that point in time all belonged to the years 1937 to 1944 Furthermore, in that span of eight years, five of these were each a new record! How unlikely is that?
Using the techniques of the AP statisticians, a simple calculation indicates that the chance of all eight years being the highest is 1 in 5047381560 – almost 9 times as unlikely as what occurred in the most recent years! Not to mention the five records…
By now, most of the readers will be mumbling “Nonsense, all these probabilities are meaningless and irrelevant to real-world temperature series” … and they would be absolutely correct! The above calculations were done under the assumption that the temperatures from any one year are all independent of the temperature for any other year. If that were genuinely the case in the real world, a plot of the NOAA series would look like the gray curve in the plot shown below which was done by randomly re-ordering the actual temperatures (in red) from the NOAA data.
For a variety of physical reasons, measured real-world global temperatures have a strong statistical persistence. They do not jump up and down erratically by large amounts and they are strongly auto-correlated over a considerable period of time due to this property. Annual changes are relatively small and when the series has reached a particular level, it may tend to stay around that level for a period of years. If the initial level is a record high then subsequent levels will also be similarly high even if the cause for the initial warming is reduced or disappears. For that reason, making the assumption that yearly temperatures are “independent” leads to probability calculation results which can bear absolutely no relationship to reality. Mr. Borenstein (along with some of the climate scientists he quoted) was unable to understand this and touted them as having enormous importance. The statisticians would probably have indicated what assumptions they had made to him, but he would very likely not have recognized the impact of those assumptions.
How would I have considered the problem of modelling the behaviour of the temperature series? My starting point would be to first look at the behaviour of the changes from year to year rather than the original temperatures themselves to see what information that might provide.
Plot the annual difference series:
Make a histogram:
Calculate some statistics:
Mean = 0.006 = (Temp_2014 – Temp_1880)/134
Median = 0.015
SD = 0.098
# Positive = 71, # Negative = 59, # Equal to 0 = 4
Autocorrelations: Lag1 = -0.225, Lag2 = -0.196, Lag3 = -0.114, Lag4 = 0.217
The autocorrelations could use some further looking into, however, the plots indicate that it might not be unreasonable to assume that the annual changes are independent of each other and of the initial temperature. Now, one can examine the structure of the waiting time from one record year to the next. This can be done with a Monte Carlo procedure using the observed set of 134 changes as a “population” of values to estimate the probability distribution of that waiting time. In that procedure, we randomly sample the change population (with replacement) and continue until the cumulative total of the selected values is greater than zero for the first time. The number of values selected is the number of years it has taken to set a new record and the total can also tell us the amount by which the record would be broken. This is repeated a very large number of times (in this case, 10000) to complete the estimation process.
The results are interesting. The probability of a new record in the year following a record temperature will obviously be the probability that the change between the two years is positive (71 / 134 = 0.530). A run of three or more consecutive record years would then occur about 28% of the time and a run of four or more about 15% of the time given an initial record year.
The first ten values of the probability distribution of the waiting time for a return to a new record as estimated by the Monte Carlo procedure look like this:
1 …….. 0.520
2 …….. 0.140
3 …….. 0.064
4 …….. 0.039
5 …….. 0.027
6 …….. 0.022
7 …….. 0.016
8 …….. 0.012
9 …….. 0.012
Note the rapid drop in the probabilities. After the occurrence of a global record, the next annual temperature is also reasonably likely to be a record, however when the temperature series drops down, it can often take a very long time for it to return to the record level. The probability that it will take at least 5 years is 0.24, at least 18 years is 0.10 and for 45 years or more it is 0.05. The longest return time in the 10000 trial MC procedure was 1661 years! This is due to the persistence characteristics inherent in the model similar to those of a simple random walk or to a Wiener process. However, unlike these stochastic processes, the temperature changes contain a positive “drift” of about 0.6 degrees per century due to the fact that the mean change is not zero thus guaranteeing a somewhat shorter return time to a new record. A duplication of the same MC analysis using changes taken from a normal distribution with mean equal to zero (i.e. no “warming drift”) and standard deviation equal to that of the observed changes produce results very similar to the one above.
The following graph shows the probabilities that the wait for a new record will be a given number of years or longer.
This shows the distribution of the amount by which the old record would be exceeded:
For a more complete analysis of the situation, one would need to take into account the relationships within the change sequence as well as the possible correlation between the current temperature and the subsequent change to the next year (correlation = -0.116). The latter could be a partial result of the autocorrelation in the changes or an indication of negative feedbacks in the earth system itself.
Despite these caveats, it should be very clear that the probabilities calculated for the propaganda campaign to hype the latest record warming are pure nonsense with no relationship to reality. The behaviour of the global temperature series from NOAA in the 21st century is probabilistically unremarkable and consistent with the persistence characteristics of the temperature record as observed in the previous century. Assertions such as “the warmest x of y years were in the recent past” or “there were z records set” when the temperatures had already reached their pre-2000s starting level as providing evidence of the continuation of previous warming are false and show a lack of understanding of the character of the underlying situation. Any claims of an end to the “hiatus” based on a posited 0.04 C increase (which is smaller than the statistical uncertainty of the measurement process) are merely unscientifically motivated assertions with no substantive support. That these claims also come from some noted climate scientists indicates that their science takes a back seat to their activism and reduces their credibility on other matters as a result.
I might add that this time around I was pleased to see some climate scientists who were willing to publicly question the validity of the propaganda probabilities in social media such as Twitter. As well, the (sometimes reluctant) admissions that the 2014 records of other temperature agencies are in a “statistical tie” with their earlier records seems to be a positive step towards a more honest future discussion of the world of climate science.
Note: AP has added a “clarification” of various issues in the Seth Borenstein article:
In a story Jan. 16, The Associated Press reported that the odds that nine of the 10 hottest years have occurred since 2000 are about 650 million to one. These calculations, as the story noted, treated as equal the possibility of any given year in the records being one of the hottest. The story should have included the fact that substantial warming in the years just prior to this century could make it more likely that the years since were warmer, because high temperatures tend to persist.