Guest Post by Willis Eschenbach
Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:
U.S. heat over the past 13 months: a one in 1.6 million event
Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.
All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.
Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.
Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.
The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match
Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.
Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.
So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.
Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source
Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.
The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.
Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …
w.
Data Source, NOAA US Temperatures, thanks to Lucia for the link.
KR says:
July 14, 2012 at 10:14 pm
I have no clue what Tamino did, nor do I care. I don’t deal with or visit web sites that ban people for asking scientific questions, or that censor those questions. If other people took the same action, those sites would wither and die. And in fact that’s what they seem to be doing, both Tamino’s site and RealClimate have fallen entirely out of the Alexa ratings, because their readership is too low, and deservedly so … while ClimateAudit and WUWT are doing well. But I digress.
I said from the start that I was not answering the same question that Masters and Lucia were answering, viz:
I guess that wasn’t clear enough for you, so let me say it again. I’m not attempting to answer the question they asked, which relates to some imaginary climate with no trend. I try to avoid theoretical questions about imaginary climates. Instead I looked at what the real odds were of there being 13 out of 13 in the real climate.
Yes, they are two different discussions. I said that coming in. That’s what “let me propose a different way of looking at the situation” means.
w.
Bart says:
July 14, 2012 at 2:56 pm
Willis Eschenbach says:
July 14, 2012 at 2:40 pm
For month X, I compared it to the historical record at that point in time. I assumed that’s what Masters meant when he said:
I took the term “historical distribution” to mean that he was not going to use future temperatures, just the historical temperatures.
Why would I want to do that? It would totally distort the record, because all of the high numbers would be clustered in the recent times. The way I did it, the individual months are not compared to warmer months that might or might not happen in the future, but only to the actual record up to that data. In addition, it makes the entire record change when you add more months. So you don’t have a stable dataset to analyze, using that method, so it’s quite possible that in a few years this June-to-June will no longer have 13 months in the top third.

However, we are nothing if not a full service website:
Clearly, it’s not a binomial distribution …
I don’t recall suggesting that it will eliminate any 13 month sequence, just that this current one will soon no longer have 13 in the top third if the three century warming trend continues.
Doing it the way you suggest means that the most recent year deals only with past temperatures, while previous years are measured against future temperatures that hadn’t even happened at that time. So you are judging different years by different metrics.
w.
Phil. says:
July 14, 2012 at 12:12 pm
Thanks, Phil. That part you bolded doesn’t mean that they have compared each one to the 1895-present record. They distinguish between the “historical record”, which I take to mean historical rather than future temperatures, and the 1895-present record.
All he said about the bolded part was that this was the first time in the 1895-present record that 13 months had been among the third warmest in the historical record. I read this as meaning that they were NOT in the warmest third the 1895-present record, they were in the warmest third in the historical record.
w.
Willis Eschenbach says:
July 15, 2012 at 12:58 am
Thank you. Now, we see the mean of 4.33.
Is there actually even one stretch of 13 in the data now? Clearly, there is not a significant deviation from the binomial distribution overall. This is reasonable to expect because, as I have pointed out, the modest warming which was observed over the 20th century should have a relatively small impact on the distribution of the relatively wide 1/3 temperature bands.
So, in the end, we conclude that there is no evidence that what we have seen recently is in any way out of the ordinary, and the entire hullabaloo has been over a trivial matter of a singleton observation.
Oops… I meant there is not a significant deviation from the Poisson distribution.
1) Setting stats aside, if their is an uptrend in temp during the period, why should it be surprising if a recent period had more warm months in it than an earlier period. I know detrending has been discussed, but it would be interesting to determine the the probability of occurrence of a string of 13 months (I’d make it 12 – I know this would be an issue for the CAGW group) in detrended data to see the probability considering natural variation. One could add on the slope of 0.5C or whatever the forced warming is thought to be.
2)Regarding the 13 months, I’m sure somewhere you have accounted for the double counting that there would be for every June that is in the top 1/3. It is a ridiculous proposition.
But, under the assumption of no significant deviation from uniformly distributed events, it should be more binomial, shouldn’t it? That bothered me, so I set up a Monte Carlo run. I found that the histograms for this number of data points are fairly variable. Sometimes they look more binomial, sometimes they look more Poisson. Meh.
Gary Pearse says:
July 15, 2012 at 1:09 pm
“…if their is an uptrend in temp during the period, why should it be surprising if a recent period had more warm months in it than an earlier period.”
The uptrend has been very modest relative to the width of the bands. Hence, it should have very little effect at all. And, in fact, it doesn’t. The histogram is well within the range of variability for an order 13 binomial distribution with this many samples. This has been much ado about nothing.
Bart says:
July 15, 2012 at 1:23 pm
That’s why we have statistical tests. In this case (using all data rather than historical data) it strongly rejects binomial, and fails to reject Poisson (although not as decisively as in the prior case using historical data rather than past and future data).
Let me add that doing a Monte Carlo analysis is a very, very tricky thing to do, and is often done without enough prior thought. It is critical that you investigate the distribution of the observations very, very, closely, and you need to match your pseudo-data to whatever it is that you find.
w.
PS—Why are you making the assumption of “no significant deviation from uniformly distributed events”?
Bart says:
July 15, 2012 at 4:57 pm
Not true at all. The KS test strongly rejects binomial distribution for this data. By strongly, I mean I swept all probabilities, and the largest p-value, for the binomial probability of 0.32, was 2e-05, which is about as strong as it gets. Bear in mind that in my method, the value of 2e-05 is the average of the Kolmogorov-Smirnov test comparing the test data to 1000 random binomial datasets with a probability of 0.32 … so no, it is not anywhere near a binomial distribution with this many samples.
w.
Willis Eschenbach says:
July 15, 2012 at 11:35 pm
That’s why we have statistical tests.”
Statistical tests are overrated. Their greatest function is confirming what you can usually see with your own eyes. Indeed, looking at your plot, it is clear that the binomial distribution with n = 13 does not look so good.
“Let me add that doing a Monte Carlo analysis is a very, very tricky thing to do…”
Try it yourself if you don’t believe me. Here’s one I made by creating a length 1392 sequence of uniformly distributed 0, 1, and 2’s and dividing it up into overlapping segments of 13 per your description of what you did, calling the 2’s the “upper 1/3”. Is it Binomial, or Poisson? Here’s a more usual sample run which is clearly more Binomial.
“Why are you making the assumption of “no significant deviation from uniformly distributed events”?”
Because, as I keep saying, the bands are much wider than any trends. The data should be pretty random and the threshold exceedances should come at a roughly average rate.
With all that said, on further consideration, I think the overlapping of the intervals likely could indeed be skewing the distribution. With overlap, you are capturing every 13 point streak possible and, in fact, it captures streaks of anything greater than 13 as well and marks them all as 13. So, it kind of makes sense that, at least in the upper levels, you might more closely approach a binomial distribution with n = 1374 rather than n = 13, which would be pretty close to a Poisson distribution.
Maybe it is possible to derive the distribution for overlapping intervals. Or, maybe it is so messy, that is precisely why they always talk about non-overlapping intervals when discussing the Poisson distribution, at least in every web reference I googled looking for where someone addressed overlapping intervals.
Whatever. It’s a distribution with a mean of 4.33 which is something in the Poisson/Binomial family. It looks pretty common and ordinary, and a singleton observation does not make or break it.
Willis Eschenbach says:
July 15, 2012 at 12:58 am
Bart says:
July 14, 2012 at 2:56 pm
Willis Eschenbach says:
July 14, 2012 at 2:40 pm
Now, you’re confusing me even more. It’s always the same, but it isn’t?
For month X, I compared it to the historical record at that point in time. I assumed that’s what Masters meant when he said:
Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution …
I took the term “historical distribution” to mean that he was not going to use future temperatures, just the historical temperatures.
Which was an incorrect assumption.
“What I would like to see is a statistic where, for each month over the entire data set, you compute the range of temperatures, divide it into three bins, and assign the threshold to be the lower level of the top bin. That will be the threshold for that month, to be applied uniformly to all the data.”
Why would I want to do that?
Because it’d the right way to do it. If you were going to examine the possibility of a sequence of a certain length occurring in say 1000 throws of a die you wouldn’t make the comparison with only the throws which preceded a particular throw. In the case considered here you have about 116 periods, let’s say we start at the 16th because then you’ll have a reasonable value for the mean temperature, T1, and the threshold for the top third, Tt1. By the time you get to the last period you’ll have a different threshold, Tt100, so you’ve built a distribution as a composite of 100 Poissons (we’ll assume that they are Poisson processes for the sake of argument) each of which will have a mean of 4.333 but with different thresholds, Ttx, the composite will not have a mean of 4.333 because of the way you have compiled it. When you do it correctly by using a single threshold you might still get a Poisson which should have a mean of 4.333.
It would totally distort the record, because all of the high numbers would be clustered in the recent times. The way I did it, the individual months are not compared to warmer months that might or might not happen in the future, but only to the actual record up to that data. In addition, it makes the entire record change when you add more months. So you don’t have a stable dataset to analyze, using that method, so it’s quite possible that in a few years this June-to-June will no longer have 13 months in the top third.
See above
However, we are nothing if not a full service website:
Clearly, it’s not a binomial distribution …
No I wouldn’t expect it to be, N isn’t large enough. But upon examination when done properly as you appear to have done here the mean of the Poisson distribution looks very close to the theoretically expected 4.333. What was it exactly, you omitted to tell us?
“No I wouldn’t expect it to be, N isn’t large enough.”
Strike that. Reverse it. The Poisson distribution is the limit as N gets large.
Phil. says:
July 16, 2012 at 12:23 pm
“The right way to do it”? Well, aren’t you full of yourself. There are two ways to do it, and neither one can be claimed to be “the right way”. In particular, your example is not anywhere near a parallel to the question. Throws of a die are known to be stationary, where time series of temperature are not. So it doesn’t matter with a die if you include future and past events, but it most assuredly does matter with time series of temperature.
If you had been paying attention, you would have noticed I was responding to someone claiming it was a binomial distribution.
I haven’t a clue, but it might have been that you are acting like a puffed-up jerkwagon convinced of his own infallibility. Truly, it’s not necessary to act like that to make your point.
w.
Bart says:
July 16, 2012 at 1:19 pm
The reverse of that is
Not sure that’s what you mean. Or perhaps you mean
w.
Willis Eschenbach says:
July 16, 2012 at 1:31 pm
I was just trying to add some levity by channeling Willy Wonka.
July 16, 2012 at 2:00 pm
My bad, I missed the reference totally and completely … guess I should watch more movies.
w.
Willis Eschenbach says:
July 16, 2012 at 1:28 pm
Phil. says:
July 16, 2012 at 12:23 pm
“Because it’d the right way to do it. If you were going to examine the possibility of a sequence of a certain length occurring in say 1000 throws of a die you wouldn’t make the comparison with only the throws which preceded a particular throw.”
“The right way to do it”? Well, aren’t you full of yourself. There are two ways to do it, and neither one can be claimed to be “the right way”. In particular, your example is not anywhere near a parallel to the question. Throws of a die are known to be stationary, where time series of temperature are not. So it doesn’t matter with a die if you include future and past events, but it most assuredly does matter with time series of temperature.
As pointed out above by doing it your way you’re comparing each event to a different threshold, whereas I’m talking about making the comparison to the single threshold for the whole series. The latter gives a predictable mean based on the process if it is indeed Poisson. If you wanted to estimate the probability of 13 events in the top third occurring in the next 13 months your method gives a wrong value for the mean which overestimates the probability whereas the theoretical 4.333 will give the correct probability. This is the test that was proposed initially by Masters which started this whole thing off.
Clearly, it’s not a binomial distribution …
“No I wouldn’t expect it to be, N isn’t large enough. But upon examination when done properly as you appear to have done here the mean of the Poisson distribution looks very close to the theoretically expected 4.333.”
The point being was that a Poisson and Binomial are the same for sufficiently large values of N so given that this data has a Poisson shape it wouldn’t be a binomial as well because of the value of N.
If you had been paying attention, you would have noticed I was responding to someone claiming it was a binomial distribution.
“What was it exactly, you omitted to tell us?”
I haven’t a clue, but it might have been that you are acting like a puffed-up jerkwagon convinced of his own infallibility. Truly, it’s not necessary to act like that to make your point.
I assumed that since you’d fitted a Poisson distribution you’d know what the mean was, I was just asking what it was. By eye it looks fairly close to the theoretical value of 4.333, which would be interesting.