Hell and High Histogramming – Mastering an Interesting Heat Wave Puzzle

Guest Post by Willis Eschenbach

Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:

U.S. heat over the past 13 months: a one in 1.6 million event

Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.

All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.

Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.

Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.

The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match

Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.

Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.

So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.

Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source 

Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.

The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.

Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …

w.

Data Source, NOAA US Temperatures, thanks to Lucia for the link.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
268 Comments
Inline Feedbacks
View all comments
Bart
July 15, 2012 12:10 pm

Willis Eschenbach says:
July 15, 2012 at 12:58 am
Thank you. Now, we see the mean of 4.33.
Is there actually even one stretch of 13 in the data now? Clearly, there is not a significant deviation from the binomial distribution overall. This is reasonable to expect because, as I have pointed out, the modest warming which was observed over the 20th century should have a relatively small impact on the distribution of the relatively wide 1/3 temperature bands.
So, in the end, we conclude that there is no evidence that what we have seen recently is in any way out of the ordinary, and the entire hullabaloo has been over a trivial matter of a singleton observation.

Bart
July 15, 2012 12:37 pm

Oops… I meant there is not a significant deviation from the Poisson distribution.

July 15, 2012 1:09 pm

1) Setting stats aside, if their is an uptrend in temp during the period, why should it be surprising if a recent period had more warm months in it than an earlier period. I know detrending has been discussed, but it would be interesting to determine the the probability of occurrence of a string of 13 months (I’d make it 12 – I know this would be an issue for the CAGW group) in detrended data to see the probability considering natural variation. One could add on the slope of 0.5C or whatever the forced warming is thought to be.
2)Regarding the 13 months, I’m sure somewhere you have accounted for the double counting that there would be for every June that is in the top 1/3. It is a ridiculous proposition.

Bart
July 15, 2012 1:23 pm

But, under the assumption of no significant deviation from uniformly distributed events, it should be more binomial, shouldn’t it? That bothered me, so I set up a Monte Carlo run. I found that the histograms for this number of data points are fairly variable. Sometimes they look more binomial, sometimes they look more Poisson. Meh.

Bart
July 15, 2012 4:57 pm

Gary Pearse says:
July 15, 2012 at 1:09 pm
“…if their is an uptrend in temp during the period, why should it be surprising if a recent period had more warm months in it than an earlier period.”
The uptrend has been very modest relative to the width of the bands. Hence, it should have very little effect at all. And, in fact, it doesn’t. The histogram is well within the range of variability for an order 13 binomial distribution with this many samples. This has been much ado about nothing.

Bart
July 16, 2012 9:53 am

Willis Eschenbach says:
July 15, 2012 at 11:35 pm
That’s why we have statistical tests.”
Statistical tests are overrated. Their greatest function is confirming what you can usually see with your own eyes. Indeed, looking at your plot, it is clear that the binomial distribution with n = 13 does not look so good.
“Let me add that doing a Monte Carlo analysis is a very, very tricky thing to do…”
Try it yourself if you don’t believe me. Here’s one I made by creating a length 1392 sequence of uniformly distributed 0, 1, and 2’s and dividing it up into overlapping segments of 13 per your description of what you did, calling the 2’s the “upper 1/3”. Is it Binomial, or Poisson? Here’s a more usual sample run which is clearly more Binomial.
“Why are you making the assumption of “no significant deviation from uniformly distributed events”?”
Because, as I keep saying, the bands are much wider than any trends. The data should be pretty random and the threshold exceedances should come at a roughly average rate.
With all that said, on further consideration, I think the overlapping of the intervals likely could indeed be skewing the distribution. With overlap, you are capturing every 13 point streak possible and, in fact, it captures streaks of anything greater than 13 as well and marks them all as 13. So, it kind of makes sense that, at least in the upper levels, you might more closely approach a binomial distribution with n = 1374 rather than n = 13, which would be pretty close to a Poisson distribution.
Maybe it is possible to derive the distribution for overlapping intervals. Or, maybe it is so messy, that is precisely why they always talk about non-overlapping intervals when discussing the Poisson distribution, at least in every web reference I googled looking for where someone addressed overlapping intervals.
Whatever. It’s a distribution with a mean of 4.33 which is something in the Poisson/Binomial family. It looks pretty common and ordinary, and a singleton observation does not make or break it.

July 16, 2012 12:23 pm

Willis Eschenbach says:
July 15, 2012 at 12:58 am
Bart says:
July 14, 2012 at 2:56 pm
Willis Eschenbach says:
July 14, 2012 at 2:40 pm
Now, you’re confusing me even more. It’s always the same, but it isn’t?
For month X, I compared it to the historical record at that point in time. I assumed that’s what Masters meant when he said:
Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution …
I took the term “historical distribution” to mean that he was not going to use future temperatures, just the historical temperatures.

Which was an incorrect assumption.
“What I would like to see is a statistic where, for each month over the entire data set, you compute the range of temperatures, divide it into three bins, and assign the threshold to be the lower level of the top bin. That will be the threshold for that month, to be applied uniformly to all the data.”
Why would I want to do that?
Because it’d the right way to do it. If you were going to examine the possibility of a sequence of a certain length occurring in say 1000 throws of a die you wouldn’t make the comparison with only the throws which preceded a particular throw. In the case considered here you have about 116 periods, let’s say we start at the 16th because then you’ll have a reasonable value for the mean temperature, T1, and the threshold for the top third, Tt1. By the time you get to the last period you’ll have a different threshold, Tt100, so you’ve built a distribution as a composite of 100 Poissons (we’ll assume that they are Poisson processes for the sake of argument) each of which will have a mean of 4.333 but with different thresholds, Ttx, the composite will not have a mean of 4.333 because of the way you have compiled it. When you do it correctly by using a single threshold you might still get a Poisson which should have a mean of 4.333.
It would totally distort the record, because all of the high numbers would be clustered in the recent times. The way I did it, the individual months are not compared to warmer months that might or might not happen in the future, but only to the actual record up to that data. In addition, it makes the entire record change when you add more months. So you don’t have a stable dataset to analyze, using that method, so it’s quite possible that in a few years this June-to-June will no longer have 13 months in the top third.
See above
However, we are nothing if not a full service website:
Clearly, it’s not a binomial distribution …

No I wouldn’t expect it to be, N isn’t large enough. But upon examination when done properly as you appear to have done here the mean of the Poisson distribution looks very close to the theoretically expected 4.333. What was it exactly, you omitted to tell us?

Bart
July 16, 2012 1:19 pm

“No I wouldn’t expect it to be, N isn’t large enough.”
Strike that. Reverse it. The Poisson distribution is the limit as N gets large.

Bart
July 16, 2012 2:00 pm

Willis Eschenbach says:
July 16, 2012 at 1:31 pm
I was just trying to add some levity by channeling Willy Wonka.

July 17, 2012 9:47 am

Willis Eschenbach says:
July 16, 2012 at 1:28 pm
Phil. says:
July 16, 2012 at 12:23 pm
“Because it’d the right way to do it. If you were going to examine the possibility of a sequence of a certain length occurring in say 1000 throws of a die you wouldn’t make the comparison with only the throws which preceded a particular throw.”
“The right way to do it”? Well, aren’t you full of yourself. There are two ways to do it, and neither one can be claimed to be “the right way”. In particular, your example is not anywhere near a parallel to the question. Throws of a die are known to be stationary, where time series of temperature are not. So it doesn’t matter with a die if you include future and past events, but it most assuredly does matter with time series of temperature.

As pointed out above by doing it your way you’re comparing each event to a different threshold, whereas I’m talking about making the comparison to the single threshold for the whole series. The latter gives a predictable mean based on the process if it is indeed Poisson. If you wanted to estimate the probability of 13 events in the top third occurring in the next 13 months your method gives a wrong value for the mean which overestimates the probability whereas the theoretical 4.333 will give the correct probability. This is the test that was proposed initially by Masters which started this whole thing off.
Clearly, it’s not a binomial distribution …
“No I wouldn’t expect it to be, N isn’t large enough. But upon examination when done properly as you appear to have done here the mean of the Poisson distribution looks very close to the theoretically expected 4.333.”

The point being was that a Poisson and Binomial are the same for sufficiently large values of N so given that this data has a Poisson shape it wouldn’t be a binomial as well because of the value of N.
If you had been paying attention, you would have noticed I was responding to someone claiming it was a binomial distribution.
“What was it exactly, you omitted to tell us?”
I haven’t a clue, but it might have been that you are acting like a puffed-up jerkwagon convinced of his own infallibility. Truly, it’s not necessary to act like that to make your point.
I assumed that since you’d fitted a Poisson distribution you’d know what the mean was, I was just asking what it was. By eye it looks fairly close to the theoretical value of 4.333, which would be interesting.

1 9 10 11