Hell and High Histogramming – Mastering an Interesting Heat Wave Puzzle

Guest Post by Willis Eschenbach

Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:

U.S. heat over the past 13 months: a one in 1.6 million event

Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.

All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.

Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.

Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.

The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match

Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.

Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.

So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.

Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source 

Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.

The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.

Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …

w.

Data Source, NOAA US Temperatures, thanks to Lucia for the link.

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
268 Comments
Inline Feedbacks
View all comments
July 11, 2012 4:17 pm

Willis Eschenbach says:
July 11, 2012 at 3:54 pm
Dr Burns says:
July 11, 2012 at 2:32 pm
Willis,
You claim “it is indeed a Poisson process”. Don Wheeler, one of the world’s leading statisticians points out “The numbers you obtain from a probability model are not really as precise as they look.” A Burr distribution can be made to look almost identical to a Poisson but give very different results in the tails.
Yes, I know, and there are other distributions that are similar as well … but in this particular case, the Poisson distribution gives very good results in the tails.

But gives a mean of 5.2 instead of the known mean for that process, if it were Poisson, of 4.33, an error of 20%.

son of mulder
July 11, 2012 4:21 pm

1 in 1.6 million is extremely common. Everytime they draw the lottery of 6 numbers from 49 in the UK the winning streak is about 1 in 14 million chance but it happens every week ;>)
Nice one Willis.

Bart
July 11, 2012 4:34 pm

KR says:
July 11, 2012 at 2:12 pm
“…Lucia’s (re-)estimate is less than a 1:100,000 chance … Masters made the mistake of not accounting for autocorrelation, and appears to be off by at least an order of magnitude as a result. Eschenbach is using the wrong model, and appears to be off by perhaps five orders of magnitude as a result.”
Eschenbach is 2.6 in 1374. Compared to 1 in 100,000, that is off by two orders of magnitude. However, Lucia said “less than”, so assuming she is right, you still can’t say for sure. I wouldn’t have bothered commenting because, as I have stated preciously, this is a tempest in a teapot. But, the snark was kind of annoying.
Willis Eschenbach says:
July 11, 2012 at 2:47 pm
“I checked on that by seeing if my results were autocorrelated. They were not, in fact they were slightly negatively correlated (-.15).”
That said, this was painful to read. An autocorrelation is generally a multi-valued function comprised of expected values of lagged products.

Steve R
July 11, 2012 4:38 pm

Phil Said…..But gives a mean of 5.2 instead of the known mean for that process, if it were Poisson, of 4.33, an error of 20%.
This depends…Was each month ranked into Terciles based only on the months preceding it? Or was it ranked based on all of the data?

cd_uk
July 11, 2012 4:54 pm

Willis
please don’t bite my head off I’m only trying to help.
On the method of autocorrelation can I suggest:
1) Create a data series where the number of months that satisfy your criteria (top third) are recorded for each year.
2) Run an autocorrelation function (e.g. http://en.wikipedia.org/wiki/Correlogram):
3) Alternatively if you don’t want to go through step 2 (long winded or write your own code). If you’re familiar with Excel you could do an FFT of data series outlined in 1. This will give you a series of complex numbers. Then using the IMABS() function get the power of each output. If you then take these cells and compute another FFT for this you will get the correlogram and again you’ll be looking for the characteristic autocorrelated signature. See add-ins in Excel for FFT.
You put so much work into your posts and people seem to spend most of the time nit-picking so I’m just trying to help. I do try to give support but something always seems to get lost in translation.
Anyways off to bed. Good night. But look forward to hearing from you if this helps.

KR
July 11, 2012 4:56 pm

Willis Eschenbach – Again, the wrong model. A Poisson distribution is for the number of events occurring in independent sampling intervals, and you haven’t defined a sampling interval – in fact, you have different sampling intervals for each bin. A binomial distribution for successive runs would be closer, but still not account for autocorrelation.
Masters computed straight probabilities, without autocorrelation, for 13 successive months to be in the top 1/3 of temperatures – 1:1.6×10^6 is definitely too high. Tamino calculated for normalized distributions and got 1:5×10^5, which he notes is certainly a bit high due to inter-month correlations, but is probably at least close. Lucia ran a Monte Carlo simulation, and got values around 1:1×10^5, although if you give a generous helping of uncertainty to early temperature records she feels it might fall as low as 1:2×10^3 – and that’s almost certainly too low an estimate.
What you have done, essentially, is to state that the observations fall very close to a curve that is fit … to those very same observations, with a ratio near 1:1. That’s not a probability analysis, Willis, it’s a tautology. And it says exactly nothing.

cd_uk
July 11, 2012 5:03 pm

Forgot to mention that you’ll obviously need powers of two data series to run the FFT tool. You can just simply pad your series out to this if it isn’t. Or use a DFT instead. You’ll need to find one or I can write something for you if you supply the data (not on work time I hasten to add).

KR
July 11, 2012 5:05 pm

Bart – That 2.6 in 1374 means only slightly over a 1:1 probability during the period of observation.
Of course, since that’s a prediction of observations made from a curve fit to those observations, the fact that the observations fall close to that curve is totally unsurprising. What it is not, however, is an estimate of the probability of 13 months of successive top 1/3 range months in a row in a stationary process with stochastic variation.
I’m going to go with Lucia’s Monte Carlo estimates on this one – a 1:166,667 chance for this occurrence for evenly supported data, with a fairly hard lower bound of 1:2000 if you assume that all of the early data is rather horribly uncertain.

Windchaser
July 11, 2012 5:11 pm

Willis says:

Five orders of magnitude? Get real. LOOK AT THE ACTUAL DISTRIBUTION. We have a host of high values in the dataset, it’s not uncommon to find occurrences of ten and eleven and twelve months being in the warmest third. The idea that these are extremely uncommon, five orders of magnitude uncommon, doesn’t pass the laugh test.

To make sure I understand correctly:
The second plot above is for the number of months within a 13-month period that fall in the warmest 1/3, correct? Not the number of *contiguous* months which each fall into the top 1/3rd?
If so, then no, it doesn’t show the occurrences of ten and eleven and twelve months being in the warmest third. Because that 10 warms months could be 3 warm months, 3 cool ones, then 7 more warm ones. (Etc.). Obviously, periods like that will be more common than periods of 10 strictly consecutive months.

My understanding is that the record is counted if it is in the top third of all records up to that date, not if it is in the top third of all historical records for all time. It doesn’t make sense any other way, to me at least.

Hmm. I think that if you’re measuring the number of months within the top 1/3rd of months so far, instead of within all months, that you’re going to get skewed numbers. Or, numbers with a different purpose than these, at least.
Here’s the simplified example. Let’s say we have a linear, positive trend with a small bit of noise. Then every few years, we’ll hit new records. Likewise, we’ll have a disproportionately high number of months within the top third. If the noise is small enough, then nearly *every* month will be in the top third, as the temperature trends higher and higher. And because of the positive trend, you’ll have an insanely high number of 5- or 10- or however-long periods of consecutive months within the top 1/3rd.
Obviously, you couldn’t look at the high frequency of, say, 9-month hot streaks in this scenario and say “this means that 10-month hot streaks would be uncommon if the temperature was flat”. Because those 9-month hot streaks came from an ever-rising trend (which distorts their probability), they tell you nothing about the probability of 10-month hot streaks within a flat trend.

ZP
July 11, 2012 5:43 pm

It’s rather common for people to underestimate the true probability of streaks. While it is true that for p = 1/3, the probability of any particular streak is (1/3)^13, that is not the cumulative probability for all possible streaks over the long run (refer to http://en.wikipedia.org/wiki/Gambler%27s_fallacy#Monte_Carlo_Casino). As Willis correctly points out, we cannot choose our start and end points arbitrarily.
The correct calculation algorithm for independent events is quite a bit more complicated and is described here: http://marknelson.us/2011/01/17/20-heads-in-a-row-what-are-the-odds/. We can use an on-line calculator here: http://www.pulcinientertainment.com/info/Streak-Calculator-enter.html.
If we define a “win” as an event in the top third historically (i.e. p = 1/3), then over 1392 consecutive trials (months), the probability of 13 consecutive wins would be 0.06% or 1 in about 1730. Clearly, Jeff Masters vastly underestimates the streak probability. Remember, this approach assumes perfectly independent trials akin to the expected probability of win streak while making only column bets on a roulette wheel). Considering that weather patterns are not independent events, we should be able to safely conclude that this calculation provides the lower bound for the true probability.

Richard Simons
July 11, 2012 5:59 pm

Using the Poisson distribution on these data tells us that there is a probability of about 0.07% of a random 13-month period having 14 months that are in the top 1/3. This is clearly not possible. Also, the calculated value of lambda (5.213) is wildly different from the mean (4.333). Both of these should have been sufficient taken alone to convince you that the Poisson distribution is not an appropriate model.

Toto
July 11, 2012 6:00 pm

There is an endless stream of alarmist climate stories in the MSM. What are the odds of that if the journalism is fair? You can fool all of the MSM journalists, all of the time.

Windchaser
July 11, 2012 6:09 pm

What we know is that the mean of the data is 5.2. But all that means is that p is not 1/3 as you claim, it’s some larger number, to wit, 5.2/13. It doesn’t mean that we are not looking a Poisson distribution. It just means that your estimate of p is incorrect.
Why? Because the earth is warming, obviously, so the chances of being in the warmest third are greater than if it were stationary. But again, that doesn’t mean the distribution is not Poisson. It just means that your estimate of “p” is wrong.

Then I’m unsure that you’re calculating the same thing as Lucia and Masters. They were considering what the probability of this streak would be in a non-warming world. In such a world, p=1/3.
Again, you can’t fit data to a warming-world scenario, then use that for the probability in an untrended world.

KR
July 11, 2012 7:38 pm

Willis Eschenbach – If you have used 13 month intervals for each bin, then I would have to say I misinterpreted your post in that respect. But that’s really pretty irrelevant to the core problem.
You have fit a Poisson distribution (which is prima facie invalid, as what you are looking at is the expectation of a normal distribution of temperatures and the co-occurrence of 13 autocorrelated months in a row in a particular range, rather than collections of independent Poisson events in evenly sampled bins) to observations, and then used that fit to describe the observations.
Amazingly, the observations fit the curve that is matched to the observations – a tautology since any set of observations will closely match a curve fit directly to them. It doesn’t matter if you have fit a Poisson distribution, a binomial distribution, or the shape of your favorite baseball cap or a for that matter a 1967 VW Beetle. This says exactly nothing about a stationary process with stochastic noise, which is what Masters compared the last 13 months to. You cannot fit observations to a descriptive curve and then make judgements about the observations without looking at those expectations of the observations and how they behave in respect to those expectations. Which is something you have not done.
Have you analyzed expectations of a stochastic process? No. You have only compared the observations to the observations, and come up with a nearly 1:1 relationship. Not surprising.
I hate to say it, but your analysis has absolutely nothing to do with a process with stochastic, normally distributed variations, such as the temperature record.
Word of the day: Tautology

RobertInAz
July 11, 2012 8:10 pm

I am inordinately fond of of this thread. It is a microcosm of the model verses reality discussion. I think the streak post by ZP at 5:43 is very on-point. It has been 32 years since my last statistics class and I do not use statistics in my work. I finally settled on this analysis to form my own opinion on the “truth”.
The N in M problem for a binomial distribution was pretty standard. So if you take a probability for an event (1 in 1.5 millionish) and run x number of trials (1374), then the probability of the event occurring increases with each trial. Cranking the numbers through the binomial formula gives 1 in 1161 (and I probably have an error somewhere) of the event occurring once and only once. Note that this answer describes a different problem from all of the other answers discussed but it illustrates Dr. Master’s and NCDCs original error – they only considered one trial. The error is of course compounded by projecting out a gazillion years.
The streak calculator ZP points to is also a binomial view of the world and attacks the problem in a more sophisticated way than my sanity check and yields a 1 in 1964 chance. I have no idea why the streak result is different than my sanity check.
Lucia addresses different questions based in a couple of different models. The key point is that the calculations are models. In addition, they were based on a real world temperature trend of 0 and a modeled auto-correlation factor.
Willis addresses a somewhat different question – given the real world properties of this data, what is the probability of the event. He does not try to take out any real world temperature trend nor define an auto-correlation factor. He looks at the curve.
I would be interested in the curves for 12 in 12, 11 in 11, 10 in 10 and 9 in 9 to see if the 13 in 13 curve properties holds for them. Not interested enough to do the work myself of course…..

RobertInAz
July 11, 2012 8:17 pm

p.s. I think Willis’s analysis compared to Lucia’s reasonable model confirms a real world temperature trend.

Steve R
July 11, 2012 9:05 pm

Something just seems wrong. 10’s of thousands left their farms and lives behind to escape the dust bowl conditions of the 30’s. We’ve all seen the pictures of the total devastation. I find it extremely difficult to believe that the past 13 months are anywhere close to the conditions in those days. I mean yes, its been hot, but even in my 50 year experience, I would hesitate to say this is the worst I’ve seen. Have we actually blown away all the record high’s set back in those days? Somehow I doubt it.

July 11, 2012 9:10 pm

Willis,
Your choice of a Poisson distribution has been criticised, not least because it gives a finite probability for getting 14 months out of 13. And if it gets that tail value wrong, 13/13 is a worry too.
In fact, the Poisson is just the limiting form of the binomial for events of low probability. So the binomial for 13 would look quite like a Poisson anyway, and doesn’t have this issue. So you might as well use it.
In fact, that’s just what Masters did, with p=1/3. In effect, you’re regarding this p as a fittable parameter, rather than understood from first principles. And when fitted, it comes out to something different.
That discrepancy is an issue, but I think in any case if you do want to fit a distribution, the binomial is better.

rgbatduke
July 11, 2012 9:30 pm

Then I’m unsure that you’re calculating the same thing as Lucia and Masters. They were considering what the probability of this streak would be in a non-warming world. In such a world, p=1/3.
Again, you can’t fit data to a warming-world scenario, then use that for the probability in an untrended world.

This was discussed some on today’s thread on John N-G’s blog. I objected, fairly strenuously, to the claim, as being a lousy use of statistics. John explained to me that the reason it was published was that 10-35% percent of all (Americans? Humans?) still don’t believe that there has been a warming trend over the last 150 years at all. Lucia and Masters, as you say, assumed no warming trend — more or less straight up independent trials and no autocorrelation, which will then damn skippy make the result very unlikely — results that hold for an imaginary planet with temperatures per month that are pulled out of a hat around some mean from a distribution with some width, which is even more unlikely.
So the “point” is to convince those holdouts that the Earth is in a warming trend at all.
To me this is bizarre in so very many ways. I pointed out that Willis was if anything too kind. To even begin to estimate the correct probability of the outcome, one has to do many things — account for a monotonic or near monotonic warming or cooling trend, both of which would make runs in the top 1/3 more likely depending on the noise (at one or the other end of the trended data). At the moment, following 150 years of global warming post the Dalton minimum, of course it isn’t even close to as unlikely as a flat temperature plus noise estimate will produce. Then, just as Willis averaged over all possible starting months, one similarly has to average over all possible US sized patches of the Earth’s surface (and all possible starting points). The US is roughly 1/50 of the Earth, so even if you do a mutually exclusive partitioning, you get fifty chances in a year right there, and if you use sliding windows looking for any patch where it is true you get far more.
Then, there are places on the Earth’s surface that beat the flat odds all the time. The patch of ocean where El Nino occurs, for example, is roughly the area of the US. Very roughly once a decade it warms up by 0.5-0.9C (compared to the usual monthly temperature the rest of the time) on the surface, and typically stays that way for 1-2 years. It therefore produces this “unusual” event approximately once a decade, very probably almost independent of any superimposed warming or cooling trends.
Curiously, John agreed with me on basically everything, including the fact that the observation is basically meaningless except as proof that we are in a warming trend, which anybody that can actually read a graph can see anyway (and the ones that are going to “deny” that graph aren’t going to be convinced by a little thing like bad, almost deliberately misleading statistics).
Have we really reached the point in climate science where the ends justify the means? Should we be trying to convince young earth creationists that evolution is true and the Universe is old by making egregious and irrelevant claims now, or should we rely on things like radiometric dating and measuring distances to distant stars and galaxies?
This is really a lot more like their arguments with evolutionary biologists. If we shake a box full of “stuff”, it is absurdly improbable that a fully formed organism will fall out, therefore God is necessary. The former is true, and yet horribly misleading and certainly neither proves the consequent nor disproves the mechanism of evolution in any way, but it certainly does emphasize the surprising difference between randomness and structure.
Is this not the exact same argument? In conditions that everybody knows do not hold or pertain to the issue of climate we make an egregious but true statement that is phrased in such a way as to make one think that something important has been proven, that the event in question was really unlikely at the level indicated given the actual data of a near monotonic increase in temperature across the entire thermal record! As if it mattered.
So in retrospect, I will withdraw my earlier conclusion that the result was erroneous. It is perfectly correct.
Which is worse. Being mistaken is forgivable. Deliberately using statistics to mislead people in a political discussion is, well, less forgivable.
So Willis, a question. Suppose you take the data and (say) chop it off at the end of the flattened stretch from the 40’s through the 70s. That’s thirty years or so when the temperature was fairly uniformly in the top third of the shorter data set. You might get lucky and find a stretch of 13 consecutive months in there that are all in the top third too — just not in the current top third. I wonder what one could say to that — the same miracle occurring twice in one single dataset (and quite possibly in a stretch where the temperature was steady or decreasing from the 40’s peak).
rgb

RobertInAz
July 11, 2012 9:49 pm

“So Willis, a question. Suppose you take the data and (say) chop it off at the end of the flattened stretch from the 40′s through the 70s. That’s thirty years or so when the temperature was fairly uniformly in the top third of the shorter data set. ”
Well, the limiting case is to truncate the data to 13 months for a probability of 1. Also 1 for all 13 years in the middle and lower thirds. So the nature of the distribution changes over time even if the trend is flat. My head started to hurt so I dropped back to simple N of M analysis.

1 4 5 6 7 8 11