Hell and High Histogramming – Mastering an Interesting Heat Wave Puzzle

Guest Post by Willis Eschenbach

Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:

U.S. heat over the past 13 months: a one in 1.6 million event

Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.

All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.

Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.

Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.

The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match

Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.

Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.

So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.

Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source

Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.

The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.

Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …

Data Source, NOAA US Temperatures, thanks to Lucia for the link.

0 0 votes

Article Rating

268 Comments

Ric Werme

Editor

July 10, 2012 10:20 pm

I’m impressed at how nicely your 13 month samples fit the Poisson distribution. Nice review.

jorgekafkazar

July 10, 2012 10:23 pm

Lucia has an update based on low serial autocorrelation data for the 48 states. It’s apparently a lot closer to white noise than she thought previously.

Robert Brown

July 10, 2012 10:24 pm

As always, astounding reasoning, Willis. I find your conclusion flawless. I find that it also supports something that I have long suspected — few people are actually qualified to work with statistics or make statistical pronouncements. From what I recall, Jeff was only quoting some ass at NOAA, so perhaps it isn’t his fault. However, you really should communicate your reasoning to him. I think that there is absolutely no question that you have demonstrated that it is a Poisson process with significant autocorrelation — indeed, from the histogram (exactly as one would expect) and that as you say if anything it suggests that there have (probably) been other thirteen month stretches. It is also interesting to note that the distribution peaks at 5 months. That is, the most likely number of months in a year to be in the top 1/3 is between 1/3 and 1/2 of them!
Yet according the reasoning of the unknown statistician at NOAA, the odds of having any interval of 5 months in the top third are $(1/3)^5 \approx 4/1000$ . They seem to think that every month is an independent trial or something.
Sigh.
rgb

Murray Grainger

July 10, 2012 10:28 pm

1 in 2.6 is close enough to 1 in 1.6 million for the average climate alarmist; what’s your beef? Nothing that a little data adjustment won’t fix.

captainfish

July 10, 2012 10:34 pm

Willis, Can I please borrow your brain for a few days.. I could make a gazillion dollars with all that extra smarts and speed of thought. I can’t even comprehend the amount of work and effort that it even took to come up with the line of analysis, let alone sit down with the data. But then, I still don’t have your brain. But then, unfortunately, not many scientists do either.
Thank you, Sir.

John F. Hultquist

July 10, 2012 10:51 pm

Say you investigate a sample of teen age boys. You will find that each has a height. Do the measurements. Run the numbers.
Now investigate a sample of time intervals with lightening strikes. Not every time interval has a strike. There’s the rub!
Your example (people lined up at a bank teller’s window) is the same idea. Somewhere (long ago), I believe hearing or reading that this is exactly why the Poisson distribution was invented. Not that many folks make it that far in their education.

Robert

July 10, 2012 11:09 pm

Willis, I think you are one of the most clever people I have ever read……very well done and keep up the good work,
Robert

Steve R

July 10, 2012 11:14 pm

This whole 1 in 1.6 million issue has been great entertainment. It’s also been an eye opener, to see so many climate scientists struggling with a fairly basic statistical concept.

July 10, 2012 11:28 pm

Nice work, Willis – I can’t see a flaw in your work but it’s been a long day so I’ll look again for the sake of due diligence in the morning. The data has Poisson written all over as I head off to bed, and that doesn’t leave a lot of wiggle room.

Bart

July 10, 2012 11:32 pm

I’m not getting the controversy. We’re not dealing with a stationary process here. The guy said: “These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.” No kidding. And, we are currently at a plateau in temperature which is the result of combining the steady warming since the LIA with the peak of the ~60 year temperature cycle. So, what’s the gripe? The globe has been warming. Everyone knows it’s been warming. The disagreement is over the cause.

Dave Wendt

July 10, 2012 11:44 pm

It would appear that inadvertently Mr. Masters, or whoever provided him with his numbers, has arrived at a ratio that is quite correct, the only problem being the ratio is applied to the wrong query. If you ask “what are the odds of a story about a human caused plague of horrendous heatwaves, which appears in any Lamestream Media source, NOT being complete BS?” the ratio of 1 in 1.6 million appears, to my eye at least, to be just about spot on.

Venter

July 10, 2012 11:48 pm

Brilliant work, Willis.

Steve R

July 10, 2012 11:50 pm

Bart: The point is that the claimed 1 in 2.6 million chance that this June to June “event” is BS. and regardless of whether there has been warming or not, this “event” is indistinguishable from random.

Michel

July 10, 2012 11:57 pm

as in the song:

S’il fait du soleil à Paris il en fait partout…

You may have it warm in the US. Here in Europe we have a rather cold and wet early summer.
Is there also a 1 in 2.6 probability to experience a series of coldest 13 continuous months within a 116 years period? of wettest? of windiest? of cloudiest? of …?
Anthropogenic warming seems to be more frequent in Northern America than elsewhere (or there are more blogs telling this there than elsewhere).

Weather is not climate. What climate would you like? let’s change it!

Willis Eschenbach

Author

July 11, 2012 12:02 am

Robert Brown says:
July 10, 2012 at 10:24 pm

As always, astounding reasoning, Willis. I find your conclusion flawless. I find that it also supports something that I have long suspected — few people are actually qualified to work with statistics or make statistical pronouncements. From what I recall, Jeff was only quoting some ass at NOAA, so perhaps it isn’t his fault. However, you really should communicate your reasoning to him.

Thanks as always for your comments, Robert. I actually feel sorry for Jeff Masters, because I’ve made foolish public errors myself. It’s not easy, it’s painful, and it is the risk we all take when we blog. So I’ll pass on contacting him, it would look like kicking a man when he’s down.
I suspect he will hear about my analysis in any case. Most people who are truly interested in climate science read WUWT, regardless of their position on the AGW supporter/skeptic continuum, if only to find out what fools these mortals be today.
However, I doubt that in general I am “actually qualified to work with statistics”, as my knowledge of statistics (as with many things) is quite wide but is only deep in some places. However, I am well served by my habit of starting (and perhaps even finishing) by using my Mark One Eyeballs. I get as much data as I can, stretching back as far as I can, and then I put it up on the silver screen and I just think about it. I give it the smell test. I re-plot it from some other angle, I’m a graphical thinker. I give it the laugh test. I use color to give it another dimension. If necessary, I look at and consider each and every case or station record or proxy of the data individually. The work is often very boring, but also has flashes of fire and insight.
I look at the data first, before theorizing about the data, or calculating the statistics of the data, or analyzing the situation using pseudo data or red noise, or doing a monte-carlo analysis. I just look at the data in as many graphical representations as I can dream up. Then I look at it again.
I look because at the core, I’m trying to understand the data, not to measure its waistline. Oh, I may get around to that, but before I pull out the cloth tape and start taking the measurements, I want to know the habits and the relationships and the linkages and the interactions and ultimately the very form and meaning of the data.
Because at the end of the day, statistics are a model of reality. As a result, picking the appropriate model for the situation is the central, crucial, indispensable, and often overlooked first step of any statistical analysis. If you don’t start out with the correct understanding of what’s going on, all the statistics in the world won’t help you. And for me, the only way to get that understanding is to look at the longest dataset I can find, from as many angles as I can, and to think about the data in as many ways as I can.
All the best to you,
w.

P. Solar

July 11, 2012 12:03 am

Damn, a poisson distribution. I always new these climatologists were hiding something fishy !
Very good analysis Willis. It’s a shame that some of these cargo cult scientists are not capable of applying appropriate maths.

P. Solar

July 11, 2012 12:19 am

Doesn’t this analysis basically point out that the warming trend far smaller than the magnitude of variations? The fact that it peaks between 4 and 5 months tells us that the dominant variation is of that timescale. This is what we commonly call seasons.
None of this is surprising but it still does a very good job of pointing out how rediculous and inappropriate Masters’ comment was.
Maybe he should have thought about it before using it. The fact he apparently got it from someone at NOAA does not excuse his need to think whether it makes sense before using it himself.

SasjaL

July 11, 2012 12:21 am

AWG: a chance of 1 in 1.6 million of being close to a correct analyze …

David

July 11, 2012 12:40 am

Six months of record recent heat wave was a once in 800,000 year event????
There have been 372,989 correctly recorded daily high temperature records in the US since 1895. 84% of them were set when CO2 was below 350ppm.
http://stevengoddard.wordpress.com/2012/07/08/heatwaves-were-much-worse-through-most-of-us-history/
http://stevengoddard.wordpress.com/2012/07/08/ushcn-thermometer-data-shows-no-warming-since-1900/
Lots of things are very rare….
ttp://stevengoddard.wordpress.com/2012/07/11/1970s-global-cooling-was-a-one-in-525000-event/
http://stevengoddard.wordpress.com/2012/07/11/last-ice-age-was-a-one-in-a-google-event/
sheesh!

omnologos

July 11, 2012 12:43 am

Poisson is by far the most important and least known mathematician for all aspects of real (non-relativistic) life.

Willis Eschenbach

Author

July 11, 2012 12:51 am

Bart says:
July 10, 2012 at 11:32 pm

I’m not getting the controversy. We’re not dealing with a stationary process here. The guy said: “These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.”

The controversy is that the “ridiculously long odds” he refers to are wildly incorrect …
w.

tonyb

July 11, 2012 1:41 am

Robert Brown
I think you make a very good point when you say that few people are qualified to talk about statistics. That is so even on quite simple statistics, but at the sort of level of tree ring analysis and many other facets of climate science I think the maths is quite beyond most scientists as it is a special separate field that they are unlikely to have learnt in detail.
It would be interesting to know who is actually qualified to interpret statistics arising from their work or whether they get in genuine experts to check it through. I suspect the number that do is very small indeed.
tonyb

cd_uk

July 11, 2012 1:45 am

Willis
I have to be a pedant here but what you plotted is a bar chart not a histogram 😉
I think the main problem with pushing out the trite case of probability with a time series is that you have no base. For example where, and at what length of time, would you determine a norm. He assumes that time series are not second order stationary (why?) and hence you can get a change in the distribution with time (dirft) which is implicitly suggested in his (paraphraising) “…you can only get this if you’re in a warming world…”. This assumption is only the result of incomplete information as it only appears as such when choosing a small portion of the time series. So absolute nonsense, there is no need for any more statistics in my view as it is completely flawed because of incomplete data although you make a more appropriate case here.

Brian H

July 11, 2012 2:02 am

Since, as you say, “one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …” it follows that warming climate reduces the likelihood of severe weather. Which is what the “heat gradient flattening” POV (mine) predicts.

Mike

July 11, 2012 2:02 am

What Willis missed is that Jeff sold his website to the weather channel for some beaucoup buxes and he needs to deliver this sort of bs in a technical fashion so they can feel they are getting a good deal. Model that in your Poisson distribution. big guy!

1 2 3 … 11 Next »

wpDiscuz

Share this:

Related Posts

Worry About Climate Fearmongering – Not Climate Change

The Climate Cult

Claim: The Trump Administration is Shutting Out Climate Refugees

Crackpot Christians Worried About GB News!