Hell and High Histogramming – Mastering an Interesting Heat Wave Puzzle

Guest Post by Willis Eschenbach

Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:

U.S. heat over the past 13 months: a one in 1.6 million event

Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.

All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.

Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.

Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.

The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match

Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.

Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.

So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.

Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source

Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.

The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.

Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …

Data Source, NOAA US Temperatures, thanks to Lucia for the link.

0 0 votes

Article Rating

268 Comments

Inline Feedbacks

View all comments

rgbatduke

July 12, 2012 11:25 am

So no, Masters was NOT setting out to prove the climate was warming, that’s totally contradicted by his own words. He was claiming that in the current, warming climate, the odds were greatly against 13 being in the warmest third. They are not, it’s about a 50/50 bet.
And I almost agree, except that (as I explained in some detail) it’s more subtle than that. I don’t really object to your histogram and projected probability (as I pointed out in my very first post) I think it is actually very persuasive.
What I disagree with is that the observation is actually far more interesting than that — if it is properly analyzed. It can always be “p happens” (or “a black swan event”) but in truth it remains unlikely even in trended data with noise! unless the data either has substantial autocorrelation, substantial skew/kurtosis, or (almost the same thing) something happened to sigma. Or, of course, unless there is undetected bias in the underlying data set!
Random number generator testing is my thing. So here’s a formal null hypothesis.
a) The data being fit (shall we say GISS) to determine both trend and sigma is unbiased.
b) Given the trend and sigma from the data, the probability of obtaining 13 months in a row in the top 1/3 of all of those particular months in the dataset is small, say p = 0.001.
c) We observe a string of 13 months in a row in the very first/only experiment we conduct. The probability of obtaining this is 0.001
Most people who do hypothesis testing would at least provisionally reject the null hypothesis, would they not? Or they would look at the data more carefully and recompute the probability, perhaps slapping themselves on the forehead and going “Doh!” at the same time. What they would not do is use this to conclude anything egregious based on their computation of 0.001, because the very smallness of the probability is strong Bayesian evidence that it is wrong!, especially when it happens in the one-trial sampling of 100 years.
Yes, sometimes random number generator testers produce results where p = 0.001 (or less) for good random number generators. My own tester sometimes does. Roughly one time in a 1000, for a good generator and a good test. But if it happened the first and only time I could run a known good test on a presumed good (null hypothesis) generator, I would hesitate to use that generator anywhere I really counted on the results being unbiased.
rgb

July 12, 2012 11:29 am

Willis,
So no, Masters was NOT setting out to prove the climate was warming, that’s totally contradicted by his own words. He was claiming that in the current, warming climate, the odds were greatly against 13 being in the warmest third.
That statement is completely false.
Quoting you, quoting Masters:
“These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.”
You left that conclusory sentence out of your most recent post, though you did include it up top. Odd.
Masters point was that the recent 13 observations are so unlikely to have occured in an unchanging climate that the climate must be warming. That is a non-sequitur built on a strawman, but it is what he meant to do. You don’t appear to understand what he was getting at, which likely explains the irrelevance of your post.
Masters was very wrong in the argument he made, but what you have presented above does not engage it.

Willis Eschenbach

Author

July 12, 2012 11:43 am

Nick Stokes says:
July 11, 2012 at 9:10 pm

Willis,
Your choice of a Poisson distribution has been criticised, not least because it gives a finite probability for getting 14 months out of 13. And if it gets that tail value wrong, 13/13 is a worry too.

Nick, as always good to hear from you. You are correct, but the difference is trivially small. The cumulative poisson distribution for the lambda in question (5.17, the mean of the data) from 0 to 13 is 0.9990. As a result, the largest difference it could make is 0.001 …

In fact, the Poisson is just the limiting form of the binomial for events of low probability. So the binomial for 13 would look quite like a Poisson anyway, and doesn’t have this issue. So you might as well use it.

As I showed above, the Kolmogorov-Smirnov test resounding rejects the binomial distribution for the results, while it fails to reject it being a Poisson distribution.
In addition, the histogram of a binomial for 13 with 1374 trials looks nothing like that of a poisson distribution for the same number of trials. In particular, the frequencies at the higher end are much greater for the Poisson case … here’s a typical random Poisson (red) vs. binomial (blue) (1374 trials, lambda = 5.17 for Poisson, p=5.17/13 for binomial) :

In fact, that’s just what Masters did, with p=1/3. In effect, you’re regarding this p as a fittable parameter, rather than understood from first principles. And when fitted, it comes out to something different.

No, I’m not. I’m using the mean of the data as the lambda in a Poisson distribution. I’m not doing anything with p.

That discrepancy is an issue, but I think in any case if you do want to fit a distribution, the binomial is better.

Kolmogorov and Smirnov beg to differ …
w.

Willis Eschenbach

Author

July 12, 2012 11:45 am

JJ says:
July 12, 2012 at 11:29 am

Willis,

So no, Masters was NOT setting out to prove the climate was warming, that’s totally contradicted by his own words. He was claiming that in the current, warming climate, the odds were greatly against 13 being in the warmest third.

That statement is completely false.

He said, and I quote:

Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years.

So no, he is not trying to show that the climate is warming. He specifically said that those are the odds ASSUMING THAT THE CLIMATE IS WARMING.
w.

HenryP

July 12, 2012 12:13 pm

Gail says (quoting somebody-who)
“What we’re seeing is a long term trend, a steady decrease in pressure that began sometime in the mid-1990s,” explains Arik Posner, NASA’s Ulysses Program Scientist in Washington DC.
Henry says
Well, what did I tell you
http://wattsupwiththat.com/2012/07/10/hell-and-high-histogramming-an-interesting-heat-wave-puzzle/#comment-1030645
I am still puzzling about the connection with ozone
I am sure I will still find out

July 12, 2012 12:14 pm

Willis,
He said, and I quote:
You need to read what you quote. For your convenience, I have bolded the parts that don’t comport with your misunderstanding.
“Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years.”
So no, he is not trying to show that the climate is warming. He specifically said that those are the odds ASSUMING THAT THE CLIMATE IS WARMING.
Masters was parroting NCDC’s talking point that the 13 recent observations demonstrate that the climate is not static, and thus must be warming. That was their whole point. I am at a loss to explain how a person whose mother tongue is English cannot understand this. Read the whole paragraph, in toto:
U.S. heat over the past 13 months: a one in 1.6 million event
Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.
Lucia gets it. Here is how she summarized her replication of Master’s calc, using improved stats. For your convenience, I have bolded the parts where she refers to the conclusion Masters draws from his calc, and the assumption his calc is based on:
So, what does the 10% probability this mean about global warming?
Nothing. Absolutely nothing. What this means is that trying to demonstrate global warming by estimating the odds of getting 13 months of temperatures in the top 1/3rd of historic records under the assumption that the climate has not changed is often a stoooopid way of proving or disproving global warming.
Once again, Masters is wrong but your post does not engage his thesis.

Willis Eschenbach

Author

July 12, 2012 12:22 pm

rgbatduke says:
July 12, 2012 at 11:09 am (Edit)

As said by multiple posters here and elsewhere – the 13 month period of high temperatures is extremely unlikely without a climate trend. With the warming trend, it goes from a 5-6 sigma event to a 2-3 sigma. And that is the point that Masters was making.

And I agree (although what the sigma is depends, as noted, on parameters of the model estimation process and their best interpretation is that the data fails the null hypothesis of unbiased data, BTW, precisely because it is a 2-3 sigma event, infinitely more so as a 5-6 sigma event).

I always hate to disagree with you, Robert, because your science-fu is strong. But no, that’s not the point he was making. He specifically said that those odds of 1 in 1.6 million are “assuming the climate is staying the same as it did during the past 118 years”. His full quote:

According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years.”

Since virtually everyone agrees that the climate has warmed over the past 118 years, he is specifically stating that those are the odds assuming a warming climate, and thus he is not claiming that those odds show that the climate is warming.
Finally, I wish to make it clear that the issue is not the autocorrelation, which is quite small (0.15). It is the non-stationarity of the dataset that has tripped him up.
w.

Willis Eschenbach

Author

July 12, 2012 12:29 pm

Phil. says:
July 12, 2012 at 11:12 am

Willis Eschenbach says:
July 12, 2012 at 10:33 am

That was what I objected to. You and KR and other folks say that he was using his calculation to show the climate was warming. But he specifically made the claim that he was talking about the odds in a warming climate, not that he was using those odds to show that the climate was warming.
Now, what I have done is that show that the odds, not in your claimed theoretical world but in the current warming climate that he himself specified, or as he said “assuming the climate is staying the same as it did during the past 118 years”, that those odds were nothing like what he claimed. If we assume (as he did) the climate is as it was in the last 118 years, then my result gives the correct odds for it happening.

No it doesn’t because as pointed out before your assumption that it is the result of a Poisson process is wrong because you can’t use a Poisson process when there is a trend.

Since by every measure that I can find the results have a Poisson distribution, I fear you are going to have to take that claim up with Mother Nature. I’m just following the observations, and as near as I can tell, they have a Poisson distribution.
I have said several times that if folks think that the results have a different distribution, they need to say what that distribution is … no takers so far. However, you seem convinced that it’s not a Poisson distribution, so how about you give us some idea of what distribution we’re looking at.

Not only that but your own results show that it’s inappropriate because the mean for the statistic is defined to be 4.33 not the arbitrary fitted 5.2 that you found. So even if it were a Poisson process you don’t get the right odds because you use the wrong data.

The “unbiased estimator” for the variable lambda in a Poisson distribution is known to be the mean of the distribution. That is what I have used. It is not “arbitrarily fitted”, although an iterative fit gives the same answer … which is further evidence that it is in fact a Poisson distribution.
But heck, if you think it is something else, let us know what you think it is. I have shown above that the Kolmogorov-Smirnov test rules out a normal distribution and a binomial distribution … so what do you think the distribution is?
w.

Willis Eschenbach

Author

July 12, 2012 12:36 pm

Bart says:
July 12, 2012 at 9:11 am

Nigel Harris says:
July 12, 2012 at 1:53 am

“As several commenters have pointed out (with greater or lesser degrees of condescension), your analysis is tautologous. “

That’s not quite right either, though. IF these data fit the requirements for the particular distribution, it would be quite possible to estimate a non-trivial probability for an event which had not been observed, and the mean frequency of such events in any case.

Thank you, Bart. At least someone gets it. And indeed, as you point out it is “quite possible to estimate a non-trivial probability for an event which had not been observed”. We know this because it is possible to estimate the non-trivial probability of finding 12 out of 13 in the full dataset, merely by looking at the June-to-June data, despite the fact that such an event had not been observed in the June-to-June data.
So your theoretical claim is borne out by the observations.
w.

Phil.

July 12, 2012 3:22 pm

Willis Eschenbach says:
July 12, 2012 at 11:43 am
Nick Stokes says:
July 11, 2012 at 9:10 pm
“In fact, that’s just what Masters did, with p=1/3. In effect, you’re regarding this p as a fittable parameter, rather than understood from first principles. And when fitted, it comes out to something different.”
No, I’m not. I’m using the mean of the data as the lambda in a Poisson distribution. I’m not doing anything with p.
That’s right you’re using an arbitrary value for p obtained from fitting a distribution as the parameter governing a controlling Poisson process, which it can’t be since the required conditions for a Poisson process aren’t met. If they were, p for the process is 1/3 and the mean it gives is 4.33 not 5.2. When the correct value is used the probability for 13 out of 13 is approx. 1/2500. Masters’ statement that, using a binomial distribution, the odds of it happening again were about 1/1.5million in any given month, hence in an unchanging climate not likely to occur for a long time, was overestimated because of the failure to account for autocorrelation, although as shown by Lucia only by about a factor of ten. As I posted before but apparently got lost, the reason you got a false mean is because of the trend, so your fitted value has no predictive value.
A simple illustration is if the data can be divided into two parts, the early part with a mean temperature of say 15º which is governed by a Poisson process the mean of which is 4.33, the second part with a mean temperature of say 15.5º which is also governed by a Poisson process with a mean of 4.33. If you look at the resultant composite distribution produced it is still a Poisson distribution but with a mean of 8.67, however that parameter has no predictive value!

Willis Eschenbach

Author

July 12, 2012 3:25 pm

Nigel Harris says:
July 12, 2012 at 5:42 am

cd_uk says

“you’ll see a lot worse in peer reviewed literature”.

I challenge you to find a single example of peer reviewed literature in any non-vanity journal that includes an analysis that is as bad (on so many levels) as this is. This is cargo cult science at its finest.
It would appear that most commenters on WUWT really have no critical faculties at all. The thought process seems to go: Willis seems like a good bloke and he writes lots of sciency-looking stuff that always comes to the conclusions I want to hear, so everything he writes must be great, and anyone pointing out the glaring flaws in his circular argument should “lighten up”.

Nigel, I believe I have replied in detail to every single issue that you have raised. Are you right? Am I right? That question is still not answered. Some people have agreed with you, and some with me.
As a result, your claim that people here have “no critical facilities” has nothing to support it … other than the fact that you are not showing much in that line, I suppose.
What’s not clear to me is why you have decided to go on a rant abusing almost everyone’s critical facilities … I thought we were discussing distributions.
Unilaterally declaring victory and insulting “most commenters” doesn’t raise your reputation in anyone’s opinion, it just makes you look like a sore loser.
I have asked several times for people who do not think this is a Poisson distribution to identify what kind of distribution it is, and to verify that statistically. I have shown that K-S rejects normal and binomial distributions, and fails to reject Poisson. So if you’re so damn smart, how about you tell us what kind of distribution it is, and give us the Kolmogorov-Smirnov results that support your claim?
Because so far, all you’ve shown us is a smart mouth … and that’s a whole lot different than a smart mind.
w.

Willis Eschenbach

Author

July 12, 2012 4:15 pm

KR says:
July 12, 2012 at 6:31 am

Willis Eschenbach – The question Masters was investigating was how likely the 13 months in a row of top 1/3 temperatures was absent a trend?

Masters said:

Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years.”

That means that he is giving the odds assuming the climate is warming, unless you are claiming that Masters thinks the climate was not warming over the past 118 years.
However, that is a very peripheral issue to the question of the correct odds of finding the 13 in the warmest third.

[Incidentally, insofar as the Shapiro-Wilk test goes, monthly anomalies standardized by their SD (which is reasonable considering that the top 1/3 check is on a monthly basis) do follow the normal distribution.

I’m supposed to be impressed because you can transform something which is not a normal distribution into a normal distribution? I’m not. But in any case, the question is the distribution of the results, not the distribution of the data.

The question you asked (and answered) is how much do the observations look like the observations? You fit a Poisson distribution – you might as well have fit a skewed Gaussian, a spline curve, or a Nth order polynomial; each would be in that case descriptions of the observations.

How many times do I have to say it? I looked to see if the data had the form of a standard Poisson distribution. It does have that form, with lambda equal to the mean of the data just as you would expect. I didn’t tweak the data, I didn’t skew a gaussian to make it agree, I didn’t fit a polynomial. I looked to see if it fit a bog-standard Poisson distribution, and it does fit it to a T. Not only that, but the K-S test rejected both normal and binomial distributions, but it failed to reject a Poisson distribution.
Your claim that I can’t apply Poisson statistics to these results is like looking at results that follow a bog-standard binomial distribution, say results from flipping a coin, and saying “Hey, you can’t apply binomial statistics to coin flipping! You’ve fit your results to a binomial distribution”.
No, flipping coins is not “fit” to a binomial distribution, any more than the data in this case is “fit” to a Poisson distribution. As near as I can tell, that’s what the distribution actually is, or at least is indistinguishable from.
If you think it is following another distribution, what distribution is it following, and what does the K-S test say about your claim?
w.

Phil.

July 12, 2012 4:32 pm

Willis Eschenbach says:
July 12, 2012 at 12:36 pm
Bart says:
July 12, 2012 at 9:11 am
Nigel Harris says:
July 12, 2012 at 1:53 am
“As several commenters have pointed out (with greater or lesser degrees of condescension), your analysis is tautologous. “
That’s not quite right either, though. IF these data fit the requirements for the particular distribution, it would be quite possible to estimate a non-trivial probability for an event which had not been observed, and the mean frequency of such events in any case.
Thank you, Bart. At least someone gets it. And indeed, as you point out it is “quite possible to estimate a non-trivial probability for an event which had not been observed”. We know this because it is possible to estimate the non-trivial probability of finding 12 out of 13 in the full dataset, merely by looking at the June-to-June data, despite the fact that such an event had not been observed in the June-to-June data.
The most important word in Bart’s post being “IF”, unfortunately as pointed out before the requirements for a Poisson process are not met and the probability estimate you make will not be accurate. Regardless of the form of the distribution it’s trivial to predict that in the full dataset there must be at least two 12 out of 13 samples.

Steve R

July 12, 2012 5:09 pm

I’m so freakin mixed up. Is Masters really right? Are we really not going to see another 13 month heat wave for 1.6 million months? WUWT?

Bart

July 12, 2012 5:10 pm

Phil. says:
July 12, 2012 at 4:32 pm
‘The most important word in Bart’s post being “IF”’
Indeed it is. I am not taking sides in this debate. That would require me to do work of my own to investigate the issues, and I’m not motivated to do so due to the triviality of its impact on the larger AGW debate. So much heat generated for so little light in this thread…
“…unfortunately as pointed out before the requirements for a Poisson process are not met and the probability estimate you make will not be accurate.”
Poisson or not, the general morphology is reasonably close. It could easily be accessible to the field of non-parametric statistical methods, which I’d imagine might well yield similar conclusions.

Phil.

July 12, 2012 5:39 pm

Bart I’m interested that you think the ‘morphology is reasonably close’ since Willis’s fit of a Poisson says that there is an approximately 40% probability of an event being in the top third of it’s historical range!

July 12, 2012 7:10 pm

Willis Eschenbach
KR: Masters said:
“Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years.”
WE: “That means that he is giving the odds assuming the climate is warming, unless you are claiming that Masters thinks the climate was not warming over the past 118 years.”
Masters quoted 1:1,594,323, which is the value given by 1/3^13, or the chance of 13 successive months being in the top 1/3 of their historic range assuming no auto-correlation. Not their recently trending range, but the range over the last 117 years. Those are the odds for a non-trending climate.
He then stated (as you quoted in the opening post!!!): “These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.”
Masters quoted the odds for a non-trending climate as an illustration of the trend. I’m really scratching my head over how anyone could interpret his words otherwise.
—
The other issue I have with this thread is that your Poisson fit is purely descriptive – the observations fit a curve which predicts the observations, in a dog-chasing-tail fashion. I got roughly the same quality of fit with a cubic spline, and with a skewed Gaussian. In each and every case that description of the data has a close to 1:1 match to the observations it’s derived from.
But the whole discussion is about how likely those observations would be given the full record and the observed variance. For that you need a prediction (not a derivation) from the statistical qualities of the data, and you have not done that half of the investigation. The only thing you have stated is The observations closely resemble… the observations. That’s not a probability test.
Have you looked at Lucia’s Monte Carlo tests? The ones that from the data variance predict odds of ~1:150,000 of this 13 month streak occurring without a trend?

joeldshore

July 12, 2012 7:11 pm

Willis says:

Since virtually everyone agrees that the climate has warmed over the past 118 years, he is specifically stating that those are the odds assuming a warming climate, and thus he is not claiming that those odds show that the climate is warming.

That is rather a tortured reading of what Masters actually said. Furthermore, if virtually everyone agrees on this, then why does Anthony regularly post stuff claiming that the heat wave cannot in any way be related to global warming or that the U.S. was just as hot back in the 1930s or other such stuff.
And, furthermore, if that was what Masters was trying to show, why would he argue that this likelihood is so small in a warming climate? Is he trying to prove it is not warming? Your interpretation basically makes no sense at all.

Bart

July 12, 2012 7:41 pm

joeldshore says:
July 12, 2012 at 7:11 pm
“… why does Anthony regularly post stuff claiming that the heat wave cannot in any way be related to global warming or that the U.S. was just as hot back in the 1930s or other such stuff.”
A) We’re talking extreme weather events in that case, not the fractions of a degree of observed warming according to the global temperature metric.
B) Why do people on your side regularly post stuff claiming extreme cold weather we experience in no way refutes AGW? If extreme hot proves AGW, surely extreme cold refutes it.
But, thanks for crystallizing the debate for me. I now realize that, for categorizing temperatures into bins, the modest warming we had in the early and latter thirds of the 20th century are relatively small with little impact on extreme weather, and Willis is probably on the right track after all.

July 12, 2012 8:32 pm

KR says:
July 12, 2012 at 7:10 pm
Masters quoted 1:1,594,323, which is the value given by 1/3^13, or the chance of 13 successive months being in the top 1/3 of their historic range assuming no auto-correlation. Not their recently trending range, but the range over the last 117 years. Those are the odds for a non-trending climate.

Not quite, the 1:1.6 million corresponds to the probability of this particular 13-month stretch will be in the top 1/3 of historical temperatures. However, the real question that we want to answer is what is the probability that we will observe at least one 13-month stretch that is in the top 1/3 of the historical range. To answer this question, you must evaluate the probability of observing this streak against all possible outcomes. For independent trials, the probability of this occurring is 1 in 1730.

Willis Eschenbach

Author

July 12, 2012 9:30 pm

KR says:
July 12, 2012 at 7:10 pm

… your Poisson fit is purely descriptive – the observations fit a curve which predicts the observations, in a dog-chasing-tail fashion. I got roughly the same quality of fit with a cubic spline, and with a skewed Gaussian. In each and every case that description of the data has a close to 1:1 match to the observations it’s derived from.

Thanks, KR. Since you have neglected to give us the Kolmogorov-Smirnov results for your distributions, I can only assume that you haven’t calculated them or you aren’t saying. Until you do, I won’t comment on your claims, they’re purely anecdotal. In any case, I was unaware that “cubic spline” was a distribution …

But the whole discussion is about how likely those observations would be given the full record and the observed variance. For that you need a prediction (not a derivation) from the statistical qualities of the data, and you have not done that half of the investigation.

That’s the theoretical way to find out “how likely those observations would be given the full record and the observed variance”, and it’s a good way to do it. In that method, you look at the distribution of the data, and from that you draw your conclusions about what results you might find.
But it’s not the only way to find out “how likely those observations would be given the full record and the observed variance”. In the other method, the one I’m using here, you look at the distribution of the results, and from that you draw your conclusions about what further results you might find.
You keep saying I can’t look at the distribution of the results and draw conclusions, that somehow that is “fitting” the results. But you are advising me to do the same thing with the data—to look at the distribution of the data and draw conclusions.

The only thing you have stated is The observations closely resemble… the observations. That’s not a probability test.

Yes, and the only thing that you have stated is that The data closely resembles … the data. Here’s the difference in our methods.
You are looking at the distribution of the underlying data.
I am looking at the distribution of the actual results.
Perhaps it would make more sense if you think of it as a “black box” type of analysis. In that type of analysis, you have have a black box, which has outputs, but you don’t know what goes on in the black box. All you know are the outputs of the black box. You have to study the outputs because you don’t know the details of what’s in there. The goal is to figure out what kind of process is going on inside the black box.
So for example if we determine that what comes out of the black box are numbers in a Gaussian distribution, we can say that the mystery process is Gaussian. And based on that fact alone, we can make predictions about what numbers will come out of the black box in the future. Now, we don’t know what the process is in the black box. It might be a speck of nuclear material with a counter that spits out a “1” when it detects a nuclear decay. It might be a computer generating random numbers.
But regardless of the process, once we have observed a thousand or so outcomes, we can make a very good guess about what the odds are of a given number coming up … without understanding the guts of the black box in the slightest.
That is the method that I am using. I understand that it is anathema to theoreticians, but I assure you, that doesn’t mean it is wrong or weak. In fact, it is a very powerful technique when used wisely.
Now, I don’t know why it is that the results of this particular mathematical operation on particular climate dataset has a Poisson distribution … but that’s the nature of black boxes. That doesn’t stop me from calculating the odds of finding a given outcome in the output of this particular black box.
As with any technique, it has to be used judiciously. You can’t, as you point out, just fit it to an arbitrary shape, or use a cubic spline. You need to use actual distributions, and use the usual statistical tests to determine whether it actually is the distribution that you think it might be. You need to sub-sample it and see if the statistical tests are still valid, or if it’s just an oddity.
Once you do know what the distribution is, though, then you should be able to establish the odds of any given result.
All the best,
w.
PS—Is this a “real” Poisson distribution, generated by a “real” Poisson prices? Here’s the thing. If it is statistically indistinguishable from a real Poisson distribution, in both the whole and the parts, it doesn’t matter … the statistics of the Poisson distribution are applicable to it.
In that regard, here are the Kolmogorov-Smirnov results for the months individually:
Month, p-value
Jan, 0.79
Feb, 0.80
Mar, 0.79
Apr, 0.79
May, 0.86
Jun, 0.90
Jul, 0.86
Aug, 0.84
Sep, 0.81
Oct, 0.74
Nov, 0.83
Dec, 0.76
Note that in all cases the K-S test strongly fails to reject the Poisson distribution. As further evidence of the stability of the distribution, the average for the individual months (which is the unbiased estimator of lambda for the Poisson distribution of each individual month’s results) is as follows.
Jan, 5.07
Feb, 5.12
Mar, 5.17
Apr, 5.16
May, 5.11
Jun, 5.12
Jul, 5.18
Aug, 5.16
Sep, 5.12
Oct, 5.10
Nov, 5.18
Dec, 5.22
Average, 5.14
Std. Dev, 0.04
Ao s I said above, I don’t know why the results from this particular climate black box have a Poisson distribution … but assuredly, they do.

Ron Broberg

July 12, 2012 9:33 pm

Phil: Bart I’m interested that you think the ‘morphology is reasonably close’ since Willis’s fit of a Poisson says that there is an approximately 40% probability of an event being in the top third of it’s historical range!
While I agree that Willis has inappropriately used a model which requires independent events, I do agree with inference that there is an approximately 40% chance that an event will be in the top third of its historical range given that the previous month was also in its top third.
http://rhinohide.wordpress.com/2012/07/12/eschenbach-poisson-pill/

July 12, 2012 11:58 pm

Willis,
“That means that he is giving the odds assuming the climate is warming, unless you are claiming that Masters thinks the climate was not warming over the past 118 years. “
Dont be silly.
Masters clearly thinks that the climate has warmed over that past 118 years, and that is why the odds he gave assume that the climate has not warmed. His whole point (parroted from the NCDC original) is that the long odds of the current 13 month streak assuming the climate has not warmed are proof that the climate is warming. How can you be blind to this?
The method that NCDC/Masters used (and the method that Lucia replicated) is the standard method of statistical hypothesis testing:
Step 1 – Assume that the opposite of your favored hypothesis is true. In statistics, this is opposite is called the null hypothesis.
Step 2 – under the assumption that the null hypothesis is true, calculate the odds that some observed phenomenon could have occurred.
Step 3 – If those odds are very small, then declare thee null hypothesis to be rejected. Declare support for your favored hypothesis (in statistics called the alternate hypothesis).
This is exactly what NCDC/Masters did:
Step 1 – Being flaming warmists, their favored hypothesis is that the climate is warming, so they assumed that climate has not changed whatsoeve in 118 years.
Step 2 – They calculated (incorrectly) the odds that the current 13 month streak of warm temps could have occurred, assuming that the climate has not changed whatsoever in 118 years
Step 3 – The odds that they calculated (incorrectly) were very small, so they claimed that this disproves the assumption that the climate has not changed whatsoever in 118 years. They then claim that this proves their favored hypothesis – that the climate has warmed, and (by implicit over-reaching) that it is all our fault, and we are all going to die if we don’t sign over our lives to GreenPeace.
You’re a bright guy. Having had this pointed out to you several times now, you have to understand your error. Isn’t it about time you fessed up?

pjie2

July 13, 2012 2:39 am

As with any technique, it has to be used judiciously. You can’t, as you point out, just fit it to an arbitrary shape, or use a cubic spline. You need to use actual distributions, and use the usual statistical tests to determine whether it actually is the distribution that you think it might be.
Unless you know the underlying process is a Poisson one (successive independent events), then the Poisson curve is an “arbitrary shape”. As you say yourself, you have no idea “why the results from this particular climate black box have a Poisson distribution”. More correctly, you should say you have no idea why they resemble a Poisson distribution – the key word being “resemble”, because there is no reason whatsoever to suppose that they necessarily are a Poisson distribution. It could easily be “Poisson-like, except at the extreme tails”, for example.
So, in summary, you have:
1) Missed Jeff’s entire point, which was to prove that the climate is warming, by showing how unlikely a given streak is if you assume the climate is not walking.
2) Fitted an inappropriate model by noticing that the distribution of results looks somewhat like a Poisson distribution.
3) Applied it incorrectly. If you want to test how unusual the recent string of 13 months is, then you have to fit a distribution to the rest of the data set excluding the most recent 13 months. Then, having generated your prediction from the Poisson model, you would at least have a properly-derived expected value to which you could compare the current streak.

Willis Eschenbach

Author

July 13, 2012 3:04 am

JJ says:
July 12, 2012 at 11:58 pm

You’re a bright guy. Having had this pointed out to you several times now, you have to understand your error. Isn’t it about time you fessed up?

So you are saying that he did all of that just to prove that the climate is warming? That’s it? That whole prediction was just to establish warming? That interpretation has seemed so incredible to me that I have resisted it, I thought no one could seriously be doing that.
But I suppose anything’s possible. OK, Masters has set out to conclusively prove what everyone else accepted long, long ago—the earth has been warming, in fits and starts, for at least the last two and perhaps three centuries.
To do so he has assumed a white-noise Gaussian temperature distribution, with no Hurst long-term persistence, no auto-correlation or ARIMA structure, and no non-stationarity.
And to no one’s shock, he has shown that those assumptions are false.
You were right, I was wrong, and to my surprise, Masters is foolishly proving what is well established.
Got it.
Thanks,
w.