Hell and High Histogramming – Mastering an Interesting Heat Wave Puzzle

Guest Post by Willis Eschenbach

Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:

U.S. heat over the past 13 months: a one in 1.6 million event

Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.

All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.

Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.

Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.

The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match

Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.

Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.

So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.

Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source

Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.

The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.

Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …

Data Source, NOAA US Temperatures, thanks to Lucia for the link.

0 0 votes

Article Rating

268 Comments

Inline Feedbacks

View all comments

Nigel Harris

July 13, 2012 6:33 am

As Willis has apparently unequivocally established that this particular distribution is *in fact* a Poisson distribution, I am looking forward with great anticipation to the first period of 13 consecutive months within which FOURTEEN of the months fall into the top 1/3 of their historical temperature distributions. His Poisson distribution tells us this is not very improbable, so we shouldn’t have too long to wait.
I could really use the extra time that having fourteen warm months in a 13-month period would give me. And think what a boost to the US economy it would be! Good to see a desirable outcome emerging from the warming temperature series.

July 13, 2012 8:53 am

Willis Eschenbach says:
So you are saying that he did all of that just to prove that the climate is warming?
Yes. That is what warmists do.
But I suppose anything’s possible.
And blatantly obvious.
OK, Masters has set out to conclusively prove what everyone else accepted long, long ago—the earth has been warming, in fits and starts, for at least the last two and perhaps three centuries.
To be completely accurate, it wasn’t Masters who cooked up this erroneous statistic to prove that the climate is warming. It was NCDC. They are the ones who set out to give the statistically illiterate some “statistical proof” of global warming. Masters is just one of the idiots who bought it, and passed it on to a wider audience. That is his role.
To do so he has assumed a white-noise Gaussian temperature distribution, with no Hurst long-term persistence, no auto-correlation or ARIMA structure, and no non-stationarity.
Yes, NCDC did that. And more. They also assumed the strawman argument of no change whatsoever in surface temp over 118 years. And they invited overreach of conclusion – an invitation that Masters happily accepted when he erroneously claimed that the long odds necessitate warming. And the NCDC/Masters team also conflate “has warmed” with “is warming” – their favorite charade since it became inconveniently apparent that warming ceased more than a decade ago.
And to no one’s shock, he has shown that those assumptions are false.
No.
Others have shown that those assumptions are false. To his credit, Masters accepted the criticism and admitted the error. Unfortunately, because the critics spent so much time perseverating over the (turns out relatively minor) errors in NCDC assumptions and methods (and on irrelevant Poisson distribution schemes intended to counter a grossly misunderstood position), they failed to call NCDC/Masters on the egregious errors in their conclusions. This hands NCDC the propaganda win.
You were right, I was wrong, and to my surprise, Masters is foolishly proving what is well established.
Not foolish. If people walk away from this believing that NCDC/Masters have given “statistical proof” of ongoing ‘global warming’, and that this is “proving the well established” then they have accomplished their goal. Deceitful. Evil. Not foolish.
JJ

July 13, 2012 9:14 am

Willis Eschenbach – You claim support from the Kolmogorov-Smirnov test, which is one way to evaluate the distance between the measured distribution function of the sample, and the the reference distribution.
It should be noted, however, that “If either the form or the parameters of F(x) [reference distribution] are determined from the data Xi the critical values determined in this way are invalid. In such cases, Monte Carlo or other methods may be required…” (http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Test_with_estimated_parameters for easy reference). Hence your K-S test is not a valid one.
This is a critical issue with your Poisson distribution, and quite frankly cuts to the core of the problems with your post. You generated your reference (Poisson) distribution directly from the sample itself, and hence comparing that reference to the sample is an invalid, self-referential exercise.
Masters took a reference distribution (normal distributions, consistent with the physics of the issue, which Poisson statistics notably are not), and examined how closely the observations meet the reference (with the error of not including auto-correlation). From that he was able to consider the reference distribution as the null hypothesis, and rejected it. Tamino incorporated auto-correlation in the sequential statistics, although not a consideration of a full 12-D normal distribution (hence an estimate he considers too small), and again considers the behavior of that reference distribution as his null. Lucia ran Monte Carlo distributions as her null.
You, on the other hand, are making invalid tests of sample versus a distribution generated directly from the sample. This means you have no null hypothesis to test against, you are comparing your sample with your sample, and (as said before) your consideration of these observations against the observations is statistical tautology.

Willis Eschenbach

Author

July 13, 2012 10:21 am

KR says:
July 13, 2012 at 9:14 am (Edit)

Willis Eschenbach – You claim support from the Kolmogorov-Smirnov test, which is one way to evaluate the distance between the measured distribution function of the sample, and the the reference distribution.
It should be noted, however, that “If either the form or the parameters of F(x) [reference distribution] are determined from the data Xi the critical values determined in this way are invalid. In such cases, Monte Carlo or other methods may be required…” (http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Test_with_estimated_parameters for easy reference). Hence your K-S test is not a valid one.
This is a critical issue with your Poisson distribution, and quite frankly cuts to the core of the problems with your post. You generated your reference (Poisson) distribution directly from the sample itself, and hence comparing that reference to the sample is an invalid, self-referential exercise.

Thanks, KR. Here’s my problem. You look at the data, and you say “It has all the characteristics of a normal distribution”. You test it statistically to see if it has the form of a normal distribution, and if it passes the tests, you draw conclusions from that fact.
I look at the results, and I say “It has all the characteristics of a Poisson distribution”. I test it statistically to see if it has the form of a Poisson distribution, and if it passes the tests, I draw conclusions from that fact.
You keep saying that your procedure is legitimate, and mine is not … I’m not following that argument.
In this case, if I understand your quoted material, you (and Wikipedia) claim that if I use the K-S test and I choose to test the data against any kind of Poisson reference distribution, the test is invalid.
Why? Because, according to you, I’ve looked at the results and used it to choose the form (Poisson) of the reference distribution.
I fear that I’m not seeing the logic of that. Perhaps you can explain it. I can understand the part about the parameters, but not the part about the form.
In any case, I’m using the two-sample K-S test, and according to Wikipedia that consideration applies to the one-sample K-S test … and the same statement is made concerning the R function that I am actually using, viz (emphasis mine):

If a single-sample test is used, the parameters specified in … must be pre-specified and not estimated from the data.

However, it says nothing about the form of the distribution, so I’m gonna say that Wikipedia got that one wrong.
Next, I have used the same two-sample K-S test, using parameters estimated from the results, to test whether the distribution is Gaussian or is Binomial. In both cases the test strongly rejected the distributions.
w.
PS—Despite your objections to the tests that I have done, I note that you have not submitted the K-S or any other tests for the distributions that you claimed were of the “same quality of fit” as the Poisson distribution. I also note that you have not found any test that rejects the idea that the results have the form of a Poisson distribution.

Bart

July 13, 2012 10:35 am

Nigel Harris says:
July 13, 2012 at 6:33 am
You may be right, or you may be wrong. But, you haven’t demonstrated anything, either analytically or through simulation, to prove it. Personally, I see that the statistic is closely related to others which are Poisson distributed, so it isn’t much of a stretch to expect that the variable should have something at least semi-closely related, and it therefore might well give results which are close to reality. So, at the least, you can consider it a sort of parametric curve fit to the actual distribution. It may or may not be a particularly accurate curve fit, but I have not seen anyone demonstrate it one way or the other. It certainly looks reasonably close to the histogram.
It is fairly easy to do a monte carlo with independent samples (which may not be quite appropriate for the real world, but at least is a starting point, and can be used to check how far off Masters was). Just generate a 1392 by 3 grid of random numbers in the range zero to 3 and take the integer value. Assign the values of “2” the distinction of being in the upper 1/3. Repeat Willis’ procedure, and check and see if the probabilities you get for 14 contiguous, 15 contiguous, etc… are reasonable.
BTW, it is very easy to give a very conservative lower bound on the probability of 13 consecutive temperatures in the upper 1/3 for independent samples, which at least shows Masters’ estimate to be wildly inaccurate. You have something like 9 non-overlapping independent 13 point windows. The odds for all of them be high in each one are p = 1/3^13 (Masters’ estimate). The odds of that not happening are q = 1 – 1/3^13. The odds of none of the 9 having all high values is q^9. So, the odds of having at least one with all high values is 1 – q^9, which is approximately 9/3^13, almost ten times greater than Masters’ estimate. And, that is ignoring streaks which occur across the non-overlapping window boundaries and looking only at 13 long streaks, not 13+.
Willis Eschenbach says:
July 13, 2012 at 3:04 am
“…Masters is foolishly proving what is well established.”
I’m no longer entirely of that opinion. We are talking about extreme weather events here. I was originally of the opinion that JJ @ur momisugly July 12, 2012 at 6:42 am had the right idea, of regressing out the long term components of known warming. But, that warming signal is fairly small, and I think it should not change the range of the 1/3 bands very significantly.
The key thing about the actual distribution is that, a Poisson distribution assumes that events occur with an average rate independently of the time of since the last event. But we are surely looking at a variable which is correlated in time such that the occurrence of an event makes succeeding such events more likely. So, perhaps a more appropriate distribution is a Conway–Maxwell–Poisson type.

Bart

July 13, 2012 10:40 am

“You have something like 9 non-overlapping independent 13…”
‘Scuse me. You have over 100. So, that’s about 100/3^13 or 100 times less than Master’s estimate.

Bart

July 13, 2012 10:42 am

“…or 100 times greater thanthan Master’s estimate.”

July 13, 2012 11:11 am

Willis,
I look at the results, and I say “It has all the characteristics of a Poisson distribution”. I test it statistically to see if it has the form of a Poisson distribution, and if it passes the tests, I draw conclusions from that fact.
The problem is, you are drawing conclusions that are not supported by your methods. If you find that you can fit a Poisson distribution to the observations, the only conclusion that you can legitimately draw from that is: that you can fit a Poisson distribution to the observations. When you attempt to draw other conclusions above, all you are really doing is comparing the observations to themselves, and finding that they are comparable. The professional term for this phenomenon is called: Well, duh.
You keep saying that your procedure is legitimate, and mine is not … I’m not following that argument.
Because you are not recognizing the difference between what you are doing and what others are doing. Look at what NCDC did (and what Masters blindly parroted, and Lucia and Tamino reanalyzed, etc) NCDC did this:
1. They assumed something that they wished to disprove. In this case they assumed the climate for the last 118 years has not changed one tiny little bit.
2. They made a statistical model of their assumption. In this case, their statisitical model is stochastic variation about an average climate that for the last 118 years has not changed one tiny little bit.
3. They compared observations against the expectations of the statistical model of their assumption, found them to be incompatible (very long odds), and on that basis rejected the assumption. This is standard hypothesis testing, which is a variant of the “proof by contradiction” method of standard logical reasoning.
That is what they did. This is what you did:
1. You attempted to disprove a position that you misunderstood 100%.
2. You made a statistical model of some observations. In this case a jiggered Poisson distribution.
3. You compared the observations to the descriptive statistics derived from those same observations, found them to be compatible (well, duh) and on that basis rejected the position that you didn’t understand in the first place. In the discourse of formal logic, the term for this is WTF.
The fundamental problem is that you completely misunderstood what NCDC/Masters were saying. Everything that follows – this entire blog post – is rendered irrelevant or invalid by that misunderstanding. Having been made aware of the error, you should return to first principles and begin again under the proper understanding of what it is you are responding to. There is plenty of crap wrong with what NCDC/Masters actually said, and thus far none of the bloggers has managed to address most of it.
JJ

Bart

July 13, 2012 11:18 am

“Just generate a 1392 element array of random numbers in the range zero to 3 and take the integer value.”
I’m in a hurry, please forgive my sloppiness.

Phil.

July 13, 2012 11:34 am

Ron Broberg says:
July 12, 2012 at 9:33 pm
Phil: Bart I’m interested that you think the ‘morphology is reasonably close’ since Willis’s fit of a Poisson says that there is an approximately 40% probability of an event being in the top third of it’s historical range!
While I agree that Willis has inappropriately used a model which requires independent events, I do agree with inference that there is an approximately 40% chance that an event will be in the top third of its historical range given that the previous month was also in its top third.
http://rhinohide.wordpress.com/2012/07/12/eschenbach-poisson-pill/
Agreed, and a similarly enhanced chance that an event will be in the bottom two-thirds if the previous month is in the bottom two-thirds, but Willis’s analysis infers that 40% of all the events will be in the top third. Overall for the entire dataset the probability that an event will be in the top third is one-third, i.e. p=0.333.
What we are looking at here is the probability distribution of the number of occurrences of an event being in the top third of its range taken 13 at a time, so p=0.333 and N=13. If this is a Poisson process the mean of the distribution function will be N*p or in this case 4.333, this is fixed, you can’t arbitrarily fit a value to it. This is what Willis doesn’t understand, the mean of the generated Poisson distribution is not a free parameter, it’s defined by the process.
To test this properly Willis should have superimposed a Poisson distribution with a mean of 4.333 on the dataset. Having done so he would see that the fit was not good and realized that this was not a Poisson process, and since there was a higher probability of longer sequences perhaps come to the conclusion that there was a degree of autocorrelation present (i.e. the data are not independent, a requirement of a Poisson process).
Yes you can fit a Poisson-like distribution to the data and let the mean be a free parameter with the result that you conclude that there is a 40% chance of being in the top third of the distribution, that should also be a clue that you’re doing something wrong!

Phil.

July 13, 2012 11:59 am

Nigel Harris says:
July 13, 2012 at 6:33 am
As Willis has apparently unequivocally established that this particular distribution is *in fact* a Poisson distribution, I am looking forward with great anticipation to the first period of 13 consecutive months within which FOURTEEN of the months fall into the top 1/3 of their historical temperature distributions. His Poisson distribution tells us this is not very improbable, so we shouldn’t have too long to wait.
I could really use the extra time that having fourteen warm months in a 13-month period would give me. And think what a boost to the US economy it would be! Good to see a desirable outcome emerging from the warming temperature series.
Well I can’t promise you that but I will say that there is about a 40% chance that by the end of this month we will have the first occurrence in the record of the first period of 14 consecutive months within which 14 of the months fall into the top 1/3 of their historical temperature distributions. 😉

July 13, 2012 12:04 pm

Willis Eschenbach – “Despite your objections to the tests that I have done, I note that you have not submitted the K-S or any other tests for the distributions that you claimed were of the “same quality of fit” as the Poisson distribution.”
The tests I performed were of least-squares error fit, as you apparently did in your initial Poisson fitting. I have not performed K-S tests, as the form and parameter of my fits (and yours) are derived from the observational data, and hence the K-S test of observations, requiring comparing against an independent reference distributio,n would be wholly inappropriate.
“I also note that you have not found any test that rejects the idea that the results have the form of a Poisson distribution.”
Poisson: Used for counts of events in fixed sampling periods, that are independent of previous events. This is inappropriate due to the same auto-correlation that so many (including you) have noted, as that violates the successive independence criteria. Your distribution also predicts a non-zero probability of 14 events in 13 months, which is absurd – another indication of a inappropriate distribution, one that cannot describe the data. And, as noted before by JJ, myself, and others, you have generated the Poisson distribution from the sample data to be tested, hence it is not an independent reference distribution, and it provides no null hypothesis for comparison.
A more appropriate (although not exact) distribution would be the 13 of 13 appearance in a binomial distribution with autocorrelation dependence. That ends up (see http://tamino.wordpress.com/2012/07/11/thirteen/ as he’s already done the work) as 1:458,000. And it is clearly independent, meaning it provides a reference distribution for a null hypothesis.
You simply have not performed proper hypothesis testing.

July 13, 2012 12:25 pm

Willis Eschenbach – Additional note: while the K-S test can be performed against two samples and used to check consistency between them, they must be two independent samples. Performing a two-sample test between a set of observations and a curve fit directly to those observations again provides no independent null hypothesis – you are testing the data against itself.

July 13, 2012 12:35 pm

Bart,
I was originally of the opinion that JJ @ur momisugly July 12, 2012 at 6:42 am had the right idea, of regressing out the long term components of known warming. But, that warming signal is fairly small, and I think it should not change the range of the 1/3 bands very significantly.
Here’s how I think about it:
NCDC/Masters’ odds vs null hypothesis was calc’d from this model: weather without any change at all in climate. In other words, stochastic variation around a flat line.
To conceptualize those odds, consider the two components separately. Start with a flat line climate only, no weather. What are the odds that the final 13 months of 118 years of unchanging climate temp observations are going to be “in the upper third of the distribution”?
Zero. No chance at all. The question doesn’t even really make sense, as there isnt an “upper third” that is distinct from a “middle third” or a “lower third” of those observations.
Now, add in stochasitic variation, i.e. weather. Keep the flat line. Now what are the odds that the final 13 months of 118 years of unchanging climate temp observations are going to be “in the upper third of the distribution”?
Pretty damn close to zero. Almost no chance at all. This is the NCDC/Masters statistic. 1,600,000:1, if you ignore persistance, etc.
Now, start over with just the flat line. Then change it ever so slightly. Give it a warming trend of 0.00000000001C per century. Given that teeny, tiny warming trend, what are the odds that the final 13 months of 118 years of *almost* unchanging climate temp observations are going to be “in the upper third of the distribution”?
100%. Given any slope at all the final 13 months of 118 years worth of observations will not only all be in the upper third, they will be the thirteen highest observations.
Now add stochastic variation to a climate with a moderate (natural, non-catastrophic) warming trend. Now what are the odds are the odds that the final 13 months of 118 years of moderately changing climate temp observations are going to be “in the upper third of the distribution”?
Pretty damn good. Not the 100% of the “no weather” state, but certainly not zero. The actual value depends on the relationship between the magnitude of the trend and the magnitude of the “weather” variation. If the overall trend in degrees per century is in the same ballpark as the magnitude of the same-month annual variation, such “extreme” events will be quite common after a century.
The detrended annual variation of same-month temps is what? A couple of degrees C? And we’re aiming for a probability that produces 1 qualifying 13 month event in 118 years? Kids games, and that doesn’t even factor in the substantial bump in the odds that would accompany any cyclic component to climate.
There is a reason that those twits stick to their strawman…
JJ

Bart

July 13, 2012 1:45 pm

JJ says:
July 13, 2012 at 12:35 pm
“100%. Given any slope at all the final 13 months of 118 years worth of observations will not only all be in the upper third, they will be the thirteen highest observations.”
You can’t go from effectively 0% to 100% with a tiny change like that, though. Let me give an example, a simple analogy, if you will. I have stationary random data normally distributed about zero with some uncertainty parameter “sigma”. I calculate the sample mean, and find generally that it is non-zero with a standard deviation of sigma/sqrt(N), where N is the number of points. Now, I take another data set from the ensemble, and add in a small positive bias much less than sigma. Are the odds going to change greatly from 50/50 that I will estimate a negative mean value?
No. In fact, the delta likelihood should be approximately equal to the bias divided by the sigma divided by sqrt(2*pi) (additional x-axis displacement times the peak of the probability distribution is basically a rectangular integration of the additional area of the distribution displaced to the positive side). Thus, if bias/sigma is small, the change in probability is small.
Thus, I have decided I do not even believe that the result necessarily indicates warming at all, because the warming has been quite small relative to the range of the 1/3 bands.
“What are the odds that the final 13 months of 118 years of unchanging climate temp observations are going to be “in the upper third of the distribution”?”
That is the wrong question. The right question is, how likely is a 13 month stretch to be in the top 1/3 at some time within the data record? As I showed previously, it is at least 100X the value Masters got, and probably more like 1000X, and that is if you consider each point to be independent of all the others. Add in the correlation between adjacent samples (if you can!), and I expect it will go higher, still.
But, this is a difficult problem, because we do not know if the process is even stationary in time (it probably isn’t), so even trying to estimate an autocorrelation function is hazardous. I’d bet the odds are actually quite reasonable and, indeed, the fact that we have observed such a stretch suggests it may not be particularly unlikely at all.

Willis Eschenbach

Author

July 13, 2012 2:20 pm

JJ says:
July 13, 2012 at 11:11 am
Emphasis mine

Willis,

I look at the results, and I say “It has all the characteristics of a Poisson distribution”. I test it statistically to see if it has the form of a Poisson distribution, and if it passes the tests, I draw conclusions from that fact.

The problem is, you are drawing conclusions that are not supported by your methods. If you find that you can fit a Poisson distribution to the observations, the only conclusion that you can legitimately draw from that is: that you can fit a Poisson distribution to the observations. When you attempt to draw other conclusions above, all you are really doing is comparing the observations to themselves, and finding that they are comparable. The professional term for this phenomenon is called: Well, duh.

First, I did not “fit a Poisson distribution to the observations”. I note that the results, not the observations but the results, have the form of a Poisson distribution. This is no different than you noting that the underlying observations have (or don’t have) a Gaussian distribution. It is not a “fit” of any kind.
Next, my analysis of the June-to-June results allowed me to accurately estimate the number of instances of “12 of 13 in the warmest third” in the full dataset, despite the fact that there were no instances of “12 of 13” in the June-to-June dataset.
How is this not a “conclusion that I can legitimately draw from that”?
You clearly understand that if you know the distribution of the underlying data and the operations being done to them, you can draw conclusions about the results.
What you still don’t seem to have grasped is that if you know the distribution of the results, you can draw conclusions about the underlying data and/or the operations being done to them.
For example. Suppose a guy hands you a die, and wants to know if it is loaded. You throw the die 10,000 times, and get the following results for the faces 1 through 6:
[1] 1320
[2] 2842
[3] 2748
[4] 1779
[5] 811
[6] 490
Is the die loaded? Bear in mind that if you say “yes”, that means that you have drawn a conclusion from the results regarding the underlying process generating the numbers … and yet you have claimed above that I can’t do that by just analyzing the results.
Which is my point. From analyzing the distribution of the results, we can draw valid conclusions about the underlying process, as well as using the analysis of results to accurately calculate the probability of events that have not yet occurred.
Finally, the underlying question in this thread is “is the occurrence of 13 out of 13 an unexpected, unpredictable, unusual event”. Suppose the question had been asked last year before it actually occurred, “if it hits 13 of 13, is that an anomaly or an expected result”?
If I had analyzed the records in this manner last year, I would have gotten essentially the same answer I get now—that it would not be unusual or unexpected in any way.
How is that not a valid conclusion?
All the best,
w.

Nigel Harris

July 13, 2012 2:22 pm

People,
The fact that Willis’s distributions have a mean around 5.15 instead of the “expected” 4.33 is (as he explained to me in a comment above) has nothing to do with distribution shapes or fat tails. It’s because he didn’t do the analysis that you think he did. His definition of a month that is “in the top third of its historical record” is a month that is in the top third of observations that occurred *prior to* (and presumably including) that point in the record.
I have no idea how he handled the first two years of data but from the third year on, he presumably counted each June as top 1/3 if it was above 2/3 of previous June temperatures. And in a steadily rising dataset, this is going to result in a rather large number of months being counted as in the top 1/3 of their historical records. Around 40% of them, it seems, in the case of US lower 48 temps.
If the data were a rising trend with no noise, then 100% of months would be in the top 1/3 of their historical record by Willis’s criterion.
In the comment where he revealed this method, he stated “It doesn’t make sense any other way, to me at least”. However, it has apparently not occurred to any other observers to treat the data this way.

July 13, 2012 2:41 pm

Willis Eschenbach – “Is the die loaded?”
From the data you provided, and an independent reference distribution with the expectation of uniform random numbers, you can conclude yes. Your example ignores the fact that most people have sufficient experience to expect a uniform random distribution. ‘Tho there are always those with wishful thinking or poor statistical knowledge who continue to get into dice games…
You need both a null hypothesis (uniform random values) from a reference distribution and the observations to perform hypothesis testing (how likely are the observations given the reference distribution). Your Poisson distribution is simply a smoothed version of the observations. The two are not independent, and you cannot use self-referential data for hypothesis testing.

Phil.

July 13, 2012 2:46 pm

Nigel Harris says:
July 13, 2012 at 2:22 pm
People,
The fact that Willis’s distributions have a mean around 5.15 instead of the “expected” 4.33 is (as he explained to me in a comment above) has nothing to do with distribution shapes or fat tails. It’s because he didn’t do the analysis that you think he did. His definition of a month that is “in the top third of its historical record” is a month that is in the top third of observations that occurred *prior to* (and presumably including) that point in the record.
A truly bizarre way to do it!
I have no idea how he handled the first two years of data but from the third year on, he presumably counted each June as top 1/3 if it was above 2/3 of previous June temperatures. And in a steadily rising dataset, this is going to result in a rather large number of months being counted as in the top 1/3 of their historical records. Around 40% of them, it seems, in the case of US lower 48 temps.
No, for the ones near the end of the series the results will be the same as if the whole series had been chosen, i.e. 4.333. In fact once you’ve reached about 100 months in the end effect should have disappeared so you should still get 4.333 from there on. It is, inter alia, the autocorrelation that leads to the increased number of long sequences.

July 13, 2012 2:51 pm

Bart says:
“100%. Given any slope at all the final 13 months of 118 years worth of observations will not only all be in the upper third, they will be the thirteen highest observations.”
You can’t go from effectively 0% to 100% with a tiny change like that, though.
You can (and do) under the “no weather” assumption. The purpose of that case is to demonstrate the dramatic effect of any trend whatsoever on the odds. Many find that result counter intuitive.
Adding back in the weather brings the odds down from 100%, in manner related to the relative magnitude of the “weather” variation vs the magnitude of the trend. If the two are similar (say, same order of magnitude) the odds can be quite high for some periods on the trend.
Let me give an example, a simple analogy, if you will. I have stationary random data normally distributed about zero with some uncertainty parameter “sigma”. I calculate the sample mean, and find generally that it is non-zero with a standard deviation of sigma/sqrt(N), where N is the number of points. Now, I take another data set from the ensemble, and add in a small positive bias much less than sigma. Are the odds going to change greatly from 50/50 that I will estimate a negative mean value?
No. In fact, the delta likelihood should be approximately equal to the bias divided by the sigma divided by sqrt(2*pi) (additional x-axis displacement times the peak of the probability distribution is basically a rectangular integration of the additional area of the distribution displaced to the positive side). Thus, if bias/sigma is small, the change in probability is small.
Yes! But a change in the mean of stationary data is harder to effect than a change in the same data subject to a trend over time. Add a small positive bias once, the change will be hard to detect. Add it 118 times, and the change will be two orders of magnitude higher. 🙂
“What are the odds that the final 13 months of 118 years of unchanging climate temp observations are going to be “in the upper third of the distribution”?”
That is the wrong question. The right question is, how likely is a 13 month stretch to be in the top
1/3 at some time within the data record?
I disagree. We are not concerned with the odds that any 13 month event could occur under the null hypothesis. We are interested in the odds that the observed 13 month event could occur. In any climate other than the completely unrealistic “climate that doesn’t change at all” strawman, the probability that an upper third 13 month event could occur is not uniform over time.
For example, given any net trend the odds of such an event occurring are higher at the point along the trend where the accumulated trend effect approximates the magnitude of the “weather” variation, and lower earlier in the trend.
The same it true of cyclic components and other sources of auto-correlation not considered by the “climate that doesn’t change at all” strawman assumption used by NCDC. The odds of a qualifying “upper third” 13 month event are higher near the peaks of a cycle. The observed event occured near the peak of cycle. Any calculation of the odds of that particular event occurring would underestimate those odds if it included the probability of similar events occuring near the low spots in the cycle.
Removing the contributory effects on the odds of the temporal component of an actual climate is one of the tricks of the NCDC “climate that doesn’t change at all” strawman.
As I showed previously, it is at least 100X the value Masters got, and probably more like 1000X, and that is if you consider each point to be independent of all the others. Add in the correlation between adjacent samples (if you can!), and I expect it will go higher, still.
Exaclty!
In constructing their strawman, NCDC has eliminated any and every component of a natural climate system that tends to increase the odds of a 13 month “upper third” event occurring, thus skewing the odds waaaaaaaaayyyyyyyyyyy low. Add those components back in… consider the universe of all ‘non-catastrophic-global warming’ null hypotheses … and the odds aren’t so long.

Willis Eschenbach

Author

July 13, 2012 3:09 pm

JJ says:
July 13, 2012 at 11:11 am

… This is what you did:
1. You attempted to disprove a position that you misunderstood 100%.

Thanks, JJ. Please read the head post again. Right near the start you’ll find:

All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.
Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves.

Note that this means I’m not attempting to disprove anything. I’m not going to advance “reasons why he was wrong”.
To the contrary, I was clearly setting out to do something different—to answer the question of whether the 13-out-of-13 result was an anomaly, a low-odds result, something different from the past, a highly unlikely event, something unexpected or out of the ordinary, a cause for concern … or on the other hand whether it was a ho-hum, expected event. That is to say, I wanted to establish the true odds of the occurrence of 13-out-of-13 coming up in the global temperature record.
So no, I was not attempting to “disprove a position”, to show reasons why someone was wrong, and I said so quite clearly … but clearly not clearly enough.
All the best,
w.

Phil.

July 13, 2012 3:25 pm

Willis Eschenbach says:
July 13, 2012 at 2:20 pm
For example. Suppose a guy hands you a die, and wants to know if it is loaded. You throw the die 10,000 times, and get the following results for the faces 1 through 6:
[1] 1320
[2] 2842
[3] 2748
[4] 1779
[5] 811
[6] 490
Is the die loaded? Bear in mind that if you say “yes”, that means that you have drawn a conclusion from the results regarding the underlying process generating the numbers … and yet you have claimed above that I can’t do that by just analyzing the results.
No, you know that if the die is fair then it will result in a uniform distribution i.e. about 1667 for each score, clearly the die is loaded but that’s all we know.
In the case of the number of months in the top third of 13 months taken at a time we know that if the events are a result of a Poisson process then the PDF will be a Poisson distribution with a mean of 4.333. Therefore by comparison with the observations we can see that they were not generated by a Poisson process, the observation that the probability of longer sequences is higher than expected could lead you to deduce that there might be some autocorrelation.
Which is my point. From analyzing the distribution of the results, we can draw valid conclusions about the underlying process, as well as using the analysis of results to accurately calculate the probability of events that have not yet occurred.
Mostly we can deduce what it isn’t! We still have no way to make accurate predictions about future events because we don’t know what the generating process is.

Bart

July 13, 2012 3:27 pm

JJ says:
July 13, 2012 at 2:51 pm
“If the two are similar (say, same order of magnitude) the odds can be quite high for some periods on the trend.”
But, they’re not. The warming is on the order of 0.1 degC. The 1/3 bands are what, maybe 20 deg or so wide? That’s more than two orders of magnitude.
“We are interested in the odds that the observed 13 month event could occur.”
I’m not interested in that. The question is whether it is an ordinary or extraordinary event. And, determining whether it is ordinary or not requires establishing just what is ordinary.

Willis Eschenbach

Author

July 13, 2012 3:35 pm

Phil. says:
July 13, 2012 at 2:46 pm

Nigel Harris says:
July 13, 2012 at 2:22 pm

People,
The fact that Willis’s distributions have a mean around 5.15 instead of the “expected” 4.33 is (as he explained to me in a comment above) has nothing to do with distribution shapes or fat tails. It’s because he didn’t do the analysis that you think he did. His definition of a month that is “in the top third of its historical record” is a month that is in the top third of observations that occurred *prior to* (and presumably including) that point in the record.

A truly bizarre way to do it!

That’s what “in the historical record means”, it means you’re not comparing them to future years. There’s no other way to do it than to compare it to the historical record that existed at that point, unless you want to compare events that have actually occurred with events that haven’t happened. For example, consider the most recent month … can we compare it to the future months? No, not possible, we can only compare it to previous months … nor should we compare to the future for previous months.
The other thing you seem to have overlooked is that if we do it your way, and temperatures continue on their centuries long slow rise … then very soon this current June-to-June won’t have 13 in the top 13 …
Think about it in terms of the oft-repeated claim that “this is the warmest year in the historical record” … they are not comparing this year to future years. Nor should we in this case.
w.

Willis Eschenbach

Author

July 13, 2012 4:09 pm

KR says:
July 13, 2012 at 2:41 pm

Willis Eschenbach – “Is the die loaded?”
From the data you provided, and an independent reference distribution with the expectation of uniform random numbers, you can conclude yes. Your example ignores the fact that most people have sufficient experience to expect a uniform random distribution. ‘Tho there are always those with wishful thinking or poor statistical knowledge who continue to get into dice games…
You need both a null hypothesis (uniform random values) from a reference distribution and the observations to perform hypothesis testing (how likely are the observations given the reference distribution).

Thanks for the answer, KR. With the die, the null hypothesis is that the outcome has the form of a Gaussian distribution. As you point out, we can reject that hypothesis.
My null hypothesis is that this outcome has the form of a Poisson distribution. I am testing how likely the observations are given that particular reference distribution. I have not been able to reject that hypothesis.
When I take an alternate null hypothesis, that this outcome has the form of a Gaussian distribution, I am able to reject that. In other words, I can say that these dice are loaded. This is important information if I wish to establish probabilities of a given occurrence.
When I take another alternate null hypothesis, that it has the form a binary distribution, I am able to reject that one as well.
So I have a null hypothesis, actually several … where is the problem?

Your Poisson distribution is simply a smoothed version of the observations. The two are not independent, and you cannot use self-referential data for hypothesis testing

A mathematical distribution is not a “smoothed version” of a given dataset. A Gaussian distribution is not a “smoothed version” of any aspect of reality. It is a mathematical construct describing one of many ways that data can be distributed, and it exists independent of any given set of observations.
The same is true of a Poisson distribution. It is not a “smoothed representation of the observations”. It is a mathematical description of a particular type of a dataset, which some actual datasets resemble (to a greater or lesser degree) and some datasets do not resemble.
This one does resemble a Poisson distribution, to a very good degree, both in aggregate and also each and every one of the 12 monthly subsamples. Not only that, but the theoretical value for lambda (the mean of the observations) is almost identical to the value for lambda I get from an iterative fit, which strongly supports the idea that the data very, very closely resembles a Poisson distribution. In fact, it strikes me that you should be able to use the difference between the mean, and lambda determined by an iterative fit, to do hypothesis testing for a Poisson distribution … but I digress. I do plan to look into that, however.
Is it actually a Poisson distribution? It can’t be, because a Poisson distribution is open ended. What happens is that the very final part of the tail of the Poisson distribution is folded back in, because a run of 14 gets counted as a run of 13. However, this is only about one thousandth of the data, and for the current purposes it is a third-order effect that can safely be ignored.
w.
PS—What are my “current purposes”? Let me quote from above:

To the contrary, I was clearly setting out to do something different—to answer the question of whether the 13-out-of-13 result was an anomaly, a low-odds result, something different from the past, a highly unlikely event, something unexpected or out of the ordinary, a cause for concern … or on the other hand whether it was a ho-hum, expected event. That is to say, I wanted to establish the true odds of the occurrence of 13-out-of-13 coming up in the global temperature record.