Guest Post by Willis Eschenbach
Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:
U.S. heat over the past 13 months: a one in 1.6 million event
Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.
All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.
Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.
Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.
The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match
Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.
Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.
So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.
Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source
Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.
The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.
Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …
w.
Data Source, NOAA US Temperatures, thanks to Lucia for the link.
John@EF says:
July 11, 2012 at 9:54 am
Nope. I’ll leave them to their methods. In Tamino’s case, I’ve been banned from his blog for years for asking inconvenient questions, so he can rot for all I care, I wouldn’t increase his page view count by one.
w.
Mark from Los Alamos says:
July 11, 2012 at 10:32 am
Absolutely not. I’m not looking to find what the odds are of finding 13 in some imaginary detrended world. I’m interested in finding the odds in this world, the real world.
w.
You also have to consider that a continuous period of warmth in the USA is actually no more surprising than a continuous period of warmth in any other 8,000,000 square km of land area in order assess the probability of noting a 13-month period of high temperatures. Masters is using sampling bias with his post-hoc choice of the USA as his area of study.
The sheer number of data points we track means that statistically records will be broken more frequently than the average person would guess. Is your town having it’s hottest ever day? What about the next town over, or another in the area? Your county? Your region / state? Your country? What about the hottest week? Hottest month? Hottest year? What about the coldest? Wettest? Driest? Windiest? Sunniest? Cloudiest? that list gives 112 statistics that could apply just to you, at this time.
Willis E says:
I don’t follow. A Poisson distribution is (by definition) for independent events, right? And we know that the temperature series is auto-correlated, so, not independent?
So how can an auto-correlated series follow a Poisson distribution?
(Sure, you might approximate a lightly auto-correlated series by a Poisson distribution, but that approximation is going to give you some healthy-sized errors on the fringes).
>Willis Eschenbach:
>
>Thanks, mb. The model is not predicting how many months will come up out of 13. It is predicting >how many months will come up. You are correct that there will be an “edge effect”, since we are >only looking at 13-month intervals. But since it only affects ~ one case in 1400, the effect will be >trivially small.
I don’t agree. The data you want to describe, and which you have graphed, is: For each n, the “The number of 13 month intervals with n months in the top third”. This gives a number C(n), which you label as “count” in your graph. C(n) is certainly zero if n is greater or equal to 14.You claim that C(n) is approximated by a Poisson distribution P(n), and finally use this to estimate the expected frequency C(13) by P(13), Actually you estimate the inverse of P(n) by the inverse of P(n). You do not graph or estimate how many month will come up.
The edge is the number 13. My argument is that since it is obviously not a good idea to estimate the frequency C(14) by P(14), Even if C(n) is approximated well by P(n) for n less than say 7, I doubt that it’s a good idea to approximate C(13) by P(13).
>Suppose that in fact there is one run of 14 in the data. Since we are counting in 13-month >intervals, in the first case (June to June only) it will be counted as a run of 13. And in the second >case (all 13 month intervals) it will be counted as two runs of 13 … but in neither case does that >materially affect the results shown above.
>So in practice, the edge effect slightly increases the odds of finding a run of 13.
I agree, but it is irrelevant to my argument. The point is not that some 14 month sequences show up as pairs of 13 month sequences. The point is that the model definitely breaks down for n equal to 14, so why should we believe it for 13, 13 being so close to 14.
>w.
Willis,
Thanks for your responses.
I still can’t see how, if you’ve done what you say you’ve done, you can have 597 out of 1508 months that fall into the top tercile. Your sample set consists of essentially all 116 years of data, barring possibly a handful of months at the start of the series. Surely by definition 1/3 of all months will fall into the top tercile for their month. The fact that all the Junes are counted twice shouldn’t matter. And no matter how they’re distributed across the groups of 13 months, I think the mean should be close to 4.33 not 5.15.
Am I being thick?
Nigel
Anthony
The point regarding the histogram was a rather lighthearted throwaway, hence the smiley but Willis seems to have thrown his toys out of his pram at this.
As for the clip, how did you know I worked with dirt people [SNIP: Right in one, but let’s not be giving away our trade secrets, Dr. D. I, for one, am very impressed with the caliber of our commenters. -REP]. Anyways I’ll take it on the chin, but as a geophysicist aren’t you running the risk of being tarred with the same brush 😉
So, if I understood correctly, ‘One man’s mean, is another man’s Poisson?’
Windchaser – wrong – according to NOAA the continental US has cooled over the last decade, in all zones but one, I seem to remember. That the world is warmer this decade than last is not surprising as we are still probablyrecovering from the Little Ice Age and will be until we are not. No controversy therefore and no surprise that “extreme events” are happening.
If you believe that trees are thermometers, the recent paper (“This is what global cooling really looks like – new tree ring study shows 2000 years of cooling – previous studies underestimated temperatures of Roman and Medieval Warm Periods – as seen on WUWT) suggests that the medieval and Roman warm events were more “extreme” than our present warm period. But why trust proxies when we have lots of evidence that these warm periods were real and global. Unfortunately we had no idiot MSM to record those events, but if you read Roman accounts of the time they also had ridiculous and unscientific beliefs about what was driving their weather. Plus ca change……..
Jim
Fair enough, a bit sloppy – but then it is a blog.
Upon reflection, I see that you are right, that I shouldn’t have fit the value of lambda. I should have used the mean of the actual data
No, you should have used the known probability of any given month being in the top 1/3, i.e. 33.3%, times the number of events (13), i.e. a lambda of 4.3333. A Poisson distribution is only appropriate if the probabilities for non-overlapping time intervals are independent.
In fact, your analysis is even weaker than I thought, since all you’ve shown is that Poisson is the wrong model, i.e. that hot months “clump” together more than would be expected by random chance. That could be for several different reasons, most notably (1) if there is a trend over time, or (2) if there is autocorrelation between successive months. Note that those are independent – you could have a non-stationary dataset without autocorrelation, or a data set with autocorrelation but no net trend.
In point of fact, for temperature data both (1) and (2) are already known to be true, so a Poisson model is doubly wrong. The fact that the combination of an upward trend and a certain degree of autocorrelation has resulted in something that looks a bit like a different Poisson distribution with a larger mean is irrelevant.
@Nick in Vancouver:
Even if the US has cooled over the last decade (which it hasn’t, or at least I doubt it after the last 13 months), it could still be significantly warmer than average for the period we’re looking at (the last 110 years). And even last year, when the US was “cooling”, this was the case. Obviously, we have a much greater chance of hitting hot records after the temperature has gone up than not.
As a side note, if your cooling or warming trend is weak enough that a couple hot or cold years can completely upset it, then it’s not really very useful. So I’m kind of skeptical of things like 10-year plots that show we’re cooling, and then the next year we’re warming, and then the year after that, we’re cooling again. That’s noise, not a real, statistically-significant trend.
So looking at longer-term trends: Yes, the US is warmer than average, and has been for at least the last decade. Moreover, the warming is big enough that we see things happening that we wouldn’t expect to see if the US had not been warming.
Nigel Harris says:
July 11, 2012 at 1:17 pm
My understanding is that the record is counted if it is in the top third of all records up to that date, not if it is in the top third of all historical records for all time. It doesn’t make sense any other way, to me at least.
w.
The maximum likelihood estimator for lambda for a Poisson population is:
lambda(MLE) = 1/n sum[i=1 to n](Ki) (http://en.wikipedia.org/wiki/Poisson_distribution#Maximum_likelihood)
not a least squares Excel fit as Eschenbach apparently performed. In addition, Poisson populations are by definition collections of independent events, while temperatures display autocorrelation – meaning that a Poisson distribution is the wrong model to start with. The apparent appearance of a Poisson distribution can be seen in the sum of a changing normal distribution with changing standard deviation, such as in Hansen et al 2012 (http://www.columbia.edu/~jeh1/mailings/2012/20120105_PerceptionsAndDice.pdf), Figure 4, where they demonstrate both that mean temperature has risen and the standard deviation has increased as well. The sum of this distribution displays a longer tail on the high end, but is most definitely not Poisson.
Monte Carlo estimation using observed statistics is a reasonable (and quite robust) method to use here – Lucia’s (re-)estimate is less than a 1:100,000 chance for the 13 month period being entirely in the upper 1/3, using an AR(1) noise model and the autocorrelation seen in US records.
Masters made the mistake of not accounting for autocorrelation, and appears to be off by at least an order of magnitude as a result. Eschenbach is using the wrong model, and appears to be off by perhaps five orders of magnitude as a result.
Windchaser says:
July 11, 2012 at 1:59 pm
Why on earth are you guys arguing over what you think has happened when there is a link to the actual data at the end of my post? Go get the numbers so you can cease your endless speculation on what the US might have done …
w.
PS—The actual US trend for the last 120 months is .3°C ± 1.4°C (95%CI) … so not statistically different from zero. As a result, we cannot say whether the US is warming or cooling over the last decade. From the first of last year up to the middle of last year, the previous 120 months were in fact cooling (statistically significant), but this latest warm 13 months has pushed it back into neutral. A decade is a short time span in which to find statistically significant results.
Willis Eschenbach says:
July 11, 2012 at 11:30 am
Nigel Harris says:
July 11, 2012 at 6:39 am
Willis,
I’m still puzzled why your distribution has such a high mean value.
I have assumed all along that it has a high mean value because the data is autocorrelated. This pushes the distribution to be “fat-tailed”, increasing the probability that we will find larger groups and decreasing the probability of smaller groups.
pjie2 says:
July 11, 2012 at 6:21 am
Willis has this whole thing upside down. He’s fitting lambda to the data, rather than comparing the data to the known lambda (lambda is simply the probability of success times the number of events, and thus has by definition to be 13/3). That means his conclusion is exactly backwards.
Reasoning correctly, we know that if there is no autocorrelation between hot months, then we should get a Poisson distribution with lambda = 4.3333. We don’t, instead we have a significant excess of hot streaks. All this proves that it’s a non-Poisson process, i.e. that there is some autocorrelation, and the temperature in a given month is not independent of the temperature of the surrounding months! Having thus proved it’s non-Poisson, you can’t then draw further conclusions using the Poisson distribution.
I disagree. We have not shown it is not a Poisson distribution. We have shown that it is a special kind of Poisson distribution, a “fat-tailed” Poisson distribution where all results are shifted to somewhat higher values.
No, you’ve shown that it isn’t a Poisson distributed variable.
Strictly the Poisson distribution is a limiting case of the binomial distribution where p is very small and approaches zero and N is very large. (where p is the probability of success and N the number of trials. If that were the case the mean would be Np, and the variance Np, which in this case would both be 4.33.
You’ve shown that it is not, this could mean that there is autocorrelation but Lucia has shown that this is low so a more likely explanation is an increase in temperature over the course of the trials (either way it’s not a Poisson distributed variable).
Since it strictly doesn’t meet the criteria for a Poisson for a binomial distribution the mean is still 4.33 but the variance is Np(1-p) or 13*0.3*0.7= 2.73, in that case the probability of 13 successes out of 13 would be: 13!/(13!*0!)*p^13*(1-p)^0= 1/3^13
However, it doesn’t meet the criteria for a Binomial distribution either.
The probability of the event success varies with where you are in the series, i.e. p isn’t constant.
Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record.
If we assume that this particular 13 month period was the tenth warmest on record globally, I would think that almost no months would be colder than one of the warmest 39 in the 118 year old global record.
Willis,
You claim “it is indeed a Poisson process”. Don Wheeler, one of the world’s leading statisticians points out “The numbers you obtain from a probability model are not really as precise as they look.” A Burr distribution can be made to look almost identical to a Poisson but give very different results in the tails.
KR says:
July 11, 2012 at 2:12 pm
Thanks, KR. As I pointed out above, but apparently you didn’t read, the answer when doing it your way is only trivially different from the answer when I do it as a least squares fit. Least squares gave me lambda = 5.21 for both methods. Your way gives me 5.15 for June-to-June and 5.17 for all 13-month intervals.
I checked on that by seeing if my results were autocorrelated. They were not, in fact they were slightly negatively correlated (-.15). That’s my general method for checking to see if autocorrelation is an issue, do you have another method?
I also hold that the excellent agreement between the theoretical lambda and the lambda obtained by an iterative fit is strong evidence that the data actually has a Poisson distribution.
Finally, I’m not sure why you think that autocorrelation is a problem for a Poisson population. I say this because the order in which the Poisson events occur does not affect the calculations.
For example, suppose I take the number of people standing in the line at the bank, which is known to be a Poisson variable. I measure it at 10 minute intervals, and I get the following values for the numbers of people in the line:
1 2 3 2 4 3 3 3 4 3 4 4 2
The lag-1 autocorrelation of these is about 0.1. Now suppose I measure the lines again, and business happens to be steadily picking up, and I get the following results:
1 2 2 2 3 3 3 3 3 4 4 4 4
Note that the distribution of this group is identical to the previous group … but the autocorrelation of this group is 0.65.
So should I throw out my second set of data, or say that the distribution of the second set is not Poisson? The second set is identical to the first set, just in a different order … how can the second one not be a Poisson distribution, while the other one is a Poisson distribution?
Perhaps … me, I’m always more partial to looking at the real dataset rather than depending on pseudo data.
Five orders of magnitude? Get real. LOOK AT THE ACTUAL DISTRIBUTION. We have a host of high values in the dataset, it’s not uncommon to find occurrences of ten and eleven and twelve months being in the warmest third. The idea that these are extremely uncommon, five orders of magnitude uncommon, doesn’t pass the laugh test.
w.
pjie2 says:
July 11, 2012 at 1:44 pm
Why on earth would I do it that way? That assumes a whole host of things about the dataset that obviously aren’t true, since the mean of the data is not 4.333. I’m not investigating your imaginary data, I’m investigating this actual dataset.
w.
How do we know that we have not seen a Black Swan? Isn’t it possible that we have seen an event that completely changes our knowledge of the distribution of temperatures in the US?
If this event was indeed a “Black Swan” then it seems to me that the if the temperatures next year are exactly the same as the temperatures this year then a lot of the statistics that are being applied should actually predict that the second occurrence was more likely than the first. Of course then you could start doing math for a 26 month period…. but the example still holds because we could add an intervening year with non-record temperatures.
Mr Masters may be able to tell me otherwise, but I’ve seen no hint that this is the case from the literature. But I’ve seen many times that low solar activity is linked with increased jet stream blocking.
times the area at that temperature — this is part of why the moon is cooler on average than the Earth, because its hot side is very hot when it is hot and its cold side never receives any part of the hot side heat to radiate away more slowly.
Where? That is a fascinating datum, if true!
Jet stream blocking reduces atmospheric mixing. Reduced mixing causes hot spots to get hotter and cold spots to not be warmed by the hot spots. Less mixing increases cooling efficiency as one radiates energy at
The next question (again, if true) is why does decreased solar activity lead to increased jet stream blocking, hotter hots and colder colds, and overall cooling. This alone could be an undiscovered mechanism for why periods of low solar activity seem to be net global cooling periods.
rgb
Willis Eschenbach says:
July 11, 2012 at 2:51 pm
pjie2 says:
July 11, 2012 at 1:44 pm
Upon reflection, I see that you are right, that I shouldn’t have fit the value of lambda. I should have used the mean of the actual data
No, you should have used the known probability of any given month being in the top 1/3, i.e. 33.3%, times the number of events (13), i.e. a lambda of 4.3333.
Why on earth would I do it that way? That assumes a whole host of things about the dataset that obviously aren’t true, since the mean of the data is not 4.333. I’m not investigating your imaginary data, I’m investigating this actual dataset.
Which is precisely why it is not Poisson, we know p for the dataset, it’s 1/3, for a Poisson distribution p must be constant, we know the number of events N, it’s 13, so if that dataset is Poisson the mean must be Np, i.e. 4.33.
Your dataset, if it’s Poisson is for a process where the overall probability of being in the top third is ~0.40!
Dr Burns says:
July 11, 2012 at 2:32 pm
Yes, I know, and there are other distributions that are similar as well … but in this particular case, the Poisson distribution gives very good results in the tails.
w.
It seems to me that if the the previous 13 months were really such a rare event it would stick out like a sore thumb on a time series chart. Would someone be willing to put up the monthly averages of the data under discussion over the past 116 years?